Retrieval-Augmented Generation is trending

Databrix announces the open-source DBRX model

Posted on April 15th, 2024

Summary

Retrieval-Augmented Generation (RAG) is a popular topic in recent articles and an overview article appears on the tdwi.org website. RAG is the idea of connecting a large language model to a database so that up-to-date information can be included in language model responses. This is also seen as having the potential to reduce hallucinations, and increase data protection by segregating the model from sensitive data. RAGs are putting vector databases into the spotlight since these store data in a compatible manner to language model engines.

On the technical side, there is an article on the Go programming language community’s efforts around large language models. There is an MIT News article on an AI model to improve the efficiency of red-teaming. In red-teaming, a group of users manually try to jailbreak a language model, but the limit of this approach is that team members cannot think of all possible prompts to test; the AI model proposes a wider range of prompts. An article from the DevPro journal highlights a talk from the Wikimedia foundation on the use of data graphs to address language model concerns like hallucination and processing efficiency.

Among the announcements this week, a Computer Weekly article covers Google’s announcements around its generative artificial intelligence technologies. Large corporate users of the Google platforms include Mercedes Benz, Goldman Sachs, Uber, Intercontinental Hotels, and NewsCorp. Databrix announced the open-source DBRX model and claims better performance than Llama and Mistral models. In particular, the model uses the mixture-of-experts (MoE) architecture, which optimizes performance by employing many models (16 in this case) and assigning specific sub-tasks to the most suitable model within the architecture.

On legal aspects, an article from DMR News looks at the relation between AI generated content and copyright problems. Regulation in China is discussed, and how restrictive measures like enforcing socialist values can hinder Chinese AI companies.

Several articles look at the challenge of adopting AI in organizations. A consensus is emerging around the need for a governance program to handle risks, including reputational risk that can arise from, say, confiding customer service tasks to chatbots. Concerning governance, an article from MIT Technology Review emphasizes how organizations need to define and optimize their data strategy, but not just to facilitate access to data by the language models they deploy, but for more optimal business decision-making.

Finally, another article from MIT Technology Review presents an interview with OpenAI employees. They talk of their surprise at the success of ChatGPT and how, at the time, information accuracy was perceived as being more important than mitigating toxic content.

1. How GenAI use is evolving for Google Cloud customers

Goldman Sachs, Uber, Mercedes Benz, and Intercontinental Hotels are using Google Cloud’s Gemini and Vertex generative artificial intelligence technologies. Gemini comprises Google’s general-use AI models, analogous to OpenAI's GPT series. Vertex AI, on the other hand, is Google Cloud’s platform for developing and deploying generative systems, allowing customers to create specialized AI entities, known as agents, for particular purposes. Goldman Sachs aims to enhance client experiences and boost developer productivity using these technologies. The tools are also used to summarize public filings, extract sentiments and signals from corporate statements, and gather and analyze information such as earnings reports. Mercedes Benz plans to equip its vehicles with computers using Google’s AI to offer personalized user experiences. Intercontinental Hotels is developing an AI-driven travel planning system, while NewsCorp uses Vertex AI to sift through data and refresh its systems across 30,000 global sources and 2.5 billion news articles daily. Additionally, Google announced Google Vids, an AI-driven application within Google Workspace that enables video content generation from Google Drive files.

2. The inside story of how ChatGPT was built from the people who made it

This article features interviews with OpenAI employees who express their surprise at ChatGPT's enormous success. Their astonishment partly stems from the fact that most of the technology powering ChatGPT isn't new. ChatGPT is essentially a refined version of GPT-3.5, a large language model series released by OpenAI months earlier. GPT-3.5 itself is an enhancement of GPT-3, which debuted in 2020. A key insight from the interviews was OpenAI's prioritization of factual accuracy (the absence of hallucinations) over the mitigation of toxic content. At the time, the model was deemed sufficiently safe compared to other large models like InstructGPT. However, this view has since evolved in light of the vast user base that adopted the chatbot. To combat issues like jailbreaking and the generation of toxic content, researchers employed adversarial training. Additionally, a large user group was tasked with reviewing ChatGPT's prompts and responses.

3. A faster, better way to prevent an AI chatbot from giving toxic responses

Red-teaming is a method where teams of human testers create prompts designed to elicit toxic responses from a model under evaluation. These prompts are used to train a chatbot to avoid such replies. However, this approach depends on the engineers’ ability to anticipate which toxic prompts to test, which can be challenging given the huge amount of possible prompts. Researchers from the Improbable AI Lab at MIT and the MIT-IBM Watson AI Lab have developed a machine-learning technique to train a large language model for red-teaming, that generates a broader array of prompts that trigger undesirable responses from the chatbot being tested. This red-team model is designed to be inquisitive about the outcomes of its prompts, experimenting with various words, sentence structures, and meanings to maximize the toxicity of the responses it elicits. Additionally, the model includes an entropy bonus to promote randomness, encouraging it to explore a diverse set of prompts. This innovation allows for more testing against specific standards, such as a company's policy document.

4. Modernizing data with strategic purpose

This article reports on a survey of 350 senior technology executives on data modernization strategies. Over half of the respondents say their organization is currently implementing a data modernization program. One key finding is that AI is only one reason for data modernization programs. Other key reasons remain improved decision-making, regulatory compliance and boosting operational efficiency. Data strategy is seen as being too often siloed from business strategy, and the top data pain points are data quality, timeliness (time to extract quality information), and the challenge of aligning data strategy to the business strategy of the organization. Cross-functional teams and DataOps are key levers to improve data quality.

5. Go language shines for AI-powered workloads, survey says

This article presents the results of a 2024 Go Developer Survey, which gathered 6,224 responses. The survey shows that while Go is considered a strong platform for running AI-powered applications, most developers prefer Python for initiating such projects. The survey reveals that the most commonly developed AI services among respondents are summarization tools, text generation tools, and chatbots. Regarding AI models, a majority of 81% of developers use OpenAI's ChatGPT or DALL-E, with Meta Llama used by 28%. As for AI libraries, 9% of respondents use OpenAI, while Hugging Face TGI or Candle is used by 22%, and LangChain by 20%.

6. Why the Chinese government is sparing AI from harsh regulations - for now

This article presents an interview with Angela Huyue Zhang, a law professor at Hong Kong University, on China's approach to AI regulation. The 2023 regulations by the Cyberspace Administration of China (CAC) are notably strict regarding freedom of speech and content control. However, Zhang highlights a cyclical pattern in China's tech regulation: initial leniency allowing companies to grow and compete, followed by severe crackdowns impacting profits, and then a relaxation of restrictions. This pattern is often overlooked by Western observers who mainly focus on the crackdowns, not realizing that these regulations can also benefit firms. China aims for technological supremacy and self-sufficiency, with the government playing multiple roles such as policymaker, incubator, investor, research supplier, and customer of AI applications. However, restrictive measures like enforcing socialist values in language models and mandating real-identity verification for users could hinder Chinese companies' ability to innovate and compete globally. According to Zhang, a significant controversy or misuse of AI that threatens social stability might trigger widespread changes in these regulatory approaches.

7. Where We See the Biggest Opportunities in Generative AI

The article summarizes the key challenges faced by generative AI: hallucinations (where models generate inaccurate or irrelevant outputs that are not aligned with the intended context or task), lack of interpretability (meaning it can be difficult to discern the reasoning or decision-making processes behind the outputs), resource-intensiveness due to extensive datasets and considerable computational power, and struggling to maintain coherence and context in prolonged conversations, mainly due to memory limitations. Addressing inaccuracies, misinformation, or updating outdated content in these models increases training expenses and limits accessibility for developers.

The article references a talk by Denny Vrandečić from Wikimedia about using knowledge graphs to provide a structured context and enhance the functionality of these models. Knowledge graphs offer a structured representation of domain-specific knowledge and relationships, which can mitigate many of the above issues by enabling faster query processing, reducing computational and memory overhead, and improving interpretability and explainability through structured and comprehensible knowledge representations.

8. Tackling AI risks: Your reputation is at stake

This article primarily focuses on reputational dangers for organizations employing AI technologies. It raises concerns about how customers' perceptions and the potential harm of disseminating false or misleading information can impact an organization. A specific example mentioned is a social welfare chatbot designed to communicate with citizens in their native languages, where inaccuracies or "hallucinated" information could prevent individuals from accessing necessary resources. The article suggests the use of retrieval-augmented generation to enhance the accuracy and reduce misinformation risks in AI models. The article also discusses the disruptive potential of Generative AI on traditional work practices, emphasizing the importance of supporting teams rather than just individuals. Lastly, it warns against the overly rapid adoption and fine-tuning of Large Language Models, pointing out the potential dangers of over-enthusiasm in their application.

9. Biggest problems and best practices for generative AI rollouts

This article mentions how organizations are rapidly adopting generative AI to increase productivity and efficiency, but many are not taking a strategic approach implementing the technology. IT leaders face many hurdles to the effective adoption and scaling of generative AI, including the lack of comprehensive AI governance for risk mitigation and control. By 2027, more than 50% of enterprises will have implemented a responsible AI governance program to address the risks of generative AI compared to less than 2% today, according to Gartner. other problems include ineffective prompt engineering, inadequate chunking or retrieval in Retrieval-Augmented Generation (RAG), the complexity involved in fine-tuning an AI model, and the difficulty in finding talented employees. The article cites Gartner as encouraging organizations to put responsible AI framework to the forefront by defining and publicizing a vision for responsible AI before beginning a generative AI program. The architecture should be modular, scalable, and embedded with governance up front.

10. Exploring the legal implications of whether generative artificial intelligence infringes copyright

The proliferation of AI-generated content has created debates around authenticity in creative works. Consumers struggle to discern between human and AI creations. AI relies on algorithms and data, humans draw from emotions and experiences; human creativity is driven by intuition and empathy. The impact on copyright law remains a contentious issue. Key elements for copyright include originality, fixation (in a sufficiently permanent medium), and tangible form, but generative AI introduces a shift in the concept of originality by enabling automated content generation. The New York Times engaged in a lawsuit against tech companies for copyright infringement. Determining copyright infringement in AI-generated content involves assessing originality and creativity. Courts look at whether the content was produced independently or through copying. Factors like substantial similarity play a crucial role in identifying violations by AI tools. Protecting original work from infringement becomes more complex with AI’s ability to replicate styles and generate similar content. Creators often face difficulties in proving ownership of their work when AI produces similar pieces. Among the preventive measures cited are conducting regular copyright audits and educating developers about intellectual property laws, content filtering algorithms, metadata tracking (where meta-data is added to AI generated images thus allowing people to detect AI content), and data sourcing policies.

11. Generative AI: Databricks unveils open source large language model

Databricks has introduced DBRX, a general-purpose large language model that reportedly surpasses other open-source models in performance. According to Databricks, DBRX outperforms well-known models like Llama 2 70 B and Mixtral-8x7B across various industry benchmarks, including language understanding, programming, mathematics, and logic. Developed by Mosaic AI and trained using Nvidia DGX Cloud, DBRX employs a mixture-of-experts (MoE) architecture, which optimizes performance by employing many models (16 in this case) and assigning specific sub-tasks to the most suitable model within the architecture. DBRX is open-source and available for both research and commercial use on GitHub, Hugging Face, Amazon Web Services (AWS), Google Cloud, and Microsoft Azure. An expert notes a notable trend towards open-source usage in 2024, pointing out that while enterprises are still interested in customizing models, the availability of high-quality open-source models has led many to adopt retrieval-augmented generation (RAG) or fine-tuning existing open-source models.

12. How RAG Will Usher In the Next Generation of LLMs and Generative AI

The article reviews Retrieval-augmented generation (RAG) – the technique integrated during the inference and output generation phases of processing requests with Large Language Models. It involves retrieving relevant data from a connected database to enhance the relevance and accuracy of responses. This approach allows AI models to access up-to-date external data, thereby overcoming the limitations of outdated or incomplete datasets typically used in LLM training. Such data access helps in reducing the issue of hallucinated information. To facilitate efficient retrieval, RAG usually includes a pre-processing step where embeddings of all external documents are created, indexed, and stored. The use of a vector database is common for these purposes because it supports efficient indexing, storage, and retrieval of complex vector data.

RAG also offers advantages in terms of data security and privacy. It allows enterprises to make sensitive data accessible to applications while ensuring that this data is kept separate and protected. Additionally, RAG can be cost-effective as it permits the easy addition of new data to the vector database without significant reengineering. On the other hand, RAG requires the maintenance of a vector database environment, and also introduces increased latency in processing prompts due to the additional steps of data retrieval and integration. Moreover, the effectiveness of RAG heavily depends on the data quality of the external data sources.