ChatGPT & enterprise knowledge: “How can I create a chatbot for my business unit?”

Published in

#NextLevelGermanEngineering

8 min readMay 15, 2023

Technologies like ChatGPT have the great potential to boost creativity among our employees, improve our customer experience and support decision-making processes.

“How can I create a chatbot for my business unit?”[T1] This is probably the most common question we as Porsche’s AI research team hear these days. Across the company, technologies like ChatGPT have the great potential to boost creativity among our employees, improve our customer experience and support decision-making processes. Above all, we see these tools as a game changer in the way we work, access and consolidate knowledge within our enterprise. The ubiquitous availability of Pre-trained Large Language Models (PLLMs) such as ChatGPT has dramatically lowered the barriers for this task. However, it is still by far not as easy as one might think. In this article, we briefly explain the underlying mechanics of state-of-the-art PLLMs, discuss how this leads to common pitfalls for knowledge-tasks in the enterprise context, and present the Retrieval Augmented Generation Pattern [0] as a possible solution to these issues.

Let’s start where the hype began, at ChatGPT. ChatGPT is a PLLM published by OpenAI that performs stunningly well, for instance in answering questions and summarizing texts. If you haven’t done so already, we highly encourage you to go to the freely available website and give it a try! The model passes the Turing test with ease and has revolutionized the public opinion on language-generating AI. But how does it work? We won’t annoy you with technical details on the underlying mechanics, but will give you just enough information to understand the common pitfalls these models bring.

The illusion of knowledge

ChatGPT is a deep artificial neural network that was first pre-trained for causal language modeling on a large corpus of text data and afterwards fine-tuned with a technique known as ‘reinforcement learning from human feedback’. In the pre-training phase, it is shown a variable number of (sub-)words (referred to as ‘context’ or ‘prompt’) and learns to predict the most probable next (sub-)word. Through doing so, it learns a probability distribution over all the (sub-)words it knows, conditioned on the context. The learned probability distributions encode rules of grammar, e.g., subject-verb-object and semantics, e.g., ‘snow’ is usually associated with ‘cold’. After the pre-training phase, it could therefore be argued that the model is capable of encoding and generating natural language.

However, so far, there is no way of influencing what exactly the model generates. Therefore, the model is trained to give answers to questions in a subsequent fine-tuning step. During fine-tuning, the model is shown questions and must generate suitable answers to these [3].

It is important to note that, due to the architecture and training paradigm of modern language models, PLLMs are indeed capable of encoding and generating natural language. but have no understanding of the text. Furthermore, they cannot consult a knowledge database while generating answers, hence the output they produce only conveys the illusion of knowledge. Through numerous studies, it has been shown that hallucinations (communicating false information) or biases (e.g., discriminating against a group of people) are major issues for PLLMs [4]. Besides that, the integration of static knowledge bases is not trivial.

Misconceptions around fine-tuning

As we explained above, fine-tuning of PLLMs is a means to adapt the model from pure language encoding and generation to a related task. However, we observe that it is often misunderstood as a means to incorporate your domain knowledge into the model. In fact, unless you could provide huge amounts of training data, your own domain-specific training data is completely underrepresented compared to the massive corpus of data the model has seen during the pre-training phase. Therefore, it would be rather optimistic to assume that the model will forget or better overwrite the information it has learned during the training phase (encoded as conditional probabilities of sub-words). A more realistic outcome is a model that may provide factually correct answers from time to time but, in reality, often fails. Hence, fine-tuning should be seen as a means to adapt the model to how it communicates, but not what it communicates.

But bear with us! There is still hope to take advantage of PLLMs for tasks that require knowledgeable answers and that must be free from hallucinations or bias. This is where the Retrieval Augmented Generation Pattern comes to the rescue.

Separating knowledge and skill

Illustration of the Retrieval Augmented Generation Pattern, adapted from an open-source project by Redis utilizing OpenAI [2]

The idea behind the Retrieval Augmented Generation Pattern is simple: PLLMs are excellent at encoding and generating natural language. We could therefore set up a pipeline where we find documents semantically related to a question and then let the PLLM generate knowledgeable answers from that information. This allows us to make the most of the power of PLLMs while disconnecting answers from the alleged ‘knowledge’ encoded in the model. With Microsoft’s Prometheus model, a related approach is already being used in Bing Chat to provide real-time, factually correct answers [5]. A key difficulty here lies in the orchestration between ’conversational’ and ‘factual’ conversations.

1) Encode questions and answers

First, we need to find a way to semantically search for documents relating to a question. If a person enters the word ‘motor’ in a question, then documents mentioning the word ‘engine’ should be found as relevant in the subsequent step. Luckily, comparing words and sentences in a semantic sense is already a well-explored area in machine-learning research. For this article, it is sufficient to understand that we can encode words or phrases as vectors, with similar meanings having similar vectors. The so-called ‘embedding vectors’ or ‘embeddings’ can be easily generated by Large Language Models.

2) Compare the encoded entities

In our example, for each document in the knowledge base, an embedding is generated and then compared with the embedding of the question associated. If a document embedding is only a short distance away from the question’s embedding, then the document is marked as relevant. This also comes with the additional benefit of being fully transparent about the knowledge source the model uses for generating answers in the next step and allows a reference to it to be provided.

3) Generating knowledgeable answers

Now that we have coded the question and found the relevant documents, we still need to find the correct answer in the documents and return it in the form of natural language. We can therefore put the question and relevant documents in the prompt and instruct our PLLM to provide an answer to it. Here, we can see a relatively new discipline evolving named ‘prompt engineering’, which focuses on the way in which the prompt is formed from the necessary information. Interestingly, we can even put rule-based instructions for the model into the prompt, allowing us to set the tone of the generated answer or advise the model not to use any knowledge other than that specified in the prompt. However, since the prompt supports only a limited amount of text, it may be necessary to reduce the size by inserting only the most important paragraphs [2].

The Retrieval Augmented Generation Pattern is very easy to replicate step by step, as shown here in the OpenAI playground [1]. To begin with, the knowledge base is empty. Then, after a question is entered, it is manually populated with the Wikipedia article on the Porsche 918 Spyder.

First open-source projects implement the pattern

We can already see some projects integrating this pattern with convenient-to-use PLLMs as the backbone, such as OpenAI’s GPT [1]. The core of such implementation is a document database like Redis or Elastic that stores raw documents with their associated embeddings, coupled with a fast vector search. Also, Microsoft is pursuing this approach by implementing a similar pattern with Azure Cognitive Search and Azure OpenAI [6]. A video worth watching can be found on YouTube. It will be exciting to see variations of this pattern, as the models and underlying document databases allow for a high degree of variability in the enterprise context.

The road ahead

The Retrieval Augmented Generation Pattern is an exciting and powerful design pattern that has the potential to become the most influential design pattern to integrate enterprise knowledge in the coming years. It couples the ease-of-use of Pre-trained Large Language Models with the ability to incorporate domain-specific knowledge from textual documents. As embeddings are universally applicable to other documents like images and videos, there is great potential to soon implement multimodal domain-specific chatbots in the future.

Hence, use cases for the vertical and horizontal integration of knowledge are vast and varied and will likely enable knowledge to seamlessly flow through the entire enterprise. For instance, think of the knowledge from the vehicle development engineers made available to repair workshops through the integration of technical product datasheets. Workshop personnel will feel like having a team of expert engineers at their fingertips, giving them access to detailed information on the vehicle’s specifications and design.

Nevertheless, despite its huge potential, this pattern is still in its infancy. Further research and adoption will be needed to make this pattern accessible and safely usable by a wide range of enterprises. For example, it may still suffer from problems like bias, hallucinations and toxic comments. The articulation of such problems might be more subtle, and therefore even riskier. Additionally, we see the need to research the systematic adoption of the design pattern into enterprise IT landscapes as the rapid development in the area of PLLMs will require continuous integration and swift adaptation.

About the authors

Tobias Grosse-Puppendahl is an enterprise architect for data & AI at Porsche AG. He contributes to academic research at the intersection of human-computer interaction and artificial intelligence.

Philipp Hallgarten is a PhD student in Machine Learning Methods for Recommender Systems at Porsche AG and TU Munich.

Lukas Zwaller is a dual-career student at Porsche AG, studying computer science at the DHBW Ravensburg Campus Friedrichshafen.

Christoph Hoellig is a consultant for data science and analytics at MHP — A Porsche Company. He is a freelance lecturer for TU Munich and IU International University of Applied Sciences.

Pouyan Asgharzadeh is an AI portfolio lead at MHP — A Porsche Company.

References

[0] Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models. https://ai.facebook.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/. March 30, 2023

[1] ChatGPT. http://chat.openai.com/. March 30, 2023

[2] Question & Answering using Redis & OpenAI. https://github.com/RedisVentures/redis-openai-qna. March 31, 2023

[3] Introducing ChatGPT. https://openai.com/blog/chatgpt. March 30, 2023

[4] Bang, Yejin, et al. “A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.” arXiv preprint arXiv:2302.04023 (2023).

[5] Microsoft shares the secret sauce behind its Prometheus model that powers the new Bing. https://www.windowscentral.com/software-apps/microsoft-shares-the-secret-sauce-behind-its-prometheus-model-that-powers-the-new-bing. March 31, 2023

[6] ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search, https://github.com/Azure-Samples/azure-search-openai-demo. April 13, 2023

About this publication: Where innovation meets tradition. There’s more to Porsche than sports cars — we are developing new digital products and services — always with our customers in focus. On our Medium blog, we tell these stories. It’s about our #nextvisions, emerging technologies, and the people that drive our digital journey. If you want to know more, follow us on Twitter, Instagram and LinkedIn.