Unlocking the potential of large language models: Enhancing chatbots with a GPU-centric vector database
Data Science Summit
Warsaw, Poland
Abstract
In recent years large language models have captured the public imagination due to their apparent human-like conversational capability and their potential impact on society. On that wave of popularity, issues concerning accuracy, bias, and misinformation of certain language models have been disclosed. One way to alleviate some of those issues is to enhance the language model with an abstract vectorized memory, or vector database. Such memory structures are designed for an efficient search for similar instances of subject matter that can improve the model’s response toward that of expert human responses. A caveat with mainstream vector databases is that they are restricted to traditional, serial CPU computing power. Here, “nCodex”, a novel GPU-centric vector database is introduced that uns similarity searches on GPU. It is shown that this choice of hardware is effective at the ‘single-instruction, multiple-data’ operations required of typical similarity searches. The case studies presented demonstrate that the performance and response of small language models can exceed that of larger language models when supported by the nCodex vector database. It is then discussed how nCodex is applicable to real business situations deploying large language models with high concurrency and low latency needs
See LinkedIn post.