🔍Build a Search Ecosystem with Vector Databases and LLMs🔎

8 min readJun 18, 2024

Searching for information is probably 80 Percent of the job in IT.

People spend so much time searching for

Documents
Syntax
Information
Solutions to errors
Tutorials

Etc. etc.

Most websites and apps have extremely capable search Engines built in to allow users to perform sophisticated and fuzzy searches across a variety of information and data.

Traditionally, Search implementations were done using tools like Elastic Search and Apache Solr.

The world was happy. The search engines they built were quite good at trying to second guess what the user was actually searching for.

But with time —things need to change. And Changing they are.

Gone are the days when Traditional implementations using Elastic Search or Lucene are acceptable.

What happened to Elastic Search?

Vector Databases and Large Language Models happened

In this article, I will explain how a sophisticated search ecosystem can be built using the latest and greatest advancement in the past year — Large Language Models.

Vector Databases

A vector database is a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes. The vectors are usually generated by applying some kind of transformation or embedding function to the raw data, such as text, images, audio, video, and others. The embedding function can be based on various methods, such as machine learning models, word embeddings, feature extraction algorithms.

The main advantage of a vector database is that it allows for fast and accurate similarity search and retrieval of data based on their vector distance or similarity. This means that instead of using traditional methods of querying databases based on exact matches or predefined criteria, you can use a vector database to find the most similar or relevant data based on their semantic or contextual meaning.

So what’s the game changer in the above description of Vector Databases?

You can use a vector database to find the most similar or relevant data based on their semantic or contextual meaning

Even if you skipped the description of a Vector Database, you cannot ignore the last line. Context matters. It matters a lot.

Large Language Models

Large language models (LLMs) are trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks.

Large Language models are generators of information in

A language you understand
A domain you know
Information with context

We know now that if I give ChatGPT a prompt, and a context, it is quite capable of understanding what I am actually looking for within that context.

Yet again, we encounter the word

CONTEXT

Context Matters

So when we search for information, the context matters. If I want to search for Pants on your ecommerce website, I shouldn’t have to specify things like

🔘Color

🔘Type

🔘Preferences

And other scientific shit designed to get all the information from me and leave nothing to the imagination.

Now instead of going through an expansive form filling in search terms, filters, etc. what if I could simply say

“Find me regular fitting Pants in black color that I can wear to the office”

Isn’t that easier? It provides enough context for a framework using Vector Databases and LLMs to find information for you.

And if I don’t like what the Search Engine recommends, I can just say

“Nah, find me other pants, ignore pants of so and so company”

Today, this kind of sophisticated searching is within our reach. I know because it is my bread and butter these days to design these kind of search engines.

So what’s under the hood?

Implementing Document Search

Let me walk you through the steps to implement a RAG based search to search information within documents.

There are libraries to process almost every type of document or file out there. Docx, PPT, PDF, CSV, Excel, etc. — All of these documents can be processed.

And if I can process a document and extract text from it, I can search for it as well.

But then you would say — what about images within the documents?

That’s trivial as well with OCR packages like Tesseract. But you won’t need that.. let me explain.

1. Assumptions

Assuming you have all your documents on a Cloud storage.
Assuming you have provisioned a Vector Database.

2.Extraction

You need to write a service to process the documents and extract the following information

▶️The text: Headlines, subheadings, paragraphs, any content that provides context for user to search on

▶️The metadata: Which page? Which Sheet? Which Slide? etc.

▶️Images: The images in case you need to read text using OCR

▶️Image metadata: Which Image? What page? Which Slide? etc.

▶️The document’s URL: Storage URL/Website URL/Document store URL

Extraction can be done by creating your own Services and deploying to a Kubernetes Cluster.

Or writing a Data Pipeline to extract the text.

3.Storage

Irrespective of what you choose as the Middleware, you will be writing the information we extract to a Vector database along with the Metadata.

Now, using the context that the user would provide, we look for the text that we stored in the Vector DB.

And the Vector DB uses the text to find the Metadata, especially the document URL.

This is what it looks like

4. Retrieval

The retrieval step can be represented as follows

The user defers to the Retrieval Service which is an application responsible for

⏺Getting K similar records from the Vector Database containing text(Context) and Document Metadata containing the Document link as well.

⏺Using Prompt Engineering to ensure that the context retrieved from the Vector Database is presented to the LLM in way so that the response generated by the LLM is relevant to the human on the other side. And readable.

⏺Other Non functional requirements like Logging, Retry mechanisms, etc.

The Retrieval Service would of course have to be a scalable service, capable of handling high number of requests. So should the service into which the LLM is deployed but with managed services like Azure Open AI, AWS Bedrock etc. scale is managed by the Cloud Platform.

The Retrieval Service returns to the user the following information

🟠The LLM response — which is a summary of the information retrieved from the most relevant document.

🟢The Document metadata such as the location, when it was created, which team manages it, etc.

🔵The link to the actual document and the location within it from where the relevant information was retrieved. This allows the user to go to the actual document and the actual location from which the information was retrieved.

So let’s summarize the implementation —

The Vector Database along with the LLM is able to provide information that is relevant to the user. But not just the information but the origin and the context. Why is this important?

Because it allows your system to have a feedback loop from the user.

With usage and over time, your system becomes capable to return the latest and most relevant results. This is a big win!

Next, we must talk about Database search. The norm was always to use an Elastic Search or an Apache Solr to make the search smarter. None of that is required now. Let’s see how and why..

Implementing Database Search

Let me be very clear right from the Get go — Expecting LLMs to fire sophisticated SQL queries to databases is fraught with risk. Especially if you think you would be performing CRUD operations.

Don’t do it

Instead, my recommendation would be to have an ORM or Graph QL layer over the Database to form an Access Layer. LLMs perform better on calling APIs with Schemas and Entities as the Bounds and the Context are already known. Plus you have a protective layer over the database that the LLM has to go though.

Assuming you have a Data Access layer in place, this is what Information Retrieval looks like from a Database —

Looks confusing? TMI? Let me break it down for you.

1. Retrieval

Assuming that the Data Retrieval layer is in place, we will leverage the same service to query the Vector Database as well. These are the steps —

❇️User submits a prompt that needs to get converted into a database query. Here’s an example — select all pants that are black in color, have a waist size of 34 and are priced in the range 3000 to 10000.

❇️The Retrieval service submits this prompt to the Data Access Layer to get the Schemas relevant to the prompt from the Vector Database.

❇️The Data Access layer fetches relevant Schemas from the Vector database and returns the most relevant schema.

2. Firing Queries

Now that the Retrieval process is completed, we submit the Schema to the LLM so that it can invoke the relevant API.

NOTE: The API to be called along with how the Schema is to be used becomes part of the prompt engineering

❇️The LLM invokes the API with the relevant schema to hit the database with the actual query.

❇️The response from the database is returned to the Retrieval service along with the LLM summary response, the query that was fired (for context) and other metadata such as table names and columns. This metadata would be part of the Vector database btw.

❇️The response service sends this to the User for feedback.

And that’s it. The user can then make a decision to accept the response or modify their prompt to fire another query.

Tooling

You can implement a RAG architecture using

🟧Langchain

🟨LlamaIndex

🟩GradientJ

Invoke LLMs using

🟦Ollama

🟪GPT4ALL

🟫Dify

Also, if you want to use managed LLMs though APIs then

🟠Open AI

🟡Azure OpenAI

🟢AWS Bedrock

🔵Cohere

🟣Mistral

Of course, you can also deploy your own LLM locally and use Pipelines along with Retrieval QA chains to implement RAG architectures.

Challenges

Some obvious challenges with using this kind of a search strategy are

❗️Its expensive, at least if you compare it to traditional setups using Elastic Search and the likes.

❗️It requires a decent knowledge of Prompt engineering to get the most out of your LLM.

❗️The solution improves over time. You might have to go through a few iterations just to get it into production

❗️Responses aren’t always reliable as there is a probability for the LLM to hallucinate. But if you are smart and ensure the LLM has the right context, this can be avoided with RAG.

❗️Latency would be an issue as LLMs compile the responses. Therefore, responses would take longer than traditional setups.

But then these aren’t challenges that are bad enough to completely ignore the benefits of designing such a solution.

The world of technology is evolving faster than it was a year ago. Tried and tested System design is boring today and thinking out of the box is becoming more and more important.

AI is going to eat away at most jobs because of automation. But this doesn’t mean that it will replace everything.

AI must be used as an enabler rather than a replacement for Human thought and action.

Building this kind of a search architecture has challenges but it also has room to automate data mining and data quality. Its the kind of system that can get smarter autonomously instead of through human intervention or bug fixes.

Follow me Ritesh Shergill

for more articles on

👨‍💻 Tech

👩‍🎓 Career advice

📲 User Experience

🏆 Leadership

I also do

✅ Career Guidance counselling — https://topmate.io/ritesh_shergill/149890

✅ Mentor Startups as a Fractional CTO — https://topmate.io/ritesh_shergill/193786