LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 2

LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 2

The LangChain RAG Pattern series, part 2, explores FastAPI interface creation for enhanced application functionality and user experience. It delves into leveraging FastAPI’s capabilities for efficient request handling and optimal performance, paving the way for a comprehensive LangChain RAG implementation.

Jonathan Scholtes

February 29, 2024

10–15 minutes

Azure Cosmos DB, LangChain, OpenAI, Python, Python FastAPI

The second part of the LangChain RAG Pattern with React, FastAPI, and Cosmos DB Vector Store series continues from Part 1 by creating the FastAPI interface. This involves building a strong and efficient web API with FastAPI for later use by the user web application developed in Part 3. By using FastAPI, developers can effectively handle requests and responses while ensuring great performance. This part of the series provides valuable insights into seamlessly integrating FastAPI, setting the stage for a thorough and dynamic implementation of the LangChain RAG Pattern.

In this article

Prerequisites
FastAPI: What is it?
Download the Project
Setting Up Python Environment
Perform a Vector Search
Leverage RAG to Answer Questions
Walkthrough LangChain Q&A with RAG

Prerequisites

If you don’t have an Azure subscription, create an Azure free account before you begin.
Complete part 1 – LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 1

FastAPI: What is it?

FastAPI is a cutting-edge web framework that enables developers to swiftly build APIs using Python 3.8+. Renowned for its remarkable performance, it rivals the speed of NodeJS and Go, thanks to its integration with Starlette and Pydantic. Not only does FastAPI expedite development by up to 300%*, but it also significantly reduces human-induced errors by 40%*. With excellent editor support and comprehensive documentation, FastAPI ensures an intuitive and easy-to-use experience, minimizing code duplication and maximizing productivity.

*estimation based on tests on an internal development team at FastAPI, building production applications

Download the Project

All the code and sample datasets are accessible to you on my GitHub.

Clone the Demo API project (aptly named demo_api) GitHub Repo

The demo_api project is set up to easily handle web requests and process data transactions against the Azure Cosmos DB for Mongo DB vector database and an Azure Storage Account. This design keeps the code organized, making it easier to develop new features and fix existing functionality.

Setting Up Python Environment

In this tutorial, Python is utilized for the development of a FastAPI web interface, which necessitates its setup on the user’s computer. The tutorial involves the use of Python and LangChain for vector search against Azure Cosmos DB for MongoDB vCore, as well as the execution of Q&A RAG chains. Python version 3.11.4 was used throughout the development and testing of this walkthrough.

First setup your python virtual environment in the demo_api directory.

python -m venv venv

Activate your environment and install dependencies using the requirements file in the demo_api directory:

venv\Scripts\activate
python -m pip install -r requirements.txt

Create a file, named ‘.env’ in the demo_api directory, to store your environment variables.

VECTORDB_ENDPOINT='https://[your-web-app].azurewebsites.net/'
VECTORDB_API_KEY='**your_key**'
OPENAI_API_KEY="**Your Open AI Key**"
MONGO_CONNECTION_STRING="mongodb+srv:**your connection string from Azure Cosmos DB**"
AZURE_STORAGE_CONNECTION_STRING="**"
AZURE_STORAGE_CONTAINER="images"

Environment Variable	Description
VECTORDB_ENDPOINT	The URL Endpoint for Weaviate. You can leave the default value: ‘https://%5Byour-web-app%5D.azurewebsites.net/’ as Weaviate is not used in this exercise.
VECTORDB_API_KEY	The Weaviate authentication api key. You can leave the default value: ‘your_key’ as Weaviate is not used in this exercise.
OPENAI_API_KEY	The key to connect to OpenAI API. If you do not possess an API key for Open AI, you can proceed by following the guidelines outlined here.
MONGO_CONNECTION_STRING	The Connection string for Azure Cosmos DB for MongoDB vCore (see here)
AZURE_STORAGE_CONNECTION_STRING	The Connection string for Azure Storage Account (see here)
AZURE_STORAGE_CONTAINER	The container name used from part 1, defaults to ‘images‘.

.env file variables

In the GitHub repository, the setup is arranged to facilitate the transition between different vector storage systems, specifically Azure Cosmos DB for MongoDB and Weaviate. However, a thorough exploration of this capability is slated to be introduced in an upcoming article.

With the environment configured and variables set up, we are ready to initiate the FastAPI server. Run the following command from the demo_api directory to initiate the server.

python main.py

The FastAPI server launches on the localhost loopback 127.0.0.1 port 8000 by default. You can access the Swagger documents using the following localhost address: http://127.0.0.1:8000/docs

Perform a Vector Search

With our FastAPI service running locally, we can test a vector search against our vector database by using the /search/{query} endpoint. In this scenario, we submit the query “what is a turbofan” and then it conveys a series of corresponding matches to the web interface.

Click Try It out for /search/{query}.

Enter “what is a turbofan” for the query and click Execute.

The response consists of a JSON array containing resources depicted below.

{resource_id": "b801d68a8a0d4f1dac72dc6c170c395b",
 "page_id": "cd3eea19611c423aaaf85e6da691f23d",
 "title": "Rocket Propulsion Elements",
 "source": "Chapter 1 - Classification  (page-2)",
 "content":"..text.."}

Leverage RAG to Answer Questions

The retrieved resources (documents) can serve as the groundwork for an LLM in addressing inquiries. In this regard, our RAG endpoint for Q&A (question and answer): /search/qa/{query} can be utilized for this purpose. Upon submitting the query “what is a turbofan”, the response includes ‘text’ in addition to the relevant documents. This ‘text’ constitutes the LLM’s answer derived/grounded from our vector search matching the query.

The response from our RAG endpoint includes the LLM ‘answer’ to our query stored in the ‘text’ value along with a list of the resources used to support the answer under ‘ResourceCollection’.

{
  "text": "A turbofan is a type of air-breathing engine that uses a fan to compress air and mix it with fuel for combustion. It is a type of ducted engine that is more fuel-efficient than a turbojet. Turbofans are commonly used in commercial aircraft for propulsion.",
  "ResourceCollection": [
    {
      "resource_id": "b801d68a8a0d4f1dac72dc6c170c395b",
      "page_id": "cd3eea19611c423aaaf85e6da691f23d",
      "title": "Rocket Propulsion Elements",
      "source": "Chapter 1 - Classification  (page-2)",
      "content": "....text..."
},...]
}

The query first retrieves documents via a vector search. Subsequently, it prompts the LLM to produce the answer using both the documents and the query.

Walkthrough LangChain Q&A with RAG

When integrating RAG in the API, the web search components initiate all requests, followed by the search service, and finally the data components. In our case, we use MongoDB data search, which connects to Azure Cosmos DB for MongoDB vCore. The layers exchange Model components, and most of the LangChain code resides in the service layer. I implemented this approach to facilitate the interchangeability of data sources while using the same chains.

Web Layer

The Web layer is responsible for routing requests and managing communication with the caller. In the context of the Q&A RAG, we only have two code files: the primary FastAPI application (main.py) and the API router for search (search.py).

main.py

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

from web import search, content

app = FastAPI()

origins = ["*"]

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)



app.include_router(search.router)
app.include_router(content.router)


@app.get("/")
def get() -> str:
    return "running"


if __name__ == "__main__":
    import uvicorn
    uvicorn.run("main:app", reload=True)

This code, main.py, kicks us off and sets up the FastAPI web application with CORS middleware enabled, allowing it to handle requests from different origins seamlessly. By including routers for the search and content-related endpoints, the application becomes well-equipped to process various types of requests effectively. Moreover, the definition of a default endpoint (“/”) that returns a simple “running” message serves as a valuable tool for testing the application’s functionality. I leveraged this simple message during development and testing phases. The utilization of uvicorn to run the FastAPI application and enabling auto-reload reinforces the efficiency of the development process, reducing the time and effort required for testing and refining the application.

web/search.py

from fastapi import APIRouter
from service import search as search 
from model.resource import Resource
from model.airesults import AIResults

router = APIRouter(prefix="/search")


@router.get("/{query}")
def get_search(query) -> list[Resource]:
    return search.get_query(query)


@router.get("/summary/{query}")
def get_query_summary(query) -> AIResults:
    return search.get_query_summary(query)


@router.get("/qa/{query}")
def get_query_qa(query) -> AIResults:
    return search.get_qa_from_query(query)

The web search code is responsible for managing various search-related endpoints. It facilitates the integration of functions from the search service module, as well as the inclusion of models for resources (Resource) and AI results (AIResults). The router is distinctly prefixed with “/search” and encompasses three essential GET routes, each serving distinct purposes.

The first endpoint, /search/{query}, plays a pivotal role in retrieving search results based on the provided query. This process involves invoking the get_query function from the search service module, which then orchestrates the return of a list of Resource objects, encapsulating the essence of the search results obtained.

Moving on to the /search/summary/{query} route, this pathway is designed to fetch a summary of the search results associated with the provided query. This functionality is realized by enlisting the get_query_summary function from the search service module, ultimately presenting an AIResults object encapsulating the summarized outcomes of the search process.

Lastly, the /search/qa/{query} route focuses on retrieving question and answer results using the LangChain RAG pattern. This is achieved through invocation of the get_qa_from_query function from the search service module.

Service Layer

The service layers serve as the foundation for our primary business logic. In the context of this particular use case, the service layers play a vital role as the repository for LangChain code, positioning them as a central component within our technological framework.

service/search.py


from data.mongodb import search as search 

from langchain.docstore.document import Document
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate

from model.airesults import AIResults
from model.resource import Resource


template:str = """Use the following pieces of context to answer the question at the end.
                    If none of the pieces of context answer the question, just say you don't know.
                    If you don't know the answer, just say that you don't know, don't try to make up an answer.
                    Use three sentences maximum and keep the answer as concise as possible.

                    {context}

                    Question: {question}

                    Answer:"""


def get_query(query:str)-> list[Resource]:
    resources, docs = search.similarity_search(query)
    return resources


def get_query_summary(query:str) -> str:
    prompt_template = """Write a summary of the following:
    "{text}"
    CONCISE SUMMARY:"""
    prompt = PromptTemplate.from_template(prompt_template)

    resources, docs = search.similarity_search(query)

    if len(resources)==0:return AIResults(text="No Documents Found",ResourceCollection=resources)

    llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
    llm_chain = LLMChain(llm=llm, prompt=prompt)

    stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")

    return AIResults(stuff_chain.run(docs),resources) 


def get_qa_from_query(query:str) -> str:
   
    resources, docs = search.similarity_search(query)

    if len(resources) ==0 :return AIResults(text="No Documents Found",ResourceCollection=resources)

    custom_rag_prompt = PromptTemplate.from_template(template)
    llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)

    content = format_docs(docs)

    rag_chain = (
    {"context": lambda x: content , "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
    )

    return AIResults(text=rag_chain.invoke(query),ResourceCollection=resources)

The service search.py demonstrates the integration of database interactions, language models, and prompt templates to deliver functionalities such as summarization and question-answering based on user queries. It initially defines a template string named template which functions as a blueprint for creating questions and answers for RAG. This template encompasses placeholders for context and question prompts. The search service comprises three functions:

The first search service function, get_query(query:str) -> list[Resource]: This function accepts a query string as input and conducts a vector search in the database for resources similar to the query, returning a list of matching resources.

The second function, get_query_summary(query:str) -> str: This function generates a concise summary of the documents related to the given query. It initially seeks out relevant documents via a vector search, then utilizes a StuffDocumentChain to summarize the documents.

The final function, get_qa_from_query(query:str) -> str: This function retrieves documents related to the query and crafts answers to questions based on the retrieved documents. It employs the global template for creating questions and answers along with a custom_rag_prompt to pass in the documents returned from the vector search.

Data Layer

Finally, we have reached the Data Layer. Although the GitHub repository includes code for both Weaviate and Cosmos DB for MongoDB (the code simply refers to MongoDB), our focus in this series will be exclusively on connecting to Cosmos DB. In the Data Layer, I utilize the module as a singleton to handle database related connections in the init.py file. While this approach is clean, I often contemplate the use of global variables. In addition to the init.py file, there is also the search.py file, which executes the vector search against CosmoDB.

data/mongodb/init.py

from os import environ
from dotenv import load_dotenv
from pymongo import MongoClient
from pymongo.collection import Collection
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.azure_cosmos_db import AzureCosmosDBVectorSearch


load_dotenv(override=True)


collection: Collection | None = None
vector_store: AzureCosmosDBVectorSearch | None=None

def mongodb_init():
    MONGO_CONNECTION_STRING = environ.get("MONGO_CONNECTION_STRING")
    DB_NAME = "research"
    COLLECTION_NAME = "resources"
    INDEX_NAME = "vectorSearchIndex"

    global collection, vector_store
    client = MongoClient(MONGO_CONNECTION_STRING)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    vector_store = AzureCosmosDBVectorSearch.from_connection_string(MONGO_CONNECTION_STRING,
                                                                    DB_NAME + "." + COLLECTION_NAME,
                                                                    OpenAIEmbeddings(disallowed_special=()),
                                                                    index_name=INDEX_NAME                                                                    
)
   

mongodb_init()

The init.py file first loads environment variables from a .env file using the load_dotenv(override=True) method. Subsequently, it declares the global variables collection (representing the CosmosDB MongoDB collection) and vector_store (representing the CosmosDB Vector Search). The MongoClient is then initialized with the MongoDB connection string, and both the database and collection objects are retrieved. These objects are subsequently passed into the AzureCosmosDBVectorSearch from_connection_string() method.

data/mongodb/search.py

from model.resource import Resource
from .init import collection, vector_store
from langchain.docstore.document import Document
from typing import List, Optional, Union




def results_to_model(result:Document) -> Resource:
    return Resource(resource_id = result.metadata["resource_id"],
                        page_id = result.metadata["page_id"],
                        title=result.metadata["title"],
                        source=f"{result.metadata['chapter']}  (page-{result.metadata['pagenumber']})",
                        content=result.page_content)



def similarity_search(query:str)-> tuple[list[Resource], list[Document]]:

    docs = vector_store.similarity_search_with_score(query,4)

    # Cosine Similarity:
    #It measures the cosine of the angle between two vectors in an n-dimensional space.
    #The values of similarity metrics typically range between 0 and 1, with higher values indicating greater similarity between the vectors.
    docs_filters = [doc for doc, score  in docs if score >=.75]

    # List the scores for documents
    for doc, score  in docs:
        print(score)

    # Print number of documents passing score threshold
    print(len(docs_filters))
  
    return [results_to_model(document) for document in docs_filters],docs_filters

The data retrieval code accesses global variables (collection, vector_search) from our singleton and currently encompasses two functions pertaining to similarity search and data transformation.

The first function, results_to_model(result: Document) -> Resource, accepts a LangChain Document object as input and yields a Resource object. It extracts metadata attributes from the provided Document, such as resource_id, page_id, title, chapter, and page number. This function ensures reuse of vector search returned resources beyond LangChain.

The second function, similarity_search(query: str) -> Tuple[List[Resource], List[Document]], conducts a similarity search using a specified query. This function invokes the similarity_search_with_score method of vector_store to retrieve documents that are similar to the given query, along with their respective similarity scores. It filters the retrieved documents based on a threshold score of 0.75 and transforms the filtered documents into Resource objects using the results_to_model function, returning both the transformed Resource objects and the original Document objects.

Congratulations on successfully completing the setup of the web framework! It has indeed required significant effort to broaden our foundation for the upcoming tasks in Part 3.

7 responses to “LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 2”

Anonymous

March 1, 2024 at 6:28 am

Thanks a lot!

LikeLike

Reply
Gregoire

March 4, 2024 at 2:41 am

Hi Jonathan,

Thanks again for this very interesting tutorial. I was wondering if there’s a way to apply a filtering on the data (and build a custom vectorstore from a collection), before applying similarity search. I haven’t found any solutions so far, it seems Cosmo only features the post-filtering, with $match for instance.

Looking forward for your next tutorials

LikeLike

Reply
1. Jonathan Scholtes
  
  March 4, 2024 at 11:54 am
  
  You could use the LangChain Self-querying retriever or filter the AzureCosmosDBVectorSearch retriever:
  vector_store.as_retriever(
  search_kwargs={‘filter’: {‘Chapter 1 – Classification’}}
  )
  
  LikeLike
  
  Reply
LangChain RAG with React, FastAPI, Cosmos DB Vectors: Part 3 – Stochastic Coder

February 21, 2025 at 1:55 pm

[…] We set up our database and loaded our vector database in Part 1. The next phase, documented in Part 2, involved leveraging LangChain and Python FastAPI to craft a RAG (Retrieval-Augmented Generation) […]

LikeLike

Reply
LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 1 – Stochastic Coder

February 21, 2025 at 2:02 pm

[…] vCore, and subsequently uploaded binary documents to an Azure Storage Account for application in Part 2 and Part 3 of this instructional […]

LikeLike

Reply
Improve LLM Performance Using Semantic Cache with Cosmos DB – Stochastic Coder

February 21, 2025 at 2:20 pm

[…] the semantic caching logic into the Python FastAPI code initially created in the article: LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 2. This article will solely address the code modifications necessary for implementing semantic […]

LikeLike

Reply
Empower your AI Agent with Azure Cosmos DB – Stochastic Coder

February 21, 2025 at 2:24 pm

[…] The AI travel agent will be hosted in a backend API using Python FastAPI, facilitating integration with the frontend user interface. The API project has been configured to process agent requests by grounding the LLM prompts against the data layer, specifically the Azure Cosmos DB for MongoDB vCore vectors and documents. Furthermore, the agent will make use of various tools, particularly the Python functions provided at the API service layer. This article will focus in on the code necessary for AI agents within the API code. For a more comprehensive examination of the code related to vector search with Python FastAPI, please refer to the article LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 2. […]

LikeLike

Reply

Stochastic Coder

LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 2

Prerequisites

FastAPI: What is it?

Download the Project

Setting Up Python Environment

Perform a Vector Search

Leverage RAG to Answer Questions

Walkthrough LangChain Q&A with RAG

Web Layer

main.py

web/search.py

Service Layer

service/search.py

Data Layer

data/mongodb/init.py

data/mongodb/search.py

7 responses to “LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 2”

Leave a Reply Cancel reply

Optimizing AI Agents for Scale: Triage, Throttle Control, and Model Right-Sizing

From Chatbots to Autonomous Agents: Building an Always-On Risk Platform

LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 2

Prerequisites

FastAPI: What is it?

Download the Project

Setting Up Python Environment

Perform a Vector Search

Leverage RAG to Answer Questions

Walkthrough LangChain Q&A with RAG

Web Layer

main.py

web/search.py

Service Layer

service/search.py

Data Layer

data/mongodb/init.py

data/mongodb/search.py

Share this:

7 responses to “LangChain RAG with React, FastAPI, Cosmos DB Vector: Part 2”

Leave a Reply Cancel reply