Building an Intelligent Document Retrieval System with Pinecone and LlamaIndex

In this article, we'll walk through setting up a document retrieval system using Pinecone for vector storage and LlamaIndex for managing the ingestion and querying processes. This step-by-step guide will help you understand how to integrate these tools for efficient and intelligent document searches.

![Test app on HuggingFace]

Using LlamaIndex with Pinecone

Accompanying notebook to this documentation. Note this notebook requires a paid OpenAI plan to avoid rate limits.

Prerequisites

Before we start, ensure you have the following:

Python 3.6 or higher installed.
Accounts on Pinecone and OpenAI.
API keys for both Pinecone and OpenAI.

Step 1: Install the Required Libraries

First, we need to install the necessary Python libraries. Open your terminal and run the following command:

!pip install -qU \
    llama-index==0.9.34 \
    "pinecone-client[grpc]"==3.0.0 \
    arxiv==2.1.0

Step 2: Set Up API Keys

We need to set up the API keys for Pinecone and OpenAI. You can either set them as environment variables or prompt for input if they are not already set.

import os
from getpass import getpass

pinecone_api_key = os.getenv("PINECONE_API_KEY") or getpass("Enter your Pinecone API Key: ")
openai_api_key = os.getenv("OPENAI_API_KEY") or getpass("Enter your OpenAI API Key: ")

Step 3: Initialize the Ingestion Pipeline

Next, we'll set up the ingestion pipeline using LlamaIndex. This pipeline will handle document parsing and vectorization.

from llama_index.node_parser import SemanticSplitterNodeParser
from llama_index.embeddings import OpenAIEmbedding
from llama_index.ingestion import IngestionPipeline

# This will be the model we use both for Node parsing and for vectorization
embed_model = OpenAIEmbedding(api_key=openai_api_key)

# Define the initial pipeline
pipeline = IngestionPipeline(
    transformations=[
        SemanticSplitterNodeParser(
            buffer_size=1,
            breakpoint_percentile_threshold=95,
            embed_model=embed_model,
        ),
        embed_model,
    ],
)

Explanation:

SemanticSplitterNodeParser: This transformation splits documents into smaller chunks based on semantic content.
OpenAIEmbedding: This model generates embeddings using OpenAI's API.

Step 4: Initialize Pinecone

We now set up the Pinecone connection and index.

from pinecone.grpc import PineconeGRPC
from pinecone import ServerlessSpec

from llama_index.vector_stores import PineconeVectorStore

# Initialize connection to Pinecone
pc = PineconeGRPC(api_key=pinecone_api_key)
index_name = "anualreport"

# Initialize your index
pinecone_index = pc.Index(index_name)

# Initialize VectorStore
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

Explanation:

PineconeGRPC: Initializes the gRPC connection to Pinecone.
Index: Accesses the specified index in Pinecone.
PineconeVectorStore: Interface for storing and retrieving vectors from Pinecone.

Step 5: Create a VectorStoreIndex

We now create an index from the vector store and set up a retriever to fetch relevant documents.

from llama_index import VectorStoreIndex
from llama_index.retrievers import VectorIndexRetriever

# Due to how LlamaIndex works here, if your Open AI API key was
# not set to an environment variable before, you have to set it at this point
if not os.getenv('OPENAI_API_KEY'):
    os.environ['OPENAI_API_KEY'] = openai_api_key

# Instantiate VectorStoreIndex object from our vector_store object
vector_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

# Grab 5 search results
retriever = VectorIndexRetriever(index=vector_index, similarity_top_k=5)

Explanation:

VectorStoreIndex: Creates an index from the vector store.
VectorIndexRetriever: Retrieves the top K similar documents from the index.

Step 6: Query the Vector Database

Finally, we query the vector database to retrieve the top results for our query.

# Query vector DB
answer = retriever.retrieve('Summery of the Anual Report?')

# Inspect results
print([i.get_content() for i in answer])

Explanation:

retrieve: Queries the vector store for the most similar documents to the given query.
get_content: Extracts the content of the retrieved documents.

Step 7: Integrate with LlamaIndex Query Engine

To get a more refined answer using LlamaIndex's query engine, we integrate the retriever with the query engine.

from llama_index.query_engine import RetrieverQueryEngine

# Pass in your retriever from above, which is configured to return the top 5 results
query_engine = RetrieverQueryEngine(retriever=retriever)

# Now you query:
llm_query = query_engine.query('Summery of the Anual Report?')

llm_query.response

Explanation:

RetrieverQueryEngine: Uses the retriever to fetch results and refine them into a coherent response.
query: Executes the query and retrieves a formatted response.

Conclusion

By following these steps, you can set up an intelligent document retrieval system using Pinecone and LlamaIndex. This system can efficiently handle large volumes of documents and provide accurate and relevant results for your queries.