Sitemap

Build a đź§  Reflective Agentic RAG Workflow using LangGraph, Typesense, Tavily, Ollama and Cohere

40 min readSep 21, 2025
Press enter or click to view image in full size
Press enter or click to view image in full size

Why Traditional RAG Is No Longer Enough

In the rapidly evolving landscape of artificial intelligence, Retrieval-Augmented Generation (RAG) has emerged as a crucial technology for enhancing the factual accuracy and relevance of large language models. Traditional RAG systems work by retrieving relevant information from external knowledge sources and then using this context to generate more informed responses. However, these systems often suffer from significant limitations — they retrieve documents indiscriminately without evaluating their true relevance, generate responses without verifying their accuracy against sources, and lack mechanisms for self-correction when information is insufficient or misleading .

Enter Self-Reflective RAG (SELF-RAG), a groundbreaking approach that addresses these limitations by incorporating critical self-assessment mechanisms throughout the retrieval and generation process. This paradigm shift represents the next evolution in AI systems that can not only access information but also evaluate its quality, reflect on their own outputs, and continuously improve their responses through iterative refinement.

What Makes Self-RAG Different?

The Power of Self-Reflection

Self-RAG introduces a sophisticated framework where the language model learns to adaptively retrieve passages on-demand and generates reflection tokens to critique both retrieved documents and its own outputs. These reflection tokens serve as quality control markers, indicating whether retrieval is needed and evaluating the relevance, support, and completeness of generated responses .

Unlike conventional RAG approaches that retrieve passages indiscriminately, Self-RAG trains a single arbitrary language model to make intelligent decisions about when to retrieve information, how to process it, and when to generate responses based on its critical assessment of available knowledge .

Key Components of Self-RAG Systems

  1. Adaptive Retrieval: The system dynamically decides whether retrieval is necessary based on the query complexity and its existing knowledge, preventing unnecessary searches for straightforward questions .
  2. Relevance Assessment: Retrieved documents are critically evaluated for their relevance to the query before being used in generation .
  3. Self-Critique: The system examines its own responses for hallucinations, factual accuracy, and completeness relative to the retrieved evidence .
  4. Iterative Refinement: When responses are deemed insufficient or unsupported, the system can initiate additional web searches or retrieval steps to gather more information .

The implementation of Self-RAG typically involves sophisticated frameworks like LangGraph, which enables the creation of stateful, cyclic computational workflows that manage these multi-step processes with dynamic adjustment capabilities based on real-time assessments .

LangExtract: Precision Information Extraction

While Self-RAG improves the overall quality of retrieval and generation, tools like LangExtract address another critical challenge: precisely extracting structured information from unstructured text with exact source grounding.

What is LangExtract?

LangExtract is a Gemini-powered Python library developed by Google for programmatically extracting structured, grounded information from unstructured textual data. It serves as an intelligent layer on top of LLMs, providing the necessary scaffolding to transform their language understanding capabilities into reliable information extraction systems .

Key Features and Advantages

  1. Exact Source Grounding: LangExtract’s standout feature is its ability to map every extracted entity back to its exact character offsets in the source text. This provides unparalleled traceability, allowing users to verify where each piece of information originated — a crucial capability for production systems where accuracy is paramount .
  2. Smart Chunking Strategies: The library intelligently handles the “needle-in-a-haystack” problem in long documents by respecting sentence boundaries, paragraph breaks, and natural text flow. This results in chunks that actually make sense to the LLM, significantly improving extraction quality .
  3. Parallel Processing Capabilities: LangExtract can process multiple chunks simultaneously using the max_workers parameter (up to 10 chunks in parallel), dramatically reducing latency without compromising quality .
  4. Multi-Pass Extraction: The library runs multiple extraction passes independently, leveraging the LLM’s stochastic nature to catch entities that might be missed in a single run. This “second opinion” approach is particularly valuable for critical applications where accuracy matters more than cost .
  5. Interactive Visualization: Instead of staring at raw JSON output, users get an interactive view that shows extracted entities in their original context, functioning like a highlighter that displays exactly what was extracted and where .

Limitations and Considerations

Despite its powerful capabilities, LangExtract has some limitations:

  1. Domain-Specific Challenges: The library excels at identifying standard entities like product names and established business terminology but may struggle with industry-specific acronyms or specialized jargon .
  2. Context-Dependent Performance: Extraction quality varies depending on context — terms mentioned in clear, structured sentences are captured perfectly, while those buried in conversational tangents or transcription errors may produce “match_fuzzy” results instead of “match_exact” .
  3. API Dependency: As a relatively new library, LangExtract’s long-term maintenance and support trajectory should be considered for production systems requiring stability .

# Define the extraction schema for metadata
extraction_class = "document_metadata" # Custom class for your metadata
attributes = ["title", "author", "date", "intent"] # Attributes you want to extract

# Provide few-shot examples for accurate extraction. Adjust these based on your document type.
examples = [
lx.data.ExampleData(
text="The annual report was authored by Jane Smith in January 2023.",
extractions=[ # Use a list of Extraction objects
lx.data.Extraction(
extraction_class=extraction_class,
extraction_text="The annual report was authored by Jane Smith in January 2023.",
attributes={"title": "annual report", "author": "Jane Smith", "date": "January 2023", "intent": "The annual report representing the company annual turnover was presented by Jane Smith"}
)
]
),
lx.data.ExampleData(
text="Project timeline document by John Doe on 2022-12-15.",
extractions=[ # Use a list of Extraction objects
lx.data.Extraction(
extraction_class=extraction_class,
extraction_text="Project timeline document by John Doe on 2022-12-15.",
attributes={"title": "Project timeline document", "author": "John Doe", "date": "2022-12-15", "intent": "This document explains the project timelines in detail"}
)
]
),
# Add more examples to improve accuracy
]

import textwrap

# 1. Define a concise prompt
prompt = textwrap.dedent("""\
Extract document metadata such as title, author, date, and intent.
Use exact text for extractions. Do not paraphrase or overlap entities.
Provide meaningful attributes for each entity to add context.""")

# Extract metadata from all documents in a batch
doc_texts = [doc.page_content for doc in documents]
# print(doc_texts[10]) # Inspect a sample document
results =[]

for input_text in doc_texts:
try:
result = lx.extract(
text_or_documents=input_text,
prompt_description=prompt,
language_model_type=lx.inference.OllamaLanguageModel,
examples=examples,
model_id="llama3.2",
temperature=0.3,
model_url="http://localhost:11434")
print("Extraction successful!")
print(result)
results.append(result.extractions[0].attributes)

except Exception as e:
print(f"Extraction failed: {e}")

Typesense: High-Performance Search Infrastructure

For any RAG system to function effectively, it requires a robust search engine capable of quickly retrieving relevant information. Typesense has emerged as a popular open-source solution in this space.

Typesense Strengths

  1. Blazing Fast Performance: Typesense is optimized for speed, typically returning search results in less than 50ms, making it ideal for real-time applications .
  2. Typo Tolerance: The search engine offers built-in typo tolerance out of the box, significantly improving user experience by accounting for common spelling errors .
  3. Ease of Use: Typesense prioritizes developer experience with a simple API, clear semantics, and smart defaults that work well out-of-the-box without extensive configuration .
  4. Flexible Deployment: It supports various deployment options, including self-hosting and cloud-hosted solutions, providing flexibility for different infrastructure preferences .
  5. Query-Time Flexibility: Unlike some alternatives, Typesense allows most settings (fields to search, facets, ranking settings) to be configured at query time rather than requiring predefined index configurations, enabling greater adaptability to different use cases .

Typesense Limitations

  1. Real-Time Indexing Challenges: Some users report that Typesense struggles with real-time indexing when handling frequent content updates, which can disrupt workflows requiring instant retrieval of newly added content .
  2. Analytics Capabilities: The platform lacks extensive built-in analytics capabilities compared to more established alternatives, limiting insights into search performance and user behavior .
  3. Ecosystem Maturity: As a relatively newer project compared to giants like Elasticsearch, Typesense has a smaller ecosystem and fewer third-party integrations, which may require more custom development for specific use cases .
  4. Complex Scenario Configuration: While excellent for standard search requirements, Typesense may require more manual configuration for complex search scenarios compared to more established alternatives .
docsearch = Typesense.from_documents(
chunks,
embeddings,
typesense_client_params={
"host": "XXXXXXXXXX.a1.typesense.net", # Use xxx.a1.typesense.net for Typesense Cloud
"port": "443", # Use 443 for Typesense Cloud
"protocol": "https", # Use https for Typesense Cloud
"typesense_api_key": "Typesense_API_KEY",
"typesense_collection_name": "mcp",
},
)

Competitive Landscape

Press enter or click to view image in full size

Note: The above comparison is based on my analysis and open to scrutinization

Technology Stack Details

Press enter or click to view image in full size
Press enter or click to view image in full size

Technology Stack Mind Map

Press enter or click to view image in full size

🏗️ Architectural Overview

The RAG application is built with a layered architecture that separates concerns and allows for modular development. Each layer has specific responsibilities and interacts with adjacent layers through well-defined interfaces.

đź’»User Interface Layer

  • Streamlit App: Main web application framework
  • Configuration Panel: Manages API keys and application settings
  • File Uploader: Handles PDF document input from users
  • Query Interface: Provides the Q&A interaction interface

đź“„Document Processing Layer

  • DocumentProcessor: Main orchestrator for document processing
  • PDF Loader: Extracts text content from uploaded PDFs
  • Text Splitter: Divides documents into manageable chunks
  • Metadata Extractor: Identifies and extracts document metadata

đź§ AI/ML Layer

  • Ollama Embeddings: Generates vector representations of text
  • ChatGroq LLM: Large language model for response generation
  • Enhanced Retrieval: Multi-stage document retrieval system
  • Agentic RAG: Intelligent workflow for Q&A with reflection

đź’ľStorage Layer

  • Typesense: High-performance vector database for document storage
  • Collections: Organized storage units for different document types
  • Full-text Search — Text indexing and querying
  • Schema Management — Data structure definition

🔌External Services

  • Ollama: Local LLM server for AI inference
  • Tavily Search: Web search API for external knowledge
  • Cohere Rerank: API for improving retrieval quality

🔄Data Flow

  • Documents flow from UI through processing to storage
  • Queries trigger retrieval and generation processes
  • External services augment core functionality
  • All components are loosely coupled for flexibility

Data Flow Explanation

Document Ingestion

Users upload PDF documents and provide API keys for external services. The system processes these documents through text extraction and metadata enhancement.

Processing & Storage

Documents are split into chunks, enhanced with metadata, and converted to vector embeddings. These embeddings are then stored in Typesense for efficient retrieval.

Query Processing

When a user submits a query, the system uses enhanced retrieval methods to find relevant document chunks from the vector database.

Answer Generation

The Agentic RAG system generates answers using the LLM. If the system detects uncertainty, it performs web searches to enhance the answer quality.

🎯 Key Features Highlighted

Agentic Workflow

  • Self-Reflection: Automatically determines when additional web search is needed
  • Multi-Step Processing: Retrieve → Generate → Reflect pipeline
  • Error Recovery: Handles authentication and API failures gracefully
  • Adaptive Learning: Improves responses based on feedback and context

Advanced Retrieval

  • Hybrid Search: Combines vector similarity and keyword-based BM25
  • Ensemble Retrieval: Weighted fusion of multiple retrieval methods
  • Reranking: Cohere or Flashrank for improved relevance
  • Contextual Understanding: Semantic analysis of queries and documents

Scalable Architecture

  • Modular Design: Separated concerns with clear component boundaries
  • Configurable: Flexible API key management and service selection
  • Local + Cloud: Hybrid approach with local Ollama and cloud services
  • Extensible Framework: Easy to add new data sources and AI models

Robust Error Handling

  • API Validation: Comprehensive API key checking
  • Graceful Degradation: Falls back to available services
  • User Feedback: Clear error messages and status updates
  • Fault Tolerance: Continues operation despite partial failures

Code Implementation

Press enter or click to view image in full size

Install required dependencies

uv pip install streamlit pandas langchain langchain-community langchain-groq 
langchain-ollama typesense pypdf python-dotenv langextract
import streamlit as st
import pandas as pd
import time
import os
import tempfile
from typing import List, Dict, Any, TypedDict, Annotated
import hashlib
from datetime import datetime
import json
import uuid
from operator import add
import re

# Import core libraries that are always needed
from pydantic import BaseModel, Field

# Try to import langchain core components needed for type definitions
try:
from langchain_core.messages import BaseMessage
except ImportError:
# Create a dummy BaseMessage if langchain is not available
BaseMessage = object

# Import necessary libraries (you'll need to install these)
try:
from langchain_community.document_loaders import PyPDFLoader, TextLoader, CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ollama.embeddings import OllamaEmbeddings
from langchain_groq import ChatGroq
from langchain_community.vectorstores import Typesense
from langchain_core.documents import Document
from langchain.chains import RetrievalQA
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.document_compressors import FlashrankRerank
from langchain_cohere import CohereRerank
from langchain_tavily import TavilySearch
from langgraph.graph import StateGraph, END
from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
import typesense
import langextract as lx
import textwrap
except ImportError as e:
st.error(f"Missing required libraries. Please install: {e}")

# Page configuration
st.set_page_config(
page_title="Document Intelligence Hub",
page_icon="📚",
layout="wide",
initial_sidebar_state="expanded"
)

# Custom CSS for professional styling
st.markdown("""
<style>
.main-header {
font-size: 2.5rem;
font-weight: 700;
color: #1f77b4;
text-align: center;
margin-bottom: 2rem;
background: linear-gradient(90deg, #1f77b4, #ff7f0e);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
}

.section-header {
font-size: 1.5rem;
font-weight: 600;
color: #2c3e50;
margin-top: 2rem;
margin-bottom: 1rem;
padding-bottom: 0.5rem;
border-bottom: 2px solid #3498db;
}

.status-box {
padding: 1rem;
border-radius: 10px;
margin: 1rem 0;
border-left: 5px solid #3498db;
background-color: #f8f9fa;
}

.success-box {
background-color: #d4edda;
border-color: #28a745;
color: #155724;
}

.warning-box {
background-color: #fff3cd;
border-color: #ffc107;
color: #856404;
}

.error-box {
background-color: #f8d7da;
border-color: #dc3545;
color: #721c24;
}

.metric-card {
background: white;
padding: 1rem;
border-radius: 10px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
border-left: 4px solid #3498db;
}

.progress-container {
background: #f1f3f4;
border-radius: 10px;
padding: 1rem;
margin: 1rem 0;
}
</style>
""", unsafe_allow_html=True)

# Define Pydantic models for structured output
class crFormat(BaseModel):
Web_Response: str = Field(..., description="response generated for the question asked based on the information provided.")

# Define state schema for agentic RAG
class AgentState(TypedDict):
question: str
context: List[str]
generation: str
reflection: str
messages: Annotated[List[BaseMessage], add]

class DocumentProcessor:
def __init__(self):
self.typesense_client = None
self.vectorstore = None
self.qa_chain = None
self.documents = []
self.chunks = []
self.collection_name = "documents"
self.enhanced_docs = []
self.compression_retriever = None
self.agentic_rag_graph = None
self.current_file_collection = None

def initialize_typesense_client(self, typesense_host: str, typesense_port: str, typesense_api_key: str):
"""Initialize Typesense client"""
try:
self.typesense_client = typesense.Client({
'nodes': [{
'host': typesense_host,
'port': typesense_port,
'protocol': 'http'
}],
'api_key': typesense_api_key,
'connection_timeout_seconds': 60
})

# Test connection
self.typesense_client.collections.retrieve()
return True
except Exception as e:
st.error(f"Error connecting to Typesense: {str(e)}")
return False

def initialize_typesense_client_with_protocol(self, typesense_host: str, typesense_port: str, typesense_api_key: str, protocol: str):
"""Initialize Typesense client with custom protocol"""
try:
self.typesense_client = typesense.Client({
'nodes': [{
'host': typesense_host,
'port': typesense_port,
'protocol': protocol
}],
'api_key': typesense_api_key,
'connection_timeout_seconds': 60
})

# Test connection
self.typesense_client.collections.retrieve()
return True
except Exception as e:
st.error(f"Error connecting to Typesense: {str(e)}")
return False

def check_collection_exists(self, collection_name: str):
"""Check if a Typesense collection exists - using reference app logic"""
try:
collections = self.typesense_client.collections.retrieve()
collection_names = [col['name'] for col in collections]
print(f"Collection_Name:{collection_name}")
return collection_name in collection_names
except Exception as e:
error_msg = str(e)
if "401" in error_msg or "invalid api token" in error_msg.lower():
st.error("🔑 **Invalid Typesense API Key!** Please check your API key in the sidebar.")
elif "404" in error_msg or "not found" in error_msg.lower():
st.warning("📡 **Typesense server not found.** Please check your host and port settings.")
else:
st.error(f"❌ **Typesense connection error:** {error_msg}")
print(f"Error checking collections: {error_msg}")
return False

def generate_collection_name_from_file(self, filename: str):
"""Generate a valid Typesense collection name from filename"""
import re
# Remove file extension
name = os.path.splitext(filename)[0]
# Replace spaces and special characters with underscores
name = re.sub(r'[^a-zA-Z0-9_]', '_', name)
# Ensure it starts with a letter
if name and name[0].isdigit():
name = f"doc_{name}"
# Limit length to 64 characters (Typesense limit)
name = name[:64]
# Ensure it's not empty
if not name:
name = "document"
return name.lower()

def get_collection_name_for_files(self, uploaded_files):
"""Generate collection name based on uploaded files"""
if len(uploaded_files) == 1:
# Single file - use file name
return self.generate_collection_name_from_file(uploaded_files[0].name)
else:
# Multiple files - create a combined name or use a generic one
file_names = [self.generate_collection_name_from_file(f.name) for f in uploaded_files[:3]] # Use first 3 files
combined_name = "_".join(file_names)
if len(combined_name) > 60: # Leave room for potential suffix
combined_name = combined_name[:60]
return combined_name or "multi_documents"

def create_vectorstore_from_documents(self, chunks, embeddings, typesense_host, typesense_port, typesense_api_key, typesense_protocol):
"""Create vectorstore using from_documents - reference app approach"""
try:
collection_exists = self.check_collection_exists(self.collection_name)

if not collection_exists:
st.info(f"Creating new Typesense collection '{self.collection_name}'")
docsearch = Typesense.from_documents(
chunks,
embeddings,
typesense_client_params={
"host": typesense_host,
"port": typesense_port,
"protocol": typesense_protocol,
"typesense_api_key": typesense_api_key,
"typesense_collection_name": self.collection_name,
},
)
else:
st.success(f"Using existing Typesense collection '{self.collection_name}'")
docsearch = Typesense(
embedding=embeddings,
typesense_client=self.typesense_client,
typesense_collection_name=self.collection_name
)

return docsearch

except Exception as e:
st.error(f"Error creating vectorstore: {str(e)}")
return None

def load_document(self, file_path: str, file_type: str):
"""Load document based on file type"""
try:
if file_type == "pdf":
loader = PyPDFLoader(file_path)
elif file_type == "txt":
loader = TextLoader(file_path)
elif file_type == "csv":
loader = CSVLoader(file_path)
else:
raise ValueError(f"Unsupported file type: {file_type}")

documents = loader.load()
return documents
except Exception as e:
st.error(f"Error loading document: {str(e)}")
return None

def chunk_documents(self, documents: List, chunk_size: int = 1000, chunk_overlap: int = 200):
"""Split documents into chunks"""
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
length_function=len,
)
chunks = text_splitter.split_documents(documents)
return chunks

def extract_metadata(self, documents: List, progress_callback=None):
"""Extract metadata from documents using LangExtract"""
try:
# Define the extraction schema for metadata
extraction_class = "document_metadata"
attributes = ["title", "author", "date", "intent"]

# Provide few-shot examples for accurate extraction
examples = [
lx.data.ExampleData(
text="The annual report was authored by Jane Smith in January 2023.",
extractions=[
lx.data.Extraction(
extraction_class=extraction_class,
extraction_text="The annual report was authored by Jane Smith in January 2023.",
attributes={"title": "annual report", "author": "Jane Smith", "date": "January 2023", "intent": "The annual report representing the company annual turnover was presented by Jane Smith"}
)
]
),
lx.data.ExampleData(
text="Project timeline document by John Doe on 2022-12-15.",
extractions=[
lx.data.Extraction(
extraction_class=extraction_class,
extraction_text="Project timeline document by John Doe on 2022-12-15.",
attributes={"title": "Project timeline document", "author": "John Doe", "date": "2022-12-15", "intent": "This document explains the project timelines in detail"}
)
]
),
]

# Define a concise prompt
prompt = textwrap.dedent("""\
Extract document metadata such as title, author, date, and intent.
Use exact text for extractions. Do not paraphrase or overlap entities.
Provide meaningful attributes for each entity to add context.""")

# Extract metadata from all documents
doc_texts = [doc.page_content for doc in documents]
results = []
total_docs = len(doc_texts)

for i, input_text in enumerate(doc_texts):
try:
# Run the extraction
result = lx.extract(
text_or_documents=input_text,
prompt_description=prompt,
language_model_type=lx.inference.OllamaLanguageModel,
examples=examples,
model_id="llama3.2",
temperature=0.3,
model_url="http://localhost:11434"
)

if result.extractions:
metadata = result.extractions[0].attributes
results.append(metadata)
else:
results.append(None)

if progress_callback:
progress = (i + 1) / total_docs
progress_callback(progress)

except Exception as e:
st.warning(f"Metadata extraction failed for document {i+1}: {str(e)}")
results.append(None)

if progress_callback:
progress = (i + 1) / total_docs
progress_callback(progress)

# Enhance documents with extracted metadata
enhanced_docs = []
for i, doc in enumerate(documents):
if i < len(results) and results[i] is not None:
metadata = results[i] # This is a dictionary of extracted attributes
doc.metadata.update(metadata)
enhanced_docs.append(doc)

return results

except Exception as e:
st.error(f"Error in metadata extraction: {str(e)}")
return []


def create_typesense_vectorstore(self, chunks: List, typesense_host: str, typesense_port: str, typesense_api_key: str, protocol: str = "https"):
"""Create Typesense vectorstore - reference app approach"""
try:
embeddings = OllamaEmbeddings(model="mxbai-embed-large")
return self.create_vectorstore_from_documents(chunks, embeddings, typesense_host, typesense_port, typesense_api_key, protocol)
except Exception as e:
st.error(f"Error creating Typesense vectorstore: {str(e)}")
return None

def initialize_docsearch_from_existing(self, collection_name: str = "RAG", typesense_host: str = None, typesense_port: str = None, typesense_api_key: str = None, protocol: str = "https"):
"""Initialize DocSearch from existing collection"""
try:
embeddings = OllamaEmbeddings(model="mxbai-embed-large")

# Create Typesense client
typesense_client = typesense.Client({
'nodes': [{
'host': typesense_host,
'port': typesense_port,
'protocol': protocol
}],
'api_key': typesense_api_key,
'connection_timeout_seconds': 60
})

# Initialize DocSearch from existing collection
docsearch = Typesense(
embedding=embeddings,
typesense_client=typesense_client,
typesense_collection_name=collection_name,
)

return docsearch
except Exception as e:
st.error(f"Error initializing DocSearch from existing collection: {str(e)}")
return None

def extract_metadata_with_langextract(self, documents: List, progress_callback=None):
"""Extract metadata from documents using LangExtract"""
try:
# Define the extraction schema for metadata
extraction_class = "document_metadata"
attributes = ["title", "author", "date", "intent"]

# Provide few-shot examples for accurate extraction
examples = [
lx.data.ExampleData(
text="The annual report was authored by Jane Smith in January 2023.",
extractions=[
lx.data.Extraction(
extraction_class=extraction_class,
extraction_text="The annual report was authored by Jane Smith in January 2023.",
attributes={
"title": "annual report",
"author": "Jane Smith",
"date": "January 2023",
"intent": "The annual report representing the company annual turnover was presented by Jane Smith"
}
)
]
),
lx.data.ExampleData(
text="Project timeline document by John Doe on 2022-12-15.",
extractions=[
lx.data.Extraction(
extraction_class=extraction_class,
extraction_text="Project timeline document by John Doe on 2022-12-15.",
attributes={
"title": "Project timeline document",
"author": "John Doe",
"date": "2022-12-15",
"intent": "This document explains the project timelines in detail"
}
)
]
)
]

# Define a concise prompt
prompt = textwrap.dedent("""\
Extract document metadata such as title, author, date, and intent.
Use exact text for extractions. Do not paraphrase or overlap entities.
Provide meaningful attributes for each entity to add context.""")

# Extract metadata from all documents in a batch
doc_texts = [doc.page_content for doc in documents]
results = []

total_docs = len(doc_texts)
for i, input_text in enumerate(doc_texts):
if progress_callback:
progress_callback(i / total_docs)

try:
# Run the extraction
result = lx.extract(
text_or_documents=input_text,
prompt_description=prompt,
language_model_type=lx.inference.OllamaLanguageModel,
examples=examples,
model_id="llama3.2",
temperature=0.3,
model_url="http://localhost:11434"
)
if result.extractions:
results.append(result.extractions[0].attributes)
else:
results.append(None)

except Exception as e:
st.warning(f"Metadata extraction failed for document {i+1}: {str(e)}")
results.append(None)

# Enhance documents with extracted metadata
enhanced_docs = []
for i, doc in enumerate(documents):
if i < len(results) and results[i] is not None:
metadata = results[i] # This is a dictionary of extracted attributes
doc.metadata.update(metadata)
enhanced_docs.append(doc)

self.enhanced_docs = enhanced_docs
return enhanced_docs

except Exception as e:
st.error(f"Error in metadata extraction: {str(e)}")
return documents # Return original documents if extraction fails

def setup_enhanced_retrieval(self, chunks: List, docsearch, cohere_api_key: str = None, use_bm25: bool = False):
"""Setup enhanced retrieval with optional BM25, ensemble, and reranking"""
try:
# Initialize vector store retriever
vector_retriever = docsearch.as_retriever(search_kwargs={"k": 10})

# Setup ensemble retriever (with or without BM25)
if use_bm25:
try:
# Initialize BM25 retriever for keyword search
bm25_retriever = BM25Retriever.from_documents(chunks)
bm25_retriever.k = 10

# Combine retrievers using EnsembleRetriever
ensemble_retriever = EnsembleRetriever(
retrievers=[vector_retriever, bm25_retriever],
weights=[0.6, 0.4] # Weighted fusion
)
st.info("âś… Using Vector + BM25 ensemble retrieval")
except Exception as e:
st.warning(f"BM25 setup failed, using vector-only retrieval: {str(e)}")
ensemble_retriever = EnsembleRetriever(
retrievers=[vector_retriever],
weights=[1.0]
)
else:
# Vector-only ensemble
ensemble_retriever = EnsembleRetriever(
retrievers=[vector_retriever],
weights=[1.0]
)
st.info("âś… Using vector-only retrieval")

# Add reranker for contextual compression
if cohere_api_key:
try:
reranker = CohereRerank(model="rerank-english-v3.0", cohere_api_key=cohere_api_key)
st.info("âś… Using Cohere reranker")
except Exception as e:
st.warning(f"Cohere reranker failed: {str(e)}")
reranker = FlashrankRerank()
st.info("âś… Using Flashrank reranker (Cohere failed)")
else:
reranker = FlashrankRerank()
st.info("âś… Using Flashrank reranker")

compression_retriever = ContextualCompressionRetriever(
base_compressor=reranker,
base_retriever=ensemble_retriever
)

self.compression_retriever = compression_retriever
return compression_retriever

except Exception as e:
st.error(f"Error setting up enhanced retrieval: {str(e)}")
# Fallback to simple retriever
simple_retriever = docsearch.as_retriever()
self.compression_retriever = simple_retriever
return simple_retriever

def setup_agentic_rag_graph(self, groq_api_key: str, tavily_api_key: str = None):
"""Setup the streamlined agentic RAG workflow using LangGraph"""
try:
# Initialize components
llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.0, max_tokens=4028, api_key=groq_api_key)
parser = JsonOutputParser(pydantic_object=crFormat)

# Initialize web search tool if API key provided
web_tool = None
if tavily_api_key:
try:
# Set environment variable for Tavily
import os
os.environ['TAVILY_API_KEY'] = tavily_api_key
web_tool = TavilySearch(max_results=5, topic="general")
except Exception as e:
st.warning(f"⚠️ Tavily search not available: {str(e)}")

# Define retrieval node - use compression retriever if available, otherwise vector retriever
def retrieve_node(state: AgentState):
question = state["question"]
try:
if hasattr(self, 'compression_retriever') and self.compression_retriever:
# Use the enhanced retrieval system
retrieved_docs = self.compression_retriever.invoke(question)
context = [doc.page_content for doc in retrieved_docs]
else:
# Fallback to vector retriever
vector_retriever = self.vectorstore.as_retriever(search_kwargs={"k": 10})
retrieved_docs = vector_retriever.invoke(question)
context = [doc.page_content for doc in retrieved_docs]

return {"context": context}
except Exception as e:
error_msg = str(e)
print(f"Retrieval error: {error_msg}")
if "401" in error_msg or "invalid api token" in error_msg.lower():
return {"context": ["❌ Authentication failed: Invalid Typesense API key. Please update your API key in the sidebar and try again."]}
elif "404" in error_msg:
return {"context": ["❌ Collection not found. Please check if the collection exists or recreate it."]}
else:
return {"context": [f"❌ Unable to retrieve documents: {error_msg[:100]}..."]}


# Define generation node
def generate_node(state: AgentState):
context_str = "\n\n".join(state["context"]) if state["context"] else "No context available."
prompt = f"""
Answer the question based on the context below.
If unsure, say "I cannot answer based on the available context."

Context: {context_str}

Question: {state["question"]}

Answer:
"""
response = llm.invoke(prompt)
print(f"Response Generated: {response.content}")
return {"generation": response.content, "messages": [response.content]}

# Define self-reflection node (agentic critique)
def reflect_node(state: AgentState):
if not web_tool:
return {"reflection": "Web search not available", "messages": ["I cannot answer based on the available context."]}

try:
print("---------------------------AGENT REFLECTION -------------------------------------")
response = web_tool.invoke({"query": state['question']})
context = response.get('results', [])
print(context)

# Create reflection prompt
reflection_prompt = ChatPromptTemplate.from_template("""
Based on the Query and Content provided. Please generate a detailed verbose response.
Stick to the CONTENT provided. Do not USE YOUR KNOWLEDGE.

QUERY: {query}
CONTENT: {context}

{format_instructions}

The Answer should be clear, detailed and have a professional tone.
Start the response with 'Based on Web Search performed'.
Also Specify the URLs referred.

OUTPUT FORMAT:
{{"Web_Response":"The Answer should be clear,detailed and have a professional tone.Start the response with 'Based on Web Search performed'.Also Specify the URLs referred."}}
""")

# Create the chain
critique_chain = reflection_prompt | llm | parser

# Invoke the chain
output = critique_chain.invoke({
"query": state['question'],
"context": context,
"format_instructions": parser.get_format_instructions()
})

print(f"CRITIQUE: {output}")
return {"reflection": response.get('results', []), "messages": [output["Web_Response"]]}
except Exception as e:
print(f"Reflection error: {str(e)}")
return {"reflection": "Web search reflection failed", "messages": ["I cannot answer based on the available context."]}

# Helper function to determine if reflection is needed
def needs_reflection(state: AgentState):
gen = state['generation'].lower()
print(f"Generated response: {state['generation']}")
if "uncertain" in gen or "cannot answer" in gen:
return "reflect"
return END

# Build state graph
graph_builder = StateGraph(AgentState)
graph_builder.add_node("retrieve", retrieve_node)
graph_builder.add_node("generate", generate_node)
graph_builder.add_node("reflect", reflect_node)
graph_builder.set_entry_point("retrieve")
graph_builder.add_edge("retrieve", "generate")
graph_builder.add_conditional_edges(
"generate",
needs_reflection,
{"reflect": "reflect", "__end__": END}
)

# Compile graph
agentic_rag_graph = graph_builder.compile()
self.agentic_rag_graph = agentic_rag_graph

return agentic_rag_graph

except Exception as e:
st.error(f"Error setting up agentic RAG graph: {str(e)}")
return None

def query_agentic_rag(self, question: str):
"""Query the streamlined agentic RAG system"""
try:
if not self.agentic_rag_graph:
st.error("Agentic RAG graph not initialized")
return None

# Initialize state with required fields
initial_state = {
"question": question,
"context": [],
"generation": "",
"reflection": "",
"messages": []
}

# Execute the graph
result = self.agentic_rag_graph.invoke(initial_state)

# Return the final answer from messages (latest response)
if result.get("messages") and len(result["messages"]) > 0:
final_answer = result["messages"][-1] # Get the latest message
source_type = "agentic_rag_with_web_search" if result.get("reflection") else "agentic_rag"

return {
"answer": final_answer,
"source": source_type,
"context": result.get("context", [])
}
else:
# Fallback to generation if no messages
return {
"answer": result.get("generation", "No response generated"),
"source": "agentic_rag",
"context": result.get("context", [])
}

except Exception as e:
st.error(f"Error querying agentic RAG: {str(e)}")
return None

def setup_qa_chain(self, vectorstore, groq_api_key: str):
"""Setup QA chain with Groq LLM"""
try:
llm = ChatGroq(
groq_api_key=groq_api_key,
model_name="mixtral-8x7b-32768",
temperature=0.1
)

qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
return_source_documents=True
)

return qa_chain
except Exception as e:
st.error(f"Error setting up QA chain: {str(e)}")
return None

def initialize_session_state():
"""Initialize session state variables"""
if 'processor' not in st.session_state:
st.session_state.processor = DocumentProcessor()
if 'indexing_complete' not in st.session_state:
st.session_state.indexing_complete = False
if 'chat_history' not in st.session_state:
st.session_state.chat_history = []
if 'processing_stats' not in st.session_state:
st.session_state.processing_stats = {}

def validate_api_keys(groq_key: str, google_key: str, typesense_api_key: str, langextract_key: str = "") -> Dict[str, bool]:
"""Validate API keys"""
validation_results = {
'groq': bool(groq_key and len(groq_key) > 10),
'google': bool(google_key and len(google_key) > 10),
'typesense': bool(typesense_api_key and len(typesense_api_key) > 5),
'langextract': bool(langextract_key and len(langextract_key) > 10) if langextract_key else True # Optional
}
return validation_results

def display_processing_progress(stage: str, progress: float, message: str):
"""Display processing progress with animations"""
col1, col2 = st.columns([3, 1])

with col1:
st.markdown(f"**{stage}**")
progress_bar = st.progress(progress)
st.caption(message)

with col2:
if progress < 1.0:
st.markdown("🔄 Processing...")
else:
st.markdown("âś… Complete!")

# Function to sanitize filenames to valid Typesense collection names:
def sanitize_collection_name(filename: str) -> str:
# Lowercase, remove extension, replace non-alphanumeric with underscores
name = filename.lower()
name = re.sub(r'\.pdf$', '', name) # remove .pdf extension
name = re.sub(r'[^a-z0-9]+', '_', name) # replace invalid chars with underscore
return name

def process_documents_reference_style(documents, collection_name, typesense_client, typesense_host, typesense_port, typesense_protocol, typesense_api_key, cohere_api_key, groq_api_key, tavily_api_key):
"""Process documents following reference app logic"""

# LangExtract metadata extraction setup
extraction_class = "document_metadata"
attributes = ["title", "author", "date", "intent"]
examples = [
lx.data.ExampleData(
text="The annual report was authored by Jane Smith in January 2023.",
extractions=[
lx.data.Extraction(
extraction_class=extraction_class,
extraction_text="The annual report was authored by Jane Smith in January 2023.",
attributes={
"title": "annual report",
"author": "Jane Smith",
"date": "January 2023",
"intent": "The annual report representing the company annual turnover was presented by Jane Smith"
}
)
]
),
lx.data.ExampleData(
text="Project timeline document by John Doe on 2022-12-15.",
extractions=[
lx.data.Extraction(
extraction_class=extraction_class,
extraction_text="Project timeline document by John Doe on 2022-12-15.",
attributes={
"title": "Project timeline document",
"author": "John Doe",
"date": "2022-12-15",
"intent": "This document explains the project timelines in detail"
}
)
]
),
]

prompt = textwrap.dedent("""
Extract document metadata such as title, author, date, and intent.
Use exact text for extractions. Do not paraphrase or overlap entities.
Provide meaningful attributes for each entity to add context.
""")

# Extract metadata for each document page
doc_texts = [doc.page_content for doc in documents]
results = []
extraction_errors = []
with st.spinner("Extracting metadata with LangExtract..."):
for idx, input_text in enumerate(doc_texts):
try:
result = lx.extract(
text_or_documents=input_text,
prompt_description=prompt,
language_model_type=lx.inference.OllamaLanguageModel,
examples=examples,
model_id="llama3.2",
temperature=0.3,
model_url="http://localhost:11434"
)
print("Extraction successful!")
print(result)
results.append(result.extractions[0].attributes)
except Exception as e:
extraction_errors.append(f"Page {idx}: {str(e)}")
results.append({})

if extraction_errors:
st.warning(f"Metadata extraction had some errors on {len(extraction_errors)} pages.")
for err in extraction_errors:
st.write(err)
else:
st.success("Metadata extraction completed successfully.")

# Enhance documents with extracted metadata
enhanced_docs = []
for i, doc in enumerate(documents):
if i < len(results) and results[i]:
doc.metadata.update(results[i])
enhanced_docs.append(doc)

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(enhanced_docs)
st.info(f"Split documents into {len(chunks)} chunks for indexing.")

# Initialize Ollama embeddings
embeddings = OllamaEmbeddings(model="mxbai-embed-large")

# Create vectorstore - reference app style
st.info(f"Creating new Typesense collection '{collection_name}'")
docsearch = Typesense.from_documents(
chunks,
embeddings,
typesense_client_params={
"host": typesense_host,
"port": typesense_port,
"protocol": typesense_protocol,
"typesense_api_key": typesense_api_key,
"typesense_collection_name": collection_name,
},
)

# Setup retrievers and reranker - reference app style
setup_retrievers_and_agentic_rag(docsearch, chunks, cohere_api_key, groq_api_key, tavily_api_key, collection_name)

def setup_existing_collection_reference_style(collection_name, typesense_client, cohere_api_key, groq_api_key, tavily_api_key):
"""Setup existing collection following reference app logic"""
embeddings = OllamaEmbeddings(model="mxbai-embed-large")
docsearch = Typesense(
embedding=embeddings,
typesense_client=typesense_client,
typesense_collection_name=collection_name
)

# Setup retrievers and agentic RAG for existing collection
setup_retrievers_and_agentic_rag(docsearch, [], cohere_api_key, groq_api_key, tavily_api_key, collection_name)

def setup_retrievers_and_agentic_rag(docsearch, chunks, cohere_api_key, groq_api_key, tavily_api_key, collection_name):
"""Setup retrievers and agentic RAG - reference app style"""

# Setup retrievers and reranker
if cohere_api_key:
compressor = CohereRerank(model="rerank-english-v3.0", cohere_api_key=cohere_api_key)
else:
compressor = FlashrankRerank()

vector_retriever = docsearch.as_retriever(search_kwargs={"k": 10})

# Simple ensemble retriever (vector only, like reference app)
ensemble_retriever = EnsembleRetriever(
retrievers=[vector_retriever],
weights=[1]
)
reranker = FlashrankRerank()
compression_retriever = ContextualCompressionRetriever(
base_compressor=reranker,
base_retriever=ensemble_retriever
)

# Initialize Tavily Search tool
if tavily_api_key:
try:
# Set environment variable for Tavily
import os
os.environ['TAVILY_API_KEY'] = tavily_api_key
tool = TavilySearch(max_results=5, topic="general")
except Exception as e:
st.warning(f"⚠️ Tavily search initialization failed: {str(e)}")
tool = None
else:
tool = None

# Define agent state and parser
class crFormat(BaseModel):
Web_Response: str = Field(..., description="Generated response based on provided info.")

parser = JsonOutputParser(pydantic_object=crFormat)

class AgentState(TypedDict):
question: str
context: List[str]
generation: str
reflection: str
messages: Annotated[List[BaseMessage], add]

# Initialize Groq LLM
llm = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.0, max_tokens=4028, api_key=groq_api_key)

# Retrieval node
def retrieve_node(state: AgentState):
question = state["question"]
retrieved_docs = compression_retriever.invoke(question)
context = [doc.page_content for doc in retrieved_docs]
return {"context": context}

# Generation node
def generate_node(state: AgentState):
context_str = "\n\n".join(state["context"])
prompt = f"""
Answer the question based on the context below.
If unsure, say "I cannot answer based on the available context."
Context: {context_str}
Question: {state["question"]}
Answer:
"""
response = llm.invoke(prompt)
print(f"Response Generated: {response.content}")
return {"generation": response.content,"messages":[response.content]}

# Define self-reflection node (agentic critique)
def reflect_node(state: AgentState):
if not tool:
return {"reflection": "Web search not available", "messages": [state["generation"]]}

print("---------------------------AGENT REFLECTION -------------------------------------")
response = tool.invoke({"query": state['question']})
context = response.get('results', [])
print(context)

# Create reflection prompt
reflection_prompt = ChatPromptTemplate.from_template("""
Based on the Query and Content provided. Please generate a detailed verbose response.
Stick to the CONTENT provided. Do not USE YOUR KNOWLEDGE.
QUERY : {query}
CONTENT: {context}

{format_instructions}
The Answer should be clear,detailed and have a professional tone.Start the response with 'Based on Web Search performed '.
Also Specify the URLs referred.

OUTPUT FORMAT:
{{"Web_Response":"The Answer should be clear,detailed and have a professional tone.Start the response with 'Based on Web Search performed'.Also Specify the URLs referred."}}

""")

# Create the chain
critique_chain = reflection_prompt | llm | parser

# Invoke the chain
output = critique_chain.invoke({
"query": state['question'],
"context":context,
"format_instructions": parser.get_format_instructions()
})

print(f"CRITIQUE: {output}")
return {"reflection": response.get('results', []),"messages":[output["Web_Response"]]}

# Decide if reflection needed
def needs_reflection(state: AgentState):
gen = state['generation'].lower()
if "uncertain" in gen or "cannot answer" in gen:
return "reflect"
return END

# Build state graph
graph_builder = StateGraph(AgentState)
graph_builder.add_node("retrieve", retrieve_node)
graph_builder.add_node("generate", generate_node)
graph_builder.add_node("reflect", reflect_node)
graph_builder.set_entry_point("retrieve")
graph_builder.add_edge("retrieve", "generate")
graph_builder.add_conditional_edges(
"generate",
needs_reflection,
{"reflect": "reflect", "__end__": END}
)
agentic_rag_graph = graph_builder.compile()

# Store in session state for query interface
st.session_state.agentic_rag_graph = agentic_rag_graph
st.session_state.collection_name = collection_name

# Display query interface
display_query_interface_reference_style()

def display_query_interface_reference_style():
"""Display query interface - reference app style"""
st.header("âť“ Ask Questions about the Uploaded Documents")

if hasattr(st.session_state, 'collection_name'):
st.success(f"📚 **Using Collection:** `{st.session_state.collection_name}`")

query = st.text_input("Enter your question:")

if query:
if hasattr(st.session_state, 'agentic_rag_graph'):
initial_state = {"question": query, "context": [], "generation": "", "reflection": "", "messages": []}
with st.spinner("Processing your query..."):
result = st.session_state.agentic_rag_graph.invoke(initial_state)

st.subheader("Answer:")
if result.get("messages") and len(result["messages"]) > 0:
st.write(result["messages"][-1])
else:
st.write(result.get("generation", "No response generated"))
else:
st.error("Agentic RAG system not initialized")

def main():
"""Main Streamlit app function - Reference app structure"""
st.set_page_config(page_title="Document Metadata Extractor & RAG QA", layout="wide")

st.title("đź“„ Document Metadata Extraction & Retrieval-Augmented Generation (RAG)")

# Sidebar configuration - keep API key inputs
with st.sidebar:
st.header("⚙️ Configuration")
st.subheader("🔑 API Keys")

# API Key inputs
groq_api_key = st.text_input(
"Groq API Key",
type="password",
placeholder="Enter your Groq API key",
help="Required for LLM inference"
)


typesense_api_key = st.text_input(
"Typesense API Key",
value="",
type="password",
placeholder="Enter your valid Typesense API key",
help="⚠️ Required: Get this from your Typesense Cloud dashboard"
)


cohere_api_key = st.text_input(
"Cohere API Key",
type="password",
placeholder="Enter your Cohere API key",
help="Optional for enhanced reranking (falls back to Flashrank)"
)

tavily_api_key = st.text_input(
"Tavily API Key",
type="password",
placeholder="Enter your Tavily API key",
help="Optional for web search capabilities in agentic RAG"
)


st.info("ℹ️ **Metadata Extraction**: Using local Ollama (llama3.2) - no API key required!")

st.subheader("đź”§ Typesense Configuration")

typesense_host = st.text_input(
"Typesense Host",
value="pnfcivy9jst8z0d7p-1.a1.typesense.net",
placeholder="your-cluster.a1.typesense.net",
help="Typesense server hostname"
)

typesense_port = st.text_input(
"Typesense Port",
value="443",
placeholder="443",
help="Typesense server port (443 for HTTPS)"
)

typesense_protocol = st.selectbox(
"Protocol",
options=["https", "http"],
index=0,
help="Connection protocol"
)


# Reference app logic - direct file upload and processing
uploaded_files = st.file_uploader("Upload PDF Documents", type=["pdf"], accept_multiple_files=True)

if uploaded_files:
# Check required API keys
if not all([groq_api_key, typesense_api_key]):
st.warning("⚠️ Please provide all required API keys in the sidebar to continue.")
return

st.info(f"{len(uploaded_files)} document(s) uploaded. Processing...")

# Load documents from uploaded PDFs - reference app style
documents = []
for uploaded_file in uploaded_files:
# Save uploaded file to a temporary file
with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
tmp_file.write(uploaded_file.read())
tmp_file_path = tmp_file.name

# Initialize PyPDFLoader with the temporary file path
loader = PyPDFLoader(tmp_file_path)
docs = loader.load()

# Add source metadata (filename)
for d in docs:
d.metadata["source"] = uploaded_file.name

documents.extend(docs)
collection_name = sanitize_collection_name(uploaded_file.name)

st.success(f"Loaded {len(documents)} pages from uploaded documents.")

# Initialize Typesense client - reference app style
try:
typesense_client = typesense.Client({
'nodes': [{
'host': typesense_host,
'port': int(typesense_port),
'protocol': typesense_protocol
}],
'api_key': typesense_api_key,
'connection_timeout_seconds': 60
})
# Check if collection exists
collections = typesense_client.collections.retrieve()
collection_names = [col['name'] for col in collections]
print(f"Collection_Name:{collection_name}")

if collection_name not in collection_names:
# Process documents with metadata extraction - reference app style
process_documents_reference_style(documents, collection_name, typesense_client, typesense_host, typesense_port, typesense_protocol, typesense_api_key, cohere_api_key, groq_api_key, tavily_api_key)
else:
# Use existing collection - reference app style
st.success(f"Using existing Typesense collection '{collection_name}'")
setup_existing_collection_reference_style(collection_name, typesense_client, cohere_api_key, groq_api_key, tavily_api_key)

except Exception as e:
st.error(f"Error initializing Typesense client: {e}")
st.stop()
else:
st.info("Please upload one or more PDF documents to start.")

if __name__ == "__main__":
main()
streamlit run app.py

UI- Self Reflective RAG

Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size

Response Logs

(RAG_typesense) C:\Users\PLNAYAK\Documents\RAG_typesense>streamlit run app.py

You can now view your Streamlit app in your browser.

Local URL: http://localhost:8502
Network URL: http://192.168.1.2:8502

Collection_Name:ai_agent_architecture_design_patterns
Collection_Name:ai_agent_architecture_design_patterns
Collection_Name:agentic_rag_system_architectures
C:\Users\PLNAYAK\Documents\RAG_typesense\.venv\Lib\site-packages\langextract\inference.py:32: FutureWarning: `langextract.inference.OllamaLanguageModel` is deprecated and will be removed in v2.0.0; use `langextract.providers.ollama.OllamaLanguageModel` instead.
return inference.__getattr__(name)
C:\Users\PLNAYAK\Documents\RAG_typesense\.venv\Lib\site-packages\langextract\__init__.py:55: FutureWarning: 'language_model_type' is deprecated and will be removed in v2.0.0. Use model, config, or model_id parameters instead.
return extract_func(*args, **kwargs)
INFO:absl:Loaded provider plugin: gemini
INFO:absl:Loaded provider plugin: ollama
INFO:absl:Loaded provider plugin: openai
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 1
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_d6f540c1.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='Guide to 7 Popular Agentic RAG SystemArchitectures', char_interval=CharInterval(start_pos=0, end_pos=50), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'title': 'Guide to 7 Popular Agentic RAG SystemArchitectures', 'author': 'Dipanjan (DJ)', 'date': None, 'intent': 'This document provides an overview of 7 popular Agentic RAG system architectures'})], text='Guide to 7 Popular Agentic RAG SystemArchitectures\nDipanjan (DJ)')
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 1
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_04ba1810.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='Agentic RAG Workflow', char_interval=CharInterval(start_pos=0, end_pos=20), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'title': 'Agentic RAG Workflow', 'author': 'Unknown', 'date': 'Unknown', 'intent': 'This document explains the Agentic RAG workflow and its applications'}), Extraction(extraction_class='document_metadata', extraction_text='Agentic RAG is a combination of AI Agents and RAG Systems', char_interval=CharInterval(start_pos=21, end_pos=78), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=2, group_index=1, description=None, attributes={'title': 'Agentic RAG', 'author': 'Unknown', 'date': 'Unknown', 'intent': 'This sentence provides an overview of the Agentic RAG concept'}), Extraction(extraction_class='document_metadata', extraction_text='Agentic RAG Systems have various workflows depending on the use-case', char_interval=CharInterval(start_pos=79, end_pos=147), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=3, group_index=2, description=None, attributes={'title': 'Agentic RAG Systems', 'author': 'Unknown', 'date': 'Unknown', 'intent': 'This sentence explains the flexibility of Agentic RAG Systems'}), Extraction(extraction_class='document_metadata', extraction_text='In this workflow we first create various vector databases based on specific document types and domains', char_interval=CharInterval(start_pos=148, end_pos=250), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=4, group_index=3, description=None, attributes={'title': 'Workflow Step', 'author': 'Unknown', 'date': 'Unknown', 'intent': 'This sentence describes a step in the Agentic RAG workflow'}), Extraction(extraction_class='document_metadata', extraction_text='Based on the user query the LLM will reason and route to the relevant Vector DB', char_interval=CharInterval(start_pos=251, end_pos=330), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=5, group_index=4, description=None, attributes={'title': 'Workflow Step', 'author': 'Unknown', 'date': 'Unknown', 'intent': 'This sentence describes another step in the Agentic RAG workflow'}), Extraction(extraction_class='document_metadata', extraction_text='Context documents are retrieved and the standard RAG flow is executed after that as usual', char_interval=CharInterval(start_pos=331, end_pos=420), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=6, group_index=5, description=None, attributes={'title': 'Workflow Step', 'author': 'Unknown', 'date': 'Unknown', 'intent': 'This sentence describes a final step in the Agentic RAG workflow'}), Extraction(extraction_class='document_metadata', extraction_text='Very useful when you have documents related to different domains, departments', char_interval=CharInterval(start_pos=421, end_pos=498), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=7, group_index=6, description=None, attributes={'title': 'Benefits of Agentic RAG', 'author': 'Unknown', 'date': 'Unknown', 'intent': 'This sentence highlights the benefits of using Agentic RAG'}), Extraction(extraction_class='document_metadata', extraction_text='Source: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)', char_interval=CharInterval(start_pos=499, end_pos=586), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=8, group_index=7, description=None, attributes={'title': 'Reference', 'author': 'Dipanjan (DJ)', 'date': 'Unknown', 'intent': 'This sentence provides a reference for the Agentic RAG concept'})], text='Agentic RAG Workflow\nAgentic RAG is a combination of AI Agents and RAG Systems\nAgentic RAG Systems have various workflows depending on the use-case\nIn this workflow we first create various vector databases based on specific\ndocument types and domains\nBased on the user query the LLM will reason and route to the relevant Vector DB\nContext documents are retrieved and the standard RAG flow is executed after that\nas usual\nVery useful when you have documents related to different domains, departments\nSource: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)')
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 2
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_0f29fce5.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='The annual report was authored by Jane Smith in January 2023.', char_interval=None, alignment_status=None, extraction_index=1, group_index=0, description=None, attributes={'title': 'annual report', 'author': 'Jane Smith', 'date': 'January 2023', 'intent': 'The annual report representing the company annual turnover was presented by Jane Smith'}), Extraction(extraction_class='document_metadata', extraction_text='Project timeline document by John Doe on 2022-12-15.', char_interval=None, alignment_status=None, extraction_index=2, group_index=1, description=None, attributes={'title': 'Project timeline document', 'author': 'John Doe', 'date': '2022-12-15', 'intent': 'This document explains the project timelines in detail'}), Extraction(extraction_class='document_metadata', extraction_text='1. Agentic RAG Routers\nAgentic RAG Routers are systems designed to dynamically route user queries to\nappropriate tools or data sources, enhancing the capabilities of Large Language\nModels (LLMs)\nThe primary purpose of such routers is to combine retrieval mechanisms with the\ngenerative strengths of LLMs to deliver accurate and contextually rich responses\nThere are various types of Agentic RAG Routers:\nSingle Agentic RAG Router: One unified agent responsible for all routing,\nretrieval, and decision-making tasks.\nMultiple Agentic RAG Routers: Multiple agents, each handling a specific type\nof task or query. \nMulti-Agentic RAG Routers are useful where the system employs multiple\nretrieval agents, each specializing in a specific type of task. For example:\nRetrieval Agent 1 might handle SQL-based queries.\nRetrieval Agent 2 might focus on semantic searches.\nRetrieval Agent 3 could prioritize recommendations or web searches.', char_interval=CharInterval(start_pos=0, end_pos=929), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=3, group_index=2, description=None, attributes={'title': 'Agentic RAG Routers', 'author': None, 'date': None, 'intent': 'This document explains the concept and types of Agentic RAG Routers, their benefits, and use cases'}), Extraction(extraction_class='document_metadata', extraction_text='Source: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)', char_interval=CharInterval(start_pos=930, end_pos=1017), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'title': '7 Agentic RAG System Architectures to Build AI Agents', 'author': 'Dipanjan (DJ)', 'date': '', 'intent': 'This source provides information on 7 agentic RAG system architectures for building AI agents'})], text='1. Agentic RAG Routers\nAgentic RAG Routers are systems designed to dynamically route user queries to\nappropriate tools or data sources, enhancing the capabilities of Large Language\nModels (LLMs)\nThe primary purpose of such routers is to combine retrieval mechanisms with the\ngenerative strengths of LLMs to deliver accurate and contextually rich responses\nThere are various types of Agentic RAG Routers:\nSingle Agentic RAG Router: One unified agent responsible for all routing,\nretrieval, and decision-making tasks.\nMultiple Agentic RAG Routers: Multiple agents, each handling a specific type\nof task or query. \nMulti-Agentic RAG Routers are useful where the system employs multiple\nretrieval agents, each specializing in a specific type of task. For example:\nRetrieval Agent 1 might handle SQL-based queries.\nRetrieval Agent 2 might focus on semantic searches.\nRetrieval Agent 3 could prioritize recommendations or web searches.\nSource: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)')
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 1
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_00423265.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='Query Planning Agentic RAG (Retrieval-Augmented Generation) is a methodology designed to handle complex queries efficiently by leveraging multiple parallelizable subqueries across diverse data sources', char_interval=CharInterval(start_pos=30, end_pos=230), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'title': 'Query Planning Agentic RAG (Retrieval-Augmented Generation)', 'author': 'None specified', 'date': 'None specified', 'intent': 'Explains the Query Planning Agentic RAG methodology for handling complex queries'}), Extraction(extraction_class='document_metadata', extraction_text='This approach combines intelligent query division, distributed processing, and response synthesis to deliver accurate and comprehensive results', char_interval=CharInterval(start_pos=231, end_pos=374), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=2, group_index=1, description=None, attributes={'title': 'Query Planning Agentic RAG (Retrieval-Augmented Generation)', 'author': 'None specified', 'date': 'None specified', 'intent': 'Describes the benefits of using Query Planning Agentic RAG for query processing'}), Extraction(extraction_class='document_metadata', extraction_text='The Query Planner is the central component orchestrating the process. It: Interprets the query provided by the user, Generates appropriate prompts for the downstream components, Decide which tools (query engines) to invoke to answer specific parts of the query', char_interval=CharInterval(start_pos=375, end_pos=490), alignment_status=<AlignmentStatus.MATCH_LESSER: 'match_lesser'>, extraction_index=3, group_index=2, description=None, attributes={'title': 'Query Planning Agentic RAG (Retrieval-Augmented Generation)', 'author': 'None specified', 'date': 'None specified', 'intent': 'Explains the role of the Query Planner in the Query Planning Agentic RAG process'}), Extraction(extraction_class='document_metadata', extraction_text='Very useful to break down complex queries step by step to retrieve results for each query to get to the final result', char_interval=CharInterval(start_pos=634, end_pos=750), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=4, group_index=3, description=None, attributes={'title': 'Query Planning Agentic RAG (Retrieval-Augmented Generation)', 'author': 'None specified', 'date': 'None specified', 'intent': 'Highlights the utility of Query Planning Agentic RAG for complex query processing'}), Extraction(extraction_class='document_metadata', extraction_text='Source: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)', char_interval=CharInterval(start_pos=751, end_pos=838), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=5, group_index=4, description=None, attributes={'title': 'Query Planning Agentic RAG (Retrieval-Augmented Generation)', 'author': 'Dipanjan (DJ)', 'date': 'None specified', 'intent': 'Cites the source of the information and credits the author'})], text='2. Query Planning Agentic RAG\nQuery Planning Agentic RAG (Retrieval-Augmented Generation) is a methodology\ndesigned to handle complex queries efficiently by leveraging multiple\nparallelizable subqueries across diverse data sources\nThis approach combines intelligent query division, distributed processing, and\nresponse synthesis to deliver accurate and comprehensive results\nThe Query Planner is the central component orchestrating the process. It:\nInterprets the query provided by the user\nGenerates appropriate prompts for the downstream components\nDecide which tools (query engines) to invoke to answer specific parts of the\nquery\nVery useful to break down complex queries step by step to retrieve results for\neach query to get to the final result\nSource: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)')
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 2
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_214b9293.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='3. Adaptive RAG Adaptive Retrieval-Augmented Generation (Adaptive RAG) is a method that enhances the flexibility and efficiency of large language models (LLMs) by tailoring the query handling strategy to the complexity of the incoming query', char_interval=CharInterval(start_pos=0, end_pos=241), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'title': 'Adaptive RAG', 'author': 'None', 'date': 'None', 'intent': 'Explains the Adaptive RAG method for improving large language models'}), Extraction(extraction_class='document_metadata', extraction_text='Classifier Role: A smaller language model predicts query complexity It is trained using automatically labelled datasets, where the labels are derived from past model outcomes and inherent patterns in the data', char_interval=CharInterval(start_pos=632, end_pos=840), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=2, group_index=1, description=None, attributes={'title': 'Classifier Role', 'author': 'None', 'date': 'None', 'intent': 'Describes the role of a smaller language model in predicting query complexity'}), Extraction(extraction_class='document_metadata', extraction_text='Dynamic Strategy Selection: For simple or straightforward queries, the framework avoids wasting computational resources', char_interval=CharInterval(start_pos=841, end_pos=960), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=3, group_index=2, description=None, attributes={'title': 'Dynamic Strategy Selection', 'author': 'None', 'date': 'None', 'intent': 'Explains how Adaptive RAG avoids wasting resources for simple queries'}), Extraction(extraction_class='document_metadata', extraction_text='For complex queries, it ensures sufficient iterations through multiple retrieval steps', char_interval=CharInterval(start_pos=961, end_pos=1047), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'title': 'None', 'author': 'None', 'date': 'None', 'intent': 'Explains the importance of iteration in query processing'})], text='3. Adaptive RAG \nAdaptive Retrieval-Augmented Generation (Adaptive RAG) is a method that\nenhances the flexibility and efficiency of large language models (LLMs) by\ntailoring the query handling strategy to the complexity of the incoming query\nAdaptive RAG is a better version of routing RAG where it dynamically chooses\nbetween different strategies for answering questions—ranging from simple single-\nstep approaches to more complex multi-step or even no-retrieval processes—\nbased on the complexity of the query. \nThis selection is facilitated by a classifier, which analyzes the query’s nature and\ndetermines the optimal approach.\nClassifier Role:\nA smaller language model predicts query complexity\nIt is trained using automatically labelled datasets, where the labels are derived from past model\noutcomes and inherent patterns in the data\nDynamic Strategy Selection:\nFor simple or straightforward queries, the framework avoids wasting computational resources\nFor complex queries, it ensures sufficient iterations through multiple retrieval steps\nSource: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)')
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 1
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_483ca67a.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='4. Agentic Corrective RAG The Agentic Corrective RAG Architecture enhances Retrieval-Augmented Generation(RAG) with corrective steps for accurate answers:Query and Initial Retrieval: A user query retrieves context documents from avector database.Document Evaluation:The LLM Grader Prompt evaluates each document’srelevance (yes or no).Decision Node:All Relevant: Directly proceed to generate the answer.Irrelevant Documents: Trigger corrective steps.Query Rephrasing: The LLM Rephrase Prompt rewrites the query for optimized web retrieval.Additional Retrieval: A web search retrieves improved context documents.Response Generation: The RAG Prompt generates an answer using validated context only.Source: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)', char_interval=CharInterval(start_pos=0, end_pos=231), alignment_status=<AlignmentStatus.MATCH_LESSER: 'match_lesser'>, extraction_index=1, group_index=0, description=None, attributes={'title': 'Agentic Corrective RAG', 'author': 'Dipanjan (DJ)', 'date': '', 'intent': 'This document explains the Agentic Corrective RAG Architecture, its components, and how it enhances Retrieval-Augmented Generation with corrective steps for accurate answers'})], text='4. Agentic Corrective RAG \nThe Agentic Corrective RAG Architecture enhances Retrieval-Augmented Generation\n(RAG) with corrective steps for accurate answers:\nQuery and Initial Retrieval: A user query retrieves context documents from a\nvector database.\nDocument Evaluation: The LLM Grader Prompt evaluates each document’s\nrelevance (yes or no).\nDecision Node:\nAll Relevant: Directly proceed to generate the answer.\nIrrelevant Documents: Trigger corrective steps.\nQuery Rephrasing: The LLM Rephrase Prompt rewrites the query for optimized\nweb retrieval.\nAdditional Retrieval: A web search retrieves improved context documents.\nResponse Generation: The RAG Prompt generates an answer using validated\ncontext only.\nSource: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)')
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 1
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_a029e46e.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='5. Self-Reflective RAG Self-reflective RAG (Retrieval-Augmented Generation) is an advanced approach that combines the capabilities of retrieval-based methods with generative models while adding an additional layer of self-reflection and logical reasoning.', char_interval=CharInterval(start_pos=0, end_pos=256), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'title': 'Self-Reflective RAG', 'author': 'Dipanjan (DJ)', 'date': '', 'intent': 'This document explains the concept and capabilities of Self-Reflective RAG'})], text='5. Self-Reflective RAG \nSelf-reflective RAG (Retrieval-Augmented Generation) is an advanced approach\nthat combines the capabilities of retrieval-based methods with generative models\nwhile adding an additional layer of self-reflection and logical reasoning.\nSelf-reflective RAG helps in retrieval, re-writing questions, discarding irrelevant\nor hallucinated documents and re-try retrieval.\nSource: 7 Agentic RAG System Architectures to Build AI Agents\n Created by: Dipanjan (DJ)')
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 1
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_83457cee.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='6. Speculative RAG ', char_interval=CharInterval(start_pos=0, end_pos=18), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'title': 'Speculative RAG', 'author': 'unknown', 'date': 'unknown', 'intent': "This document explains the Speculative RAG framework, a smart approach to improve large language models' performance"})], text='6. Speculative RAG \nSpeculative RAG is a smart framework designed to make large language models\n(LLMs) both faster and more accurate when answering questions. It does this by\nsplitting the work between two kinds of language models:\nA small, specialized model that quickly create multiple drafts of possible\nanswers and includes reasoning for each draft (like saying, “This answer is\nbased on this source”).\nA large, general-purpose model that double-checks these drafts and picks the\nbest one as the final response.\nSource: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)')
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 1
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_391f17fa.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='7. Self Route Agentic RAG \n\nSelf Route is a design pattern in Agentic RAG systems where Large Language\nModels (LLMs) play an active role in deciding how a query should be processed\n\nThis is a hybrid approach which combines Retrieval-Augmented Generation (RAG)\nand Long Context (LC) LLMs.\n\nKey components of Self Route:\nDecision-making by LLMs: Queries are evaluated to determine if they can be\nanswered with the given retrieved context.\nRouting: If a query is answerable, response is generated immediately.\nOtherwise, it is routed to a long-context model with the full context\ndocuments to generate the response.\n\nEfficiency and Accuracy: This design balances cost-efficiency (avoiding\nunnecessary computation cost and time) and accuracy (leveraging long-\ncontext models only when needed).\n\nSource: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)', char_interval=CharInterval(start_pos=0, end_pos=874), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes=None)], text='7. Self Route Agentic RAG \nSelf Route is a design pattern in Agentic RAG systems where Large Language\nModels (LLMs) play an active role in deciding how a query should be processed\nThis is a hybrid approach which combines Retrieval-Augmented Generation (RAG)\nand Long Context (LC) LLMs.\nKey components of Self Route:\nDecision-making by LLMs: Queries are evaluated to determine if they can be\nanswered with the given retrieved context.\nRouting: If a query is answerable, response is generated immediately.\nOtherwise, it is routed to a long-context model with the full context\ndocuments to generate the response.\nEfficiency and Accuracy: This design balances cost-efficiency (avoiding\nunnecessary computation cost and time) and accuracy (leveraging long-\ncontext models only when needed).\nSource: 7 Agentic RAG System Architectures to Build AI Agents Created by: Dipanjan (DJ)')
INFO:absl:Starting document annotation.
INFO:absl:Processing batch 0 with length 1
INFO:absl:Starting resolver process for input text.
INFO:absl:Starting string parsing.
INFO:absl:Completed parsing of string.
INFO:absl:Starting to extract and order extractions from data.
INFO:absl:Completed extraction and ordering of extractions.
INFO:absl:Starting alignment process for provided chunk text.
INFO:absl:Completed alignment process for the provided source_text.
INFO:absl:Finalizing annotation for document ID doc_4837145e.
INFO:absl:Document annotation completed.
Extraction successful!
AnnotatedDocument(extractions=[Extraction(extraction_class='document_metadata', extraction_text='Detailed Article Check out the detailed article here Created by: Dipanjan (DJ)', char_interval=CharInterval(start_pos=0, end_pos=78), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'title': 'Detailed Article', 'author': 'Dipanjan (DJ)', 'date': None, 'intent': None})], text='Detailed Article\nCheck out the\ndetailed article\nhere\nCreated by: Dipanjan (DJ)')
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
Collection_Name:agentic_rag_system_architectures
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
Response Generated: I cannot answer based on the available context. The question is incomplete, and I don't have enough information to provide a response. Please provide the full question.
---------------------------AGENT REFLECTION -------------------------------------
[{'url': 'https://www.merriam-webster.com/dictionary/what', 'title': 'WHAT Definition & Meaning - Merriam-Webster', 'content': "—used as an interrogative expressing inquiry about the identity, nature, or value of an object or matter archaic **:** who sense 1 —used as an interrogative expressing inquiry about the identity of a person you know what —used at the end of a question to express inquiry about additional possibilities —used as an interrogative expressing inquiry about the identity, nature, or value of a person, object, or matter * know something or someone for what it/he/she is * know what it is * what does one know * what do you know * what do you know about that * what's the good word * you-know-what * you know what they say you know *what?*", 'score': 0.55767727, 'raw_content': None}, {'url': 'https://www.dictionary.com/browse/what', 'title': 'WHAT Definition & Meaning - Dictionary.com', 'content': "* + Word of the Day 5. (used interrogatively to request a repetition of words or information not fully understood, usually used in elliptical constructions). He said what everyone expected he would. You know what? (used as an intensifier in exclamatory phrases, often followed by an indefinite article). 1. (used in exclamatory expressions, often followed by a question). 2. *informal*, \xa0a punishment or reprimand (esp in the phrase **give** ( **a person** ) **what for** ) The use of *are* in sentences such as *what we need are more doctors* is common, although many people think *is* should be used: *what we need is more doctors* ## Idioms and Phrases It's high time you told him what's what. ### More idioms and phrases containing *what* ## Word of the Day", 'score': 0.53065056, 'raw_content': None}, {'url': 'https://en.wiktionary.org/wiki/what', 'title': 'what - Wiktionary, the free dictionary', 'content': 'Middle English. Etymology 1. From Old English hwæt, from Proto-West Germanic *hwat, from Proto-Germanic *hwat, from Proto-Indo-European *kʷód.', 'score': 0.4431913, 'raw_content': None}, {'url': 'https://www.youtube.com/watch?v=ejc5zic4q2A', 'title': 'what. (Bo Burnham FULL SHOW HD) - YouTube', 'content': 'what. (Bo Burnham FULL SHOW HD)\nboburnham\n3800000 subscribers\n558914 likes\n26050161 views\n17 Dec 2013\nwhat. I hope you enjoy it.\nbuy the CD here: https://itunes.apple.com/us/album/what./id773753940\n\nor get my poetry book here: http://www.amazon.com/Egghead-Cant-Survive-Ideas-Alone/dp/1455519146\n\nThanks for watching!\n\n', 'score': 0.391802, 'raw_content': None}, {'url': 'https://www.reddit.com/r/RandomThoughts/comments/18hw88q/what_is_the_meaning_of_what/', 'title': "What is the meaning of 'what'? : r/RandomThoughts - Reddit", 'content': "What is the meaning of 'what'? Open menu Open navigationGo to Reddit Home Image 1: r/RandomThoughts icon Go to RandomThoughts Image 3: r/RandomThoughts iconr/RandomThoughtsImage 4: GoldStandardImage 5: GoldStandard What is the meaning of 'what'? What does 'WHAT' mean? Archived post. New comments cannot be posted and votes cannot be cast. Related Answers Section Related Answers Meaning of the word 'what' New to Reddit? Continue with Email Continue With Phone Number Image 6: GoldStandardImage 7: GoldStandard Anyone can view, post, and comment to this community Top Posts * Reddit reReddit: Top posts of December 14, 2023 * * * * Reddit reReddit: Top posts of December 2023 * * * * Reddit reReddit: Top posts of 2023 * * * Image 8", 'score': 0.23077278, 'raw_content': None}]
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
CRITIQUE: {'Web_Response': "Based on Web Search performed on various online platforms, including https://www.merriam-webster.com/dictionary/what, https://www.dictionary.com/browse/what, https://en.wiktionary.org/wiki/what, https://www.youtube.com/watch?v=ejc5zic4q2A, and https://www.reddit.com/r/RandomThoughts/comments/18hw88q/what_is_the_meaning_of_what/, the term 'what' is a widely used interrogative word in the English language. According to Merriam-Webster, 'what' is used to express inquiry about the identity, nature, or value of an object or matter. It can also be used to express inquiry about the identity of a person. Additionally, 'what' can be used as an intensifier in exclamatory phrases, often followed by an indefinite article. Dictionary.com further explains that 'what' can be used in elliptical constructions to request a repetition of words or information not fully understood. The etymology of 'what' is rooted in Old English, with the word 'hwæt' being derived from Proto-West Germanic *hwat, from Proto-Germanic *hwat, from Proto-Indo-European *kʷód, as stated on Wiktionary. The usage of 'what' is not limited to formal language, as it is also commonly used in informal expressions, such as 'you know what' or 'what's what.' Overall, the meaning and usage of 'what' are multifaceted and context-dependent, reflecting the complexity and nuance of the English language."}
Collection_Name:agentic_rag_system_architectures
INFO:httpx:HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.groq.com/openai/v1/chat/completions "HTTP/1.1 200 OK"
Response Generated: Query Planning Agentic RAG is a methodology designed to handle complex queries efficiently by leveraging multiple parallelizable subqueries across diverse data sources. It combines intelligent query division, distributed processing, and response synthesis to deliver accurate and comprehensive results. The Query Planner is the central component that interprets the query, generates prompts, decides which tools to invoke, and breaks down complex queries step by step to retrieve results.

Conclusion:

Self-Reflective RAG represents a transformative evolution in AI knowledge systems, moving beyond simple retrieval to incorporate critical self-assessment and iterative refinement. When combined with LangExtract’s precise information extraction capabilities and Typesense’s high-performance search infrastructure, these technologies create a powerful framework for building accurate, verifiable, and trustworthy AI applications. Together, they address the fundamental challenges of factual accuracy, source transparency, and response quality that have long plagued conventional RAG systems, paving the way for more reliable and responsible AI implementations across industries.

References:

--

--

Plaban Nayak
Plaban Nayak

Written by Plaban Nayak

Machine Learning and Deep Learning enthusiast

Responses (1)