πŸ₯ Building an AI-Powered Medical Document Analyzer and Validator

11 min readMar 9, 2025

πŸ€– A Modern Approach to Medical Document Processing

Medical documentation is a critical component of healthcare, but analyzing and validating these documents can be time-consuming and complex. Let’s explore our innovative solution! πŸš€

🎯 The Challenge

Healthcare professionals face several challenges when dealing with medical documents:

- ⏰ Time-consuming manual review processes

- ❌ Potential for human error in document analysis

- βœ… Need for consistent validation of diagnoses and treatments

- πŸ” Difficulty in quickly extracting key information

- πŸ“ Requirement for accurate summarization

πŸ’‘Solution: The Medical Document Analyzer

We’ve developed a modern web application that uses advanced AI models to analyze, summarize, and validate medical documents. The application focuses on:

- 🎨 User-friendly interface

- ⚑ Fast processing

- 🎯 Accurate analysis

- ✨ Comprehensive validation

- πŸ“Š Clear presentation of results

πŸ—οΈ Technical Architecture

The application is built using a modern tech stack:

πŸ–₯️ Frontend

- πŸš€ Framework: FastAPI for the backend API

- 🎨 UI: Tailwind CSS + DaisyUI for styling

- ⚑ Interactivity: Alpine.js for reactive components

- πŸ“ Markdown Rendering: Marked.js for formatted output

βš™οΈ Backend🧠 AI Models:

- πŸ€– DeepSeek-R1-Distill-Llama-70B for analysis

- πŸ”„ Mixtral-8x7b for summarization

- πŸ‘οΈ Mistral OCR -OCR capabilities for PDF processing

- πŸ“Š LangGraph for workflow orchestration

πŸ”„ The Analysis Pipeline

The application processes documents through four key stages:

1. πŸ“„ Document Extraction

- πŸ‘οΈ OCR processing for PDFs

- ✨ Text extraction and cleaning

- πŸ“‹ Format standardization

2. πŸ” Medical Analysis

- πŸ“… Identification of key dates

- πŸ₯ Extraction of facility information

- πŸ‘¨β€βš•οΈ Healthcare provider details

- πŸ‘€ Patient information analysis

- πŸ’Š Medication documentation

3. πŸ“ Summary Generation

- 🎯 Key findings identification

- 🏷️ Diagnosis compilation

- πŸ’‰ Treatment plan summarization

- ⭐ Critical observations highlighting

4. βœ… Diagnosis Validation

- πŸ”„ Symptom-diagnosis alignment check

- πŸ“Š Treatment appropriateness evaluation

- πŸ’Š Medication review

- ⚠️ Risk assessment

-πŸ’‘ Alternative treatment suggestions

✨ Key Features

1. πŸ” Intelligent Document Processing

- πŸ“„ Support for PDF documents

- πŸ‘οΈ Advanced OCR capabilities

- πŸ“‹ Structured information extraction

2. πŸ“Š Comprehensive Analysis

- πŸ” Detailed medical information extraction

- πŸ“ Structured formatting of findings

- 🎯 Clear presentation of key details

3. πŸ“ Smart Summarization

- πŸ’‘ Concise yet comprehensive summaries

- πŸ“Š Hierarchical information organization

- ⭐ Focus on critical elements

4. βœ… Validation System

- πŸ”„ Treatment-diagnosis alignment check

- πŸ’Š Medication appropriateness review

- πŸ’‘ Alternative treatment suggestions

- ⚠️ Risk factor identification

πŸ”’ Security and Privacy

The application is built with security in mind:

- πŸ” Secure file handling

- 🧹 Temporary file cleanup

- 🚫 No permanent storage of sensitive data

  • πŸ’» Local processing capabilities

πŸš€Code Implementation

Folder Structure

Install required dependencies

langgraph
langchain
langchain-ollama
langchain-groq
fastapi
python-multipart
uvicorn
jinja2
graphviz

Setup the API Keys (.env)

GROQ_API_KEY='Your API KEY'
MISTRAL_API_KEY='Your API KEY'

Main.py

from typing import Dict, Any
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph import StateGraph,START,END
from langchain_ollama import ChatOllama
from langchain_groq import ChatGroq
from pathlib import Path
import base64
from io import BytesIO
from mistralai import Mistral
from typing_extensions import TypedDict
import os
from dotenv import load_dotenv

load_dotenv()

client = Mistral(api_key=os.getenv("MISTRAL_API_KEY"))

# Initialize clients
summary_llm = ChatGroq(
model_name="mixtral-8x7b-32768",
temperature=0
)
analyzer_llm = ChatGroq(
model_name="DeepSeek-R1-Distill-Llama-70B",
temperature=0.6
)


def extracttpdf(pdf_name):
uploaded_pdf = client.files.upload(
file={
"file_name": "pdf_name",
"content": open(pdf_name, "rb"),
},
purpose="ocr"
)
#
signed_url = client.files.get_signed_url(file_id=uploaded_pdf.id)
#
ocr_response = client.ocr.process(
model="mistral-ocr-latest",
document={
"type": "document_url",
"document_url": signed_url.url,
}
)
#
text = "\n\n".join([page.markdown for page in ocr_response.pages])
return text

#
class MedicalAnalysisState(TypedDict):
file_name : str
context:str
analysis_result: str
summary: str
validation_result: str


def create_medical_analysis_chain():
# Define the nodes (agents) in our graph
def extract_context(state:MedicalAnalysisState):
print("----------------------------------------------------")
print("-----------Extracting context from PDF--------------")
print("----------------------------------------------------")
pdf_name = state['file_name']
text = extracttpdf(pdf_name)
state["context"] = text
return state
def analyze_document(state:MedicalAnalysisState):
print("----------------------------------------------------")
print("------------Analyzing context from PDF--------------")
print("----------------------------------------------------")
messages = state["context"]
document_content = messages

# Use Langchain groq for medical analysis
messages = [
SystemMessage(content="""You are a medical document analyzer. Extract key information and format it in markdown with the following sections:

### Date of Incident
- Specify the date when the medical incident occurred

### Medical Facility
- Name of the medical center/hospital
- Location details

### Healthcare Providers
- Primary physician
- Other medical staff involved

### Patient Information
- Chief complaints
- Vital signs
- Relevant medical history

### Medications
- Current medications
- New prescriptions
- Dosage information

Please ensure the response is well-formatted in markdown with appropriate headers and bullet points."""),
HumanMessage(content=document_content)
]
response = analyzer_llm.invoke(messages)

state["analysis_result"] = response.content.split("</think>")[-1]
return state

def generate_summary(state:MedicalAnalysisState):
print("----------------------------------------------------")
print("------------Generating summary from PDF-------------")
print("----------------------------------------------------")
analysis_result = state["analysis_result"]

messages = [
SystemMessage(content="""You are a medical report summarizer. Create a detailed summary in markdown format with the following sections:

### Key Findings
- Main medical issues identified
- Critical observations

### Diagnosis
- Primary diagnosis
- Secondary conditions (if any)

### Treatment Plan
- Recommended procedures
- Medications prescribed
- Follow-up instructions

### Additional Notes
- Important considerations
- Special instructions

Please ensure proper markdown formatting with headers, bullet points, and emphasis where appropriate."""),
HumanMessage(content=f"Generate a detailed medical summary report based on this analysis: {analysis_result}")
]
response = summary_llm.invoke(messages)

state["summary"] = response.content
return state

def validate_diagnosis(state:MedicalAnalysisState):
print("----------------------------------------------------")
print("------------Validating diagnosis from PDF-----------")
print("----------------------------------------------------")
analysis_result = state["analysis_result"]
summary = state["summary"]

messages = [
SystemMessage(content="""You are a medical diagnosis validator. Provide your assessment in markdown format with these sections:

### Alignment Analysis
- Evaluate if diagnosis matches symptoms
- Assess treatment appropriateness
- Review medication selections

### Recommendations
- Alternative treatments to consider
- Suggested medication adjustments
- Additional tests if needed

### Risk Assessment
- Potential complications
- Drug interaction concerns
- Follow-up recommendations

Please format your response in clear markdown with appropriate headers and bullet points."""),
HumanMessage(content=f"""Analysis: {analysis_result}\nSummary: {summary}
Based on the Analysis and Summary provided please provide whether diagnosis,treatment and medication provided is in alignment with medical complaint.
If not in alignment then specify what best treatment and medication could have been provided.
""")
]
response = analyzer_llm.invoke(messages)

state["validation_result"] = response.content.split("</think>")[-1]
return state

# Create the graph
workflow = StateGraph(MedicalAnalysisState)

# Add nodes
workflow.add_node("extractor", extract_context)
workflow.add_node("analyzer", analyze_document)
workflow.add_node("summarizer", generate_summary)
workflow.add_node("validator", validate_diagnosis)

# Define edges
workflow.add_edge(START, "extractor")
workflow.add_edge("extractor", "analyzer")
workflow.add_edge("analyzer", "summarizer")
workflow.add_edge("summarizer", "validator")
workflow.add_edge("validator", END)


# Compile the graph
chain = workflow.compile()

# Generate graph visualization
graph_png = chain.get_graph().draw_mermaid_png()
graph_base64 = base64.b64encode(graph_png).decode('utf-8')

return chain, graph_base64

def process_medical_document(document_path: str) -> Dict[str, Any]:
# Read the document with error handling for different encodings
# try:
# # Try UTF-8 first
# content = Path(document_path).read_text(encoding='utf-8')
# except UnicodeDecodeError:
# try:
# # Try cp1252 (Windows-1252) encoding
# content = Path(document_path).read_text(encoding='cp1252')
# except UnicodeDecodeError:
# try:
# # Try latin-1 as a fallback
# content = Path(document_path).read_text(encoding='latin-1')
# except UnicodeDecodeError:
# raise ValueError("Unable to read document - unsupported character encoding")

# Create the chain and get graph visualization
chain, graph_viz = create_medical_analysis_chain()
print(f"Document path: {document_path}")
# Process the document
result = chain.invoke({"file_name": document_path})

return {
"analysis": result["analysis_result"],
"summary": result["summary"],
"validation": result["validation_result"],
"graph": graph_viz
}

FastAPI application (api.py)

from fastapi import FastAPI, UploadFile, File, Request
from fastapi.responses import JSONResponse, HTMLResponse
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
import shutil
from pathlib import Path
from tempfile import NamedTemporaryFile
from .main import process_medical_document, create_medical_analysis_chain
import os
from datetime import datetime

app = FastAPI()

# Create required directories if they don't exist
static_dir = Path("medical_analyzer/static")
data_dir = Path("medical_analyzer/data")
for directory in [static_dir, data_dir]:
directory.mkdir(parents=True, exist_ok=True)

# Mount static files
app.mount("/static", StaticFiles(directory=str(static_dir)), name="static")

# Setup templates
templates = Jinja2Templates(directory="medical_analyzer/templates")

@app.get("/", response_class=HTMLResponse)
async def home(request: Request):
# Generate the graph visualization when loading the page
_, graph_base64 = create_medical_analysis_chain()
return templates.TemplateResponse(
"index.html",
{
"request": request,
"graph_base64": graph_base64
}
)

@app.post("/analyze-medical-document")
async def analyze_document(file: UploadFile = File(...)):
try:
# Validate file extension
if not file.filename.lower().endswith('.pdf'):
return JSONResponse(
status_code=400,
content={"status": "error", "message": "Only PDF files are supported"}
)

# Create a unique filename to avoid overwrites
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
safe_filename = f"{timestamp}_{file.filename}"
file_path = data_dir / safe_filename

# Save the file
with open(file_path, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)

# Process the document
result = process_medical_document(str(file_path))

return JSONResponse(content={
"status": "success",
"analysis": result["analysis"],
"summary": result["summary"],
"validation": result["validation"],
"graph": result["graph"]
})
except ValueError as e:
return JSONResponse(
status_code=400,
content={"status": "error", "message": str(e)}
)
except Exception as e:
print(f"Error processing document: {str(e)}") # For debugging
return JSONResponse(
status_code=500,
content={"status": "error", "message": "An error occurred while processing the document"}
)

# Optional: Cleanup endpoint for maintenance
@app.delete("/cleanup")
async def cleanup_old_files():
"""Clean up files older than 24 hours"""
try:
current_time = datetime.now()
for file_path in data_dir.glob("*.pdf"):
file_age = current_time - datetime.fromtimestamp(file_path.stat().st_mtime)
if file_age.days >= 1: # Files older than 24 hours
file_path.unlink()
return JSONResponse(content={"status": "success", "message": "Cleanup completed"})
except Exception as e:
return JSONResponse(
status_code=500,
content={"status": "error", "message": f"Cleanup failed: {str(e)}"}
)

index.html

<!DOCTYPE html>
<html lang="en" data-theme="light">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Medical Document Analyzer</title>
<link href="https://cdn.jsdelivr.net/npm/daisyui@4.7.2/dist/full.min.css" rel="stylesheet" type="text/css" />
<script src="https://cdn.tailwindcss.com"></script>
<script defer src="https://unpkg.com/alpinejs@3.x.x/dist/cdn.min.js"></script>
<script>
tailwind.config = {
theme: {
extend: {
fontFamily: {
sans: ['Poppins', 'sans-serif'],
},
},
},
daisyui: {
themes: ["light", "dark", "cupcake", "corporate"],
},
}
</script>
<link href="https://fonts.googleapis.com/css2?family=Poppins:wght@300;400;500;600;700&display=swap" rel="stylesheet">
<!-- Add Marked.js for Markdown rendering -->
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<!-- Add GitHub Markdown CSS -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/github-markdown-css/5.2.0/github-markdown.min.css">

<style>
/* Custom styles for markdown content */
.markdown-body {
box-sizing: border-box;
min-width: 200px;
max-width: 100%;
padding: 1rem;
}

.markdown-body h1,
.markdown-body h2,
.markdown-body h3 {
border-bottom: 1px solid var(--border-color);
padding-bottom: 0.3em;
}

.theme-dark .markdown-body {
color: #c9d1d9;
background-color: transparent;
}

.theme-dark .markdown-body h1,
.theme-dark .markdown-body h2,
.theme-dark .markdown-body h3 {
border-bottom-color: #30363d;
}
</style>
</head>
<body x-data="{
isUploading: false,
result: null,
errorMessage: null,
isDragging: false,
theme: 'light',
toggleTheme() {
this.theme = this.theme === 'light' ? 'dark' : 'light';
document.documentElement.setAttribute('data-theme', this.theme);
},
async handleSubmit(event) {
this.isUploading = true;
this.result = null;
this.errorMessage = null;

const formData = new FormData(event.target);
try {
const response = await fetch('/analyze-medical-document', {
method: 'POST',
body: formData
});
const data = await response.json();
if (data.status === 'success') {
this.result = data;
document.getElementById('results').scrollIntoView({ behavior: 'smooth' });
} else {
this.errorMessage = data.message;
}
} catch (error) {
this.errorMessage = 'An error occurred while processing the document';
}
this.isUploading = false;
},
// Add markdown rendering function
renderMarkdown(text) {
if (!text) return '';
return marked.parse(text);
}
}">
<!-- Navbar -->
<div class="navbar bg-base-100 shadow-lg">
<div class="navbar-start">
<a class="btn btn-ghost text-xl">MedDoc Analyzer</a>
</div>
<div class="navbar-end">
<label class="swap swap-rotate btn btn-ghost btn-circle" @click="toggleTheme">
<input type="checkbox" />
<svg class="swap-on fill-current w-5 h-5" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M5.64,17l-.71.71a1,1,0,0,0,0,1.41,1,1,0,0,0,1.41,0l.71-.71A1,1,0,0,0,5.64,17ZM5,12a1,1,0,0,0-1-1H3a1,1,0,0,0,0,2H4A1,1,0,0,0,5,12Zm7-7a1,1,0,0,0,1-1V3a1,1,0,0,0-2,0V4A1,1,0,0,0,12,5ZM5.64,7.05a1,1,0,0,0,.7.29,1,1,0,0,0,.71-.29,1,1,0,0,0,0-1.41l-.71-.71A1,1,0,0,0,4.93,6.34Zm12,.29a1,1,0,0,0,.7-.29l.71-.71a1,1,0,1,0-1.41-1.41L17,5.64a1,1,0,0,0,0,1.41A1,1,0,0,0,17.66,7.34ZM21,11H20a1,1,0,0,0,0,2h1a1,1,0,0,0,0-2Zm-9,8a1,1,0,0,0-1,1v1a1,1,0,0,0,2,0V20A1,1,0,0,0,12,19ZM18.36,17A1,1,0,0,0,17,18.36l.71.71a1,1,0,0,0,1.41,0,1,1,0,0,0,0-1.41ZM12,6.5A5.5,5.5,0,1,0,17.5,12,5.51,5.51,0,0,0,12,6.5Zm0,9A3.5,3.5,0,1,1,15.5,12,3.5,3.5,0,0,1,12,15.5Z"/></svg>
<svg class="swap-off fill-current w-5 h-5" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M21.64,13a1,1,0,0,0-1.05-.14,8.05,8.05,0,0,1-3.37.73A8.15,8.15,0,0,1,9.08,5.49a8.59,8.59,0,0,1,.25-2A1,1,0,0,0,8,2.36,10.14,10.14,0,1,0,22,14.05,1,1,0,0,0,21.64,13Zm-9.5,6.69A8.14,8.14,0,0,1,7.08,5.22v.27A10.15,10.15,0,0,0,17.22,15.63a9.79,9.79,0,0,0,2.1-.22A8.11,8.11,0,0,1,12.14,19.73Z"/></svg>
</label>
</div>
</div>

<!-- Hero Section -->
<div class="hero min-h-[40vh] bg-base-200">
<div class="hero-content text-center">
<div class="max-w-md">
<h1 class="text-5xl font-bold">Medical Document Analyzer</h1>
<p class="py-6">Upload your medical documents for instant AI-powered analysis, summary, and validation.</p>
<button class="btn btn-primary" onclick="document.getElementById('upload-section').scrollIntoView({behavior: 'smooth'})">Get Started</button>
</div>
</div>
</div>

<!-- Main Content -->
<div class="container mx-auto px-4 py-8">
<!-- Upload Section -->
<div id="upload-section" class="max-w-xl mx-auto">
<div class="card bg-base-100 shadow-xl">
<div class="card-body">
<h2 class="card-title">Upload Document</h2>
<form @submit.prevent="handleSubmit">
<div class="form-control w-full"
@dragover.prevent="isDragging = true"
@dragleave.prevent="isDragging = false"
@drop.prevent="isDragging = false">
<label class="label">
<span class="label-text">Choose a file or drag it here</span>
</label>
<div class="border-2 border-dashed rounded-lg p-8 text-center transition-all duration-200"
:class="{'border-primary bg-primary/5': isDragging}">
<input type="file" name="file" class="file-input file-input-bordered w-full max-w-xs" required />
<p class="mt-2 text-sm text-base-content/70">Supported formats: PDF, DOC, TXT (max 10MB)</p>
</div>
</div>
<button type="submit" class="btn btn-primary w-full mt-4" :disabled="isUploading">
<span x-show="!isUploading">Analyze Document</span>
<span x-show="isUploading" class="loading loading-spinner"></span>
<span x-show="isUploading">Processing...</span>
</button>
</form>
</div>
</div>
</div>

<!-- Error Alert -->
<div x-show="errorMessage"
x-transition
class="alert alert-error max-w-xl mx-auto mt-8">
<svg xmlns="http://www.w3.org/2000/svg" class="stroke-current shrink-0 h-6 w-6" fill="none" viewBox="0 0 24 24"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M10 14l2-2m0 0l2-2m-2 2l-2-2m2 2l2 2m7-2a9 9 0 11-18 0 9 9 0 0118 0z" /></svg>
<span x-text="errorMessage"></span>
</div>

<!-- Results Section -->
<div id="results" x-show="result"
x-transition:enter="transition ease-out duration-300"
x-transition:enter-start="opacity-0 transform -translate-y-4"
x-transition:enter-end="opacity-100 transform translate-y-0"
class="mt-8 space-y-8 max-w-4xl mx-auto">

<!-- Analysis -->
<div class="card bg-base-100 shadow-xl">
<div class="card-body">
<h2 class="card-title text-primary">Document Analysis</h2>
<div class="divider"></div>
<div class="markdown-body" x-html="renderMarkdown(`## Analysis Results\n\n${result?.analysis}`)"></div>
</div>
</div>

<!-- Summary -->
<div class="card bg-base-100 shadow-xl">
<div class="card-body">
<h2 class="card-title text-secondary">Medical Summary</h2>
<div class="divider"></div>
<div class="markdown-body" x-html="renderMarkdown(`## Medical Summary Report\n\n${result?.summary}`)"></div>
</div>
</div>

<!-- Validation -->
<div class="card bg-base-100 shadow-xl">
<div class="card-body">
<h2 class="card-title text-accent">Diagnosis Validation</h2>
<div class="divider"></div>
<div class="markdown-body" x-html="renderMarkdown(`## Validation Results\n\n${result?.validation}`)"></div>
</div>
</div>
</div>
</div>

<!-- Footer -->
<footer class="footer footer-center p-10 bg-base-300 text-base-content">
<aside>
<p>Copyright Β© 2024 - All rights reserved by Medical Document Analyzer</p>
</aside>
</footer>
</body>
</html>

UI

πŸš€ Future Enhancements

We’re planning several improvements:

1. πŸ“ Support for more document formats

2. πŸ”„ Enhanced validation algorithms

3. πŸ”Œ Integration with medical databases

4. βš™οΈ Customizable analysis parameters

5. πŸ“€ Export capabilities for reports

πŸŽ‰ Conclusion

The Medical Document Analyzer represents a significant step forward in automating medical document processing. By combining modern web technologies with advanced AI models, we’ve created a tool that can significantly improve the efficiency of medical document analysis while maintaining accuracy and reliability.

The application demonstrates how AI can be practically applied in healthcare settings to reduce manual workload and improve document processing accuracy. As we continue to develop and refine the system, we expect it to become an increasingly valuable tool for healthcare professionals and administrators.

References

Connect with me

--

--

Plaban Nayak
Plaban Nayak

Written by Plaban Nayak

Machine Learning and Deep Learning enthusiast

Responses (1)