Retrieval Enhanced Generative Medical Question & Answering

Plaban Nayak
8 min readMar 19, 2023


Courtesy :

In general when performing Question Answering task , we can get answers pertaining to context in which the question was asked. However if we ask more advanced or specific questions to LLMs like COHERE xlarge or google/Flan-t5-xl, these models will not perform well.

So in order to to accurately and efficiently answer questions posed by users based on a large corpus of textual information we can take two approaches :-

  1. Finetuning LLMs
  2. Retrieval Augmented generation (retrieval-based and generative-based approach).

What is finetuning ?

  • type of transfer learning
  • use a large pretrained model and teach it a new trick with a few hundred or few thousand samples
  • here we are only teaching a new task by tweaking final output

What is retrieval-based approach / semantic search ?

  • search based on meaning of the context
  • next generation database

The above two concepts are very different. Finetuning a Q&A model is like using a hammer to drive a screw through a board placed on your knee.We could do it but we surely will regret it.

So, finetuning is just like a doing a new task and not applying a new knowledge. In order for the model to learn new knowledge we have to unfreeze the entire model weights which is lucratively expensive and even then it cannot get rid of hallucination like in case of ChatGPT.

Since ChatGPT was finetuned to follow a conversational pattern they have not done any epistemological work. And LLMs on their own do not have a theory of knowledge or mind.

Finetuning vs Semantic Search


  • slow , difficult, expensive
  • prone to hallucination/ confabulation
  • teaches new task not ne information
  • requires constant retraining
  • not scalable
  • does not work for Question and Answer task

Semantic Search

  • fast, easy,cheap
  • recalls exact information
  • adding new information is easy
  • infinitely scalable
  • solves half of Question and Answer task

In order to overcome the shortcomings in a finetune approach we will apply Retrieval Enhanced Generative Question Answering (REGQA).

What is the main use of Retrieval Enhanced Generative Question answering ?

Retrieval Enhanced Generative Question Answering (REGQA) is a type of question answering system that combines the strengths of both retrieval-based and generative-based models.

The main use of REGQA is to accurately and efficiently answer questions posed by users based on a large corpus of textual information. It works by first retrieving relevant information from the corpus and then using a generative model to generate an answer to the question based on the retrieved information.

REGQA systems are particularly useful for tasks such as customer support, where users may have specific questions about a product or service and need a quick and accurate response. They can also be used in educational settings to help students quickly find answers to questions related to a particular topic.

Overall, the main goal of REGQA is to provide accurate and informative answers to questions posed by users, while minimizing the amount of time and effort required to find those answers.

What is the advantage of Retrieval Enhanced Generative Question Answering ?

The advantage of Retrieval Enhanced Generative Question Answering (REGQA) over general question answering using Language Models (LMs) or Transformers is that it combines the strengths of both retrieval-based and generative-based models.

Retrieval-based models are good at identifying and retrieving relevant information from a large corpus of text, while generative-based models are good at generating natural language responses to questions. By combining these two approaches, REGQA systems can produce more accurate and informative answers than LM or Transformer-based models alone.

Another advantage of REGQA is that it can improve the efficiency of the question answering process. Retrieval-based models can quickly identify the most relevant information from a large corpus, which can then be used to generate an answer using a generative-based model. This can be faster than having to generate a response from scratch using a generative model alone.

In addition, REGQA can be more robust to noisy or incomplete input data. Because it relies on retrieving relevant information from a corpus, it can still provide a reasonable answer even if the input data is not complete or accurate. This is particularly useful in real-world scenarios where data can be messy or incomplete.

Overall, the advantage of REGQA is that it combines the strengths of both retrieval-based and generative-based models to provide more accurate, efficient, and robust question answering.

Implementaion :

In order to implement Retrieval Enhanced Generative Question Answering we will use following technology stack

  1. Pincecone : Pinecone is a cloud-based vector database designed for building and deploying machine learning applications that require real-time similarity search. The primary use of Pinecone database is to store high-dimensional vectors (i.e., numerical representations of data points) and enable fast and efficient similarity search operations on those vectors.
  2. OpenAI Embedding : Convert text into respective embeddings.
  3. OpenAI ChatGPT api for Summarization
  4. Dataset : diseases data from

Install required Libraries

!pip install -qU openai pinecone-client
!pip install -qU transformers

Initialize Pinecone

import pinecone
from tqdm.autonotebook import tqdm
index_name = "openai-medical-symp-text"
pinecone.init(api_key ="<<Your Pinecone API Key>>",
environment = "us-west1-gcp")

Initialize OpenAI

import openai
openai.api_key = "Your OpenAI API Key"

Data preprocessing

import pandas as pd
import numpy as np
url = ""
df = pd.read_csv(url,encoding='utf-8',encoding_errors='ignore')
import re

#print(df.text.replace(to_replace, replace_with, regex=True))
df['diagnosis'] = df['diagnosis'].replace(to_replace, replace_with, regex=True)
df['Symptoms'] = df['Symptoms'].replace(to_replace, replace_with, regex=True)
new = df[['Disease','link','Symptoms']].copy()
Processed Data

Build a helper function to summarize the disease symptoms

This is applicable wherever the length of the symptoms text exceeds 10000. This is done to prevent error encountered in pinecone database while upsert when the metadata size exceeds the limit of 10240 bytes per vector.

from transformers import pipeline
import textwrap
summary = pipeline('summarization')
def reduce_text(x):
if len(x) > 10000:
final_list = []
chunks = textwrap.wrap(x,1000)
for chunk in chunks:
final_summary = " ".join(final_list)
return final_summary
return x
new['Symptoms'] = new['Symptoms'].map(reduce_text)

Create Embeddings using OpenAI

MODEL = "text-embedding-ada-002"
res = openai.Embedding.create(
input= new['Symptoms'].values.tolist()[:3],
engine = MODEL
embeds = [ record['embedding'] for record in res['data']]
#(1536, 1536, 1536)
pinecone_dimension = len(embeds[0])
  • The embeddings will have the same dimensionality for all the input.

Create Pinecone index if not present

if index_name not in pinecone.list_indexes():
pinecone.create_index(index_name,dimension = pinecone_dimension,metric ='cosine')
  • After successful creation of index we can check for the index in pinecone

Connect to index and view index status

index = pinecone.Index(index_name)
{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {},
'total_vector_count': 0}

Populate vector database with OpenAI text-embeddings-ada-002 build embeddings

from import tqdm
import datetime
from time import sleep
batch_size = 32
for i in tqdm(range(0,len(new),batch_size)):
#find end of each batch
i_end = min(len(new),i + batch_size)
#get batch id
batch_ids = [str(i) for i in range(i,i_end)]
#get batch of text
text_batch = new['Symptoms'].values.tolist()[i:i_end]
#get batch of symptom description
link_batch = new['link'].values.tolist()[i:i_end]
#get batch disease name
dis_batch = new['Disease'].values.tolist()[i:i_end]
#create embeddings
res = openai.Embedding.create(input=text_batch,engine=MODEL)
embeds = [ record['embedding'] for record in res['data']]
#prepare metadata and upsert batch
meta = [{'text':line,'ds':desc,'url':link} for line,desc,link in zip(text_batch,dis_batch,link_batch)]
to_upsert = zip(batch_ids,embeds,meta)
#upsert to pinecone

View index status after insert of vector embeddings in the vector database.

{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {'': {'vector_count': 1106}},
'total_vector_count': 1106}

Ask Questions or Query

Convert Search query into embeddings using OpenAI

query = "experiencing dizziness"
res = openai.Embedding.create(input=[query],engine=MODEL)
xq = res['data'][0]['embedding']
#Search for the query vector in the Vector Database and return top 3 matches
res = index.query(xq,top_k=3, include_metadata=True)

Retrieve the url returned as apart of the data and use OpenAI ChatGPT api to summarize the conetent in the url.


urls = res['matches'][0]['metadata']['url']
messages = [{"role":"system",
"content":"Your are a helpful healthcare assistant.",
prompt = """Give an extremely engaging and detailed summary based on the context in the below url.

url : <<URL>>


prompt = prompt.replace("<<URL>>",urls)
def retrive_summary(messages):
max_retry = 5
retry = 0
while True:
chat = openai.ChatCompletion.create(model="gpt-3.5-turbo",
messages = messages,
reply = chat.choices[0].message.content
return reply
except Exception as oops:
retry += 1
if retry >= max_retry:
return "Accessing the Completion service error: %s" % oops
print('Error communicating with OpenAI:', oops)

The Mayo Clinic website provides a comprehensive overview of dizziness, including its symptoms, causes, and treatment options. Dizziness is a common condition that can be caused by a variety of factors, including inner ear problems, low blood pressure, medication side effects, and anxiety. The symptoms of dizziness can range from mild lightheadedness to severe vertigo, and can be accompanied by other symptoms such as nausea, vomiting, and difficulty walking.

The website provides detailed information on the various causes of dizziness, including how they affect the body and what symptoms they may produce. It also offers advice on how to prevent dizziness, such as avoiding sudden changes in position, staying hydrated, and managing stress.

In addition, the website provides a range of treatment options for dizziness, including medication, physical therapy, and lifestyle changes. It also offers advice on when to seek medical attention for dizziness, as well as what to expect during a medical evaluation.

Overall, the Mayo Clinic website provides a wealth of information on dizziness that is both informative and easy to understand. It is a valuable resource for anyone experiencing dizziness or seeking to learn more about this common condition.

Delete Pinecone Index

When you create an index, it runs as a service until you delete it. Users are
billed for running indexes, so we recommend you delete any indexes you’re not using. This will minimize your costs.



connect with me