AI Document Search & Question Answering System (RAG)

Built a multimodal AI system for document, audio, and image understanding with natural language question answering using retrieval-augmented generation (RAG). This project demonstrates practical execution from architecture and implementation to measurable delivery outcomes.

Personal ProjectsYear 2026

Project Overview

Objective

Built a multimodal AI system for document, audio, and image understanding with natural language question answering using retrieval-augmented generation (RAG).

Stack

FastAPINext.jsPyPDFLoaderSentenceTransformers (bge-m3)QdrantElasticsearchBLIPTyphoon OCRWhisperGPT-4o-miniGPT-4.1GPT-5

Delivery highlights

  • Developed an advanced multimodal AI platform by extending and integrating multiple existing systems including AI Document Question Answering System with RAG and LLM, Multimodal Semantic Search, AI Meeting Transcription & Q&A, and Text-to-Image Semantic Search into a unified architecture that enables cross-modal retrieval and context-aware reasoning across documents, audio, and images, designing and implementing RESTful APIs using FastAPI for file upload, background processing, indexing, and question-answering workflows, using PyMuPDF for document parsing, Typhoon OCR API for extracting text from images and scanned PDFs, Whisper for speech-to-text transcription from audio, and BLIP for image captioning, while applying text chunking and generating semantic embeddings with SentenceTransformers (BAAI/bge-m3) stored in Qdrant for vector similarity search, combined with Elasticsearch for keyword-based retrieval to implement a hybrid search system that improves retrieval accuracy and reduces hallucination, and leveraging selectable Large Language Models (GPT-4o-mini, GPT-4.1, GPT-5) via LangChain to generate context-grounded answers with source attribution, supported by scalable backend services with background job processing and persistent storage, and a modern frontend built with Next.js for file upload, semantic search, and interactive knowledge exploration across multiple data sources.
Back to Topic ProjectsBack to All Projects

Related Projects

3 items

AI Document Question Answering System with RAG and LLM

Personal ProjectsYear: 2026

Built PDF upload and natural language QA system with retrieval-augmented generation.

Visual Question Answering System with YOLO, CLIP, ViT, BLIP, BLIP Caption, and LLM

Personal ProjectsYear: 2026

Built end-to-end VQA platform for image upload, scene understanding, and LLM-based answers.

AI Meeting Transcription, Summarization & Q&A System (RAG + LLM):

Personal ProjectsYear: 2026

Built an end-to-end system for meeting transcription, summarization, and context-aware Q&A using Whisper, Qdrant, and LLMs, with FastAPI + React for real-time processing and interactive querying.