Project Overview
Objective
Built FastAPI semantic search over videos with clip indexing and multilingual query support.
Stack
FastAPIOpenCVCLIP (ViT-B/32)PyTorchFAISS
Delivery highlights
- Built a FastAPI-based video semantic search system that enables natural language queries over video content by segmenting videos into fixed-duration clips, extracting representative frames with OpenCV, generating semantic embeddings using CLIP (ViT-B/32) in PyTorch, and indexing them in FAISS for efficient cosine similarity search; the system processes queries such as 'dog running in park' or 'car accident at intersection', performs multilingual translation when necessary to improve embedding alignment, matches results against stored clip embeddings using similarity scoring, groups matched segments by video, and returns structured JSON responses with video filenames, URLs, and precise timestamp intervals (start_time, end_time) for direct navigation to relevant moments without watching the entire video.