Multilingual Semantic Video Event and Action Search Engine and API | Projects

Project Overview

Objective

Built FastAPI semantic search over videos with clip indexing and multilingual query support.

Stack

FastAPIOpenCVCLIP (ViT-B/32)PyTorchFAISS

Delivery highlights

Built a FastAPI-based video semantic search system that enables natural language queries over video content by segmenting videos into fixed-duration clips, extracting representative frames with OpenCV, generating semantic embeddings using CLIP (ViT-B/32) in PyTorch, and indexing them in FAISS for efficient cosine similarity search; the system processes queries such as 'dog running in park' or 'car accident at intersection', performs multilingual translation when necessary to improve embedding alignment, matches results against stored clip embeddings using similarity scoring, groups matched segments by video, and returns structured JSON responses with video filenames, URLs, and precise timestamp intervals (start_time, end_time) for direct navigation to relevant moments without watching the entire video.

3 items

Personal ProjectsYear: 2026

Built text-to-video semantic scene retrieval with multilingual query processing.

Personal ProjectsYear: 2026

Unified text-to-video and text-to-image search into one cross-modal retrieval platform.

Personal ProjectsYear: 2026

Built end-to-end multilingual video analysis with clip-level descriptions and bilingual summaries.

Objective

Built FastAPI semantic search over videos with clip indexing and multilingual query support.

Stack

FastAPIOpenCVCLIP (ViT-B/32)PyTorchFAISS

Delivery highlights

Built a FastAPI-based video semantic search system that enables natural language queries over video content by segmenting videos into fixed-duration clips, extracting representative frames with OpenCV, generating semantic embeddings using CLIP (ViT-B/32) in PyTorch, and indexing them in FAISS for efficient cosine similarity search; the system processes queries such as 'dog running in park' or 'car accident at intersection', performs multilingual translation when necessary to improve embedding alignment, matches results against stored clip embeddings using similarity scoring, groups matched segments by video, and returns structured JSON responses with video filenames, URLs, and precise timestamp intervals (start_time, end_time) for direct navigation to relevant moments without watching the entire video.

3 items

Personal ProjectsYear: 2026

Built text-to-video semantic scene retrieval with multilingual query processing.

Personal ProjectsYear: 2026

Unified text-to-video and text-to-image search into one cross-modal retrieval platform.

Personal ProjectsYear: 2026

Built end-to-end multilingual video analysis with clip-level descriptions and bilingual summaries.