Multilingual Video Understanding and Event Summarization System (Thai-English Timeline Intelligence) | Projects

Project Overview

Objective

Built end-to-end multilingual video analysis with clip-level descriptions and bilingual summaries.

Stack

FastAPIOpenCVBLIP-2FLAN-T5NLLB-200ReactTailwind CSS

Delivery highlights

Built an end-to-end AI video analysis system that accepts a video file as input and performs temporal segmentation based on a configurable Seconds Per Clip parameter (e.g., every 4 seconds). The system samples frames at fixed intervals using OpenCV, generates segment-level descriptions in English using BLIP-2, summarizes the overall video using FLAN-T5, and translates the outputs into Thai using NLLB-200, producing bilingual timeline descriptions and summaries (description_en, description_th, summary_en, summary_th).The system exposes its functionality through FastAPI, which handles video upload and returns structured JSON results. The frontend is built with React for user interaction and timeline visualization, while Tailwind CSS is used for UI styling and layout design.

1 items

Demo Video

3 items

Personal ProjectsYear: 2026

Built FastAPI semantic search over videos with clip indexing and multilingual query support.

Personal ProjectsYear: 2026

Built text-to-video semantic scene retrieval with multilingual query processing.

Personal ProjectsYear: 2026

Unified text-to-video and text-to-image search into one cross-modal retrieval platform.

Objective

Built end-to-end multilingual video analysis with clip-level descriptions and bilingual summaries.

Stack

FastAPIOpenCVBLIP-2FLAN-T5NLLB-200ReactTailwind CSS

Delivery highlights

Built an end-to-end AI video analysis system that accepts a video file as input and performs temporal segmentation based on a configurable Seconds Per Clip parameter (e.g., every 4 seconds). The system samples frames at fixed intervals using OpenCV, generates segment-level descriptions in English using BLIP-2, summarizes the overall video using FLAN-T5, and translates the outputs into Thai using NLLB-200, producing bilingual timeline descriptions and summaries (description_en, description_th, summary_en, summary_th).The system exposes its functionality through FastAPI, which handles video upload and returns structured JSON results. The frontend is built with React for user interaction and timeline visualization, while Tailwind CSS is used for UI styling and layout design.