Project Overview
Objective
Built end-to-end multilingual video analysis with clip-level descriptions and bilingual summaries.
Stack
FastAPIOpenCVBLIP-2FLAN-T5NLLB-200ReactTailwind CSS
Delivery highlights
- Built an end-to-end AI video analysis system that accepts a video file as input and performs temporal segmentation based on a configurable Seconds Per Clip parameter (e.g., every 4 seconds). The system samples frames at fixed intervals using OpenCV, generates segment-level descriptions in English using BLIP-2, summarizes the overall video using FLAN-T5, and translates the outputs into Thai using NLLB-200, producing bilingual timeline descriptions and summaries (description_en, description_th, summary_en, summary_th).The system exposes its functionality through FastAPI, which handles video upload and returns structured JSON results. The frontend is built with React for user interaction and timeline visualization, while Tailwind CSS is used for UI styling and layout design.