Phasuwut
ExperienceProjectsSkillsPublicationsAI ChatContactHire Me
Phasuwut

Full Stack · AI Engineer · Thailand

Portfolio

  • Work Experience
  • Projects
  • Skills
  • Publications

More

  • Certification
  • Demo
  • AI Chat
  • Contact

Download

  • Profile
  • Resume
  • CV

© 2026 Phasuwut Chunnapiya

[email protected]

Multimodal Semantic Retrieval (Video and Image Search)

Unified text-to-video and text-to-image search into one cross-modal retrieval platform. This project demonstrates practical execution from architecture and implementation to measurable delivery outcomes.

 Home AI Resume Chat Work Experience All Projects Demo Technical Skills Certification Publications Contact
Personal ProjectsYear 2026

Project Overview

Objective

Unified text-to-video and text-to-image search into one cross-modal retrieval platform.

Stack

CLIP (ViT-B/32)FAISSFastAPIReact.jsTailwind CSS

Delivery highlights

  • Extended and integrated previous projects (Textto-Video Semantic Search and Text-to-Image Semantic Search) into a unified multimodal semantic retrieval platform capable of searching across both videos and images using natural language queries. Leveraged CLIP (ViT-B/32) developed by OpenAI to generate shared embeddings for text, video keyframes, and images within the same vector space, enabling cross-modal semantic similarity search with FAISS and accurate timestamp alignment for video playback. Integrated language translation preprocessing (Thai → English) to improve embedding alignment and retrieval accuracy, as CLIP performs more effectively with English text inputs. Developed RESTful APIs using FastAPI that return structured JSON responses containing media_id, timestamp (for videos), similarity score, and media URL, and built a responsive frontend using React.js and Tailwind CSS for real-time result visualization.
Back to Topic ProjectsBack to All Projects

Project Videos

1 items

Demo Video

Watch on source

Related Projects

3 items

Text-to-Video Semantic Search

Personal ProjectsYear: 2026

Built text-to-video semantic scene retrieval with multilingual query processing.

Text-to-Image Semantic Search Module

Personal ProjectsYear: 2026

Built text-to-image semantic search using CLIP shared embedding space and FAISS indexing.

Multilingual Semantic Video Event and Action Search Engine and API

Personal ProjectsYear: 2026

Built FastAPI semantic search over videos with clip indexing and multilingual query support.