Electric Vehicle Charger Socket Semantic Visual Search System with YOLO, CLIP, and FAISS | Projects

Project Overview

Objective

Built two-stage semantic retrieval and socket-type refinement for EV charger images.

Stack

CLIPCustom CLIPFAISSYOLO

Delivery highlights

This project extends the previousworks, “ElectricVehicleCharger SocketRecognitionandSemanticRetrieval Modelusing OpenCLIP(ViT-B/32)” and “ElectricVehicleCharger SocketDetection” by combining semantic text understanding and object detection into a two-stage pipelinewhere, in Stage 1, a natural language query such as “A red car plugged into an AC Type 2 charging port” is encoded using a pretrained CLIP model and matched against image embeddings through FAISS to retrieve broadly relevantimages based on general visualfeatures, meaning the system may primarily focus on obvious elements like “a red car” because the pretrained model does not strongly understand specific ElectricVehicle Charger Socke types, and therefore the initialresults might emphasize car colorratherthan the exact charger socket;then in Stage 2,refinementis performed in two parts:first, a customtrained CLIP model specialized for ElectricVehicle Charger Socke re-ranks the candidate images to prioritize those containing the correct plug type, and second, YOLOobject detection is applied to the top-ranked results to explicitly detect and label chargertypeswith bounding boxes and confidence scores, ensuring thatthe final output not only matches the general scene but also correctly identifies the specific ElectricVehicle Charger Socke type.

Back to Topic Projects Back to All Projects

Project Videos

1 items

Demo Video

Watch on source

Related Projects

3 items

Visual Question Answering System with YOLO, CLIP, ViT, BLIP, BLIP Caption, and LLM

Personal ProjectsYear: 2026

Built end-to-end VQA platform for image upload, scene understanding, and LLM-based answers.

Multimodal Semantic Search Chat System (FastAPI, Qdrant, CLIP, BLIP,Typhoon OCR):

Personal ProjectsYear: 2026

Built a multimodal chat system leveraging vector embeddings and cross-modal retrieval for semantic search across text and images with multilingual support.

Electric Vehicle Charger Socket Recognition and Semantic Retrieval Model using OpenCLIP (ViT-B/32)

Personal ProjectsYear: 2026

Built OpenCLIP-based recognition and text-image retrieval for five EV socket classes.

Project Overview

Objective

Built two-stage semantic retrieval and socket-type refinement for EV charger images.

Stack

CLIPCustom CLIPFAISSYOLO

Delivery highlights

This project extends the previousworks, “ElectricVehicleCharger SocketRecognitionandSemanticRetrieval Modelusing OpenCLIP(ViT-B/32)” and “ElectricVehicleCharger SocketDetection” by combining semantic text understanding and object detection into a two-stage pipelinewhere, in Stage 1, a natural language query such as “A red car plugged into an AC Type 2 charging port” is encoded using a pretrained CLIP model and matched against image embeddings through FAISS to retrieve broadly relevantimages based on general visualfeatures, meaning the system may primarily focus on obvious elements like “a red car” because the pretrained model does not strongly understand specific ElectricVehicle Charger Socke types, and therefore the initialresults might emphasize car colorratherthan the exact charger socket;then in Stage 2,refinementis performed in two parts:first, a customtrained CLIP model specialized for ElectricVehicle Charger Socke re-ranks the candidate images to prioritize those containing the correct plug type, and second, YOLOobject detection is applied to the top-ranked results to explicitly detect and label chargertypeswith bounding boxes and confidence scores, ensuring thatthe final output not only matches the general scene but also correctly identifies the specific ElectricVehicle Charger Socke type.

Back to Topic Projects Back to All Projects

Project Videos

1 items

Demo Video

Watch on source

Related Projects

3 items

Visual Question Answering System with YOLO, CLIP, ViT, BLIP, BLIP Caption, and LLM

Personal ProjectsYear: 2026

Built end-to-end VQA platform for image upload, scene understanding, and LLM-based answers.

Multimodal Semantic Search Chat System (FastAPI, Qdrant, CLIP, BLIP,Typhoon OCR):

Personal ProjectsYear: 2026

Built a multimodal chat system leveraging vector embeddings and cross-modal retrieval for semantic search across text and images with multilingual support.

Electric Vehicle Charger Socket Recognition and Semantic Retrieval Model using OpenCLIP (ViT-B/32)

Personal ProjectsYear: 2026

Built OpenCLIP-based recognition and text-image retrieval for five EV socket classes.