Project Overview
Objective
Built two-stage semantic retrieval and socket-type refinement for EV charger images.
Stack
CLIPCustom CLIPFAISSYOLO
Delivery highlights
- This project extends the previousworks, “ElectricVehicleCharger SocketRecognitionandSemanticRetrieval Modelusing OpenCLIP(ViT-B/32)” and “ElectricVehicleCharger SocketDetection” by combining semantic text understanding and object detection into a two-stage pipelinewhere, in Stage 1, a natural language query such as “A red car plugged into an AC Type 2 charging port” is encoded using a pretrained CLIP model and matched against image embeddings through FAISS to retrieve broadly relevantimages based on general visualfeatures, meaning the system may primarily focus on obvious elements like “a red car” because the pretrained model does not strongly understand specific ElectricVehicle Charger Socke types, and therefore the initialresults might emphasize car colorratherthan the exact charger socket;then in Stage 2,refinementis performed in two parts:first, a customtrained CLIP model specialized for ElectricVehicle Charger Socke re-ranks the candidate images to prioritize those containing the correct plug type, and second, YOLOobject detection is applied to the top-ranked results to explicitly detect and label chargertypeswith bounding boxes and confidence scores, ensuring thatthe final output not only matches the general scene but also correctly identifies the specific ElectricVehicle Charger Socke type.