Real-Time Object Detection Web Application Using YOLOv8

Technologies Used

Python YOLOv8 OpenCV NumPy Computer Vision

Real-Time Object Detection Web Application Using YOLOv8 and Streamlit

This project showcases a real-time object detection web application built using the YOLO (You Only Look Once) deep learning model and deployed with an interactive Streamlit interface.

The primary goal of this project was to build an end-to-end computer vision solution capable of detecting objects in images, video files, and live webcam streams — all within a clean, user-friendly web interface.

Leveraging the Ultralytics YOLOv8 model, the application performs high-speed and accurate object detection while dynamically rendering bounding boxes, class labels, and confidence scores. The system is optimized for real-time inference and provides a seamless experience for users without requiring complex setup.

Key Features

Real-time object detection using YOLOv8
Supports image, video, and live webcam input
Dynamic bounding box rendering with confidence scores
Cached model loading for performance optimization
Clean and interactive Streamlit-based UI
Lightweight deployment-ready architecture

Model Loading with Resource Caching

@st.cache_resource
def load_yolo_v11():
    model = YOLO("yolov8x.pt")
    return model

The YOLO model is loaded once and cached using Streamlit’s resource caching to improve performance and prevent redundant model loading during interaction.

Object Detection Pipeline

def detect_objects_v11(img, model):
    results = model.predict(img)
    return results

This function performs inference using the YOLOv8 model and returns structured detection results including bounding boxes, class IDs, and confidence scores.

Drawing Bounding Boxes and Labels

for box in boxes:
    x1, y1, x2, y2 = map(int, box.xyxy[0].tolist())
    label_id = int(box.cls)
    conf = float(box.conf)

    cv2.rectangle(annotated_frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
    cv2.putText(annotated_frame, f'{label} {conf:.2f}', 
                (x1 + 5, y1 + 25), 
                cv2.FONT_HERSHEY_SIMPLEX, 
                0.9, (0, 0, 0), 2)

The application dynamically renders bounding boxes and overlays class labels with confidence scores for each detected object.

Input Flexibility

Users can choose between:

Uploading an image
Uploading a video file
Using a live webcam feed

Each input type follows the same detection pipeline, ensuring consistency and scalability.

🚀 View Live Project 💻 View on GitHub