Technologies Used

Python YOLOv8 OpenCV NumPy Computer Vision

Real-Time Object Detection Web Application Using YOLOv8 and Streamlit

This project showcases a real-time object detection web application built using the YOLO (You Only Look Once) deep learning model and deployed with an interactive Streamlit interface.

The primary goal of this project was to build an end-to-end computer vision solution capable of detecting objects in images, video files, and live webcam streams — all within a clean, user-friendly web interface.

Leveraging the Ultralytics YOLOv8 model, the application performs high-speed and accurate object detection while dynamically rendering bounding boxes, class labels, and confidence scores. The system is optimized for real-time inference and provides a seamless experience for users without requiring complex setup.


Key Features

  • Real-time object detection using YOLOv8
  • Supports image, video, and live webcam input
  • Dynamic bounding box rendering with confidence scores
  • Cached model loading for performance optimization
  • Clean and interactive Streamlit-based UI
  • Lightweight deployment-ready architecture

Model Loading with Resource Caching

@st.cache_resource
def load_yolo_v11():
    model = YOLO("yolov8x.pt")
    return model

The YOLO model is loaded once and cached using Streamlit’s resource caching to improve performance and prevent redundant model loading during interaction.

Object Detection Pipeline

def detect_objects_v11(img, model):
    results = model.predict(img)
    return results

This function performs inference using the YOLOv8 model and returns structured detection results including bounding boxes, class IDs, and confidence scores.

Drawing Bounding Boxes and Labels

for box in boxes:
    x1, y1, x2, y2 = map(int, box.xyxy[0].tolist())
    label_id = int(box.cls)
    conf = float(box.conf)

    cv2.rectangle(annotated_frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
    cv2.putText(annotated_frame, f'{label} {conf:.2f}', 
                (x1 + 5, y1 + 25), 
                cv2.FONT_HERSHEY_SIMPLEX, 
                0.9, (0, 0, 0), 2)

The application dynamically renders bounding boxes and overlays class labels with confidence scores for each detected object.

Input Flexibility

Users can choose between:

  • Uploading an image
  • Uploading a video file
  • Using a live webcam feed

Each input type follows the same detection pipeline, ensuring consistency and scalability.