Vehicle Recognition with Segmentation Training on a Custom Dataset

1. Introduction

This project demonstrates the implementation of a custom vehicle recognition and segmentation system using YOLOv8 trained on a specially curated vehicle dataset. The system has been designed to identify and segment three distinct vehicle types (cars, pickups, and trucks) using instance segmentation techniques that provide pixel-level precision in vehicle identification.

The project addresses the growing demand for automated vehicle classification in applications such as traffic monitoring, parking management, toll collection systems, and intelligent transportation systems. By training a custom YOLOv8 segmentation model on a specialized dataset, the system achieves high accuracy in distinguishing between similar vehicle categories that typically pose challenges for general-purpose models.

The implementation showcases a complete machine learning workflow, from data collection and preprocessing through model training, evaluation, and deployment. Using Roboflow for dataset management and annotation, and Ultralytics YOLOv8 for model training, the project offers a practical example of custom object segmentation for domain-specific applications.

Core Features:

2. Methodology / Approach

The project follows a structured deep learning workflow for custom vehicle segmentation, leveraging YOLOv8's state-of-the-art instance segmentation capabilities. The methodology combines careful dataset preparation, strategic data augmentation, and systematic model training to achieve optimal performance.

Dataset Preparation: The custom dataset consists of 60 manually selected and annotated vehicle images distributed equally across three classes (cars, pickups, trucks). Images were annotated using Roboflow's segmentation tools, creating precise polygon masks for each vehicle instance.

Data Augmentation: To enhance model generalization and prevent overfitting on the small dataset, multiple augmentation techniques were applied, including horizontal flips, random crops (0-20% zoom), rotation (±10°), grayscale conversion, and blur effects.

Model Training: The YOLOv8x-seg model (extra-large variant) was initialized with pre-trained weights and fine-tuned for 50 epochs on the custom dataset. The training utilized automatic mixed precision (AMP) for faster computation and included mosaic augmentation for the first 40 epochs.

2.1 System Architecture

The system comprises four main components:

  1. Data Collection & Annotation: Manual selection and polygon annotation of vehicle images
  2. Preprocessing & Augmentation: Image standardization and augmentation pipeline
  3. Model Training: Custom YOLOv8x-seg training with transfer learning
  4. Evaluation & Testing: Performance assessment on unseen vehicle images

2.2 Dataset Split

The 60-image dataset was strategically divided:

Additionally, 15 completely new images (5 per class) were used for external validation to assess real-world performance.

3. Mathematical Framework

3.1 Performance Metrics

Precision: Measures the accuracy of positive predictions

$$\text{Precision} = \frac{TP}{TP + FP}$$

where $TP$ denotes true positives and $FP$ denotes false positives.

Recall: Measures the ability to find all positive instances

$$\text{Recall} = \frac{TP}{TP + FN}$$

where $FN$ denotes false negatives.

F1 Score: Harmonic mean of precision and recall

$$F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$

mAP (mean Average Precision): Average precision across all classes at specified IoU thresholds

3.2 Training Objective

YOLOv8 segmentation loss function combines:

The total loss function can be expressed as:

$$\mathcal{L}_{\text{total}} = \lambda_{\text{box}} \mathcal{L}_{\text{box}} + \lambda_{\text{seg}} \mathcal{L}_{\text{seg}} + \lambda_{\text{cls}} \mathcal{L}_{\text{cls}} + \lambda_{\text{dfl}} \mathcal{L}_{\text{dfl}}$$

where $\lambda_i$ represents the weight coefficient for each loss component.

4. Requirements

requirements.txt

ultralytics>=8.0.0
roboflow>=1.0.0
pandas>=1.3.0

5. Installation & Configuration

5.1 Environment Setup

# Clone the repository
git clone https://github.com/kemalkilicaslan/Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset.git
cd Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset

# Install required packages
pip install -r requirements.txt

Note: This project was developed using Google Colab with GPU acceleration (Tesla T4). For local execution, ensure you have:

5.2 Project Structure

Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset
├── input/
│   ├── car1.jpg - car5.jpg
│   ├── pickup1.jpg - pickup5.jpg
│   └── truck1.jpg - truck5.jpg
├── output/
│   └── (segmented prediction images)
├── Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset.ipynb
├── confusion_matrix.png
├── results.png
├── README.md
├── requirements.txt
└── LICENSE

5.3 Required Files

Dataset Access:

Pre-trained Model:

6. Usage / How to Run

6.1 Dataset Download from Roboflow

from roboflow import Roboflow

rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("kemalkilicaslan-bgq6q").project("vehicle-segmentation-yvbo4")
version = project.version(5).download("yolov8")

6.2 Model Training

CLI:

yolo task=segment mode=train model=yolov8x-seg.pt \
  data="/path/to/vehicle-segmentation-5/data.yaml" \
  epochs=50 imgsz=640

Python:

from ultralytics import YOLO

# Initialize model
model = YOLO('yolov8x-seg.pt')

# Train model
model.train(
    data='/path/to/vehicle-segmentation-5/data.yaml',
    epochs=50,
    imgsz=640,
    task='segment'
)

6.3 Model Validation

yolo task=segment mode=val \
  model=/path/to/runs/segment/train/weights/best.pt \
  data="/path/to/vehicle-segmentation-5/data.yaml"

6.4 Prediction on New Images

CLI:

yolo task=segment mode=predict \
  model=/path/to/runs/segment/train/weights/best.pt \
  conf=0.85 \
  source="/path/to/test/images/*.jpg"

Python:

from ultralytics import YOLO

# Load trained model
model = YOLO('/path/to/best.pt')

# Run prediction
results = model.predict(
    source='/path/to/test/images',
    conf=0.85,
    save=True
)

7. Application / Results

7.1 Training Results

The model was trained for 50 epochs with the following final metrics:

Metric Value
Box mAP50 99.5%
Box mAP50-95 90.3%
Mask mAP50 99.5%
Mask mAP50-95 87.2%
Training Time 0.308 hours (~18.5 minutes)

7.2 Class-wise Performance

Class Precision Recall F1 Score Box mAP50-95 Mask mAP50-95
Car 99.1% 100% 0.995 89.7% 89.9%
Pickup 90.5% 100% 0.950 91.9% 81.5%
Truck 100% 79.6% 0.886 89.4% 90.3%
Overall 96.5% 93.2% 0.948 90.3% 87.2%

7.3 Training Progress

Confusion Matrix:

Confusion Matrix

Training Metrics:

Training Results

7.4 Prediction Examples

The model was tested on 15 completely new vehicle images (5 per class) to evaluate its real-world performance. All test images were successfully segmented with high confidence scores (>85%), demonstrating the model's robust ability to accurately classify and segment vehicles across various conditions.

output_20_0 output_20_2 output_20_4 output_20_6 output_20_8 output_20_10 output_20_12 output_20_14 output_20_16 output_20_18 output_20_20 output_20_22 output_20_24 output_20_26 output_20_28

7.5 Performance Analysis

Model Capabilities Demonstrated:

Strengths:

Areas for Improvement:

8. How It Works (Pipeline Overview)

8.1 Data Preparation Pipeline

[Raw Vehicle Images]
          ↓
[Roboflow Annotation] → [Polygon Segmentation Masks]
          ↓
[Dataset Split] → [70% Train | 20% Val | 10% Test]
          ↓
[Preprocessing] → [Auto-orient, Resize to 640×640]
          ↓
[Augmentation] → [Flip, Crop, Rotate, Grayscale, Blur]
          ↓
[YOLO Format Dataset]

8.2 Training Pipeline

[Pre-trained YOLOv8x-seg]
          ↓
[Transfer Learning] → [Fine-tune on Custom Dataset]
          ↓
[50 Epochs Training]
          ├── Box Loss Optimization
          ├── Segmentation Loss Minimization
          ├── Classification Accuracy
          └── DFL Loss Reduction
          ↓
[Model Validation] → [Performance Metrics]
          ↓
[Best Model Checkpoint] (.pt file)

8.3 Inference Pipeline

[Input Vehicle Image]
          ↓
[Image Preprocessing] → [Resize, Normalize]
          ↓
[YOLOv8x-seg Model]
          ├── Backbone Feature Extraction
          ├── Neck Feature Fusion
          └── Head Detection & Segmentation
          ↓
[Post-processing]
          ├── Confidence Filtering (>85%)
          ├── Non-Maximum Suppression
          └── Mask Generation
          ↓
[Output: Segmented Vehicle + Class Label + Confidence]

9. Tech Stack

9.1 Core Technologies

9.2 Libraries & Dependencies

Library Version Purpose
ultralytics 8.3.205 YOLOv8 implementation and training
roboflow 1.0+ Dataset management and download
torch 2.8.0 Deep learning computations
opencv-python 4.12.0 Image processing
pandas Latest Data analysis and metrics
numpy 2.0.2 Numerical computations

9.3 Model Specifications

YOLOv8x-seg Architecture:

Training Configuration:

10. Dataset Details

10.1 Data Collection

Source Images:

Annotation Method:

10.2 Preprocessing

All images underwent standardized preprocessing:

10.3 Augmentation Techniques

Technique Parameters Purpose
Horizontal Flip 50% probability Increase orientation diversity
Random Crop 0-20% zoom Simulate varying distances
Rotation ±10 degrees Handle tilted vehicles
Grayscale 100% conversion Reduce color dependency
Blur 2px Simulate motion/focus variations

11. License

This project is open source and available under the Apache License 2.0.

12. References

  1. Ultralytics YOLOv8 Documentation.
  2. Roboflow Dataset Management, Detect and Annotate Documentation.

Acknowledgments

Special thanks to:


Note: This project is intended for educational and research purposes. When deploying vehicle recognition systems in production environments, ensure compliance with relevant regulations regarding automated surveillance and data privacy.