1. Introduction

This project demonstrates the implementation of a custom vehicle recognition and segmentation system using YOLOv8 trained on a specially curated vehicle dataset. The system has been designed to identify and segment three distinct vehicle types (cars, pickups, and trucks) using instance segmentation techniques that provide pixel-level precision in vehicle identification.

The project addresses the growing demand for automated vehicle classification in applications such as traffic monitoring, parking management, toll collection systems, and intelligent transportation systems. By training a custom YOLOv8 segmentation model on a specialized dataset, the system achieves high accuracy in distinguishing between similar vehicle categories that typically pose challenges for general-purpose models.

The implementation showcases a complete machine learning workflow, from data collection and preprocessing through model training, evaluation, and deployment. Using Roboflow for dataset management and annotation, and Ultralytics YOLOv8 for model training, the project offers a practical example of custom object segmentation for domain-specific applications.

Core Features:

  • Custom dataset of 60 images (20 per vehicle class)
  • Data augmentation techniques to enhance model robustness
  • YOLOv8x-seg architecture for instance segmentation
  • 50-epoch training with comprehensive metric tracking
  • High accuracy: 99.5% mAP50 on validation set
  • Pixel-level vehicle segmentation masks

2. Methodology / Approach

The project follows a structured deep learning workflow for custom vehicle segmentation, leveraging YOLOv8's state-of-the-art instance segmentation capabilities. The methodology combines careful dataset preparation, strategic data augmentation, and systematic model training to achieve optimal performance.

Dataset Preparation: The custom dataset consists of 60 manually selected and annotated vehicle images distributed equally across three classes (cars, pickups, trucks). Images were annotated using Roboflow's segmentation tools, creating precise polygon masks for each vehicle instance.

Data Augmentation: To enhance model generalization and prevent overfitting on the small dataset, multiple augmentation techniques were applied, including horizontal flips, random crops (0-20% zoom), rotation (±10°), grayscale conversion, and blur effects.

Model Training: The YOLOv8x-seg model (extra-large variant) was initialized with pre-trained weights and fine-tuned for 50 epochs on the custom dataset. The training utilized automatic mixed precision (AMP) for faster computation and included mosaic augmentation for the first 40 epochs.

2.1 System Architecture

The system comprises four main components:

  1. Data Collection & Annotation: Manual selection and polygon annotation of vehicle images
  2. Preprocessing & Augmentation: Image standardization and augmentation pipeline
  3. Model Training: Custom YOLOv8x-seg training with transfer learning
  4. Evaluation & Testing: Performance assessment on unseen vehicle images

2.2 Dataset Split

The 60-image dataset was strategically divided:

  • Training Set (70%): 42 images for model learning
  • Validation Set (20%): 12 images for hyperparameter tuning
  • Test Set (10%): 6 images for final evaluation

Additionally, 15 completely new images (5 per class) were used for external validation to assess real-world performance.

3. Mathematical Framework

3.1 Performance Metrics

Precision: Measures the accuracy of positive predictions

$$\text{Precision} = \frac{TP}{TP + FP}$$

where $TP$ denotes true positives and $FP$ denotes false positives.

Recall: Measures the ability to find all positive instances

$$\text{Recall} = \frac{TP}{TP + FN}$$

where $FN$ denotes false negatives.

F1 Score: Harmonic mean of precision and recall

$$F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$

mAP (mean Average Precision): Average precision across all classes at specified IoU thresholds

  • $\text{mAP}_{50}$: Average precision at IoU threshold 0.50
  • $\text{mAP}_{50-95}$: Average precision at IoU thresholds from 0.50 to 0.95

3.2 Training Objective

YOLOv8 segmentation loss function combines:

  • Box Loss: Bounding box regression loss (localization)
  • Segmentation Loss: Mask prediction loss (pixel-level accuracy)
  • Classification Loss: Class prediction loss
  • DFL Loss: Distribution focal loss for box regression

The total loss function can be expressed as:

$$\mathcal{L}_{\text{total}} = \lambda_{\text{box}} \mathcal{L}_{\text{box}} + \lambda_{\text{seg}} \mathcal{L}_{\text{seg}} + \lambda_{\text{cls}} \mathcal{L}_{\text{cls}} + \lambda_{\text{dfl}} \mathcal{L}_{\text{dfl}}$$

where $\lambda_i$ represents the weight coefficient for each loss component.

4. Requirements

requirements.txt

ultralytics>=8.0.0
roboflow>=1.0.0
pandas>=1.3.0

5. Installation & Configuration

5.1 Environment Setup

# Clone the repository
git clone https://github.com/kemalkilicaslan/Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset.git
cd Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset

# Install required packages
pip install -r requirements.txt

Note: This project was developed using Google Colab with GPU acceleration (Tesla T4). For local execution, ensure you have:

  • Python 3.8+
  • CUDA-compatible GPU (recommended)
  • At least 8GB RAM

5.2 Project Structure

Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset
├── input/
│   ├── car1.jpg - car5.jpg
│   ├── pickup1.jpg - pickup5.jpg
│   └── truck1.jpg - truck5.jpg
├── output/
│   └── (segmented prediction images)
├── Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset.ipynb
├── confusion_matrix.png
├── results.png
├── README.md
├── requirements.txt
└── LICENSE

5.3 Required Files

Dataset Access:

  • Roboflow API key (required for dataset download)
  • Dataset: vehicle-segmentation-yvbo4 (version 5)
  • Workspace: kemalkilicaslan-bgq6q

Pre-trained Model:

  • yolov8x-seg.pt (automatically downloaded during training)

6. Usage / How to Run

6.1 Dataset Download from Roboflow

from roboflow import Roboflow

rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("kemalkilicaslan-bgq6q").project("vehicle-segmentation-yvbo4")
version = project.version(5).download("yolov8")

6.2 Model Training

CLI:

yolo task=segment mode=train model=yolov8x-seg.pt \
  data="/path/to/vehicle-segmentation-5/data.yaml" \
  epochs=50 imgsz=640

Python:

from ultralytics import YOLO

# Initialize model
model = YOLO('yolov8x-seg.pt')

# Train model
model.train(
    data='/path/to/vehicle-segmentation-5/data.yaml',
    epochs=50,
    imgsz=640,
    task='segment'
)

6.3 Model Validation

yolo task=segment mode=val \
  model=/path/to/runs/segment/train/weights/best.pt \
  data="/path/to/vehicle-segmentation-5/data.yaml"

6.4 Prediction on New Images

CLI:

yolo task=segment mode=predict \
  model=/path/to/runs/segment/train/weights/best.pt \
  conf=0.85 \
  source="/path/to/test/images/*.jpg"

Python:

from ultralytics import YOLO

# Load trained model
model = YOLO('/path/to/best.pt')

# Run prediction
results = model.predict(
    source='/path/to/test/images',
    conf=0.85,
    save=True
)

7. Application / Results

7.1 Training Results

The model was trained for 50 epochs with the following final metrics:

Metric Value
Box mAP50 99.5%
Box mAP50-95 90.3%
Mask mAP50 99.5%
Mask mAP50-95 87.2%
Training Time 0.308 hours (~18.5 minutes)

7.2 Class-wise Performance

Class Precision Recall F1 Score Box mAP50-95 Mask mAP50-95
Car 99.1% 100% 0.995 89.7% 89.9%
Pickup 90.5% 100% 0.950 91.9% 81.5%
Truck 100% 79.6% 0.886 89.4% 90.3%
Overall 96.5% 93.2% 0.948 90.3% 87.2%

7.3 Training Progress

Confusion Matrix:

Confusion Matrix

Training Metrics:

Training Results

7.4 Prediction Examples

The model was tested on 15 completely new vehicle images (5 per class) to evaluate its real-world performance. All test images were successfully segmented with high confidence scores (>85%), demonstrating the model's robust ability to accurately classify and segment vehicles across various conditions.

output_20_0 output_20_2 output_20_4 output_20_6 output_20_8 output_20_10 output_20_12 output_20_14 output_20_16 output_20_18 output_20_20 output_20_22 output_20_24 output_20_26 output_20_28

7.5 Performance Analysis

Model Capabilities Demonstrated:

  • Accurate classification of vehicle types across diverse scenarios
  • Precise pixel-level segmentation masks that closely follow vehicle contours
  • Robust performance under various lighting conditions (daylight, shadows, overcast)
  • Effective handling of different vehicle orientations and viewing angles
  • High confidence scores indicating strong model certainty

Strengths:

  • Excellent overall accuracy (99.5% mAP50)
  • Perfect recall for cars and pickups (100%)
  • High precision across all classes (>90%)
  • Robust to various vehicle orientations and environmental conditions
  • Successful generalization to completely unseen test images

Areas for Improvement:

  • Truck recall could be improved (79.6%) - some trucks may be confused with pickups
  • Additional training data for edge cases and rare vehicle configurations
  • Testing on more diverse vehicle models and manufacturers
  • Performance evaluation in challenging weather conditions

8. Tech Stack

8.1 Core Technologies

  • Programming Language: Python 3.12
  • Deep Learning Framework: PyTorch 2.8.0
  • Model Architecture: Ultralytics YOLOv8x-seg
  • Dataset Management: Roboflow
  • Development Environment: Google Colab (Tesla T4 GPU)

8.2 Libraries & Dependencies

Library Version Purpose
ultralytics 8.3.205 YOLOv8 implementation and training
roboflow 1.0+ Dataset management and download
torch 2.8.0 Deep learning computations
opencv-python 4.12.0 Image processing
pandas Latest Data analysis and metrics
numpy 2.0.2 Numerical computations

8.3 Model Specifications

YOLOv8x-seg Architecture:

  • Total Layers: 231 (training) / 125 (inference, fused)
  • Parameters: 71,753,737 (71.7M)
  • GFLOPs: 328.8
  • Model Size: 143.9 MB
  • Input Resolution: 640×640 pixels
  • Output Classes: 3 (car, pickup, truck)

Training Configuration:

  • Optimizer: AdamW (lr=0.001429, momentum=0.9)
  • Batch Size: 16
  • Image Size: 640×640
  • Epochs: 50
  • AMP: Enabled (Automatic Mixed Precision)
  • Mosaic Augmentation: Epochs 1-40
  • Warmup Epochs: 3

9. Dataset Details

9.1 Data Collection

Source Images:

  • 20 car images (various models and angles)
  • 20 pickup truck images (different makes)
  • 20 truck images (commercial vehicles)

Annotation Method:

  • Manual polygon annotation using Roboflow
  • Pixel-precise segmentation masks
  • Quality-controlled annotations

9.2 Preprocessing

All images underwent standardized preprocessing:

  • Auto-orientation: Correct EXIF orientation
  • Resizing: 640×640 pixels (maintaining aspect ratio)
  • Normalization: Pixel values scaled appropriately

9.3 Augmentation Techniques

Technique Parameters Purpose
Horizontal Flip 50% probability Increase orientation diversity
Random Crop 0-20% zoom Simulate varying distances
Rotation ±10 degrees Handle tilted vehicles
Grayscale 100% conversion Reduce color dependency
Blur 2px Simulate motion/focus variations

10. License

This project is open source and available under the Apache License 2.0.

11. References

  1. Ultralytics YOLOv8 Documentation.
  2. Roboflow Dataset Management, Detect and Annotate Documentation.

Acknowledgments

Special thanks to:

  • Ultralytics for developing and maintaining the YOLOv8 framework
  • Roboflow for providing excellent dataset management and annotation tools
  • Google Colab for providing free GPU resources for model training

Note: This project is intended for educational and research purposes. When deploying vehicle recognition systems in production environments, ensure compliance with relevant regulations regarding automated surveillance and data privacy.