Kemal Kilicaslan

1. Introduction

This project demonstrates the implementation of a custom vehicle recognition and segmentation system using YOLOv8 trained on a specially curated vehicle dataset. The system has been designed to identify and segment three distinct vehicle types (cars, pickups, and trucks) using instance segmentation techniques that provide pixel-level precision in vehicle identification.

The project addresses the growing demand for automated vehicle classification in applications such as traffic monitoring, parking management, toll collection systems, and intelligent transportation systems. By training a custom YOLOv8 segmentation model on a specialized dataset, the system achieves high accuracy in distinguishing between similar vehicle categories that typically pose challenges for general-purpose models.

The implementation showcases a complete machine learning workflow, from data collection and preprocessing through model training, evaluation, and deployment. Using Roboflow for dataset management and annotation, and Ultralytics YOLOv8 for model training, the project offers a practical example of custom object segmentation for domain-specific applications.

Core Features:

Custom dataset of 60 images (20 per vehicle class)
Data augmentation techniques to enhance model robustness
YOLOv8x-seg architecture for instance segmentation
50-epoch training with comprehensive metric tracking
High accuracy: 99.5% mAP50 on validation set
Pixel-level vehicle segmentation masks

2. Methodology / Approach

The project follows a structured deep learning workflow for custom vehicle segmentation, leveraging YOLOv8's state-of-the-art instance segmentation capabilities. The methodology combines careful dataset preparation, strategic data augmentation, and systematic model training to achieve optimal performance.

Dataset Preparation: The custom dataset consists of 60 manually selected and annotated vehicle images distributed equally across three classes (cars, pickups, trucks). Images were annotated using Roboflow's segmentation tools, creating precise polygon masks for each vehicle instance.

Data Augmentation: To enhance model generalization and prevent overfitting on the small dataset, multiple augmentation techniques were applied, including horizontal flips, random crops (0-20% zoom), rotation (±10°), grayscale conversion, and blur effects.

Model Training: The YOLOv8x-seg model (extra-large variant) was initialized with pre-trained weights and fine-tuned for 50 epochs on the custom dataset. The training utilized automatic mixed precision (AMP) for faster computation and included mosaic augmentation for the first 40 epochs.

2.1 System Architecture

The system comprises four main components:

Data Collection & Annotation: Manual selection and polygon annotation of vehicle images
Preprocessing & Augmentation: Image standardization and augmentation pipeline
Model Training: Custom YOLOv8x-seg training with transfer learning
Evaluation & Testing: Performance assessment on unseen vehicle images

2.2 Dataset Split

The 60-image dataset was strategically divided:

Training Set (70%): 42 images for model learning
Validation Set (20%): 12 images for hyperparameter tuning
Test Set (10%): 6 images for final evaluation

Additionally, 15 completely new images (5 per class) were used for external validation to assess real-world performance.

3. Mathematical Framework

3.1 Performance Metrics

Precision: Measures the accuracy of positive predictions

$$\text{Precision} = \frac{TP}{TP + FP}$$

where $TP$ denotes true positives and $FP$ denotes false positives.

Recall: Measures the ability to find all positive instances

$$\text{Recall} = \frac{TP}{TP + FN}$$

where $FN$ denotes false negatives.

F1 Score: Harmonic mean of precision and recall

$$F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$

mAP (mean Average Precision): Average precision across all classes at specified IoU thresholds

$\text{mAP}_{50}$: Average precision at IoU threshold 0.50
$\text{mAP}_{50-95}$: Average precision at IoU thresholds from 0.50 to 0.95

3.2 Training Objective

YOLOv8 segmentation loss function combines:

Box Loss: Bounding box regression loss (localization)
Segmentation Loss: Mask prediction loss (pixel-level accuracy)
Classification Loss: Class prediction loss
DFL Loss: Distribution focal loss for box regression

The total loss function can be expressed as:

$$\mathcal{L}_{\text{total}} = \lambda_{\text{box}} \mathcal{L}_{\text{box}} + \lambda_{\text{seg}} \mathcal{L}_{\text{seg}} + \lambda_{\text{cls}} \mathcal{L}_{\text{cls}} + \lambda_{\text{dfl}} \mathcal{L}_{\text{dfl}}$$

where $\lambda_i$ represents the weight coefficient for each loss component.

4. Requirements

requirements.txt

ultralytics>=8.0.0
roboflow>=1.0.0
pandas>=1.3.0

5. Installation & Configuration

5.1 Environment Setup

# Clone the repository
git clone https://github.com/kemalkilicaslan/Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset.git
cd Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset

# Install required packages
pip install -r requirements.txt

Note: This project was developed using Google Colab with GPU acceleration (Tesla T4). For local execution, ensure you have:

Python 3.8+
CUDA-compatible GPU (recommended)
At least 8GB RAM

5.2 Project Structure

Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset
├── input/
│   ├── car1.jpg - car5.jpg
│   ├── pickup1.jpg - pickup5.jpg
│   └── truck1.jpg - truck5.jpg
├── output/
│   └── (segmented prediction images)
├── Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset.ipynb
├── confusion_matrix.png
├── results.png
├── README.md
├── requirements.txt
└── LICENSE

5.3 Required Files

Dataset Access:

Roboflow API key (required for dataset download)
Dataset: vehicle-segmentation-yvbo4 (version 5)
Workspace: kemalkilicaslan-bgq6q

Pre-trained Model:

yolov8x-seg.pt (automatically downloaded during training)

6. Usage / How to Run

6.1 Dataset Download from Roboflow

from roboflow import Roboflow

rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("kemalkilicaslan-bgq6q").project("vehicle-segmentation-yvbo4")
version = project.version(5).download("yolov8")

6.2 Model Training

CLI:

yolo task=segment mode=train model=yolov8x-seg.pt \
  data="/path/to/vehicle-segmentation-5/data.yaml" \
  epochs=50 imgsz=640

Python:

from ultralytics import YOLO

# Initialize model
model = YOLO('yolov8x-seg.pt')

# Train model
model.train(
    data='/path/to/vehicle-segmentation-5/data.yaml',
    epochs=50,
    imgsz=640,
    task='segment'
)

6.3 Model Validation

yolo task=segment mode=val \
  model=/path/to/runs/segment/train/weights/best.pt \
  data="/path/to/vehicle-segmentation-5/data.yaml"

6.4 Prediction on New Images

CLI:

yolo task=segment mode=predict \
  model=/path/to/runs/segment/train/weights/best.pt \
  conf=0.85 \
  source="/path/to/test/images/*.jpg"

Python:

from ultralytics import YOLO

# Load trained model
model = YOLO('/path/to/best.pt')

# Run prediction
results = model.predict(
    source='/path/to/test/images',
    conf=0.85,
    save=True
)

7. Application / Results

7.1 Training Results

The model was trained for 50 epochs with the following final metrics:

Metric	Value
Box mAP50	99.5%
Box mAP50-95	90.3%
Mask mAP50	99.5%
Mask mAP50-95	87.2%
Training Time	0.308 hours (~18.5 minutes)

7.2 Class-wise Performance

Class	Precision	Recall	F1 Score	Box mAP50-95	Mask mAP50-95
Car	99.1%	100%	0.995	89.7%	89.9%
Pickup	90.5%	100%	0.950	91.9%	81.5%
Truck	100%	79.6%	0.886	89.4%	90.3%
Overall	96.5%	93.2%	0.948	90.3%	87.2%

7.3 Training Progress

Confusion Matrix:

Training Metrics:

7.4 Prediction Examples

The model was tested on 15 completely new vehicle images (5 per class) to evaluate its real-world performance. All test images were successfully segmented with high confidence scores (>85%), demonstrating the model's robust ability to accurately classify and segment vehicles across various conditions.

7.5 Performance Analysis

Model Capabilities Demonstrated:

Accurate classification of vehicle types across diverse scenarios
Precise pixel-level segmentation masks that closely follow vehicle contours
Robust performance under various lighting conditions (daylight, shadows, overcast)
Effective handling of different vehicle orientations and viewing angles
High confidence scores indicating strong model certainty

Strengths:

Excellent overall accuracy (99.5% mAP50)
Perfect recall for cars and pickups (100%)
High precision across all classes (>90%)
Robust to various vehicle orientations and environmental conditions
Successful generalization to completely unseen test images

Areas for Improvement:

Truck recall could be improved (79.6%) - some trucks may be confused with pickups
Additional training data for edge cases and rare vehicle configurations
Testing on more diverse vehicle models and manufacturers
Performance evaluation in challenging weather conditions

8. Tech Stack

8.1 Core Technologies

Programming Language: Python 3.12
Deep Learning Framework: PyTorch 2.8.0
Model Architecture: Ultralytics YOLOv8x-seg
Dataset Management: Roboflow
Development Environment: Google Colab (Tesla T4 GPU)

8.2 Libraries & Dependencies

Library	Version	Purpose
ultralytics	8.3.205	YOLOv8 implementation and training
roboflow	1.0+	Dataset management and download
torch	2.8.0	Deep learning computations
opencv-python	4.12.0	Image processing
pandas	Latest	Data analysis and metrics
numpy	2.0.2	Numerical computations

8.3 Model Specifications

YOLOv8x-seg Architecture:

Total Layers: 231 (training) / 125 (inference, fused)
Parameters: 71,753,737 (71.7M)
GFLOPs: 328.8
Model Size: 143.9 MB
Input Resolution: 640×640 pixels
Output Classes: 3 (car, pickup, truck)

Training Configuration:

Optimizer: AdamW (lr=0.001429, momentum=0.9)
Batch Size: 16
Image Size: 640×640
Epochs: 50
AMP: Enabled (Automatic Mixed Precision)
Mosaic Augmentation: Epochs 1-40
Warmup Epochs: 3

9. Dataset Details

9.1 Data Collection

Source Images:

20 car images (various models and angles)
20 pickup truck images (different makes)
20 truck images (commercial vehicles)

Annotation Method:

Manual polygon annotation using Roboflow
Pixel-precise segmentation masks
Quality-controlled annotations

9.2 Preprocessing

All images underwent standardized preprocessing:

Auto-orientation: Correct EXIF orientation
Resizing: 640×640 pixels (maintaining aspect ratio)
Normalization: Pixel values scaled appropriately

9.3 Augmentation Techniques

Technique	Parameters	Purpose
Horizontal Flip	50% probability	Increase orientation diversity
Random Crop	0-20% zoom	Simulate varying distances
Rotation	±10 degrees	Handle tilted vehicles
Grayscale	100% conversion	Reduce color dependency
Blur	2px	Simulate motion/focus variations

10. License

This project is open source and available under the Apache License 2.0.

11. References

Ultralytics YOLOv8 Documentation.
Roboflow Dataset Management, Detect and Annotate Documentation.

Acknowledgments

Special thanks to:

Ultralytics for developing and maintaining the YOLOv8 framework
Roboflow for providing excellent dataset management and annotation tools
Google Colab for providing free GPU resources for model training

Note: This project is intended for educational and research purposes. When deploying vehicle recognition systems in production environments, ensure compliance with relevant regulations regarding automated surveillance and data privacy.

About

This project implements a custom Vehicle Recognition and Segmentation system using YOLOv8 trained on a specialized dataset. Classifies and segments three vehicle types (cars, pickups, trucks) with 99.5% mAP50 accuracy. Complete pipeline from data annotation with Roboflow to model training and evaluation for traffic monitoring applications.

Apache License 2.0

Tech Stack

Requirements

pandas>=1.3.0
ultralytics>=8.0.0
roboflow>=1.0.0

Languages

Python 100%

Vehicle Recognition with Segmentation Training on a Custom Dataset