1. Introduction
This project demonstrates the implementation of a custom vehicle recognition and segmentation system using YOLOv8 trained on a specially curated vehicle dataset. The system has been designed to identify and segment three distinct vehicle types (cars, pickups, and trucks) using instance segmentation techniques that provide pixel-level precision in vehicle identification.
The project addresses the growing demand for automated vehicle classification in applications such as traffic monitoring, parking management, toll collection systems, and intelligent transportation systems. By training a custom YOLOv8 segmentation model on a specialized dataset, the system achieves high accuracy in distinguishing between similar vehicle categories that typically pose challenges for general-purpose models.
The implementation showcases a complete machine learning workflow, from data collection and preprocessing through model training, evaluation, and deployment. Using Roboflow for dataset management and annotation, and Ultralytics YOLOv8 for model training, the project offers a practical example of custom object segmentation for domain-specific applications.
Core Features:
- Custom dataset of 60 images (20 per vehicle class)
- Data augmentation techniques to enhance model robustness
- YOLOv8x-seg architecture for instance segmentation
- 50-epoch training with comprehensive metric tracking
- High accuracy: 99.5% mAP50 on validation set
- Pixel-level vehicle segmentation masks
2. Methodology / Approach
The project follows a structured deep learning workflow for custom vehicle segmentation, leveraging YOLOv8's state-of-the-art instance segmentation capabilities. The methodology combines careful dataset preparation, strategic data augmentation, and systematic model training to achieve optimal performance.
Dataset Preparation: The custom dataset consists of 60 manually selected and annotated vehicle images distributed equally across three classes (cars, pickups, trucks). Images were annotated using Roboflow's segmentation tools, creating precise polygon masks for each vehicle instance.
Data Augmentation: To enhance model generalization and prevent overfitting on the small dataset, multiple augmentation techniques were applied, including horizontal flips, random crops (0-20% zoom), rotation (±10°), grayscale conversion, and blur effects.
Model Training: The YOLOv8x-seg model (extra-large variant) was initialized with pre-trained weights and fine-tuned for 50 epochs on the custom dataset. The training utilized automatic mixed precision (AMP) for faster computation and included mosaic augmentation for the first 40 epochs.
2.1 System Architecture
The system comprises four main components:
- Data Collection & Annotation: Manual selection and polygon annotation of vehicle images
- Preprocessing & Augmentation: Image standardization and augmentation pipeline
- Model Training: Custom YOLOv8x-seg training with transfer learning
- Evaluation & Testing: Performance assessment on unseen vehicle images
2.2 Dataset Split
The 60-image dataset was strategically divided:
- Training Set (70%): 42 images for model learning
- Validation Set (20%): 12 images for hyperparameter tuning
- Test Set (10%): 6 images for final evaluation
Additionally, 15 completely new images (5 per class) were used for external validation to assess real-world performance.
3. Mathematical Framework
3.1 Performance Metrics
Precision: Measures the accuracy of positive predictions
$$\text{Precision} = \frac{TP}{TP + FP}$$
where $TP$ denotes true positives and $FP$ denotes false positives.
Recall: Measures the ability to find all positive instances
$$\text{Recall} = \frac{TP}{TP + FN}$$
where $FN$ denotes false negatives.
F1 Score: Harmonic mean of precision and recall
$$F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$
mAP (mean Average Precision): Average precision across all classes at specified IoU thresholds
- $\text{mAP}_{50}$: Average precision at IoU threshold 0.50
- $\text{mAP}_{50-95}$: Average precision at IoU thresholds from 0.50 to 0.95
3.2 Training Objective
YOLOv8 segmentation loss function combines:
- Box Loss: Bounding box regression loss (localization)
- Segmentation Loss: Mask prediction loss (pixel-level accuracy)
- Classification Loss: Class prediction loss
- DFL Loss: Distribution focal loss for box regression
The total loss function can be expressed as:
$$\mathcal{L}_{\text{total}} = \lambda_{\text{box}} \mathcal{L}_{\text{box}} + \lambda_{\text{seg}} \mathcal{L}_{\text{seg}} + \lambda_{\text{cls}} \mathcal{L}_{\text{cls}} + \lambda_{\text{dfl}} \mathcal{L}_{\text{dfl}}$$
where $\lambda_i$ represents the weight coefficient for each loss component.
4. Requirements
requirements.txt
ultralytics>=8.0.0
roboflow>=1.0.0
pandas>=1.3.0
5. Installation & Configuration
5.1 Environment Setup
# Clone the repository
git clone https://github.com/kemalkilicaslan/Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset.git
cd Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset
# Install required packages
pip install -r requirements.txt
Note: This project was developed using Google Colab with GPU acceleration (Tesla T4). For local execution, ensure you have:
- Python 3.8+
- CUDA-compatible GPU (recommended)
- At least 8GB RAM
5.2 Project Structure
Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset
├── input/
│ ├── car1.jpg - car5.jpg
│ ├── pickup1.jpg - pickup5.jpg
│ └── truck1.jpg - truck5.jpg
├── output/
│ └── (segmented prediction images)
├── Vehicle-Recognition-with-Segmentation-Training-on-a-Custom-Dataset.ipynb
├── confusion_matrix.png
├── results.png
├── README.md
├── requirements.txt
└── LICENSE
5.3 Required Files
Dataset Access:
- Roboflow API key (required for dataset download)
- Dataset:
vehicle-segmentation-yvbo4(version 5) - Workspace:
kemalkilicaslan-bgq6q
Pre-trained Model:
yolov8x-seg.pt(automatically downloaded during training)
6. Usage / How to Run
6.1 Dataset Download from Roboflow
from roboflow import Roboflow
rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("kemalkilicaslan-bgq6q").project("vehicle-segmentation-yvbo4")
version = project.version(5).download("yolov8")
6.2 Model Training
CLI:
yolo task=segment mode=train model=yolov8x-seg.pt \
data="/path/to/vehicle-segmentation-5/data.yaml" \
epochs=50 imgsz=640
Python:
from ultralytics import YOLO
# Initialize model
model = YOLO('yolov8x-seg.pt')
# Train model
model.train(
data='/path/to/vehicle-segmentation-5/data.yaml',
epochs=50,
imgsz=640,
task='segment'
)
6.3 Model Validation
yolo task=segment mode=val \
model=/path/to/runs/segment/train/weights/best.pt \
data="/path/to/vehicle-segmentation-5/data.yaml"
6.4 Prediction on New Images
CLI:
yolo task=segment mode=predict \
model=/path/to/runs/segment/train/weights/best.pt \
conf=0.85 \
source="/path/to/test/images/*.jpg"
Python:
from ultralytics import YOLO
# Load trained model
model = YOLO('/path/to/best.pt')
# Run prediction
results = model.predict(
source='/path/to/test/images',
conf=0.85,
save=True
)
7. Application / Results
7.1 Training Results
The model was trained for 50 epochs with the following final metrics:
| Metric | Value |
|---|---|
| Box mAP50 | 99.5% |
| Box mAP50-95 | 90.3% |
| Mask mAP50 | 99.5% |
| Mask mAP50-95 | 87.2% |
| Training Time | 0.308 hours (~18.5 minutes) |
7.2 Class-wise Performance
| Class | Precision | Recall | F1 Score | Box mAP50-95 | Mask mAP50-95 |
|---|---|---|---|---|---|
| Car | 99.1% | 100% | 0.995 | 89.7% | 89.9% |
| Pickup | 90.5% | 100% | 0.950 | 91.9% | 81.5% |
| Truck | 100% | 79.6% | 0.886 | 89.4% | 90.3% |
| Overall | 96.5% | 93.2% | 0.948 | 90.3% | 87.2% |
7.3 Training Progress
Confusion Matrix:
Training Metrics:
7.4 Prediction Examples
The model was tested on 15 completely new vehicle images (5 per class) to evaluate its real-world performance. All test images were successfully segmented with high confidence scores (>85%), demonstrating the model's robust ability to accurately classify and segment vehicles across various conditions.
7.5 Performance Analysis
Model Capabilities Demonstrated:
- Accurate classification of vehicle types across diverse scenarios
- Precise pixel-level segmentation masks that closely follow vehicle contours
- Robust performance under various lighting conditions (daylight, shadows, overcast)
- Effective handling of different vehicle orientations and viewing angles
- High confidence scores indicating strong model certainty
Strengths:
- Excellent overall accuracy (99.5% mAP50)
- Perfect recall for cars and pickups (100%)
- High precision across all classes (>90%)
- Robust to various vehicle orientations and environmental conditions
- Successful generalization to completely unseen test images
Areas for Improvement:
- Truck recall could be improved (79.6%) - some trucks may be confused with pickups
- Additional training data for edge cases and rare vehicle configurations
- Testing on more diverse vehicle models and manufacturers
- Performance evaluation in challenging weather conditions
8. Tech Stack
8.1 Core Technologies
- Programming Language: Python 3.12
- Deep Learning Framework: PyTorch 2.8.0
- Model Architecture: Ultralytics YOLOv8x-seg
- Dataset Management: Roboflow
- Development Environment: Google Colab (Tesla T4 GPU)
8.2 Libraries & Dependencies
| Library | Version | Purpose |
|---|---|---|
| ultralytics | 8.3.205 | YOLOv8 implementation and training |
| roboflow | 1.0+ | Dataset management and download |
| torch | 2.8.0 | Deep learning computations |
| opencv-python | 4.12.0 | Image processing |
| pandas | Latest | Data analysis and metrics |
| numpy | 2.0.2 | Numerical computations |
8.3 Model Specifications
YOLOv8x-seg Architecture:
- Total Layers: 231 (training) / 125 (inference, fused)
- Parameters: 71,753,737 (71.7M)
- GFLOPs: 328.8
- Model Size: 143.9 MB
- Input Resolution: 640×640 pixels
- Output Classes: 3 (car, pickup, truck)
Training Configuration:
- Optimizer: AdamW (lr=0.001429, momentum=0.9)
- Batch Size: 16
- Image Size: 640×640
- Epochs: 50
- AMP: Enabled (Automatic Mixed Precision)
- Mosaic Augmentation: Epochs 1-40
- Warmup Epochs: 3
9. Dataset Details
9.1 Data Collection
Source Images:
- 20 car images (various models and angles)
- 20 pickup truck images (different makes)
- 20 truck images (commercial vehicles)
Annotation Method:
- Manual polygon annotation using Roboflow
- Pixel-precise segmentation masks
- Quality-controlled annotations
9.2 Preprocessing
All images underwent standardized preprocessing:
- Auto-orientation: Correct EXIF orientation
- Resizing: 640×640 pixels (maintaining aspect ratio)
- Normalization: Pixel values scaled appropriately
9.3 Augmentation Techniques
| Technique | Parameters | Purpose |
|---|---|---|
| Horizontal Flip | 50% probability | Increase orientation diversity |
| Random Crop | 0-20% zoom | Simulate varying distances |
| Rotation | ±10 degrees | Handle tilted vehicles |
| Grayscale | 100% conversion | Reduce color dependency |
| Blur | 2px | Simulate motion/focus variations |
10. License
This project is open source and available under the Apache License 2.0.
11. References
- Ultralytics YOLOv8 Documentation.
- Roboflow Dataset Management, Detect and Annotate Documentation.
Acknowledgments
Special thanks to:
- Ultralytics for developing and maintaining the YOLOv8 framework
- Roboflow for providing excellent dataset management and annotation tools
- Google Colab for providing free GPU resources for model training
Note: This project is intended for educational and research purposes. When deploying vehicle recognition systems in production environments, ensure compliance with relevant regulations regarding automated surveillance and data privacy.