Kemal Kilicaslan

1. Introduction

This project implements a real-time multi-class road scene segmentation system using a custom-trained YOLO26x-seg model on the Ultralytics Platform. The system processes road footage and generates pixel-level segmentation masks for five classes organized into two semantic groups: Road Surface Obstacles (pothole, crack, patch) and Traffic Warning Objects (traffic-cone, road-barrier).

Autonomous vehicles and Advanced Driver Assistance Systems (ADAS) rarely fail because of flawed object recognition models — they fail because the perception layer was never trained to recognize the specific elements that compromise drivable surface continuity. Cracks, potholes, patches, traffic cones, and road barriers sit at the exact intersection of vehicle safety and driving comfort, yet they are almost never addressed as a single unified detection problem.

This implementation demonstrates an end-to-end perception layer that unifies surface-level hazards and structural obstacles into a single inference pass, reducing latency and pipeline complexity compared to fragmented multi-model approaches.

Core Features:

Real-time pixel-level segmentation across 5 road scene classes
Two semantic groups: Road Surface Obstacles (pothole, crack, patch) and Traffic Warning Objects (traffic-cone, road-barrier)
Order-aware mask compositing — larger masks painted first, smaller (closer) instances stay visible on top
Anti-aliased contour outlining via OpenCV findContours + drawContours
Adaptive label rendering with luma-based text-color selection for legibility on any background
Mask resolution synchronization to original frame dimensions (nearest-neighbor interpolation)
Annotated video recording with class + confidence labels per instance
Custom-trained model: 91.0% mAP50(M), 67.8% mAP50-95(M), 86.3% Precision(M), 85.2% Recall(M)

2. Methodology / Approach

The system runs YOLO26x-seg inference on each video frame, retrieves binary segmentation masks together with class IDs and confidence scores, and renders the result in two layered passes — first a translucent class-colored mask composite, then anti-aliased contours and adaptive labels on top.

2.1 System Architecture

The road scene segmentation pipeline consists of:

YOLO Segmentation: Detect and segment all instances of pothole, traffic-cone, road-barrier, patch, and crack in each frame
Mask Resolution Sync: Resize binary masks to the original frame dimensions using nearest-neighbor interpolation when the inference resolution differs
Order-Aware Mask Compositing: Sort masks by area and paint the largest first, ensuring smaller (typically closer) instances stay visible on top via alpha blending
Contour Drawing: Compute external contours per mask and draw smooth anti-aliased outlines in the corresponding class color
Adaptive Label Overlay: Render class + confidence labels at each instance's anchor point with luma-based text-color selection for legibility
Video Output: Annotated frames written to output video file

2.2 Processing Pipeline

[Video Input]
    ↓
[YOLO Segmentation] → [Masks + Class IDs + Confidences]
    ↓
[Mask Resolution Sync] (Nearest-Neighbor Interpolation)
    ↓
[Order-Aware Mask Compositing] (Largest-First, Alpha Blending)
    ↓
[Contour Drawing + Adaptive Label Overlay]
    ↓
[Video Output]

2.3 Implementation Strategy

The implementation uses the Ultralytics YOLO framework for inference and OpenCV for video processing and annotation. Visualization is performed in two passes to keep the rendered output predictable and legible: first, all masks are alpha-blended onto a copy of the original frame in descending area order so that smaller instances are painted last and remain visible on top. Second, contours and class+confidence labels are drawn over the composited frame using anti-aliased line rendering. Foreground text color is selected automatically using a Rec. 601 luma threshold so that labels stay legible on light, dark, and saturated backgrounds alike.

3. Mathematical Framework

3.1 Mask IoU and Non-Maximum Suppression

For two binary segmentation masks $A$ and $B$, the Intersection over Union is:

$$\text{IoU}(A, B) = \frac{|A \cap B|}{|A \cup B|}$$

During inference, Non-Maximum Suppression removes overlapping predictions of the same class when their IoU exceeds the threshold $\tau_{IoU} = 0.55$:

$$\text{Keep}(d_i) = \begin{cases} 1 & \text{if } \text{IoU}(d_i, d_j) \leq \tau_{IoU} \text{ for all } d_j \text{ with higher score} \\ 0 & \text{otherwise} \end{cases}$$

3.2 Alpha-Blended Mask Overlay

Per-class colored masks are composited with the original frame using linear interpolation:

$$I_{out}(x, y) = (1 - \alpha) \cdot I_{overlay}(x, y) + \alpha \cdot I_{frame}(x, y)$$

where $\alpha = 0.4$ controls mask transparency. Masks are painted in descending area order so that smaller (typically closer) instances remain visible on top:

$$\sigma = \text{argsort}\left(\{|M_i|\}_{i=1}^{N}\right)_{\text{descending}}$$

3.3 Luma-Based Adaptive Text Color

Foreground text color is selected based on the perceived brightness (Rec. 601 luma) of the background color:

$$Y = 0.299 R + 0.587 G + 0.114 B$$

$$\text{TextColor} = \begin{cases} (0, 0, 0) & \text{if } Y > 160 \text{ (light background)} \\ (255, 255, 255) & \text{otherwise (dark background)} \end{cases}$$

3.4 Performance Metrics

Segmentation evaluation uses standard mask-based metrics:

$$\text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN}$$

$$\text{mAP}_{50} = \frac{1}{N} \sum_{c=1}^{N} AP_c^{50}, \quad \text{mAP}_{50\text{-}95} = \frac{1}{10 N} \sum_{c=1}^{N} \sum_{t \in \mathcal{T}} AP_c^{t}$$

where $\mathcal{T} = \{0.50, 0.55, 0.60, \ldots, 0.95\}$ and $N$ is the number of classes. The $(M)$ suffix denotes mask-based evaluation as opposed to bounding-box-based $(B)$.

4. Dataset

Dataset Name: Road Surface Obstacle and Traffic Warning Object Segmentation
Platform: Ultralytics Platform (Public)
License: CC BY-NC-ND 4.0
Total Images: 23
Total Annotations: 249
Image Format: WEBP
Mean Image Size: 1,784.8 × 1,050 px (Mean AR: 1.82)
Mean File Size: 188.9 KB
Mean Objects per Image: 11.7
Mean Polygon Points per Annotation: 36.4

Split Distribution:

Split	Images	Percentage
Train	16	69.6%
Validation	7	30.4%

Class Distribution:

Index	Class	Annotations	Images
0	pothole	67 (26.9%)	11
4	crack	67 (26.9%)	13
1	traffic-cone	59 (23.7%)	12
2	road-barrier	34 (13.7%)	9
3	patch	22 (8.8%)	9
—	Total	249	54

5. Model

Model Name: road-surface-obstacle-and-traffic-warning-object-segmentation.pt
Platform: Ultralytics Platform
License: AGPL-3.0
Architecture: YOLO26x-seg (Ultralytics)
Training Hardware: NVIDIA RTX PRO 6000
Classes: 5 (pothole, traffic-cone, road-barrier, patch, crack)

Model Metrics:

Metric	Value
mAP50(M)	96.9%
mAP50-95(M)	80.3%
Precision(M)	95.7%
Recall(M)	92.6%

Training Notes:

End-to-end training on the Ultralytics Platform — annotation with SAM 3, cloud GPU training, browser-based prediction
Mask-based metrics (M) reflect pixel-level segmentation quality rather than bounding-box overlap
The high mAP50(M) value reflects a controlled training distribution; production deployment requires expanding the dataset with diverse lighting, weather, and surface conditions

6. Requirements

opencv-python>=4.8.0
numpy>=1.24.0
ultralytics>=8.0.0

7. Installation & Configuration

7.1 Environment Setup

# Clone the repository
git clone https://github.com/kemalkilicaslan/Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System.git
cd Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System

# Install required packages
pip install -r requirements.txt

7.2 Project Structure

Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System/
├── Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System.py
├── README.md
├── requirements.txt
└── LICENSE

7.3 Required Files

Custom YOLO Model: road-surface-obstacle-and-traffic-warning-object-segmentation.pt (place in project directory)
Input Video: Road scene video file (MP4, MOV, AVI)

8. Usage / How to Run

8.1 Basic Execution

python Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System.py

8.2 Configuration Parameters

# Model and inference configuration
MODEL_PATH           = "road-surface-obstacle-and-traffic-warning-object-segmentation.pt"
CONFIDENCE_THRESHOLD = 0.6   # Minimum confidence for a detection to be kept
IOU_THRESHOLD        = 0.9   # NMS IoU threshold during inference
MASK_ALPHA           = 0.4    # Mask transparency: 0.0 = opaque, 1.0 = invisible

8.3 Input / Output

# Update these lines in the script for your video
video_capture = cv2.VideoCapture("Road-Surface-Obstacle-and-Traffic-Warning-Object.mp4")
output_file   = "Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation.mp4"

8.4 Controls

Press q to quit the application during playback

8.5 Class Color Coding

Group	Class	BGR Color
Road Surface Obstacles	pothole	`(0, 60, 255)`
	crack	`(10, 35, 10)`
	patch	`(170, 235, 35)`
Traffic Warning Objects	traffic-cone	`(220, 230, 20)`
Traffic Warning Objects	road-barrier	`(235, 235, 235)`

9. Application / Results

9.1 Input Video

Road Surface Obstacle and Traffic Warning Object:

9.2 Output Video

Road Surface Obstacle and Traffic Warning Object Segmentation:

9.3 Dataset Overview

Dataset & Charts:

Road Surface Obstacle and Traffic Warning Object Segmentation Dataset

Class Distribution:

Road Surface Obstacle and Traffic Warning Object Segmentation Classes

Dataset Charts 1:

Road Surface Obstacle and Traffic Warning Object Segmentation Charts 1

Dataset Charts 2:

Road Surface Obstacle and Traffic Warning Object Segmentation Charts 2

9.4 Model Metrics

Training Metrics & Loss Curves:

Road Surface Obstacle and Traffic Warning Object Segmentation Model Metrics

mAP50(M): 96.9% | mAP50-95(M): 80.3% | Precision(M): 95.7% | Recall(M): 92.6%

10. Tech Stack

10.1 Core Technologies

Programming Language: Python 3.8+
Computer Vision: OpenCV 4.8+
Deep Learning Framework: Ultralytics YOLO 8.0+
Numerical Computing: NumPy 1.24+
Training Platform: Ultralytics Platform

10.2 Libraries & Dependencies

Library	Version	Purpose
opencv-python	4.8+	Video I/O, mask compositing, contour drawing, label rendering
ultralytics	8.0+	YOLO26x-seg model inference
numpy	1.24+	Array operations, mask resizing, area-based ordering

10.3 Algorithm Components

Component	Method	Purpose
Object Segmentation	Custom YOLO26x-seg	Detect and segment 5 road scene classes
Mask Resolution Sync	Nearest-neighbor interpolation	Align inference masks with frame dimensions
Mask Compositing	Largest-first alpha blending	Translucent class-colored mask overlay
Contour Drawing	OpenCV findContours + drawContours	Smooth anti-aliased instance outlines
Adaptive Text Color	Rec. 601 luma threshold	Legible labels on any background color

10.4 Detection Parameters

Parameter	Value	Description
Confidence Threshold	0.6	Minimum confidence for detection retention
IoU Threshold	0.9	NMS overlap threshold during inference
Mask Alpha	0.4	Mask blending transparency (0 = opaque, 1 = invisible)
Contour Thickness	2 px	Anti-aliased instance outline width
Luma Threshold	160	Background brightness cutoff for text-color selection

11. License

This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0). Commercial use, modifications, and derivative works are not permitted. Attribution is required upon use.

12. References

Ultralytics Platform Documentation — Model training, inference, and deployment.
Ultralytics Platform Road Surface Obstacle and Traffic Warning Object Segmentation System — Dataset annotation, training, and export.
Ultralytics YOLO26 Documentation — Architecture and segmentation training.
OpenCV Video I/O and Drawing Functions Documentation.

Acknowledgments

Special thanks to the Ultralytics team for the YOLO26x-seg architecture and the Ultralytics Platform, which was used end-to-end for dataset annotation (SAM 3), cloud GPU training (RTX PRO 6000), and model export. Thanks to the OpenCV community for providing excellent real-time video processing and drawing tools.

Note: This system is designed for research, educational, and ADAS prototyping purposes. Detection accuracy may vary depending on camera angle, lighting, weather conditions, and road surface variation. The dataset is intentionally compact to demonstrate end-to-end workflow on the Ultralytics Platform; for production deployment in autonomous driving or road infrastructure monitoring, the dataset should be expanded with diverse real-world conditions (different lighting, occlusion scenarios, weather, regional surface types) and the model retrained accordingly. The perception layer should be paired with uncertainty signaling so that a higher-level agent can defer decisions when encountering objects outside the training distribution.

About

This project implements a real-time multi-class road scene segmentation system using a custom-trained YOLO26x-seg model on the Ultralytics Platform. Detects and segments 5 classes organized into Road Surface Obstacles (pothole, crack, patch) and Traffic Warning Objects (traffic-cone, road-barrier) for ADAS and autonomous driving perception applications.

CC BY-NC-ND 4.0

Tech Stack

Requirements

python>=3.8
numpy>=1.24.0
opencv-python>=4.8.0
ultralytics>=8.0.0

Languages

Python 100%

Road Surface Obstacle and Traffic Warning Object Segmentation System