1. Introduction
This project implements a real-time multi-class road scene segmentation system using a custom-trained YOLO26x-seg model on the Ultralytics Platform. The system processes road footage and generates pixel-level segmentation masks for five classes organized into two semantic groups: Road Surface Obstacles (pothole, crack, patch) and Traffic Warning Objects (traffic-cone, road-barrier).
Autonomous vehicles and Advanced Driver Assistance Systems (ADAS) rarely fail because of flawed object recognition models — they fail because the perception layer was never trained to recognize the specific elements that compromise drivable surface continuity. Cracks, potholes, patches, traffic cones, and road barriers sit at the exact intersection of vehicle safety and driving comfort, yet they are almost never addressed as a single unified detection problem.
This implementation demonstrates an end-to-end perception layer that unifies surface-level hazards and structural obstacles into a single inference pass, reducing latency and pipeline complexity compared to fragmented multi-model approaches.
Core Features:
- Real-time pixel-level segmentation across 5 road scene classes
- Two semantic groups: Road Surface Obstacles (pothole, crack, patch) and Traffic Warning Objects (traffic-cone, road-barrier)
- Order-aware mask compositing — larger masks painted first, smaller (closer) instances stay visible on top
- Anti-aliased contour outlining via OpenCV
findContours+drawContours - Adaptive label rendering with luma-based text-color selection for legibility on any background
- Mask resolution synchronization to original frame dimensions (nearest-neighbor interpolation)
- Annotated video recording with class + confidence labels per instance
- Custom-trained model: 91.0% mAP50(M), 67.8% mAP50-95(M), 86.3% Precision(M), 85.2% Recall(M)
2. Methodology / Approach
The system runs YOLO26x-seg inference on each video frame, retrieves binary segmentation masks together with class IDs and confidence scores, and renders the result in two layered passes — first a translucent class-colored mask composite, then anti-aliased contours and adaptive labels on top.
2.1 System Architecture
The road scene segmentation pipeline consists of:
- YOLO Segmentation: Detect and segment all instances of
pothole,traffic-cone,road-barrier,patch, andcrackin each frame - Mask Resolution Sync: Resize binary masks to the original frame dimensions using nearest-neighbor interpolation when the inference resolution differs
- Order-Aware Mask Compositing: Sort masks by area and paint the largest first, ensuring smaller (typically closer) instances stay visible on top via alpha blending
- Contour Drawing: Compute external contours per mask and draw smooth anti-aliased outlines in the corresponding class color
- Adaptive Label Overlay: Render
class+confidencelabels at each instance's anchor point with luma-based text-color selection for legibility - Video Output: Annotated frames written to output video file
2.2 Processing Pipeline
[Video Input]
↓
[YOLO Segmentation] → [Masks + Class IDs + Confidences]
↓
[Mask Resolution Sync] (Nearest-Neighbor Interpolation)
↓
[Order-Aware Mask Compositing] (Largest-First, Alpha Blending)
↓
[Contour Drawing + Adaptive Label Overlay]
↓
[Video Output]
2.3 Implementation Strategy
The implementation uses the Ultralytics YOLO framework for inference and OpenCV for video processing and annotation. Visualization is performed in two passes to keep the rendered output predictable and legible: first, all masks are alpha-blended onto a copy of the original frame in descending area order so that smaller instances are painted last and remain visible on top. Second, contours and class+confidence labels are drawn over the composited frame using anti-aliased line rendering. Foreground text color is selected automatically using a Rec. 601 luma threshold so that labels stay legible on light, dark, and saturated backgrounds alike.
3. Mathematical Framework
3.1 Mask IoU and Non-Maximum Suppression
For two binary segmentation masks $A$ and $B$, the Intersection over Union is:
$$\text{IoU}(A, B) = \frac{|A \cap B|}{|A \cup B|}$$
During inference, Non-Maximum Suppression removes overlapping predictions of the same class when their IoU exceeds the threshold $\tau_{IoU} = 0.55$:
$$\text{Keep}(d_i) = \begin{cases} 1 & \text{if } \text{IoU}(d_i, d_j) \leq \tau_{IoU} \text{ for all } d_j \text{ with higher score} \\ 0 & \text{otherwise} \end{cases}$$
3.2 Alpha-Blended Mask Overlay
Per-class colored masks are composited with the original frame using linear interpolation:
$$I_{out}(x, y) = (1 - \alpha) \cdot I_{overlay}(x, y) + \alpha \cdot I_{frame}(x, y)$$
where $\alpha = 0.4$ controls mask transparency. Masks are painted in descending area order so that smaller (typically closer) instances remain visible on top:
$$\sigma = \text{argsort}\left(\{|M_i|\}_{i=1}^{N}\right)_{\text{descending}}$$
3.3 Luma-Based Adaptive Text Color
Foreground text color is selected based on the perceived brightness (Rec. 601 luma) of the background color:
$$Y = 0.299 R + 0.587 G + 0.114 B$$
$$\text{TextColor} = \begin{cases} (0, 0, 0) & \text{if } Y > 160 \text{ (light background)} \\ (255, 255, 255) & \text{otherwise (dark background)} \end{cases}$$
3.4 Performance Metrics
Segmentation evaluation uses standard mask-based metrics:
$$\text{Precision} = \frac{TP}{TP + FP}, \quad \text{Recall} = \frac{TP}{TP + FN}$$
$$\text{mAP}_{50} = \frac{1}{N} \sum_{c=1}^{N} AP_c^{50}, \quad \text{mAP}_{50\text{-}95} = \frac{1}{10 N} \sum_{c=1}^{N} \sum_{t \in \mathcal{T}} AP_c^{t}$$
where $\mathcal{T} = \{0.50, 0.55, 0.60, \ldots, 0.95\}$ and $N$ is the number of classes. The $(M)$ suffix denotes mask-based evaluation as opposed to bounding-box-based $(B)$.
4. Dataset
Dataset Name: Road Surface Obstacle and Traffic Warning Object Segmentation
Platform: Ultralytics Platform (Public)
License: CC BY-NC-ND 4.0
Total Images: 23
Total Annotations: 249
Image Format: WEBP
Mean Image Size: 1,784.8 × 1,050 px (Mean AR: 1.82)
Mean File Size: 188.9 KB
Mean Objects per Image: 11.7
Mean Polygon Points per Annotation: 36.4
Split Distribution:
| Split | Images | Percentage |
|---|---|---|
| Train | 16 | 69.6% |
| Validation | 7 | 30.4% |
Class Distribution:
| Index | Class | Annotations | Images |
|---|---|---|---|
| 0 | pothole | 67 (26.9%) | 11 |
| 4 | crack | 67 (26.9%) | 13 |
| 1 | traffic-cone | 59 (23.7%) | 12 |
| 2 | road-barrier | 34 (13.7%) | 9 |
| 3 | patch | 22 (8.8%) | 9 |
| — | Total | 249 | 54 |
5. Model
Model Name: road-surface-obstacle-and-traffic-warning-object-segmentation.pt
Platform: Ultralytics Platform
License: AGPL-3.0
Architecture: YOLO26x-seg (Ultralytics)
Training Hardware: NVIDIA RTX PRO 6000
Classes: 5 (pothole, traffic-cone, road-barrier, patch, crack)
Model Metrics:
| Metric | Value |
|---|---|
| mAP50(M) | 96.9% |
| mAP50-95(M) | 80.3% |
| Precision(M) | 95.7% |
| Recall(M) | 92.6% |
Training Notes:
- End-to-end training on the Ultralytics Platform — annotation with SAM 3, cloud GPU training, browser-based prediction
- Mask-based metrics
(M)reflect pixel-level segmentation quality rather than bounding-box overlap - The high mAP50(M) value reflects a controlled training distribution; production deployment requires expanding the dataset with diverse lighting, weather, and surface conditions
6. Requirements
opencv-python>=4.8.0
numpy>=1.24.0
ultralytics>=8.0.0
7. Installation & Configuration
7.1 Environment Setup
# Clone the repository
git clone https://github.com/kemalkilicaslan/Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System.git
cd Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System
# Install required packages
pip install -r requirements.txt
7.2 Project Structure
Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System/
├── Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System.py
├── README.md
├── requirements.txt
└── LICENSE
7.3 Required Files
- Custom YOLO Model:
road-surface-obstacle-and-traffic-warning-object-segmentation.pt(place in project directory) - Input Video: Road scene video file (MP4, MOV, AVI)
8. Usage / How to Run
8.1 Basic Execution
python Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation-System.py
8.2 Configuration Parameters
# Model and inference configuration
MODEL_PATH = "road-surface-obstacle-and-traffic-warning-object-segmentation.pt"
CONFIDENCE_THRESHOLD = 0.6 # Minimum confidence for a detection to be kept
IOU_THRESHOLD = 0.9 # NMS IoU threshold during inference
MASK_ALPHA = 0.4 # Mask transparency: 0.0 = opaque, 1.0 = invisible
8.3 Input / Output
# Update these lines in the script for your video
video_capture = cv2.VideoCapture("Road-Surface-Obstacle-and-Traffic-Warning-Object.mp4")
output_file = "Road-Surface-Obstacle-and-Traffic-Warning-Object-Segmentation.mp4"
8.4 Controls
- Press
qto quit the application during playback
8.5 Class Color Coding
| Group | Class | BGR Color |
|---|---|---|
| Road Surface Obstacles | pothole | (0, 60, 255) |
| crack | (10, 35, 10) |
|
| patch | (170, 235, 35) |
|
| Traffic Warning Objects | traffic-cone | (220, 230, 20) |
| road-barrier | (235, 235, 235) |
9. Application / Results
9.1 Input Video
Road Surface Obstacle and Traffic Warning Object:
9.2 Output Video
Road Surface Obstacle and Traffic Warning Object Segmentation:
9.3 Dataset Overview
Dataset & Charts:

Class Distribution:

Dataset Charts 1:

Dataset Charts 2:

9.4 Model Metrics
Training Metrics & Loss Curves:

mAP50(M): 96.9% | mAP50-95(M): 80.3% | Precision(M): 95.7% | Recall(M): 92.6%
10. Tech Stack
10.1 Core Technologies
- Programming Language: Python 3.8+
- Computer Vision: OpenCV 4.8+
- Deep Learning Framework: Ultralytics YOLO 8.0+
- Numerical Computing: NumPy 1.24+
- Training Platform: Ultralytics Platform
10.2 Libraries & Dependencies
| Library | Version | Purpose |
|---|---|---|
| opencv-python | 4.8+ | Video I/O, mask compositing, contour drawing, label rendering |
| ultralytics | 8.0+ | YOLO26x-seg model inference |
| numpy | 1.24+ | Array operations, mask resizing, area-based ordering |
10.3 Algorithm Components
| Component | Method | Purpose |
|---|---|---|
| Object Segmentation | Custom YOLO26x-seg | Detect and segment 5 road scene classes |
| Mask Resolution Sync | Nearest-neighbor interpolation | Align inference masks with frame dimensions |
| Mask Compositing | Largest-first alpha blending | Translucent class-colored mask overlay |
| Contour Drawing | OpenCV findContours + drawContours | Smooth anti-aliased instance outlines |
| Adaptive Text Color | Rec. 601 luma threshold | Legible labels on any background color |
10.4 Detection Parameters
| Parameter | Value | Description |
|---|---|---|
| Confidence Threshold | 0.6 | Minimum confidence for detection retention |
| IoU Threshold | 0.9 | NMS overlap threshold during inference |
| Mask Alpha | 0.4 | Mask blending transparency (0 = opaque, 1 = invisible) |
| Contour Thickness | 2 px | Anti-aliased instance outline width |
| Luma Threshold | 160 | Background brightness cutoff for text-color selection |
11. License
This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0). Commercial use, modifications, and derivative works are not permitted. Attribution is required upon use.
12. References
- Ultralytics Platform Documentation — Model training, inference, and deployment.
- Ultralytics Platform Road Surface Obstacle and Traffic Warning Object Segmentation System — Dataset annotation, training, and export.
- Ultralytics YOLO26 Documentation — Architecture and segmentation training.
- OpenCV Video I/O and Drawing Functions Documentation.
Acknowledgments
Special thanks to the Ultralytics team for the YOLO26x-seg architecture and the Ultralytics Platform, which was used end-to-end for dataset annotation (SAM 3), cloud GPU training (RTX PRO 6000), and model export. Thanks to the OpenCV community for providing excellent real-time video processing and drawing tools.
Note: This system is designed for research, educational, and ADAS prototyping purposes. Detection accuracy may vary depending on camera angle, lighting, weather conditions, and road surface variation. The dataset is intentionally compact to demonstrate end-to-end workflow on the Ultralytics Platform; for production deployment in autonomous driving or road infrastructure monitoring, the dataset should be expanded with diverse real-world conditions (different lighting, occlusion scenarios, weather, regional surface types) and the model retrained accordingly. The perception layer should be paired with uncertainty signaling so that a higher-level agent can defer decisions when encountering objects outside the training distribution.