This project implements a comprehensive computer vision system for face detection and person recognition using OpenCV and YOLO (You Only Look Once) deep learning models. The system offers a wide range of capabilities, from basic face detection in static images to real-time person identification using webcam streams.
There's a growing need for automated face and person recognition across various domains, including security monitoring, attendance automation, photo editing, and access control systems. By combining traditional computer vision methods (like Haar Cascade classifiers) with modern deep learning techniques (such as YOLO), this system delivers both lightweight face detection and advanced person identification capabilities.
The implementation has been designed with modularity in mind, allowing it to process both static images and live video streams. This demonstrates how computer vision can be applied to practical, real-world scenarios. Thanks to its flexible architecture, users can select the most appropriate module for their specific needs—whether that's simple face detection or complex, model-based person recognition.
Core Features:
The project employs two distinct approaches for computer vision tasks:
Face Detection: Utilizes OpenCV's Haar Cascade classifier (haarcascade_frontalface_default.xml), a machine learning-based approach where a cascade function is trained with positive and negative images. This traditional method is computationally efficient and suitable for basic face detection tasks.
Person Recognition: Implements YOLO (You Only Look Once) deep learning models, specifically custom-trained versions for identifying specific individuals. The YOLO architecture processes the entire image in a single forward pass, making it suitable for real-time applications while maintaining high accuracy.
The system is organized into five independent modules, each designed for specific use cases:
Each module is implemented as a standalone Python script, allowing flexible deployment based on requirements. The face detection modules use OpenCV's pre-trained Haar Cascade classifier for rapid detection, while person recognition modules leverage Ultralytics YOLO framework with custom-trained models for specific individual identification. All real-time modules include graceful exit mechanisms (press 'q' to quit) and proper resource cleanup.
The Haar Cascade classifier uses a cascade of weak classifiers to detect faces through sliding window approach:
$$F(x) = \begin{cases} 1 & \text{if } \sum_{i=1}^{n} \alpha_i h_i(x) \geq \theta \\ 0 & \text{otherwise} \end{cases}$$
where:
Rectangular features calculated as difference between sum of pixels in adjacent regions:
$$f_{\text{haar}} = \sum_{\text{white region}} I(x,y) - \sum_{\text{black region}} I(x,y)$$
where $I(x,y)$ represents pixel intensity at position $(x,y)$.
The integral image allows fast feature calculation:
$$II(x,y) = \sum_{x' \leq x, y' \leq y} I(x',y')$$
Any rectangular sum can be computed in constant time:
$$\text{Sum} = II(D) + II(A) - II(B) - II(C)$$
where $A, B, C, D$ are corners of the rectangle.
YOLO divides the image into $S \times S$ grid and predicts bounding boxes:
$$P(\text{object}) \times IOU_{\text{pred}}^{\text{truth}} = \text{Confidence Score}$$
Bounding Box Prediction:
$$\text{bbox} = (x, y, w, h, \text{confidence}, c_1, c_2, ..., c_n)$$
where:
YOLO uses NMS to eliminate redundant detections:
$$\text{IoU}(box_i, box_j) = \frac{\text{Area}(box_i \cap box_j)}{\text{Area}(box_i \cup box_j)}$$
Boxes with $\text{IoU} > \text{threshold}$ are suppressed if their confidence is lower than the maximum.
The YOLO loss function combines localization, confidence, and classification losses:
$$\mathcal{L} = \lambda_{\text{coord}} \mathcal{L}_{\text{box}} + \lambda_{\text{obj}} \mathcal{L}_{\text{obj}} + \lambda_{\text{noobj}} \mathcal{L}_{\text{noobj}} + \lambda_{\text{class}} \mathcal{L}_{\text{class}}$$
where:
Precision: Proportion of correct positive predictions
$$\text{Precision} = \frac{TP}{TP + FP}$$
Recall: Proportion of actual positives correctly identified
$$\text{Recall} = \frac{TP}{TP + FN}$$
F1 Score: Harmonic mean of precision and recall
$$F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$
Mean Average Precision (mAP):
$$\text{mAP} = \frac{1}{N} \sum_{i=1}^{N} AP_i$$
where $AP_i$ is the average precision for class $i$.
requirements.txt
numpy>=1.19.0
opencv-python>=4.5.0
ultralytics>=8.0.0
# Clone the repository
git clone https://github.com/kemalkilicaslan/Face-Detection-and-Person-Recognition-System.git
cd Face-Detection-and-Person-Recognition-System
# Install required packages
pip install -r requirements.txt
Face-Detection-and-Person-Recognition-System
├── Detect-Faces-in-Photo.py
├── Detect-Faces-in-Video.py
├── Real-Time-Face-Detection.py
├── Real-Time-Person-Recognition.py
├── Person-Recognition-in-Photo-and-Video.py
├── haarcascade_frontalface_default.xml
├── README.md
├── requirements.txt
└── LICENSE
For Face Detection:
haarcascade_frontalface_default.xml - Download from OpenCV GitHub repository or included in OpenCV installationFor Person Recognition:
.pt format)
Kemal-Kilicaslan.ptHow-I-Met-Your-Mother-Person-Recognition-model.ptpython Detect-Faces-in-Photo.py
Requirements:
HIMYM.jpg (modify in script)HIMYM-faces-detected.jpgpython Detect-Faces-in-Video.py
Requirements:
HIMYM.mp4 (modify in script)HIMYM-faces-detected.mp4python Real-Time-Face-Detection.py
Controls:
q to quit the applicationpython Person-Recognition-in-Photo-and-Video.py
Configuration:
python Real-Time-Person-Recognition.py
Requirements:
Kemal-Kilicaslan.ptControls:
q to exitInput Image:
Output Image:
Input Image:
Output Image:
Input Video:
Output Video:
Input Video:
Output Video:
Demo Video:
Demo Video:
Performance varies based on hardware and input resolution:
| Metric | Face Detection | Person Recognition (YOLO) |
|---|---|---|
| Processing Speed | 30+ FPS | 15-30 FPS (CPU), 60+ FPS (GPU) |
| Detection Accuracy | 85-95% | 90-98% (with proper training) |
| False Positive Rate | Low (5-10%) | Very Low (2-5%) |
[Camera/Image Input]
↓
[Convert to Grayscale]
↓
[Apply Histogram Equalization] (optional)
↓
[Haar Cascade Detection]
├─ Integral Image Computation
├─ Sliding Window Search
├─ Multi-scale Detection
└─ Cascade Classifier Evaluation
↓
[Filter False Positives]
├─ Minimum Size Filter
├─ Neighbor Grouping
└─ Confidence Threshold
↓
[Draw Bounding Boxes]
↓
[Display/Save Output]
[Camera/Image Input]
↓
[Image Preprocessing]
├─ Resize to Model Input Size
├─ Normalize Pixel Values
└─ Channel Conversion (RGB)
↓
[YOLO Model Inference]
├─ Backbone Feature Extraction
├─ Neck Feature Fusion
└─ Detection Head Prediction
↓
[Post-Processing]
├─ Confidence Filtering
├─ Non-Maximum Suppression
└─ Class-specific Thresholding
↓
[Annotate with Labels & Confidence]
↓
[Display/Save Output]
Haar Cascade:
| Library | Version | Purpose |
|---|---|---|
| opencv-python | 4.5+ | Image processing, video capture, and face detection |
| ultralytics | 8.0+ | YOLO model implementation and inference |
| numpy | 1.19+ | Array operations (dependency) |
Haar Cascade Classifier:
haarcascade_frontalface_default.xmlCustom YOLO Models:
This project is open source and available under the Apache License 2.0.
Special thanks to the OpenCV and Ultralytics communities for providing excellent computer vision tools and documentation. Sample images and demonstrations use content from "How I Met Your Mother" for educational purposes only. The Haar Cascade classifier was developed by Viola and Jones (2001), revolutionizing real-time face detection. YOLO architecture continues to evolve, with YOLOv8 representing the latest advancement in unified object detection frameworks.
Note: Ensure you have proper permissions and comply with privacy regulations when using facial recognition technology in production environments. This system is intended for educational and research purposes in controlled settings. Always respect individual privacy rights and obtain appropriate consent before deploying face recognition systems. Consider ethical implications and potential biases in facial recognition technology.