This project implements a deep learning-based facial expression recognition system using the FER-2013 dataset and VGG19 architecture with batch normalization. The system can classify facial expressions into seven distinct emotion categories: Angry, Disgust, Fear, Happy, Neutral, Sad, and Surprise.
The project addresses the growing need for emotion recognition across various domains, including human-computer interaction, mental health monitoring, customer sentiment analysis, and entertainment applications. By leveraging transfer learning with VGG19 and custom classifier heads, the system achieves robust performance on grayscale facial images.
The implementation demonstrates practical applications of deep learning for emotion detection, using data augmentation techniques to improve model generalization and prevent overfitting on the relatively small 48ร48 pixel images.
Core Features:
The system employs transfer learning with VGG19, a deep convolutional neural network pre-trained on ImageNet. The architecture is adapted for facial expression recognition by replacing the final classification layers with custom fully-connected layers tailored for 7 emotion classes.
The facial expression recognition pipeline consists of:
The implementation uses PyTorch as the deep learning framework with the timm library for accessing pre-trained models. Data augmentation techniques (horizontal flips, rotations) are applied during training to improve model robustness. The custom classifier includes dropout layers (30% rate) to prevent overfitting, and gradient clipping is employed to stabilize training.
The VGG19 architecture consists of:
\[\text{Input} \xrightarrow{\text{Dropout}(0.3)} \text{Linear}(4096 \to 512) \xrightarrow{\text{ReLU}} \text{Dropout}(0.3) \xrightarrow{} \text{Linear}(512 \to 7)\]
Cross-entropy loss for multi-class classification:
\[\mathcal{L} = -\sum_{i=1}^{N} \sum_{c=1}^{C} y_{ic} \cdot \log(p_{ic})\]
where:
\[\text{Accuracy} = \frac{1}{N} \sum_{i=1}^{N} \mathbb{1}[\arg\max(p_i) = y_i]\]
where \(\mathbb{1}\) is the indicator function returning 1 for correct predictions.
Training transformations:
requirements.txt
numpy>=1.19.0
matplotlib>=3.3.0
plotly>=5.0.0
torch>=1.9.0
torchvision>=0.10.0
timm>=0.4.12
tqdm>=4.62.0
pillow>=8.0.0
# Clone the repository
git clone https://github.com/kemalkilicaslan/Facial-Expression-Recognition-System.git
cd Facial-Expression-Recognition-System
# Install required packages
pip install -r requirements.txt
Facial-Expression-Recognition-System
โโโ test/ # Test dataset (7,178 images)
โ โโโ angry/
โ โโโ disgust/
โ โโโ fear/
โ โโโ happy/
โ โโโ neutral/
โ โโโ sad/
โ โโโ surprise/
โโโ train/ # Training dataset (28,709 images)
โ โโโ angry/
โ โโโ disgust/
โ โโโ fear/
โ โโโ happy/
โ โโโ neutral/
โ โโโ sad/
โ โโโ surprise/
โโโ Facial-Expression-Recognition-System.ipynb
โโโ README.md
โโโ requirements.txt
โโโ LICENSE
FER-2013 Dataset:
Open and run the Jupyter notebook:
jupyter notebook Facial-Expression-Recognition-System.ipynb
Or run cells sequentially in Google Colab after uploading the notebook.
# Hyperparameters (can be modified in the notebook)
lr = 0.0001 # Learning rate
batch_size = 16 # Batch size for training
epochs = 20 # Number of training epochs
device = 'cuda' # 'cuda' for GPU, 'cpu' for CPU
model_name = 'vgg19_bn' # Model architecture
dropout_rate = 0.3 # Dropout probability
# Initialize model
model = FaceRecognitionModel(dropout_rate=0.3)
model.to(device)
# Initialize optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=0.01)
# Train the model
for epoch in range(epochs):
train_loss, train_acc = train_func(model, trainloader, optimizer, epoch)
test_loss, test_acc = eval_func(model, testloader, epoch)
# Save best model
if test_loss < best_test_loss:
torch.save(model.state_dict(), 'best-weights.pt')
# Load best model
model.load_state_dict(torch.load('best-weights.pt'))
# Make predictions
predict(model, testloader, num_class=7)
Training Set Example:
Test Set Example:
Loss and Accuracy Over 20 Epochs:
Angry Expression:
Disgust Expression:
Fear Expression:
Happy Expression:
Neutral Expression:
Sad Expression:
Surprise Expression:
Final Model Performance (Epoch 10 - Best Model):
| Metric | Training Set | Test Set |
|---|---|---|
| Loss | 0.8215 | 0.9448 |
| Accuracy | 70.51% | 66.33% |
Per-Emotion Performance:
| Emotion | Recognition Quality |
|---|---|
| Happy | Excellent (98.41%) |
| Surprise | Excellent (96.73%) |
| Angry | Good (59.37%) |
| Neutral | Moderate (50.31%) |
| Fear | Moderate (37.67%) |
| Sad | Challenging (misclassified as Fear) |
| Disgust | Challenging (misclassified as Surprise) |
[FER-2013 Dataset]
โ
[Data Loading & Preprocessing]
โโโ Train: 28,709 images
โโโ Test: 7,178 images
โ
[Data Augmentation (Training Only)]
โโโ Random Horizontal Flip (p=0.5)
โโโ Random Rotation (ยฑ20ยฐ)
โโโ Tensor Conversion
โ
[VGG19 Feature Extraction]
โโโ Convolutional Layers (16 layers)
โโโ Batch Normalization
โโโ ReLU Activation
โโโ Max Pooling
โ
[Global Average Pooling]
โ
[Custom Classifier Head]
โโโ Dropout(0.3)
โโโ Linear(4096 โ 512)
โโโ ReLU
โโโ Dropout(0.3)
โโโ Linear(512 โ 7)
โ
[Softmax Activation]
โ
[Emotion Classification]
โโโ Angry
โโโ Disgust
โโโ Fear
โโโ Happy
โโโ Neutral
โโโ Sad
โโโ Surprise
โ
[Prediction Output]
โโโ Class probabilities + Visualization
| Library | Version | Purpose |
|---|---|---|
| torch | 1.9+ | Deep learning framework |
| torchvision | 0.10+ | Image transformations and datasets |
| timm | 0.4.12+ | Pre-trained model access (VGG19) |
| numpy | 1.19+ | Numerical computations |
| matplotlib | 3.3+ | Static visualization |
| plotly | 5.0+ | Interactive visualization |
| tqdm | 4.62+ | Progress bar for training |
| pillow | 8.0+ | Image processing |
VGG19 with Batch Normalization:
| Parameter | Value | Purpose |
|---|---|---|
| Optimizer | AdamW | Improved Adam with weight decay |
| Learning Rate | 0.0001 | Step size for gradient descent |
| Weight Decay | 0.01 | L2 regularization |
| Batch Size | 16 | Mini-batch size |
| Epochs | 20 | Training iterations |
| Dropout Rate | 0.3 | Prevent overfitting |
| Gradient Clipping | 1.0 | Prevent gradient explosion |
This project is open source and available under the Apache License 2.0.
This project uses the FER-2013 dataset created for the Facial Expression Recognition Challenge. Special thanks to the PyTorch and timm communities for providing excellent deep learning tools and pre-trained models. The VGG19 architecture was originally developed by the Visual Geometry Group at the University of Oxford.
Note: This system is designed for research and educational purposes. Facial expression recognition should be used responsibly and ethically, with consideration for privacy, consent, and potential biases in emotion detection. The model's performance varies across different emotions, with some expressions (like "disgust" and "sad") being more challenging to classify accurately.