A comprehensive educational project demonstrating why residual connections matter in deep learning through hands-on implementation and visualization.
This project provides a complete, reproducible implementation that:
- Implements ResNet-18 from scratch in PyTorch
- Compares it against a Plain CNN with the same depth
- Visualizes training dynamics, gradient flow, and attention maps
- Explains the theory behind residual learning
| Model | Best Test Accuracy | Parameters |
|---|---|---|
| ResNet-18 | ~92% | 11.2M |
| Plain CNN-18 | ~82% | 11.2M |
| Improvement | +10% | Same |
Key Insight: With identical depth and parameters, ResNet dramatically outperforms the plain network, demonstrating that residual connections solve the optimization problem, not just add capacity.
Input x
│
├──────────────────┐
│ │
Conv 3×3 │
│ Identity
BN + ReLU Shortcut
│ │
Conv 3×3 │
│ │
BN │
│ │
└───── + ─────────┘
│
ReLU
│
Output y = F(x) + x
Instead of learning H(x) directly, the network learns the residual F(x) = H(x) - x:
- Easy Identity: If optimal is identity, just push F(x) → 0
- Gradient Highway: Identity path ensures gradients flow directly
- Additive Learning: Each block adds a small correction
resnet_project/
│
├── model.py # ResNet-18 and Plain CNN implementations
├── train.py # Training script with comparison experiment
├── utils.py # Training utilities and metrics
├── visualize.py # Static visualization generation
├── gradcam.py # Grad-CAM attention visualization
├── app.py # Interactive Streamlit dashboard
├── requirements.txt # Python dependencies
└── README.md # This file
# Clone the repository
git clone https://github.com/imjbassi/ResNet.git
cd ResNet
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Train both models and compare (50 epochs, ~30 min on GPU)
python train.py --compare --epochs 50
# Or train individually
python train.py --model resnet --epochs 50
python train.py --model plain --epochs 50# Generate all visualizations
python visualize.py --results-dir ./results/TIMESTAMP
# Or run with demo data
python visualize.py --demostreamlit run app.pyCompares loss and accuracy between ResNet and Plain CNN over training epochs.
Shows how ResNet's advantage over Plain CNN grows during training.
Shows how gradients propagate through layers - ResNet maintains consistent gradients while Plain CNN suffers from vanishing gradients.
Visual explanation of the residual block structure and skip connections.
Complete training analysis dashboard with all metrics.
# Plain networks degrade with depth
plain_18_layer = 82% # accuracy
plain_34_layer = 78% # WORSE with more layers!class ResidualBlock(nn.Module):
def forward(self, x):
out = self.conv1(x)
out = self.bn1(out)
out = F.relu(out)
out = self.conv2(out)
out = self.bn2(out)
# THE KEY: Skip connection
out += self.shortcut(x) # Add input directly
return F.relu(out)∂L/∂x = ∂L/∂y × (∂F(x)/∂x + 1)
↑
Always has a path!
This project teaches:
-
Deep Learning Fundamentals
- Vanishing gradient problem
- Batch normalization
- Modern training techniques
-
PyTorch Best Practices
- Model modularization
- Custom training loops
- Visualization hooks
-
Research Methodology
- Controlled experiments
- Ablation studies
- Results visualization
-
Software Engineering
- Code organization
- Documentation
- Reproducibility
Implemented and trained ResNet-18 from scratch on CIFAR-10, demonstrating 10% accuracy improvement over a plain CNN through residual connections. Created comprehensive visualizations including Grad-CAM attention maps to explain model behavior.
| Parameter | Default | Description |
|---|---|---|
--epochs |
50 | Number of training epochs |
--batch-size |
128 | Batch size for training |
--lr |
0.1 | Initial learning rate |
--weight-decay |
5e-4 | L2 regularization |
Cosine annealing learning rate schedule for smooth convergence.
-
Deep Residual Learning for Image Recognition
- He, K., Zhang, X., Ren, S., & Sun, J. (2016)
- arXiv:1512.03385
-
Grad-CAM: Visual Explanations from Deep Networks
- Selvaraju, R. R., et al. (2017)
- arXiv:1610.02391
Contributions welcome! Please feel free to submit issues and pull requests.
MIT License - feel free to use this code for learning and projects.
- Medical Imaging: Replace CIFAR-10 with X-ray or MRI data
- Transfer Learning: Fine-tune pretrained ResNet on custom dataset
- Architecture Variants: Implement ResNet-34, ResNet-50, ResNeXt
- Efficiency Analysis: Measure FLOPs, inference time, memory usage
- Attention Mechanisms: Add SE blocks or CBAM modules
Built for learning deep learning fundamentals




