Shoe Pair Classification using PyTorch

This project implements and evaluates custom Convolutional Neural Network (CNN) architectures for image similarity detection. The core objective is to classify whether a pair of images belongs to the same shoe or different shoes.

Beyond basic classification, this project explores architectural trade-offs and hardware-constrained optimization, specifically addressing VRAM limitations and memory management efficiency.

🚀 Project Overview

The system processes image pairs of size $224 \times 224 \times 3$. The challenge lies in learning spatial correlations between two distinct images to determine identity. Two primary architectures were benchmarked:

Standard CNN: A modular 4-block architecture utilizing Batch Normalization and Dropout for regularization.
CNNChannel (Optimized): A specialized approach that concatenates input images along the channel dimension. Instead of treating images as separate entities, the model processes a 6-channel tensor: $$X_{input} \in \mathbb{R}^{N \times 6 \times 224 \times 224}$$ This allows the initial convolutional layers to learn filters that directly compare corresponding spatial features between the two images, leading to better convergence.

🛠️ Engineering Challenges & Solutions

1. Memory Management (VRAM & OOM)

Training deep models on consumer-grade hardware (e.g., NVIDIA GTX 1660 Ti) presented significant Out-of-Memory (OOM) challenges during the training of multi-layer CNNs.

Solution: Implemented dynamic batch-size scaling and efficient data loading through PyTorch DataLoaders.
Optimization: Applied mean-subtraction and normalization as a pre-processing step to accelerate convergence and reduce the computational overhead of the first layer.

2. Overfitting & Generalization

Initial tests showed a significant gap between Validation Accuracy (~88%) and Test Accuracy (~70%) in basic models.

Regularization: Integrated Dropout ($p=0.5$) and Batch Normalization across all convolutional blocks to stabilize training.
Early Stopping: Monitored validation loss to prevent the model from memorizing training set noise.
Result: The CNNChannel model proved significantly more stable, achieving an average accuracy of 86.5% on unseen test sets.

📊 Performance Benchmarking

Model	Val Accuracy	Test (Male)	Test (Female)	Status
Standard CNN	84.0%	71.2%	75.0%	Overfitted
CNNChannel	88.2%	86.1%	87.2%	Optimal

📁 Repository Structure

├── data/                 # Dataset (Ignored by Git)
├── models/
│   ├── shoe_models.py    # PyTorch architectures (CNN & CNNChannel)
│   └── weights/          # Saved model checkpoints (.pk files)
├── notebooks/
│   └── shoe_classification.ipynb # Training loops & Performance Analysis
├── requirements.txt      # Dependency list
└── README.md             # Project documentation

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
models		models
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shoe Pair Classification using PyTorch

🚀 Project Overview

🛠️ Engineering Challenges & Solutions

1. Memory Management (VRAM & OOM)

2. Overfitting & Generalization

📊 Performance Benchmarking

📁 Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Shoe Pair Classification using PyTorch

🚀 Project Overview

🛠️ Engineering Challenges & Solutions

1. Memory Management (VRAM & OOM)

2. Overfitting & Generalization

📊 Performance Benchmarking

📁 Repository Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages