This project implements and evaluates custom Convolutional Neural Network (CNN) architectures for image similarity detection. The core objective is to classify whether a pair of images belongs to the same shoe or different shoes.
Beyond basic classification, this project explores architectural trade-offs and hardware-constrained optimization, specifically addressing VRAM limitations and memory management efficiency.
The system processes image pairs of size
- Standard CNN: A modular 4-block architecture utilizing Batch Normalization and Dropout for regularization.
-
CNNChannel (Optimized): A specialized approach that concatenates input images along the channel dimension. Instead of treating images as separate entities, the model processes a 6-channel tensor:
$$X_{input} \in \mathbb{R}^{N \times 6 \times 224 \times 224}$$ This allows the initial convolutional layers to learn filters that directly compare corresponding spatial features between the two images, leading to better convergence.
Training deep models on consumer-grade hardware (e.g., NVIDIA GTX 1660 Ti) presented significant Out-of-Memory (OOM) challenges during the training of multi-layer CNNs.
- Solution: Implemented dynamic batch-size scaling and efficient data loading through PyTorch
DataLoaders. - Optimization: Applied mean-subtraction and normalization as a pre-processing step to accelerate convergence and reduce the computational overhead of the first layer.
Initial tests showed a significant gap between Validation Accuracy (~88%) and Test Accuracy (~70%) in basic models.
-
Regularization: Integrated
Dropout($p=0.5$ ) andBatch Normalizationacross all convolutional blocks to stabilize training. - Early Stopping: Monitored validation loss to prevent the model from memorizing training set noise.
-
Result: The
CNNChannelmodel proved significantly more stable, achieving an average accuracy of 86.5% on unseen test sets.
| Model | Val Accuracy | Test (Male) | Test (Female) | Status |
|---|---|---|---|---|
| Standard CNN | 84.0% | 71.2% | 75.0% | Overfitted |
| CNNChannel | 88.2% | 86.1% | 87.2% | Optimal |
├── data/ # Dataset (Ignored by Git)
├── models/
│ ├── shoe_models.py # PyTorch architectures (CNN & CNNChannel)
│ └── weights/ # Saved model checkpoints (.pk files)
├── notebooks/
│ └── shoe_classification.ipynb # Training loops & Performance Analysis
├── requirements.txt # Dependency list
└── README.md # Project documentation