Skip to content

Chlsim/ClassifyingShoes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shoe Pair Classification using PyTorch

This project implements and evaluates custom Convolutional Neural Network (CNN) architectures for image similarity detection. The core objective is to classify whether a pair of images belongs to the same shoe or different shoes.

Beyond basic classification, this project explores architectural trade-offs and hardware-constrained optimization, specifically addressing VRAM limitations and memory management efficiency.

🚀 Project Overview

The system processes image pairs of size $224 \times 224 \times 3$. The challenge lies in learning spatial correlations between two distinct images to determine identity. Two primary architectures were benchmarked:

  1. Standard CNN: A modular 4-block architecture utilizing Batch Normalization and Dropout for regularization.
  2. CNNChannel (Optimized): A specialized approach that concatenates input images along the channel dimension. Instead of treating images as separate entities, the model processes a 6-channel tensor: $$X_{input} \in \mathbb{R}^{N \times 6 \times 224 \times 224}$$ This allows the initial convolutional layers to learn filters that directly compare corresponding spatial features between the two images, leading to better convergence.

🛠️ Engineering Challenges & Solutions

1. Memory Management (VRAM & OOM)

Training deep models on consumer-grade hardware (e.g., NVIDIA GTX 1660 Ti) presented significant Out-of-Memory (OOM) challenges during the training of multi-layer CNNs.

  • Solution: Implemented dynamic batch-size scaling and efficient data loading through PyTorch DataLoaders.
  • Optimization: Applied mean-subtraction and normalization as a pre-processing step to accelerate convergence and reduce the computational overhead of the first layer.

2. Overfitting & Generalization

Initial tests showed a significant gap between Validation Accuracy (~88%) and Test Accuracy (~70%) in basic models.

  • Regularization: Integrated Dropout ($p=0.5$) and Batch Normalization across all convolutional blocks to stabilize training.
  • Early Stopping: Monitored validation loss to prevent the model from memorizing training set noise.
  • Result: The CNNChannel model proved significantly more stable, achieving an average accuracy of 86.5% on unseen test sets.

📊 Performance Benchmarking

Model Val Accuracy Test (Male) Test (Female) Status
Standard CNN 84.0% 71.2% 75.0% Overfitted
CNNChannel 88.2% 86.1% 87.2% Optimal

📁 Repository Structure

├── data/                 # Dataset (Ignored by Git)
├── models/
│   ├── shoe_models.py    # PyTorch architectures (CNN & CNNChannel)
│   └── weights/          # Saved model checkpoints (.pk files)
├── notebooks/
│   └── shoe_classification.ipynb # Training loops & Performance Analysis
├── requirements.txt      # Dependency list
└── README.md             # Project documentation

About

Deep Learning project using PyTorch to classify shoe image pairs. Features custom CNN architectures (CNNChannel) with a focus on hardware-constrained optimization and VRAM management.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors