I build data products and AI-powered analytics tools that turn complex datasets into clear business insights.
- π€ Building GenAI & NLP systems β trade intent detection, sentiment analysis, and LLM-powered analytics tools
- π§Ύ Developed AI Receipt Tracker β production-style system using LLM vision to parse receipts, categorize spend, and generate analytics dashboards
- π¬ Built recommender systems β content-based (TF-IDF + cosine similarity) and collaborative filtering (MovieLens)
- π³ Built fraud detection pipelines using Random Forest & XGBoost with revenue optimization analysis
- π Performed customer segmentation on 500K+ retail transactions using RFM modeling, K-Means clustering, and behavioral analytics
- π Strong in tree-based ML models β Decision Trees, Random Forest, XGBoost with hyperparameter tuning and cross-validation
- π Experienced in ML evaluation β RMSE, MAE, Precision@K, Recall@K, clustering validation & dimensionality reduction (PCA)
Currently exploring LLMs, RAG pipelines, and AI-powered data workflows.
π
IBM Certified Data Scientist & Data Analyst
π Machine Learning Specialization β Stanford
β Powered by curiosity, coffee, and F1 ποΈ
A production-grade AI-powered receipt parsing and spending tracker built end-to-end
| Feature | Details |
|---|---|
| π§ AI Parsing | GPT-4.1-mini vision + GPT-4o-mini text β reads PDFs and images |
| ποΈ Database | PostgreSQL (Supabase) + SQLAlchemy ORM with migrations |
| βοΈ Cloud Storage | AWS S3 for receipt file storage with presigned URLs |
| π Retry Logic | Tenacity retry wrapper for rate limits & API timeouts |
| π§Ή Smart Categorization | 20+ categories with sub-categories + category memory from history |
| π Duplicate Detection | SHA-256 file hashing to prevent double uploads |
| π Analytics | Plotly dashboards β spend by category, store, month + drill-down |
| π Auth | Password-protected with Streamlit session state |
| π° Cost Tracking | Per-call token logging with monthly API cost monitoring |
| π§Ύ Multi-store Support | Handles Costco, Target, Walmart β store-specific tax & discount logic |
Tech: Python β’ Streamlit β’ OpenAI API β’ PostgreSQL β’ SQLAlchemy β’ AWS S3 β’ Plotly β’ pandas β’ boto3
| Project | Description | Stack |
|---|---|---|
| π¬ CineMatch | Content-based & collaborative filtering recommender with full ML evaluation | Python, TF-IDF, Cosine Similarity, KMeans |
| π³ Credit Card Fraud Detection |
Fraud detection with revenue optimization | Python, Random Forest, XGBoost, SHAP |
| π’ Titanic Survival Prediction | Feature engineering + 6 model comparison achieving ~87% accuracy | Decision Tree, XGBoost, SVM, KNN, Random Forest |
| π Trade Intent NLP | Buy/Sell intent detection using POS tagging & sentiment analysis | Python, NLP, ML |
| ποΈ F1 Sentiment Analysis | Sentiment analysis on F1 tweets | Python, NLP, Twitter API |
| π E-commerce Customer Segmentation | Customer & product analytics on 500K+ UK retail transactions β K-Means clustering, RFM analysis, revenue seasonality, cancellation trends | Python, K-Means, Seaborn, Pandas |
Languages
ML & Data
Cloud & Data Engineering
Tools
Visualization & BI