An open and evolving collection of repos exploring how AI, fundamental, and quantitative methods apply to institutional investment research.
Ideas come from experience managing long/short institutional equity portfolios, academic research, and the open-source community. Each repo is both a working tool and a learning exercise. Input and perspectives are welcome.
Created and maintained by a former long/short equity portfolio manager with 20+ years of institutional buy-side experience.
Curiosity compounds. Rigor endures.
Evaluating and improving LLM performance on financial reasoning tasks — building rubrics, adversarial tests, preference data, and multi-agent systems to assess whether AI models can meet institutional-grade investment standards.
investment-workflow-evals — Scoring rubrics for the full institutional workflow (thesis → catalysts → sizing → risk → monitoring → post-mortem). Adversarial variants target LLM failure modes: regime-blind extrapolation, confident nonsense on illiquid names, circular reasoning.
fin-reasoning-eval — 306 finance reasoning problems (valuation, accounting, credit, portfolio math) with difficulty grading and worked solutions.
judgment-under-uncertainty-eval — Evaluates LLM calibration and decision-making under ambiguity in financial contexts.
excel-model-eval (private) — Graph-based structural auditing of LLM-generated Excel models: dependency tracing, circular reference detection, balance sheet consistency, complexity scoring.
institutional-investor-casebook (private) — Case studies testing institutional investment reasoning across strategies and market regimes.
conviction-gradient-framework — Conviction scoring and position sizing framework. Maps qualitative thesis strength to quantitative allocation signals.
multi-agent-investment-committee (private) — Five-agent IC (sector analyst, short analyst, risk manager, macro analyst, PM) on LangGraph. Structured debate, committee memo with sizing. Shapley attribution, 6 portfolio optimizers. Bloomberg/IBKR adapters.
redflag-ex1-analyst — Red-flag detection for analyst notes. Identifies buried assumptions, one-sided risk, stale comps, missing sensitivity analysis. PDF/DOCX ingestion with section-aware parsing.
ls-portfolio-lab — L/S portfolio construction and risk analysis. Attribution, drawdown decomposition, rebalancing, trade impact. Yahoo, Bloomberg, IB providers. Streamlit dashboard.
backtest-lab — Event-driven backtesting with realistic execution modeling. Regime detection (threshold + HMM). Statistical inference (PSR, MinTRL, FDR). Bias guards for lookahead leakage and overfitting.
investment-research-rag (private) — Document ingestion and retrieval for SEC filings, earnings transcripts, equity research. Hybrid search (dense + BM25/RRF), cross-encoder reranking, citation traceability.
fund-tracker-13f (private) — Institutional holdings analysis from SEC 13F filings.
financial-data-providers — Shared market data provider package with adapter pattern. Yahoo, Bloomberg, IBKR. Used by MAIC, backtest-lab, ls-portfolio-lab.
sec-financial-model-builder (private) — Builds professional-grade Excel financial models from SEC EDGAR XBRL data. LLM-assisted concept mapping and narrative generation (Anthropic/Gemini).
---note, certain modules are private repos---
- Methods: RLHF preference data; adversarial red teaming; guardrail/safety taxonomy testing.
- Infrastructure: Scoring rubrics; golden answer authoring; domain-specific fine-tuning (SFT).
- Benchmarking: 306-problem finance reasoning benchmark with difficulty grading and multi-model leaderboard; institutional workflow evals covering thesis → sizing → risk → monitoring → post-mortem.
- Model Audit: Graph-based structural auditing of LLM-generated Excel models — dependency tracing, circular reference detection, balance sheet consistency.
- Signal: Preference pairs where domain-expertise signal outweighs stylistic polish.
- Criteria: Transparency of assumptions; quantitative precision; intellectual honesty regarding uncertainty.
- Pipeline: Section-aware 10-K/10-Q ingestion; boilerplate filtering; K-ranking annotation; multi-provider generation (Claude, GPT-4o, Gemini).
- Investment Committee: Five-agent system with structured debate and configurable parameters.
- Reasoning Traces: THINK → PLAN → ACT → REFLECT loop with full trace visibility.
- Output Signal: Directional T-signal (direction × entropy-adjusted confidence) as RL input for downstream portfolio systems.
- Red Teaming: Multi-turn escalation sequences testing safety beyond first-refusal holds. Hypothesis-driven with full conversation path reproducibility.
- Guardrails: Evaluating deterministic filters, semantic classifiers, and system prompt constraints.
- Purple Teaming: Translating red team findings into refined safety taxonomies and targeted SFT/RLHF updates.
- Dual-Use Risk: Calibrating harm severity in financial contexts — distinguishing legitimate analysis from manipulation facilitation.
Over 20 years institutional buy-side experience (PM/Analyst | L/S equity | SAC/Point72, WRC). MBA Finance. MS Analytics & Modeling (ML/Deep Learning). Northwestern. CFA® Charterholder.
Python · PyTorch · Hugging Face (transformers, datasets, evaluate) · Weights & Biases · Braintrust · Promptfoo · LangGraph · Streamlit · pandas · SQL · Git
Local inference on Mac M4 Max (128GB RAM). Lambda Cloud dual-GPU (2× NVIDIA) for larger workloads.
Claude (Anthropic) is the preferred model across all LLM-integrated repos. Multi-agent, evaluation, and generation modules are built around Claude where applicable.
The maintainer strongly supports Anthropic's leadership and their commitment to treating AI safety and moral responsibility with the same rigor as capability.
- Bailey, David H., and Marcos López de Prado. 2014. "The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality." Journal of Portfolio Management. SSRN 2460551.
- CHSOFT AG. 2022. Practical Performance Calculation. v4.0.
- Darmanin, Adam. n.d. "Language Model Guided Reinforcement Learning in Quantitative Trading." University of Malta.
- López de Prado, Marcos. 2018. Advances in Financial Machine Learning. Hoboken, NJ: Wiley.
- López de Prado, Marcos. 2020. Machine Learning for Asset Managers. Cambridge: Cambridge University Press.
- López de Prado, Marcos. 2023. Causal Factor Investing: Can Factor Investing Become Scientific? Cambridge: Cambridge University Press.
- Paleologo, Giuseppe A. 2021. Advanced Portfolio Management: A Quant's Guide for Fundamental Investors. Hoboken, NJ: Wiley. (Focus: Chapters 6–8)
- Paleologo, Giuseppe A. 2024. The Elements of Quantitative Investing. Hoboken, NJ: Wiley. (Focus: Sections 3.5, 3.6, 4.4, 4.5, and Chapter 7)
- Ahmed, Nisha Arya. 2022. "Vanishing/Exploding Gradients in Deep Neural Networks." Heartbeat. Link.
- Brownlee, Jason. n.d. Machine Learning Mastery. https://machinelearningmastery.com/.
- Chollet, François. 2021. Deep Learning with Python. 2nd ed. Manning Publications.
- Gao, Hanyao, and Gang Kou, et al. 2022. "Machine Learning in Business and Finance: A Literature Review and Research Opportunities." Financial Innovation. DOI: 10.1186/s40854-022-00353-8.
- Géron, Aurélien. 2022. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. 3rd ed. O'Reilly Media.
- Géron, Aurélien. 2023. Hands-On Machine Learning with Scikit-Learn and PyTorch: Concepts, Tools, and Techniques to Build Intelligent Systems. 1st ed. Sebastopol, CA: O'Reilly Media.
- Ha, Vi Q. n.d. "Building an RLHF Pipeline for LLMs: A Beginner-Friendly Tutorial."
- Chivers, Tom. 2024. Everything Is Predictable: How Bayesian Statistics Explain Our World.
- Cromwell, David. n.d. Richard Feynman's Mental Models.
- Dylan, Bob. Thematic evolution and narrative complexity.
- Weir, Bob. Improvisational theory and structural interplay.

