/ projects / alphaforge

AlphaForge

experimental

Evolutionary optimization over a typed prompt-bundle surface, scored against point-in-time equity cohorts with deterministic regime-conditional screening and tax-aware exit policies. The system now runs a stable search pipeline, produces historical shadow candidates, and blocks live-persona promotion unless a bundle survives a stricter validation and shadow-readiness gate.

Inspired by Karpathy's autoresearch (autonomous experimentation loop) and Atlas-GIC (multi-agent prompt optimization via Sharpe). Diverges from both: this is a governed IC workflow where structural risk gates — not prompt mutation alone — are the discipline mechanism.

Codex MCP Python SEC EDGAR FRED yfinance

The Forge

AlphaForge treats investment research prompts as parameters to be optimized. Each "prompt bundle" is a typed configuration: persona overlays, ranking weights, screener template, guidance style, and tool budgets. The system mutates these variables and scores the output against historical equity cohorts — keeping only the survivors.

# The evolutionary loop
seeds      = load_prompt_bundles()                # 6 philosophy-first seed bundles
universe   = build_universe_as_of(date)           # liquid large/mid-cap PIT subset
regime     = precompute_regime_packet(date)       # once per decision date
shortlist  = screen_equities(universe, regime)    # auto_cycle_map_v1
research   = run_research(shortlist, bundle)       # LLM thesis + filing analysis
scored     = score_cohorts(research, SPY)          # 12m / 36m / 60m after-tax
survivors  = gate(scored, mean_36m > 0)          # complete coverage required
mutations  = diversify_process_hypotheses()       # weights, evidence, screen strictness
shadow     = stage_for_manual_review(candidate)   # never auto-promote live

Three Layers

Deterministic Screener

Philosophy-first via auto_cycle_map_v1: a PIT regime packet is computed once per date and mapped into approved internal screen profiles. SEC company facts + yfinance prices + FRED macro series. Factor scoring: quality resilience, balance sheet strength, valuation, capital allocation, price confirmation. Later generations mostly mutate thresholds and evidence discipline, not direct stock-picking rules.

LLM Research Worker

Isolated Codex workspace with only the backtest MCP surface. Reads SEC filings, builds regime context from FRED/market data, ranks candidates, selects a portfolio with sell policies, and produces live-guidance actions. Every tool call is taped for reproducibility and audit.

Mechanical Scorer

Backtests each portfolio against SPY at 12m/36m/60m horizons with 1x/2x/3x transaction cost assumptions. Tax-aware (20% long-term after 366d hold). Deterministic sell policies: trailing stop, 200dma break, or hold-to-horizon. Search survivors must clear positive 36m after-tax excess return, then survive untouched validation dates before they can even enter shadow review.

Current State

AlphaForge has crossed the infrastructure threshold: the search loop, mutation bookkeeping, tape validation, and validation replay path are now trustworthy. The system has produced its first legitimate historical shadow candidate, but it still has not earned live-persona promotion.

DONE Deterministic screener with 5 regime-conditional templates and bounded mutation surface

DONE Best-effort PIT universe builder from SEC ticker master + historical price/liquidity screens

DONE Tax-aware scorer with deterministic sell policies (trailing stop, 200dma, hold-to-horizon)

DONE Hard search gate requiring positive 36m after-tax excess return before validation

DONE Promotion contract: complete validation coverage required, staged artifacts for manual review

DONE Live per-worker transcripts, taped tool calls, and reproducible batch artifacts

WIP First legitimate historical shadow candidate: valuation_discipline_v1 cleared search and loose validation, but still fails the stricter live-shadow bar

WIP Broader qualification ladder is in place: annual PIT dates from 2008 through 2022 plus frozen-bundle replay before live shadow

WIP Side-effect-free shadow lane exists, but the live packet still lacks full portfolio state, ES, illiquidity, employer concentration, and tax-lot context

TODO Stricter live-persona shadow gate: require stronger per-date validation and non-negative validation Calmar before any bundle can touch the live personas

TODO Survivorship-clean historical universe. The current large/mid-cap builder is liquid and deterministic, but not yet delisting-safe.

Mutation Surface

Variable	Range
ranking_weights	5 factors, weight shifts of 0.10 between pairs
screen_template_id	Philosophy-first auto_cycle_map_v1 on live runs; internal profile mapping changes by PIT regime
pm_prompt_variant	3 styles (deployment discipline, quality compounder, drawdown guard)
ra_prompt_variant	3 styles (quality first, filing delta first, regime aware)
guidance_style	3 styles (staged accumulator, watchlist first, high conviction only)
theta_screen_overrides	min_composite_score [0.55-0.70], max_candidates [12-20]
shortlist_n / portfolio_k	[8, 12, 16] / [3, 5]
evidence_threshold / tool_budget	[3, 4] / [4, 5, 6] per name; underfilling is allowed when evidence is thin

Why "AlphaForge"

The name reflects the iterative, mechanical nature of the optimization loop. Prompt bundles are forged through repeated testing against historical equity cohorts — not generated by a single model pass or hand-tuned by a human analyst. Most bundles are expected to fail. The system is designed around attrition, not genius.

This is not autonomous trading. AlphaForge produces candidate prompt/screener configurations that are staged for manual review and forward shadowing before any live use. The current best bundle is still shadow-only, because historical validation is only marginal and the live packet does not yet certify portfolio risk gates. The human remains the decision-maker; the forge just searches a larger configuration space than manual tuning allows.