AlphaForge
experimentalEvolutionary optimization over a typed prompt-bundle surface, scored against point-in-time equity cohorts with deterministic regime-conditional screening and tax-aware exit policies. The system now runs a stable search pipeline, produces historical shadow candidates, and blocks live-persona promotion unless a bundle survives a stricter validation and shadow-readiness gate.
Inspired by Karpathy's autoresearch (autonomous experimentation loop) and Atlas-GIC (multi-agent prompt optimization via Sharpe). Diverges from both: this is a governed IC workflow where structural risk gates — not prompt mutation alone — are the discipline mechanism.
The Forge
AlphaForge treats investment research prompts as parameters to be optimized. Each "prompt bundle" is a typed configuration: persona overlays, ranking weights, screener template, guidance style, and tool budgets. The system mutates these variables and scores the output against historical equity cohorts — keeping only the survivors.
# The evolutionary loop
seeds = load_prompt_bundles() # 6 philosophy-first seed bundles
universe = build_universe_as_of(date) # liquid large/mid-cap PIT subset
regime = precompute_regime_packet(date) # once per decision date
shortlist = screen_equities(universe, regime) # auto_cycle_map_v1
research = run_research(shortlist, bundle) # LLM thesis + filing analysis
scored = score_cohorts(research, SPY) # 12m / 36m / 60m after-tax
survivors = gate(scored, mean_36m > 0) # complete coverage required
mutations = diversify_process_hypotheses() # weights, evidence, screen strictness
shadow = stage_for_manual_review(candidate) # never auto-promote live Three Layers
Deterministic Screener
Philosophy-first via auto_cycle_map_v1: a PIT regime packet is computed once per date and mapped into approved internal screen profiles. SEC company facts + yfinance prices + FRED macro series. Factor scoring: quality resilience, balance sheet strength, valuation, capital allocation, price confirmation. Later generations mostly mutate thresholds and evidence discipline, not direct stock-picking rules.
LLM Research Worker
Isolated Codex workspace with only the backtest MCP surface. Reads SEC filings, builds regime context from FRED/market data, ranks candidates, selects a portfolio with sell policies, and produces live-guidance actions. Every tool call is taped for reproducibility and audit.
Mechanical Scorer
Backtests each portfolio against SPY at 12m/36m/60m horizons with 1x/2x/3x transaction cost assumptions. Tax-aware (20% long-term after 366d hold). Deterministic sell policies: trailing stop, 200dma break, or hold-to-horizon. Search survivors must clear positive 36m after-tax excess return, then survive untouched validation dates before they can even enter shadow review.
Current State
AlphaForge has crossed the infrastructure threshold: the search loop, mutation bookkeeping, tape validation, and validation replay path are now trustworthy. The system has produced its first legitimate historical shadow candidate, but it still has not earned live-persona promotion.
Mutation Surface
| Variable | Range |
|---|---|
| ranking_weights | 5 factors, weight shifts of 0.10 between pairs |
| screen_template_id | Philosophy-first auto_cycle_map_v1 on live runs; internal profile mapping changes by PIT regime |
| pm_prompt_variant | 3 styles (deployment discipline, quality compounder, drawdown guard) |
| ra_prompt_variant | 3 styles (quality first, filing delta first, regime aware) |
| guidance_style | 3 styles (staged accumulator, watchlist first, high conviction only) |
| theta_screen_overrides | min_composite_score [0.55-0.70], max_candidates [12-20] |
| shortlist_n / portfolio_k | [8, 12, 16] / [3, 5] |
| evidence_threshold / tool_budget | [3, 4] / [4, 5, 6] per name; underfilling is allowed when evidence is thin |
Why "AlphaForge"
The name reflects the iterative, mechanical nature of the optimization loop. Prompt bundles are forged through repeated testing against historical equity cohorts — not generated by a single model pass or hand-tuned by a human analyst. Most bundles are expected to fail. The system is designed around attrition, not genius.
This is not autonomous trading. AlphaForge produces candidate prompt/screener configurations that are staged for manual review and forward shadowing before any live use. The current best bundle is still shadow-only, because historical validation is only marginal and the live packet does not yet certify portfolio risk gates. The human remains the decision-maker; the forge just searches a larger configuration space than manual tuning allows.