Methodology · Pundit Scoreboard

1️⃣ Data sources

Every prediction comes from publicly available content by finance YouTubers and Weibo influencers:

Download channel-published audio + Weibo text (internal analysis only; not redistributed)
Transcribe with Whisper or use existing subtitles
Multi-model extraction identifies statements that combine a ticker, direction/target, and horizon

Every prediction card links back to the source video/post + timestamp for independent verification.

2️⃣ What counts as a "stock prediction"

A statement qualifies only if it has ALL of:

Specific instrument (ticker, ETF, index, crypto, commodity)
Direction or price target (long/short/neutral OR an explicit target)
Time horizon (a date or "by year-end" / "within 6 months" etc.)
Verifiability (can be checked against historical market data)
Attribution (the host's own forward call, not relay of someone else's position)

Excluded:

Vague claims without a ticker ("tech stocks will do well")
Conditional predictions ("if the Fed pivots, then…") — the trigger isn't verifiable
Pure recommendations ("you should buy gold") with no forward claim
Eschatological calls without a horizon ("the dollar will collapse")

3️⃣ Tiering by horizon

Each prediction is bucketed by holding duration:

Short (<60 days): weekly-to-monthly technical calls
Mid (60-360 days): quarterly-to-annual fundamental / valuation calls
Long (>360 days): multi-year compounders or macro themes

Tiering makes cross-pundit comparison fair: a technician's short-term hit rate isn't apples-to-apples with a value investor's long-term rate.

4️⃣ Strict model consensus

Every candidate prediction must pass strict multi-model validation before publishing.

Candidate extraction: transcript analysis emits candidate rows with ticker / direction / target / horizon.
Primary validator path: when available, Claude Opus + OpenAI GPT + Vertex Gemini run a two-round strict consensus workflow.
Accepted fallback: when Claude is unavailable, OpenAI GPT + Vertex Gemini vote independently and a candidate is published only if both vote KEEP.
Audit trail: each final batch records its consensus version, such as 2v-strict-gpt-gemini, so fallback-validated data is traceable.

This design reduces single-model bias while keeping the launch pipeline usable when one vendor path is unavailable.

5️⃣ Automated market verification

When the horizon expires, an automated job pulls via yfinance:

Entry price: closing price on the video-publish date
Exit price: closing price on the horizon date
P/L %: (exit − entry) / entry (sign-flipped for short calls)
Benchmark alpha: P/L difference vs SPY over the same period

"Simulated P/L" is a stylized backtest — each call as a $1 position, opened on publish date, closed on horizon date. For accuracy scoring only; not investment advice.

6️⃣ Verdict labels

🎯 Hit (Bullseye): direction correct, target reached (or clearly profitable)
🤏 Partial: right direction but magnitude or timing off-target
💸 Miss: wrong direction, or target not reached and horizon passed
🔮 Pending: horizon hasn't expired yet

"Miss" judges the prediction, not the person.

⚠️ Known limitations

Transcription has errors, especially mixed-language speech
LLM judgment on "is this a prediction?" has edge cases — rhetorical phrasing can be miscategorized
"$1 equal-weight" backtest ignores fees, slippage, stops, rebalancing
Close-price entry/exit may differ from real execution
Sample size grows over time — early-period coverage is sparse

All of these are open to correction. See 📮 Corrections & Contact.