Technical documentation & layman’s guide to prediction market arbitrage
Arbitrage is the practice of exploiting price differences for the same asset across different markets. In prediction markets like Polymarket, arbitrage occurs when the prices of mutually exclusive outcomes are mispriced — their probabilities don’t add up correctly.
Think of it like concert tickets: if a venue has 3 sections and you can buy a ticket to every section for less than the guaranteed payout, you profit no matter which section ends up being “the winner.” In prediction markets, the “sections” are outcomes (candidates, teams, results) and the “ticket price” is the YES share price.
A sports betting analogy: if a sportsbook offers odds on a tennis match where the implied probabilities of both players winning sum to less than 100%, you can bet on both and guarantee a profit. This is exactly what our scanner detects in prediction markets.
1. Single-Condition Arbitrage
The simplest form. A binary market (YES/NO) where the two prices don’t sum to $1.00. Example: YES at $0.45 and NO at $0.50 → total $0.95 → buy both for guaranteed $0.05 profit. Rare on Polymarket because market makers actively correct this.
2. Intra-Market Rebalancing (Algorithm 1)
Within a NegRisk event with multiple outcomes, if the sum of all YES prices deviates from 1.00, arbitrage exists. If Σ < 1 → buy all YES (LONG). If Σ > 1 → buy all NO (SHORT). This is the primary algorithm implemented by our scanner.
3. Inter-Market Combinatorial (Algorithm 2)
Across different events that share topics (measured by tag/title Jaccard similarity), similar questions should have similar prices. Large price differences between semantically similar questions across events signal a possible arbitrage.
From “Unravelling the Probabilistic Forest” (Saguillo et al., 2025)
Definition 3.1 — Event
E = {o₁, o₂, ..., oₙ} where exactly one outcome will resolve to TRUE. In Polymarket, these are NegRisk events.Definition 3.2 — Market
Definition 3.3 — Arbitrage Condition (Rebalancing)
o₁..oₙ, an arbitrage opportunity exists when:|1 - Σᵢ p(oᵢ)| > τ where τ = minimum deviation threshold If Σ < 1: LONG signal → buy all YES shares (cost < $1, payout = $1) If Σ > 1: SHORT signal → buy all NO shares (cost < $1, payout = $1)
Definition 3.4 — Dependency Score (Combinatorial)
D(E₁, E₂) = 0.4 × J(tags₁, tags₂) + 0.6 × J(kw₁, kw₂)
where J(A, B) = |A ∩ B| / |A ∪ B| (Jaccard similarity)
kw = extracted title keywords (stopwords removed)Algorithm 1: Market Rebalancing
function detectRebalancing(events, config):
for each event E in events:
if not E.negRisk or |E.markets| < 2: skip
midpoints ← [getYesMidpoint(m) for m in E.markets]
if any midpoint > config.maxPosition: skip
sum ← Σ midpoints
deviation ← |1 - sum|
if deviation < config.minDeviation: skip
if deviation < config.minProfit: skip
direction ← sum < 1 ? LONG : SHORT
orders ← generate buy orders for all markets
emit Signal(direction, deviation, orders)Algorithm 2: Combinatorial Arbitrage
function detectCombinatorial(events, config):
for each pair (E₁, E₂) in events:
if not overlapping dates: skip
score ← D(E₁, E₂) // dependency score
if score < config.minJaccardSimilarity: skip
for each pair (m₁ ∈ E₁, m₂ ∈ E₂):
qSim ← J(keywords(m₁.question), keywords(m₂.question))
if qSim < 0.4: skip
diff ← |price(m₁) - price(m₂)|
if diff < config.minProfit: skip
direction ← price(m₁) < price(m₂) ? LONG : SHORT
emit Signal(direction, diff, cross-market orders)The scanner is a Next.js application with server-side API routes that proxy requests to Polymarket’s two APIs (Gamma for market discovery, CLOB for order books). The arbitrage engine runs server-side to keep API keys secure and reduce client-side compute.
Gamma API — Public market discovery. Returns events and markets with outcome prices. Behind Cloudflare WAF (requires browser-like headers).
CLOB API — Order book and pricing. Returns bid/ask spreads and midpoint prices for individual tokens.
Arbitrage Engine — Server-side module that runs both algorithms against enriched market data and produces actionable signals.
Statistics from the paper’s analysis of Polymarket data (2020–2024):
$39.6M
Total trading volume analyzed
4,203
Arbitrage opportunities found
$0.07
Average profit per opportunity
14.2
Avg time window (minutes)
312
NegRisk events scanned
2,847
Rebalancing signals
1,356
Combinatorial signals
18%
High-confidence signals
Transaction costs not modeled
Gas fees on Polygon, CLOB maker/taker fees, and slippage are not deducted from displayed profit margins. Real profits will be lower.
Execution risk
Signals assume you can fill all orders at the displayed prices. In practice, order book depth varies and prices move between detection and execution.
Midpoint approximation
We use the order book midpoint (avg of best bid/ask) as the “true price.” This works well for liquid markets but can be misleading for illiquid ones with wide spreads.
No MEV protection
On-chain arbitrage execution is susceptible to MEV (Miner/Maximal Extractable Value). Front-running bots can capture your arbitrage before your transaction settles.
Combinatorial relies on semantic similarity
The Jaccard-based dependency score is a heuristic. Two events may share keywords but have genuinely different outcomes. Always verify the logical relationship manually.
The Fruit Stand Analogy
Imagine a fruit stand selling apples, oranges, and bananas. A sign says “Exactly one of these fruits will be declared Fruit of the Year.” Each fruit has a price tag representing how likely people think it’ll win.
If Apple is $0.40, Orange is $0.30, and Banana is $0.20, the total is $0.90. Since exactly one MUST win (paying $1.00), you can buy all three for $0.90 and guarantee a $0.10 profit. That gap is arbitrage.
Why Do Mispricings Happen?
Markets aren’t perfectly efficient, especially in prediction markets. Prices update at different speeds — news about one candidate might instantly move their market while related markets lag behind. Our scanner catches these temporary gaps.
What Does “Confidence” Mean?
High: Tight bid-ask spreads (<2¢). The order book has depth — you can likely execute near the displayed price.
Medium: Moderate spreads (2–5¢). Execution is feasible but expect some slippage.
Low: Wide spreads (>5¢). The displayed arbitrage may disappear once you try to fill orders.
LONG vs SHORT?
LONG: Prices sum to less than $1.00. Strategy: buy all YES shares. You spend less than $1, and exactly one pays out $1.
SHORT: Prices sum to more than $1.00. Strategy: buy all NO shares. The NO shares also guarantee a $1 payout (the one whose YES loses) for less than $1 total cost.
Based on “Unravelling the Probabilistic Forest: Arbitrage Detection in Prediction Markets” by Saguillo et al., 2025.
This tool is for educational and research purposes. Not financial advice. Always do your own research before trading.