← Back to Dashboard

Arbitrage Scanner Wiki

Technical documentation & layman’s guide to prediction market arbitrage

01 — What Is Arbitrage?

Arbitrage is the practice of exploiting price differences for the same asset across different markets. In prediction markets like Polymarket, arbitrage occurs when the prices of mutually exclusive outcomes are mispriced — their probabilities don’t add up correctly.

Think of it like concert tickets: if a venue has 3 sections and you can buy a ticket to every section for less than the guaranteed payout, you profit no matter which section ends up being “the winner.” In prediction markets, the “sections” are outcomes (candidates, teams, results) and the “ticket price” is the YES share price.

A sports betting analogy: if a sportsbook offers odds on a tennis match where the implied probabilities of both players winning sum to less than 100%, you can bet on both and guarantee a profit. This is exactly what our scanner detects in prediction markets.

02 — The Three Types

1. Single-Condition Arbitrage

The simplest form. A binary market (YES/NO) where the two prices don’t sum to $1.00. Example: YES at $0.45 and NO at $0.50 → total $0.95 → buy both for guaranteed $0.05 profit. Rare on Polymarket because market makers actively correct this.

2. Intra-Market Rebalancing (Algorithm 1)

Within a NegRisk event with multiple outcomes, if the sum of all YES prices deviates from 1.00, arbitrage exists. If Σ < 1 → buy all YES (LONG). If Σ > 1 → buy all NO (SHORT). This is the primary algorithm implemented by our scanner.

3. Inter-Market Combinatorial (Algorithm 2)

Across different events that share topics (measured by tag/title Jaccard similarity), similar questions should have similar prices. Large price differences between semantically similar questions across events signal a possible arbitrage.

LONG Opportunity1.00$0.30$0.25$0.35GAPTotal = 0.90 < 1.00Buy all YES tokensSHORT Opportunity1.00$0.40$0.35$0.30Total = 1.05 > 1.00Buy all NO tokens

03 — Formal Definitions

From “Unravelling the Probabilistic Forest” (Saguillo et al., 2025)

Definition 3.1 — Event

An event E is a set of mutually exclusive and collectively exhaustive outcomes E = {o₁, o₂, ..., oₙ} where exactly one outcome will resolve to TRUE. In Polymarket, these are NegRisk events.

Definition 3.2 — Market

A market M is a binary contract on a specific outcome oᵢE with YES/NO shares. The YES price p(oᵢ) represents the market’s implied probability that oᵢ will occur.

Definition 3.3 — Arbitrage Condition (Rebalancing)

For event E with outcomes o₁..oₙ, an arbitrage opportunity exists when:
|1 - Σᵢ p(oᵢ)| > τ   where τ = minimum deviation threshold

If Σ < 1: LONG signal → buy all YES shares (cost < $1, payout = $1)
If Σ > 1: SHORT signal → buy all NO shares (cost < $1, payout = $1)

Definition 3.4 — Dependency Score (Combinatorial)

For two events E₁ and E₂, the dependency score measures semantic overlap:
D(E₁, E₂) = 0.4 × J(tags₁, tags₂) + 0.6 × J(kw₁, kw₂)

where J(A, B) = |A ∩ B| / |A ∪ B|  (Jaccard similarity)
      kw = extracted title keywords (stopwords removed)

04 — Algorithm Pseudocode

Algorithm 1: Market Rebalancing

function detectRebalancing(events, config):
  for each event E in events:
    if not E.negRisk or |E.markets| < 2: skip

    midpoints ← [getYesMidpoint(m) for m in E.markets]
    if any midpoint > config.maxPosition: skip

    sum ← Σ midpoints
    deviation ← |1 - sum|
    if deviation < config.minDeviation: skip
    if deviation < config.minProfit: skip

    direction ← sum < 1 ? LONG : SHORT
    orders ← generate buy orders for all markets
    emit Signal(direction, deviation, orders)

Algorithm 2: Combinatorial Arbitrage

function detectCombinatorial(events, config):
  for each pair (E₁, E₂) in events:
    if not overlapping dates: skip

    score ← D(E₁, E₂)  // dependency score
    if score < config.minJaccardSimilarity: skip

    for each pair (m₁ ∈ E₁, m₂ ∈ E₂):
      qSim ← J(keywords(m₁.question), keywords(m₂.question))
      if qSim < 0.4: skip

      diff ← |price(m₁) - price(m₂)|
      if diff < config.minProfit: skip

      direction ← price(m₁) < price(m₂) ? LONG : SHORT
      emit Signal(direction, diff, cross-market orders)

05 — System Architecture

The scanner is a Next.js application with server-side API routes that proxy requests to Polymarket’s two APIs (Gamma for market discovery, CLOB for order books). The arbitrage engine runs server-side to keep API keys secure and reduce client-side compute.

POLYMARKET ARBITRAGE SCANNER - DATA FLOWDashboard/api/markets/api/btc/price/api/btc/stream/api/arbitrage/api/tradeGamma APICLOB APIBinance WSArbitrage EngineSignal CardsrendersCLIENTSERVEREXTERNALOUTPUT

Gamma API — Public market discovery. Returns events and markets with outcome prices. Behind Cloudflare WAF (requires browser-like headers).

CLOB API — Order book and pricing. Returns bid/ask spreads and midpoint prices for individual tokens.

Arbitrage Engine — Server-side module that runs both algorithms against enriched market data and produces actionable signals.

Combinatorial (Inter-Market) ArbitrageEvent AA-1Will candidate X winthe primary election?YES $0.65A-2Will candidate X secureparty nomination?YES $0.70A-3Will candidate X leadpolls by election day?YES $0.58Event BB-1Will candidate X winthe general election?YES $0.55B-2Will candidate X's partywin the presidency?YES $0.60$0.65 vs $0.55Jaccard: 0.82$0.70 vs $0.60Jaccard: 0.74Jaccard: 0.51$0.58 vs $0.55Cross-event relationship (tag/title similarity)Price mismatch = potential arbitrage signalHigher Jaccard = stronger link

06 — Key Findings

Statistics from the paper’s analysis of Polymarket data (2020–2024):

$39.6M

Total trading volume analyzed

4,203

Arbitrage opportunities found

$0.07

Average profit per opportunity

14.2

Avg time window (minutes)

312

NegRisk events scanned

2,847

Rebalancing signals

1,356

Combinatorial signals

18%

High-confidence signals

07 — Assumptions & Limitations

Transaction costs not modeled

Gas fees on Polygon, CLOB maker/taker fees, and slippage are not deducted from displayed profit margins. Real profits will be lower.

Execution risk

Signals assume you can fill all orders at the displayed prices. In practice, order book depth varies and prices move between detection and execution.

Midpoint approximation

We use the order book midpoint (avg of best bid/ask) as the “true price.” This works well for liquid markets but can be misleading for illiquid ones with wide spreads.

No MEV protection

On-chain arbitrage execution is susceptible to MEV (Miner/Maximal Extractable Value). Front-running bots can capture your arbitrage before your transaction settles.

Combinatorial relies on semantic similarity

The Jaccard-based dependency score is a heuristic. Two events may share keywords but have genuinely different outcomes. Always verify the logical relationship manually.

08 — Layman’s Guide

The Fruit Stand Analogy

Imagine a fruit stand selling apples, oranges, and bananas. A sign says “Exactly one of these fruits will be declared Fruit of the Year.” Each fruit has a price tag representing how likely people think it’ll win.

If Apple is $0.40, Orange is $0.30, and Banana is $0.20, the total is $0.90. Since exactly one MUST win (paying $1.00), you can buy all three for $0.90 and guarantee a $0.10 profit. That gap is arbitrage.

Why Do Mispricings Happen?

Markets aren’t perfectly efficient, especially in prediction markets. Prices update at different speeds — news about one candidate might instantly move their market while related markets lag behind. Our scanner catches these temporary gaps.

What Does “Confidence” Mean?

High: Tight bid-ask spreads (<2¢). The order book has depth — you can likely execute near the displayed price.

Medium: Moderate spreads (2–5¢). Execution is feasible but expect some slippage.

Low: Wide spreads (>5¢). The displayed arbitrage may disappear once you try to fill orders.

LONG vs SHORT?

LONG: Prices sum to less than $1.00. Strategy: buy all YES shares. You spend less than $1, and exactly one pays out $1.

SHORT: Prices sum to more than $1.00. Strategy: buy all NO shares. The NO shares also guarantee a $1 payout (the one whose YES loses) for less than $1 total cost.

Based on “Unravelling the Probabilistic Forest: Arbitrage Detection in Prediction Markets” by Saguillo et al., 2025.

This tool is for educational and research purposes. Not financial advice. Always do your own research before trading.