Data

What data types do you support?

We provide the most comprehensive and granular market data on the market sourced from real-time WebSocket APIs with complete control and transparency how the data is being recorded which means that data types and data format (which we call exchange-native data format) that we provide can vary on per exchange basis. For example for BitMEX we store liquidations and chat messages data in addition to tick by tick trades and order book L2 messages, but not for FTX which doesn't provide those non standard data types.

See historical data details to learn about real-time channels captured for each exchange and available via our API as well as other exchange specific data collection and integration details.

We also provide following normalized data types via our client libs (normalization is done client-side):

  • tick-by-tick trades

  • order book L2 updates

  • order book snapshots (tick-by-tick, 10ms, 100ms, 1s, 10s etc)

  • quotes

  • derivative tick info (open interest, funding rate, mark price, index price)

  • liquidations

  • options summary

  • OHLCV

  • volume/tick based trade bars

and downloadable CSV data files:

What does high frequency historical data mean?

We always collect and provide data with the most granularity that exchange can offer via it's real-time WS feeds. High frequency can mean different things for different exchanges due to exchanges APIs limitations. For example for Coinbase Pro it can mean L3 order book data (market-by-order), for Binance Futures all order book L2 real-time updates and for Binance Spot it means order book updates aggregated in 100ms intervals.

How historical raw market data is being sourced?

Raw market data is sourced from exchanges real-time WebSocket APIs. For cases where exchange lacks WebSocket API for particular data type we fallback to pooling REST API periodically, e.g., Binance Futures open interest data.

Why data source matters and why we use real-time WebSocket feeds as data source vs periodically calling REST endpoints?

Recording exchanges real-time WebSocket feeds allows us preserving and providing the most granular data that exchanges APIs can offer including data that is simply not available via their REST APIs like tick level order book updates. Historical data sourced from WebSocket real-time feeds adheres to what you'll see when trading live and can be used to exactly replicate live conditions even if it means some occasional connection drops causing small data gaps, real-time data publishing delays especially during larger market moves, duplicated trades or crossed books in some edge cases. We find that trade-off acceptable and even if data isn't as clean and corrected as sourced from REST APIs, it allows for more insight into market microstructure and various unusual exchanges behaviors that simply can't be captured otherwise. Simple example would be latency spikes for many exchanges during increased volatility periods where exchange publish trade/order book/quote WebSocket messages with larger than usual latency or simply skip some of the the updates and then return those in one batch. Querying the REST API would result in nice, clean trade history, but such data wouldn't fully reflect real actionable market behavior and would result in unrealistic backtesting results, breaking in the real-time scenarios.

See market data collection overview for more details.

What L2 order book data can be used for?

L2 data (market-by-price) includes bids and asks orders aggregated by price level and can be used to analyze among other things:

  • order book imbalance

  • average execution cost

  • average liquidity away from midpoint

  • average spread

  • hidden interest (i.e., iceberg orders)

We do provide L2 data both in CSV format as incremental order book L2 updates, tick level order book snapshots (top 25 and top 5 levels) as well as in exchange-native format via API and client libraries that can perform full order book reconstruction client-side.

What L3 order book data can be used for?

L3 data (market-by-order) includes every order book order addition, update, cancellation and match and can be used to analyze among other things:

  • order resting time

  • order fill probability

  • order queue dynamics

Historical L3 data is currently available via API for Bitfinex, Coinbase Pro and Bitstamp - remaining supported exchanges provide L2 data only.

What is the maximum order book depth available for each supported exchange?

We always collect full depth order book data as long as exchange's WebSocket API supports it. Table below shows current state of affairs for each supported exchange.

exchange

order book depth

order book updates frequency

BitMEX

full order book depth snapshot and updates

real-time

Deribit

full order book depth snapshot and updates

real-time

Binance USDT Futures

top 1000 levels initial order book snapshot, full depth incremental order book updates

real-time

Binance COIN Futures

top 1000 levels initial order book snapshot, full depth incremental order book updates

real-time

Binance Spot

top 1000 levels initial order book snapshot, full depth incremental order book updates

100ms

FTX

top 100 levels initial order book snapshot and updates

real-time

OKEx Futures

top 400 levels initial order book snapshot and updates

real-time

OKEx Swap

top 400 levels initial order book snapshot and updates

real-time

OKEx Options

top 400 levels initial order book snapshot and updates

real-time

OKEx Spot

top 400 levels initial order book snapshot and updates

real-time

Huobi Futures

top 150 levels initial order book snapshot and updates

30ms

Huobi Swap

top 150 levels initial order book snapshot and updates

30ms

Huobi Global

top 150 levels initial order book snapshot and updates

100ms

Bitfinex Derivatives

top 100 levels initial order book snapshot and updates

real-time

Bitfinex

top 100 levels initial order book snapshot and updates

real-time

Coinbase Pro

full order book depth snapshot and updates

real-time

Kraken Futures

full order book depth snapshot and updates

real-time

Kraken

top 1000 levels initial order book snapshot and updates

real-time

Bitstamp

full order book depth snapshot and updates

real-time

Gemini

full order book depth snapshot and updates

real-time

Poloniex

full order book depth snapshot and updates

real-time

Bybit

top 25 levels initial order book snapshot and updates

real-time

Phemex

top 30 levels initial order book snapshot and updates

20ms

FTX US

top 100 levels initial order book snapshot and updates

real-time

Binance US

top 1000 levels initial order book snapshot, full depth incremental order book updates

100ms

Gate.io Futures

top 20 levels order book snapshots

unknown

Gate.io

top 30 levels order book snapshots

unknown

OKCoin

top 400 levels initial order book snapshot and updates

real-time

bitFlyer

full order book depth snapshot and updates

real-time

HitBTC

full order book depth snapshot and updates

real-time

Binance DEX

top 1000 levels initial order book snapshot, full depth incremental order book updates

100ms

Do you provide historical options data?

Yes, we do provide historical options data for Deribit and OKEx Options - see options chain CSV data type and Deribit and OKEx Options exchange details pages.

Do you provide historical futures data?

We cover all leading derivatives exchanges such as BitMEX, Deribit, Binance USDT Futures, Binance COIN Futures, FTX, OKEx, Huobi Futures, Huobi Swap, Bitfinex Derivatives, Bybit and many more.

What is the difference between futures and perpetual swaps contracts?

Futures contract is a contract that has expiry date (for example quarter ahead for quarterly futures). Futures contract price converges to spot price as the contract approaches expiration/settlement date. After futures contract expires, exchange settles it and replaces with a new contract for the next period (next quarter for our previous example).

Perpetual swap contract also commonly called "perp", "swap", "perpetual" or "perpetual future" in crypto exchanges nomenclature is very similar to futures contract, but does not have expiry date (hence perpetual). In order to ensure that the perpetual swap contract price stays near the spot price exchanges employ mechanism called funding rate. When the funding rate is positive, Longs pay Shorts. When the funding rate is negative, Shorts pay Longs. This mechanism can be quite nuanced and vary between exchanges, so it's best to study each contract specification to learn all the details (funding periods, mark price mechanisms etc.).

See CSV grouped symbols section if you'd like to download data for all futures or perpetual swaps as a single file for given exchange instead one by one for each individual instrument.

Do you provide time based aggregated data as well?

We are focusing on providing the best possible tick-level historical data for cryptocurrency exchanges and as of now our APIs (both HTTP and CSV datasets) do offer access to tick-level data only and do not offer support for time based aggregated data.

If you're interested in time based aggregated data (OHLC, interval based order book snapshots) see our client libs that provide such capabilities, but with the caveat that data aggregation is performed client-side from tick-level data sourced from the API, meaning it can be relatively slow process in contrast to ready to download aggregated data.

Can you record market data for exchange that's not currently supported?

Yes, we're always open to support new promising exchanges. Contact us and we'll get back to you to discuss the details.

Do you provide market data in normalized format?

Normalized market data (unified data format for every exchange) is available via our official libraries and downloadable CSV files. Our HTTP API provides data only in exchange-native format.

Do you provide normalized contract amounts for derivatives exchanges in your historical data feeds?

Data we provide has contract amounts exactly as provided by exchanges APIs, meaning in some cases it can be tricky to compare across exchanges due to different contract multipliers (like for example OKEx where each contract has $100 value) or different contract types (linear or inverse). We'll keep it this way, but we're also working on an API that will provide contract multipliers, tick sizes and more for each instrument in uniform way, allowing easily normalize the contract amounts client-side without having to go through all kinds of documentation on various exchange to find this information. Contact us to learn more.

What is a difference between exchange-native and normalized data format?

Cryptocurrency markets are very fragmented and every exchange provides data in it's own bespoke data format which we call exchange-native data format. Our HTTP API and client libs can provide market data in this format, meaning data you receive is exactly the same as the live data you would have received from exchanges ("as-is").

See how we collect data in exchange-native format and why it's important.

For example BitMEX trade message looks like this:

{"table":"trade","action":"insert","data":[{"timestamp":"2019-06-01T00:03:11.589Z","symbol":"ETHUSD","side":"Sell","size":10,"price":268.7,"tickDirection":"ZeroMinusTick","trdMatchID":"ebc230d9-0b6e-2d5d-f99a-f90109a2b113","grossValue":268700,"homeNotional":0.08555051758063137,"foreignNotional":22.987424073915648}]}

and this is Deribit trade message:

{"jsonrpc":"2.0","method":"subscription","params":{"channel":"trades.ETH-26JUN20.raw","data":[{"trade_seq":18052,"trade_id":"ETH-10813935","timestamp":1577836825724,"tick_direction":0,"price":132.65,"instrument_name":"ETH-26JUN20","index_price":128.6,"direction":"buy","amount":1.0}]}}

In contrast, normalized data format means the same, unified format across multiple exchanges. We provide normalized data via our client libs (data normalization is performed client-side) as well as via downloadable CSV files.

In the process of data normalization we map the data we collected from real-time WebSocket APIs (exchange-native format) to normalized/unified format across exchanges that is easier to deal with (one data format across multiple exchanges). We've open sourced all the data mappings from exchange-native to normalized format to make the whole process as transparent as possible.

Sample normalized trade message:

{
"type": "trade",
"symbol": "XBTUSD",
"exchange": "bitmex",
"id": "282a0445-0e3a-abeb-f403-11003204ea1b",
"price": 7996,
"amount": 50,
"side": "sell",
"timestamp": "2019-10-23T10:32:49.669Z",
"localTimestamp": "2019-10-23T10:32:49.740Z"
}

We support following normalized data types via our client libs:

  • tick-by-tick trades

  • order book L2 updates

  • order book snapshots (tick-by-tick, 10ms, 100ms, 1s, 10s etc)

  • quotes

  • derivative tick info (open interest, funding rate, mark price, index price)

  • liquidations

  • OHLCV

  • volume/tick based trade bars

and downloadable CSV data files:

What is the channel field used in the HTTP API and client libs replay functions?

Exchanges when publishing real-time data messages, always publish those for subscription topics clients have subscribed to. Those subscriptions topics are also very often called "channels" or "streams" in exchanges documentations pages and describe data type given message belongs to - for example BitMEX publishes it's trades data via trade channel and order book L2 updates data via orderBookL2.

Since we collect the data for all the channels described in exchanges' details page (Captured real-time market data channels section) our HTTP API and client libs offer filtering capability by those channels names, so for example to get historical trades for BitMEX, channel trade needs to be provided alongside requested instruments symbols (via HTTP API or client lib replay function args).

What time zone is used in the data?

UTC, always.

Is provided raw market data complete?

We're doing our best to provide the most complete and reliable historical raw data API on the market. To do so amongst many other things, we utilize highly available Kubernetes clusters on Google Cloud Platform that offer best in the class availability, networking and monitoring. However due to exchanges' APIs downtimes (maintenance, deployments, connection drops etc.) we can experience market data gaps and cannot guarantee 100% data completeness. In rare circumstances, when exchange's API changes without any notice or we hit new unexpected rate limits we also may fail to record data during such period, it happens very rarely and is very specific for each exchange. Use /exchanges/:exchange API endpoint and check for incidentReports field in order to get most detailed and up to date information on that subject.

How frequently exchanges drop WebSocket connections?

As long as exchange WebSocket API is not 'hidden' behind Cloudflare proxy (causing relatively frequent "CloudFlare WebSocket proxy restarting, Connection reset by peer" errors) connections are stable for majority of supported exchanges and there is almost no connection drops during the day. In times when there is more volatility in the market some exchanges tend to drop connections more frequently or have larger latency spikes. Overall it's a nuanced matter that changes over time, if you'd have any questions regarding particular exchange, please do not hesitate to contact us.

Can historical order books reconstructed from L2 updates be crossed (bid/ask overlap) occasionally?

Although is should never happen in theory, in practice due to various crypto exchanges bugs and peculiarities it can happen (very occasionally), see some posts from users reporting those issues:

We do track sequence numbers of WebSocket L2 order book messages when collecting the data and restart connection when sequence gap is detected for exchanges that do provide those numbers. We observe that even in scenario when sequence numbers are in check, bid/ask overlap can occur. When such scenario occurs, exchanges tend to 'forget' to publish delete messages for the opposite side of the book when publishing new level for given side - we validated that hypothesis by comparing reconstructed order book snapshots that had crossed order book (bid/ask overlap) for which we removed order book levels for the opposite side manually (as exchange didn't publish that 'delete'), with quote/ticker feeds if best bid/ask matches (for exchanges that provide those) - see sample code that implements that manual level removal logic.

Can exchange publish data with non monotonically increasing timestamps for single data channel?

That shouldn't happen in theory, but we've detected that for some exchanges when new connection is established sometimes first message for given channel & symbol has newer timestamp than subsequent message, e.g., order book snapshot has newer timestamp than first order book update. This is why we provide data via API and CSV downloads for given data ranges based on local timestamps (timestamp of message arrival) which are always monotonically increasing.

Are exchanges publishing duplicated trades data messages?

Some exchanges are occasionally publishing duplicated trades (trades with the same ids). Since we collect real-time data we also collect and provide duplicate trades via API if those were published by real-time WebSocket feeds of exchanges. Our client libraries have functionality that when working with normalized data can deduplicate such trades, similarly for downloadable CSV files we deduplicate tick-by-tick trades data.

How order book data snapshots are provided?

Historical market data available via HTTP API provides order book snapshots at the beginning of each day (00:00 UTC) - see details.

We also provide custom order book snapshots with customizable time intervals from tick-by-tick, milliseconds to minutes or hours via client libs in which case custom snapshots are computed client side from raw data provided via HTTP API as well as via downloadable CSV files - book_snapshot_25 and book_snapshot_5 .

Do you collect order books as snapshots or in streaming mode?

Order books are collected in streaming mode - snapshot at the beginning of each day and then incremental updates. See details.

We also provide custom order book snapshots with customizable time intervals from tick-by-tick, milliseconds to minutes or hours via client libs in which case custom snapshots are computed client side from raw data provided via HTTP API as well as via downloadable CSV files - book_snapshot_25 and book_snapshot_5 .

How incremental_book_l2 CSV dataset is built from real-time data?

Cryptocurrency exchanges real-time APIs vary a lot, but for L2 order book data they all tend to follow similar flow, first when WS connection is established and subscription is confirmed, exchanges send initial order book snapshot (all existing price levels or top 'x' levels depending on exchange) and then start streaming 'book update' messages (called frequently deltas as well). Those updates when applied to initial snapshot, result in up to data order book state at given time.

We do provide initial L2 snapshots in incremental_book_L2 dataset at the beginning of each day (00:00 UTC, more details), but also anytime exchange closes it's real-time WebSocket connection, see details.

Let's take FTX as an example and start with it's snapshot orderbook message (that is frequently called 'partial' in exchanges API docs as well). Remaining bids and asks levels were removed from this sample message for the sake of clarity.

{
"channel": "orderbook",
"market": "ETH/USD",
"type": "partial",
"data": {
"time": 1601510401.2166328,
"checksum": 204980439,
"bids": [
[
359.72,
121.259
]
],
"asks": [
[
359.8,
8.101
]
],
"action": "partial"
}
}

Such snapshot message maps to the following rows in CSV file:

exchange

symbol

timestamp

local_timestamp

is_snapshot

side

price

amount

ftx

ETH/USD

1601510401216632

1601510401316432

true

ask

359.8

8.101

ftx

ETH/USD

1601510401216632

1601510401316432

true

bid

359.72

121.259

... and here's a sample FTX orderbook update message.

{
"channel": "orderbook",
"market": "ETH/USD",
"type": "update",
"data": {
"time": 1601510427.1840546,
"checksum": 1377242400,
"bids": [],
"asks": [
[
360.24,
4.962
],
[
361.02,
0
]
],
"action": "update"
}
}

Let's see how it maps to CSV format.

exchange

symbol

timestamp

local_timestamp

is_snapshot

side

price

amount

ftx

ETH/USD

1601510427184054

1601510427204046

false

ask

360.24

4.962

ftx

ETH/USD

1601510427184054

1601510427204036

false

ask

361.02

0

See this answer if you have doubts how to reconstruct order book state based on data provided in incremental_book_L2 dataset.

How can I reconstruct full order book state from incremental_book_L2 CSV dataset?

In order to reconstruct full order book state correctly from incremental_book_L2 data:

  • For each row in the CSV file (iterate in the same order as provided in file):

    • only if local timestamp of current row is larger than previous row local timestamp(local_timestamp column value) it means you can read your local order book state as it's consistent, why? CSV format is flat where each row represents single price level update, but most exchanges real-time feeds publish multiple order book levels updates via single WebSocket message that need to be processed together before reading locally maintained order book state. We use local timestamp value here to detect all price level updates belonging to single 'update' message.

    • if current row is a part of the snapshot (is_snapshot column value set to true) and previous one was not, reset your local order book state object that tracks price levels for each order book side as it means that there was a connection restart and exchange provided full order book snapshot or it was a start of a new day (each incremental_book_L2 file starts with the snapshot)

    • if current row amount is set to zero (amount column value set to 0) remove such price level (row's price column) from your local order book state as such price level does not exist anymore

    • if current row amount is not set to zero update your local order book state price level with new value or add new price level if not exist yet in your local order book state - maintain separately bids and asks order book sides (side column value)

Alternatively we do also provide top 25 and top 5 levels order book snapshots CSV datasets ready to download.

How CSV datasets are split into the files?

CSV datasets are available in daily intervals split by exchange, data type and symbol. In addition to standard currency pairs/instrument symbols, each exchange also has special 'grouped' symbols available depending if it supports given market type: SPOT, FUTURES, OPTIONS and PERPETUALS. That feature is useful if someone is interested in for examples all Deribit's options instruments' trades or quotes data without a need to request data for each symbol separately one by one.

How market data messages are being timestamped?

Each message received via WebSocket connection is timestamped with 100ns precision using synchronized clock at arrival time (before any message processing) and stored in ISO 8601 format.

What is the new historical market data delay in relation to real-time?

For API access it's 4 minutes (T - 4min), downloadable CSV files for given day are available on the next day around 03:00 UTC.

Contents
What data types do you support?
What does high frequency historical data mean?
How historical raw market data is being sourced?
Why data source matters and why we use real-time WebSocket feeds as data source vs periodically calling REST endpoints?
What L2 order book data can be used for?
What L3 order book data can be used for?
What is the maximum order book depth available for each supported exchange?
Do you provide historical options data?
Do you provide historical futures data?
What is the difference between futures and perpetual swaps contracts?
Do you provide time based aggregated data as well?
Can you record market data for exchange that's not currently supported?
Do you provide market data in normalized format?
Do you provide normalized contract amounts for derivatives exchanges in your historical data feeds?
What is a difference between exchange-native and normalized data format?
What is the channel field used in the HTTP API and client libs replay functions?
What time zone is used in the data?
Is provided raw market data complete?
How frequently exchanges drop WebSocket connections?
Can historical order books reconstructed from L2 updates be crossed (bid/ask overlap) occasionally?
Can exchange publish data with non monotonically increasing timestamps for single data channel?
Are exchanges publishing duplicated trades data messages?
How order book data snapshots are provided?
Do you collect order books as snapshots or in streaming mode?
How incremental_book_l2 CSV dataset is built from real-time data?
How can I reconstruct full order book state from incremental_book_L2 CSV dataset?
How CSV datasets are split into the files?
How market data messages are being timestamped?
What is the new historical market data delay in relation to real-time?