Data

What data types do you support?

We strive to provide complete raw market data capture as published by exchange' real-time WebSocket feeds. This means that data types we support vary for every exchange, for example for BitMEX we store liquidations and chat messages data in addition to tick by tick trades and order book L2 messages, but not for FTX which doesn't provide those non standard data types. See historical data details to find out captured real-time channels for each exchange.

We also provide following normalized data types via our client libs and downloadable CSV data files:

  • tick-by-tick trades

  • order book L2 updates

  • order book snapshots (tick-by-tick, 10ms, 100ms, 1s, 10s etc)

  • quotes

  • derivative tick info (open interest, funding rate, mark price, index price)

  • OHLCV

  • volume/tick based trade bars

What does high frequency historical data mean?

We always collect and provide data with the most granularity that exchange can offer via it's real-time feeds. High frequency means different things for each exchange so for example for Coinbase Pro it can mean L3 order book data (market-by-order), for Binance Futures all order book L2 real-time updates.

What L2 order book data can be used for?

Order Book Level 2 aggregated by price data can be used to analyse among other things:

  • order book imbalance

  • average execution cost

  • average liquidity away from midpoint

  • average spread

  • hidden interest (i.e., iceberg orders)

We store and provide L2 data with highest resolution possible.

What L3 order book data can be used for?

Order Book Level 3 data with all individual orders can be used to analyse among other things:

  • order resting time

  • order fill probability

  • order queue dynamics

Historical L3 data is currently available by Coinbase Pro, CoinFLEX and Bitstamp.

Do you provide historical options data?

Yes, we do provide historical options data for Deribit and OKEx Options.

Do you provide historical futures data?

We specialize in derivatives exchanges market data and cover all top venues that trade futures contracts: BitMEX, Deribit, Binance Futures, Huobi DM and more.

Can you record market data for exchange that's not currently supported?

Yes, we're always open to support new promising exchanges. Contact us and we'll get back to you as soon as possible to discuss details.

Do you provide market data in normalized format?

Normalized market data (unified data format for every exchange) is available via our official libraries and downloadable CSV data files. Our HTTP API provides data only in exchange-native format.

What is a difference between exchange-native and normalized data format?

Cryptocurrency markets are very fragmented and every exchange provides data in it's own bespoke data format which we call exchange-native data format. Our HTTP API provides market data in this format.

For example BitMEX trade message looks like this:

{"table":"trade","action":"insert","data":[{"timestamp":"2019-06-01T00:03:11.589Z","symbol":"ETHUSD","side":"Sell","size":10,"price":268.7,"tickDirection":"ZeroMinusTick","trdMatchID":"ebc230d9-0b6e-2d5d-f99a-f90109a2b113","grossValue":268700,"homeNotional":0.08555051758063137,"foreignNotional":22.987424073915648}]}

and Deribit trade message:

{"jsonrpc":"2.0","method":"subscription","params":{"channel":"trades.ETH-26JUN20.raw","data":[{"trade_seq":18052,"trade_id":"ETH-10813935","timestamp":1577836825724,"tick_direction":0,"price":132.65,"instrument_name":"ETH-26JUN20","index_price":128.6,"direction":"buy","amount":1.0}]}}

In contrast, normalized data format means the same, unified format across multiple exchanges. We provide normalized data via our client libs (data normalization is performed client-side) as well as via downloadable CSV data files.

Sample normalized trade message:

{
"type": "trade",
"symbol": "XBTUSD",
"exchange": "bitmex",
"id": "282a0445-0e3a-abeb-f403-11003204ea1b",
"price": 7996,
"amount": 50,
"side": "sell",
"timestamp": "2019-10-23T10:32:49.669Z",
"localTimestamp": "2019-10-23T10:32:49.740Z"
}

What time zone is used in the data?

UTC.

How historical raw market data is being sourced?

Raw market data is sourced from WebSocket real-time APIs provided by the exchanges. See details.

Is provided raw market data complete?

We're doing our best to provide the most complete and reliable historical raw data API on the market. To do so amongst many other things, we utilize highly available Kubernetes cluster on Google Cloud Platform that offers best in the class availability, networking and monitoring. However due to exchanges' API downtimes (maintenance, deployments etc.) we can experience market data gaps and cannot guarantee 100% data completeness. In rare circumstances, when exchange's API changes without notice or we hit new unexpected rate limits we also may fail to record data during such period, it happens very rarely and is specific for each exchange, use /exchanges/:exchange API endpoint and check for incidents field in order to get most detailed and up to date information on that.

See Live Status Dashboard to take a peek about current state of market data collection.

What kind of protocols are used for data collection from exchanges?

We use WebSocket protocol for real-time data collection and occasionally HTTP REST APIs for fetching initial full order book snapshots for exchanges that do not provide them via WebSocket.

How order book data snapshots are provided?

Historical market data available via HTTP API provides order book snapshots at the beginning of each day (00:00 UTC) - see details.

We also provide custom order book snapshots with customizable time intervals from tick-by-tick, milliseconds to minutes or hours, via CSV downloads and client libs in which case custom snapshots are computed client side from raw data provided via HTTP API.

Do you collect order books as snapshots or in streaming mode?

Order books are collected in streaming mode - snapshot at the beginning of each day and then incremental delta updates. See details.

We also provide custom order book snapshots with customizable time intervals from tick-by-tick, milliseconds to minutes or hours, via CSV downloads and client libs in which case custom snapshots are computed client side from raw data provided via HTTP API.

How market data messages are being timestamped?

Each message received via exchange' WebSocket API is timestamped with 100ns precision using single clock source at arrival time and stored in ISO 8601 format.

What is the new historical market data delay in relation to real-time?

4 minutes (T - 4min).