You'll find here per-exchange details about:
historical data availability date ranges — since when the historical data has been collected and is available
captured real-time market data channels also described as streams, subscription topics, tables etc in exchanges' docs — available historical raw market data is being sourced from WebSocket real-time APIs provided by the exchanges and can be filtered by channels, e.g.: to get historical trades for BitMEX, channel
trade needs to be provided alongside requested instruments symbols (via HTTP API or client libs function args).
symbols of recorded instruments/currency pairs
incidents - describing periods where due to internal errors data has been missing for given exchange
Some exchanges encode requested symbol in channel name, e.g.: Deribit
trades.BTC-PERPETUAL.100ms channel. This is not the case with our API as we always consider channel name and symbol to be separate inputs. In case of Deribit example channel name would be
trades and symbol
BTC-PERPETUAL. If channel provides option of frequency of updates (e.g.: 100ms vs raw tick by tick) always higher frequency one is being chosen and recorded.
All market data collection is being performed on one of the highly available Google Cloud Platform Kubernetes Clusters - London, UK (europe-west2 region) or Tokyo, Japan (asia-northeast1 region) - information which data center location is used for particular exchange is described on exchange historical data details page.
When exchange provides choice of real-time data frequency for specific data types (e.g. order book data ) always most granular, non aggregated data feed is being collected.
Choice if single or many WebSocket connections are being used to record full real-time data feed is made on case by case basis - we take into account exchange API limits and latency which may be higher or lower if single connection is being used - detailed information which strategy is used for particular exchange is described on exchange historical data details page.
WebSocket connection is always restarted at 00:00 UTC (every 24 hours) in order to receive initial order book snapshots
Each received message is timestamped with 100ns precision using synchronized clock at arrival time and stored in ISO 8601 format.
Messages provided by exchanges' WebSocket feeds are being stored without any modifications.
Collected historical market data is being stored in parallel in two separate geo-distributed storage services.
Checks if there are new instruments available for given exchange are being performed every minute.
Market data collection services are being constantly monitored both manually and via automated tools (monitoring, alert notifications) and have built-in self-healing capabilities. We also constantly monitor for upcoming exchanges' API changes and adapt to those beforehand.
There are multiple built-in checks detecting if connection to exchange is healthy during data collection process, such as:
validating subscription responses - if exchange does not confirm subscriptions within 20 seconds, connection is being restarted
order books sequence numbers validation for exchange that provide those
validating JSON format as in some unusual circumstances exchanges return data that is invalid JSON
stale connection detection - if there are no responses received within certain period (adjusted per exchange) it's most likely stale connection which get's automatically restarted
detection of unusually small messages count being received from exchange in given time period which likely means connection is not healthy, e.g.: receiving only 'pings' without data messages
and many more ...
Any incident that is caused by us (bugs, network errors etc.) is being logged and available via API.
New market data delay is 4 minutes in relation to real-time (
T - 4min).
Historical market data available via HTTP API provides order book snapshots at the beginning of each day (00:00 UTC) and every-time WebSocket connection has been closed when recording real-time data feed (connection is restarted and new snapshot provided via fresh connection). It means that in order to be sure to receive initial order book snapshots one must replay historical data from 00:00 UTC time of the day. It also means that there is a tiny gap in historical data (around
300-3000ms range depending on exchange) during re-subscribing to real-time WebSocket feed (every 24 hours) in order to receive order book snapshots.
Some exchanges do not provide initial order book snapshots when subscribing to WebSocket real-time feeds (like Binance, Bitstamp or Coinbase Pro full order book), hence for those there is a 'generated' snapshot available instead (based on REST API call) - details are specific for each exchange and can be found below.