Historical Data Details

Historical market data details for each supported exchange — available symbols, channels, date ranges...

Historical data details describes data collection specifics for each supported exchange and what's available via Tardis.dev HTTP API. If you'd like to work with normalized market data instead (same data format for each exchange) see official libraries and CSV data exports.

You'll find here per-exchange details about:

  • historical data availability date ranges — since when the historical data has been collected and is available

  • recorded market data channels also described as streams, subscription topics, tables etc in exchanges' docs — available historical raw market data is being sourced from WebSocket real-time APIs provided by the exchanges and can be filtered by channels, e.g.: to get historical trades for BitMEX, channel trade needs to be provided alongside requested instruments symbols (via HTTP API or client libs function args).

  • symbols of recorded instruments/currency pairs

  • incidents - describing periods where due to internal errors data has been missing for given exchange

Some exchanges encode requested symbol in channel name, e.g.: Deribit trades.BTC-PERPETUAL.100ms channel. This is not the case with our API as we always consider channel name and symbol to be separate inputs. In case of Deribit example channel name would be trades and symbol BTC-PERPETUAL. If channel provides option of frequency of updates (e.g.: 100ms vs raw tick by tick) always higher frequency one is being chosen and recorded.

See Live Status Dashboard to take a peek about current state of market data collection.

Market data collection overview

  • All market data collection is being performed on highly available Google Cloud Platform Kubernetes Cluster in London region (europe-west2).

  • When exchange provides choice of real-time data frequency for specific data types (e.g. order book data ) always most granular, non aggregated data feed is being collected.

  • Single WebSocket connection is being used to subscribe to full real-time data feed of given exchange and is the source of collected historical data. We're doing our best to maintain stable connection to exchanges' WebSocket APIs when recording the data, but disconnections can happen - on average 1-2 disconnects per day per exchange. See Live Status Dashboard for more details.

  • Each message received via exchange' WebSocket API is timestamped with 100ns precision using single clock source at arrival time and stored in ISO 8601 format.

  • Messages provided by exchanges' WebSocket feeds are being stored without any modifications.

  • All exchanges' data recorders use the same synchronized clock ensuring reliable local timestamps.

  • Collected historical market data is being stored in parallel in two separate geo-distributed storage providers.

  • Market data collection services are being constantly monitored both manually and via automated tools (StackDriver monitoring, alert notifications etc.) and have built-in self-healing capabilities. We also constantly monitor for upcoming exchanges' API changes and adapt to those beforehand.

  • There are multiple built-in checks detecting if connection to exchange is healthy during data collection process, such as:

    • validating subscription responses - if exchange does not confirm subscriptions within 20 seconds, connection is being restarted

    • order books sequence numbers validation for exchange that provide those

    • validating JSON format as in some unusual circumstances exchanges return data that is invalid JSON

    • stale connection detection - if there are no responses received within certain period (adjusted per exchange) it's most likely stale connection which get's automatically restarted

    • detection of unusually small messages count being received from exchange in given time period which likely means connection is not healthy, e.g.: receiving only 'pings' without data messages

    • and many more ...

  • any incident that is caused by us (bugs, network errors etc.) is being logged and available via API

  • New market data delay is 4 minutes in relation to real-time (T - 4min).

Collected order book data details

Historical market data available via HTTP API provides order book snapshots at the beginning of each day (00:00 UTC) and every-time WebSocket connection has been closed when recording real-time data feed (connection is restarted and new snapshot provided via fresh connection). It means that in order to be sure to receive initial order book snapshots one must replay historical data from 00:00 UTC time of the day. It also means that there is a tiny gap in historical data (around 50-300ms range depending on exchange) during re-subscribing to real-time WebSocket feed (every 24 hours) in order to receive order book snapshots.

Order book data for each exchange is provided in exactly the same format as exchange' WebSocket real-time data feed messages (exchange-native format), meaning that there is a full order book snapshot just after successfully establishing connection and order book's incremental updates after that.

Some exchanges do not provide initial order book snapshots when subscribing to WebSocket real-time feeds (like Binance, Bitstamp or Coinbase Pro full order book), hence for those there is a 'generated' snapshot available instead (based on REST API call) - details are specific for each exchange and can be found below.

Per-exchange historical data details