Historical Data Details

Historical market data details for each supported exchange — available symbols, channels, date ranges...

Historical data details describes data collection specifics for each supported exchange and what's available via Tardis.dev HTTP API. If you'd like to work with normalized market data see official libraries and downloadable CSV files.

You'll find here per-exchange details about:

  • historical data availability date ranges — since when the historical data has been collected and is available

  • captured real-time market data channels also described as streams, subscription topics, tables etc in exchanges' docs — available historical raw market data is being sourced from WebSocket real-time APIs provided by the exchanges and can be filtered by channels, e.g.: to get historical trades for BitMEX, channel trade needs to be provided alongside requested instruments symbols (via HTTP API or client libs function args).

  • symbols of recorded instruments/currency pairs

  • incidents - describing periods where due to internal errors data has been missing for given exchange

Some exchanges encode requested symbol in channel name, e.g.: Deribit trades.BTC-PERPETUAL.100ms channel. This is not the case with our API as we always consider channel name and symbol to be separate inputs. In case of Deribit example channel name would be trades and symbol BTC-PERPETUAL. If channel provides option of frequency of updates (e.g.: 100ms vs raw tick by tick) always higher frequency one is being chosen and recorded.

Market data collection overview

  • All market data collection is being performed on one of the highly available Google Cloud Platform Kubernetes Clusters - London, UK (europe-west2 region) or Tokyo, Japan (asia-northeast1 region) - information which data center location is used for particular exchange is described on exchange historical data details page.

  • When exchange provides choice of real-time data frequency for specific data types (e.g. order book data ) always most granular, non aggregated data feed is being collected.

  • Choice if single or multiple WebSocket connections are being used to record full real-time data feed is made on case by case basis - we take into account exchange API limits and latency which may be higher or lower if single connection is being used - detailed information which strategy is used for particular exchange is described on exchange historical data details page.

  • WebSocket connection is always restarted at 00:00 UTC (every 24 hours) in order to receive initial order book snapshots - also some exchanges (e.g., Binance) require connection restart every 24h.

  • Each received message is timestamped with 100ns precision using synchronized clock at arrival time and stored in ISO 8601 format.

  • Messages provided by exchanges' WebSocket feeds are being stored without any modifications.

  • Checks if there are new instruments available for given exchange are being performed every minute.

  • Market data collection services are being constantly monitored both manually and via automated tools (monitoring, alert notifications) and have built-in self-healing capabilities. We also constantly monitor for upcoming exchanges' API changes and adapt to those beforehand.

  • There are multiple built-in checks detecting if connection to exchange is healthy during data collection process, such as:

    • validating subscription responses - if exchange does not confirm subscriptions within 20 seconds, connection is being restarted

    • order books sequence numbers validation for exchange that provide those

    • validating JSON format as in some unusual circumstances exchanges return data that is invalid JSON

    • stale connection detection - if there are no responses received within certain period (adjusted per exchange) it's most likely stale connection which gets automatically restarted

    • detection of unusually small messages count being received from exchange in given time period which likely means connection is not healthy, e.g.: receiving only 'pings' without data messages

    • and many more ...

  • Any incident that is caused by us (bugs, network errors etc.) is being logged and available via API.

  • New market data delay is 4 minutes in relation to real-time (T - 4min).

Collected order book data details

Historical market data available via HTTP API provides order book snapshots at the beginning of each day (00:00 UTC) and every-time WebSocket connection has been closed when recording real-time data feed (connection is restarted and new snapshot provided via fresh connection). It means that in order to be sure to receive initial order book snapshots one must replay historical data from 00:00 UTC time of the day. It also means that there is a tiny gap in historical data (around 300-3000ms range depending on exchange) during re-subscribing to real-time WebSocket feed (every 24 hours) in order to receive order book snapshots.

Some exchanges do not provide initial order book snapshots when subscribing to WebSocket real-time feeds (like Binance, Bitstamp or Coinbase Pro full order book), hence for those there is a 'generated' snapshot available instead (based on REST API call) - details are specific for each exchange and are described in per-exchange historical details pages.

Per-exchange historical data details

Click any exchange below to see it's historical data details - available instruments, captured real-time channels, API access details and market data collection specifics.