Data
Common questions about data types, formats and collection
What data types do you support?
We provide the most comprehensive and granular market data sourced from real-time WebSocket APIs, with complete control and transparency over how the data is recorded.
Via downloadable CSV data files following normalized tick-level data types are available:
book_ticker (best bid/ask from native exchange BBO feeds)
derivative tick info (open interest, funding rate, mark price, index price)
Raw data API, which is available for pro and business subscriptions, provides data in exchange-native data format. See historical data details to learn about real-time channels captured for each exchange. Each captured channel can be considered a different exchange-specific data type (for example Binance bookTicker channel, or BitMEX liquidation channel).
We also provide the following normalized data types via our client libs (normalization is done client-side, using raw data API as a data source):
trades
order book L2 updates
order book snapshots (tick-by-tick, 10ms, 100ms, 1s, 10s etc)
quotes
book ticker (best bid/ask from native BBO feeds)
derivative tick info (open interest, funding rate, mark price, index price)
liquidations
options summary
OHLCV
volume/tick based trade bars
What does high frequency historical data mean?
We always collect and provide data with the highest granularity an exchange can offer via its real-time WS feeds. High frequency can mean different things for different exchanges due to exchange API limitations. For example, for Coinbase Exchange it can mean L3 order book data (market-by-order), for Binance USDS-M Futures all order book L2 real-time updates, and for Binance Spot it means order book updates aggregated in 100ms intervals.
How is historical raw market data sourced?
Raw market data is sourced from exchanges' real-time WebSocket APIs. For cases where an exchange lacks a WebSocket API for a particular data type, we fall back to polling a REST API periodically, e.g., Binance USDS-M Futures open interest data.
See market data collection overview for more details and why data source matters.
Why data source matters — WebSocket feeds vs REST endpoints
Recording exchanges' real-time WebSocket feeds allows us to preserve and provide the most granular data that exchanges' APIs can offer, including data that is not available via REST APIs, like tick-level order book updates.
Historical data sourced from WebSocket real-time feeds adheres to what you'll see when trading live and can be used to exactly replicate live conditions, even if it means occasional connection drops causing small data gaps, real-time publishing delays during larger market moves, duplicated trades, or crossed books in edge cases. We find that trade-off acceptable and, even if data isn't as clean and corrected as data sourced from REST APIs, it provides more insight into market microstructure and unusual exchange behaviors that otherwise can't be captured.
A simple example would be latency spikes for many exchanges during increased volatility periods where exchanges publish trade/order book/quote WebSocket messages with larger-than-usual latency or skip some updates and then return them in one batch. Querying the REST API would result in a clean trade history, but such data wouldn't fully reflect actionable market behavior and could produce unrealistic backtesting results that break in real-time scenarios.
See market data collection overview for more details.
What is a difference between exchange-native and normalized data format?
Cryptocurrency markets are very fragmented and every exchange provides data in its own bespoke format, which we call exchange-native data format.
Our HTTP API and client libs can provide market data in this format, meaning data you receive is exactly the same as the live data you would have received from exchanges ("as-is").
See how we collect data in exchange-native format and why it's important.
For example BitMEX trade message looks like this:
{
"table": "trade",
"action": "insert",
"data": [
{
"timestamp": "2019-06-01T00:03:11.589Z",
"symbol": "ETHUSD",
"side": "Sell",
"size": 10,
"price": 268.7,
"tickDirection": "ZeroMinusTick",
"trdMatchID": "ebc230d9-0b6e-2d5d-f99a-f90109a2b113",
"grossValue": 268700,
"homeNotional": 0.08555051758063137,
"foreignNotional": 22.987424073915648
}
]
}and this is Deribit trade message:
{
"jsonrpc": "2.0",
"method": "subscription",
"params": {
"channel": "trades.ETH-26JUN20.raw",
"data": [
{
"trade_seq": 18052,
"trade_id": "ETH-10813935",
"timestamp": 1577836825724,
"tick_direction": 0,
"price": 132.65,
"instrument_name": "ETH-26JUN20",
"index_price": 128.6,
"direction": "buy",
"amount": 1.0
}
]
}
}
In contrast, normalized data format means the same, unified format across multiple exchanges. We provide normalized data via our client libs (data normalization is performed client-side) as well as via downloadable CSV files.
In the process of data normalization we map the data we collected from real-time WebSocket APIs (exchange-native format) to normalized/unified format across exchanges that is easier to deal with (one data format across multiple exchanges).
We've open sourced all the data mappings from exchange-native to normalized format to make the whole process as transparent as possible.
Sample normalized trade message:
{
"type": "trade",
"symbol": "XBTUSD",
"exchange": "bitmex",
"id": "282a0445-0e3a-abeb-f403-11003204ea1b",
"price": 7996,
"amount": 50,
"side": "sell",
"timestamp": "2019-10-23T10:32:49.669Z",
"localTimestamp": "2019-10-23T10:32:49.740Z"
}We support following normalized data types via our client libs:
tick-by-tick trades
order book L2 updates
order book snapshots (tick-by-tick, 10ms, 100ms, 1s, 10s etc)
quotes
book ticker (best bid/ask from native BBO feeds)
derivative tick info (open interest, funding rate, mark price, index price)
liquidations
options summary
OHLCV
volume/tick based trade bars
and downloadable CSV data files:
book_ticker (best bid/ask from native BBO feeds)
derivative tick info (open interest, funding rate, mark price, index price)
Where to find what — quick reference:
Raw replay (HTTP API) — all exchange-native channels in original format. Use this when you need fields or data types not available in normalized format.
Downloadable CSV files (datasets) — normalized data types listed above (trades, incremental_book_L2, book_snapshot, quotes, book_ticker, derivative_ticker, liquidations, options_chain).
Client libs / tardis-machine (getting started) — normalized data types plus additional computed types (trade_bar, book snapshots with custom intervals, OHLCV, etc.), available for both historical replay and real-time streaming.
Not every exchange-native channel has a normalized equivalent. If a data type is available as a raw channel but not listed in normalized types above, it can only be accessed via raw replay.
Do you provide market data in normalized format?
Normalized market data (unified data format for every exchange) is available via our official libraries and downloadable CSV files. Our HTTP API provides data only in exchange-native format.
What is the difference between `book_ticker` and `quote`?
Both provide best bid/ask (BBO) data, but from different sources:
book_ticker— sourced from exchanges' native WebSocket BBO channels (e.g., BinancebookTicker, Bybitorderbook.1). Available via client libraries, tardis-machine and CSV datasets. See which exchanges support it.quote(alias forbook_snapshot_1_0ms) — derived from L2 order book data. Available on all exchanges that provide L2 data. Also available asquotesCSV dataset.
Update frequency differs between the two and depends on the exchange. For example, on Binance the native bookTicker stream fires significantly more often than L2-derived quotes, because the exchange publishes a dedicated BBO update on every best-price change. On other exchanges the difference may be smaller or negligible. Check the update rates for your specific exchange before choosing.
When using the replay API, book_ticker can be replayed starting from any point in time since it is a standalone exchange feed. In contrast, quote is derived from L2 order book state, which requires an initial snapshot to reconstruct — snapshots are provided at 00:00 UTC each day (and after each WebSocket reconnect), so replay should start from 00:00 UTC to get accurate quotes.
Does Tardis provide precomputed indicators or OHLCV candles?
Tardis provides raw tick-level market data (trades, order book updates, funding rates, liquidations, etc.) — not precomputed indicators, aggregated Kline/OHLCV candles, or hosted analytics. OHLCV bars and other derived metrics can be computed client-side from our data, for example using trade_bar data type in tardis-machine or client libraries.
Do you provide time based aggregated data as well?
We are focusing on providing the best possible tick-level historical data for cryptocurrency exchanges and as of now our APIs (both HTTP and CSV datasets) do offer access to tick-level data only and do not offer support for time based aggregated data.
If you're interested in time based aggregated data (OHLC, interval based order book snapshots) see our client libs that provide such capabilities, but with the caveat that data aggregation is performed client-side from tick-level data sourced from the API, meaning it can be relatively slow process in contrast to ready to download aggregated data.
What is the historical market data delay in relation to real-time?
For raw data replay API the most recent data available is approximately T-6 minutes from the current time.
Downloadable CSV files for a given day are available on the next day around 06:00 UTC — see CSV readiness for details.
What is the `channel` field used in the HTTP API and client libs `replay` functions?
When exchanges publish real-time data messages, they always publish them for the subscription topics clients subscribed to. These subscription topics are also often called "channels" or "streams" in exchange documentation and describe the data type a given message belongs to. For example, BitMEX publishes its trade data via trade channel and order book L2 update data via orderBookL2.
Since we collect the data for all the channels described in exchanges' details page (Captured real-time market data channels section) our HTTP API and client libs offer filtering capability by those channels names, so for example to get historical trades for BitMEX, channel trade needs to be provided alongside requested instruments symbols (via HTTP API or client lib replay function args).
How CSV datasets are split into the files?
CSV datasets are available in daily intervals split by exchange, data type, and symbol. In addition to standard currency pairs/instrument symbols, each exchange also has special grouped symbols available depending on whether it supports a given market type: SPOT, FUTURES, OPTIONS, and PERPETUALS. That feature is useful if someone is interested in, for example, all Deribit's options instruments trades or quotes data without requesting data for each symbol separately.
How do symbol IDs differ between raw replay and CSV datasets?
The /exchanges/:exchange API returns two separate symbol lists with different ID formats:
availableSymbols[].id— used for raw data replay. These are exchange-native symbol IDs as used in WebSocket subscriptions (e.g.,btcusdtfor Binance,BTC-PERPETUALfor Deribit).datasets.symbols[].id— used for CSV dataset downloads. These are always uppercased, and URL-unsafe characters (/,:) are replaced with-.
The Instruments Metadata API provides both id (native) and datasetId (CSV) fields per instrument.
Does your historical data include delisted or expired instruments?
Yes. Our historical datasets are survival-bias-free — they include all instruments that were actively trading at the time of data collection, including those that have since been delisted, expired, or renamed by the exchange.
Note that for some spot exchanges, early historical coverage was limited to high-cap currency pairs only. See individual exchange pages in Historical Data Details for exact coverage boundaries and start dates.
Can the same symbol ID refer to different assets over time?
Yes. Exchanges may reuse symbol identifiers for different assets — for example, delisting a token and later listing a different token under the same symbol. Tardis passes through symbol IDs as provided by the exchange without modification, so the same symbol string may appear across different time periods for different underlying assets. Verify token identity using additional context such as price levels or exchange announcements.
Do you provide historical futures data?
We cover all leading derivatives exchanges such as BitMEX, Deribit, Binance USDS-M Futures, Binance COIN Futures, FTX, OKX Futures, HTX Coin-M Futures, HTX Coin-M Perpetual, Bitfinex Derivatives, Bybit Derivatives and many more.
What is the difference between futures and perpetual swaps contracts?
Futures contract is a contract that has expiry date (for example quarter ahead for quarterly futures). Futures contract price converges to spot price as the contract approaches expiration/settlement date. After futures contract expires, exchange settles it and replaces with a new contract for the next period (next quarter for our previous example).
A perpetual swap contract, also commonly called "perp", "swap", "perpetual", or "perpetual future" in crypto exchange nomenclature, is very similar to a futures contract but does not have an expiry date (hence perpetual). To ensure that the perpetual swap contract price stays near the spot price, exchanges employ a mechanism called funding rate. When the funding rate is positive, Longs pay Shorts. When the funding rate is negative, Shorts pay Longs. This mechanism can be quite nuanced and vary between exchanges, so it's best to study each contract specification to learn all the details (funding periods, mark price mechanisms, etc.).
See CSV grouped symbols section if you'd like to download data for all futures or perpetual swaps as a single file for given exchange instead one by one for each individual instrument.
Do you provide historical options data?
Yes, we do provide historical options data for Deribit and OKX Options - see options chain CSV data type and Deribit and OKX Options exchange details pages.
Which exchanges support liquidations data type?
Liquidations data is sourced from exchanges' WebSocket APIs when supported, with fallback to polling REST APIs when WebSocket APIs do not support that data type, and can be accessed via raw data APIs (replaying relevant channels) or as a normalized data type via CSV downloads.
2019-03-30
WS trades channel (trades with liquidation flag); data available until 2023-10-03 (Deribit removed the liquidation field from public trade subscriptions on that date)
2020-12-17
WS futures/liquidation channel (before 2021-12-23); WS liquidations channel since 2021-12-23
2020-11-03
WS liquidation channel (before 2023-04-05); WS allLiquidation channel since 2025-02-25
Binance forceOrder streams push snapshot data at most once per second since April 2021 (no longer real-time individual events). Tardis captures exactly what exchanges publish — liquidation data should not be assumed to contain every individual liquidation event.
Which exchanges support book_ticker data type?
book_ticker provides top of the book (best bid/ask) data captured directly from exchanges' native WebSocket best bid/offer channels. See book_ticker vs quote for how it differs from L2-derived quotes.
Do you provide normalized contract amounts for derivatives?
Data we provide has contract amounts exactly as provided by exchanges' APIs, meaning the amount field may represent contracts, base-asset units, or USD depending on the exchange. This can be tricky when comparing across exchanges due to different contract multipliers (e.g., OKX where each contract has a $100 value) or different contract types (linear or inverse).
We provide the instruments metadata API that returns contractMultiplier, inverse, contractType, tick sizes, and more for each instrument in a uniform way, allowing you to normalize contract amounts client-side. Use the following formulas for futures and perpetual contracts:
Linear (inverse: false)
amount × contractMultiplier
amount × contractMultiplier × price
Inverse (inverse: true)
amount × contractMultiplier / price
amount × contractMultiplier
These formulas apply to standard futures and perpetual contracts. Quanto contracts (quanto_future, quanto_perpetual) and options require different treatment — consult exchange-specific documentation for their volume calculations.
Can you record market data for exchange that's not currently supported?
Yes, we're always open to support new promising exchanges. Contact us and we'll get back to you to discuss the details.
Is provided raw market data complete?
We're doing our best to provide the most complete and reliable historical raw data API on the market. To do so, among many other things, we utilize highly available Kubernetes clusters on Google Cloud Platform that offer best-in-class availability, networking, and monitoring. However, due to exchanges' API downtimes (maintenance, deployments, connection drops, etc.) we can experience data gaps and cannot guarantee 100% data completeness, but 99.9% (99.99% on most days), which should be more than enough for most use cases where tick-level data is useful.
In rare circumstances, when exchange's API changes without any notice or we hit new unexpected rate limits we also may fail to record data during such period, it happens very rarely and is very specific for each exchange. Data gaps caused by exchange outages or collection interruptions are permanent — WebSocket-sourced data cannot be retroactively backfilled. Use /exchanges/:exchange API endpoint and check for incidentReports field in order to get most detailed and up to date information on that subject.
Can exchange data contain invalid or extreme values?
Yes. Tardis stores and serves exchange payloads exactly as received, without modification or filtering. If an exchange publishes an invalid price, extreme value, or malformed field, it will appear in the data as-is. This preserves full fidelity of the original feed. Apply your own validation and sanitization downstream when consuming raw data.
How market data messages are being timestamped?
Each message received via WebSocket connection is timestamped with 100ns precision using synchronized clock at arrival time (before any message processing) and stored in ISO 8601 format. Note that data is collected from different server locations depending on the exchange (see market data collection overview). Local timestamps for exchanges collected from the same server location are directly comparable, but cross-region comparisons (e.g., London vs Tokyo) should not be used for sub-millisecond latency analysis.
How are events ordered when multiple messages share the same timestamp?
Row order in both replay API responses and CSV files reflects the original capture order — the sequence in which messages were received from the exchange WebSocket connection. When multiple events share the same millisecond exchange timestamp, use the row position (or localTimestamp ordering) as the tie-breaker rather than rounding or deduplicating by exchange timestamp.
Are trades and order book updates synchronized across channels?
Exchanges publish different data types (trades, order book updates, tickers, etc.) on independent WebSocket channels — often processed by separate backend services or workers. There is no cross-channel ordering guarantee from exchanges. For example, Deribit explicitly documents that cross-instrument timing is "inherently asynchronous" with separate internal workers per currency. Other exchanges (Binance, Bybit, OKX) are silent on cross-channel ordering, which in practice means no guarantee.
This means a trade's exchange timestamp does not guarantee that the order book state at that exact timestamp reflects the pre- or post-trade book. Similarly, different symbols — even on the same channel — may be served by different backend servers and arrive independently.
Tardis preserves the original message arrival order and never reorders events. In historical replay and CSV files, all messages are sorted by localTimestamp (the time we received the message), providing a chronological sequence across data types as observed from our collection servers.
How frequently exchanges drop WebSocket connections?
As long as an exchange WebSocket API is not hidden behind a Cloudflare proxy (causing relatively frequent "CloudFlare WebSocket proxy restarting, Connection reset by peer" errors), connections are stable for the majority of supported exchanges and there are almost no connection drops during the day. During periods of higher market volatility, some exchanges tend to drop connections more frequently or have larger latency spikes. Overall, it's a nuanced matter that changes over time. If you have any questions regarding a particular exchange, please do not hesitate to contact us.
Are exchanges publishing duplicated trades data messages?
Some exchanges are occasionally publishing duplicated trades (trades with the same ids). Since we collect real-time data we also collect and provide duplicate trades via API if those were published by real-time WebSocket feeds of exchanges. Our client libraries have functionality that when working with normalized data can deduplicate such trades, similarly for downloadable CSV files we deduplicate tick-by-tick trades data.
Can timestamps be non-monotonic within a channel?
That shouldn't happen in theory, but we've detected that for some exchanges when new connection is established sometimes first message for given channel & symbol has newer timestamp than subsequent message, e.g., order book snapshot has newer timestamp than first order book update. This is why we provide data via API and CSV downloads for given data ranges based on local timestamps (timestamp of message arrival) which are always monotonically increasing.
Last updated
Was this helpful?