Downloadable CSV files

Quick start

CSV datasets are available via dedicated datasets API that allows downloading tick level incremental order book L2 updates, order book snapshots, trades, options chains, quotes, derivative tickers and liquidations data. For ongoing data, CSV datasets for a given day are available on the next day around 06:00 UTC.

CSV datasets are exported from exchanges' real-time WebSocket feeds data we collected (and also provide via our API as historical data in exchange-native format).

Historical datasets for the first day of each month are available to download without API key. Our Node.js and Python clients have built-in functions to efficiently download whole date range of data.

# pip install tardis-dev
# requires Python >=3.6
from tardis_dev import datasets

datasets.download(
    exchange="deribit",
    data_types=[
        "incremental_book_L2",
        "trades",
        "quotes",
        "derivative_ticker",
        "book_snapshot_25",
        "liquidations"
    ],
    from_date="2019-11-01",
    to_date="2019-11-02",
    symbols=["BTC-PERPETUAL", "ETH-PERPETUAL"],
    api_key="YOUR API KEY (optionally)",
)

See full example that shows all available download options (download path customization, filenames conventions and more).

CSV format details

  • columns delimiter: , (comma)

  • new line marker: \n (LF)

  • decimal mark: . (dot)

  • date time format: microseconds since epoch (https://www.epochconverter.com/)

  • date time timezone: UTC

Data types

incremental_book_L2

Incremental order book L2 updates collected from exchanges' real-time WebSocket order book L2 data feeds - data as deep and granular as underlying real-time data source, please see FAQ: What is the maximum order book depth available for each supported exchange? for more details.

As exchanges real-time feeds usually publish multiple order book levels updates via single message you can recognize that by grouping rows by local_timestamp field if needed.

If you have any doubts how to correctly reconstruct full order book state from incremental_book_L2 CSV dataset, please see this answer or contact us.

In case you only need order book data for top 25 or top 5 levels, we do provide datasets with already reconstructed snapshots for every update for those. See book_snapshot_25 and book_snapshot_5.

column name

description

exchange

exchange id, one of https://api.tardis.dev/v1/exchanges ([].id field)

symbol

instrument symbol as provided by exchange (always uppercase)

timestamp

timestamp provided by exchange in microseconds since epoch - if exchange does not provide one local_timestamp value is used as a fallback

local_timestamp

message arrival timestamp in microseconds since epoch

is_snapshot

possible values:

  • true - if update was a part of initial order book snapshot

  • false - if update was not a part of initial order book snapshot

If last update was not a snapshot and current one is, then existing order book state must be discarded (all existing levels removed)

side

determines to which side of the order book update belongs to:

  • bid - bid side of the book, buy orders

  • ask - ask side of the book, sell orders

price

price identifying book level being updated

amount

updated price level amount as provided by exchange, not a delta - an amount of 0 indicates that the price level can be removed

Deribit BTC-PERPETUAL incremental order book L2 updates for 2020-04-01
Deribit FUTURES instruments incremental order book L2 updates for 2020-09-01

• book_snapshot_25

Tick-level order book snapshots reconstructed from exchanges' real-time WebSocket order book L2 data feeds. Each row represents top 25 levels from each side of the limit order book book and was recorded every time any of the tracked bids/asks top 25 levels have changed.

column name

description

exchange

exchange id, one of https://api.tardis.dev/v1/exchanges ([].id field)

symbol

instrument symbol as provided by exchange (always uppercase)

timestamp

timestamp provided by exchange in microseconds since epoch - if exchange does not provide one local_timestamp value is used as a fallback

local_timestamp

message arrival timestamp in microseconds since epoch

asks[0..24].price

top 25 asks prices in ascending order, empty if there aren't enough price levels available in the order book or provided by the exchange

asks[0..24].amount

top 25 asks amounts in ascending order, empty if there aren't enough price levels available in the order book or provided by the exchange

bids[0..24].price

top 25 bids prices in descending order, empty if there aren't enough price levels available in the order book or provided by the exchange

bids[0..24].amount

top 25 bids amounts in descending order, empty if there aren't enough price levels available in the order book or provided by the exchange

BitMEX XBTUSD top 25 levels order book snapshots for 2020-09-01
Binance USDT Futures BTCUSDT top 25 levels order book snapshots for 2020-09-01

• book_snapshot_5

Tick-level order book snapshots reconstructed from exchanges' real-time WebSocket order book L2 data feeds. Each row represents top 5 levels from each side of the limit order book book and was recorded every time any of the tracked bids/asks top 5 levels have changed.

column name

description

exchange

exchange id, one of https://api.tardis.dev/v1/exchanges ([].id field)

symbol

instrument symbol as provided by exchange (always uppercase)

timestamp

timestamp provided by exchange in microseconds since epoch - if exchange does not provide one local_timestamp value is used as a fallback

local_timestamp

message arrival timestamp in microseconds since epoch

asks[0..4].price

top 5 asks prices in ascending order, empty if there aren't enough price levels available in the order book or provided by the exchange

asks[0..4].amount

top 5 asks amounts in ascending order, empty if there aren't enough price levels available in the order book or provided by the exchange

bids[0..4].price

top 5 bids prices in descending order, empty if there aren't enough price levels available in the order book or provided by the exchange

bids[0..4].amount

top 5 bids amounts in descending order, empty if there aren't enough price levels available in the order book or provided by the exchange

BitMEX XBTUSD top 5 levels order book snapshots for 2020-09-01
Binance USDT Futures BTCUSDT top 5 levels order book snapshots for 2020-09-01

• trades

Individual trades data collected from exchanges' real-time WebSocket trades data feeds.

column name

description

exchange

exchange id, one of https://api.tardis.dev/v1/exchanges ([].id field)

symbol

instrument symbol as provided by exchange (always uppercase)

timestamp

timestamp provided by exchange in microseconds since epoch - if exchange does not provide one local_timestamp value is used as a fallback

local_timestamp

message arrival timestamp in microseconds since epoch

id

trade id as provided by exchange, empty if exchange does not provide one - different exchanges provide id's as numeric values, GUID's or other strings, and some do not provide that information at all

side

liquidity taker side (aggressor), possible values:

  • buy - liquidity taker was buying

  • sell - liquidity taker was selling

  • unknown - exchange did not provide that information

price

trade price as provided by exchange

amount

trade amount as provided by exchange

Bitmex XBTUSD trades for 2020-03-01 dataset sample
OKEx Futures FUTURES instruments trades for 2020-03-01 dataset sample

• options_chain

Tick-level options summary info (strike prices, expiration dates, open interest, implied volatility, greeks etc.) for all active options instruments collected from exchanges' real-time WebSocket options tickers data feeds. Options chain data is available for Deribit (sourced from ticker channel) and OKEx Options (sourced from option/summary and index/ticker channels).

For options_chain data type only 'OPTIONS' symbol is available (one file per day for all options instruments).

column name

description

exchange

exchange id, one of https://api.tardis.dev/v1/exchanges ([].id field)

symbol

instrument symbol as provided by exchange (always uppercase)

timestamp

ticker timestamp provided by exchange in microseconds since epoch

local_timestamp

ticker message arrival timestamp in microseconds since epoch

type

option type, possible values:

  • put

  • call

strike_price

option strike price

expiration

option expiration date in microseconds since epoch

open_interest

current open interest, empty is exchange does not provide one

last_price

price of the last trade, empty if there weren't any trades yet

bid_price

current best bid price, empty if there aren't any bids

bid_amount

current best bid amount, empty if there aren't any bids

bid_iv

implied volatility for best bid, empty if there aren't any bids

ask_price

current best ask price, empty if there aren't any asks

ask_amount

current best ask amount, empty if there aren't any asks

ask_iv

implied volatility for best ask, empty if there aren't any asks

mark_price

mark price, empty is exchange does not provide one

mark_iv

implied volatility for mark price, empty is exchange does not provide one

underlying_index

underlying index name that option contract is based upon

underlying_price

underlying price, empty is exchange does not provide one

delta

delta value for the option, empty is exchange does not provide one

gamma

gamma value for the option, empty is exchange does not provide one

vega

vega value for the option, empty is exchange does not provide one

theta

theta value for the option, empty is exchange does not provide one

rho

rho value for the option, empty is exchange does not provide one

Deribit options chain for 2020-03-01
OKEx options chain for 2020-03-01

• quotes

Top of the book (best bid/ask) data reconstructed from exchanges' real-time WebSocket order book L2 data feeds. - best bid/ask recorded every time top of the book has changed. We on purpose choose this solution over native exchanges real-time quotes feeds as those vary a lot between exchanges, can be throttled, some are absent at all, often are delayed and published in batches in comparison to more granular L2 updates which are the basis for our quotes dataset.

column name

description

exchange

exchange id, one of https://api.tardis.dev/v1/exchanges ([].id field)

symbol

instrument symbol as provided by exchange (always uppercase)

timestamp

timestamp provided by exchange in microseconds since epoch - if exchange does not provide one local_timestamp value is used as a fallback

local_timestamp

message arrival timestamp in microseconds since epoch

ask_amount

best ask amount as provided by exchange, empty if there aren't any asks

ask_price

best ask price as provided by exchange, empty if there aren't any asks

bid_price

best bid price as provided by exchange, empty if there aren't any bids

bid_amount

best bid amount as provided by exchange, empty if there aren't any bids

Huobi DM Swap BTC-USD quotes for 2020-05-01
Deribit OPTIONS instruments quotes for 2020-05-01

book_ticker

• derivative_ticker

Derivative instrument ticker info (open interest, funding, mark price, index price) collected from exchanges' real-time WebSocket instruments & tickers data feeds. Anytime any of the tracked values has changed data was added to final dataset.

column name

description

exchange

exchange id, one of https://api.tardis.dev/v1/exchanges ([].id field)

symbol

instrument symbol as provided by exchange (always uppercase)

timestamp

timestamp provided by exchange in microseconds since epoch - if exchange does not provide one local_timestamp value is used as a fallback

local_timestamp

message arrival timestamp in microseconds since epoch

funding_timestamp

timestamp of the next funding event in microseconds since epoch, empty if exchange does not provide one

funding_rate

funding rate that will take effect on the next funding event at funding timestamp, for some exchanges it's fixed, for other it fluctuates, empty if exchange does not provide one

predicted_funding_rate

estimated predicted funding rate for the next after closest funding event, empty if exchange does not provide one

open_interest

current open interest, empty if exchange does not provide one

last_price

last instrument price, empty if exchange does not provide one

index_price

index price of the instrument, empty if exchange does not provide one

mark_price

mark price of the instrument, empty if exchange does not provide one

BitMEX ETHUSD derivative ticker for 2019-04-01
FTX PERPETUALS instruments derivative ticker for 2019-04-01

• liquidations

Liquidations data collected from exchanges' real-time WebSocket data feeds were available.

See details which exchanges support it and since when.

column name

description

exchange

exchange id, one of https://api.tardis.dev/v1/exchanges ([].id field)

symbol

instrument symbol as provided by exchange (always uppercase)

timestamp

timestamp provided by exchange in microseconds since epoch - if exchange does not provide one local_timestamp value is used as a fallback

local_timestamp

message arrival timestamp in microseconds since epoch

id

liquidation id as provided by exchange, empty if exchange does not provide one - different exchanges provide id's as numeric values, GUID's or other strings, and some do not provide that information at all

side

liquidation side:

  • buy - short position was liquidated

  • sell - long position was liquidated

price

liquidation price as provided by exchange

amount

liquidation amount as provided by exchange

FTX perpetual futures liquidations for 2021-09-01

BitMEX XBTUSD liquidations for 2021-09-91

Grouped symbols

In addition to standard currency pairs & instrument symbols that can be requested when via CSV datasets API, each exchange has additional special grouped symbols available depending if it supports given market type: SPOT, FUTURES, OPTIONS and PERPETUALS. When such symbol is requested then downloaded file for it has all the data for all instruments belonging for given market type. This is especially useful for options instruments that as specifying each option symbol one by one can be mundane process, using 'OPTIONS' as a symbol gives data for all options available at given time.

those special symbols are also listed in response to /exchanges/:exchange API call

Datasets API details

  • all downloadable datasets are gzip compressed

  • historical market data is available in daily intervals (separate file for each day) based on local timestamp (timestamp of message arrival) split by exchange, data type and symbol

  • data for a given day is available on the next day around 6h after 00:00 UTC - exact date until when data is available can be requested via /exchanges/:exchange API call (datasets.exportedUntil), e.g., https://api.tardis.dev/v1/exchanges/ftx

  • datasets are ordered and split into separate daily files by local_timestamp (timestamp of message arrival time)

  • empty gzip compressed file is being returned in case of no data available for a given day, symbol and data type, e.g., exchange downtime, very low volume currency pairs etc.

  • iftimestamp equals to local_timestamp it means that exchange didn't provide timestamp for message, e.g., BitMEX order book updates

  • cell in CSV file is empty if there's no value for it, e.g., no trade id if a given exchange doesn't provide one

  • datasets are sourced from Tardis.dev HTTP API, which in turn provides the the data sourced from exchanges real-time WebSocket market data feeds (in contrast to REST API endpoints)

Download via client libraries

Historical datasets for the first day of each month are available to download without API key.

# pip install tardis-dev
# requires Python >=3.6
from tardis_dev import datasets, get_exchange_details
import logging

# comment out to disable debug logs
logging.basicConfig(level=logging.DEBUG)

# function used by default if not provided via options
def default_file_name(exchange, data_type, date, symbol, format):
    return f"{exchange}_{data_type}_{date.strftime('%Y-%m-%d')}_{symbol}.{format}.gz"


# customized get filename function - saves data in nested directory structure
def file_name_nested(exchange, data_type, date, symbol, format):
    return f"{exchange}/{data_type}/{date.strftime('%Y-%m-%d')}_{symbol}.{format}.gz"


# returns data available at https://api.tardis.dev/v1/exchanges/deribit
deribit_details = get_exchange_details("deribit")
# print(deribit_details)

datasets.download(
    # one of https://api.tardis.dev/v1/exchanges with supportsDatasets:true - use 'id' value
    exchange="deribit",
    # accepted data types - 'datasets.symbols[].dataTypes' field in https://api.tardis.dev/v1/exchanges/deribit,
    # or get those values from 'deribit_details["datasets"]["symbols][]["dataTypes"] dict above
    data_types=["incremental_book_L2", "trades", "quotes", "derivative_ticker", "book_snapshot_25", "book_snapshot_5", "liquidations"],
    # change date ranges as needed to fetch full month or year for example
    from_date="2019-11-01",
    # to date is non inclusive
    to_date="2019-11-02",
    # accepted values: 'datasets.symbols[].id' field in https://api.tardis.dev/v1/exchanges/deribit
    symbols=["BTC-PERPETUAL", "ETH-PERPETUAL",],
    # (optional) your API key to get access to non sample data as well
    api_key="YOUR API KEY",
    # (optional) path where data will be downloaded into, default dir is './datasets'
    # download_dir="./datasets",
    # (optional) - one can customize downloaded file name/path (flat dir strucure, or nested etc) - by default function 'default_file_name' is used
    # get_filename=default_file_name,
    # (optional) file_name_nested will download data to nested directory structure (split by exchange and data type)
    # get_filename=file_name_nested,
)

If you're running into RuntimeError: This event loop is already running error try solution from https://github.com/ipython/ipython/issues/11338#issuecomment-646539516 (adding nest_asyncio).

Datasets API reference

GET https://datasets.tardis.dev/v1/:exchange/:dataType/:year/:month/:day/:symbol.csv.gz

Returns gzip compressed CSV dataset for given exchange, data type, date (year, month, day) and symbol.

Path Parameters

NameTypeDescription

exchange

string

one of https://api.tardis.dev/v1/exchanges (field id, only exchanges with "supportsDatasets":true)

dataType

string

one of datasets.symbols[].dataTypes values from https://api.tardis.dev/v1/exchanges/:exchange API response

year

string

year in format YYYY (four-digit year)

month

string

month in format MM (two-digit month of the year)

day

string

day in format DD (two-digit day of the month)

symbol

string

one of datasets.symbols[].id values from https://api.tardis.dev/v1/exchanges/:exchange API response, see details below

Headers

NameTypeDescription

Authorization

string

For authenticated requests provide Authorization header with value: 'Bearer YOUR_API_KEY'. Without API key historical datasets for the first day of each month are available to download.

Sample requests