# API Reference

## Datasets API details

Datasets API serves gzip compressed CSV files in daily intervals, split by exchange, data type and symbol.

* all downloadable datasets are gzip compressed
* historical market data is available in daily intervals (separate file for each day) based on local timestamp (timestamp of message arrival) split by exchange, [data type](https://docs.tardis.dev/downloadable-csv-files/data-types) and symbol
* data for a given day is available on the next day around 6h after 00:00 UTC - exact date until when data is available can be requested via [/exchanges/:exchange](https://docs.tardis.dev/api/http-api-reference#exchanges-exchange) API call (`datasets.exportedUntil`), e.g., <https://api.tardis.dev/v1/exchanges/deribit>
* datasets are ordered and split into separate daily files by [`local_timestamp`](https://docs.tardis.dev/faq/data#how-are-market-data-messages-timestamped) (timestamp of message arrival time)
* an empty gzip file is returned when no data is available for a given day, symbol and data type, e.g., exchange downtime, very low volume currency pairs etc.
* if `timestamp` equals `local_timestamp`, it means the exchange didn't provide a timestamp for the message, e.g., BitMEX order book updates
* cell in CSV file is empty if there's no value for it, e.g., no trade id if a given exchange doesn't provide one
* datasets are sourced from Tardis.dev [HTTP API](https://docs.tardis.dev/api/http-api-reference), which in turn provides data sourced from exchanges' [real-time WebSocket market data feeds](https://docs.tardis.dev/faq/data#why-data-source-matters-websocket-feeds-vs-rest-endpoints) (in contrast to REST API endpoints)
* disconnect events are not included in CSV datasets — they are available via the [raw HTTP API](https://docs.tardis.dev/api/http-api-reference) (as empty lines) and via [normalized replay](https://docs.tardis.dev/node-client/replaying-historical-data#replaynormalized-options-...normalizers) with `withDisconnectMessages` enabled
* dataset download responses for Pro and Business subscriptions (premium network) include an `x-md5` header, but for large files uploaded in chunks this value may not match a full-file MD5 checksum — use file size and gzip decompression success as primary integrity checks
* CSV datasets for a given day are typically available by 06:00 UTC the next day. To check the exact export status for an exchange, use the [`/exchanges/:exchange`](https://docs.tardis.dev/api/http-api-reference#exchanges-exchange) API endpoint and poll the `datasets.exportedUntil` field — do not rely on wall-clock time alone
* See ["Data FAQ"](https://docs.tardis.dev/faq/data) regarding [potential order book overlaps](https://docs.tardis.dev/faq/order-books#can-reconstructed-order-books-have-bid-ask-overlap), [non-monotonically increasing exchange timestamps](https://docs.tardis.dev/faq/data#can-timestamps-be-non-monotonic-within-a-channel), [duplicated trade data](https://docs.tardis.dev/faq/data#are-exchanges-publishing-duplicated-trades-data-messages), and more

## Download via client libraries

{% hint style="info" %}
Historical datasets for the first day of each month are available to download without API key.
{% endhint %}

{% tabs %}
{% tab title="Python" %}

```python
# pip install tardis-dev
# requires Python >=3.9
from tardis_dev import download_datasets, get_exchange_details
import logging
import re

# comment out to disable debug logs
logging.basicConfig(level=logging.DEBUG)

# function used by default if not provided via options
def default_file_name(exchange, data_type, date, symbol, format):
    sanitized_symbol = re.sub(r'[:\\/?*<>|"]', "-", symbol)
    return f"{exchange}_{data_type}_{date.strftime('%Y-%m-%d')}_{sanitized_symbol}.{format}.gz"


# customized get filename function - saves data in nested directory structure
def file_name_nested(exchange, data_type, date, symbol, format):
    sanitized_symbol = re.sub(r'[:\\/?*<>|"]', "-", symbol)
    return f"{exchange}/{data_type}/{date.strftime('%Y-%m-%d')}_{sanitized_symbol}.{format}.gz"


# returns data available at https://api.tardis.dev/v1/exchanges/deribit
deribit_details = get_exchange_details("deribit")
# print(deribit_details)

download_datasets(
    # one of https://api.tardis.dev/v1/exchanges with supportsDatasets:true - use 'id' value
    exchange="deribit",
    # accepted data types - 'datasets.symbols[].dataTypes' field in https://api.tardis.dev/v1/exchanges/deribit,
    # or get those values from 'deribit_details["datasets"]["symbols"][i]["dataTypes"]' above
    data_types=["incremental_book_L2", "trades", "quotes", "derivative_ticker", "book_snapshot_25", "book_snapshot_5", "liquidations"],
    # change date ranges as needed to fetch full month or year for example
    from_date="2019-11-01",
    # to date is non inclusive
    to_date="2019-11-02",
    # accepted values: 'datasets.symbols[].id' field in https://api.tardis.dev/v1/exchanges/deribit
    symbols=["BTC-PERPETUAL", "ETH-PERPETUAL",],
    # (optional) your API key to get access to non sample data as well
    api_key="YOUR API KEY",
    # (optional) path where data will be downloaded into, default dir is './datasets'
    # download_dir="./datasets",
    # (optional) - one can customize downloaded file name/path (flat dir structure, or nested etc) - by default function 'default_file_name' is used
    # get_filename=default_file_name,
    # (optional) file_name_nested will download data to nested directory structure (split by exchange and data type)
    # get_filename=file_name_nested,
)
```

{% hint style="info" %}
If you're running into `RuntimeError: download_datasets() cannot be called from a running event loop`, use `download_datasets_async()` instead.
{% endhint %}
{% endtab %}

{% tab title="Node.js" %}

```javascript
// npm install tardis-dev
// requires Node.js v24+

// remove it to disable debug logs
process.env.DEBUG = 'tardis-dev*'

import { downloadDatasets, getExchangeDetails } from 'tardis-dev'

// returns data available at https://api.tardis.dev/v1/exchanges/deribit
const deribitDetails = await getExchangeDetails('deribit')

// console.log(deribitDetails.datasets)

await downloadDatasets({
  exchange: 'deribit', // one of https://api.tardis.dev/v1/exchanges with supportsDatasets:true - use 'id' value
  dataTypes: ['incremental_book_L2', 'trades', 'quotes', 'derivative_ticker', 'book_snapshot_25', 'book_snapshot_5', 'liquidations'], // accepted data types - 'datasets.symbols[].dataTypes' field in https://api.tardis.dev/v1/exchanges/deribit, or get those values from 'deribitDetails.datasets.symbols[].dataTypes' object above
  from: '2019-11-01', // change date ranges as needed to fetch full month or year for example
  to: '2019-11-02', // to date is non inclusive
  symbols: ['BTC-PERPETUAL', 'ETH-PERPETUAL'], // accepted values: 'datasets.symbols[].id' field in https://api.tardis.dev/v1/exchanges/deribit, or `deribitDetails.datasets.symbols[].id` from object above

  apiKey: 'YOUR_API_KEY', // (optional) your API key to get access to non sample data as well
  // downloadDir:'./datasets', // (optional) path where data will be downloaded into, default dir is './datasets'

  // getFilename: getFilenameDefault, // (optional) - one can customize downloaded file name/path (flat dir structure, or nested etc) - by default function 'getFilenameDefault' is used
  // getFilename: getFilenameCustom // (optional) getFilenameCustom will download data to nested directory structure (split by exchange and data type)
})

// function used by default if not provided via options
function sanitizeForFilename(value) {
  return value.replace(/[:\\/?*<>|"]/g, '-')
}

function getFilenameDefault({ exchange, dataType, format, date, symbol }) {
  return `${exchange}_${dataType}_${date.toISOString().split('T')[0]}_${sanitizeForFilename(symbol)}.${format}.gz`
}

// customized get filename function - saves data in nested directory structure
function getFilenameCustom({ exchange, dataType, format, date, symbol }) {
  return `${exchange}/${dataType}/${date.toISOString().split('T')[0]}_${sanitizeForFilename(symbol)}.${format}.gz`
}
```

{% endtab %}
{% endtabs %}

## Datasets API reference

<mark style="color:blue;">`GET`</mark> `https://datasets.tardis.dev/v1/:exchange/:dataType/:year/:month/:day/:symbol.csv.gz`

Returns gzip compressed CSV dataset for given exchange, data type, date (year, month, day) and symbol.

#### Path Parameters

| Name     | Type   | Description                                                                                                                |
| -------- | ------ | -------------------------------------------------------------------------------------------------------------------------- |
| exchange | string | one of <https://api.tardis.dev/v1/exchanges> (field `id`, only exchanges with "supportsDatasets":true)                     |
| dataType | string | one of `datasets.symbols[].dataTypes` values from <https://api.tardis.dev/v1/exchanges/:exchange> API response             |
| year     | string | year in format `YYYY` (four-digit year)                                                                                    |
| month    | string | month in format `MM` (two-digit month of the year)                                                                         |
| day      | string | day in format `DD` (two-digit day of the month)                                                                            |
| symbol   | string | one of `datasets.symbols[].id` values from <https://api.tardis.dev/v1/exchanges/:exchange> API response, see details below |

#### Headers

| Name          | Type   | Description                                                                                                                                                                                                        |
| ------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Authorization | string | <p>For authenticated requests provide Authorization header with value: '<code>Bearer YOUR\_API\_KEY</code>'.<br>Without API key historical datasets for the first day of each month are available to download.</p> |

{% tabs %}
{% tab title="200 gzip compressed CSV dataset" %}

```
```

{% endtab %}
{% endtabs %}

* symbols param provided to datasets API in comparison to [HTTP API](https://docs.tardis.dev/api/http-api-reference) needs to be both always uppercase and have '/' and ':' characters replaced with '-' so symbol is url safe.
* list of allowed symbols for each exchange can be requested via [/exchanges/:exchange](https://docs.tardis.dev/api/http-api-reference#exchanges-exchange) API call, e.g., <https://api.tardis.dev/v1/exchanges/deribit> - `datasets.symbols[].id` field

#### Sample requests

{% embed url="<https://datasets.tardis.dev/v1/deribit/trades/2019/11/01/BTC-PERPETUAL.csv.gz>" %}

{% embed url="<https://datasets.tardis.dev/v1/deribit/trades/2019/11/01/OPTIONS.csv.gz>" %}

{% embed url="<https://datasets.tardis.dev/v1/bitmex/incremental_book_L2/2020/04/01/XBTUSD.csv.gz>" %}

{% embed url="<https://datasets.tardis.dev/v1/deribit/options_chain/2019/08/01/OPTIONS.csv.gz>" %}

{% embed url="<https://datasets.tardis.dev/v1/deribit/book_snapshot_25/2020/08/01/BTC-PERPETUAL.csv.gz>" %}

{% embed url="<https://datasets.tardis.dev/v1/deribit/quotes/2019/08/01/OPTIONS.csv.gz>" %}
