This page documents Tardis Machine historical replay APIs in both exchange-native and normalized formats. Real-time streaming is documented on Streaming Real-Time Data, and normalized response schemas are documented on Output Data Types.
Exchange-native market data APIs
Exchange-native market data API endpoints provide historical data in exchange-native format. The main difference between HTTP and WebSocket endpoints is the logic of requesting data:
WebSocket API accepts exchanges' specific 'subscribe' messages that define what data will be then "replayed" and send to WebSocket client
• HTTP GET /replay?options={options}
Returns historical market data messages in exchange-native format for given replay options query string param. Single streaming HTTP response returns data for the whole requested time period as NDJSON.
In our preliminary benchmarks on AMD Ryzen 7 3700X, 64GB RAM, HTTP /replay API endpoint was returning ~700 000 messages/s (already locally cached data).
Each replay request can trigger many parallel HTTP requests to the upstream Tardis API (one per minute of data per channel/symbol combination). Running multiple concurrent replay processes can quickly exceed API rate limits, causing stalls without visible error messages.
import asyncioimport aiohttpimport jsonimport urllib.parseasyncdefreplay_via_tardis_machine_machine(replay_options): timeout = aiohttp.ClientTimeout(total=0)asyncwith aiohttp.ClientSession(timeout=timeout)as session:# url encode as json object options encoded_options = urllib.parse.quote_plus(json.dumps(replay_options))# assumes tardis-machine HTTP API running on localhost:8000 url =f"http://localhost:8000/replay?options={encoded_options}"asyncwith session.get(url)as response:# otherwise we may get line to long errors response.content._high_water =100_000_000# returned data is in NDJSON format http://ndjson.org/# each line is separate message JSON encodedasyncfor line in response.content:yield lineasyncdefrun(): lines =replay_via_tardis_machine_machine({"exchange":"bitmex","from":"2019-10-01","to":"2019-10-02","filters":[{"channel":"trade","symbols":["XBTUSD","ETHUSD"]},{"channel":"orderBookL2","symbols":["XBTUSD","ETHUSD"]},],})asyncfor line in lines: message = json.loads(line)# localTimestamp string marks timestamp when message was received# message is a message dict as provided by exchange real-time streamprint(message["localTimestamp"], message["message"])asyncio.run(run())
Click to see API response in the browser as long as tardis-machine is running on localhost:8000
We're working on providing more samples and dedicated client libraries in different languages, but in the meanwhile to consume HTTP /replay API responses in your language of choice, you should:
Provide url encoded JSON options object via options query string param when sending HTTP request
Parse HTTP response stream line by line as it's returned - buffering in memory whole response may result in slow performance and memory overflows
replay period start date (UTC) in a ISO 8601 format, e.g., 2019-04-01
to
string
-
replay period end date (UTC) in a ISO 8601 format, e.g., 2019-04-02
withDisconnects
boolean (optional)
undefined
when set to true, response includes empty lines (\n) that mark events when real-time WebSocket connection that was used to collect the historical data got disconnected
waitWhenDataNotYetAvailable
boolean or number (optional)
undefined
when set to true, waits for data that is not yet available — useful when replaying near real-time data. Defaults to a 30-minute offset. When set to a number, specifies the offset in minutes (minimum effective value is 6 minutes).
autoCleanup
boolean (optional)
undefined
when set to true, automatically removes cached data from disk after it has been processed. Not safe for concurrent replay — see warning
When symbols array is empty or omitted in filters, data for all active symbols is returned.
Disconnect markers: An empty line in raw replay (or a disconnect message in normalized replay) indicates that the WebSocket connection used during data collection was interrupted. After a reconnect, exchanges typically re-send initial snapshots, so duplicate snapshot messages are expected following a disconnect marker. Disconnect markers apply to the entire connection, not individual channels.
autoCleanup concurrency: avoid using autoCleanup with concurrent replay — the cache path is keyed by exchange, a hash of the filters, and the calendar day, so any concurrent jobs sharing those three components will conflict (even with non-overlapping time windows within the same day) and cause file-not-found errors. Clean the cache manually after all jobs complete instead.
Response format
Streamed HTTP response provides data in NDJSON format (new line delimited JSON) - each response line is a JSON with market data message in exchange-native format plus local timestamp:
localTimestamp - date when message has been received in ISO 8601 format
message - JSON with exactly the same format as provided by requested exchange real-time feeds
Exchanges' WebSocket APIs are designed to publish real-time market data feeds, not historical ones. Tardis-machine WebSocket /ws-replay API fills that gap and allows "replaying" historical market data from any given past point in time with the same data format and 'subscribe' logic as real-time exchanges' APIs. In many cases existing exchanges' WebSocket clients can be used to connect to this endpoint just by changing URL, and receive historical market data in exchange-native format for date ranges specified in URL query string params.
After connection is established, client has 2 seconds to send subscriptions payloads and then market data replay starts.
If two clients connect at the same time requesting data for different exchanges and provide the same session key via query string param, then data being send to those clients will be synchronized (by local timestamp).
In our preliminary benchmarks on AMD Ryzen 7 3700X, 64GB RAM, WebSocket /ws-replay API endpoint was sending ~500 000 messages/s (already locally cached data).
You can also try using existing WebSocket client by changing URL endpoint to the one shown in the example above.
You can also use existing WebSocket client, just by changing URL endpoint as shown in the example below that uses ccxws.
As long as you already use existing WebSocket client that connects to and consumes real-time exchange market data feed, in most cases you can use it to connect to /ws-replay API as well just by changing URL endpoint.
Query string params
name
type
default
description
exchange
string
-
requested exchange id - use /exchanges HTTP API to get list of valid exchanges ids
from
string
-
replay period start date (UTC) in a ISO 8601 format, e.g., 2019-04-01
to
string
-
replay period end date (UTC) in a ISO 8601 format, e.g., 2019-04-02
session
string (optional)
undefined
optional replay session key. When specified and multiple clients use it when connecting at the same time then data being send to those clients is synchronized (by local timestamp).
Normalized market data APIs
Normalized market data API endpoints provide data in unified format across all supported exchanges. Both HTTP /replay-normalized and WebSocket /ws-replay-normalized APIs accept the same replay options payload via query string param. It's mostly a matter of preference when choosing which protocol to use, but WebSocket /ws-replay-normalized API also has its real-time counterpart /ws-stream-normalized, which connects directly to exchanges' real-time WebSocket APIs. This opens the possibility of seamless switching between real-time streaming and historical normalized market data replay.
In our preliminary benchmarks on AMD Ryzen 7 3700X, 64GB RAM, HTTP /replay-normalized API endpoint was returning ~100 000 messages/s and ~50 000 messages/s when order book snapshots were also requested.
Click to see API response in the browser as long as tardis-machine is running on localhost:8000
We're working on providing more samples and dedicated client libraries in different languages, but in the meanwhile to consume HTTP /replay-normalized API responses in your language of choice, you should:
Provide URL-encoded JSON options via the options query string parameter when sending an HTTP request
Parse HTTP response stream line by line as it's returned - buffering in memory whole response may result in slow performance and memory overflows
Options JSON needs to be an object or an array of objects with fields as specified below. If array is provided, then data requested for multiple exchanges is returned synchronized (by local timestamp).
name
type
default
description
exchange
string
-
requested exchange id - use /exchanges HTTP API to get list of valid exchanges ids
symbols
string[] (optional)
undefined
optional symbols of requested historical data feed - use /exchanges/:exchange HTTP API to get allowed symbols for requested exchange
from
string
-
replay period start date (UTC) in a ISO 8601 format, e.g., 2019-04-01
to
string
-
replay period end date (UTC) in a ISO 8601 format, e.g., 2019-04-02
dataTypes
string[]
-
array of normalized data types for which historical data will be returned
withDisconnectMessages
boolean (optional)
undefined
when set to true, response includes disconnect messages that mark events when real-time WebSocket connection that was used to collect the historical data got disconnected
waitWhenDataNotYetAvailable
boolean or number (optional)
undefined
when set to true, waits for data that is not yet available — useful when replaying near real-time data. Defaults to a 30-minute offset. When set to a number, specifies the offset in minutes (minimum effective value is 6 minutes).
autoCleanup
boolean (optional)
undefined
when set to true, automatically removes cached data from disk after it has been processed. Not safe for concurrent replay — see warning
WebSocket /ws-stream-normalized is the real-time counterpart of this API endpoint, providing real-time market data in the same format, but not requiring API key as connects directly to exchanges' real-time WebSocket APIs.
We're working on providing more samples and dedicated client libraries in different languages, but in the meanwhile to consume WebSocket /ws-replay-normalized API responses in your language of choice, you should:
Provide url encoded JSON options via options query string param when connecting to
Options JSON needs to be an object or an array of objects with fields as specified below. If array is provided, then data requested for multiple exchanges is being send synchronized (by local timestamp).
name
type
default
description
exchange
string
-
requested exchange id - use /exchanges HTTP API to get list of valid exchanges ids
symbols
string[] (optional)
undefined
optional symbols of requested historical data feed - use /exchanges/:exchange HTTP API to get allowed symbols for requested exchange
from
string
-
replay period start date (UTC) in a ISO 8601 format, e.g., 2019-04-01
to
string
-
replay period end date (UTC) in a ISO 8601 format, e.g., 2019-04-02
dataTypes
string[]
-
array of normalized data types for which historical data will be provided
withDisconnectMessages
boolean (optional)
undefined
when set to true, sends also disconnect messages that mark events when real-time WebSocket connection that was used to collect the historical data got disconnected
waitWhenDataNotYetAvailable
boolean or number (optional)
undefined
when set to true, waits for data that is not yet available — useful when replaying near real-time data. Defaults to a 30-minute offset. When set to a number, specifies the offset in minutes (minimum effective value is 6 minutes).
autoCleanup
boolean (optional)
undefined
when set to true, automatically removes cached data from disk after it has been processed. Not safe for concurrent replay — see warning
In our preliminary benchmarks on AMD Ryzen 7 3700X, 64GB RAM, WebSocket /ws-replay-normalized API endpoint was returning ~70 000 messages/s and ~40 000 messages/s when order book snapshots were also requested.
const fetch = require('node-fetch')
const split2 = require('split2')
const serialize = options => {
return encodeURIComponent(JSON.stringify(options))
}
async function* replayViaTardisMachine(options) {
// assumes tardis-machine HTTP API running on localhost:8000
const url = `http://localhost:8000/replay?options=${serialize(options)}`
const response = await fetch(url)
// returned data is in NDJSON format http://ndjson.org/
// each line is separate message JSON encoded
// split response body stream by new lines
const lines = response.body.pipe(split2())
for await (const line of lines) {
yield line
}
}
async function run() {
const options = {
exchange: 'bitmex',
from: '2019-10-01',
to: '2019-10-02',
filters: [
{
channel: 'trade',
symbols: ['XBTUSD', 'ETHUSD']
},
{
channel: 'orderBookL2',
symbols: ['XBTUSD', 'ETHUSD']
}
]
}
const lines = replayViaTardisMachine(options)
for await (const line of lines) {
// localTimestamp string marks timestamp when message was received
// message is a message object as provided by exchange real-time stream
const { message, localTimestamp } = JSON.parse(line)
console.log(message, localTimestamp)
}
}
run()
const ccxws = require('ccxws')
const BASE_URL = 'ws://localhost:8001/ws-replay'
const WS_REPLAY_URL = `${BASE_URL}?exchange=bitmex&from=2019-10-01&to=2019-10-02`
const bitMEXClient = new ccxws.bitmex()
// only change required for ccxws client is to point it to /ws-replay URL
bitMEXClient._wssPath = WS_REPLAY_URL
const market = {
id: 'XBTUSD',
base: 'BTC',
quote: 'USD'
}
bitMEXClient.on('l2snapshot', snapshot =>
console.log('snapshot', snapshot.asks.length, snapshot.bids.length)
)
bitMEXClient.on('l2update', update => console.log(update))
bitMEXClient.on('trade', trade => console.log(trade))
bitMEXClient.subscribeTrades(market)
bitMEXClient.subscribeLevel2Updates(market)
import asyncio
import aiohttp
import json
import urllib.parse
async def replay_normalized_via_tardis_machine_machine(replay_options):
timeout = aiohttp.ClientTimeout(total=0)
async with aiohttp.ClientSession(timeout=timeout) as session:
# url encode as json object options
encoded_options = urllib.parse.quote_plus(json.dumps(replay_options))
# assumes tardis-machine HTTP API running on localhost:8000
url = f"http://localhost:8000/replay-normalized?options={encoded_options}"
async with session.get(url) as response:
# otherwise we may get line to long errors
response.content._high_water = 100_000_000
# returned data is in NDJSON format http://ndjson.org/ streamed
# each line is separate message JSON encoded
async for line in response.content:
yield line
async def run():
lines = replay_normalized_via_tardis_machine_machine(
{
"exchange": "bitmex",
"from": "2019-10-01",
"to": "2019-10-02",
"symbols": ["XBTUSD", "ETHUSD"],
"withDisconnectMessages": True,
# other available data types examples:
# 'book_snapshot_10_100ms', 'derivative_ticker', 'quote',
# 'trade_bar_10ms', 'trade_bar_10s'
"dataTypes": ["trade", "book_change", "book_snapshot_10_100ms"],
}
)
async for line in lines:
normalized_message = json.loads(line)
print(normalized_message)
asyncio.run(run())
const fetch = require('node-fetch')
const split2 = require('split2')
const serialize = options => {
return encodeURIComponent(JSON.stringify(options))
}
async function* replayNormalizedViaTardisMachine(options) {
// assumes tardis-machine HTTP API running on localhost:8000
const url = `http://localhost:8000/replay-normalized?options=${serialize(
options
)}`
const response = await fetch(url)
// returned data is in NDJSON format http://ndjson.org/
// each line is separate message JSON encoded
// split response body stream by new lines
const lines = response.body.pipe(split2())
for await (const line of lines) {
yield line
}
}
async function run() {
const options = {
exchange: 'bitmex',
from: '2019-10-01',
to: '2019-10-02',
symbols: ['XBTUSD', 'ETHUSD'],
withDisconnectMessages: true,
// other available data types examples:
// 'book_snapshot_10_100ms', 'derivative_ticker', 'quote',
// 'trade_bar_10ms', 'trade_bar_10s'
dataTypes: ['trade', 'book_change', 'book_snapshot_10_100ms']
}
const lines = replayNormalizedViaTardisMachine(options)
for await (const line of lines) {
const normalizedMessage = JSON.parse(line)
console.log(normalizedMessage)
}
}
run()