Historical data format is the same as provided by real-time Binance WebSocket API with addition of local timestamps. If you'd like to work with normalized data format instead (same format for each exchange) see downloadable CSV files or official client libs that can perform data normalization client-side.
Tardis-machine is a locally runnable server that exposes API allowing efficiently requesting historical market data for whole time periods in contrast to HTTP API that provides data only in minute by minute slices.
Binance depth channel has been recorded with the fastest update speed API allowed at the time. It means until 2019-08-30 it was depth (without @time suffix) - book updates pushed every 1000ms and after that date it was [email protected] - book updates pushed every 100ms (new API feature).
depthSnapshot - generated channel with full order book snapshots
Binance real-time WebSocket API does not provide initial order book snapshots. To overcome this issue we fetch initial order book snapshots from REST API and store them together with the rest of the WebSocket messages - top 1000 levels. Such snapshot messages are marked with "stream":"<symbol>@depthSnapshot" and "generated":true fields.
During data collection integrity of order book incremental updates is being validated using sequence numbers provided by real-time feed (U and u fields) - in case of detecting missed message WebSocket connection is being restarted. We also validate if initial book snapshot fetched from REST API overlaps with received depthmessages.
Market data collection details
Market data collection infrastructure for Binance since 2020-05-18 is located in GCP asia-northeast1 (Tokyo, Japan), before that it was located in GCP europe-west2 region (London, UK).
Real-time market data is captured via multiple WebSocket connections.
Binance servers are located in AWS ap-northeast-1 region (Tokyo, Japan).