BlockSci consists of two major components: the parser, and the analysis library. The parser generates the BlockSci Data files. The analysis library can then be pointed at the generated data.
To parse the Bitcoin blockchain, BlockSci requires at least 60 GB of RAM (as of July 2020). For smaller blockchains using less RAM might be sufficient.
GCC 7.2 or above, or Clang 5 or above
CMake 3.11 or above
sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y sudo apt-get update sudo apt install cmake libtool autoconf libboost-filesystem-dev libboost-iostreams-dev \ libboost-serialization-dev libboost-thread-dev libboost-test-dev libssl-dev libjsoncpp-dev \ libcurl4-openssl-dev libjsoncpp-dev libjsonrpccpp-dev libsnappy-dev zlib1g-dev libbz2-dev \ liblz4-dev libzstd-dev libjemalloc-dev libsparsehash-dev python3-dev python3-pip git clone https://github.com/citp/BlockSci.git cd BlockSci mkdir release cd release CC=gcc-7 CXX=g++-7 cmake -DCMAKE_BUILD_TYPE=Release .. make sudo make install cd .. CC=gcc-7 CXX=g++-7 sudo -H pip3 install -e blockscipy
To avoid timestamp inconsistencies (i.e. pure Python
datetime objects get adjusted to the local time zone, in contrast to numpy
datetime64 objects returned by ranges/iterators and the fluent interface), set the system clock to UTC:
sudo timedatectl set-timezone UTC
You will also need to increase the open files limit on your system (e.g., to 64000).
Mac OS 10.15¶
BlockSci currently is not compatible with Xcode 11.5 and above, due to incompatibility with some dependencies. The instructions below may work with older versions of Xcode.
brew install cmake jsoncpp libjson-rpc-cpp boost openssl jemalloc zstd \ automake libtool google-sparsehash lz4 python3 sudo xcode-select --reset git clone https://github.com/citp/BlockSci.git cd BlockSci mkdir release cd release cmake -DCMAKE_BUILD_TYPE=Release -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl .. make sudo make install cd .. pip3 install -e blockscipy
Running a full node¶
The BlockSci parser extracts blockchain data generated by a full node (such as bitcoind or an altcoin node). Thus, to set up BlockSci, you must first run a full node.
The BlockSci parser provides two different mechanisms for processing blockchain data, a disk mode and an RPC mode.
Disk mode is optimized for parsing Bitcoin’s data files. It reads blockchain data directly from disk in a rapid manner. However, this means that it does not work on blockchains that have a different serialization format than Bitcoin.
RPC mode uses the RPC interface of a cryptocurrency to extract data regarding the blockchain. It works with a variety of cryptocurrencies which have the same general model as Bitcoin, but with minor changes to the serialization format (that break the parser in disk mode). Examples of this are Zcash and Namecoin. To use the parser in RPC mode, your full node must be running with
To set up a full node, please refer to its installation instructions.
Parsing the blockchain¶
Before you can parse a blockchain, you’ll need to generate a config file.
blocksci_parser <config file> generate-config <coin type> <blocksci data directory> [--max-block <max block>] [--disk <fullnode data directory>] [--rpc <username> <password> [--address <address>] [--port <port>]]
BlockSci provides defaults for the most common cryptocurrencies, including Bitcoin, Bitcoin Cash, Litecoin, Dash, Namecoin and ZCash.
These can be selected through the
<coin type> field (simply run
blocksci_parser to see all the options available).
BlockSci does, however, not provide defaults for the data directories, as they differ between operating systems.
You’ll thus need to provide information for either the
--disk or the
After creating the config file, you can parse the blockchain by running:
blocksci_parser <config file> update
BlockSci can be kept up to date with the blockchain by setting up a cronjob to periodically run the parser command. Updates to the parser should not noticeably impact usage of the analysis library. For Bitcoin, we recommended keeping the parsing at least 6 blocks behind the head of the chain as BlockSci currently does not handle reorganizations. Other cryptocurrencies may require greater security margins.
You can set BlockSci to stay 6 blocks behind the head of the chain by setting “maxBlockNum”: -6 in your config file and adding
@hourly /usr/local/bin/blocksci_parser <config file> update
to your system crontab.
BlockSci provides an optional mempool recorder will record the arrival times of blocks and transactions at your local node. The recorder works by repeatedly pinging the RPC interface and obvserving as new transactions arrive. This data is accessible directly through the python API via Tx.timestamp_seen/Tx.time_seen and Block.timestamp_seen/Block.time_seen which will return a timestamp, or None if the transaction or block was not observed. To use the mempool recorder, you need to have continuous incremental updates enabled (see above) and a valid RPC section in your config file.
mempool_recorder <config file>
BlockSci provides a clustering module to apply heuristic based clustering techniques. We recommend using it through the Python
blocksci.cluster module, which provides many customization options.
If you prefer to use the standalone tool, you can run it as follows.
blocksci_clusterer <data location> <cluster output directory> [--overwrite]