Local Setup¶

BlockSci consists of two major components: the parser, and the analysis library. The parser generates the BlockSci Data files. The analysis library can then be pointed at the generated data.

System requirements¶

To parse the Bitcoin blockchain, BlockSci requires at least 60 GB of RAM (as of July 2020). For smaller blockchains using less RAM might be sufficient.

GCC 7.2 or above, or Clang 5 or above
CMake 3.11 or above
Python 3

Installation¶

Ubuntu 18.04¶

sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y
sudo apt-get update
sudo apt install cmake libtool autoconf libboost-filesystem-dev libboost-iostreams-dev \
libboost-serialization-dev libboost-thread-dev libboost-test-dev  libssl-dev libjsoncpp-dev \
libcurl4-openssl-dev libjsoncpp-dev libjsonrpccpp-dev libsnappy-dev zlib1g-dev libbz2-dev \
liblz4-dev libzstd-dev libjemalloc-dev libsparsehash-dev python3-dev python3-pip

git clone https://github.com/citp/BlockSci.git
cd BlockSci
mkdir release
cd release
CC=gcc-7 CXX=g++-7 cmake -DCMAKE_BUILD_TYPE=Release ..
make
sudo make install

cd ..
CC=gcc-7 CXX=g++-7 sudo -H pip3 install -e blockscipy

To avoid timestamp inconsistencies (i.e. pure Python datetime objects get adjusted to the local time zone, in contrast to numpy datetime64 objects returned by ranges/iterators and the fluent interface), set the system clock to UTC:

sudo timedatectl set-timezone UTC

You will also need to increase the open files limit on your system (e.g., to 64000).

Mac OS 10.15¶

BlockSci currently is not compatible with Xcode 11.5 and above, due to incompatibility with some dependencies. The instructions below may work with older versions of Xcode.

brew install cmake jsoncpp libjson-rpc-cpp boost openssl jemalloc zstd \
automake libtool google-sparsehash lz4 python3
sudo xcode-select --reset

git clone https://github.com/citp/BlockSci.git
cd BlockSci
mkdir release
cd release
cmake -DCMAKE_BUILD_TYPE=Release -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl ..
make
sudo make install

cd ..
pip3 install -e blockscipy

Running a full node¶

The BlockSci parser extracts blockchain data generated by a full node (such as bitcoind or an altcoin node). Thus, to set up BlockSci, you must first run a full node.

The BlockSci parser provides two different mechanisms for processing blockchain data, a disk mode and an RPC mode.

Disk mode is optimized for parsing Bitcoin’s data files. It reads blockchain data directly from disk in a rapid manner. However, this means that it does not work on blockchains that have a different serialization format than Bitcoin.
RPC mode uses the RPC interface of a cryptocurrency to extract data regarding the blockchain. It works with a variety of cryptocurrencies which have the same general model as Bitcoin, but with minor changes to the serialization format (that break the parser in disk mode). Examples of this are Zcash and Namecoin. To use the parser in RPC mode, your full node must be running with txindex enabled.

To set up a full node, please refer to its installation instructions.

Parsing the blockchain¶

Before you can parse a blockchain, you’ll need to generate a config file.

blocksci_parser <config file> generate-config <coin type> <blocksci data directory> [--max-block <max block>] [--disk <fullnode data directory>] [--rpc <username> <password> [--address <address>] [--port <port>]]

BlockSci provides defaults for the most common cryptocurrencies, including Bitcoin, Bitcoin Cash, Litecoin, Dash, Namecoin and ZCash. These can be selected through the <coin type> field (simply run blocksci_parser to see all the options available). BlockSci does, however, not provide defaults for the data directories, as they differ between operating systems. You’ll thus need to provide information for either the --disk or the --rpc option.

After creating the config file, you can parse the blockchain by running:

blocksci_parser <config file> update

Incremental updates¶

BlockSci can be kept up to date with the blockchain by setting up a cronjob to periodically run the parser command. Updates to the parser should not noticeably impact usage of the analysis library. For Bitcoin, we recommended keeping the parsing at least 6 blocks behind the head of the chain as BlockSci currently does not handle reorganizations. Other cryptocurrencies may require greater security margins.

You can set BlockSci to stay 6 blocks behind the head of the chain by setting “maxBlockNum”: -6 in your config file and adding

@hourly /usr/local/bin/blocksci_parser <config file> update

to your system crontab.

Mempool recorder¶

BlockSci provides an optional mempool recorder will record the arrival times of blocks and transactions at your local node. The recorder works by repeatedly pinging the RPC interface and obvserving as new transactions arrive. This data is accessible directly through the python API via Tx.timestamp_seen/Tx.time_seen and Block.timestamp_seen/Block.time_seen which will return a timestamp, or None if the transaction or block was not observed. To use the mempool recorder, you need to have continuous incremental updates enabled (see above) and a valid RPC section in your config file.

mempool_recorder <config file>

Clustering¶

BlockSci provides a clustering module to apply heuristic based clustering techniques. We recommend using it through the Python blocksci.cluster module, which provides many customization options.

If you prefer to use the standalone tool, you can run it as follows.

blocksci_clusterer <data location> <cluster output directory> [--overwrite]

Multi-chain mode¶

The multi-chain mode described in the paper is not integrated into the latest release of BlockSci. The prototype implementation is available at mplattner/BlockSci.