Skip to content
Snippets Groups Projects
user avatar
Tobias Kappé authored
fc9c47df
History

Hashtable benchmarks

This project contains a set of benchmarks for some hashtable implementations. It reviews their speed and memory usage. The code was written to review the performance of NVRAM versus RAM in the context of EOS at the IT-DSS section of CERN openlab.

Building the code

To build the code, create a directory build and run cmake from there, i.e.:

mkdir build
cd build
cmake ../
make

Generating a log file

One can generate a (random) log file using the genlogfile tool. This tool needs a configuration file. The header of this configuration file associates some keys to values. Possible keys are:

  • key_type: specifies the type of the hashtable keys. Can be either longlong (unsigned 64-bit integer) or string.
  • key_size: if key_type is string, this can be used to specify the size in bytes of the keys used.
  • value_size: specifies the size of values in the hash table, in bytes.
  • entries: the maxiumum number of entries in the hashtable.
  • alphanumeric: set to true when the values (and keys, when applicable) should be alphanumeric instead of random bytes. The default value is false.

The header is followed by an empty line. Then, the instructions for load generation follow. These are comprised of an operation name and the number of times this particular operation needs to be executed. Possible operations are:

  • set: Adds a random key/value-pair to the hashtable. Note that the key may already exist in the hashtable.
  • get: Retrieves a random value from the hashtable, using a (randomly selected) key that is already in the hashtable.
  • get-random: Retrieves a randomly generated key from the hashtable. Chances are this key does not exist.
  • delete and delete-random: Like get and get-random respectively, but delete the key.
  • iterate: Iterates over all entries in the hashtable (in arbitrary order).

An example configuration file is:

key_type longlong
value_size 1000
entries 1000
alphanumeric true

set 50
get 10
delete 20
iterate 2

To generate a log file, specify the configuration file and the filename of the log file to be written:

./genlogfile log.conf log.bin

Benchmarking

To run a benchmark, invoke one of benchmark-string or benchmark-longlong (depending on the key type) with a log file and the interface to use, for example:

./benchmark-string log.bin ./PersistentHashtableInterface.so

Note that the ./ part is necessary for the executable to be able to find the library. Currently, the following interfaces are available:

  • StdMapInterface.so uses std::map from the STL
  • GoogleDenseMapInterface.so uses google::dense_hash_map from Sparsehash (if the library is found by CMake)
  • GoogleSparseMapInterface.so uses google::sparse_hash_map from Sparsehash (if the library is found by CMake)
  • PersistentHashtableInterface.so uses C++ wrapper around a pure C hashtable, found in the PersistentHashtable directory.

A benchmark will print the user- and kernel-time spent executing the directives from the log file, along with the memory consumed while doing so.

Optionally, a third argument can be provided, like so:

./benchmark-string log.bin ./PersistentHashtableInterface.so fingerprint.bin

This will write a CRC32 fingerprint of the hashtable after each operation to fingerprint.bin. Currently, this is only implemented for PersistentHashtableInterface, as it is meant to be used when validating memory sanity after a sudden crash. Be aware that the fingerprinting will notably slow down the operations.