Skip to content

[master] BK query sampling support for all relevant queries

Ryunosuke O'Neil requested to merge roneil-BKquery-sampling into master

Implement the following to support deterministic random sampling of BK query results:

from hashlib import md5

def make_sample_max_hash(fraction):
    upper_limit = (2**128) - 1
    width = int(upper_limit * fraction)
    max_hash = hex(width)[2:].upper()
    return max_hash


def do_sampling(sampleable, seed_md5, fraction):
    max_hash = make_sample_max_hash(fraction)
    for i in sampleable:
        h = md5((i + seed_md5).encode('utf-8')).hexdigest().upper()
        if h < max_hash:
            yield i

cc @cburr

BEGINRELEASENOTES

*NewOracleBookkeepingDB NEW: BK query sampling support for all relevant queries

ENDRELEASENOTES

Edited by Ryunosuke O'Neil

Merge request reports

Loading