Skip to content

Add basic timing + memory benchmarks for each stage to READMEs

As per @msn suggestion, it's nice for users to have some idea of how long / how much memory they can expect to spend on each stage, for a particular hardware working point (e.g. A100 40Gb GPU, and some standard CPU).