Skip to content

Possible unpacker speed up

Summary

From unpacker profiling (see relevant logs section) it is becoming evident that the most time consuming operation is geo-tagging the VFAT block results. Currently it is done with a following function:

def update_key(self, key):
    return f'{key[:5]}{self.slot}:{self.link}:{self.pos}{key[10:]}'

a key here is a string like VFAT:$:!:#:POS One of the possible ways to improve it is to use tuples instead of strings:

(slot, link, position, 'FIELD NAME')

We have to consider how this will affect the selection of dataframe columns for further analysis (well, column renaming can be mapped and done once for resulting data frame). Other suggestions are very welcome.

Relevant logs and/or screenshots

 3392054 function calls (3384732 primitive calls) in 4.620 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1005048    0.848    0.000    0.848    0.000 vfat.py:47(update_key)
      209    0.511    0.002    0.513    0.002 {pandas._libs.lib.maybe_convert_objects}
   232650    0.454    0.000    0.828    0.000 generic_block.py:17(unpack_word)
   111672    0.325    0.000    1.173    0.000 vfat.py:54(<dictcomp>)
   214641    0.304    0.000    0.304    0.000 {method 'update' of 'dict' objects}
   232650    0.195    0.000    0.195    0.000 {method 'unpack' of '_cbitstruct.CompiledFormatDict' objects}
     9306    0.163    0.000    2.291    0.000 geb.py:47(unpack)
   223344    0.160    0.000    0.160    0.000 geb.py:44(update_key)
   111672    0.141    0.000    1.726    0.000 vfat.py:50(unpack)
     9306    0.137    0.000    2.870    0.000 amc.py:91(unpack)
Edited by Mykhailo Dalchenko