Structure the analysis scripts
Summary
At the moment, we are using a disparate set of analysis scripts with different data formats and behaviors. We should uniformize those in order to simplify their usage and their port to the analysis suite.
The proposal is the following:
-
All
scriptsexecutables do only one task and well (KISS philosophy). A typical example is the S-bit rate analysis that can be factored in the following steps:- Raw results -> output thresholds (with an option to define the allowed noise levels)
- Plotting of the raw results (with an option to add the derived thresholds in the plot)
- Convert the output thresholds into VFAT configuration files
These 3
scriptsexecutables being able to run independently and linked together via Bash commands/scripts. -
The Python scripts have the following structure to help in the port to the analysis framework when the time will come:
def my_helper_function_1(): pass def my_helper_function_2(): pass def my_main_function(): pass def my_tool_1(): # Create an argument parser # Call the functions with the right parameters def my_tool_2(): # Create an argument parser # Call the functions with the right parameters # EDIT # if __name__ == "__main__": # # Create an argument parser # # Call the functions with the right parameters
The Python functions are converted into executable programs with the right import in the
pyproject.toml
file. This is an intermidate solution until a better and more uniform CLI system is provided. -
The input and output paths are given as arguments to the scripts and are not inferred based on the location of the scripts themselves or other files. This allows to easily run on the same data with different parameters.
-
The manipulated files are, at the moment, CSV files, ideally GZIP compressed. They should be manipulated with
pandas
in case the dataframe storage format would change (HDF5?).- We need to agree on a delimiter. A reduced list of potential characters is
;
,:
,|
. ;
has been chosen by @cgalloni incmsgemos
, use it in absence of other proposals
- We need to agree on a delimiter. A reduced list of potential characters is
-
A line in the data file must be self-consistent. In the S-bit rate example, a line in the output file is enough to figure out to which VFAT in the whole system a threshold must be applied. In this case, it means that the
fed
,slot
,optohybrid
andvfat
must all be defined.