diff --git a/docs/source/index.rst b/docs/source/index.rst index 0feb359fd9a7ed064d4385f21ccf82d7dc1ed24b..76f5c80f497cc38a82d90725ea0a2882a161fe09 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -9,6 +9,25 @@ The main purpose is to use it as dependance of `pyjapcscout` in the control room acquired form the control system as parquet files, and then on the user's "GPN" computer for data analysis without the need of JAVA or other dependances needed to interact with the control system. +This package provides the following (main) functions. Note that many of those functions are simple wrappers of external functions (from `pandas`, `pyarrow`, `awkward`), but sometimes with some twiks to make sure data type/shape is somewhat always preserved. + +- `dict_to_pandas(input_dict)`: Creates a `pandas` dataframe from a (list of) `dict`. +- `dict_to_awkward(input_dict)`: Creates an `awkward` array from a (list of) `dict`. +- `dict_to_parquet(input_dict, filename)`: Saves a (list of) `dict` into a `parquet` file. **In order to do so, 2D arrays are split in 1D arrays of 1D arrays.** +- `dict_to_pickle(input_dict, filename)`: Saves a (list of) `dict` into a `pickle` file. +- `dict_to_json(input_dict, filename)`: Saves a (list of) `dict` into a `json` file. +- `json_to_pandas(filename)`: It loads from a `json` file a `pandas` dataframe. This function is not so interesting (because data types/shapes are not preserved), but provided for convenience. +- `pandas_to_dict(input_pandas)`: It converts back a `pandas` dataframe into a (list of) `dict`. +- `awkward_to_dict(input_awkward)`: It converts back a `awkward` array into a (list of) `dict`. **In order to preserve data type/shape, it re-merges 1D arrays of 1D arrays into 2D arrays.** +- `parquet_to_dict(filename)`: Loads a (list of) `dict` from a `parquet` file. **In order to preserve data type/shape, it re-merges 1D arrays of 1D arrays into 2D arrays.** +- `pickle_to_dict(filname)`: Loads a (list of) `dict` from a `pickle` file. +- `pandas_to_awkward(input_pandas)`: It creates an `awkward`array starting from a `pandas` dataframe. +- `awkward_to_pandas(input_awkward)`: It creates an `pandas` dataframe starting from a `awkward` array. +- `parquet_to_pandas(filename)`: It loads a `parquet` file into a `pandas` dataframe. **Instead of using the method provided by `pandas` (which does not preserve single value types and 2D arrays), it first loads the parquet as `dict`, and then converts it into a `pandas` dataframe.** +- `parquet_to_awkward(filename)`: It loads a `parquet` file into a `awkward` array. +- `save_dict(dictData, folderPath = None, filename = None, fileFormat='parquet')`: Additional wrapper of a few functions above to easily save a `dict` on a file using a supported format (`parquet` and `dict` for the time being) +- `load_dict(filename, fileFormat='parquet')`: It reads a file assuming a given format and returns its content as a `dict` (which can be then converted to other formats...) + Installation ------------ diff --git a/docs/source/usage.rst b/docs/source/usage.rst index 7b7437c54a6d909b2c0cfbd36c8044e60af96088..11689a76aef529261deab480d1dc076abc42f264 100644 --- a/docs/source/usage.rst +++ b/docs/source/usage.rst @@ -22,6 +22,7 @@ A simple example of use could be the following: The generated dict can now be converted to Awkward Arrays, PyArrow Arrays or pandas DataFrame: .. testcode:: + ds.dict_to_awkward(my_dict) ds.dict_to_pyarrow(my_dict) ds.dict_to_pandas(my_dict) @@ -29,6 +30,7 @@ The generated dict can now be converted to Awkward Arrays, PyArrow Arrays or pan The generated dict can also be stored to file: .. testcode:: + # Parquet ds.dict_to_parquet(my_dict, "my_test_data.parquet") my_dict_load = ds.parquet_to_dict("my_test_data.parquet") @@ -44,6 +46,7 @@ If one decides to store data to JSON, one should know that data type and precisi **NOT preserved**! .. testcode:: + # JSON - no exact data preservation ds.dict_to_json(my_dict, "my_test_data.json") my_pandas_load = ds.json_to_pandas("my_test_data.json") @@ -54,6 +57,7 @@ If one decides to store data to JSON, one should know that data type and precisi The loss of precision is due to the JSON data format, and not to data conversion to pandas .. testcode:: + # note: going to pandas and back to dict does keep the data integrity my_pandas = ds.dict_to_pandas(my_dict) my_dict_from_pandas = ds.pandas_to_dict(my_pandas)