Skip to content
Snippets Groups Projects
Commit 6618e999 authored by [ACC] Elena Operation's avatar [ACC] Elena Operation
Browse files

fixed some doc

parent 1f1e1fe9
No related branches found
No related tags found
No related merge requests found
Pipeline #3495799 passed
......@@ -6,7 +6,11 @@ Mainly used as dependance of `pyjapcscout`, but it can be used for other purpose
## Purpose of this project
The idea is to provide a few sweet functions to go from a nested `dict` of `numpy` arrays to `parquet` (and to `pickle`, and `json`) and come back **preserving** the data types (but for `json`, for which no coming back is impelmented here!). The aspect related to data types preservation is important for the roud-trip of meachine parameter reading, saving and settings.
This package is meant to be simple enought and with very little dependencies to allow for *home* data analysis without the needs of *CERN TN Network* or *Java* libraries.
This package is meant to be simple enough and with very little dependencies to allow for *home* data analysis without the needs of *CERN TN Network* or *Java* libraries.
The basic data unit (or dataset) is assumed to be a (nested) **dictionary** of **numpy values** and **numpy arrays**.
**Lists** are in principle not allowed (at least not supported) inside a dataset.
On the other hand, **lists** might be used to define a list of datasets (e.g. a list of consecutive acquisitions of a accelerator data).
## Getting started
......
......@@ -287,6 +287,14 @@ def _convert_parrow_data(
def dict_to_pyarrow(input_dict):
"""Convert a dictionary into a PyArrow Array
Args:
input_dict (dict): the dictionary to convert
Returns:
(PyArrow): the data converted as PyArrow Array
"""
my_data_dict_converted = _convert_dict_list(
input_dict, in_memory=False, split_to_list=False, verbose=False
)
......@@ -296,18 +304,48 @@ def dict_to_pyarrow(input_dict):
def pyarrow_to_parquet(input_pa, filename):
"""Save a given PyArrow Array into a parquet file
Args:
input_pa (dict): the PyArrow Array to save
filename (string): the file name (with its path)
"""
pq.write_table(input_pa, filename)
def parquet_to_pyarrow(filename):
"""Loads a parquet file as PyArrow Array
Args:
filename (string): the file name (with its path)
Returns:
(PyArrow): the loaded file as PyArrow Array
"""
return pq.read_table(filename)
def pyarrow_to_dict(input_pa):
"""Convert a PyArrow Array into a dictionary
Args:
input_pa (PyArrow): the PyArrow Array to convert
Returns:
(dict): the data converted as dict
"""
return _convert_parrow_data(input_pa)
def pyarrow_to_pandas(input_pa):
"""Convert a PyArrow Array into a Pandas DataFrame
Args:
input_pa (PyArrow): the PyArrow Array to convert
Returns:
(DataFrame): the data converted as Pandas DataFrame
"""
return dict_to_pandas(pyarrow_to_dict(input_pa))
......
......@@ -9,6 +9,10 @@ The main purpose is to use it as dependance of `pyjapcscout` in the control room
acquired form the control system as parquet files, and then on the user's "GPN" computer for data
analysis without the need of JAVA or other dependances needed to interact with the control system.
The basic data unit (or dataset) is assumed to be a (nested) **dictionary** of **numpy values** and **numpy arrays**.
**Lists** are in principle not allowed (at least not supported) inside a dataset.
On the other hand, **lists** might be used to define a list of datasets (e.g. a list of consecutive acquisitions of a accelerator data).
This package provides the following (main) functions. Note that many of those functions are simple wrappers of external functions (from `pandas`, `pyarrow`, `awkward`), but sometimes with some twiks to make sure data type/shape is somewhat always preserved.
- `dict_to_pandas(input_dict)`: Creates a `pandas` dataframe from a (list of) `dict`.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment