fixed some doc

6618e999 · [ACC] Elena Operation · 1f1e1fe9 · 6618e999 · 6618e999 · 6618e999
Commit 6618e999 authored 3 years ago by [ACC] Elena Operation
--- a/README.md
+++ b/README.md
@@ -6,7 +6,11 @@ Mainly used as dependance of `pyjapcscout`, but it can be used for other purpose
 ## Purpose of this project

 The idea is to provide a few sweet functions to go from a nested `dict` of `numpy` arrays to `parquet` (and to `pickle`, and `json`) and come back **preserving** the data types (but for `json`, for which no coming back is impelmented here!). The aspect related to data types preservation is important for the roud-trip of meachine parameter reading, saving and settings.
-This package is meant to be simple enought and with very little dependencies to allow for *home* data analysis without the needs of *CERN TN Network* or *Java* libraries. 
+This package is meant to be simple enough and with very little dependencies to allow for *home* data analysis without the needs of *CERN TN Network* or *Java* libraries.
+
+The basic data unit (or dataset) is assumed to be a (nested) **dictionary** of **numpy values** and **numpy arrays**.
+**Lists** are in principle not allowed (at least not supported) inside a dataset.
+On the other hand, **lists** might be used to define a list of datasets (e.g. a list of consecutive acquisitions of a accelerator data).

 ## Getting started


--- a/datascout/_datascout.py
+++ b/datascout/_datascout.py
@@ -287,6 +287,14 @@ def _convert_parrow_data(


 def dict_to_pyarrow(input_dict):
+    """Convert a dictionary into a PyArrow Array
+
+    Args:
+        input_dict (dict): the dictionary to convert
+
+    Returns:
+        (PyArrow): the data converted as PyArrow Array
+    """
    my_data_dict_converted = _convert_dict_list(
        input_dict, in_memory=False, split_to_list=False, verbose=False
    )
@@ -296,18 +304,48 @@ def dict_to_pyarrow(input_dict):


 def pyarrow_to_parquet(input_pa, filename):
+    """Save a given PyArrow Array into a parquet file
+
+    Args:
+        input_pa (dict): the PyArrow Array to save
+        filename (string): the file name (with its path)
+    """
    pq.write_table(input_pa, filename)


 def parquet_to_pyarrow(filename):
+    """Loads a parquet file as PyArrow Array
+
+    Args:
+        filename (string): the file name (with its path)
+
+    Returns:
+        (PyArrow): the loaded file as PyArrow Array
+    """
    return pq.read_table(filename)


 def pyarrow_to_dict(input_pa):
+    """Convert a PyArrow Array into a dictionary
+
+    Args:
+        input_pa (PyArrow): the PyArrow Array to convert
+
+    Returns:
+        (dict): the data converted as dict
+    """
    return _convert_parrow_data(input_pa)


 def pyarrow_to_pandas(input_pa):
+    """Convert a PyArrow Array into a Pandas DataFrame
+
+    Args:
+        input_pa (PyArrow): the PyArrow Array to convert
+
+    Returns:
+        (DataFrame): the data converted as Pandas DataFrame
+    """
    return dict_to_pandas(pyarrow_to_dict(input_pa))



--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -9,6 +9,10 @@ The main purpose is to use it as dependance of `pyjapcscout` in the control room
 acquired form the control system as parquet files, and then on the user's "GPN" computer for data
 analysis without the need of JAVA or other dependances needed to interact with the control system.

+The basic data unit (or dataset) is assumed to be a (nested) **dictionary** of **numpy values** and **numpy arrays**.
+**Lists** are in principle not allowed (at least not supported) inside a dataset.
+On the other hand, **lists** might be used to define a list of datasets (e.g. a list of consecutive acquisitions of a accelerator data).
+
 This package provides the following (main) functions. Note that many of those functions are simple wrappers of external functions (from `pandas`, `pyarrow`, `awkward`), but sometimes with some twiks to make sure data type/shape is somewhat always preserved.

 - `dict_to_pandas(input_dict)`: Creates a `pandas` dataframe from a (list of) `dict`.