Skip to content

Add support for JSON schema validation

Daniel Joseph Antrim requested to merge dantrim/labRemote:add_jsonschema into devel

Table of Contents

Description of the MR

This MR adds support for validating the user-provided JSON configuration files against a well-defined schema. It also updates the format of some of the objects that are configured in these user-provided JSON files so that they are well-defined JSON objects with well-defined properties.

The idea is this: we define a (set of) JSON schemas for the configurable items in labRemote and then during run-time we can cross-check and validate the user-provided JSON configuration files against this schema. This will not only make more clear what is configurable for each of the objects in labRemote, but will also make the code safer and more robust. For example:

// pseudocode!
#include <nlohmann/json-schema.hpp>
using nlohmann::json;
using nlohmann::json_schema::json_validator;
...

void loadConfig(json& user_config) {
   json_validator validator;
   validator.set_root_schema(m_schema); // get m_schema somewhere

   validator.validate(user_config); // throws std::exception if fails
   // if we're still here, the user_config is valid!
   m_property0 = user_config["property0"];
   m_property1 = user_config["property1"];
   ...
}

Changes

The updates included so far in this MR are listed below:

1. Define a labRemote configuration schema

JSON schema is a standardized thing. See the official page here. There are many libraries that can be used to parse JSON schemas and validate JSON files against it. In C++ a good one appears to be the one added in the MR (see below). In python there is the jsonschema module, which is also very easy to use.

This MR introduces the src/schema directory. At the moment, there are two files in this directory:

  1. An initial labRemote JSON schema.
  2. A version of a labRemote JSON configuration that satisfies the schema. The labRemote JSON configuration in this directory contains the same information as that in src/configs/input-hw.json but with fixed methods for defining JSON objects using specific properties. See Updates to Configurations. The JSON configuration file in the schema/ directory is temporary during the WIP phase.

2. JSON validator submodule

The MR adds an additional submodule json-schema-validator which is a JSON schema validator that is tailored for nlohmann/json.

What is nice about this validator is that it provides very descriptive errors if a user-provided JSON file fails the schema validation. For example:

./bin/test_laberemote_schema /path/to/schema.json /path/to/labremote_config.json
Validation failed: At /devices/0/hw-type of "Scope" - instance not found in required enum

The above validation failed because the first (0'th index) device in the devices list in the provided configuration has an invalid/unsupported value for the the hw-type property: "Scope". If we look at the schema file for the devices definition,

"def_device": {
    "type": "object",
    "properties": {
        ...
        "hw-type": { "type": "string", "enum": ["PS"]},
        ...
    }
}

we see that the hw-type field can only take the value "PS".

3. Utility to validate JSON configurations against the labRemote schema

The MR introduces the utility src/tools/check_json_schema.cpp which checks a provided labRemote JSON configuration file against the labRemote schema:

./bin/check_json_schema -h
Usage: ./bin/check_json_schema [OPTIONS] <input> [<schema>]

Options:
 -v, --verbose    Increase verbosity of output
Required positional arguments:
 input            JSON file to validate against the provided schema
Optional positional arguments:
 schema           File containing JSON schema specification (if not provided, will use default labRemote schema)

If the provided input file satisfies the schema, it will be silent by default. If it does not satisfy the schema, it will report an error.

4. Add libUtils/FileUtils

The MR also adds libUtils/FileUtils{.h,.cpp}. The motivation for FileUtils is to provide common functions to operate on paths and files. For example, a cross-platform way for checking if files exist or for making output directories, etc... As we rely on C++11, we cannot use modern C++ methods defined in std::filesystem which are by default cross-platform (and I assume we do not want to bring in the added dependencies of boost). So the methods defined here must be constructed so as to satisfy our supported platforms (linux and macOSX).

The reason FileUtils is necessary for this MR is that the various parts of labRemote that consume user-provided JSON configurations will need to know where to find the labRemote schema file, which is by default located in src/schema. The method utils::labremote_dir() and utils::labremote_schema_file() provide the absolute paths to the labRemote/ repository and to the default labRemote JSON schema file, respectively. These methods can be called from any location and provide the same paths, as they do not rely on being within, e.g. labRemote/build, but instead use system calls to find the location of the currently executed labRemote executable. In this way, in EquipConf::setHardwareConfig(...), can just get the full path to the schema file in order to perform the JSON validation:

...
#include "FileUtils.h"
#include <nlohmann/json-schema.hpp>
using nlohmann::json;
using nlohmann::json_schema::json_validator;
...
void EquipConf::setHardwareconfig(string hardwareConfigFile) {
    ...
    json_validator validator;
    std::string path_to_schema = utils::labremote_schema_file();
    validator.set_root_schema(json::parse(std::ifstream(path_to_schema,std::ios::in)));
    ...
}

5. Updates to Configurations

This MR also proposes an update to the already existing configuration files. In most cases in the configuration, there is allowed not well-defined JSON objects. For example, there is the datasinks property with ill-defined objects:

"datasinks":{
    "Console": { "sinktype": "ConsoleSink" },
    "File": { "sinktype": "CSVSink", "directory" : "myOutputData" }
}

In the above definitions of datasinks, what is trying to be conveyed is that the datasinks property provides an array of defined and possible objects all of type DataSink. However, the implementation, as is, defines the datasinks as an object with several properties: Console and File. From the viewpoint of any code parsing or inspecting this JSON datasinks object, the Console and File objects are two properties of the object datasinks, and they are considered more or less independent. Instead of Console and File being names of things of a common type, they are considered independent properties.

The updates in this MR changes the above datasinks property of the configuration to be:

"datasinks": [
    {
        "name": "Console",
        "sinktype": "ConsoleSink"
    },
    {
        "name": "File",
        "sinktype": "CSVSink",
        "directory": "myOutputData"
    }
]

Now the datasinks property of the configuration is an array of common items, and it is made explicit that Console and File are not properties, but rather names of objects that are of the same type (DataSink in the above example).

What the above change allows, is to create well-defined schemas for each of the various types of DataSink that labRemote supports. This is done by defining the schema of the datasinks property to rely on the use of JSON schema's "anyOf" property, which allows the schema to enforce that the datasinks array can contain only a specified set of possible objects. Whats more, each of the objects in this specified set can have common (e.g. "name") and/or distinct properties (e.g. "directory").

This type of change -- using an array of specified and schema-fied objects instead of ill-defined properties -- is proposed for the devices and channels fields as well. It adds an extra line per object, but it allows us to have a well-defined schema supported for each of our types and makes more explicit the intent of the configuration.

6. Script to Automatically Update your labRemote JSON Configurations

Given the changes mentioned in Updates to Configurations, users will have to update their JSON configuration files from the "old" format to the "new" format. To make this simple, this MR adds a python script scripts/update_config.py that takes as input an old-style labRemote JSON configuration and produces a labRemote JSON configuration file containing the same information, but formatted in the new way.

The script scripts/update_config.py first finds all paths to nodes in the input JSON configuration that need to be updated (i.e. channels, devices, datastreams, datasinks, ...) and updates the JSON blocks under those nodes to be list-formatted (as described in Updates to Configurations). It is not a problem if there are multiple paths to sought-after nodes (channels, devices, datastreams, datasinks, ...), as in the case of devices in the follow configuration,

{
    "labA" : {
        "devices" : {...}
     },
     "labB" : {
        "foo" : {
            "devices" : {...}
        }
     }
}

as the scripts/update_config.py will find them all, regardless of how nested they are, and update them accordingly.

Here is how you would update an old-styled configuration named "my-config.json" using this new script inside of a Python virtual environment:

$ python3 -m venv env # use Python >= 3.6
$ source env/bin/activate
{env} $ python -m pip install jq
{env} $ python scripts/update_config.py my-config.json
Storing updated configuration: my-config_updated.json

The script scripts/update_config.py relies on the jq Python module and should be installed using the pip command as in the above snippet. jq is a very useful tool for manipulating JSON documents (see here for more info).

Todo

  • Finalize labRemote JSON specification for devices
  • Finalize labRemote JSON specification for datastreams
  • Finalize labRemote JSON specification for datasinks
  • Finalize labRemote JSON specification for channels
  • Incorporate JSON validation in existing labRemote code that loads configuration from JSON configuration files
  • Improve schema finding (use cmake to install schema(s) in run-time knowable location)
  • Create script for updating old labRemote configurations
  • Add fields for Keithley auto range etc (!201 (merged))

Related Issues

First mentioned in #64 (closed).

Edited by Daniel Joseph Antrim

Merge request reports