Add support for JSON schema validation
Table of Contents
Description of the MR
This MR adds support for validating the user-provided JSON configuration files against a well-defined schema. It also updates the format of some of the objects that are configured in these user-provided JSON files so that they are well-defined JSON objects with well-defined properties.
The idea is this: we define a (set of) JSON schemas for the configurable items in labRemote
and then during run-time we can cross-check and validate the user-provided JSON configuration files against this schema. This will not only make more clear what is configurable for each of the objects in labRemote
, but will also make the code safer and more robust. For example:
// pseudocode!
#include <nlohmann/json-schema.hpp>
using nlohmann::json;
using nlohmann::json_schema::json_validator;
...
void loadConfig(json& user_config) {
json_validator validator;
validator.set_root_schema(m_schema); // get m_schema somewhere
validator.validate(user_config); // throws std::exception if fails
// if we're still here, the user_config is valid!
m_property0 = user_config["property0"];
m_property1 = user_config["property1"];
...
}
Changes
The updates included so far in this MR are listed below:
labRemote
configuration schema
1. Define a JSON schema is a standardized thing. See the official page here. There are many libraries that can be used to parse JSON schemas and validate JSON files against it. In C++
a good one appears to be the one added in the MR (see below). In python
there is the jsonschema
module, which is also very easy to use.
This MR introduces the src/schema directory. At the moment, there are two files in this directory:
- An initial
labRemote
JSON schema. - A version of a
labRemote
JSON configuration that satisfies the schema. ThelabRemote
JSON configuration in this directory contains the same information as that in src/configs/input-hw.json but with fixed methods for defining JSON objects using specific properties. See Updates to Configurations. The JSON configuration file in theschema/
directory is temporary during the WIP phase.
2. JSON validator submodule
The MR adds an additional submodule json-schema-validator which is a JSON schema validator that is tailored for nlohmann/json
.
What is nice about this validator is that it provides very descriptive errors if a user-provided JSON file fails the schema validation. For example:
./bin/test_laberemote_schema /path/to/schema.json /path/to/labremote_config.json
Validation failed: At /devices/0/hw-type of "Scope" - instance not found in required enum
The above validation failed because the first (0
'th index) device in the devices
list in the provided configuration has an invalid/unsupported value for the the hw-type
property: "Scope"
. If we look at the schema file for the devices
definition,
"def_device": {
"type": "object",
"properties": {
...
"hw-type": { "type": "string", "enum": ["PS"]},
...
}
}
we see that the hw-type
field can only take the value "PS"
.
labRemote
schema
3. Utility to validate JSON configurations against the The MR introduces the utility src/tools/check_json_schema.cpp which checks a provided labRemote
JSON configuration file against the labRemote
schema:
./bin/check_json_schema -h
Usage: ./bin/check_json_schema [OPTIONS] <input> [<schema>]
Options:
-v, --verbose Increase verbosity of output
Required positional arguments:
input JSON file to validate against the provided schema
Optional positional arguments:
schema File containing JSON schema specification (if not provided, will use default labRemote schema)
If the provided input file satisfies the schema, it will be silent by default. If it does not satisfy the schema, it will report an error.
libUtils/FileUtils
4. Add The MR also adds libUtils/FileUtils{.h,.cpp}
. The motivation for FileUtils
is to provide common functions to operate on paths and files. For example, a cross-platform way for checking if files exist or for making output directories, etc... As we rely on C++11
, we cannot use modern C++
methods defined in std::filesystem
which are by default cross-platform (and I assume we do not want to bring in the added dependencies of boost
). So the methods defined here must be constructed so as to satisfy our supported platforms (linux
and macOSX
).
The reason FileUtils
is necessary for this MR is that the various parts of labRemote
that consume user-provided JSON configurations will need to know where to find the labRemote
schema file, which is by default located in src/schema
. The method utils::labremote_dir()
and utils::labremote_schema_file()
provide the absolute paths to the labRemote/
repository and to the default labRemote
JSON schema file, respectively. These methods can be called from any location and provide the same paths, as they do not rely on being within, e.g. labRemote/build
, but instead use system calls to find the location of the currently executed labRemote
executable. In this way, in EquipConf::setHardwareConfig(...)
, can just get the full path to the schema file in order to perform the JSON validation:
...
#include "FileUtils.h"
#include <nlohmann/json-schema.hpp>
using nlohmann::json;
using nlohmann::json_schema::json_validator;
...
void EquipConf::setHardwareconfig(string hardwareConfigFile) {
...
json_validator validator;
std::string path_to_schema = utils::labremote_schema_file();
validator.set_root_schema(json::parse(std::ifstream(path_to_schema,std::ios::in)));
...
}
5. Updates to Configurations
This MR also proposes an update to the already existing configuration files. In most cases in the configuration, there is allowed not well-defined JSON objects. For example, there is the datasinks
property with ill-defined objects:
"datasinks":{
"Console": { "sinktype": "ConsoleSink" },
"File": { "sinktype": "CSVSink", "directory" : "myOutputData" }
}
In the above definitions of datasinks
, what is trying to be conveyed is that the datasinks
property provides an array of defined and possible objects all of type DataSink
. However, the implementation, as is, defines the datasinks
as an object with several properties: Console
and File
. From the viewpoint of any code parsing or inspecting this JSON datasinks
object, the Console
and File
objects are two properties of the object datasinks
, and they are considered more or less independent. Instead of Console
and File
being names of things of a common type, they are considered independent properties.
The updates in this MR changes the above datasinks
property of the configuration to be:
"datasinks": [
{
"name": "Console",
"sinktype": "ConsoleSink"
},
{
"name": "File",
"sinktype": "CSVSink",
"directory": "myOutputData"
}
]
Now the datasinks
property of the configuration is an array of common items, and it is made explicit that Console
and File
are not properties, but rather names of objects that are of the same type (DataSink
in the above example).
What the above change allows, is to create well-defined schemas for each of the various types of DataSink
that labRemote
supports. This is done by defining the schema of the datasinks
property to rely on the use of JSON schema's "anyOf"
property, which allows the schema to enforce that the datasinks
array can contain only a specified set of possible objects. Whats more, each of the objects in this specified set can have common (e.g. "name"
) and/or distinct properties (e.g. "directory"
).
This type of change -- using an array of specified and schema-fied objects instead of ill-defined properties -- is proposed for the devices
and channels
fields as well. It adds an extra line per object, but it allows us to have a well-defined schema supported for each of our types and makes more explicit the intent of the configuration.
6. Script to Automatically Update your labRemote JSON Configurations
Given the changes mentioned in Updates to Configurations, users will have to
update their JSON configuration files from the "old" format to the "new" format. To make this simple,
this MR adds a python script scripts/update_config.py
that takes as input an old-style labRemote JSON
configuration and produces a labRemote JSON configuration file containing the same information, but formatted
in the new way.
The script scripts/update_config.py
first finds all paths to nodes in the input JSON configuration that
need to be updated (i.e. channels
, devices
, datastreams
, datasinks
, ...) and updates the JSON
blocks under those nodes to be list-formatted (as described in Updates to Configurations).
It is not a problem if there are multiple paths to sought-after nodes (channels
, devices
, datastreams
, datasinks
, ...),
as in the case of devices
in the follow configuration,
{
"labA" : {
"devices" : {...}
},
"labB" : {
"foo" : {
"devices" : {...}
}
}
}
as the scripts/update_config.py
will find them all, regardless of how nested they are, and update them accordingly.
Here is how you would update an old-styled configuration named "my-config.json" using this new script inside of a Python virtual environment:
$ python3 -m venv env # use Python >= 3.6
$ source env/bin/activate
{env} $ python -m pip install jq
{env} $ python scripts/update_config.py my-config.json
Storing updated configuration: my-config_updated.json
The script scripts/update_config.py
relies on the jq Python module and
should be installed using the pip
command as in the above snippet.
jq
is a very useful tool for manipulating JSON documents (see here for more info).
Todo
-
Finalize labRemote
JSON specification fordevices
-
Finalize labRemote
JSON specification fordatastreams
-
Finalize labRemote
JSON specification fordatasinks
-
Finalize labRemote
JSON specification forchannels
-
Incorporate JSON validation in existing labRemote
code that loads configuration from JSON configuration files -
Improve schema finding (use cmake to install schema(s) in run-time knowable location) -
Create script for updating old labRemote configurations -
Add fields for Keithley auto range etc (!201 (merged))
Related Issues
First mentioned in #64 (closed).