Feature: JSON logging with correct field types

Problem to solve

During testing of the present json logging (with cta-taped) in Preproduction on Alma9 it was noticed that fluentd would not write to the InfluxDB database. As we have moved to JSON logging we are no longer using our custom log line parser, but instead rely on the fluentd-native json parser. This results in type conflicts in InfluxDB, where most of our old numeric data was converted into float, whereas our json at present uses string for everything.

Wiping the DB and using string for everything is not an option, as this will break our monitoring and not allow for common operations such as mean, avg, etc.

As a temporary workaround I have added explicit type conversions in the fluentd config for cta-taped (we can do the frontend too), but this is exceedingly painful to produce and:

Is not perfect. I could only account for present measurements and the most common fields. It is likely that some were overlooked. Certain events will still fail to be captured.
It is not future proof: Any new log line parameter in CTA would have to be added in the fluentd config during future updates. Failure to do so will result in having to wipe the DB later on, in order to correct the type assigned in Influx.
It is fluentd specific, and other logging solutions would need their own workarounds.

Stakeholders

CTA@CERN + external users who are excited for JSON logging.

Proposal

We will work with the assumption that json logging is to be used on more recent operating systems where the default json parsers can handle 64bit integers correctly.

For a later version of the format, if there is a need for it, we could amend it with duplicates of the 64bit fields, called <field_name>_str, where the value is given as a quoted string.

Desired behavior example with printf:

String string -> json "string"
Float float -> json printf("%d.%02d", float)
- Except for the epoch_time field, as we wanted to keep the higher precision here for debugging purposes.
- Precision can go higher than proposed 2, but a float that is unluckily an integer must be written with at least one 0 after the .: <whatever>.0
Int integer -> json printf("%d", integer)
Int64 largeint -> json printf("%d", largeint)
Boolean bool -> json printf("%s", x ? "true" : "false")

Edited Mar 25, 2024 by Julien Leduc