Feature: JSON logging with correct field types
Triggered by Alma9 upgrade testing.
Problem to solve
During testing of the present json logging (with cta-taped) in Preproduction on Alma9 it was noticed that fluentd would not write to the InfluxDB database. As we have moved to JSON logging we are no longer using our custom log line parser, but instead rely on the fluentd-native json parser. This results in type conflicts in InfluxDB, where most of our old numeric data was converted into float
, whereas our json at present uses string
for everything.
Wiping the DB and using string
for everything is not an option, as this will break our monitoring and not allow for common operations such as mean
, avg
, etc.
As a temporary workaround I have added explicit type conversions in the fluentd config for cta-taped (we can do the frontend too), but this is exceedingly painful to produce and:
- Is not perfect. I could only account for present measurements and the most common fields. It is likely that some were overlooked. Certain events will still fail to be captured.
- It is not future proof: Any new log line parameter in CTA would have to be added in the fluentd config during future updates. Failure to do so will result in having to wipe the DB later on, in order to correct the type assigned in Influx.
- It is fluentd specific, and other logging solutions would need their own workarounds.
Stakeholders
CTA@CERN + external users who are excited for JSON logging.
Proposal
We will work with the assumption that json logging is to be used on more recent operating systems where the default json parsers can handle 64bit integers correctly.
For a later version of the format, if there is a need for it, we could amend it with duplicates of the 64bit fields, called <field_name>_str
, where the value is given as a quoted string.
Desired behavior example with printf
:
- String
string
-> json"string"
- Float
float
-> jsonprintf("%d.%02d", float)
- Except for the
epoch_time
field, as we wanted to keep the higher precision here for debugging purposes. - Precision can go higher than proposed 2, but a float that is unluckily an integer must be written with at least one
0
after the.
:<whatever>.0
- Except for the
- Int
integer
-> jsonprintf("%d", integer)
- Int64
largeint
-> jsonprintf("%d", largeint)
- Boolean
bool
-> jsonprintf("%s", x ? "true" : "false")