json logging: Problems with field consistency
Summary
- CTA version: 5.10.10.1-1
- OS: AlmaLinux 9.3
- Kernel: 5.14.0-362.8.1.el9_3.x86_64
When cta-taped logs with the json
log format option, the output log may contain:
- Duplicate keys with differing values, such as two
pid
entries with different values - Different value types for the same fields
The first may cause json parsers to use only the last occurrence of the key:value pair supplied.
The second problem is highly problematic for our monitoring, as we rely on InfluxDB which does not like the type for a given key to change within the same measurement.
Steps to reproduce
- Start cta-taped with
--log-format=json
- Observe the logs, for instance those with the message
Tape session finished
What is the current bug behaviour?
1: Duplicate keys in same message
In this log message the pid
key appears twice, once as an integer
and once as a string
:
{"epoch_time":1714133686.199549736,"local_time":"2024-04-26T14:14:46+0200","hostname":"tpsrv600","program":"cta-taped","log_level":"INFO","pid":"78405","tid":"78405","mes
sage":"Tape session finished","drive_name":"SPECTRALIB2-LTO9-F05B5S2","instance":"ctaproduction","sched_backend":"cephProductionUser","tapeVid":"L64171","mountType":"Arch
iveForUser","mountId":"1273386","volReqId":"1273386","vendor":"HP-SONY","vo":"ATLAS","mediaType":"LTO9","tapePool":"r_atlas_raw","logicalLibrary":"SPC2L9","capacityInByte
s":"18000000000000","Error_tapeAlertCleaningRequested":"1","Error_checkingTapeAlert":"1","status":"failure","wasTapeMounted":"0","mountTime":"0.0","positionTime":"0.0","w
aitInstructionsTime":"0.24711","waitFreeMemoryTime":"0.0","waitDataTime":"0.0","waitReportingTime":"33.888602","checksumingTime":"0.0","readWriteTime":"0.0","flushTime":"
0.0","unloadTime":"29.4255","unmountTime":"4.514965","encryptionControlTime":"0.002922","transferTime":"34.135712","totalTime":"68.596611","deliveryTime":"68.59661","drai
ningTime":"0.0","dataVolume":"0","filesCount":"0","headerVolume":"0","payloadTransferSpeedMBps":"0.0","driveTransferSpeedMBps":"0.0","tapeDrive":"SPECTRALIB2-LTO9-F05B5S2
","pid":79549,"exitCode":0,"killSignal":0}
2: Different value types
In this instance of a Tape session finished
log entry, there is only one occurrence of pid
, but it is of type string
.
"epoch_time":1714133686.314822085,"local_time":"2024-04-26T14:14:46+0200","hostname":"tpsrv600","program":"cta-taped","log_level":"INFO","pid":"79891","tid":"79891","message":"In Scheduler::ping(): success.","drive_name":"SPECTRALIB2-LTO9-F05B5S2","instance":"ctaproduction","sched_backend":"cephProductionUser","tapeVid":"L64171","mountType":"ArchiveForUser","mountId":"1273386","volReqId":"1273386","vendor":"HP-SONY","vo":"ATLAS","mediaType":"LTO9","tapePool":"r_atlas_raw","logicalLibrary":"SPC2L9","capacityInBytes":"18000000000000","Error_tapeAlertCleaningRequested":"1","Error_checkingTapeAlert":"1","status":"failure","wasTapeMounted":"0","mountTime":"0.0","positionTime":"0.0","waitInstructionsTime":"0.24711","waitFreeMemoryTime":"0.0","waitDataTime":"0.0","waitReportingTime":"33.888602","checksumingTime":"0.0","readWriteTime":"0.0","flushTime":"0.0","unloadTime":"29.4255","unmountTime":"4.514965","encryptionControlTime":"0.002922","transferTime":"34.135712","totalTime":"68.596611","deliveryTime":"68.59661","drainingTime":"0.0","dataVolume":"0","filesCount":"0","headerVolume":"0","payloadTransferSpeedMBps":"0.0","driveTransferSpeedMBps":"0.0","SubprocessName":"drive:SPECTRALIB2-LTO9-F05B5S2","catalogueTime":0.065133,"schedulerDbTime":0.002657,"checkEnvironmentVariablesTime":3e-05
What is the expected correct behaviour?
- Each key within a log line should appear only once
- A given key should have an associated value of the same type, no matter where in the code it is produced
- If this one is too cumbersome to implement, then having the same key type between log entries of the same type would probably suffice.
- In the monitoring we usually distinguish cta-taped log messages by their
message
field. So for each message regex we are interested in, we create a new measurement in Influx. The type of a value associated with a given key must be consistent within each Influx measurement.
- In the monitoring we usually distinguish cta-taped log messages by their
- If this one is too cumbersome to implement, then having the same key type between log entries of the same type would probably suffice.
Relevant logs and/or screenshots
Possible causes
Edited by Richard Bachmann