Skip to content

json logging: Problems with field consistency

Summary

  • CTA version: 5.10.10.1-1
  • OS: AlmaLinux 9.3
  • Kernel: 5.14.0-362.8.1.el9_3.x86_64

When cta-taped logs with the json log format option, the output log may contain:

  1. Duplicate keys with differing values, such as two pid entries with different values
  2. Different value types for the same fields

The first may cause json parsers to use only the last occurrence of the key:value pair supplied.

The second problem is highly problematic for our monitoring, as we rely on InfluxDB which does not like the type for a given key to change within the same measurement.

Steps to reproduce

  1. Start cta-taped with --log-format=json
  2. Observe the logs, for instance those with the message Tape session finished

What is the current bug behaviour?

1: Duplicate keys in same message

In this log message the pid key appears twice, once as an integer and once as a string:

{"epoch_time":1714133686.199549736,"local_time":"2024-04-26T14:14:46+0200","hostname":"tpsrv600","program":"cta-taped","log_level":"INFO","pid":"78405","tid":"78405","mes
sage":"Tape session finished","drive_name":"SPECTRALIB2-LTO9-F05B5S2","instance":"ctaproduction","sched_backend":"cephProductionUser","tapeVid":"L64171","mountType":"Arch
iveForUser","mountId":"1273386","volReqId":"1273386","vendor":"HP-SONY","vo":"ATLAS","mediaType":"LTO9","tapePool":"r_atlas_raw","logicalLibrary":"SPC2L9","capacityInByte
s":"18000000000000","Error_tapeAlertCleaningRequested":"1","Error_checkingTapeAlert":"1","status":"failure","wasTapeMounted":"0","mountTime":"0.0","positionTime":"0.0","w
aitInstructionsTime":"0.24711","waitFreeMemoryTime":"0.0","waitDataTime":"0.0","waitReportingTime":"33.888602","checksumingTime":"0.0","readWriteTime":"0.0","flushTime":"
0.0","unloadTime":"29.4255","unmountTime":"4.514965","encryptionControlTime":"0.002922","transferTime":"34.135712","totalTime":"68.596611","deliveryTime":"68.59661","drai
ningTime":"0.0","dataVolume":"0","filesCount":"0","headerVolume":"0","payloadTransferSpeedMBps":"0.0","driveTransferSpeedMBps":"0.0","tapeDrive":"SPECTRALIB2-LTO9-F05B5S2
","pid":79549,"exitCode":0,"killSignal":0}

2: Different value types

In this instance of a Tape session finished log entry, there is only one occurrence of pid, but it is of type string.

"epoch_time":1714133686.314822085,"local_time":"2024-04-26T14:14:46+0200","hostname":"tpsrv600","program":"cta-taped","log_level":"INFO","pid":"79891","tid":"79891","message":"In Scheduler::ping(): success.","drive_name":"SPECTRALIB2-LTO9-F05B5S2","instance":"ctaproduction","sched_backend":"cephProductionUser","tapeVid":"L64171","mountType":"ArchiveForUser","mountId":"1273386","volReqId":"1273386","vendor":"HP-SONY","vo":"ATLAS","mediaType":"LTO9","tapePool":"r_atlas_raw","logicalLibrary":"SPC2L9","capacityInBytes":"18000000000000","Error_tapeAlertCleaningRequested":"1","Error_checkingTapeAlert":"1","status":"failure","wasTapeMounted":"0","mountTime":"0.0","positionTime":"0.0","waitInstructionsTime":"0.24711","waitFreeMemoryTime":"0.0","waitDataTime":"0.0","waitReportingTime":"33.888602","checksumingTime":"0.0","readWriteTime":"0.0","flushTime":"0.0","unloadTime":"29.4255","unmountTime":"4.514965","encryptionControlTime":"0.002922","transferTime":"34.135712","totalTime":"68.596611","deliveryTime":"68.59661","drainingTime":"0.0","dataVolume":"0","filesCount":"0","headerVolume":"0","payloadTransferSpeedMBps":"0.0","driveTransferSpeedMBps":"0.0","SubprocessName":"drive:SPECTRALIB2-LTO9-F05B5S2","catalogueTime":0.065133,"schedulerDbTime":0.002657,"checkEnvironmentVariablesTime":3e-05

What is the expected correct behaviour?

  1. Each key within a log line should appear only once
  2. A given key should have an associated value of the same type, no matter where in the code it is produced
    • If this one is too cumbersome to implement, then having the same key type between log entries of the same type would probably suffice.
      • In the monitoring we usually distinguish cta-taped log messages by their message field. So for each message regex we are interested in, we create a new measurement in Influx. The type of a value associated with a given key must be consistent within each Influx measurement.

Relevant logs and/or screenshots

Possible causes

Edited by Richard Bachmann