Skip to content

Add mechanism to separate between logs that need to be parsed by operations and logs that do not

Problem

At the moment, several of the logs printed by CTA are being automatically parsed by fluentd in order to extract useful metrics for monitoring/operations.

However, there is no "contract" in place between operators and developers about which log messages are being tracked neither their format. Since operators are relying on log messages that may change arbitrarily, they are creating an implicit dependency on unstable behaviour.

This is a problem for several reasons:

  • Simple changes (like fixing typos or clarifying language) in log messages can break monitoring tools.
  • Developers feel blocked from improving log clarity.
  • Operators feel blindsided when their tools break silently.

Solution

We need to a clear mechanism to prevent this problem from happening, that addresses both operators and developers concerns.

Once discussed and accepted by both the development and operations teams, it will be implemented on CTA.