Review the use of decode and encode of unicode strings
In general we cannot be sure that the stdout/stderr of the commands we call is valid unicode, so we have to protect all bytes.decode
invocations with errors="surrogateescape"
, and make sure that all the matching str.encode
also use errors="surrogateescape"
(see https://docs.python.org/3/library/codecs.html#error-handlers)
-
@clemenci started a discussion: As discussed on the chat, we cannot be sure that the output of our commands is valid unicode, so it's better to use:
output["stdout"] = result.stdout.decode(errors="surrogateescape") output["stderr"] = result.stderr.decode(errors="surrogateescape")