Modify UniqueIDGenerator to provide deterministic IDs
The tag of UniqueIDGenerator
objects (defined at construction time) is a random number determined using the date and time. This means any output will be non-deterministic, as two independent executions of sequences that create this object will provide different outputs. This might lead to issues when hashing files, since both files will be reported as different, that might have non-desired implications for CI/DIRAC. There are some possibilities that have/are being explored:
- Use the TES location of the generator: simple, but leads to problems if we overwrite the generator at that location on a second execution, since the internal counter would be reset.
- Use some metadata from the input file: if we could somehow keep track (and persist) the history of the file we could use it to determine an identification number. It would be enough with saving the (overall) number of parent files that were processed to create that data file. This way, if you process the data and you create the ID generator on a sample that has already been processed, the tag will be different. This does not avoid the problem of having two generators being constructed in the same execution process, but that can be solved by adding the TES location information to the ID (a combined hash). To date, this is the most appealing solution, but it all depends on how we handle the persistence.