use CPU nodes as a run directory and temporary storage
Summary
We would like to add an option to sim_runner which will create job_commands that will use the computing nodes as a run directory and as temporary space for output and logging. For the moment, we use fefs (the storage elements) as a run_directory, and also we write the output and log files directly to fefs. This approach has 2 caveats:
- (most important) We have observed a momentarily loss of connection between the cpu nodes and the storage elements in La Palma. When this happens, Corsika is not able to write the event to the disk and therefore complains that there is not enough memory/disk space and crashes. This issue has been observed at La Palma cluster, even though it might exist in other clusters as well.
- Speed: it is expected to be noticeably faster when the job runs and writes / logs in the same node, than in the storage element.
What is the expected correct behavior?
The job creates a temp directory in the computing node, copies all necessary files in order to run there, writes the output and logs also in the node and once the job is finished, copies everything to fefs and deletes the temporary directory.