Write scripts to test resilience under server failures
While the system is actively archiving and retrieving do the following and then check no files for archive are lost and the system comes back to a coherent state:
- Reboot each type of EOS and CTA server (mgm, fst, cta front-end and tape server).
- Kill -9 each type of EOS and CTA daemon.
- Disconnect (or disable) network links.
- Shutdown gracefully each type of EOS and CTA service (tape servers should gracefully eject mounted tapes for example).