Post-mortem OTG0149499
https://cern.service-now.com/service-portal?id=outage&n=OTG0149499
Questions/Discussion
- Why wasn't the procedure of 1 week in between releases followed
- How can we prevent this from happening again?
- Written procedure to release and OTG guidelines
- Testing pipelines; testing actual release not only the MR
- Locking dependencies to patch
- Improve monitoring: alerting for critical websites, alerting for increased error rates