OVHcloud Public Cloud Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
Outage ApiServer GRA7 June 5th Post-Mortem
Incident Report for Public Cloud
Resolved
|--------------------------------------------------------------------------------------------------------|

Between 19:37 UTC and 20:03 UTC

Was in intervention about incidents raised by OpsGenie during an on-call period like:
- customer's node re-installation
- workload rescheduling to spread the load between several admin nodes

|--------------------------------------------------------------------------------------------------------|

2021-06-05 20:05 UTC

Detect that one of our proxy in front of customer's apiservers was consuming too much CPU.
This proxy is called Pokeflute
Starting to recycle Pokeflute

|--------------------------------------------------------------------------------------------------------|

2021-06-05 20:15 UTC

Received lots of different alerts symptomatic of a communication problem with customer's ApiServers
Stoping some alerting bots
Starting investigation

|--------------------------------------------------------------------------------------------------------|

2021-06-05 20:15 ~ 20:45 UTC

Logs analysis to try to understand the situation and identify the root cause

|--------------------------------------------------------------------------------------------------------|

2021-06-05 21:23 ~ 22:20 UTC

Clean last traces of the outage:
- redeploy some customers clusters/nodes/components in ERROR
- clean alerts and restart all monitoring/alerting bots

|--------------------------------------------------------------------------------------------------------|

2021-06-05 22:20 UTC

End of the incident

|--------------------------------------------------------------------------------------------------------|
Posted Jun 16, 2021 - 15:39 UTC