Additional disk cluster GRA1

OVHcloud Public Cloud Status

Current status

Legend

Operational
Degraded performance
Partial Outage
Major Outage
Under maintenance

Incident Report for Public Cloud

Resolved

• incident summary - Résumé de l'incident: Faulty NVME has to be replaced
• Start time - Heure de début: 14:30 CET 16/10/2017
• Impact - Périmètre affecté: Public Cloud
• Impact type / Type d'impact: Lowered performances

Update(s):

Date: 2017-10-17 12:57:40 UTC
Data rebalanced.

Date: 2017-10-17 07:52:53 UTC
95 % of the data are rebalanced.

Date: 2017-10-16 22:32:44 UTC
45% of the data are rebalanced.

Date: 2017-10-16 16:34:12 UTC
No issue on the new host.
We'll check rebalance progress during the evening.

Date: 2017-10-16 16:00:08 UTC
Disks from the new hosts are in the cluster, data are rebalancing.

Date: 2017-10-16 15:38:40 UTC
Host crashed few minutes after after boot.
Disk removed from cluster, we are growing cluster with another host.

Date: 2017-10-16 15:22:02 UTC
Host with nvme replaced just crashed, if the issue is not linked to the nvme we'll use another host to grow the cluster.

Date: 2017-10-16 15:01:56 UTC
Disks added to the cluster, data are rebalancing.

Date: 2017-10-16 14:42:37 UTC
Task started, disks will be within the clusters within few minutes.
Then data will have to rebalance.

Date: 2017-10-16 14:40:51 UTC
Growing cluster by readding the \"missing\" disks.

Date: 2017-10-16 14:03:03 UTC
NVME is being replaced, we'll be then able to readd disks to the cluster.

Date: 2017-10-16 13:07:22 UTC
Data that only exists in 2 replicates are now being replicated a third time.

Date: 2017-10-16 13:04:42 UTC
8 of the 12 disks were still running, but 4 OSD still running seemed to slow the whole cluster.
We removed all the disks from the cluster.

Posted Oct 16, 2017 - 12:50 UTC