FS#47481 — Object Storage / Cloud Archive GRA
Attached to Project— Cloud
Incident | |
cloud | |
CLOSED | |
![]() |
We are observing an increase of latency and error rate. We are investigating.
Date: Tuesday, 10 November 2020, 17:08PMReason for closing: Done
Latency is due to an increase of traffic on our containers servers. We are scaling up.
We added more servers to handle the load. Since 7h45 UTC we observe better performances. We are still investigating.
As a consequences of FS#47465 , one component of the Swift cluster was late in executing asynchronous task. With too much concurrency, these tasks were overwhelming the cluster. We are reducing concurrency to better manage the load.
The load is managed and the lag in asynchronous tasks treatment is recovering. We are monitoring closely.
Performances are back to normal since 04 November 7h45 UTC.
The listing of some containers could be inconsistent due to the lag with the asynchronous update tasks.
Now that the lag has been partially recovered, we are increasing the concurrency to recover faster and we are watching closely the performances to prevent any impact.
The infra is recovering the lag of asynchronous tasks. We are monitoring infra performances closely.
The consistency lag is reducing well, we keep watching the monitoring.
The consistency lag is still reducing well. At this time, 80% of asynchronous update tasks have been treated.
The consumption of asynchronous update tasks has slowed down due to an increase of traffic. We are monitoring closely the infrastructure to avoid any other impact.
The consumption of asynchronous update tasks is back to normal since few hours.
The consistency lag is still reducing well. At this time, 90% of asynchronous update tasks have been treated.
All the asynchronous update tasks have been treated.