OVHcloud Bare Metal Cloud Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
vss-1-6k
Incident Report for Bare Metal Cloud
Resolved
Nous avons un défaut sur l'un des routeurs de RBX2

Update(s):

Date: 2008-11-18 12:53:40 UTC
Nous avons remis la configuration sur le routeur 2 et redemarré.
Il a bien pris tous les ports.

Le fonctionnement est à nouveau optimal.

Nous allons mettre à jour le VSS au niveau d'IOS dans la nuit.
Une nouvelle version d'IOS, sorti il y a 7 jours, fixe probablement
ce bug.

Date: 2008-11-18 12:18:49 UTC
Plus precissement, c'est le 2ème chassis qui a planté. Le 1er est resté
le maitre. Le 2ème routeur a booté mais a perdu la configuration. Nous
allons lui resyncroniser la configuration et le redemarrer.

Oct 11 18:51:23 GMT: %CPU_MONITOR-SW1_SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for 31 seconds [21/1]
Oct 11 18:51:53 GMT: %CPU_MONITOR-SW1_SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for 61 seconds [21/1]
Oct 11 18:52:23 GMT: %CPU_MONITOR-SW1_SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for 91 seconds [21/1]
3w4d: SW1_SP: icc_send_request_internal: ipc_send_rpc_blocked failed, result 6
3w4d: SW1_SP: -Traceback= 40BC1538 40BC16F8 40BC19E0 40B11AE4 40B120D0 40752F58 40752F44
3w4d: SW1_SP: IPC: Message 5046F360 timed out waiting for Ack
3w4d: SW1_SP: IPC: MSG: ptr: 0x5046F360, flags: 0x20101, retries: 21, seq: 0x315149E, refcount: 2, retry: 00:00:00, rpc_result = 0x0, data_buffer = 0x43B39B6C, header = 0x88D8CC8, data = 0x88D8CE8 || HDR: src:
0x10000, dst: 0x3150010, index: 0, seq: 5278, sz: 80, type: 1, flags: 0x404 hi: 0x4128E36, lo: 0x88D8CE8 || DATA: 00 00 00 15 00 00 00 00 00 00 07 D1 00 00 00 02 00 00 00 09
3w4d: SW1_SP: IPC: Send failed: IPC msg timeout MSG: ptr: 0x5046F360, flags: 0x20101, retries: 21, seq: 0x315149E, refcount: 2, retry: 00:00:00, rpc_result = 0x0, data_buffer = 0x43B39B6C, header = 0x88D8CC8, dat
a = 0x88D8CE8 || HDR: src: 0x10000, dst: 0x3150010, index: 0, seq: 5278, sz: 80, type: 1, flags: 0x404 hi: 0x4128E36, lo: 0x88D8CE8 || DATA: 00 00 00 15 00 00 00 00 00 00 07 D1 00 00 00 02 00 00 00 09
3w4d: SW1_SP: -Traceback= 403E6CB0 403EB96C 403EC00C 40405988 40752F58 40752F44
Oct 11 18:52:53 GMT: %C6K_PROCMIB-SW1_SP-3-IPC_TRANSMIT_FAIL: Failed to send process statistics update : error code = timeout
-Traceback= 409A39A4 409A39F4 409A3C00 409A3E60 40752F58 40752F44
Oct 11 18:52:53 GMT: %CPU_MONITOR-SW1_SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for 121 seconds [21/1]
Oct 11 18:53:23 GMT: %CPU_MONITOR-SW1_SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for 151 seconds [21/1]
Oct 11 18:53:54 GMT: %CPU_MONITOR-SW1_SP-3-TIMED_OUT: CPU_MONITOR messages have failed, resetting system [21/1]


Date: 2008-11-18 12:07:52 UTC
Le routeur est composé de 2 routeurs. L'un de 2 routeurs s'est mis en défaut.
Le 2ème a repris la charge sur soi. L'opération a durée environ 3 minutes;
debut 12:58:25
fin 13:00:36

Nous regardons le routeur en défaut.
Posted Nov 18, 2008 - 12:02 UTC
This incident affected: Dedicated Servers || Global Infrastructure (BHS, ERI, GRA, LIM, RBX, SBG, SGP, SYD, WAW).