Clubs PSMT - Major outage - POS - 2024-09-23 – Incident details

8702 - Portmore experiencing major outage

Major outage - POS - 2024-09-23

Resolved
Operational
Started about 1 year agoLasted about 5 hours

Affected

POS Guatemala

Major outage from 12:10 PM to 12:22 PM, Degraded performance from 12:22 PM to 12:34 PM, Operational from 12:34 PM to 5:03 PM

6307 - Escuintla

Major outage from 12:10 PM to 12:22 PM, Degraded performance from 12:22 PM to 12:34 PM, Operational from 12:34 PM to 5:03 PM

6301 - Mira Flores

Major outage from 12:10 PM to 12:22 PM, Degraded performance from 12:22 PM to 12:34 PM, Operational from 12:34 PM to 5:03 PM

6303 - Pradera

Major outage from 12:10 PM to 12:22 PM, Degraded performance from 12:22 PM to 12:34 PM, Operational from 12:34 PM to 5:03 PM

6305 - San Cristobal

Major outage from 12:10 PM to 12:22 PM, Degraded performance from 12:22 PM to 12:34 PM, Operational from 12:34 PM to 5:03 PM

6304 - Fraijanes

Major outage from 12:10 PM to 12:22 PM, Degraded performance from 12:22 PM to 12:34 PM, Operational from 12:34 PM to 5:03 PM

Updates
  • Resolved
    Resolved

    The incident has been successfully resolved and is now closed.

    Brief description of the problem:

    An outage was experienced due to AWS tunnels going down, leading to loss of connectivity between our on-premise infrastructure and the cloud resources hosted on AWS. This impacted the availability and functionality of applications dependent on this connection.

    Actual state:

    The AWS tunnels were restored, and connectivity has been re-established.

    Root Cause:

    The incident was caused by a disruption in AWS Site-to-Site VPN tunnels, likely due to either network misconfigurations, a temporary issue with AWS networking services, or a failure in VPN endpoints. Further analysis revealed that the automatic failover mechanism did not engage as expected, prolonging the downtime.

    Our systems are back to normal operation. Thank you for your understanding and support throughout this process.

  • Monitoring
    Monitoring

    The system is stable but the situation is being actively monitored. We are closely watching for any potential impact and taking necessary measures. Further updates to follow. Thank you for your patience and understanding.

  • Identified
    Identified

    The issue has been identified and our team is working on resolving it. We appreciate your patience as we work towards a resolution. Further updates to follow. Thank you for your understanding.

  • Resolved
    Resolved

    The incident has been successfully resolved and is now closed.

    Brief description of the problem:

    An outage was experienced due to AWS tunnels going down, leading to loss of connectivity between our on-premise infrastructure and the cloud resources hosted on AWS. This impacted the availability and functionality of applications dependent on this connection.

    Actual state:

    The AWS tunnels were restored, and connectivity has been re-established.

    Root Cause:

    The incident was caused by a disruption in AWS Site-to-Site VPN tunnels, likely due to either network misconfigurations, a temporary issue with AWS networking services, or a failure in VPN endpoints. Further analysis revealed that the automatic failover mechanism did not engage as expected, prolonging the downtime.

    Our systems are back to normal operation. Thank you for your understanding and support throughout this process.

  • Monitoring
    Monitoring

    The system is stable but the situation is being actively monitored. We are closely watching for any potential impact and taking necessary measures. Further updates to follow. Thank you for your patience and understanding.

  • Identified
    Identified

    The issue has been identified and our team is working on resolving it. We appreciate your patience as we work towards a resolution. Further updates to follow. Thank you for your understanding.

  • Investigating
    Investigating

    We are currently investigating an incident affecting our systems, impacting the login and order process in the site. Our team is actively working to identify and resolve the issue. Further updates to follow. Thank you for your patience.