Trouble connecting to Roam for customers in the Middle East geographic region

Incident Report for Roam

Postmortem

Summary of Impact

From 1:07 ET on June 1, 2023 until 7:00 ET Roam meetings were unavailable for users in the Middle East geographic region.

Cause

On the night of May 31st we started to roll out infrastructure in the AWS me-central-1 region (UAE) to improve the AV quality and experience of our users in the Middle East. This infrastructure wasn't meant to be turned on, but due to an error was put into use before it was ready. Once that occurred any users in the region would have been unable to connect via AV to other users.

Remediation Plan

  1. We updated our infrastructure code to make a similar problem less likely to occur in the future.
  2. We have updated our monitoring and alerting to make us catch issues like these more quickly and reduce the amount of downtime.
  3. We will add client based fall back to alternate regions in case the closest region isn't working for any reason.
Posted Jun 01, 2023 - 16:41 EDT

Resolved

Our logs are indicating that users in this region are able to successfully connect. Will will post a postmortem in the next 24 hours.
Posted Jun 01, 2023 - 07:25 EDT

Identified

This issue has been identified and a remediation is in place. That remediation involves an update to our DNS, which is configured to cache for 5 minutes, but some DNS systems are configured to ignore the timeouts provided by endpoints and override with larger values. If you are having issues after 11:15 AM GMT please report a bug from Roam menu or send a Team Roam support chat.
Posted Jun 01, 2023 - 07:14 EDT