Resolved -
AWS has resolved their outage status and all of our metrics have been looking normal for the last couple of hours, so we can now resolve this incident. Thanks for your patience.
Oct 20, 17:04 MDT
Update -
As AWS service recovery continues, we are able to increase capacity. We are also seeing fewer internal networking errors. At this point, all Pronto services are running with normal response times and error rates. We will continue to closely monitor performance.
Oct 20, 14:35 MDT
Monitoring -
AWS has mitigated some issues that have allowed us to increase capacity and get Pronto working again. We are still seeing intermittent scaling and network issues related to the AWS outage, but we have enough capacity now that Pronto is usable. Expect to see some intermittent errors and slowness as we work to increase capacity to our baseline needs for this time of day.
Oct 20, 13:50 MDT
Update -
We continue to investigate. The issues seems related to issues with AWS internal networking. We are attempting to find a way to mitigate the issue. For now, Pronto services are, by and large, unavailable. We apologize for the issue and will continue to provide updates as we learn more.
Oct 20, 13:22 MDT
Investigating -
Pronto is encountering errors within AWS and is currently experiencing connection errors. We suspect this is related to the AWS outage and are actively investigating.
Oct 20, 12:49 MDT
Identified -
AWS is recovering from a large-scale outage that is affecting Pronto's ability to scale up to meet the normal daily traffic increases. This is causing problems, including delaying push notifications, updating badge counts, and overall app performance. Course synchronization is also in a degraded state because many LMS sources are also affected by the AWS outage. We will continue to follow AWS's recovery closely.
Oct 20, 10:13 MDT