Inky US dashboard services experiencing increased error rates.
Incident Report for Inky
Postmortem

Post incident report:

Start: 8-March-2022 1323 UTC

End: 9-March-2022 1630 UTC

Duration: 27 hr 7 min 

Summary:

The Inky Dashboard is having intermittent loading issues for some customers. 

Root Cause:

Resource usage on the servers was higher than expected.  Systems put in place to help in cleaning up the resources used were so busy they started to use more resources than expected.  The cycle continued until some systems were failing when users were trying to access the dashboard. 

Customer Impact:

Some users were unable to access the Inky Dashboard intermittently. 

Mitigation Action:

Separated each portion of the process out to its own server, so they can scale independently and quickly.  

Follow-up Items and Preventative Measures:

  1. The entire process has been split with each step having its own resource pool to draw from.  This should prevent any one step overwhelming the entire process.

  2. Monitoring has also been broken down to quickly spot an issue with any individual step in the process in addition to monitoring the health of the process as a whole.  This should assist us with spotting intermittent issues like this one.

Posted Mar 10, 2022 - 13:30 UTC

Resolved
This incident has been resolved.
Posted Mar 09, 2022 - 20:54 UTC
Monitoring
Inky engineers are monitoring that normal function has been restored after off loading services to a different node.
Posted Mar 09, 2022 - 16:32 UTC
Investigating
Inky engineers are investigating
Posted Mar 09, 2022 - 16:20 UTC
This incident affected: Dashboard Services (Dashboard Services US).