Degraded performance on API requests
Incident Report for Gridly
Postmortem

Impact

  • Major incident
  • Degraded performance on api.gridly.com, gridly is running very slowly.

Timeline

2021-10-04 UTC
  • 01:09 PM - Degraded performance on api.gridly.com
  • 01:35 PM - Enable maintenance mode for 45 minutes for upgrading internal database
  • 02:36 PM - API is back to normal.
2021-10-05 UTC
  • 01:50 AM - Partial outage on api.gridly.com, some internal services has been down for 2-3 minutes
  • 02:40 AM - Deployed hotfix to production. API is back to normal.

Root cause analysis (RCA)

  • We had unexpected downtime from our internal service since Sep 29, 2021, related to license service (plan, seat & subscription). At this time, we scaled out to increase High Availability.
  • Our database was running under pressure because of high traffic, it’s still working but the operations & response time from database are very slowly, that’s why we experienced degraded performance on some API endpoints.
  • We scaled up & upgraded hardware specification on database side to help reducing workload & impact.
  • From perf insight & error tracking, we identified the root cause, it’s about blockers during processing tasks.
  • After identified the root cause, we deployed hotfix for this, optimize some logics on async.
  • All is back to normal, continue monitoring this kind of issue for next few days
Posted Oct 05, 2021 - 05:01 UTC

Resolved
Infrastructure workaround has been implemented and the service is operating normally. We have identified the cause for the issue and are working towards a resolution. We will provide post-mortem shortly.
Posted Oct 04, 2021 - 15:19 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 04, 2021 - 14:36 UTC
Update
We will keep the maintenance mode for next 15 mins.We will provide updates as necessary.
Posted Oct 04, 2021 - 14:16 UTC
Update
We're enabling maintenance mode for working on internal services. This is unexpected maintenance in less than 30 minutes from now.
Posted Oct 04, 2021 - 13:35 UTC
Investigating
We are currently investigating this issue.
Posted Oct 04, 2021 - 13:09 UTC
This incident affected: API Requests.