01:35 PM - Enable maintenance mode for 45 minutes for upgrading internal database
02:36 PM - API is back to normal.
2021-10-05 UTC
01:50 AM - Partial outage on api.gridly.com, some internal services has been down for 2-3 minutes
02:40 AM - Deployed hotfix to production. API is back to normal.
Root cause analysis (RCA)
We had unexpected downtime from our internal service since Sep 29, 2021, related to license service (plan, seat & subscription). At this time, we scaled out to increase High Availability.
Our database was running under pressure because of high traffic, it’s still working but the operations & response time from database are very slowly, that’s why we experienced degraded performance on some API endpoints.
We scaled up & upgraded hardware specification on database side to help reducing workload & impact.
From perf insight & error tracking, we identified the root cause, it’s about blockers during processing tasks.
After identified the root cause, we deployed hotfix for this, optimize some logics on async.
All is back to normal, continue monitoring this kind of issue for next few days
Posted Oct 05, 2021 - 05:01 UTC
Resolved
Infrastructure workaround has been implemented and the service is operating normally. We have identified the cause for the issue and are working towards a resolution. We will provide post-mortem shortly.
Posted Oct 04, 2021 - 15:19 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 04, 2021 - 14:36 UTC
Update
We will keep the maintenance mode for next 15 mins.We will provide updates as necessary.
Posted Oct 04, 2021 - 14:16 UTC
Update
We're enabling maintenance mode for working on internal services. This is unexpected maintenance in less than 30 minutes from now.