Degraded API performance and connection timeout
Incident Report for Gridly
Postmortem

Timeline on 2021-08-29 UTC

  • 05:13 AM - Degraded API performance and timeout connections
  • 05:26 AM - Timeout connections are under control. API is back to normal
  • 06:15 AM - Deployed hotfix and improvements to production cluster.

Root cause analysis (RCA)

  • Internal endpoint rotated IP addresses unexpectedly
  • Our edge proxy servers had timeout connections on networking between services because of expired IPs
  • Reload proxy layer for re-init connections to latest IPs
  • Deploy hotfixes & improvements on DNS caching layer for actively update changes from internal endpoints
Posted Aug 30, 2021 - 04:18 UTC

Resolved
We're experiencing an elevated level of API timeout. This issue has been resolved on infrastructure side.
Posted Aug 29, 2021 - 05:15 UTC