After few hours of monitoring, we confirm that the issue has been solved. The problem with degraded performance was related to a lack of resources on one of our application servers. The node's capacity was quickly utilized because of temporarily increased traffic.
Unfortunetly, incorrect configuration in our auto-scaling procedure didn't trigger a process of adding additional resources and scaling up the service.
The issue has been solved, and it shouldn't appear in the future. Additionally, we fixed the auto-scaling procedure, and a new solution is being tested now.
We sincerely apologize for any inconvenience caused by this issue and appreciate your understanding in this matter.
Posted Dec 12, 2019 - 12:21 CET
A fix has been implemented and we are monitoring the results.
Posted Dec 12, 2019 - 07:42 CET
We identified a source of the issue. Our SRE team stabilized it, and now we are working on the long run improvement to avoid such a situation in the future.
Posted Dec 12, 2019 - 06:39 CET
Communication with API is currently being impacted by the anomalous behavior of one of our European clusters. W are investigating the issue.