We know our districts experienced a significant service degradation beginning on December 18th and extending to the morning of December 20th, more details from the real time event tracking documentation. During this event Eduphoria Immediately enacted response teams to begin mitigation of this degradation. The issue was identified quickly and we are confident in the effective remediation efforts from our team to prevent further exposure of this event.
It is important to note that during this process no data was compromised or suspected of compromise. Our incident response plans were not required to be implemented. Business Continuity Planning implementation was considered, however the issue was estimated to be rectified within necessary timeframes of implementing the continuity plan.
In summary, there were several minor issues that lead to the degradation, including high CPU loads on the front end servers, higher than normal user base activity, and the number of database call timeout connections allowed. Initial efforts from infrastructure teams lessened the issue, however engineering teams were required to adjust code bases to fully mitigate the degradation.
Unfortunately, this event occurred at a very inconvenient time in the school year schedule. Our team does expect some of these issues as a software as service company. Our goal is to minimize these events and effectively respond when they do occur. Moving forward, expect quarterly communication from our leadership team to present the status of our service offerings as well as planned enhancements.
Thank you for your patience and trust as we fine tune our tools and internal processes. Please feel free to reach out to support or our leadership team with any further questions or comments.
Tim Smith, Co-Founder and CTO
Teal Shalek, COO
Colin McDorman, CEO
Completed improvements by our teams since December 18th:
- Improvements to behavior for queued tasks, nightly tasks, and scheduled tasks. Errors rates have been reduced during task management to reduce number of retries. Completed 1/31/2019
- Restructuring of the infrastructure to increase performance. New application servers now support higher CPU, memory, and network load on fewer servers resulting in a decreased number of allowable connections from the application servers to the database servers. Completed 1/30/2019
- Aware views improvement to optimize queries. Initial testing results indicate the improvement reduces time it takes to load data views by nearly 50% in many cases. Scheduled release 2/12/2019
- Database read servers improvements to reduce load on write servers. Available read servers are more actively monitored by the applications. Smarter load balancing between read servers by looking at active connections in each pool before assigning a task. This helps balance load faster when a new server comes online. Instead of throwing exceptions when a read server goes offline the applications look for active read servers again and re-assign a valid connection to the task. Scheduled release 2/6/2019