Slow Query Performance

Incident Report for Keen

Postmortem

We have written a full postmortem on our blog. We're very sorry for the slow queries and any inconvenience, we take the performance of our service very seriously and are working hard to ensure this won't happen again.

In summary:

An optimization layer was turned off for analysis and review.
We had several patterns of unexpected and increased usage causing queries to run slow enough to begin stacking up.
Tuning the processing and storage layers (JVM settings, etc) were inadequate.

To resolve we re-enabled the optimization layer, which allowed us to clear the query backlog and normal performance returned.

In response we've implemented some better monitoring around utilization, improved visibility into rate limits and are working to have better notifications, and getting more information.

If you'd like more details, please read our full post or contact us!

Posted Dec 10, 2014 - 15:08 PST

Resolved

All systems have returned to normal operation after over 12 hours of monitoring! Thank you for your patience!

Posted Dec 09, 2014 - 07:50 PST

Update

We are continuing to monitor the situation here and hoping to set everything to fully operational soon. There have been a few brief periods of performance degradation that we are continuing to monitor and investigate.

Posted Dec 08, 2014 - 16:03 PST

Monitoring

Query performance is continuing to improve after we have deployed our fixes. We are still monitoring performance and taking additional steps to ensure long-term stability.

Posted Dec 08, 2014 - 12:47 PST

Update

At present query performance has returned to reasonable levels, but still somewhat slower than normal. We are optimistic that over the next few hours performance will settle back to pre-incident levels. We are continuing to monitor and will update periodically.

Posted Dec 08, 2014 - 12:39 PST

Identified

We have identified the underlying causes for degraded query performance and are working towards resolution.

Posted Dec 08, 2014 - 08:21 PST

Update

Queries are currently slow and causing timeouts, we are still investigating and hope to share some good news soon.

Posted Dec 08, 2014 - 08:11 PST

Investigating

We are experiencing slower than normal query performance from time to time for some customers. We are investigating and currently have no ETA.

Posted Dec 08, 2014 - 03:54 PST