Query Durations are Up for our Dallas data centre

Incident Report for Keen

Resolved

All systems are stable. We are humbled by the patience and professionalism of our community. Thank you for reminding us why we do this every day.

Posted Feb 19, 2015 - 20:50 PST

Update

Query response time has improved drastically and performance is almost back to steady state. We continue to monitor. We are almost out of the woods. Thank you for your patience.

Posted Feb 19, 2015 - 19:25 PST

Monitoring

Sorry for the delay in update, but we have enabled deletes again, and the event writing is working correctly. Queries are still slower than normal, but will continue to improve in speed over the next several hours. Thank you for your patience with this.

Posted Feb 19, 2015 - 16:36 PST

Update

Query performance has stabilized, and we continue to work on performance enhancements.

Posted Feb 19, 2015 - 15:11 PST

Update

We are still experiencing query instability for some customers. We continue to work towards a remedy!

Posted Feb 19, 2015 - 13:49 PST

Update

New data is now flowing in efficiently and ready for querying within minutes. We identified that a chunk of yesterday's data did not get written to disk, and we are very sorry for that. Our first data loss in 12 months. We are still experiencing query instability for some customers.

Posted Feb 19, 2015 - 12:39 PST

Update

Still experiencing query instability. Data collection is up but delayed. We are making progress on the backlog and are currently adding additional hardware to help with this effort.

Posted Feb 19, 2015 - 11:31 PST

Update

We are disabling delete API calls in our remaining DC as an emergency measure as we continue to work on query durations.

Posted Feb 19, 2015 - 10:49 PST

Update

We're continuing to work on our query durations. We're specifically working on changing our post-write optimization systems which have been disabled and need to complete their backlog.

Posted Feb 19, 2015 - 10:03 PST

Update

We are continuing to work on our write path and are seeing high durations of queries in one DC. We are disabling delete API calls in one DC as an emergency measure.

Posted Feb 19, 2015 - 09:33 PST

Update

Query durations continue to be high in one of our DCs. We've shifted traffic to balance our data centers to try and balance the durations. We're also investigating increasingly high query times in our storage layer which we suspect may be related to changes in our write path from yesterday's event. We're currently working on that problem and preparing to add more capacity later today!

Posted Feb 19, 2015 - 08:53 PST

Identified

We've deployed some changes to balance the query load more evenly and are working on increasing capacity. Query durations are improving as a results of this effort, but we still have work to do.

Posted Feb 19, 2015 - 07:56 PST

Investigating

We are investigating increased query durations. Not all customers should be affected. Writes are not impacted.

Posted Feb 19, 2015 - 06:31 PST

This incident affected: Stream API and Compute API.