S3 Streaming Uploads are Delayed

Incident Report for Keen

Postmortem

The incident was triggered by our S3 streaming pipeline not being able to keep up with bursts in write event volume. This caused a backlog of events that needed to be processed and uploaded to S3.

The problem was exacerbated by the fact that once the system is in a degraded state, it's possible that some events could be uploaded but not captured by our book keeping system. Once we have a backlog, the system checks for duplicate events before uploading new data. Duplicate checking is expensive and requires reading data from S3 every time new events need to be written. This caused additional load on the system.

To solve the problem we added more capacity in terms of number of processes, as well as made changes in our system to process a smaller batch of events at a time to prevent resource starvation.

Posted Apr 13, 2015 - 14:14 PDT

Resolved

The S3 upload backlog has been cleared and S3 streaming is now back to normal. We apologize for any inconvenience this may have caused and as always appreciate your patience!

Posted Apr 13, 2015 - 12:36 PDT

Update

We're still working on speeding up the process that clears the S3 upload backlog. This issue only impacts a small percentage of our customers who use Keen to stream data to S3.

Posted Apr 13, 2015 - 09:56 PDT

Update

Our most recent changes cleared a portion of the S3 upload backlog, albeit at a slower rate than we would like. We're currently working to speed up the process.

Posted Apr 13, 2015 - 07:50 PDT

Update

We've deployed additional changes and are continuing to work on clearing the S3 upload backlog.

Posted Apr 13, 2015 - 07:19 PDT

Update

We're still working on clearing the S3 upload backlog. We appreciate your patience.

Posted Apr 13, 2015 - 06:36 PDT

Update

We are still working on correcting the upload backlog.

Posted Apr 13, 2015 - 05:55 PDT

Monitoring

We have deployed a fix for the S3 upload problem and are monitoring to ensure that uploads to S3 are working correctly.

Posted Apr 13, 2015 - 05:11 PDT

Identified

We're currently working on a problem that is preventing streaming uploads to S3 from being written. We believe we've isolated the problem and are working on a fix.

Posted Apr 13, 2015 - 05:02 PDT

This incident affected: Stream API.