Filter By Components

February 2017

S3 Streaming Delayed
The backlog has been processed and S3 streaming is now back to normal. We appreciate your patience and apologize for the inconvenience!
Feb 17, 07:18-09:36 PST
S3 Streaming Delayed
The backlog has been processed and S3 streaming is now back to normal. We appreciate your patience and apologize for the inconvenience! Happy Sunday/Monday!
Feb 12, 15:22-16:10 PST
Explorer Down
Annnnnd Explorer is back to normal. We apologize for the inconvenience!
Feb 6, 09:54-10:18 PST

January 2017

Event write delay
We've processed the backlog, and events should be available for querying. Sorry about the delay!
Jan 26, 11:38-14:14 PST
Elevated Query Durations
The underlying issue with our hosting provider has been resolved, and query durations have returned to normal. We appreciate you patience!
Jan 9, 09:25-12:07 PST
Write Path Issues
Failed host has been restored and is back in rotation. This incident is now resolved. Thanks for you patience.
Jan 8, 15:54-18:12 PST

December 2016

Cached query processing backed up.
Cached queries are humming along, and should continue to do so. We've added additional tracking to help identify this quicker if it gets backed up again.
Dec 14, 15:30-19:48 PST

November 2016

S3 Streaming delayed
The S3 backup has been resolved, and all delays should be caught up within the next 30 minutes.
Nov 22, 21:33-22:39 PST
Cached Queries execution delayed. All other queries are functioning normally.
Cached queries should be performing normally since around 11 pm PST Friday night. Please let us know if you continue to see missing results.
Nov 18, 13:55 - Nov 19, 09:10 PST
Inconsistent query responses
Queries should be returning consistent results now. Again, no data was lost. Please let us know if you continue to see inconsistent results.
Nov 15, 11:01-18:29 PST
Write Events Delayed
Write traffic has returned to normal and all systems are go. Enjoy!
Nov 7, 13:56-16:14 PST
Event Writes Delayed
The backlog of events has cleared. It will take a moment for the status page to reflect this. Thank you for your patience.
Nov 7, 12:16-13:15 PST

October 2016

Event write processing is delayed
Event writes are caught up and processing normally again. Thanks for your patience!
Oct 25, 10:27-13:53 PST
DNS Resolution Issue for (api.keen.io)
DNS resolution issues have been fully resolved and our service is now back to normal. Thanks for your patience.
Oct 21, 09:32-15:20 PST
Extractions under load. Some queries with extraction analysis type timing out.
We have made further enhancements to our extraction layer and reduced the number of extraction failures to a VERY small number. We will continue to work on making further improvements to our extraction layer and bring this number to 0. Once again this only impacts the customers who run extraction queries, other features of the Keen API including Data Collection and Queries with other analysis types are healthy and serving traffic.
Oct 19, 12:38 - Oct 20, 11:27 PST
Event Writes are delayed
All operations are back to normal. Thank you for your patience!
Oct 18, 15:17-17:49 PST
Slow response times for read queries
The analysis API is back to normal, thanks again for your patience!
Oct 17, 08:16-12:40 PST
Timeouts Publishing Events to Keen
This incident has been resolved.
Oct 4, 10:53-16:03 PST
[Scheduled] Write Path Experiment
The maintenance is complete. Thank you for your patience.
Oct 3, 21:59-23:15 PST

September 2016

Event Writes are delayed. Some reads/writes timing out.
All operations are back to normal. Thank you for your patience!
Sep 22, 08:15-13:18 PST

August 2016

Intermittent Network Connectivity Issues
And things are back to normal. It turns out that emergency maintenance was required for some critical network components in one of our data centers, and that the intermittent network connectivity issues we experienced were a result of said maintenance. We apologize for the inconvenience and thank you for your patience!
Aug 23, 13:49-14:17 PST
Slow Queries and delayed writes
All events continue to be written quickly. Thank you again.
Aug 15, 11:46-14:37 PST
Reads are slow; writes delayed
Everything is healthy once again, thank you for your patience.
Aug 13, 10:35-12:34 PST
Website unavailable
This incident has been resolved.
Aug 10, 17:40-18:12 PST
S3 Streaming delayed
Everything continues to look good with our S3 Streaming Service.
Aug 10, 08:27-11:31 PST
Delayed Writes and slower queries
Our event backlog has been processed. No events were lost. We've made some code changes to make our service more resilient in the face of higher load. Thank you.
Aug 9, 08:49-16:57 PST
Occasional slow queries
Our maintenance and observation has been completed. Both APIs appear stable and performing normally.
Aug 8, 12:10-18:15 PST

July 2016

Data collection API requests are Failing
Everything looks to be stable from our earlier changes. Thank you for bearing with us.
Jul 28, 16:13-16:55 PST
Keen.io is slow to load
We have resolved the database performance issue. Thanks for your patience
Jul 20, 11:22-15:18 PST
Event writes are delayed
This incident has been resolved
Jul 19, 13:54-17:06 PST
www.keen.io slow to load
We've resolved the issue affecting the load times on our website. We will continue to monitor for further issues.
Jul 18, 19:38-20:09 PST
www.keen.io slow to load
This incident has been resolved.
Jul 18, 10:46-12:30 PST
[Scheduled] Write Path Maintenance
The maintenance is complete. Thank you for your patience.
Jul 12, 22:32-22:53 PST
[Scheduled] Write path experiment
Experiment completed; all services back to normal. We got some good data! Thanks for your patience.
Jul 2, 22:02-22:24 PST

June 2016

Events are delayed
All services are up and running normally. Thanks for your patience as we have worked through the issues. The team is going to put together a post mortem and make sure it gets to our status page.
Jun 20, 07:47 - Jun 21, 08:04 PST
Slow Query Times and Event Write Delays
Query response times and event write delays spiked over a 30 minute period, but it has already recovered. We’re looking into the root cause of the spike. We apologize for the temporarily slow down. Query response times and event write delays should be back to normal.
Jun 17, 09:44 PST
Slow Query Times
The issue is resolved and things have been stable for the last hour. We believe the latency increase was caused by increased load to our database layer. Our engineering team is still investigating the root cause and we have some leads. We will update again when we have more details.
Jun 13, 23:30 - Jun 14, 01:48 PST
Inconsistent Query Results
This issue is now resolved. Query results for the affected time period are back to their normal state. If you have any questions about this (or about anything else), as always, please don't hesitate to reach out to us via our standard support channels: https://keen.io/support/. Thanks again for your patience as we resolved this issue.
Jun 9, 08:56 - Jun 12, 13:35 PST
Dropped Connections
The sudden increase in traffic we experienced has subsided and all services have returned to normal. We haven't seen any 502s or a 504s in the last 30 minutes. This issue was entirely customer load related, and we are continuing to monitor traffic closely. We appreciate your patience and apologize for the inconvenience.
Jun 7, 10:01-10:46 PST

May 2016

Write and Read APIs Slow
We've implemented a few additional configuration changes that have allowed us to return response times to within their normal ranges. We're still working on the underlying instability in our database cluster and will continue to do so until we're able to address the root cause. We feel the changes we've made today will help mitigate further instability while we do so. We apologize for the delays and thank you again for your patience!
May 19, 09:30-17:20 PST
Collection and Analysis APIs Slow
We experienced significant instability in our database cluster today which caused delays in both our Data Collection and Data Analysis APIs. We've taken steps to mitigate the instability and have stabilized both APIs. While we are stable right now, we're still working to fix the root cause and will continue working until we've remedied the issue completely. All of that being said, we're happy to report that no data was lost as a result of today's instability :) We appreciate your patience and apologize for any inconvenience these delays have caused. Please feel free to reach out with any questions or concerns you might have and we will be happy to help. Have a great evening!
May 18, 08:44-17:02 PST
Connection Issues
Our API request times have stabilized, and service is back to normal. Thanks for your patience during this incident.
May 13, 12:08-18:13 PST
Connection issues
Our load balancers have stabilized. All systems are operating normally.
May 12, 11:42-15:50 PST
Event writes slow and query durations high
S3 streaming is now caught up as well. All systems operating normally.
May 11, 03:58 - May 12, 03:20 PST
Event writes slow
The backlog of events has been processed and all the things have returned to normal. We experienced high latency in our write path which slowed inbound event processing to a crawl for a few hours. Inbound events were still accepted during this time, and we experienced no performance degradation as far as API event ingestion is concerned. Any queries executed during this timeframe were likely off by a small amount, but should all now be returning up-to-date totals. We apologize for the inconvenience and wish you a pleasant start to your day.
May 5, 02:47-04:47 PST

April 2016

Duplicate Events
We wanted to let you know about a small and very unusual issue that occurred earlier this week. On Tuesday, April 26th, we identified a configuration issue in our write path that caused data duplication. Approximately .2% of events were duplicated during the hours of 04:10 and 17:30 UTC due to the misconfiguration. We noticed the issue at 17:25 UTC and immediately took action to fix the problem, and were able to fix it within minutes. Based on how most people use our platform, this incident might not be noticeable at all. But if you did notice, you likely saw a small number of additional events during that timeframe which would have inflated your counts slightly. We're currently working on cleaning up the duplicated data and will get the deduplication completed as quickly as possible. We apologize for the inconvenience and urge you to reach out with any questions or concerns you might have. Thank you as always for your patience, and for your business!
Apr 26
Reduced Performance on Event Writes and Queries
Things have been looking stable for the last couple of hours. We are continuing to investigate the root cause on the engineering side. We will post an update once we have more information.
Apr 18, 09:35-15:31 PST
Slow queries and extractions
Our queries and extractions look healthy now that the misbehaving node is being disciplined. It should not impact our API any longer. Sorry for the trouble.
Apr 15, 05:47-06:36 PST
API and website problems
We are not seeing any further problems with our website or APIs. Thank you for your understanding.
Apr 12, 22:59 - Apr 13, 00:01 PST

March 2016

Slow queries and write delay
Service levels have returned to normal which means both reads and writes are good to go. We apologize for any convenience this may have caused and thank you for your patience. Have a great day!
Mar 30, 06:39-07:56 PST
Brief organization overview page distruption
Usage stats are now restored.
Mar 4, 13:37-15:38 PST
Brief Explorer UI interruption
We just experienced a brief interruption in service for our Explorer UI due to unscheduled maintenance. We have resolved the issue and all is back to normal. We apologize for any inconvenience!
Mar 3, 10:21 PST

February 2016

Slower Extractions
Our adjustment has restored our query/extraction times to expected levels. Thank you for your patience -- and thank you for reading dry status page messages.
Feb 22, 09:35-10:33 PST
Slow queries and write delay
The event backlog has been processed and all hosts in both the query and write paths have returned to normal. Thank you for sticking with us!
Feb 2, 21:52-22:49 PST

January 2016

Write Event Delay
This incident has been resolved.
Jan 20, 11:31-12:34 PST

December 2015

Delay in writing events
The increase in inbound event volume has subsided. Event writes have returned to normal and the backlog has been processed. We apologize again for any inconvenience this delay may have caused!
Dec 3, 04:05-04:44 PST

November 2015

Event writes slow
The write path is no longer experiencing any slowdowns. Thanks for your patience as we resolved this.
Nov 13, 10:45-13:46 PST
Organization Pages Down
We're good to go! Organization pages are back up!
Nov 6, 14:29-14:32 PST
Slow queries and occasional errors
Queries have remained normal. We are following up to prevent this from recurring.
Nov 4, 19:59-22:04 PST
High query durations
We have made use of more capacity to recover response times. Thanks so much for your patience!
Nov 3, 09:05-10:30 PST

October 2015

[Scheduled] Explorer Update
We are done! Saved queries are live! Check out the Explorer!@
Oct 26, 10:04-12:45 PST
Concurrency Failures
Our failure rate has now recovered. The team is continuing to invent in improving performance and making sure that we are serving all valid queries.
Oct 21, 08:22-15:29 PST
Slow Reads and Concurrency Limits
The timed-out queries have all but vanished. This was likely a side-effect of decreased load on our systems. We are still looking into a real fix for the underlying issues, not all of which have been identified. We appreciate your patience, and apologize for any difficulty this may cause you.
Oct 20, 08:22-15:57 PST
Elevated errors rates
We have fixed the issue. Thanks for your patience!
Oct 13, 14:10-17:03 PST
Elevated error rates
We have moved some traffic and reduced the number of errors on our API. Thanks for your patience!
Oct 6, 09:44-11:10 PST
Write event delay
Write event delays are now at reasonable levels. We will continue to address the longer-term question of how to react to new bursts of load for the future. I guess this is a "victim of your own success" kind of problem, no? Thank you for your patience once more.
Oct 2, 10:33-11:48 PST

September 2015

[Scheduled] Infrastructure upgrade by our hosting provider
The scheduled maintenance has been completed.
Sep 30, 00:00-06:00 PST
Event write delay
Event writes are caught up now. Delays for event visibility in queries should be back to the normal ~10 seconds.
Sep 8, 13:27-14:45 PST
Extreme Query Delays
Query durations are looking somewhat better and are fluctuating less. We will still investigate the underlying issues but for the time being, the analysis API seems stable.
Sep 7, 08:52-12:16 PST
Periodic Long Query Delays
Queries continue to do fine in our Dallas data-center. Thank you once more.
Sep 5, 13:43-14:42 PST
Slow Query Times
Query times have normalized. We're working to make slowdown incidents less frequent (or non-existent?) in the future. Thank you for your ongoing patience.
Sep 4, 08:09-12:19 PST
Slow Queries
Everything is looking pretty swell. Thanks everyone.
Sep 3, 08:54-16:13 PST
Slow Query Times
We have fixed the problem in our database in Dallas. If this arises again, we will keep you wonderful people informed and take action again -- kind of like superheroes, really.
Sep 2, 10:38-11:15 PST

August 2015

Delays in writing events and slow queries
Both query times and write event delay are back to normal. We will continue monitoring.
Aug 30, 10:48-11:24 PST
Slow Queries and Extractions
This incident has been resolved.
Aug 26, 13:05-16:16 PST
Write latency for events
We've finished repairing the write event latency problem. Thank you for your patience.
Aug 19, 10:15-17:41 PST
Minor Query Inconsistency
Background processes tackling the temporary data inconsistency are progressing well. Queries might encounter small amounts of inconsistency as the repairs take place over the weekend. We apologize for any confusion this has caused.
Aug 7, 15:59-16:57 PST
Slow queries
This incident has been resolved.
Aug 5, 11:38-13:21 PST

July 2015

Delayed writes
We had a short event backup as a result of a minor storage issue. This issue has been resolved and events are being processed as usual. Sorry for the delay!
Jul 30, 05:34-05:55 PST
Degraded performance - Data Collection and Analysis
We are back to normal response times. Thank you for your patience.
Jul 22, 22:20 - Jul 23, 00:13 PST
Slow Query Duration
We have determined the slowness from today was a result of extra load on our query infrasctructure. Thank you for your patience.
Jul 22, 11:40-15:09 PST
Delays in Event Writing
Incoming event availability is restored. We are back to recording the events at normal speed.
Jul 20, 13:49-14:38 PST

June 2015

Slow Query Durations
This incident has been resolved.
Jun 29, 07:43 - Jun 30, 07:44 PST
Slow query responses
This incident has been resolved.
Jun 9, 19:00-19:14 PST
Slow query responses
This incident has been resolved.
Jun 9, 15:20-16:59 PST
Slow queries and elevated number of query timeouts
Backlog is clear. Queries should be performing normally now.
Jun 1, 15:12-20:00 PST

May 2015

[Scheduled] Service Migration
Migration complete! Success! Have a nice day :)
May 27, 12:00-16:35 PST
A small bug in our Keen enrichment add on for user agent data has been fixed!
We have deployed a small bug fix to Keen's data enrichment add on for user agent data that might cause you to see some changes to your data. The bug was causing Windows Mobile devices to be misreported as Android devices for the past 6 months. As a result, counts on the number of users across different devices prior to May 26th at 4pm PDT may be off. This happened because our underlying database of devices, OSes, and browsers was out of date! (Ohnoes!) We've since updated the library that provides us (and you) with user agent information. This is now fixed and all counts on the number of users across devices will be accurate. This requires no changes to how you send us data. Unfortunately, our systems are unable to retroactively update user agent data. If you require this historical data to be updated and accurate, please reach out to us. We apologize for any inconvenience this may have caused. Please don’t hesitate to reach out to us with any other questions. A big thank you to our customers for finding and reporting this bug! <3
May 27, 11:08 PST
Delay in writing events
The event backlog is all clear, and with that, this incident is resolved. We apologize for any inconvenience!
May 4, 11:08-12:20 PST

April 2015

Delay in writing events.
We've completed maintenance and the backlog is clear. All fixed up!
Apr 28, 09:34-09:48 PST
Slow incoming event processing times
This incident has been resolved.
Apr 22, 12:16-16:03 PST
InvalidTimeZoneError being returned for queries with numeric timezones
We've successfully rolled back the change that caused this issue. We sincerely apologize for the inconvenience!
Apr 20, 13:06-13:44 PST
Delay in Writing Events
All is steady; events are ready for query within seconds of capture. Thanks for using Keen IO!
Apr 14, 13:15-14:48 PST
S3 Streaming Uploads are Delayed
The S3 upload backlog has been cleared and S3 streaming is now back to normal. We apologize for any inconvenience this may have caused and as always appreciate your patience!
Apr 13, 05:02-12:36 PST
Delay in writing events.
The backlog of delayed events has been completed. Events are now flowing in a timely fashion.
Apr 12, 10:47-11:07 PST
Interrupted Connections
Load balancers are all fixed! Thank you for your patience as we work through a big list of fixes and improvements!
Apr 2, 14:47-15:18 PST
Query failures for some users and delayed writes for some events.
Everything looks healthy. Sorry for the interruption!
Apr 2, 08:53-09:43 PST
Slow Query Performance
Query load has subsided and average duration has dropped to acceptable levels. Thanks for your patience!
Apr 1, 13:58-17:30 PST

March 2015

Query Service Errors
We're back up and running. Our internal load balancer (HAProxy) got stuck in an ornery state, and it took us a while to realize it was the load balancer instead of the actual services causing the errors. Sorry for the intermittent failures!
Mar 31, 14:42-16:07 PST
Analysis temporarily unavailable
We’ve identified and solved the problem. Analysis queries have returned to normal operation.
Mar 27, 12:10-12:18 PST
[Scheduled] Configuration maintenance
This is complete.
Mar 27, 10:42 PST
[Scheduled] Configuration maintenance
Closing this.
Mar 27, 10:42 PST
Slow query durations
Query performance is much more stable and we've identified further areas for improvement in the coming days. Thanks for your patience!
Mar 25, 11:17-17:06 PST
Slow queries
This issue has been resolved. We are still investigating root cause.
Mar 23, 11:11-11:22 PST
Delay in processing events
The backlog has been cleared and events are being processed normally.
Mar 13, 14:57-15:23 PST
Query Durations are High
Query times have improved after deploying additional capacity.
Mar 9, 10:12-11:06 PST
Brief Delay in Event Writing
The backlog has been completed.
Mar 7, 06:10-06:28 PST
Query Durations are High
Query durations have returned to normal.
Mar 5, 02:19-03:48 PST
Event processing is slow
The backlog is clear, we are back to normal levels. We've identified the root cause and have a fix in the pipeline to prevent this from happening again.
Mar 4, 17:33-18:43 PST
[Scheduled] Maintenance on our storage layer
The scheduled maintenance has been completed.
Mar 3, 16:15 PST
Query durations are high.
We've made significant adjustments to our query execution this morning, increasing capacity by 30% and distributing query execution across multiple data centers. At present our query durations have leveled out and look much more consistent.
Mar 3, 07:13-08:40 PST
[Scheduled] Storage layer maintenance
The scheduled maintenance has been completed.
Mar 2, 18:07 PST
Query duration increase
Query durations have returned to normal.
Mar 2, 14:54-15:51 PST

February 2015

Unplanned maintenance: Temporary Delay in Writes
We have completed maintenance and all events should be available for querying.
Feb 27, 13:07-13:58 PST
Write backlog
We've replayed the backlog and all events are now available for querying!
Feb 27, 08:22-11:00 PST
Delay in making events available for queries
All the writers have caught up and the backlog has been drained.
Feb 24, 17:18-19:00 PST
High query durations
We’ve identified a problem with our query durations and have executed a solution. Query durations should now be returned to normal.
Feb 23, 12:15-13:00 PST
Query Durations are Up for our Dallas data centre
All systems are stable. We are humbled by the patience and professionalism of our community. Thank you for reminding us why we do this every day.
Feb 19, 06:31-20:50 PST
Increased query latency
We’ve brought the query latency back to normal levels.
Feb 19, 00:16-03:49 PST
Partial outage writing events
All incoming events are being processed normally. We will continue to keep a close eye out and work on providing more detailed information.
Feb 18, 13:06-20:34 PST
Query Durations are Up
Query durations are back to normal. We are continuing to monitor and are working on improving Query stability.
Feb 17, 17:27-17:41 PST
Datastore servicing
Query durations have returned to normal levels. We are continuing to deploy more capacity now that incident has completed in an effort to prevent further slowdowns.
Feb 17, 12:55-15:03 PST
Query durations are high.
Query durations have returned to normal.
Feb 17, 11:28-11:37 PST
Query durations are up
We're seeing normal query durations after making adjustments to query scheduling.
Feb 17, 06:42-07:09 PST
Query durations are up
Query durations hare returned to normal.
Feb 16, 16:51-18:50 PST
Increased query durations for some customers
Query durations have returned to normal levels across the board for all Keen customers.
Feb 12, 09:22-14:47 PST
Delayed Query Availability of Events
We've completed the maintenance and the time between event writes and their availability for querying has returned to normal.
Feb 9, 12:47-13:08 PST
Quick Maintenance Will Cause Delays in Event Querying Availability
We've completed the maintenance and the small backlog created during this maintenance has been processed.
Feb 9, 08:54-09:28 PST
Delay in Writing Events
We have completed working through the small backlog of delayed events and all events are available for querying.
Feb 4, 08:54-09:01 PST
Slow query responses
Query durations have returned to normal.
Jan 31, 23:10 - Feb 1, 01:37 PST

January 2015

Temporary Read Inconsistency
Queries should be back to normal. If you are experiencing any unexpected null properties, please feel free to contact us and we will assist you.
Jan 30, 10:28-14:17 PST
Queries are failing
Queries should be working normally again. We apologize for the inconvenience.
Jan 30, 01:49-02:13 PST
Write events delayed
The delay in event writes has been fixed. Again, no data was lost.
Jan 28, 17:04-17:17 PST
Slow query responses
Queries have been performing at normal speeds since 18:00 PST. We are continuing to investigate the root cause.
Jan 27, 09:02-19:54 PST
Datacenter Network Outage
We've recovered from a brief network outage in one of our multiple datacenters. We apologize for temporary slow writes, slow queries, and website downtime that impacted some of our customers.
Jan 22, 20:57 PST
Slow query responses
We've identified the cause of the issue and locked it down. Query performance is back to normal and we will be making further improvements to head off this type of outage going forward. We apologize again for the inconvenience of this outage.
Jan 20, 17:25 - Jan 21, 00:33 PST
Slow Query Response Times
Response times on the API have recovered and we are continuing to investigate.
Jan 9, 08:40-10:06 PST
High Query Latency
Query latency has recovered, and our investigation into the root cause is ongoing.
Jan 8, 12:31-13:31 PST
Slow writes
This incident has been resolved.
Jan 6, 11:56-12:13 PST

December 2014

Increase in Query Durations
Query durations have returned to normal. The underlying cause was a failure in a portion of our query queues that caused an increased in latency and a premature failure of some outstanding read requests. The affected services have been disabled and we'll use the metrics we accumulated to diagnose the failure before proceeding. We apologize for the inconvenience!
Dec 30, 14:27-14:44 PST
Slow Query Performance
All systems have returned to normal operation after over 12 hours of monitoring! Thank you for your patience!
Dec 8, 03:54 - Dec 9, 07:50 PST
Poor Query Performance and Increased Timeouts
We've completed isolating specific workloads that were causing problems and things are looking speedy again. We apologize for the unpredictable query performance today.
Dec 7, 18:30-20:18 PST
Query Slowdown
This incident has been resolved.
Dec 7, 13:22-14:24 PST
Query Slowdown
Query speeds are looking normal again. We apologize for the disruption in your queries!
Dec 7, 11:26-11:52 PST
Delayed Writes and Slow Query Performance
We've caught up with the backlog and all events have been written. We're sorry for the trouble and are working on a postmortem.
Dec 4, 10:30-11:34 PST

November 2014

Delay in processing some events.
We have deployed a fix for the issue and all backlogged events have now been processed. This problem began at around 7p central time and was cleared at around 8:50p central time. Problem: Defensive code in our system blocked a larger-than-normal event and a configuration change had to be deployed to event processing systems. This took approximately 30 minutes to track down and another 30 minutes to configure and deploy. We're really sorry for the trouble!
Nov 27, 18:00-19:00 PST
Slow query response
We have resolved the issues affecting query performance and the analysis API should be back to normal!
Nov 24, 09:20-14:41 PST
Intermittent API errors
Unscheduled maintenance on our infrastructure should resolve these intermittent API errors.
Nov 18, 16:56-17:50 PST
Intermittent Service Errors
We are back up and running, intermittent networking issue caused some instability.
Nov 17, 19:04-20:15 PST
keen.io site is sometimes returning 500 Internal Server Error
We identified and fixed an issue with one of our web servers. All requests should be working properly again.
Nov 5, 12:05-12:12 PST

October 2014

Some events aren't being made available for querying immediately
The issue with delayed events being available for querying has been resolved, along with the underlying root cause. All systems operational!
Oct 10, 02:09-12:37 PST
Increased execution times for some queries.
Query execution has been stable for many hours. We will continue to monitor throughout the week.
Oct 6, 09:15-15:20 PST
Data Center Maintenance Today
This maintenance has been completed. There were no interruptions to service.
Oct 1, 07:51 - Oct 2, 08:36 PST

September 2014

Some events aren't being made available for querying immediately
We've identified the issue as stemming from a bad piece of data in one of our queues. The data originated from a non-customer-facing internal process that has been shut down. We'll fix that process so it can't do that again. In the meantime, we've removed that from blocking progress and everything looks normal now. We'll continue to monitor the situation but this should be resolved. Apologies for the inconvenience.
Sep 22, 14:18-14:41 PST
Slow down in data being made available for querying
As of 5:50 PM Pacific time, the backlog of events was completely cleared. All systems are operating normally right now. Thanks for your patience.
Sep 12, 11:26-18:26 PST

August 2014

Issues recording events.
We've identified the root cause as latency between data centers and identified the components involved in the latency. Writes are back to normal and we will be making changes to decrease the impact of cross-DC latency in the future.
Aug 25, 15:04-15:32 PST
Brief slowdowns across the API
Our infrastructure provider experienced a small window of network instability between our production data centers. This occurred from 8:41 PM Pacific time until 8:52 PM Pacific time. Our monitoring notified us of increased API latencies. During investigation we pinpointed that the problem stemmed from packet loss on the internal network. While investigating, the issue resolved itself and API performance returned to normal levels. We're in discussions with our infrastructure provider to understand what happened and we'll continue to monitor our service.
Aug 23, 21:16 PST
Issues recording and querying events
All systems are back to normal. Sorry for the inconvenience!
Aug 13, 11:37-14:02 PST
Delay in new data being ready for querying
We have now caught up with the backlog of events and things are operating normally.
Aug 3, 09:15-10:13 PST

July 2014

Delay in new data being ready for querying
All systems are back to normal. Thanks for your patience!
Jul 30, 10:25 - Jul 31, 10:47 PST
Data not immediately queryable
This incident has been resolved.
Jul 14, 13:46-14:00 PST

June 2014

Data not immediately queryable
This issue has been resolved, and queries are fully up to date. The root cause was a bug in the write path that processes add-ons. A customer had provided an invalid value for an add-on that was not being guarded against and that caused an exception. A fix has been deployed.
Jun 25, 12:58-13:38 PST
Query instability
The instability has been resolved. A writeup of the incident will follow. Thanks for your patience!
Jun 12, 13:58-18:20 PST
Delay in events being available for query.
The delay in the write path has been corrected and events are now being correctly written. No event data was lost. Any events sent during this problem were merely queued for writing. Events written during the period from 7:29a PST to 7:53a PST may take a few minutes to show up for querying but are being written.
Jun 11, 07:49-07:59 PST

May 2014

Delay in writing new events
We've deployed a code fix that resolves this issue. All systems are functioning normally.
May 4, 21:50-22:05 PST

April 2014

Keen.io Website is Unavailable (API is fine)
This incident has been resolved.
Apr 29, 15:17-15:23 PST
Delay in writing new events
This incident has been resolved.
Apr 26, 14:39-15:05 PST
Keen IO and the “Heartbleed” OpenSSL Vulnerability
A security vulnerability in OpenSSL was announced yesterday. Like most of the other sites on the Internet, Keen IO was vulnerable to this. We patched our systems around 9PM PST last night night and completed a deployment of new keys for our SSL certificates at 2:30PM PST today. During deployment of new keys to the Keen IO infrastructure there was a brief outage caused by a malformed certificate file from approximately 2:01PM PST to 2:06PST. We quickly corrected the issue by reverting to the old keys then deployed new keys at 2:30PM PST.
Apr 9, 15:27 PST

March 2014

Slow/Failing queries
Starting at 7:59pm Pacific some customers experienced slow/failing queries. We found and isolated the problem and as of 9:02pm Pacific it is resolved. Data collection was not affected. We apologize for any inconvenience.
Mar 31, 21:24 PST
Query Instability for Some Customers
Everything is back to normal. Queries are functioning normally across all data centers. Apologies for any inconvenience.
Mar 11, 11:02-15:07 PST

February 2014

System instability
The DDoS attack appears to be resolved at this time. If another one comes through we'll set up a new notification.
Feb 11, 20:00-20:55 PST
Intermittent Query Outage
We experienced instability on one of our query clusters, but it's resolved now.
Feb 4, 10:35-11:23 PST

January 2014

Query Performance Issues
We've returned service to normal and are currently investigating which queries caused these issues.
Jan 22, 13:38-15:44 PST
Website timeouts
This incident has been resolved.
Jan 21, 14:36-18:25 PST
New data not available for querying immediately
We've resolved the issue. We were under heavier load than normal and deployed additional resources to handle the load. As always, we're monitoring the situation and will be taking steps to prevent this sort of thing from being customer-impacting in the future.
Jan 21, 07:50-08:32 PST
Subset of queries returning stale data
At this time all maintenance is complete and our systems are operating normally.
Jan 16, 08:48-10:07 PST
Some query problems with newly added collections and event properties
This issue has been resolved for all but a few customers with unique data sets. We're in the communication with each of them individually and is no longer a concern to new or other existing users.
Nov 21, 13:56 - Jan 8, 14:55 PST
Database Issues
All service has been fully restored. Impact: Clients experienced elevated (single-digit) error rates from 05:00PST to 06:40PST for both data collection and analysis. During a 9-minute window from 05:35PST to 05:44PST error rates spiked as a patch was applied to one of our database clusters. After the patch was applied, error rates dropped effectively to 0, and service was restored. Cause: The errors originated from a database cluster that we now use mainly to store metadata. This cluster became stuck in a loop of electing then re-electing a primary replica node. We were able to pin down this behavior to a known bug for which there was a fix in a future version. We sincerely apologize for the inconvenience this incident has caused. We understand the reliability you expect from our service and the trust you place in us. We will do better. A more detailed RCA (root cause analysis) will follow once that analysis is complete. -Josh
Jan 8, 05:14-09:09 PST

December 2013

Query instability
Any customer impact from this incident has been resolved.
Dec 22, 16:40 - Dec 23, 00:09 PST
New data not made available for querying immediately
All systems are functioning normally.
Dec 21, 11:35-12:38 PST
Data not available for querying immediately
This incident has been resolved.
Dec 19, 15:55-15:59 PST
Event indexer behind
The maintenance finished normally over night and all our systems are functioning normally now.
Dec 2, 14:43 - Dec 3, 09:01 PST

November 2013

Event indexer slower than normal (new architecture only)
This has been completely resolved.
Nov 26, 19:50-21:07 PST
Potential DDoS
All services are back online and operating normally. We are very sorry for the inconvenience that this outage has caused. This is completely unacceptable to us, and we're going to be working hard to fully restore your confidence in our service. Part of earning your trust is being transparent about service interruptions like these. What follows is a technical explanation of what happened. --- At around 12:00am PST we were alerted to a massive spike in open connections to our load balancer - over 100x the normal amount. We believed we were under attack, and began trying to identify the source of the traffic and mitigate it. We were unable to identify a culpable traffic source. Patterns were typical, yet a flood of connections continued to exist. We then turned our attention internally. We performed a rolling restart of our application pools in an attempt to reset connections, but as they came back up they were again instantly saturated with open connections. We inspected the connections and determined they were not malicious. They were authentic requests, but not being fulfilled and released properly. It was at this time we made the very difficult decision to take the API offline. Not difficult from a methodology perspective; the API was effectively not doing work even though it was "up". It was difficult because taking the API offline is an acknowledgement that we're temporarily not capturing events. If the client doesn't have queuing or retry logic built in it means those events may be unrecoverable. I want to be very clear that this is the worse-case, last-resort situation for us. We go to great lengths to preserve the uptime and integrity of the entire API, and particularly the ingestion side. This includes running hot-hot in multiple data centers and adding redundancy at every layer. Sadly even these measures did not forestall the need to take the API offline around 1am PST. Instantly, three of our engineers began running diagnostics and searching for the root cause. The root cause was identified as a database deadlock triggered by an atomic update operation. This deadlock was particularly destructive: it not only locked up 1 database, but all write operations for the entire cluster. Based on our understanding of the DBMS, MongoDB, we didn't believe this scenario was possible, and as a consequence it took us longer to track it down. We will be reaching out to MongoDB to figure out what happened and get a bug filed if applicable. Starting at 5:30am, we removed the code that triggered the destructive operation and cautiously brought the API back online. The API was fully operational as of 7am. No data was lost that had already been captured before the outage. However, any 500-level response codes that HTTP clients received during the outage indicate that our API did not store the event. If you had the event in a queue, or can regenerate it, you can resend it at this time (remember to override the keen.timestamp property to when the event actually happened). If not, the events will ultimately be missing from your collections, and you may need to use filters to exclude the outage period from certain queries. We sincerely apologize for the inconvenience this has caused to our customers. We take this very seriously and are very committed to meeting your expectations in the future. Here are a few things we are doing right now to make things better: - We are replacing our current data architecture with one that's far better suited to writing huge streams of events (lock-free) and running large, parallelizable queries. This new architecture, based on Cassandra & Storm, is already serving our largest customers and will be serving all customers soon. - We will post more frequent updates during outages. We understand that these updates help you decide how to adapt and respond, and we need to do better than we did this time. If you have any questions about the outage, or our future plans for robustness, please don't hesitate to get in touch. josh at keen io. -Josh
Nov 21, 00:08-11:18 PST
Some queries are inaccurate due to stale data on one server
This issue has been resolved -- all queries should be up to date and operational! Sorry for the inconvenience!
Nov 20, 10:20-10:39 PST
New architecture query instability
This has been resolved for now but we're still working on an RCA.
Nov 15, 15:21-19:29 PST
Query Instability
Queries are being served normally again for all users.
Nov 11, 13:45-15:02 PST
System Incident
We pinpointed the issue to be with the way we handle customers deleting events in a certain way. It uncovered a number of issues which we've patched, deployed, and are currently monitoring. Our apologies for the inconvenience. UPDATE: We've posted a post mortem on our blog here: https://keen.io/blog/66171746436/were-sorry-heres-what-happened
Nov 4, 12:34-15:56 PST

October 2013

Data Indexing Slowdown
This incident has been resolved.
Oct 10, 14:18-20:27 PST

September 2013

Hosting Provider Maintenance
Our hosting provider, SoftLayer, had a maintenance window which affected one of our data centers. This maintenance window resulted in some instability of our API. At this time the maintenance window is over and our systems are operating normally.
Sep 26, 09:25 PST

August 2013

No incidents reported for this month.

July 2013

Data Collection API Outage
At 8:12am US/Pacific time, we experienced issues in our datastore that caused our API to reject data for around 15 minutes. We've resolved the issue and will post a write up of the root cause.
Jul 25, 08:48 PST
Data Collection API Slowdown
At 9:44am US/Pacific time we noticed a slowdown in on Data Collection API. It is due to heavy write load on our service.
Jul 23, 02:44 PST
Service Downtime
Rolling service outage affecting multiple customers.
Jul 18, 19:42 PST