From Wednesday July 16, 8:05 PM UTC through Thursday July 17, 09:42 AM UTC an update to the Native Mobile asset upload service caused 50-70% of all asset management requests to fail due to an issue in routing requests for rs.eu1.fullstory.com
and mr.eu1.fullstory.com
.
During the incident window any new Native Mobile build that performed asset uploading or checking would have failed due to 404 errors when hitting the necessary endpoint. As multiple calls are made during the build process almost all builds were likely impacted during the partial outage. Retries of failed builds would have resulted in the same failures until the underlying issue was resolved.
Additionally, session capture was disrupted for applications using the iOS SDK that perform asset uploads at runtime, as 404 errors during capture result in capture being stopped immediately. Sessions in progress were interrupted and new sessions could not be created, and this capture data is not available. Android applications were not interrupted, however assets uploaded during these sessions are likely not available during playback.
An update to our asset management service was deployed on July 16th to improve the reliability of asset uploads for Native Mobile builds. To mitigate a source of failures during asset uploads, the service was divided into two portions to handle asset uploads in a robust manner to separate these endpoints from the resource crawling and fetching functionality. To support this change, a routing update for our public endpoints was necessary to direct asset upload requests to the appropriate service. While the necessary routing changes were programmatically applied in our NA production and staging environments, the update was not successful in our EU production environments. As a result, subsequent asset upload requests were failing 50-70% of the time because they were not delegated to the appropriate service.
The issue was addressed on July 17 at 9:20 AM UTC by manually applying the necessary routing changes. As soon as these changes were completed, all requests were properly routed to the correct asset upload service, and normal functionality was restored.
We are committed to preventing this incident in the future. We’ve completed the following action items:
In addition, we taking the following steps:
We deeply regret this incident and invite any Fullstory customer who was materially affected to contact support@fullstory.com. We stand by ready to fully address all of your concerns.