Delayed data processing, Degradation in Fullstory Web application

Incident Report for Fullstory

Postmortem

2025/10/22 Fullstory Networking Disruption

On October 22, 2025, the Fullstory platform experienced a 30-minute networking control plane partial outage, beginning at 2:44 PM EDT. This incident led to widespread degradation, impacting network requests for many applications and resulting in customer logouts, partial capture failure, and delayed indexing and warehousing sync. The root cause was identified as a misconfiguration of resource pinning within a component of Fullstory's central networking infrastructure. This misconfiguration prevented the component from restarting successfully after an update, leading to a cycle of crashing and restarting as the remaining service replicas became overloaded.

Customer Impact

During the incident window, some API requests, Fullstory Analytics app usage, and data captures would have resulted in HTTP errors. Some users logged into Fullstory Analytics were logged out of the application. Affected sessions will have missing data causing gaps/disruptions when loading playback, webhooks not firing, and events not being exported via the Anywhere: Warehouse feature. Data capture may have been affected for all orgs during the incident window. Please reach out to support@fullstory.com if you have any questions or concerns.

Root Cause

Due to an update in configuration, our networking control plane required a progressive restart. Normally this restart happens very quickly without effect on any systems. Due to an unforeseen misconfiguration of resource pinning, the control plane spiked in CPU and caused a domino effect on itself, unable to recover without intervention.

Resolution

After receiving alerts on the failed networking components, our Core Infrastructure team took action to quickly implement changes to stop the cascading failures. These interventions were strategically designed to prevent a cascading series of failures across our interconnected systems, thereby mitigating further impact and providing a rapid restoration of services.

Process Changes and Prevention

Actions Taken:

  • Stabilized the core networking components
  • Added resiliency to these components to better handle full load during replica loss
  • Fixed misconfiguration of resource pinning to allow networking components to fully utilize all available resources

We deeply regret this incident and invite any Fullstory customer who was materially affected to contact support@fullstory.com. We stand by ready to fully address all of your concerns.

Posted Oct 23, 2025 - 18:29 EDT

Resolved

This issue is resolved. The Fullstory platform, including login, data processing and indexing, webhooks, APIs, and destinations, are fully operational. If there are any follow up questions, please reach out to support@fullstory.com.
Posted Oct 22, 2025 - 17:35 EDT

Identified

Customers may have experienced delayed data processing and indexing, or degradation to the Fullstory Web application between 2:46 PM ET to 2:55 PM ET. Data ingested within this timeframe may be delayed or affected. We have identified the issue and are actively working on full resolution.
Posted Oct 22, 2025 - 15:42 EDT
This incident affected: Data Capture (Web Capture, Native Mobile Capture) and API, Fullstory Web Application, Destinations.