How to Investigate a Security Breach Using Cockpit and Audit Trail Together

Security breaches are rarely loud; more often, they begin as a subtle spike in CPU usage, an unexpected burst of outbound traffic, or a failed login attempt from an unusual IP. The sooner you can detect these warning signs, the more likely you are to contain the damage, trace the root cause, and respond decisively.

At Scaleway, Cockpit and Audit Trail are designed to work in tandem during these moments: one providing real-time visibility into system behavior, the other delivering comprehensive traceability of user and system actions. When used together, they form a powerful incident response workflow that helps your teams move from detection to resolution with clarity and confidence.

Here’s how that process unfolds.

Spotting the anomaly

Every investigation begins with a signal, a hint that something isn’t right. This might be a service outage, an unexpected performance degradation, or even just a spike in resource consumption during off-hours. Missing that signal, often buried under noise, can become costly very quickly. That’s where real-time observability with Cockpit becomes essential.

With live dashboards showing CPU usage, memory allocation, and network traffic, your team can quickly spot unusual patterns across infrastructure or applications. Alerts configured for key metrics act as early warning systems, helping your engineers detect anomalies before they escalate into incidents.

More than just numbers, these signals are the entry point to the investigation.

From suspicion to confirmation

Once abnormal behavior is detected, the next step is to answer a more critical question: was this activity legitimate?

This is where complete visibility into user and system actions becomes invaluable. By examining detailed activity logs using Audit Trail, teams can trace the sequence of events that preceded the anomaly. They might uncover a configuration change made moments before the spike, a newly created virtual machine, or an access token that shouldn’t have been used.

Because these logs are tied to specific users, IP addresses, and API calls — and retained in a tamper-proof format — they provide the accountability needed to determine whether the behavior was authorized or a sign of compromise.

If something doesn’t line up — for example, a resource scaled unexpectedly or an admin role granted without proper authorization — that’s your confirmation: the anomaly isn’t just a blip. It’s a breach!

Mapping the scope of impact

With the breach confirmed, your team must shift into containment and analysis mode. But doing so without full visibility risks missing critical dependencies or overlooking affected systems.

To understand how far the issue has spread, engineers use Cockpit to examine infrastructure performance and usage trends leading up to and during the event. Visual insights help pinpoint when services started behaving abnormally, which components were under stress, and how those impacts evolved over time.

At the same time, historical logs stored in Audit Trail offer a detailed timeline of every action taken in the environment: from login attempts and failed authentications, to changes in IAM roles and resource provisioning. By aligning these logs with the observed performance patterns, it becomes possible to build a complete narrative: who did what, when, and what consequences followed.

Taking action

At this point, speed matters. With context in hand, your team can begin mitigation by:

  • Scaling down or isolating affected services
  • Reviewing, and revoking where necessary, access permissions
  • Disabling compromised credentials and API keys
  • Updating firewall rules to block suspect IPs

Throughout this process, the monitoring layer from your observability solution ensures that recovery actions are having the intended effect: performance stabilizes and no new anomalies appear. Meanwhile, detailed audit records support immediate decision-making and, if needed, form the basis of a report for internal compliance teams or external regulators.

This step isn’t just about fixing what went wrong. It’s about doing so in a way that’s visible, accountable, and defensible.

Learning from the incident

After the dust settles, a well-run incident response ends with a post-mortem. Not just to understand what happened, but to ensure it doesn’t happen again.

Here, observability and auditing continue to be key. Performance data collected throughout the event helps teams refine thresholds for future alerting. Unexpected traffic patterns might prompt changes in network policies or API rate limits. Most importantly, access logs provide a factual basis for understanding whether internal controls worked as intended or need revision.

Clear documentation of the incident, built from timestamped events and real-time telemetry, is not only useful for internal knowledge-sharing but often essential for proving due diligence in the event of regulatory scrutiny.

This is also beneficial to rebuild trust with clients: in the event of a security breach, a clear and transparent explanation of the events goes a long way to assuage customer concerns.

The case for combined visibility

When responding to a breach, timing is everything. And timing depends on visibility — not just into how your systems behave, but into how your teams and users interact with those systems.

That’s why tools like Cockpit and Audit Trail aren’t just complementary: they’re connected parts of the same story. Together, they help teams:

  • Detect irregular system behavior quickly
  • Trace incidents back to their origin with full accountability
  • Make informed, confident decisions during remediation
  • Document incidents thoroughly for compliance and improvement

Security isn’t just about prevention. It’s about having the right lens to see what’s happening, and the right records to explain how it happened.

Be prepared before the next incident strikes: explore how Scaleway’s observability tools can help you respond faster and smarter.

Recommended articles