incident response

Organize your Incident Response Program - pt 1

goblikon

03 Oct 2024 — 5 min read

Paper incident reports from your steam-powered wooden computer.

A well-organized cyber security incident response (IR) program can certainly make the difference between a bad day and a bad week in the face of a serious incident, but with some extra work up front you can set yourself up for great metrics and reporting programs, organized alert tuning, and smooth on-boarding of new analysts to your security operations center (SOC).

I'm going to go over some practical tips and documents that I've found to increase the efficacy and efficiency of my IR programs.

Priority/Severity Ranking

You should have a system in place for marking how serious an event/incident was. Even a simple 1-5 based on gut feelings is better than nothing. A more "mature" option would be to have a set of severity & scope criteria assigned in your playbooks. This should be calculated when triaging the alert, but might change as more information is discovered.

Use Cases

A use case is a document that explains why the IR team is concerned with a certain type of behavior. These are great for organizing your thoughts and help analysts see the bigger picture. A description, attacker objective, data sources for detection, and detection logic can all be helpful here. These can be more or less specific depending on your teams needs, but generally these cover large categories of potential incidents:

Playbooks

What to do when you get that alert. This has been well covered by others and you can find lots of good examples on the web. I'll have some links at the bottom of the article.

Organize your alerts

Whatever system you have for finding the bad should be well documented. A simple table with some key points will do you wonders.

Here is a breakdown of the table elements:

ID - Unique ID for this alert
Description - what this alert looks for, in simple language
Platform - where this alert comes from (SIEM, IDS, EDR, etc.)
Query - specific to SIEM alerts, but what exactly you are looking for
Type & Status - Will cover these more below
Schedule - how often this alert runs if on a specific schedule
Use Case - Why you are looking for this type of activity
Playbook - What to do when you see this alert

Type

Type is an indicator of an alert's specificity and categorizes alerts by how likely they are to detect actual malicious behavior. I've boiled this down to a few levels:

Anomaly - a weird thing happened, it's probably not bad
Investigate - probably bad, but additional information is needed to determine
High-fidelity - Almost certainly bad

Status

Status is the status of the alert logic itself.

Experimental - Testing feasibility, in development
Functional - Still tuning, investigate with some care
Stable - Thoroughly vetted and stable, should not change unless something else changes

Make a ticket

Tickets are love. Tickets are life. This is the only thing I consider non-negotiable. When I say "a ticket" I mean some sort of quasi-permanent record containing information about the event/incident. Any time your team is investigating or responding to something there should be a unique record of that event that says what happened and what you did about it. You will need to find these in the future for analysis. You probably already have this. If you are running your IR program out of a Slack channel... gross.

Tickety Goodness

So, what goes in the ticket? What the incident/event was & what you did about it are key, but these tickets are going to drive your metrics & measures program and your alert tuning.

Use Cases that should go in there. So should the playbook. Now you are tracking what kind of events you are seeing. If you are creating tickets automatically based on alerts you can probably put these in from the start. Now you have a quick reference for what an event means and how to respond to it.

Scope, severity, and priority will be established by the SOC team during the triage process and possibly updated during investigation.

Alert ID should be tied to every ticket. What alert was this made for? I suggest having documented alert types for things like "support saw a weird thing" & "user reported event."

Every ticket should be marked as a true or false positive. You can have subcategories of these if you want, but a binary decision is probably best to start with.

Timestamps of when a ticket was open and when it went through various steps of the playbook can be helpful for measuring parts of SOC performance such as time-to-triage and time-to-resolve.

Putting it Together

What you have now is a single artifact that contains not only everything you need to work on the incident now, but quite a few useful things for the future: