From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Ravi Bangoria <ravi.bangoria@amd.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Namhyung Kim <namhyung@kernel.org>,
Sasha Levin <sashal@kernel.org>,
mingo@redhat.com, acme@kernel.org, tglx@kernel.org, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0-5.10] perf/amd/ibs: Avoid race between event add and NMI
Date: Mon, 20 Apr 2026 09:08:41 -0400 [thread overview]
Message-ID: <20260420131539.986432-55-sashal@kernel.org> (raw)
In-Reply-To: <20260420131539.986432-1-sashal@kernel.org>
From: Ravi Bangoria <ravi.bangoria@amd.com>
[ Upstream commit 1b044ff3c17e9d7fd93ffc0ba541ccdeb992d7f5 ]
Consider the following race:
--------
o OP_CTL contains stale value: OP_CTL[Val]=1, OP_CTL[En]=0
o A new IBS OP event is being added
o [P]: Process context, [N]: NMI context
[P] perf_ibs_add(event) {
[P] if (test_and_set_bit(IBS_ENABLED, pcpu->state))
[P] return;
[P] /* pcpu->state = IBS_ENABLED */
[P]
[P] pcpu->event = event;
[P]
[P] perf_ibs_start(event) {
[P] set_bit(IBS_STARTED, pcpu->state);
[P] /* pcpu->state = IBS_ENABLED | IBS_STARTED */
[P] clear_bit(IBS_STOPPING, pcpu->state);
[P] /* pcpu->state = IBS_ENABLED | IBS_STARTED */
[N] --> NMI due to genuine FETCH event. perf_ibs_handle_irq()
[N] called for OP PMU as well.
[N]
[N] perf_ibs_handle_irq(perf_ibs) {
[N] event = pcpu->event; /* See line 6 */
[N]
[N] if (!test_bit(IBS_STARTED, pcpu->state)) /* false */
[N] return 0;
[N]
[N] if (WARN_ON_ONCE(!event)) /* false */
[N] goto fail;
[N]
[N] if (!(*buf++ & perf_ibs->valid_mask)) /* false due to stale
[N] * IBS_OP_CTL value */
[N] goto fail;
[N]
[N] ...
[N]
[N] perf_ibs_enable_event() // *Accidentally* enable the event.
[N] }
[N]
[N] /*
[N] * Repeated NMIs may follow due to accidentally enabled IBS OP
[N] * event if the sample period is very low. It could also lead
[N] * to pcpu->state corruption if the event gets throttled due
[N] * to too frequent NMIs.
[N] */
[P] perf_ibs_enable_event();
[P] }
[P] }
--------
We cannot safely clear IBS_{FETCH|OP}_CTL while disabling the event,
because the register might be read again later. So, clear the register
in the enable path - before we update pcpu->state and enable the event.
This guarantees that any NMI that lands in the gap finds Val=0 and
bails out cleanly.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20260216042216.1440-6-ravi.bangoria@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a thorough understanding. Let me compile the full analysis.
---
## PHASE 1: COMMIT MESSAGE FORENSICS
### Step 1.1: Subject Line
- **Subsystem**: `perf/amd/ibs` (AMD Instruction-Based Sampling perf PMU
driver)
- **Action verb**: "Avoid" (almost always indicates fixing a bug)
- **Summary**: Prevent race between adding a perf event and NMI handling
in IBS
### Step 1.2: Tags
- **Signed-off-by: Ravi Bangoria** - the author, AMD IBS subsystem
expert
- **Signed-off-by: Peter Zijlstra (Intel)** - perf core maintainer,
committed the patch
- **Acked-by: Namhyung Kim** - another perf subsystem maintainer
- **Link**:
`https://patch.msgid.link/20260216042216.1440-6-ravi.bangoria@amd.com`
- patch 5/5 (originally numbered 6 including cover letter) in a series
- **No Fixes: tag** - expected for manual review candidates
- **No Cc: stable tag** - expected
Record: Strong reviewer/maintainer endorsement. No syzbot or bug
reports.
### Step 1.3: Commit Body
The commit body provides an extremely detailed, step-by-step walkthrough
of the race:
1. `OP_CTL` has stale `Val=1, En=0` from a previous IBS sample
2. In process context, `perf_ibs_add()` → `perf_ibs_start()` sets
`IBS_STARTED`
3. Before `perf_ibs_enable_event()` is called, an NMI arrives (from a
FETCH event)
4. NMI handler sees `IBS_STARTED=true`, reads stale `OP_CTL[Val]=1`,
treats it as valid
5. NMI handler calls `perf_ibs_enable_event()`, **accidentally**
enabling the OP event
6. Results in: repeated NMIs, `pcpu->state` corruption, potential
throttle issues
**Failure mode**: NMI storm and state corruption.
### Step 1.4: Hidden Bug Fix Detection
This is explicitly described as a race condition fix - not disguised at
all.
---
## PHASE 2: DIFF ANALYSIS
### Step 2.1: Inventory
- **Files changed**: 1 (`arch/x86/events/amd/ibs.c`)
- **Lines added**: +8 (6 comment lines + 1 blank + 1 code line)
- **Lines removed**: 0
- **Functions modified**: `perf_ibs_start()`
- **Scope**: Single-function surgical fix
### Step 2.2: Code Flow Change
**Before**: After computing `config`, `perf_ibs_start()` sets
`IBS_STARTED`, clears `IBS_STOPPING`, and enables the event. The IBS
control MSR may contain stale `Val=1` data from a previous sample.
**After**: Before setting `IBS_STARTED`, the code calls
`perf_ibs_disable_event(perf_ibs, hwc, 0)` which writes 0 to the IBS
control MSR, clearing `Val` and `En` bits. Any NMI arriving in the
window between this write and the actual event enable will read `Val=0`
and bail out.
### Step 2.3: Bug Mechanism
**Category**: Race condition between process context and NMI context
- The NMI handler (`perf_ibs_handle_irq`) is called for both FETCH and
OP PMUs on every IBS NMI (see `perf_ibs_nmi_handler` at line
1396-1411)
- A stale `Val=1` in the OP_CTL MSR fools the handler into thinking a
valid sample exists
- The handler then re-enables the event prematurely
### Step 2.4: Fix Quality
- **Obviously correct**: A single MSR zero-write before enabling, using
an existing function
- **Minimal/surgical**: 1 functional line + comments
- **Regression risk**: Extremely low - clearing a register before
writing new values into it is safe. The MSR was going to be written
anyway a few instructions later by `perf_ibs_enable_event()`
- **No red flags**: Single function, single file, existing API
---
## PHASE 3: GIT HISTORY INVESTIGATION
### Step 3.1: Blame
The IBS state machine code (set_bit/clear_bit for STARTED/STOPPING) was
introduced by:
- `5a50f529170113` (Peter Zijlstra, 2016-03-16): "Fix race with
IBS_STARTING state"
- `85dc600263c229` (Peter Zijlstra, 2016-03-21): "Fix pmu::stop()
nesting"
Both were merged in kernel 4.7. The race condition has existed since
then — present in **all active stable trees**.
### Step 3.2: No Fixes: tag (expected)
### Step 3.3: File History
Recent changes to `ibs.c` are mostly feature additions (ldlat filtering,
min_period, etc.) and cleanups (MSR rename, syscore changes). This fix
doesn't conflict with any of them.
### Step 3.4: Author
Ravi Bangoria is the AMD IBS subsystem maintainer/expert with 15+
commits to `arch/x86/events/amd/ibs.c`. This fix carries strong
authority.
### Step 3.5: Dependencies
This is patch 5/5 in a series titled "perf/amd/ibs: Assorted fixes":
1. "Account interrupt for discarded samples"
2. "Limit ldlat->l3missonly dependency to Zen5"
3. "Preserve PhyAddrVal bit when clearing PhyAddr MSR"
4. "Avoid calling perf_allow_kernel() from the IBS NMI handler"
5. "Avoid race between event add and NMI" (this patch)
**Critical finding**: This patch is **completely standalone**. It only
adds a call to the existing `perf_ibs_disable_event()` function. It does
not depend on any changes from patches 1-4 (which address unrelated
bugs).
---
## PHASE 4: MAILING LIST RESEARCH
### Step 4.1: Patch Discussion
- Found via patchew.org and spinics.net
- Upstream commit: `1b044ff3c17e9d7fd93ffc0ba541ccdeb992d7f5`
- Merged into tip/perf/core on Feb 27, 2026 by Peter Zijlstra
- v2 series (v1 was at different URL, split from enhancements)
### Step 4.2: Reviewers
- **Namhyung Kim** Acked the entire series
- **Peter Zijlstra** (perf core maintainer) committed it
- All appropriate subsystem maintainers were involved
### Step 4.3-4.5: No specific bug report (author-found via code
analysis), no stable-specific discussion found.
---
## PHASE 5: CODE SEMANTIC ANALYSIS
### Step 5.1: Key Functions
- `perf_ibs_start()` (modified) - called from `perf_ibs_add()` during
event scheduling
- `perf_ibs_disable_event()` (called, not modified) - writes 0 to IBS
control MSR
### Step 5.2: Callers of perf_ibs_start
- `perf_ibs_add()` → `perf_ibs_start()` — called during perf event
scheduling to CPU
- This is a standard perf PMU operation, triggered whenever a perf event
using IBS is scheduled
### Step 5.3-5.4: Call Chain
`perf_event_open()` → context scheduling → `perf_ibs_add()` →
`perf_ibs_start()`. The race is triggerable from userspace by using
`perf record` with IBS events on AMD processors. This is a common usage
path.
### Step 5.5: Similar Patterns
The existing IBS state machine already had two prior race fixes (5a50f52
and 85dc600) for similar NMI vs. process context races. This is a third
variant of the same class of bug.
---
## PHASE 6: STABLE TREE ANALYSIS
### Step 6.1: Buggy Code in Stable
The vulnerable code (`set_bit(IBS_STARTED, ...)` followed by
`perf_ibs_enable_event()` without clearing stale MSR data) has existed
since kernel 4.7 (2016). It exists in **all active stable trees**:
5.4.y, 5.10.y, 5.15.y, 6.1.y, 6.6.y, 6.12.y.
### Step 6.2: Backport Complications
The `perf_ibs_start()` function has seen minor changes over the years
(8b0bed7d in v5.9 added config variable, fa5d0a82 added min_period
check), but the core structure — compute config, set STARTED, enable
event — has remained stable. The MSR accessor names changed (`wrmsrl` →
`wrmsrq`) in newer trees, but the older trees still have
`perf_ibs_disable_event()` with the same signature. The patch should
apply cleanly to recent stable trees, and with trivial adjustments to
older ones.
### Step 6.3: No related fixes already in stable for this specific race.
---
## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT
### Step 7.1: Subsystem
- **Path**: `arch/x86/events/amd/ibs.c`
- **Subsystem**: x86 perf/PMU (AMD IBS)
- **Criticality**: IMPORTANT — AMD IBS is used by all AMD processor
users running perf profiling, which is standard practice for
performance analysis
### Step 7.2: Activity
Actively maintained by Ravi Bangoria at AMD with regular fixes and
enhancements.
---
## PHASE 8: IMPACT AND RISK ASSESSMENT
### Step 8.1: Who is Affected
All users of AMD processors (Zen and later) using `perf` with IBS
events. This includes server workloads, desktop profiling, and CI
systems running performance testing.
### Step 8.2: Trigger Conditions
- Both IBS FETCH and IBS OP events are in use (or hardware has residual
FETCH state)
- NMI from FETCH arrives during OP event initialization
- OP_CTL MSR has stale Val=1 from a previous sample
- More likely with low sample periods (high sampling frequency)
- Triggered from unprivileged perf usage (with appropriate
perf_event_paranoid)
### Step 8.3: Failure Mode
- **NMI storm**: Repeated NMIs can occur if the sample period is very
low
- **State corruption**: `pcpu->state` can become corrupted if the event
gets throttled
- **Severity**: HIGH — NMI storms can make a system unresponsive; state
corruption can lead to further crashes or undefined behavior
### Step 8.4: Risk-Benefit Ratio
- **Benefit**: Prevents NMI storms and state corruption on AMD systems
during IBS profiling — HIGH benefit
- **Risk**: Single function call to an existing function, writing 0 to
an MSR before writing new values — VERY LOW risk
- **Ratio**: Extremely favorable
---
## PHASE 9: FINAL SYNTHESIS
### Step 9.1: Evidence Summary
**FOR backporting:**
- Fixes a concrete race condition between process context and NMI
context
- Can cause NMI storms and state corruption (HIGH severity)
- Single line of functional code + comments (VERY LOW risk)
- Uses existing `perf_ibs_disable_event()` function (no new code paths)
- Self-contained — no dependencies on other patches
- Written by the AMD IBS maintainer, committed by perf core maintainer
- Acked by Namhyung Kim (another perf maintainer)
- Bug exists in all stable trees (since v4.7, 2016)
- Obviously correct — clearing a register before enabling it
**AGAINST backporting:**
- No user-reported bug or syzbot report (found via code analysis)
- Part of a 5-patch series (but confirmed standalone)
- No explicit stable nomination
### Step 9.2: Stable Rules Checklist
1. **Obviously correct and tested?** YES — single MSR write, committed
by PeterZ, Acked by Namhyung
2. **Fixes a real bug?** YES — documented race causing NMI storms and
state corruption
3. **Important issue?** YES — NMI storm = system hang; state corruption
= undefined behavior
4. **Small and contained?** YES — 1 functional line in 1 function in 1
file
5. **No new features or APIs?** Correct — no new features
6. **Can apply to stable trees?** YES — clean apply expected
### Step 9.3: Exception Categories
Not an exception category — this is a standard race condition bug fix.
### Step 9.4: Decision
Strong YES. This is a textbook stable backport candidate: a minimal,
obviously correct fix for a real race condition with severe consequences
(NMI storms, state corruption), written and reviewed by the appropriate
maintainers.
---
## Verification
- [Phase 1] Parsed tags: Signed-off-by (Bangoria, PeterZ), Acked-by
(Namhyung Kim), Link to patchew/lore
- [Phase 2] Diff analysis: 1 functional line added
(`perf_ibs_disable_event(perf_ibs, hwc, 0)`) + 7 comment/blank lines
in `perf_ibs_start()`
- [Phase 2] Verified `perf_ibs_disable_event()` with config=0 writes 0
to the MSR (lines 451-459 of ibs.c)
- [Phase 2] Verified NMI handler checks `valid_mask` at line 1261 —
stale Val=1 bypasses this check
- [Phase 2] Verified NMI handler is called for both FETCH and OP PMUs at
lines 1402-1403
- [Phase 3] git blame: IBS_STARTED state machine introduced by
5a50f529170113 (v4.7, 2016) and refined by 85dc600263c229 (v4.7, 2016)
— present in all stable trees
- [Phase 3] git log --author="Ravi Bangoria": 15+ commits to ibs.c —
confirmed subsystem expert
- [Phase 3] Verified no dependencies on other patches in series — only
uses existing `perf_ibs_disable_event()`
- [Phase 4] patchew.org: confirmed patch 5/5 in "Assorted fixes" series,
all Acked by Namhyung
- [Phase 4] spinics.net: confirmed upstream commit
1b044ff3c17e9d7fd93ffc0ba541ccdeb992d7f5, merged Feb 27, 2026
- [Phase 4] Verified series patches 1-4 are independent bug fixes
(different issues)
- [Phase 5] Verified `perf_ibs_start()` called from `perf_ibs_add()`
(line 568), a standard PMU scheduling path
- [Phase 6] git log v6.6.. and v6.1..: ibs.c has had changes but core
start() structure is preserved
- [Phase 8] Failure mode: NMI storm + state corruption → severity HIGH
- UNVERIFIED: Could not access lore.kernel.org directly due to Anubis
bot protection — used patchew.org and spinics.net mirrors instead
**YES**
arch/x86/events/amd/ibs.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e35132c5448dd..705e9a65c9ac9 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -490,6 +490,14 @@ static void perf_ibs_start(struct perf_event *event, int flags)
}
config |= period >> 4;
+ /*
+ * Reset the IBS_{FETCH|OP}_CTL MSR before updating pcpu->state.
+ * Doing so prevents a race condition in which an NMI due to other
+ * source might accidentally activate the event before we enable
+ * it ourselves.
+ */
+ perf_ibs_disable_event(perf_ibs, hwc, 0);
+
/*
* Set STARTED before enabling the hardware, such that a subsequent NMI
* must observe it.
--
2.53.0
next parent reply other threads:[~2026-04-20 13:17 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260420131539.986432-1-sashal@kernel.org>
2026-04-20 13:08 ` Sasha Levin [this message]
2026-04-20 13:09 ` [PATCH AUTOSEL 7.0-6.18] perf/amd/ibs: Limit ldlat->l3missonly dependency to Zen5 Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260420131539.986432-55-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=acme@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=patches@lists.linux.dev \
--cc=peterz@infradead.org \
--cc=ravi.bangoria@amd.com \
--cc=stable@vger.kernel.org \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox