From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01D243A1D05; Mon, 20 Apr 2026 13:15:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776690946; cv=none; b=Mcww6q8SFEw43ooXaZEiqL9bpIlMPGKRMXhd/fDGmy5ldAI0tFZmzY4odKU50kQFVi43yNgjrBxTADeIQL1PCIY0W56rOlsCm3Ih9zIYC67KDXFTaUBzix/K4fMQP3+6W970Dp6ebTjwtRqM1wvt2BM65hHalKhsRzJCT/qU2FQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776690946; c=relaxed/simple; bh=aLB5/vX8UYbLn2INeG6AwfgkHeDCamVDuEQNIBttXY0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=umKHckcJwx/tySI8HRA0U2kglvJOX82EyjGBykJ26o7SkYurVj0iygHK2fiFJWUzqddjVuh2+NI2Nbxf1iJ862R/yiWKI13DmhvIl4eFmiuPG/g32xNWJqplCMw7bdmBmOA28B+fxZxurvFmwmn7I6CimzB/wBC6dMCsEcs9zbU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aF4iW/VX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aF4iW/VX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 74B8DC2BCB6; Mon, 20 Apr 2026 13:15:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776690945; bh=aLB5/vX8UYbLn2INeG6AwfgkHeDCamVDuEQNIBttXY0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aF4iW/VXja2jbN13SiNPdmmBPh5O73bSM7VTgoKu4ShJSg/RhVyt0x9dZ75/Wcwl4 FX3OWdUh++/oweqrNXP8RM8o6DvCEh9aHBD3mX+zBxi/8oBbSgomE2V2EpH6yXgHIS 4rlsri+hkA3XGhzV9JiutNoJvWOb5IwUVkziWtmaCBIP81S55pqZNrolULT6NhlHn/ rTPaJb/eg5FuZiCX5/kDSLGA09aaew802hoTmf1nacBE5Od1L2t+NwdkwP23sGruMQ IGDr8QtS7mR5x3jxEqWy/v+royAvS5FrRHo7Zk2z8j7NTD0OoWF7za4kFAfo6iWwH6 JO5f+OAKa6euw== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Vincent Donnefort , "Steven Rostedt (Google)" , Sasha Levin , mhiramat@kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Subject: [PATCH AUTOSEL 7.0-6.12] ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers Date: Mon, 20 Apr 2026 09:07:49 -0400 Message-ID: <20260420131539.986432-3-sashal@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260420131539.986432-1-sashal@kernel.org> References: <20260420131539.986432-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 7.0 Content-Transfer-Encoding: 8bit From: Vincent Donnefort [ Upstream commit 20ad8b0888be392eb2c4c3654805eb8594952373 ] On CPU hotplug, if it is the first time a trace_buffer sees a CPU, a ring_buffer_per_cpu will be allocated and its corresponding bit toggled in the cpumask. Many readers check this cpumask to know if they can safely read the ring_buffer_per_cpu but they are doing so without memory ordering and may observe the cpumask bit set while having NULL buffer pointer. Enforce the memory read ordering by sending an IPI to all online CPUs. The hotplug path is a slow-path anyway and it saves us from adding read barriers in numerous call sites. Link: https://patch.msgid.link/20260401053659.3458961-1-vdonnefort@google.com Signed-off-by: Vincent Donnefort Suggested-by: Steven Rostedt (Google) Signed-off-by: Steven Rostedt (Google) Signed-off-by: Sasha Levin --- LLM Generated explanations, may be completely bogus: Based on my comprehensive analysis, I have sufficient information to make a decision. Let me now lay out the full analysis following the 9-phase workflow. --- ## PHASE 1: COMMIT MESSAGE FORENSICS **Step 1.1 - Subject Line Parse:** Record: Subsystem=`ring-buffer`, action verb=`Enforce`, summary=`Enforce read ordering of trace_buffer cpumask and buffers`. "Enforce" implies fixing a missing ordering guarantee = bug fix. **Step 1.2 - Tags:** Record: - Link: https://patch.msgid.link/20260401053659.3458961-1- vdonnefort@google.com (lore discussion) - Signed-off-by: Vincent Donnefort (author) - Suggested-by: Steven Rostedt (Google) - tracing maintainer suggested the fix approach - Signed-off-by: Steven Rostedt - applied via maintainer tree - No Fixes: tag - No Cc: stable tag - No Reported-by tag **Step 1.3 - Body Analysis:** Record: Bug described is a memory ordering issue. On CPU hotplug, `trace_rb_cpu_prepare()` allocates a `ring_buffer_per_cpu` and sets the corresponding cpumask bit. Readers on other CPUs check the cpumask first, then access `buffer->buffers[cpu]`. Without read ordering, on weakly-ordered architectures a reader may observe the cpumask bit set while still seeing NULL for `buffer->buffers[cpu]`, causing NULL pointer dereference. Fix uses IPI + barrier trick to force ordering across all CPUs. Language "may observe" indicates defensive/analytical fix rather than reported crash. **Step 1.4 - Hidden Fix Detection:** Record: This IS clearly a bug fix (memory ordering/race fix), though without explicit "fix" in subject. The word "Enforce" indicates adding a missing ordering guarantee - a classic race fix pattern. --- ## PHASE 2: DIFF ANALYSIS **Step 2.1 - Inventory:** Record: Single file `kernel/trace/ring_buffer.c`, +18 -1 lines. New function `rb_cpu_sync()` (6 lines). Modified `trace_rb_cpu_prepare()` (replace single `smp_wmb()` with conditional IPI + wmb). Surgical fix. **Step 2.2 - Code Flow:** Record: - Before: `rb_allocate_cpu_buffer()` -> `smp_wmb()` -> `cpumask_set_cpu()`. Writer-side ordering only. - After: `rb_allocate_cpu_buffer()` -> `on_each_cpu(rb_cpu_sync, ...)` (IPI to all online CPUs, each executes `smp_rmb()`) -> `smp_wmb()` -> `cpumask_set_cpu()`. Forces synchronization on reader CPUs. - `if (unlikely(system_state == SYSTEM_RUNNING))` guard avoids IPI during early boot when IPI infrastructure may not be ready. **Step 2.3 - Bug Mechanism:** Record: Category (b) Synchronization/race condition fix. Specifically: data race between writer (trace_rb_cpu_prepare) and many readers/writers on other CPUs. On weakly-ordered CPUs (ARM, ARM64, PowerPC), readers can observe the cpumask bit set before seeing the newly-allocated buffer pointer, leading to NULL dereference. The IPI mechanism acts as a cross- CPU barrier: after `on_each_cpu()` returns synchronously, every online CPU has executed `smp_rmb()` via the IPI handler. Any subsequent cpumask_set_cpu write becomes visible only after that sync, ensuring any reader observing the bit also observes the buffer pointer. **Step 2.4 - Fix Quality:** Record: Fix is conceptually elegant and correct. Uses a well-known kernel pattern (IPI-as-barrier). Only affects the slow CPU hotplug path. Low regression risk - the IPI is guarded against early-boot execution, and CPU hotplug is inherently rare. The alternative (adding `smp_rmb()` to 30+ callsites) would be much more invasive and error-prone. --- ## PHASE 3: GIT HISTORY **Step 3.1 - Blame:** Record: The `smp_wmb()` + `cpumask_set_cpu` pattern was introduced by commit `b32614c03413f8` "tracing/rb: Convert to hotplug state machine" by Sebastian Andrzej Siewior, dated Nov 27, 2016 (v4.10). This code has been present in all stable trees since v4.10. **Step 3.2 - Fixes: Tag Follow-up:** Record: No Fixes: tag. The buggy code (the missing read barrier on reader side) has been present since 2016. The root cause is systemic - many reader callsites were added over the years without matching smp_rmb(). **Step 3.3 - File History:** Record: `kernel/trace/ring_buffer.c` is actively maintained (578 commits, recent activity). No prerequisite patches needed. Standalone fix (v1 only). **Step 3.4 - Author Context:** Record: Vincent Donnefort is a regular ring-buffer contributor (6+ commits to ring_buffer.c in 2024). Steven Rostedt (tracing maintainer) suggested the approach. Both are highly credentialed. **Step 3.5 - Dependencies:** Record: None. Self-contained fix. `on_each_cpu`, `smp_rmb`, `smp_wmb`, `system_state`/`SYSTEM_RUNNING` are all long-standing kernel primitives available in all stable trees. --- ## PHASE 4: MAILING LIST **Step 4.1 - Patch Discussion:** Record: `b4 dig -c 20ad8b0888be3` returned lore URL https://lore.kernel. org/all/20260401053659.3458961-1-vdonnefort@google.com/. Thread contains only the single patch submission (v1). No review comments, no NAKs, no "Cc: stable" suggestions. Thread mbox has just the submission message. **Step 4.2 - Reviewers:** Record: To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, linux-trace-kernel@vger.kernel.org. Cc: kernel-team@android.com, linux-kernel@vger.kernel.org. All relevant maintainers CC'd. **Step 4.3 - Bug Report:** Record: No bug report linked. No Reported-by. This is a defensive fix based on code analysis. Web fetch of lore was blocked by Anubis bot protection. **Step 4.4 - Related Patches:** Record: `b4 dig -c -a` confirmed only v1 exists. Standalone single patch, not part of a series. **Step 4.5 - Stable Discussion:** Record: Not searched separately; no Cc: stable in the submission indicates the author/maintainer did not explicitly nominate it for stable. --- ## PHASE 5: CODE SEMANTIC ANALYSIS **Step 5.1 - Key Functions:** Record: `trace_rb_cpu_prepare()` (modified), `rb_cpu_sync()` (new). **Step 5.2 - Callers of `trace_rb_cpu_prepare`:** Record: Registered as CPU hotplug callback via `cpuhp_state_add_instance(CPUHP_TRACE_RB_PREPARE, ...)`. Called on CPU online transition for each `trace_buffer` registered. Runs in kernel context on the control CPU (not the target CPU). **Step 5.3 - Readers Affected (impact surface):** Record: `grep cpumask_test_cpu.*buffer->cpumask` in ring_buffer.c shows 30+ callsites that use the pattern `if (!cpumask_test_cpu(cpu, buffer->cpumask)) return; cpu_buffer = buffer->buffers[cpu];`. Key hot paths include: - `ring_buffer_lock_reserve()` line 4716 - the main WRITE path, called from every trace event - `ring_buffer_peek()` line 5874 (reader) - `ring_buffer_record_disable_cpu/enable_cpu` - userspace-reachable via tracefs - `rb_wait()` via userspace read of per-CPU trace_pipe_raw These are user-reachable from tracefs and syscall paths. **Step 5.4 - Bug Reachability:** Record: User-reachable. CPU hotplug is triggered by: - Boot (initial CPU online) - Suspend/resume cycle (all CPUs brought down/up) - `/sys/devices/system/cpu/cpuX/online` toggling by root - VM CPU hotplug on cloud instances Concurrent readers on other CPUs during hotplug could race with the buffer allocation. On weakly-ordered archs (ARM64 very common), the race is realistic. **Step 5.5 - Similar Patterns:** Record: The IPI-as-barrier pattern is used elsewhere in the kernel (e.g., `kernel/time/hrtimer.c:958` `on_each_cpu(retrigger_next_event, NULL, 1)`). It's a well-established idiom. --- ## PHASE 6: STABLE TREE ANALYSIS **Step 6.1 - Buggy Code in Stable:** Record: Confirmed by inspecting archived versions - the exact `smp_wmb(); cpumask_set_cpu(cpu, buffer->cpumask);` pattern exists in v6.1, v6.6, v6.12, v6.18 (all active stable trees) and has been there since v4.10. Many readers lacking `smp_rmb()` are also present across all those versions. **Step 6.2 - Backport Complications:** Record: The patch context is simple and stable across versions. The line numbers differ but the exact 2-line context (`smp_wmb();\n\tcpumask_set_cpu(cpu, buffer->cpumask);`) is unchanged across all examined stable trees. Should apply cleanly with fuzz. **Step 6.3 - Related Fixes in Stable:** Record: No prior related fix found. Searched for memory ordering/barrier commits in kernel/trace/ - no matches targeting this specific ordering issue. --- ## PHASE 7: SUBSYSTEM CONTEXT **Step 7.1 - Subsystem/Criticality:** Record: `kernel/trace/ring_buffer.c` is the core ring buffer backing tracefs/ftrace/perf/BPF. Criticality = IMPORTANT (affects all users of tracing - kernel developers, security/observability tools, distros using ftrace). **Step 7.2 - Subsystem Activity:** Record: Very active - ~578 commits to ring_buffer.c, recent activity in 2024-2026. Well-maintained with careful attention to correctness. --- ## PHASE 8: IMPACT AND RISK **Step 8.1 - Affected Population:** Record: Users of tracing (ftrace, perf, BPF) on systems with CPU hotplug, especially on weakly-ordered architectures (ARM64 - ubiquitous in cloud, mobile, embedded; PowerPC - servers). Any Linux system where tracing is active during suspend/resume is potentially affected. **Step 8.2 - Trigger Conditions:** Record: Requires: 1. CPU hotplug event bringing online a CPU not previously seen by a trace_buffer 2. Concurrent tracing activity on other CPUs 3. Weakly-ordered CPU architecture (or unlucky timing on x86) Trigger is realistic (suspend/resume of any laptop, VM CPU hotplug) but timing-sensitive. Unprivileged users cannot directly trigger CPU hotplug, but tracing (if accessible) can be used to exercise the hot path. System_state transitions occur normally at boot. **Step 8.3 - Failure Severity:** Record: NULL pointer dereference -> kernel oops/panic. CRITICAL severity (system crash). No data corruption but immediate loss of system availability. **Step 8.4 - Risk-Benefit:** Record: - Benefit: MEDIUM-HIGH. Prevents a real kernel crash in a widely-used subsystem on common architectures. - Risk: LOW. 18-line change. IPI only on slow-path (CPU hotplug). SYSTEM_RUNNING guard prevents early-boot issues. Standard kernel idiom. Reviewed by subsystem maintainer. - Ratio: Favorable for backport. --- ## PHASE 9: SYNTHESIS **Step 9.1 - Evidence Summary:** FOR backport: - Real memory-ordering race that can cause NULL dereference (kernel crash) - Small, surgical fix (18 lines, single file) - Buggy pattern present in all active stable trees since v4.10 - Reviewed and authored by the tracing maintainer (Steven Rostedt) - Applies cleanly (context unchanged across stable versions) - Affects commonly-used architecture (ARM64) - Impact path includes hot tracing code called from many paths AGAINST backport: - No Fixes: tag - No Cc: stable tag - No Reported-by - defensive fix based on code analysis, not observed crash - Bug was latent for 10 years without mainstream report - Commit message uses conditional "may observe" **Step 9.2 - Stable Rules Checklist:** 1. Obviously correct and tested? Yes - uses standard IPI-as-barrier idiom, accepted by maintainer 2. Fixes a real bug? Yes - verifiable memory-ordering race 3. Important (crash)? Yes - NULL pointer dereference 4. Small and contained? Yes - 18 lines, single file 5. No new features? Correct - pure ordering fix 6. Can apply to stable? Yes - simple context, unchanged code structure **Step 9.3 - Exception Category:** Not a device-ID/quirk/build/doc/DT fix. Standard bug fix category. **Step 9.4 - Decision:** This is a borderline-positive case. The fix addresses a genuine memory- ordering bug that can cause a kernel crash (NULL dereference) on weakly- ordered architectures. The fix is small, well-reasoned, reviewed by the subsystem maintainer, and very low regression risk. While the absence of Fixes:/Cc:stable tags and concrete reports suggests lower urgency, stable rules explicitly include theoretical races with severe consequences. The tracing subsystem is widely used, and ARM64 (where this bug most readily manifests) is prevalent. The fix applies cleanly to stable trees. --- ## Verification - [Phase 1] Parsed commit message: confirmed no Fixes:, no Cc: stable, no Reported-by; Suggested-by: Steven Rostedt (tracing maintainer) - [Phase 1] Language analysis: "may observe" and "Enforce" indicate defensive race fix based on code analysis - [Phase 2] `git show 20ad8b0888be3 --stat`: verified 18 insertions, 1 deletion, single file - [Phase 2] Diff inspection: verified replacement of `smp_wmb()` with conditional `on_each_cpu + smp_wmb` block; new `rb_cpu_sync` helper with `smp_rmb()` - [Phase 3] `git blame -L 7508,7512 kernel/trace/ring_buffer.c`: confirmed `smp_wmb();/cpumask_set_cpu` pattern introduced by b32614c03413f8 in v4.10 (Nov 2016) - [Phase 3] `git show b32614c03413f8`: confirmed original commit by Sebastian Andrzej Siewior, "tracing/rb: Convert to hotplug state machine" - [Phase 3] `git log --author="Vincent Donnefort" -- kernel/trace/ring_buffer.c`: author has 6+ ring_buffer.c commits, regular contributor - [Phase 4] `b4 dig -c 20ad8b0888be3`: resolved to lore URL, confirmed submission thread - [Phase 4] `b4 dig -c -a`: confirmed only v1 exists, no revisions - [Phase 4] `/tmp/rb_sync_thread.mbox` read: thread has only the single patch submission, no review replies, no stable nomination in discussion - [Phase 5] `grep cpumask_test_cpu.*buffer->cpumask`: confirmed 30+ reader callsites in ring_buffer.c - [Phase 5] Verified `ring_buffer_lock_reserve` (line 4716) uses the pattern - hot write path - [Phase 5] Verified `on_each_cpu(x, NULL, 1)` idiom used elsewhere (kernel/time/hrtimer.c:958) - [Phase 6] `git show v6.6/v6.12/v6.18:kernel/trace/ring_buffer.c`: confirmed identical 2-line context `smp_wmb();\ncpumask_set_cpu(cpu, buffer->cpumask);` present in all major stable trees -> patch will apply cleanly - [Phase 6] `git show v4.10:kernel/trace/ring_buffer.c`: confirmed pattern present at trace_rb_cpu_prepare since v4.10 - [Phase 7] `git log --oneline --since=2024 -- kernel/trace/ring_buffer.c`: confirmed active subsystem with many recent commits - [Phase 8] Confirmed reachability: cpumask checks precede `buffer->buffers[cpu]` dereference in hot write path (`ring_buffer_lock_reserve`) and reader paths - NULL deref is possible if race occurs - UNVERIFIED: Cannot confirm whether this race has actually been observed in production (no Reported-by, no Link to bug tracker, Lore WebFetch blocked by bot protection). Assessment is based on code analysis and consequences of the race, which are severe when it does trigger. **YES** kernel/trace/ring_buffer.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 170170bd83bd9..10d2d0404434d 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -7468,6 +7468,12 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu) return 0; } +static void rb_cpu_sync(void *data) +{ + /* Not really needed, but documents what is happening */ + smp_rmb(); +} + /* * We only allocate new buffers, never free them if the CPU goes down. * If we were to free the buffer, then the user would lose any trace that was in @@ -7506,7 +7512,18 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node) cpu); return -ENOMEM; } - smp_wmb(); + + /* + * Ensure trace_buffer readers observe the newly allocated + * ring_buffer_per_cpu before they check the cpumask. Instead of using a + * read barrier for all readers, send an IPI. + */ + if (unlikely(system_state == SYSTEM_RUNNING)) { + on_each_cpu(rb_cpu_sync, NULL, 1); + /* Not really needed, but documents what is happening */ + smp_wmb(); + } + cpumask_set_cpu(cpu, buffer->cpumask); return 0; } -- 2.53.0