From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Danil Skrebenkov <danil.skrebenkov@cloudbear.ru>,
Andrew Jones <ajones@ventanamicro.com>,
Paul Walmsley <pjw@kernel.org>, Sasha Levin <sashal@kernel.org>,
palmer@dabbelt.com, aou@eecs.berkeley.edu,
alexander.deucher@amd.com, alexandre.f.demers@gmail.com,
linux-riscv@lists.infradead.org
Subject: [PATCH AUTOSEL 6.17-5.10] RISC-V: clear hot-unplugged cores from all task mm_cpumasks to avoid rfence errors
Date: Mon, 27 Oct 2025 20:38:52 -0400 [thread overview]
Message-ID: <20251028003940.884625-8-sashal@kernel.org> (raw)
In-Reply-To: <20251028003940.884625-1-sashal@kernel.org>
From: Danil Skrebenkov <danil.skrebenkov@cloudbear.ru>
[ Upstream commit ae9e9f3d67dcef7582a4524047b01e33c5185ddb ]
openSBI v1.7 adds harts checks for ipi operations. Especially it
adds comparison between hmask passed as an argument from linux
and mask of online harts (from openSBI side). If they don't
fit each other the error occurs.
When cpu is offline, cpu_online_mask is explicitly cleared in
__cpu_disable. However, there is no explicit clearing of
mm_cpumask. mm_cpumask is used for rfence operations that
call openSBI RFENCE extension which uses ipi to remote harts.
If hart is offline there may be error if mask of linux is not
as mask of online harts in openSBI.
this patch adds explicit clearing of mm_cpumask for offline hart.
Signed-off-by: Danil Skrebenkov <danil.skrebenkov@cloudbear.ru>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250919132849.31676-1-danil.skrebenkov@cloudbear.ru
[pjw@kernel.org: rewrote subject line for clarity]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Based on my comprehensive analysis using semantic code analysis tools
and examination of the kernel repository, here is my determination:
## **Backport Status: YES**
## Extensive Analysis
### 1. Semantic Tools Used and Findings
**mcp__semcode__find_function & mcp__semcode__find_callers:**
- Located `clear_tasks_mm_cpumask()` in kernel/cpu.c:1241-1268 - a well-
established function that safely clears CPU bits from all process
mm_cpumask fields
- Found that `arch_cpuhp_cleanup_dead_cpu()` is called by
`cpuhp_bp_sync_dead()` in the CPU hotplug core synchronization path
(kernel/cpu.c:361)
- **Critical finding**: ARM, ARM64, PARISC, and PowerPC architectures
already call `clear_tasks_mm_cpumask()` in their
`arch_cpuhp_cleanup_dead_cpu()` implementations - RISC-V was the
outlier missing this call
**mcp__semcode__find_callchain:**
- Traced the execution path: `cpuhp_bp_sync_dead` →
`arch_cpuhp_cleanup_dead_cpu` → `clear_tasks_mm_cpumask`
- Confirmed this is part of the standard CPU hotplug dead-CPU cleanup
sequence
**Impact Analysis via Callers:**
- `sbi_remote_sfence_vma_asid()` (the function affected by stale
mm_cpumask) has 3 direct callers, with `__flush_tlb_range()` being the
main one (arch/riscv/mm/tlbflush.c:118)
- `__flush_tlb_range()` is called by ALL TLB flush operations:
`flush_tlb_mm()`, `flush_tlb_page()`, `flush_tlb_range()`,
`flush_pmd_tlb_range()`, `flush_pud_tlb_range()`, and
`arch_tlbbatch_flush()`
- **User-space exposure**: HIGH - Any memory operations (mmap, munmap,
mprotect, page faults) trigger TLB flushes
### 2. Code Change Analysis
The fix adds exactly **one line** to arch/riscv/kernel/cpu-hotplug.c:
```c
clear_tasks_mm_cpumask(cpu);
```
This is placed in `arch_cpuhp_cleanup_dead_cpu()` right after the CPU is
confirmed dead, matching the pattern used by other architectures.
### 3. Root Cause and Bug Impact
**The Bug:**
When a CPU is hot-unplugged:
1. `__cpu_disable()` clears `cpu_online_mask` (line 39 of cpu-hotplug.c)
2. **BUT** the offline CPU remains set in mm_cpumask of all running
processes
3. Subsequent TLB flush operations use `mm_cpumask(mm)` to determine
target CPUs
4. This calls `sbi_remote_sfence_vma_asid()` which invokes openSBI's
RFENCE extension with the stale CPU mask
5. **openSBI v1.7+** validates the hart mask against online harts and
**returns an error** if they don't match
**Consequences:**
- RFENCE operations fail with errors
- TLB flush failures can lead to stale TLB entries
- Potential for data corruption or system instability
- Issue occurs on **every TLB flush** after any CPU hotplug event
**Affected Versions:**
- Bug introduced in v6.10 (commit 72b11aa7f8f93, May 2023) when RISC-V
switched to hotplug core state synchronization
- Fix appears in v6.18-rc2
### 4. Why This Should Be Backported
**Meets Stable Tree Criteria:**
✅ **Fixes important bug**: RFENCE errors with openSBI v1.7+ cause TLB
flush failures
✅ **Obviously correct**: Matches established pattern from 4+ other
architectures (ARM, ARM64, PARISC, PowerPC)
✅ **Small and contained**: Single line addition, no side effects
✅ **No new features**: Pure bug fix for CPU hotplug cleanup
✅ **Low regression risk**: Function specifically designed for this
purpose, already tested on multiple architectures
**Additional Justification:**
1. **Architectural correctness**: RISC-V should behave like other
architectures for CPU hotplug
2. **Real-world impact**: Affects any RISC-V system with CPU hotplug +
openSBI v1.7+
3. **High exposure**: User-space memory operations routinely trigger TLB
flushes
4. **No dependencies**: `clear_tasks_mm_cpumask()` already exists in all
kernel versions with CPU hotplug support
5. **Well-understood fix**: The function has extensive documentation
explaining its purpose (kernel/cpu.c:1241)
**Risk Assessment:**
- **Minimal risk**: The fix aligns RISC-V with established behavior
- `clear_tasks_mm_cpumask()` includes safeguards:
WARN_ON(cpu_online(cpu)) check, proper RCU locking
- No changes to core hotplug logic, just adds missing cleanup step
### 5. Why No Stable Tag?
The commit lacks "Cc: stable@vger.kernel.org" and "Fixes:" tags, which
is unfortunate. However, based on:
- The commit message explicitly describing the error condition
- The architectural inconsistency (other arches already do this)
- The real-world failure with openSBI v1.7+
- Review by Andrew Jones (a RISC-V maintainer)
This appears to be an oversight rather than an indication the fix
shouldn't be backported.
### Recommendation
**YES - This commit should be backported to stable kernels v6.10+** as
it fixes a real bug causing TLB flush failures on RISC-V systems with
CPU hotplug enabled when using modern openSBI firmware. The fix is
small, safe, and brings RISC-V in line with other architectures.
arch/riscv/kernel/cpu-hotplug.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/kernel/cpu-hotplug.c b/arch/riscv/kernel/cpu-hotplug.c
index a1e38ecfc8be2..3f50d3dd76c6f 100644
--- a/arch/riscv/kernel/cpu-hotplug.c
+++ b/arch/riscv/kernel/cpu-hotplug.c
@@ -54,6 +54,7 @@ void arch_cpuhp_cleanup_dead_cpu(unsigned int cpu)
pr_notice("CPU%u: off\n", cpu);
+ clear_tasks_mm_cpumask(cpu);
/* Verify from the firmware if the cpu is really stopped*/
if (cpu_ops->cpu_is_stopped)
ret = cpu_ops->cpu_is_stopped(cpu);
--
2.51.0
next prev parent reply other threads:[~2025-10-28 0:40 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-28 0:38 [PATCH AUTOSEL 6.17-6.1] smb/server: fix possible memory leak in smb2_read() Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17-5.4] NFS4: Fix state renewals missing after boot Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17-6.12] drm/amdgpu: remove two invalid BUG_ON()s Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17-5.15] NFS: check if suid/sgid was cleared after a write as needed Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17-6.12] HID: logitech-hidpp: Add HIDPP_QUIRK_RESET_HI_RES_SCROLL Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17-5.4] ASoC: max98090/91: fixed max98091 ALSA widget powering up/down Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17] ALSA: hda/realtek: Fix mute led for HP Omen 17-cb0xxx Sasha Levin
2025-10-28 0:38 ` Sasha Levin [this message]
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17] ASoC: nau8821: Avoid unnecessary blocking in IRQ handler Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17-5.4] HID: quirks: avoid Cooler Master MM712 dongle wakeup bug Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17] drm/amdkfd: fix suspend/resume all calls in mes based eviction path Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17-6.12] exfat: fix improper check of dentry.stream.valid_size Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17] io_uring: fix unexpected placement on same size resizing Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17] drm/amd: Disable ASPM on SI Sasha Levin
2025-10-28 0:38 ` [PATCH AUTOSEL 6.17-6.6] riscv: acpi: avoid errors caused by probing DT devices when ACPI is used Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17-6.1] drm/amd/pm: Disable MCLK switching on SI at high pixel clocks Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17-6.12] drm/amdgpu: hide VRAM sysfs attributes on GPUs without VRAM Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17] fs: return EOPNOTSUPP from file_setattr/file_getattr syscalls Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17-6.12] NFS4: Apply delay_retrans to async operations Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17-6.1] drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17] ixgbe: handle IXGBE_VF_FEATURES_NEGOTIATE mbox cmd Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17] ixgbe: handle IXGBE_VF_GET_PF_LINK_STATE mailbox operation Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17-6.6] HID: quirks: Add ALWAYS_POLL quirk for VRS R295 steering wheel Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17] HID: intel-thc-hid: intel-quickspi: Add ARL PCI Device Id's Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17-6.12] HID: nintendo: Wait longer for initial probe Sasha Levin
2025-10-28 0:39 ` [PATCH AUTOSEL 6.17-6.1] smb/server: fix possible refcount leak in smb2_sess_setup() Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251028003940.884625-8-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=ajones@ventanamicro.com \
--cc=alexander.deucher@amd.com \
--cc=alexandre.f.demers@gmail.com \
--cc=aou@eecs.berkeley.edu \
--cc=danil.skrebenkov@cloudbear.ru \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=patches@lists.linux.dev \
--cc=pjw@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).