* [PATCH AUTOSEL 6.19-5.10] s390/purgatory: Add -Wno-default-const-init-unsafe to KBUILD_CFLAGS
[not found] <20260211123112.1330287-1-sashal@kernel.org>
@ 2026-02-11 12:30 ` Sasha Levin
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19] ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn() Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-02-11 12:30 UTC (permalink / raw)
To: patches, stable
Cc: Heiko Carstens, Sasha Levin, pjw, palmer, aou, nathan, gor, ardb,
samitolvanen, linux-riscv, llvm
From: Heiko Carstens <hca@linux.ibm.com>
[ Upstream commit b4780fe4ddf04b51127a33d705f4a2e224df00fa ]
Add -Wno-default-const-init-unsafe to purgatory KBUILD_CFLAGS, similar
to scripts/Makefile.extrawarn, since clang generates warnings for the
dummy variable in typecheck():
CC arch/s390/purgatory/purgatory.o
arch/s390/include/asm/ptrace.h:221:9: warning: default initialization of an object of type 'typeof (regs->psw)' (aka 'const psw_t') leaves the object uninitialized [-Wdefault-const-init-var-unsafe]
221 | return psw_bits(regs->psw).pstate;
| ^
arch/s390/include/asm/ptrace.h:98:2: note: expanded from macro 'psw_bits'
98 | typecheck(psw_t, __psw); \
| ^
include/linux/typecheck.h:11:12: note: expanded from macro 'typecheck'
11 | typeof(x) __dummy2; \
| ^
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have a thorough understanding. Let me compile my analysis.
## Analysis: s390/purgatory: Add -Wno-default-const-init-unsafe to
KBUILD_CFLAGS
### 1. Commit Message Analysis
The commit adds the `-Wno-default-const-init-unsafe` compiler flag to
the s390 purgatory's `KBUILD_CFLAGS`. The commit message clearly
explains the problem: **clang 21+** introduced a new on-by-default
warning (`-Wdefault-const-init-var-unsafe`) that triggers on the
`typecheck()` macro's dummy variable. The warning fires in
`arch/s390/include/asm/ptrace.h:221` via `psw_bits()` -> `typecheck()`
-> `__dummy2` (line 11 of `include/linux/typecheck.h`).
The commit author is Heiko Carstens, the s390 subsystem maintainer.
### 2. Code Change Analysis
The change is exactly **one line** added to
`arch/s390/purgatory/Makefile`:
```
+KBUILD_CFLAGS += $(call cc-option, -Wno-default-const-init-unsafe)
```
This is wrapped in `$(call cc-option, ...)`, which means it's only
applied when the compiler supports the flag, providing backward
compatibility.
### 3. Root Cause: Why s390 Purgatory Needs Its Own Fix
This is the critical technical detail. The s390 purgatory Makefile
**completely replaces** `KBUILD_CFLAGS` from scratch (line 16):
```16:16:arch/s390/purgatory/Makefile
KBUILD_CFLAGS := -std=gnu11 -fms-extensions -fno-strict-aliasing -Wall
-Wstrict-prototypes
```
Note the `:=` assignment operator — this discards ALL previously-set
global flags, including the `-Wno-default-const-init-unsafe` that was
already added to `scripts/Makefile.warn` (formerly
`scripts/Makefile.extrawarn`) by commit `d0afcfeb9e381` ("kbuild:
Disable -Wdefault-const-init-unsafe").
In contrast, other purgatory Makefiles (x86, riscv, powerpc) use
`filter-out` patterns like:
```
KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=%
...,$(KBUILD_CFLAGS))
```
which **preserve** the global flags (including the warning suppression).
Only s390's purgatory builds from scratch and needs this companion fix.
### 4. Is This a Build Fix?
**Yes, definitively.** With `CONFIG_WERROR=y` (enabled in many distro
configs and CI systems), the clang 21+ warning becomes a build error.
The commit message shows the exact warning output from `CC
arch/s390/purgatory/purgatory.o`. The trigger path is:
- `purgatory.o` includes `asm/ptrace.h`
- `ptrace.h:221` calls `psw_bits(regs->psw).pstate`
- `psw_bits` macro (line 98) calls `typecheck(psw_t, __psw)`
- `typecheck` macro (`include/linux/typecheck.h:11`) declares `typeof(x)
__dummy2;` — an uninitialized const variable
- clang 21+ flags this with `-Wdefault-const-init-var-unsafe`
### 5. Relationship to Parent Commit
The parent commit `d0afcfeb9e381` ("kbuild: Disable -Wdefault-const-
init-unsafe") was explicitly tagged with `Cc: stable@vger.kernel.org`,
indicating the kernel community considers this warning fix important for
stable. That commit fixed the global build system, but the s390
purgatory was missed because it builds its own CFLAGS from scratch. This
commit is the necessary companion fix.
However, this commit is **self-contained** — it doesn't depend on
`d0afcfeb9e381` being present in stable. The s390 purgatory always
builds its own CFLAGS independently.
### 6. Risk Assessment
- **Lines changed**: 1
- **Files changed**: 1 Makefile
- **Runtime risk**: **Zero**. This is purely a compile-time flag that
suppresses a spurious compiler warning. No runtime behavior change
whatsoever.
- **Backward compatibility**: The `$(call cc-option, ...)` wrapper
ensures the flag is silently ignored on compilers that don't support
it (older clang, all gcc).
- **Side effects**: None. The warning being suppressed (`-Wdefault-
const-init-var-unsafe`) is well-audited — the `typecheck()` macro's
dummy variable is never used; it exists solely for compile-time type
checking.
### 7. Stable Relevance
- The s390 purgatory (kexec code) exists in all stable kernel trees
going back many years.
- The `psw_bits()` / `typecheck()` code path exists in all these trees.
- Users building stable kernels with newer clang + `CONFIG_WERROR=y`
**will** get build failures.
- Build fixes are explicitly listed as appropriate for stable in
`Documentation/process/stable-kernel-rules.rst`.
### 8. Dependency Check
The commit is fully self-contained. The only "dependency" is the s390
purgatory Makefile itself, which has existed unchanged (in terms of the
KBUILD_CFLAGS structure) for many kernel versions. The `$(call cc-
option, ...)` mechanism is available in all stable trees.
### Summary
This is a textbook build fix for stable:
- **One line**, zero risk, zero runtime impact
- Fixes a **real build failure** with newer clang + CONFIG_WERROR
- Self-contained, no dependencies
- Companion to a commit already tagged `Cc: stable@vger.kernel.org`
- Uses `cc-option` for backward compatibility
- Written by the s390 subsystem maintainer
**YES**
arch/s390/purgatory/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/s390/purgatory/Makefile b/arch/s390/purgatory/Makefile
index 0c196a5b194af..61d240a37633d 100644
--- a/arch/s390/purgatory/Makefile
+++ b/arch/s390/purgatory/Makefile
@@ -23,6 +23,7 @@ KBUILD_CFLAGS += -D__DISABLE_EXPORTS
KBUILD_CFLAGS += $(CLANG_FLAGS)
KBUILD_CFLAGS += $(if $(CONFIG_CC_IS_CLANG),-Wno-microsoft-anon-tag)
KBUILD_CFLAGS += $(call cc-option,-fno-PIE)
+KBUILD_CFLAGS += $(call cc-option, -Wno-default-const-init-unsafe)
KBUILD_AFLAGS := $(filter-out -DCC_USING_EXPOLINE,$(KBUILD_AFLAGS))
KBUILD_AFLAGS += -D__DISABLE_EXPORTS
--
2.51.0
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 2+ messages in thread* [PATCH AUTOSEL 6.19] ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn()
[not found] <20260211123112.1330287-1-sashal@kernel.org>
2026-02-11 12:30 ` [PATCH AUTOSEL 6.19-5.10] s390/purgatory: Add -Wno-default-const-init-unsafe to KBUILD_CFLAGS Sasha Levin
@ 2026-02-11 12:30 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-02-11 12:30 UTC (permalink / raw)
To: patches, stable
Cc: Yicong Yang, Rafael J. Wysocki, Sasha Levin, rafael, pjw, palmer,
aou, linux-acpi, linux-riscv
From: Yicong Yang <yang.yicong@picoheart.com>
[ Upstream commit 7cf28b3797a81b616bb7eb3e90cf131afc452919 ]
The device object rescan in acpi_scan_clear_dep_fn() is scheduled on a
system workqueue which is not guaranteed to be finished before entering
userspace. This may cause some key devices to be missing when userspace
init task tries to find them. Two issues observed on RISCV platforms:
- Kernel panic due to userspace init cannot have an opened
console.
The console device scanning is queued by acpi_scan_clear_dep_queue()
and not finished by the time userspace init process running, thus by
the time userspace init runs, no console is present.
- Entering rescue shell due to the lack of root devices (PCIe nvme in
our case).
Same reason as above, the PCIe host bridge scanning is queued on
a system workqueue and finished after init process runs.
The reason is because both devices (console, PCIe host bridge) depend on
riscv-aplic irqchip to serve their interrupts (console's wired interrupt
and PCI's INTx interrupts). In order to keep the dependency, these
devices are scanned and created after initializing riscv-aplic. The
riscv-aplic is initialized in device_initcall() and a device scan work
is queued via acpi_scan_clear_dep_queue(), which is close to the time
userspace init process is run. Since system_dfl_wq is used in
acpi_scan_clear_dep_queue() with no synchronization, the issues will
happen if userspace init runs before these devices are ready.
The solution is to wait for the queued work to complete before entering
userspace init. One possible way would be to use a dedicated workqueue
instead of system_dfl_wq, and explicitly flush it somewhere in the
initcall stage before entering userspace. Another way is to use
async_schedule_dev_nocall() for scanning these devices. It's designed
for asynchronous initialization and will work in the same way as before
because it's using a dedicated unbound workqueue as well, but the kernel
init code calls async_synchronize_full() right before entering userspace
init which will wait for the work to complete.
Compared to a dedicated workqueue, the second approach is simpler
because the async schedule framework takes care of all of the details.
The ACPI code only needs to focus on its job. A dedicated workqueue for
this could also be redundant because some platforms don't need
acpi_scan_clear_dep_queue() for their device scanning.
Signed-off-by: Yicong Yang <yang.yicong@picoheart.com>
[ rjw: Subject adjustment, changelog edits ]
Link: https://patch.msgid.link/20260128132848.93638-1-yang.yicong@picoheart.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Detailed Analysis
### 1. COMMIT MESSAGE ANALYSIS
The commit fixes two concrete, severe boot failures on RISC-V platforms
with ACPI:
1. **Kernel panic** - userspace init cannot open a console because the
console device hasn't been scanned yet. The device scan work was
queued via `acpi_scan_clear_dep_queue()` to `system_dfl_wq` (or
`system_unbound_wq` in stable) and hasn't completed by the time
userspace init runs.
2. **Boot failure into rescue shell** - root device (PCIe NVMe via PCIe
host bridge) is missing for the same reason: the scan work is still
queued and not completed.
Both are caused by a race: the deferred device scan (queued by
`acpi_scan_clear_dep_queue()`) is scheduled on a system workqueue with
**no synchronization barrier** before userspace init starts. Devices
that depend on RISC-V APLIC (interrupt controller) are scanned
asynchronously after APLIC initialization in `device_initcall()`, and if
init runs before the workqueue work completes, critical devices are
missing.
The commit message is well-written by both the author (Yicong Yang) and
was reviewed/edited by the ACPI maintainer (Rafael J. Wysocki), who
signed it off.
### 2. CODE CHANGE ANALYSIS
The change is **small and surgical** (~30 net lines removed):
**Before (old code):**
- A `struct acpi_scan_clear_dep_work` wraps `work_struct` + `acpi_device
*`
- `acpi_scan_clear_dep_fn()` is a `work_struct` callback that calls
`acpi_bus_attach()` under `acpi_scan_lock`, then releases the device
reference and frees the wrapper
- `acpi_scan_clear_dep_queue()` allocates the wrapper via `kmalloc()`,
initializes the work, and queues it on
`system_dfl_wq`/`system_unbound_wq`
**After (new code):**
- `acpi_scan_clear_dep_fn()` signature changes to `(void *dev,
async_cookie_t cookie)` - an `async_func_t` callback
- It uses `to_acpi_device(dev)` directly instead of `container_of` on a
wrapper struct
- `acpi_scan_clear_dep_queue()` calls `async_schedule_dev_nocall()`
instead of `queue_work()`
- The `struct acpi_scan_clear_dep_work` wrapper is removed entirely
- No more `kmalloc()` for the wrapper (the async framework handles its
own allocation internally)
**Why this fixes the bug:** `async_schedule_dev_nocall()` schedules work
on the async framework's dedicated domain (`async_dfl_domain`). The
critical property is that `kernel_init()` in `init/main.c` calls
`async_synchronize_full()` **before** entering userspace (before
`run_init_process()`):
```1569:1642:init/main.c
static int __ref kernel_init(void *unused)
{
// ...
kernel_init_freeable();
/* need to finish all async __init code before freeing the
memory */
async_synchronize_full();
// ...
// <userspace init happens after this point>
```
This guarantees all async-scheduled work (including the device scans)
completes before userspace init starts. The old
`queue_work(system_unbound_wq, ...)` had no such synchronization
barrier.
**Reference counting correctness:** The reference counting is preserved
identically:
- On success: `acpi_scan_clear_dep_fn()` releases the reference via
`acpi_dev_put(adev)`
- On failure: `acpi_scan_clear_dep_queue()` returns `false`, and the
caller `acpi_scan_clear_dep()` releases the reference via
`acpi_dev_put(adev)`
### 3. CLASSIFICATION
This is a **real bug fix** for a **race condition** that causes **kernel
panics and boot failures**. It is not a feature, cleanup, or
optimization.
### 4. SCOPE AND RISK ASSESSMENT
- **Files changed:** 1 (`drivers/acpi/scan.c`)
- **Net lines:** Reduced - removes the wrapper struct, simplifies both
functions
- **Subsystem:** ACPI scan, a core subsystem
- **Risk:** LOW. The change replaces one deferred scheduling mechanism
(workqueue) with another (async framework) that has the specific
property of being synchronized before userspace init. The functional
behavior of the callback is identical. The async framework is well-
established and already used extensively in the kernel for device
probing.
- **Could this break something?** Very unlikely. The
`async_schedule_dev_nocall()` function uses an unbound workqueue
internally just like the old code, with the added benefit of the
synchronization barrier. The only behavior change is that work is
guaranteed to complete before userspace init, which is strictly
desirable.
### 5. USER IMPACT
- **Severity:** CRITICAL - kernel panics and inability to boot
- **Affected platforms:** Primarily RISC-V ACPI platforms right now, but
the underlying race could affect any platform using
`acpi_dev_clear_dependencies()` (Intel camera IVSC, INT3472, Surface
devices, ACPI EC, PCI link, GPIO, I2C - 18 different callers)
- **Who benefits:** RISC-V ACPI users are the primary beneficiaries.
Other platforms could theoretically hit this race too under heavy load
at boot time, though it's most likely on RISC-V where interrupt
controller dependency chains are deeper.
### 6. DEPENDENCY CHECK
- **`async_schedule_dev_nocall()`:** Already backported to all active
stable trees (6.1.y, 6.6.y, 6.12.y) - verified by checking commit
history
- **`acpi_scan_clear_dep_queue()` / `acpi_scan_clear_dep_fn()`:**
Present in all active stable trees (introduced before v5.17)
- **`#include <linux/async.h>`:** The new header include is the only
other change needed
- **Backport adjustment:** Stable trees (6.1.y through 6.14.y) use
`system_unbound_wq` instead of `system_dfl_wq`, but since the commit
removes the `queue_work()` call entirely, the diff will need a trivial
context adjustment (the old `queue_work(system_unbound_wq, ...)` line
vs `queue_work(system_dfl_wq, ...)`)
- **RISC-V ACPI dependency IDs:** The RSCV0001/RSCV0002 entries are in
6.12.y but not in 6.6.y or 6.1.y. However, the fix is still valid for
older trees because the same race exists for Intel camera/IVSC/PCI-
link devices that use this code path
### 7. STABILITY INDICATORS
- Authored by Yicong Yang (kernel contributor)
- Reviewed and edited by Rafael J. Wysocki (ACPI maintainer, signed off)
- The approach was explicitly chosen over a dedicated workqueue because
it leverages the existing async synchronization framework which "takes
care of all the details"
- Real-world tested (the bug was observed on actual RISC-V hardware)
### CONCLUSION
This commit fixes a **critical race condition** causing **kernel panics
and boot failures**. The fix is:
- **Small and surgical** (single file, removes complexity)
- **Obviously correct** (leverages the well-established async
synchronization barrier before userspace init)
- **Self-contained** (no dependencies beyond `async_schedule_dev_nocall`
which already exists in all stable trees)
- **Low risk** (functionally equivalent deferred execution, just with
proper synchronization)
- **High impact** (prevents kernel panics and boot failures on RISC-V
ACPI platforms; potentially prevents similar races on other platforms
using the same code path)
The only minor concern is the trivial context difference
(`system_unbound_wq` vs `system_dfl_wq`) for backport to pre-6.19 stable
trees, which requires a trivial adaptation of the diff context but does
not affect the logic (the line is removed, not modified).
**YES**
drivers/acpi/scan.c | 41 +++++++++++++++--------------------------
1 file changed, 15 insertions(+), 26 deletions(-)
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 416d87f9bd107..b78f6be2f9468 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -5,6 +5,7 @@
#define pr_fmt(fmt) "ACPI: " fmt
+#include <linux/async.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/slab.h>
@@ -2360,46 +2361,34 @@ static int acpi_dev_get_next_consumer_dev_cb(struct acpi_dep_data *dep, void *da
return 0;
}
-struct acpi_scan_clear_dep_work {
- struct work_struct work;
- struct acpi_device *adev;
-};
-
-static void acpi_scan_clear_dep_fn(struct work_struct *work)
+static void acpi_scan_clear_dep_fn(void *dev, async_cookie_t cookie)
{
- struct acpi_scan_clear_dep_work *cdw;
-
- cdw = container_of(work, struct acpi_scan_clear_dep_work, work);
+ struct acpi_device *adev = to_acpi_device(dev);
acpi_scan_lock_acquire();
- acpi_bus_attach(cdw->adev, (void *)true);
+ acpi_bus_attach(adev, (void *)true);
acpi_scan_lock_release();
- acpi_dev_put(cdw->adev);
- kfree(cdw);
+ acpi_dev_put(adev);
}
static bool acpi_scan_clear_dep_queue(struct acpi_device *adev)
{
- struct acpi_scan_clear_dep_work *cdw;
-
if (adev->dep_unmet)
return false;
- cdw = kmalloc(sizeof(*cdw), GFP_KERNEL);
- if (!cdw)
- return false;
-
- cdw->adev = adev;
- INIT_WORK(&cdw->work, acpi_scan_clear_dep_fn);
/*
- * Since the work function may block on the lock until the entire
- * initial enumeration of devices is complete, put it into the unbound
- * workqueue.
+ * Async schedule the deferred acpi_scan_clear_dep_fn() since:
+ * - acpi_bus_attach() needs to hold acpi_scan_lock which cannot
+ * be acquired under acpi_dep_list_lock (held here)
+ * - the deferred work at boot stage is ensured to be finished
+ * before userspace init task by the async_synchronize_full()
+ * barrier
+ *
+ * Use _nocall variant since it'll return on failure instead of
+ * run the function synchronously.
*/
- queue_work(system_dfl_wq, &cdw->work);
-
- return true;
+ return async_schedule_dev_nocall(acpi_scan_clear_dep_fn, &adev->dev);
}
static void acpi_scan_delete_dep_data(struct acpi_dep_data *dep)
--
2.51.0
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 2+ messages in thread