* [PATCH v7 3/6] mm/memory-failure: report MF_MSG_KERNEL for unrecoverable kernel pages
From: Breno Leitao @ 2026-05-13 15:39 UTC (permalink / raw)
To: Miaohe Lin, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, Naoya Horiguchi, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
Liam R. Howlett
Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
linux-trace-kernel, kernel-team
In-Reply-To: <20260513-ecc_panic-v7-0-be2e578e61da@debian.org>
The previous patch teaches get_any_page() to return -ENOTRECOVERABLE
for stable unhandlable kernel pages (PG_reserved, slab, vmalloc, page
tables, kernel stacks, ...). memory_failure() still folds every
negative return into MF_MSG_GET_HWPOISON, so callers that want to
react to the unrecoverable cases (a panic option, smarter logging)
cannot tell them apart from transient page-allocator races.
Turn the post-call branch into a switch over the get_hwpoison_page()
return code: map -ENOTRECOVERABLE to MF_MSG_KERNEL and any other
negative return to MF_MSG_GET_HWPOISON. case 0 keeps the existing
free-buddy / kernel-high-order handling and case 1 falls through to
the rest of memory_failure() unchanged.
The MF_MSG_KERNEL label and tracepoint string are kept as
"reserved kernel page" to avoid breaking userspace tools that match
on those literals; the enum value still adequately tags the failure
even though it now also covers slab, vmalloc, page tables and kernel
stack pages.
Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
mm/memory-failure.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index bae883df3ccb2..4b3a5d4190a07 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2410,7 +2410,8 @@ int memory_failure(unsigned long pfn, int flags)
* that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
*/
res = get_hwpoison_page(p, flags);
- if (!res) {
+ switch (res) {
+ case 0:
if (is_free_buddy_page(p)) {
if (take_page_off_buddy(p)) {
page_ref_inc(p);
@@ -2429,7 +2430,19 @@ int memory_failure(unsigned long pfn, int flags)
res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
}
goto unlock_mutex;
- } else if (res < 0) {
+ case 1:
+ /* Got a refcount on a handlable page. */
+ break;
+ case -ENOTRECOVERABLE:
+ /*
+ * Stable unhandlable kernel-owned page (PG_reserved,
+ * slab, vmalloc, page tables, kernel stacks, ...).
+ * No recovery possible.
+ */
+ res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
+ goto unlock_mutex;
+ default:
+ /* Transient lifecycle race with the page allocator. */
res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
goto unlock_mutex;
}
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE
From: Breno Leitao @ 2026-05-13 15:39 UTC (permalink / raw)
To: Miaohe Lin, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, Naoya Horiguchi, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
Liam R. Howlett
Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
linux-trace-kernel, kernel-team
In-Reply-To: <20260513-ecc_panic-v7-0-be2e578e61da@debian.org>
get_any_page() collapses three different failure modes into a single
-EIO return:
* the put_page race in the !count_increased path;
* the HWPoisonHandlable() rejection that bounces out of
__get_hwpoison_page() with -EBUSY and exhausts shake_page() retries;
* the HWPoisonHandlable() rejection that goes through the
count_increased / put_page / shake_page retry loop.
The first is transient (the page is racing with the allocator). The
second can be either transient (a userspace folio briefly off LRU
during migration/compaction) or stable (slab/vmalloc/page-table/
kernel-stack pages). The third describes a stable kernel-owned page
that the count_increased=true caller already held a reference on.
Distinguish them on the return path: keep -EIO for both the put_page
race and the -EBUSY-after-retries branch (shake_page() cannot drag a
folio back from active migration, so we cannot prove the page is
permanently kernel-owned from there), keep -EBUSY for the allocation
race (unchanged), and return -ENOTRECOVERABLE only from the
count_increased-true HWPoisonHandlable() rejection that exhausts its
retries -- the caller's reference is structural evidence that the
page is owned by the kernel.
Extend the unhandlable-page pr_err() to fire for either errno and
update the get_hwpoison_page() kerneldoc.
memory_failure() still folds every negative return into
MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so
this patch is a no-op for users of memory_failure() and only changes
the errno that soft_offline_page() can propagate to its callers. A
follow-up wires the new return code through memory_failure() and
reports MF_MSG_KERNEL for the unrecoverable cases.
Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
mm/memory-failure.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 49bcfbd04d213..bae883df3ccb2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1408,6 +1408,15 @@ static int get_any_page(struct page *p, unsigned long flags)
shake_page(p);
goto try_again;
}
+ /*
+ * Return -EIO rather than -ENOTRECOVERABLE: this
+ * branch is also reached for pages that are merely
+ * off-LRU transiently (e.g. a folio in the middle
+ * of migration or compaction), which shake_page()
+ * cannot drag back. The caller cannot prove the
+ * page is permanently kernel-owned from here, so
+ * keep it on the recoverable errno.
+ */
ret = -EIO;
goto out;
}
@@ -1427,10 +1436,10 @@ static int get_any_page(struct page *p, unsigned long flags)
goto try_again;
}
put_page(p);
- ret = -EIO;
+ ret = -ENOTRECOVERABLE;
}
out:
- if (ret == -EIO)
+ if (ret == -EIO || ret == -ENOTRECOVERABLE)
pr_err("%#lx: unhandlable page.\n", page_to_pfn(p));
return ret;
@@ -1487,7 +1496,10 @@ static int __get_unpoison_page(struct page *page)
* -EIO for pages on which we can not handle memory errors,
* -EBUSY when get_hwpoison_page() has raced with page lifecycle
* operations like allocation and free,
- * -EHWPOISON when the page is hwpoisoned and taken off from buddy.
+ * -EHWPOISON when the page is hwpoisoned and taken off from buddy,
+ * -ENOTRECOVERABLE for stable kernel-owned pages the handler
+ * cannot recover (PG_reserved, slab, vmalloc, page tables,
+ * kernel stacks, and similar non-LRU/non-buddy pages).
*/
static int get_hwpoison_page(struct page *p, unsigned long flags)
{
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v7 0/6] mm/memory-failure: add panic option for unrecoverable pages
From: Breno Leitao @ 2026-05-13 15:39 UTC (permalink / raw)
To: Miaohe Lin, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, Naoya Horiguchi, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
Liam R. Howlett
Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
linux-trace-kernel, kernel-team, Lance Yang
A multi-bit ECC error on a kernel-owned page that the memory failure
handler cannot recover is currently swallowed: PG_hwpoison is set, the
event is logged, and the kernel keeps running. The corrupted memory
remains accessible to the kernel and either drives silent data
corruption or surfaces seconds-to-minutes later as an apparently
unrelated crash. In a large fleet that delayed, unattributable crash
turns into significant engineering effort to root-cause; in a kdump
configuration, by the time the crash happens the original error
context (faulting PFN, MCE/GHES record, page state) is long gone.
This series adds an opt-in sysctl,
vm.panic_on_unrecoverable_memory_failure, that converts an
unrecoverable kernel-page hwpoison event into an immediate panic with
a clean dmesg/vmcore that still contains the original failure
context. The default is disabled so existing workloads see no
change.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v7:
- Move the PG_reserved / unhandlable-kernel-page classification into
get_any_page() and surface it via -ENOTRECOVERABLE, per David
Hildenbrand's and Lance Yang's review of v6. This drops the
is_reserved snapshot in memory_failure() and the mf_get_page_status
enum / out-parameter introduced in v6.
- Restructure the post-call branch in memory_failure() as a switch
over the get_hwpoison_page() return code (David).
- Drop the "reserved" qualifier from the MF_MSG_KERNEL label and the
matching tracepoint string; the enum now covers both PG_reserved
pages and other unhandlable kernel pages.
- Squash the former patches 1/4 ("MF_MSG_KERNEL for reserved pages")
and 2/4 ("classify get_any_page() failures by reason") into a
single classification patch; the series is now 3 patches.
- Simplify panic_on_unrecoverable_mf() to a single return statement
(David).
- Link to v6: https://patch.msgid.link/20260511-ecc_panic-v6-0-183012ba7d4b@debian.org
Changes in v6:
- Dropped the selftest given the value was not clear
- Get the status of the failure from get_any_page()
- Small nits from different people/AIs.
- Link to v5: https://patch.msgid.link/20260424-ecc_panic-v5-0-a35f4b50425c@debian.org
Changes in v5:
- Add vm.panic_on_unrecoverable_memory_failure sysctl to panic on
unrecoverable kernel page hwpoison events (reserved pages, refcount-0
non-buddy pages, unknown state), with a recheck to avoid racing with
concurrent buddy allocations. (Miaohe)
- Distinguish reserved pages as MF_MSG_KERNEL in memory_failure(),
document the new sysctl in Documentation/admin-guide/sysctl/vm.rst,
and add a selftest verifying SIGBUS recovery on userspace pages still
works when the sysctl is enabled. (Miaohe)
- Added a selftest
- Link to v4:
https://patch.msgid.link/20260415-ecc_panic-v4-0-2d0277f8f601@debian.org
Changes in v4:
- Drop CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option.
- Split the reserved page classification (MF_MSG_KERNEL) into its own
patch, separate from the panic mechanism.
- Document why the buddy allocator TOCTOU race (between
get_hwpoison_page() and is_free_buddy_page()) cannot cause false
positives: PG_hwpoison is set beforehand and check_new_page() in the
page allocator rejects hwpoisoned pages.
- Document the narrow LRU isolation race window for MF_MSG_UNKNOWN and
its mitigation via identify_page_state()'s two-pass design.
- Explicitly document why MF_MSG_GET_HWPOISON is excluded from the
panic conditions (shared path with transient races and non-reserved
kernel memory).
- Link to v3: https://patch.msgid.link/20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org
Changes in v3:
- Rename is_unrecoverable_memory_failure() to panic_on_unrecoverable_mf()
as suggested by maintainer.
- Add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option,
similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
- Add documentation for the sysctl and CONFIG option.
- Add code comments documenting the panic condition design rationale and
how the retry mechanism mitigates false positives from buddy allocator
races.
- Link to v2: https://patch.msgid.link/20260331-ecc_panic-v2-0-9e40d0f64f7a@debian.org
Changes in v2:
- Panic on MF_MSG_KERNEL, MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_UNKNOWN
instead of MF_MSG_GET_HWPOISON.
- Report MF_MSG_KERNEL for reserved pages when get_hwpoison_page() fails
instead of MF_MSG_GET_HWPOISON.
- Link to v1: https://patch.msgid.link/20260323-ecc_panic-v1-0-72a1921726c5@debian.org
To: Miaohe Lin <linmiaohe@huawei.com>
To: Naoya Horiguchi <nao.horiguchi@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Steven Rostedt <rostedt@goodmis.org>
To: Masami Hiramatsu <mhiramat@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
To: David Hildenbrand <david@kernel.org>
To: Lorenzo Stoakes <ljs@kernel.org>
To: "Liam R. Howlett" <liam@infradead.org>
To: Vlastimil Babka <vbabka@kernel.org>
To: Mike Rapoport <rppt@kernel.org>
To: Suren Baghdasaryan <surenb@google.com>
To: Michal Hocko <mhocko@suse.com>
To: Shuah Khan <shuah@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
---
Breno Leitao (6):
mm/memory-failure: drop dead error_states[] entry for reserved pages
mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE
mm/memory-failure: report MF_MSG_KERNEL for unrecoverable kernel pages
mm/memory-failure: short-circuit PG_reserved before get_hwpoison_page()
mm/memory-failure: add panic option for unrecoverable pages
Documentation: document panic_on_unrecoverable_memory_failure sysctl
Documentation/admin-guide/sysctl/vm.rst | 80 +++++++++++++++++++++++++++++++
mm/memory-failure.c | 85 +++++++++++++++++++++++++--------
2 files changed, 146 insertions(+), 19 deletions(-)
---
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
change-id: 20260323-ecc_panic-4e473b83087c
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply
* [PATCH v7 1/6] mm/memory-failure: drop dead error_states[] entry for reserved pages
From: Breno Leitao @ 2026-05-13 15:39 UTC (permalink / raw)
To: Miaohe Lin, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, Naoya Horiguchi, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Liam R. Howlett,
Liam R. Howlett
Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest, Breno Leitao,
linux-trace-kernel, kernel-team
In-Reply-To: <20260513-ecc_panic-v7-0-be2e578e61da@debian.org>
The first entry of error_states[],
{ reserved, reserved, MF_MSG_KERNEL, me_kernel },
is unreachable. identify_page_state() has two callers, and neither
one can dispatch a PG_reserved page to me_kernel():
* memory_failure() reaches identify_page_state() only after
get_hwpoison_page() returned 1. get_any_page() reaches that
return only via __get_hwpoison_page(), which gates the refcount
on HWPoisonHandlable(). HWPoisonHandlable() rejects PG_reserved
pages, so they fail with -EBUSY/-EIO long before
identify_page_state() runs.
* try_memory_failure_hugetlb() reaches identify_page_state() on
the MF_HUGETLB_IN_USED branch, but the page is necessarily a
hugetlb folio there. The first table entry that matches a
hugetlb folio is { head, head, MF_MSG_HUGE, me_huge_page }, so
they dispatch to me_huge_page() before the (now-removed)
reserved entry would have matched, regardless of whether
PG_reserved happens to be set on the head page.
me_kernel() never executes and the entry exists only to be matched
against by code that cannot see it.
Drop the entry, the me_kernel() helper, and the now-unused
"reserved" macro. Leave the MF_MSG_KERNEL enum value in place: it
remains part of the tracepoint and pr_err() string tables, and
follow-on work to classify unrecoverable kernel pages can reuse it
without churning the user-visible enum.
No functional change.
Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
mm/memory-failure.c | 14 --------------
1 file changed, 14 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 866c4428ac7ef..49bcfbd04d213 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -992,17 +992,6 @@ static bool has_extra_refcount(struct page_state *ps, struct page *p,
return false;
}
-/*
- * Error hit kernel page.
- * Do nothing, try to be lucky and not touch this instead. For a few cases we
- * could be more sophisticated.
- */
-static int me_kernel(struct page_state *ps, struct page *p)
-{
- unlock_page(p);
- return MF_IGNORED;
-}
-
/*
* Page in unknown state. Do nothing.
* This is a catch-all in case we fail to make sense of the page state.
@@ -1211,10 +1200,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
#define mlock (1UL << PG_mlocked)
#define lru (1UL << PG_lru)
#define head (1UL << PG_head)
-#define reserved (1UL << PG_reserved)
static struct page_state error_states[] = {
- { reserved, reserved, MF_MSG_KERNEL, me_kernel },
/*
* free pages are specially detected outside this table:
* PG_buddy pages only make a small fraction of all free pages.
@@ -1246,7 +1233,6 @@ static struct page_state error_states[] = {
#undef mlock
#undef lru
#undef head
-#undef reserved
static void update_per_node_mf_stats(unsigned long pfn,
enum mf_result result)
--
2.53.0-Meta
^ permalink raw reply related
* Re: [RFC PATCH v3] bpf: introduce TAINT_UNSAFE_BPF for mutating helpers
From: Steven Rostedt @ 2026-05-13 15:23 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Aaron Tomlin, Jonathan Corbet, Song Liu, KP Singh, Matt Bobrowski,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard,
Kumar Kartikeya Dwivedi, Masami Hiramatsu, Shuah Khan, Jiri Olsa,
Martin KaFai Lau, Yonghong Song, Mathieu Desnoyers, Randy Dunlap,
neelx, sean, chjohnst, steve, mproche, nick.lange,
open list:DOCUMENTATION, LKML, bpf, linux-trace-kernel
In-Reply-To: <CAADnVQL_sWznA+JJLdzP_ZdUgQeO7p-AGnOtx9=fXjH+PnRJBA@mail.gmail.com>
On Wed, 13 May 2026 08:16:07 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> It's impossible to track all modifications.
> See what sched-ext is doing.
> What does it modify? Everything.
What about just having a list of what BPF programs are loaded, what they
may be attached to, and what kfuncs they are calling?
-- Steve
^ permalink raw reply
* Re: [RFC v2 0/2] add kconfirm
From: Nicolas Schier @ 2026-05-13 15:22 UTC (permalink / raw)
To: Julian Braha
Cc: nathan, jani.nikula, akpm, gary, ljs, arnd, gregkh, masahiroy,
ojeda, corbet, qingfang.deng, linux-kernel, rust-for-linux,
linux-doc, linux-kbuild
In-Reply-To: <20260509203808.1142311-1-julianbraha@gmail.com>
On Sat, May 09, 2026 at 09:38:06PM +0100, Julian Braha wrote:
> Hi all,
>
> kconfirm is a tool to detect misusage of Kconfig. It detects dead code,
> constant conditions, and invalid (reverse) ranges. There are also optional
> checks to detect config options that select visible config options, and to
> check for dead links in the help texts.
>
> The full patchset (with the vendored dependencies) is available in my
> linux fork, git branch 'kconfirm_rfc2', and is based on linux v7.1-rc2:
> https://github.com/julianbraha/linux/tree/kconfirm_rfc2
Thanks! I like the idea of having a static analyser for kconfig!
I guess the github branch is expected to work out of the box, but on my arm64
system this fails with:
kconfirm$ make -j8 kconfirm
error: no matching package named `env_logger` found
location searched: crates.io index
required by package `kconfirm-lib v0.9.0 (/data/kbuild/kbuild-fixes/kconfirm/scripts/kconfirm/kconfirm-lib)`
As a reminder, you're using offline mode (--offline) which can sometimes cause surprising resolution failures, if this error is too confusing you may wish to retry without the offline flag.
make[2]: *** [Makefile:17: kconfirm] Error 101
make[1]: *** [kconfirm/Makefile:2244: kconfirm] Error 2
make: *** [Makefile:248: __sub-make] Error 2
[exit code 2]
and if 'kconfirm' does not need a .config file, you want to add 'kconfirm' to
the list of 'no-dot-config-targets' in top-level Makefile.
FTR: the 'kconfirm' and 'kconfirmclean' targets need some love: both do not
really integrate in kbuild, yet: 'kconfirm' is not working with out-of-source
builds (O=...), 'kconfirmclean' should not be required if 'make clean' is
supported correctly, and 'make mrproper' removes the whole scripts/kconfirm
tree due to the change in 'scripts/Makefile'. (Tested?)
>
> The patches sent here with the RFC include everything other than the
> vendored dependencies, including the tool's code, the documentation, and
> the makefile changes.
> Following this discussion:
> https://lore.kernel.org/all/20260405122749.4990dcb538d457769a3276e0@linux-foundation.org/
> in which Andrew brought up the possibility of moving kconfirm in-tree,
> I've prepared this RFC to do so. See also kconfirm's introduction to the
> mailing list:
> https://lore.kernel.org/all/6ec4df6d-1445-48ca-8f54-1d1a83c4716d@gmail.com/
The large amount of changes has been mentioned often enough; even if all the
vendored dependencies could be dropped, I am not convinced yet, that it is a
good idea to maintain kconfirm in-tree due to its project size.
IMO, we need at least someone who steps up for maintaining kconfirm and
registers in a dedicated MAINTAINERS entry. (My own rust knowledge is not good
enough for appropriate review, I can only offer some initial testing and
frequent use when it is working/integrated.)
Kind regards,
Nicolas
^ permalink raw reply
* Re: [RFC PATCH v3] bpf: introduce TAINT_UNSAFE_BPF for mutating helpers
From: Alexei Starovoitov @ 2026-05-13 15:16 UTC (permalink / raw)
To: Steven Rostedt
Cc: Aaron Tomlin, Jonathan Corbet, Song Liu, KP Singh, Matt Bobrowski,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard,
Kumar Kartikeya Dwivedi, Masami Hiramatsu, Shuah Khan, Jiri Olsa,
Martin KaFai Lau, Yonghong Song, Mathieu Desnoyers, Randy Dunlap,
neelx, sean, chjohnst, steve, mproche, nick.lange,
open list:DOCUMENTATION, LKML, bpf, linux-trace-kernel
In-Reply-To: <20260513111331.7bede512@gandalf.local.home>
On Wed, May 13, 2026 at 8:13 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Sun, 3 May 2026 21:51:49 +0200
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > Nack.
> >
> > Please stop this spam.
> > We're not doing it. These helpers have been around for a long time.
> > There was no need to taint then. There is no need to taint now.
>
> Hi Alexei,
>
> I'm wondering if there's a way to see what modifications BPF programs are
> doing to the kernel? I try to make it easy to see what modifications ftrace
> has done (like the enabled_functions file), because I like to know how my
> kernel is modified since boot up.
It's impossible to track all modifications.
See what sched-ext is doing.
What does it modify? Everything.
^ permalink raw reply
* Re: [RFC PATCH v3] bpf: introduce TAINT_UNSAFE_BPF for mutating helpers
From: Steven Rostedt @ 2026-05-13 15:13 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Aaron Tomlin, Jonathan Corbet, Song Liu, KP Singh, Matt Bobrowski,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard,
Kumar Kartikeya Dwivedi, Masami Hiramatsu, Shuah Khan, Jiri Olsa,
Martin KaFai Lau, Yonghong Song, Mathieu Desnoyers, Randy Dunlap,
neelx, sean, chjohnst, steve, mproche, nick.lange,
open list:DOCUMENTATION, LKML, bpf, linux-trace-kernel
In-Reply-To: <CAADnVQJ5fatNF4auH+a8E39zWMfja3rm4BM_xGcTnLX8uuCQ9Q@mail.gmail.com>
On Sun, 3 May 2026 21:51:49 +0200
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> Nack.
>
> Please stop this spam.
> We're not doing it. These helpers have been around for a long time.
> There was no need to taint then. There is no need to taint now.
Hi Alexei,
I'm wondering if there's a way to see what modifications BPF programs are
doing to the kernel? I try to make it easy to see what modifications ftrace
has done (like the enabled_functions file), because I like to know how my
kernel is modified since boot up.
Thus, it would be nice to know if BPF is modifying anything in user space
or just what BPF programs are loaded.
Note, I'm agnostic to this change, it just brought up a previous concern of
mine when I read it.
Thanks,
-- Steve
^ permalink raw reply
* [PATCH v9 23/23] x86/virt/tdx: Document TDX module update
From: Chao Gao @ 2026-05-13 15:10 UTC (permalink / raw)
To: kvm, linux-coco, linux-kernel, linux-doc
Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
yilun.xu, xiaoyao.li, yan.y.zhao, Chao Gao, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin,
Jonathan Corbet, Shuah Khan
In-Reply-To: <20260513151045.1420990-1-chao.gao@intel.com>
Document TDX module update as a subsection of "TDX Host Kernel Support" to
provide background information and cover key points that developers and
users may need to know, for example:
- update is done in stop_machine() context
- update instructions and results
- update policy and tooling
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
Documentation/arch/x86/tdx.rst | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/Documentation/arch/x86/tdx.rst b/Documentation/arch/x86/tdx.rst
index 1a3b5bac1021..9d2b7db166b5 100644
--- a/Documentation/arch/x86/tdx.rst
+++ b/Documentation/arch/x86/tdx.rst
@@ -73,6 +73,40 @@ initialize::
[..] virt/tdx: TDX-Module initialization failed ...
+TDX module Runtime Update
+-------------------------
+
+The TDX architecture includes a persistent SEAM loader (P-SEAMLDR) that
+runs in SEAM mode separately from the TDX module. The kernel can
+communicate with P-SEAMLDR to perform runtime updates of the TDX module.
+
+During updates, the TDX module becomes unresponsive to other TDX
+operations. To prevent components using TDX (such as KVM) from
+experiencing unexpected errors during updates, updates are performed in
+stop_machine() context.
+
+TDX module updates have complex compatibility requirements; the new module
+must be compatible with the current CPU, P-SEAMLDR, and running TDX module.
+Rather than implementing complex module selection and policy enforcement
+logic in the kernel, userspace is responsible for auditing and selecting
+appropriate updates.
+
+Updates use the standard firmware upload interface. See
+Documentation/driver-api/firmware/fw_upload.rst for detailed instructions.
+
+If updates failed, running TDs may be killed and further TDX operations may
+not be possible until reboot. For detailed error information, see
+Documentation/ABI/testing/sysfs-devices-faux-tdx-host.
+
+Given the risk of losing existing TDs, userspace should verify that the
+update is compatible with the current system and properly validated before
+applying it.
+
+A reference userspace tool that implements necessary checks is available
+at:
+
+ https://github.com/intel/tdx-module-binaries
+
TDX Interaction to Other Kernel Components
------------------------------------------
--
2.52.0
^ permalink raw reply related
* [PATCH v9 00/23] Runtime TDX module update support
From: Chao Gao @ 2026-05-13 15:09 UTC (permalink / raw)
To: kvm, linux-coco, x86, linux-kernel, linux-rt-devel, linux-doc
Cc: binbin.wu, dave.hansen, djbw, ira.weiny, kai.huang, kas,
nik.borisov, paulmck, pbonzini, reinette.chatre, rick.p.edgecombe,
sagis, seanjc, tony.lindgren, vannapurve, vishal.l.verma,
yilun.xu, xiaoyao.li, yan.y.zhao, Chao Gao, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, H. Peter Anvin,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Jonathan Corbet, Shuah Khan
Hi Dave,
Thanks for your thorough review of v8. This v9 addresses the issues you
pointed out. In particular, it adopts the new tdx_blob format you
suggested, removes module version printing during updates, and reworks
the do-while loop in the update flow to improve readability. It also
adds the two cleanup patches you suggested as patches 1 and 2.
Please take a look at this new version. I hope it can still be merged
for 7.2.
---
(For transparency, note that I used AI tools to help proofread this
cover-letter and commit messages)
This series adds support for runtime TDX module updates that preserve
running TDX guests. It is also available at:
https://github.com/gaochaointel/linux-dev/commits/tdx-module-updates-v9/
== Background ==
Intel TDX isolates Trusted Domains (TDs), or confidential guests, from the
host. A key component of Intel TDX is the TDX module, which enforces
security policies to protect the memory and CPU states of TDs from the
host. However, the TDX module is software that requires updates.
== Problems ==
Currently, the TDX module is loaded by the BIOS at boot time, and the only
way to update it is through a reboot, which results in significant system
downtime. Users expect the TDX module to be updatable at runtime without
disrupting TDX guests.
== Solution ==
On TDX platforms, P-SEAMLDR[1] is a component within the protected SEAM
range. It is loaded by the BIOS and provides the host with functions to
install a TDX module at runtime.
This series implements runtime TDX module updates through the fw_upload
mechanism. That interface is a good fit because TDX module selection is not
a simple "load a known file from disk" problem. The update image to load
depends on module versioning, compatibility rules. fw_upload lets userspace
choose the module explicitly while the kernel provides the update
mechanism.
This design intentionally keeps most update validation/policy in userspace.
The kernel exposes the information userspace needs, such as TDX module
version and P-SEAMLDR information, but userspace is responsible for
understanding TDX module's versioning and compatibility rules and for
choosing an appropriate update image (see "TDX module versioning" below).
The kernel still enforces the pieces that must be handled in-kernel:
1. Validate the tdx_blob header fields that are not passed through tothe
TDX module. Just the standard overflow and reserved bits defensive ABI stuff.
2. Make sure no non-update SEAMCALLs are called during the update.
3. Make sure SEAMCALLs are on the right CPU, for any the user has made
available to the kernel.
4. Handle the race between updates and concurrent TD builds by
returning -EBUSY to userspace.
Everything else remains a userspace responsibility.
In the unlikely event the update fails, for example userspace picks an
incompatible update image, or the image is otherwise corrupted, all TDs
will experience SEAMCALL failures and be killed. The recovery of TD
operation from that event requires a reboot.
Given there is no mechanism to quiesce SEAMCALLs, the TDs themselves must
pause execution over an update. The most straightforward way to meet the
'pause TDs while update executes' constraint is to run the update in
stop_machine() context. All other evaluated solutions export more
complexity to KVM, or exports more fragility to userspace.
== How to test this series ==
NOTE: This v9 uses a new tdx_blob format. The scripts and module blobs in
https://github.com/intel/tdx-module-binaries have not yet been updated
to match this version. Those updates will be done separately later.
== Other information relevant to Runtime TDX module updates ==
=== TDX module versioning ===
Each TDX module is assigned a version number x.y.z, where x represents the
"major" version, y the "minor" version, and z the "update" version.
Runtime TDX module updates are restricted to Z-stream releases.
Note that Z-stream releases do not necessarily guarantee compatibility. A
new release may not be compatible with all previous versions. To address this,
Intel provides a separate file containing compatibility information, which
specifies the minimum module version required for a particular update. This
information is referenced by the tool to determine if two modules are
compatible.
=== TCB Stability ===
Updates change the TCB as viewed by attestation reports. In TDX there is
a distinction between launch-time version and current version where
runtime TDX module updates cause that latter version number to change,
subject to Z-stream constraints.
The concern that a malicious host may attack confidential VMs by loading
insecure updates was addressed by Alex in [3]. Similarly, the scenario
where some "theoretical paranoid tenant" in the cloud wants to audit
updates and stop trusting the host after updates until audit completion
was also addressed in [4]. Users not in the cloud control the host machine
and can manage updates themselves, so they don't have these concerns.
See more about the implications of current TCB version changes in
attestation as summarized by Dave in [5].
=== TDX module Distribution Model ===
At a high level, Intel publishes all TDX modules on the github [2], along
with a mapping_file.json which documents the compatibility information
about each TDX module and a userspace tool to install the TDX module. OS
vendors can package these modules and distribute them. Administrators
install the package and use the tool to select the appropriate TDX module
and install it via the interfaces exposed by this series.
[1]: https://cdrdv2.intel.com/v1/dl/getContent/733584
[2]: https://github.com/intel/tdx-module-binaries
[3]: https://lore.kernel.org/all/665c5ae0-4b7c-4852-8995-255adf7b3a2f@amazon.com/
[4]: https://lore.kernel.org/all/5d1da767-491b-4077-b472-2cc3d73246d6@amazon.com/
[5]: https://lore.kernel.org/all/94d6047e-3b7c-4bc1-819c-85c16ff85abf@intel.com/
Chao Gao (22):
x86/virt/tdx: Consolidate TDX global initialization states
x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h
coco/tdx-host: Introduce a "tdx_host" device
coco/tdx-host: Expose TDX module version
x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs
x86/virt/seamldr: Add a helper to retrieve P-SEAMLDR information
coco/tdx-host: Expose P-SEAMLDR information via sysfs
coco/tdx-host: Don't expose P-SEAMLDR information on CPUs with erratum
coco/tdx-host: Implement firmware upload sysfs ABI for TDX module
updates
x86/virt/seamldr: Allocate and populate a module update request
x86/virt/seamldr: Introduce skeleton for TDX module updates
x86/virt/seamldr: Abort updates after a failed step
x86/virt/seamldr: Shut down the current TDX module
x86/virt/tdx: Reset software states during TDX module shutdown
x86/virt/seamldr: Install a new TDX module
x86/virt/seamldr: Do TDX per-CPU initialization after module
installation
x86/virt/tdx: Restore TDX module state
x86/virt/tdx: Refresh TDX module version after update
x86/virt/tdx: Reject updates during compatibility-sensitive operations
x86/virt/tdx: Enable TDX module runtime updates
coco/tdx-host: Document TDX module update compatibility criteria
x86/virt/tdx: Document TDX module update
Kai Huang (1):
x86/virt/tdx: Move low level SEAMCALL helpers out of <asm/tdx.h>
.../ABI/testing/sysfs-devices-faux-tdx-host | 68 ++++
Documentation/arch/x86/tdx.rst | 34 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/seamldr.h | 37 +++
arch/x86/include/asm/tdx.h | 67 ++--
arch/x86/include/asm/tdx_global_metadata.h | 4 +
arch/x86/include/asm/vmx.h | 1 +
arch/x86/virt/vmx/tdx/Makefile | 2 +-
arch/x86/virt/vmx/tdx/seamcall_internal.h | 109 +++++++
arch/x86/virt/vmx/tdx/seamldr.c | 306 ++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.c | 162 ++++++----
arch/x86/virt/vmx/tdx/tdx.h | 8 +-
arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 17 +-
drivers/virt/coco/Kconfig | 2 +
drivers/virt/coco/Makefile | 1 +
drivers/virt/coco/tdx-host/Kconfig | 12 +
drivers/virt/coco/tdx-host/Makefile | 1 +
drivers/virt/coco/tdx-host/tdx-host.c | 221 +++++++++++++
18 files changed, 940 insertions(+), 113 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-devices-faux-tdx-host
create mode 100644 arch/x86/include/asm/seamldr.h
create mode 100644 arch/x86/virt/vmx/tdx/seamcall_internal.h
create mode 100644 arch/x86/virt/vmx/tdx/seamldr.c
create mode 100644 drivers/virt/coco/tdx-host/Kconfig
create mode 100644 drivers/virt/coco/tdx-host/Makefile
create mode 100644 drivers/virt/coco/tdx-host/tdx-host.c
base-commit: 5209e5bfe5cab593476c3e7754e42c5e47ce36de
--
2.52.0
^ permalink raw reply
* Re: [PATCH RFC v4 01/10] dt-bindings: iio: frequency: add ad9910
From: Rodrigo Alencar @ 2026-05-13 15:09 UTC (permalink / raw)
To: Jonathan Cameron, Rodrigo Alencar via B4 Relay
Cc: rodrigo.alencar, linux-iio, devicetree, linux-kernel, linux-doc,
linux-hardening, Lars-Peter Clausen, Michael Hennerich,
David Lechner, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
Kees Cook, Gustavo A. R. Silva
In-Reply-To: <20260512193129.777d62a8@jic23-huawei>
On 26/05/12 07:31PM, Jonathan Cameron wrote:
> On Fri, 08 May 2026 18:00:17 +0100
> Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
>
> > From: Rodrigo Alencar <rodrigo.alencar@analog.com>
> >
> > DT-bindings for AD9910, a 1 GSPS DDS with 14-bit DAC. It includes
> > configurations for clocks, DAC current, reset and basic GPIO control.
>
> I think this is getting close enough now that for next version you should
> drop the RFC (which is probably gating DT binding folk giving it
> a detailed review!)
>
> >
> > Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
>
> > +
> > + adi,dac-output-current-microamp:
> > + minimum: 8640
> > + maximum: 31590
> > + default: 20070
> > + description:
> > + DAC full-scale output current in microamps.
> > +
> Can we use generic dac.yaml defined output-range-microamp? The base will be 0 always but
> that shouldn't matter.
>
would that be fine even if we do not have those child channel nodes in the device-tree node?
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* Re: [PATCH v6 2/4] mm/memory-failure: classify get_any_page() failures by reason
From: Breno Leitao @ 2026-05-13 15:07 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
Shuah Khan, Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Liam R. Howlett, linux-mm,
linux-kernel, linux-doc, linux-kselftest, linux-trace-kernel,
kernel-team, Lance Yang
In-Reply-To: <3e5e8fb8-6957-46b7-9777-0ed1bff1d0fb@kernel.org>
On Wed, May 13, 2026 at 01:48:11PM +0200, David Hildenbrand (Arm) wrote:
> > @@ -1441,10 +1456,10 @@ static int get_any_page(struct page *p, unsigned long flags)
> > goto try_again;
> > }
> > put_page(p);
> > - ret = -EIO;
> > + ret = -ENOTRECOVERABLE;
> > }
> > out:
> > - if (ret == -EIO)
> > + if (ret == -EIO || ret == -ENOTRECOVERABLE)
> > pr_err("%#lx: unhandlable page.\n", page_to_pfn(p));
> >
> > return ret;
> > @@ -2431,6 +2448,9 @@ int memory_failure(unsigned long pfn, int flags)
> > res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> > }
> > goto unlock_mutex;
> > + } else if (res == -ENOTRECOVERABLE) {
> > + res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> > + goto unlock_mutex;
> > } else if (res < 0) {
> > res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
> > goto unlock_mutex;
>
> That might probably read nicer as
>
> switch (res) {
> case 0: ...
> case 1: ...
> case -ENOTRECOVERABLE: ...
> case ...
> default:
> }
>
> >
> >
> > If that is what you are suggestion, maybe we can create another
> > MF_MSG_RESERVED? and another return value for get_any_page() to track
> > the reserve pages ?
>
> I guess "reserved" is really just like most other kernel pages. So I wouldn't
> special-case them here.
>
> Or would there be a good reason?
Not really, treating them as MF_MSG_KERNEL is sufficient for my use
case.
Thank you for the review. I'm digesting all the feedback and will send
out a new revision shortly, where we can continue the discussion.
--breno
^ permalink raw reply
* Re: [PATCH v6 2/8] docs/zh_CN: Add acm.rst translation
From: Alex Shi @ 2026-05-13 15:03 UTC (permalink / raw)
To: Kefan Bai, linux-usb, si.yanteng
Cc: gregkh, alexs, dzm91, corbet, skhan, linux-doc, doubled
In-Reply-To: <0ab199e9eafc0f7e312008063059aec4af0c65bc.1778415392.git.baikefan@leap-io-kernel.com>
On 2026/5/10 15:53, Kefan Bai wrote:
> Translate .../usb/acm.rst into Chinese
>
> Update the translation through commit ecefae6db042
> ("docs: usb: rename files to .rst and add them to drivers-api")
>
> Reviewed-by: Yanteng Si<siyanteng@cqsoftware.com.cn>
> Signed-off-by: Kefan Bai<baikefan@leap-io-kernel.com>
> ---
> Documentation/translations/zh_CN/usb/acm.rst | 136 ++++++++++++++++++
> .../translations/zh_CN/usb/index.rst | 2 +-
> 2 files changed, 137 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/translations/zh_CN/usb/acm.rst
>
> diff --git a/Documentation/translations/zh_CN/usb/acm.rst b/Documentation/translations/zh_CN/usb/acm.rst
> new file mode 100644
> index 000000000000..25ec83afd25f
> --- /dev/null
> +++ b/Documentation/translations/zh_CN/usb/acm.rst
> @@ -0,0 +1,136 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: ../disclaimer-zh_CN.rst
> +
> +:Original: Documentation/usb/acm.rst
> +
> +:翻译:
> +
> + 白钶凡 Kefan Bai<baikefan@leap-io-kernel.com>
> +
> +:校译:
> +
> +
> +=======================
> +Linux ACM 驱动 v0.16
> +=======================
> +
> +版权所有 (c) 1999 Vojtech Pavlik<vojtech@suse.cz>
> +
> +由 SuSE 赞助
> +
> +0. 免责声明
> +~~~~~~~~~~~~~
> +本程序是自由软件;你可以在自由软件基金会发布的 GNU 通用公共许可证第 2 版,
> +或者(按你的选择)任何后续版本的条款下重新发布和/或修改它。
> +
> +发布本程序是希望它能发挥作用,但它不附带任何担保;甚至不包括对适销性
> +或特定用途适用性的默示担保。详情见 GNU 通用公共许可证。
> +
Hi Kefan,
Please align the lines above to make them look a bit cleaner. They don't
need to be aligned perfectly, but the current formatting could be
improved. Please apply this same standard to all patches.
Thanks
Alex
> +你应该已经随本程序收到了 GNU 通用公共许可证的副本;
> +如果没有,请致信:Free Software Foundation, Inc., 59
> +Temple Place, Suite 330, Boston, MA 02111-1307 USA。
> +
> +如需联系作者,可发送电子邮件至vojtech@suse.cz,
> +或邮寄至:
> +Vojtech Pavlik, Ucitelska 1576, Prague 8, 182 00, Czech Republic。
> +
> +为方便起见,软件包中已附带 GNU 通用公共许可证第 2 版:见 COPYING 文件。
> +
> +1. 使用方法
> +~~~~~~~~~~~~~
> +``drivers/usb/class/cdc-acm.c`` 驱动可用于符合 USB 通信设备类抽象控制模型
> +(USB CDC ACM)规范的 USB 调制解调器和 USB ISDN 终端适配器。
> +
> +许多调制解调器支持此驱动,以下是我所知道的一些型号:
> +
> + - 3Com OfficeConnect 56k
> + - 3Com Voice FaxModem Pro
> + - 3Com Sportster
> + - MultiTech MultiModem 56k
> + - Zoom 2986L FaxModem
> + - Compaq 56k FaxModem
> + - ELSA Microlink 56k
> +
> +我知道有一款 ISDN 终端适配器可以与 ACM 驱动一起使用:
> +
> + - 3Com USR ISDN Pro TA
> +
> +一些手机也可以通过 USB 连接。我知道以下机型可以正常工作:
> +
> + - SonyEricsson K800i
> +
> +遗憾的是,许多调制解调器和大多数 ISDN TA 都使用专有接口,
> +因此无法与此驱动配合工作。购买前请先确认设备是否符合 ACM 规范。
> +
> +要使用这些调制解调器,需要加载以下模块::
> +
> + usbcore.ko
> + uhci-hcd.ko ohci-hcd.ko or ehci-hcd.ko
> + cdc-acm.ko
> +
> +之后就应该可以访问这些调制解调器了。
> +应当可以使用 ``minicom``、``ppp`` 和 ``mgetty`` 与它们通信。
> +
> +2. 验证驱动是否正常工作
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +第一步是检查 ``/sys/kernel/debug/usb/devices``,其内容应该类似如下::
> +
> + T: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
> + B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0
> + D: Ver= 1.00 Cls=09(hub ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
> + P: Vendor=0000 ProdID=0000 Rev= 0.00
> + S: Product=USB UHCI Root Hub
> + S: SerialNumber=6800
> +C:* #Ifs= 1 Cfg#= 1 Atr=40 MxPwr= 0mA
> + I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
> + E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=255ms
> + T: Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 2 Spd=12 MxCh= 0
> + D: Ver= 1.00 Cls=02(comm.) Sub=00 Prot=00 MxPS= 8 #Cfgs= 2
> + P: Vendor=04c1 ProdID=008f Rev= 2.07
> + S: Manufacturer=3Com Inc.
> + S: Product=3Com U.S. Robotics Pro ISDN TA
> + S: SerialNumber=UFT53A49BVT7
> + C: #Ifs= 1 Cfg#= 1 Atr=60 MxPwr= 0mA
> + I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=acm
> + E: Ad=85(I) Atr=02(Bulk) MxPS= 64 Ivl= 0ms
> + E: Ad=04(O) Atr=02(Bulk) MxPS= 64 Ivl= 0ms
> + E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=128ms
> +C:* #Ifs= 2 Cfg#= 2 Atr=60 MxPwr= 0mA
> + I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=acm
> + E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=128ms
> + I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=acm
> + E: Ad=85(I) Atr=02(Bulk) MxPS= 64 Ivl= 0ms
> + E: Ad=04(O) Atr=02(Bulk) MxPS= 64 Ivl= 0ms
> +
> +这三行的存在很关键(以及 ``Cls=`` 字段里出现的 ``comm`` 和 ``data`` 类);
> +它说明这是一个 ACM 设备。``Driver=acm`` 表示该设备正在使用 acm 驱动。
> +如果只看到 ``Cls=ff(vend.)``,那就无能为力了:这说明你手上的设备使用的是
> +厂商专有接口::
> +
> + D: Ver= 1.00 Cls=02(comm.) Sub=00 Prot=00 MxPS= 8 #Cfgs= 2
> + I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=01 Driver=acm
> + I: If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=acm
> +
> +在系统日志中应该可以看到::
> +
> + usb.c: USB new device connect, assigned device number 2
> + usb.c: kmalloc IF c7691fa0, numif 1
> + usb.c: kmalloc IF c7b5f3e0, numif 2
> + usb.c: skipped 4 class/vendor specific interface descriptors
> + usb.c: new device strings: Mfr=1, Product=2, SerialNumber=3
> + usb.c: USB device number 2 default language ID 0x409
> + Manufacturer: 3Com Inc.
> + Product: 3Com U.S. Robotics Pro ISDN TA
> + SerialNumber: UFT53A49BVT7
> + acm.c: probing config 1
> + acm.c: probing config 2
> + ttyACM0: USB ACM device
> + acm.c: acm_control_msg: rq: 0x22 val: 0x0 len: 0x0 result: 0
> + acm.c: acm_control_msg: rq: 0x20 val: 0x0 len: 0x7 result: 7
> + usb.c: acm driver claimed interface c7b5f3e0
> + usb.c: acm driver claimed interface c7b5f3f8
> + usb.c: acm driver claimed interface c7691fa0
> +
> +如果以上都正常,请启动 ``minicom``,把它配置为连接 ``ttyACM`` 设备,
> +然后尝试输入 ``at``。如果返回 ``OK``,说明一切工作正常。
> diff --git a/Documentation/translations/zh_CN/usb/index.rst b/Documentation/translations/zh_CN/usb/index.rst
> index 7cfe99a4dc0a..449e8ac2dff0 100644
> --- a/Documentation/translations/zh_CN/usb/index.rst
> +++ b/Documentation/translations/zh_CN/usb/index.rst
> @@ -17,10 +17,10 @@ USB 支持
> .. toctree::
> :maxdepth: 1
>
> + acm
>
> Todolist:
>
> -* acm
> * authorization
> * chipidea
> * dwc3
> --
> 2.54.0
>
^ permalink raw reply
* Re: [PATCH 08/12] swap,iomap: simplify iomap_swapfile_iter
From: Darrick J. Wong @ 2026-05-13 14:59 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andrew Morton, Chris Li, Kairui Song, Christian Brauner,
Jens Axboe, David Sterba, Theodore Ts'o, Jaegeuk Kim, Chao Yu,
Trond Myklebust, Anna Schumaker, Namjae Jeon, Hyunchul Lee,
Steve French, Paulo Alcantara, Carlos Maiolino, Damien Le Moal,
Naohiro Aota, linux-xfs, linux-fsdevel, linux-doc, linux-mm,
linux-block, linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs,
linux-cifs
In-Reply-To: <20260513065608.GA2250@lst.de>
On Wed, May 13, 2026 at 08:56:08AM +0200, Christoph Hellwig wrote:
> On Tue, May 12, 2026 at 10:02:04AM -0700, Darrick J. Wong wrote:
> > OH. Now I remember why -- it's to handle contiguous mixed mappings
> > better.
> >
> > Let's say that you have a 1k fsblock filesystem and 4k base pages. You
> > fallocate an 8G swap file and then mkswap it. The first mapping is a 1k
> > written mapping at offset 0 for the swap header, followed by an 8388607k
> > unwritten mapping at offset 3k.
> >
> > The PAGE_SIZE rounding code in iomap_swapfile_add_extent will round the
> > end of that first mapping down to zero and ignore it. The second
> > mapping will be treated as if it were a 8388604k mapping starting at
> > offset 4096. Now the page counts are wrong and the swapon fails.
>
> Do we care about this use case? I guess you did as you implemented
> his, but still?
We do, because mkswap -F uses fallocate nowadays:
$ mkswap -s 4194304 -F a
Setting up swapspace version 1, size = 4 MiB (4190208 bytes)
no label, UUID=bc9746bf-e200-4944-927c-80d83872f1cb
$ filefrag -v a
Filesystem type is: 58465342
File size of a is 4194304 (1024 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 0: 411383552.. 411383552: 1:
1: 1.. 1023: 411383553.. 411384575: 1023: last,unwritten,eof
a: 1 extent found
> > A more generic solution to this would be to change add_swap_extent to
> > take sector_t addr and length values and use them to construct a bitmap
> > representing contiguous physical space on the bdev, accounting of course
> > for PAGE_SIZE alignment. Except for the swap header page, every other
> > contiguously set page-aligned region in the bitmap gets added to the
> > swap extent map.
>
> You don't even need a bitmap, just do basically the same checks as
> the iomap code when moving to a new swap extent after moving to use
> the sector_t. And it really should anyway, as the current abuse of
> sector_t to store a disk offset in PAGE_SIZE units is pretty gross.
Oh, I meant this to handle the particularly gross case where the fsblock
size is smaller than a base page, but there are a very large number of
file mappings that point to a physically contiguous extent but are not
in logical order:
{.offset=0, .length=1k, .addr=7},
{.offset=1, .length=1k, .addr=6},
{.offset=2, .length=1k, .addr=5},
{.offset=3, .length=1k, .addr=4},
{.offset=4, .length=1k, .addr=3},
{.offset=5, .length=1k, .addr=2},
{.offset=6, .length=1k, .addr=1},
{.offset=7, .length=1k, .addr=0},
That's two pages of swapfile, but with the current layout accumulation
code we "cannot" find either.
--D
^ permalink raw reply
* Re: [PATCH v5 0/2] AMD Promontory 21 xHCI temperature sensor support
From: Mario Limonciello @ 2026-05-13 14:48 UTC (permalink / raw)
To: Jihong Min, Greg Kroah-Hartman, Mathias Nyman
Cc: Guenter Roeck, Jonathan Corbet, Shuah Khan, Basavaraj Natikar,
linux-usb, linux-hwmon, linux-doc, linux-pci, linux-kernel
In-Reply-To: <20260512213910.871859-1-hurryman2212@gmail.com>
On 5/12/26 16:39, Jihong Min wrote:
> Hi,
>
> This series adds temperature monitoring for AMD Promontory 21 (PROM21)
> xHCI PCI functions.
>
> Patch 1 adds a small PROM21-specific xHCI PCI glue driver. USB host
> operation is delegated to the common xhci-pci code, while the PROM21 glue
> publishes an auxiliary device for optional sensor support.
>
> Patch 2 adds an auxiliary-bus hwmon driver that binds to that auxiliary
> device and exposes the PROM21 xHCI temperature value as temp1_input.
>
> The hwmon driver reads the sensor through a vendor index/data register pair
> in the xHCI PCI MMIO BAR. It does not wake the parent PCI device for hwmon
> reads; if the parent is suspended, the read returns -ENODATA.
>
> Changes in v5:
> - Add support for AMD 1022:43fc PROM21 xHCI controllers and document the
> new PCI ID.
> - Make USB_XHCI_PCI_PROM21 depend on X86 and default to USB_XHCI_PCI.
> - Keep the PROM21 PCI glue built-in-only when enabled, while allowing the
> hwmon sensor driver to be built as a separate module.
> - Move PROM21 xHCI PCI device IDs to xhci-pci.h so xhci-pci.c and
> xhci-pci-prom21.c use shared definitions.
> - Pass the parent PCI device, MMIO base, and resource length to the hwmon
> driver through platform data defined in a common header, instead of
> inspecting the parent driver's drvdata from the hwmon driver.
> - Remove the private hwmon mutex and rely on hwmon core serialization for
> this driver's callbacks.
> - Clarify that the driver only serializes its own hwmon callbacks and does
> not synchronize with firmware, SMM, ACPI AML, or other possible users of
> the PROM21 vendor index/data register pair.
> - Use readb() for the temperature data register, validate the value before
> writing the output pointer, and drop the 0xff invalid-value check.
> - Use pm_runtime_put() after successful reads with the parent device active
> so the PM core can re-evaluate the parent device's idle state.
> - Simplify the documentation and use more precise terminology for the
> supported device.
>
> Jihong Min (2):
> usb: xhci-pci: add AMD Promontory 21 PCI glue
> hwmon: add AMD Promontory 21 xHCI temperature sensor support
>
> Documentation/hwmon/index.rst | 1 +
> Documentation/hwmon/prom21-xhci.rst | 101 ++++++++
> drivers/hwmon/Kconfig | 10 +
> drivers/hwmon/Makefile | 1 +
> drivers/hwmon/prom21-xhci.c | 238 ++++++++++++++++++
> drivers/usb/host/Kconfig | 20 ++
> drivers/usb/host/Makefile | 1 +
> drivers/usb/host/xhci-pci-prom21.c | 123 +++++++++
> drivers/usb/host/xhci-pci.c | 11 +
> drivers/usb/host/xhci-pci.h | 3 +
> include/linux/platform_data/usb-xhci-prom21.h | 22 ++
> 11 files changed, 531 insertions(+)
> create mode 100644 Documentation/hwmon/prom21-xhci.rst
> create mode 100644 drivers/hwmon/prom21-xhci.c
> create mode 100644 drivers/usb/host/xhci-pci-prom21.c
> create mode 100644 include/linux/platform_data/usb-xhci-prom21.h
>
Thanks for the driver. I think this looks good now, and thank you
especially for documenting your reverse engineering efforts that led to
it. If there are problems in the future I'm supposing it's going to be
based upon the calculations with the magic values to scale numbers.
There isn't a lot that can be done in the event that BIOS is accessing
the same register pairs, but since you identified that this is exactly
how Windows HWInfo64 does it too; this is 'probably' low risk.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
^ permalink raw reply
* Re: [PATCH] crypto: af_alg - Document the deprecation of AF_ALG
From: Jeff Barnes @ 2026-05-13 14:29 UTC (permalink / raw)
To: Ignat Korchagin
Cc: Eric Biggers, Kamran Khan, Andy Lutomirski,
linux-crypto@vger.kernel.org, Herbert Xu,
linux-doc@vger.kernel.org, linux-api@vger.kernel.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
Linus Torvalds
In-Reply-To: <CAOs+rJUA+bz6Y2GKioHnFGFKX_uAP+4LaPRs=ZDgRQoUi4mWkg@mail.gmail.com>
On May 12 2026, at 5:18 pm, Ignat Korchagin <ignat@linux.win> wrote:
> On Mon, May 11, 2026 at 10:38 PM Eric Biggers <ebiggers@kernel.org> wrote:
>>
>> On Mon, May 11, 2026 at 10:03:21PM +0100, Ignat Korchagin wrote:
>> > I don't think fully discounting hardware offloading is beneficial
>> here. HW
>> > accelerators will be produced and without a common interface
>> vendors would
>> > start implementing their own "bespoke" drivers with bespoke userspace
>> > interfaces (we already had such proposals), which in turn may
>> introduce more
>> > attack surface. Yes, AF_ALG needs substantial improvement, but at
>> least it
>> > can be a standardisation point.
>>
>> That isn't the best way to accelerate symmetric crypto anymore though,
>> if it ever was. This has been known for a long time.
>>
>> > > In any case, any hypothetical security benefit provided by AF_ALG would
>> > > have to be *very high* to outweigh the continuous stream of
>> > > vulnerabilities in it. I understand that people using AF_ALG
>> might not
>> > > be familiar with that continuous stream of vulnerabilities, but
>> it would
>> >
>> >
>> > Is it actually that much compared to other features/subsystems,
>> like eBPF or
>> > user namespaces? But we don't rush to deprecate those - instead
>> trying to
>> > harden them and come up with better design.
>>
>> There are plenty of other kernel features with a large attack surface,
>> of course. But they tend to be much more useful than AF_ALG. It's all
>> about weighing benefits vs. risks.
>
> If divide number of CVEs in such systems on imaginary units of
> usefulness, I think the ratio is similar.
>
>> When we get the point where a large number of Linux users *had* to
>> disable AF_ALG as an emergency vulnerability response, and at the same
>> time their systems weren't even using AF_ALG so nothing even broke and
>> they could have just done that to begin with, I think we get a very
>
> Well, there were: cryptsetup, RHEL fips check, so there are some...
cryptsetup does not have a hard dependency on AF_ALG.
It is a potential consumer via AF_ALG.
AF_ALG provides a broad, hard-to-control interface
cryptsetup (and similar tools) are not blockers
AF_ALG removal does not necessarily break cryptsetup usage. Removal does
improve FIPS boundary clarity.
>
>> clear idea of which side is heavier for AF_ALG in the real world.
>
> Same thing could be said for unprivileged user namespaces - distros
> even put a custom sysctl to restrict it and no-one noticed.
>
>> The main relevance of AF_ALG to the Linux community is that it allows
>> their systems to be exploited.
>
> To be clear I'm not arguing for the current AF_ALG implementation. I
> agree, the splice zero-copy is... suboptimal (to be soft) and is
> actually not-so-zero copy. But I think it was just added before we had
> more modern approaches like io_uring (have their own can of worms, but
> hey - people adopt it fast).
>
> But I advocate for the usefulness of the concept itself - kernel/OS
> providing crypto services to userspace. As mentioned in other threads,
> other operating systems have it and Linux lags behind. There are use
> cases: common interface for HW accelerators, embedded systems, which
> don't have the space to bring a userspace lib etc. Even non-technical:
> there are environments that just don't want to rely on third-party
> userspace libraries like OpenSSL purely for licensing reasons. And I
> agree, that it is hard to do it right, but we can piggy-back on other
> subsystems (such as io_uring mentioned or other ideas).
>
>> - Eric
>>
>
> Ignat
>
Jeff
^ permalink raw reply
* Re: [RFC v2 0/2] add kconfirm
From: Julian Braha @ 2026-05-13 14:12 UTC (permalink / raw)
To: Miguel Ojeda, Jan Engelhardt
Cc: nathan, nsc, jani.nikula, akpm, gary, ljs, arnd, gregkh,
masahiroy, ojeda, corbet, qingfang.deng, linux-kernel,
rust-for-linux, linux-doc, linux-kbuild
In-Reply-To: <851ccd3c-d86a-409e-bd73-f0ef10b85879@gmail.com>
On 5/11/26 00:06, Julian Braha wrote:
>> By the way, another option for that may be using the distribution's
>> registry (e.g. Debian and Fedora provide one through the package
>> manager).
> Unfortunately, it seems that there's no built-in way to fall back for
> other distros:
> https://github.com/rust-lang/cargo/issues/3066
>
> The workaround could be to create various Cargo config.toml files, and
> instruct users that, for example, if they want to use the debian
> packages, they can download their dependencies using:
> `cargo vendor --config debian.toml`
> But I need to test this and confirm first since I don't use any of these
> distros.
As I started testing this approach with debian, I discovered that
the parser crate, nom-kconfig, isn't available in the debian registry. I
will bring this up with the developer of that library. However, it may
take some time to be packaged and made available to users, so I will
soon submit RFC v3 using crates.io for dependency download, but outside
of make, as previously discussed.
- Julian Braha
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: ti: icssg: Derive stats array lengths from ARRAY_SIZE
From: David CARLIER @ 2026-05-13 14:07 UTC (permalink / raw)
To: MD Danish Anwar
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Jonathan Corbet, Shuah Khan, Roger Quadros,
Andrew Lunn, Jacob Keller, Meghana Malladi, Kevin Hao,
Vadim Fedorenko, netdev, linux-doc, linux-kernel,
linux-arm-kernel, Vignesh Raghavendra
In-Reply-To: <54fae7f6-fcb9-4520-a79a-569426ab96f1@ti.com>
On Wed, 13 May 2026 at 07:29, MD Danish Anwar <danishanwar@ti.com> wrote:
>
> Hi David
>
> On 12/05/26 3:33 pm, David CARLIER wrote:
> > Hi Danish,
> >
> >
> > On Tue, 12 May 2026 at 10:40, MD Danish Anwar <danishanwar@ti.com> wrote:
> >>
> >> Hi David,
> >>
> >> On 12/05/26 1:28 pm, David CARLIER wrote:
> >>> Hi MD,
> >>>
> >>> On Tue, 12 May 2026 at 07:06, MD Danish Anwar <danishanwar@ti.com> wrote:
> >>>>
> >>>> Replace the manually maintained ICSSG_NUM_MIIG_STATS and
> >>>> ICSSG_NUM_PA_STATS constants with ARRAY_SIZE() expressions derived
> >>>> directly from the corresponding stat descriptor arrays, so that adding
> >>>> new entries to icssg_all_miig_stats[] or icssg_all_pa_stats[] no longer
> >>>> requires a separate update to a numeric constant.
> >>>>
> >>>> To make this self-contained, break the circular include dependency
> >>>> between icssg_stats.h and icssg_prueth.h:
> >>>>
> >>>> - icssg_stats.h previously included icssg_prueth.h (transitively
> >>>> pulling in icssg_switch_map.h and ETH_GSTRING_LEN). Replace that
> >>>> with direct includes of <linux/ethtool.h>, <linux/kernel.h> and
> >>>> "icssg_switch_map.h".
> >>>>
> >>>> - icssg_prueth.h now includes icssg_stats.h, giving it access to
> >>>> the ARRAY_SIZE-based ICSSG_NUM_MIIG_STATS and ICSSG_NUM_PA_STATS
> >>>> before they are used in the prueth_emac struct and ICSSG_NUM_STATS.
> >>>>
> >>>> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
> >>>> ---
> >>>> drivers/net/ethernet/ti/icssg/icssg_prueth.h | 3 +--
> >>>> drivers/net/ethernet/ti/icssg/icssg_stats.h | 7 ++++++-
> >>>> 2 files changed, 7 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> >>>> index df93d15c5b78..e2ccecb0a0dd 100644
> >>>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> >>>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
> >>>> @@ -43,6 +43,7 @@
> >>>>
> >>>> #include "icssg_config.h"
> >>>> #include "icss_iep.h"
> >>>> +#include "icssg_stats.h"
> >>>> #include "icssg_switch_map.h"
> >>>>
> >>>> #define PRUETH_MAX_MTU (2000 - ETH_HLEN - ETH_FCS_LEN)
> >>>> @@ -57,8 +58,6 @@
> >>>>
> >>>> #define ICSSG_MAX_RFLOWS 8 /* per slice */
> >>>>
> >>>> -#define ICSSG_NUM_PA_STATS 32
> >>>> -#define ICSSG_NUM_MIIG_STATS 60
> >>>> /* Number of ICSSG related stats */
> >>>> #define ICSSG_NUM_STATS (ICSSG_NUM_MIIG_STATS + ICSSG_NUM_PA_STATS)
> >>>> #define ICSSG_NUM_STANDARD_STATS 31
> >>>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.h b/drivers/net/ethernet/ti/icssg/icssg_stats.h
> >>>> index 5ec0b38e0c67..b854eb587c1e 100644
> >>>> --- a/drivers/net/ethernet/ti/icssg/icssg_stats.h
> >>>> +++ b/drivers/net/ethernet/ti/icssg/icssg_stats.h
> >>>> @@ -8,10 +8,15 @@
> >>>> #ifndef __NET_TI_ICSSG_STATS_H
> >>>> #define __NET_TI_ICSSG_STATS_H
> >>>>
> >>>> -#include "icssg_prueth.h"
> >>>> +#include <linux/ethtool.h>
> >>>> +#include <linux/kernel.h>
> >>>> +#include "icssg_switch_map.h"
> >>>>
> >>>> #define STATS_TIME_LIMIT_1G_MS 25000 /* 25 seconds @ 1G */
> >>>>
> >>>> +#define ICSSG_NUM_MIIG_STATS ARRAY_SIZE(icssg_all_miig_stats)
> >>>> +#define ICSSG_NUM_PA_STATS ARRAY_SIZE(icssg_all_pa_stats)
> >>>> +
> >>>> struct miig_stats_regs {
> >>>> /* Rx */
> >>>> u32 rx_packets;
> >>>> --
> >>>> 2.34.1
> >>>>
> >>>
> >>> One thing that caught my eye: icssg_all_miig_stats[] and
> >>> icssg_all_pa_stats[] are 'static const' arrays in icssg_stats.h with
> >>> ETH_GSTRING_LEN name buffers per entry. Right now only icssg_stats.c
> >>> and icssg_ethtool.c pull them in. After this patch icssg_prueth.h
> >>> includes icssg_stats.h, so every .c in the driver (classifier,
> >>> common, config, mii_cfg, queues, switchdev, ...) ends up with its own
> >>> static-const copy of both tables.
> >>>
> >>> Would a static_assert() work for what you're after? Something like:
> >>>
> >>
> >> While adding more stats manually, The ARRAY_SIZE() approach was
> >> explicitly requested by maintainer [1]:
> >>
> >> This patch is a direct response to that feedback. static_assert() would
> >> still require updating the numeric constant on every array change. The
> >> goal here is to eliminate the need of manually incrementing stats count
> >> whenever new stats are added
> >>
> >> Your concern about multiple copies of table is noted and valid. Could
> >> you advise on the preferred way to reconcile these two requirements? I
> >> am happy to restructure if there is an approach that satisfies both.
> >>
> >> [1]
> >> https://lore.kernel.org/all/20260112181436.4s5ceywwembn674r@skbuf/#:~:text=Can%27t%20this%20be%20expressed%20as%20ARRAY_SIZE(icssg_all_pa_stats)%3F%20It%20is%20very%0Afragile%20to%20have%20to%20count%20and%20update%20this%20manually.
> >>
> >>
> >>> static const struct icssg_miig_stats icssg_all_miig_stats[] = {
> >>> ...
> >>> };
> >>> static_assert(ARRAY_SIZE(icssg_all_miig_stats) == ICSSG_NUM_MIIG_STATS);
> >>>
> >>> next to each array, keeping the numeric #defines as-is. Then 2/2 fails
> >>> to build the moment a new entry is added without bumping the count,
> >>> which is the case you're guarding against — without touching the
> >>> include graph.
> >>>
> >>> What do you think ?
> >>>
> >>> Cheers.
> >>
> >> --
> >> Thanks and Regards,
> >> Danish
> >>
> >
> >
> > Thanks for digging up the context — fair point, I'd missed Vladimir's
> > earlier ask. Reading it again though, what he calls fragile is the
> > silent miscount, not the keystroke of typing a number. A static_assert
> > turns "forgot to bump" into a build error, which I think gets you
> > there.
> >
>
> Thank you for the suggestion. I think your previous suggestion fits
> better. I believe keeping the arrays in icssg_stats.h is preferable to
> moving them to icssg_stats.c. Here is my reasoning:
>
>
> Your binary-bloat concern was about icssg_prueth.h including
> icssg_stats.h, which would drag the static const tables into every .c
> that includes icssg_prueth.h (~11 translation units). That concern is
> valid, but it is specific to the include direction of the previous
> patch. If we simply revert to the original include graph —
> icssg_stats.h includes icssg_prueth.h, not the other way around —
> only the two files that have always included icssg_stats.h directly
> (icssg_stats.c and icssg_ethtool.c) get a copy of the arrays. No
> regression in binary size compared to the baseline.
>
> > What about moving the two arrays into icssg_stats.c, declaring them
> > extern in the header, and dropping a static_assert next to each
> > definition? Numeric #defines stay where they are, icssg_prueth.h
> > doesn't need to know about icssg_stats.h, and the tables live in one
> > TU instead of every .o in the driver. If the count and the array
> > disagree, you get a compile error on the spot.
> >
>
> Moving the arrays to icssg_stats.c (approach #2) adds extern
> declarations, splits the definition from the static_assert, and is a
> larger restructuring for the same safety guarantee. Keeping the arrays
> in the header with a static_assert immediately after each one is a
> 2-line diff and leaves the code easy to read in one place.
>
> Please let me know if this sounds okay to you. I will send out a v2 soon
> if this approach is fine with you.
Sounds fine by me, note that I am not a maintainer ; I was just "chiming in" ;)
Cheers !
>
> > Probably worth keeping Vladimir on Cc for v2 in case he had something
> > else in mind.
> >
>
> I will CC Vladimir in v2.
>
> --
> Thanks and Regards,
> Danish
>
^ permalink raw reply
* [PATCH v2] docs/ja_JP: translate more of submitting-patches.rst (no-mime)
From: Akiyoshi Kurita @ 2026-05-13 13:11 UTC (permalink / raw)
To: linux-doc; +Cc: linux-kernel, corbet, akiyks, Akiyoshi Kurita
Translate the "No MIME, no links, no compression, no attachments.
Just plain text" and "Respond to review comments" sections in
Documentation/translations/ja_JP/process/submitting-patches.rst.
Keep the wording close to the English text and wrap lines to match
the style used in the surrounding Japanese translation.
Signed-off-by: Akiyoshi Kurita <weibu@redadmin.org>
---
v2:
- Make the subject unique.
- Reword the no-MIME section title.
- Refer to the untranslated "The canonical patch format" section by name with a TODO.
.../ja_JP/process/submitting-patches.rst | 63 +++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/Documentation/translations/ja_JP/process/submitting-patches.rst b/Documentation/translations/ja_JP/process/submitting-patches.rst
index 928e38a8d34d..165cb3ed94ec 100644
--- a/Documentation/translations/ja_JP/process/submitting-patches.rst
+++ b/Documentation/translations/ja_JP/process/submitting-patches.rst
@@ -292,3 +292,66 @@ MAINTAINERS ファイルに記載されている MAN-PAGES メンテナに
man-pages パッチ、少なくとも変更の通知を送って、情報が
マニュアルページに反映されるようにしてください。ユーザー空間 API の
変更は、linux-api@vger.kernel.org にも Cc してください。
+
+MIME・リンク・圧縮・添付なし、プレーンテキストのみ
+----------------------------------------------------
+
+Linus や他のカーネル開発者は、あなたが投稿する変更を読み、
+コメントできる必要があります。カーネル開発者が標準的な
+メールツールを使ってあなたの変更を「引用」し、コードの特定の
+箇所についてコメントできることが重要です。
+
+このため、すべてのパッチはメール本文中に ``inline`` で投稿すべきです。
+これを行う最も簡単な方法は ``git send-email`` を使うことであり、
+強く推奨されます。``git send-email`` の対話型チュートリアルは
+https://git-send-email.io で利用できます。
+
+``git send-email`` を使わないことを選ぶ場合:
+
+.. warning::
+
+ パッチをコピー&ペーストする場合は、エディタの word-wrap によって
+ パッチが壊れないよう注意してください。
+
+圧縮の有無にかかわらず、パッチを MIME 添付ファイルとして添付しては
+いけません。多くの一般的なメールアプリケーションは、MIME 添付
+ファイルを常にプレーンテキストとして送信するとは限らず、あなたの
+コードにコメントできなくなります。MIME 添付ファイルは Linus が
+処理するのにも少し余分な時間がかかるため、MIME 添付された変更が
+受け入れられる可能性を下げます。
+
+例外: メーラがパッチを壊してしまう場合は、誰かから MIME を使って
+再送するよう求められることがあります。
+
+パッチを変更せずに送信するようメールクライアントを設定するための
+ヒントについては、Documentation/process/email-clients.rst を参照してください。
+
+
+レビューコメントに返答する
+--------------------------
+
+あなたのパッチには、ほぼ確実に、パッチを改善する方法について
+レビューアからコメントが付きます。それは、あなたのメールへの返信という
+形で届きます。それらのコメントには必ず返答してください。レビューアを
+無視することは、こちらも無視されるためのよい方法です。コメントに
+答えるには、単にそのメールへ返信すれば構いません。コード変更に
+つながらないレビューコメントや質問であっても、次のレビューアが状況を
+よりよく理解できるように、ほぼ確実にコメントまたは changelog エントリに
+反映すべきです。
+
+どのような変更を行うのかをレビューアに必ず伝え、時間を割いてくれた
+ことに感謝してください。コードレビューは疲れる、時間のかかる作業であり、
+レビューアが不機嫌になることもあります。そのような場合であっても、
+丁寧に返答し、指摘された問題に対応してください。次の版を送るときは、
+cover letter または個々のパッチに ``patch changelog`` を追加し、前回の
+投稿との差分を説明してください。詳細は原文の該当節
+("The canonical patch format") を参照してください。
+
+.. TODO: Convert to file-local cross-reference when the destination is
+ translated.
+
+あなたのパッチにコメントした人には、パッチの Cc リストに追加して、
+新しい版を知らせてください。
+
+メールクライアントとメーリングリストでの作法についての推奨事項は、
+Documentation/process/email-clients.rst を参照してください。
--
2.47.3
^ permalink raw reply related
* Re: [PATCH v5 07/11] leds: flash: add support for Samsung S2M series PMIC flash LED device
From: Lee Jones @ 2026-05-13 14:00 UTC (permalink / raw)
To: Jacek Anaszewski
Cc: Kaustabh Chakraborty, Pavel Machek, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, MyungJoo Ham, Chanwoo Choi,
Sebastian Reichel, Krzysztof Kozlowski, André Draszik,
Alexandre Belloni, Jonathan Corbet, Shuah Khan, Nam Tran,
Łukasz Lebiedziński, linux-leds, devicetree,
linux-kernel, linux-pm, linux-samsung-soc, linux-rtc, linux-doc
In-Reply-To: <80d85385-f5af-44e3-b9ed-d4489542d4da@gmail.com>
On Thu, 07 May 2026, Jacek Anaszewski wrote:
> Hi Lee,
>
> On 5/7/26 6:46 PM, Lee Jones wrote:
> > On Fri, 24 Apr 2026, Kaustabh Chakraborty wrote:
> >
> > > Add support for flash LEDs in certain Samsung S2M series PMICs.
> > > The device has two channels for LEDs, typically for the back and front
> > > cameras in mobile devices. Both channels can be independently
> > > controlled, and can be operated in torch or flash modes.
> > >
> > > The driver includes initial support for the S2MU005 PMIC flash LEDs.
> > >
> > > Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
> > > ---
> > > drivers/leds/flash/Kconfig | 12 ++
> > > drivers/leds/flash/Makefile | 1 +
> > > drivers/leds/flash/leds-s2m-flash.c | 358 ++++++++++++++++++++++++++++++++++++
> > > 3 files changed, 371 insertions(+)
> > >
> > > diff --git a/drivers/leds/flash/Kconfig b/drivers/leds/flash/Kconfig
> > > index 5e08102a67841..be62e05277429 100644
> > > --- a/drivers/leds/flash/Kconfig
> > > +++ b/drivers/leds/flash/Kconfig
> > > @@ -114,6 +114,18 @@ config LEDS_RT8515
> > > To compile this driver as a module, choose M here: the module
> > > will be called leds-rt8515.
> > > +config LEDS_S2M_FLASH
> > > + tristate "Samsung S2M series PMICs flash/torch LED support"
> > > + depends on LEDS_CLASS
> > > + depends on MFD_SEC_CORE
> > > + depends on V4L2_FLASH_LED_CLASS || !V4L2_FLASH_LED_CLASS
> >
> > The `|| !V4L2_FLASH_LED_CLASS` part of this dependency makes it
> > unconditionally true. Was this intended? Perhaps this dependency can be
> > removed entirely.
> This is for a reason to allow building the driver if
> V4L2_FLASH_LED_CLASS is turned off, or build it as a module
> if V4L2_FLASH_LED_CLASS=m. You will get nice explanation from
> Google AI if you type just
> "V4L2_FLASH_LED_CLASS || !V4L2_FLASH_LED_CLASS".
>
> See e.g. [0], which fixes undefined symbol error by adding this.
>
> [0] https://git.paulk.fr/projects/linux.git/commit/drivers?h=sunxi/cedrus/jpeg-nv16&id=dbeb02a0bc41b9e9b9c05e460890351efecf1352
I see. Thanks for the explanation.
--
Lee Jones
^ permalink raw reply
* Re: [RFC v2 0/2] add kconfirm
From: Julian Braha @ 2026-05-13 13:59 UTC (permalink / raw)
To: Demi Marie Obenour, nathan, nsc
Cc: jani.nikula, akpm, gary, ljs, arnd, gregkh, masahiroy, ojeda,
corbet, qingfang.deng, linux-kernel, rust-for-linux, linux-doc,
linux-kbuild
In-Reply-To: <8fe7c7c8-00f7-4a72-a984-e929f71bec22@gmail.com>
On 5/11/26 05:24, Demi Marie Obenour wrote:
> This adds too many dependencies.
>
> Some suggestions:
>
> - Use system libcurl instead of ureq.
> - Use libc getopt_long instead of clap.
> - Use manual FFI bindings instead of third-party crates.
> - Use the C Kconfig parser instead of a third-party library.
Hi Demi, thanks for going in-depth on the alternatives.
Unfortunately, given the amount of analysis performed on the parse tree,
even just replacing the parser amounts to a wholesale rewrite of the
tool.
I will look into the others though, just to reduce the amount of trust
that kconfirm users need to invest in the Rust ecosystem.
- Julian Braha
^ permalink raw reply
* Re: [PATCH] Documentation: KVM: Document guest-visible compatibility expectations
From: David Woodhouse @ 2026-05-13 13:57 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier
Cc: Jonathan Corbet, Shuah Khan, kvm, linux-doc, linux-kernel,
Sean Christopherson, Jim Mattson, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
Raghavendra Rao Ananta, Eric Auger, Kees Cook, Arnd Bergmann,
Nathan Chancellor, linux-arm-kernel, kvmarm, linux-kselftest
In-Reply-To: <ba08dfe9-932b-40c3-9fdf-fc891d52e1d8@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 7241 bytes --]
On Wed, 2026-05-13 at 14:43 +0200, Paolo Bonzini wrote:
> On 5/13/26 11:24, David Woodhouse wrote:
> > On Wed, 2026-05-13 at 09:42 +0100, Marc Zyngier wrote:
> > > If userspace is not a total joke, it will read all the ID registers,
> > > and configure what it wants to see, assuming it is a feature that can
> > > be configured (not everything can, because the architecture itself is
> > > not fully backward compatible).
> > >
> > > Yes, this is buggy at times, because the combinatorial explosion of
> > > CPU capabilities and supported features makes it pretty hard to test
> > > (and really nobody actually does). But overall, it works, and QEMU is
> > > growing an infrastructure to manage it in a "user friendly" way.
> >
> > Yes, that is precisely what I'm asking for. I'm prepared to deal with
> > the fact that KVM/Arm64 is not a stable and mature platform like x86
> > is, and that userspace has to find all the random changes from one
> > version to the next, and explicitly pin things down to be compatible.
> >
> > All I'm asking for is that KVM makes it *possible* to pin things down
> > to the behaviour of previously released Linux/KVM kernels.
> >
> > > But really, this isn't what David is asking. He's demanding "bug for
> > > bug" compatibility. For that, we have two possible cases:
> >
> > No, I am not asking you to meet that bar. I merely observed that x86
> > does and that it would be nice. But we are a *long* way from that.
>
> x86 doesn't do bug-for-bug compatibility, thankfully - we have quirks
> but only 11 of them, or about one per year since we started adding them.
> We only add quirks, generally speaking, when 1) we change the way file
> descriptors are initialized, 2) guests in the wild were relying on it,
> or 3) it prevends restoring state saved from an old kernel. Is there
> anything else?
>
> So you're asking something not really far from this:
>
> > > - this is a behaviour that is not allowed by the architecture: we fix
> > > it for good. We do that on every release. Some minor, some much more
> > > visible. And there is no way we will add this sort of "bring the
> > > bugs back" type of behaviours. Specially when it is really obvious
> > > that no SW can make any reasonable use of the defect. We allow
> > > userspace to keep behaving as before, but the guest will not see a
> > > non-compliant behaviour.
>
> ... where for example
> https://lore.kernel.org/kvm/e03f092dfbb7d391a6bf2797ba01e122ba080bcd.camel@infradead.org/
> is an example of a bug that "no SW can make any reasonable use of".
I actually believe that the focus on ICEBP was triggered by some weird
gaming software's anti-DRM mechanism, and that it *did* affect actual
guests in the wild?
But yeah, *fixing* it should not have any adverse effects. That's the
key.
> > Marc, this is complete nonsense and you should know better.
> > Once a behaviour is present in a released version of Linux/KVM, we
> > can't just declare it "wrong" and unilaterally impose a change in
> > guest-visible behaviour on *running* guests as a side-effect of a
> > kernel upgrade.
> >
> > The criterion for *KVM* to remain compatible is "once it has been in a
> > released version of the kernel". Not "once it is in the architecture".
>
> That is *also* obviously nonsense though, isn't it (see example above)?
> The truth is in the middle, "once it is in the architecture" is likely
> too narrow but "once it is in a Linux release" is way too broad.
How about "once it is in a Linux release and guest visible, and unless
we *know* that changing it in either direction underneath running
guests cannot cause problems".
> And besides, both miss the point of *configurability* which is the basis of
> it all.
Hm, configurability *is* the point, I thought. I'm not asking for the
*default* to remain compatible. I only ask that a VMM *can* ask KVM for
guest-visible things to remain the same as before.
> The main difference between x86 and Arm is the default state at
> creation; x86 defaults to a blank slate, mostly; and when we didn't do
> that, we regretted it later (cue the STUFF_FEATURE_MSRS quirk). It's
> too late to change the behavior for Arm, but I think we can agree that
> patches such as
> https://lore.kernel.org/kvm/20260511113558.3325004-2-dwmw2@infradead.org/
> ("KVM: arm64: vgic: Allow userspace to set IIDR revision 1") are what
> the letter and spirit of this proposal is about.
Yes. That *exact* patch.
> Marc did not mention having to deal with guests in the wild. Let's
> ignore it for now because even defining "guests in the wild" is hard;
> and anyway it's not related to the patch that triggered the discussion.
>
> So we have the third case, "restoring state saved from an old kernel".
> If this case arises, I do believe that Arm will have to deal with it and
> introduce quirks or KVM_GET/SET_REG hacks. Maybe it hasn't happened
> yet, lucky you.
We literally have those mechanisms already. That's exactly what the
revision field in the IIDR is used for:
https://developer.arm.com/documentation/111107/2026-03/External-Registers/GICD-IIDR--Distributor-Implementer-Identification-Register
See commit https://git.kernel.org/torvalds/c/49a1a2c70a7f which adds a
new guest-visible feature in revision 3, but allowed userspace to
restore the old behaviour by setting it to revision 2. (Or at least
intended to; there was a separate bug which stopped it working, which I
already fixed last week.)
All my patch above does, is make it possible to set it to revision 1 as
well. Because https://git.kernel.org/torvalds/c/d53c2c29ae0d previously
changed the behaviour and bumped the default to 2 *without* allowing
userspace to restore the prior behaviour, and we've been carrying a
*revert* of that patch.
So the patch we're arguing about is just making that earlier guest-
visible change optional in precisely the way that is already designed
into KVM, and has been used for the subsequent change.
Why would we *not* accept such a patch?
It's not like I'm trying to upstream something like
https://david.woodhou.se/0001-Allow-writes-via-newly-readonly-PTE-for-buggy-Ubuntu.patch
... but yes, those *are* the lengths we have to go to sometimes to
ensure that when we upgrade the hosting environment, guests which have
worked for years don't suddenly break — however much they DESERVE to :)
> Overall, even if we may disagree about the details, are we really on
> terribly distant grounds, or are we not?
I genuinely have no idea.
On one hand, no we are not terribly distant. All the mechanisms to do
this properly already *exist*, and the fix I'm asking for is not much
more than a one-liner to fix up the previous oversight.
But on the other hand, Marc seems terribly insistent that we SHOULD NOT
restore the behaviour that older KVM offered to guests, and we MUST
change it unconditionally underneath running guests, making these
registers writable on upgrade... and reverting them to read-only for
running guests on a rollback.
And there we do have a very different viewpoint.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH] docs: leds: uleds: Make the documentation match the code.
From: Lee Jones @ 2026-05-13 13:45 UTC (permalink / raw)
To: Björn Persson
Cc: Pavel Machek, Jonathan Corbet, Shuah Khan, linux-leds, linux-doc,
linux-kernel
In-Reply-To: <20260510214308.09652225@tag.xn--rombobjrn-67a.se>
On Sun, 10 May 2026, Björn Persson wrote:
> Lee Jones wrote:
> > On Fri, 24 Apr 2026, Björn Persson wrote:
> >
> > > Lee Jones wrote:
> > > > On Thu, 02 Apr 2026, Björn Persson wrote:
> > > >
> > > > > +The current brightness is found by reading a whole int from the character
> > > >
> > > > Try not to shorten names in documentation "integer".
> > >
> > > The type is named "int" in C. There are many integer types, but it would
> > > be wrong to try to read a uint16_t or a size_t or any other integer
> > > type. The document needs to use the actual type name to make it clear to
> > > the reader that they must read sizeof(int) bytes.
> >
> > Right, but you're not writing in C.
>
> That's technically true, as I wrote my program in C++. It's far from my
> favorite, but I had to use a language that can include C header files
> and use C types, because /dev/uleds is a very C-centric interface.
>
> If API documentation isn't allowed to name a type, then I withdraw the
> patch. It's pointless to continue. The next programmer will also have to
> read the code to find out what the true API is, like I did.
It's not that it's "not allowed". 90% of my review comments are
suggestions. These ones are for simply for the sake of readability.
Equally, I'm also not coming up for blackmail or hissy fits. If you
don't want to continue with the review process, no one is going to beg
you.
--
Lee Jones
^ permalink raw reply
* Re: [PATCH 04/23] tick/nohz: Allow runtime changes in full dynticks CPUs
From: Frederic Weisbecker @ 2026-05-13 13:04 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Shuah Khan, Catalin Marinas, Will Deacon,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
Guenter Roeck, Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt,
Mathieu Desnoyers, Lai Jiangshan, Zqiang, Anna-Maria Behnsen,
Ingo Molnar, Chen Ridong, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Ben Segall, Mel Gorman,
Valentin Schneider, K Prateek Nayak, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, cgroups,
linux-doc, linux-kernel, linux-arm-kernel, linux-hyperv,
linux-hwmon, rcu, netdev, linux-kselftest, Costa Shulyupin,
Qiliang Yuan
In-Reply-To: <87340od7ev.ffs@tglx>
Le Tue, Apr 21, 2026 at 10:50:00AM +0200, Thomas Gleixner a écrit :
> On Mon, Apr 20 2026 at 23:03, Waiman Long wrote:
> > + /*
> > + * To properly enable/disable nohz_full dynticks for the affected CPUs,
> > + * the new nohz_full CPUs have to be copied to tick_nohz_full_mask and
> > + * ct_cpu_track_user/ct_cpu_untrack_user() will have to be called
> > + * for those CPUs that have their states changed. Those CPUs should be
> > + * in an offline state.
> > + */
> > + for_each_cpu_andnot(cpu, cpumask, tick_nohz_full_mask) {
> > + WARN_ON_ONCE(cpu_online(cpu));
> > + ct_cpu_track_user(cpu);
> > + cpumask_set_cpu(cpu, tick_nohz_full_mask);
> > + }
> > +
> > + for_each_cpu_andnot(cpu, tick_nohz_full_mask, cpumask) {
> > + WARN_ON_ONCE(cpu_online(cpu));
> > + ct_cpu_untrack_user(cpu);
> > + cpumask_clear_cpu(cpu, tick_nohz_full_mask);
> > + }
> > +}
>
> So this writes to tick_nohz_full_mask while other CPUs can access
> it. That's just wrong and I'm not at all interested in the resulting
> KCSAN warnings.
>
> tick_nohz_full_mask needs to become a RCU protected pointer, which is
> updated once the new mask is established in a separately allocated one.
How about just dropping tick_nohz_full_mask that is just
~housekeeping_cpumask(HK_TYPE_KERNEL_NOISE) which itself is becoming RCU
protected in this patchset?
Thanks.
>
> Thanks,
>
> tglx
>
>
--
Frederic Weisbecker
SUSE Labs
^ permalink raw reply
* Re: [PATCH RFC] printk: remove BOOT_PRINTK_DELAY
From: Petr Mladek @ 2026-05-13 13:04 UTC (permalink / raw)
To: Andrew Murray
Cc: Jonathan Corbet, Shuah Khan, Russell King, Florian Fainelli,
Ray Jui, Scott Branden, Broadcom internal kernel review list,
Steven Rostedt, John Ogness, Sergey Senozhatsky, Andrew Morton,
Sebastian Andrzej Siewior, Randy Dunlap, Clark Williams,
linux-doc, linux-kernel, linux-arm-kernel, linux-rpi-kernel,
linux-rt-devel, Linus Torvalds
In-Reply-To: <CALqELGxhXO=kzh9bpztd9=Ug9ykPL2NALo9Apq3=Oj6aeiEcKg@mail.gmail.com>
On Wed 2026-05-06 23:37:01, Andrew Murray wrote:
> On Tue, 5 May 2026 at 15:26, Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Tue 2026-05-05 14:45:00, Andrew Murray wrote:
> > > The CONFIG_BOOT_PRINTK_DELAY option enables support for the boot_delay
> > > kernel parameter, this allows for a configurable delay to be added before
> > > each and every printk is emitted. This is DEBUG_KERNEL option that is
> > > helpful for debugging as kernel output can be slowed down during boot
> > > allowing messages to be seen before scrolling off the screen, or to
> > > correlate timing between some physical event and console output.
> > >
> > > However, since the introduction of nbcon and the legacy printer thread for
> > > PREEMPT_RT kernels, printk records are now emited to the console
> > > asynchronously to the caller of printk and its boot_delay. The delay added
> > > by boot_delay continues to slow down the calling process, but may not have
> > > any impact to the rate in which records are emited to the console. For
> > > example, if delay_use is set to 100ms, and the printer thread has a
> > > backlog of more than 100ms, perhaps due to a slow serial console, then the
> > > records will appear to be printed without any delay between them.
> > >
> > > It would be unhelpful to add a delay to the printer thread, and it would
> > > not be possible to disallow selection of CONFIG_BOOT_PRINTK_DELAY at build
> > > time as it's not possible to detect which consoles are nbcon enabled at
> > > build time. Therefore, let's remove this feature.
> >
> > Heh, Randy proposed to remove "boot_delay" few days ago.
> > This RFC goes even further and remove both "boot_delay" and
> > "printk_delay".
>
> Apologies, I didn't see this. I'll co-ordinate with Randy.
No need to apologize.
> > Honestly, I do not feel comfortable by this. The delay seems to
> > be handy when there is only graphical console. I would suggest
> > to do:
> >
> > 1. Obsolete "boot_delay" with "printk_delay" as
> > proposed in Randy's thread, see
> > https://lore.kernel.org/all/afn2sYKKsqG4QBVX@pathway.suse.cz/
>
> Your suggestion was:
>
> " 1. Add "printk_delay" early_param() which would allow
> to set "printk_delay_msec" via command line."
>
> And I assume the intent is to replicate the functionality of
> boot_delay, by allowing printk_delay to be used to introduce delays
> from early_param time? Thus deprecating delay_use.
Exactly.
>
> " 2. Modify boot_delay_setup() to set "printk_delay_msec" as well.
> In addition, it might print a message that it has been
> obsoleted by "printk_delay" and will be removed."
>
> Given the intent may be to deprecate boot_delay, I'm not sure that
> setting printk_delay_msec as well would be beneficial, as this would
> extend its functionality to add delays beyond SYSTEM_RUNNING which is
> where boot_delay stops. Unless you mean to use boot_delay as an alias
> to an early_param hook for printk_delay?
I do not think that this is a big problem. As you write below, it is
a debug feature. IMHO, people debugging boot problems won't mind when
the delay continues beyond SYSTEM_RUNNING. And if anyone complains
than we would at least know that there are people using this feature ;-)
> It seems that there are also differences in behavior between
> printk_delay and boot_use, with printk_delay unconditionally adding
> delays to all printks, and delay_use which considers the loglevel.
The unconditional delay does not make much sense. I consider it a bug.
> >
> > 2. Move printk_delay() from vprintk_emit() to
> > console_emit_next_record() and nbcon_emit_next_record().
> >
> > For nbcon console, even better would be to use a sleeping
> > wait in nbcon_kthread_func(). But it would need some
> > changes to call it only when a record was really emitted.
> > Also we would need to use the busy wait in
> > __nbcon_atomic_flush_pending_con().
>
> This makes sense.
>
> If the use case (in a post kthread printk thread world), is only
> relevant for graphical consoles, then I do wonder if printk_delay and
> boot_delay can be replaced with a more specific solution? Now that we
> have printk threads, the time in which a printk is presented to the
> user may not relate to when it was created, and I fear people may
> continue to debug issues that rely on that assumption.
>
> I think the most pragmatic solution for now is:
> - Move the printk delay to the point where the printk is actually
> printed (e.g. console_flush_one_record and descendants)
> - Add an early_param to allow for printk_delay_msec to be set
> - Deprecate boot_delay, by using it as an alias for setting
> printk_delay_msec, and include a user mesage that it is being
> deprecated and that it now extends to beyond boot (which could impact
> performance on non PREEMPT_RT and non nbcon systems)
Sounds good.
> - Update printk_delay function to use the appropiate mechanism to
> delay based on stage of boot and using printk_delay_msec instead of
> boot_delay.
Good point! I thought that mdelay() can be used even for the early
messages because parse_early_param() is called right before
parse_args() in start_kernel() in init/main.c.
But parse_early_param() might be called even earlier, for example,
by setup_arch in arch/x86/kernel/setup.c. And it is called before
+ tsc_early_init()
+ tsc_enable_sched_clock()
+ loops_per_jiffy = get_loops_per_jiffy()
which seems to be used by
+ mdelay()
+ udelay()
+ __const_udelay()
Anyway, it has to be done before printk_delay_msec() can be set
via an early parameter.
> If that makes sense I can fashion a patchset.
That would be great.
Best Regards,
Petr
PS: Note that I am traveling the following week so my review might
get delayed.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox