* Re: [PATCH v3 3/4] docs: clk: include some identifiers to keep documentation up to date
From: Maxime Ripard @ 2026-05-13 5:48 UTC (permalink / raw)
To: Brian Masney
Cc: linux-clk, linux-doc, linux-kernel, Jonathan Corbet,
Maxime Ripard, Michael Turquette, Shuah Khan, Stephen Boyd
In-Reply-To: <20260511-clk-docs-v3-3-ed67e1065809@redhat.com>
On Mon, 11 May 2026 21:35:06 -0400, Brian Masney wrote:
> The clk documentation currently has a separate list of some members of
> struct clk_core and struct clk_ops. Now that all of these structures
> have proper kernel docs, let's go ahead and just include them here via
> the identifiers statement in kerneldoc.
>
>
> [ ... ]
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Thanks!
Maxime
^ permalink raw reply
* Re: [PATCH v3 4/4] clk: test: convert constants to use HZ_PER_MHZ
From: Maxime Ripard @ 2026-05-13 5:49 UTC (permalink / raw)
To: Brian Masney
Cc: linux-clk, linux-doc, linux-kernel, Jonathan Corbet,
Maxime Ripard, Michael Turquette, Shuah Khan, Stephen Boyd
In-Reply-To: <20260511-clk-docs-v3-4-ed67e1065809@redhat.com>
On Mon, 11 May 2026 21:35:07 -0400, Brian Masney wrote:
> Convert the DUMMY_CLOCK_* constants over to use HZ_PER_MHZ.
>
> Signed-off-by: Brian Masney <bmasney@redhat.com>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Thanks!
Maxime
^ permalink raw reply
* Re: [RFC net-next 0/4] devlink: Add boot-time defaults
From: Mark Bloch @ 2026-05-13 5:53 UTC (permalink / raw)
To: Jiri Pirko, Parav Pandit
Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, Andrew Lunn,
David S. Miller, Jonathan Corbet, Shuah Khan, Simon Horman,
Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Andrew Morton,
Borislav Petkov (AMD), Randy Dunlap, Dave Hansen,
Christian Brauner, Petr Mladek, Peter Zijlstra (Intel),
Thomas Gleixner, Pawan Gupta, Dapeng Mi, Kees Cook, Marco Elver,
Eric Biggers, NBU-Contact-Li Rongqing (EXTERNAL),
Paul E. McKenney, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org
In-Reply-To: <agNy3RF9WCHBPev5@FV6GYCPJ69>
On 12/05/2026 21:35, Jiri Pirko wrote:
> Tue, May 12, 2026 at 05:25:21PM CEST, parav@nvidia.com wrote:
>>
>>
>>> From: Jiri Pirko <jiri@resnulli.us>
>>> Sent: 12 May 2026 07:37 PM
>>>
>>> Tue, May 12, 2026 at 03:48:32PM CEST, parav@nvidia.com wrote:
>>>>
>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>> Sent: 12 May 2026 02:16 PM
>>>>>
>>>>> Mon, May 11, 2026 at 08:21:37PM +0200, parav@nvidia.com wrote:
>>>>>>
>>>>>>> From: Mark Bloch <mbloch@nvidia.com>
>>>>>>> Sent: 10 May 2026 06:02 PM
>>>>>>>
>>>>>>
>>>>>> [..]
>>>>>>
>>>>>>>> I look at it from the perspective that from some CX generation,
>>>>>>>> switchdev mode should be default. So that is a device-based decision.
>>>>>>>> I believe as such it can optionally be permanenty configured (nv config)
>>>>>>>> on older device. Why not?
>>>>>>>
>>>>>> Because sometimes switchdev_inactive is needed and sometimes not.
>>>>>> Such knob is not device decision.
>>>>>
>>>>> That is what I would call corner case. In that, user can use userspace
>>>>> configuration to change the mode in runtime.
>>>>>
>>>> Corner vs common depends on users one talks to. :)
>>>> If fw has switchdev(active) as default, and then
>>>> And user needs to run switchdev_inactive, it will actually break their switching applications.
>>>
>>> Can you describe the actutal breakage please?
>>>
>> Driver default was switchdev so all the traffic is forwarded to the switch,
>> and user didn't have chance to setup the fdb rules.
>> So packets are dropped but user didn't expect the traffic to be forwarded.
>
> User may switch mode to switchdev_inactive early on, before any of the
> representors are created. What's the issue then?
That is the ordering problem I am trying to solve.
On a DPU, the host PF cannot finish loading until the ECPF moves the eswitch to
switchdev/switchdev_inactive. So we need to do that transition during ECPF
driver init, as early as possible. Waiting for userspace means the host PF stays
blocked until userspace is up and has the right logic.
That is not always true in practice, the driver may be built in, loaded from an
initramfs, or the initramfs may simply not contain the devlink policy we need.
Also, after talking with Parav, my understanding is that we need to support both
switchdev and switchdev_inactive, since different customers want different boot
behavior. Once we do the transition, the host PF can load and may start sending
packets. At that point the initial mode already matters: in switchdev_inactive
packets are dropped until userspace programs the pipeline; in switchdev they may
reach the FDB before the pipeline is ready.
So I do not think an early userspace transition is equivalent here. The initial
mode needs to be known by the kernel before userspace runs, which is why I am
proposing the devlink= command line default.
Mark
>
>
>>
>> With this RFC, the device would start in the switchdev_inactive.
>> And user's goal is achieved.
>>
>>>>
>>>> So, one needs to invent switchdev_inactive in the FW.
>>>>
>>>> Jakub's suggestion in this RFC is covering both the scenarios uniformly without above problems.
>>>> Single uapi for all the cases, so looks good to me.
>>>>
>>>> Moreover, do not understand how alternative solves such problems.
>>>> i.e. user is unable to configure the fw because driver is not yet loaded/up.
>>>
>>> See my other reply in this thread. I don't think there is a need to
>>> configure anything in FW. If we fix the behaviour in switchdev mode for
>>> non-sriov user and change the default, no fw knob needed. What am I
>>> missing?
>>>
>> If I understood your suggestion right, is it the devlinkd based solution?
>
> The suggestion is to use "switchdev" as default with user configuration
> no matter if it is devlinkd or something else.
>
>
>>
>> If yes, then Mark explained that it has the issue of all drivers to be loaded, followed by user space to start.
^ permalink raw reply
* Re: [PATCH 1/2] Doc: deprecated.rst: add strlcat()
From: Heiko Carstens @ 2026-05-13 5:53 UTC (permalink / raw)
To: Manuel Ebner
Cc: Jani Nikula, andy.shevchenko, apw, corbet, dwaipayanray1, joe,
kees, linux-doc, linux-kernel, lukas.bulwahn, skhan, workflows
In-Reply-To: <46d8a5a8c8c77d3de9acfa1c55de2148fb2975c5.camel@mailbox.org>
On Tue, May 12, 2026 at 12:43:54PM +0200, Manuel Ebner wrote:
> On Tue, 2026-05-12 at 11:52 +0300, Jani Nikula wrote:
> > On Sun, 10 May 2026, Manuel Ebner <manuelebner@mailbox.org> wrote:
> > > add strlcat and alternatives
> >
> > You'd think it's the strlcat() definition that needs a comment above it
> > saying it's deprecated. I don't think folks really look at
> > deprecated.rst.
>
> arch/s390/lib/string.c
> lib/string.c
> and
> tools/include/nolibc/string.h
>
> do not mentions anything about obsolete.
>
> include/linux/fortify-string.h has
>
> /* Defined after fortified strlen() to reuse it. */
> extern size_t __real_strlcat(char *p, const char *q, size_t avail) __RENAME(strlcat);
> /**
> * strlcat - Append a string to an existing string
> * [...]
> * Do not use this function. While FORTIFY_SOURCE tries to avoid
> * read and write overflows, this is only possible when the sizes
> * of @p and @q are known to the compiler. Prefer building the
> * string with formatting, via scnprintf(), seq_buf, or similar.
>
> should i add this to the former three files?
I'm going to remove s390's implementation of strlcat(), and convert the only
two users of strlcat() in s390 code to something else. No reason to add a
comment there.
^ permalink raw reply
* Re: [PATCH 05/12] swap: cleanup setup_swap_extents
From: Christoph Hellwig @ 2026-05-13 5:56 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song,
Christian Brauner, Jens Axboe, David Sterba, Theodore Ts'o,
Jaegeuk Kim, Chao Yu, Trond Myklebust, Anna Schumaker,
Namjae Jeon, Hyunchul Lee, Steve French, Paulo Alcantara,
Carlos Maiolino, Damien Le Moal, Naohiro Aota, linux-xfs,
linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512164308.GF9555@frogsfrogsfrogs>
On Tue, May 12, 2026 at 09:43:08AM -0700, Darrick J. Wong wrote:
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 651c1b59ff9f..1b7fc03612f4 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -2783,25 +2783,24 @@ static int setup_swap_extents(struct swap_info_struct *sis,
> > {
> > struct address_space *mapping = swap_file->f_mapping;
> > struct inode *inode = mapping->host;
> > - int ret;
> > + int ret, error = 0;
>
> /me wonders why not reuse ret instead of declaring a new variable?
Because when I wrote this, the setup methods could still return a
positive number of extents value that must not be clobbered. Since
then I added patches before this that removed that, so we can use
the same ret variable.
^ permalink raw reply
* Re: [PATCH 09/12] swap: push down setting sis->bdev into ->swap_activate
From: Christoph Hellwig @ 2026-05-13 5:58 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song,
Christian Brauner, Jens Axboe, David Sterba, Theodore Ts'o,
Jaegeuk Kim, Chao Yu, Trond Myklebust, Anna Schumaker,
Namjae Jeon, Hyunchul Lee, Steve French, Paulo Alcantara,
Carlos Maiolino, Damien Le Moal, Naohiro Aota, linux-xfs,
linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512170846.GJ9555@frogsfrogsfrogs>
On Tue, May 12, 2026 at 10:08:46AM -0700, Darrick J. Wong wrote:
> > + /* Only one bdev per swap file for now. */
> > + if (!sis->bdev)
> > + sis->bdev = bdev;
> > + else if (bdev != sis->bdev)
> > + return -EINVAL;
>
> Should this return error if the bdev is zoned? AFAICT XFS and zonefs
> already guard against this, but other fses might be more naïve.
Yes, now that the bdev is passed down to add_swap_extent we could
consolidate the check here.
^ permalink raw reply
* Re: [PATCH v2 13/14] selftests/mm: add userfaultfd RWP tests
From: Mike Rapoport @ 2026-05-13 6:06 UTC (permalink / raw)
To: Kiryl Shutsemau (Meta)
Cc: akpm, peterx, david, ljs, surenb, vbabka, Liam.Howlett, ziy,
corbet, skhan, seanjc, pbonzini, jthoughton, aarcange, sj,
usama.arif, linux-mm, linux-kernel, linux-doc, linux-kselftest,
kvm, kernel-team
In-Reply-To: <e097db49bd0ada5f3c22f9c98c548c3b8ca24ba7.1778254670.git.kas@kernel.org>
On Fri, May 08, 2026 at 04:55:25PM +0100, Kiryl Shutsemau (Meta) wrote:
> Coverage for UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT:
>
> rwp-async async mode — touch pages, verify permissions are
> auto-restored without a message
> rwp-sync sync mode — access blocks, handler resolves via
> UFFDIO_RWPROTECT
> rwp-pagemap PAGEMAP_SCAN reports still-cold pages via
> inverted PAGE_IS_ACCESSED
> rwp-mprotect RWP survives mprotect(PROT_NONE) ->
> mprotect(PROT_READ|PROT_WRITE) round-trip
> rwp-gup GUP walks through a protnone RWP PTE (pipe
> write/read drives the GUP path)
> rwp-async-toggle UFFDIO_SET_MODE flips between sync and async
> without re-registering
> rwp-close closing the uffd restores page permissions
> rwp-fork RWP survives fork() with EVENT_FORK; child's
> PTEs keep the uffd bit
> rwp-fork-pin RWP survives fork() on an RO-longterm-pinned
> anon page (forces copy_present_page()); child
> read auto-resolves and clears the bit, proving
> PAGE_NONE was in place
> rwp-wp-exclusive register with MODE_WP|MODE_RWP returns -EINVAL
>
> All tests run against anon, shmem, shmem-private, hugetlb, and
> hugetlb-private memory, except rwp-fork-pin which is anon-only —
> copy_present_page() is the private-anon pinned-exclusive fork path.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6
> ---
> tools/testing/selftests/mm/uffd-unit-tests.c | 774 +++++++++++++++++++
> 1 file changed, 774 insertions(+)
>
> diff --git a/tools/testing/selftests/mm/uffd-unit-tests.c b/tools/testing/selftests/mm/uffd-unit-tests.c
> index 6f5e404a446c..a35fb677e4cc 100644
> --- a/tools/testing/selftests/mm/uffd-unit-tests.c
> +++ b/tools/testing/selftests/mm/uffd-unit-tests.c
> @@ -7,6 +7,7 @@
>
> #include "uffd-common.h"
>
> +#include <linux/fs.h>
> #include "../../../../mm/gup_test.h"
>
> #ifdef __NR_userfaultfd
> @@ -167,6 +168,23 @@ static int test_uffd_api(bool use_dev)
> goto out;
> }
>
> + /* Verify returned fd-level ioctls bitmask */
> + {
> + uint64_t expected_ioctls =
can be const uint64_t and declared at the top of the function to avoid
extra indentation here.
> + BIT_ULL(_UFFDIO_REGISTER) |
> + BIT_ULL(_UFFDIO_UNREGISTER) |
> + BIT_ULL(_UFFDIO_API) |
> + BIT_ULL(_UFFDIO_SET_MODE);
> +
> + if ((uffdio_api.ioctls & expected_ioctls) != expected_ioctls) {
> + uffd_test_fail("UFFDIO_API missing expected ioctls: "
> + "got=0x%"PRIx64", expected=0x%"PRIx64,
> + (uint64_t)uffdio_api.ioctls,
> + expected_ioctls);
> + goto out;
> + }
> + }
> +
> /* Test double requests of UFFDIO_API with a random feature set */
> uffdio_api.features = BIT_ULL(0);
> if (ioctl(uffd, UFFDIO_API, &uffdio_api) == 0) {
...
> +static void uffd_rwp_pagemap_test(uffd_global_test_opts_t *gopts,
> + uffd_test_args_t *args)
> +{
> + unsigned long nr_pages = gopts->nr_pages;
> + unsigned long page_size = gopts->page_size;
> + unsigned long p;
> + struct page_region regions[16];
> + struct pm_scan_arg pm_arg;
> + int pagemap_fd;
> + long ret;
...
> + /*
> + * PAGE_IS_ACCESSED is set once the uffd-wp bit has been cleared
> + * (access happened, or the user resolved). Invert it to select
> + * still-protected (cold) pages.
> + */
> + memset(&pm_arg, 0, sizeof(pm_arg));
> + pm_arg.size = sizeof(pm_arg);
> + pm_arg.start = (uint64_t)gopts->area_dst;
> + pm_arg.end = (uint64_t)gopts->area_dst + nr_pages * page_size;
> + pm_arg.vec = (uint64_t)regions;
> + pm_arg.vec_len = 16;
ARRAY_SIZE(regions)?
> + pm_arg.category_mask = PAGE_IS_ACCESSED;
> + pm_arg.category_inverted = PAGE_IS_ACCESSED;
> + pm_arg.return_mask = PAGE_IS_ACCESSED;
> +
> +}
> +
> +/*
> + * Test that RWP protection survives a mprotect(PROT_NONE) ->
> + * mprotect(PROT_READ|PROT_WRITE) round-trip. The uffd-wp bit on a
> + * VM_UFFD_RWP VMA must continue to carry PROT_NONE semantics after
> + * mprotect() changes the base protection; otherwise accesses would
> + * silently succeed and the pagemap bit would stick without a fault
> + * ever clearing it.
> + */
> +static void uffd_rwp_mprotect_test(uffd_global_test_opts_t *gopts,
> + uffd_test_args_t *args)
> +{
> + unsigned long nr_pages = gopts->nr_pages;
> + unsigned long page_size = gopts->page_size;
> + unsigned long p;
> + struct page_region regions[16];
> + struct pm_scan_arg pm_arg;
> + int pagemap_fd;
> + long ret;
...
> + memset(&pm_arg, 0, sizeof(pm_arg));
> + pm_arg.size = sizeof(pm_arg);
> + pm_arg.start = (uint64_t)gopts->area_dst;
> + pm_arg.end = (uint64_t)gopts->area_dst + nr_pages * page_size;
> + pm_arg.vec = (uint64_t)regions;
> + pm_arg.vec_len = 16;
ARRAY_SIZE(regions)?
> + pm_arg.category_mask = PAGE_IS_ACCESSED;
> + pm_arg.category_inverted = PAGE_IS_ACCESSED;
> + pm_arg.return_mask = PAGE_IS_ACCESSED;
> +
> + ret = ioctl(pagemap_fd, PAGEMAP_SCAN, &pm_arg);
> + close(pagemap_fd);
> +
> + if (ret < 0) {
> + uffd_test_fail("PAGEMAP_SCAN failed: %s", strerror(errno));
> + return;
> + }
> + if (ret != 0) {
> + uffd_test_fail("expected no cold pages after mprotect()+touch, got %ld regions",
> + ret);
> + return;
> + }
> +
> + uffd_test_pass();
> +}
> +
> +/*
> + * Test that GUP resolves through protnone PTEs (async mode).
> + * RW-protect pages, then use a pipe to exercise GUP on the RW-protected
> + * memory. write() from RW-protected pages triggers GUP which must fault
> + * through the protnone PTE.
> + */
> +static void uffd_rwp_gup_test(uffd_global_test_opts_t *gopts,
> + uffd_test_args_t *args)
> +{
> + unsigned long page_size = gopts->page_size;
> + char *buf;
> + int pipefd[2];
> +
> + buf = malloc(page_size);
> + if (!buf)
> + err("malloc");
> +
> + /* Populate first page with known content */
> + memset(gopts->area_dst, 0xCD, page_size);
> +
> + if (uffd_register_rwp(gopts->uffd, gopts->area_dst, page_size))
> + err("register failure");
> +
> + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst, page_size, true);
> +
> + if (pipe(pipefd))
> + err("pipe");
> +
> + /*
> + * write() from the RW-protected page into the pipe. This triggers
> + * GUP on the protnone PTE; in async mode the kernel auto-restores
> + * permissions and GUP succeeds. One byte is enough to exercise
> + * the GUP path and avoids any concern about pipe buffer sizing on
> + * large-page archs.
> + */
> + if (write(pipefd[1], gopts->area_dst, 1) != 1) {
> + uffd_test_fail("write from RW-protected page failed: %s",
> + strerror(errno));
> + goto out;
> + }
Sashiko (https://sashiko.dev/#/patchset/cover.1778254670.git.kas%40kernel.org?part=13):
Could this write() implementation be bypassing the intended test
logic?
... the write() call here will trigger standard hardware page
faults during copy_from_user() rather than the intended
get_user_pages() code path.
It also suggests to use vmsplice().
> +
> + if (read(pipefd[0], buf, 1) != 1) {
> + uffd_test_fail("read from pipe failed");
> + goto out;
> + }
> +
> + if (buf[0] != (char)0xCD) {
> + uffd_test_fail("content mismatch: got 0x%02x, expected 0xCD",
> + (unsigned char)buf[0]);
> + goto out;
> + }
> +
> + uffd_test_pass();
> +out:
> + close(pipefd[0]);
> + close(pipefd[1]);
> + free(buf);
> +}
> +
> +/*
> + * Test runtime toggle between async and sync modes.
> + * Start in async mode (detection), flip to sync (eviction), verify faults
> + * block, resolve them, flip back to async.
> + */
> +static void uffd_rwp_async_toggle_test(uffd_global_test_opts_t *gopts,
> + uffd_test_args_t *args)
> +{
> + unsigned long nr_pages = gopts->nr_pages;
> + unsigned long page_size = gopts->page_size;
> + struct uffd_args uargs = { };
> + pthread_t uffd_mon;
> + bool started = false;
> + char c = '\0';
> + unsigned long p;
> +
> + uargs.gopts = gopts;
> + uargs.handle_fault = uffd_handle_rwp_fault;
> +
> + /* Populate */
> + for (p = 0; p < nr_pages; p++)
> + memset(gopts->area_dst + p * page_size, p % 255 + 1, page_size);
> +
> + if (uffd_register_rwp(gopts->uffd, gopts->area_dst,
> + nr_pages * page_size))
> + err("register failure");
> +
> + /* Phase 1: async detection — RW-protect, access first half */
> + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst,
> + nr_pages * page_size, true);
> +
> + for (p = 0; p < nr_pages / 2; p++) {
> + volatile char *page = gopts->area_dst + p * page_size;
> + (void)*page; /* auto-resolves in async mode */
> + }
> +
> + /* Phase 2: flip to sync for eviction */
> + set_async_mode(gopts->uffd, false);
> +
> + /* Start handler — will receive faults for cold pages */
> + if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &uargs))
> + err("uffd_poll_thread create");
> + started = true;
> +
> + /* Access second half (cold pages) — should trigger sync faults */
> + for (p = nr_pages / 2; p < nr_pages; p++) {
> + unsigned char *page = (unsigned char *)gopts->area_dst +
> + p * page_size;
> + if (page[0] != (p % 255 + 1)) {
> + uffd_test_fail("page %lu content mismatch", p);
> + goto out;
> + }
> + }
> +
> + /*
> + * Stop the handler before reading minor_faults: the last fault
> + * resolution rwprotect_range()s before incrementing the counter,
> + * so the main thread can race ahead of the increment. Stopping
> + * here also makes Phase 3 a clean async-only test -- with the
> + * handler still running it would silently resolve any sync fault
> + * the kernel erroneously delivers, masking a regression.
> + */
> + if (write(gopts->pipefd[1], &c, sizeof(c)) != sizeof(c))
> + err("pipe write");
> + if (pthread_join(uffd_mon, NULL))
> + err("join() failed");
> + started = false;
I think 'started' is misleading, would "running_sync_test" better?
> +
> + if (uargs.minor_faults == 0) {
> + uffd_test_fail("expected sync faults, got 0");
> + goto out;
> + }
And it seems here we can just return and then started is not needed at
all.
> +
> + /* Phase 3: flip back to async */
> + set_async_mode(gopts->uffd, true);
> +
> + /* RW-protect and access again — should auto-resolve */
> + rwprotect_range(gopts->uffd, (uint64_t)gopts->area_dst,
> + nr_pages * page_size, true);
> +
> + for (p = 0; p < nr_pages; p++) {
> + volatile char *page = gopts->area_dst + p * page_size;
> + (void)*page;
> + }
> +
> + uffd_test_pass();
> +out:
> + if (started) {
> + if (write(gopts->pipefd[1], &c, sizeof(c)) != sizeof(c))
> + err("pipe write");
> + if (pthread_join(uffd_mon, NULL))
> + err("join() failed");
> + }
> +}
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH v8 4/4] kunit: Add documentation for warning backtrace suppression API
From: kernel test robot @ 2026-05-13 6:12 UTC (permalink / raw)
To: Albert Esteve, Arnd Bergmann, Brendan Higgins, David Gow,
Rae Moar, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Simona Vetter, Jonathan Corbet, Shuah Khan,
Andrew Morton, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti
Cc: oe-kbuild-all, Linux Memory Management List, linux-kernel,
linux-arch, linux-kselftest, kunit-dev, dri-devel, workflows,
linux-riscv, linux-doc, peterz, Guenter Roeck,
Linux Kernel Functional Testing, Dan Carpenter,
Alessandro Carminati
In-Reply-To: <20260504-kunit_add_support-v8-4-3e5957cdd235@redhat.com>
Hi Alessandro,
kernel test robot noticed the following build errors:
[auto build test ERROR on 80234b5ab240f52fa45d201e899e207b9265ef91]
url: https://github.com/intel-lab-lkp/linux/commits/Albert-Esteve/bug-kunit-Core-support-for-suppressing-warning-backtraces/20260513-043807
base: 80234b5ab240f52fa45d201e899e207b9265ef91
patch link: https://lore.kernel.org/r/20260504-kunit_add_support-v8-4-3e5957cdd235%40redhat.com
patch subject: [PATCH v8 4/4] kunit: Add documentation for warning backtrace suppression API
config: x86_64-rhel-9.4-ltp (https://download.01.org/0day-ci/archive/20260513/202605130826.e6Lyyytr-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260513/202605130826.e6Lyyytr-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605130826.e6Lyyytr-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/gpu/drm/drm_buddy.c:6:
>> include/kunit/test-bug.h:90:15: error: unknown type name 'bool'
90 | static inline bool kunit_is_suppressed_warning(bool count) { return false; }
| ^~~~
include/kunit/test-bug.h:13:1: note: 'bool' is defined in header '<stdbool.h>'; this is probably fixable by adding '#include <stdbool.h>'
12 | #include <linux/stddef.h> /* for NULL */
+++ |+#include <stdbool.h>
13 |
include/kunit/test-bug.h:90:48: error: unknown type name 'bool'
90 | static inline bool kunit_is_suppressed_warning(bool count) { return false; }
| ^~~~
include/kunit/test-bug.h:90:48: note: 'bool' is defined in header '<stdbool.h>'; this is probably fixable by adding '#include <stdbool.h>'
vim +/bool +90 include/kunit/test-bug.h
88
89 static inline struct kunit *kunit_get_current_test(void) { return NULL; }
> 90 static inline bool kunit_is_suppressed_warning(bool count) { return false; }
91
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [PATCH v2 14/14] Documentation/userfaultfd: document RWP working set tracking
From: Mike Rapoport @ 2026-05-13 6:26 UTC (permalink / raw)
To: Kiryl Shutsemau (Meta)
Cc: akpm, peterx, david, ljs, surenb, vbabka, Liam.Howlett, ziy,
corbet, skhan, seanjc, pbonzini, jthoughton, aarcange, sj,
usama.arif, linux-mm, linux-kernel, linux-doc, linux-kselftest,
kvm, kernel-team
In-Reply-To: <0b6f87fd4809245f9eebee73f34e2fb14230330c.1778254670.git.kas@kernel.org>
On Fri, May 08, 2026 at 04:55:26PM +0100, Kiryl Shutsemau (Meta) wrote:
> Add an admin-guide section covering UFFDIO_REGISTER_MODE_RWP:
>
> - sync and async fault models;
> - UFFDIO_RWPROTECT semantics;
> - UFFD_FEATURE_RWP_ASYNC;
> - UFFDIO_SET_MODE runtime mode flips.
>
> It also covers typical VMM working-set-tracking workflow from detection
> loop through sync-mode eviction and back to async.
We'd also need man page update at some point :)
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6
> ---
> Documentation/admin-guide/mm/userfaultfd.rst | 226 ++++++++++++++++++-
> 1 file changed, 220 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
> index 1e533639fd50..5ac4ae3dff1b 100644
> --- a/Documentation/admin-guide/mm/userfaultfd.rst
> +++ b/Documentation/admin-guide/mm/userfaultfd.rst
> @@ -275,16 +275,16 @@ tracking and it can be different in a few ways:
> - Dirty information will not get lost if the pte was zapped due to
> various reasons (e.g. during split of a shmem transparent huge page).
>
> - - Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit
> - set; dirty when uffd-wp bit cleared), it has different semantics on
> - some of the memory operations. For example: ``MADV_DONTNEED`` on
> + - Due to a reverted meaning of soft-dirty (page clean when the uffd bit
> + is set; dirty when the uffd bit is cleared), it has different semantics
> + on some of the memory operations. For example: ``MADV_DONTNEED`` on
> anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as
> - dirtying of memory by dropping uffd-wp bit during the procedure.
> + dirtying of memory by dropping the uffd bit during the procedure.
>
> The user app can collect the "written/dirty" status by looking up the
> -uffd-wp bit for the pages being interested in /proc/pagemap.
> +uffd bit for the pages being interested in /proc/pagemap.
>
> -The page will not be under track of uffd-wp async mode until the page is
> +The page will not be under track of userfaultfd-wp async mode until the page is
> explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode
> flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set. Trying to resolve a page fault
> that was tracked by async mode userfaultfd-wp is invalid.
> @@ -307,6 +307,220 @@ transparent to the guest, we want that same address range to act as if it was
> still poisoned, even though it's on a new physical host which ostensibly
> doesn't have a memory error in the exact same spot.
>
> +Read-Write Protection
> +---------------------
> +
> +``UFFDIO_REGISTER_MODE_RWP`` enables read-write protection tracking on a
> +memory range. It is similar to (but faster than) ``mprotect(PROT_NONE)``
> +combined with a signal handler; unlike ``mprotect(PROT_NONE)``, RWP only
> +traps accesses to *present* PTEs, so accesses to unpopulated addresses in a
> +protected range fall through to the normal missing-page path. It uses the
> +PROT_NONE hinting mechanism (same as NUMA balancing) to make pages
> +inaccessible while keeping them resident in memory. Works on anonymous,
> +shmem, and hugetlbfs memory.
> +
> +This is designed for VM memory managers that need to track the working set
This feature? Or RWP mode?
> +of guest memory for cold page eviction to tiered or remote storage.
> +
> +**Setup:**
> +
> +1. Open a userfaultfd and enable ``UFFD_FEATURE_RWP`` via ``UFFDIO_API``.
> + Optionally request ``UFFD_FEATURE_RWP_ASYNC`` as well — it requires
> + ``UFFD_FEATURE_RWP`` to be set in the same ``UFFDIO_API`` call.
> +
> +2. Register the guest memory range with ``UFFDIO_REGISTER_MODE_RWP``
> + (and ``UFFDIO_REGISTER_MODE_MISSING`` if evicted pages will need to be
> + fetched back from storage).
> +
> +**Feature availability:**
> +
> +RWP is built on top of two kernel primitives: a spare PTE bit owned by
> +userfaultfd (``CONFIG_HAVE_ARCH_USERFAULTFD_WP``) and arch support for
Please spell out architecture.
> +present-but-inaccessible PTEs (``CONFIG_ARCH_HAS_PTE_PROTNONE``). When both
> +are available on a 64-bit kernel, the build selects
> +``CONFIG_USERFAULTFD_RWP=y`` and the ``VM_UFFD_RWP`` VMA flag becomes
> +available.
> +
> +``UFFD_FEATURE_RWP`` and ``UFFD_FEATURE_RWP_ASYNC`` are masked out of the
> +features returned by ``UFFDIO_API`` when the running kernel or architecture
> +cannot support them — for example 32-bit kernels (where ``VM_UFFD_RWP`` is
> +unavailable), kernels built without ``CONFIG_USERFAULTFD_RWP``, and
> +architectures whose ptes cannot carry the uffd bit at runtime (e.g. riscv
> +without the ``SVRSW60T59B`` extension). ``UFFDIO_API`` does not fail;
> +unsupported bits are simply absent from ``uffdio_api.features`` on return.
> +VMMs should inspect the returned ``features`` after ``UFFDIO_API`` and fall
Lets s/VMM/Callers/.
Although RWP is designed for VMMs, it's not limited to them and I expect
other use-cases will be coming along.
> +back to another tracking method when RWP is unavailable.
> +
> +**Protecting and Unprotecting:**
> +
> +Use ``UFFDIO_RWPROTECT`` to protect or unprotect a range, mirroring the
> +``UFFDIO_WRITEPROTECT`` interface::
> +
> + struct uffdio_rwprotect rwp = {
> + .range = { .start = addr, .len = len },
> + .mode = UFFDIO_RWPROTECT_MODE_RWP, /* protect */
> + };
> + ioctl(uffd, UFFDIO_RWPROTECT, &rwp);
> +
> +Setting ``UFFDIO_RWPROTECT_MODE_RWP`` sets PROT_NONE on present PTEs in the
> +range. Pages stay resident and their physical frames are preserved — only
> +access permissions are removed.
> +
> +Clearing ``UFFDIO_RWPROTECT_MODE_RWP`` restores normal VMA permissions and
> +wakes any faulting threads (unless ``UFFDIO_RWPROTECT_MODE_DONTWAKE`` is set).
> +
> +**Scope of protection:**
> +
> +RWP protection is a property of *present* PTEs. ``UFFDIO_RWPROTECT`` only
> +affects entries that are already populated. Unpopulated addresses within
> +the range remain unpopulated; when first accessed they fault through the
> +normal missing path (``do_anonymous_page()``, ``do_swap_page()``,
> +``finish_fault()``) and the resulting PTE is not RWP-protected. To observe
> +the population itself, co-register the range with
> +``UFFDIO_REGISTER_MODE_MISSING``.
> +
> +Protection is preserved across page reclaim: a page swapped out while
> +RWP-protected carries the marker on its swap entry, and swap-in restores
> +the PROT_NONE state so the first access after swap-in still faults. The
> +same applies to pages temporarily replaced by migration entries.
> +
> +Operations that drop the PTE entirely — ``MADV_DONTNEED`` on anonymous
> +memory, hole-punch on shmem, truncation of a file mapping — also drop the
> +RWP marker: the next access re-populates the range without protection.
> +Unlike WP (which persists via ``PTE_MARKER_UFFD_WP``), there is no
> +persistent RWP marker today. The VMM needs to re-arm the range with
s/VMM/User/
> +``UFFDIO_RWPROTECT`` after any operation that explicitly frees PTEs.
> +
> +**Fault Handling:**
> +
> +When a protected page is accessed:
> +
> +- **Sync mode** (default): The faulting thread blocks and a
> + ``UFFD_PAGEFAULT_FLAG_RWP`` message is delivered to the userfaultfd
> + handler. The handler resolves the fault with ``UFFDIO_RWPROTECT``
> + (clearing ``MODE_RWP``), which restores the PTE permissions and wakes
> + the faulting thread.
> +
> +- **Async mode** (``UFFD_FEATURE_RWP_ASYNC``): The kernel automatically
> + restores PTE permissions and the thread continues without blocking. No
> + message is delivered to the handler.
> +
> +**Runtime Mode Switching:**
> +
> +``UFFDIO_SET_MODE`` toggles ``UFFD_FEATURE_RWP_ASYNC`` at runtime, allowing
> +the VMM to switch between lightweight async detection and safe sync
> +eviction without re-registering. The toggle takes ``mmap_write_lock()`` to
> +ensure all in-flight faults complete before the mode change takes effect.
> +
> +**Cold Page Detection with PAGEMAP_SCAN:**
> +
> +RWP-protected PTEs carry the uffd PTE bit; the fault-resolution path
> +clears it. ``PAGEMAP_SCAN`` reports ``PAGE_IS_ACCESSED`` once the bit is
> +clear on a ``VM_UFFD_RWP`` VMA, so inverting it efficiently reports the
> +still-protected (cold) pages::
> +
> + struct pm_scan_arg arg = {
> + .size = sizeof(arg),
> + .start = guest_mem_start,
> + .end = guest_mem_end,
> + .vec = (uint64_t)regions,
> + .vec_len = regions_len,
> + .category_mask = PAGE_IS_ACCESSED,
> + .category_inverted = PAGE_IS_ACCESSED,
> + .return_mask = PAGE_IS_ACCESSED,
> + };
> + long n = ioctl(pagemap_fd, PAGEMAP_SCAN, &arg);
> +
> +The returned ``page_region`` array contains contiguous cold ranges that can
> +then be evicted.
> +
> +**Cleanup:**
> +
> +When the userfaultfd is closed or the range is unregistered, all PROT_NONE
> +PTEs are automatically restored to their normal VMA permissions. This
> +prevents pages from becoming permanently inaccessible.
> +
> +**VMM Working Set Tracking Workflow:**
> +
> +A typical VMM lifecycle for cold page eviction to tiered storage. Two
> +mappings of the same shmem (or hugetlbfs) file are used: ``guest_mem`` is
> +the RWP-registered mapping that vCPUs access through, and ``io_mem`` is a
> +private mapping for VMM-side I/O. Reading ``io_mem`` does not go through
> +the RWP-protected PTEs of ``guest_mem``, so the VMM's own ``pwrite()``
> +never traps on its own ::
> +
> + /* One-time setup */
> + fd = memfd_create("guest", MFD_CLOEXEC);
> + ftruncate(fd, guest_size);
> + guest_mem = mmap(NULL, guest_size, PROT_READ | PROT_WRITE,
> + MAP_SHARED, fd, 0); /* vCPU view, RWP-registered */
> + io_mem = mmap(NULL, guest_size, PROT_READ | PROT_WRITE,
> + MAP_SHARED, fd, 0); /* VMM I/O view, unprotected */
> +
> + uffd = userfaultfd(O_CLOEXEC | O_NONBLOCK);
> + ioctl(uffd, UFFDIO_API, &(struct uffdio_api){
> + .api = UFFD_API,
> + .features = UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC,
> + });
> + ioctl(uffd, UFFDIO_REGISTER, &(struct uffdio_register){
> + .range = { guest_mem, guest_size },
> + .mode = UFFDIO_REGISTER_MODE_RWP |
> + UFFDIO_REGISTER_MODE_MISSING,
> + });
> +
> + /* Tracking loop */
> + while (vm_running) {
> + /* 1. Detection phase (async — no vCPU stalls) */
> + ioctl(uffd, UFFDIO_RWPROTECT, &(struct uffdio_rwprotect){
> + .range = full_range,
> + .mode = UFFDIO_RWPROTECT_MODE_RWP });
> + sleep(tracking_interval);
> +
> + /* 2. Find cold pages (uffd bit still set) */
> + ioctl(pagemap_fd, PAGEMAP_SCAN, &(struct pm_scan_arg){
> + .category_mask = PAGE_IS_ACCESSED,
> + .category_inverted = PAGE_IS_ACCESSED,
> + .return_mask = PAGE_IS_ACCESSED,
> + ...
> + });
> +
> + /* 3. Switch to sync for safe eviction */
> + ioctl(uffd, UFFDIO_SET_MODE,
> + &(struct uffdio_set_mode){
> + .disable = UFFD_FEATURE_RWP_ASYNC });
> +
> + /* 4. Evict cold pages (vCPU faults block on guest_mem) */
> + for each cold range:
> + /* Read from io_mem -- bypasses RWP, no fault. */
> + pwrite(storage_fd, io_mem + cold_offset, len, offset);
> + /* Drop the page from the shared file. */
> + fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> + cold_offset, len);
> + /*
> + * Wake any vCPU blocked on the RWP fault for this range:
> + * fallocate() does not iterate ctx->fault_pending_wqh.
> + */
> + ioctl(uffd, UFFDIO_WAKE, &(struct uffdio_range){
> + .start = (uintptr_t)guest_mem + cold_offset,
> + .len = len });
> +
> + /* 5. Resume async tracking */
> + ioctl(uffd, UFFDIO_SET_MODE,
> + &(struct uffdio_set_mode){
> + .enable = UFFD_FEATURE_RWP_ASYNC });
> + }
> +
> +During step 4, a vCPU that accesses ``guest_mem + cold_offset`` blocks
> +with a ``UFFD_PAGEFAULT_FLAG_RWP`` fault while the eviction is in
> +progress. After ``fallocate()`` punches the page out and ``UFFDIO_WAKE``
> +fires, the vCPU retries the access, faults as ``MISSING``, and the
> +handler resolves it with ``UFFDIO_COPY`` from storage.
> +
> +This workflow targets shmem and hugetlbfs (both support a private
> +``io_mem`` mapping over the same fd). Anonymous-memory backings need a
> +different inner-loop strategy because the VMM has no way to read the
> +page without going through the RWP-protected mapping.
> +
> QEMU/KVM
> ========
>
> --
> 2.51.2
>
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: ti: icssg: Derive stats array lengths from ARRAY_SIZE
From: MD Danish Anwar @ 2026-05-13 6:29 UTC (permalink / raw)
To: David CARLIER
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Jonathan Corbet, Shuah Khan, Roger Quadros,
Andrew Lunn, Jacob Keller, Meghana Malladi, Kevin Hao,
Vadim Fedorenko, netdev, linux-doc, linux-kernel,
linux-arm-kernel, Vignesh Raghavendra
In-Reply-To: <CA+XhMqzx9CUX5H7q1UqL=heGWLFjZVfyiTx6b45VW=E9t13Fow@mail.gmail.com>
Hi David
On 12/05/26 3:33 pm, David CARLIER wrote:
> Hi Danish,
>
>
> On Tue, 12 May 2026 at 10:40, MD Danish Anwar <danishanwar@ti.com> wrote:
>>
>> Hi David,
>>
>> On 12/05/26 1:28 pm, David CARLIER wrote:
>>> Hi MD,
>>>
>>> On Tue, 12 May 2026 at 07:06, MD Danish Anwar <danishanwar@ti.com> wrote:
>>>>
>>>> Replace the manually maintained ICSSG_NUM_MIIG_STATS and
>>>> ICSSG_NUM_PA_STATS constants with ARRAY_SIZE() expressions derived
>>>> directly from the corresponding stat descriptor arrays, so that adding
>>>> new entries to icssg_all_miig_stats[] or icssg_all_pa_stats[] no longer
>>>> requires a separate update to a numeric constant.
>>>>
>>>> To make this self-contained, break the circular include dependency
>>>> between icssg_stats.h and icssg_prueth.h:
>>>>
>>>> - icssg_stats.h previously included icssg_prueth.h (transitively
>>>> pulling in icssg_switch_map.h and ETH_GSTRING_LEN). Replace that
>>>> with direct includes of <linux/ethtool.h>, <linux/kernel.h> and
>>>> "icssg_switch_map.h".
>>>>
>>>> - icssg_prueth.h now includes icssg_stats.h, giving it access to
>>>> the ARRAY_SIZE-based ICSSG_NUM_MIIG_STATS and ICSSG_NUM_PA_STATS
>>>> before they are used in the prueth_emac struct and ICSSG_NUM_STATS.
>>>>
>>>> Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
>>>> ---
>>>> drivers/net/ethernet/ti/icssg/icssg_prueth.h | 3 +--
>>>> drivers/net/ethernet/ti/icssg/icssg_stats.h | 7 ++++++-
>>>> 2 files changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>>>> index df93d15c5b78..e2ccecb0a0dd 100644
>>>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>>>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>>>> @@ -43,6 +43,7 @@
>>>>
>>>> #include "icssg_config.h"
>>>> #include "icss_iep.h"
>>>> +#include "icssg_stats.h"
>>>> #include "icssg_switch_map.h"
>>>>
>>>> #define PRUETH_MAX_MTU (2000 - ETH_HLEN - ETH_FCS_LEN)
>>>> @@ -57,8 +58,6 @@
>>>>
>>>> #define ICSSG_MAX_RFLOWS 8 /* per slice */
>>>>
>>>> -#define ICSSG_NUM_PA_STATS 32
>>>> -#define ICSSG_NUM_MIIG_STATS 60
>>>> /* Number of ICSSG related stats */
>>>> #define ICSSG_NUM_STATS (ICSSG_NUM_MIIG_STATS + ICSSG_NUM_PA_STATS)
>>>> #define ICSSG_NUM_STANDARD_STATS 31
>>>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.h b/drivers/net/ethernet/ti/icssg/icssg_stats.h
>>>> index 5ec0b38e0c67..b854eb587c1e 100644
>>>> --- a/drivers/net/ethernet/ti/icssg/icssg_stats.h
>>>> +++ b/drivers/net/ethernet/ti/icssg/icssg_stats.h
>>>> @@ -8,10 +8,15 @@
>>>> #ifndef __NET_TI_ICSSG_STATS_H
>>>> #define __NET_TI_ICSSG_STATS_H
>>>>
>>>> -#include "icssg_prueth.h"
>>>> +#include <linux/ethtool.h>
>>>> +#include <linux/kernel.h>
>>>> +#include "icssg_switch_map.h"
>>>>
>>>> #define STATS_TIME_LIMIT_1G_MS 25000 /* 25 seconds @ 1G */
>>>>
>>>> +#define ICSSG_NUM_MIIG_STATS ARRAY_SIZE(icssg_all_miig_stats)
>>>> +#define ICSSG_NUM_PA_STATS ARRAY_SIZE(icssg_all_pa_stats)
>>>> +
>>>> struct miig_stats_regs {
>>>> /* Rx */
>>>> u32 rx_packets;
>>>> --
>>>> 2.34.1
>>>>
>>>
>>> One thing that caught my eye: icssg_all_miig_stats[] and
>>> icssg_all_pa_stats[] are 'static const' arrays in icssg_stats.h with
>>> ETH_GSTRING_LEN name buffers per entry. Right now only icssg_stats.c
>>> and icssg_ethtool.c pull them in. After this patch icssg_prueth.h
>>> includes icssg_stats.h, so every .c in the driver (classifier,
>>> common, config, mii_cfg, queues, switchdev, ...) ends up with its own
>>> static-const copy of both tables.
>>>
>>> Would a static_assert() work for what you're after? Something like:
>>>
>>
>> While adding more stats manually, The ARRAY_SIZE() approach was
>> explicitly requested by maintainer [1]:
>>
>> This patch is a direct response to that feedback. static_assert() would
>> still require updating the numeric constant on every array change. The
>> goal here is to eliminate the need of manually incrementing stats count
>> whenever new stats are added
>>
>> Your concern about multiple copies of table is noted and valid. Could
>> you advise on the preferred way to reconcile these two requirements? I
>> am happy to restructure if there is an approach that satisfies both.
>>
>> [1]
>> https://lore.kernel.org/all/20260112181436.4s5ceywwembn674r@skbuf/#:~:text=Can%27t%20this%20be%20expressed%20as%20ARRAY_SIZE(icssg_all_pa_stats)%3F%20It%20is%20very%0Afragile%20to%20have%20to%20count%20and%20update%20this%20manually.
>>
>>
>>> static const struct icssg_miig_stats icssg_all_miig_stats[] = {
>>> ...
>>> };
>>> static_assert(ARRAY_SIZE(icssg_all_miig_stats) == ICSSG_NUM_MIIG_STATS);
>>>
>>> next to each array, keeping the numeric #defines as-is. Then 2/2 fails
>>> to build the moment a new entry is added without bumping the count,
>>> which is the case you're guarding against — without touching the
>>> include graph.
>>>
>>> What do you think ?
>>>
>>> Cheers.
>>
>> --
>> Thanks and Regards,
>> Danish
>>
>
>
> Thanks for digging up the context — fair point, I'd missed Vladimir's
> earlier ask. Reading it again though, what he calls fragile is the
> silent miscount, not the keystroke of typing a number. A static_assert
> turns "forgot to bump" into a build error, which I think gets you
> there.
>
Thank you for the suggestion. I think your previous suggestion fits
better. I believe keeping the arrays in icssg_stats.h is preferable to
moving them to icssg_stats.c. Here is my reasoning:
Your binary-bloat concern was about icssg_prueth.h including
icssg_stats.h, which would drag the static const tables into every .c
that includes icssg_prueth.h (~11 translation units). That concern is
valid, but it is specific to the include direction of the previous
patch. If we simply revert to the original include graph —
icssg_stats.h includes icssg_prueth.h, not the other way around —
only the two files that have always included icssg_stats.h directly
(icssg_stats.c and icssg_ethtool.c) get a copy of the arrays. No
regression in binary size compared to the baseline.
> What about moving the two arrays into icssg_stats.c, declaring them
> extern in the header, and dropping a static_assert next to each
> definition? Numeric #defines stay where they are, icssg_prueth.h
> doesn't need to know about icssg_stats.h, and the tables live in one
> TU instead of every .o in the driver. If the count and the array
> disagree, you get a compile error on the spot.
>
Moving the arrays to icssg_stats.c (approach #2) adds extern
declarations, splits the definition from the static_assert, and is a
larger restructuring for the same safety guarantee. Keeping the arrays
in the header with a static_assert immediately after each one is a
2-line diff and leaves the code easy to read in one place.
Please let me know if this sounds okay to you. I will send out a v2 soon
if this approach is fine with you.
> Probably worth keeping Vladimir on Cc for v2 in case he had something
> else in mind.
>
I will CC Vladimir in v2.
--
Thanks and Regards,
Danish
^ permalink raw reply
* Re: [PATCH 08/12] swap,iomap: simplify iomap_swapfile_iter
From: Christoph Hellwig @ 2026-05-13 6:56 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Christoph Hellwig, Andrew Morton, Chris Li, Kairui Song,
Christian Brauner, Jens Axboe, David Sterba, Theodore Ts'o,
Jaegeuk Kim, Chao Yu, Trond Myklebust, Anna Schumaker,
Namjae Jeon, Hyunchul Lee, Steve French, Paulo Alcantara,
Carlos Maiolino, Damien Le Moal, Naohiro Aota, linux-xfs,
linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260512170204.GI9555@frogsfrogsfrogs>
On Tue, May 12, 2026 at 10:02:04AM -0700, Darrick J. Wong wrote:
> OH. Now I remember why -- it's to handle contiguous mixed mappings
> better.
>
> Let's say that you have a 1k fsblock filesystem and 4k base pages. You
> fallocate an 8G swap file and then mkswap it. The first mapping is a 1k
> written mapping at offset 0 for the swap header, followed by an 8388607k
> unwritten mapping at offset 3k.
>
> The PAGE_SIZE rounding code in iomap_swapfile_add_extent will round the
> end of that first mapping down to zero and ignore it. The second
> mapping will be treated as if it were a 8388604k mapping starting at
> offset 4096. Now the page counts are wrong and the swapon fails.
Do we care about this use case? I guess you did as you implemented
his, but still?
>
> A more generic solution to this would be to change add_swap_extent to
> take sector_t addr and length values and use them to construct a bitmap
> representing contiguous physical space on the bdev, accounting of course
> for PAGE_SIZE alignment. Except for the swap header page, every other
> contiguously set page-aligned region in the bitmap gets added to the
> swap extent map.
You don't even need a bitmap, just do basically the same checks as
the iomap code when moving to a new swap extent after moving to use
the sector_t. And it really should anyway, as the current abuse of
sector_t to store a disk offset in PAGE_SIZE units is pretty gross.
^ permalink raw reply
* Re: [PATCH v12 02/11] lib: kstrtox: add kstrtoudec64() and kstrtodec64()
From: Rodrigo Alencar @ 2026-05-13 7:14 UTC (permalink / raw)
To: Andy Shevchenko, Rodrigo Alencar
Cc: Andy Shevchenko, Jonathan Cameron, Rodrigo Alencar via B4 Relay,
rodrigo.alencar, linux-kernel, linux-iio, devicetree, linux-doc,
David Lechner, Andy Shevchenko, Lars-Peter Clausen,
Michael Hennerich, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Jonathan Corbet, Andrew Morton, Petr Mladek, Steven Rostedt,
Rasmus Villemoes, Sergey Senozhatsky, Shuah Khan, David Laight
In-Reply-To: <agOKq0iH2CHQ3TIg@ashevche-desk.local>
On 26/05/12 11:16PM, Andy Shevchenko wrote:
> On Tue, May 12, 2026 at 08:39:21PM +0100, Rodrigo Alencar wrote:
> > On 26/05/12 10:08PM, Andy Shevchenko wrote:
> > > On Tue, May 12, 2026 at 07:15:17PM +0100, Rodrigo Alencar wrote:
> > > > On 26/05/12 08:46PM, Andy Shevchenko wrote:
> > > > > On Tue, May 12, 2026 at 06:26:12PM +0100, Rodrigo Alencar wrote:
> > > > > > On 26/05/12 08:13PM, Andy Shevchenko wrote:
> > > > > > > On Tue, May 12, 2026 at 05:35:59PM +0100, Rodrigo Alencar wrote:
> > > > > > > > On 26/05/12 06:21PM, Andy Shevchenko wrote:
> > > > > > > > > On Tue, May 12, 2026 at 6:11 PM Rodrigo Alencar
> > > > > > > > > <455.rodrigo.alencar@gmail.com> wrote:
> > > > > > > > > > On 26/05/12 05:43PM, Andy Shevchenko wrote:
> > > > > > > > > > > On Tue, May 12, 2026 at 03:12:24PM +0100, Rodrigo Alencar wrote:
> > > > > > > > > > > > On 26/05/12 04:48PM, Andy Shevchenko wrote:
> > > > > > > > > > > > > On Tue, May 12, 2026 at 02:21:14PM +0100, Rodrigo Alencar wrote:
> > > > > > > > > > > > > > On 26/05/12 04:12PM, Andy Shevchenko wrote:
> > > > > > > > > > > > > > > On Tue, May 12, 2026 at 12:39:53PM +0100, Jonathan Cameron wrote:
> > > > > > > > > > > > > > > > On Sun, 10 May 2026 13:42:20 +0100
> > > > > > > > > > > > > > > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Add helpers that parses decimal numbers into 64-bit number, i.e., decimal
> > > > > > > > > > > > > > > > > point numbers with pre-defined scale are parsed into a 64-bit value (fixed
> > > > > > > > > > > > > > > > > precision). After the decimal point, digits beyond the specified scale
> > > > > > > > > > > > > > > > > are ignored.
>
> ...
>
> > > > > > > > > > I think we are going in circles here and we could look at the code instead:
> > > > > > > > > > - integer parsing with _parse_integer()
> > > > > > > > > > - overflow check and validation of the return value
> > > > > > > > > > - fractional parsing with _parse_integer_limit()
> > > > > > > > > > - overflow check and validation of the return value
> > > > > > > > >
> > > > > > > > > No, this is not fully true. That's what my whole point is about. The
> > > > > > > > > max_chars parameter limits the input check, then it skips an arbitrary
> > > > > > > > > number of digits and only *then* it checks for \n and \0. What will be
> > > > > > > > > the result of the
> > > > > > > > > 0.00000000000000000000000000000000423 in your case? Whatever scale you
> > > > > > > > > gave it will return 0 without checking on how many digits were
> > > > > > > > > supplied.
> > > > > > > >
> > > > > > > > I suppose that is a valid input and 0 is the expected result there.
> > > > > > > >
> > > > > > > > > All the same for 0.9999999999999999999999999999999000423. My
> > > > > > > > > point is that we should limit this by 19 digits.
> > > > > > > >
> > > > > > > > why we need to limit by 19? Digits beyond the scale carry no value...
> > > > > > >
> > > > > > > ...only if they are all 0:s.
> > > > > >
> > > > > > I thought your concern was on input length.
> > > > >
> > > > > One of, since I think you rose the topic of leading 0:s for integers and
> > > > > I agreed with that which makes sense to have mirrored in fractional part.
> > > > >
> > > > > > > > just like leading zeros to the integer part (which is also accepted by
> > > > > > > > kstrtoull() when parsing with base 10). Not sure why this is invalid input.
> > > > > > >
> > > > > > > See above. I agree on truncating trailing 0:s as it's done for leading ones
> > > > > > > in integer part, but if any of the digit behind 19th is not 0, it's an overflow
> > > > > > > condition (or bad input, depending how strict the rules are).
> > > > > >
> > > > > > stating in the documentation that digits beyond the scale are ignored is not
> > > > > > enough?
> > > > >
> > > > > It's in case we are not for kstrto*() family. My understanding that kstrto*()
> > > > > use strict rules on the input in overflow check.
> > > > >
> > > > > > > > > On top of that, what about -0.9(19 times) ? the fraction should be u64
> > > > > > > > > in this case and it's fine. The sign applies to the combined value.
> > > > > > > >
> > > > > > > > yes, range for signed values are verified later.
> > > > > > >
> > > > > > > > > > - extra scaling and truncation happening outside if needed.
> > > > > > > > >
> > > > > > > > > Right, but the given input may be way too long and still needs more validation.
> > > > > > > >
> > > > > > > > What is the problem with a long input of digits?
> > > > > > > > C compiler does not complain about this when parsing a float value,
> > > > > > > > python does not
> > > > > > > > complain about this when parsing floats or decimals either.
> > > > > > >
> > > > > > > Because there is an exponent limit and for double it's something like 1e307
> > > > > > > IIRC, meaning, try 1024 digits to be sure.
> > > > > > >
> > > > > > > Python most likely uses the library for big numbers, you can't compare it at all with this.
> > > > > >
> > > > > > You would be fine if the truncation loop:
> > > > > >
> > > > > > while (isdigit(*s)) /* truncate */
> > > > > > s++;
> > > > > >
> > > > > > is bounded by (19-scale) iteration count? or it should keep iterating if those are zero?
> > > > >
> > > > > Ideally both.
> > > > >
> > > > > We don't care about the digits in the range of 19-scale and skip all 0:s after
> > > > > that.
> > > > >
> > > > > /* truncate unrequired digits within type limit, i.e. 19 decimal digits */
> > > > > while (isdigit(*s) && "(s - pos_of_dot) is less than 19")
> > > > > s++;
> > > > > while (s == '0') /* truncate trailing 0:s, it's not a bad input nor overflow */
> > > > > s++;
> > > >
> > > > We could have agreed on something like that since the beginning!
> > >
> > > Yes, but who knew that we go to have this agreement?
> > >
> > > > And I think that changing the logic to something like this would not change a
> > > > thing on the kind of inputs we expect, it will just complicate the code.
> > > > I suppose that kind of kstrto*() rules were never stated anywhere.
> > > >
> > > > |> 20th digit
> > > > Also, 0.00000000000000000001 still sounds like a valid decimal number to me, even
> > > > though it is going to be parsed as 0!
> > >
> > > Hmm... It would mean that testing for 19th/20th digits is not enough... :-(
> > >
> > > > >
> > > > > // Now if it's not \0 nor \n and
> > > > > // a) still a digit consider either overflow or bad input,
> > > > > // b) if not a digit, consider as bad input.
> > > > >
> > > > > In a) I tend to be on par with the other k*() and consider that as overflow.
> > > > >
> > > > > > is that the only concern? Again, the usage of _parse_integer_limit(s, 10, &_frac, scale)
> > > > > > avoids a 64-bit division when checking the rv.
> > > > >
> > > > > I'm not against usage of _parse_integer_limit(), I'm for stricter rules on the input.
> > > > > With the above addressed, I have no more concerns.
> > > >
> > > > Thanks! I will proceed with the requested adjustments.
> > >
> > > But it seems it's not enough as you pointed out!
> > >
> > > So the biggest fraction we may consume in 64-bit (unsigned) value is
> > > 0.18446744073709551615. If we go with one digit less, the whole value
> > > can be
> > >
> > > In [3]: hex(9999999999999999999)
> > > Out[3]: '0x8ac7230489e7ffff'
> > >
> > > So, I don't know how we are supposed to represent values between
> > > -0.9223372036854775808
> > > -0.9999999999999999999
> > > in a signed type as they have bit 63 set.
> > >
> > > The easiest way out is to limit scale to 18 (but still accept 19th digit, and
> > > with check for overflow even 20th up to 0.18446744073709551615). This will need
> > > to run _parse_integer_limit() twice (with given scale and with 20).
> > >
> > > Can you add the respective test cases and see what is currently going on with
> > > them?
> >
> > I can add test cases, but for the signed case the situation is:
> >
> > scale = 0
> > max = 9223372036854775807, min = -9223372036854775808
> > scale = 1
> > max = 922337203685477580.7, min = -922337203685477580.8
> > scale = 2
> > max = 92233720368547758.07, min = -92233720368547758.08
> > ...
> > scale = 18
> > max = 9.223372036854775807, min = -9.223372036854775808
> > scake = 19
> > max = 0.9223372036854775807, min = -0.9223372036854775808
> >
> > anything outside those ranges will give you -ERANGE. Then it depends on the scale used.
>
> Oh, I only now realised that this is sliding window for a single 64-bit signed value!
> I was under impression that you wanted implementation that covers 128-bit signed value
> (with 64 + 64)...
So that was the initial approach with strntoull() with integer and fractional parts
combined in iio core. At that time I realized that we ended up combining them anyways
with:
val64 = (u64)val * MICRO + val2
so why not have val64 already! And all this made me realise that once leading 0s are ok,
scale can be even bigger, e.g.
scale = 20
max = 0.09223372036854775807, min = -0.09223372036854775808
scale = 21
max = 0.009223372036854775807, min = -0.009223372036854775808
It might be a sliding window of 19 digits, but here we trade range for scale, precision
is still fixed at 64-bit. I have a new idea to make thing simpler, actually
it would go back to what David pointed out in the past. Let me put this together...
> > I am not representing -0.9999999999999999999 as is. The desired scale will have this
> > truncated. It may be -0.9999 or -0.999999 or -0.9. And this is practical for a
> > reasonable scale value... for pico and femto precision you still get a decent range.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* [PATCH v10 0/4] kunit: Add support for suppressing warning backtraces
From: Albert Esteve @ 2026-05-13 7:30 UTC (permalink / raw)
To: Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Jonathan Corbet, Shuah Khan, Andrew Morton,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti
Cc: linux-kernel, linux-arch, linux-kselftest, kunit-dev, dri-devel,
workflows, linux-riscv, linux-doc, peterz, Alessandro Carminati,
Guenter Roeck, Kees Cook, Albert Esteve,
Linux Kernel Functional Testing, Maíra Canal, Dan Carpenter,
Kees Cook, Simona Vetter
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons:
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
One option to address the problem would be to add messages such as
"expected warning backtraces start/end here" to the kernel log.
However, that would again require filter scripts, might result in
missing real problematic warning backtraces triggered while the test
is running, and the irrelevant backtrace(s) would still clog the
kernel log.
Solve the problem by providing a means to suppress warning backtraces
originating from the current kthread while executing test code.
Since each KUnit test runs in its own kthread, this effectively scopes
suppression to the test that enabled it, without requiring any
architecture-specific code.
Overview:
Patch#1 Introduces the suppression infrastructure integrated into
KUnit's hook mechanism.
Patch#2 Adds selftests to validate the functionality.
Patch#3 Demonstrates real-world usage in the DRM subsystem.
Patch#4 Documents the new API and usage guidelines.
Design Notes:
Suppression is integrated into the existing KUnit hooks infrastructure,
reusing the kunit_running static branch for zero overhead
when no tests are running. The implementation lives entirely in the
kunit module; only a static-inline wrapper and a function pointer
slot are added to built-in code.
Suppression is checked at three points in the warning path:
- In `warn_slowpath_fmt()` (kernel/panic.c), for architectures without
__WARN_FLAGS. The check runs before any output, fully suppressing
both message and backtrace.
- In `__warn_printk()` (kernel/panic.c), for architectures that define
__WARN_FLAGS but not their own __WARN_printf (arm64, loongarch,
parisc, powerpc, riscv, sh). The check suppresses the warning message
text that is printed before the trap enters __report_bug().
- In `__report_bug()` (lib/bug.c), for architectures that define
__WARN_FLAGS. The check runs before `__warn()` is called, suppressing
the backtrace and stack dump.
To avoid double-counting on architectures where both `__warn_printk()`
and `__report_bug()` run for the same warning, the hook takes a bool
parameter: true to increment the suppression counter, false to suppress
without counting.
The suppression state is dynamically allocated via kunit_kzalloc() and
tied to the KUnit test lifecycle via `kunit_add_action()`, ensuring
automatic cleanup at test exit. Writer-side access to the global
suppression list is serialized with a spinlock; readers use RCU.
Two API forms are provided:
- kunit_warning_suppress(test) { ... }: scoped blocks with automatic
cleanup. The suppression handle is not accessible outside the block,
so warning counts (if needed) must be checked inside. Multiple
suppression blocks are allowed.
- kunit_start/end_suppress_warning(test): direct functions that return
an explicit handle. Use when the handle needs to be retained, or passed
across helpers. Multiple suppression blocks are allowed.
This series is based on the RFC patch and subsequent discussion at
https://patchwork.kernel.org/project/linux-kselftest/patch/02546e59-1afe-4b08-ba81-d94f3b691c9a@moroto.mountain/
and offers a more comprehensive solution of the problem discussed there.
Changes since RFC:
- Introduced CONFIG_KUNIT_SUPPRESS_BACKTRACE
- Minor cleanups and bug fixes
- Added support for all affected architectures
- Added support for counting suppressed warnings
- Added unit tests using those counters
- Added patch to suppress warning backtraces in dev_addr_lists tests
Changes since v1:
- Rebased to v6.9-rc1
- Added Tested-by:, Acked-by:, and Reviewed-by: tags
[I retained those tags since there have been no functional changes]
- Introduced KUNIT_SUPPRESS_BACKTRACE configuration option, enabled by
default.
Changes since v2:
- Rebased to v6.9-rc2
- Added comments to drm warning suppression explaining why it is needed.
- Added patch to move conditional code in arch/sh/include/asm/bug.h
to avoid kerneldoc warning
- Added architecture maintainers to Cc: for architecture specific patches
- No functional changes
Changes since v3:
- Rebased to v6.14-rc6
- Dropped net: "kunit: Suppress lock warning noise at end of dev_addr_lists tests"
since 3db3b62955cd6d73afde05a17d7e8e106695c3b9
- Added __kunit_ and KUNIT_ prefixes.
- Tested on interessed architectures.
Changes since v4:
- Rebased to v6.15-rc7
- Dropped all code in __report_bug()
- Moved all checks in WARN*() macros.
- Dropped all architecture specific code.
- Made __kunit_is_suppressed_warning nice to noinstr functions.
Changes since v5:
- Rebased to v7.0-rc3
- Added RCU protection for the suppressed warnings list.
- Added static key and branching optimization.
- Removed custom `strcmp` implementation and reworked
__kunit_is_suppressed_warning() entrypoint function.
Changes since v6:
- Moved suppression checks from WARN*() macros to warn_slowpath_fmt()
and __report_bug().
- Replaced stack-allocated suppression struct with kunit_kzalloc() heap
allocation tied to the KUnit test lifecycle.
- Changed suppression strategy from function-name matching to task-scoped:
all warnings on the current task are suppressed between START and END,
rather than only warnings originating from a specific named function.
- Simplified macro API: removed KUNIT_DECLARE_SUPPRESSED_WARNING(),
the START macro now takes (test) and handles allocation internally.
- Removed static key and branching optiomization, as by the time it
was executed, callers are already in warn slowpaths.
- Link to v6: https://lore.kernel.org/r/20260317-kunit_add_support-v6-0-dd22aeb3fe5d@redhat.com
Changes since v7:
- Integrated suppression into existing KUnit hooks infrastructure
- Removed CONFIG_KUNIT_SUPPRESS_BACKTRACE
- Added suppression check in __warn_printk()
- Added spinlock for writer-side RCU protection
- Replaced explicit rcu_read_lock/unlock with guard(rcu)()
- Added scoped API (kunit_warning_suppress) using __cleanup attribute
- Updated DRM patch to use scoped API
- Expanded self-tests: incremental counting, cross-kthread isolation
- Rewrote documentation covering all three API forms with examples
- Link to v7: https://lore.kernel.org/r/20260420-kunit_add_support-v7-0-e8bc6e0f70de@redhat.com
Changes since v8:
- Rebased to v7.1-rc2
- Remove KUNIT_START/END_SUPPRESSED_WARNING() macros
- Add KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT checks to drm tests
- Link to v8: https://lore.kernel.org/r/20260504-kunit_add_support-v8-0-3e5957cdd235@redhat.com
Changes since v9:
- Fix silent false-pass when kunit_start_suppress_warning() returns NULL
- Fix RCU lockdep splat for kunit_is_suppressed_warning() calls
- Move disable_trace_on_warning() in __report_bug()
- Make suppress counter atomic
- Mark helper warn functions in selftest as noinline
- Add kunit_skip() for CONFIG_BUG=n in selftests
- Fix potentially uninitialized data.was_active in kthread seltest
- Add kthread_stop() in kthread selftest early exit
- Initialize scaling_factor to INT_MIN in DRM scaling tests
- Add include for bool in test-bug.h to fix CONFIG_KUNIT=n case
- Link to v9: https://lore.kernel.org/r/20260508-kunit_add_support-v9-0-99df7aa880f6@redhat.com
--
2.34.1
---
To: Brendan Higgins <brendan.higgins@linux.dev>
To: David Gow <david@davidgow.net>
To: Rae Moar <raemoar63@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Paul Walmsley <pjw@kernel.org>
To: Palmer Dabbelt <palmer@dabbelt.com>
To: Albert Ou <aou@eecs.berkeley.edu>
To: Alexandre Ghiti <alex@ghiti.fr>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
To: Maxime Ripard <mripard@kernel.org>
To: Thomas Zimmermann <tzimmermann@suse.de>
To: David Airlie <airlied@gmail.com>
To: Simona Vetter <simona@ffwll.ch>
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com
Cc: linux-riscv@lists.infradead.org
Cc: dri-devel@lists.freedesktop.org
Cc: workflows@vger.kernel.org
Cc: linux-doc@vger.kernel.org
---
Alessandro Carminati (1):
bug/kunit: Core support for suppressing warning backtraces
Guenter Roeck (3):
kunit: Add backtrace suppression self-tests
drm: Suppress intentional warning backtraces in scaling unit tests
kunit: Add documentation for warning backtrace suppression API
Documentation/dev-tools/kunit/usage.rst | 46 +++++++-
drivers/gpu/drm/tests/drm_rect_test.c | 32 +++++-
include/kunit/test-bug.h | 26 +++++
include/kunit/test.h | 98 ++++++++++++++++
kernel/panic.c | 11 ++
lib/bug.c | 14 ++-
lib/kunit/Makefile | 4 +-
lib/kunit/backtrace-suppression-test.c | 196 ++++++++++++++++++++++++++++++++
lib/kunit/bug.c | 119 +++++++++++++++++++
lib/kunit/hooks-impl.h | 2 +
10 files changed, 538 insertions(+), 10 deletions(-)
---
base-commit: 74fe02ce122a6103f207d29fafc8b3a53de6abaf
change-id: 20260312-kunit_add_support-2f35806b19dd
Best regards,
--
Albert Esteve <aesteve@redhat.com>
^ permalink raw reply
* [PATCH v10 1/4] bug/kunit: Core support for suppressing warning backtraces
From: Albert Esteve @ 2026-05-13 7:30 UTC (permalink / raw)
To: Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Jonathan Corbet, Shuah Khan, Andrew Morton,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti
Cc: linux-kernel, linux-arch, linux-kselftest, kunit-dev, dri-devel,
workflows, linux-riscv, linux-doc, peterz, Alessandro Carminati,
Guenter Roeck, Kees Cook, Albert Esteve
In-Reply-To: <20260513-kunit_add_support-v10-0-e379d206c8cd@redhat.com>
From: Alessandro Carminati <acarmina@redhat.com>
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons:
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
Solve the problem by providing a means to suppress warning backtraces
originating from the current kthread while executing test code. Since
each KUnit test runs in its own kthread, this effectively scopes
suppression to the test that enabled it. Limit changes to generic code
to the absolute minimum.
Implementation details:
Suppression is integrated into the existing KUnit hooks infrastructure
in test-bug.h, reusing the kunit_running static branch for zero
overhead when no tests are running.
Suppression is checked at three points in the warning path:
- In warn_slowpath_fmt(), the check runs before any output, fully
suppressing both message and backtrace. This covers architectures
without __WARN_FLAGS.
- In __warn_printk(), the check suppresses the warning message text.
This covers architectures that define __WARN_FLAGS but not their own
__WARN_printf (arm64, loongarch, parisc, powerpc, riscv, sh), where
the message is printed before the trap enters __report_bug().
- In __report_bug(), the check runs before __warn() is called,
suppressing the backtrace and stack dump.
To avoid double-counting on architectures where both __warn_printk()
and __report_bug() run for the same warning, kunit_is_suppressed_warning()
takes a bool parameter: true to increment the suppression counter
(used in warn_slowpath_fmt and __report_bug), false to check only
(used in __warn_printk).
The suppression state is dynamically allocated via kunit_kzalloc() and
tied to the KUnit test lifecycle via kunit_add_action(), ensuring
automatic cleanup at test exit. Writer-side access to the global
suppression list is serialized with a spinlock; readers use RCU.
Two API forms are provided:
- kunit_warning_suppress(test) { ... }: scoped, uses __cleanup for
automatic teardown on scope exit, kunit_add_action() as safety net
for abnormal exits (e.g. kthread_exit from failed assertions).
Suppression handle is only accessible inside the block.
- kunit_start/end_suppress_warning(test): direct functions returning
an explicit handle, for retaining the handle within the test,
or for cross-function usage.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
include/kunit/test-bug.h | 26 +++++++++++
include/kunit/test.h | 98 ++++++++++++++++++++++++++++++++++++++
kernel/panic.c | 11 +++++
lib/bug.c | 14 +++++-
lib/kunit/Makefile | 3 +-
lib/kunit/bug.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++
lib/kunit/hooks-impl.h | 2 +
7 files changed, 270 insertions(+), 3 deletions(-)
diff --git a/include/kunit/test-bug.h b/include/kunit/test-bug.h
index 47aa8f21ccce8..99869029fc686 100644
--- a/include/kunit/test-bug.h
+++ b/include/kunit/test-bug.h
@@ -10,6 +10,7 @@
#define _KUNIT_TEST_BUG_H
#include <linux/stddef.h> /* for NULL */
+#include <linux/types.h> /* for bool */
#if IS_ENABLED(CONFIG_KUNIT)
@@ -23,6 +24,7 @@ DECLARE_STATIC_KEY_FALSE(kunit_running);
extern struct kunit_hooks_table {
__printf(3, 4) void (*fail_current_test)(const char*, int, const char*, ...);
void *(*get_static_stub_address)(struct kunit *test, void *real_fn_addr);
+ bool (*is_suppressed_warning)(bool count);
} kunit_hooks;
/**
@@ -60,9 +62,33 @@ static inline struct kunit *kunit_get_current_test(void)
} \
} while (0)
+/**
+ * kunit_is_suppressed_warning() - Check if warnings are being suppressed
+ * by the current KUnit test.
+ * @count: if true, increment the suppression counter on match.
+ *
+ * Returns true if the current task has active warning suppression.
+ * Uses the kunit_running static branch for zero overhead when no tests run.
+ *
+ * A single WARN*() may traverse multiple call sites in the warning path
+ * (e.g., __warn_printk() and __report_bug()). Pass @count = true at the
+ * primary suppression point to count each warning exactly once, and
+ * @count = false at secondary points to suppress output without
+ * inflating the count.
+ */
+static inline bool kunit_is_suppressed_warning(bool count)
+{
+ if (!static_branch_unlikely(&kunit_running))
+ return false;
+
+ return kunit_hooks.is_suppressed_warning &&
+ kunit_hooks.is_suppressed_warning(count);
+}
+
#else
static inline struct kunit *kunit_get_current_test(void) { return NULL; }
+static inline bool kunit_is_suppressed_warning(bool count) { return false; }
#define kunit_fail_current_test(fmt, ...) do {} while (0)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 9cd1594ab697d..be71612f61655 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -1795,4 +1795,102 @@ do { \
// include resource.h themselves if they need it.
#include <kunit/resource.h>
+/*
+ * Warning backtrace suppression API.
+ *
+ * Suppresses WARN*() backtraces on the current task while active. Two forms
+ * are provided:
+ *
+ * - Scoped: kunit_warning_suppress(test) { ... }
+ * Suppression is active for the duration of the block. On normal exit,
+ * the for-loop increment deactivates suppression. On early exit (break,
+ * return, goto), the __cleanup attribute fires. On kthread_exit() (e.g.,
+ * a failed KUnit assertion), kunit_add_action() cleans up at test
+ * teardown. The suppression handle is only accessible inside the block,
+ * so warning counts must be checked before the block exits.
+ *
+ * - Direct: kunit_start_suppress_warning() / kunit_end_suppress_warning()
+ * The underlying functions, returning an explicit handle pointer. Use
+ * when the handle needs to be retained (e.g., for post-suppression
+ * count checks) or passed across helper functions.
+ */
+struct kunit_suppressed_warning;
+
+struct kunit_suppressed_warning *
+kunit_start_suppress_warning(struct kunit *test);
+void kunit_end_suppress_warning(struct kunit *test,
+ struct kunit_suppressed_warning *w);
+int kunit_suppressed_warning_count(struct kunit_suppressed_warning *w);
+void __kunit_suppress_auto_cleanup(struct kunit_suppressed_warning **wp);
+bool kunit_has_active_suppress_warning(void);
+
+/**
+ * kunit_warning_suppress() - Suppress WARN*() backtraces for the duration
+ * of a block.
+ * @test: The test context object.
+ *
+ * Scoped form of the suppression API. Suppression starts when the block is
+ * entered and ends automatically when the block exits through any path. See
+ * the section comment above for the cleanup guarantees on each exit path.
+ * Fails the test if suppression is already active; nesting is not supported.
+ *
+ * The warning count can be checked inside the block via
+ * KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(). The handle is not accessible
+ * after the block exits.
+ *
+ * Example::
+ *
+ * kunit_warning_suppress(test) {
+ * trigger_warning();
+ * KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 1);
+ * }
+ */
+#define kunit_warning_suppress(test) \
+ for (struct kunit_suppressed_warning *__kunit_suppress \
+ __cleanup(__kunit_suppress_auto_cleanup) = \
+ kunit_start_suppress_warning(test); \
+ __kunit_suppress; \
+ kunit_end_suppress_warning(test, __kunit_suppress), \
+ __kunit_suppress = NULL)
+
+/**
+ * KUNIT_SUPPRESSED_WARNING_COUNT() - Returns the suppressed warning count.
+ *
+ * Returns the number of WARN*() calls suppressed since the current
+ * suppression block started, or 0 if the handle is NULL. Usable inside a
+ * kunit_warning_suppress() block.
+ */
+#define KUNIT_SUPPRESSED_WARNING_COUNT() \
+ kunit_suppressed_warning_count(__kunit_suppress)
+
+/**
+ * KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT() - Sets an expectation that the
+ * suppressed warning count equals
+ * @expected.
+ * @test: The test context object.
+ * @expected: an expression that evaluates to the expected warning count.
+ *
+ * Sets an expectation that the number of suppressed WARN*() calls equals
+ * @expected. This is semantically equivalent to
+ * KUNIT_EXPECT_EQ(@test, KUNIT_SUPPRESSED_WARNING_COUNT(), @expected).
+ * See KUNIT_EXPECT_EQ() for more information.
+ */
+#define KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, expected) \
+ KUNIT_EXPECT_EQ(test, KUNIT_SUPPRESSED_WARNING_COUNT(), expected)
+
+/**
+ * KUNIT_ASSERT_SUPPRESSED_WARNING_COUNT() - Sets an assertion that the
+ * suppressed warning count equals
+ * @expected.
+ * @test: The test context object.
+ * @expected: an expression that evaluates to the expected warning count.
+ *
+ * Sets an assertion that the number of suppressed WARN*() calls equals
+ * @expected. This is the same as KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(),
+ * except it causes an assertion failure (see KUNIT_ASSERT_TRUE()) when the
+ * assertion is not met.
+ */
+#define KUNIT_ASSERT_SUPPRESSED_WARNING_COUNT(test, expected) \
+ KUNIT_ASSERT_EQ(test, KUNIT_SUPPRESSED_WARNING_COUNT(), expected)
+
#endif /* _KUNIT_TEST_H */
diff --git a/kernel/panic.c b/kernel/panic.c
index 20feada5319d4..213725b612aa1 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -39,6 +39,7 @@
#include <linux/sys_info.h>
#include <trace/events/error_report.h>
#include <asm/sections.h>
+#include <kunit/test-bug.h>
#define PANIC_TIMER_STEP 100
#define PANIC_BLINK_SPD 18
@@ -1124,6 +1125,11 @@ void warn_slowpath_fmt(const char *file, int line, unsigned taint,
bool rcu = warn_rcu_enter();
struct warn_args args;
+ if (kunit_is_suppressed_warning(true)) {
+ warn_rcu_exit(rcu);
+ return;
+ }
+
pr_warn(CUT_HERE);
if (!fmt) {
@@ -1146,6 +1152,11 @@ void __warn_printk(const char *fmt, ...)
bool rcu = warn_rcu_enter();
va_list args;
+ if (kunit_is_suppressed_warning(false)) {
+ warn_rcu_exit(rcu);
+ return;
+ }
+
pr_warn(CUT_HERE);
va_start(args, fmt);
diff --git a/lib/bug.c b/lib/bug.c
index 224f4cfa4aa31..d99e369bc1103 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -48,6 +48,7 @@
#include <linux/rculist.h>
#include <linux/ftrace.h>
#include <linux/context_tracking.h>
+#include <kunit/test-bug.h>
extern struct bug_entry __start___bug_table[], __stop___bug_table[];
@@ -209,8 +210,6 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
return BUG_TRAP_TYPE_NONE;
}
- disable_trace_on_warning();
-
bug_get_file_line(bug, &file, &line);
fmt = bug_get_format(bug);
@@ -220,6 +219,17 @@ static enum bug_trap_type __report_bug(struct bug_entry *bug, unsigned long buga
no_cut = bug->flags & BUGFLAG_NO_CUT_HERE;
has_args = bug->flags & BUGFLAG_ARGS;
+#ifdef CONFIG_KUNIT
+ /*
+ * Before the once logic so suppressed warnings do not consume
+ * the single-fire budget of WARN_ON_ONCE().
+ */
+ if (warning && kunit_is_suppressed_warning(true))
+ return BUG_TRAP_TYPE_WARN;
+#endif
+
+ disable_trace_on_warning();
+
if (warning && once) {
if (done)
return BUG_TRAP_TYPE_WARN;
diff --git a/lib/kunit/Makefile b/lib/kunit/Makefile
index 656f1fa35abcc..4592f9d0aa8dd 100644
--- a/lib/kunit/Makefile
+++ b/lib/kunit/Makefile
@@ -10,7 +10,8 @@ kunit-objs += test.o \
executor.o \
attributes.o \
device.o \
- platform.o
+ platform.o \
+ bug.o
ifeq ($(CONFIG_KUNIT_DEBUGFS),y)
kunit-objs += debugfs.o
diff --git a/lib/kunit/bug.c b/lib/kunit/bug.c
new file mode 100644
index 0000000000000..cdfcbfe80b5df
--- /dev/null
+++ b/lib/kunit/bug.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit helpers for backtrace suppression
+ *
+ * Copyright (C) 2025 Alessandro Carminati <acarmina@redhat.com>
+ * Copyright (C) 2024 Guenter Roeck <linux@roeck-us.net>
+ */
+
+#include <kunit/resource.h>
+#include <linux/export.h>
+#include <linux/rculist.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+
+#include "hooks-impl.h"
+
+struct kunit_suppressed_warning {
+ struct list_head node;
+ struct task_struct *task;
+ struct kunit *test;
+ atomic_t counter;
+};
+
+static LIST_HEAD(suppressed_warnings);
+static DEFINE_SPINLOCK(suppressed_warnings_lock);
+
+static void kunit_suppress_warning_remove(struct kunit_suppressed_warning *w)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&suppressed_warnings_lock, flags);
+ list_del_rcu(&w->node);
+ spin_unlock_irqrestore(&suppressed_warnings_lock, flags);
+ synchronize_rcu(); /* Wait for readers to finish */
+}
+
+KUNIT_DEFINE_ACTION_WRAPPER(kunit_suppress_warning_cleanup,
+ kunit_suppress_warning_remove,
+ struct kunit_suppressed_warning *);
+
+bool kunit_has_active_suppress_warning(void)
+{
+ return __kunit_is_suppressed_warning_impl(false);
+}
+EXPORT_SYMBOL_GPL(kunit_has_active_suppress_warning);
+
+struct kunit_suppressed_warning *
+kunit_start_suppress_warning(struct kunit *test)
+{
+ struct kunit_suppressed_warning *w;
+ unsigned long flags;
+ int ret;
+
+ if (kunit_has_active_suppress_warning()) {
+ KUNIT_FAIL(test, "Another suppression block is already active");
+ return NULL;
+ }
+
+ w = kunit_kzalloc(test, sizeof(*w), GFP_KERNEL);
+ if (!w) {
+ KUNIT_FAIL(test, "Failed to allocate suppression handle.");
+ return NULL;
+ }
+
+ w->task = current;
+ w->test = test;
+
+ spin_lock_irqsave(&suppressed_warnings_lock, flags);
+ list_add_rcu(&w->node, &suppressed_warnings);
+ spin_unlock_irqrestore(&suppressed_warnings_lock, flags);
+
+ ret = kunit_add_action_or_reset(test,
+ kunit_suppress_warning_cleanup, w);
+ if (ret) {
+ KUNIT_FAIL(test, "Failed to add suppression cleanup action.");
+ return NULL;
+ }
+
+ return w;
+}
+EXPORT_SYMBOL_GPL(kunit_start_suppress_warning);
+
+void kunit_end_suppress_warning(struct kunit *test,
+ struct kunit_suppressed_warning *w)
+{
+ if (!w)
+ return;
+ kunit_release_action(test, kunit_suppress_warning_cleanup, w);
+}
+EXPORT_SYMBOL_GPL(kunit_end_suppress_warning);
+
+void __kunit_suppress_auto_cleanup(struct kunit_suppressed_warning **wp)
+{
+ if (*wp)
+ kunit_end_suppress_warning((*wp)->test, *wp);
+}
+EXPORT_SYMBOL_GPL(__kunit_suppress_auto_cleanup);
+
+int kunit_suppressed_warning_count(struct kunit_suppressed_warning *w)
+{
+ return w ? atomic_read(&w->counter) : 0;
+}
+EXPORT_SYMBOL_GPL(kunit_suppressed_warning_count);
+
+bool __kunit_is_suppressed_warning_impl(bool count)
+{
+ struct kunit_suppressed_warning *w;
+
+ guard(rcu)();
+ list_for_each_entry_rcu(w, &suppressed_warnings, node) {
+ if (w->task == current) {
+ if (count)
+ atomic_inc(&w->counter);
+ return true;
+ }
+ }
+
+ return false;
+}
diff --git a/lib/kunit/hooks-impl.h b/lib/kunit/hooks-impl.h
index 4e71b2d0143ba..d8720f2616925 100644
--- a/lib/kunit/hooks-impl.h
+++ b/lib/kunit/hooks-impl.h
@@ -19,6 +19,7 @@ void __printf(3, 4) __kunit_fail_current_test_impl(const char *file,
int line,
const char *fmt, ...);
void *__kunit_get_static_stub_address_impl(struct kunit *test, void *real_fn_addr);
+bool __kunit_is_suppressed_warning_impl(bool count);
/* Code to set all of the function pointers. */
static inline void kunit_install_hooks(void)
@@ -26,6 +27,7 @@ static inline void kunit_install_hooks(void)
/* Install the KUnit hook functions. */
kunit_hooks.fail_current_test = __kunit_fail_current_test_impl;
kunit_hooks.get_static_stub_address = __kunit_get_static_stub_address_impl;
+ kunit_hooks.is_suppressed_warning = __kunit_is_suppressed_warning_impl;
}
#endif /* _KUNIT_HOOKS_IMPL_H */
--
2.53.0
^ permalink raw reply related
* [PATCH v10 2/4] kunit: Add backtrace suppression self-tests
From: Albert Esteve @ 2026-05-13 7:30 UTC (permalink / raw)
To: Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Jonathan Corbet, Shuah Khan, Andrew Morton,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti
Cc: linux-kernel, linux-arch, linux-kselftest, kunit-dev, dri-devel,
workflows, linux-riscv, linux-doc, peterz, Guenter Roeck,
Linux Kernel Functional Testing, Alessandro Carminati,
Albert Esteve, Dan Carpenter, Kees Cook
In-Reply-To: <20260513-kunit_add_support-v10-0-e379d206c8cd@redhat.com>
From: Guenter Roeck <linux@roeck-us.net>
Add unit tests to verify that warning backtrace suppression works.
Tests cover both API forms:
- Scoped: kunit_warning_suppress() with in-block count verification
and post-block inactivity check.
- Direct functions: kunit_start/end_suppress_warning() with
sequential independent suppression blocks and per-block counts.
Furthermore, tests verify incremental warning counting, that
kunit_has_active_suppress_warning() transitions correctly around
suppression boundaries, and that suppression active in the test
kthread does not leak to a separate kthread.
If backtrace suppression does _not_ work, the unit tests will likely
trigger unsuppressed backtraces, which should actually help to get
the affected architectures / platforms fixed.
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Acked-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
lib/kunit/Makefile | 1 +
lib/kunit/backtrace-suppression-test.c | 196 +++++++++++++++++++++++++++++++++
2 files changed, 197 insertions(+)
diff --git a/lib/kunit/Makefile b/lib/kunit/Makefile
index 4592f9d0aa8dd..2e8a6b71a2ab0 100644
--- a/lib/kunit/Makefile
+++ b/lib/kunit/Makefile
@@ -22,6 +22,7 @@ obj-$(if $(CONFIG_KUNIT),y) += hooks.o
obj-$(CONFIG_KUNIT_TEST) += kunit-test.o
obj-$(CONFIG_KUNIT_TEST) += platform-test.o
+obj-$(CONFIG_KUNIT_TEST) += backtrace-suppression-test.o
# string-stream-test compiles built-in only.
ifeq ($(CONFIG_KUNIT_TEST),y)
diff --git a/lib/kunit/backtrace-suppression-test.c b/lib/kunit/backtrace-suppression-test.c
new file mode 100644
index 0000000000000..831a60f3521fa
--- /dev/null
+++ b/lib/kunit/backtrace-suppression-test.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit test for suppressing warning tracebacks.
+ *
+ * Copyright (C) 2024, Guenter Roeck
+ * Author: Guenter Roeck <linux@roeck-us.net>
+ */
+
+#include <kunit/test.h>
+#include <linux/bug.h>
+#include <linux/completion.h>
+#include <linux/kthread.h>
+
+static void backtrace_suppression_test_warn_direct(struct kunit *test)
+{
+ if (!IS_ENABLED(CONFIG_BUG))
+ kunit_skip(test, "requires CONFIG_BUG");
+
+ kunit_warning_suppress(test) {
+ WARN(1, "This backtrace should be suppressed");
+ /*
+ * Count must be checked inside the scope; the handle
+ * is not accessible after the block exits.
+ */
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 1);
+ }
+ KUNIT_EXPECT_FALSE(test, kunit_has_active_suppress_warning());
+}
+
+static noinline void trigger_backtrace_warn(void)
+{
+ WARN(1, "This backtrace should be suppressed");
+}
+
+static void backtrace_suppression_test_warn_indirect(struct kunit *test)
+{
+ if (!IS_ENABLED(CONFIG_BUG))
+ kunit_skip(test, "requires CONFIG_BUG");
+
+ kunit_warning_suppress(test) {
+ trigger_backtrace_warn();
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 1);
+ }
+}
+
+static void backtrace_suppression_test_warn_multi(struct kunit *test)
+{
+ if (!IS_ENABLED(CONFIG_BUG))
+ kunit_skip(test, "requires CONFIG_BUG");
+
+ kunit_warning_suppress(test) {
+ WARN(1, "This backtrace should be suppressed");
+ trigger_backtrace_warn();
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 2);
+ }
+}
+
+static void backtrace_suppression_test_warn_on_direct(struct kunit *test)
+{
+ if (!IS_ENABLED(CONFIG_BUG))
+ kunit_skip(test, "requires CONFIG_BUG");
+ if (!IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE) && !IS_ENABLED(CONFIG_KALLSYMS))
+ kunit_skip(test, "requires CONFIG_DEBUG_BUGVERBOSE or CONFIG_KALLSYMS");
+
+ kunit_warning_suppress(test) {
+ WARN_ON(1);
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 1);
+ }
+}
+
+static noinline void trigger_backtrace_warn_on(void)
+{
+ WARN_ON(1);
+}
+
+static void backtrace_suppression_test_warn_on_indirect(struct kunit *test)
+{
+ if (!IS_ENABLED(CONFIG_BUG))
+ kunit_skip(test, "requires CONFIG_BUG");
+ if (!IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE))
+ kunit_skip(test, "requires CONFIG_DEBUG_BUGVERBOSE");
+
+ kunit_warning_suppress(test) {
+ trigger_backtrace_warn_on();
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 1);
+ }
+}
+
+static void backtrace_suppression_test_count(struct kunit *test)
+{
+ if (!IS_ENABLED(CONFIG_BUG))
+ kunit_skip(test, "requires CONFIG_BUG");
+
+ kunit_warning_suppress(test) {
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 0);
+
+ WARN(1, "suppressed");
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 1);
+
+ WARN(1, "suppressed again");
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 2);
+ }
+}
+
+static void backtrace_suppression_test_active_state(struct kunit *test)
+{
+ KUNIT_EXPECT_FALSE(test, kunit_has_active_suppress_warning());
+
+ kunit_warning_suppress(test) {
+ KUNIT_EXPECT_TRUE(test, kunit_has_active_suppress_warning());
+ }
+
+ KUNIT_EXPECT_FALSE(test, kunit_has_active_suppress_warning());
+
+ kunit_warning_suppress(test) {
+ KUNIT_EXPECT_TRUE(test, kunit_has_active_suppress_warning());
+ }
+
+ KUNIT_EXPECT_FALSE(test, kunit_has_active_suppress_warning());
+}
+
+static void backtrace_suppression_test_multi_scope(struct kunit *test)
+{
+ struct kunit_suppressed_warning *sw1, *sw2;
+
+ if (!IS_ENABLED(CONFIG_BUG))
+ kunit_skip(test, "requires CONFIG_BUG");
+ if (!IS_ENABLED(CONFIG_DEBUG_BUGVERBOSE))
+ kunit_skip(test, "requires CONFIG_DEBUG_BUGVERBOSE");
+
+ sw1 = kunit_start_suppress_warning(test);
+ trigger_backtrace_warn_on();
+ WARN(1, "suppressed by sw1");
+ kunit_end_suppress_warning(test, sw1);
+
+ sw2 = kunit_start_suppress_warning(test);
+ WARN(1, "suppressed by sw2");
+ kunit_end_suppress_warning(test, sw2);
+
+ KUNIT_EXPECT_EQ(test, kunit_suppressed_warning_count(sw1), 2);
+ KUNIT_EXPECT_EQ(test, kunit_suppressed_warning_count(sw2), 1);
+}
+
+struct cross_kthread_data {
+ bool was_active;
+ struct completion done;
+};
+
+static int cross_kthread_fn(void *data)
+{
+ struct cross_kthread_data *d = data;
+
+ d->was_active = kunit_has_active_suppress_warning();
+ complete(&d->done);
+ return 0;
+}
+
+static void backtrace_suppression_test_cross_kthread(struct kunit *test)
+{
+ struct cross_kthread_data data;
+ struct task_struct *task;
+
+ data.was_active = false;
+ init_completion(&data.done);
+
+ kunit_warning_suppress(test) {
+ task = kthread_run(cross_kthread_fn, &data, "kunit-cross-test");
+ KUNIT_ASSERT_FALSE(test, IS_ERR(task));
+ wait_for_completion(&data.done);
+ kthread_stop(task);
+ }
+
+ KUNIT_EXPECT_FALSE(test, data.was_active);
+}
+
+static struct kunit_case backtrace_suppression_test_cases[] = {
+ KUNIT_CASE(backtrace_suppression_test_warn_direct),
+ KUNIT_CASE(backtrace_suppression_test_warn_indirect),
+ KUNIT_CASE(backtrace_suppression_test_warn_multi),
+ KUNIT_CASE(backtrace_suppression_test_warn_on_direct),
+ KUNIT_CASE(backtrace_suppression_test_warn_on_indirect),
+ KUNIT_CASE(backtrace_suppression_test_count),
+ KUNIT_CASE(backtrace_suppression_test_active_state),
+ KUNIT_CASE(backtrace_suppression_test_multi_scope),
+ KUNIT_CASE(backtrace_suppression_test_cross_kthread),
+ {}
+};
+
+static struct kunit_suite backtrace_suppression_test_suite = {
+ .name = "backtrace-suppression-test",
+ .test_cases = backtrace_suppression_test_cases,
+};
+kunit_test_suites(&backtrace_suppression_test_suite);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("KUnit test to verify warning backtrace suppression");
--
2.53.0
^ permalink raw reply related
* [PATCH v10 3/4] drm: Suppress intentional warning backtraces in scaling unit tests
From: Albert Esteve @ 2026-05-13 7:30 UTC (permalink / raw)
To: Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Jonathan Corbet, Shuah Khan, Andrew Morton,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti
Cc: linux-kernel, linux-arch, linux-kselftest, kunit-dev, dri-devel,
workflows, linux-riscv, linux-doc, peterz, Guenter Roeck,
Linux Kernel Functional Testing, Maíra Canal,
Alessandro Carminati, Albert Esteve, Dan Carpenter, Simona Vetter
In-Reply-To: <20260513-kunit_add_support-v10-0-e379d206c8cd@redhat.com>
From: Guenter Roeck <linux@roeck-us.net>
The drm_test_rect_calc_hscale and drm_test_rect_calc_vscale unit tests
intentionally trigger warning backtraces by providing bad parameters to
the tested functions. What is tested is the return value, not the existence
of a warning backtrace. Suppress the backtraces to avoid clogging the
kernel log and distraction from real problems. Additionally, the
suppression API allows to actually ensure a warning was triggered,
without parsing any kernel logs and keeping them clean.
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Acked-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Maíra Canal <mcanal@igalia.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
Acked-by: David Gow <david@davidgow.net>
Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
drivers/gpu/drm/tests/drm_rect_test.c | 32 ++++++++++++++++++++++++++------
1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/tests/drm_rect_test.c b/drivers/gpu/drm/tests/drm_rect_test.c
index 17e1f34b76101..ccc741b6191ff 100644
--- a/drivers/gpu/drm/tests/drm_rect_test.c
+++ b/drivers/gpu/drm/tests/drm_rect_test.c
@@ -10,6 +10,7 @@
#include <drm/drm_rect.h>
#include <drm/drm_mode.h>
+#include <linux/limits.h>
#include <linux/string_helpers.h>
#include <linux/errno.h>
@@ -407,10 +408,20 @@ KUNIT_ARRAY_PARAM(drm_rect_scale, drm_rect_scale_cases, drm_rect_scale_case_desc
static void drm_test_rect_calc_hscale(struct kunit *test)
{
const struct drm_rect_scale_case *params = test->param_value;
- int scaling_factor;
+ int expected_warnings = params->expected_scaling_factor == -EINVAL;
+ int scaling_factor = INT_MIN;
- scaling_factor = drm_rect_calc_hscale(¶ms->src, ¶ms->dst,
- params->min_range, params->max_range);
+ /*
+ * drm_rect_calc_hscale() generates a warning backtrace whenever bad
+ * parameters are passed to it. This affects unit tests with -EINVAL
+ * error code in expected_scaling_factor.
+ */
+ kunit_warning_suppress(test) {
+ scaling_factor = drm_rect_calc_hscale(¶ms->src, ¶ms->dst,
+ params->min_range,
+ params->max_range);
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, expected_warnings);
+ }
KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor);
}
@@ -418,10 +429,19 @@ static void drm_test_rect_calc_hscale(struct kunit *test)
static void drm_test_rect_calc_vscale(struct kunit *test)
{
const struct drm_rect_scale_case *params = test->param_value;
- int scaling_factor;
+ int expected_warnings = params->expected_scaling_factor == -EINVAL;
+ int scaling_factor = INT_MIN;
- scaling_factor = drm_rect_calc_vscale(¶ms->src, ¶ms->dst,
- params->min_range, params->max_range);
+ /*
+ * drm_rect_calc_vscale() generates a warning backtrace whenever bad
+ * parameters are passed to it. This affects unit tests with -EINVAL
+ * error code in expected_scaling_factor.
+ */
+ kunit_warning_suppress(test) {
+ scaling_factor = drm_rect_calc_vscale(¶ms->src, ¶ms->dst,
+ params->min_range, params->max_range);
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, expected_warnings);
+ }
KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor);
}
--
2.53.0
^ permalink raw reply related
* [PATCH v10 4/4] kunit: Add documentation for warning backtrace suppression API
From: Albert Esteve @ 2026-05-13 7:30 UTC (permalink / raw)
To: Arnd Bergmann, Brendan Higgins, David Gow, Rae Moar,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Jonathan Corbet, Shuah Khan, Andrew Morton,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti
Cc: linux-kernel, linux-arch, linux-kselftest, kunit-dev, dri-devel,
workflows, linux-riscv, linux-doc, peterz, Guenter Roeck,
Linux Kernel Functional Testing, Alessandro Carminati,
Albert Esteve, Dan Carpenter, Kees Cook
In-Reply-To: <20260513-kunit_add_support-v10-0-e379d206c8cd@redhat.com>
From: Guenter Roeck <linux@roeck-us.net>
Document API functions for suppressing warning backtraces.
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Acked-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
Reviewed-by: David Gow <david@davidgow.net>
Signed-off-by: Albert Esteve <aesteve@redhat.com>
---
Documentation/dev-tools/kunit/usage.rst | 46 ++++++++++++++++++++++++++++++++-
1 file changed, 45 insertions(+), 1 deletion(-)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index ebd06f5ea4550..1c78dfff94e8a 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -157,6 +157,50 @@ Alternatively, one can take full control over the error message by using
if (some_setup_function())
KUNIT_FAIL(test, "Failed to setup thing for testing");
+Suppressing warning backtraces
+------------------------------
+
+Some unit tests trigger warning backtraces either intentionally or as a side
+effect. Such backtraces are normally undesirable since they distract from
+the actual test and may result in the impression that there is a problem.
+
+Backtraces can be suppressed with **task-scoped suppression**: while
+suppression is active on the current task, the backtrace and stack dump from
+``WARN*()``, ``WARN_ON*()``, and related macros on that task are suppressed.
+Two API forms are available.
+
+- Scoped suppression is the simplest form. Wrap the code that triggers
+ warnings in a ``kunit_warning_suppress()`` block:
+
+.. code-block:: c
+
+ static void some_test(struct kunit *test)
+ {
+ kunit_warning_suppress(test) {
+ trigger_backtrace();
+ KUNIT_EXPECT_SUPPRESSED_WARNING_COUNT(test, 1);
+ }
+ }
+
+.. note::
+ The warning count must be checked inside the block; the suppression handle
+ is not accessible after the block exits.
+
+- Direct functions return an explicit handle pointer. Use them when the handle
+ needs to be retained or passed across helper functions:
+
+.. code-block:: c
+
+ static void some_test(struct kunit *test)
+ {
+ struct kunit_suppressed_warning *w;
+
+ w = kunit_start_suppress_warning(test);
+ trigger_backtrace();
+ kunit_end_suppress_warning(test, w);
+
+ KUNIT_EXPECT_EQ(test, kunit_suppressed_warning_count(w), 1);
+ }
Test Suites
~~~~~~~~~~~
@@ -1211,4 +1255,4 @@ For example:
dev_managed_string = devm_kstrdup(fake_device, "Hello, World!");
// Everything is cleaned up automatically when the test ends.
- }
\ No newline at end of file
+ }
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v7 07/20] KVM: arm64: Set up FGT for Partitioned PMU
From: Oliver Upton @ 2026-05-13 7:34 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, Alexandru Elisei, Paolo Bonzini, Jonathan Corbet,
Russell King, Catalin Marinas, Will Deacon, Marc Zyngier,
Oliver Upton, Mingwei Zhang, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Mark Rutland, Shuah Khan, Ganapatrao Kulkarni,
James Clark, linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
linux-perf-users, linux-kselftest
In-Reply-To: <20260504211813.1804997-8-coltonlewis@google.com>
On Mon, May 04, 2026 at 09:18:00PM +0000, Colton Lewis wrote:
> +static void __compute_hdfgrtr(struct kvm_vcpu *vcpu)
> +{
> + __compute_fgt(vcpu, HDFGRTR_EL2);
> +
> + *vcpu_fgt(vcpu, HDFGRTR_EL2) |=
> + HDFGRTR_EL2_PMOVS
> + | HDFGRTR_EL2_PMCCFILTR_EL0
> + | HDFGRTR_EL2_PMEVTYPERn_EL0
> + | HDFGRTR_EL2_PMCEIDn_EL0
> + | HDFGRTR_EL2_PMMIR_EL1;
> +}
> +
I've given this feedback at least twice already...
Operators go on the preceding line in the case of line continuations.
> +
> +/**
> + * kvm_pmu_is_partitioned() - Determine if given PMU is partitioned
> + * @pmu: Pointer to arm_pmu struct
> + *
> + * Determine if given PMU is partitioned by looking at hpmn field. The
> + * PMU is partitioned if this field is less than the number of
> + * counters in the system.
> + *
> + * Return: True if the PMU is partitioned, false otherwise
> + */
> +bool kvm_pmu_is_partitioned(struct arm_pmu *pmu)
> +{
> + if (!pmu)
> + return false;
> +
> + return pmu->max_guest_counters >= 0 &&
> + pmu->max_guest_counters <= *host_data_ptr(nr_event_counters);
> +}
> +
> +/**
> + * kvm_vcpu_pmu_is_partitioned() - Determine if given VCPU has a partitioned PMU
> + * @vcpu: Pointer to kvm_vcpu struct
> + *
> + * Determine if given VCPU has a partitioned PMU by extracting that
> + * field and passing it to :c:func:`kvm_pmu_is_partitioned`
> + *
> + * Return: True if the VCPU PMU is partitioned, false otherwise
> + */
> +bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu)
> +{
> + return kvm_pmu_is_partitioned(vcpu->kvm->arch.arm_pmu) &&
> + false;
> +}
Ok, I'm thoroughly confused about these predicates.
Whether or not a vCPU is using a partitioned PMU is a per-VM property.
This is separate from whether or not the backing arm_pmu has a range of
available counters for the guest to use.
It is entirely possible that a VM *isn't* using the partitioned PMU
feature (i.e. backed with perf events) yet the supporting arm_pmu has a
guest counter range.
> +#if !defined(__KVM_NVHE_HYPERVISOR__)
> +bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu);
> +bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu);
> +#else
> +static inline bool kvm_vcpu_pmu_is_partitioned(struct kvm_vcpu *vcpu)
> +{
> + return false;
> +}
> +
> +static inline bool kvm_vcpu_pmu_use_fgt(struct kvm_vcpu *vcpu)
> +{
> + return false;
> +}
> +#endif
> +
Don't use ifdeffery for this. Aim to have a single definition and rely
on has_vhe() to do the rest of the work.
Thanks,
Oliver
^ permalink raw reply
* Re: [PATCH 09/12] swap: push down setting sis->bdev into ->swap_activate
From: Damien Le Moal @ 2026-05-13 7:44 UTC (permalink / raw)
To: Christoph Hellwig, Darrick J. Wong
Cc: Andrew Morton, Chris Li, Kairui Song, Christian Brauner,
Jens Axboe, David Sterba, Theodore Ts'o, Jaegeuk Kim, Chao Yu,
Trond Myklebust, Anna Schumaker, Namjae Jeon, Hyunchul Lee,
Steve French, Paulo Alcantara, Carlos Maiolino, Naohiro Aota,
linux-xfs, linux-fsdevel, linux-doc, linux-mm, linux-block,
linux-btrfs, linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <20260513055806.GC1236@lst.de>
On 5/13/26 14:58, Christoph Hellwig wrote:
> On Tue, May 12, 2026 at 10:08:46AM -0700, Darrick J. Wong wrote:
>>> + /* Only one bdev per swap file for now. */
>>> + if (!sis->bdev)
>>> + sis->bdev = bdev;
>>> + else if (bdev != sis->bdev)
>>> + return -EINVAL;
>>
>> Should this return error if the bdev is zoned? AFAICT XFS and zonefs
>> already guard against this, but other fses might be more naïve.
>
> Yes, now that the bdev is passed down to add_swap_extent we could
> consolidate the check here.
Hmmm... With zonefs, swap files can be created on top of conventional zone
files. So enforcing "no swap on zoned device" here would break that.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply
* Re: [PATCH v7 08/20] KVM: arm64: Add Partitioned PMU register trap handlers
From: Oliver Upton @ 2026-05-13 7:45 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, Alexandru Elisei, Paolo Bonzini, Jonathan Corbet,
Russell King, Catalin Marinas, Will Deacon, Marc Zyngier,
Oliver Upton, Mingwei Zhang, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Mark Rutland, Shuah Khan, Ganapatrao Kulkarni,
James Clark, linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
linux-perf-users, linux-kselftest
In-Reply-To: <20260504211813.1804997-9-coltonlewis@google.com>
On Mon, May 04, 2026 at 09:18:01PM +0000, Colton Lewis wrote:
> We may want a partitioned PMU but not have FEAT_FGT to untrap the
> specific registers that would normally be untrapped. Add handling for
> those trapped register accesses that does the right thing if the PMU
> is partitioned.
>
> For registers that shouldn't be written to hardware because they
> require special handling (PMEVTYPER and PMOVS), write to the virtual
> register. A later patch will ensure these are handled correctly at
> vcpu_load time.
>
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
I'd prefer an approach that provides a single accessor helper that takes
a vcpu_sysreg enum as an argument and internally handles the dispatch
between partitioned and emulated PMUs. That goes for all of the PMU
sysregs.
This will help you reuse some of the PMU emuation code that you'll still
need for things like nested...
Thanks,
Oliver
^ permalink raw reply
* Re: [PATCH 09/12] swap: push down setting sis->bdev into ->swap_activate
From: Christoph Hellwig @ 2026-05-13 7:46 UTC (permalink / raw)
To: Damien Le Moal
Cc: Christoph Hellwig, Darrick J. Wong, Andrew Morton, Chris Li,
Kairui Song, Christian Brauner, Jens Axboe, David Sterba,
Theodore Ts'o, Jaegeuk Kim, Chao Yu, Trond Myklebust,
Anna Schumaker, Namjae Jeon, Hyunchul Lee, Steve French,
Paulo Alcantara, Carlos Maiolino, Naohiro Aota, linux-xfs,
linux-fsdevel, linux-doc, linux-mm, linux-block, linux-btrfs,
linux-ext4, linux-f2fs-devel, linux-nfs, linux-cifs
In-Reply-To: <acd6428b-a352-4f7b-a349-b2c9e341fd87@kernel.org>
On Wed, May 13, 2026 at 04:44:53PM +0900, Damien Le Moal wrote:
> Hmmm... With zonefs, swap files can be created on top of conventional zone
> files. So enforcing "no swap on zoned device" here would break that.
We can check that none of the extents fall onto sequential zones instead
of just devices.
I still wonder why you bother with swap to zonefs at all, though.
^ permalink raw reply
* Re: [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
From: David Hildenbrand (Arm) @ 2026-05-13 7:53 UTC (permalink / raw)
To: Breno Leitao
Cc: Miaohe Lin, Naoya Horiguchi, Andrew Morton, Jonathan Corbet,
Shuah Khan, Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Liam R. Howlett, linux-mm,
linux-kernel, linux-doc, linux-kselftest, linux-trace-kernel,
kernel-team, Lance Yang
In-Reply-To: <agMj4ukhj1PkXXrN@gmail.com>
On 5/12/26 15:04, Breno Leitao wrote:
> On Tue, May 12, 2026 at 10:17:00AM +0200, David Hildenbrand (Arm) wrote:
>>> @@ -2348,6 +2348,7 @@ int memory_failure(unsigned long pfn, int flags)
>>> unsigned long page_flags;
>>> bool retry = true;
>>> int hugetlb = 0;
>>> + bool is_reserved;
>>>
>>> if (!sysctl_memory_failure_recovery)
>>> panic("Memory failure on page %lx", pfn);
>>> @@ -2411,6 +2412,18 @@ int memory_failure(unsigned long pfn, int flags)
>>> * In fact it's dangerous to directly bump up page count from 0,
>>> * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>>> */
>>> + /*
>>> + * Pages with PG_reserved set are not currently managed by the
>>> + * page allocator (memblock-reserved memory, driver reservations,
>>> + * etc.), so classify them as kernel-owned for reporting.
>>> + *
>>> + * Sample the flag before get_hwpoison_page(): in the
>>> + * MF_COUNT_INCREASED path, get_any_page() can drop the caller's
>>> + * reference before returning -EIO, after which page->flags may
>>> + * have been reset by the allocator.
>>> + */
>>> + is_reserved = PageReserved(p);
>>> +
>>> res = get_hwpoison_page(p, flags);
>>> if (!res) {
>>> if (is_free_buddy_page(p)) {
>>> @@ -2432,7 +2445,11 @@ int memory_failure(unsigned long pfn, int flags)
>>> }
>>> goto unlock_mutex;
>>> } else if (res < 0) {
>>> - res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
>>> + if (is_reserved)
>>> + res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
>>> + else
>>> + res = action_result(pfn, MF_MSG_GET_HWPOISON,
>>> + MF_IGNORED);
>>> goto unlock_mutex;
>>> }
>>>
>>>
>>
>> It's a bit odd that we need this handling when we already have handling for
>> reserved pages in error_states[].
>>
>> HWPoisonHandlable() would always essentially reject PG_reserved pages. So
>> __get_hwpoison_page() ... would always fail? Making
>> get_hwpoison_page()->get_any_page() always fail?
>>
>> But then, we never call identify_page_state()? And never call me_kernel()?
>
> From what I read, it seems that error_states[0] = { reserved, reserved, MF_MSG_KERNEL, me_kernel }
> has been effectively dead code on the hwpoison-from-MCE path for a
> while.
>
> My v6 patch relabels the failure-path output to match what me_kernel() would
> have reported anyway.
>
>> This all looks very odd.
>>
>> Why would you even want to call get_hwpoison_page() in the first place if you
>> find PageReserved?
>
> Are you suggesting we should all the page action as soon as we detect the page
> is reserved and get out?
>
> Something as:
>
> if (PageReserved(p)) {
> res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> goto unlock_mutex;
> }
>
> res = get_hwpoison_page(p, flags);
Or you combine this patch with the other patch and let simply
get_hwpoison_page() check that, and return an appropriate error code for
unhandable that you can process here?
Like, maybe, returning -EIO directly?
res = get_hwpoison_page(p, flags);
switch (res) {
case 0: /* Success */
...
break
case -EIO: /* Unhandable kernel page. */
...
break;
case -EBUSY: /* Race, try again? */
...
break;
case ...
}
You can add more return codes as you see fit.
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
From: David Hildenbrand (Arm) @ 2026-05-13 7:53 UTC (permalink / raw)
To: jane.chu, Breno Leitao, Miaohe Lin, Naoya Horiguchi,
Andrew Morton, Jonathan Corbet, Shuah Khan, Lorenzo Stoakes,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Shuah Khan, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Liam R. Howlett
Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
linux-trace-kernel, kernel-team, Lance Yang
In-Reply-To: <816e3d8e-22d2-49a4-92ae-981568f38792@oracle.com>
On 5/12/26 19:58, jane.chu@oracle.com wrote:
>
>
> On 5/12/2026 1:17 AM, David Hildenbrand (Arm) wrote:
>> On 5/11/26 17:38, Breno Leitao wrote:
>>> When get_hwpoison_page() returns a negative value, distinguish
>>> reserved pages from other failure cases by reporting MF_MSG_KERNEL
>>> instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
>>> and should be classified accordingly for proper handling.
>>>
>>> Sample PG_reserved before the get_hwpoison_page() call. In the
>>> MF_COUNT_INCREASED path get_any_page() can drop the caller's
>>> reference before returning -EIO, after which the underlying page may
>>> have been freed and reallocated with page->flags reset; reading
>>> PageReserved(p) at that point would observe stale or unrelated state.
>>> The pre-call snapshot reflects what the page actually was at the
>>> time of the failure event.
>>>
>>> Acked-by: Miaohe Lin <linmiaohe@huawei.com>
>>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>> ---
>>> mm/memory-failure.c | 19 ++++++++++++++++++-
>>> 1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 866c4428ac7ef..f112fb27a8ff6 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -2348,6 +2348,7 @@ int memory_failure(unsigned long pfn, int flags)
>>> unsigned long page_flags;
>>> bool retry = true;
>>> int hugetlb = 0;
>>> + bool is_reserved;
>>> if (!sysctl_memory_failure_recovery)
>>> panic("Memory failure on page %lx", pfn);
>>> @@ -2411,6 +2412,18 @@ int memory_failure(unsigned long pfn, int flags)
>>> * In fact it's dangerous to directly bump up page count from 0,
>>> * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>>> */
>>> + /*
>>> + * Pages with PG_reserved set are not currently managed by the
>>> + * page allocator (memblock-reserved memory, driver reservations,
>>> + * etc.), so classify them as kernel-owned for reporting.
>>> + *
>>> + * Sample the flag before get_hwpoison_page(): in the
>>> + * MF_COUNT_INCREASED path, get_any_page() can drop the caller's
>>> + * reference before returning -EIO, after which page->flags may
>>> + * have been reset by the allocator.
>>> + */
>>> + is_reserved = PageReserved(p);
>>> +
>>> res = get_hwpoison_page(p, flags);
>>> if (!res) {
>>> if (is_free_buddy_page(p)) {
>>> @@ -2432,7 +2445,11 @@ int memory_failure(unsigned long pfn, int flags)
>>> }
>>> goto unlock_mutex;
>>> } else if (res < 0) {
>>> - res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
>>> + if (is_reserved)
>>> + res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
>>> + else
>>> + res = action_result(pfn, MF_MSG_GET_HWPOISON,
>>> + MF_IGNORED);
>>> goto unlock_mutex;
>>> }
>>>
>>
>> It's a bit odd that we need this handling when we already have handling for
>> reserved pages in error_states[].
>>
>> HWPoisonHandlable() would always essentially reject PG_reserved pages. So
>> __get_hwpoison_page() ... would always fail? Making
>> get_hwpoison_page()->get_any_page() always fail?
>>
>> But then, we never call identify_page_state()? And never call me_kernel()?
>>
>> This all looks very odd.
>>
>> Why would you even want to call get_hwpoison_page() in the first place if you
>> find PageReserved?
>>
>
> Ah, good point!
> It seems to me that all unhandable pages should head out to identify_page_state:
>
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2411,6 +2411,10 @@ int memory_failure(unsigned long pfn, int flags)
> * In fact it's dangerous to directly bump up page count from 0,
> * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
> */
> +
> + if (!HWPoisonHandlable(page, flags)
> + goto identify_page_state;
> +
> res = get_hwpoison_page(p, flags);
> if (!res) {
> if (is_free_buddy_page(p)) {
That's one option, or we just let get_hwpoison_page() return clearer error
codes, let it take care of checking PageReserved, and process the error codes
return by get_hwpoison_page() in a better way.
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH v6 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages
From: David Hildenbrand (Arm) @ 2026-05-13 7:54 UTC (permalink / raw)
To: Lance Yang
Cc: leitao, linmiaohe, nao.horiguchi, akpm, corbet, skhan, ljs,
vbabka, rppt, surenb, mhocko, shuah, rostedt, mhiramat,
mathieu.desnoyers, liam, linux-mm, linux-kernel, linux-doc,
linux-kselftest, linux-trace-kernel, kernel-team
In-Reply-To: <20260512124837.38883-1-lance.yang@linux.dev>
On 5/12/26 14:48, Lance Yang wrote:
>
> On Tue, May 12, 2026 at 10:17:00AM +0200, David Hildenbrand (Arm) wrote:
>> On 5/11/26 17:38, Breno Leitao wrote:
>>> When get_hwpoison_page() returns a negative value, distinguish
>>> reserved pages from other failure cases by reporting MF_MSG_KERNEL
>>> instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel
>>> and should be classified accordingly for proper handling.
>>>
>>> Sample PG_reserved before the get_hwpoison_page() call. In the
>>> MF_COUNT_INCREASED path get_any_page() can drop the caller's
>>> reference before returning -EIO, after which the underlying page may
>>> have been freed and reallocated with page->flags reset; reading
>>> PageReserved(p) at that point would observe stale or unrelated state.
>>> The pre-call snapshot reflects what the page actually was at the
>>> time of the failure event.
>>>
>>> Acked-by: Miaohe Lin <linmiaohe@huawei.com>
>>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>> ---
>>> mm/memory-failure.c | 19 ++++++++++++++++++-
>>> 1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 866c4428ac7ef..f112fb27a8ff6 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -2348,6 +2348,7 @@ int memory_failure(unsigned long pfn, int flags)
>>> unsigned long page_flags;
>>> bool retry = true;
>>> int hugetlb = 0;
>>> + bool is_reserved;
>>>
>>> if (!sysctl_memory_failure_recovery)
>>> panic("Memory failure on page %lx", pfn);
>>> @@ -2411,6 +2412,18 @@ int memory_failure(unsigned long pfn, int flags)
>>> * In fact it's dangerous to directly bump up page count from 0,
>>> * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>>> */
>>> + /*
>>> + * Pages with PG_reserved set are not currently managed by the
>>> + * page allocator (memblock-reserved memory, driver reservations,
>>> + * etc.), so classify them as kernel-owned for reporting.
>>> + *
>>> + * Sample the flag before get_hwpoison_page(): in the
>>> + * MF_COUNT_INCREASED path, get_any_page() can drop the caller's
>>> + * reference before returning -EIO, after which page->flags may
>>> + * have been reset by the allocator.
>>> + */
>>> + is_reserved = PageReserved(p);
>>> +
>>> res = get_hwpoison_page(p, flags);
>>> if (!res) {
>>> if (is_free_buddy_page(p)) {
>>> @@ -2432,7 +2445,11 @@ int memory_failure(unsigned long pfn, int flags)
>>> }
>>> goto unlock_mutex;
>>> } else if (res < 0) {
>>> - res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
>>> + if (is_reserved)
>>> + res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
>>> + else
>>> + res = action_result(pfn, MF_MSG_GET_HWPOISON,
>>> + MF_IGNORED);
>>> goto unlock_mutex;
>>> }
>>>
>>>
>>
>> It's a bit odd that we need this handling when we already have handling for
>> reserved pages in error_states[].
>>
>> HWPoisonHandlable() would always essentially reject PG_reserved pages. So
>> __get_hwpoison_page() ... would always fail? Making
>> get_hwpoison_page()->get_any_page() always fail?
>>
>> But then, we never call identify_page_state()? And never call me_kernel()?
>
> Looks like we never get that far ...
Right, likely that should be removed+cleaned up then.
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH v7 09/20] KVM: arm64: Set up MDCR_EL2 to handle a Partitioned PMU
From: Oliver Upton @ 2026-05-13 7:57 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, Alexandru Elisei, Paolo Bonzini, Jonathan Corbet,
Russell King, Catalin Marinas, Will Deacon, Marc Zyngier,
Oliver Upton, Mingwei Zhang, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Mark Rutland, Shuah Khan, Ganapatrao Kulkarni,
James Clark, linux-doc, linux-kernel, linux-arm-kernel, kvmarm,
linux-perf-users, linux-kselftest
In-Reply-To: <20260504211813.1804997-10-coltonlewis@google.com>
On Mon, May 04, 2026 at 09:18:02PM +0000, Colton Lewis wrote:
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 3ad6b7c6e4ba7..0ab89c91e19cb 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -36,20 +36,43 @@ static int cpu_has_spe(u64 dfr0)
> */
> static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
> {
> + int hpmn = kvm_pmu_hpmn(vcpu);
> +
> preempt_disable();
>
> /*
> * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK
> * to disable guest access to the profiling and trace buffers
> */
> - vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN,
> - *host_data_ptr(nr_event_counters));
> +
> + vcpu->arch.mdcr_el2 = FIELD_PREP(MDCR_EL2_HPMN, hpmn);
> vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> MDCR_EL2_TPMS |
> MDCR_EL2_TTRF |
> MDCR_EL2_TPMCR |
> MDCR_EL2_TDRA |
> - MDCR_EL2_TDOSA);
> + MDCR_EL2_TDOSA |
> + MDCR_EL2_HPME);
> +
> + if (kvm_vcpu_pmu_is_partitioned(vcpu)) {
> + /*
> + * Filtering these should be redundant because we trap
> + * all the TYPER and FILTR registers anyway and ensure
> + * they filter EL2, but set the bits if they are here.
> + */
> + if (is_pmuv3p1(read_pmuver()))
> + vcpu->arch.mdcr_el2 |= MDCR_EL2_HPMD;
> + if (is_pmuv3p5(read_pmuver()))
> + vcpu->arch.mdcr_el2 |= MDCR_EL2_HCCD;
Neither of these controls are of any consequence on unsupported
hardware (RES0). Set them unconditionally?
> + /*
> + * Take out the coarse grain traps if we are using
> + * fine grain traps.
> + */
> + if (kvm_vcpu_pmu_use_fgt(vcpu))
I think open coding the check here would actually improve readability.
if (cpus_have_final_cap(ARM64_HAS_FGT) &&
(cpus_have_final_cap(ARM64_HAS_HPMN0) ||
vcpu->kvm->arch.nr_pmu_counters != 0))
vcpu->arch.mdcr_el2 &= ~(MDCR_EL2_TPM | MDCR_EL2_TPMCR);
> +
> +/**
> + * kvm_pmu_hpmn() - Calculate HPMN field value
> + * @vcpu: Pointer to struct kvm_vcpu
> + *
> + * Calculate the appropriate value to set for MDCR_EL2.HPMN. If
> + * partitioned, this is the number of counters set for the guest if
> + * supported, falling back to max_guest_counters if needed. If we are not
> + * partitioned or can't set the implied HPMN value, fall back to the
> + * host value.
> + *
> + * Return: A valid HPMN value
> + */
> +u8 kvm_pmu_hpmn(struct kvm_vcpu *vcpu)
> +{
> + u8 nr_guest_cntr = vcpu->kvm->arch.nr_pmu_counters;
> +
> + if (kvm_vcpu_pmu_is_partitioned(vcpu)
> + && !vcpu_on_unsupported_cpu(vcpu)
> + && (cpus_have_final_cap(ARM64_HAS_HPMN0) || nr_guest_cntr > 0))
> + return nr_guest_cntr;
> +
> + return *host_data_ptr(nr_event_counters);
> +}
This helper isn't helpful. Just open code it in the place where we are
computing MDCR_EL2.
> @@ -542,6 +542,13 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm)
> if (cpus_have_final_cap(ARM64_WORKAROUND_PMUV3_IMPDEF_TRAPS))
> return 1;
>
> + /*
> + * If partitioned then we are limited by the max counters in
> + * the guest partition.
> + */
> + if (kvm_pmu_is_partitioned(arm_pmu))
> + return arm_pmu->max_guest_counters;
> +
Ok, this is exactly what I was getting at earlier. What about a VM with
an emulated PMU? It should use cntr_mask calculation, not the guest
range.
Thanks,
Oliver
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox