* Re: [PATCH v8 36/46] KVM: selftests: Test that truncation does not change shared/private status
From: Fuad Tabba @ 2026-06-25 7:03 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-36-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Add a test to verify that deallocating a page in a guest memfd region via
> fallocate() with FALLOC_FL_PUNCH_HOLE does not alter the shared or private
> status of the corresponding memory range.
>
> When a page backing a guest memfd mapping is deallocated, e.g., by punching
> a hole or truncating the file, and then subsequently faulted back in, the
> new page must inherit the correct shared/private status tracked by
> guest_memfd.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../selftests/kvm/x86/guest_memfd_conversions_test.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> index 0b024fb7227f0..f03af2c46426f 100644
> --- a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> @@ -10,6 +10,7 @@
> #include <linux/sizes.h>
>
> #include "kvm_util.h"
> +#include "kvm_syscalls.h"
> #include "kselftest_harness.h"
> #include "test_util.h"
> #include "ucall_common.h"
> @@ -309,6 +310,19 @@ GMEM_CONVERSION_MULTIPAGE_TEST_INIT_SHARED(unallocated_folios, 8)
> test_convert_to_shared(t, i, 'B', 'C', 'D');
> }
>
> +/* Truncation should not affect shared/private status. */
> +GMEM_CONVERSION_TEST_INIT_SHARED(truncate)
> +{
> + host_do_rmw(t->mem, 0, 0, 'A');
> + kvm_fallocate(t->gmem_fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, page_size);
> + host_do_rmw(t->mem, 0, 0, 'A');
> +
> + test_convert_to_private(t, 0, 'A', 'B');
> +
> + kvm_fallocate(t->gmem_fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, page_size);
> + test_private(t, 0, 0, 'A');
> +}
> +
> int main(int argc, char *argv[])
> {
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 37/46] KVM: selftests: Test that shared/private status is consistent across processes
From: Fuad Tabba @ 2026-06-25 7:14 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-37-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Add a test to verify that a guest_memfd's shared/private status is
> consistent across processes, and that any shared pages previously mapped in
> any process are unmapped from all processes.
>
> The test forks a child process after creating the shared guest_memfd
> region so that the second process exists alongside the main process for the
> entire test.
>
> The processes then take turns to access memory to check that the
> shared/private status is consistent across processes.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
Two things below, otherwise:
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> .../kvm/x86/guest_memfd_conversions_test.c | 118 +++++++++++++++++++++
> 1 file changed, 118 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> index f03af2c46426f..99b0023609670 100644
> --- a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> @@ -2,6 +2,8 @@
> /*
> * Copyright (c) 2024, Google LLC.
> */
> +#include <pthread.h>
> +#include <time.h>
> #include <sys/mman.h>
> #include <unistd.h>
nit: include order
>
> @@ -323,6 +325,122 @@ GMEM_CONVERSION_TEST_INIT_SHARED(truncate)
> test_private(t, 0, 0, 'A');
> }
>
> +/* Test that shared/private memory protections work and are seen from any process. */
> +GMEM_CONVERSION_TEST_INIT_SHARED(forked_accesses)
> +{
> + enum test_state {
> + STATE_INIT,
> + STATE_CHECK_SHARED,
> + STATE_DONE_CHECKING_SHARED,
> + STATE_CHECK_PRIVATE,
> + STATE_DONE_CHECKING_PRIVATE,
> + };
> +
> + struct sync_state {
> + pthread_mutex_t mutex;
> + pthread_cond_t cond;
> + enum test_state step;
> + } *sync;
> +
> + pthread_mutexattr_t mattr;
> + pthread_condattr_t cattr;
> + pid_t child_pid, parent_pid;
> + int status;
> +
> + sync = kvm_mmap(sizeof(*sync), PROT_READ | PROT_WRITE,
> + MAP_SHARED | MAP_ANONYMOUS, -1);
> +
> + pthread_mutexattr_init(&mattr);
> + pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
> + pthread_mutex_init(&sync->mutex, &mattr);
> + pthread_mutexattr_destroy(&mattr);
> +
> + pthread_condattr_init(&cattr);
> + pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED);
> + pthread_cond_init(&sync->cond, &cattr);
> + pthread_condattr_destroy(&cattr);
> +
> + sync->step = STATE_INIT;
> +
> +#define TEST_STATE_AWAIT(__state) \
> + do { \
> + pthread_mutex_lock(&sync->mutex); \
> + while (sync->step != (__state)) { \
> + struct timespec ts, stop; \
> + int ret; \
> + \
> + clock_gettime(CLOCK_REALTIME, &ts); \
> + stop = timespec_add_ns(ts, 100 * 1000000UL); \
> + \
> + ret = pthread_cond_timedwait(&sync->cond, &sync->mutex, &stop); \
> + if (ret == ETIMEDOUT) { \
> + bool alive = (child_pid == 0) ? \
> + (getppid() == parent_pid) : \
> + (waitpid(child_pid, NULL, WNOHANG) == 0); \
Not sure it's worth it, but if you want to silence Sashiko, waitid
with WNOWAIT might be the way to go (not tested, just from looking at
the man page). This is though very unlikely, mentioning it since
Sashiko complained.
> + TEST_ASSERT(alive, "Other process exited prematurely"); \
> + } else { \
> + TEST_ASSERT(!ret, "pthread_cond_timedwait failed"); \
> + } \
> + } \
> + pthread_mutex_unlock(&sync->mutex); \
> + } while (0)
> +
> +#define TEST_STATE_SET(__state) \
> + do { \
> + pthread_mutex_lock(&sync->mutex); \
> + sync->step = (__state); \
> + pthread_cond_broadcast(&sync->cond); \
> + pthread_mutex_unlock(&sync->mutex); \
> + } while (0)
> +
> + parent_pid = getpid();
> + child_pid = fork();
> + TEST_ASSERT(child_pid != -1, "fork failed");
> +
> + if (child_pid == 0) {
> + const char inconsequential = 0xdd;
> +
> + TEST_STATE_AWAIT(STATE_CHECK_SHARED);
> +
> + /*
> + * This maps the pages into the child process as well, and tests
> + * that the conversion process will unmap the guest_memfd memory
> + * from all processes.
> + */
> + host_do_rmw(t->mem, 0, 0xB, 0xC);
> +
> + TEST_STATE_SET(STATE_DONE_CHECKING_SHARED);
> + TEST_STATE_AWAIT(STATE_CHECK_PRIVATE);
> +
> + TEST_EXPECT_SIGBUS(READ_ONCE(t->mem[0]));
> + TEST_EXPECT_SIGBUS(WRITE_ONCE(t->mem[0], inconsequential));
> +
> + TEST_STATE_SET(STATE_DONE_CHECKING_PRIVATE);
> + exit(0);
> + }
> +
> + test_shared(t, 0, 0, 0xA, 0xB);
> +
> + TEST_STATE_SET(STATE_CHECK_SHARED);
> + TEST_STATE_AWAIT(STATE_DONE_CHECKING_SHARED);
> +
> + test_convert_to_private(t, 0, 0xC, 0xD);
> +
> + TEST_STATE_SET(STATE_CHECK_PRIVATE);
> + TEST_STATE_AWAIT(STATE_DONE_CHECKING_PRIVATE);
> +
> + TEST_ASSERT_EQ(waitpid(child_pid, &status, 0), child_pid);
> + TEST_ASSERT(WIFEXITED(status) && WEXITSTATUS(status) == 0,
> + "Child exited with unexpected status");
> +
> + pthread_mutex_destroy(&sync->mutex);
> + pthread_cond_destroy(&sync->cond);
> + kvm_munmap(sync, sizeof(*sync));
> +
> +#undef TEST_STATE_SET
> +#undef TEST_STATE_AWAIT
> +}
> +
> int main(int argc, char *argv[])
> {
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 38/46] KVM: selftests: Add helpers to pin pages with CONFIG_GUP_TEST
From: Fuad Tabba @ 2026-06-25 7:40 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-38-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Add helper functions to allow KVM selftests to pin memory using
> CONFIG_GUP_TEST. This is useful for testing scenarios where some page has
> an increased refcount. such as in guest_memfd in-place conversion tests.
>
> The helpers open /sys/kernel/debug/gup_test and invoke the
> PIN_LONGTERM_TEST_START and PIN_LONGTERM_TEST_STOP ioctls. Since this
> functionality depends on the kernel being built with CONFIG_GUP_TEST,
> provide stub implementations that trigger a test failure if the
> configuration is missing.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
nit below, otherwise:
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/include/kvm_util.h | 3 +++
> tools/testing/selftests/kvm/lib/kvm_util.c | 23 +++++++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 323d06b5699ec..79ab64ac8b869 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -1195,6 +1195,9 @@ static inline int pin_self_to_any_cpu(void)
> return pin_task_to_any_cpu(pthread_self());
> }
>
> +void pin_pages(void *vaddr, uint64_t size);
> +void unpin_pages(void);
> +
> void kvm_print_vcpu_pinning_help(void);
> void kvm_parse_vcpu_pinning(const char *pcpus_string, u32 vcpu_to_pcpu[],
> int nr_vcpus);
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index b73817f7bc803..524ef97d634bf 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -18,6 +18,8 @@
> #include <unistd.h>
> #include <linux/kernel.h>
>
> +#include "../../../../mm/gup_test.h"
> +
> #define KVM_UTIL_MIN_PFN 2
>
> u32 guest_random_seed;
> @@ -639,6 +641,27 @@ int __pin_task_to_cpu(pthread_t task, int cpu)
> return pthread_setaffinity_np(task, sizeof(cpuset), &cpuset);
> }
>
> +static int gup_test_fd = -1;
> +
> +void pin_pages(void *vaddr, uint64_t size)
> +{
> + const struct pin_longterm_test args = {
> + .addr = (uint64_t)vaddr,
> + .size = size,
> + .flags = PIN_LONGTERM_TEST_FLAG_USE_WRITE,
> + };
> +
> + gup_test_fd = __open_path_or_exit("/sys/kernel/debug/gup_test", O_RDWR,
> + "Is CONFIG_GUP_TEST enabled?");
nit: should you close this/reset it to -1 after the tests?
> +
> + TEST_ASSERT_EQ(ioctl(gup_test_fd, PIN_LONGTERM_TEST_START, &args), 0);
> +}
> +
> +void unpin_pages(void)
> +{
> + TEST_ASSERT_EQ(ioctl(gup_test_fd, PIN_LONGTERM_TEST_STOP), 0);
> +}
> +
> static u32 parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
> {
> u32 pcpu = atoi_non_negative("CPU number", cpu_str);
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v3 0/2] tracing: Remove trace_printk.h from kernel.h
From: Sebastian Andrzej Siewior @ 2026-06-25 7:56 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov
In-Reply-To: <20260624081806.120105649@kernel.org>
On 2026-06-24 04:18:06 [-0400], Steven Rostedt wrote:
> Remove trace_printk.h by creating a trace_controls.h for those places that
> need access to tracing prototypes like tracing_off() and for the places that
> need trace_printk() directly, to have it included directly.
That sounds reasonable. Thank you for doing it.
Sebastian
^ permalink raw reply
* Re: [PATCH v8 39/46] KVM: selftests: Test conversion with elevated page refcount
From: Fuad Tabba @ 2026-06-25 8:04 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-39-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Add a selftest to verify that converting a shared guest_memfd page to a
> private page fails if the page has an elevated reference count.
>
> When KVM converts a shared page to a private one, it expects the page to
> have a reference count equal to the reference counts taken by the
> filemap. If another kernel subsystem holds a reference to the page, the
> conversion must be aborted.
>
> The test asserts that both bulk and single-page conversion attempts
> correctly fail with EAGAIN for the pinned page. After the page is unpinned,
> the test verifies that subsequent conversions succeed.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Not sure Sashiko's concern is worth it.
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../kvm/x86/guest_memfd_conversions_test.c | 56 ++++++++++++++++++++++
> 1 file changed, 56 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> index 99b0023609670..4ebbd29029526 100644
> --- a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> @@ -441,6 +441,62 @@ GMEM_CONVERSION_TEST_INIT_SHARED(forked_accesses)
> #undef TEST_STATE_AWAIT
> }
>
> +static void test_convert_to_private_fails(test_data_t *t, u64 pgoff,
> + size_t nr_pages,
> + u64 expected_error_offset)
> +{
> + /* +1 to make it anything but expected_error_offset. */
> + u64 error_offset = expected_error_offset + 1;
> + u64 offset = pgoff * page_size;
> + int ret;
> +
> + do {
> + ret = __gmem_set_private(t->gmem_fd, offset,
> + nr_pages * page_size, &error_offset);
> + } while (ret == -1 && errno == EINTR);
> + TEST_ASSERT(ret == -1 && errno == EAGAIN,
> + "Wanted EAGAIN on page %lu, got %d (ret = %d)", pgoff,
> + errno, ret);
> + TEST_ASSERT_EQ(error_offset, expected_error_offset);
> +}
> +
> +GMEM_CONVERSION_MULTIPAGE_TEST_INIT_SHARED(elevated_refcount, 4)
> +{
> + int i;
> +
> + pin_pages(t->mem + test_page * page_size, page_size);
> +
> + for (i = 0; i < nr_pages; i++)
> + test_shared(t, i, 0, 'A', 'B');
> +
> + /*
> + * Converting in bulk should fail as long any page in the range has
> + * unexpected refcounts.
> + */
> + test_convert_to_private_fails(t, 0, nr_pages, test_page * page_size);
> +
> + for (i = 0; i < nr_pages; i++) {
> + /*
> + * Converting page-wise should also fail as long any page in the
> + * range has unexpected refcounts.
> + */
> + if (i == test_page)
> + test_convert_to_private_fails(t, i, 1, test_page * page_size);
> + else
> + test_convert_to_private(t, i, 'B', 'C');
> + }
> +
> + unpin_pages();
> +
> + gmem_set_private(t->gmem_fd, 0, nr_pages * page_size);
> +
> + for (i = 0; i < nr_pages; i++) {
> + char expected = i == test_page ? 'B' : 'C';
> +
> + test_private(t, i, expected, 'D');
> + }
> +}
> +
> int main(int argc, char *argv[])
> {
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v4 09/13] verification/rvgen: Delete __parse_constraint()
From: Gabriele Monaco @ 2026-06-25 8:21 UTC (permalink / raw)
To: Nam Cao
Cc: Steven Rostedt, Wander Lairson Costa, linux-trace-kernel,
linux-kernel
In-Reply-To: <b22a5a3822fe53afb8e2cf1df623a0e4c9ed5f49.1781847583.git.namcao@linutronix.de>
On Fri, 2026-06-19 at 07:52 +0200, Nam Cao wrote:
> All previous users of self.invariants and self.guards have been
> converted
> to the Lark parser, delete __parse_constraints() and its associates.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
This one was missing the
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
The series looks ready for inclusion to me, thanks!
Gabriele
> ---
> tools/verification/rvgen/rvgen/dot2k.py | 67 ++---------------------
> --
> 1 file changed, 4 insertions(+), 63 deletions(-)
>
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index 4ea1ecc55c80..f1f5fa297adb 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -177,7 +177,6 @@ class ha2k(dot2k):
> if not self.is_hybrid_automata():
> raise AutomataError("Detected deterministic automaton,
> use the 'da' class")
> self.trace_h = self._read_template_file("trace_hybrid.h")
> - self.__parse_constraints()
> self.has_invariant = False
> self.has_guard = False
> for state in self._states:
> @@ -308,64 +307,6 @@ class ha2k(dot2k):
> separator = "\n\t\t " if sum(len(r) for r in rules) >
> 80 else " "
> return ["res = " + separator.join(rules) + ";"]
>
> - def __validate_constraint(self, key: tuple[int, int] | int,
> constr: str,
> - rule, reset) -> None:
> - # event constrains are tuples and allow both rules and reset
> - # state constraints are only used for expirations (e.g.
> clk<N)
> - if self.is_event_constraint(key):
> - if not rule and not reset:
> - raise AutomataError("Unrecognised event constraint "
> -
> f"({self.states[key[0]]}/{self.events[key[1]]}: {constr})")
> - if rule and (rule["env"] in self.env_types and
> - rule["env"] not in self.env_stored):
> - raise AutomataError("Clocks in hybrid automata
> always require a storage"
> - f" ({rule["env"]})")
> - else:
> - if not rule:
> - raise AutomataError("Unrecognised state constraint "
> - f"({self.states[key]}:
> {constr})")
> - if rule["env"] not in self.env_stored:
> - raise AutomataError("State constraints always
> require a storage "
> - f"({rule["env"]})")
> - if rule["op"] not in ["<", "<="]:
> - raise AutomataError("State constraints must be clock
> expirations like"
> - f" clk<N ({rule.string})")
> -
> - def __parse_constraints(self) -> None:
> - self.guards: dict[_EventConstraintKey, str] = {}
> - self.invariants: dict[_StateConstraintKey, str] = {}
> - for key, constraint in self.constraints.items():
> - rules = []
> - resets = []
> - for c, sep in self._split_constraint_expr(constraint):
> - rule = self.constraint_rule.search(c)
> - reset = self.constraint_reset.search(c)
> - self.__validate_constraint(key, c, rule, reset)
> - if rule:
> - value = rule["val"]
> - value_len = len(rule["val"])
> - unit = None
> - if rule.groupdict().get("unit"):
> - value_len += len(rule["unit"])
> - unit = rule["unit"]
> - c = c[:-(value_len)]
> - value = self.__adjust_value(value, unit)
> - if self.is_event_constraint(key):
> - c = self.__parse_single_constraint(rule,
> value)
> - if sep:
> - c += f" {sep}"
> - else:
> - c = self.__parse_timer_constraint(rule,
> value)
> - rules.append(c)
> - if reset:
> - c = f"ha_reset_env(ha_mon,
> {reset["env"]}{self.enum_suffix}, time_ns)"
> - resets.append(c)
> - if self.is_event_constraint(key):
> - res = self.__format_guard_rules(rules) + resets
> - self.guards[key] = ";".join(res)
> - else:
> - self.invariants[key] = rules[0]
> -
> def __fill_verify_invariants_func(self) -> list[str]:
> if not self.has_invariant:
> return []
> @@ -490,15 +431,15 @@ f"""static bool ha_verify_constraint(struct
> ha_monitor *ha_mon,
> \t\t\t\t enum {self.enum_states_def} next_state, u64 time_ns)
> {{""")
>
> - if self.invariants:
> + if self.has_invariant:
> buff.append("\tif (!ha_verify_invariants(ha_mon,
> curr_state, "
> "event, next_state, time_ns))\n\t\treturn
> false;\n")
>
> - if self.guards:
> + if self.has_guard:
> buff.append("\tif (!ha_verify_guards(ha_mon, curr_state,
> event, "
> "next_state, time_ns))\n\t\treturn
> false;\n")
>
> - if self.invariants:
> + if self.has_invariant:
> buff.append("\tha_setup_invariants(ha_mon, curr_state,
> event, next_state, time_ns);\n")
>
> buff.append("\treturn true;\n}\n")
> @@ -575,7 +516,7 @@ f"""static bool ha_verify_constraint(struct
> ha_monitor *ha_mon,
> return self.__fill_hybrid_get_reset_functions() +
> self.__fill_constr_func()
>
> def _fill_timer_type(self) -> list:
> - if self.invariants:
> + if self.has_invariant:
> return [
> "/* XXX: If the monitor has several instances,
> consider HA_TIMER_WHEEL */",
> "#define HA_TIMER_TYPE HA_TIMER_HRTIMER"
^ permalink raw reply
* Re: [PATCH v8 40/46] KVM: selftests: Reset shared memory after hole-punching
From: Fuad Tabba @ 2026-06-25 8:46 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-40-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> private_mem_conversions_test used to reset the shared memory that was used
> for the test to an initial pattern at the end of each test iteration. Then,
> it would punch out the pages, which would zero memory.
>
> Without in-place conversion, the resetting would write shared memory, and
> hole-punching will zero private memory, hence resetting the test to the
> state at the beginning of the for loop.
>
> With in-place conversion, resetting writes memory as shared, and
> hole-punching zeroes the same physical memory, hence undoing the reset
> done before the hole punch.
>
> Move the resetting after the hole-punching, and reset the entire
> PER_CPU_DATA_SIZE instead of just the tested range.
>
> With in-place conversion, this zeroes and then resets the same physical
> memory. Without in-place conversion, the private memory is zeroed, and the
> shared memory is reset to init_p.
>
> This is sufficient since at each test stage, the memory is assumed to start
> as shared, and private memory is always assumed to start zeroed. Conversion
> zeroes memory, so the future test stages will work as expected.
>
> Fixes: 43f623f350ce1 ("KVM: selftests: Add x86-only selftest for private memory conversions")
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/x86/private_mem_conversions_test.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> index 861baff201e78..289ad10063fca 100644
> --- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> @@ -202,15 +202,18 @@ static void guest_test_explicit_conversion(u64 base_gpa, bool do_fallocate)
> guest_sync_shared(gpa, size, p3, p4);
> memcmp_g(gpa, p4, size);
>
> - /* Reset the shared memory back to the initial pattern. */
> - memset((void *)gpa, init_p, size);
> -
> /*
> * Free (via PUNCH_HOLE) *all* private memory so that the next
> * iteration starts from a clean slate, e.g. with respect to
> * whether or not there are pages/folios in guest_mem.
> */
> guest_map_shared(base_gpa, PER_CPU_DATA_SIZE, true);
> +
> + /*
> + * Hole-punching above zeroed private memory. Reset shared
> + * memory in preparation for the next GUEST_STAGE.
> + */
> + memset((void *)base_gpa, init_p, PER_CPU_DATA_SIZE);
> }
> }
>
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 41/46] KVM: selftests: Provide function to look up guest_memfd details from gpa
From: Fuad Tabba @ 2026-06-25 8:58 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-41-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Introduce a new helper, kvm_gpa_to_guest_memfd(), to find the
> guest_memfd-related details of a memory region that contains a given guest
> physical address (GPA).
>
> The function returns the file descriptor for the memfd, the offset into
> the file that corresponds to the GPA, and the number of bytes remaining
> in the region from that GPA.
>
> kvm_gpa_to_guest_memfd() was factored out from vm_guest_mem_fallocate();
> refactor vm_guest_mem_fallocate() to use the new helper.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/include/kvm_util.h | 3 +++
> tools/testing/selftests/kvm/lib/kvm_util.c | 37 ++++++++++++++++----------
> 2 files changed, 26 insertions(+), 14 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 79ab64ac8b869..3a6b1fa7f26ef 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -428,6 +428,9 @@ static inline void vm_enable_cap(struct kvm_vm *vm, u32 cap, u64 arg0)
> vm_ioctl(vm, KVM_ENABLE_CAP, &enable_cap);
> }
>
> +int kvm_gpa_to_guest_memfd(struct kvm_vm *vm, gpa_t gpa, off_t *fd_offset,
> + size_t *nr_bytes);
> +
> /*
> * KVM_SET_MEMORY_ATTRIBUTES{,2} overwrites _all_ attributes. These
> * flows need significant enhancements to support multiple attributes.
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 524ef97d634bf..0b2256ea65ff9 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1305,27 +1305,20 @@ void vm_guest_mem_fallocate(struct kvm_vm *vm, u64 base, u64 size,
> bool punch_hole)
> {
> const int mode = FALLOC_FL_KEEP_SIZE | (punch_hole ? FALLOC_FL_PUNCH_HOLE : 0);
> - struct userspace_mem_region *region;
> u64 end = base + size;
> - gpa_t gpa, len;
> off_t fd_offset;
> - int ret;
> + int fd, ret;
> + size_t len;
> + gpa_t gpa;
>
> for (gpa = base; gpa < end; gpa += len) {
> - u64 offset;
> -
> - region = userspace_mem_region_find(vm, gpa, gpa);
> - TEST_ASSERT(region && region->region.flags & KVM_MEM_GUEST_MEMFD,
> - "Private memory region not found for GPA 0x%lx", gpa);
> + fd = kvm_gpa_to_guest_memfd(vm, gpa, &fd_offset, &len);
> + len = min(end - gpa, len);
>
> - offset = gpa - region->region.guest_phys_addr;
> - fd_offset = region->region.guest_memfd_offset + offset;
> - len = min_t(u64, end - gpa, region->region.memory_size - offset);
> -
> - ret = fallocate(region->region.guest_memfd, mode, fd_offset, len);
> + ret = fallocate(fd, mode, fd_offset, len);
> TEST_ASSERT(!ret, "fallocate() failed to %s at %lx (len = %lu), fd = %d, mode = %x, offset = %lx",
> punch_hole ? "punch hole" : "allocate", gpa, len,
> - region->region.guest_memfd, mode, fd_offset);
> + fd, mode, fd_offset);
> }
> }
>
> @@ -1662,6 +1655,22 @@ void *addr_gpa2alias(struct kvm_vm *vm, gpa_t gpa)
> return (void *) ((uintptr_t) region->host_alias + offset);
> }
>
> +int kvm_gpa_to_guest_memfd(struct kvm_vm *vm, gpa_t gpa, off_t *fd_offset,
> + size_t *nr_bytes)
> +{
> + struct userspace_mem_region *region;
> + gpa_t gpa_offset;
> +
> + region = userspace_mem_region_find(vm, gpa, gpa);
> + TEST_ASSERT(region && region->region.flags & KVM_MEM_GUEST_MEMFD,
> + "guest_memfd memory region not found for GPA 0x%lx", gpa);
> +
> + gpa_offset = gpa - region->region.guest_phys_addr;
> + *fd_offset = region->region.guest_memfd_offset + gpa_offset;
> + *nr_bytes = region->region.memory_size - gpa_offset;
> + return region->region.guest_memfd;
> +}
> +
> /* Create an interrupt controller chip for the specified VM. */
> void vm_create_irqchip(struct kvm_vm *vm)
> {
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 42/46] KVM: selftests: Provide common function to set memory attributes
From: Fuad Tabba @ 2026-06-25 9:09 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-42-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Introduce vm_mem_set_memory_attributes(), which handles setting of memory
> attributes for a range of guest physical addresses, regardless of whether
> the attributes should be set via guest_memfd or via the memory attributes
> at the VM level.
>
> Refactor existing vm_mem_set_{shared,private} functions to use the new
> function. Opportunistically update the size parameter to use size_t instead
> of u64.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/include/kvm_util.h | 46 +++++++++++++++++++-------
> 1 file changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 3a6b1fa7f26ef..db1442da21bb1 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -454,18 +454,6 @@ static inline void vm_set_memory_attributes(struct kvm_vm *vm, gpa_t gpa,
> vm_ioctl(vm, KVM_SET_MEMORY_ATTRIBUTES, &attr);
> }
>
> -static inline void vm_mem_set_private(struct kvm_vm *vm, gpa_t gpa,
> - u64 size)
> -{
> - vm_set_memory_attributes(vm, gpa, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> -}
> -
> -static inline void vm_mem_set_shared(struct kvm_vm *vm, gpa_t gpa,
> - u64 size)
> -{
> - vm_set_memory_attributes(vm, gpa, size, 0);
> -}
> -
> static inline int __gmem_set_memory_attributes(int fd, u64 offset,
> size_t size, u64 attributes,
> u64 *error_offset)
> @@ -532,6 +520,40 @@ static inline void gmem_set_shared(int fd, u64 offset, size_t size)
> gmem_set_memory_attributes(fd, offset, size, 0);
> }
>
> +static inline void vm_mem_set_memory_attributes(struct kvm_vm *vm, gpa_t gpa,
> + size_t size, u64 attrs)
> +{
> + if (kvm_has_gmem_attributes) {
> + gpa_t end = gpa + size;
> + off_t fd_offset;
> + gpa_t addr;
> + size_t len;
> + int fd;
> +
> + for (addr = gpa; addr < end; addr += len) {
> + fd = kvm_gpa_to_guest_memfd(vm, addr, &fd_offset, &len);
> + len = min(end - addr, len);
> +
> + gmem_set_memory_attributes(fd, fd_offset, len, attrs);
> + }
> + } else {
> + vm_set_memory_attributes(vm, gpa, size, attrs);
> + }
> +}
> +
> +static inline void vm_mem_set_private(struct kvm_vm *vm, gpa_t gpa,
> + size_t size)
> +{
> + vm_mem_set_memory_attributes(vm, gpa, size,
> + KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +}
> +
> +static inline void vm_mem_set_shared(struct kvm_vm *vm, gpa_t gpa,
> + size_t size)
> +{
> + vm_mem_set_memory_attributes(vm, gpa, size, 0);
> +}
> +
> void vm_guest_mem_fallocate(struct kvm_vm *vm, gpa_t gpa, u64 size,
> bool punch_hole);
>
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 43/46] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot
From: Fuad Tabba @ 2026-06-25 9:20 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-43-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Check that a valid fd provided to mmap() must be accompanied by MAP_SHARED.
>
> With an invalid fd (usually used for anonymous mappings), there are no
> constraints on mmap() flags.
>
> Add this check to make sure that when a guest_memfd is used as region->fd,
> the flag provided to mmap() will include MAP_SHARED.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> [Rephrase assertion message.]
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/lib/kvm_util.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 0b2256ea65ff9..6b304e8a0e0d5 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1110,6 +1110,9 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
> src_type == VM_MEM_SRC_SHARED_HUGETLB);
> }
>
> + TEST_ASSERT(region->fd == -1 || backing_src_is_shared(src_type),
> + "A valid fd provided to mmap() must be accompanied by MAP_SHARED.");
> +
> region->mmap_start = __kvm_mmap(region->mmap_size, PROT_READ | PROT_WRITE,
> vm_mem_backing_src_alias(src_type)->flag,
> region->fd, mmap_offset);
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 44/46] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe
From: Fuad Tabba @ 2026-06-25 9:30 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-44-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> The TEST_EXPECT_SIGBUS macro is not thread-safe as it uses a global
> sigjmp_buf and installs a global SIGBUS signal handler. If multiple threads
> execute the macro concurrently, they will race on installing the signal
> handler and stomp on other threads' jump buffers, leading to incorrect test
> behavior.
>
> Make TEST_EXPECT_SIGBUS thread-safe with the following changes:
>
> Share the KVM tests' global signal handler. sigaction() applies to all
> threads; without sharing a global signal handler, one thread may have
> removed the signal handler that another thread added, hence leading to
> unexpected signals.
>
> The alternative of layering signal handlers was considered, but calling
> sigaction() within TEST_EXPECT_SIGBUS() necessarily creates a race. To
> avoid adding new setup and teardown routines to do sigaction() and keep
> usage of TEST_EXPECT_SIGBUS() simple, share the KVM tests' global signal
> handler.
>
> Opportunistically rename report_unexpected_signal to
> catchall_signal_handler.
>
> To continue to only expect SIGBUS within specific regions of code, use a
> thread-specific variable, expecting_sigbus, to replace installing and
> removing signal handlers.
>
> Make the execution environment for the thread, sigjmp_buf, a
> thread-specific variable.
>
> As part of TEST_EXPECT_SIGBUS(), assert the prerequisite for this setup,
> that the current signal handler is the catchall_signal_handler.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/include/test_util.h | 32 +++++++++++++------------
> tools/testing/selftests/kvm/lib/kvm_util.c | 18 ++++++++++----
> tools/testing/selftests/kvm/lib/test_util.c | 7 ------
> 3 files changed, 30 insertions(+), 27 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
> index 51287fac8138a..bd75162ec868d 100644
> --- a/tools/testing/selftests/kvm/include/test_util.h
> +++ b/tools/testing/selftests/kvm/include/test_util.h
> @@ -82,21 +82,23 @@ do { \
> __builtin_unreachable(); \
> } while (0)
>
> -extern sigjmp_buf expect_sigbus_jmpbuf;
> -void expect_sigbus_handler(int signum);
> -
> -#define TEST_EXPECT_SIGBUS(action) \
> -do { \
> - struct sigaction sa_old, sa_new = { \
> - .sa_handler = expect_sigbus_handler, \
> - }; \
> - \
> - sigaction(SIGBUS, &sa_new, &sa_old); \
> - if (sigsetjmp(expect_sigbus_jmpbuf, 1) == 0) { \
> - action; \
> - TEST_FAIL("'%s' should have triggered SIGBUS", #action); \
> - } \
> - sigaction(SIGBUS, &sa_old, NULL); \
> +extern __thread sigjmp_buf expect_sigbus_jmpbuf;
> +extern __thread volatile sig_atomic_t expecting_sigbus;
> +extern void catchall_signal_handler(int signum);
> +
> +#define TEST_EXPECT_SIGBUS(action) \
> +do { \
> + struct sigaction __sa = {}; \
> + \
> + TEST_ASSERT_EQ(sigaction(SIGBUS, NULL, &__sa), 0); \
> + TEST_ASSERT_EQ(__sa.sa_handler, &catchall_signal_handler); \
> + \
> + expecting_sigbus = true; \
> + if (sigsetjmp(expect_sigbus_jmpbuf, 1) == 0) { \
> + action; \
> + TEST_FAIL("'%s' should have triggered SIGBUS", #action);\
> + } \
> + expecting_sigbus = false; \
> } while (0)
>
> size_t parse_size(const char *size);
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 6b304e8a0e0d5..b4f104436875b 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -2292,13 +2292,20 @@ __weak void kvm_selftest_arch_init(void)
> {
> }
>
> -static void report_unexpected_signal(int signum)
> +__thread sigjmp_buf expect_sigbus_jmpbuf;
> +__thread volatile sig_atomic_t expecting_sigbus;
> +
> +void catchall_signal_handler(int signum)
> {
> + switch (signum) {
> + case SIGBUS: {
> + if (expecting_sigbus)
> + siglongjmp(expect_sigbus_jmpbuf, 1);
> +
> + TEST_FAIL("Unexpected SIGBUS (%d)\n", signum);
> + }
> #define KVM_CASE_SIGNUM(sig) \
> case sig: TEST_FAIL("Unexpected " #sig " (%d)\n", signum)
> -
> - switch (signum) {
> - KVM_CASE_SIGNUM(SIGBUS);
> KVM_CASE_SIGNUM(SIGSEGV);
> KVM_CASE_SIGNUM(SIGILL);
> KVM_CASE_SIGNUM(SIGFPE);
> @@ -2310,12 +2317,13 @@ static void report_unexpected_signal(int signum)
> void __attribute((constructor)) kvm_selftest_init(void)
> {
> struct sigaction sig_sa = {
> - .sa_handler = report_unexpected_signal,
> + .sa_handler = catchall_signal_handler,
> };
>
> /* Tell stdout not to buffer its content. */
> setbuf(stdout, NULL);
>
> + expecting_sigbus = false;
> sigaction(SIGBUS, &sig_sa, NULL);
> sigaction(SIGSEGV, &sig_sa, NULL);
> sigaction(SIGILL, &sig_sa, NULL);
> diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
> index bab1bd2b775b6..30eb701e4becd 100644
> --- a/tools/testing/selftests/kvm/lib/test_util.c
> +++ b/tools/testing/selftests/kvm/lib/test_util.c
> @@ -18,13 +18,6 @@
>
> #include "test_util.h"
>
> -sigjmp_buf expect_sigbus_jmpbuf;
> -
> -void __attribute__((used)) expect_sigbus_handler(int signum)
> -{
> - siglongjmp(expect_sigbus_jmpbuf, 1);
> -}
> -
> /*
> * Random number generator that is usable from guest code. This is the
> * Park-Miller LCG using standard constants.
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 45/46] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd
From: Fuad Tabba @ 2026-06-25 9:43 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-45-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Update the private memory conversions selftest to also test conversions
> that are done "in-place" via per-guest_memfd memory attributes. In-place
> conversions require the host to be able to mmap() the guest_memfd so that
> the host and guest can share the same backing physical memory.
>
> This includes several updates, that are conditioned on the system
> supporting per-guest_memfd attributes (kvm_has_gmem_attributes):
>
> 1. Set up guest_memfd requesting MMAP and INIT_SHARED.
>
> 2. With in-place conversions, the host's mapping points directly to the
> guest's memory. When the guest converts a region to private, host access
> to that region is blocked. Update the test to expect a SIGBUS when
> attempting to access the host virtual address (HVA) of private memory.
>
> 3. Use vm_mem_set_memory_attributes(), which chooses how to set memory
> attributes based on whether kvm_has_gmem_attributes.
>
> Restrict the test to using VM_MEM_SRC_SHMEM because guest_memfd's required
> mmap() flags and page sizes happens to align with those of
> VM_MEM_SRC_SHMEM. As long as VM_MEM_SRC_SHMEM is used for src_type,
> vm_mem_add() works as intended.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../kvm/x86/private_mem_conversions_test.c | 44 ++++++++++++++++++----
> 1 file changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> index 289ad10063fca..4308c67952310 100644
> --- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> @@ -306,9 +306,12 @@ static void handle_exit_hypercall(struct kvm_vcpu *vcpu)
> if (do_fallocate)
> vm_guest_mem_fallocate(vm, gpa, size, map_shared);
>
> - if (set_attributes)
> - vm_set_memory_attributes(vm, gpa, size,
> - map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE);
> + if (set_attributes) {
> + u64 attrs = map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +
> + vm_mem_set_memory_attributes(vm, gpa, size, attrs);
> + }
> +
> run->hypercall.ret = 0;
> }
>
> @@ -352,8 +355,20 @@ static void *__test_mem_conversions(void *__vcpu)
> size_t nr_bytes = min_t(size_t, vm->page_size, size - i);
> u8 *hva = addr_gpa2hva(vm, gpa + i);
>
> - /* In all cases, the host should observe the shared data. */
> - memcmp_h(hva, gpa + i, uc.args[3], nr_bytes);
> + /*
> + * When using per-guest_memfd memory attributes,
> + * i.e. in-place conversion, host accesses will
> + * point at guest memory and should SIGBUS when
> + * guest memory is private. When using per-VM
> + * attributes, i.e. separate backing for shared
> + * vs. private, the host should always observe
> + * the shared data.
> + */
> + if (kvm_has_gmem_attributes &&
> + uc.args[0] == SYNC_PRIVATE)
> + TEST_EXPECT_SIGBUS(READ_ONCE(*hva));
> + else
> + memcmp_h(hva, gpa + i, uc.args[3], nr_bytes);
>
> /* For shared, write the new pattern to guest memory. */
> if (uc.args[0] == SYNC_SHARED)
> @@ -382,6 +397,7 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, u32 nr_v
> const size_t slot_size = memfd_size / nr_memslots;
> struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
> pthread_t threads[KVM_MAX_VCPUS];
> + u64 gmem_flags;
> struct kvm_vm *vm;
> int memfd, i;
>
> @@ -397,12 +413,17 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, u32 nr_v
>
> vm_enable_cap(vm, KVM_CAP_EXIT_HYPERCALL, (1 << KVM_HC_MAP_GPA_RANGE));
>
> - memfd = vm_create_guest_memfd(vm, memfd_size, 0);
> + if (kvm_has_gmem_attributes)
> + gmem_flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
> + else
> + gmem_flags = 0;
> +
> + memfd = vm_create_guest_memfd(vm, memfd_size, gmem_flags);
>
> for (i = 0; i < nr_memslots; i++)
> vm_mem_add(vm, src_type, BASE_DATA_GPA + slot_size * i,
> BASE_DATA_SLOT + i, slot_size / vm->page_size,
> - KVM_MEM_GUEST_MEMFD, memfd, slot_size * i, 0);
> + KVM_MEM_GUEST_MEMFD, memfd, slot_size * i, gmem_flags);
>
> for (i = 0; i < nr_vcpus; i++) {
> gpa_t gpa = BASE_DATA_GPA + i * per_cpu_size;
> @@ -452,17 +473,24 @@ static void usage(const char *cmd)
>
> int main(int argc, char *argv[])
> {
> - enum vm_mem_backing_src_type src_type = DEFAULT_VM_MEM_SRC;
> + enum vm_mem_backing_src_type src_type;
> u32 nr_memslots = 1;
> u32 nr_vcpus = 1;
> int opt;
>
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> + src_type = kvm_has_gmem_attributes ? VM_MEM_SRC_SHMEM :
> + DEFAULT_VM_MEM_SRC;
> +
> while ((opt = getopt(argc, argv, "hm:s:n:")) != -1) {
> switch (opt) {
> case 's':
> src_type = parse_backing_src_type(optarg);
> + TEST_ASSERT(!kvm_has_gmem_attributes ||
> + src_type == VM_MEM_SRC_SHMEM,
> + "Testing in-place conversions, only %s mem_type supported\n",
> + vm_mem_backing_src_alias(VM_MEM_SRC_SHMEM)->name);
> break;
> case 'n':
> nr_vcpus = atoi_positive("nr_vcpus", optarg);
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 46/46] KVM: selftests: Update private memory exits test to work with per-gmem attributes
From: Fuad Tabba @ 2026-06-25 9:56 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-46-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Skip setting memory to private in the private memory exits test when using
> per-gmem memory attributes, as memory is initialized to private by default
> for guest_memfd, and using vm_mem_set_private() on a guest_memfd instance
> requires creating guest_memfd with GUEST_MEMFD_FLAG_MMAP (which is totally
> doable, but would need to be conditional and is ultimately unnecessary).
>
> Expect an emulated MMIO instead of a memory fault exit when attributes are
> per-gmem, as deleting the memslot effectively drops the private status,
> i.e. the GPA becomes shared and thus supports emulated MMIO.
>
> Skip the "memslot not private" test entirely, as private vs. shared state
> for x86 software-protected VMs comes from the memory attributes themselves,
> and so when doing in-place conversions there can never be a disconnect
> between the expected and actual states.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../selftests/kvm/x86/private_mem_kvm_exits_test.c | 36 ++++++++++++++++++----
> 1 file changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c b/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
> index 10db9fe6d9063..70ed16066c63e 100644
> --- a/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
> +++ b/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
> @@ -62,8 +62,9 @@ static void test_private_access_memslot_deleted(void)
>
> virt_map(vm, EXITS_TEST_GVA, EXITS_TEST_GPA, EXITS_TEST_NPAGES);
>
> - /* Request to access page privately */
> - vm_mem_set_private(vm, EXITS_TEST_GPA, EXITS_TEST_SIZE);
> + /* Request to access page privately. */
> + if (!kvm_has_gmem_attributes)
> + vm_mem_set_private(vm, EXITS_TEST_GPA, EXITS_TEST_SIZE);
>
> pthread_create(&vm_thread, NULL,
> (void *(*)(void *))run_vcpu_get_exit_reason,
> @@ -74,10 +75,26 @@ static void test_private_access_memslot_deleted(void)
> pthread_join(vm_thread, &thread_return);
> exit_reason = (u32)(u64)thread_return;
>
> - TEST_ASSERT_EQ(exit_reason, KVM_EXIT_MEMORY_FAULT);
> - TEST_ASSERT_EQ(vcpu->run->memory_fault.flags, KVM_MEMORY_EXIT_FLAG_PRIVATE);
> - TEST_ASSERT_EQ(vcpu->run->memory_fault.gpa, EXITS_TEST_GPA);
> - TEST_ASSERT_EQ(vcpu->run->memory_fault.size, EXITS_TEST_SIZE);
> + /*
> + * If attributes are tracked per-gmem, deleting the memslot that points
> + * at the gmem instance effectively makes the memory shared, and so the
> + * read should trigger emulated MMIO.
> + *
> + * If attributes are tracked per-VM, deleting the memslot shouldn't
> + * affect the private attribute, and so KVM should generate a memory
> + * fault exit (emulated MMIO on private GPAs is disallowed).
> + */
> + if (kvm_has_gmem_attributes) {
> + TEST_ASSERT_EQ(exit_reason, KVM_EXIT_MMIO);
> + TEST_ASSERT_EQ(vcpu->run->mmio.phys_addr, EXITS_TEST_GPA);
> + TEST_ASSERT_EQ(vcpu->run->mmio.len, sizeof(u64));
> + TEST_ASSERT_EQ(vcpu->run->mmio.is_write, false);
> + } else {
> + TEST_ASSERT_EQ(exit_reason, KVM_EXIT_MEMORY_FAULT);
> + TEST_ASSERT_EQ(vcpu->run->memory_fault.flags, KVM_MEMORY_EXIT_FLAG_PRIVATE);
> + TEST_ASSERT_EQ(vcpu->run->memory_fault.gpa, EXITS_TEST_GPA);
> + TEST_ASSERT_EQ(vcpu->run->memory_fault.size, EXITS_TEST_SIZE);
> + }
>
> kvm_vm_free(vm);
> }
> @@ -88,6 +105,13 @@ static void test_private_access_memslot_not_private(void)
> struct kvm_vcpu *vcpu;
> u32 exit_reason;
>
> + /*
> + * Accessing non-private memory as private with a software-protected VM
> + * isn't possible when doing in-place conversions.
> + */
> + if (kvm_has_gmem_attributes)
> + return;
> +
> vm = vm_create_shape_with_one_vcpu(protected_vm_shape, &vcpu,
> guest_repeatedly_read);
>
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH 6.6.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
To: stable
Cc: Sasha Levin, Bjoern Doebel, Steven Rostedt, Masami Hiramatsu,
linux-trace-kernel, linux-kernel, Mathieu Desnoyers,
David Howells
In-Reply-To: <20260624122258.2476991-1-doebel@amazon.de>
> [PATCH 6.6.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
Queued for 6.6, thanks! 6.1 is queued too; the 5.15 and 5.10 versions
need a respin - see my replies on those threads.
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH 6.1.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
To: stable
Cc: Sasha Levin, Bjoern Doebel, Steven Rostedt, Masami Hiramatsu,
linux-trace-kernel, linux-kernel, Mathieu Desnoyers,
David Howells
In-Reply-To: <20260624122328.2477272-1-doebel@amazon.de>
> [PATCH 6.1.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
Queued for 6.1, thanks! (6.6 is queued too; 5.15/5.10 need a respin -
see those threads.)
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH 5.15.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
To: stable
Cc: Sasha Levin, Bjoern Doebel, Steven Rostedt, Masami Hiramatsu,
linux-trace-kernel, linux-kernel, Mathieu Desnoyers,
David Howells
In-Reply-To: <20260624122351.2477592-1-doebel@amazon.de>
> [PATCH 5.15.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
I had to drop this one for 5.15. The upstream guard(raw_spinlock_irqsave)
conversion in ring_buffer_read_start() introduces a new
-Wdeclaration-after-statement warning on 5.15 (the guard variable ends up after
a statement), which the build flags as an
error there.
Could you respin a warning-free version for 5.15 (and 5.10, which has the same
problem)? E.g. hoisting the declaration or keeping the explicit
raw_spin_lock/unlock instead of guard() on these older trees. 6.6 and 6.1 are
already queued.
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH 5.10.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
To: stable
Cc: Sasha Levin, Bjoern Doebel, Steven Rostedt, Masami Hiramatsu,
linux-trace-kernel, linux-kernel, Mathieu Desnoyers,
David Howells
In-Reply-To: <20260624122413.2477871-1-doebel@amazon.de>
> [PATCH 5.10.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
Same as the 5.15 one - I had to drop this for 5.10. The
guard(raw_spinlock_irqsave) conversion triggers a new
-Wdeclaration-after-statement warning in ring_buffer_read_start() on this tree.
Please respin a warning-free version for 5.10/5.15. 6.6 and 6.1 are queued.
--
Thanks,
Sasha
^ permalink raw reply
* [PATCH v4 0/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Steven Rostedt @ 2026-06-25 10:40 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
kvm, intel-gfx
Remove trace_printk.h by creating a trace_controls.h for those places that
need access to tracing prototypes like tracing_off() and for the places that
need trace_printk() directly, to have it included directly.
Changse since v3: https://lore.kernel.org/all/20260624081806.120105649@kernel.org/
- Always include trace_controls.h in rcu.h (kernel test robot)
There are other configs that may include tracing_off() in rcu.h besides
the one that had the include of trace_controls.h. Just always include
it in that header to be safe.
Steven Rostedt (2):
tracing: Move non-trace_printk prototypes into trace_controls.h
tracing: Remove trace_printk.h from kernel.h
----
arch/powerpc/kvm/book3s_xics.c | 1 +
arch/powerpc/xmon/xmon.c | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/machine_kexec.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
drivers/gpu/drm/i915/i915_gem.h | 2 ++
drivers/hwtracing/stm/dummy_stm.c | 1 +
drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
drivers/tty/sysrq.c | 1 +
drivers/usb/early/xhci-dbc.c | 1 +
fs/ext4/inline.c | 1 +
include/linux/ftrace.h | 2 ++
include/linux/kernel.h | 1 -
include/linux/sunrpc/debug.h | 1 +
include/linux/trace_controls.h | 54 ++++++++++++++++++++++++++++++++
include/linux/trace_printk.h | 56 ++--------------------------------
kernel/debug/debug_core.c | 1 +
kernel/panic.c | 1 +
kernel/rcu/rcu.h | 1 +
kernel/rcu/rcutorture.c | 1 +
kernel/trace/ring_buffer_benchmark.c | 1 +
kernel/trace/trace.h | 1 +
kernel/trace/trace_benchmark.c | 1 +
lib/sys_info.c | 1 +
samples/fprobe/fprobe_example.c | 1 +
samples/ftrace/ftrace-direct-too.c | 1 -
samples/trace_printk/trace-printk.c | 1 +
27 files changed, 82 insertions(+), 55 deletions(-)
create mode 100644 include/linux/trace_controls.h
^ permalink raw reply
* [PATCH v4 2/2] tracing: Remove trace_printk.h from kernel.h
From: Steven Rostedt @ 2026-06-25 10:40 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
kvm, intel-gfx
In-Reply-To: <20260625104007.041432666@kernel.org>
From: Steven Rostedt <rostedt@goodmis.org>
There have been complaints about trace_printk.h causing more build time
for being in kernel.h if it changes. There is also an effort to clean up
kernel.h to have it not include unneeded header files. Move trace_printk.h
out of kernel.h and place it in the headers and C files that use it.
Link: https://lore.kernel.org/all/CAHk-=wikCBeVFjVXiY4o-oepdbjAoir5+TcAgtL12c4u1TpZLQ@mail.gmail.com/
Suggested-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
arch/powerpc/kvm/book3s_xics.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
drivers/gpu/drm/i915/i915_gem.h | 1 +
drivers/hwtracing/stm/dummy_stm.c | 1 +
drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
drivers/usb/early/xhci-dbc.c | 1 +
fs/ext4/inline.c | 1 +
include/linux/ftrace.h | 2 ++
include/linux/kernel.h | 1 -
include/linux/sunrpc/debug.h | 1 +
include/linux/trace_printk.h | 5 +++--
kernel/trace/ring_buffer_benchmark.c | 1 +
samples/fprobe/fprobe_example.c | 1 +
samples/ftrace/ftrace-direct-too.c | 1 -
samples/trace_printk/trace-printk.c | 1 +
15 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 74a44fa702b0..ef5eb596a56e 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -26,6 +26,7 @@
#if 1
#define XICS_DBG(fmt...) do { } while (0)
#else
+#include <linux/trace_printk.h>
#define XICS_DBG(fmt...) trace_printk(fmt)
#endif
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b54ee4f25af1..f6f223090760 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -35,6 +35,7 @@
#define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
#if IS_ENABLED(CONFIG_DRM_I915_TRACE_GTT)
+#include <linux/trace_printk.h>
#define GTT_TRACE(...) trace_printk(__VA_ARGS__)
#else
#define GTT_TRACE(...)
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 1da8fb61c09e..f490052e8964 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -117,6 +117,7 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
#if IS_ENABLED(CONFIG_DRM_I915_TRACE_GEM)
#include <linux/trace_controls.h>
+#include <linux/trace_printk.h>
#define GEM_TRACE(...) trace_printk(__VA_ARGS__)
#define GEM_TRACE_ERR(...) do { \
pr_err(__VA_ARGS__); \
diff --git a/drivers/hwtracing/stm/dummy_stm.c b/drivers/hwtracing/stm/dummy_stm.c
index 38528ffdc0b3..7c5e48ebfb9f 100644
--- a/drivers/hwtracing/stm/dummy_stm.c
+++ b/drivers/hwtracing/stm/dummy_stm.c
@@ -8,6 +8,7 @@
*/
#undef DEBUG
+#include <linux/trace_printk.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/slab.h>
diff --git a/drivers/infiniband/hw/hfi1/trace_dbg.h b/drivers/infiniband/hw/hfi1/trace_dbg.h
index 58304b91380f..30df5e246586 100644
--- a/drivers/infiniband/hw/hfi1/trace_dbg.h
+++ b/drivers/infiniband/hw/hfi1/trace_dbg.h
@@ -103,6 +103,7 @@ __hfi1_trace_def(IOCTL);
*/
#ifdef HFI1_EARLY_DBG
+#include <linux/trace_printk.h>
#define hfi1_dbg_early(fmt, ...) \
trace_printk(fmt, ##__VA_ARGS__)
#else
diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
index 41118bba9197..955c73bd601f 100644
--- a/drivers/usb/early/xhci-dbc.c
+++ b/drivers/usb/early/xhci-dbc.c
@@ -30,6 +30,7 @@ static struct xdbc_state xdbc;
static bool early_console_keep;
#ifdef XDBC_TRACE
+#include <linux/trace_printk.h>
#define xdbc_trace trace_printk
#else
static inline void xdbc_trace(const char *fmt, ...) { }
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 8045e4ff270c..0eff4a0c6a6c 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -934,6 +934,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
}
#ifdef INLINE_DIR_DEBUG
+#include <linux/trace_printk.h>
void ext4_show_inline_dir(struct inode *dir, struct buffer_head *bh,
void *inline_start, int inline_size)
{
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 02bc5027523a..b5336a81e619 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -8,6 +8,8 @@
#define _LINUX_FTRACE_H
#include <linux/trace_recursion.h>
+#include <linux/trace_controls.h>
+#include <linux/trace_printk.h>
#include <linux/trace_clock.h>
#include <linux/jump_label.h>
#include <linux/kallsyms.h>
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e5570a16cbb1..e87a40fbd152 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -31,7 +31,6 @@
#include <linux/build_bug.h>
#include <linux/sprintf.h>
#include <linux/static_call_types.h>
-#include <linux/trace_printk.h>
#include <linux/util_macros.h>
#include <linux/wordpart.h>
diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index ab61bed2f7af..7524f5d82fba 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -29,6 +29,7 @@ extern unsigned int nlm_debug;
# define ifdebug(fac) if (unlikely(rpc_debug & RPCDBG_##fac))
# if IS_ENABLED(CONFIG_SUNRPC_DEBUG_TRACE)
+# include <linux/trace_printk.h>
# define __sunrpc_printk(fmt, ...) trace_printk(fmt, ##__VA_ARGS__)
# else
# define __sunrpc_printk(fmt, ...) printk(KERN_DEFAULT fmt, ##__VA_ARGS__)
diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
index a488ea9e9f85..74ce4f8995c4 100644
--- a/include/linux/trace_printk.h
+++ b/include/linux/trace_printk.h
@@ -1,11 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_TRACE_PRINTK_H
#define _LINUX_TRACE_PRINTK_H
+#if !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) && !defined(BUILD_VDSO)
-#include <linux/compiler_attributes.h>
#include <linux/instruction_pointer.h>
#include <linux/stddef.h>
#include <linux/stringify.h>
+#include <linux/stdarg.h>
#ifdef CONFIG_TRACING
static inline __printf(1, 2)
@@ -147,5 +148,5 @@ ftrace_vprintk(const char *fmt, va_list ap)
return 0;
}
#endif /* CONFIG_TRACING */
-
+#endif /* !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) && !defined(BUILD_VDSO) */
#endif
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 593e3b59e42e..2bb25caebb75 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -5,6 +5,7 @@
* Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com>
*/
#include <linux/ring_buffer.h>
+#include <linux/trace_printk.h>
#include <linux/completion.h>
#include <linux/kthread.h>
#include <uapi/linux/sched/types.h>
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index bfe98ce826f3..de81b9b4ca7d 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -12,6 +12,7 @@
#define pr_fmt(fmt) "%s: " fmt, __func__
+#include <linux/trace_printk.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/fprobe.h>
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index bf2411aa6fd7..159190f4103f 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -1,6 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/module.h>
-
#include <linux/mm.h> /* for handle_mm_fault() */
#include <linux/ftrace.h>
#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
diff --git a/samples/trace_printk/trace-printk.c b/samples/trace_printk/trace-printk.c
index cfc159580263..ff37aeb8523e 100644
--- a/samples/trace_printk/trace-printk.c
+++ b/samples/trace_printk/trace-printk.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/trace_printk.h>
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/irq_work.h>
--
2.53.0
^ permalink raw reply related
* [PATCH v4 1/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Steven Rostedt @ 2026-06-25 10:40 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
kvm, intel-gfx
In-Reply-To: <20260625104007.041432666@kernel.org>
From: Steven Rostedt <rostedt@goodmis.org>
Remove the prototypes of the code that is not associated with
trace_printk() from trace_printk.h.
These control functions as well as ftrace_dump() and trace_dump_stack()
are used in cases where things go wrong. The main use case is to do a
trace_dump_stack(); tracing_off(); ftrace_dump(); in a place that detected
that something went wrong, whereas, trace_printk() is added to normal code
during debugging and removed before committing upstream. The dump code is
fine to keep in production.
Suggested-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
Changes since v3: https://patch.msgid.link/20260624081948.147764194@kernel.org
- Move include out of #if statement in rcu.h
kernel test robot found other configs that could require the
control functions in rcu.h. Just always include it in that file.
arch/powerpc/xmon/xmon.c | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/machine_kexec.c | 1 +
drivers/gpu/drm/i915/i915_gem.h | 1 +
drivers/tty/sysrq.c | 1 +
include/linux/trace_controls.h | 54 ++++++++++++++++++++++++++++++++
include/linux/trace_printk.h | 51 ------------------------------
kernel/debug/debug_core.c | 1 +
kernel/panic.c | 1 +
kernel/rcu/rcu.h | 1 +
kernel/rcu/rcutorture.c | 1 +
kernel/trace/trace.h | 1 +
kernel/trace/trace_benchmark.c | 1 +
lib/sys_info.c | 1 +
14 files changed, 66 insertions(+), 51 deletions(-)
create mode 100644 include/linux/trace_controls.h
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index cb3a3244ae6f..2135f319e0dd 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -27,6 +27,7 @@
#include <linux/highmem.h>
#include <linux/security.h>
#include <linux/debugfs.h>
+#include <linux/trace_controls.h>
#include <asm/ptrace.h>
#include <asm/smp.h>
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index 3c346b02ceb9..baac66cc4de4 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -22,6 +22,7 @@
#include <linux/debug_locks.h>
#include <linux/vmalloc.h>
#include <linux/secure_boot.h>
+#include <linux/trace_controls.h>
#include <asm/asm-extable.h>
#include <asm/machine.h>
#include <asm/diag.h>
diff --git a/arch/s390/kernel/machine_kexec.c b/arch/s390/kernel/machine_kexec.c
index baeb3dcfc1c8..33f9a89eb3ad 100644
--- a/arch/s390/kernel/machine_kexec.c
+++ b/arch/s390/kernel/machine_kexec.c
@@ -12,6 +12,7 @@
#include <linux/delay.h>
#include <linux/reboot.h>
#include <linux/ftrace.h>
+#include <linux/trace_controls.h>
#include <linux/debug_locks.h>
#include <linux/cpufeature.h>
#include <asm/guarded_storage.h>
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 20b3cb29cfff..1da8fb61c09e 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -116,6 +116,7 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
#endif
#if IS_ENABLED(CONFIG_DRM_I915_TRACE_GEM)
+#include <linux/trace_controls.h>
#define GEM_TRACE(...) trace_printk(__VA_ARGS__)
#define GEM_TRACE_ERR(...) do { \
pr_err(__VA_ARGS__); \
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index c2e4b31b699a..d3f72dc430b8 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -324,6 +324,7 @@ static const struct sysrq_key_op sysrq_showstate_blocked_op = {
};
#ifdef CONFIG_TRACING
+#include <linux/trace_controls.h>
#include <linux/ftrace.h>
static void sysrq_ftrace_dump(u8 key)
diff --git a/include/linux/trace_controls.h b/include/linux/trace_controls.h
new file mode 100644
index 000000000000..995b97e963b4
--- /dev/null
+++ b/include/linux/trace_controls.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_TRACE_CONTROLS_H
+#define _LINUX_TRACE_CONTROLS_H
+
+
+/*
+ * General tracing related utility functions - trace_printk(),
+ * tracing_on/tracing_off and tracing_start()/tracing_stop
+ *
+ * Use tracing_on/tracing_off when you want to quickly turn on or off
+ * tracing. It simply enables or disables the recording of the trace events.
+ * This also corresponds to the user space /sys/kernel/tracing/tracing_on
+ * file, which gives a means for the kernel and userspace to interact.
+ * Place a tracing_off() in the kernel where you want tracing to end.
+ * From user space, examine the trace, and then echo 1 > tracing_on
+ * to continue tracing.
+ *
+ * tracing_stop/tracing_start has slightly more overhead. It is used
+ * by things like suspend to ram where disabling the recording of the
+ * trace is not enough, but tracing must actually stop because things
+ * like calling smp_processor_id() may crash the system.
+ *
+ * Most likely, you want to use tracing_on/tracing_off.
+ */
+enum ftrace_dump_mode {
+ DUMP_NONE,
+ DUMP_ALL,
+ DUMP_ORIG,
+ DUMP_PARAM,
+};
+
+#ifdef CONFIG_TRACING
+void tracing_on(void);
+void tracing_off(void);
+int tracing_is_on(void);
+void tracing_snapshot(void);
+void tracing_snapshot_alloc(void);
+void tracing_start(void);
+void tracing_stop(void);
+void trace_dump_stack(int skip);
+void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
+#else
+static inline void tracing_start(void) { }
+static inline void tracing_stop(void) { }
+static inline void tracing_on(void) { }
+static inline void tracing_off(void) { }
+static inline int tracing_is_on(void) { return 0; }
+static inline void tracing_snapshot(void) { }
+static inline void tracing_snapshot_alloc(void) { }
+static inline void trace_dump_stack(int skip) { }
+static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
+#endif
+
+#endif /* _LINUX_TRACE_CONTROLS_H */
diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
index 3d54f440dccf..a488ea9e9f85 100644
--- a/include/linux/trace_printk.h
+++ b/include/linux/trace_printk.h
@@ -7,43 +7,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
-/*
- * General tracing related utility functions - trace_printk(),
- * tracing_on/tracing_off and tracing_start()/tracing_stop
- *
- * Use tracing_on/tracing_off when you want to quickly turn on or off
- * tracing. It simply enables or disables the recording of the trace events.
- * This also corresponds to the user space /sys/kernel/tracing/tracing_on
- * file, which gives a means for the kernel and userspace to interact.
- * Place a tracing_off() in the kernel where you want tracing to end.
- * From user space, examine the trace, and then echo 1 > tracing_on
- * to continue tracing.
- *
- * tracing_stop/tracing_start has slightly more overhead. It is used
- * by things like suspend to ram where disabling the recording of the
- * trace is not enough, but tracing must actually stop because things
- * like calling smp_processor_id() may crash the system.
- *
- * Most likely, you want to use tracing_on/tracing_off.
- */
-
-enum ftrace_dump_mode {
- DUMP_NONE,
- DUMP_ALL,
- DUMP_ORIG,
- DUMP_PARAM,
-};
-
#ifdef CONFIG_TRACING
-void tracing_on(void);
-void tracing_off(void);
-int tracing_is_on(void);
-void tracing_snapshot(void);
-void tracing_snapshot_alloc(void);
-
-extern void tracing_start(void);
-extern void tracing_stop(void);
-
static inline __printf(1, 2)
void ____trace_printk_check_format(const char *fmt, ...)
{
@@ -149,8 +113,6 @@ int __trace_printk(unsigned long ip, const char *fmt, ...);
extern int __trace_bputs(unsigned long ip, const char *str);
extern int __trace_puts(unsigned long ip, const char *str);
-extern void trace_dump_stack(int skip);
-
/*
* The double __builtin_constant_p is because gcc will give us an error
* if we try to allocate the static variable to fmt if it is not a
@@ -173,19 +135,7 @@ __ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap);
extern __printf(2, 0) int
__ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
-
-extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
#else
-static inline void tracing_start(void) { }
-static inline void tracing_stop(void) { }
-static inline void trace_dump_stack(int skip) { }
-
-static inline void tracing_on(void) { }
-static inline void tracing_off(void) { }
-static inline int tracing_is_on(void) { return 0; }
-static inline void tracing_snapshot(void) { }
-static inline void tracing_snapshot_alloc(void) { }
-
static inline __printf(1, 2)
int trace_printk(const char *fmt, ...)
{
@@ -196,7 +146,6 @@ ftrace_vprintk(const char *fmt, va_list ap)
{
return 0;
}
-static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
#endif /* CONFIG_TRACING */
#endif
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index b276504c1c6b..f9c83a470c98 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -27,6 +27,7 @@
#define pr_fmt(fmt) "KGDB: " fmt
+#include <linux/trace_controls.h>
#include <linux/pid_namespace.h>
#include <linux/clocksource.h>
#include <linux/serial_core.h>
diff --git a/kernel/panic.c b/kernel/panic.c
index 213725b612aa..1415e910371d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -9,6 +9,7 @@
* This function is used through-out the kernel (including mm and fs)
* to indicate a major problem.
*/
+#include <linux/trace_controls.h>
#include <linux/debug_locks.h>
#include <linux/sched/debug.h>
#include <linux/interrupt.h>
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index fa6d30ce73d1..735a80df0b30 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <trace/events/rcu.h>
+#include <linux/trace_controls.h>
/*
* Grace-period counter management.
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 882a158ada7b..76bf0184b267 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -39,6 +39,7 @@
#include <linux/srcu.h>
#include <linux/slab.h>
#include <linux/trace_clock.h>
+#include <linux/trace_controls.h>
#include <asm/byteorder.h>
#include <linux/torture.h>
#include <linux/vmalloc.h>
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 80fe152af1dd..2537c33ddd49 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/once_lite.h>
#include <linux/ftrace_regs.h>
+#include <linux/trace_controls.h>
#include <linux/llist.h>
#include "pid_list.h"
diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
index e19c32f2a938..69cc39008c36 100644
--- a/kernel/trace/trace_benchmark.c
+++ b/kernel/trace/trace_benchmark.c
@@ -3,6 +3,7 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/trace_clock.h>
+#include <linux/trace_controls.h>
#define CREATE_TRACE_POINTS
#include "trace_benchmark.h"
diff --git a/lib/sys_info.c b/lib/sys_info.c
index f32a06ec9ed4..e3c9ca05601b 100644
--- a/lib/sys_info.c
+++ b/lib/sys_info.c
@@ -8,6 +8,7 @@
#include <linux/ftrace.h>
#include <linux/nmi.h>
#include <linux/sched/debug.h>
+#include <linux/trace_controls.h>
#include <linux/string.h>
#include <linux/sysctl.h>
--
2.53.0
^ permalink raw reply related
* Re: [PATCH v8 24/46] KVM: guest_memfd: Make in-place conversion the default
From: Yan Zhao @ 2026-06-25 10:57 UTC (permalink / raw)
To: Sean Christopherson, Ackerley Tng, aik, andrew.jones, binbin.wu,
brauner, chao.p.peng, david, jmattson, jthoughton, michael.roth,
oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
shivankg, steven.price, tabba, willy, wyihan, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen, Yuanchu Xie,
Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt, Kiryl Shutsemau,
Baoquan He, Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <ajyJhZcgfYFtGfS2@yzhao56-desk.sh.intel.com>
On Thu, Jun 25, 2026 at 09:51:01AM +0800, Yan Zhao wrote:
> On Wed, Jun 24, 2026 at 05:41:58PM -0700, Sean Christopherson wrote:
> > On Wed, Jun 24, 2026, Ackerley Tng wrote:
> > > Yan Zhao <yan.y.zhao@intel.com> writes:
> > > > With gmem_in_place_conversion=true, userspace can create guest_memfd without the
> > > > MMAP flag. In such cases, shared memory is allocated from different backends.
> > > > This means this module parameter only enables per-gmem memory attribute and does
> > > > not guarantee that gmem in-place conversion will actually occur.
> >
> > KVM module params are pretty much always about what KVM supports, not what is
> > guaranteed to happen.
> >
> > - enable_mmio_caching doesn't guarantee there will actually be MMIO SPTEs,
> > because maybe the guest never accesses emulated MMIO.
> > - enable_pmu doesn't guarantee VMs will get a PMU, because userspace may elect
> > not to advertise one.
> > - and so on and so forth...
> >
> > Yes, there's a small mental jump to get from "KVM supports in-place conversion"
> > to "I need to set memory attributes on the guest_memfd instance, not the VM",
> > but I don't see that as a big hurdle, certainly not in the long term. And once
> > the VMM code is written, I really do think most people are going to care about
> > whether or not KVM supports in-place conversion, not where PRIVATE is tracked.
> Sorry, I just saw this mail after posting my reply in [1].
>
> I'm ok with gmem_in_place_conversion=true just means KVM supports in-place
> conversion, while we can still create VMs with shared memory not from gmem.
Or what about "allow_gmem_in_place_conversion" ?
> Though it still feels a bit odd to require TDX huge pages to depend on
> gmem_in_place_conversion=true when shared memory is not currently allocated from
> gmem, it should become more natural over time once gmem supports in-place
> conversions for huge page.
>
> [1] https://lore.kernel.org/all/ajyCn0PnFtQK+Nka@yzhao56-desk.sh.intel.com
>
>
> > > > To avoid confusion, could we rename this module parameter to something more
> > > > accurate, such as gmem_memory_attribute?
> > >
> > > I asked Sean about this after getting some fixes off list. Sean said
> > > gmem_in_place_conversion is named for a host admin to use, and something
> > > like gmem_memory_attributes is too much implementation details for the
> > > admin.
> > >
> > > Sean, would you reconsider since Yan also asked? If the admin compiled
> > > the kernel knowing what CONFIG_KVM_VM_MEMORY_ATTRIBUTES means, then the
> > > admin would also be able to use a param like gmem_memory_attributes?
> >
> > No, because it's not all memory attributes, it's very specifically the PRIVATE
> > attribute that will get moved to guest_memfd. I don't want to pick a name that
> > will become stale and confusing when RWX attributes come along. The RWX bits
> > will be per-VM, while PRIVATE will be per-guest_memfd.
^ permalink raw reply
* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Jani Nikula @ 2026-06-25 11:00 UTC (permalink / raw)
To: Kaitao Cheng, David Laight, Christian König,
David Hildenbrand (Arm), Alexei Starovoitov
Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
Alexander Viro, Christian Brauner, Daniel Borkmann,
Andrii Nakryiko, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Juri Lelli, Vincent Guittot, Paul Moore, Andy Shevchenko,
Paul E. McKenney, Shakeel Butt, David Howells, Simona Vetter,
Randy Dunlap, Luca Ceresoli, Philipp Stanner, linux-block,
linux-kernel, cgroups, linux-ntfs-dev, linux-fsdevel, io-uring,
audit, bpf, netdev, dri-devel, linux-perf-users,
linux-trace-kernel, kexec, live-patching, linux-modules,
linux-crypto, linux-pm, rcu, sched-ext, linux-mm, virtualization,
damon, llvm, Kaitao Cheng, Muchun Song
In-Reply-To: <0ed6b5c3-e955-46e2-9fc6-075a0dfd1c4f@linux.dev>
On Thu, 25 Jun 2026, Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> 在 2026/6/24 22:23, David Laight 写道:
>> On Wed, 24 Jun 2026 15:23:47 +0200
>> Christian König <christian.koenig@amd.com> wrote:
>>> On 6/24/26 15:14, Kaitao Cheng wrote:
>>>> 在 2026/6/22 16:42, David Laight 写道:
>>>>> On Mon, 22 Jun 2026 12:05:31 +0800
>>>>> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>>>>
>>>>>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>>
>>>>>> The list_for_each*_safe() helpers are used when the loop body may
>>>>>> remove the current entry. Their API exposes the temporary cursor at
>>>>>> every call site, even though most users only need it for the iterator
>>>>>> implementation and never reference it in the loop body.
>>>>>>
>>>>>> Add *_mutable() variants for list and hlist iteration. The new helpers
>>>>>> support both forms: callers may keep passing an explicit temporary cursor
>>>>>> when they need to inspect or reset it, or omit it and let the helper use
>>>>>> a unique internal cursor.
>>>>>
>>>>> I'm not really sure 'mutable' means anything either.
>>>>> It is possible to make it valid for the loop body (or even other threads)
>>>>> to delete arbitrary list items - but that needs significant extra overheads.
>>>>>
>>>>> It might be worth doing something that doesn't need the extra variable,
>>>>> but there is little point doing all the churn just to rename things.
>>>>>
>>>>>>
>>>>>> This makes call sites that only mutate the list through the current entry
>>>>>> less noisy, while keeping the existing *_safe() helpers available for
>>>>>> compatibility.
>>>>>>
>>>>>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>> ---
>>>>>> include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
>>>>>> 1 file changed, 231 insertions(+), 38 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/list.h b/include/linux/list.h
>>>>>> index 09d979976b3b..1081def7cea9 100644
>>>>>> --- a/include/linux/list.h
>>>>>> +++ b/include/linux/list.h
>>>>>> @@ -7,6 +7,7 @@
>>>>>> #include <linux/stddef.h>
>>>>>> #include <linux/poison.h>
>>>>>> #include <linux/const.h>
>>>>>> +#include <linux/args.h>
>>>>>>
>>>>>> #include <asm/barrier.h>
>>>>>>
>>>>>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
>>>>>> #define list_for_each_prev(pos, head) \
>>>>>> for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>>>>>>
>>>>>> -/**
>>>>>> - * list_for_each_safe - iterate over a list safe against removal of list entry
>>>>>> - * @pos: the &struct list_head to use as a loop cursor.
>>>>>> - * @n: another &struct list_head to use as temporary storage
>>>>>> - * @head: the head for your list.
>>>>>> +/*
>>>>>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
>>>>>> */
>>>>>> #define list_for_each_safe(pos, n, head) \
>>>>>> for (pos = (head)->next, n = pos->next; \
>>>>>> !list_is_head(pos, (head)); \
>>>>>> pos = n, n = pos->next)
>>>>>>
>>>>>> +#define __list_for_each_mutable_internal(pos, tmp, head) \
>>>>>> + for (typeof(pos) tmp = (pos = (head)->next)->next; \
>>>>>
>>>>> Use auto
>>>>>
>>>>>> + !list_is_head(pos, (head)); \
>>>>>> + pos = tmp, tmp = pos->next)
>>>>>> +
>>>>>> +#define __list_for_each_mutable1(pos, head) \
>>>>>> + __list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
>>>>>> +
>>>>>> +#define __list_for_each_mutable2(pos, next, head) \
>>>>>> + list_for_each_safe(pos, next, head)
>>>>>> +
>>>>>> /**
>>>>>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
>>>>>> + * list_for_each_mutable - iterate over a list safe against entry removal
>>>>>> * @pos: the &struct list_head to use as a loop cursor.
>>>>>> - * @n: another &struct list_head to use as temporary storage
>>>>>> - * @head: the head for your list.
>>>>>> + * @...: either (head) or (next, head)
>>>>>> + *
>>>>>> + * next: another &struct list_head to use as optional temporary storage.
>>>>>> + * The temporary cursor is internal unless explicitly supplied by
>>>>>> + * the caller.
>>>>>> + * head: the head for your list.
>>>>>> + */
>>>>>> +#define list_for_each_mutable(pos, ...) \
>>>>>> + CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
>>>>>> + (pos, __VA_ARGS__)
>>>>>
>>>>> The variable argument count logic really just slows down compilation.
>>>>> Maybe there aren't enough copies of this code to make that significant.
>>>>> But just because you can do it doesn't mean it is a gooD idea.
>>>>> I'm also not sure it really adds anything to the readability.
>>>>>
>>>>> And, it you are going to make the middle argument optional there is
>>>>> no need to change the macro name.
>>>>
>>>> Christian König and Jani Nikula also disagree with the variadic-argument
>>>> implementation approach. If we abandon that method, it means we will
>>>> inevitably need to add some new macros. If mutable is not a good name,
>>>> suggestions for better alternatives would be welcome; coming up with a
>>>> suitable name is indeed rather tricky.
>>>
>>> I don't think you need to add a new macro for the specific use case that people want to modify the next element of the iteration.
>>>
>>> If I remember your numbers correctly that is a really corner case and keeping using the existing *_safe() macros for that sounds perfectly fine to me.
>>
>> IIRC currently you have a choice of either:
>> define Item that can't be deleted
>> list_for_each() The current item.
>> list_for_each_safe() The next item.
>> There is also likely to be code that updates the variables to allow
>> for other scenarios.
>>
>> Note that if increase a reference count and release a lock then list_for_each()
>> is likely safer than list_for_each_safe() :-)
>>
>> list.h has 9 variants of the 'safe' loop.
>> The bloat of another 9 is getting excessive.
>>
>> It has to be said that this is one of my least favourite type of list...
>
> Hi Christian König, David Laight, Jani Nikula, David Hildenbrand,
> Andy Shevchenko, Alexei Starovoitov
>
> For ease of discussion, I need to summarize the currently possible
> approaches and briefly describe their respective pros and cons,
> using the list_for_each_entry* interfaces as examples.
>
> 1. Add list_for_each_entry_mutable, while keeping list_for_each_entry
> and list_for_each_entry_safe unchanged. list_for_each_entry_mutable
> would be used specifically for safe deletion scenarios that do not
> need to expose the temporary cursor externally. The code can refer to
> the v1 version.
>
> Pros: Does not depend on immediate per-subsystem adaptation and can be
> merged directly.
> Cons: Requires adding a whole set of mutable interfaces, which makes the
> code somewhat redundant.
Seems fine, and the original _safe naming is ambiguous anyway.
> 2. Directly optimize away the temporary cursor in list_for_each_entry_safe
> and define it inside the loop instead, changing the interface from four
> arguments to three.
>
> Pros: Does not add redundant interfaces.
> Cons: (1) Users need to manually update special cases that use the
> traversal variable of list_for_each_entry_safe, the new
> list_for_each_entry_safe would no longer apply there and would
> need to be open-coded.
> (2) Because the macro arguments changes, all list_for_each_entry_safe
> callers would need to be modified and merged together, making it
> difficult to merge such a large amount of code at once.
This won't fly because there are literally thousands of
list_for_each_entry_safe() users.
> 3. Use a variadic macro approach to optimize list_for_each_entry_safe,
> so that it supports both three and four arguments.
>
> Pros: (1) Does not add redundant interfaces.
> (2) Does not depend on immediate per-subsystem adaptation and can
> be merged directly.
> Cons: (1) Increases compile time.
> (2) Makes the interface harder for users to use.
Basically I'm against any variadic macro tricks where the optional
argument is not the last argument. That's just way too surprising, and
goes against common practice in just about all other languages.
> 4. Optimize list_for_each_entry by defining the temporary cursor internally,
> making it compatible with the functionality of list_for_each_entry_safe.
> The code can refer to the v2 version.
>
> Pros: (1) Does not add redundant interfaces.
> (2) The number of externally visible arguments of list_for_each_entry
> remains unchanged, still three.
> Cons: (1) list_for_each_entry and list_for_each_entry_safe would be merged
> into one, and list_for_each_entry_safe would gradually be deprecated.
> (2) Users need to manually update special cases that use the traversal
> variable of list_for_each_entry, the new list_for_each_entry would no
> longer apply there and would need to be open-coded. There are 15 such
> cases in total.
This sounds good to me, though I take it there's some code size increase
and/or performance penalty?
Maybe the 15 cases are questionable anyway?
> 5. Use a variadic macro approach to optimize list_for_each_entry, so that
> it supports both three and four arguments.
>
> Pros: (1) Does not add redundant interfaces.
> (2) Does not depend on immediate per-subsystem adaptation and can be
> merged directly.
> Cons: (1) Increases compile time.
> (2) list_for_each_entry and list_for_each_entry_safe would be merged
> into one, and list_for_each_entry_safe would gradually be deprecated.
Please don't do the macro tricks.
> 6. Make no changes, keep the current logic unchanged, and close the current
> email discussion.
I like hiding the temporary stuff when possible.
BR,
Jani.
--
Jani Nikula, Intel
^ permalink raw reply
* Re: [PATCH v4 0/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Jani Nikula @ 2026-06-25 11:05 UTC (permalink / raw)
To: Steven Rostedt, linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
kvm, intel-gfx
In-Reply-To: <20260625104007.041432666@kernel.org>
On Thu, 25 Jun 2026, Steven Rostedt <rostedt@kernel.org> wrote:
> Remove trace_printk.h by creating a trace_controls.h for those places that
> need access to tracing prototypes like tracing_off() and for the places that
> need trace_printk() directly, to have it included directly.
>
> Changse since v3: https://lore.kernel.org/all/20260624081806.120105649@kernel.org/
>
> - Always include trace_controls.h in rcu.h (kernel test robot)
>
> There are other configs that may include tracing_off() in rcu.h besides
> the one that had the include of trace_controls.h. Just always include
> it in that header to be safe.
>
> Steven Rostedt (2):
> tracing: Move non-trace_printk prototypes into trace_controls.h
> tracing: Remove trace_printk.h from kernel.h
>
> ----
> arch/powerpc/kvm/book3s_xics.c | 1 +
> arch/powerpc/xmon/xmon.c | 1 +
> arch/s390/kernel/ipl.c | 1 +
> arch/s390/kernel/machine_kexec.c | 1 +
> drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
> drivers/gpu/drm/i915/i915_gem.h | 2 ++
For the i915 parts,
Acked-by: Jani Nikula <jani.nikula@intel.com>
for merging via whichever tree.
> drivers/hwtracing/stm/dummy_stm.c | 1 +
> drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
> drivers/tty/sysrq.c | 1 +
> drivers/usb/early/xhci-dbc.c | 1 +
> fs/ext4/inline.c | 1 +
> include/linux/ftrace.h | 2 ++
> include/linux/kernel.h | 1 -
> include/linux/sunrpc/debug.h | 1 +
> include/linux/trace_controls.h | 54 ++++++++++++++++++++++++++++++++
> include/linux/trace_printk.h | 56 ++--------------------------------
> kernel/debug/debug_core.c | 1 +
> kernel/panic.c | 1 +
> kernel/rcu/rcu.h | 1 +
> kernel/rcu/rcutorture.c | 1 +
> kernel/trace/ring_buffer_benchmark.c | 1 +
> kernel/trace/trace.h | 1 +
> kernel/trace/trace_benchmark.c | 1 +
> lib/sys_info.c | 1 +
> samples/fprobe/fprobe_example.c | 1 +
> samples/ftrace/ftrace-direct-too.c | 1 -
> samples/trace_printk/trace-printk.c | 1 +
> 27 files changed, 82 insertions(+), 55 deletions(-)
> create mode 100644 include/linux/trace_controls.h
--
Jani Nikula, Intel
^ permalink raw reply
* Re: [PATCHv4 02/13] uprobes/x86: Remove struct uprobe_trampoline object
From: Oleg Nesterov @ 2026-06-25 11:19 UTC (permalink / raw)
To: Jiri Olsa
Cc: Peter Zijlstra, Ingo Molnar, Masami Hiramatsu, Andrii Nakryiko,
bpf, linux-trace-kernel
In-Reply-To: <ajvrZ_mMXnKrWf7h@redhat.com>
On 06/24, Oleg Nesterov wrote:
>
> Perhaps we can later optimize this code a bit? I mean something like
>
> start_reachable = ...;
> end_reachable = ...;
>
> VMA_ITERATOR(vmi, mm, start_reachable);
>
> for_each_vma(vmi, vma) {
> if (!vma_is_special_mapping(...))
> continue;
> if (vma->vm_start > end_reachable)
> break;
> return vma;
> }
I have looked into include/linux/mm.h, we have for_each_vma_range(). So I guess
for_each_vma_range(vmi, vma, end_reachable) {
if (vma_is_special_mapping(...))
return vma;
}
should work.
Oleg.
^ permalink raw reply
* Re: [PATCH v9 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE
From: Miaohe Lin @ 2026-06-25 12:02 UTC (permalink / raw)
To: Breno Leitao
Cc: linux-mm, linux-kernel, linux-doc, linux-kselftest,
linux-trace-kernel, kernel-team, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Shuah Khan, Naoya Horiguchi,
Jonathan Corbet, Shuah Khan, Liam R. Howlett, lance.yang,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
In-Reply-To: <20260609-ecc_panic-v9-2-432a74002e74@debian.org>
On 2026/6/9 18:56, Breno Leitao wrote:
> get_any_page() collapses every HWPoisonHandlable() rejection into a
> single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page()
> -> retry path. That is correct for the transient case (a userspace
> folio briefly off LRU during migration or compaction, which a later
> shake can drag back), but wrong for stable kernel-owned pages: slab,
> page-table, large-kmalloc and PG_reserved pages will never become
> HWPoisonHandlable(), so the retry loop is wasted work and the final
> -EIO loses the "this is structurally unrecoverable" information.
> memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the
> panic-on-unrecoverable sysctl deliberately does not act on.
>
> Introduce HWPoisonKernelOwned(), a small predicate that positively
> identifies pages the hwpoison handler cannot recover from:
>
> HWPoisonKernelOwned(p, flags) :=
> !(MF_SOFT_OFFLINE && page_has_movable_ops(p)) &&
> (PageReserved(p) ||
> PageSlab(head) || PageTable(head) || PageLargeKmalloc(head))
>
> where head = compound_head(p).
>
> PG_reserved is a per-page flag (PF_NO_COMPOUND) and is tested on the
> page directly. The slab, page-table and large-kmalloc page-type bits
> are only stored on the head page, so those tests resolve the compound
> head first, then re-read compound_head(page) afterwards: a concurrent
> split or compound free that moves head invalidates the just-read flags
> and the loop retries. The lookup still takes no refcount, mirroring
> the rest of get_any_page(); the recheck closes the common split race,
> and a residual free->alloc->free in the same window can only mis-tag
> a genuinely poisoned page, never reclassify a handlable one.
>
> The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors the
> same exception in HWPoisonHandlable(): soft-offline is allowed to
> migrate movable_ops pages even though they are not on the LRU, and
> we must not pre-empt that with an unrecoverable verdict.
>
> The list is intentionally not exhaustive. vmalloc and kernel-stack
> pages, for example, do not carry a page_type bit and would need a
> different oracle; they keep going through the existing retry path
> unchanged. This is the smallest set we can identify with certainty
> by page type.
>
> Wire the helper into the top of get_any_page() to short-circuit
> those pages before the retry loop runs. On a hit, drop the caller's
> MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE
> straight away. Pages outside the helper's positive list still take
> the existing retry path and return -EIO, leaving operator-visible
> behaviour for those cases unchanged.
>
> Extend the unhandlable-page pr_err() to fire for either errno and
> update the get_hwpoison_page() kerneldoc to document the new return.
>
> memory_failure() still folds every negative return into
> MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so
> this patch on its own only changes the errno that soft_offline_page()
> can propagate to its callers. A follow-up wires -ENOTRECOVERABLE
> through memory_failure() and reports MF_MSG_KERNEL for the
> unrecoverable cases, which is what the
> panic_on_unrecoverable_memory_failure sysctl observes.
>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Suggested-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> mm/memory-failure.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index f4d3e6e20e13..eed9de387694 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1325,6 +1325,46 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
> return PageLRU(page) || is_free_buddy_page(page);
> }
>
> +/*
> + * Positive identification of pages the hwpoison handler cannot recover.
> + * These page types are owned by kernel internals (no userspace mapping
> + * to unmap, no file mapping to invalidate, no migration target), so the
> + * shake_page() / retry loop in get_any_page() can never turn them into
> + * something HWPoisonHandlable() will accept. Short-circuit them to
> + * -ENOTRECOVERABLE so callers can panic on operator request instead of
> + * spinning through retries that exit as a transient-looking -EIO.
> + *
> + * The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors
> + * HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops
> + * pages even though they are not on the LRU.
> + */
> +static inline bool HWPoisonKernelOwned(struct page *page, unsigned long flags)
> +{
> + struct page *head;
> +
> + if ((flags & MF_SOFT_OFFLINE) && page_has_movable_ops(page))
> + return false;
> +
> + /* PG_reserved is a per-page flag, never set on a compound page. */
> + if (PageReserved(page))
> + return true;
> +
> + /*
> + * Page-type bits live only on the head page, so resolve any tail
> + * first. The check takes no refcount; recheck the head afterwards
> + * so a concurrent split or compound free cannot leave us trusting
> + * a stale view. A free->alloc->free in the same window is still
> + * possible but closing it would require taking a reference here.
> + */
> +retry:
> + head = compound_head(page);
> + if (!(PageSlab(head) || PageTable(head) || PageLargeKmalloc(head)))
> + return false;
> + if (head != compound_head(page))
> + goto retry;
Looks good to me with one comment: should we write above as something like below:
bool kernel_owned;
retry:
head = compound_head(page);
kernel_owned = PageSlab(head) || PageTable(head) || PageLargeKmalloc(head);
if (head != compound_head(page))
goto retry;
I.e. we should always check whether compound_head has changed, regardless of whether
the page is owned by the kernel, so we can obtain a relatively stable result?
Thanks.
.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox