* [PATCH v6 36/43] KVM: selftests: Reset shared memory after hole-punching
From: Ackerley Tng via B4 Relay @ 2026-05-07 20:22 UTC (permalink / raw)
To: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco, Ackerley Tng
In-Reply-To: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com>
From: Ackerley Tng <ackerleytng@google.com>
private_mem_conversions_test used to reset the shared memory that was used
for the test to an initial pattern at the end of each test iteration. Then,
it would punch out the pages, which would zero memory.
Without in-place conversion, the resetting would write shared memory, and
hole-punching will zero private memory, hence resetting the test to the
state at the beginning of the for loop.
With in-place conversion, resetting writes memory as shared, and
hole-punching zeroes the same physical memory, hence undoing the reset
done before the hole punch.
Move the resetting after the hole-punching, and reset the entire
PER_CPU_DATA_SIZE instead of just the tested range.
With in-place conversion, this zeroes and then resets the same physical
memory. Without in-place conversion, the private memory is zeroed, and the
shared memory is reset to init_p.
This is sufficient since at each test stage, the memory is assumed to start
as shared, and private memory is always assumed to start zeroed. Conversion
zeroes memory, so the future test stages will work as expected.
Fixes: 43f623f350ce1 ("KVM: selftests: Add x86-only selftest for private memory conversions")
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/x86/private_mem_conversions_test.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
index 861baff201e78..289ad10063fca 100644
--- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
+++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
@@ -202,15 +202,18 @@ static void guest_test_explicit_conversion(u64 base_gpa, bool do_fallocate)
guest_sync_shared(gpa, size, p3, p4);
memcmp_g(gpa, p4, size);
- /* Reset the shared memory back to the initial pattern. */
- memset((void *)gpa, init_p, size);
-
/*
* Free (via PUNCH_HOLE) *all* private memory so that the next
* iteration starts from a clean slate, e.g. with respect to
* whether or not there are pages/folios in guest_mem.
*/
guest_map_shared(base_gpa, PER_CPU_DATA_SIZE, true);
+
+ /*
+ * Hole-punching above zeroed private memory. Reset shared
+ * memory in preparation for the next GUEST_STAGE.
+ */
+ memset((void *)base_gpa, init_p, PER_CPU_DATA_SIZE);
}
}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [PATCH v6 37/43] KVM: selftests: Provide function to look up guest_memfd details from gpa
From: Ackerley Tng via B4 Relay @ 2026-05-07 20:22 UTC (permalink / raw)
To: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco, Ackerley Tng
In-Reply-To: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com>
From: Ackerley Tng <ackerleytng@google.com>
Introduce a new helper, kvm_gpa_to_guest_memfd(), to find the
guest_memfd-related details of a memory region that contains a given guest
physical address (GPA).
The function returns the file descriptor for the memfd, the offset into
the file that corresponds to the GPA, and the number of bytes remaining
in the region from that GPA.
kvm_gpa_to_guest_memfd() was factored out from vm_guest_mem_fallocate();
refactor vm_guest_mem_fallocate() to use the new helper.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/kvm_util.h | 3 +++
tools/testing/selftests/kvm/lib/kvm_util.c | 37 ++++++++++++++++----------
2 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index f7c49739e97b4..4ad184ce3df68 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -428,6 +428,9 @@ static inline void vm_enable_cap(struct kvm_vm *vm, u32 cap, u64 arg0)
vm_ioctl(vm, KVM_ENABLE_CAP, &enable_cap);
}
+int kvm_gpa_to_guest_memfd(struct kvm_vm *vm, gpa_t gpa, off_t *fd_offset,
+ size_t *nr_bytes);
+
/*
* KVM_SET_MEMORY_ATTRIBUTES{,2} overwrites _all_ attributes. These
* flows need significant enhancements to support multiple attributes.
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 5e34593ad79c4..8ff09e179ff5c 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1283,27 +1283,20 @@ void vm_guest_mem_fallocate(struct kvm_vm *vm, u64 base, u64 size,
bool punch_hole)
{
const int mode = FALLOC_FL_KEEP_SIZE | (punch_hole ? FALLOC_FL_PUNCH_HOLE : 0);
- struct userspace_mem_region *region;
u64 end = base + size;
- gpa_t gpa, len;
off_t fd_offset;
- int ret;
+ int fd, ret;
+ size_t len;
+ gpa_t gpa;
for (gpa = base; gpa < end; gpa += len) {
- u64 offset;
-
- region = userspace_mem_region_find(vm, gpa, gpa);
- TEST_ASSERT(region && region->region.flags & KVM_MEM_GUEST_MEMFD,
- "Private memory region not found for GPA 0x%lx", gpa);
+ fd = kvm_gpa_to_guest_memfd(vm, gpa, &fd_offset, &len);
+ len = min(end - gpa, len);
- offset = gpa - region->region.guest_phys_addr;
- fd_offset = region->region.guest_memfd_offset + offset;
- len = min_t(u64, end - gpa, region->region.memory_size - offset);
-
- ret = fallocate(region->region.guest_memfd, mode, fd_offset, len);
+ ret = fallocate(fd, mode, fd_offset, len);
TEST_ASSERT(!ret, "fallocate() failed to %s at %lx (len = %lu), fd = %d, mode = %x, offset = %lx",
punch_hole ? "punch hole" : "allocate", gpa, len,
- region->region.guest_memfd, mode, fd_offset);
+ fd, mode, fd_offset);
}
}
@@ -1640,6 +1633,22 @@ void *addr_gpa2alias(struct kvm_vm *vm, gpa_t gpa)
return (void *) ((uintptr_t) region->host_alias + offset);
}
+int kvm_gpa_to_guest_memfd(struct kvm_vm *vm, gpa_t gpa, off_t *fd_offset,
+ size_t *nr_bytes)
+{
+ struct userspace_mem_region *region;
+ gpa_t gpa_offset;
+
+ region = userspace_mem_region_find(vm, gpa, gpa);
+ TEST_ASSERT(region && region->region.flags & KVM_MEM_GUEST_MEMFD,
+ "guest_memfd memory region not found for GPA 0x%lx", gpa);
+
+ gpa_offset = gpa - region->region.guest_phys_addr;
+ *fd_offset = region->region.guest_memfd_offset + gpa_offset;
+ *nr_bytes = region->region.memory_size - gpa_offset;
+ return region->region.guest_memfd;
+}
+
/* Create an interrupt controller chip for the specified VM. */
void vm_create_irqchip(struct kvm_vm *vm)
{
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [PATCH v6 39/43] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot
From: Ackerley Tng via B4 Relay @ 2026-05-07 20:22 UTC (permalink / raw)
To: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco, Ackerley Tng
In-Reply-To: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com>
From: Sean Christopherson <seanjc@google.com>
Check that a valid fd provided to mmap() must be accompanied by MAP_SHARED.
With an invalid fd (usually used for anonymous mappings), there are no
constraints on mmap() flags.
Add this check to make sure that when a guest_memfd is used as region->fd,
the flag provided to mmap() will include MAP_SHARED.
Signed-off-by: Sean Christopherson <seanjc@google.com>
[Rephrase assertion message.]
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 8ff09e179ff5c..efad3e6b19f4b 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1088,6 +1088,9 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
src_type == VM_MEM_SRC_SHARED_HUGETLB);
}
+ TEST_ASSERT(region->fd == -1 || backing_src_is_shared(src_type),
+ "A valid fd provided to mmap() must be accompanied by MAP_SHARED.");
+
region->mmap_start = __kvm_mmap(region->mmap_size, PROT_READ | PROT_WRITE,
vm_mem_backing_src_alias(src_type)->flag,
region->fd, mmap_offset);
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [PATCH v6 40/43] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe
From: Ackerley Tng via B4 Relay @ 2026-05-07 20:22 UTC (permalink / raw)
To: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco, Ackerley Tng
In-Reply-To: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com>
From: Ackerley Tng <ackerleytng@google.com>
The TEST_EXPECT_SIGBUS macro is not thread-safe as it uses a global
sigjmp_buf and installs a global SIGBUS signal handler. If multiple threads
execute the macro concurrently, they will race on installing the signal
handler and stomp on other threads' jump buffers, leading to incorrect test
behavior.
Make TEST_EXPECT_SIGBUS thread-safe with the following changes:
Share the KVM tests' global signal handler. sigaction() applies to all
threads; without sharing a global signal handler, one thread may have
removed the signal handler that another thread added, hence leading to
unexpected signals.
The alternative of layering signal handlers was considered, but calling
sigaction() within TEST_EXPECT_SIGBUS() necessarily creates a race. To
avoid adding new setup and teardown routines to do sigaction() and keep
usage of TEST_EXPECT_SIGBUS() simple, share the KVM tests' global signal
handler.
Opportunistically rename report_unexpected_signal to
catchall_signal_handler.
To continue to only expect SIGBUS within specific regions of code, use a
thread-specific variable, expecting_sigbus, to replace installing and
removing signal handlers.
Make the execution environment for the thread, sigjmp_buf, a
thread-specific variable.
As part of TEST_EXPECT_SIGBUS(), assert the prerequisite for this setup,
that the current signal handler is the catchall_signal_handler.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/include/test_util.h | 32 +++++++++++++------------
tools/testing/selftests/kvm/lib/kvm_util.c | 18 ++++++++++----
tools/testing/selftests/kvm/lib/test_util.c | 7 ------
3 files changed, 30 insertions(+), 27 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index c280c3233f502..c9ba4e010f0b8 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -82,21 +82,23 @@ do { \
__builtin_unreachable(); \
} while (0)
-extern sigjmp_buf expect_sigbus_jmpbuf;
-void expect_sigbus_handler(int signum);
-
-#define TEST_EXPECT_SIGBUS(action) \
-do { \
- struct sigaction sa_old, sa_new = { \
- .sa_handler = expect_sigbus_handler, \
- }; \
- \
- sigaction(SIGBUS, &sa_new, &sa_old); \
- if (sigsetjmp(expect_sigbus_jmpbuf, 1) == 0) { \
- action; \
- TEST_FAIL("'%s' should have triggered SIGBUS", #action); \
- } \
- sigaction(SIGBUS, &sa_old, NULL); \
+extern __thread sigjmp_buf expect_sigbus_jmpbuf;
+extern __thread volatile sig_atomic_t expecting_sigbus;
+extern void catchall_signal_handler(int signum);
+
+#define TEST_EXPECT_SIGBUS(action) \
+do { \
+ struct sigaction __sa = {}; \
+ \
+ TEST_ASSERT_EQ(sigaction(SIGBUS, NULL, &__sa), 0); \
+ TEST_ASSERT_EQ(__sa.sa_handler, &catchall_signal_handler); \
+ \
+ expecting_sigbus = true; \
+ if (sigsetjmp(expect_sigbus_jmpbuf, 1) == 0) { \
+ action; \
+ TEST_FAIL("'%s' should have triggered SIGBUS", #action);\
+ } \
+ expecting_sigbus = false; \
} while (0)
size_t parse_size(const char *size);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index efad3e6b19f4b..d1befa3f4b305 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2270,13 +2270,20 @@ __weak void kvm_selftest_arch_init(void)
{
}
-static void report_unexpected_signal(int signum)
+__thread sigjmp_buf expect_sigbus_jmpbuf;
+__thread volatile sig_atomic_t expecting_sigbus;
+
+void catchall_signal_handler(int signum)
{
+ switch (signum) {
+ case SIGBUS: {
+ if (expecting_sigbus)
+ siglongjmp(expect_sigbus_jmpbuf, 1);
+
+ TEST_FAIL("Unexpected SIGBUS (%d)\n", signum);
+ }
#define KVM_CASE_SIGNUM(sig) \
case sig: TEST_FAIL("Unexpected " #sig " (%d)\n", signum)
-
- switch (signum) {
- KVM_CASE_SIGNUM(SIGBUS);
KVM_CASE_SIGNUM(SIGSEGV);
KVM_CASE_SIGNUM(SIGILL);
KVM_CASE_SIGNUM(SIGFPE);
@@ -2288,12 +2295,13 @@ static void report_unexpected_signal(int signum)
void __attribute((constructor)) kvm_selftest_init(void)
{
struct sigaction sig_sa = {
- .sa_handler = report_unexpected_signal,
+ .sa_handler = catchall_signal_handler,
};
/* Tell stdout not to buffer its content. */
setbuf(stdout, NULL);
+ expecting_sigbus = false;
sigaction(SIGBUS, &sig_sa, NULL);
sigaction(SIGSEGV, &sig_sa, NULL);
sigaction(SIGILL, &sig_sa, NULL);
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
index bab1bd2b775b6..30eb701e4becd 100644
--- a/tools/testing/selftests/kvm/lib/test_util.c
+++ b/tools/testing/selftests/kvm/lib/test_util.c
@@ -18,13 +18,6 @@
#include "test_util.h"
-sigjmp_buf expect_sigbus_jmpbuf;
-
-void __attribute__((used)) expect_sigbus_handler(int signum)
-{
- siglongjmp(expect_sigbus_jmpbuf, 1);
-}
-
/*
* Random number generator that is usable from guest code. This is the
* Park-Miller LCG using standard constants.
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [PATCH v6 38/43] KVM: selftests: Provide common function to set memory attributes
From: Ackerley Tng via B4 Relay @ 2026-05-07 20:22 UTC (permalink / raw)
To: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco, Ackerley Tng
In-Reply-To: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com>
From: Sean Christopherson <seanjc@google.com>
Introduce vm_mem_set_memory_attributes(), which handles setting of memory
attributes for a range of guest physical addresses, regardless of whether
the attributes should be set via guest_memfd or via the memory attributes
at the VM level.
Refactor existing vm_mem_set_{shared,private} functions to use the new
function. Opportunistically update the size parameter to use size_t instead
of u64.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/include/kvm_util.h | 46 +++++++++++++++++++-------
1 file changed, 34 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 4ad184ce3df68..dc70c6da63fa9 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -454,18 +454,6 @@ static inline void vm_set_memory_attributes(struct kvm_vm *vm, gpa_t gpa,
vm_ioctl(vm, KVM_SET_MEMORY_ATTRIBUTES, &attr);
}
-static inline void vm_mem_set_private(struct kvm_vm *vm, gpa_t gpa,
- u64 size)
-{
- vm_set_memory_attributes(vm, gpa, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
-}
-
-static inline void vm_mem_set_shared(struct kvm_vm *vm, gpa_t gpa,
- u64 size)
-{
- vm_set_memory_attributes(vm, gpa, size, 0);
-}
-
static inline int __gmem_set_memory_attributes(int fd, loff_t offset,
size_t size, u64 attributes,
loff_t *error_offset)
@@ -534,6 +522,40 @@ static inline void gmem_set_shared(int fd, loff_t offset, size_t size)
gmem_set_memory_attributes(fd, offset, size, 0);
}
+static inline void vm_mem_set_memory_attributes(struct kvm_vm *vm, gpa_t gpa,
+ size_t size, u64 attrs)
+{
+ if (kvm_has_gmem_attributes) {
+ gpa_t end = gpa + size;
+ off_t fd_offset;
+ gpa_t addr;
+ size_t len;
+ int fd;
+
+ for (addr = gpa; addr < end; addr += len) {
+ fd = kvm_gpa_to_guest_memfd(vm, addr, &fd_offset, &len);
+ len = min(end - addr, len);
+
+ gmem_set_memory_attributes(fd, fd_offset, len, attrs);
+ }
+ } else {
+ vm_set_memory_attributes(vm, gpa, size, attrs);
+ }
+}
+
+static inline void vm_mem_set_private(struct kvm_vm *vm, gpa_t gpa,
+ size_t size)
+{
+ vm_mem_set_memory_attributes(vm, gpa, size,
+ KVM_MEMORY_ATTRIBUTE_PRIVATE);
+}
+
+static inline void vm_mem_set_shared(struct kvm_vm *vm, gpa_t gpa,
+ size_t size)
+{
+ vm_mem_set_memory_attributes(vm, gpa, size, 0);
+}
+
void vm_guest_mem_fallocate(struct kvm_vm *vm, gpa_t gpa, u64 size,
bool punch_hole);
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [PATCH v6 41/43] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd
From: Ackerley Tng via B4 Relay @ 2026-05-07 20:23 UTC (permalink / raw)
To: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco, Ackerley Tng
In-Reply-To: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com>
From: Ackerley Tng <ackerleytng@google.com>
Update the private memory conversions selftest to also test conversions
that are done "in-place" via per-guest_memfd memory attributes. In-place
conversions require the host to be able to mmap() the guest_memfd so that
the host and guest can share the same backing physical memory.
This includes several updates, that are conditioned on the system
supporting per-guest_memfd attributes (kvm_has_gmem_attributes):
1. Set up guest_memfd requesting MMAP and INIT_SHARED.
2. With in-place conversions, the host's mapping points directly to the
guest's memory. When the guest converts a region to private, host access
to that region is blocked. Update the test to expect a SIGBUS when
attempting to access the host virtual address (HVA) of private memory.
3. Use vm_mem_set_memory_attributes(), which chooses how to set memory
attributes based on whether kvm_has_gmem_attributes.
Restrict the test to using VM_MEM_SRC_SHMEM because guest_memfd's required
mmap() flags and page sizes happens to align with those of
VM_MEM_SRC_SHMEM. As long as VM_MEM_SRC_SHMEM is used for src_type,
vm_mem_add() works as intended.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../kvm/x86/private_mem_conversions_test.c | 44 ++++++++++++++++++----
1 file changed, 36 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
index 289ad10063fca..4308c67952310 100644
--- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
+++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
@@ -306,9 +306,12 @@ static void handle_exit_hypercall(struct kvm_vcpu *vcpu)
if (do_fallocate)
vm_guest_mem_fallocate(vm, gpa, size, map_shared);
- if (set_attributes)
- vm_set_memory_attributes(vm, gpa, size,
- map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE);
+ if (set_attributes) {
+ u64 attrs = map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE;
+
+ vm_mem_set_memory_attributes(vm, gpa, size, attrs);
+ }
+
run->hypercall.ret = 0;
}
@@ -352,8 +355,20 @@ static void *__test_mem_conversions(void *__vcpu)
size_t nr_bytes = min_t(size_t, vm->page_size, size - i);
u8 *hva = addr_gpa2hva(vm, gpa + i);
- /* In all cases, the host should observe the shared data. */
- memcmp_h(hva, gpa + i, uc.args[3], nr_bytes);
+ /*
+ * When using per-guest_memfd memory attributes,
+ * i.e. in-place conversion, host accesses will
+ * point at guest memory and should SIGBUS when
+ * guest memory is private. When using per-VM
+ * attributes, i.e. separate backing for shared
+ * vs. private, the host should always observe
+ * the shared data.
+ */
+ if (kvm_has_gmem_attributes &&
+ uc.args[0] == SYNC_PRIVATE)
+ TEST_EXPECT_SIGBUS(READ_ONCE(*hva));
+ else
+ memcmp_h(hva, gpa + i, uc.args[3], nr_bytes);
/* For shared, write the new pattern to guest memory. */
if (uc.args[0] == SYNC_SHARED)
@@ -382,6 +397,7 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, u32 nr_v
const size_t slot_size = memfd_size / nr_memslots;
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
pthread_t threads[KVM_MAX_VCPUS];
+ u64 gmem_flags;
struct kvm_vm *vm;
int memfd, i;
@@ -397,12 +413,17 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, u32 nr_v
vm_enable_cap(vm, KVM_CAP_EXIT_HYPERCALL, (1 << KVM_HC_MAP_GPA_RANGE));
- memfd = vm_create_guest_memfd(vm, memfd_size, 0);
+ if (kvm_has_gmem_attributes)
+ gmem_flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
+ else
+ gmem_flags = 0;
+
+ memfd = vm_create_guest_memfd(vm, memfd_size, gmem_flags);
for (i = 0; i < nr_memslots; i++)
vm_mem_add(vm, src_type, BASE_DATA_GPA + slot_size * i,
BASE_DATA_SLOT + i, slot_size / vm->page_size,
- KVM_MEM_GUEST_MEMFD, memfd, slot_size * i, 0);
+ KVM_MEM_GUEST_MEMFD, memfd, slot_size * i, gmem_flags);
for (i = 0; i < nr_vcpus; i++) {
gpa_t gpa = BASE_DATA_GPA + i * per_cpu_size;
@@ -452,17 +473,24 @@ static void usage(const char *cmd)
int main(int argc, char *argv[])
{
- enum vm_mem_backing_src_type src_type = DEFAULT_VM_MEM_SRC;
+ enum vm_mem_backing_src_type src_type;
u32 nr_memslots = 1;
u32 nr_vcpus = 1;
int opt;
TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
+ src_type = kvm_has_gmem_attributes ? VM_MEM_SRC_SHMEM :
+ DEFAULT_VM_MEM_SRC;
+
while ((opt = getopt(argc, argv, "hm:s:n:")) != -1) {
switch (opt) {
case 's':
src_type = parse_backing_src_type(optarg);
+ TEST_ASSERT(!kvm_has_gmem_attributes ||
+ src_type == VM_MEM_SRC_SHMEM,
+ "Testing in-place conversions, only %s mem_type supported\n",
+ vm_mem_backing_src_alias(VM_MEM_SRC_SHMEM)->name);
break;
case 'n':
nr_vcpus = atoi_positive("nr_vcpus", optarg);
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [PATCH v6 43/43] KVM: selftests: Update private memory exits test to work with per-gmem attributes
From: Ackerley Tng via B4 Relay @ 2026-05-07 20:23 UTC (permalink / raw)
To: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco, Ackerley Tng
In-Reply-To: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com>
From: Sean Christopherson <seanjc@google.com>
Skip setting memory to private in the private memory exits test when using
per-gmem memory attributes, as memory is initialized to private by default
for guest_memfd, and using vm_mem_set_private() on a guest_memfd instance
requires creating guest_memfd with GUEST_MEMFD_FLAG_MMAP (which is totally
doable, but would need to be conditional and is ultimately unnecessary).
Expect an emulated MMIO instead of a memory fault exit when attributes are
per-gmem, as deleting the memslot effectively drops the private status,
i.e. the GPA becomes shared and thus supports emulated MMIO.
Skip the "memslot not private" test entirely, as private vs. shared state
for x86 software-protected VMs comes from the memory attributes themselves,
and so when doing in-place conversions there can never be a disconnect
between the expected and actual states.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
.../selftests/kvm/x86/private_mem_kvm_exits_test.c | 36 ++++++++++++++++++----
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c b/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
index 10db9fe6d9063..70ed16066c63e 100644
--- a/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
+++ b/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
@@ -62,8 +62,9 @@ static void test_private_access_memslot_deleted(void)
virt_map(vm, EXITS_TEST_GVA, EXITS_TEST_GPA, EXITS_TEST_NPAGES);
- /* Request to access page privately */
- vm_mem_set_private(vm, EXITS_TEST_GPA, EXITS_TEST_SIZE);
+ /* Request to access page privately. */
+ if (!kvm_has_gmem_attributes)
+ vm_mem_set_private(vm, EXITS_TEST_GPA, EXITS_TEST_SIZE);
pthread_create(&vm_thread, NULL,
(void *(*)(void *))run_vcpu_get_exit_reason,
@@ -74,10 +75,26 @@ static void test_private_access_memslot_deleted(void)
pthread_join(vm_thread, &thread_return);
exit_reason = (u32)(u64)thread_return;
- TEST_ASSERT_EQ(exit_reason, KVM_EXIT_MEMORY_FAULT);
- TEST_ASSERT_EQ(vcpu->run->memory_fault.flags, KVM_MEMORY_EXIT_FLAG_PRIVATE);
- TEST_ASSERT_EQ(vcpu->run->memory_fault.gpa, EXITS_TEST_GPA);
- TEST_ASSERT_EQ(vcpu->run->memory_fault.size, EXITS_TEST_SIZE);
+ /*
+ * If attributes are tracked per-gmem, deleting the memslot that points
+ * at the gmem instance effectively makes the memory shared, and so the
+ * read should trigger emulated MMIO.
+ *
+ * If attributes are tracked per-VM, deleting the memslot shouldn't
+ * affect the private attribute, and so KVM should generate a memory
+ * fault exit (emulated MMIO on private GPAs is disallowed).
+ */
+ if (kvm_has_gmem_attributes) {
+ TEST_ASSERT_EQ(exit_reason, KVM_EXIT_MMIO);
+ TEST_ASSERT_EQ(vcpu->run->mmio.phys_addr, EXITS_TEST_GPA);
+ TEST_ASSERT_EQ(vcpu->run->mmio.len, sizeof(u64));
+ TEST_ASSERT_EQ(vcpu->run->mmio.is_write, false);
+ } else {
+ TEST_ASSERT_EQ(exit_reason, KVM_EXIT_MEMORY_FAULT);
+ TEST_ASSERT_EQ(vcpu->run->memory_fault.flags, KVM_MEMORY_EXIT_FLAG_PRIVATE);
+ TEST_ASSERT_EQ(vcpu->run->memory_fault.gpa, EXITS_TEST_GPA);
+ TEST_ASSERT_EQ(vcpu->run->memory_fault.size, EXITS_TEST_SIZE);
+ }
kvm_vm_free(vm);
}
@@ -88,6 +105,13 @@ static void test_private_access_memslot_not_private(void)
struct kvm_vcpu *vcpu;
u32 exit_reason;
+ /*
+ * Accessing non-private memory as private with a software-protected VM
+ * isn't possible when doing in-place conversions.
+ */
+ if (kvm_has_gmem_attributes)
+ return;
+
vm = vm_create_shape_with_one_vcpu(protected_vm_shape, &vcpu,
guest_repeatedly_read);
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [PATCH v6 42/43] KVM: selftests: Add script to exercise private_mem_conversions_test
From: Ackerley Tng via B4 Relay @ 2026-05-07 20:23 UTC (permalink / raw)
To: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
ira.weiny, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
pratyush, suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Sean Christopherson, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
linux-mm, linux-coco, Ackerley Tng
In-Reply-To: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com>
From: Ackerley Tng <ackerleytng@google.com>
Add a wrapper script to simplify running the private_mem_conversions_test
with a variety of configurations. Manually invoking the test for all
supported memory backing source types is tedious.
The script automatically detects the availability of 2MB and 1GB hugepages
and builds a list of source types to test. It then iterates through the
list, running the test for each type with both a single memslot and
multiple memslots.
This makes it easier to get comprehensive test coverage across different
memory configurations.
Add and use a helper program in C to be able to read
KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES as defined in header files and then
issue the ioctl to read the KVM CAP.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/Makefile.kvm | 4 +
.../selftests/kvm/kvm_has_gmem_attributes.c | 17 +++
.../kvm/x86/private_mem_conversions_test.sh | 128 +++++++++++++++++++++
3 files changed, 149 insertions(+)
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 6232881be500a..e5769268936a7 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -54,6 +54,7 @@ LIBKVM_loongarch += lib/loongarch/exception.S
# Non-compiled test targets
TEST_PROGS_x86 += x86/nx_huge_pages_test.sh
+TEST_PROGS_x86 += x86/private_mem_conversions_test.sh
# Compiled test targets valid on all architectures with libkvm support
TEST_GEN_PROGS_COMMON = demand_paging_test
@@ -67,6 +68,8 @@ TEST_GEN_PROGS_COMMON += set_memory_region_test
TEST_GEN_PROGS_COMMON += memslot_modification_stress_test
TEST_GEN_PROGS_COMMON += memslot_perf_test
+TEST_GEN_PROGS_EXTENDED_COMMON += kvm_has_gmem_attributes
+
# Compiled test targets
TEST_GEN_PROGS_x86 = $(TEST_GEN_PROGS_COMMON)
TEST_GEN_PROGS_x86 += x86/cpuid_test
@@ -245,6 +248,7 @@ SPLIT_TESTS += get-reg-list
TEST_PROGS += $(TEST_PROGS_$(ARCH))
TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(ARCH))
+TEST_GEN_PROGS_EXTENDED += $(TEST_GEN_PROGS_EXTENDED_COMMON)
TEST_GEN_PROGS_EXTENDED += $(TEST_GEN_PROGS_EXTENDED_$(ARCH))
LIBKVM += $(LIBKVM_$(ARCH))
diff --git a/tools/testing/selftests/kvm/kvm_has_gmem_attributes.c b/tools/testing/selftests/kvm/kvm_has_gmem_attributes.c
new file mode 100644
index 0000000000000..4f361349412fb
--- /dev/null
+++ b/tools/testing/selftests/kvm/kvm_has_gmem_attributes.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Utility to check if KVM supports guest_memfd attributes.
+ *
+ * Copyright (C) 2025, Google LLC.
+ */
+
+#include <stdio.h>
+
+#include "kvm_util.h"
+
+int main(void)
+{
+ printf("%u\n", kvm_check_cap(KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES) > 0);
+
+ return 0;
+}
diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh
new file mode 100755
index 0000000000000..7179a4fcdd498
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh
@@ -0,0 +1,128 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Wrapper script which runs different test setups of
+# private_mem_conversions_test.
+#
+# Copyright (C) 2025, Google LLC.
+
+NUM_VCPUS_TO_TEST=4
+NUM_MEMSLOTS_TO_TEST=$NUM_VCPUS_TO_TEST
+
+# Required pages are based on the test setup in the C code.
+REQUIRED_NUM_2M_HUGEPAGES=$((1024 * NUM_VCPUS_TO_TEST))
+REQUIRED_NUM_1G_HUGEPAGES=$((2 * NUM_VCPUS_TO_TEST))
+
+get_hugepage_count() {
+ local page_size_kb=$1
+ local path="/sys/kernel/mm/hugepages/hugepages-${page_size_kb}kB/nr_hugepages"
+ if [ -f "$path" ]; then
+ cat "$path"
+ else
+ echo 0
+ fi
+}
+
+get_default_hugepage_size_in_kb() {
+ local size=$(grep "Hugepagesize:" /proc/meminfo | awk '{print $2}')
+ echo "$size"
+}
+
+run_tests() {
+ local executable_path=$1
+ local src_type=$2
+ local num_memslots=$3
+ local num_vcpus=$4
+
+ echo "$executable_path -s $src_type -m $num_memslots -n $num_vcpus"
+ "$executable_path" -s "$src_type" -m "$num_memslots" -n "$num_vcpus"
+}
+
+script_dir=$(dirname "$(realpath "$0")")
+test_executable="${script_dir}/private_mem_conversions_test"
+kvm_has_gmem_attributes_tool="${script_dir}/../kvm_has_gmem_attributes"
+
+if [ ! -f "$test_executable" ]; then
+ echo "Error: Test executable not found at '$test_executable'" >&2
+ exit 1
+fi
+
+if [ ! -f "$kvm_has_gmem_attributes_tool" ]; then
+ echo "Error: kvm_has_gmem_attributes utility not found at '$kvm_has_gmem_attributes_tool'" >&2
+ exit 1
+fi
+
+kvm_has_gmem_attributes=$("$kvm_has_gmem_attributes_tool" | tail -n1)
+
+if [ "$kvm_has_gmem_attributes" -eq 1 ]; then
+ backing_src_types=("shmem")
+else
+ hugepage_2mb_count=$(get_hugepage_count 2048)
+ hugepage_2mb_enabled=$((hugepage_2mb_count >= REQUIRED_NUM_2M_HUGEPAGES))
+ hugepage_1gb_count=$(get_hugepage_count 1048576)
+ hugepage_1gb_enabled=$((hugepage_1gb_count >= REQUIRED_NUM_1G_HUGEPAGES))
+
+ default_hugepage_size_kb=$(get_default_hugepage_size_in_kb)
+ hugepage_default_enabled=0
+ if [ "$default_hugepage_size_kb" -eq 2048 ]; then
+ hugepage_default_enabled=$hugepage_2mb_enabled
+ elif [ "$default_hugepage_size_kb" -eq 1048576 ]; then
+ hugepage_default_enabled=$hugepage_1gb_enabled
+ fi
+
+ backing_src_types=("anonymous" "anonymous_thp")
+
+ if [ "$hugepage_default_enabled" -eq 1 ]; then
+ backing_src_types+=("anonymous_hugetlb")
+ else
+ echo "skipping anonymous_hugetlb backing source type"
+ fi
+
+ if [ "$hugepage_2mb_enabled" -eq 1 ]; then
+ backing_src_types+=("anonymous_hugetlb_2mb")
+ else
+ echo "skipping anonymous_hugetlb_2mb backing source type"
+ fi
+
+ if [ "$hugepage_1gb_enabled" -eq 1 ]; then
+ backing_src_types+=("anonymous_hugetlb_1gb")
+ else
+ echo "skipping anonymous_hugetlb_1gb backing source type"
+ fi
+
+ backing_src_types+=("shmem")
+
+ if [ "$hugepage_default_enabled" -eq 1 ]; then
+ backing_src_types+=("shared_hugetlb")
+ else
+ echo "skipping shared_hugetlb backing source type"
+ fi
+fi
+
+return_code=0
+for i in "${!backing_src_types[@]}"; do
+ src_type=${backing_src_types[$i]}
+ if [ "$i" -gt 0 ]; then
+ echo
+ fi
+
+ if ! run_tests "$test_executable" "$src_type" 1 1; then
+ return_code=$?
+ echo "Test failed for source type '$src_type'. Arguments: -s $src_type -m 1 -n 1" >&2
+ break
+ fi
+
+ if ! run_tests "$test_executable" "$src_type" 1 "$NUM_VCPUS_TO_TEST"; then
+ return_code=$?
+ echo "Test failed for source type '$src_type'. Arguments: -s $src_type -m 1 -n $NUM_VCPUS_TO_TEST" >&2
+ break
+ fi
+
+ if ! run_tests "$test_executable" "$src_type" "$NUM_MEMSLOTS_TO_TEST" "$NUM_VCPUS_TO_TEST"; then
+ return_code=$?
+ echo "Test failed for source type '$src_type'. Arguments: -s $src_type -m $NUM_MEMSLOTS_TO_TEST -n $NUM_VCPUS_TO_TEST" >&2
+ break
+ fi
+done
+
+exit "$return_code"
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [POC PATCH 0/5] guest_memfd in-place conversion selftests for SNP
From: Ackerley Tng @ 2026-05-07 20:34 UTC (permalink / raw)
To: devnull+ackerleytng.google.com
Cc: ackerleytng, aik, akpm, andrew.jones, aneesh.kumar, axelrasmussen,
baohua, bhe, binbin.wu, bp, brauner, chao.p.peng, chrisl, corbet,
dave.hansen, david, forkloop, hpa, ira.weiny, jgg, jmattson,
jthoughton, kas, kasong, kvm, liam, linux-coco, linux-doc,
linux-kernel, linux-kselftest, linux-mm, linux-trace-kernel,
mathieu.desnoyers, mhiramat, michael.roth, mingo, nphamcs, oupton,
pankaj.gupta, pbonzini, pratyush, qi.zheng, qperret,
rick.p.edgecombe, rientjes, rostedt, seanjc, shakeel.butt,
shikemeng, shivankg, shuah, skhan, steven.price, suzuki.poulose,
tabba, tglx, vannapurve, vbabka, weixugc, willy, wyihan, x86,
yan.y.zhao, youngjun.park, yuanchu
In-Reply-To: <20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4@google.com>
With these POC patches, I was able to test the set memory
attributes/conversion ioctls with SNP.
After allowing src_addr to be NULL, SNP_LAUNCH_UPDATE can accept NULL
for source address and the SNP VM runs fine. :)
Ackerley Tng (5):
KVM: selftests: Initialize guest_memfd with INIT_SHARED
KVM: selftests: Use guest_memfd memory contents in-place for SNP
launch update
KVM: selftests: Make guest_code_xsave more friendly
KVM: selftests: Allow specifying CoCo-privateness while mapping a page
KVM: selftests: Test conversions for SNP
.../selftests/kvm/include/x86/processor.h | 2 +
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +-
.../testing/selftests/kvm/lib/x86/processor.c | 13 +-
tools/testing/selftests/kvm/lib/x86/sev.c | 3 +-
.../selftests/kvm/x86/sev_smoke_test.c | 222 +++++++++++++++++-
5 files changed, 234 insertions(+), 18 deletions(-)
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply
* [POC PATCH 1/5] KVM: selftests: Initialize guest_memfd with INIT_SHARED
From: Ackerley Tng @ 2026-05-07 20:34 UTC (permalink / raw)
To: devnull+ackerleytng.google.com
Cc: ackerleytng, aik, akpm, andrew.jones, aneesh.kumar, axelrasmussen,
baohua, bhe, binbin.wu, bp, brauner, chao.p.peng, chrisl, corbet,
dave.hansen, david, forkloop, hpa, ira.weiny, jgg, jmattson,
jthoughton, kas, kasong, kvm, liam, linux-coco, linux-doc,
linux-kernel, linux-kselftest, linux-mm, linux-trace-kernel,
mathieu.desnoyers, mhiramat, michael.roth, mingo, nphamcs, oupton,
pankaj.gupta, pbonzini, pratyush, qi.zheng, qperret,
rick.p.edgecombe, rientjes, rostedt, seanjc, shakeel.butt,
shikemeng, shivankg, shuah, skhan, steven.price, suzuki.poulose,
tabba, tglx, vannapurve, vbabka, weixugc, willy, wyihan, x86,
yan.y.zhao, youngjun.park, yuanchu, Sagi Shahar
In-Reply-To: <cover.1778185936.git.ackerleytng@google.com>
Initialize guest_memfd with INIT_SHARED for VM types that require
guest_memfd.
Memory in the first memslot is used by the selftest framework to load
code, page tables, interrupt descriptor tables, and basically everything
the selftest needs to run. The selftest framework sets all of these up
assuming that the memory in the memslot can be written to from the
host. Align with that behavior by initializing guest_memfd as shared so
that all the writes from the host are permitted.
guest_memfd memory can later be marked private if necessary by CoCo
platform-specific initialization functions.
Suggested-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index d1befa3f4b305..a377e5f333116 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -483,8 +483,10 @@ struct kvm_vm *__vm_create(struct vm_shape shape, u32 nr_runnable_vcpus,
{
u64 nr_pages = vm_nr_pages_required(shape.mode, nr_runnable_vcpus,
nr_extra_pages);
+ enum vm_mem_backing_src_type src_type;
struct userspace_mem_region *slot0;
struct kvm_vm *vm;
+ u64 gmem_flags;
int i, flags;
kvm_set_files_rlimit(nr_runnable_vcpus);
@@ -502,7 +504,15 @@ struct kvm_vm *__vm_create(struct vm_shape shape, u32 nr_runnable_vcpus,
if (is_guest_memfd_required(shape))
flags |= KVM_MEM_GUEST_MEMFD;
- vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, flags);
+ gmem_flags = 0;
+ src_type = VM_MEM_SRC_ANONYMOUS;
+ if (is_guest_memfd_required(shape) && kvm_has_gmem_attributes) {
+ src_type = VM_MEM_SRC_SHMEM;
+ gmem_flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
+ }
+
+ vm_mem_add(vm, src_type, 0, 0, nr_pages, flags, -1, 0, gmem_flags);
+
for (i = 0; i < NR_MEM_REGIONS; i++)
vm->memslots[i] = 0;
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [POC PATCH 2/5] KVM: selftests: Use guest_memfd memory contents in-place for SNP launch update
From: Ackerley Tng @ 2026-05-07 20:34 UTC (permalink / raw)
To: devnull+ackerleytng.google.com
Cc: ackerleytng, aik, akpm, andrew.jones, aneesh.kumar, axelrasmussen,
baohua, bhe, binbin.wu, bp, brauner, chao.p.peng, chrisl, corbet,
dave.hansen, david, forkloop, hpa, ira.weiny, jgg, jmattson,
jthoughton, kas, kasong, kvm, liam, linux-coco, linux-doc,
linux-kernel, linux-kselftest, linux-mm, linux-trace-kernel,
mathieu.desnoyers, mhiramat, michael.roth, mingo, nphamcs, oupton,
pankaj.gupta, pbonzini, pratyush, qi.zheng, qperret,
rick.p.edgecombe, rientjes, rostedt, seanjc, shakeel.butt,
shikemeng, shivankg, shuah, skhan, steven.price, suzuki.poulose,
tabba, tglx, vannapurve, vbabka, weixugc, willy, wyihan, x86,
yan.y.zhao, youngjun.park, yuanchu
In-Reply-To: <cover.1778185936.git.ackerleytng@google.com>
Update the SEV-SNP launch update flow to utilize guest_memfd in-place
conversion.
Include the KVM_SET_MEMORY_ATTRIBUTES2_PRESERVE flag when setting memory
attributes to private. This is permitted before the SNP VM is finalized.
In snp_launch_update_data, pass 0 as the host virtual address. This
instructs the kernel to perform the launch update using the guest_memfd
backing the guest physical address rather than a userspace-provided
buffer.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/lib/x86/sev.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/sev.c b/tools/testing/selftests/kvm/lib/x86/sev.c
index 93f9169034617..074ab0eff1e27 100644
--- a/tools/testing/selftests/kvm/lib/x86/sev.c
+++ b/tools/testing/selftests/kvm/lib/x86/sev.c
@@ -37,8 +37,7 @@ static void encrypt_region(struct kvm_vm *vm, struct userspace_mem_region *regio
if (is_sev_snp_vm(vm))
snp_launch_update_data(vm, gpa_base + offset,
- (u64)addr_gpa2hva(vm, gpa_base + offset),
- size, page_type);
+ 0, size, page_type);
else
sev_launch_update_data(vm, gpa_base + offset, size);
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [POC PATCH 3/5] KVM: selftests: Make guest_code_xsave more friendly
From: Ackerley Tng @ 2026-05-07 20:34 UTC (permalink / raw)
To: devnull+ackerleytng.google.com
Cc: ackerleytng, aik, akpm, andrew.jones, aneesh.kumar, axelrasmussen,
baohua, bhe, binbin.wu, bp, brauner, chao.p.peng, chrisl, corbet,
dave.hansen, david, forkloop, hpa, ira.weiny, jgg, jmattson,
jthoughton, kas, kasong, kvm, liam, linux-coco, linux-doc,
linux-kernel, linux-kselftest, linux-mm, linux-trace-kernel,
mathieu.desnoyers, mhiramat, michael.roth, mingo, nphamcs, oupton,
pankaj.gupta, pbonzini, pratyush, qi.zheng, qperret,
rick.p.edgecombe, rientjes, rostedt, seanjc, shakeel.butt,
shikemeng, shivankg, shuah, skhan, steven.price, suzuki.poulose,
tabba, tglx, vannapurve, vbabka, weixugc, willy, wyihan, x86,
yan.y.zhao, youngjun.park, yuanchu
In-Reply-To: <cover.1778185936.git.ackerleytng@google.com>
The original implementation of guest_code_xsave makes a jmp to
guest_sev_es_code in inline assembly. When code that uses guest_sev_es_code
is removed, guest_sev_es_code will be optimized out, leading to a linking
error since guest_code_xsave still tries to jmp to guest_sev_es_code.
Rewrite guest_code_xsave() to instead make a call, in C, to
guest_sev_es_code(), so that usage of guest_sev_es_code() is made known to
the compiler.
This rewriting also gives a name to the xsave inline assembly, improving
readability.
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
.../selftests/kvm/x86/sev_smoke_test.c | 24 +++++++++++++------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/sev_smoke_test.c b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
index 1a49ee3915864..8b859adf4cf6f 100644
--- a/tools/testing/selftests/kvm/x86/sev_smoke_test.c
+++ b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
@@ -80,13 +80,23 @@ static void guest_sev_code(void)
GUEST_DONE();
}
-/* Stash state passed via VMSA before any compiled code runs. */
-extern void guest_code_xsave(void);
-asm("guest_code_xsave:\n"
- "mov $" __stringify(XFEATURE_MASK_X87_AVX) ", %eax\n"
- "xor %edx, %edx\n"
- "xsave (%rdi)\n"
- "jmp guest_sev_es_code");
+static void xsave_all_registers(void *addr)
+{
+ __asm__ __volatile__(
+ "mov $" __stringify(XFEATURE_MASK_X87_AVX) ", %eax\n"
+ "xor %edx, %edx\n"
+ "xsave (%0)"
+ :
+ : "r"(addr)
+ : "eax", "edx", "memory"
+ );
+}
+
+static void guest_code_xsave(void *vmsa_gva)
+{
+ xsave_all_registers(vmsa_gva);
+ guest_sev_es_code();
+}
static void compare_xsave(u8 *from_host, u8 *from_guest)
{
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [POC PATCH 4/5] KVM: selftests: Allow specifying CoCo-privateness while mapping a page
From: Ackerley Tng @ 2026-05-07 20:34 UTC (permalink / raw)
To: devnull+ackerleytng.google.com
Cc: ackerleytng, aik, akpm, andrew.jones, aneesh.kumar, axelrasmussen,
baohua, bhe, binbin.wu, bp, brauner, chao.p.peng, chrisl, corbet,
dave.hansen, david, forkloop, hpa, ira.weiny, jgg, jmattson,
jthoughton, kas, kasong, kvm, liam, linux-coco, linux-doc,
linux-kernel, linux-kselftest, linux-mm, linux-trace-kernel,
mathieu.desnoyers, mhiramat, michael.roth, mingo, nphamcs, oupton,
pankaj.gupta, pbonzini, pratyush, qi.zheng, qperret,
rick.p.edgecombe, rientjes, rostedt, seanjc, shakeel.butt,
shikemeng, shivankg, shuah, skhan, steven.price, suzuki.poulose,
tabba, tglx, vannapurve, vbabka, weixugc, willy, wyihan, x86,
yan.y.zhao, youngjun.park, yuanchu
In-Reply-To: <cover.1778185936.git.ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
tools/testing/selftests/kvm/include/x86/processor.h | 2 ++
tools/testing/selftests/kvm/lib/x86/processor.c | 13 ++++++++++---
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 77f576ee7789d..683f21452db58 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1507,6 +1507,8 @@ enum pg_level {
void tdp_mmu_init(struct kvm_vm *vm, int pgtable_levels,
struct pte_masks *pte_masks);
+void ___virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
+ gpa_t gpa, int level, bool private);
void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
gpa_t gpa, int level);
void virt_map_level(struct kvm_vm *vm, gva_t gva, gpa_t gpa,
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index b51467d70f6e7..02781194f51a2 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -256,8 +256,8 @@ static u64 *virt_create_upper_pte(struct kvm_vm *vm,
return pte;
}
-void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
- gpa_t gpa, int level)
+void ___virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
+ gpa_t gpa, int level, bool private)
{
const u64 pg_size = PG_LEVEL_SIZE(level);
u64 *pte = &mmu->pgd;
@@ -309,12 +309,19 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
* Neither SEV nor TDX supports shared page tables, so only the final
* leaf PTE needs manually set the C/S-bit.
*/
- if (vm_is_gpa_protected(vm, gpa))
+ if (private)
*pte |= PTE_C_BIT_MASK(mmu);
else
*pte |= PTE_S_BIT_MASK(mmu);
}
+void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, gva_t gva,
+ gpa_t gpa, int level)
+{
+ ___virt_pg_map(vm, mmu, gva, gpa, level,
+ vm_is_gpa_protected(vm, gpa));
+}
+
void virt_arch_pg_map(struct kvm_vm *vm, gva_t gva, gpa_t gpa)
{
__virt_pg_map(vm, &vm->mmu, gva, gpa, PG_LEVEL_4K);
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* [POC PATCH 5/5] KVM: selftests: Test conversions for SNP
From: Ackerley Tng @ 2026-05-07 20:34 UTC (permalink / raw)
To: devnull+ackerleytng.google.com
Cc: ackerleytng, aik, akpm, andrew.jones, aneesh.kumar, axelrasmussen,
baohua, bhe, binbin.wu, bp, brauner, chao.p.peng, chrisl, corbet,
dave.hansen, david, forkloop, hpa, ira.weiny, jgg, jmattson,
jthoughton, kas, kasong, kvm, liam, linux-coco, linux-doc,
linux-kernel, linux-kselftest, linux-mm, linux-trace-kernel,
mathieu.desnoyers, mhiramat, michael.roth, mingo, nphamcs, oupton,
pankaj.gupta, pbonzini, pratyush, qi.zheng, qperret,
rick.p.edgecombe, rientjes, rostedt, seanjc, shakeel.butt,
shikemeng, shivankg, shuah, skhan, steven.price, suzuki.poulose,
tabba, tglx, vannapurve, vbabka, weixugc, willy, wyihan, x86,
yan.y.zhao, youngjun.park, yuanchu
In-Reply-To: <cover.1778185936.git.ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
---
.../selftests/kvm/x86/sev_smoke_test.c | 198 +++++++++++++++++-
1 file changed, 193 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/sev_smoke_test.c b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
index 8b859adf4cf6f..8869cca748879 100644
--- a/tools/testing/selftests/kvm/x86/sev_smoke_test.c
+++ b/tools/testing/selftests/kvm/x86/sev_smoke_test.c
@@ -253,17 +253,205 @@ static void test_sev_smoke(void *guest, u32 type, u64 policy)
}
}
+#define GHCB_MSR_REG_GPA_REQ 0x012
+#define GHCB_MSR_REG_GPA_REQ_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)((v) & GENMASK_ULL(51, 0)) << 12) | \
+ /* GHCBData[11:0] */ \
+ GHCB_MSR_REG_GPA_REQ)
+
+#define GHCB_MSR_REG_GPA_RESP 0x013
+#define GHCB_MSR_REG_GPA_RESP_VAL(v) \
+ /* GHCBData[63:12] */ \
+ (((u64)(v) & GENMASK_ULL(63, 12)) >> 12)
+
+#define GHCB_DATA_LOW 12
+#define GHCB_MSR_INFO_MASK (BIT_ULL(GHCB_DATA_LOW) - 1)
+#define GHCB_RESP_CODE(v) ((v) & GHCB_MSR_INFO_MASK)
+
+/*
+ * SNP Page State Change Operation
+ *
+ * GHCBData[55:52] - Page operation:
+ * 0x0001 Page assignment, Private
+ * 0x0002 Page assignment, Shared
+ */
+enum psc_op {
+ SNP_PAGE_STATE_PRIVATE = 1,
+ SNP_PAGE_STATE_SHARED,
+};
+
+#define GHCB_MSR_PSC_REQ 0x014
+#define GHCB_MSR_PSC_REQ_GFN(gfn, op) \
+ /* GHCBData[55:52] */ \
+ (((u64)((op) & 0xf) << 52) | \
+ /* GHCBData[51:12] */ \
+ ((u64)((gfn) & GENMASK_ULL(39, 0)) << 12) | \
+ /* GHCBData[11:0] */ \
+ GHCB_MSR_PSC_REQ)
+
+#define GHCB_MSR_PSC_RESP 0x015
+#define GHCB_MSR_PSC_RESP_VAL(val) \
+ /* GHCBData[63:32] */ \
+ (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
+
+static u64 ghcb_gpa;
+static void snp_register_ghcb(void)
+{
+ u64 ghcb_pfn = ghcb_gpa >> PAGE_SHIFT;
+ u64 val;
+
+ GUEST_ASSERT(ghcb_gpa);
+
+ wrmsr(MSR_AMD64_SEV_ES_GHCB, GHCB_MSR_REG_GPA_REQ_VAL(ghcb_gpa >> PAGE_SHIFT));
+ vmgexit();
+
+ val = rdmsr(MSR_AMD64_SEV_ES_GHCB);
+ GUEST_ASSERT_EQ(GHCB_RESP_CODE(val), GHCB_MSR_REG_GPA_RESP);
+ GUEST_ASSERT_EQ(GHCB_MSR_REG_GPA_RESP_VAL(val), ghcb_pfn);
+}
+
+static void snp_page_state_change(u64 gpa, enum psc_op op)
+{
+ u64 val;
+
+ wrmsr(MSR_AMD64_SEV_ES_GHCB, GHCB_MSR_PSC_REQ_GFN(gpa >> PAGE_SHIFT, op));
+ vmgexit();
+
+ val = rdmsr(MSR_AMD64_SEV_ES_GHCB);
+ GUEST_ASSERT_EQ(GHCB_RESP_CODE(val), GHCB_MSR_PSC_RESP);
+ GUEST_ASSERT_EQ(GHCB_MSR_PSC_RESP_VAL(val), 0);
+}
+
+#define RMP_PG_SIZE_4K 0
+static inline void pvalidate(void *vaddr, bool validate)
+{
+ bool no_rmpupdate;
+ int rc;
+
+ /* "pvalidate" mnemonic support in binutils 2.36 and newer */
+ asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
+ : "=@ccc"(no_rmpupdate), "=a"(rc)
+ : "a"(vaddr), "c"(RMP_PG_SIZE_4K), "d"(validate)
+ : "memory", "cc");
+
+ GUEST_ASSERT(!no_rmpupdate);
+ GUEST_ASSERT_EQ(rc, 0);
+}
+
+#define CONVERSION_TEST_VALUE_SHARED_1 0xab
+#define CONVERSION_TEST_VALUE_SHARED_2 0xcd
+#define CONVERSION_TEST_VALUE_PRIVATE 0xef
+#define CONVERSION_TEST_VALUE_SHARED_3 0xbc
+#define CONVERSION_TEST_VALUE_SHARED_4 0xde
+static void guest_code_conversion(u8 *test_shared_gva, u8 *test_private_gva, u64 test_gpa)
+{
+ snp_register_ghcb();
+
+ GUEST_ASSERT_EQ(READ_ONCE(*test_shared_gva), CONVERSION_TEST_VALUE_SHARED_1);
+ WRITE_ONCE(*test_shared_gva, CONVERSION_TEST_VALUE_SHARED_2);
+
+ snp_page_state_change(test_gpa, SNP_PAGE_STATE_PRIVATE);
+ pvalidate(test_private_gva, true);
+
+ WRITE_ONCE(*test_private_gva, CONVERSION_TEST_VALUE_PRIVATE);
+ GUEST_ASSERT_EQ(READ_ONCE(*test_private_gva), CONVERSION_TEST_VALUE_PRIVATE);
+
+ pvalidate(test_private_gva, false);
+ snp_page_state_change(test_gpa, SNP_PAGE_STATE_SHARED);
+
+ GUEST_ASSERT_EQ(READ_ONCE(*test_shared_gva), CONVERSION_TEST_VALUE_SHARED_3);
+ WRITE_ONCE(*test_shared_gva, CONVERSION_TEST_VALUE_SHARED_4);
+
+ wrmsr(MSR_AMD64_SEV_ES_GHCB, GHCB_MSR_TERM_REQ);
+ vmgexit();
+}
+
+static void test_conversion(u64 policy)
+{
+ gva_t test_private_gva;
+ gva_t test_shared_gva;
+ struct kvm_vcpu *vcpu;
+ gva_t ghcb_gva;
+ gpa_t test_gpa;
+ struct kvm_vm *vm;
+ void *ghcb_hva;
+ void *test_hva;
+
+ vm = vm_sev_create_with_one_vcpu(KVM_X86_SNP_VM, guest_code_conversion, &vcpu);
+
+ ghcb_gva = vm_alloc_shared(vm, PAGE_SIZE, KVM_UTIL_MIN_VADDR,
+ MEM_REGION_TEST_DATA);
+ ghcb_hva = addr_gva2hva(vm, ghcb_gva);
+ ghcb_gpa = addr_gva2gpa(vm, ghcb_gva);
+ sync_global_to_guest(vm, ghcb_gpa);
+
+ test_shared_gva = vm_alloc_shared(vm, PAGE_SIZE, KVM_UTIL_MIN_VADDR,
+ MEM_REGION_TEST_DATA);
+ test_hva = addr_gva2hva(vm, test_shared_gva);
+ test_gpa = addr_gva2gpa(vm, test_shared_gva);
+
+ test_private_gva = vm_unused_gva_gap(vm, PAGE_SIZE, KVM_UTIL_MIN_VADDR);
+ ___virt_pg_map(vm, &vm->mmu, test_private_gva, test_gpa, PG_SIZE_4K, true);
+
+ vcpu_args_set(vcpu, 3, test_shared_gva, test_private_gva, test_gpa);
+
+ vm_sev_launch(vm, policy, NULL);
+
+ WRITE_ONCE(*(u8 *)test_hva, CONVERSION_TEST_VALUE_SHARED_1);
+
+ fprintf(stderr, "ghcb_hva=%p ghcb_gpa=%lx ghcb_gva=%lx\n", ghcb_hva, ghcb_gpa, ghcb_gva);
+ fprintf(stderr, "test_hva=%p test_gpa=%lx test_private_gva=%lx test_shared_gva=%lx\n", test_hva, test_gpa, test_private_gva, test_shared_gva);
+
+ vcpu_run(vcpu);
+
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_HYPERCALL);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.nr, KVM_HC_MAP_GPA_RANGE);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[0], test_gpa);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[1], 1);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[2], KVM_MAP_GPA_RANGE_ENCRYPTED | KVM_MAP_GPA_RANGE_PAGE_SZ_4K);
+
+ vm_mem_set_private(vm, test_gpa, PAGE_SIZE);
+
+ vcpu_run(vcpu);
+
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_HYPERCALL);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.nr, KVM_HC_MAP_GPA_RANGE);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[0], test_gpa);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[1], 1);
+ TEST_ASSERT_EQ(vcpu->run->hypercall.args[2], KVM_MAP_GPA_RANGE_DECRYPTED | KVM_MAP_GPA_RANGE_PAGE_SZ_4K);
+
+ vm_mem_set_shared(vm, test_gpa, PAGE_SIZE);
+
+ fprintf(stderr, "test_hva contents = %x\n", READ_ONCE(*(u8 *)test_hva));
+
+ WRITE_ONCE(*(u8 *)test_hva, CONVERSION_TEST_VALUE_SHARED_3);
+ TEST_ASSERT_EQ(*(u8 *)test_hva, CONVERSION_TEST_VALUE_SHARED_3);
+
+ vcpu_run(vcpu);
+
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_SYSTEM_EVENT);
+ TEST_ASSERT_EQ(vcpu->run->system_event.type, KVM_SYSTEM_EVENT_SEV_TERM);
+ TEST_ASSERT_EQ(vcpu->run->system_event.ndata, 1);
+ TEST_ASSERT_EQ(vcpu->run->system_event.data[0], GHCB_MSR_TERM_REQ);
+
+ TEST_ASSERT_EQ(*(u8 *)test_hva, CONVERSION_TEST_VALUE_SHARED_4);
+}
+
int main(int argc, char *argv[])
{
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SEV));
- test_sev_smoke(guest_sev_code, KVM_X86_SEV_VM, 0);
+ // test_sev_smoke(guest_sev_code, KVM_X86_SEV_VM, 0);
+
+ // if (kvm_cpu_has(X86_FEATURE_SEV_ES))
+ // test_sev_smoke(guest_sev_es_code, KVM_X86_SEV_ES_VM, SEV_POLICY_ES);
- if (kvm_cpu_has(X86_FEATURE_SEV_ES))
- test_sev_smoke(guest_sev_es_code, KVM_X86_SEV_ES_VM, SEV_POLICY_ES);
+ if (kvm_cpu_has(X86_FEATURE_SEV_SNP)) {
+ test_conversion(snp_default_policy());
- if (kvm_cpu_has(X86_FEATURE_SEV_SNP))
- test_sev_smoke(guest_snp_code, KVM_X86_SNP_VM, snp_default_policy());
+ // test_sev_smoke(guest_snp_code, KVM_X86_SNP_VM, snp_default_policy());
+ }
return 0;
}
--
2.54.0.563.g4f69b47b94-goog
^ permalink raw reply related
* Re: [RFC][PATCH] unwind: Add stacktrace_setup system call
From: Jose E. Marchesi @ 2026-05-07 12:37 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, mhiramat, mathieu.desnoyers,
jremus, jpoimboe, peterz, mingo, jolsa, acme, namhyung, tglx,
andrii, indu.bhagat, beaub, torvalds, akpm, fweimer, kees,
codonell, sam, dylanbhatch, bp, dave.hansen, david, hpa,
Liam.Howlett, lorenzo.stoakes, mhocko, rppt, surenb, vbabka, hca,
gor
In-Reply-To: <20260429114355.6c712e6a@gandalf.local.home>
> +/**
> + * sys_stacktrace_setup - register an address for user space stacktrace walking.
> + * @op: Type of operation to perform
> + * @addr_start: The virtual address of the stacktrace information
> + * @addr_length: The length of the stacktrace information
> + * @text_start: The virtual address of the text that @addr_start represents
> + * @text_length: The length of teh text
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Currently only sframes are supported, but in the future, this may be used
> + * to tell the kernel about JIT code which will most likely have a different
> + * format.
> + *
> + * The type command may be extended and parameters may be used for other
> + * purposes.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, addr_start,
> + unsigned long, addr_length, unsigned long, text_start,
> + unsigned long, text_length)
> +{
> + switch (op) {
> + case STACKTRACE_REGISTER_SFRAME:
> + return sframe_add_section(addr_start, addr_start + addr_length,
> + text_start, text_start+text_length);
> + case STACKTRACE_UNREGISTER_SFRAME:
> + return sframe_remove_section(addr_start);
> + }
> + return -EINVAL;
> +}
FWIW passing start and end of both the tracing data and the text segment
it covers seems reasonable to me. This covers the case in which the
same tracing data describes several text segments, which can happen with
SFrame and other similar formats.
> diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl
> index 7a42b32b6577..54a99cffeec4 100644
> --- a/scripts/syscall.tbl
> +++ b/scripts/syscall.tbl
> @@ -412,3 +412,4 @@
> 469 common file_setattr sys_file_setattr
> 470 common listns sys_listns
> 471 common rseq_slice_yield sys_rseq_slice_yield
> +472 common stacktrace_setup sys_stacktrace_setup
^ permalink raw reply
* Re: [PATCH] test_kprobes: clear kprobes between test runs
From: Masami Hiramatsu @ 2026-05-08 0:33 UTC (permalink / raw)
To: Martin Kaiser
Cc: Naveen N Rao, Steven Rostedt, linux-trace-kernel, linux-kernel
In-Reply-To: <20260507134615.1010905-1-martin@kaiser.cx>
On Thu, 7 May 2026 15:44:33 +0200
Martin Kaiser <martin@kaiser.cx> wrote:
> Running the kprobes sanity tests twice makes all tests fail and
> eventually crashes the kernel.
>
> [root@martin-riscv-1 ~]# echo 1 > /sys/kernel/debug/kunit/kprobes_test/run
> ...
> # Totals: pass:5 fail:0 skip:0 total:5
> ok 1 kprobes_test
> [root@martin-riscv-1 ~]# echo 1 > /sys/kernel/debug/kunit/kprobes_test/run
> ...
> # test_kprobe: EXPECTATION FAILED at lib/tests/test_kprobes.c:64
> Expected 0 == register_kprobe(&kp), but
> register_kprobe(&kp) == -22 (0xffffffffffffffea)
> ...
> Unable to handle kernel paging request ...
Oops, good catch! Thanks for fixing!
>
> The testsuite defines several kprobes and kretprobes as static variables
> that are preserved across test runs.
>
> After register_kprobe and unregister_kprobe, a kprobe contains some
> leftover data that must be cleared before the kprobe can be registered
> again. The tests are setting symbol_name to define the probe location.
> Address and flags must be cleared.
>
> The existing code clears some of the probes between subsequent tests, but
> not between two test runs. The leftover data from a previous test run
> makes the registrations fail in the next run.
>
> Move the cleanups for all kprobes into kprobes_test_init, this function
> is called before each single test (including the first test of a test
> run).
Yeah, this looks good to me. Let me pick it.
>
> Signed-off-by: Martin Kaiser <martin@kaiser.cx>
> ---
> lib/tests/test_kprobes.c | 29 ++++++++++++++++++-----------
> 1 file changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/lib/tests/test_kprobes.c b/lib/tests/test_kprobes.c
> index b7582010125c..06e729e4de05 100644
> --- a/lib/tests/test_kprobes.c
> +++ b/lib/tests/test_kprobes.c
> @@ -12,6 +12,12 @@
>
> #define div_factor 3
>
> +#define KP_CLEAR(_kp) \
> +do { \
> + (_kp).addr = NULL; \
> + (_kp).flags = 0; \
> +} while (0)
> +
> static u32 rand1, preh_val, posth_val;
> static u32 (*target)(u32 value);
> static u32 (*recursed_target)(u32 value);
> @@ -125,10 +131,6 @@ static void test_kprobes(struct kunit *test)
>
> current_test = test;
>
> - /* addr and flags should be cleard for reusing kprobe. */
> - kp.addr = NULL;
> - kp.flags = 0;
> -
> KUNIT_EXPECT_EQ(test, 0, register_kprobes(kps, 2));
> preh_val = 0;
> posth_val = 0;
> @@ -226,9 +228,6 @@ static void test_kretprobes(struct kunit *test)
> struct kretprobe *rps[2] = {&rp, &rp2};
>
> current_test = test;
> - /* addr and flags should be cleard for reusing kprobe. */
> - rp.kp.addr = NULL;
> - rp.kp.flags = 0;
> KUNIT_EXPECT_EQ(test, 0, register_kretprobes(rps, 2));
>
> krph_val = 0;
> @@ -290,8 +289,6 @@ static void test_stacktrace_on_kretprobe(struct kunit *test)
> unsigned long myretaddr = (unsigned long)__builtin_return_address(0);
>
> current_test = test;
> - rp3.kp.addr = NULL;
> - rp3.kp.flags = 0;
>
> /*
> * Run the stacktrace_driver() to record correct return address in
> @@ -352,8 +349,6 @@ static void test_stacktrace_on_nested_kretprobe(struct kunit *test)
> struct kretprobe *rps[2] = {&rp3, &rp4};
>
> current_test = test;
> - rp3.kp.addr = NULL;
> - rp3.kp.flags = 0;
>
> //KUNIT_ASSERT_NE(test, myretaddr, stacktrace_driver());
>
> @@ -367,6 +362,18 @@ static void test_stacktrace_on_nested_kretprobe(struct kunit *test)
>
> static int kprobes_test_init(struct kunit *test)
> {
> + KP_CLEAR(kp);
> + KP_CLEAR(kp2);
> + KP_CLEAR(kp_missed);
> +#ifdef CONFIG_KRETPROBES
> + KP_CLEAR(rp.kp);
> + KP_CLEAR(rp2.kp);
> +#ifdef CONFIG_ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
> + KP_CLEAR(rp3.kp);
> + KP_CLEAR(rp4.kp);
> +#endif
> +#endif
> +
> target = kprobe_target;
> target2 = kprobe_target2;
> recursed_target = kprobe_recursed_target;
> --
> 2.43.7
>
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply
* Re: [RFC][PATCH] unwind: Add stacktrace_setup system call
From: Steven Rostedt @ 2026-05-08 1:57 UTC (permalink / raw)
To: Jose E. Marchesi
Cc: linux-kernel, linux-trace-kernel, mhiramat, mathieu.desnoyers,
jremus, jpoimboe, peterz, mingo, jolsa, acme, namhyung, tglx,
andrii, indu.bhagat, beaub, torvalds, akpm, fweimer, kees,
codonell, sam, dylanbhatch, bp, dave.hansen, david, hpa,
Liam.Howlett, lorenzo.stoakes, mhocko, rppt, surenb, vbabka, hca,
gor
In-Reply-To: <87zf2bl7jj.fsf@gnu.org>
On Thu, 07 May 2026 14:37:36 +0200
"Jose E. Marchesi" <jemarch@gnu.org> wrote:
> FWIW passing start and end of both the tracing data and the text segment
> it covers seems reasonable to me. This covers the case in which the
> same tracing data describes several text segments, which can happen with
> SFrame and other similar formats.
Just so I understand you. You are suggesting to pass in the end address
instead of the length?
-- Steve
^ permalink raw reply
* Re: [RFC][PATCH] unwind: Add stacktrace_setup system call
From: Jose E. Marchesi @ 2026-05-08 7:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, mhiramat, mathieu.desnoyers,
jremus, jpoimboe, peterz, mingo, jolsa, acme, namhyung, tglx,
andrii, indu.bhagat, beaub, torvalds, akpm, fweimer, kees,
codonell, sam, dylanbhatch, bp, dave.hansen, david, hpa,
Liam.Howlett, lorenzo.stoakes, mhocko, rppt, surenb, vbabka, hca,
gor
In-Reply-To: <20260507215751.0c286695@fedora>
> On Thu, 07 May 2026 14:37:36 +0200
> "Jose E. Marchesi" <jemarch@gnu.org> wrote:
>
>> FWIW passing start and end of both the tracing data and the text segment
>> it covers seems reasonable to me. This covers the case in which the
>> same tracing data describes several text segments, which can happen with
>> SFrame and other similar formats.
>
> Just so I understand you. You are suggesting to pass in the end address
> instead of the length?
No no, was talking conceptually. Denoting the end of the region by its
length is better. I see no reason to do otherwise in this case.
^ permalink raw reply
* Re: [RFC][PATCH] unwind: Add stacktrace_setup system call
From: Jens Remus @ 2026-05-08 7:46 UTC (permalink / raw)
To: Steven Rostedt, LKML, Linux Trace Kernel
Cc: Masami Hiramatsu, Mathieu Desnoyers, Josh Poimboeuf,
Peter Zijlstra, Ingo Molnar, Jiri Olsa, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Andrii Nakryiko, Indu Bhagat,
Jose E. Marchesi, Beau Belgrave, Linus Torvalds, Andrew Morton,
Florian Weimer, Kees Cook, Carlos O'Donell, Sam James,
Dylan Hatch, Borislav Petkov, Dave Hansen, David Hildenbrand,
H. Peter Anvin, Liam R. Howlett, Lorenzo Stoakes, Michal Hocko,
Mike Rapoport, Suren Baghdasaryan, Vlastimil Babka,
Heiko Carstens, Vasily Gorbik
In-Reply-To: <20260429114355.6c712e6a@gandalf.local.home>
On 4/29/2026 5:43 PM, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
>
> [
> This is an RFC that adds a system call for dynamic linkers to use to
> tell the kernel where the sframe sections are when it loads dynamic
> libraries.
>
> It is built on top of Jens's sframe implementation for v3:
>
> https://lore.kernel.org/linux-trace-kernel/20260127150554.2760964-1-jremus@linux.ibm.com/
>
> I have a repo with that code that this applies on top of here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git sframe/core
>
>
> The name of the system call is "stacktrace_setup", but I'm not attached
> to this name. If anyone can think of a better name I'm happy to take
> suggestions.
>
> This patch is just to get the conversation going and the final result
> may be much different. I tested this with the attached program which is a
> major hack. I built glibc with sframe v3 support and I used readelf to
> find the sframe size and location of glibc.
>
> readelf -e /work/usr/lib/libc.so.6 | grep sframe
> [19] .sframe GNU_SFRAME 00000000001d3fc0 001d3fc0
>
> Then I wrote a program that takes the above location and size of the
> .sframe section in libc as parameters, scans /proc/self/maps to find
> where it loaded libc and then calls this new system call with a pointer
> to the location of the sframe along with its size, as well as where the
> libc text is located.
>
> It then spins for 2 seconds, calls the system call again to remove the
> sframe section it loaded, and spins for another 2 seconds.
>
> I ran perf record --call-graph fp,defer on the program and looked for
> the do_spin() function.
>
> With sframe loaded:
>
> sframe-test 1350 1396.333593: 202366 cpu/cycles/P:
> 7fdf0ec38a44 [unknown] ([vdso])
> 5621a6b97243 get_time+0x19 (/work/c/sframe-test)
> 5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
> 5621a6b975cd main+0xd4 (/work/c/sframe-test)
> 7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
> 7fdf0ea26d05 __libc_start_main@@GLIBC_2.34+0x85 (/work/usr/lib/libc.so.6)
> 5621a6b97131 _start+0x21 (/work/c/sframe-test)
>
> After it unloads the sframe:
>
> sframe-test 1350 1400.332902: 657582 cpu/cycles/P:
> 7fdf0ec38a5e [unknown] ([vdso])
> 5621a6b97243 get_time+0x19 (/work/c/sframe-test)
> 5621a6b9727f do_spin+0x1f (/work/c/sframe-test)
> 5621a6b97602 main+0x109 (/work/c/sframe-test)
> 7fdf0ea26bda __libc_start_call_main+0x6a (/work/usr/lib/libc.so.6)
>
> As you can see, with the sframe loaded, it was able to walk further up
> the libc library.
>
> Again, this is just an RFC, but I would like to get agreement on the
> system call so that we can then update the dynamic linker to do this
> instead of using my hack ;-)
> ]
>
> Add a system call that can be used by dynamic linkers to tell the kernel
> where the sframe section is in memory for libraries it loads.
>
> The system call stacktrace_setup takes 5 parameters:
>
> op - the type of operation to perform
> addr_start - The virtual address of the sframe section
> addr_length - The length of the sframe section
> text_start - the text section the sframe represents
> test_length - the length of the section
>
> The current op values are:
>
> STACKTRACE_REGISTER_SFRAME - This registers the sframe
> STACKTRACE_UNREGISTER_SFRAME - This removes the sframe
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
LGTM. Some comments/questions below.
> diff --git a/include/uapi/linux/stacktrace.h b/include/uapi/linux/stacktrace.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_STACKTRACE_H
> +#define _UAPI_LINUX_STACKTRACE_H
> +
> +enum stacktrace_setup_types {
> + STACKTRACE_REGISTER_SFRAME = 1,
> + STACKTRACE_UNREGISTER_SFRAME = 2,
> +};
> +
> +#endif /* _UAPI_LINUX_STACKTRACE_H */
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
Having the syscall live in kernel/unwind/sframe.c means it is only
available if config option HAVE_UNWIND_USER_SFRAME is selected (which
triggers sframe.o to be built and linked into the kernel), which makes
sense as long as it only implements sframe-specific functionality.
I suppose it could be moved elsewhere if non-sframe use cases would
arise in the future?
Would Dylan need to guard it when introducing HAVE_UNWIND_KERNEL_SFRAME?
Provided the syscall fails with -ENOSYS if not implemented (e.g. when
HAVE_UNWIND_USER_SFRAME is not enabled) the dummy implementations of
sframe_add_section() and sframe_remove_section() in linux/sframe.h also
return -ENOSYS, so the user observable behavior would be the same and
it would not matter. Do you agree?
> @@ -12,8 +12,10 @@
> #include <linux/mm.h>
> #include <linux/string_helpers.h>
> #include <linux/sframe.h>
> +#include <linux/syscalls.h>
> #include <asm/unwind_user_sframe.h>
> #include <linux/unwind_user_types.h>
> +#include <uapi/linux/stacktrace.h>
>
> #include "sframe.h"
> #include "sframe_debug.h"
> @@ -838,3 +840,38 @@ void sframe_free_mm(struct mm_struct *mm)
>
> mtree_destroy(&mm->sframe_mt);
> }
> +
> +/**
> + * sys_stacktrace_setup - register an address for user space stacktrace walking.
> + * @op: Type of operation to perform
> + * @addr_start: The virtual address of the stacktrace information
> + * @addr_length: The length of the stacktrace information
> + * @text_start: The virtual address of the text that @addr_start represents
> + * @text_length: The length of teh text
> + *
> + * This system call is used by dynamic library utilities to inform the kernel
> + * of meta data that it loaded that can be used by the kernel to know how
> + * to stack walk the given text locations.
> + *
> + * Currently only sframes are supported, but in the future, this may be used
> + * to tell the kernel about JIT code which will most likely have a different
> + * format.
> + *
> + * The type command may be extended and parameters may be used for other
> + * purposes.
> + *
> + * Return: 0 if successful, otherwise a negative error.
> + */
> +SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, addr_start,
> + unsigned long, addr_length, unsigned long, text_start,
> + unsigned long, text_length)
Would it make sense to keep the parameters generic from start, similar
to how it is done in prctl()? Or can this be changed later, if the need
arises?
SYSCALL_DEFINE5(stacktrace_setup, int, op, unsigned long, arg2,
unsigned long, arg3, unsigned long, arg4, unsigned long, arg5)
> +{
> + switch (op) {
> + case STACKTRACE_REGISTER_SFRAME:
> + return sframe_add_section(addr_start, addr_start + addr_length,
> + text_start, text_start+text_length);
Nit:
text_start, text_start + text_length);
> + case STACKTRACE_UNREGISTER_SFRAME:
> + return sframe_remove_section(addr_start);
> + }
> + return -EINVAL;
> +}
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply
* Re: [PATCH v14 05/19] unwind_user/sframe: Add support for reading .sframe contents
From: Jens Remus @ 2026-05-08 10:50 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Josh Poimboeuf, Indu Bhagat,
Dylan Hatch
Cc: bpf, linux-mm, linux-trace-kernel, x86, Namhyung Kim,
Andrii Nakryiko, Jose E. Marchesi, Beau Belgrave, Florian Weimer,
Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google), Peter Zijlstra,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
In-Reply-To: <20260505121718.3572346-6-jremus@linux.ibm.com>
On 5/5/2026 2:17 PM, Jens Remus wrote:
> From: Josh Poimboeuf <jpoimboe@kernel.org>
>
> In preparation for using sframe to unwind user space stacks, add an
> sframe_find() interface for finding the sframe information associated
> with a given text address.
>
> For performance, use user_read_access_begin() and the corresponding
> unsafe_*() accessors. Note that use of pr_debug() in uaccess-enabled
> regions would break noinstr validation, so there aren't any debug
> messages yet. That will be added in a subsequent commit.
>
> Link: https://lore.kernel.org/all/77c0d1ec143bf2a53d66c4ecb190e7e0a576fbfd.1737511963.git.jpoimboe@kernel.org/
> Link: https://lore.kernel.org/all/b35ca3a3-8de5-4d32-8d30-d4e562f6b0de@linux.ibm.com/
>
> [ Jens Remus: Add initial support for SFrame V3 (limited to regular
> FDEs). Add support for PC-relative FDE function start offset. Simplify
> logic by using an internal FDE representation. Rename struct sframe_fre
> to sframe_fre_internal to align with struct sframe_fde_internal.
> Cleanup includes. Fix checkpatch errors "spaces required around that
> ':'". ]
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> +static __always_inline int __read_fde(struct sframe_section *sec,
> + unsigned int fde_num,
> + struct sframe_fde_internal *fde)
> +{
> + unsigned long fde_addr, fda_addr, func_addr;
unsigned long fde_addr, fda_addr, func_start, func_end;
> + struct sframe_fde_v3 _fde;
> + struct sframe_fda_v3 _fda;
> +
> + fde_addr = sec->fdes_start + (fde_num * sizeof(struct sframe_fde_v3));
> + unsafe_copy_from_user(&_fde, (void __user *)fde_addr,
> + sizeof(struct sframe_fde_v3), Efault);
> +
> + func_addr = fde_addr + _fde.func_start_off;
> + if (func_addr < sec->text_start || func_addr >= sec->text_end)
> + return -EINVAL;
func_start = fde_addr + _fde.func_start_off;
func_end = func_start + _fde.func_size;
if (func_start < sec->text_start || func_end > sec->text_end)
return -EINVAL;
This would validate that the whole function described by the FDE is
within the text section and not only the function start.
Note that, unrelated to above change, this check in general causes
sframe_validate_section() to fail, if one sframe section covers more
than one text section (unrelated to whether it is actually registered
for multiple text sections), for instance in case of Dylan's sframe
kernel stacktracer on arm64. Should the check therefore be made
conditional on whether __read_fde() is called from __find_fde() or
sframe_validate_section()? Or shall we drop this check as it does not
provide that much benefit during normal stacktracing use:
- sframe_find() obtains the struct sframe_section *sec from the
mm->sframe_mt based on IP. So IP must be within sec->text_start and
sec->text_end.
- __find_fde() only returns a FDE, if the IP is within
[fde->func_addr, fde->func_addr + fde->func_size[.
Dropping the check would allow the function start/end to be outside the
text section.
> +
> + fda_addr = sec->fres_start + _fde.fres_off;
> + if (fda_addr + sizeof(struct sframe_fda_v3) > sec->fres_end)
> + return -EINVAL;
> + unsafe_copy_from_user(&_fda, (void __user *)fda_addr,
> + sizeof(struct sframe_fda_v3), Efault);
Can unsafe_copy_from_user() be used for unaligned fda_addr, at least
on x86-64, s390 64-bit, and amr64?
Do the FDE type, FDE PC type, and FRE type values need to be validated
here as well?
unsigned char fde_type = SFRAME_V3_FDE_TYPE(_fda.info2);
unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(_fda.info);
unsigned char fre_type = SFRAME_V3_FDE_FRE_TYPE(_fda.info);
The FDE type would get validatd by __read_fre_datawords(), which is
called after __read_fde(), if the read FDE is the one of interest.
So that does not neccessarily need to be checked here. Do you agree?
The FDE PC type is currently not checked for supported values anywhere.
That one would make sense to be checked here:
if (fde_pctype != SFRAME_FDE_PCTYPE_INC &&
fde_pctype != SFRAME_FDE_PCTYPE_MASK)
return -EINVAL;
The FRE type would get validated by __read_fre(), which is called
somewhere down the line after __read_fde(). So that does not
neccesarily need to be checked here. Do you agree?
> +
> + fde->func_addr = func_addr;
fde->func_addr = func_start;
> + fde->func_size = _fde.func_size;
> + fde->fda_off = _fde.fres_off;
> + fde->fres_off = _fde.fres_off + sizeof(struct sframe_fda_v3);
> + fde->fres_num = _fda.fres_num;
> + fde->info = _fda.info;
> + fde->info2 = _fda.info2;
> + fde->rep_size = _fda.rep_size;
> +
> + return 0;
> +
> +Efault:
> + return -EFAULT;
> +}
> +static __always_inline int __read_fre(struct sframe_section *sec,
> + struct sframe_fde_internal *fde,
> + unsigned long fre_addr,
> + struct sframe_fre_internal *fre)
> +{
> + unsigned char fde_type = SFRAME_V3_FDE_TYPE(fde->info2);
> + unsigned char fde_pctype = SFRAME_V3_FDE_PCTYPE(fde->info);
> + unsigned char fre_type = SFRAME_V3_FDE_FRE_TYPE(fde->info);
> + unsigned char dataword_count, dataword_size;
> + s32 cfa_off, ra_off, fp_off;
> + unsigned long cur = fre_addr;
> + unsigned char addr_size;
> + u32 ip_off;
> + u8 info;
> +
> + addr_size = fre_type_to_size(fre_type);
> + if (!addr_size)
> + return -EFAULT;
> +
> + if (fre_addr + addr_size + 1 > sec->fres_end)
> + return -EFAULT;
> +
> + UNSAFE_GET_USER_INC(ip_off, cur, addr_size, Efault);
> + if (fde_pctype == SFRAME_FDE_PCTYPE_INC && ip_off > fde->func_size)
if ((fde_pctype == SFRAME_FDE_PCTYPE_INC && ip_off >= fde->func_size) ||
(fde_pctype == SFRAME_FDE_PCTYPE_MASK && ip_off >= fde->rep_size))
For PCTYPE_INC the FRE IP offset must be less than the FDE function size.
For PCTYPE_MASK the FRE IP offset must be less than the FDE repetition size.
> + return -EFAULT;
> +
> + UNSAFE_GET_USER_INC(info, cur, 1, Efault);
> + dataword_count = SFRAME_V3_FRE_DATAWORD_COUNT(info);
> + dataword_size = dataword_size_enum_to_size(SFRAME_V3_FRE_DATAWORD_SIZE(info));
> + if (!dataword_count || !dataword_size)
> + return -EFAULT;
> +
> + if (cur + (dataword_count * dataword_size) > sec->fres_end)
> + return -EFAULT;
> +
> + /* TODO: Support for flexible FDEs not implemented yet. */
> + if (fde_type != SFRAME_FDE_TYPE_DEFAULT)
> + return -EFAULT;
> +
> + UNSAFE_GET_USER_INC(cfa_off, cur, dataword_size, Efault);
> + dataword_count--;
> +
> + ra_off = sec->ra_off;
> + if (!ra_off) {
> + if (!dataword_count--)
> + return -EFAULT;
> +
> + UNSAFE_GET_USER_INC(ra_off, cur, dataword_size, Efault);
> + }
> +
> + fp_off = sec->fp_off;
> + if (!fp_off && dataword_count) {
> + dataword_count--;
> + UNSAFE_GET_USER_INC(fp_off, cur, dataword_size, Efault);
> + }
> +
> + if (dataword_count)
> + return -EFAULT;
> +
> + fre->size = addr_size + 1 + (dataword_count * dataword_size);
> + fre->ip_off = ip_off;
> + fre->cfa_off = cfa_off;
> + fre->ra_off = ra_off;
> + fre->fp_off = fp_off;
> + fre->info = info;
> +
> + return 0;
> +
> +Efault:
> + return -EFAULT;
> +}
Thanks and regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply
* Re: [PATCH v14 12/19] unwind_user/sframe: Add .sframe validation option
From: Jens Remus @ 2026-05-08 10:51 UTC (permalink / raw)
To: Steven Rostedt, Josh Poimboeuf, Indu Bhagat, Dylan Hatch
Cc: bpf, linux-kernel, linux-mm, linux-trace-kernel, x86,
Namhyung Kim, Andrii Nakryiko, Jose E. Marchesi, Beau Belgrave,
Florian Weimer, Carlos O'Donell, Masami Hiramatsu, Jiri Olsa,
Arnaldo Carvalho de Melo, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Heiko Carstens, Vasily Gorbik,
Ilya Leoshkevich, Steven Rostedt (Google), Peter Zijlstra,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Mathieu Desnoyers, Kees Cook, Sam James
In-Reply-To: <20260505121718.3572346-13-jremus@linux.ibm.com>
On 5/5/2026 2:17 PM, Jens Remus wrote:
> From: Josh Poimboeuf <jpoimboe@kernel.org>
>
> Add a debug feature to validate all .sframe sections when first loading
> the file rather than on demand.
>
> [ Jens Remus: Add support for SFrame V3. Add support for PC-relative
> FDE function start offset. Adjust to rename of struct sframe_fre to
> sframe_fre_internal. Use %#x/%#lx format specifiers. ]
> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c
> +static int safe_read_fde(struct sframe_section *sec,
> + unsigned int fde_num, struct sframe_fde_internal *fde)
> +{
> + int ret;
> +
> + if (!user_read_access_begin((void __user *)sec->sframe_start,
> + sec->sframe_end - sec->sframe_start))
> + return -EFAULT;
> + ret = __read_fde(sec, fde_num, fde);
> + user_read_access_end();
> + return ret;
> +}
> +static int sframe_validate_section(struct sframe_section *sec)
> +{
> + unsigned long prev_ip = 0;
> + unsigned int i;
> +
> + for (i = 0; i < sec->num_fdes; i++) {
> + struct sframe_fre_internal *fre, *prev_fre = NULL;
> + unsigned long ip, fre_addr;
> + struct sframe_fde_internal fde;
> + struct sframe_fre_internal fres[2];
> + bool which = false;
> + unsigned int j;
> + int ret;
> +
> + ret = safe_read_fde(sec, i, &fde);
Iterating over all FDEs may cause __read_fde() and thus safe_read_fde()
to fail if one sframe section covers multiple text sections (regardless
of whether it is also registered for multiple text sections), as
__read_fde() checks whether the read FDE function start address is
within [sec->text_start, sec->text_end[.
See my related comments in my reply to [PATCH v14 05/19] unwind_user/
sframe: Add support for reading .sframe contents.
> + if (ret) {
> + dbg_sec("safe_read_fde(%u) failed\n", i);
> + return ret;
> + }
> +
Regards,
Jens
--
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com
IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/
^ permalink raw reply
* Re: [PATCH 7.2 v16 03/13] mm/khugepaged: rework max_ptes_* handling with helper functions
From: Lance Yang @ 2026-05-08 11:11 UTC (permalink / raw)
To: npache
Cc: linux-doc, linux-kernel, linux-mm, linux-trace-kernel, aarcange,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
lance.yang, Liam.Howlett, ljs, mathieu.desnoyers, matthew.brost,
mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
zokeefe
In-Reply-To: <20260419185750.260784-4-npache@redhat.com>
On Sun, Apr 19, 2026 at 12:57:40PM -0600, Nico Pache wrote:
>The following cleanup reworks all the max_ptes_* handling into helper
>functions. This increases the code readability and will later be used to
>implement the mTHP handling of these variables.
>
>With these changes we abstract all the madvise_collapse() special casing
>(dont respect the sysctls) away from the functions that utilize them. And
>will later in this series to cleanly restrict mTHP collapses behaviors.
>
>Suggested-by: David Hildenbrand <david@kernel.org>
>Signed-off-by: Nico Pache <npache@redhat.com>
>---
Nice. It should all be an equivalence-preserving refactor.
With David's suggested changes:
Reviewed-by: Lance Yang <lance.yang@linux.dev>
^ permalink raw reply
* [RFC PATCH] trace: Introduce a new filter_pred "caller"
From: Chen Jun @ 2026-05-08 12:26 UTC (permalink / raw)
To: rostedt, mhiramat, mathieu.desnoyers, linux-kernel,
linux-trace-kernel
Cc: chenjun102
Low-level functions have many call paths, and sometimes
we only care about the calls on a specific call path.
Add a new filter to filter based on the call stack.
Usage:
1. echo 'caller=="$function_name"' > events/../filter
Only support OP_EQ and OP_NE
Signed-off-by: Chen Jun <chenjun102@huawei.com>
---
include/linux/trace_events.h | 1 +
kernel/trace/trace.h | 3 ++-
kernel/trace/trace_events.c | 1 +
kernel/trace/trace_events_filter.c | 40 ++++++++++++++++++++++++++++--
4 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 40a43a4c7caf..1f109669a391 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -851,6 +851,7 @@ enum {
FILTER_COMM,
FILTER_CPU,
FILTER_STACKTRACE,
+ FILTER_CALLER,
};
extern int trace_event_raw_init(struct trace_event_call *call);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 80fe152af1dd..4e4b92ce264f 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1825,7 +1825,8 @@ static inline bool is_string_field(struct ftrace_event_field *field)
field->filter_type == FILTER_RDYN_STRING ||
field->filter_type == FILTER_STATIC_STRING ||
field->filter_type == FILTER_PTR_STRING ||
- field->filter_type == FILTER_COMM;
+ field->filter_type == FILTER_COMM ||
+ field->filter_type == FILTER_CALLER;
}
static inline bool is_function_field(struct ftrace_event_field *field)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c46e623e7e0d..6d220d7eec73 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -199,6 +199,7 @@ static int trace_define_generic_fields(void)
__generic_field(char *, comm, FILTER_COMM);
__generic_field(char *, stacktrace, FILTER_STACKTRACE);
__generic_field(char *, STACKTRACE, FILTER_STACKTRACE);
+ __generic_field(char *, caller, FILTER_CALLER);
return ret;
}
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 609325f57942..1cf040065abe 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -72,6 +72,7 @@ enum filter_pred_fn {
FILTER_PRED_FN_CPUMASK,
FILTER_PRED_FN_CPUMASK_CPU,
FILTER_PRED_FN_FUNCTION,
+ FILTER_PRED_FN_CALLER,
FILTER_PRED_FN_,
FILTER_PRED_TEST_VISITED,
};
@@ -1009,6 +1010,21 @@ static int filter_pred_function(struct filter_pred *pred, void *event)
return pred->op == OP_EQ ? ret : !ret;
}
+/* Filter predicate for caller. */
+static int filter_pred_caller(struct filter_pred *pred, void *event)
+{
+ unsigned long entries[32];
+ unsigned int nr_entries;
+ int i;
+
+ nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
+ for (i = 0; i < nr_entries ; i++)
+ if (pred->val <= entries[i] && entries[i] < pred->val2)
+ return !pred->not;
+
+ return pred->not;
+}
+
/*
* regex_match_foo - Basic regex callbacks
*
@@ -1617,6 +1633,8 @@ static int filter_pred_fn_call(struct filter_pred *pred, void *event)
return filter_pred_cpumask_cpu(pred, event);
case FILTER_PRED_FN_FUNCTION:
return filter_pred_function(pred, event);
+ case FILTER_PRED_FN_CALLER:
+ return filter_pred_caller(pred, event);
case FILTER_PRED_TEST_VISITED:
return test_pred_visited_fn(pred, event);
default:
@@ -2002,10 +2020,28 @@ static int parse_pred(const char *str, void *data,
} else if (field->filter_type == FILTER_DYN_STRING) {
pred->fn_num = FILTER_PRED_FN_STRLOC;
- } else if (field->filter_type == FILTER_RDYN_STRING)
+ } else if (field->filter_type == FILTER_RDYN_STRING) {
pred->fn_num = FILTER_PRED_FN_STRRELLOC;
- else {
+ } else if (field->filter_type == FILTER_CALLER) {
+ unsigned long caller;
+
+ if (op == OP_GLOB)
+ goto err_free;
+ pred->fn_num = FILTER_PRED_FN_CALLER;
+ caller = kallsyms_lookup_name(pred->regex->pattern);
+ if (!caller) {
+ parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
+ goto err_free;
+ }
+ /* Now find the function start and end address */
+ if (!kallsyms_lookup_size_offset(caller, &size, &offset)) {
+ parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
+ goto err_free;
+ }
+ pred->val = caller - offset;
+ pred->val2 = pred->val + size;
+ } else {
if (!ustring_per_cpu) {
/* Once allocated, keep it around for good */
ustring_per_cpu = alloc_percpu(struct ustring_buffer);
--
2.22.0
^ permalink raw reply related
* [PATCH 0/2] tools/bootconfig: render kernel.* subtree as a cmdline string
From: Breno Leitao @ 2026-05-08 13:55 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton
Cc: linux-kernel, linux-trace-kernel, paulmck, oss, Breno Leitao,
kernel-team
Add a bootconfig -> kernel cmdline rendering capability shared between
the kernel parser library and the userspace tools/bootconfig binary.
The new userspace mode "tools/bootconfig -C <file>" walks a bootconfig
file's "kernel" subtree and prints it as a flat, space-separated
cmdline string suitable for direct use as (or appending to) a kernel
command line.
This series prepares tools/bootconfig and lib/bootconfig.c for an
upcoming feature that lets the kernel build render an embedded
bootconfig file's "kernel" subtree to a flat cmdline string and embed
it in the kernel image.
The follow-up series (sent separately) wires this into setup_arch() so
early_param() handlers see values supplied via CONFIG_BOOT_CONFIG_EMBED_FILE,
following Masami suggestion in [1]
These two patches are pure groundwork. They add no kernel feature,
change no runtime behavior, and are useful on their own (the new
"tools/bootconfig -C" mode lets anyone render a .bootconfig file to
a cmdline string from the shell).
Landing them independently lets the follow-up series focus on the
kernel-side plumbing without dragging the refactor and tool addition
through the same review cycle.
Patch 1 lifts xbc_snprint_cmdline() from init/main.c into
lib/bootconfig.c so both the kernel runtime path
(xbc_make_cmdline -> extra_command_line) and the userspace tool can
share a single renderer.
- tools/bootconfig already compiles lib/bootconfig.c directly, so no
new shared-code mechanism is introduced.
Patch 2 adds a -C option to tools/bootconfig that walks the "kernel"
subtree of a bootconfig file and prints it as a flat, space-separated
cmdline string. Missing or empty kernel.* produces empty output and
exits 0.
- This is the renderer the kernel build will invoke.
Once this lands, the follow up patches will use it in the following way:
1) Render at build time.
The kernel build invokes the userspace bootconfig tool — using the -C mode
prep added — to convert the embedded bootconfig file into a flat kernel cmdline
string.
2) Bake the string into the kernel image.
A small assembly stub embeds the rendered file into the kernel's discardable
read-only init data, bracketed by two markers so the runtime can locate it.
3) Add a runtime helper to consume it.
A new helper in the shared bootconfig source — sitting next to the renderer
prep moved there — prepends the embedded blob to a cmdline buffer, panicking
rather than truncating if it overflows. The public header declares it with a
no-op stub when the feature is off.
4) Plumb it at architecture early setup.
The arch's early setup calls the helper after the existing builtin-cmdline
merge but before early-param parsing, so values from the embedded bootconfig
influence early-param handlers (console, log level, memory overrides) right
when architecture setup runs — not later in
Background: the v1 attempt at this feature moved bootconfig parsing
before setup_arch() with ~96KB of static __initdata buffers [1].
Masami suggested doing the conversion at build time instead [2], which
avoids the early-boot allocator dance and the start_kernel() reordering.
This series, plus the follow-up, aims to implement that approach.
[1] https://lore.kernel.org/all/20260415-bootconfig_earlier-v1-0-cf160175de5e@debian.org/
[2] https://lore.kernel.org/all/20260417104436.ece29fd5e2cb7a59c8cf8ac1@kernel.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Breno Leitao (2):
bootconfig: move xbc_snprint_cmdline() to lib/bootconfig.c
tools/bootconfig: render kernel.* subtree as cmdline string with -C
include/linux/bootconfig.h | 3 +++
init/main.c | 45 ----------------------------------
lib/bootconfig.c | 56 +++++++++++++++++++++++++++++++++++++++++++
tools/bootconfig/main.c | 60 +++++++++++++++++++++++++++++++++++++++-------
4 files changed, 111 insertions(+), 53 deletions(-)
---
base-commit: 17c7841d09ee7d33557fd075562d9289b6018c90
change-id: 20260508-bootconfig_using_tools-cfa7aa9d6a5a
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply
* [PATCH 1/2] bootconfig: move xbc_snprint_cmdline() to lib/bootconfig.c
From: Breno Leitao @ 2026-05-08 13:55 UTC (permalink / raw)
To: Masami Hiramatsu, Andrew Morton
Cc: linux-kernel, linux-trace-kernel, paulmck, oss, Breno Leitao,
kernel-team
In-Reply-To: <20260508-bootconfig_using_tools-v1-0-1132219aa773@debian.org>
Move xbc_snprint_cmdline() from init/main.c to lib/bootconfig.c so the
function (and its xbc_namebuf scratch buffer) becomes part of the shared
parser library. tools/bootconfig already compiles lib/bootconfig.c
directly, which lets a follow-up patch reuse the same renderer in the
userspace tool to convert a bootconfig file into a flat cmdline string
at build time.
No functional change.
Signed-off-by: Breno Leitao <leitao@debian.org>
---
include/linux/bootconfig.h | 3 +++
init/main.c | 45 -------------------------------------
lib/bootconfig.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 59 insertions(+), 45 deletions(-)
diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h
index 692a5acc2ffc4..1c7f3b74ffcf3 100644
--- a/include/linux/bootconfig.h
+++ b/include/linux/bootconfig.h
@@ -265,6 +265,9 @@ static inline struct xbc_node * __init xbc_node_get_subkey(struct xbc_node *node
int __init xbc_node_compose_key_after(struct xbc_node *root,
struct xbc_node *node, char *buf, size_t size);
+/* Render key/value pairs under @root as a flat cmdline string */
+int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root);
+
/**
* xbc_node_compose_key() - Compose full key string of the XBC node
* @node: An XBC node.
diff --git a/init/main.c b/init/main.c
index 96f93bb06c490..e363232b428b4 100644
--- a/init/main.c
+++ b/init/main.c
@@ -324,51 +324,6 @@ static void * __init get_boot_config_from_initrd(size_t *_size)
#ifdef CONFIG_BOOT_CONFIG
-static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
-
-#define rest(dst, end) ((end) > (dst) ? (end) - (dst) : 0)
-
-static int __init xbc_snprint_cmdline(char *buf, size_t size,
- struct xbc_node *root)
-{
- struct xbc_node *knode, *vnode;
- char *end = buf + size;
- const char *val, *q;
- int ret;
-
- xbc_node_for_each_key_value(root, knode, val) {
- ret = xbc_node_compose_key_after(root, knode,
- xbc_namebuf, XBC_KEYLEN_MAX);
- if (ret < 0)
- return ret;
-
- vnode = xbc_node_get_child(knode);
- if (!vnode) {
- ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
- if (ret < 0)
- return ret;
- buf += ret;
- continue;
- }
- xbc_array_for_each_value(vnode, val) {
- /*
- * For prettier and more readable /proc/cmdline, only
- * quote the value when necessary, i.e. when it contains
- * whitespace.
- */
- q = strpbrk(val, " \t\r\n") ? "\"" : "";
- ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
- xbc_namebuf, q, val, q);
- if (ret < 0)
- return ret;
- buf += ret;
- }
- }
-
- return buf - (end - size);
-}
-#undef rest
-
/* Make an extra command line under given key word */
static char * __init xbc_make_cmdline(const char *key)
{
diff --git a/lib/bootconfig.c b/lib/bootconfig.c
index c470b93d5dbc2..f445b7703fdd9 100644
--- a/lib/bootconfig.c
+++ b/lib/bootconfig.c
@@ -408,6 +408,62 @@ const char * __init xbc_node_find_next_key_value(struct xbc_node *root,
return ""; /* No value key */
}
+static char xbc_namebuf[XBC_KEYLEN_MAX] __initdata;
+
+#define rest(dst, end) ((end) > (dst) ? (end) - (dst) : 0)
+
+/**
+ * xbc_snprint_cmdline() - Render bootconfig keys under @root as a cmdline string
+ * @buf: Destination buffer (may be NULL when @size is 0 to query the length)
+ * @size: Size of @buf in bytes
+ * @root: Subtree root whose key=value pairs should be rendered
+ *
+ * Walk all key/value pairs under @root and emit them as a space-separated
+ * cmdline string into @buf. Values containing whitespace are quoted with
+ * double quotes. Returns the number of bytes that would be written if @buf
+ * were large enough (matching snprintf semantics), or a negative errno on
+ * failure.
+ */
+int __init xbc_snprint_cmdline(char *buf, size_t size, struct xbc_node *root)
+{
+ struct xbc_node *knode, *vnode;
+ char *end = buf + size;
+ const char *val, *q;
+ int ret;
+
+ xbc_node_for_each_key_value(root, knode, val) {
+ ret = xbc_node_compose_key_after(root, knode,
+ xbc_namebuf, XBC_KEYLEN_MAX);
+ if (ret < 0)
+ return ret;
+
+ vnode = xbc_node_get_child(knode);
+ if (!vnode) {
+ ret = snprintf(buf, rest(buf, end), "%s ", xbc_namebuf);
+ if (ret < 0)
+ return ret;
+ buf += ret;
+ continue;
+ }
+ xbc_array_for_each_value(vnode, val) {
+ /*
+ * For prettier and more readable /proc/cmdline, only
+ * quote the value when necessary, i.e. when it contains
+ * whitespace.
+ */
+ q = strpbrk(val, " \t\r\n") ? "\"" : "";
+ ret = snprintf(buf, rest(buf, end), "%s=%s%s%s ",
+ xbc_namebuf, q, val, q);
+ if (ret < 0)
+ return ret;
+ buf += ret;
+ }
+ }
+
+ return buf - (end - size);
+}
+#undef rest
+
/* XBC parse and tree build */
static int __init xbc_init_node(struct xbc_node *node, char *data, uint16_t flag)
--
2.53.0-Meta
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox