* Re: [PATCH v8 24/46] KVM: guest_memfd: Make in-place conversion the default
From: Yan Zhao @ 2026-06-25 10:57 UTC (permalink / raw)
To: Sean Christopherson, Ackerley Tng, aik, andrew.jones, binbin.wu,
brauner, chao.p.peng, david, jmattson, jthoughton, michael.roth,
oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
shivankg, steven.price, tabba, willy, wyihan, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen, Yuanchu Xie,
Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt, Kiryl Shutsemau,
Baoquan He, Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <ajyJhZcgfYFtGfS2@yzhao56-desk.sh.intel.com>
On Thu, Jun 25, 2026 at 09:51:01AM +0800, Yan Zhao wrote:
> On Wed, Jun 24, 2026 at 05:41:58PM -0700, Sean Christopherson wrote:
> > On Wed, Jun 24, 2026, Ackerley Tng wrote:
> > > Yan Zhao <yan.y.zhao@intel.com> writes:
> > > > With gmem_in_place_conversion=true, userspace can create guest_memfd without the
> > > > MMAP flag. In such cases, shared memory is allocated from different backends.
> > > > This means this module parameter only enables per-gmem memory attribute and does
> > > > not guarantee that gmem in-place conversion will actually occur.
> >
> > KVM module params are pretty much always about what KVM supports, not what is
> > guaranteed to happen.
> >
> > - enable_mmio_caching doesn't guarantee there will actually be MMIO SPTEs,
> > because maybe the guest never accesses emulated MMIO.
> > - enable_pmu doesn't guarantee VMs will get a PMU, because userspace may elect
> > not to advertise one.
> > - and so on and so forth...
> >
> > Yes, there's a small mental jump to get from "KVM supports in-place conversion"
> > to "I need to set memory attributes on the guest_memfd instance, not the VM",
> > but I don't see that as a big hurdle, certainly not in the long term. And once
> > the VMM code is written, I really do think most people are going to care about
> > whether or not KVM supports in-place conversion, not where PRIVATE is tracked.
> Sorry, I just saw this mail after posting my reply in [1].
>
> I'm ok with gmem_in_place_conversion=true just means KVM supports in-place
> conversion, while we can still create VMs with shared memory not from gmem.
Or what about "allow_gmem_in_place_conversion" ?
> Though it still feels a bit odd to require TDX huge pages to depend on
> gmem_in_place_conversion=true when shared memory is not currently allocated from
> gmem, it should become more natural over time once gmem supports in-place
> conversions for huge page.
>
> [1] https://lore.kernel.org/all/ajyCn0PnFtQK+Nka@yzhao56-desk.sh.intel.com
>
>
> > > > To avoid confusion, could we rename this module parameter to something more
> > > > accurate, such as gmem_memory_attribute?
> > >
> > > I asked Sean about this after getting some fixes off list. Sean said
> > > gmem_in_place_conversion is named for a host admin to use, and something
> > > like gmem_memory_attributes is too much implementation details for the
> > > admin.
> > >
> > > Sean, would you reconsider since Yan also asked? If the admin compiled
> > > the kernel knowing what CONFIG_KVM_VM_MEMORY_ATTRIBUTES means, then the
> > > admin would also be able to use a param like gmem_memory_attributes?
> >
> > No, because it's not all memory attributes, it's very specifically the PRIVATE
> > attribute that will get moved to guest_memfd. I don't want to pick a name that
> > will become stale and confusing when RWX attributes come along. The RWX bits
> > will be per-VM, while PRIVATE will be per-guest_memfd.
^ permalink raw reply
* [PATCH v4 1/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Steven Rostedt @ 2026-06-25 10:40 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
kvm, intel-gfx
In-Reply-To: <20260625104007.041432666@kernel.org>
From: Steven Rostedt <rostedt@goodmis.org>
Remove the prototypes of the code that is not associated with
trace_printk() from trace_printk.h.
These control functions as well as ftrace_dump() and trace_dump_stack()
are used in cases where things go wrong. The main use case is to do a
trace_dump_stack(); tracing_off(); ftrace_dump(); in a place that detected
that something went wrong, whereas, trace_printk() is added to normal code
during debugging and removed before committing upstream. The dump code is
fine to keep in production.
Suggested-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
Changes since v3: https://patch.msgid.link/20260624081948.147764194@kernel.org
- Move include out of #if statement in rcu.h
kernel test robot found other configs that could require the
control functions in rcu.h. Just always include it in that file.
arch/powerpc/xmon/xmon.c | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/machine_kexec.c | 1 +
drivers/gpu/drm/i915/i915_gem.h | 1 +
drivers/tty/sysrq.c | 1 +
include/linux/trace_controls.h | 54 ++++++++++++++++++++++++++++++++
include/linux/trace_printk.h | 51 ------------------------------
kernel/debug/debug_core.c | 1 +
kernel/panic.c | 1 +
kernel/rcu/rcu.h | 1 +
kernel/rcu/rcutorture.c | 1 +
kernel/trace/trace.h | 1 +
kernel/trace/trace_benchmark.c | 1 +
lib/sys_info.c | 1 +
14 files changed, 66 insertions(+), 51 deletions(-)
create mode 100644 include/linux/trace_controls.h
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index cb3a3244ae6f..2135f319e0dd 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -27,6 +27,7 @@
#include <linux/highmem.h>
#include <linux/security.h>
#include <linux/debugfs.h>
+#include <linux/trace_controls.h>
#include <asm/ptrace.h>
#include <asm/smp.h>
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index 3c346b02ceb9..baac66cc4de4 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -22,6 +22,7 @@
#include <linux/debug_locks.h>
#include <linux/vmalloc.h>
#include <linux/secure_boot.h>
+#include <linux/trace_controls.h>
#include <asm/asm-extable.h>
#include <asm/machine.h>
#include <asm/diag.h>
diff --git a/arch/s390/kernel/machine_kexec.c b/arch/s390/kernel/machine_kexec.c
index baeb3dcfc1c8..33f9a89eb3ad 100644
--- a/arch/s390/kernel/machine_kexec.c
+++ b/arch/s390/kernel/machine_kexec.c
@@ -12,6 +12,7 @@
#include <linux/delay.h>
#include <linux/reboot.h>
#include <linux/ftrace.h>
+#include <linux/trace_controls.h>
#include <linux/debug_locks.h>
#include <linux/cpufeature.h>
#include <asm/guarded_storage.h>
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 20b3cb29cfff..1da8fb61c09e 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -116,6 +116,7 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
#endif
#if IS_ENABLED(CONFIG_DRM_I915_TRACE_GEM)
+#include <linux/trace_controls.h>
#define GEM_TRACE(...) trace_printk(__VA_ARGS__)
#define GEM_TRACE_ERR(...) do { \
pr_err(__VA_ARGS__); \
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index c2e4b31b699a..d3f72dc430b8 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -324,6 +324,7 @@ static const struct sysrq_key_op sysrq_showstate_blocked_op = {
};
#ifdef CONFIG_TRACING
+#include <linux/trace_controls.h>
#include <linux/ftrace.h>
static void sysrq_ftrace_dump(u8 key)
diff --git a/include/linux/trace_controls.h b/include/linux/trace_controls.h
new file mode 100644
index 000000000000..995b97e963b4
--- /dev/null
+++ b/include/linux/trace_controls.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_TRACE_CONTROLS_H
+#define _LINUX_TRACE_CONTROLS_H
+
+
+/*
+ * General tracing related utility functions - trace_printk(),
+ * tracing_on/tracing_off and tracing_start()/tracing_stop
+ *
+ * Use tracing_on/tracing_off when you want to quickly turn on or off
+ * tracing. It simply enables or disables the recording of the trace events.
+ * This also corresponds to the user space /sys/kernel/tracing/tracing_on
+ * file, which gives a means for the kernel and userspace to interact.
+ * Place a tracing_off() in the kernel where you want tracing to end.
+ * From user space, examine the trace, and then echo 1 > tracing_on
+ * to continue tracing.
+ *
+ * tracing_stop/tracing_start has slightly more overhead. It is used
+ * by things like suspend to ram where disabling the recording of the
+ * trace is not enough, but tracing must actually stop because things
+ * like calling smp_processor_id() may crash the system.
+ *
+ * Most likely, you want to use tracing_on/tracing_off.
+ */
+enum ftrace_dump_mode {
+ DUMP_NONE,
+ DUMP_ALL,
+ DUMP_ORIG,
+ DUMP_PARAM,
+};
+
+#ifdef CONFIG_TRACING
+void tracing_on(void);
+void tracing_off(void);
+int tracing_is_on(void);
+void tracing_snapshot(void);
+void tracing_snapshot_alloc(void);
+void tracing_start(void);
+void tracing_stop(void);
+void trace_dump_stack(int skip);
+void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
+#else
+static inline void tracing_start(void) { }
+static inline void tracing_stop(void) { }
+static inline void tracing_on(void) { }
+static inline void tracing_off(void) { }
+static inline int tracing_is_on(void) { return 0; }
+static inline void tracing_snapshot(void) { }
+static inline void tracing_snapshot_alloc(void) { }
+static inline void trace_dump_stack(int skip) { }
+static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
+#endif
+
+#endif /* _LINUX_TRACE_CONTROLS_H */
diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
index 3d54f440dccf..a488ea9e9f85 100644
--- a/include/linux/trace_printk.h
+++ b/include/linux/trace_printk.h
@@ -7,43 +7,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
-/*
- * General tracing related utility functions - trace_printk(),
- * tracing_on/tracing_off and tracing_start()/tracing_stop
- *
- * Use tracing_on/tracing_off when you want to quickly turn on or off
- * tracing. It simply enables or disables the recording of the trace events.
- * This also corresponds to the user space /sys/kernel/tracing/tracing_on
- * file, which gives a means for the kernel and userspace to interact.
- * Place a tracing_off() in the kernel where you want tracing to end.
- * From user space, examine the trace, and then echo 1 > tracing_on
- * to continue tracing.
- *
- * tracing_stop/tracing_start has slightly more overhead. It is used
- * by things like suspend to ram where disabling the recording of the
- * trace is not enough, but tracing must actually stop because things
- * like calling smp_processor_id() may crash the system.
- *
- * Most likely, you want to use tracing_on/tracing_off.
- */
-
-enum ftrace_dump_mode {
- DUMP_NONE,
- DUMP_ALL,
- DUMP_ORIG,
- DUMP_PARAM,
-};
-
#ifdef CONFIG_TRACING
-void tracing_on(void);
-void tracing_off(void);
-int tracing_is_on(void);
-void tracing_snapshot(void);
-void tracing_snapshot_alloc(void);
-
-extern void tracing_start(void);
-extern void tracing_stop(void);
-
static inline __printf(1, 2)
void ____trace_printk_check_format(const char *fmt, ...)
{
@@ -149,8 +113,6 @@ int __trace_printk(unsigned long ip, const char *fmt, ...);
extern int __trace_bputs(unsigned long ip, const char *str);
extern int __trace_puts(unsigned long ip, const char *str);
-extern void trace_dump_stack(int skip);
-
/*
* The double __builtin_constant_p is because gcc will give us an error
* if we try to allocate the static variable to fmt if it is not a
@@ -173,19 +135,7 @@ __ftrace_vbprintk(unsigned long ip, const char *fmt, va_list ap);
extern __printf(2, 0) int
__ftrace_vprintk(unsigned long ip, const char *fmt, va_list ap);
-
-extern void ftrace_dump(enum ftrace_dump_mode oops_dump_mode);
#else
-static inline void tracing_start(void) { }
-static inline void tracing_stop(void) { }
-static inline void trace_dump_stack(int skip) { }
-
-static inline void tracing_on(void) { }
-static inline void tracing_off(void) { }
-static inline int tracing_is_on(void) { return 0; }
-static inline void tracing_snapshot(void) { }
-static inline void tracing_snapshot_alloc(void) { }
-
static inline __printf(1, 2)
int trace_printk(const char *fmt, ...)
{
@@ -196,7 +146,6 @@ ftrace_vprintk(const char *fmt, va_list ap)
{
return 0;
}
-static inline void ftrace_dump(enum ftrace_dump_mode oops_dump_mode) { }
#endif /* CONFIG_TRACING */
#endif
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index b276504c1c6b..f9c83a470c98 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -27,6 +27,7 @@
#define pr_fmt(fmt) "KGDB: " fmt
+#include <linux/trace_controls.h>
#include <linux/pid_namespace.h>
#include <linux/clocksource.h>
#include <linux/serial_core.h>
diff --git a/kernel/panic.c b/kernel/panic.c
index 213725b612aa..1415e910371d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -9,6 +9,7 @@
* This function is used through-out the kernel (including mm and fs)
* to indicate a major problem.
*/
+#include <linux/trace_controls.h>
#include <linux/debug_locks.h>
#include <linux/sched/debug.h>
#include <linux/interrupt.h>
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index fa6d30ce73d1..735a80df0b30 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <trace/events/rcu.h>
+#include <linux/trace_controls.h>
/*
* Grace-period counter management.
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 882a158ada7b..76bf0184b267 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -39,6 +39,7 @@
#include <linux/srcu.h>
#include <linux/slab.h>
#include <linux/trace_clock.h>
+#include <linux/trace_controls.h>
#include <asm/byteorder.h>
#include <linux/torture.h>
#include <linux/vmalloc.h>
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 80fe152af1dd..2537c33ddd49 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/once_lite.h>
#include <linux/ftrace_regs.h>
+#include <linux/trace_controls.h>
#include <linux/llist.h>
#include "pid_list.h"
diff --git a/kernel/trace/trace_benchmark.c b/kernel/trace/trace_benchmark.c
index e19c32f2a938..69cc39008c36 100644
--- a/kernel/trace/trace_benchmark.c
+++ b/kernel/trace/trace_benchmark.c
@@ -3,6 +3,7 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/trace_clock.h>
+#include <linux/trace_controls.h>
#define CREATE_TRACE_POINTS
#include "trace_benchmark.h"
diff --git a/lib/sys_info.c b/lib/sys_info.c
index f32a06ec9ed4..e3c9ca05601b 100644
--- a/lib/sys_info.c
+++ b/lib/sys_info.c
@@ -8,6 +8,7 @@
#include <linux/ftrace.h>
#include <linux/nmi.h>
#include <linux/sched/debug.h>
+#include <linux/trace_controls.h>
#include <linux/string.h>
#include <linux/sysctl.h>
--
2.53.0
^ permalink raw reply related
* [PATCH v4 2/2] tracing: Remove trace_printk.h from kernel.h
From: Steven Rostedt @ 2026-06-25 10:40 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
kvm, intel-gfx
In-Reply-To: <20260625104007.041432666@kernel.org>
From: Steven Rostedt <rostedt@goodmis.org>
There have been complaints about trace_printk.h causing more build time
for being in kernel.h if it changes. There is also an effort to clean up
kernel.h to have it not include unneeded header files. Move trace_printk.h
out of kernel.h and place it in the headers and C files that use it.
Link: https://lore.kernel.org/all/CAHk-=wikCBeVFjVXiY4o-oepdbjAoir5+TcAgtL12c4u1TpZLQ@mail.gmail.com/
Suggested-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
arch/powerpc/kvm/book3s_xics.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
drivers/gpu/drm/i915/i915_gem.h | 1 +
drivers/hwtracing/stm/dummy_stm.c | 1 +
drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
drivers/usb/early/xhci-dbc.c | 1 +
fs/ext4/inline.c | 1 +
include/linux/ftrace.h | 2 ++
include/linux/kernel.h | 1 -
include/linux/sunrpc/debug.h | 1 +
include/linux/trace_printk.h | 5 +++--
kernel/trace/ring_buffer_benchmark.c | 1 +
samples/fprobe/fprobe_example.c | 1 +
samples/ftrace/ftrace-direct-too.c | 1 -
samples/trace_printk/trace-printk.c | 1 +
15 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 74a44fa702b0..ef5eb596a56e 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -26,6 +26,7 @@
#if 1
#define XICS_DBG(fmt...) do { } while (0)
#else
+#include <linux/trace_printk.h>
#define XICS_DBG(fmt...) trace_printk(fmt)
#endif
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index b54ee4f25af1..f6f223090760 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -35,6 +35,7 @@
#define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
#if IS_ENABLED(CONFIG_DRM_I915_TRACE_GTT)
+#include <linux/trace_printk.h>
#define GTT_TRACE(...) trace_printk(__VA_ARGS__)
#else
#define GTT_TRACE(...)
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 1da8fb61c09e..f490052e8964 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -117,6 +117,7 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file);
#if IS_ENABLED(CONFIG_DRM_I915_TRACE_GEM)
#include <linux/trace_controls.h>
+#include <linux/trace_printk.h>
#define GEM_TRACE(...) trace_printk(__VA_ARGS__)
#define GEM_TRACE_ERR(...) do { \
pr_err(__VA_ARGS__); \
diff --git a/drivers/hwtracing/stm/dummy_stm.c b/drivers/hwtracing/stm/dummy_stm.c
index 38528ffdc0b3..7c5e48ebfb9f 100644
--- a/drivers/hwtracing/stm/dummy_stm.c
+++ b/drivers/hwtracing/stm/dummy_stm.c
@@ -8,6 +8,7 @@
*/
#undef DEBUG
+#include <linux/trace_printk.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/slab.h>
diff --git a/drivers/infiniband/hw/hfi1/trace_dbg.h b/drivers/infiniband/hw/hfi1/trace_dbg.h
index 58304b91380f..30df5e246586 100644
--- a/drivers/infiniband/hw/hfi1/trace_dbg.h
+++ b/drivers/infiniband/hw/hfi1/trace_dbg.h
@@ -103,6 +103,7 @@ __hfi1_trace_def(IOCTL);
*/
#ifdef HFI1_EARLY_DBG
+#include <linux/trace_printk.h>
#define hfi1_dbg_early(fmt, ...) \
trace_printk(fmt, ##__VA_ARGS__)
#else
diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
index 41118bba9197..955c73bd601f 100644
--- a/drivers/usb/early/xhci-dbc.c
+++ b/drivers/usb/early/xhci-dbc.c
@@ -30,6 +30,7 @@ static struct xdbc_state xdbc;
static bool early_console_keep;
#ifdef XDBC_TRACE
+#include <linux/trace_printk.h>
#define xdbc_trace trace_printk
#else
static inline void xdbc_trace(const char *fmt, ...) { }
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 8045e4ff270c..0eff4a0c6a6c 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -934,6 +934,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
}
#ifdef INLINE_DIR_DEBUG
+#include <linux/trace_printk.h>
void ext4_show_inline_dir(struct inode *dir, struct buffer_head *bh,
void *inline_start, int inline_size)
{
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 02bc5027523a..b5336a81e619 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -8,6 +8,8 @@
#define _LINUX_FTRACE_H
#include <linux/trace_recursion.h>
+#include <linux/trace_controls.h>
+#include <linux/trace_printk.h>
#include <linux/trace_clock.h>
#include <linux/jump_label.h>
#include <linux/kallsyms.h>
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e5570a16cbb1..e87a40fbd152 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -31,7 +31,6 @@
#include <linux/build_bug.h>
#include <linux/sprintf.h>
#include <linux/static_call_types.h>
-#include <linux/trace_printk.h>
#include <linux/util_macros.h>
#include <linux/wordpart.h>
diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index ab61bed2f7af..7524f5d82fba 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -29,6 +29,7 @@ extern unsigned int nlm_debug;
# define ifdebug(fac) if (unlikely(rpc_debug & RPCDBG_##fac))
# if IS_ENABLED(CONFIG_SUNRPC_DEBUG_TRACE)
+# include <linux/trace_printk.h>
# define __sunrpc_printk(fmt, ...) trace_printk(fmt, ##__VA_ARGS__)
# else
# define __sunrpc_printk(fmt, ...) printk(KERN_DEFAULT fmt, ##__VA_ARGS__)
diff --git a/include/linux/trace_printk.h b/include/linux/trace_printk.h
index a488ea9e9f85..74ce4f8995c4 100644
--- a/include/linux/trace_printk.h
+++ b/include/linux/trace_printk.h
@@ -1,11 +1,12 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_TRACE_PRINTK_H
#define _LINUX_TRACE_PRINTK_H
+#if !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) && !defined(BUILD_VDSO)
-#include <linux/compiler_attributes.h>
#include <linux/instruction_pointer.h>
#include <linux/stddef.h>
#include <linux/stringify.h>
+#include <linux/stdarg.h>
#ifdef CONFIG_TRACING
static inline __printf(1, 2)
@@ -147,5 +148,5 @@ ftrace_vprintk(const char *fmt, va_list ap)
return 0;
}
#endif /* CONFIG_TRACING */
-
+#endif /* !defined(__ASSEMBLY__) && !defined(__GENKSYMS__) && !defined(BUILD_VDSO) */
#endif
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 593e3b59e42e..2bb25caebb75 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -5,6 +5,7 @@
* Copyright (C) 2009 Steven Rostedt <srostedt@redhat.com>
*/
#include <linux/ring_buffer.h>
+#include <linux/trace_printk.h>
#include <linux/completion.h>
#include <linux/kthread.h>
#include <uapi/linux/sched/types.h>
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index bfe98ce826f3..de81b9b4ca7d 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -12,6 +12,7 @@
#define pr_fmt(fmt) "%s: " fmt, __func__
+#include <linux/trace_printk.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/fprobe.h>
diff --git a/samples/ftrace/ftrace-direct-too.c b/samples/ftrace/ftrace-direct-too.c
index bf2411aa6fd7..159190f4103f 100644
--- a/samples/ftrace/ftrace-direct-too.c
+++ b/samples/ftrace/ftrace-direct-too.c
@@ -1,6 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/module.h>
-
#include <linux/mm.h> /* for handle_mm_fault() */
#include <linux/ftrace.h>
#if !defined(CONFIG_ARM64) && !defined(CONFIG_PPC32)
diff --git a/samples/trace_printk/trace-printk.c b/samples/trace_printk/trace-printk.c
index cfc159580263..ff37aeb8523e 100644
--- a/samples/trace_printk/trace-printk.c
+++ b/samples/trace_printk/trace-printk.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/trace_printk.h>
#include <linux/module.h>
#include <linux/kthread.h>
#include <linux/irq_work.h>
--
2.53.0
^ permalink raw reply related
* [PATCH v4 0/2] tracing: Move non-trace_printk prototypes into trace_controls.h
From: Steven Rostedt @ 2026-06-25 10:40 UTC (permalink / raw)
To: linux-kernel, linux-trace-kernel
Cc: Masami Hiramatsu, Mark Rutland, Mathieu Desnoyers, Andrew Morton,
Linus Torvalds, Sebastian Andrzej Siewior, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov,
linux-doc, linux-kbuild, linuxppc-dev, dri-devel, linux-stm32,
linux-arm-kernel, linux-rdma, linux-usb, linux-ext4, linux-nfs,
kvm, intel-gfx
Remove trace_printk.h by creating a trace_controls.h for those places that
need access to tracing prototypes like tracing_off() and for the places that
need trace_printk() directly, to have it included directly.
Changse since v3: https://lore.kernel.org/all/20260624081806.120105649@kernel.org/
- Always include trace_controls.h in rcu.h (kernel test robot)
There are other configs that may include tracing_off() in rcu.h besides
the one that had the include of trace_controls.h. Just always include
it in that header to be safe.
Steven Rostedt (2):
tracing: Move non-trace_printk prototypes into trace_controls.h
tracing: Remove trace_printk.h from kernel.h
----
arch/powerpc/kvm/book3s_xics.c | 1 +
arch/powerpc/xmon/xmon.c | 1 +
arch/s390/kernel/ipl.c | 1 +
arch/s390/kernel/machine_kexec.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 1 +
drivers/gpu/drm/i915/i915_gem.h | 2 ++
drivers/hwtracing/stm/dummy_stm.c | 1 +
drivers/infiniband/hw/hfi1/trace_dbg.h | 1 +
drivers/tty/sysrq.c | 1 +
drivers/usb/early/xhci-dbc.c | 1 +
fs/ext4/inline.c | 1 +
include/linux/ftrace.h | 2 ++
include/linux/kernel.h | 1 -
include/linux/sunrpc/debug.h | 1 +
include/linux/trace_controls.h | 54 ++++++++++++++++++++++++++++++++
include/linux/trace_printk.h | 56 ++--------------------------------
kernel/debug/debug_core.c | 1 +
kernel/panic.c | 1 +
kernel/rcu/rcu.h | 1 +
kernel/rcu/rcutorture.c | 1 +
kernel/trace/ring_buffer_benchmark.c | 1 +
kernel/trace/trace.h | 1 +
kernel/trace/trace_benchmark.c | 1 +
lib/sys_info.c | 1 +
samples/fprobe/fprobe_example.c | 1 +
samples/ftrace/ftrace-direct-too.c | 1 -
samples/trace_printk/trace-printk.c | 1 +
27 files changed, 82 insertions(+), 55 deletions(-)
create mode 100644 include/linux/trace_controls.h
^ permalink raw reply
* Re: [PATCH 5.10.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
To: stable
Cc: Sasha Levin, Bjoern Doebel, Steven Rostedt, Masami Hiramatsu,
linux-trace-kernel, linux-kernel, Mathieu Desnoyers,
David Howells
In-Reply-To: <20260624122413.2477871-1-doebel@amazon.de>
> [PATCH 5.10.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
Same as the 5.15 one - I had to drop this for 5.10. The
guard(raw_spinlock_irqsave) conversion triggers a new
-Wdeclaration-after-statement warning in ring_buffer_read_start() on this tree.
Please respin a warning-free version for 5.10/5.15. 6.6 and 6.1 are queued.
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH 5.15.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
To: stable
Cc: Sasha Levin, Bjoern Doebel, Steven Rostedt, Masami Hiramatsu,
linux-trace-kernel, linux-kernel, Mathieu Desnoyers,
David Howells
In-Reply-To: <20260624122351.2477592-1-doebel@amazon.de>
> [PATCH 5.15.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
I had to drop this one for 5.15. The upstream guard(raw_spinlock_irqsave)
conversion in ring_buffer_read_start() introduces a new
-Wdeclaration-after-statement warning on 5.15 (the guard variable ends up after
a statement), which the build flags as an
error there.
Could you respin a warning-free version for 5.15 (and 5.10, which has the same
problem)? E.g. hoisting the declaration or keeping the explicit
raw_spin_lock/unlock instead of guard() on these older trees. 6.6 and 6.1 are
already queued.
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH 6.1.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
To: stable
Cc: Sasha Levin, Bjoern Doebel, Steven Rostedt, Masami Hiramatsu,
linux-trace-kernel, linux-kernel, Mathieu Desnoyers,
David Howells
In-Reply-To: <20260624122328.2477272-1-doebel@amazon.de>
> [PATCH 6.1.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
Queued for 6.1, thanks! (6.6 is queued too; 5.15/5.10 need a respin -
see those threads.)
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH 6.6.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
From: Sasha Levin @ 2026-06-25 10:41 UTC (permalink / raw)
To: stable
Cc: Sasha Levin, Bjoern Doebel, Steven Rostedt, Masami Hiramatsu,
linux-trace-kernel, linux-kernel, Mathieu Desnoyers,
David Howells
In-Reply-To: <20260624122258.2476991-1-doebel@amazon.de>
> [PATCH 6.6.y] ring-buffer: Remove ring_buffer_read_prepare_sync()
Queued for 6.6, thanks! 6.1 is queued too; the 5.15 and 5.10 versions
need a respin - see my replies on those threads.
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH v8 46/46] KVM: selftests: Update private memory exits test to work with per-gmem attributes
From: Fuad Tabba @ 2026-06-25 9:56 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-46-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Skip setting memory to private in the private memory exits test when using
> per-gmem memory attributes, as memory is initialized to private by default
> for guest_memfd, and using vm_mem_set_private() on a guest_memfd instance
> requires creating guest_memfd with GUEST_MEMFD_FLAG_MMAP (which is totally
> doable, but would need to be conditional and is ultimately unnecessary).
>
> Expect an emulated MMIO instead of a memory fault exit when attributes are
> per-gmem, as deleting the memslot effectively drops the private status,
> i.e. the GPA becomes shared and thus supports emulated MMIO.
>
> Skip the "memslot not private" test entirely, as private vs. shared state
> for x86 software-protected VMs comes from the memory attributes themselves,
> and so when doing in-place conversions there can never be a disconnect
> between the expected and actual states.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../selftests/kvm/x86/private_mem_kvm_exits_test.c | 36 ++++++++++++++++++----
> 1 file changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c b/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
> index 10db9fe6d9063..70ed16066c63e 100644
> --- a/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
> +++ b/tools/testing/selftests/kvm/x86/private_mem_kvm_exits_test.c
> @@ -62,8 +62,9 @@ static void test_private_access_memslot_deleted(void)
>
> virt_map(vm, EXITS_TEST_GVA, EXITS_TEST_GPA, EXITS_TEST_NPAGES);
>
> - /* Request to access page privately */
> - vm_mem_set_private(vm, EXITS_TEST_GPA, EXITS_TEST_SIZE);
> + /* Request to access page privately. */
> + if (!kvm_has_gmem_attributes)
> + vm_mem_set_private(vm, EXITS_TEST_GPA, EXITS_TEST_SIZE);
>
> pthread_create(&vm_thread, NULL,
> (void *(*)(void *))run_vcpu_get_exit_reason,
> @@ -74,10 +75,26 @@ static void test_private_access_memslot_deleted(void)
> pthread_join(vm_thread, &thread_return);
> exit_reason = (u32)(u64)thread_return;
>
> - TEST_ASSERT_EQ(exit_reason, KVM_EXIT_MEMORY_FAULT);
> - TEST_ASSERT_EQ(vcpu->run->memory_fault.flags, KVM_MEMORY_EXIT_FLAG_PRIVATE);
> - TEST_ASSERT_EQ(vcpu->run->memory_fault.gpa, EXITS_TEST_GPA);
> - TEST_ASSERT_EQ(vcpu->run->memory_fault.size, EXITS_TEST_SIZE);
> + /*
> + * If attributes are tracked per-gmem, deleting the memslot that points
> + * at the gmem instance effectively makes the memory shared, and so the
> + * read should trigger emulated MMIO.
> + *
> + * If attributes are tracked per-VM, deleting the memslot shouldn't
> + * affect the private attribute, and so KVM should generate a memory
> + * fault exit (emulated MMIO on private GPAs is disallowed).
> + */
> + if (kvm_has_gmem_attributes) {
> + TEST_ASSERT_EQ(exit_reason, KVM_EXIT_MMIO);
> + TEST_ASSERT_EQ(vcpu->run->mmio.phys_addr, EXITS_TEST_GPA);
> + TEST_ASSERT_EQ(vcpu->run->mmio.len, sizeof(u64));
> + TEST_ASSERT_EQ(vcpu->run->mmio.is_write, false);
> + } else {
> + TEST_ASSERT_EQ(exit_reason, KVM_EXIT_MEMORY_FAULT);
> + TEST_ASSERT_EQ(vcpu->run->memory_fault.flags, KVM_MEMORY_EXIT_FLAG_PRIVATE);
> + TEST_ASSERT_EQ(vcpu->run->memory_fault.gpa, EXITS_TEST_GPA);
> + TEST_ASSERT_EQ(vcpu->run->memory_fault.size, EXITS_TEST_SIZE);
> + }
>
> kvm_vm_free(vm);
> }
> @@ -88,6 +105,13 @@ static void test_private_access_memslot_not_private(void)
> struct kvm_vcpu *vcpu;
> u32 exit_reason;
>
> + /*
> + * Accessing non-private memory as private with a software-protected VM
> + * isn't possible when doing in-place conversions.
> + */
> + if (kvm_has_gmem_attributes)
> + return;
> +
> vm = vm_create_shape_with_one_vcpu(protected_vm_shape, &vcpu,
> guest_repeatedly_read);
>
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 45/46] KVM: selftests: Update private_mem_conversions_test to mmap() guest_memfd
From: Fuad Tabba @ 2026-06-25 9:43 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-45-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Update the private memory conversions selftest to also test conversions
> that are done "in-place" via per-guest_memfd memory attributes. In-place
> conversions require the host to be able to mmap() the guest_memfd so that
> the host and guest can share the same backing physical memory.
>
> This includes several updates, that are conditioned on the system
> supporting per-guest_memfd attributes (kvm_has_gmem_attributes):
>
> 1. Set up guest_memfd requesting MMAP and INIT_SHARED.
>
> 2. With in-place conversions, the host's mapping points directly to the
> guest's memory. When the guest converts a region to private, host access
> to that region is blocked. Update the test to expect a SIGBUS when
> attempting to access the host virtual address (HVA) of private memory.
>
> 3. Use vm_mem_set_memory_attributes(), which chooses how to set memory
> attributes based on whether kvm_has_gmem_attributes.
>
> Restrict the test to using VM_MEM_SRC_SHMEM because guest_memfd's required
> mmap() flags and page sizes happens to align with those of
> VM_MEM_SRC_SHMEM. As long as VM_MEM_SRC_SHMEM is used for src_type,
> vm_mem_add() works as intended.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../kvm/x86/private_mem_conversions_test.c | 44 ++++++++++++++++++----
> 1 file changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> index 289ad10063fca..4308c67952310 100644
> --- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> @@ -306,9 +306,12 @@ static void handle_exit_hypercall(struct kvm_vcpu *vcpu)
> if (do_fallocate)
> vm_guest_mem_fallocate(vm, gpa, size, map_shared);
>
> - if (set_attributes)
> - vm_set_memory_attributes(vm, gpa, size,
> - map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE);
> + if (set_attributes) {
> + u64 attrs = map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE;
> +
> + vm_mem_set_memory_attributes(vm, gpa, size, attrs);
> + }
> +
> run->hypercall.ret = 0;
> }
>
> @@ -352,8 +355,20 @@ static void *__test_mem_conversions(void *__vcpu)
> size_t nr_bytes = min_t(size_t, vm->page_size, size - i);
> u8 *hva = addr_gpa2hva(vm, gpa + i);
>
> - /* In all cases, the host should observe the shared data. */
> - memcmp_h(hva, gpa + i, uc.args[3], nr_bytes);
> + /*
> + * When using per-guest_memfd memory attributes,
> + * i.e. in-place conversion, host accesses will
> + * point at guest memory and should SIGBUS when
> + * guest memory is private. When using per-VM
> + * attributes, i.e. separate backing for shared
> + * vs. private, the host should always observe
> + * the shared data.
> + */
> + if (kvm_has_gmem_attributes &&
> + uc.args[0] == SYNC_PRIVATE)
> + TEST_EXPECT_SIGBUS(READ_ONCE(*hva));
> + else
> + memcmp_h(hva, gpa + i, uc.args[3], nr_bytes);
>
> /* For shared, write the new pattern to guest memory. */
> if (uc.args[0] == SYNC_SHARED)
> @@ -382,6 +397,7 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, u32 nr_v
> const size_t slot_size = memfd_size / nr_memslots;
> struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
> pthread_t threads[KVM_MAX_VCPUS];
> + u64 gmem_flags;
> struct kvm_vm *vm;
> int memfd, i;
>
> @@ -397,12 +413,17 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, u32 nr_v
>
> vm_enable_cap(vm, KVM_CAP_EXIT_HYPERCALL, (1 << KVM_HC_MAP_GPA_RANGE));
>
> - memfd = vm_create_guest_memfd(vm, memfd_size, 0);
> + if (kvm_has_gmem_attributes)
> + gmem_flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED;
> + else
> + gmem_flags = 0;
> +
> + memfd = vm_create_guest_memfd(vm, memfd_size, gmem_flags);
>
> for (i = 0; i < nr_memslots; i++)
> vm_mem_add(vm, src_type, BASE_DATA_GPA + slot_size * i,
> BASE_DATA_SLOT + i, slot_size / vm->page_size,
> - KVM_MEM_GUEST_MEMFD, memfd, slot_size * i, 0);
> + KVM_MEM_GUEST_MEMFD, memfd, slot_size * i, gmem_flags);
>
> for (i = 0; i < nr_vcpus; i++) {
> gpa_t gpa = BASE_DATA_GPA + i * per_cpu_size;
> @@ -452,17 +473,24 @@ static void usage(const char *cmd)
>
> int main(int argc, char *argv[])
> {
> - enum vm_mem_backing_src_type src_type = DEFAULT_VM_MEM_SRC;
> + enum vm_mem_backing_src_type src_type;
> u32 nr_memslots = 1;
> u32 nr_vcpus = 1;
> int opt;
>
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> + src_type = kvm_has_gmem_attributes ? VM_MEM_SRC_SHMEM :
> + DEFAULT_VM_MEM_SRC;
> +
> while ((opt = getopt(argc, argv, "hm:s:n:")) != -1) {
> switch (opt) {
> case 's':
> src_type = parse_backing_src_type(optarg);
> + TEST_ASSERT(!kvm_has_gmem_attributes ||
> + src_type == VM_MEM_SRC_SHMEM,
> + "Testing in-place conversions, only %s mem_type supported\n",
> + vm_mem_backing_src_alias(VM_MEM_SRC_SHMEM)->name);
> break;
> case 'n':
> nr_vcpus = atoi_positive("nr_vcpus", optarg);
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 44/46] KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe
From: Fuad Tabba @ 2026-06-25 9:30 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-44-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> The TEST_EXPECT_SIGBUS macro is not thread-safe as it uses a global
> sigjmp_buf and installs a global SIGBUS signal handler. If multiple threads
> execute the macro concurrently, they will race on installing the signal
> handler and stomp on other threads' jump buffers, leading to incorrect test
> behavior.
>
> Make TEST_EXPECT_SIGBUS thread-safe with the following changes:
>
> Share the KVM tests' global signal handler. sigaction() applies to all
> threads; without sharing a global signal handler, one thread may have
> removed the signal handler that another thread added, hence leading to
> unexpected signals.
>
> The alternative of layering signal handlers was considered, but calling
> sigaction() within TEST_EXPECT_SIGBUS() necessarily creates a race. To
> avoid adding new setup and teardown routines to do sigaction() and keep
> usage of TEST_EXPECT_SIGBUS() simple, share the KVM tests' global signal
> handler.
>
> Opportunistically rename report_unexpected_signal to
> catchall_signal_handler.
>
> To continue to only expect SIGBUS within specific regions of code, use a
> thread-specific variable, expecting_sigbus, to replace installing and
> removing signal handlers.
>
> Make the execution environment for the thread, sigjmp_buf, a
> thread-specific variable.
>
> As part of TEST_EXPECT_SIGBUS(), assert the prerequisite for this setup,
> that the current signal handler is the catchall_signal_handler.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/include/test_util.h | 32 +++++++++++++------------
> tools/testing/selftests/kvm/lib/kvm_util.c | 18 ++++++++++----
> tools/testing/selftests/kvm/lib/test_util.c | 7 ------
> 3 files changed, 30 insertions(+), 27 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
> index 51287fac8138a..bd75162ec868d 100644
> --- a/tools/testing/selftests/kvm/include/test_util.h
> +++ b/tools/testing/selftests/kvm/include/test_util.h
> @@ -82,21 +82,23 @@ do { \
> __builtin_unreachable(); \
> } while (0)
>
> -extern sigjmp_buf expect_sigbus_jmpbuf;
> -void expect_sigbus_handler(int signum);
> -
> -#define TEST_EXPECT_SIGBUS(action) \
> -do { \
> - struct sigaction sa_old, sa_new = { \
> - .sa_handler = expect_sigbus_handler, \
> - }; \
> - \
> - sigaction(SIGBUS, &sa_new, &sa_old); \
> - if (sigsetjmp(expect_sigbus_jmpbuf, 1) == 0) { \
> - action; \
> - TEST_FAIL("'%s' should have triggered SIGBUS", #action); \
> - } \
> - sigaction(SIGBUS, &sa_old, NULL); \
> +extern __thread sigjmp_buf expect_sigbus_jmpbuf;
> +extern __thread volatile sig_atomic_t expecting_sigbus;
> +extern void catchall_signal_handler(int signum);
> +
> +#define TEST_EXPECT_SIGBUS(action) \
> +do { \
> + struct sigaction __sa = {}; \
> + \
> + TEST_ASSERT_EQ(sigaction(SIGBUS, NULL, &__sa), 0); \
> + TEST_ASSERT_EQ(__sa.sa_handler, &catchall_signal_handler); \
> + \
> + expecting_sigbus = true; \
> + if (sigsetjmp(expect_sigbus_jmpbuf, 1) == 0) { \
> + action; \
> + TEST_FAIL("'%s' should have triggered SIGBUS", #action);\
> + } \
> + expecting_sigbus = false; \
> } while (0)
>
> size_t parse_size(const char *size);
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 6b304e8a0e0d5..b4f104436875b 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -2292,13 +2292,20 @@ __weak void kvm_selftest_arch_init(void)
> {
> }
>
> -static void report_unexpected_signal(int signum)
> +__thread sigjmp_buf expect_sigbus_jmpbuf;
> +__thread volatile sig_atomic_t expecting_sigbus;
> +
> +void catchall_signal_handler(int signum)
> {
> + switch (signum) {
> + case SIGBUS: {
> + if (expecting_sigbus)
> + siglongjmp(expect_sigbus_jmpbuf, 1);
> +
> + TEST_FAIL("Unexpected SIGBUS (%d)\n", signum);
> + }
> #define KVM_CASE_SIGNUM(sig) \
> case sig: TEST_FAIL("Unexpected " #sig " (%d)\n", signum)
> -
> - switch (signum) {
> - KVM_CASE_SIGNUM(SIGBUS);
> KVM_CASE_SIGNUM(SIGSEGV);
> KVM_CASE_SIGNUM(SIGILL);
> KVM_CASE_SIGNUM(SIGFPE);
> @@ -2310,12 +2317,13 @@ static void report_unexpected_signal(int signum)
> void __attribute((constructor)) kvm_selftest_init(void)
> {
> struct sigaction sig_sa = {
> - .sa_handler = report_unexpected_signal,
> + .sa_handler = catchall_signal_handler,
> };
>
> /* Tell stdout not to buffer its content. */
> setbuf(stdout, NULL);
>
> + expecting_sigbus = false;
> sigaction(SIGBUS, &sig_sa, NULL);
> sigaction(SIGSEGV, &sig_sa, NULL);
> sigaction(SIGILL, &sig_sa, NULL);
> diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
> index bab1bd2b775b6..30eb701e4becd 100644
> --- a/tools/testing/selftests/kvm/lib/test_util.c
> +++ b/tools/testing/selftests/kvm/lib/test_util.c
> @@ -18,13 +18,6 @@
>
> #include "test_util.h"
>
> -sigjmp_buf expect_sigbus_jmpbuf;
> -
> -void __attribute__((used)) expect_sigbus_handler(int signum)
> -{
> - siglongjmp(expect_sigbus_jmpbuf, 1);
> -}
> -
> /*
> * Random number generator that is usable from guest code. This is the
> * Park-Miller LCG using standard constants.
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 43/46] KVM: selftests: Check fd/flags provided to mmap() when setting up memslot
From: Fuad Tabba @ 2026-06-25 9:20 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-43-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Check that a valid fd provided to mmap() must be accompanied by MAP_SHARED.
>
> With an invalid fd (usually used for anonymous mappings), there are no
> constraints on mmap() flags.
>
> Add this check to make sure that when a guest_memfd is used as region->fd,
> the flag provided to mmap() will include MAP_SHARED.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> [Rephrase assertion message.]
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/lib/kvm_util.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 0b2256ea65ff9..6b304e8a0e0d5 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1110,6 +1110,9 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
> src_type == VM_MEM_SRC_SHARED_HUGETLB);
> }
>
> + TEST_ASSERT(region->fd == -1 || backing_src_is_shared(src_type),
> + "A valid fd provided to mmap() must be accompanied by MAP_SHARED.");
> +
> region->mmap_start = __kvm_mmap(region->mmap_size, PROT_READ | PROT_WRITE,
> vm_mem_backing_src_alias(src_type)->flag,
> region->fd, mmap_offset);
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 42/46] KVM: selftests: Provide common function to set memory attributes
From: Fuad Tabba @ 2026-06-25 9:09 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-42-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Introduce vm_mem_set_memory_attributes(), which handles setting of memory
> attributes for a range of guest physical addresses, regardless of whether
> the attributes should be set via guest_memfd or via the memory attributes
> at the VM level.
>
> Refactor existing vm_mem_set_{shared,private} functions to use the new
> function. Opportunistically update the size parameter to use size_t instead
> of u64.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/include/kvm_util.h | 46 +++++++++++++++++++-------
> 1 file changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 3a6b1fa7f26ef..db1442da21bb1 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -454,18 +454,6 @@ static inline void vm_set_memory_attributes(struct kvm_vm *vm, gpa_t gpa,
> vm_ioctl(vm, KVM_SET_MEMORY_ATTRIBUTES, &attr);
> }
>
> -static inline void vm_mem_set_private(struct kvm_vm *vm, gpa_t gpa,
> - u64 size)
> -{
> - vm_set_memory_attributes(vm, gpa, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> -}
> -
> -static inline void vm_mem_set_shared(struct kvm_vm *vm, gpa_t gpa,
> - u64 size)
> -{
> - vm_set_memory_attributes(vm, gpa, size, 0);
> -}
> -
> static inline int __gmem_set_memory_attributes(int fd, u64 offset,
> size_t size, u64 attributes,
> u64 *error_offset)
> @@ -532,6 +520,40 @@ static inline void gmem_set_shared(int fd, u64 offset, size_t size)
> gmem_set_memory_attributes(fd, offset, size, 0);
> }
>
> +static inline void vm_mem_set_memory_attributes(struct kvm_vm *vm, gpa_t gpa,
> + size_t size, u64 attrs)
> +{
> + if (kvm_has_gmem_attributes) {
> + gpa_t end = gpa + size;
> + off_t fd_offset;
> + gpa_t addr;
> + size_t len;
> + int fd;
> +
> + for (addr = gpa; addr < end; addr += len) {
> + fd = kvm_gpa_to_guest_memfd(vm, addr, &fd_offset, &len);
> + len = min(end - addr, len);
> +
> + gmem_set_memory_attributes(fd, fd_offset, len, attrs);
> + }
> + } else {
> + vm_set_memory_attributes(vm, gpa, size, attrs);
> + }
> +}
> +
> +static inline void vm_mem_set_private(struct kvm_vm *vm, gpa_t gpa,
> + size_t size)
> +{
> + vm_mem_set_memory_attributes(vm, gpa, size,
> + KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +}
> +
> +static inline void vm_mem_set_shared(struct kvm_vm *vm, gpa_t gpa,
> + size_t size)
> +{
> + vm_mem_set_memory_attributes(vm, gpa, size, 0);
> +}
> +
> void vm_guest_mem_fallocate(struct kvm_vm *vm, gpa_t gpa, u64 size,
> bool punch_hole);
>
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 41/46] KVM: selftests: Provide function to look up guest_memfd details from gpa
From: Fuad Tabba @ 2026-06-25 8:58 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-41-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Introduce a new helper, kvm_gpa_to_guest_memfd(), to find the
> guest_memfd-related details of a memory region that contains a given guest
> physical address (GPA).
>
> The function returns the file descriptor for the memfd, the offset into
> the file that corresponds to the GPA, and the number of bytes remaining
> in the region from that GPA.
>
> kvm_gpa_to_guest_memfd() was factored out from vm_guest_mem_fallocate();
> refactor vm_guest_mem_fallocate() to use the new helper.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/include/kvm_util.h | 3 +++
> tools/testing/selftests/kvm/lib/kvm_util.c | 37 ++++++++++++++++----------
> 2 files changed, 26 insertions(+), 14 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 79ab64ac8b869..3a6b1fa7f26ef 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -428,6 +428,9 @@ static inline void vm_enable_cap(struct kvm_vm *vm, u32 cap, u64 arg0)
> vm_ioctl(vm, KVM_ENABLE_CAP, &enable_cap);
> }
>
> +int kvm_gpa_to_guest_memfd(struct kvm_vm *vm, gpa_t gpa, off_t *fd_offset,
> + size_t *nr_bytes);
> +
> /*
> * KVM_SET_MEMORY_ATTRIBUTES{,2} overwrites _all_ attributes. These
> * flows need significant enhancements to support multiple attributes.
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 524ef97d634bf..0b2256ea65ff9 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1305,27 +1305,20 @@ void vm_guest_mem_fallocate(struct kvm_vm *vm, u64 base, u64 size,
> bool punch_hole)
> {
> const int mode = FALLOC_FL_KEEP_SIZE | (punch_hole ? FALLOC_FL_PUNCH_HOLE : 0);
> - struct userspace_mem_region *region;
> u64 end = base + size;
> - gpa_t gpa, len;
> off_t fd_offset;
> - int ret;
> + int fd, ret;
> + size_t len;
> + gpa_t gpa;
>
> for (gpa = base; gpa < end; gpa += len) {
> - u64 offset;
> -
> - region = userspace_mem_region_find(vm, gpa, gpa);
> - TEST_ASSERT(region && region->region.flags & KVM_MEM_GUEST_MEMFD,
> - "Private memory region not found for GPA 0x%lx", gpa);
> + fd = kvm_gpa_to_guest_memfd(vm, gpa, &fd_offset, &len);
> + len = min(end - gpa, len);
>
> - offset = gpa - region->region.guest_phys_addr;
> - fd_offset = region->region.guest_memfd_offset + offset;
> - len = min_t(u64, end - gpa, region->region.memory_size - offset);
> -
> - ret = fallocate(region->region.guest_memfd, mode, fd_offset, len);
> + ret = fallocate(fd, mode, fd_offset, len);
> TEST_ASSERT(!ret, "fallocate() failed to %s at %lx (len = %lu), fd = %d, mode = %x, offset = %lx",
> punch_hole ? "punch hole" : "allocate", gpa, len,
> - region->region.guest_memfd, mode, fd_offset);
> + fd, mode, fd_offset);
> }
> }
>
> @@ -1662,6 +1655,22 @@ void *addr_gpa2alias(struct kvm_vm *vm, gpa_t gpa)
> return (void *) ((uintptr_t) region->host_alias + offset);
> }
>
> +int kvm_gpa_to_guest_memfd(struct kvm_vm *vm, gpa_t gpa, off_t *fd_offset,
> + size_t *nr_bytes)
> +{
> + struct userspace_mem_region *region;
> + gpa_t gpa_offset;
> +
> + region = userspace_mem_region_find(vm, gpa, gpa);
> + TEST_ASSERT(region && region->region.flags & KVM_MEM_GUEST_MEMFD,
> + "guest_memfd memory region not found for GPA 0x%lx", gpa);
> +
> + gpa_offset = gpa - region->region.guest_phys_addr;
> + *fd_offset = region->region.guest_memfd_offset + gpa_offset;
> + *nr_bytes = region->region.memory_size - gpa_offset;
> + return region->region.guest_memfd;
> +}
> +
> /* Create an interrupt controller chip for the specified VM. */
> void vm_create_irqchip(struct kvm_vm *vm)
> {
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 40/46] KVM: selftests: Reset shared memory after hole-punching
From: Fuad Tabba @ 2026-06-25 8:46 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-40-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> private_mem_conversions_test used to reset the shared memory that was used
> for the test to an initial pattern at the end of each test iteration. Then,
> it would punch out the pages, which would zero memory.
>
> Without in-place conversion, the resetting would write shared memory, and
> hole-punching will zero private memory, hence resetting the test to the
> state at the beginning of the for loop.
>
> With in-place conversion, resetting writes memory as shared, and
> hole-punching zeroes the same physical memory, hence undoing the reset
> done before the hole punch.
>
> Move the resetting after the hole-punching, and reset the entire
> PER_CPU_DATA_SIZE instead of just the tested range.
>
> With in-place conversion, this zeroes and then resets the same physical
> memory. Without in-place conversion, the private memory is zeroed, and the
> shared memory is reset to init_p.
>
> This is sufficient since at each test stage, the memory is assumed to start
> as shared, and private memory is always assumed to start zeroed. Conversion
> zeroes memory, so the future test stages will work as expected.
>
> Fixes: 43f623f350ce1 ("KVM: selftests: Add x86-only selftest for private memory conversions")
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/x86/private_mem_conversions_test.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> index 861baff201e78..289ad10063fca 100644
> --- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
> @@ -202,15 +202,18 @@ static void guest_test_explicit_conversion(u64 base_gpa, bool do_fallocate)
> guest_sync_shared(gpa, size, p3, p4);
> memcmp_g(gpa, p4, size);
>
> - /* Reset the shared memory back to the initial pattern. */
> - memset((void *)gpa, init_p, size);
> -
> /*
> * Free (via PUNCH_HOLE) *all* private memory so that the next
> * iteration starts from a clean slate, e.g. with respect to
> * whether or not there are pages/folios in guest_mem.
> */
> guest_map_shared(base_gpa, PER_CPU_DATA_SIZE, true);
> +
> + /*
> + * Hole-punching above zeroed private memory. Reset shared
> + * memory in preparation for the next GUEST_STAGE.
> + */
> + memset((void *)base_gpa, init_p, PER_CPU_DATA_SIZE);
> }
> }
>
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v4 09/13] verification/rvgen: Delete __parse_constraint()
From: Gabriele Monaco @ 2026-06-25 8:21 UTC (permalink / raw)
To: Nam Cao
Cc: Steven Rostedt, Wander Lairson Costa, linux-trace-kernel,
linux-kernel
In-Reply-To: <b22a5a3822fe53afb8e2cf1df623a0e4c9ed5f49.1781847583.git.namcao@linutronix.de>
On Fri, 2026-06-19 at 07:52 +0200, Nam Cao wrote:
> All previous users of self.invariants and self.guards have been
> converted
> to the Lark parser, delete __parse_constraints() and its associates.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
This one was missing the
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
The series looks ready for inclusion to me, thanks!
Gabriele
> ---
> tools/verification/rvgen/rvgen/dot2k.py | 67 ++---------------------
> --
> 1 file changed, 4 insertions(+), 63 deletions(-)
>
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index 4ea1ecc55c80..f1f5fa297adb 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -177,7 +177,6 @@ class ha2k(dot2k):
> if not self.is_hybrid_automata():
> raise AutomataError("Detected deterministic automaton,
> use the 'da' class")
> self.trace_h = self._read_template_file("trace_hybrid.h")
> - self.__parse_constraints()
> self.has_invariant = False
> self.has_guard = False
> for state in self._states:
> @@ -308,64 +307,6 @@ class ha2k(dot2k):
> separator = "\n\t\t " if sum(len(r) for r in rules) >
> 80 else " "
> return ["res = " + separator.join(rules) + ";"]
>
> - def __validate_constraint(self, key: tuple[int, int] | int,
> constr: str,
> - rule, reset) -> None:
> - # event constrains are tuples and allow both rules and reset
> - # state constraints are only used for expirations (e.g.
> clk<N)
> - if self.is_event_constraint(key):
> - if not rule and not reset:
> - raise AutomataError("Unrecognised event constraint "
> -
> f"({self.states[key[0]]}/{self.events[key[1]]}: {constr})")
> - if rule and (rule["env"] in self.env_types and
> - rule["env"] not in self.env_stored):
> - raise AutomataError("Clocks in hybrid automata
> always require a storage"
> - f" ({rule["env"]})")
> - else:
> - if not rule:
> - raise AutomataError("Unrecognised state constraint "
> - f"({self.states[key]}:
> {constr})")
> - if rule["env"] not in self.env_stored:
> - raise AutomataError("State constraints always
> require a storage "
> - f"({rule["env"]})")
> - if rule["op"] not in ["<", "<="]:
> - raise AutomataError("State constraints must be clock
> expirations like"
> - f" clk<N ({rule.string})")
> -
> - def __parse_constraints(self) -> None:
> - self.guards: dict[_EventConstraintKey, str] = {}
> - self.invariants: dict[_StateConstraintKey, str] = {}
> - for key, constraint in self.constraints.items():
> - rules = []
> - resets = []
> - for c, sep in self._split_constraint_expr(constraint):
> - rule = self.constraint_rule.search(c)
> - reset = self.constraint_reset.search(c)
> - self.__validate_constraint(key, c, rule, reset)
> - if rule:
> - value = rule["val"]
> - value_len = len(rule["val"])
> - unit = None
> - if rule.groupdict().get("unit"):
> - value_len += len(rule["unit"])
> - unit = rule["unit"]
> - c = c[:-(value_len)]
> - value = self.__adjust_value(value, unit)
> - if self.is_event_constraint(key):
> - c = self.__parse_single_constraint(rule,
> value)
> - if sep:
> - c += f" {sep}"
> - else:
> - c = self.__parse_timer_constraint(rule,
> value)
> - rules.append(c)
> - if reset:
> - c = f"ha_reset_env(ha_mon,
> {reset["env"]}{self.enum_suffix}, time_ns)"
> - resets.append(c)
> - if self.is_event_constraint(key):
> - res = self.__format_guard_rules(rules) + resets
> - self.guards[key] = ";".join(res)
> - else:
> - self.invariants[key] = rules[0]
> -
> def __fill_verify_invariants_func(self) -> list[str]:
> if not self.has_invariant:
> return []
> @@ -490,15 +431,15 @@ f"""static bool ha_verify_constraint(struct
> ha_monitor *ha_mon,
> \t\t\t\t enum {self.enum_states_def} next_state, u64 time_ns)
> {{""")
>
> - if self.invariants:
> + if self.has_invariant:
> buff.append("\tif (!ha_verify_invariants(ha_mon,
> curr_state, "
> "event, next_state, time_ns))\n\t\treturn
> false;\n")
>
> - if self.guards:
> + if self.has_guard:
> buff.append("\tif (!ha_verify_guards(ha_mon, curr_state,
> event, "
> "next_state, time_ns))\n\t\treturn
> false;\n")
>
> - if self.invariants:
> + if self.has_invariant:
> buff.append("\tha_setup_invariants(ha_mon, curr_state,
> event, next_state, time_ns);\n")
>
> buff.append("\treturn true;\n}\n")
> @@ -575,7 +516,7 @@ f"""static bool ha_verify_constraint(struct
> ha_monitor *ha_mon,
> return self.__fill_hybrid_get_reset_functions() +
> self.__fill_constr_func()
>
> def _fill_timer_type(self) -> list:
> - if self.invariants:
> + if self.has_invariant:
> return [
> "/* XXX: If the monitor has several instances,
> consider HA_TIMER_WHEEL */",
> "#define HA_TIMER_TYPE HA_TIMER_HRTIMER"
^ permalink raw reply
* Re: [PATCH v8 39/46] KVM: selftests: Test conversion with elevated page refcount
From: Fuad Tabba @ 2026-06-25 8:04 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-39-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Add a selftest to verify that converting a shared guest_memfd page to a
> private page fails if the page has an elevated reference count.
>
> When KVM converts a shared page to a private one, it expects the page to
> have a reference count equal to the reference counts taken by the
> filemap. If another kernel subsystem holds a reference to the page, the
> conversion must be aborted.
>
> The test asserts that both bulk and single-page conversion attempts
> correctly fail with EAGAIN for the pinned page. After the page is unpinned,
> the test verifies that subsequent conversions succeed.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Not sure Sashiko's concern is worth it.
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../kvm/x86/guest_memfd_conversions_test.c | 56 ++++++++++++++++++++++
> 1 file changed, 56 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> index 99b0023609670..4ebbd29029526 100644
> --- a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> @@ -441,6 +441,62 @@ GMEM_CONVERSION_TEST_INIT_SHARED(forked_accesses)
> #undef TEST_STATE_AWAIT
> }
>
> +static void test_convert_to_private_fails(test_data_t *t, u64 pgoff,
> + size_t nr_pages,
> + u64 expected_error_offset)
> +{
> + /* +1 to make it anything but expected_error_offset. */
> + u64 error_offset = expected_error_offset + 1;
> + u64 offset = pgoff * page_size;
> + int ret;
> +
> + do {
> + ret = __gmem_set_private(t->gmem_fd, offset,
> + nr_pages * page_size, &error_offset);
> + } while (ret == -1 && errno == EINTR);
> + TEST_ASSERT(ret == -1 && errno == EAGAIN,
> + "Wanted EAGAIN on page %lu, got %d (ret = %d)", pgoff,
> + errno, ret);
> + TEST_ASSERT_EQ(error_offset, expected_error_offset);
> +}
> +
> +GMEM_CONVERSION_MULTIPAGE_TEST_INIT_SHARED(elevated_refcount, 4)
> +{
> + int i;
> +
> + pin_pages(t->mem + test_page * page_size, page_size);
> +
> + for (i = 0; i < nr_pages; i++)
> + test_shared(t, i, 0, 'A', 'B');
> +
> + /*
> + * Converting in bulk should fail as long any page in the range has
> + * unexpected refcounts.
> + */
> + test_convert_to_private_fails(t, 0, nr_pages, test_page * page_size);
> +
> + for (i = 0; i < nr_pages; i++) {
> + /*
> + * Converting page-wise should also fail as long any page in the
> + * range has unexpected refcounts.
> + */
> + if (i == test_page)
> + test_convert_to_private_fails(t, i, 1, test_page * page_size);
> + else
> + test_convert_to_private(t, i, 'B', 'C');
> + }
> +
> + unpin_pages();
> +
> + gmem_set_private(t->gmem_fd, 0, nr_pages * page_size);
> +
> + for (i = 0; i < nr_pages; i++) {
> + char expected = i == test_page ? 'B' : 'C';
> +
> + test_private(t, i, expected, 'D');
> + }
> +}
> +
> int main(int argc, char *argv[])
> {
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v3 0/2] tracing: Remove trace_printk.h from kernel.h
From: Sebastian Andrzej Siewior @ 2026-06-25 7:56 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, linux-trace-kernel, Masami Hiramatsu, Mark Rutland,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds, John Ogness,
Thomas Gleixner, Peter Zijlstra, Julia Lawall, Yury Norov
In-Reply-To: <20260624081806.120105649@kernel.org>
On 2026-06-24 04:18:06 [-0400], Steven Rostedt wrote:
> Remove trace_printk.h by creating a trace_controls.h for those places that
> need access to tracing prototypes like tracing_off() and for the places that
> need trace_printk() directly, to have it included directly.
That sounds reasonable. Thank you for doing it.
Sebastian
^ permalink raw reply
* Re: [PATCH v8 38/46] KVM: selftests: Add helpers to pin pages with CONFIG_GUP_TEST
From: Fuad Tabba @ 2026-06-25 7:40 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-38-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Add helper functions to allow KVM selftests to pin memory using
> CONFIG_GUP_TEST. This is useful for testing scenarios where some page has
> an increased refcount. such as in guest_memfd in-place conversion tests.
>
> The helpers open /sys/kernel/debug/gup_test and invoke the
> PIN_LONGTERM_TEST_START and PIN_LONGTERM_TEST_STOP ioctls. Since this
> functionality depends on the kernel being built with CONFIG_GUP_TEST,
> provide stub implementations that trigger a test failure if the
> configuration is missing.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
nit below, otherwise:
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> tools/testing/selftests/kvm/include/kvm_util.h | 3 +++
> tools/testing/selftests/kvm/lib/kvm_util.c | 23 +++++++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 323d06b5699ec..79ab64ac8b869 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -1195,6 +1195,9 @@ static inline int pin_self_to_any_cpu(void)
> return pin_task_to_any_cpu(pthread_self());
> }
>
> +void pin_pages(void *vaddr, uint64_t size);
> +void unpin_pages(void);
> +
> void kvm_print_vcpu_pinning_help(void);
> void kvm_parse_vcpu_pinning(const char *pcpus_string, u32 vcpu_to_pcpu[],
> int nr_vcpus);
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index b73817f7bc803..524ef97d634bf 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -18,6 +18,8 @@
> #include <unistd.h>
> #include <linux/kernel.h>
>
> +#include "../../../../mm/gup_test.h"
> +
> #define KVM_UTIL_MIN_PFN 2
>
> u32 guest_random_seed;
> @@ -639,6 +641,27 @@ int __pin_task_to_cpu(pthread_t task, int cpu)
> return pthread_setaffinity_np(task, sizeof(cpuset), &cpuset);
> }
>
> +static int gup_test_fd = -1;
> +
> +void pin_pages(void *vaddr, uint64_t size)
> +{
> + const struct pin_longterm_test args = {
> + .addr = (uint64_t)vaddr,
> + .size = size,
> + .flags = PIN_LONGTERM_TEST_FLAG_USE_WRITE,
> + };
> +
> + gup_test_fd = __open_path_or_exit("/sys/kernel/debug/gup_test", O_RDWR,
> + "Is CONFIG_GUP_TEST enabled?");
nit: should you close this/reset it to -1 after the tests?
> +
> + TEST_ASSERT_EQ(ioctl(gup_test_fd, PIN_LONGTERM_TEST_START, &args), 0);
> +}
> +
> +void unpin_pages(void)
> +{
> + TEST_ASSERT_EQ(ioctl(gup_test_fd, PIN_LONGTERM_TEST_STOP), 0);
> +}
> +
> static u32 parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
> {
> u32 pcpu = atoi_non_negative("CPU number", cpu_str);
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 37/46] KVM: selftests: Test that shared/private status is consistent across processes
From: Fuad Tabba @ 2026-06-25 7:14 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-37-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Add a test to verify that a guest_memfd's shared/private status is
> consistent across processes, and that any shared pages previously mapped in
> any process are unmapped from all processes.
>
> The test forks a child process after creating the shared guest_memfd
> region so that the second process exists alongside the main process for the
> entire test.
>
> The processes then take turns to access memory to check that the
> shared/private status is consistent across processes.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
Two things below, otherwise:
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> .../kvm/x86/guest_memfd_conversions_test.c | 118 +++++++++++++++++++++
> 1 file changed, 118 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> index f03af2c46426f..99b0023609670 100644
> --- a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> @@ -2,6 +2,8 @@
> /*
> * Copyright (c) 2024, Google LLC.
> */
> +#include <pthread.h>
> +#include <time.h>
> #include <sys/mman.h>
> #include <unistd.h>
nit: include order
>
> @@ -323,6 +325,122 @@ GMEM_CONVERSION_TEST_INIT_SHARED(truncate)
> test_private(t, 0, 0, 'A');
> }
>
> +/* Test that shared/private memory protections work and are seen from any process. */
> +GMEM_CONVERSION_TEST_INIT_SHARED(forked_accesses)
> +{
> + enum test_state {
> + STATE_INIT,
> + STATE_CHECK_SHARED,
> + STATE_DONE_CHECKING_SHARED,
> + STATE_CHECK_PRIVATE,
> + STATE_DONE_CHECKING_PRIVATE,
> + };
> +
> + struct sync_state {
> + pthread_mutex_t mutex;
> + pthread_cond_t cond;
> + enum test_state step;
> + } *sync;
> +
> + pthread_mutexattr_t mattr;
> + pthread_condattr_t cattr;
> + pid_t child_pid, parent_pid;
> + int status;
> +
> + sync = kvm_mmap(sizeof(*sync), PROT_READ | PROT_WRITE,
> + MAP_SHARED | MAP_ANONYMOUS, -1);
> +
> + pthread_mutexattr_init(&mattr);
> + pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
> + pthread_mutex_init(&sync->mutex, &mattr);
> + pthread_mutexattr_destroy(&mattr);
> +
> + pthread_condattr_init(&cattr);
> + pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED);
> + pthread_cond_init(&sync->cond, &cattr);
> + pthread_condattr_destroy(&cattr);
> +
> + sync->step = STATE_INIT;
> +
> +#define TEST_STATE_AWAIT(__state) \
> + do { \
> + pthread_mutex_lock(&sync->mutex); \
> + while (sync->step != (__state)) { \
> + struct timespec ts, stop; \
> + int ret; \
> + \
> + clock_gettime(CLOCK_REALTIME, &ts); \
> + stop = timespec_add_ns(ts, 100 * 1000000UL); \
> + \
> + ret = pthread_cond_timedwait(&sync->cond, &sync->mutex, &stop); \
> + if (ret == ETIMEDOUT) { \
> + bool alive = (child_pid == 0) ? \
> + (getppid() == parent_pid) : \
> + (waitpid(child_pid, NULL, WNOHANG) == 0); \
Not sure it's worth it, but if you want to silence Sashiko, waitid
with WNOWAIT might be the way to go (not tested, just from looking at
the man page). This is though very unlikely, mentioning it since
Sashiko complained.
> + TEST_ASSERT(alive, "Other process exited prematurely"); \
> + } else { \
> + TEST_ASSERT(!ret, "pthread_cond_timedwait failed"); \
> + } \
> + } \
> + pthread_mutex_unlock(&sync->mutex); \
> + } while (0)
> +
> +#define TEST_STATE_SET(__state) \
> + do { \
> + pthread_mutex_lock(&sync->mutex); \
> + sync->step = (__state); \
> + pthread_cond_broadcast(&sync->cond); \
> + pthread_mutex_unlock(&sync->mutex); \
> + } while (0)
> +
> + parent_pid = getpid();
> + child_pid = fork();
> + TEST_ASSERT(child_pid != -1, "fork failed");
> +
> + if (child_pid == 0) {
> + const char inconsequential = 0xdd;
> +
> + TEST_STATE_AWAIT(STATE_CHECK_SHARED);
> +
> + /*
> + * This maps the pages into the child process as well, and tests
> + * that the conversion process will unmap the guest_memfd memory
> + * from all processes.
> + */
> + host_do_rmw(t->mem, 0, 0xB, 0xC);
> +
> + TEST_STATE_SET(STATE_DONE_CHECKING_SHARED);
> + TEST_STATE_AWAIT(STATE_CHECK_PRIVATE);
> +
> + TEST_EXPECT_SIGBUS(READ_ONCE(t->mem[0]));
> + TEST_EXPECT_SIGBUS(WRITE_ONCE(t->mem[0], inconsequential));
> +
> + TEST_STATE_SET(STATE_DONE_CHECKING_PRIVATE);
> + exit(0);
> + }
> +
> + test_shared(t, 0, 0, 0xA, 0xB);
> +
> + TEST_STATE_SET(STATE_CHECK_SHARED);
> + TEST_STATE_AWAIT(STATE_DONE_CHECKING_SHARED);
> +
> + test_convert_to_private(t, 0, 0xC, 0xD);
> +
> + TEST_STATE_SET(STATE_CHECK_PRIVATE);
> + TEST_STATE_AWAIT(STATE_DONE_CHECKING_PRIVATE);
> +
> + TEST_ASSERT_EQ(waitpid(child_pid, &status, 0), child_pid);
> + TEST_ASSERT(WIFEXITED(status) && WEXITSTATUS(status) == 0,
> + "Child exited with unexpected status");
> +
> + pthread_mutex_destroy(&sync->mutex);
> + pthread_cond_destroy(&sync->cond);
> + kvm_munmap(sync, sizeof(*sync));
> +
> +#undef TEST_STATE_SET
> +#undef TEST_STATE_AWAIT
> +}
> +
> int main(int argc, char *argv[])
> {
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 36/46] KVM: selftests: Test that truncation does not change shared/private status
From: Fuad Tabba @ 2026-06-25 7:03 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-36-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Add a test to verify that deallocating a page in a guest memfd region via
> fallocate() with FALLOC_FL_PUNCH_HOLE does not alter the shared or private
> status of the corresponding memory range.
>
> When a page backing a guest memfd mapping is deallocated, e.g., by punching
> a hole or truncating the file, and then subsequently faulted back in, the
> new page must inherit the correct shared/private status tracked by
> guest_memfd.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../selftests/kvm/x86/guest_memfd_conversions_test.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> index 0b024fb7227f0..f03af2c46426f 100644
> --- a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> @@ -10,6 +10,7 @@
> #include <linux/sizes.h>
>
> #include "kvm_util.h"
> +#include "kvm_syscalls.h"
> #include "kselftest_harness.h"
> #include "test_util.h"
> #include "ucall_common.h"
> @@ -309,6 +310,19 @@ GMEM_CONVERSION_MULTIPAGE_TEST_INIT_SHARED(unallocated_folios, 8)
> test_convert_to_shared(t, i, 'B', 'C', 'D');
> }
>
> +/* Truncation should not affect shared/private status. */
> +GMEM_CONVERSION_TEST_INIT_SHARED(truncate)
> +{
> + host_do_rmw(t->mem, 0, 0, 'A');
> + kvm_fallocate(t->gmem_fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, page_size);
> + host_do_rmw(t->mem, 0, 0, 'A');
> +
> + test_convert_to_private(t, 0, 'A', 'B');
> +
> + kvm_fallocate(t->gmem_fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, page_size);
> + test_private(t, 0, 0, 'A');
> +}
> +
> int main(int argc, char *argv[])
> {
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 35/46] KVM: selftests: Convert with allocated folios in different layouts
From: Fuad Tabba @ 2026-06-25 7:03 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-35-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Add a guest_memfd selftest to verify that memory conversions work
> correctly with allocated folios in different layouts.
>
> By iterating through which pages are initially faulted, the test covers
> various layouts of contiguous allocated and unallocated regions, exercising
> conversion with different range layouts.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../kvm/x86/guest_memfd_conversions_test.c | 30 ++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> index b43ac196330f1..0b024fb7227f0 100644
> --- a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> @@ -279,6 +279,36 @@ GMEM_CONVERSION_TEST_INIT_PRIVATE(before_allocation_private)
> test_convert_to_shared(t, 0, 0, 'A', 'B');
> }
>
> +/*
> + * Test that when some of the folios in the conversion range are allocated,
> + * conversion requests are handled correctly in guest_memfd. Vary the ranges
> + * allocated before conversion, using test_page, to cover various layouts of
> + * contiguous allocated and unallocated regions.
> + */
> +GMEM_CONVERSION_MULTIPAGE_TEST_INIT_SHARED(unallocated_folios, 8)
> +{
> + const int second_page_to_fault = 4;
> + int i;
> +
> + /*
> + * Fault 2 of the pages to test filemap range operations except when
> + * test_page == second_page_to_fault.
> + */
> + host_do_rmw(t->mem, test_page, 0, 'A');
> + if (test_page != second_page_to_fault)
> + host_do_rmw(t->mem, second_page_to_fault, 0, 'A');
> +
> + gmem_set_private(t->gmem_fd, 0, nr_pages * page_size);
> + for (i = 0; i < nr_pages; ++i) {
> + char expected = (i == test_page || i == second_page_to_fault) ? 'A' : 0;
> +
> + test_private(t, i, expected, 'B');
> + }
> +
> + for (i = 0; i < nr_pages; ++i)
> + test_convert_to_shared(t, i, 'B', 'C', 'D');
> +}
> +
> int main(int argc, char *argv[])
> {
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 34/46] KVM: selftests: Test conversion before allocation
From: Fuad Tabba @ 2026-06-25 7:00 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-34-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> Add two test cases to the guest_memfd conversions selftest to cover
> the scenario where a conversion is requested before any memory has been
> allocated in the guest_memfd region.
>
> The KVM_SET_MEMORY_ATTRIBUTES2 ioctl can be called on a memory region at
> any time. If the guest had not yet faulted in any pages for that region,
> the kernel must record the conversion request and apply the requested state
> when the pages are eventually allocated.
>
> The new tests cover both conversion directions.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../selftests/kvm/x86/guest_memfd_conversions_test.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> index 8e17d5c08aeb8..b43ac196330f1 100644
> --- a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> @@ -265,6 +265,20 @@ GMEM_CONVERSION_MULTIPAGE_TEST_INIT_SHARED(indexing, 4)
> #undef combine
> }
>
> +/*
> + * Test that even if there are no folios yet, conversion requests are recorded
> + * in guest_memfd.
> + */
> +GMEM_CONVERSION_TEST_INIT_SHARED(before_allocation_shared)
> +{
> + test_convert_to_private(t, 0, 0, 'A');
> +}
> +
> +GMEM_CONVERSION_TEST_INIT_PRIVATE(before_allocation_private)
> +{
> + test_convert_to_shared(t, 0, 0, 'A', 'B');
> +}
> +
> int main(int argc, char *argv[])
> {
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 33/46] KVM: selftests: Test conversion precision in guest_memfd
From: Fuad Tabba @ 2026-06-25 6:57 UTC (permalink / raw)
To: ackerleytng
Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
rick.p.edgecombe, rientjes, shivankg, steven.price, willy, wyihan,
yan.y.zhao, forkloop, pratyush, suzuki.poulose, aneesh.kumar,
liam, Paolo Bonzini, Sean Christopherson, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
Jonathan Corbet, Shuah Khan, Shuah Khan, Vishal Annapurve,
Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park,
Qi Zheng, Shakeel Butt, Kiryl Shutsemau, Baoquan He,
Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <20260618-gmem-inplace-conversion-v8-33-9d2959357853@google.com>
On Fri, 19 Jun 2026 at 01:32, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Ackerley Tng <ackerleytng@google.com>
>
> The existing guest_memfd conversion tests only use single-page memory
> regions. This provides no coverage for multi-page guest_memfd objects,
> specifically whether KVM correctly handles the page index for conversion
> operations. An incorrect implementation could, for example, always operate
> on the first page regardless of the index provided.
>
> Add a new test case to verify that conversions between private and shared
> memory correctly target the specified page within a multi-page guest_memfd.
>
> This test also verifies the precision of memory conversions by converting a
> single page an then iterating through all other pages ensure they remain in
> their original state.
>
> To support this test, add a new GMEM_CONVERSION_MULTIPAGE_TEST_INIT_SHARED
> macro that handles setting up and tearing down the VM for each page
> iteration. The teardown logic is adjusted to prevent a double-free in this
> new scenario.
>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> ---
> .../kvm/x86/guest_memfd_conversions_test.c | 66 ++++++++++++++++++++++
> 1 file changed, 66 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> index 5b070d3374eae..8e17d5c08aeb8 100644
> --- a/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> +++ b/tools/testing/selftests/kvm/x86/guest_memfd_conversions_test.c
> @@ -61,8 +61,13 @@ static void gmem_conversions_do_setup(test_data_t *t, int nr_pages,
>
> static void gmem_conversions_do_teardown(test_data_t *t)
> {
> + /* Use NULL to avoid second free in FIXTURE_TEARDOWN (multipage tests). */
> + if (!t->vcpu)
> + return;
> +
> /* No need to close gmem_fd, it's owned by the VM structure. */
> kvm_vm_free(t->vcpu->vm);
> + t->vcpu = NULL;
> }
>
> FIXTURE_TEARDOWN(gmem_conversions)
> @@ -101,6 +106,29 @@ static void __gmem_conversions_##test(test_data_t *t, int nr_pages) \
> #define GMEM_CONVERSION_TEST_INIT_SHARED(test) \
> __GMEM_CONVERSION_TEST_INIT_SHARED(test, 1)
>
> +/*
> + * Repeats test over nr_pages in a guest_memfd of size nr_pages, providing each
> + * test iteration with test_page, the index of the page under test in
> + * guest_memfd. test_page takes values 0..(nr_pages - 1) inclusive.
> + */
> +#define GMEM_CONVERSION_MULTIPAGE_TEST_INIT_SHARED(test, __nr_pages) \
> +static void __gmem_conversions_multipage_##test(test_data_t *t, int nr_pages, \
> + const int test_page); \
> + \
> +TEST_F(gmem_conversions, test) \
> +{ \
> + const u64 flags = GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_INIT_SHARED; \
> + int i; \
> + \
> + for (i = 0; i < __nr_pages; ++i) { \
> + gmem_conversions_do_setup(self, __nr_pages, flags); \
> + __gmem_conversions_multipage_##test(self, __nr_pages, i); \
> + gmem_conversions_do_teardown(self); \
> + } \
> +} \
> +static void __gmem_conversions_multipage_##test(test_data_t *t, int nr_pages, \
> + const int test_page)
> +
> struct guest_check_data {
> void *mem;
> char expected_val;
> @@ -199,6 +227,44 @@ GMEM_CONVERSION_TEST_INIT_SHARED(init_shared)
> test_convert_to_shared(t, 0, 'C', 'D', 'E');
> }
>
> +GMEM_CONVERSION_MULTIPAGE_TEST_INIT_SHARED(indexing, 4)
> +{
> + int i;
> +
> + /* Get a char that varies with both i and n. */
> +#define combine(x, n) ((x << 4) + (n))
> +#define i_(n) (combine(i, n))
> +#define t_(n) (combine(test_page, n))
> +
> + /*
> + * Start with the highest index, to catch any errors when, perhaps, the
> + * first page is returned even for the last index.
> + */
> + for (i = nr_pages - 1; i >= 0; --i)
> + test_shared(t, i, 0, i_(0), i_(2));
> +
> + test_convert_to_private(t, test_page, t_(2), t_(3));
> +
> + for (i = 0; i < nr_pages; ++i) {
> + if (i == test_page)
> + test_private(t, test_page, t_(3), t_(4));
> + else
> + test_shared(t, i, i_(2), i_(3), i_(4));
> + }
> +
> + test_convert_to_shared(t, test_page, t_(4), t_(5), t_(6));
> +
> + for (i = 0; i < nr_pages; ++i) {
> + char expected = i == test_page ? t_(6) : i_(4);
> +
> + test_shared(t, i, expected, i_(7), i_(8));
> + }
> +
> +#undef t_
> +#undef i_
> +#undef combine
> +}
> +
> int main(int argc, char *argv[])
> {
> TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
>
> --
> 2.55.0.rc0.738.g0c8ab3ebcc-goog
>
>
^ permalink raw reply
* Re: [PATCH v8 15/46] KVM: guest_memfd: Call arch invalidate hooks on conversion
From: Fuad Tabba @ 2026-06-25 6:48 UTC (permalink / raw)
To: Ackerley Tng
Cc: Sean Christopherson, aik, andrew.jones, binbin.wu, brauner,
chao.p.peng, david, jmattson, jthoughton, michael.roth, oupton,
pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen, Yuanchu Xie,
Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt, Kiryl Shutsemau,
Baoquan He, Jason Gunthorpe, Vlastimil Babka, kvm, linux-kernel,
linux-trace-kernel, linux-doc, linux-kselftest, linux-mm,
linux-coco
In-Reply-To: <CAEvNRgGX3GkazCWM=6y9YLgn=YemXuG==Oo+L58cac1Fd86_TQ@mail.gmail.com>
On Wed, 24 Jun 2026 at 18:46, Ackerley Tng <ackerleytng@google.com> wrote:
>
> Sean Christopherson <seanjc@google.com> writes:
>
> > On Fri, Jun 19, 2026, Fuad Tabba wrote:
> >> On Fri, 19 Jun 2026 at 01:31, Ackerley Tng via B4 Relay
> >> <devnull+ackerleytng.google.com@kernel.org> wrote:
> >> >
> >> > From: Ackerley Tng <ackerleytng@google.com>
> >> >
> >> > When memory in guest_memfd is converted from private to shared, the
> >> > platform-specific state associated with the guest-private pages must be
> >> > invalidated or cleaned up.
> >> >
> >> > Iterate over the folios in the affected range and call the
> >> > kvm_arch_gmem_invalidate() hook for each PFN range. This allows
> >> > architectures to perform necessary teardown, such as updating hardware
> >> > metadata or encryption states, before the pages are transitioned to the
> >> > shared state.
> >> >
> >> > Invoke this helper after indicating to KVM's mmu code that an invalidation
> >> > is in progress to stop in-flight page faults from succeeding.
> >> >
> >> > Reviewed-by: Fuad Tabba <tabba@google.com>
> >> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >>
> >> Coming back to this after working through the arm64/pKVM side. My
> >> Reviewed-by here is from the previous round and the patch hasn't
> >> changed, but I missed an implication for arm64.
> >>
> >> kvm_arch_gmem_invalidate() is now called from two paths with the same
> >> (start, end) signature: folio teardown (kvm_gmem_free_folio) and
> >> private->shared conversion (here). For SNP/TDX that's fine, conversion is
> >> destructive anyway. For pKVM the two need opposite content semantics:
> >> conversion must preserve the page in place (same physical page, the point
> >> of in-place conversion without encryption), while teardown must scrub it
> >> before returning it to the host.
> >>
> >> The hook gets only a pfn range with no indication of which caller it's
> >> serving, so arm64 can't give the two paths the behaviour they need. It
> >> would help to signal intent on the conversion path: a reason/flag, a
> >> separate hook, or not routing non-destructive conversion through the
> >> teardown hook.
> >>
> >> arm64 isn't here yet, so this isn't urgent, but the hook is gaining a
> >> second caller now, and it's cheaper to leave room for the distinction
> >> than to change a generic contract other arches depend on later.
> >
> > Crud. It may not be urgent for arm64, but it's urgent for other reasons that
> > I "can't" describe in detail at the moment, and even if that weren't the case, I
> > think we should clean things up now. More below.
> >
> >> > virt/kvm/guest_memfd.c | 41 +++++++++++++++++++++++++++++++++++++++++
> >> > 1 file changed, 41 insertions(+)
> >> >
> >> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> >> > index 433f79047b9d1..3c94442bc8131 100644
> >> > --- a/virt/kvm/guest_memfd.c
> >> > +++ b/virt/kvm/guest_memfd.c
> >> > @@ -607,6 +607,42 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
> >> > return safe;
> >> > }
> >> >
> >> > +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> >> > +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
> >
> > Not your fault, but kvm_arch_gmem_invalidate() is badly misnamed. It's not
> > "invalidating" anything, it's much more of a "free" callback, as SNP uses it to
> > put physical pages back into a shared state when a maybe-private folio is freed.
> >
> > As Fuad points out, (ab)using that hook for the private=>shared conversion case
> > "works", but not broadly. And it makes the bad name worse, because it's called
> > from code that _is_ doing true invalidations. For pKVM, it may not even need to
> > do anything invalidation-like.
> >
>
> Thanks, I also didn't like the naming of kvm_gmem_invalidate(),
> especially when conversions also calls
> kvm_gmem_invalidate_{start,end}() and those do different things.
>
> > To avoid a conflict with patches that are going to have priority over this series,
> > to set the stage for arm64 support, and to avoid avoid bleeding vendor details
> > into guest_memfd, as if they are core guest_memfd behavior (only SNP needs the
> > "invalidation" on this specific transition), I think we should add an arch hook
> > to do conversions straightaway.
> >
> > Unless there's a clever option I'm missing, it'll mean adding yet another
> > HAVE_KVM_ARCH_GMEM_XXX flag? Hmm, especially because IIUC, arm64/pKVM doesn't
> > need a callback for this case, only the free_folio case.
> >
> >> > +{
> >> > + struct folio_batch fbatch;
> >> > + pgoff_t next = start;
> >> > + int i;
> >> > +
> >> > + folio_batch_init(&fbatch);
> >> > + while (filemap_get_folios(inode->i_mapping, &next, end - 1, &fbatch)) {
> >> > + for (i = 0; i < folio_batch_count(&fbatch); ++i) {
> >> > + struct folio *folio = fbatch.folios[i];
> >> > + pgoff_t start_index, end_index;
> >> > + kvm_pfn_t start_pfn, end_pfn;
> >> > +
> >> > + start_index = max(start, folio->index);
> >> > + end_index = min(end, folio_next_index(folio));
> >> > + /*
> >> > + * end_index is either in folio or points to
> >> > + * the first page of the next folio. Hence,
> >> > + * all pages in range [start_index, end_index)
> >> > + * are contiguous.
> >> > + */
> >> > + start_pfn = folio_file_pfn(folio, start_index);
> >> > + end_pfn = start_pfn + end_index - start_index;
> >> > +
> >> > + kvm_arch_gmem_invalidate(start_pfn, end_pfn);
> >> > + }
> >> > +
> >> > + folio_batch_release(&fbatch);
> >> > + cond_resched();
> >> > + }
> >> > +}
> >> > +#else
> >> > +static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {}
> >> > +#endif
> >> > +
> >> > static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> >> > size_t nr_pages, uint64_t attrs,
> >> > pgoff_t *err_index)
> >> > @@ -647,7 +683,12 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> >> > */
> >> >
> >> > kvm_gmem_invalidate_start(inode, start, end);
> >> > +
> >> > + if (!to_private)
> >> > + kvm_gmem_invalidate(inode, start, end);
> >
> > E.g. instead make this something like this?
> >
> > kvm_gmem_set_pfn_attributes(...)
> >
> > Hrm, though that wastes folio lookups in the to_private case. So maybe just this,
> > assuming pKVM doesn't need to take additional action on conversions?
> >
> > if (!to_private)
> > kvm_gmem_make_shared(...)
> >
> > Actually, if we do that, then we don't need a separate arch hook, just a separate
> > config. It'll still bleed SNP details into guest_memfd, but it'll at least be
> > done in a way that's more explicitly arch specific (and it's no different than
> > what we already do for PREPARE...).
> >
>
> pKVM needs some arch guest_memfd lifecycle functions that
>
> + for conversion, doesn't do anything,
> + for teardown, resets page state (IIUC it'll be reset to
> PKVM_PAGE_OWNED (by the host))
>
> So I think we need different functions for those two stages in the
> lifecycle of a page with guest_memfd? What if we have
Yes, the split is what I was after. One PFN-range hook for both
teardown and private->shared conversion can't tell them apart, and for
pKVM the two want opposite content semantics.
Two configs rather than one is right, since the needs are independent.
pKVM wants teardown but not conversion.
>
> CONFIG_HAVE_KVM_ARCH_GMEM_SET_PFN_ATTRIBUTES, which gates
>
> + kvm_gmem_should_set_pfn_attributes(attributes) and
> .gmem_should_set_pfn_attributes
> + kvm_gmem_set_pfn_attributes(start_pfn, end_pfn, attributes) and
> .gmem_set_pfn_attributes
>
> CONFIG_HAVE_KVM_ARCH_GMEM_TEARDOWN, which gates
>
> + kvm_gmem_teardown() and .gmem_teardown
>
> SNP:
>
> + .gmem_should_set_pfn_attributes = sev_gmem_should_set_pfn_attributes,
> and sev_gmem_should_set_pfn_attributes returns !is_private
> + Rename .gmem_invalidate and sev_gmem_invalidate to *set_pfn_attributes
> + .gmem_teardown = sev_gmem_set_pfn_attributes
>
> TDX:
>
> + Disable CONFIG_HAVE_KVM_ARCH_GMEM_SET_PFN_ATTRIBUTES
> + Disable CONFIG_HAVE_KVM_ARCH_GMEM_TEARDOWN
>
> pKVM:
>
> + Disable CONFIG_HAVE_KVM_ARCH_GMEM_SET_PFN_ATTRIBUTES
> + .gmem_teardown = pkvm_gmem_set_pfn_attributes
Right for pKVM:
- teardown is not a no-op: it scrubs the page and resets the host
state to PKVM_PAGE_OWNED before the page returns to the host. Your
"reset to PKVM_PAGE_OWNED" reading is correct.
- the arch conversion hook is a no-op, so disabling SET_PFN_ATTRIBUTES
is correct. Conversions in pKVM are guest-initiated: the
share/unshare hypercall does the stage-2 and page-state transition
at EL2. The host still runs the generic conversion path (safety
check, attribute update) and accepts the conversion, but EL2 has
already done the transition, so there is nothing arch-specific left
for a hook to do. The page is preserved in place (no scrub).
If pKVM does turn out to need a step on conversion, it stays
non-destructive either way, and it can opt in later without touching
a contract others depend on.
Folding the direction check behind .gmem_should_set_pfn_attributes is
a good cleanup, it keeps the !to_private check out of generic gmem.
On naming: gmem_teardown is better. gmem_set_pfn_attributes reads a
bit close to KVM_SET_MEMORY_ATTRIBUTES, but naming is hard. :)
>
> Suzuki, does this work for ARM CCA?
>
> This way,
>
> + The if (is_private) check doesn't leak SNP details into guest_memfd
> + .gmem_make_shared doesn't stick out without a .gmem_make_private
> + .gmem_set_pfn_attributes, .gmem_prepare and .gmem_teardown are aligned
> conceptually as lifecycle hooks
>
> + I think the private/shared check for prepare can also be folded into
> preparation.
> + Preparation perhaps doesn't need a should_prepare equivalent since
> there's no iteration and getting the gfn is just doing some math?
> + In another patch series?
Agreed, separate series.
Thank you Ackerley!
/fuad
>
> > E.g. this? There will still be a looming rename conflict, but that's easy enough
> > to handle.
> >
> > diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
> > index 9ce5be7843f2..8aead0abd788 100644
> > --- virt/kvm/guest_memfd.c
> > +++ virt/kvm/guest_memfd.c
> > @@ -648,8 +648,8 @@ static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
> > return safe;
> > }
> >
> > -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE
> > -static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
> > +#ifdef CONFIG_KVM_ARCH_GMEM_FREE_ON_SHARED_CONVERSION
> > +static void kvm_gmem_make_shared(struct inode *inode, pgoff_t start, pgoff_t end)
> > {
> > struct folio_batch fbatch;
> > pgoff_t next = start;
> > @@ -681,7 +681,7 @@ static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end)
> > }
> > }
> > #else
> > -static void kvm_gmem_invalidate(struct inode *inode, pgoff_t start, pgoff_t end) {}
> > +static void kvm_gmem_make_shared(struct inode *inode, pgoff_t start, pgoff_t end) { }
> > #endif
> >
> > static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> > @@ -729,7 +729,7 @@ static int __kvm_gmem_set_attributes(struct inode *inode, pgoff_t start,
> > kvm_gmem_invalidate_start(inode, start, end);
> >
> > if (!to_private)
> > - kvm_gmem_invalidate(inode, start, end);
> > + kvm_gmem_make_shared(inode, start, end);
> >
> > mas_store_prealloc(&mas, xa_mk_value(attrs));
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox