* [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support
@ 2025-10-16 17:28 Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file" Sean Christopherson
` (13 more replies)
0 siblings, 14 replies; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
Miguel, you got pulled in due to a one-line change to add a new iterator
macros in .clang-format.
Shivank's series to add support for NUMA-aware memory placement in
guest_memfd. Based on kvm-x86/next.
Note, Ackerley pointed out that we should probably have testing for the
cpuset_do_page_mem_spread() behavior. I 100% agree, but I'm also a-ok
merging without those tests.
v13:
- Collect reviews.
- Add kvm_gmem_for_each_file to .clang-format. [Ackerley]
- Fix typos. [Ackerley]
- Don't use get_task_policy() if the guest_memfd doesn't have its own
policy, so that cpuset_do_page_mem_spread() works. [Ackerley]
- Fix goofs in the changelogs related to numaif.h. [Ackerley]
v12:
- https://lore.kernel.org/all/20251007221420.344669-1-seanjc@google.com
- Add missing functionality to KVM selftests' existing numaif.h instead of
linking to libnuma (which appears to have caveats with -static).
- Add KVM_SYSCALL_DEFINE() infrastructure to reduce the boilerplate needed
to wrap syscalls and/or to assert that a syscall succeeds.
- Rename kvm_gmem to gmem_file, and use gmem_inode for the inode structure.
- Track flags in a gmem_inode field instead of using i_private.
- Add comments to call out subtleties in the mempolicy code (e.g. that
returning NULL for vm_operations_struct.get_policy() is important for ABI
reasons).
- Improve debugability of guest_memfd_test (I kept generating SIGBUS when
tweaking the tests).
- Test mbind() with private memory (sadly, verifying placement with
move_pages() doesn't work due to the dependency on valid page tables).
- V11: Rebase on kvm-next, remove RFC tag, use Ackerley's latest patch
and fix a rcu race bug during kvm module unload.
- V10: Rebase on top of Fuad's V17. Use latest guest_memfd inode patch
from Ackerley (with David's review comments). Use newer kmem_cache_create()
API variant with arg parameter (Vlastimil)
- v9: Rebase on top of Fuad's V13 and incorporate review comments
- v8: Rebase on top of Fuad's V12: Host mmaping for guest_memfd memory.
- v7: Use inodes to store NUMA policy instead of file [0].
- v4-v6: Current approach using shared_policy support and vm_ops (based on
suggestions from David [1] and guest_memfd bi-weekly upstream
call discussion [2]).
- v3: Introduced fbind() syscall for VMM memory-placement configuration.
- v1,v2: Extended the KVM_CREATE_GUEST_MEMFD IOCTL to pass mempolicy.
[0] https://lore.kernel.org/all/diqzbjumm167.fsf@ackerleytng-ctop.c.googlers.com
[1] https://lore.kernel.org/all/6fbef654-36e2-4be5-906e-2a648a845278@redhat.com
[2] https://lore.kernel.org/all/2b77e055-98ac-43a1-a7ad-9f9065d7f38f@amd.com
Ackerley Tng (1):
KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes
Sean Christopherson (7):
KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file"
KVM: guest_memfd: Add macro to iterate over gmem_files for a
mapping/inode
KVM: selftests: Define wrappers for common syscalls to assert success
KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE
by default
KVM: selftests: Add additional equivalents to libnuma APIs in KVM's
numaif.h
KVM: selftests: Use proper uAPI headers to pick up mempolicy.h
definitions
KVM: guest_memfd: Add gmem_inode.flags field instead of using
i_private
Shivank Garg (4):
KVM: guest_memfd: Add slab-allocated inode cache
KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
KVM: selftests: Add helpers to probe for NUMA support, and multi-node
systems
KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support
.clang-format | 1 +
include/uapi/linux/magic.h | 1 +
tools/testing/selftests/kvm/arm64/vgic_irq.c | 2 +-
.../testing/selftests/kvm/guest_memfd_test.c | 98 ++++++
.../selftests/kvm/include/kvm_syscalls.h | 81 +++++
.../testing/selftests/kvm/include/kvm_util.h | 29 +-
tools/testing/selftests/kvm/include/numaif.h | 110 +++---
.../selftests/kvm/kvm_binary_stats_test.c | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 55 +--
.../kvm/x86/private_mem_conversions_test.c | 9 +-
.../selftests/kvm/x86/xapic_ipi_test.c | 5 +-
virt/kvm/guest_memfd.c | 331 ++++++++++++++----
virt/kvm/kvm_main.c | 7 +-
virt/kvm/kvm_mm.h | 9 +-
14 files changed, 565 insertions(+), 177 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/kvm_syscalls.h
base-commit: f222788458c8a7753d43befef2769cd282dc008e
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v13 01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file"
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-17 9:31 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 02/12] KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode Sean Christopherson
` (12 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
Rename the "kvm_gmem" structure to "gmem_file" in anticipation of using
dedicated guest_memfd inodes instead of anonyomous inodes, at which point
the "kvm_gmem" nomenclature becomes quite misleading. In guest_memfd,
inodes are effectively the raw underlying physical storage, and will be
used to track properties of the physical memory, while each gmem file is
effectively a single VM's view of that storage, and is used to track assets
specific to its associated VM, e.g. memslots=>gmem bindings.
Using "kvm_gmem" suggests that the per-VM/per-file structures are _the_
guest_memfd instance, which almost the exact opposite of reality.
Opportunistically rename local variables from "gmem" to "f", again to
avoid confusion once guest_memfd specific inodes come along.
No functional change intended.
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/guest_memfd.c | 98 +++++++++++++++++++++++-------------------
1 file changed, 53 insertions(+), 45 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 1e4af29159ea..2989c5fe426f 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -7,7 +7,16 @@
#include "kvm_mm.h"
-struct kvm_gmem {
+/*
+ * A guest_memfd instance can be associated multiple VMs, each with its own
+ * "view" of the underlying physical memory.
+ *
+ * The gmem's inode is effectively the raw underlying physical storage, and is
+ * used to track properties of the physical memory, while each gmem file is
+ * effectively a single VM's view of that storage, and is used to track assets
+ * specific to its associated VM, e.g. memslots=>gmem bindings.
+ */
+struct gmem_file {
struct kvm *kvm;
struct xarray bindings;
struct list_head entry;
@@ -115,16 +124,16 @@ static enum kvm_gfn_range_filter kvm_gmem_get_invalidate_filter(struct inode *in
return KVM_FILTER_PRIVATE;
}
-static void __kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
+static void __kvm_gmem_invalidate_begin(struct gmem_file *f, pgoff_t start,
pgoff_t end,
enum kvm_gfn_range_filter attr_filter)
{
bool flush = false, found_memslot = false;
struct kvm_memory_slot *slot;
- struct kvm *kvm = gmem->kvm;
+ struct kvm *kvm = f->kvm;
unsigned long index;
- xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) {
+ xa_for_each_range(&f->bindings, index, slot, start, end - 1) {
pgoff_t pgoff = slot->gmem.pgoff;
struct kvm_gfn_range gfn_range = {
@@ -157,20 +166,20 @@ static void kvm_gmem_invalidate_begin(struct inode *inode, pgoff_t start,
{
struct list_head *gmem_list = &inode->i_mapping->i_private_list;
enum kvm_gfn_range_filter attr_filter;
- struct kvm_gmem *gmem;
+ struct gmem_file *f;
attr_filter = kvm_gmem_get_invalidate_filter(inode);
- list_for_each_entry(gmem, gmem_list, entry)
- __kvm_gmem_invalidate_begin(gmem, start, end, attr_filter);
+ list_for_each_entry(f, gmem_list, entry)
+ __kvm_gmem_invalidate_begin(f, start, end, attr_filter);
}
-static void __kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start,
+static void __kvm_gmem_invalidate_end(struct gmem_file *f, pgoff_t start,
pgoff_t end)
{
- struct kvm *kvm = gmem->kvm;
+ struct kvm *kvm = f->kvm;
- if (xa_find(&gmem->bindings, &start, end - 1, XA_PRESENT)) {
+ if (xa_find(&f->bindings, &start, end - 1, XA_PRESENT)) {
KVM_MMU_LOCK(kvm);
kvm_mmu_invalidate_end(kvm);
KVM_MMU_UNLOCK(kvm);
@@ -181,10 +190,10 @@ static void kvm_gmem_invalidate_end(struct inode *inode, pgoff_t start,
pgoff_t end)
{
struct list_head *gmem_list = &inode->i_mapping->i_private_list;
- struct kvm_gmem *gmem;
+ struct gmem_file *f;
- list_for_each_entry(gmem, gmem_list, entry)
- __kvm_gmem_invalidate_end(gmem, start, end);
+ list_for_each_entry(f, gmem_list, entry)
+ __kvm_gmem_invalidate_end(f, start, end);
}
static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
@@ -282,9 +291,9 @@ static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset,
static int kvm_gmem_release(struct inode *inode, struct file *file)
{
- struct kvm_gmem *gmem = file->private_data;
+ struct gmem_file *f = file->private_data;
struct kvm_memory_slot *slot;
- struct kvm *kvm = gmem->kvm;
+ struct kvm *kvm = f->kvm;
unsigned long index;
/*
@@ -304,7 +313,7 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
filemap_invalidate_lock(inode->i_mapping);
- xa_for_each(&gmem->bindings, index, slot)
+ xa_for_each(&f->bindings, index, slot)
WRITE_ONCE(slot->gmem.file, NULL);
/*
@@ -312,18 +321,18 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
* Zap all SPTEs pointed at by this file. Do not free the backing
* memory, as its lifetime is associated with the inode, not the file.
*/
- __kvm_gmem_invalidate_begin(gmem, 0, -1ul,
+ __kvm_gmem_invalidate_begin(f, 0, -1ul,
kvm_gmem_get_invalidate_filter(inode));
- __kvm_gmem_invalidate_end(gmem, 0, -1ul);
+ __kvm_gmem_invalidate_end(f, 0, -1ul);
- list_del(&gmem->entry);
+ list_del(&f->entry);
filemap_invalidate_unlock(inode->i_mapping);
mutex_unlock(&kvm->slots_lock);
- xa_destroy(&gmem->bindings);
- kfree(gmem);
+ xa_destroy(&f->bindings);
+ kfree(f);
kvm_put_kvm(kvm);
@@ -491,7 +500,7 @@ bool __weak kvm_arch_supports_gmem_init_shared(struct kvm *kvm)
static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
{
const char *anon_name = "[kvm-gmem]";
- struct kvm_gmem *gmem;
+ struct gmem_file *f;
struct inode *inode;
struct file *file;
int fd, err;
@@ -500,14 +509,13 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
if (fd < 0)
return fd;
- gmem = kzalloc(sizeof(*gmem), GFP_KERNEL);
- if (!gmem) {
+ f = kzalloc(sizeof(*f), GFP_KERNEL);
+ if (!f) {
err = -ENOMEM;
goto err_fd;
}
- file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem,
- O_RDWR, NULL);
+ file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, f, O_RDWR, NULL);
if (IS_ERR(file)) {
err = PTR_ERR(file);
goto err_gmem;
@@ -529,15 +537,15 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
kvm_get_kvm(kvm);
- gmem->kvm = kvm;
- xa_init(&gmem->bindings);
- list_add(&gmem->entry, &inode->i_mapping->i_private_list);
+ f->kvm = kvm;
+ xa_init(&f->bindings);
+ list_add(&f->entry, &inode->i_mapping->i_private_list);
fd_install(fd, file);
return fd;
err_gmem:
- kfree(gmem);
+ kfree(f);
err_fd:
put_unused_fd(fd);
return err;
@@ -562,7 +570,7 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
{
loff_t size = slot->npages << PAGE_SHIFT;
unsigned long start, end;
- struct kvm_gmem *gmem;
+ struct gmem_file *f;
struct inode *inode;
struct file *file;
int r = -EINVAL;
@@ -576,8 +584,8 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
if (file->f_op != &kvm_gmem_fops)
goto err;
- gmem = file->private_data;
- if (gmem->kvm != kvm)
+ f = file->private_data;
+ if (f->kvm != kvm)
goto err;
inode = file_inode(file);
@@ -591,8 +599,8 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
start = offset >> PAGE_SHIFT;
end = start + slot->npages;
- if (!xa_empty(&gmem->bindings) &&
- xa_find(&gmem->bindings, &start, end - 1, XA_PRESENT)) {
+ if (!xa_empty(&f->bindings) &&
+ xa_find(&f->bindings, &start, end - 1, XA_PRESENT)) {
filemap_invalidate_unlock(inode->i_mapping);
goto err;
}
@@ -607,7 +615,7 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
if (kvm_gmem_supports_mmap(inode))
slot->flags |= KVM_MEMSLOT_GMEM_ONLY;
- xa_store_range(&gmem->bindings, start, end - 1, slot, GFP_KERNEL);
+ xa_store_range(&f->bindings, start, end - 1, slot, GFP_KERNEL);
filemap_invalidate_unlock(inode->i_mapping);
/*
@@ -625,7 +633,7 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
{
unsigned long start = slot->gmem.pgoff;
unsigned long end = start + slot->npages;
- struct kvm_gmem *gmem;
+ struct gmem_file *f;
struct file *file;
/*
@@ -636,10 +644,10 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot)
if (!file)
return;
- gmem = file->private_data;
+ f = file->private_data;
filemap_invalidate_lock(file->f_mapping);
- xa_store_range(&gmem->bindings, start, end - 1, NULL, GFP_KERNEL);
+ xa_store_range(&f->bindings, start, end - 1, NULL, GFP_KERNEL);
/*
* synchronize_srcu(&kvm->srcu) ensured that kvm_gmem_get_pfn()
@@ -657,17 +665,17 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file,
pgoff_t index, kvm_pfn_t *pfn,
bool *is_prepared, int *max_order)
{
- struct file *gmem_file = READ_ONCE(slot->gmem.file);
- struct kvm_gmem *gmem = file->private_data;
+ struct file *slot_file = READ_ONCE(slot->gmem.file);
+ struct gmem_file *f = file->private_data;
struct folio *folio;
- if (file != gmem_file) {
- WARN_ON_ONCE(gmem_file);
+ if (file != slot_file) {
+ WARN_ON_ONCE(slot_file);
return ERR_PTR(-EFAULT);
}
- if (xa_load(&gmem->bindings, index) != slot) {
- WARN_ON_ONCE(xa_load(&gmem->bindings, index));
+ if (xa_load(&f->bindings, index) != slot) {
+ WARN_ON_ONCE(xa_load(&f->bindings, index));
return ERR_PTR(-EIO);
}
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 02/12] KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file" Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-17 9:34 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 03/12] KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes Sean Christopherson
` (11 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
Add a kvm_gmem_for_each_file() to make it more obvious that KVM is
iterating over guest_memfd _files_, not guest_memfd instances, as could
be assumed given the name "gmem_list".
No functional change intended.
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.clang-format | 1 +
virt/kvm/guest_memfd.c | 9 +++++----
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/.clang-format b/.clang-format
index f371a13b4d19..220cda77a8f1 100644
--- a/.clang-format
+++ b/.clang-format
@@ -540,6 +540,7 @@ ForEachMacros:
- 'kvm_for_each_memslot'
- 'kvm_for_each_memslot_in_gfn_range'
- 'kvm_for_each_vcpu'
+ - 'kvm_gmem_for_each_file'
- 'libbpf_nla_for_each_attr'
- 'list_for_each'
- 'list_for_each_codec'
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 2989c5fe426f..5cce20ff418d 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -22,6 +22,9 @@ struct gmem_file {
struct list_head entry;
};
+#define kvm_gmem_for_each_file(f, mapping) \
+ list_for_each_entry(f, &(mapping)->i_private_list, entry)
+
/**
* folio_file_pfn - like folio_file_page, but return a pfn.
* @folio: The folio which contains this index.
@@ -164,13 +167,12 @@ static void __kvm_gmem_invalidate_begin(struct gmem_file *f, pgoff_t start,
static void kvm_gmem_invalidate_begin(struct inode *inode, pgoff_t start,
pgoff_t end)
{
- struct list_head *gmem_list = &inode->i_mapping->i_private_list;
enum kvm_gfn_range_filter attr_filter;
struct gmem_file *f;
attr_filter = kvm_gmem_get_invalidate_filter(inode);
- list_for_each_entry(f, gmem_list, entry)
+ kvm_gmem_for_each_file(f, inode->i_mapping)
__kvm_gmem_invalidate_begin(f, start, end, attr_filter);
}
@@ -189,10 +191,9 @@ static void __kvm_gmem_invalidate_end(struct gmem_file *f, pgoff_t start,
static void kvm_gmem_invalidate_end(struct inode *inode, pgoff_t start,
pgoff_t end)
{
- struct list_head *gmem_list = &inode->i_mapping->i_private_list;
struct gmem_file *f;
- list_for_each_entry(f, gmem_list, entry)
+ kvm_gmem_for_each_file(f, inode->i_mapping)
__kvm_gmem_invalidate_end(f, start, end);
}
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 03/12] KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file" Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 02/12] KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 04/12] KVM: guest_memfd: Add slab-allocated inode cache Sean Christopherson
` (10 subsequent siblings)
13 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
From: Ackerley Tng <ackerleytng@google.com>
guest_memfd's inode represents memory the guest_memfd is
providing. guest_memfd's file represents a struct kvm's view of that
memory.
Using a custom inode allows customization of the inode teardown
process via callbacks. For example, ->evict_inode() allows
customization of the truncation process on file close, and
->destroy_inode() and ->free_inode() allow customization of the inode
freeing process.
Customizing the truncation process allows flexibility in management of
guest_memfd memory and customization of the inode freeing process
allows proper cleanup of memory metadata stored on the inode.
Memory metadata is more appropriately stored on the inode (as opposed
to the file), since the metadata is for the memory and is not unique
to a specific binding and struct kvm.
Acked-by: David Hildenbrand <david@redhat.com>
Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
[sean: drop helpers, open code logic in __kvm_gmem_create()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/uapi/linux/magic.h | 1 +
virt/kvm/guest_memfd.c | 82 +++++++++++++++++++++++++++++++-------
virt/kvm/kvm_main.c | 7 +++-
virt/kvm/kvm_mm.h | 9 +++--
4 files changed, 80 insertions(+), 19 deletions(-)
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index bb575f3ab45e..638ca21b7a90 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -103,5 +103,6 @@
#define DEVMEM_MAGIC 0x454d444d /* "DMEM" */
#define SECRETMEM_MAGIC 0x5345434d /* "SECM" */
#define PID_FS_MAGIC 0x50494446 /* "PIDF" */
+#define GUEST_MEMFD_MAGIC 0x474d454d /* "GMEM" */
#endif /* __LINUX_MAGIC_H__ */
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 5cce20ff418d..ce04fc85e631 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -1,12 +1,16 @@
// SPDX-License-Identifier: GPL-2.0
+#include <linux/anon_inodes.h>
#include <linux/backing-dev.h>
#include <linux/falloc.h>
+#include <linux/fs.h>
#include <linux/kvm_host.h>
+#include <linux/pseudo_fs.h>
#include <linux/pagemap.h>
-#include <linux/anon_inodes.h>
#include "kvm_mm.h"
+static struct vfsmount *kvm_gmem_mnt;
+
/*
* A guest_memfd instance can be associated multiple VMs, each with its own
* "view" of the underlying physical memory.
@@ -424,11 +428,6 @@ static struct file_operations kvm_gmem_fops = {
.fallocate = kvm_gmem_fallocate,
};
-void kvm_gmem_init(struct module *module)
-{
- kvm_gmem_fops.owner = module;
-}
-
static int kvm_gmem_migrate_folio(struct address_space *mapping,
struct folio *dst, struct folio *src,
enum migrate_mode mode)
@@ -500,7 +499,7 @@ bool __weak kvm_arch_supports_gmem_init_shared(struct kvm *kvm)
static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
{
- const char *anon_name = "[kvm-gmem]";
+ static const char *name = "[kvm-gmem]";
struct gmem_file *f;
struct inode *inode;
struct file *file;
@@ -516,16 +515,17 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
goto err_fd;
}
- file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, f, O_RDWR, NULL);
- if (IS_ERR(file)) {
- err = PTR_ERR(file);
+ /* __fput() will take care of fops_put(). */
+ if (!fops_get(&kvm_gmem_fops)) {
+ err = -ENOENT;
goto err_gmem;
}
- file->f_flags |= O_LARGEFILE;
-
- inode = file->f_inode;
- WARN_ON(file->f_mapping != inode->i_mapping);
+ inode = anon_inode_make_secure_inode(kvm_gmem_mnt->mnt_sb, name, NULL);
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
+ goto err_fops;
+ }
inode->i_private = (void *)(unsigned long)flags;
inode->i_op = &kvm_gmem_iops;
@@ -537,6 +537,15 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
/* Unmovable mappings are supposed to be marked unevictable as well. */
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
+ file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
+ if (IS_ERR(file)) {
+ err = PTR_ERR(file);
+ goto err_inode;
+ }
+
+ file->f_flags |= O_LARGEFILE;
+ file->private_data = f;
+
kvm_get_kvm(kvm);
f->kvm = kvm;
xa_init(&f->bindings);
@@ -545,6 +554,10 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
fd_install(fd, file);
return fd;
+err_inode:
+ iput(inode);
+err_fops:
+ fops_put(&kvm_gmem_fops);
err_gmem:
kfree(f);
err_fd:
@@ -816,3 +829,44 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_populate);
#endif
+
+static int kvm_gmem_init_fs_context(struct fs_context *fc)
+{
+ if (!init_pseudo(fc, GUEST_MEMFD_MAGIC))
+ return -ENOMEM;
+
+ fc->s_iflags |= SB_I_NOEXEC;
+ fc->s_iflags |= SB_I_NODEV;
+
+ return 0;
+}
+
+static struct file_system_type kvm_gmem_fs = {
+ .name = "guest_memfd",
+ .init_fs_context = kvm_gmem_init_fs_context,
+ .kill_sb = kill_anon_super,
+};
+
+static int kvm_gmem_init_mount(void)
+{
+ kvm_gmem_mnt = kern_mount(&kvm_gmem_fs);
+
+ if (IS_ERR(kvm_gmem_mnt))
+ return PTR_ERR(kvm_gmem_mnt);
+
+ kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC;
+ return 0;
+}
+
+int kvm_gmem_init(struct module *module)
+{
+ kvm_gmem_fops.owner = module;
+
+ return kvm_gmem_init_mount();
+}
+
+void kvm_gmem_exit(void)
+{
+ kern_unmount(kvm_gmem_mnt);
+ kvm_gmem_mnt = NULL;
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b7a0ae2a7b20..4845e5739436 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -6517,7 +6517,9 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
if (WARN_ON_ONCE(r))
goto err_vfio;
- kvm_gmem_init(module);
+ r = kvm_gmem_init(module);
+ if (r)
+ goto err_gmem;
r = kvm_init_virtualization();
if (r)
@@ -6538,6 +6540,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
err_register:
kvm_uninit_virtualization();
err_virt:
+ kvm_gmem_exit();
+err_gmem:
kvm_vfio_ops_exit();
err_vfio:
kvm_async_pf_deinit();
@@ -6569,6 +6573,7 @@ void kvm_exit(void)
for_each_possible_cpu(cpu)
free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
kmem_cache_destroy(kvm_vcpu_cache);
+ kvm_gmem_exit();
kvm_vfio_ops_exit();
kvm_async_pf_deinit();
kvm_irqfd_exit();
diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h
index 31defb08ccba..9fcc5d5b7f8d 100644
--- a/virt/kvm/kvm_mm.h
+++ b/virt/kvm/kvm_mm.h
@@ -68,17 +68,18 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
#endif /* HAVE_KVM_PFNCACHE */
#ifdef CONFIG_KVM_GUEST_MEMFD
-void kvm_gmem_init(struct module *module);
+int kvm_gmem_init(struct module *module);
+void kvm_gmem_exit(void);
int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args);
int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
unsigned int fd, loff_t offset);
void kvm_gmem_unbind(struct kvm_memory_slot *slot);
#else
-static inline void kvm_gmem_init(struct module *module)
+static inline int kvm_gmem_init(struct module *module)
{
-
+ return 0;
}
-
+static inline void kvm_gmem_exit(void) {};
static inline int kvm_gmem_bind(struct kvm *kvm,
struct kvm_memory_slot *slot,
unsigned int fd, loff_t offset)
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 04/12] KVM: guest_memfd: Add slab-allocated inode cache
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (2 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 03/12] KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-27 11:06 ` Vlastimil Babka
2025-10-16 17:28 ` [PATCH v13 05/12] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy Sean Christopherson
` (9 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
From: Shivank Garg <shivankg@amd.com>
Add a dedicated gmem_inode structure and a slab-allocated inode cache for
guest memory backing, similar to how shmem handles inodes.
This adds the necessary allocation/destruction functions and prepares
for upcoming guest_memfd NUMA policy support changes. Using a dedicated
structure will also allow for additional cleanups, e.g. to track flags in
gmem_inode instead of i_private.
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
[sean: s/kvm_gmem_inode_info/gmem_inode, name init_once()]
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/guest_memfd.c | 77 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 76 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index ce04fc85e631..88fd812f0f31 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -26,6 +26,15 @@ struct gmem_file {
struct list_head entry;
};
+struct gmem_inode {
+ struct inode vfs_inode;
+};
+
+static __always_inline struct gmem_inode *GMEM_I(struct inode *inode)
+{
+ return container_of(inode, struct gmem_inode, vfs_inode);
+}
+
#define kvm_gmem_for_each_file(f, mapping) \
list_for_each_entry(f, &(mapping)->i_private_list, entry)
@@ -830,13 +839,61 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_gmem_populate);
#endif
+static struct kmem_cache *kvm_gmem_inode_cachep;
+
+static void kvm_gmem_init_inode_once(void *__gi)
+{
+ struct gmem_inode *gi = __gi;
+
+ /*
+ * Note! Don't initialize the inode with anything specific to the
+ * guest_memfd instance, or that might be specific to how the inode is
+ * used (from the VFS-layer's perspective). This hook is called only
+ * during the initial slab allocation, i.e. only fields/state that are
+ * idempotent across _all_ use of the inode _object_ can be initialized
+ * at this time!
+ */
+ inode_init_once(&gi->vfs_inode);
+}
+
+static struct inode *kvm_gmem_alloc_inode(struct super_block *sb)
+{
+ struct gmem_inode *gi;
+
+ gi = alloc_inode_sb(sb, kvm_gmem_inode_cachep, GFP_KERNEL);
+ if (!gi)
+ return NULL;
+
+ return &gi->vfs_inode;
+}
+
+static void kvm_gmem_destroy_inode(struct inode *inode)
+{
+}
+
+static void kvm_gmem_free_inode(struct inode *inode)
+{
+ kmem_cache_free(kvm_gmem_inode_cachep, GMEM_I(inode));
+}
+
+static const struct super_operations kvm_gmem_super_operations = {
+ .statfs = simple_statfs,
+ .alloc_inode = kvm_gmem_alloc_inode,
+ .destroy_inode = kvm_gmem_destroy_inode,
+ .free_inode = kvm_gmem_free_inode,
+};
+
static int kvm_gmem_init_fs_context(struct fs_context *fc)
{
+ struct pseudo_fs_context *ctx;
+
if (!init_pseudo(fc, GUEST_MEMFD_MAGIC))
return -ENOMEM;
fc->s_iflags |= SB_I_NOEXEC;
fc->s_iflags |= SB_I_NODEV;
+ ctx = fc->fs_private;
+ ctx->ops = &kvm_gmem_super_operations;
return 0;
}
@@ -860,13 +917,31 @@ static int kvm_gmem_init_mount(void)
int kvm_gmem_init(struct module *module)
{
+ struct kmem_cache_args args = {
+ .align = 0,
+ .ctor = kvm_gmem_init_inode_once,
+ };
+ int ret;
+
kvm_gmem_fops.owner = module;
+ kvm_gmem_inode_cachep = kmem_cache_create("kvm_gmem_inode_cache",
+ sizeof(struct gmem_inode),
+ &args, SLAB_ACCOUNT);
+ if (!kvm_gmem_inode_cachep)
+ return -ENOMEM;
- return kvm_gmem_init_mount();
+ ret = kvm_gmem_init_mount();
+ if (ret) {
+ kmem_cache_destroy(kvm_gmem_inode_cachep);
+ return ret;
+ }
+ return 0;
}
void kvm_gmem_exit(void)
{
kern_unmount(kvm_gmem_mnt);
kvm_gmem_mnt = NULL;
+ rcu_barrier();
+ kmem_cache_destroy(kvm_gmem_inode_cachep);
}
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 05/12] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (3 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 04/12] KVM: guest_memfd: Add slab-allocated inode cache Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 06/12] KVM: selftests: Define wrappers for common syscalls to assert success Sean Christopherson
` (8 subsequent siblings)
13 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
From: Shivank Garg <shivankg@amd.com>
Previously, guest-memfd allocations followed local NUMA node id in absence
of process mempolicy, resulting in arbitrary memory allocation.
Moreover, mbind() couldn't be used by the VMM as guest memory wasn't
mapped into userspace when allocation occurred.
Enable NUMA policy support by implementing vm_ops for guest-memfd mmap
operation. This allows the VMM to use mmap()+mbind() to set the desired
NUMA policy for a range of memory, and provides fine-grained control over
guest memory allocation across NUMA nodes.
Note, using mmap()+mbind() works even for PRIVATE memory, as mbind()
doesn't require the memory to be faulted in. However, get_mempolicy()
and other paths that require the userspace page tables to be populated
may return incorrect information for PRIVATE memory (though under the hood,
KVM+guest_memfd will still behave correctly).
Store the policy in the inode structure, gmem_inode, as a shared memory
policy, so that the policy is a property of the physical memory itself,
i.e. not bound to the VMA. In guest_memfd, KVM is the primary MMU and any
VMAs are secondary, i.e. using mbind() on a VMA to set policy is a means
to an end, e.g. to avoid having to add a file-based equivalent to mbind().
Similarly, retrieve the policy via mpol_shared_policy_lookup(), not
get_vma_policy(), even when allocating to fault in memory for userspace
mappings, so that the policy stored in gmem_inode is always the source of
true.
Apply policy changes only to future allocations, i.e. do not migrate
existing memory in the guest_memfd instance. This matches mbind(2)'s
default behavior, which affects only new allocations unless overridden
with MPOL_MF_MOVE/MPOL_MF_MOVE_ALL flags (which are not supported by
guest_memfd as guest_memfd memory is unmovable).
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
Link: https://lore.kernel.org/all/e9d43abc-bcdb-4f9f-9ad7-5644f714de19@amd.com
[sean: fold in fixup (see Link above), massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/guest_memfd.c | 58 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 56 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 88fd812f0f31..4463643bd0a2 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -4,6 +4,7 @@
#include <linux/falloc.h>
#include <linux/fs.h>
#include <linux/kvm_host.h>
+#include <linux/mempolicy.h>
#include <linux/pseudo_fs.h>
#include <linux/pagemap.h>
@@ -27,6 +28,7 @@ struct gmem_file {
};
struct gmem_inode {
+ struct shared_policy policy;
struct inode vfs_inode;
};
@@ -129,7 +131,25 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
{
/* TODO: Support huge pages. */
- return filemap_grab_folio(inode->i_mapping, index);
+ struct mempolicy *policy;
+ struct folio *folio;
+
+ /*
+ * Fast-path: See if folio is already present in mapping to avoid
+ * policy_lookup.
+ */
+ folio = __filemap_get_folio(inode->i_mapping, index,
+ FGP_LOCK | FGP_ACCESSED, 0);
+ if (!IS_ERR(folio))
+ return folio;
+
+ policy = mpol_shared_policy_lookup(&GMEM_I(inode)->policy, index);
+ folio = __filemap_get_folio_mpol(inode->i_mapping, index,
+ FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
+ mapping_gfp_mask(inode->i_mapping), policy);
+ mpol_cond_put(policy);
+
+ return folio;
}
static enum kvm_gfn_range_filter kvm_gmem_get_invalidate_filter(struct inode *inode)
@@ -411,8 +431,40 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
return ret;
}
+#ifdef CONFIG_NUMA
+static int kvm_gmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
+{
+ struct inode *inode = file_inode(vma->vm_file);
+
+ return mpol_set_shared_policy(&GMEM_I(inode)->policy, vma, mpol);
+}
+
+static struct mempolicy *kvm_gmem_get_policy(struct vm_area_struct *vma,
+ unsigned long addr, pgoff_t *pgoff)
+{
+ struct inode *inode = file_inode(vma->vm_file);
+
+ *pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT);
+
+ /*
+ * Return the memory policy for this index, or NULL if none is set.
+ *
+ * Returning NULL, e.g. instead of the current task's memory policy, is
+ * important for the .get_policy kernel ABI: it indicates that no
+ * explicit policy has been set via mbind() for this memory. The caller
+ * can then replace NULL with the default memory policy instead of the
+ * current task's memory policy.
+ */
+ return mpol_shared_policy_lookup(&GMEM_I(inode)->policy, *pgoff);
+}
+#endif /* CONFIG_NUMA */
+
static const struct vm_operations_struct kvm_gmem_vm_ops = {
- .fault = kvm_gmem_fault_user_mapping,
+ .fault = kvm_gmem_fault_user_mapping,
+#ifdef CONFIG_NUMA
+ .get_policy = kvm_gmem_get_policy,
+ .set_policy = kvm_gmem_set_policy,
+#endif
};
static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
@@ -864,11 +916,13 @@ static struct inode *kvm_gmem_alloc_inode(struct super_block *sb)
if (!gi)
return NULL;
+ mpol_shared_policy_init(&gi->policy, NULL);
return &gi->vfs_inode;
}
static void kvm_gmem_destroy_inode(struct inode *inode)
{
+ mpol_free_shared_policy(&GMEM_I(inode)->policy);
}
static void kvm_gmem_free_inode(struct inode *inode)
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 06/12] KVM: selftests: Define wrappers for common syscalls to assert success
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (4 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 05/12] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-17 9:51 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 07/12] KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default Sean Christopherson
` (7 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
Add kvm_<sycall> wrappers for munmap(), close(), fallocate(), and
ftruncate() to cut down on boilerplate code when a sycall is expected
to succeed, and to make it easier for developers to remember to assert
success.
Implement and use a macro framework similar to the kernel's SYSCALL_DEFINE
infrastructure to further cut down on boilerplate code, and to drastically
reduce the probability of typos as the kernel's syscall definitions can be
copy+paste almost verbatim.
Provide macros to build the raw <sycall>() wrappers as well, e.g. to
replace hand-coded wrappers (NUMA) or pure open-coded calls.
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/arm64/vgic_irq.c | 2 +-
.../selftests/kvm/include/kvm_syscalls.h | 81 +++++++++++++++++++
.../testing/selftests/kvm/include/kvm_util.h | 29 +------
.../selftests/kvm/kvm_binary_stats_test.c | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 31 ++-----
.../kvm/x86/private_mem_conversions_test.c | 9 +--
6 files changed, 96 insertions(+), 60 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/kvm_syscalls.h
diff --git a/tools/testing/selftests/kvm/arm64/vgic_irq.c b/tools/testing/selftests/kvm/arm64/vgic_irq.c
index 6338f5bbdb70..8d7758f12280 100644
--- a/tools/testing/selftests/kvm/arm64/vgic_irq.c
+++ b/tools/testing/selftests/kvm/arm64/vgic_irq.c
@@ -636,7 +636,7 @@ static void kvm_routing_and_irqfd_check(struct kvm_vm *vm,
}
for (f = 0, i = intid; i < (uint64_t)intid + num; i++, f++)
- close(fd[f]);
+ kvm_close(fd[f]);
}
/* handles the valid case: intid=0xffffffff num=1 */
diff --git a/tools/testing/selftests/kvm/include/kvm_syscalls.h b/tools/testing/selftests/kvm/include/kvm_syscalls.h
new file mode 100644
index 000000000000..d4e613162bba
--- /dev/null
+++ b/tools/testing/selftests/kvm/include/kvm_syscalls.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef SELFTEST_KVM_SYSCALLS_H
+#define SELFTEST_KVM_SYSCALLS_H
+
+#include <sys/syscall.h>
+
+#define MAP_ARGS0(m,...)
+#define MAP_ARGS1(m,t,a,...) m(t,a)
+#define MAP_ARGS2(m,t,a,...) m(t,a), MAP_ARGS1(m,__VA_ARGS__)
+#define MAP_ARGS3(m,t,a,...) m(t,a), MAP_ARGS2(m,__VA_ARGS__)
+#define MAP_ARGS4(m,t,a,...) m(t,a), MAP_ARGS3(m,__VA_ARGS__)
+#define MAP_ARGS5(m,t,a,...) m(t,a), MAP_ARGS4(m,__VA_ARGS__)
+#define MAP_ARGS6(m,t,a,...) m(t,a), MAP_ARGS5(m,__VA_ARGS__)
+#define MAP_ARGS(n,...) MAP_ARGS##n(__VA_ARGS__)
+
+#define __DECLARE_ARGS(t, a) t a
+#define __UNPACK_ARGS(t, a) a
+
+#define DECLARE_ARGS(nr_args, args...) MAP_ARGS(nr_args, __DECLARE_ARGS, args)
+#define UNPACK_ARGS(nr_args, args...) MAP_ARGS(nr_args, __UNPACK_ARGS, args)
+
+#define __KVM_SYSCALL_ERROR(_name, _ret) \
+ "%s failed, rc: %i errno: %i (%s)", (_name), (_ret), errno, strerror(errno)
+
+/* Define a kvm_<syscall>() API to assert success. */
+#define __KVM_SYSCALL_DEFINE(name, nr_args, args...) \
+static inline void kvm_##name(DECLARE_ARGS(nr_args, args)) \
+{ \
+ int r; \
+ \
+ r = name(UNPACK_ARGS(nr_args, args)); \
+ TEST_ASSERT(!r, __KVM_SYSCALL_ERROR(#name, r)); \
+}
+
+/*
+ * Macro to define syscall APIs, either because KVM selftests doesn't link to
+ * the standard library, e.g. libnuma, or because there is no library that yet
+ * provides the syscall. These
+ */
+#define KVM_SYSCALL_DEFINE(name, nr_args, args...) \
+static inline long name(DECLARE_ARGS(nr_args, args)) \
+{ \
+ return syscall(__NR_##name, UNPACK_ARGS(nr_args, args)); \
+} \
+__KVM_SYSCALL_DEFINE(name, nr_args, args)
+
+/*
+ * Special case mmap(), as KVM selftest rarely/never specific an address,
+ * rarely specify an offset, and because the unique return code requires
+ * special handling anyways.
+ */
+static inline void *__kvm_mmap(size_t size, int prot, int flags, int fd,
+ off_t offset)
+{
+ void *mem;
+
+ mem = mmap(NULL, size, prot, flags, fd, offset);
+ TEST_ASSERT(mem != MAP_FAILED, __KVM_SYSCALL_ERROR("mmap()",
+ (int)(unsigned long)MAP_FAILED));
+ return mem;
+}
+
+static inline void *kvm_mmap(size_t size, int prot, int flags, int fd)
+{
+ return __kvm_mmap(size, prot, flags, fd, 0);
+}
+
+static inline int kvm_dup(int fd)
+{
+ int new_fd = dup(fd);
+
+ TEST_ASSERT(new_fd >= 0, __KVM_SYSCALL_ERROR("dup()", new_fd));
+ return new_fd;
+}
+
+__KVM_SYSCALL_DEFINE(munmap, 2, void *, mem, size_t, size);
+__KVM_SYSCALL_DEFINE(close, 1, int, fd);
+__KVM_SYSCALL_DEFINE(fallocate, 4, int, fd, int, mode, loff_t, offset, loff_t, len);
+__KVM_SYSCALL_DEFINE(ftruncate, 2, unsigned int, fd, off_t, length);
+
+#endif /* SELFTEST_KVM_SYSCALLS_H */
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index ee60dbf5208a..c610169933ef 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -23,6 +23,7 @@
#include <pthread.h>
+#include "kvm_syscalls.h"
#include "kvm_util_arch.h"
#include "kvm_util_types.h"
#include "sparsebit.h"
@@ -283,34 +284,6 @@ static inline bool kvm_has_cap(long cap)
return kvm_check_cap(cap);
}
-#define __KVM_SYSCALL_ERROR(_name, _ret) \
- "%s failed, rc: %i errno: %i (%s)", (_name), (_ret), errno, strerror(errno)
-
-static inline void *__kvm_mmap(size_t size, int prot, int flags, int fd,
- off_t offset)
-{
- void *mem;
-
- mem = mmap(NULL, size, prot, flags, fd, offset);
- TEST_ASSERT(mem != MAP_FAILED, __KVM_SYSCALL_ERROR("mmap()",
- (int)(unsigned long)MAP_FAILED));
-
- return mem;
-}
-
-static inline void *kvm_mmap(size_t size, int prot, int flags, int fd)
-{
- return __kvm_mmap(size, prot, flags, fd, 0);
-}
-
-static inline void kvm_munmap(void *mem, size_t size)
-{
- int ret;
-
- ret = munmap(mem, size);
- TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
-}
-
/*
* Use the "inner", double-underscore macro when reporting errors from within
* other macros so that the name of ioctl() and not its literal numeric value
diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index f02355c3c4c2..b7dbde9c0843 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -239,14 +239,14 @@ int main(int argc, char *argv[])
* single stats file works and doesn't cause explosions.
*/
vm_stats_fds = vm_get_stats_fd(vms[i]);
- stats_test(dup(vm_stats_fds));
+ stats_test(kvm_dup(vm_stats_fds));
/* Verify userspace can instantiate multiple stats files. */
stats_test(vm_get_stats_fd(vms[i]));
for (j = 0; j < max_vcpu; ++j) {
vcpu_stats_fds[j] = vcpu_get_stats_fd(vcpus[i * max_vcpu + j]);
- stats_test(dup(vcpu_stats_fds[j]));
+ stats_test(kvm_dup(vcpu_stats_fds[j]));
stats_test(vcpu_get_stats_fd(vcpus[i * max_vcpu + j]));
}
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 83a721be7ec5..8b60b767224b 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -704,8 +704,6 @@ userspace_mem_region_find(struct kvm_vm *vm, uint64_t start, uint64_t end)
static void kvm_stats_release(struct kvm_binary_stats *stats)
{
- int ret;
-
if (stats->fd < 0)
return;
@@ -714,8 +712,7 @@ static void kvm_stats_release(struct kvm_binary_stats *stats)
stats->desc = NULL;
}
- ret = close(stats->fd);
- TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("close()", ret));
+ kvm_close(stats->fd);
stats->fd = -1;
}
@@ -738,8 +735,6 @@ __weak void vcpu_arch_free(struct kvm_vcpu *vcpu)
*/
static void vm_vcpu_rm(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
{
- int ret;
-
if (vcpu->dirty_gfns) {
kvm_munmap(vcpu->dirty_gfns, vm->dirty_ring_size);
vcpu->dirty_gfns = NULL;
@@ -747,9 +742,7 @@ static void vm_vcpu_rm(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
kvm_munmap(vcpu->run, vcpu_mmap_sz());
- ret = close(vcpu->fd);
- TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("close()", ret));
-
+ kvm_close(vcpu->fd);
kvm_stats_release(&vcpu->stats);
list_del(&vcpu->list);
@@ -761,16 +754,12 @@ static void vm_vcpu_rm(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
void kvm_vm_release(struct kvm_vm *vmp)
{
struct kvm_vcpu *vcpu, *tmp;
- int ret;
list_for_each_entry_safe(vcpu, tmp, &vmp->vcpus, list)
vm_vcpu_rm(vmp, vcpu);
- ret = close(vmp->fd);
- TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("close()", ret));
-
- ret = close(vmp->kvm_fd);
- TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("close()", ret));
+ kvm_close(vmp->fd);
+ kvm_close(vmp->kvm_fd);
/* Free cached stats metadata and close FD */
kvm_stats_release(&vmp->stats);
@@ -828,7 +817,7 @@ void kvm_vm_free(struct kvm_vm *vmp)
int kvm_memfd_alloc(size_t size, bool hugepages)
{
int memfd_flags = MFD_CLOEXEC;
- int fd, r;
+ int fd;
if (hugepages)
memfd_flags |= MFD_HUGETLB;
@@ -836,11 +825,8 @@ int kvm_memfd_alloc(size_t size, bool hugepages)
fd = memfd_create("kvm_selftest", memfd_flags);
TEST_ASSERT(fd != -1, __KVM_SYSCALL_ERROR("memfd_create()", fd));
- r = ftruncate(fd, size);
- TEST_ASSERT(!r, __KVM_SYSCALL_ERROR("ftruncate()", r));
-
- r = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, size);
- TEST_ASSERT(!r, __KVM_SYSCALL_ERROR("fallocate()", r));
+ kvm_ftruncate(fd, size);
+ kvm_fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, size);
return fd;
}
@@ -1084,8 +1070,7 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
* needing to track if the fd is owned by the framework
* or by the caller.
*/
- guest_memfd = dup(guest_memfd);
- TEST_ASSERT(guest_memfd >= 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd));
+ guest_memfd = kvm_dup(guest_memfd);
}
region->region.guest_memfd = guest_memfd;
diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
index 82a8d88b5338..1969f4ab9b28 100644
--- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
+++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c
@@ -380,7 +380,7 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
pthread_t threads[KVM_MAX_VCPUS];
struct kvm_vm *vm;
- int memfd, i, r;
+ int memfd, i;
const struct vm_shape shape = {
.mode = VM_MODE_DEFAULT,
@@ -428,11 +428,8 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t
* should prevent the VM from being fully destroyed until the last
* reference to the guest_memfd is also put.
*/
- r = fallocate(memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, memfd_size);
- TEST_ASSERT(!r, __KVM_SYSCALL_ERROR("fallocate()", r));
-
- r = fallocate(memfd, FALLOC_FL_KEEP_SIZE, 0, memfd_size);
- TEST_ASSERT(!r, __KVM_SYSCALL_ERROR("fallocate()", r));
+ kvm_fallocate(memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, memfd_size);
+ kvm_fallocate(memfd, FALLOC_FL_KEEP_SIZE, 0, memfd_size);
close(memfd);
}
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 07/12] KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (5 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 06/12] KVM: selftests: Define wrappers for common syscalls to assert success Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-17 9:53 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 08/12] KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h Sean Christopherson
` (6 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
Register handlers for signals for all selftests that are likely happen due
to test (or kernel) bugs, and explicitly fail tests on unexpected signals
so that users get a stack trace, i.e. don't have to go spelunking to do
basic triage.
Register the handlers as early as possible, to catch as many unexpected
signals as possible, and also so that the common code doesn't clobber a
handler that's installed by test (or arch) code.
Tested-by: Ackerley Tng <ackerleytng@google.com>
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 24 ++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 8b60b767224b..0c3a6a40d1a9 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2290,11 +2290,35 @@ __weak void kvm_selftest_arch_init(void)
{
}
+static void report_unexpected_signal(int signum)
+{
+#define KVM_CASE_SIGNUM(sig) \
+ case sig: TEST_FAIL("Unexpected " #sig " (%d)\n", signum)
+
+ switch (signum) {
+ KVM_CASE_SIGNUM(SIGBUS);
+ KVM_CASE_SIGNUM(SIGSEGV);
+ KVM_CASE_SIGNUM(SIGILL);
+ KVM_CASE_SIGNUM(SIGFPE);
+ default:
+ TEST_FAIL("Unexpected signal %d\n", signum);
+ }
+}
+
void __attribute((constructor)) kvm_selftest_init(void)
{
+ struct sigaction sig_sa = {
+ .sa_handler = report_unexpected_signal,
+ };
+
/* Tell stdout not to buffer its content. */
setbuf(stdout, NULL);
+ sigaction(SIGBUS, &sig_sa, NULL);
+ sigaction(SIGSEGV, &sig_sa, NULL);
+ sigaction(SIGILL, &sig_sa, NULL);
+ sigaction(SIGFPE, &sig_sa, NULL);
+
guest_random_seed = last_guest_seed = random();
pr_info("Random seed: 0x%x\n", guest_random_seed);
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 08/12] KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (6 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 07/12] KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-17 10:12 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions Sean Christopherson
` (5 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
Add APIs for all syscalls defined in the kernel's mm/mempolicy.c to match
those that would be provided by linking to libnuma. Opportunistically use
the recently inroduced KVM_SYSCALL_DEFINE() builders to take care of the
boilerplate, and to fix a flaw where the two existing wrappers would
generate multiple symbols if numaif.h were to be included multiple times.
Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/numaif.h | 36 +++++++++++--------
.../selftests/kvm/x86/xapic_ipi_test.c | 5 ++-
2 files changed, 23 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/numaif.h b/tools/testing/selftests/kvm/include/numaif.h
index b020547403fd..aaa4ac174890 100644
--- a/tools/testing/selftests/kvm/include/numaif.h
+++ b/tools/testing/selftests/kvm/include/numaif.h
@@ -13,23 +13,29 @@
#ifndef SELFTEST_KVM_NUMAIF_H
#define SELFTEST_KVM_NUMAIF_H
-#define __NR_get_mempolicy 239
-#define __NR_migrate_pages 256
+#include <linux/mempolicy.h>
-/* System calls */
-long get_mempolicy(int *policy, const unsigned long *nmask,
- unsigned long maxnode, void *addr, int flags)
-{
- return syscall(__NR_get_mempolicy, policy, nmask,
- maxnode, addr, flags);
-}
+#include "kvm_syscalls.h"
-long migrate_pages(int pid, unsigned long maxnode,
- const unsigned long *frommask,
- const unsigned long *tomask)
-{
- return syscall(__NR_migrate_pages, pid, maxnode, frommask, tomask);
-}
+KVM_SYSCALL_DEFINE(get_mempolicy, 5, int *, policy, const unsigned long *, nmask,
+ unsigned long, maxnode, void *, addr, int, flags);
+
+KVM_SYSCALL_DEFINE(set_mempolicy, 3, int, mode, const unsigned long *, nmask,
+ unsigned long, maxnode);
+
+KVM_SYSCALL_DEFINE(set_mempolicy_home_node, 4, unsigned long, start,
+ unsigned long, len, unsigned long, home_node,
+ unsigned long, flags);
+
+KVM_SYSCALL_DEFINE(migrate_pages, 4, int, pid, unsigned long, maxnode,
+ const unsigned long *, frommask, const unsigned long *, tomask);
+
+KVM_SYSCALL_DEFINE(move_pages, 6, int, pid, unsigned long, count, void *, pages,
+ const int *, nodes, int *, status, int, flags);
+
+KVM_SYSCALL_DEFINE(mbind, 6, void *, addr, unsigned long, size, int, mode,
+ const unsigned long *, nodemask, unsigned long, maxnode,
+ unsigned int, flags);
/* Policies */
#define MPOL_DEFAULT 0
diff --git a/tools/testing/selftests/kvm/x86/xapic_ipi_test.c b/tools/testing/selftests/kvm/x86/xapic_ipi_test.c
index 35cb9de54a82..ae4a4b6c05ca 100644
--- a/tools/testing/selftests/kvm/x86/xapic_ipi_test.c
+++ b/tools/testing/selftests/kvm/x86/xapic_ipi_test.c
@@ -256,7 +256,7 @@ void do_migrations(struct test_data_page *data, int run_secs, int delay_usecs,
int nodes = 0;
time_t start_time, last_update, now;
time_t interval_secs = 1;
- int i, r;
+ int i;
int from, to;
unsigned long bit;
uint64_t hlt_count;
@@ -267,9 +267,8 @@ void do_migrations(struct test_data_page *data, int run_secs, int delay_usecs,
delay_usecs);
/* Get set of first 64 numa nodes available */
- r = get_mempolicy(NULL, &nodemask, sizeof(nodemask) * 8,
+ kvm_get_mempolicy(NULL, &nodemask, sizeof(nodemask) * 8,
0, MPOL_F_MEMS_ALLOWED);
- TEST_ASSERT(r == 0, "get_mempolicy failed errno=%d", errno);
fprintf(stderr, "Numa nodes found amongst first %lu possible nodes "
"(each 1-bit indicates node is present): %#lx\n",
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (7 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 08/12] KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-17 10:35 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 10/12] KVM: selftests: Add helpers to probe for NUMA support, and multi-node systems Sean Christopherson
` (4 subsequent siblings)
13 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
Drop the KVM's re-definitions of MPOL_xxx flags in numaif.h as they are
defined by the already-included, kernel-provided mempolicy.h. The only
reason the duplicate definitions don't cause compiler warnings is because
they are identical, but only on x86-64! The syscall numbers in particular
are subtly x86_64-specific, i.e. will cause problems if/when numaif.h is
used outsize of x86.
Opportunistically clean up the file comment as the license information is
covered by the SPDX header, the path is superfluous, and as above the
comment about the contents is flat out wrong.
Fixes: 346b59f220a2 ("KVM: selftests: Add missing header file needed by xAPIC IPI tests")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/numaif.h | 32 +-------------------
1 file changed, 1 insertion(+), 31 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/numaif.h b/tools/testing/selftests/kvm/include/numaif.h
index aaa4ac174890..1554003c40a1 100644
--- a/tools/testing/selftests/kvm/include/numaif.h
+++ b/tools/testing/selftests/kvm/include/numaif.h
@@ -1,14 +1,5 @@
/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * tools/testing/selftests/kvm/include/numaif.h
- *
- * Copyright (C) 2020, Google LLC.
- *
- * This work is licensed under the terms of the GNU GPL, version 2.
- *
- * Header file that provides access to NUMA API functions not explicitly
- * exported to user space.
- */
+/* Copyright (C) 2020, Google LLC. */
#ifndef SELFTEST_KVM_NUMAIF_H
#define SELFTEST_KVM_NUMAIF_H
@@ -37,25 +28,4 @@ KVM_SYSCALL_DEFINE(mbind, 6, void *, addr, unsigned long, size, int, mode,
const unsigned long *, nodemask, unsigned long, maxnode,
unsigned int, flags);
-/* Policies */
-#define MPOL_DEFAULT 0
-#define MPOL_PREFERRED 1
-#define MPOL_BIND 2
-#define MPOL_INTERLEAVE 3
-
-#define MPOL_MAX MPOL_INTERLEAVE
-
-/* Flags for get_mem_policy */
-#define MPOL_F_NODE (1<<0) /* return next il node or node of address */
- /* Warning: MPOL_F_NODE is unsupported and
- * subject to change. Don't use.
- */
-#define MPOL_F_ADDR (1<<1) /* look up vma using address */
-#define MPOL_F_MEMS_ALLOWED (1<<2) /* query nodes allowed in cpuset */
-
-/* Flags for mbind */
-#define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */
-#define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform to mapping */
-#define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */
-
#endif /* SELFTEST_KVM_NUMAIF_H */
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 10/12] KVM: selftests: Add helpers to probe for NUMA support, and multi-node systems
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (8 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 11/12] KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support Sean Christopherson
` (3 subsequent siblings)
13 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
From: Shivank Garg <shivankg@amd.com>
Add NUMA helpers to probe for support/availability and to check if the
test is running on a multi-node system. The APIs will be used to verify
guest_memfd NUMA support.
Signed-off-by: Shivank Garg <shivankg@amd.com>
[sean: land helpers in numaif.h, add comments, tweak names]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/numaif.h | 52 ++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/numaif.h b/tools/testing/selftests/kvm/include/numaif.h
index 1554003c40a1..29572a6d789c 100644
--- a/tools/testing/selftests/kvm/include/numaif.h
+++ b/tools/testing/selftests/kvm/include/numaif.h
@@ -4,6 +4,8 @@
#ifndef SELFTEST_KVM_NUMAIF_H
#define SELFTEST_KVM_NUMAIF_H
+#include <dirent.h>
+
#include <linux/mempolicy.h>
#include "kvm_syscalls.h"
@@ -28,4 +30,54 @@ KVM_SYSCALL_DEFINE(mbind, 6, void *, addr, unsigned long, size, int, mode,
const unsigned long *, nodemask, unsigned long, maxnode,
unsigned int, flags);
+static inline int get_max_numa_node(void)
+{
+ struct dirent *de;
+ int max_node = 0;
+ DIR *d;
+
+ /*
+ * Assume there's a single node if the kernel doesn't support NUMA,
+ * or if no nodes are found.
+ */
+ d = opendir("/sys/devices/system/node");
+ if (!d)
+ return 0;
+
+ while ((de = readdir(d)) != NULL) {
+ int node_id;
+ char *endptr;
+
+ if (strncmp(de->d_name, "node", 4) != 0)
+ continue;
+
+ node_id = strtol(de->d_name + 4, &endptr, 10);
+ if (*endptr != '\0')
+ continue;
+
+ if (node_id > max_node)
+ max_node = node_id;
+ }
+ closedir(d);
+
+ return max_node;
+}
+
+static bool is_numa_available(void)
+{
+ /*
+ * Probe for NUMA by doing a dummy get_mempolicy(). If the syscall
+ * fails with ENOSYS, then the kernel was built without NUMA support.
+ * if the syscall fails with EPERM, then the process/user lacks the
+ * necessary capabilities (CAP_SYS_NICE).
+ */
+ return !get_mempolicy(NULL, NULL, 0, NULL, 0) ||
+ (errno != ENOSYS && errno != EPERM);
+}
+
+static inline bool is_multi_numa_node_system(void)
+{
+ return is_numa_available() && get_max_numa_node() >= 1;
+}
+
#endif /* SELFTEST_KVM_NUMAIF_H */
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 11/12] KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (9 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 10/12] KVM: selftests: Add helpers to probe for NUMA support, and multi-node systems Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 12/12] KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private Sean Christopherson
` (2 subsequent siblings)
13 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
From: Shivank Garg <shivankg@amd.com>
Add tests for NUMA memory policy binding and NUMA aware allocation in
guest_memfd. This extends the existing selftests by adding proper
validation for:
- KVM GMEM set_policy and get_policy() vm_ops functionality using
mbind() and get_mempolicy()
- NUMA policy application before and after memory allocation
Run the NUMA mbind() test with and without INIT_SHARED, as KVM should allow
doing mbind(), madvise(), etc. on guest-private memory, e.g. so that
userspace can set NUMA policy for CoCo VMs.
Run the NUMA allocation test only for INIT_SHARED, i.e. if the host can't
fault-in memory (via direct access, madvise(), etc.) as move_pages()
returns -ENOENT if the page hasn't been faulted in (walks the host page
tables to find the associated folio)
[sean: don't skip entire test when running on non-NUMA system, test mbind()
with private memory, provide more info in assert messages]
Signed-off-by: Shivank Garg <shivankg@amd.com>
Tested-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../testing/selftests/kvm/guest_memfd_test.c | 98 +++++++++++++++++++
1 file changed, 98 insertions(+)
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index e7d9aeb418d3..618c937f3c90 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -19,6 +19,7 @@
#include <sys/stat.h>
#include "kvm_util.h"
+#include "numaif.h"
#include "test_util.h"
#include "ucall_common.h"
@@ -75,6 +76,101 @@ static void test_mmap_supported(int fd, size_t total_size)
kvm_munmap(mem, total_size);
}
+static void test_mbind(int fd, size_t total_size)
+{
+ const unsigned long nodemask_0 = 1; /* nid: 0 */
+ unsigned long nodemask = 0;
+ unsigned long maxnode = 8;
+ int policy;
+ char *mem;
+ int ret;
+
+ if (!is_multi_numa_node_system())
+ return;
+
+ mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
+
+ /* Test MPOL_INTERLEAVE policy */
+ kvm_mbind(mem, page_size * 2, MPOL_INTERLEAVE, &nodemask_0, maxnode, 0);
+ kvm_get_mempolicy(&policy, &nodemask, maxnode, mem, MPOL_F_ADDR);
+ TEST_ASSERT(policy == MPOL_INTERLEAVE && nodemask == nodemask_0,
+ "Wanted MPOL_INTERLEAVE (%u) and nodemask 0x%lx, got %u and 0x%lx",
+ MPOL_INTERLEAVE, nodemask_0, policy, nodemask);
+
+ /* Test basic MPOL_BIND policy */
+ kvm_mbind(mem + page_size * 2, page_size * 2, MPOL_BIND, &nodemask_0, maxnode, 0);
+ kvm_get_mempolicy(&policy, &nodemask, maxnode, mem + page_size * 2, MPOL_F_ADDR);
+ TEST_ASSERT(policy == MPOL_BIND && nodemask == nodemask_0,
+ "Wanted MPOL_BIND (%u) and nodemask 0x%lx, got %u and 0x%lx",
+ MPOL_BIND, nodemask_0, policy, nodemask);
+
+ /* Test MPOL_DEFAULT policy */
+ kvm_mbind(mem, total_size, MPOL_DEFAULT, NULL, 0, 0);
+ kvm_get_mempolicy(&policy, &nodemask, maxnode, mem, MPOL_F_ADDR);
+ TEST_ASSERT(policy == MPOL_DEFAULT && !nodemask,
+ "Wanted MPOL_DEFAULT (%u) and nodemask 0x0, got %u and 0x%lx",
+ MPOL_DEFAULT, policy, nodemask);
+
+ /* Test with invalid policy */
+ ret = mbind(mem, page_size, 999, &nodemask_0, maxnode, 0);
+ TEST_ASSERT(ret == -1 && errno == EINVAL,
+ "mbind with invalid policy should fail with EINVAL");
+
+ kvm_munmap(mem, total_size);
+}
+
+static void test_numa_allocation(int fd, size_t total_size)
+{
+ unsigned long node0_mask = 1; /* Node 0 */
+ unsigned long node1_mask = 2; /* Node 1 */
+ unsigned long maxnode = 8;
+ void *pages[4];
+ int status[4];
+ char *mem;
+ int i;
+
+ if (!is_multi_numa_node_system())
+ return;
+
+ mem = kvm_mmap(total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd);
+
+ for (i = 0; i < 4; i++)
+ pages[i] = (char *)mem + page_size * i;
+
+ /* Set NUMA policy after allocation */
+ memset(mem, 0xaa, page_size);
+ kvm_mbind(pages[0], page_size, MPOL_BIND, &node0_mask, maxnode, 0);
+ kvm_fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, page_size);
+
+ /* Set NUMA policy before allocation */
+ kvm_mbind(pages[0], page_size * 2, MPOL_BIND, &node1_mask, maxnode, 0);
+ kvm_mbind(pages[2], page_size * 2, MPOL_BIND, &node0_mask, maxnode, 0);
+ memset(mem, 0xaa, total_size);
+
+ /* Validate if pages are allocated on specified NUMA nodes */
+ kvm_move_pages(0, 4, pages, NULL, status, 0);
+ TEST_ASSERT(status[0] == 1, "Expected page 0 on node 1, got it on node %d", status[0]);
+ TEST_ASSERT(status[1] == 1, "Expected page 1 on node 1, got it on node %d", status[1]);
+ TEST_ASSERT(status[2] == 0, "Expected page 2 on node 0, got it on node %d", status[2]);
+ TEST_ASSERT(status[3] == 0, "Expected page 3 on node 0, got it on node %d", status[3]);
+
+ /* Punch hole for all pages */
+ kvm_fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0, total_size);
+
+ /* Change NUMA policy nodes and reallocate */
+ kvm_mbind(pages[0], page_size * 2, MPOL_BIND, &node0_mask, maxnode, 0);
+ kvm_mbind(pages[2], page_size * 2, MPOL_BIND, &node1_mask, maxnode, 0);
+ memset(mem, 0xaa, total_size);
+
+ kvm_move_pages(0, 4, pages, NULL, status, 0);
+ TEST_ASSERT(status[0] == 0, "Expected page 0 on node 0, got it on node %d", status[0]);
+ TEST_ASSERT(status[1] == 0, "Expected page 1 on node 0, got it on node %d", status[1]);
+ TEST_ASSERT(status[2] == 1, "Expected page 2 on node 1, got it on node %d", status[2]);
+ TEST_ASSERT(status[3] == 1, "Expected page 3 on node 1, got it on node %d", status[3]);
+
+ kvm_munmap(mem, total_size);
+}
+
static void test_fault_sigbus(int fd, size_t accessible_size, size_t map_size)
{
const char val = 0xaa;
@@ -273,11 +369,13 @@ static void __test_guest_memfd(struct kvm_vm *vm, uint64_t flags)
if (flags & GUEST_MEMFD_FLAG_INIT_SHARED) {
gmem_test(mmap_supported, vm, flags);
gmem_test(fault_overflow, vm, flags);
+ gmem_test(numa_allocation, vm, flags);
} else {
gmem_test(fault_private, vm, flags);
}
gmem_test(mmap_cow, vm, flags);
+ gmem_test(mbind, vm, flags);
} else {
gmem_test(mmap_not_supported, vm, flags);
}
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v13 12/12] KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (10 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 11/12] KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support Sean Christopherson
@ 2025-10-16 17:28 ` Sean Christopherson
2025-10-17 11:01 ` Garg, Shivank
2025-10-16 18:08 ` [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Miguel Ojeda
2025-10-20 16:33 ` Sean Christopherson
13 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 17:28 UTC (permalink / raw)
To: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
Sean Christopherson
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
Track a guest_memfd instance's flags in gmem_inode instead of burying them
in i_private. Burning an extra 8 bytes per inode is well worth the added
clarity provided by explicit tracking.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/guest_memfd.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 4463643bd0a2..20f6e7fab58d 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -30,6 +30,8 @@ struct gmem_file {
struct gmem_inode {
struct shared_policy policy;
struct inode vfs_inode;
+
+ u64 flags;
};
static __always_inline struct gmem_inode *GMEM_I(struct inode *inode)
@@ -154,7 +156,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
static enum kvm_gfn_range_filter kvm_gmem_get_invalidate_filter(struct inode *inode)
{
- if ((u64)inode->i_private & GUEST_MEMFD_FLAG_INIT_SHARED)
+ if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
return KVM_FILTER_SHARED;
return KVM_FILTER_PRIVATE;
@@ -385,9 +387,7 @@ static inline struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot)
static bool kvm_gmem_supports_mmap(struct inode *inode)
{
- const u64 flags = (u64)inode->i_private;
-
- return flags & GUEST_MEMFD_FLAG_MMAP;
+ return GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_MMAP;
}
static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
@@ -399,7 +399,7 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
return VM_FAULT_SIGBUS;
- if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_INIT_SHARED))
+ if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED))
return VM_FAULT_SIGBUS;
folio = kvm_gmem_get_folio(inode, vmf->pgoff);
@@ -588,7 +588,6 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
goto err_fops;
}
- inode->i_private = (void *)(unsigned long)flags;
inode->i_op = &kvm_gmem_iops;
inode->i_mapping->a_ops = &kvm_gmem_aops;
inode->i_mode |= S_IFREG;
@@ -598,6 +597,8 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
/* Unmovable mappings are supposed to be marked unevictable as well. */
WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
+ GMEM_I(inode)->flags = flags;
+
file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
if (IS_ERR(file)) {
err = PTR_ERR(file);
@@ -917,6 +918,8 @@ static struct inode *kvm_gmem_alloc_inode(struct super_block *sb)
return NULL;
mpol_shared_policy_init(&gi->policy, NULL);
+
+ gi->flags = 0;
return &gi->vfs_inode;
}
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (11 preceding siblings ...)
2025-10-16 17:28 ` [PATCH v13 12/12] KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private Sean Christopherson
@ 2025-10-16 18:08 ` Miguel Ojeda
2025-10-16 20:28 ` Sean Christopherson
2025-10-20 16:33 ` Sean Christopherson
13 siblings, 1 reply; 31+ messages in thread
From: Miguel Ojeda @ 2025-10-16 18:08 UTC (permalink / raw)
To: Sean Christopherson
Cc: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
On Thu, Oct 16, 2025 at 7:30 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Miguel, you got pulled in due to a one-line change to add a new iterator
> macros in .clang-format.
Thanks!
The macro is not in `include/`, right? That means that, currently,
when I rerun the command to update the list it will go away.
If that is correct, and you want to have it in the list, then we
should add e.g. `virt/` there or similar, or we could have a few
separate lines at the top that are independent of the ones generated
by the command.
Cheers,
Miguel
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support
2025-10-16 18:08 ` [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Miguel Ojeda
@ 2025-10-16 20:28 ` Sean Christopherson
2025-10-16 21:07 ` Miguel Ojeda
0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-16 20:28 UTC (permalink / raw)
To: Miguel Ojeda
Cc: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
On Thu, Oct 16, 2025, Miguel Ojeda wrote:
> On Thu, Oct 16, 2025 at 7:30 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > Miguel, you got pulled in due to a one-line change to add a new iterator
> > macros in .clang-format.
>
> Thanks!
>
> The macro is not in `include/`, right? That means that, currently,
> when I rerun the command to update the list it will go away.
Oh, I take it .clang-format is auto-generated? Is it a "formal" script, or do
you literally just run the grep command in the comment?
# Taken from:
# git grep -h '^#define [^[:space:]]*for_each[^[:space:]]*(' include/ tools/ \
# | sed "s,^#define \([^[:space:]]*for_each[^[:space:]]*\)(.*$, - '\1'," \
# | LC_ALL=C sort -u
> If that is correct, and you want to have it in the list,
I don't think I care if it's in the list? I honestly don't know for sure, because
it's entirely possible I'm consuming .clang-format without knowing it. I added
the entry based on someone else's request.
Ackerley?
> then we should add e.g. `virt/` there or similar, or we could have a few
> separate lines at the top that are independent of the ones generated
> by the command.
Is it possible, and sensible, to have per-subsystem .clang-format files? KVM
(virt/kvm) and KVM x86 (arch/x86/kvm) both have has several for_each macros,
pretty much all of which are more interesting than kvm_gmem_for_each_file().
Adding arch/x86/kvm to the "script" in .clang-format feels wrong.
E.g.
$ git grep -h '^#define [^[:space:]]*for_each[^[:space:]]*(' arch/x86/kvm/ | \
sed "s,^#define \([^[:space:]]*for_each[^[:space:]]*\)(.*$, - '\1'," | \
LC_ALL=C sort -u
- '__for_each_rmap_spte'
- '__for_each_tdp_mmu_root'
- '__for_each_tdp_mmu_root_yield_safe'
- 'for_each_gfn_valid_sp_with_gptes'
- 'for_each_rmap_spte'
- 'for_each_rmap_spte_lockless'
- 'for_each_shadow_entry'
- 'for_each_shadow_entry_lockless'
- 'for_each_shadow_entry_using_root'
- 'for_each_slot_rmap_range'
- 'for_each_sp'
- 'for_each_tdp_mmu_root_rcu'
- 'for_each_tdp_mmu_root_yield_safe'
- 'for_each_tdp_pte'
- 'for_each_tdp_pte_min_level'
- 'for_each_tdp_pte_min_level_all'
- 'for_each_valid_sp'
- 'for_each_valid_tdp_mmu_root'
- 'for_each_valid_tdp_mmu_root_yield_safe'
- 'kvm_for_each_pmc'
- 'tdp_root_for_each_leaf_pte'
- 'tdp_root_for_each_pte'
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support
2025-10-16 20:28 ` Sean Christopherson
@ 2025-10-16 21:07 ` Miguel Ojeda
2025-10-16 23:57 ` Ackerley Tng
0 siblings, 1 reply; 31+ messages in thread
From: Miguel Ojeda @ 2025-10-16 21:07 UTC (permalink / raw)
To: Sean Christopherson
Cc: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
On Thu, Oct 16, 2025 at 10:28 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Oh, I take it .clang-format is auto-generated? Is it a "formal" script, or do
> you literally just run the grep command in the comment?
I just run it and copy-paste the results there from time to time.
Yeah, a very low-tech solution :)
> I don't think I care if it's in the list? I honestly don't know for sure, because
> it's entirely possible I'm consuming .clang-format without knowing it. I added
> the entry based on someone else's request.
>
> Ackerley?
If you are not relying on it, then please just skip it, yeah.
> Is it possible, and sensible, to have per-subsystem .clang-format files? KVM
> (virt/kvm) and KVM x86 (arch/x86/kvm) both have has several for_each macros,
> pretty much all of which are more interesting than kvm_gmem_for_each_file().
There is `InheritParentConfig` nowadays, but from a quick look I don't
see it supports merging lists.
So to do something fancier, we would do need something like we did for
rust-analyzer, i.e. a `make` target or similar that would generate it.
Otherwise, we can just add extra macros at the top meanwhile.
What we did last time is just to add `tools/` to that command --
increasing coverage is not an issue (I just started with `include/`
originally to be a bit conservative and avoid a huge list until we
knew the tool would be used).
Cheers,
Miguel
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support
2025-10-16 21:07 ` Miguel Ojeda
@ 2025-10-16 23:57 ` Ackerley Tng
2025-10-17 0:09 ` Miguel Ojeda
0 siblings, 1 reply; 31+ messages in thread
From: Ackerley Tng @ 2025-10-16 23:57 UTC (permalink / raw)
To: Miguel Ojeda, Sean Christopherson
Cc: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
linux-arm-kernel, kvmarm, kvm, linux-kernel, Shivank Garg,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> writes:
> On Thu, Oct 16, 2025 at 10:28 PM Sean Christopherson <seanjc@google.com> wrote:
>>
>> Oh, I take it .clang-format is auto-generated? Is it a "formal" script, or do
>> you literally just run the grep command in the comment?
>
> I just run it and copy-paste the results there from time to time.
> Yeah, a very low-tech solution :)
>
I assumed someone was doing this from time to time, and I ran the grep
command in .clang-format but IIUC it only reads tools/ and include/
(which doesn't cover this new macro) and so I thought the "automation"
would miss this new macro, hence I suggested to manually add the macro.
Using the command on virt/ would pick it up. Would it be better to add
"virt/" to the "automation" + update .clang-format while we're at it?
$ git grep -h '^#define [^[:space:]]*for_each[^[:space:]]*(' virt/ | sed "s,^#define \([^[:space:]]*for_each[^[:space:]]*\)(.*$, - '\1'," | LC_ALL=C sort -u
- 'kvm_for_each_memslot_in_hva_range'
- 'kvm_gmem_for_each_file'
>> I don't think I care if it's in the list? I honestly don't know for sure, because
>> it's entirely possible I'm consuming .clang-format without knowing it. I added
>> the entry based on someone else's request.
>>
>> Ackerley?
>
> If you are not relying on it, then please just skip it, yeah.
>
I'm using it, I believe clangd (my lsp server) uses it to reflow correctly.
>> Is it possible, and sensible, to have per-subsystem .clang-format files? KVM
>> (virt/kvm) and KVM x86 (arch/x86/kvm) both have has several for_each macros,
>> pretty much all of which are more interesting than kvm_gmem_for_each_file().
>
> There is `InheritParentConfig` nowadays, but from a quick look I don't
> see it supports merging lists.
>
> So to do something fancier, we would do need something like we did for
> rust-analyzer, i.e. a `make` target or similar that would generate it.
>
> Otherwise, we can just add extra macros at the top meanwhile.
>
> What we did last time is just to add `tools/` to that command --
> increasing coverage is not an issue (I just started with `include/`
> originally to be a bit conservative and avoid a huge list until we
> knew the tool would be used).
>
> Cheers,
> Miguel
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support
2025-10-16 23:57 ` Ackerley Tng
@ 2025-10-17 0:09 ` Miguel Ojeda
2025-10-17 16:49 ` Sean Christopherson
0 siblings, 1 reply; 31+ messages in thread
From: Miguel Ojeda @ 2025-10-17 0:09 UTC (permalink / raw)
To: Ackerley Tng
Cc: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini, linux-arm-kernel, kvmarm, kvm, linux-kernel,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
On Fri, Oct 17, 2025 at 1:57 AM Ackerley Tng <ackerleytng@google.com> wrote:
>
> Using the command on virt/ would pick it up. Would it be better to add
> "virt/" to the "automation" + update .clang-format while we're at it?
Yeah, that is what I was suggesting if you rely on it (and if the
maintainers of the relevant folders are OK with it).
Cheers,
Miguel
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file"
2025-10-16 17:28 ` [PATCH v13 01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file" Sean Christopherson
@ 2025-10-17 9:31 ` Garg, Shivank
0 siblings, 0 replies; 31+ messages in thread
From: Garg, Shivank @ 2025-10-17 9:31 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
On 10/16/2025 10:58 PM, Sean Christopherson wrote:
> Rename the "kvm_gmem" structure to "gmem_file" in anticipation of using
> dedicated guest_memfd inodes instead of anonyomous inodes, at which point
> the "kvm_gmem" nomenclature becomes quite misleading. In guest_memfd,
> inodes are effectively the raw underlying physical storage, and will be
> used to track properties of the physical memory, while each gmem file is
> effectively a single VM's view of that storage, and is used to track assets
> specific to its associated VM, e.g. memslots=>gmem bindings.
>
> Using "kvm_gmem" suggests that the per-VM/per-file structures are _the_
> guest_memfd instance, which almost the exact opposite of reality.
>
> Opportunistically rename local variables from "gmem" to "f", again to
> avoid confusion once guest_memfd specific inodes come along.
>
> No functional change intended.
>
> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> Tested-by: Ackerley Tng <ackerleytng@google.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> virt/kvm/guest_memfd.c | 98 +++++++++++++++++++++++-------------------
> 1 file changed, 53 insertions(+), 45 deletions(-)
>
Tested-by: Shivank Garg <shivankg@amd.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 02/12] KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode
2025-10-16 17:28 ` [PATCH v13 02/12] KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode Sean Christopherson
@ 2025-10-17 9:34 ` Garg, Shivank
0 siblings, 0 replies; 31+ messages in thread
From: Garg, Shivank @ 2025-10-17 9:34 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
On 10/16/2025 10:58 PM, Sean Christopherson wrote:
> Add a kvm_gmem_for_each_file() to make it more obvious that KVM is
> iterating over guest_memfd _files_, not guest_memfd instances, as could
> be assumed given the name "gmem_list".
>
> No functional change intended.
>
> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> Tested-by: Ackerley Tng <ackerleytng@google.com>
> Reviewed-by: Shivank Garg <shivankg@amd.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> .clang-format | 1 +
> virt/kvm/guest_memfd.c | 9 +++++----
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
Tested-by: Shivank Garg <shivankg@amd.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 06/12] KVM: selftests: Define wrappers for common syscalls to assert success
2025-10-16 17:28 ` [PATCH v13 06/12] KVM: selftests: Define wrappers for common syscalls to assert success Sean Christopherson
@ 2025-10-17 9:51 ` Garg, Shivank
0 siblings, 0 replies; 31+ messages in thread
From: Garg, Shivank @ 2025-10-17 9:51 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
On 10/16/2025 10:58 PM, Sean Christopherson wrote:
> Add kvm_<sycall> wrappers for munmap(), close(), fallocate(), and
> ftruncate() to cut down on boilerplate code when a sycall is expected
> to succeed, and to make it easier for developers to remember to assert
> success.
>
> Implement and use a macro framework similar to the kernel's SYSCALL_DEFINE
> infrastructure to further cut down on boilerplate code, and to drastically
> reduce the probability of typos as the kernel's syscall definitions can be
> copy+paste almost verbatim.
>
> Provide macros to build the raw <sycall>() wrappers as well, e.g. to
> replace hand-coded wrappers (NUMA) or pure open-coded calls.
>
> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> Tested-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> tools/testing/selftests/kvm/arm64/vgic_irq.c | 2 +-
> .../selftests/kvm/include/kvm_syscalls.h | 81 +++++++++++++++++++
> .../testing/selftests/kvm/include/kvm_util.h | 29 +------
> .../selftests/kvm/kvm_binary_stats_test.c | 4 +-
> tools/testing/selftests/kvm/lib/kvm_util.c | 31 ++-----
> .../kvm/x86/private_mem_conversions_test.c | 9 +--
> 6 files changed, 96 insertions(+), 60 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/include/kvm_syscalls.h
>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 07/12] KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default
2025-10-16 17:28 ` [PATCH v13 07/12] KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default Sean Christopherson
@ 2025-10-17 9:53 ` Garg, Shivank
0 siblings, 0 replies; 31+ messages in thread
From: Garg, Shivank @ 2025-10-17 9:53 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
On 10/16/2025 10:58 PM, Sean Christopherson wrote:
> Register handlers for signals for all selftests that are likely happen due
> to test (or kernel) bugs, and explicitly fail tests on unexpected signals
> so that users get a stack trace, i.e. don't have to go spelunking to do
> basic triage.
>
> Register the handlers as early as possible, to catch as many unexpected
> signals as possible, and also so that the common code doesn't clobber a
> handler that's installed by test (or arch) code.
>
> Tested-by: Ackerley Tng <ackerleytng@google.com>
> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> tools/testing/selftests/kvm/lib/kvm_util.c | 24 ++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 8b60b767224b..0c3a6a40d1a9 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -2290,11 +2290,35 @@ __weak void kvm_selftest_arch_init(void)
> {
> }
>
> +static void report_unexpected_signal(int signum)
> +{
> +#define KVM_CASE_SIGNUM(sig) \
> + case sig: TEST_FAIL("Unexpected " #sig " (%d)\n", signum)
> +
> + switch (signum) {
> + KVM_CASE_SIGNUM(SIGBUS);
> + KVM_CASE_SIGNUM(SIGSEGV);
> + KVM_CASE_SIGNUM(SIGILL);
> + KVM_CASE_SIGNUM(SIGFPE);
> + default:
> + TEST_FAIL("Unexpected signal %d\n", signum);
> + }
> +}
> +
> void __attribute((constructor)) kvm_selftest_init(void)
> {
> + struct sigaction sig_sa = {
> + .sa_handler = report_unexpected_signal,
> + };
> +
> /* Tell stdout not to buffer its content. */
> setbuf(stdout, NULL);
>
> + sigaction(SIGBUS, &sig_sa, NULL);
> + sigaction(SIGSEGV, &sig_sa, NULL);
> + sigaction(SIGILL, &sig_sa, NULL);
> + sigaction(SIGFPE, &sig_sa, NULL);
> +
> guest_random_seed = last_guest_seed = random();
> pr_info("Random seed: 0x%x\n", guest_random_seed);
>
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 08/12] KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h
2025-10-16 17:28 ` [PATCH v13 08/12] KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h Sean Christopherson
@ 2025-10-17 10:12 ` Garg, Shivank
0 siblings, 0 replies; 31+ messages in thread
From: Garg, Shivank @ 2025-10-17 10:12 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
On 10/16/2025 10:58 PM, Sean Christopherson wrote:
> Add APIs for all syscalls defined in the kernel's mm/mempolicy.c to match
> those that would be provided by linking to libnuma. Opportunistically use
> the recently inroduced KVM_SYSCALL_DEFINE() builders to take care of the
> boilerplate, and to fix a flaw where the two existing wrappers would
> generate multiple symbols if numaif.h were to be included multiple times.
>
> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> Tested-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> tools/testing/selftests/kvm/include/numaif.h | 36 +++++++++++--------
> .../selftests/kvm/x86/xapic_ipi_test.c | 5 ++-
> 2 files changed, 23 insertions(+), 18 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/numaif.h b/tools/testing/selftests/kvm/include/numaif.h
> index b020547403fd..aaa4ac174890 100644
> --- a/tools/testing/selftests/kvm/include/numaif.h
> +++ b/tools/testing/selftests/kvm/include/numaif.h
> @@ -13,23 +13,29 @@
> #ifndef SELFTEST_KVM_NUMAIF_H
> #define SELFTEST_KVM_NUMAIF_H
>
> -#define __NR_get_mempolicy 239
> -#define __NR_migrate_pages 256
> +#include <linux/mempolicy.h>
>
> -/* System calls */
> -long get_mempolicy(int *policy, const unsigned long *nmask,
> - unsigned long maxnode, void *addr, int flags)
> -{
> - return syscall(__NR_get_mempolicy, policy, nmask,
> - maxnode, addr, flags);
> -}
> +#include "kvm_syscalls.h"
>
> -long migrate_pages(int pid, unsigned long maxnode,
> - const unsigned long *frommask,
> - const unsigned long *tomask)
> -{
> - return syscall(__NR_migrate_pages, pid, maxnode, frommask, tomask);
> -}
> +KVM_SYSCALL_DEFINE(get_mempolicy, 5, int *, policy, const unsigned long *, nmask,
> + unsigned long, maxnode, void *, addr, int, flags);
> +
> +KVM_SYSCALL_DEFINE(set_mempolicy, 3, int, mode, const unsigned long *, nmask,
> + unsigned long, maxnode);
> +
> +KVM_SYSCALL_DEFINE(set_mempolicy_home_node, 4, unsigned long, start,
> + unsigned long, len, unsigned long, home_node,
> + unsigned long, flags);
set_mempolicy_home_node() has no user in this series,
but adding it for API completeness (or future use) looks reasonable to me.
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
> +
> +KVM_SYSCALL_DEFINE(migrate_pages, 4, int, pid, unsigned long, maxnode,
> + const unsigned long *, frommask, const unsigned long *, tomask);
> +
> +KVM_SYSCALL_DEFINE(move_pages, 6, int, pid, unsigned long, count, void *, pages,
> + const int *, nodes, int *, status, int, flags);
> +
> +KVM_SYSCALL_DEFINE(mbind, 6, void *, addr, unsigned long, size, int, mode,
> + const unsigned long *, nodemask, unsigned long, maxnode,
> + unsigned int, flags);
>
> /* Policies */
> #define MPOL_DEFAULT 0
> diff --git a/tools/testing/selftests/kvm/x86/xapic_ipi_test.c b/tools/testing/selftests/kvm/x86/xapic_ipi_test.c
> index 35cb9de54a82..ae4a4b6c05ca 100644
> --- a/tools/testing/selftests/kvm/x86/xapic_ipi_test.c
> +++ b/tools/testing/selftests/kvm/x86/xapic_ipi_test.c
> @@ -256,7 +256,7 @@ void do_migrations(struct test_data_page *data, int run_secs, int delay_usecs,
> int nodes = 0;
> time_t start_time, last_update, now;
> time_t interval_secs = 1;
> - int i, r;
> + int i;
> int from, to;
> unsigned long bit;
> uint64_t hlt_count;
> @@ -267,9 +267,8 @@ void do_migrations(struct test_data_page *data, int run_secs, int delay_usecs,
> delay_usecs);
>
> /* Get set of first 64 numa nodes available */
> - r = get_mempolicy(NULL, &nodemask, sizeof(nodemask) * 8,
> + kvm_get_mempolicy(NULL, &nodemask, sizeof(nodemask) * 8,
> 0, MPOL_F_MEMS_ALLOWED);
> - TEST_ASSERT(r == 0, "get_mempolicy failed errno=%d", errno);
>
> fprintf(stderr, "Numa nodes found amongst first %lu possible nodes "
> "(each 1-bit indicates node is present): %#lx\n",
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions
2025-10-16 17:28 ` [PATCH v13 09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions Sean Christopherson
@ 2025-10-17 10:35 ` Garg, Shivank
2025-10-17 16:18 ` Sean Christopherson
0 siblings, 1 reply; 31+ messages in thread
From: Garg, Shivank @ 2025-10-17 10:35 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
On 10/16/2025 10:58 PM, Sean Christopherson wrote:
> Drop the KVM's re-definitions of MPOL_xxx flags in numaif.h as they are
> defined by the already-included, kernel-provided mempolicy.h. The only
> reason the duplicate definitions don't cause compiler warnings is because
> they are identical, but only on x86-64! The syscall numbers in particular
> are subtly x86_64-specific, i.e. will cause problems if/when numaif.h is
> used outsize of x86.
>
> Opportunistically clean up the file comment as the license information is
> covered by the SPDX header, the path is superfluous, and as above the
> comment about the contents is flat out wrong.
>
> Fixes: 346b59f220a2 ("KVM: selftests: Add missing header file needed by xAPIC IPI tests")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> tools/testing/selftests/kvm/include/numaif.h | 32 +-------------------
> 1 file changed, 1 insertion(+), 31 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/numaif.h b/tools/testing/selftests/kvm/include/numaif.h
> index aaa4ac174890..1554003c40a1 100644
> --- a/tools/testing/selftests/kvm/include/numaif.h
> +++ b/tools/testing/selftests/kvm/include/numaif.h
> @@ -1,14 +1,5 @@
> /* SPDX-License-Identifier: GPL-2.0-only */
> -/*
> - * tools/testing/selftests/kvm/include/numaif.h
> - *
> - * Copyright (C) 2020, Google LLC.
> - *
> - * This work is licensed under the terms of the GNU GPL, version 2.
> - *
> - * Header file that provides access to NUMA API functions not explicitly
> - * exported to user space.
> - */
> +/* Copyright (C) 2020, Google LLC. */
Given this file got a complete overhaul in this series, Should the copyright be 2020, 2025?
Not entirely sure what the rules are for this.
LGTM!
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
>
> #ifndef SELFTEST_KVM_NUMAIF_H
> #define SELFTEST_KVM_NUMAIF_H
> @@ -37,25 +28,4 @@ KVM_SYSCALL_DEFINE(mbind, 6, void *, addr, unsigned long, size, int, mode,
> const unsigned long *, nodemask, unsigned long, maxnode,
> unsigned int, flags);
>
> -/* Policies */
> -#define MPOL_DEFAULT 0
> -#define MPOL_PREFERRED 1
> -#define MPOL_BIND 2
> -#define MPOL_INTERLEAVE 3
> -
> -#define MPOL_MAX MPOL_INTERLEAVE
> -
> -/* Flags for get_mem_policy */
> -#define MPOL_F_NODE (1<<0) /* return next il node or node of address */
> - /* Warning: MPOL_F_NODE is unsupported and
> - * subject to change. Don't use.
> - */
> -#define MPOL_F_ADDR (1<<1) /* look up vma using address */
> -#define MPOL_F_MEMS_ALLOWED (1<<2) /* query nodes allowed in cpuset */
> -
> -/* Flags for mbind */
> -#define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */
> -#define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform to mapping */
> -#define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */
> -
> #endif /* SELFTEST_KVM_NUMAIF_H */
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 12/12] KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private
2025-10-16 17:28 ` [PATCH v13 12/12] KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private Sean Christopherson
@ 2025-10-17 11:01 ` Garg, Shivank
0 siblings, 0 replies; 31+ messages in thread
From: Garg, Shivank @ 2025-10-17 11:01 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
On 10/16/2025 10:58 PM, Sean Christopherson wrote:
> Track a guest_memfd instance's flags in gmem_inode instead of burying them
> in i_private. Burning an extra 8 bytes per inode is well worth the added
> clarity provided by explicit tracking.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> virt/kvm/guest_memfd.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 4463643bd0a2..20f6e7fab58d 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -30,6 +30,8 @@ struct gmem_file {
> struct gmem_inode {
> struct shared_policy policy;
> struct inode vfs_inode;
> +
> + u64 flags;
> };
>
> static __always_inline struct gmem_inode *GMEM_I(struct inode *inode)
> @@ -154,7 +156,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
>
> static enum kvm_gfn_range_filter kvm_gmem_get_invalidate_filter(struct inode *inode)
> {
> - if ((u64)inode->i_private & GUEST_MEMFD_FLAG_INIT_SHARED)
> + if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)
> return KVM_FILTER_SHARED;
>
> return KVM_FILTER_PRIVATE;
> @@ -385,9 +387,7 @@ static inline struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot)
>
> static bool kvm_gmem_supports_mmap(struct inode *inode)
> {
> - const u64 flags = (u64)inode->i_private;
> -
> - return flags & GUEST_MEMFD_FLAG_MMAP;
> + return GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_MMAP;
> }
>
> static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
> @@ -399,7 +399,7 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
> if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode))
> return VM_FAULT_SIGBUS;
>
> - if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_INIT_SHARED))
> + if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED))
> return VM_FAULT_SIGBUS;
>
> folio = kvm_gmem_get_folio(inode, vmf->pgoff);
> @@ -588,7 +588,6 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
> goto err_fops;
> }
>
> - inode->i_private = (void *)(unsigned long)flags;
> inode->i_op = &kvm_gmem_iops;
> inode->i_mapping->a_ops = &kvm_gmem_aops;
> inode->i_mode |= S_IFREG;
> @@ -598,6 +597,8 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
> /* Unmovable mappings are supposed to be marked unevictable as well. */
> WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
>
> + GMEM_I(inode)->flags = flags;
> +
> file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops);
> if (IS_ERR(file)) {
> err = PTR_ERR(file);
> @@ -917,6 +918,8 @@ static struct inode *kvm_gmem_alloc_inode(struct super_block *sb)
> return NULL;
>
> mpol_shared_policy_init(&gi->policy, NULL);
> +
> + gi->flags = 0;
I initially wondered why these are initialized in kvm_gmem_alloc_inode()
instead of kvm_gmem_init_inode_once().
Your comment in kvm_gmem_init_inode_once() made clear to me.. The ctor runs
once per slab object, so gi->flags wouldn't reset on inode realloction.
Initializing in the alloc_inode() ensures proper reset on every allocation,
Makes sense!
Thanks for the detailed comment!
Reviewed-by: Shivank Garg <shivankg@amd.com>
Tested-by: Shivank Garg <shivankg@amd.com>
> return &gi->vfs_inode;
> }
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions
2025-10-17 10:35 ` Garg, Shivank
@ 2025-10-17 16:18 ` Sean Christopherson
0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2025-10-17 16:18 UTC (permalink / raw)
To: Shivank Garg
Cc: Miguel Ojeda, Marc Zyngier, Oliver Upton, Paolo Bonzini,
linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
On Fri, Oct 17, 2025, Shivank Garg wrote:
>
>
> On 10/16/2025 10:58 PM, Sean Christopherson wrote:
> > Drop the KVM's re-definitions of MPOL_xxx flags in numaif.h as they are
> > defined by the already-included, kernel-provided mempolicy.h. The only
> > reason the duplicate definitions don't cause compiler warnings is because
> > they are identical, but only on x86-64! The syscall numbers in particular
> > are subtly x86_64-specific, i.e. will cause problems if/when numaif.h is
> > used outsize of x86.
> >
> > Opportunistically clean up the file comment as the license information is
> > covered by the SPDX header, the path is superfluous, and as above the
> > comment about the contents is flat out wrong.
> >
> > Fixes: 346b59f220a2 ("KVM: selftests: Add missing header file needed by xAPIC IPI tests")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > tools/testing/selftests/kvm/include/numaif.h | 32 +-------------------
> > 1 file changed, 1 insertion(+), 31 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/include/numaif.h b/tools/testing/selftests/kvm/include/numaif.h
> > index aaa4ac174890..1554003c40a1 100644
> > --- a/tools/testing/selftests/kvm/include/numaif.h
> > +++ b/tools/testing/selftests/kvm/include/numaif.h
> > @@ -1,14 +1,5 @@
> > /* SPDX-License-Identifier: GPL-2.0-only */
> > -/*
> > - * tools/testing/selftests/kvm/include/numaif.h
> > - *
> > - * Copyright (C) 2020, Google LLC.
> > - *
> > - * This work is licensed under the terms of the GNU GPL, version 2.
> > - *
> > - * Header file that provides access to NUMA API functions not explicitly
> > - * exported to user space.
> > - */
> > +/* Copyright (C) 2020, Google LLC. */
>
> Given this file got a complete overhaul in this series, Should the copyright be 2020, 2025?
> Not entirely sure what the rules are for this.
Me either. I just figure I can't really go totally wrong by doing nothing :-)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support
2025-10-17 0:09 ` Miguel Ojeda
@ 2025-10-17 16:49 ` Sean Christopherson
0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2025-10-17 16:49 UTC (permalink / raw)
To: Miguel Ojeda
Cc: Ackerley Tng, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini, linux-arm-kernel, kvmarm, kvm, linux-kernel,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
On Fri, Oct 17, 2025, Miguel Ojeda wrote:
> On Fri, Oct 17, 2025 at 1:57 AM Ackerley Tng <ackerleytng@google.com> wrote:
> >
> > Using the command on virt/ would pick it up. Would it be better to add
> > "virt/" to the "automation" + update .clang-format while we're at it?
>
> Yeah, that is what I was suggesting if you rely on it (and if the
> maintainers of the relevant folders are OK with it).
Hmm, my vote would be to go all-or-nothing for KVM (x86), i.e. include everything
in KVM, or explicitly filter out KVM. I don't see how auto-formatting can be
useful if it's wildly inconsistent, e.g. if it works for some KVM for-each macros,
but clobbers others.
And I'm leaning towards filtering out KVM, because I'm not sure I want to encourage
use of auto-formatting. I can definitely see how it's useful, but so much of the
auto-formatting is just _awful_.
E.g. I ran it on a few KVM files and it generated changes like this
- intel_pmu_enable_fixed_counter_bits(pmu, INTEL_FIXED_0_KERNEL |
- INTEL_FIXED_0_USER |
- INTEL_FIXED_0_ENABLE_PMI);
+ intel_pmu_enable_fixed_counter_bits(
+ pmu, INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER |
+ INTEL_FIXED_0_ENABLE_PMI);
and
- intel_pmu_enable_fixed_counter_bits(pmu, ICL_FIXED_0_ADAPTIVE);
+ intel_pmu_enable_fixed_counter_bits(
+ pmu, ICL_FIXED_0_ADAPTIVE);
There are definitely plenty of good changes as well, but overall I find the results
to be very net negative. That's obviously highly subjective, and maybe there's
settings in clangd I can tweak to make things more to my liking, but my initial
reaction is that I don't want to actively encourage use of auto-formatting in KVM.
I think no matter what, any decision should be in a separate, dedicated patch/thread.
So for this series, I'll drop the .clang-format change when applying, assuming
nothing else pops that requires a new version.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
` (12 preceding siblings ...)
2025-10-16 18:08 ` [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Miguel Ojeda
@ 2025-10-20 16:33 ` Sean Christopherson
2025-10-21 5:59 ` Garg, Shivank
13 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2025-10-20 16:33 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra,
Vlastimil Babka
On Thu, 16 Oct 2025 10:28:41 -0700, Sean Christopherson wrote:
> Miguel, you got pulled in due to a one-line change to add a new iterator
> macros in .clang-format.
>
> Shivank's series to add support for NUMA-aware memory placement in
> guest_memfd. Based on kvm-x86/next.
>
> Note, Ackerley pointed out that we should probably have testing for the
> cpuset_do_page_mem_spread() behavior. I 100% agree, but I'm also a-ok
> merging without those tests.
>
> [...]
Applied to kvm-x86 gmem, sans the .clang-format change. Thanks!
[01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file"
https://github.com/kvm-x86/linux/commit/497b1dfbcacf
[02/12] KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode
https://github.com/kvm-x86/linux/commit/392dd9d9488a
[03/12] KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes
https://github.com/kvm-x86/linux/commit/a63ca4236e67
[04/12] KVM: guest_memfd: Add slab-allocated inode cache
https://github.com/kvm-x86/linux/commit/f609e89ae893
[05/12] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
https://github.com/kvm-x86/linux/commit/ed1ffa810bd6
[06/12] KVM: selftests: Define wrappers for common syscalls to assert success
https://github.com/kvm-x86/linux/commit/3223560c93eb
[07/12] KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default
https://github.com/kvm-x86/linux/commit/29dc539d74ab
[08/12] KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h
https://github.com/kvm-x86/linux/commit/2189d78269c5
[09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions
https://github.com/kvm-x86/linux/commit/fe7baebb99de
[10/12] KVM: selftests: Add helpers to probe for NUMA support, and multi-node systems
https://github.com/kvm-x86/linux/commit/e698e89b3ed1
[11/12] KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support
https://github.com/kvm-x86/linux/commit/38ccc50ac037
[12/12] KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private
https://github.com/kvm-x86/linux/commit/e66438bb81c4
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support
2025-10-20 16:33 ` Sean Christopherson
@ 2025-10-21 5:59 ` Garg, Shivank
0 siblings, 0 replies; 31+ messages in thread
From: Garg, Shivank @ 2025-10-21 5:59 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra, Vlastimil Babka
On 10/20/2025 10:03 PM, Sean Christopherson wrote:
> On Thu, 16 Oct 2025 10:28:41 -0700, Sean Christopherson wrote:
>> Miguel, you got pulled in due to a one-line change to add a new iterator
>> macros in .clang-format.
>>
>> Shivank's series to add support for NUMA-aware memory placement in
>> guest_memfd. Based on kvm-x86/next.
>>
>> Note, Ackerley pointed out that we should probably have testing for the
>> cpuset_do_page_mem_spread() behavior. I 100% agree, but I'm also a-ok
>> merging without those tests.
>>
>> [...]
>
> Applied to kvm-x86 gmem, sans the .clang-format change. Thanks!
>
> [01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file"
> https://github.com/kvm-x86/linux/commit/497b1dfbcacf
> [02/12] KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode
> https://github.com/kvm-x86/linux/commit/392dd9d9488a
> [03/12] KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes
> https://github.com/kvm-x86/linux/commit/a63ca4236e67
> [04/12] KVM: guest_memfd: Add slab-allocated inode cache
> https://github.com/kvm-x86/linux/commit/f609e89ae893
> [05/12] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy
> https://github.com/kvm-x86/linux/commit/ed1ffa810bd6
> [06/12] KVM: selftests: Define wrappers for common syscalls to assert success
> https://github.com/kvm-x86/linux/commit/3223560c93eb
> [07/12] KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default
> https://github.com/kvm-x86/linux/commit/29dc539d74ab
> [08/12] KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h
> https://github.com/kvm-x86/linux/commit/2189d78269c5
> [09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions
> https://github.com/kvm-x86/linux/commit/fe7baebb99de
> [10/12] KVM: selftests: Add helpers to probe for NUMA support, and multi-node systems
> https://github.com/kvm-x86/linux/commit/e698e89b3ed1
> [11/12] KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support
> https://github.com/kvm-x86/linux/commit/38ccc50ac037
> [12/12] KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private
> https://github.com/kvm-x86/linux/commit/e66438bb81c4
>
> --
> https://github.com/kvm-x86/linux/tree/next
Hi Sean,
Thank you for handling all the changes in v12-v13. I appreciate you taking this on,
especially the refactoring, improving selftests and code clarity.
Thanks to everyone who provided support, reviews, and suggestions throughout the series.
Best regards,
Shivank
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 04/12] KVM: guest_memfd: Add slab-allocated inode cache
2025-10-16 17:28 ` [PATCH v13 04/12] KVM: guest_memfd: Add slab-allocated inode cache Sean Christopherson
@ 2025-10-27 11:06 ` Vlastimil Babka
2025-10-27 12:25 ` Garg, Shivank
0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2025-10-27 11:06 UTC (permalink / raw)
To: Sean Christopherson, Miguel Ojeda, Marc Zyngier, Oliver Upton,
Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
Shivank Garg, David Hildenbrand, Fuad Tabba, Ashish Kalra
On 10/16/25 19:28, Sean Christopherson wrote:
> From: Shivank Garg <shivankg@amd.com>
>
> Add a dedicated gmem_inode structure and a slab-allocated inode cache for
> guest memory backing, similar to how shmem handles inodes.
>
> This adds the necessary allocation/destruction functions and prepares
> for upcoming guest_memfd NUMA policy support changes. Using a dedicated
> structure will also allow for additional cleanups, e.g. to track flags in
> gmem_inode instead of i_private.
>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> Tested-by: Ashish Kalra <ashish.kalra@amd.com>
> [sean: s/kvm_gmem_inode_info/gmem_inode, name init_once()]
> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> Tested-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Some nits below, not critical unless there's resubmit for other reasons:
> @@ -860,13 +917,31 @@ static int kvm_gmem_init_mount(void)
>
> int kvm_gmem_init(struct module *module)
> {
> + struct kmem_cache_args args = {
> + .align = 0,
This seems unnecessary as it's implicit.
> + .ctor = kvm_gmem_init_inode_once,
> + };
> + int ret;
> +
> kvm_gmem_fops.owner = module;
> + kvm_gmem_inode_cachep = kmem_cache_create("kvm_gmem_inode_cache",
> + sizeof(struct gmem_inode),
> + &args, SLAB_ACCOUNT);
> + if (!kvm_gmem_inode_cachep)
> + return -ENOMEM;
>
> - return kvm_gmem_init_mount();
> + ret = kvm_gmem_init_mount();
> + if (ret) {
> + kmem_cache_destroy(kvm_gmem_inode_cachep);
> + return ret;
> + }
> + return 0;
> }
>
> void kvm_gmem_exit(void)
> {
> kern_unmount(kvm_gmem_mnt);
> kvm_gmem_mnt = NULL;
> + rcu_barrier();
Is it because VFS can do call_rcu() with something that ends up with
kvm_gmem_free_inode()? Because nothing in this patch does that directly,
maybe worth a comment?
> + kmem_cache_destroy(kvm_gmem_inode_cachep);
> }
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v13 04/12] KVM: guest_memfd: Add slab-allocated inode cache
2025-10-27 11:06 ` Vlastimil Babka
@ 2025-10-27 12:25 ` Garg, Shivank
0 siblings, 0 replies; 31+ messages in thread
From: Garg, Shivank @ 2025-10-27 12:25 UTC (permalink / raw)
To: Vlastimil Babka, Sean Christopherson, Miguel Ojeda, Marc Zyngier,
Oliver Upton, Paolo Bonzini
Cc: linux-arm-kernel, kvmarm, kvm, linux-kernel, Ackerley Tng,
David Hildenbrand, Fuad Tabba, Ashish Kalra
On 10/27/2025 4:36 PM, Vlastimil Babka wrote:
> On 10/16/25 19:28, Sean Christopherson wrote:
>> From: Shivank Garg <shivankg@amd.com>
>>
>> Add a dedicated gmem_inode structure and a slab-allocated inode cache for
>> guest memory backing, similar to how shmem handles inodes.
>>
>> This adds the necessary allocation/destruction functions and prepares
>> for upcoming guest_memfd NUMA policy support changes. Using a dedicated
>> structure will also allow for additional cleanups, e.g. to track flags in
>> gmem_inode instead of i_private.
>>
>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>> Tested-by: Ashish Kalra <ashish.kalra@amd.com>
>> [sean: s/kvm_gmem_inode_info/gmem_inode, name init_once()]
>> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
>> Tested-by: Ackerley Tng <ackerleytng@google.com>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> Some nits below, not critical unless there's resubmit for other reasons:
Hi Vlastimil,
Thank you for the review.
>
>> @@ -860,13 +917,31 @@ static int kvm_gmem_init_mount(void)
>>
>> int kvm_gmem_init(struct module *module)
>> {
>> + struct kmem_cache_args args = {
>> + .align = 0,
>
> This seems unnecessary as it's implicit.
Ack
>> + .ctor = kvm_gmem_init_inode_once,
>> + };
>> + int ret;
>> +
>> kvm_gmem_fops.owner = module;
>> + kvm_gmem_inode_cachep = kmem_cache_create("kvm_gmem_inode_cache",
>> + sizeof(struct gmem_inode),
>> + &args, SLAB_ACCOUNT);
>> + if (!kvm_gmem_inode_cachep)
>> + return -ENOMEM;
>>
>> - return kvm_gmem_init_mount();
>> + ret = kvm_gmem_init_mount();
>> + if (ret) {
>> + kmem_cache_destroy(kvm_gmem_inode_cachep);
>> + return ret;
>> + }
>> + return 0;
>> }
>>
>> void kvm_gmem_exit(void)
>> {
>> kern_unmount(kvm_gmem_mnt);
>> kvm_gmem_mnt = NULL;
>> + rcu_barrier();
>
> Is it because VFS can do call_rcu() with something that ends up with
> kvm_gmem_free_inode()? Because nothing in this patch does that directly,
> maybe worth a comment?
Yes, exactly. I discovered this race condition while debugging a bug that
occurred during kvm_amd module unload after running gmem backed VM.
More details here:
https://lore.kernel.org/linux-mm/e7f7703d-fe76-4ab2-bef4-8d4c54da03ad@amd.com
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 427c0acee9d7..e1f69747fc84 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -969,7 +969,6 @@ static int kvm_gmem_init_mount(void)
int kvm_gmem_init(struct module *module)
{
struct kmem_cache_args args = {
- .align = 0,
.ctor = kvm_gmem_init_inode_once,
};
int ret;
@@ -993,6 +992,15 @@ void kvm_gmem_exit(void)
{
kern_unmount(kvm_gmem_mnt);
kvm_gmem_mnt = NULL;
+
+ /*
+ * Wait for all pending RCU callbacks to complete before destroying
+ * the inode cache. The VFS layer use call_rcu() during inode
+ * eviction (via evict_inodes() -> destroy_inode() -> call_rcu()),
+ * which eventually calls kvm_gmem_free_inode().
+ * We must ensure all such callbacks have finished before
+ * kmem_cache_destroy() to avoid issues with the kmem cache.
+ */
rcu_barrier();
kmem_cache_destroy(kvm_gmem_inode_cachep);
}
>
>> + kmem_cache_destroy(kvm_gmem_inode_cachep);
>> }
>
^ permalink raw reply related [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-10-27 12:25 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16 17:28 [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 01/12] KVM: guest_memfd: Rename "struct kvm_gmem" to "struct gmem_file" Sean Christopherson
2025-10-17 9:31 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 02/12] KVM: guest_memfd: Add macro to iterate over gmem_files for a mapping/inode Sean Christopherson
2025-10-17 9:34 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 03/12] KVM: guest_memfd: Use guest mem inodes instead of anonymous inodes Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 04/12] KVM: guest_memfd: Add slab-allocated inode cache Sean Christopherson
2025-10-27 11:06 ` Vlastimil Babka
2025-10-27 12:25 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 05/12] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 06/12] KVM: selftests: Define wrappers for common syscalls to assert success Sean Christopherson
2025-10-17 9:51 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 07/12] KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by default Sean Christopherson
2025-10-17 9:53 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 08/12] KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.h Sean Christopherson
2025-10-17 10:12 ` Garg, Shivank
2025-10-16 17:28 ` [PATCH v13 09/12] KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitions Sean Christopherson
2025-10-17 10:35 ` Garg, Shivank
2025-10-17 16:18 ` Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 10/12] KVM: selftests: Add helpers to probe for NUMA support, and multi-node systems Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 11/12] KVM: selftests: Add guest_memfd tests for mmap and NUMA policy support Sean Christopherson
2025-10-16 17:28 ` [PATCH v13 12/12] KVM: guest_memfd: Add gmem_inode.flags field instead of using i_private Sean Christopherson
2025-10-17 11:01 ` Garg, Shivank
2025-10-16 18:08 ` [PATCH v13 00/12] KVM: guest_memfd: Add NUMA mempolicy support Miguel Ojeda
2025-10-16 20:28 ` Sean Christopherson
2025-10-16 21:07 ` Miguel Ojeda
2025-10-16 23:57 ` Ackerley Tng
2025-10-17 0:09 ` Miguel Ojeda
2025-10-17 16:49 ` Sean Christopherson
2025-10-20 16:33 ` Sean Christopherson
2025-10-21 5:59 ` Garg, Shivank
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).