[PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts
@ 2025-09-12 22:25 David Matlack
  2025-09-12 22:25 ` [PATCH 1/2] KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests David Matlack
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: David Matlack @ 2025-09-12 22:25 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson; +Cc: Alex Williamson, kvm, David Matlack

This series can be found on GitHub:

  https://github.com/dmatlack/linux/tree/kvm/selftests/vfio_pci_irq_test/v1

I'm sending this series out a little early. VFIO selftests has landed in
vfio/next[1] and is slated for 6.18, but has not yet been merged by
Linus. But I wanted to start the discussion of linking the VFIO
selftests library into KVM selftests so we can test VFIO-KVM
interactions.

To demostrate testing VFIO+KVM interactions, patch 2 contains a test to
exercise delivering VFIO device interrupts to vCPUs. This test is
heavily based on Sean's vfio_irq_test.c [2].

Running selftests with VFIO devices is slightly different than typical
KVM selftests since it requires a PCI device bound to vfio-pci. The VFIO
selftests have a helper script for setting up a device:

  $ ./run.sh -s -d 0000:6a:01.0
  + echo "vfio-pci" > /sys/bus/pci/devices/0000:6a:01.0/driver_override
  + echo "0000:6a:01.0" > /sys/bus/pci/drivers/vfio-pci/bind

  Dropping into /bin/bash with VFIO_SELFTESTS_BDF=0000:6a:01.0

The test can then be run and it will detect the $VFIO_SELFTESTS_BDF
environment variable.

  $ tools/testing/selftests/kvm/vfio_pci_irq_test
  $ tools/testing/selftests/kvm/vfio_pci_irq_test -v64 -d -x

You can also pass the BDF directly to the test on the command-line (i.e.
using run.sh and the environment variable is not a required).

  $ tools/testing/selftests/kvm/vfio_pci_irq_test 0000:6a:01.0
  $ tools/testing/selftests/kvm/vfio_pci_irq_test -v64 -d -x 0000:6a:01.0

In order to test with real MSIs generated by the device (-d option), the
device must also have a supported driver in
tools/testing/selftests/vfio/lib/drivers/. At the moment we only have
drivers for Intel CBDMA and Intel DSA devices, so I haven't been able to
test with -d on AMD yet.

Currently this test only supports x86_64, but it should be portable to
at least ARM in the future, so I optimistically placed it in the
top-level KVM selftests directory.

[1] https://github.com/awilliam/linux-vfio/tree/next
[2] https://lore.kernel.org/kvm/20250404193923.1413163-68-seanjc@google.com/

Cc: Sean Christopherson <seanjc@google.com>

David Matlack (2):
  KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests
  KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs

 tools/testing/selftests/kvm/Makefile.kvm      |   6 +-
 .../testing/selftests/kvm/vfio_pci_irq_test.c | 507 ++++++++++++++++++
 2 files changed, 512 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/kvm/vfio_pci_irq_test.c

base-commit: 093458c58f830d0a713fab0de037df5f0ce24fef
prerequisite-patch-id: 72dce9cd586ac36ea378354735d9fabe2f3c445e
prerequisite-patch-id: a8c7ccfd91ce3208f328e8af7b25c83bff8d464d
-- 
2.51.0.384.g4c02a37b29-goog

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests
  2025-09-12 22:25 [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts David Matlack
@ 2025-09-12 22:25 ` David Matlack
  2025-09-12 22:25 ` [PATCH 2/2] KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs David Matlack
  2025-10-27 15:47 ` [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts Sean Christopherson
  2 siblings, 0 replies; 8+ messages in thread
From: David Matlack @ 2025-09-12 22:25 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson; +Cc: Alex Williamson, kvm, David Matlack

Include libvfio.mk into the KVM selftests Makefile and link it into all
KVM selftests by adding it to LIBKVM_OBJS.

Note that KVM selftests build their own copy of sefltests/vfio/lib and
the resulting object files are placed in $(OUTPUT)/lib. This allows the
KVM and VFIO selftests to apply different CFLAGS when building without
conflicting with each other.

Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 2f7a0ed61452..ac283eddb66c 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -228,6 +228,7 @@ OVERRIDE_TARGETS = 1
 # which causes the environment variable to override the makefile).
 include ../lib.mk
 include ../cgroup/lib/libcgroup.mk
+include ../vfio/lib/libvfio.mk
 
 INSTALL_HDR_PATH = $(top_srcdir)/usr
 LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/
@@ -281,7 +282,9 @@ LIBKVM_S := $(filter %.S,$(LIBKVM))
 LIBKVM_C_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_C))
 LIBKVM_S_OBJ := $(patsubst %.S, $(OUTPUT)/%.o, $(LIBKVM_S))
 LIBKVM_STRING_OBJ := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBKVM_STRING))
-LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(LIBKVM_STRING_OBJ) $(LIBCGROUP_O)
+LIBKVM_OBJS = $(LIBKVM_C_OBJ) $(LIBKVM_S_OBJ) $(LIBKVM_STRING_OBJ)
+LIBKVM_OBJS += $(LIBCGROUP_O)
+LIBKVM_OBJS += $(LIBVFIO_O)
 SPLIT_TEST_GEN_PROGS := $(patsubst %, $(OUTPUT)/%, $(SPLIT_TESTS))
 SPLIT_TEST_GEN_OBJ := $(patsubst %, $(OUTPUT)/$(SRCARCH)/%.o, $(SPLIT_TESTS))
 
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs
  2025-09-12 22:25 [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts David Matlack
  2025-09-12 22:25 ` [PATCH 1/2] KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests David Matlack
@ 2025-09-12 22:25 ` David Matlack
  2025-10-27 16:52   ` Sean Christopherson
  2025-10-27 15:47 ` [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts Sean Christopherson
  2 siblings, 1 reply; 8+ messages in thread
From: David Matlack @ 2025-09-12 22:25 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson; +Cc: Alex Williamson, kvm, David Matlack

Add a new selftest called vfio_pci_irq_test that routes and delivers an
MSI from a vfio-pci device into a guest. This test builds on top of the
VFIO selftests library, which provides helpers for interacting with VFIO
devices and drivers for generating interrupts with specific devices.

This test sets up a configurable number of vCPUs in separate threads
that all spin in guest-mode or halt. Then the test round robin routes
the device's interrupt to different CPUs, triggers it, and then verifies
the guest received it. The test supports several options to enable
affinitizing the host IRQ handler to different CPUs, pinning vCPU
threads to different CPUs, and more.

This test also measure and reports the number of times the device IRQ
was handled by the host. This can be used to confirm whether
device-posted interrupts are working as expected.

Running this test requires a PCI device bound to the vfio-pci driver,
and then passing the BDF of the device to the test, e.g.:

  $ ./vfio_pci_irq_test 0000:6a:01.0

To run the test with real device-sent MSIs (-d option), the PCI device
must also have a supported driver in
tools/testing/selftests/vfio/lib/drivers/.

This test only supports x86_64 for now, but can be ported to other
architectures in the future.

Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/kvm/20250404193923.1413163-68-seanjc@google.com/
Signed-off-by: David Matlack <dmatlack@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/vfio_pci_irq_test.c | 507 ++++++++++++++++++
 2 files changed, 508 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/vfio_pci_irq_test.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index ac283eddb66c..fc1fb91a6810 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -148,6 +148,7 @@ TEST_GEN_PROGS_x86 += rseq_test
 TEST_GEN_PROGS_x86 += steal_time
 TEST_GEN_PROGS_x86 += system_counter_offset_test
 TEST_GEN_PROGS_x86 += pre_fault_memory_test
+TEST_GEN_PROGS_x86 += vfio_pci_irq_test
 
 # Compiled outputs used by test targets
 TEST_GEN_PROGS_EXTENDED_x86 += x86/nx_huge_pages_test
diff --git a/tools/testing/selftests/kvm/vfio_pci_irq_test.c b/tools/testing/selftests/kvm/vfio_pci_irq_test.c
new file mode 100644
index 000000000000..ed6baa8f9d74
--- /dev/null
+++ b/tools/testing/selftests/kvm/vfio_pci_irq_test.c
@@ -0,0 +1,507 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "kvm_util.h"
+#include "test_util.h"
+#include "apic.h"
+#include "processor.h"
+
+#include <pthread.h>
+#include <ctype.h>
+#include <time.h>
+#include <linux/vfio.h>
+#include <linux/sizes.h>
+#include <sys/sysinfo.h>
+
+#include <vfio_util.h>
+
+static bool x2apic = true;
+static bool done;
+static bool block;
+
+static bool guest_ready_for_irqs[KVM_MAX_VCPUS];
+static bool guest_received_irq[KVM_MAX_VCPUS];
+static bool guest_received_nmi[KVM_MAX_VCPUS];
+
+static pid_t vcpu_tids[KVM_MAX_VCPUS];
+
+#define TIMEOUT_NS (2ULL * 1000 * 1000 * 1000)
+
+#define READ_FROM_GUEST(_vm, _variable) ({		\
+	sync_global_from_guest(_vm, _variable);		\
+	READ_ONCE(_variable);				\
+})
+
+#define WRITE_TO_GUEST(_vm, _variable, _value) do {	\
+	WRITE_ONCE(_variable, _value);			\
+	sync_global_to_guest(_vm, _variable);		\
+} while (0)
+
+static u32 guest_get_vcpu_id(void)
+{
+	if (x2apic)
+		return x2apic_read_reg(APIC_ID);
+	else
+		return xapic_read_reg(APIC_ID) >> 24;
+}
+
+static void guest_enable_interrupts(void)
+{
+	if (x2apic)
+		x2apic_enable();
+	else
+		xapic_enable();
+
+	sti_nop();
+}
+
+static void guest_irq_handler(struct ex_regs *regs)
+{
+	WRITE_ONCE(guest_received_irq[guest_get_vcpu_id()], true);
+
+	if (x2apic)
+		x2apic_write_reg(APIC_EOI, 0);
+	else
+		xapic_write_reg(APIC_EOI, 0);
+}
+
+static void guest_nmi_handler(struct ex_regs *regs)
+{
+	WRITE_ONCE(guest_received_nmi[guest_get_vcpu_id()], true);
+}
+
+static void guest_code(void)
+{
+	guest_enable_interrupts();
+	WRITE_ONCE(guest_ready_for_irqs[guest_get_vcpu_id()], true);
+
+	while (!READ_ONCE(done)) {
+		if (block)
+			hlt();
+	}
+
+	GUEST_DONE();
+}
+
+static void *vcpu_thread_main(void *arg)
+{
+	struct kvm_vcpu *vcpu = arg;
+	struct ucall uc;
+
+	WRITE_ONCE(vcpu_tids[vcpu->id], syscall(__NR_gettid));
+
+	vcpu_run(vcpu);
+	TEST_ASSERT_EQ(UCALL_DONE, get_ucall(vcpu, &uc));
+
+	return NULL;
+}
+
+static int get_cpu(struct kvm_vcpu *vcpu)
+{
+	pid_t tid = vcpu_tids[vcpu->id];
+	cpu_set_t cpus;
+	int cpu = -1;
+	int i, ret;
+
+	ret = sched_getaffinity(tid, sizeof(cpus), &cpus);
+	TEST_ASSERT(ret == 0, "sched_getaffinity() failed");
+
+	for (i = 0; i < get_nprocs(); i++) {
+		if (!CPU_ISSET(i, &cpus))
+			continue;
+
+		if (cpu != -1) {
+			cpu = i;
+		} else {
+			/* vCPU is pinned to multiple CPUs */
+			return -1;
+		}
+	}
+
+	return cpu;
+}
+
+static void pin_vcpu_threads(int nr_vcpus, int start_cpu, cpu_set_t *available_cpus)
+{
+	const size_t size = sizeof(cpu_set_t);
+	int nr_cpus, cpu, vcpu_index = 0;
+	cpu_set_t target_cpu;
+	int r;
+
+	nr_cpus = get_nprocs();
+	CPU_ZERO(&target_cpu);
+
+	for (cpu = start_cpu;; cpu = (cpu + 1) % nr_cpus) {
+		if (vcpu_index == nr_vcpus)
+			break;
+
+		if (!CPU_ISSET(cpu, available_cpus))
+			continue;
+
+		CPU_SET(cpu, &target_cpu);
+
+		r = sched_setaffinity(vcpu_tids[vcpu_index], size, &target_cpu);
+		TEST_ASSERT(r == 0, "sched_setaffinity() failed (cpu: %d)", cpu);
+
+		CPU_CLR(cpu, &target_cpu);
+
+		vcpu_index++;
+	}
+}
+
+static FILE *open_proc_interrupts(void)
+{
+	FILE *fp;
+
+	fp = fopen("/proc/interrupts", "r");
+	TEST_ASSERT(fp, "fopen(/proc/interrupts) failed");
+
+	return fp;
+}
+
+static int get_irq_number(const char *device_bdf, int msi)
+{
+	char search_string[64];
+	char line[4096];
+	int irq = -1;
+	FILE *fp;
+
+	fp = open_proc_interrupts();
+
+	snprintf(search_string, sizeof(search_string), "vfio-msix[%d]", msi);
+
+	while (fgets(line, sizeof(line), fp)) {
+		if (strstr(line, device_bdf) && strstr(line, search_string)) {
+			TEST_ASSERT_EQ(1, sscanf(line, "%d:", &irq));
+			break;
+		}
+	}
+
+	fclose(fp);
+
+	TEST_ASSERT(irq != -1, "Failed to locate IRQ for %s %s", device_bdf, search_string);
+	return irq;
+}
+
+static int parse_interrupt_count(char *token)
+{
+	char *c;
+
+	for (c = token; *c; c++) {
+		if (!isdigit(*c))
+			return 0;
+	}
+
+	return atoi_non_negative("interrupt count", token);
+}
+
+static u64 __get_irq_count(const char *search_string)
+{
+	u64 total_count = 0;
+	char line[4096];
+	FILE *fp;
+
+	fp = open_proc_interrupts();
+
+	while (fgets(line, sizeof(line), fp)) {
+		char *token = strtok(line, " ");
+
+		if (strcmp(token, search_string))
+			continue;
+
+		while ((token = strtok(NULL, " ")))
+			total_count += parse_interrupt_count(token);
+
+		break;
+	}
+
+	fclose(fp);
+	return total_count;
+}
+
+static u64 get_irq_count(int irq)
+{
+	char search_string[32];
+
+	snprintf(search_string, sizeof(search_string), "%d:", irq);
+	return __get_irq_count(search_string);
+}
+
+static void kvm_clear_gsi_routes(struct kvm_vm *vm)
+{
+	struct kvm_irq_routing routes = {};
+
+	vm_ioctl(vm, KVM_SET_GSI_ROUTING, &routes);
+}
+
+static void kvm_route_msi(struct kvm_vm *vm, u32 gsi, struct kvm_vcpu *vcpu,
+			  u8 vector, bool do_nmi)
+{
+	u8 buf[sizeof(struct kvm_irq_routing) + sizeof(struct kvm_irq_routing_entry)] = {};
+	struct kvm_irq_routing *routes = (void *)&buf;
+
+	routes->nr = 1;
+	routes->entries[0].gsi = gsi;
+	routes->entries[0].type = KVM_IRQ_ROUTING_MSI;
+	routes->entries[0].u.msi.address_lo = 0xFEE00000 | (vcpu->id << 12);
+	routes->entries[0].u.msi.data = do_nmi ? NMI_VECTOR | (4 << 8) : vector;
+
+	vm_ioctl(vm, KVM_SET_GSI_ROUTING, routes);
+}
+
+static int setup_msi(struct vfio_pci_device *device, bool use_device_msi)
+{
+	const int flags = MAP_SHARED | MAP_ANONYMOUS;
+	const int prot = PROT_READ | PROT_WRITE;
+	struct vfio_dma_region *region;
+
+	if (use_device_msi) {
+		/* A driver is required to generate an MSI. */
+		TEST_REQUIRE(device->driver.ops);
+
+		/* Set up a DMA-able region for the driver to use. */
+		region = &device->driver.region;
+		region->iova = 0;
+		region->size = SZ_2M;
+		region->vaddr = mmap(NULL, region->size, prot, flags, -1, 0);
+		TEST_ASSERT(region->vaddr != MAP_FAILED, "mmap() failed\n");
+		vfio_pci_dma_map(device, region);
+
+		vfio_pci_driver_init(device);
+
+		return device->driver.msi;
+	}
+
+	TEST_REQUIRE(device->msix_info.count > 0);
+	vfio_pci_msix_enable(device, 0, 1);
+	return 0;
+}
+
+static void send_msi(struct vfio_pci_device *device, bool use_device_msi, int msi)
+{
+	if (use_device_msi) {
+		TEST_ASSERT_EQ(msi, device->driver.msi);
+		vfio_pci_driver_send_msi(device);
+	} else {
+		vfio_pci_irq_trigger(device, VFIO_PCI_MSIX_IRQ_INDEX, msi);
+	}
+}
+
+static void help(const char *name)
+{
+	printf("Usage: %s [-a] [-b] [-d] [-e] [-h] [-i nr_irqs] [-n] [-p] [-v nr_vcpus] [-x] segment:bus:device.function\n",
+	       name);
+	printf("\n");
+	printf("  -a: Randomly affinitize the device IRQ to different CPUs\n"
+	       "      throughout the test.\n");
+	printf("  -b: Block vCPUs (e.g. HLT) instead of spinning in guest-mode\n");
+	printf("  -d: Use the device to trigger the IRQ instead of emulating\n"
+	       "      it with an eventfd write.\n");
+	printf("  -e: Destroy and recreate KVM's GSI routing table in between\n"
+	       "      some interrupts.\n");
+	printf("  -i: The number of IRQs to generate during the test.\n");
+	printf("  -n: Route some of the device interrupts to be delivered as\n"
+	       "      an NMI into the guest.\n");
+	printf("  -p: Pin vCPU threads to random pCPUs throughout the test.\n");
+	printf("  -v: Set the number of vCPUs that the test should create.\n"
+	       "      Interrupts will be round-robined among vCPUs.\n");
+	printf("  -x: Use xAPIC mode instead of x2APIC mode in the guest.\n");
+	printf("\n");
+	exit(KSFT_FAIL);
+}
+
+int main(int argc, char **argv)
+{
+	/* Random non-reserved vector and GSI to use for the device IRQ */
+	const u8 vector = 0xe0;
+	const u32 gsi = 32;
+
+	/* Test configuration (overridable by command line flags). */
+	bool use_device_msi = false, irq_affinity = false, pin_vcpus = false;
+	bool empty = false, nmi = false;
+	int nr_irqs = 1000;
+	int nr_vcpus = 1;
+
+	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+	pthread_t vcpu_threads[KVM_MAX_VCPUS];
+	u64 irq_count, pin_count, piw_count;
+	struct vfio_pci_device *device;
+	cpu_set_t available_cpus;
+	const char *device_bdf;
+	FILE *irq_affinity_fp;
+	int i, j, c, msi, irq;
+	struct kvm_vm *vm;
+	int irq_cpu;
+	int ret;
+
+	device_bdf = vfio_selftests_get_bdf(&argc, argv);
+
+	while ((c = getopt(argc, argv, "abdehi:npv:x")) != -1) {
+		switch (c) {
+		case 'a':
+			irq_affinity = true;
+			break;
+		case 'b':
+			block = true;
+			break;
+		case 'd':
+			use_device_msi = true;
+			break;
+		case 'e':
+			empty = true;
+			break;
+		case 'i':
+			nr_irqs = atoi_positive("Number of IRQs", optarg);
+			break;
+		case 'n':
+			nmi = true;
+			break;
+		case 'p':
+			pin_vcpus = true;
+			break;
+		case 'v':
+			nr_vcpus = atoi_positive("nr_vcpus", optarg);
+			break;
+		case 'x':
+			x2apic = false;
+			break;
+		case 'h':
+		default:
+			help(argv[0]);
+		}
+	}
+
+	vm = vm_create_with_vcpus(nr_vcpus, guest_code, vcpus);
+	vm_install_exception_handler(vm, vector, guest_irq_handler);
+	vm_install_exception_handler(vm, NMI_VECTOR, guest_nmi_handler);
+
+	if (!x2apic)
+		virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
+
+	device = vfio_pci_device_init(device_bdf, default_iommu_mode);
+	msi = setup_msi(device, use_device_msi);
+	irq = get_irq_number(device_bdf, msi);
+
+	irq_count = get_irq_count(irq);
+	pin_count = __get_irq_count("PIN:");
+	piw_count = __get_irq_count("PIW:");
+
+	printf("%s %s MSI-X[%d] (IRQ-%d) %d times\n",
+	       use_device_msi ? "Triggering" : "Notifying the eventfd for",
+	       device_bdf, msi, irq, nr_irqs);
+
+	kvm_assign_irqfd(vm, gsi, device->msi_eventfds[msi]);
+
+	sync_global_to_guest(vm, x2apic);
+	sync_global_to_guest(vm, block);
+
+	for (i = 0; i < nr_vcpus; i++)
+		pthread_create(&vcpu_threads[i], NULL, vcpu_thread_main, vcpus[i]);
+
+	for (i = 0; i < nr_vcpus; i++) {
+		struct kvm_vcpu *vcpu = vcpus[i];
+
+		while (!READ_FROM_GUEST(vm, guest_ready_for_irqs[vcpu->id]))
+			continue;
+	}
+
+	if (pin_vcpus) {
+		ret = sched_getaffinity(vcpu_tids[0], sizeof(available_cpus), &available_cpus);
+		TEST_ASSERT(ret == 0, "sched_getaffinity() failed");
+
+		if (nr_vcpus > CPU_COUNT(&available_cpus)) {
+			printf("There are more vCPUs than pCPUs; refusing to pin.\n");
+			pin_vcpus = false;
+		}
+	}
+
+	if (irq_affinity) {
+		char path[PATH_MAX];
+
+		snprintf(path, sizeof(path), "/proc/irq/%d/smp_affinity_list", irq);
+		irq_affinity_fp = fopen(path, "w");
+		TEST_ASSERT(irq_affinity_fp, "fopen(%s) failed", path);
+	}
+
+	/* Set a consistent seed so that test are repeatable. */
+	srand(0);
+
+	for (i = 0; i < nr_irqs; i++) {
+		struct kvm_vcpu *vcpu = vcpus[i % nr_vcpus];
+		const bool do_nmi = nmi && (i & BIT(2));
+		const bool do_empty = empty && (i & BIT(3));
+		struct timespec start;
+
+		if (do_empty)
+			kvm_clear_gsi_routes(vm);
+
+		kvm_route_msi(vm, gsi, vcpu, vector, do_nmi);
+
+		if (irq_affinity && vcpu->id == 0) {
+			irq_cpu = rand() % get_nprocs();
+
+			ret = fprintf(irq_affinity_fp, "%d\n", irq_cpu);
+			TEST_ASSERT(ret > 0, "Failed to affinitize IRQ-%d to CPU %d", irq, irq_cpu);
+		}
+
+		if (pin_vcpus && vcpu->id == 0)
+			pin_vcpu_threads(nr_vcpus, rand() % get_nprocs(), &available_cpus);
+
+		for (j = 0; j < nr_vcpus; j++) {
+			TEST_ASSERT_EQ(READ_FROM_GUEST(vm, guest_received_irq[vcpu->id]), false);
+			TEST_ASSERT_EQ(READ_FROM_GUEST(vm, guest_received_nmi[vcpu->id]), false);
+		}
+
+		send_msi(device, use_device_msi, msi);
+
+		clock_gettime(CLOCK_MONOTONIC, &start);
+		for (;;) {
+			if (do_nmi && READ_FROM_GUEST(vm, guest_received_nmi[vcpu->id]))
+				break;
+
+			if (!do_nmi && READ_FROM_GUEST(vm, guest_received_irq[vcpu->id]))
+				break;
+
+			if (timespec_to_ns(timespec_elapsed(start)) > TIMEOUT_NS) {
+				printf("Timeout waiting for interrupt!\n");
+				printf("  vCPU: %d\n", vcpu->id);
+				printf("  do_nmi: %d\n", do_nmi);
+				printf("  do_empty: %d\n", do_empty);
+				if (irq_affinity)
+					printf("  irq_cpu: %d\n", irq_cpu);
+				if (pin_vcpus)
+					printf("  vcpu_cpu: %d\n", get_cpu(vcpu));
+
+				TEST_FAIL("vCPU never received IRQ!\n");
+			}
+		}
+
+		if (do_nmi)
+			WRITE_TO_GUEST(vm, guest_received_nmi[vcpu->id], false);
+		else
+			WRITE_TO_GUEST(vm, guest_received_irq[vcpu->id], false);
+	}
+
+	WRITE_TO_GUEST(vm, done, true);
+
+	for (i = 0; i < nr_vcpus; i++) {
+		if (block) {
+			kvm_route_msi(vm, gsi, vcpus[i], vector, false);
+			send_msi(device, false, msi);
+		}
+
+		pthread_join(vcpu_threads[i], NULL);
+	}
+
+	if (irq_affinity)
+		fclose(irq_affinity_fp);
+
+	printf("Host interrupts handled:\n");
+	printf("  IRQ-%d: %lu\n", irq, get_irq_count(irq) - irq_count);
+	printf("  Posted-interrupt notification events: %lu\n",
+	       __get_irq_count("PIN:") - pin_count);
+	printf("  Posted-interrupt wakeup events: %lu\n",
+	       __get_irq_count("PIW:") - piw_count);
+
+	vfio_pci_device_cleanup(device);
+
+	return 0;
+}
-- 
2.51.0.384.g4c02a37b29-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts
  2025-09-12 22:25 [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts David Matlack
  2025-09-12 22:25 ` [PATCH 1/2] KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests David Matlack
  2025-09-12 22:25 ` [PATCH 2/2] KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs David Matlack
@ 2025-10-27 15:47 ` Sean Christopherson
  2025-10-27 16:03   ` David Matlack
  2 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-10-27 15:47 UTC (permalink / raw)
  To: David Matlack; +Cc: Paolo Bonzini, Alex Williamson, kvm

On Fri, Sep 12, 2025, David Matlack wrote:
> This series can be found on GitHub:
> 
>   https://github.com/dmatlack/linux/tree/kvm/selftests/vfio_pci_irq_test/v1

...

> David Matlack (2):
>   KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests
>   KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs
> 
>  tools/testing/selftests/kvm/Makefile.kvm      |   6 +-
>  .../testing/selftests/kvm/vfio_pci_irq_test.c | 507 ++++++++++++++++++
>  2 files changed, 512 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/kvm/vfio_pci_irq_test.c
> 
> 
> base-commit: 093458c58f830d0a713fab0de037df5f0ce24fef
> prerequisite-patch-id: 72dce9cd586ac36ea378354735d9fabe2f3c445e
> prerequisite-patch-id: a8c7ccfd91ce3208f328e8af7b25c83bff8d464d

Please don't base series on prerequisite patches unless there is a hard dependency.

  ffdc6a9d6d9eb20c855404e2c09b6b2ea25b4a04 KVM: selftests: Rename $(ARCH_DIR) to $(SRCARCH)
  9dc0c1189dfa1f4eef3856445fa72c9fb1e14d1c Revert "KVM: selftests: Override ARCH for x86_64 instead of using ARCH_DIR"

By all means, do testing on top of such patches if that's what your environment
effectively dictates, but rebase/drop such prereqs before posting, as they add
noise and friction.  E.g. these patches don't apply cleanly on kvm-x86/next due
to the prereqs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts
  2025-10-27 15:47 ` [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts Sean Christopherson
@ 2025-10-27 16:03   ` David Matlack
  0 siblings, 0 replies; 8+ messages in thread
From: David Matlack @ 2025-10-27 16:03 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Alex Williamson, kvm

On Mon, Oct 27, 2025 at 8:47 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Sep 12, 2025, David Matlack wrote:
> > This series can be found on GitHub:
> >
> >   https://github.com/dmatlack/linux/tree/kvm/selftests/vfio_pci_irq_test/v1
>
> ...
>
> > David Matlack (2):
> >   KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests
> >   KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs
> >
> >  tools/testing/selftests/kvm/Makefile.kvm      |   6 +-
> >  .../testing/selftests/kvm/vfio_pci_irq_test.c | 507 ++++++++++++++++++
> >  2 files changed, 512 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/testing/selftests/kvm/vfio_pci_irq_test.c
> >
> >
> > base-commit: 093458c58f830d0a713fab0de037df5f0ce24fef
> > prerequisite-patch-id: 72dce9cd586ac36ea378354735d9fabe2f3c445e
> > prerequisite-patch-id: a8c7ccfd91ce3208f328e8af7b25c83bff8d464d
>
> Please don't base series on prerequisite patches unless there is a hard dependency.
>
>   ffdc6a9d6d9eb20c855404e2c09b6b2ea25b4a04 KVM: selftests: Rename $(ARCH_DIR) to $(SRCARCH)
>   9dc0c1189dfa1f4eef3856445fa72c9fb1e14d1c Revert "KVM: selftests: Override ARCH for x86_64 instead of using ARCH_DIR"
>
> By all means, do testing on top of such patches if that's what your environment
> effectively dictates, but rebase/drop such prereqs before posting, as they add
> noise and friction.  E.g. these patches don't apply cleanly on kvm-x86/next due
> to the prereqs.

Makes sense. I will send patches applied without prereqs specific to
my environment next time.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs
  2025-09-12 22:25 ` [PATCH 2/2] KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs David Matlack
@ 2025-10-27 16:52   ` Sean Christopherson
  2025-10-27 17:46     ` David Matlack
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-10-27 16:52 UTC (permalink / raw)
  To: David Matlack; +Cc: Paolo Bonzini, Alex Williamson, kvm

On Fri, Sep 12, 2025, David Matlack wrote: 
> Add a new selftest called vfio_pci_irq_test that routes and delivers an
> MSI from a vfio-pci device into a guest. This test builds on top of the
> VFIO selftests library, which provides helpers for interacting with VFIO
> devices and drivers for generating interrupts with specific devices.
> 
> This test sets up a configurable number of vCPUs in separate threads
> that all spin in guest-mode or halt. Then the test round robin routes
> the device's interrupt to different CPUs, triggers it, and then verifies
> the guest received it. The test supports several options to enable
> affinitizing the host IRQ handler to different CPUs, pinning vCPU
> threads to different CPUs, and more.
> 
> This test also measure and reports the number of times the device IRQ
> was handled by the host. This can be used to confirm whether
> device-posted interrupts are working as expected.
> 
> Running this test requires a PCI device bound to the vfio-pci driver,
> and then passing the BDF of the device to the test, e.g.:
> 
>   $ ./vfio_pci_irq_test 0000:6a:01.0
> 
> To run the test with real device-sent MSIs (-d option), the PCI device
> must also have a supported driver in
> tools/testing/selftests/vfio/lib/drivers/.
> 
> This test only supports x86_64 for now, but can be ported to other
> architectures in the future.

Can it though?  There are bits and pieces that can be reused, but this test is
x86 through and through.

> Suggested-by: Sean Christopherson <seanjc@google.com>
> Link: https://lore.kernel.org/kvm/20250404193923.1413163-68-seanjc@google.com/
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
>  tools/testing/selftests/kvm/Makefile.kvm      |   1 +
>  .../testing/selftests/kvm/vfio_pci_irq_test.c | 507 ++++++++++++++++++

Please break this into multiple patches, e.g. a "basic" patch and and then roughly
one per "advanced" command line option.

There is a _lot_ going on here, with very little documentation, e.g. the changelog
just says:

  The test supports several options to enable affinitizing the host IRQ handler
  to different CPUs, pinning vCPU threads to different CPUs, and more. 

which is rather useless for readers that aren't already familiar with much of
what is being tested, and why.

>  2 files changed, 508 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/vfio_pci_irq_test.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> index ac283eddb66c..fc1fb91a6810 100644
> --- a/tools/testing/selftests/kvm/Makefile.kvm
> +++ b/tools/testing/selftests/kvm/Makefile.kvm
> @@ -148,6 +148,7 @@ TEST_GEN_PROGS_x86 += rseq_test
>  TEST_GEN_PROGS_x86 += steal_time
>  TEST_GEN_PROGS_x86 += system_counter_offset_test
>  TEST_GEN_PROGS_x86 += pre_fault_memory_test
> +TEST_GEN_PROGS_x86 += vfio_pci_irq_test
>  
>  # Compiled outputs used by test targets
>  TEST_GEN_PROGS_EXTENDED_x86 += x86/nx_huge_pages_test
> diff --git a/tools/testing/selftests/kvm/vfio_pci_irq_test.c b/tools/testing/selftests/kvm/vfio_pci_irq_test.c
> new file mode 100644
> index 000000000000..ed6baa8f9d74
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/vfio_pci_irq_test.c
> @@ -0,0 +1,507 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "kvm_util.h"
> +#include "test_util.h"
> +#include "apic.h"
> +#include "processor.h"
> +
> +#include <pthread.h>
> +#include <ctype.h>
> +#include <time.h>
> +#include <linux/vfio.h>
> +#include <linux/sizes.h>
> +#include <sys/sysinfo.h>
> +
> +#include <vfio_util.h>
> +
> +static bool x2apic = true;
> +static bool done;
> +static bool block;
> +
> +static bool guest_ready_for_irqs[KVM_MAX_VCPUS];
> +static bool guest_received_irq[KVM_MAX_VCPUS];
> +static bool guest_received_nmi[KVM_MAX_VCPUS];
> +
> +static pid_t vcpu_tids[KVM_MAX_VCPUS];

s/tids/pids?

> +#define TIMEOUT_NS (2ULL * 1000 * 1000 * 1000)

The timeout should be configurable via command line.

> +#define READ_FROM_GUEST(_vm, _variable) ({		\
> +	sync_global_from_guest(_vm, _variable);		\
> +	READ_ONCE(_variable);				\
> +})
> +
> +#define WRITE_TO_GUEST(_vm, _variable, _value) do {	\
> +	WRITE_ONCE(_variable, _value);			\
> +	sync_global_to_guest(_vm, _variable);		\
> +} while (0)

These belong in a separate patch, and in tools/testing/selftests/kvm/include/kvm_util.h.

> +
> +static u32 guest_get_vcpu_id(void)
> +{
> +	if (x2apic)
> +		return x2apic_read_reg(APIC_ID);
> +	else
> +		return xapic_read_reg(APIC_ID) >> 24;
> +}
> +
> +static void guest_enable_interrupts(void)

This is a misleading name, e.g. I would expect it to _just_ be sti_nop().  If the
intent is to provide a hook for a potential non-x86 implementation, then it should
probably be split into guest_arch_test_setup() and guest_arch_irq_enable() or so
(to align with the kernel's local_irq_{dis,en}able()).

As is, I would just omit the helper.

> +{
> +	if (x2apic)
> +		x2apic_enable();
> +	else
> +		xapic_enable();
> +
> +	sti_nop();
> +}
> +
> +static void guest_irq_handler(struct ex_regs *regs)
> +{
> +	WRITE_ONCE(guest_received_irq[guest_get_vcpu_id()], true);

Hmm, using APID ID works, but I don't like the hidden dependency on the library
using ascending IDs starting from '0'.  This would also be a good opportunity to
improve the core infrastructure.

Rather than use APIC IDs, vm_arch_vcpu_add() could set MSR_TSC_AUX if RDTSCP or
RDPID is supported, and then make guest_get_vcpu_id() a common API that uses
RDPID or RDTSCP (prefer RDPID when possible).

Then tests that want to use guest_get_vcpu_id() can do a TEST_REQUIRE().  RDTSCP
exists at least as far back as Haswell, so I don't think it's unreasonable to
start depending on MSR_TSC_AUX.

Hmm, or we could steal an idea from the kernel and use the LDT to stash info in
a segment limit as a fallback.

> +
> +	if (x2apic)
> +		x2apic_write_reg(APIC_EOI, 0);
> +	else
> +		xapic_write_reg(APIC_EOI, 0);
> +}
> +
> +static void guest_nmi_handler(struct ex_regs *regs)
> +{
> +	WRITE_ONCE(guest_received_nmi[guest_get_vcpu_id()], true);
> +}
> +
> +static void guest_code(void)
> +{
> +	guest_enable_interrupts();
> +	WRITE_ONCE(guest_ready_for_irqs[guest_get_vcpu_id()], true);
> +
> +	while (!READ_ONCE(done)) {
> +		if (block)
> +			hlt();

		else
			cpu_relax();

> +	}
> +
> +	GUEST_DONE();
> +}
> +
> +static void *vcpu_thread_main(void *arg)
> +{
> +	struct kvm_vcpu *vcpu = arg;
> +	struct ucall uc;
> +
> +	WRITE_ONCE(vcpu_tids[vcpu->id], syscall(__NR_gettid));

Please add wrapper in tools/testing/selftests/kvm/include/kvm_syscalls.h.

> +
> +	vcpu_run(vcpu);
> +	TEST_ASSERT_EQ(UCALL_DONE, get_ucall(vcpu, &uc));
> +
> +	return NULL;
> +}
> +
> +static int get_cpu(struct kvm_vcpu *vcpu)
> +{
> +	pid_t tid = vcpu_tids[vcpu->id];
> +	cpu_set_t cpus;
> +	int cpu = -1;
> +	int i, ret;
> +
> +	ret = sched_getaffinity(tid, sizeof(cpus), &cpus);
> +	TEST_ASSERT(ret == 0, "sched_getaffinity() failed");
> +
> +	for (i = 0; i < get_nprocs(); i++) {
> +		if (!CPU_ISSET(i, &cpus))
> +			continue;
> +
> +		if (cpu != -1) {
> +			cpu = i;
> +		} else {
> +			/* vCPU is pinned to multiple CPUs */
> +			return -1;
> +		}
> +	}
> +
> +	return cpu;
> +}
> +
> +static void pin_vcpu_threads(int nr_vcpus, int start_cpu, cpu_set_t *available_cpus)
> +{
> +	const size_t size = sizeof(cpu_set_t);
> +	int nr_cpus, cpu, vcpu_index = 0;
> +	cpu_set_t target_cpu;
> +	int r;
> +
> +	nr_cpus = get_nprocs();

Generally speaking, KVM selftests try to avoid affining tasks to CPUs that are
outside of the original affinity list.  See various usage of sched_getaffinity().


> +	CPU_ZERO(&target_cpu);
> +
> +	for (cpu = start_cpu;; cpu = (cpu + 1) % nr_cpus) {
> +		if (vcpu_index == nr_vcpus)
> +			break;
> +
> +		if (!CPU_ISSET(cpu, available_cpus))
> +			continue;
> +
> +		CPU_SET(cpu, &target_cpu);
> +
> +		r = sched_setaffinity(vcpu_tids[vcpu_index], size, &target_cpu);
> +		TEST_ASSERT(r == 0, "sched_setaffinity() failed (cpu: %d)", cpu);
> +
> +		CPU_CLR(cpu, &target_cpu);
> +
> +		vcpu_index++;
> +	}
> +}
> +
> +static FILE *open_proc_interrupts(void)
> +{
> +	FILE *fp;
> +
> +	fp = fopen("/proc/interrupts", "r");
> +	TEST_ASSERT(fp, "fopen(/proc/interrupts) failed");

open_path_or_exit()?

> +
> +	return fp;
> +}

And all of these /proc helpers belong in library code.

> +static int get_irq_number(const char *device_bdf, int msi)
> +{
> +	char search_string[64];
> +	char line[4096];
> +	int irq = -1;
> +	FILE *fp;
> +
> +	fp = open_proc_interrupts();
> +
> +	snprintf(search_string, sizeof(search_string), "vfio-msix[%d]", msi);
> +
> +	while (fgets(line, sizeof(line), fp)) {
> +		if (strstr(line, device_bdf) && strstr(line, search_string)) {
> +			TEST_ASSERT_EQ(1, sscanf(line, "%d:", &irq));
> +			break;
> +		}
> +	}
> +
> +	fclose(fp);
> +
> +	TEST_ASSERT(irq != -1, "Failed to locate IRQ for %s %s", device_bdf, search_string);
> +	return irq;
> +}
> +
> +static int parse_interrupt_count(char *token)
> +{
> +	char *c;
> +
> +	for (c = token; *c; c++) {
> +		if (!isdigit(*c))
> +			return 0;
> +	}
> +
> +	return atoi_non_negative("interrupt count", token);
> +}
> +
> +static u64 __get_irq_count(const char *search_string)
> +{
> +	u64 total_count = 0;
> +	char line[4096];
> +	FILE *fp;
> +
> +	fp = open_proc_interrupts();
> +
> +	while (fgets(line, sizeof(line), fp)) {
> +		char *token = strtok(line, " ");
> +
> +		if (strcmp(token, search_string))
> +			continue;
> +
> +		while ((token = strtok(NULL, " ")))
> +			total_count += parse_interrupt_count(token);
> +
> +		break;
> +	}
> +
> +	fclose(fp);
> +	return total_count;
> +}
> +
> +static u64 get_irq_count(int irq)
> +{
> +	char search_string[32];
> +
> +	snprintf(search_string, sizeof(search_string), "%d:", irq);
> +	return __get_irq_count(search_string);
> +}
> +
> +static void kvm_clear_gsi_routes(struct kvm_vm *vm)
> +{
> +	struct kvm_irq_routing routes = {};
> +
> +	vm_ioctl(vm, KVM_SET_GSI_ROUTING, &routes);
> +}
> +
> +static void kvm_route_msi(struct kvm_vm *vm, u32 gsi, struct kvm_vcpu *vcpu,
> +			  u8 vector, bool do_nmi)
> +{
> +	u8 buf[sizeof(struct kvm_irq_routing) + sizeof(struct kvm_irq_routing_entry)] = {};
> +	struct kvm_irq_routing *routes = (void *)&buf;
> +
> +	routes->nr = 1;
> +	routes->entries[0].gsi = gsi;
> +	routes->entries[0].type = KVM_IRQ_ROUTING_MSI;
> +	routes->entries[0].u.msi.address_lo = 0xFEE00000 | (vcpu->id << 12);
> +	routes->entries[0].u.msi.data = do_nmi ? NMI_VECTOR | (4 << 8) : vector;
> +
> +	vm_ioctl(vm, KVM_SET_GSI_ROUTING, routes);
> +}
> +
> +static int setup_msi(struct vfio_pci_device *device, bool use_device_msi)
> +{
> +	const int flags = MAP_SHARED | MAP_ANONYMOUS;
> +	const int prot = PROT_READ | PROT_WRITE;
> +	struct vfio_dma_region *region;
> +
> +	if (use_device_msi) {
> +		/* A driver is required to generate an MSI. */
> +		TEST_REQUIRE(device->driver.ops);
> +
> +		/* Set up a DMA-able region for the driver to use. */

Why?

> +		region = &device->driver.region;
> +		region->iova = 0;
> +		region->size = SZ_2M;
> +		region->vaddr = mmap(NULL, region->size, prot, flags, -1, 0);

kvm_mmap()

> +		TEST_ASSERT(region->vaddr != MAP_FAILED, "mmap() failed\n");
> +		vfio_pci_dma_map(device, region);
> +
> +		vfio_pci_driver_init(device);
> +
> +		return device->driver.msi;
> +	}
> +
> +	TEST_REQUIRE(device->msix_info.count > 0);
> +	vfio_pci_msix_enable(device, 0, 1);
> +	return 0;
> +}
> +
> +static void send_msi(struct vfio_pci_device *device, bool use_device_msi, int msi)

IMO, this helper does more harm than good.  There is only one real user, the
second call unconditionally passes %false for @use_device_msi.

If you drop the helper, than there should be no need to assert that the MSI is
the device MSI on *every* send via device.

> +{
> +	if (use_device_msi) {
> +		TEST_ASSERT_EQ(msi, device->driver.msi);
> +		vfio_pci_driver_send_msi(device);
> +	} else {
> +		vfio_pci_irq_trigger(device, VFIO_PCI_MSIX_IRQ_INDEX, msi);
> +	}
> +}
> +
> +static void help(const char *name)
> +{
> +	printf("Usage: %s [-a] [-b] [-d] [-e] [-h] [-i nr_irqs] [-n] [-p] [-v nr_vcpus] [-x] segment:bus:device.function\n",
> +	       name);
> +	printf("\n");
> +	printf("  -a: Randomly affinitize the device IRQ to different CPUs\n"
> +	       "      throughout the test.\n");
> +	printf("  -b: Block vCPUs (e.g. HLT) instead of spinning in guest-mode\n");
> +	printf("  -d: Use the device to trigger the IRQ instead of emulating\n"
> +	       "      it with an eventfd write.\n");
> +	printf("  -e: Destroy and recreate KVM's GSI routing table in between\n"
> +	       "      some interrupts.\n");
> +	printf("  -i: The number of IRQs to generate during the test.\n");
> +	printf("  -n: Route some of the device interrupts to be delivered as\n"
> +	       "      an NMI into the guest.\n");
> +	printf("  -p: Pin vCPU threads to random pCPUs throughout the test.\n");
> +	printf("  -v: Set the number of vCPUs that the test should create.\n"
> +	       "      Interrupts will be round-robined among vCPUs.\n");
> +	printf("  -x: Use xAPIC mode instead of x2APIC mode in the guest.\n");
> +	printf("\n");
> +	exit(KSFT_FAIL);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	/* Random non-reserved vector and GSI to use for the device IRQ */
> +	const u8 vector = 0xe0;

s/random/arbitrary

Why not make it truly random?

> +	const u32 gsi = 32;

Ditto here.

> +	/* Test configuration (overridable by command line flags). */
> +	bool use_device_msi = false, irq_affinity = false, pin_vcpus = false;
> +	bool empty = false, nmi = false;
> +	int nr_irqs = 1000;
> +	int nr_vcpus = 1;
> +
> +	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
> +	pthread_t vcpu_threads[KVM_MAX_VCPUS];
> +	u64 irq_count, pin_count, piw_count;
> +	struct vfio_pci_device *device;
> +	cpu_set_t available_cpus;
> +	const char *device_bdf;
> +	FILE *irq_affinity_fp;
> +	int i, j, c, msi, irq;
> +	struct kvm_vm *vm;
> +	int irq_cpu;
> +	int ret;
> +
> +	device_bdf = vfio_selftests_get_bdf(&argc, argv);
> +
> +	while ((c = getopt(argc, argv, "abdehi:npv:x")) != -1) {
> +		switch (c) {
> +		case 'a':
> +			irq_affinity = true;
> +			break;
> +		case 'b':
> +			block = true;
> +			break;
> +		case 'd':
> +			use_device_msi = true;
> +			break;
> +		case 'e':
> +			empty = true;
> +			break;
> +		case 'i':
> +			nr_irqs = atoi_positive("Number of IRQs", optarg);
> +			break;
> +		case 'n':
> +			nmi = true;
> +			break;
> +		case 'p':
> +			pin_vcpus = true;
> +			break;
> +		case 'v':
> +			nr_vcpus = atoi_positive("nr_vcpus", optarg);
> +			break;
> +		case 'x':
> +			x2apic = false;
> +			break;
> +		case 'h':
> +		default:
> +			help(argv[0]);
> +		}
> +	}
> +
> +	vm = vm_create_with_vcpus(nr_vcpus, guest_code, vcpus);
> +	vm_install_exception_handler(vm, vector, guest_irq_handler);
> +	vm_install_exception_handler(vm, NMI_VECTOR, guest_nmi_handler);
> +
> +	if (!x2apic)
> +		virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
> +
> +	device = vfio_pci_device_init(device_bdf, default_iommu_mode);
> +	msi = setup_msi(device, use_device_msi);
> +	irq = get_irq_number(device_bdf, msi);
> +
> +	irq_count = get_irq_count(irq);
> +	pin_count = __get_irq_count("PIN:");
> +	piw_count = __get_irq_count("PIW:");

This is obviously very Intel specific information.  If you're going to print the
posted IRQ info, then the test should also print e.g. AMD GALogIntr events.

> +	printf("%s %s MSI-X[%d] (IRQ-%d) %d times\n",
> +	       use_device_msi ? "Triggering" : "Notifying the eventfd for",
> +	       device_bdf, msi, irq, nr_irqs);
> +
> +	kvm_assign_irqfd(vm, gsi, device->msi_eventfds[msi]);
> +
> +	sync_global_to_guest(vm, x2apic);
> +	sync_global_to_guest(vm, block);
> +
> +	for (i = 0; i < nr_vcpus; i++)
> +		pthread_create(&vcpu_threads[i], NULL, vcpu_thread_main, vcpus[i]);
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		struct kvm_vcpu *vcpu = vcpus[i];
> +
> +		while (!READ_FROM_GUEST(vm, guest_ready_for_irqs[vcpu->id]))
> +			continue;
> +	}
> +
> +	if (pin_vcpus) {
> +		ret = sched_getaffinity(vcpu_tids[0], sizeof(available_cpus), &available_cpus);
> +		TEST_ASSERT(ret == 0, "sched_getaffinity() failed");

!ret

Though this is another syscall that deserves a wrapper in kvm_syscalls.h.

> +
> +		if (nr_vcpus > CPU_COUNT(&available_cpus)) {
> +			printf("There are more vCPUs than pCPUs; refusing to pin.\n");
> +			pin_vcpus = false;

Why is this not an assertion?  Alternatively, why not double/triple/quadruple up
as needed?

> +		}
> +	}
> +
> +	if (irq_affinity) {
> +		char path[PATH_MAX];
> +
> +		snprintf(path, sizeof(path), "/proc/irq/%d/smp_affinity_list", irq);
> +		irq_affinity_fp = fopen(path, "w");
> +		TEST_ASSERT(irq_affinity_fp, "fopen(%s) failed", path);

More code that belongs in the library.

> +	}
> +
> +	/* Set a consistent seed so that test are repeatable. */
> +	srand(0);

We should really figure out a solution for reproducible random numbers in the
host.  Ah, and kvm_selftest_init()'s handling of guest random seeds is flawed,
because it does random() without srand() and so AFAICT, gets the same seed every
time.  E.g. seems like we want something like this, but with a way to override
"random_seed" from a test.

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 5744643d9ec3..0118fd2ba56b 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2310,6 +2310,7 @@ void __attribute((constructor)) kvm_selftest_init(void)
        struct sigaction sig_sa = {
                .sa_handler = report_unexpected_signal,
        };
+       int random_seed;
 
        /* Tell stdout not to buffer its content. */
        setbuf(stdout, NULL);
@@ -2319,8 +2320,13 @@ void __attribute((constructor)) kvm_selftest_init(void)
        sigaction(SIGILL, &sig_sa, NULL);
        sigaction(SIGFPE, &sig_sa, NULL);
 
+       random_seed = time(NULL);
+       srand(random_seed);
+
        guest_random_seed = last_guest_seed = random();
-       pr_info("Random seed: 0x%x\n", guest_random_seed);
+
+       pr_info("Guest random seed: 0x%x (srand: 0x%x)\n",
+               guest_random_seed, random_seed);
 
        kvm_selftest_arch_init();
 }


> +	for (i = 0; i < nr_irqs; i++) {
> +		struct kvm_vcpu *vcpu = vcpus[i % nr_vcpus];
> +		const bool do_nmi = nmi && (i & BIT(2));
> +		const bool do_empty = empty && (i & BIT(3));
> +		struct timespec start;
> +
> +		if (do_empty)
> +			kvm_clear_gsi_routes(vm);
> +
> +		kvm_route_msi(vm, gsi, vcpu, vector, do_nmi);
> +
> +		if (irq_affinity && vcpu->id == 0) {

Please add comments explaining why affinity related stuff is only applied to
vCPU0.  I could probably figure it out, but I really shouldn't have to.

> +			irq_cpu = rand() % get_nprocs();
> +
> +			ret = fprintf(irq_affinity_fp, "%d\n", irq_cpu);
> +			TEST_ASSERT(ret > 0, "Failed to affinitize IRQ-%d to CPU %d", irq, irq_cpu);



> +		}
> +
> +		if (pin_vcpus && vcpu->id == 0)
> +			pin_vcpu_threads(nr_vcpus, rand() % get_nprocs(), &available_cpus);
> +
> +		for (j = 0; j < nr_vcpus; j++) {
> +			TEST_ASSERT_EQ(READ_FROM_GUEST(vm, guest_received_irq[vcpu->id]), false);
> +			TEST_ASSERT_EQ(READ_FROM_GUEST(vm, guest_received_nmi[vcpu->id]), false);

These won't generate helpful assert messages.

> +		}
> +
> +		send_msi(device, use_device_msi, msi);
> +
> +		clock_gettime(CLOCK_MONOTONIC, &start);
> +		for (;;) {
> +			if (do_nmi && READ_FROM_GUEST(vm, guest_received_nmi[vcpu->id]))
> +				break;
> +
> +			if (!do_nmi && READ_FROM_GUEST(vm, guest_received_irq[vcpu->id]))
> +				break;

		received_irq = do_nmi ? &guest_received_nmi[vcpu->id] :
					&guest_received_irq[vcpu->id];
		while (!READ_FROM_GUEST(vm, *received_irq))
			if (timespec_to_ns(timespec_elapsed(start)) > TIMEOUT_NS) {
				...
			}	

			cpu_relax();
		}

> +
> +			if (timespec_to_ns(timespec_elapsed(start)) > TIMEOUT_NS) {
> +				printf("Timeout waiting for interrupt!\n");
> +				printf("  vCPU: %d\n", vcpu->id);
> +				printf("  do_nmi: %d\n", do_nmi);
> +				printf("  do_empty: %d\n", do_empty);
> +				if (irq_affinity)
> +					printf("  irq_cpu: %d\n", irq_cpu);

vfio_pci_irq_test.c: In function ‘main’:
vfio_pci_irq_test.c:469:41: error: ‘irq_cpu’ may be used uninitialized [-Werror=maybe-uninitialized]
  469 |                                         printf("  irq_cpu: %d\n", irq_cpu);
      |                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vfio_pci_irq_test.c:332:13: note: ‘irq_cpu’ was declared here
  332 |         int irq_cpu;
      |             ^~~~~~~

> +				if (pin_vcpus)
> +					printf("  vcpu_cpu: %d\n", get_cpu(vcpu));
> +
> +				TEST_FAIL("vCPU never received IRQ!\n");
> +			}
> +		}

		TEST_ASSERT(guest_received_irq[vcpu->id] !=
			    guest_received_nmi[vcpu->id],
			    "blah blah blah");

		WRITE_TO_GUEST(vm, *received_irq, false);

> +		if (do_nmi)
> +			WRITE_TO_GUEST(vm, guest_received_nmi[vcpu->id], false);
> +		else
> +			WRITE_TO_GUEST(vm, guest_received_irq[vcpu->id], false);
> +	}
> +
> +	WRITE_TO_GUEST(vm, done, true);
> +
> +	for (i = 0; i < nr_vcpus; i++) {
> +		if (block) {
> +			kvm_route_msi(vm, gsi, vcpus[i], vector, false);
> +			send_msi(device, false, msi);
> +		}
> +
> +		pthread_join(vcpu_threads[i], NULL);
> +	}
> +
> +	if (irq_affinity)
> +		fclose(irq_affinity_fp);
> +
> +	printf("Host interrupts handled:\n");
> +	printf("  IRQ-%d: %lu\n", irq, get_irq_count(irq) - irq_count);
> +	printf("  Posted-interrupt notification events: %lu\n",
> +	       __get_irq_count("PIN:") - pin_count);
> +	printf("  Posted-interrupt wakeup events: %lu\n",
> +	       __get_irq_count("PIW:") - piw_count);
> +
> +	vfio_pci_device_cleanup(device);
> +
> +	return 0;
> +}
> -- 
> 2.51.0.384.g4c02a37b29-goog
> 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs
  2025-10-27 16:52   ` Sean Christopherson
@ 2025-10-27 17:46     ` David Matlack
  2025-10-27 18:50       ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: David Matlack @ 2025-10-27 17:46 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Alex Williamson, kvm

On Mon, Oct 27, 2025 at 9:52 AM Sean Christopherson <seanjc@google.com> wrote:

Thank you for the feedback!

> On Fri, Sep 12, 2025, David Matlack wrote:
> >
> > This test only supports x86_64 for now, but can be ported to other
> > architectures in the future.
>
> Can it though?  There are bits and pieces that can be reused, but this test is
> x86 through and through.

Delivering MSIs from a vfio-pci device into a guest is not an
x86-specific thing. But I think you could make the argument that it
will be simpler for each architecture to have their own version of
this test, with some shared code, rather than the other way around.

> >  tools/testing/selftests/kvm/Makefile.kvm      |   1 +
> >  .../testing/selftests/kvm/vfio_pci_irq_test.c | 507 ++++++++++++++++++
>
> Please break this into multiple patches, e.g. a "basic" patch and and then roughly
> one per "advanced" command line option.

Will do.

> > +static pid_t vcpu_tids[KVM_MAX_VCPUS];
>
> s/tids/pids?

This stores output from gettid(), so tids felt more appropriate.

> > +#define TIMEOUT_NS (2ULL * 1000 * 1000 * 1000)
>
> The timeout should be configurable via command line.

Will do.

> > +#define READ_FROM_GUEST(_vm, _variable) ({           \
> > +     sync_global_from_guest(_vm, _variable);         \
> > +     READ_ONCE(_variable);                           \
> > +})
> > +
> > +#define WRITE_TO_GUEST(_vm, _variable, _value) do {  \
> > +     WRITE_ONCE(_variable, _value);                  \
> > +     sync_global_to_guest(_vm, _variable);           \
> > +} while (0)
>
> These belong in a separate patch, and in tools/testing/selftests/kvm/include/kvm_util.h.

Will do.

> > +static void guest_enable_interrupts(void)
>
> This is a misleading name, e.g. I would expect it to _just_ be sti_nop().  If the
> intent is to provide a hook for a potential non-x86 implementation, then it should
> probably be split into guest_arch_test_setup() and guest_arch_irq_enable() or so
> (to align with the kernel's local_irq_{dis,en}able()).
>
> As is, I would just omit the helper.

Will do.

> > +static void guest_irq_handler(struct ex_regs *regs)
> > +{
> > +     WRITE_ONCE(guest_received_irq[guest_get_vcpu_id()], true);
>
> Hmm, using APID ID works, but I don't like the hidden dependency on the library
> using ascending IDs starting from '0'.  This would also be a good opportunity to
> improve the core infrastructure.

What dependency are you referring to? I think the only requirement is
vcpu->id == APIC ID. Are you saying tests should not make that
assumption?

> > +     WRITE_ONCE(vcpu_tids[vcpu->id], syscall(__NR_gettid));
>
> Please add wrapper in tools/testing/selftests/kvm/include/kvm_syscalls.h.

Will do.

> > +static void pin_vcpu_threads(int nr_vcpus, int start_cpu, cpu_set_t *available_cpus)
> > +{
> > +     const size_t size = sizeof(cpu_set_t);
> > +     int nr_cpus, cpu, vcpu_index = 0;
> > +     cpu_set_t target_cpu;
> > +     int r;
> > +
> > +     nr_cpus = get_nprocs();
>
> Generally speaking, KVM selftests try to avoid affining tasks to CPUs that are
> outside of the original affinity list.  See various usage of sched_getaffinity().

available_cpus is initialized by calling sched_getaffinity().

> > +static FILE *open_proc_interrupts(void)
> > +{
> > +     FILE *fp;
> > +
> > +     fp = fopen("/proc/interrupts", "r");
> > +     TEST_ASSERT(fp, "fopen(/proc/interrupts) failed");
>
> open_path_or_exit()?

I guess I'll have to rework this code to use fds instead of FILE?

>
> > +
> > +     return fp;
> > +}
>
> And all of these /proc helpers belong in library code.

Will do.

> > +static int setup_msi(struct vfio_pci_device *device, bool use_device_msi)
> > +{
> > +     const int flags = MAP_SHARED | MAP_ANONYMOUS;
> > +     const int prot = PROT_READ | PROT_WRITE;
> > +     struct vfio_dma_region *region;
> > +
> > +     if (use_device_msi) {
> > +             /* A driver is required to generate an MSI. */
> > +             TEST_REQUIRE(device->driver.ops);
> > +
> > +             /* Set up a DMA-able region for the driver to use. */
>
> Why?

I will extend the comment to explain.

Each driver needs DMA-able memory so that it can post commands to the
device to get the device to perform actions like sending an MSI and
doing a memcpy.

You might wonder why tests have to worry about setting up this region
and why the driver doesn't just do it automatically. That is because
some tests will want to control how the memory is mapped in the host
(e.g. Live Update tests want to use memfds so it can persist them
across Live Update) and how it is mapped into the IOMMU.

That being said, I think it would be worth adding a helper to set up a
default mapping for drivers for tests that don't care.

>
> > +             region = &device->driver.region;
> > +             region->iova = 0;
> > +             region->size = SZ_2M;
> > +             region->vaddr = mmap(NULL, region->size, prot, flags, -1, 0);
>
> kvm_mmap()

Will do.

> > +static void send_msi(struct vfio_pci_device *device, bool use_device_msi, int msi)
>
> IMO, this helper does more harm than good.  There is only one real user, the
> second call unconditionally passes %false for @use_device_msi.
>
> If you drop the helper, than there should be no need to assert that the MSI is
> the device MSI on *every* send via device.

Will do.

> > +int main(int argc, char **argv)
> > +{
> > +     /* Random non-reserved vector and GSI to use for the device IRQ */
> > +     const u8 vector = 0xe0;
>
> s/random/arbitrary
>
> Why not make it truly random?

Only because there's already a lot going on in this test. Do you think
it's worth randomizing these?

> > +     irq_count = get_irq_count(irq);
> > +     pin_count = __get_irq_count("PIN:");
> > +     piw_count = __get_irq_count("PIW:");
>
> This is obviously very Intel specific information.  If you're going to print the
> posted IRQ info, then the test should also print e.g. AMD GALogIntr events.

I saw PIN and PIW in /proc/interrupts when I tested on AMD hosts,
that's why I included them both by default.

I can look into adding GALogIntr if you want.

> > +     if (pin_vcpus) {
> > +             ret = sched_getaffinity(vcpu_tids[0], sizeof(available_cpus), &available_cpus);
> > +             TEST_ASSERT(ret == 0, "sched_getaffinity() failed");
>
> !ret
>
> Though this is another syscall that deserves a wrapper in kvm_syscalls.h.

Will do.

> > +             if (nr_vcpus > CPU_COUNT(&available_cpus)) {
> > +                     printf("There are more vCPUs than pCPUs; refusing to pin.\n");
> > +                     pin_vcpus = false;
>
> Why is this not an assertion?  Alternatively, why not double/triple/quadruple up
> as needed?

I don't recall why I added this... I can probably just drop it in the
next version. I'll report back here if I remember why when working on
v2.

> > +     if (irq_affinity) {
> > +             char path[PATH_MAX];
> > +
> > +             snprintf(path, sizeof(path), "/proc/irq/%d/smp_affinity_list", irq);
> > +             irq_affinity_fp = fopen(path, "w");
> > +             TEST_ASSERT(irq_affinity_fp, "fopen(%s) failed", path);
>
> More code that belongs in the library.

Will do.

> > +     /* Set a consistent seed so that test are repeatable. */
> > +     srand(0);
>
> We should really figure out a solution for reproducible random numbers in the
> host.  Ah, and kvm_selftest_init()'s handling of guest random seeds is flawed,
> because it does random() without srand() and so AFAICT, gets the same seed every
> time.  E.g. seems like we want something like this, but with a way to override
> "random_seed" from a test.
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 5744643d9ec3..0118fd2ba56b 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -2310,6 +2310,7 @@ void __attribute((constructor)) kvm_selftest_init(void)
>         struct sigaction sig_sa = {
>                 .sa_handler = report_unexpected_signal,
>         };
> +       int random_seed;
>
>         /* Tell stdout not to buffer its content. */
>         setbuf(stdout, NULL);
> @@ -2319,8 +2320,13 @@ void __attribute((constructor)) kvm_selftest_init(void)
>         sigaction(SIGILL, &sig_sa, NULL);
>         sigaction(SIGFPE, &sig_sa, NULL);
>
> +       random_seed = time(NULL);
> +       srand(random_seed);
> +
>         guest_random_seed = last_guest_seed = random();
> -       pr_info("Random seed: 0x%x\n", guest_random_seed);
> +
> +       pr_info("Guest random seed: 0x%x (srand: 0x%x)\n",
> +               guest_random_seed, random_seed);
>
>         kvm_selftest_arch_init();
>  }

Just to make sure I understand: You are proposing using the current
time as the seed and printing it to the console. That way each run
uses a different random seed and we get broader test coverage. Then if
someone wants to reproduce a test result, there would be some way for
them to override the seed via cmdline? That sounds reasonable to me, I
can take a look at adding that in the next version.

> > +             if (irq_affinity && vcpu->id == 0) {
>
> Please add comments explaining why affinity related stuff is only applied to
> vCPU0.  I could probably figure it out, but I really shouldn't have to.

Will do.

> > +             for (j = 0; j < nr_vcpus; j++) {
> > +                     TEST_ASSERT_EQ(READ_FROM_GUEST(vm, guest_received_irq[vcpu->id]), false);
> > +                     TEST_ASSERT_EQ(READ_FROM_GUEST(vm, guest_received_nmi[vcpu->id]), false);
>
> These won't generate helpful assert messages.

Good point. I'll improve these in the next version.

> > +             for (;;) {
> > +                     if (do_nmi && READ_FROM_GUEST(vm, guest_received_nmi[vcpu->id]))
> > +                             break;
> > +
> > +                     if (!do_nmi && READ_FROM_GUEST(vm, guest_received_irq[vcpu->id]))
> > +                             break;
>
>                 received_irq = do_nmi ? &guest_received_nmi[vcpu->id] :
>                                         &guest_received_irq[vcpu->id];
>                 while (!READ_FROM_GUEST(vm, *received_irq))
>                         if (timespec_to_ns(timespec_elapsed(start)) > TIMEOUT_NS) {
>                                 ...
>                         }
>
>                         cpu_relax();
>                 }

LGTM

> > +
> > +                     if (timespec_to_ns(timespec_elapsed(start)) > TIMEOUT_NS) {
> > +                             printf("Timeout waiting for interrupt!\n");
> > +                             printf("  vCPU: %d\n", vcpu->id);
> > +                             printf("  do_nmi: %d\n", do_nmi);
> > +                             printf("  do_empty: %d\n", do_empty);
> > +                             if (irq_affinity)
> > +                                     printf("  irq_cpu: %d\n", irq_cpu>
> > +             if (do_nmi)
> > +                     WRITE_TO_GUEST(vm, guest_received_nmi[vcpu->id], false);
> > +             else
> > +                     WRITE_TO_GUEST(vm, guest_received_irq[vcpu->id], false);
> > +     }
> > +
> > +     WRITE_TO_GUEST(vm, done, true);
> > +
> > +     for (i = 0; i < nr_vcpus; i++) {
> > +             if (block) {
> > +                     kvm_route_msi(vm, gsi, vcpus[i], vector, false);
> > +                     send_msi(device, false, msi);
> > +             }
> > +
> > +             pthread_join(vcpu_threads[i], NULL);
> > +     }
> > +
> > +     if (irq_affinity)
> > +             fclose(irq_affinity_fp);
> > +
> > +     printf("Host interrupts handled:\n");
> > +     printf("  IRQ-%d: %lu\n", irq, get_irq_count(irq) - irq_count);
> > +     printf("  Posted-interrupt notification events: %lu\n",
> > +            __get_irq_count("PIN:") - pin_count);
> > +     printf("  Posted-interrupt wakeup events: %lu\n",
> > +            __get_irq_count("PIW:") - piw_count);
> > +
> > +     vfio_pci_device_cleanup(device);
> > +
> > +     return 0;
> > +}
> > --
> > 2.51.0.384.g4c02a37b29-goog
> >);
>
> vfio_pci_irq_test.c: In function ‘main’:
> vfio_pci_irq_test.c:469:41: error: ‘irq_cpu’ may be used uninitialized [-Werror=maybe-uninitialized]
>   469 |                                         printf("  irq_cpu: %d\n", irq_cpu);
>       |                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> vfio_pci_irq_test.c:332:13: note: ‘irq_cpu’ was declared here
>   332 |         int irq_cpu;
>       |             ^~~~~~~

Ack, will fix.

> > +                             if (pin_vcpus)
> > +                                     printf("  vcpu_cpu: %d\n", get_cpu(vcpu));
> > +
> > +                             TEST_FAIL("vCPU never received IRQ!\n");
> > +                     }
> > +             }
>
>                 TEST_ASSERT(guest_received_irq[vcpu->id] !=
>                             guest_received_nmi[vcpu->id],
>                             "blah blah blah");
>
>                 WRITE_TO_GUEST(vm, *received_irq, false);

LGTM

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs
  2025-10-27 17:46     ` David Matlack
@ 2025-10-27 18:50       ` Sean Christopherson
  0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2025-10-27 18:50 UTC (permalink / raw)
  To: David Matlack; +Cc: Paolo Bonzini, Alex Williamson, kvm

On Mon, Oct 27, 2025, David Matlack wrote:
> On Mon, Oct 27, 2025 at 9:52 AM Sean Christopherson <seanjc@google.com> wrote:
> 
> Thank you for the feedback!
> 
> > On Fri, Sep 12, 2025, David Matlack wrote:
> > >
> > > This test only supports x86_64 for now, but can be ported to other
> > > architectures in the future.
> >
> > Can it though?  There are bits and pieces that can be reused, but this test is
> > x86 through and through.
> 
> Delivering MSIs from a vfio-pci device into a guest is not an
> x86-specific thing. But I think you could make the argument that it
> will be simpler for each architecture to have their own version of
> this test, with some shared code, rather than the other way around.

Right, the concept is x86, but the code as written is very x86 centric.

> > > +static void guest_irq_handler(struct ex_regs *regs)
> > > +{
> > > +     WRITE_ONCE(guest_received_irq[guest_get_vcpu_id()], true);
> >
> > Hmm, using APID ID works, but I don't like the hidden dependency on the library
> > using ascending IDs starting from '0'.  This would also be a good opportunity to
> > improve the core infrastructure.
> 
> What dependency are you referring to? I think the only requirement is
> vcpu->id == APIC ID

Yep, this one.

> Are you saying tests should not make that assumption?

Ya, ideally, they would not.

> > > +     WRITE_ONCE(vcpu_tids[vcpu->id], syscall(__NR_gettid));
> >
> > Please add wrapper in tools/testing/selftests/kvm/include/kvm_syscalls.h.
> 
> Will do.
> 
> > > +static void pin_vcpu_threads(int nr_vcpus, int start_cpu, cpu_set_t *available_cpus)
> > > +{
> > > +     const size_t size = sizeof(cpu_set_t);
> > > +     int nr_cpus, cpu, vcpu_index = 0;
> > > +     cpu_set_t target_cpu;
> > > +     int r;
> > > +
> > > +     nr_cpus = get_nprocs();
> >
> > Generally speaking, KVM selftests try to avoid affining tasks to CPUs that are
> > outside of the original affinity list.  See various usage of sched_getaffinity().
> 
> available_cpus is initialized by calling sched_getaffinity().

Ah, I missed that in the for-loop.
 
> > > +static FILE *open_proc_interrupts(void)
> > > +{
> > > +     FILE *fp;
> > > +
> > > +     fp = fopen("/proc/interrupts", "r");
> > > +     TEST_ASSERT(fp, "fopen(/proc/interrupts) failed");
> >
> > open_path_or_exit()?
> 
> I guess I'll have to rework this code to use fds instead of FILE?

Hmm, not necessarily.  E.g. maybe open_file_or_exit()?  I don't have a strong
preference (and have put zero thought into what would work well).
 
> > > +int main(int argc, char **argv)
> > > +{
> > > +     /* Random non-reserved vector and GSI to use for the device IRQ */
> > > +     const u8 vector = 0xe0;
> >
> > s/random/arbitrary
> >
> > Why not make it truly random?
> 
> Only because there's already a lot going on in this test. Do you think
> it's worth randomizing these?

Probably?  But without a command line option, to keep this somewhat less crazy?

> > > +     irq_count = get_irq_count(irq);
> > > +     pin_count = __get_irq_count("PIN:");
> > > +     piw_count = __get_irq_count("PIW:");
> >
> > This is obviously very Intel specific information.  If you're going to print the
> > posted IRQ info, then the test should also print e.g. AMD GALogIntr events.
> 
> I saw PIN and PIW in /proc/interrupts when I tested on AMD hosts,

The kernel really should key off CONFIG_KVM_INTEL={m,y}, not CONFIG_KVM (though
in practice it doesn't matter all that much).

> that's why I included them both by default.

I think it makes sense to only print PIN+PIW on Intel, otherwise it's pure noise.

> I can look into adding GALogIntr if you want.

PIN is midly interesting.  PIW and GALogIntr are much more interesting from a
coverage perspective.  E.g. if the deltas for PIW and GALogIntr are '0', then
the test likely isn't exercising the blocking path.

> > > +     /* Set a consistent seed so that test are repeatable. */
> > > +     srand(0);
> >
> > We should really figure out a solution for reproducible random numbers in the
> > host.  Ah, and kvm_selftest_init()'s handling of guest random seeds is flawed,
> > because it does random() without srand() and so AFAICT, gets the same seed every
> > time.  E.g. seems like we want something like this, but with a way to override
> > "random_seed" from a test.
> >
> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> > index 5744643d9ec3..0118fd2ba56b 100644
> > --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> > @@ -2310,6 +2310,7 @@ void __attribute((constructor)) kvm_selftest_init(void)
> >         struct sigaction sig_sa = {
> >                 .sa_handler = report_unexpected_signal,
> >         };
> > +       int random_seed;
> >
> >         /* Tell stdout not to buffer its content. */
> >         setbuf(stdout, NULL);
> > @@ -2319,8 +2320,13 @@ void __attribute((constructor)) kvm_selftest_init(void)
> >         sigaction(SIGILL, &sig_sa, NULL);
> >         sigaction(SIGFPE, &sig_sa, NULL);
> >
> > +       random_seed = time(NULL);
> > +       srand(random_seed);
> > +
> >         guest_random_seed = last_guest_seed = random();
> > -       pr_info("Random seed: 0x%x\n", guest_random_seed);
> > +
> > +       pr_info("Guest random seed: 0x%x (srand: 0x%x)\n",
> > +               guest_random_seed, random_seed);
> >
> >         kvm_selftest_arch_init();
> >  }
> 
> Just to make sure I understand: You are proposing using the current
> time as the seed

Or anything that's somewhat random.  I don't know what glibc is doing under the
hood, so it's entirely possible/likely there's a much better method.

> and printing it to the console. That way each run uses a different random
> seed and we get broader test coverage. Then if someone wants to reproduce a
> test result, there would be some way for them to override the seed via
> cmdline? That sounds reasonable to me, I can take a look at adding that in
> the next version.

Yep.  That was the intent with the guest_random_seed, I just didn't implement it
very well :-)

Regarding reproducibility, one idea would be to have kvm_selftest_init() pull the
seed from an environment variable, e.g. so that reproducing with a specific seed
doesn't require hacking or test support.  I'm not entirely sure I like the idea
of using environment variables though...

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-10-27 18:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12 22:25 [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts David Matlack
2025-09-12 22:25 ` [PATCH 1/2] KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests David Matlack
2025-09-12 22:25 ` [PATCH 2/2] KVM: selftests: Add a test for vfio-pci device IRQ delivery to vCPUs David Matlack
2025-10-27 16:52   ` Sean Christopherson
2025-10-27 17:46     ` David Matlack
2025-10-27 18:50       ` Sean Christopherson
2025-10-27 15:47 ` [PATCH 0/2] KVM: selftests: Link with VFIO selftests lib and test device interrupts Sean Christopherson
2025-10-27 16:03   ` David Matlack

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox