All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Aaron Lewis <aaronlewis@google.com>
Cc: alex.williamson@redhat.com, bhelgaas@google.com,
	vipinsh@google.com, kvm@vger.kernel.org, seanjc@google.com,
	jrhilke@google.com
Subject: Re: [RFC PATCH 2/3] vfio: selftests: Introduce the selftest vfio_flr_test
Date: Fri, 27 Jun 2025 22:37:19 +0000	[thread overview]
Message-ID: <aF8dH06DsMpWCLti@google.com> (raw)
In-Reply-To: <20250626180424.632628-3-aaronlewis@google.com>

On 2025-06-26 06:04 PM, Aaron Lewis wrote:
> Introduce the VFIO selftest "vfio_flr_test" as a way of demonstrating a
> latency issue that occurs when multiple devices are reset at the
> same time.  To reproduce this issue simply run the selftest, e.g.
> 
>   $ VFIO_BDF_1=0000:17:0c.1 VFIO_BDF_2=0000:17:0c.2 ./vfio_flr_test
>   [0x7f61bb888700] '0000:17:0c.2' initialized in 108.6ms.
>   [0x7f61bc089700] '0000:17:0c.1' initialized in 212.3ms.
> 
> When the devices are initialized in the test, despite that happening on
> separate threads, one of the devices will take >200ms.  Initializing a
> device requires a FLR (function level reset), so the device taking
> ~100ms to initialize is expected, however, when a second devices is
> initialized in parallel the desired behavior would be that the FLR
> happens concurrently.  Unfortunately, due to a lock in
> vfio_df_group_open() that does not happen.
> 
> As for how the selftest works, it requires two BDF's be passed to it as
> environment variables as shown in the example above:
>  - VFIO_BDF_1
>  - VFIO_BDF_2
> 
> It then initializes each of them on separate threads and reports how long
> they took.
> 
> The Thread ID is included in the debug prints to show what thread they
> are executing from, and to show that this is a parallel operation.
> 
> Some of the prints are commented out to allow the focus to be on the
> issue at hand, however, they can be useful for debugging so they were
> left in.
> 
> Signed-off-by: Aaron Lewis <aaronlewis@google.com>
> ---
>  tools/testing/selftests/vfio/Makefile        |   1 +
>  tools/testing/selftests/vfio/vfio_flr_test.c | 120 +++++++++++++++++++
>  2 files changed, 121 insertions(+)
>  create mode 100644 tools/testing/selftests/vfio/vfio_flr_test.c
> 
> diff --git a/tools/testing/selftests/vfio/Makefile b/tools/testing/selftests/vfio/Makefile
> index 324ba0175a33..57635883f2c2 100644
> --- a/tools/testing/selftests/vfio/Makefile
> +++ b/tools/testing/selftests/vfio/Makefile
> @@ -1,5 +1,6 @@
>  CFLAGS = $(KHDR_INCLUDES)
>  TEST_GEN_PROGS += vfio_dma_mapping_test
> +TEST_GEN_PROGS += vfio_flr_test
>  TEST_GEN_PROGS += vfio_iommufd_setup_test
>  TEST_GEN_PROGS += vfio_pci_device_test
>  TEST_GEN_PROGS += vfio_pci_driver_test
> diff --git a/tools/testing/selftests/vfio/vfio_flr_test.c b/tools/testing/selftests/vfio/vfio_flr_test.c
> new file mode 100644
> index 000000000000..5522f6d256dc
> --- /dev/null
> +++ b/tools/testing/selftests/vfio/vfio_flr_test.c

I don't think vfio_flr_test.c is the right name for this test. FLR is
something that happens in the kernel during one specific ioctl() deep
within vfio_pci_device_init(). And it also might not happen at all (if
the device being used does not support FLR).

What this test actually does it measure the time it takes to do
vfio_pci_device_init() in parallel across multiple threads.

How about vfio_pci_device_init_perf_test.c?

> @@ -0,0 +1,120 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <fcntl.h>
> +#include <pthread.h>
> +#include <stdlib.h>
> +
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +
> +#include <linux/limits.h>
> +#include <linux/pci_regs.h>
> +#include <linux/sizes.h>
> +#include <linux/vfio.h>
> +
> +#include <vfio_util.h>
> +
> +#define NR_THREADS 2
> +
> +double freq_ms = .0;
> +
> +static bool is_bdf(const char *str)
> +{
> +	unsigned int s, b, d, f;
> +	int length, count;
> +
> +	count = sscanf(str, "%4x:%2x:%2x.%2x%n", &s, &b, &d, &f, &length);
> +	return count == 4 && length == strlen(str);
> +}
> +
> +static const char *vfio_get_bdf(const char *env_var)
> +{
> +	char *bdf;
> +
> +	bdf = getenv(env_var);
> +	if (bdf) {
> +		VFIO_ASSERT_TRUE(is_bdf(bdf), "Invalid BDF: %s\n", bdf);
> +		return bdf;
> +	}
> +
> +	return NULL;
> +}

Please update the library to support multiple BDFs rather than creating
a custom solution for this test.

In the library we have support for getting a single BDF from either the
command line or the environment variable $VFIO_SELFTESTS_BDF via:

  const char *device_bdf = vfio_selftests_get_bdf(&argc, argv);

Add a variant of this that supports looking for and returning multiple
BDFs and use it here, e.g.

  const char *device_bdfs[2];

  vfio_selftests_get_bdfs(&argc, argv, device_bdfs, 2);

Then we can reimplement vfio_selftests_get_bdf() on top like this:

const char *vfio_selftests_get_bdf(int *argc, char *argv[])
{
	const char *bdf;

	vfio_selftests_get_bdfs(argc, argv, &bdf, 1);
	return bdf;
}

The new function should support getting the BDFs from either the command
line or $VFIO_SELFTESTS_BDF, just like the current function.

For bonus points we could even make vfio_selftests_get_bdfs() support
any number of BDFs by dynamically allocating the array and returning the
number it found on the command line or in the env var. That way your
test can support running with any number of devices (not just 2).

Once you do this then you can run your test just like any other VFIO
selftests, e.g.:

   $ ./run.sh -d 0000:17:0c.1 -d 0000:17:0c.2 -- ./vfio_flr_test

Or like this:

   $ ./run.sh -d 0000:17:0c.1 -d 0000:17:0c.2 -s
   $ ./vfio_flr_test
   $ exit


> +
> +static inline uint64_t rdtsc(void)
> +{
> +	uint32_t eax, edx;
> +	uint64_t tsc_val;
> +	/*
> +	 * The lfence is to wait (on Intel CPUs) until all previous
> +	 * instructions have been executed. If software requires RDTSC to be
> +	 * executed prior to execution of any subsequent instruction, it can
> +	 * execute LFENCE immediately after RDTSC
> +	 */
> +	__asm__ __volatile__("lfence; rdtsc; lfence" : "=a"(eax), "=d"(edx));
> +	tsc_val = ((uint64_t)edx) << 32 | eax;
> +	return tsc_val;
> +}
> +
> +static void init_freq_ms(void)
> +{
> +	uint64_t tsc_start, tsc_end;
> +
> +	tsc_start = rdtsc();
> +	sleep(1);
> +	tsc_end = rdtsc();
> +	freq_ms = (double)(tsc_end - tsc_start) / 1000.0;
> +	// printf("[0x%lx] freq_ms = %.03lf\n", pthread_self(), freq_ms);
> +}
> +
> +static double now_ms(void)
> +{
> +	return (double)rdtsc() / freq_ms;
> +}
> +
> +static double time_since_last_ms(double *last_ms)
> +{
> +	double now = now_ms();
> +	double duration = now - *last_ms;
> +
> +	*last_ms = now;
> +	return duration;
> +}

Please use clock_gettime(CLOCK_MONOTONIC) to measure elapsed time in a
platform-independent way.

> +
> +static void *flr_test_thread(void *arg)
> +{
> +	const char *device_bdf = (const char *)arg;
> +	struct vfio_pci_device *device;
> +	double last = now_ms();

Should we put a simple spin-barrier here to ensure the threads do
vfio_pci_device_init() at the same time (at least as best as we can
guarantee from userspace)?

> +
> +	device = vfio_pci_device_init(device_bdf, default_iommu_mode);
> +	printf("[0x%lx] '%s' initialized in %.1lfms.\n",
> +	       pthread_self(), device_bdf, time_since_last_ms(&last));

I think we need a barrier here to make sure nothing in
vfio_pci_device_cleanup() interferes with the performance of a
still-running vfio_pci_device_init().

> +
> +	vfio_pci_device_cleanup(device);
> +	// printf("[0x%lx] '%s' cleaned up in %.1lfms\n",
> +	//        pthread_self(), device_bdf, time_since_last_ms(&last));
> +
> +	return NULL;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	const char *device_bdfs[NR_THREADS];
> +	pthread_t threads[NR_THREADS];
> +	int i;
> +
> +	init_freq_ms();
> +
> +	device_bdfs[0] = vfio_get_bdf("VFIO_BDF_1");
> +	device_bdfs[1] = vfio_get_bdf("VFIO_BDF_2");
> +
> +	// printf("[0x%lx] nr_threads = '%d'\n", pthread_self(), NR_THREADS);
> +
> +	for (i = 0; i < NR_THREADS; i++)
> +		pthread_create(&threads[i], NULL, flr_test_thread,
> +			       (void *)device_bdfs[i]);
> +
> +	for (i = 0; i < NR_THREADS; i++)
> +		pthread_join(threads[i], NULL);
> +
> +	return 0;
> +}
> -- 
> 2.50.0.727.gbf7dc18ff4-goog
> 

  reply	other threads:[~2025-06-27 22:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-26 18:04 [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Aaron Lewis
2025-06-26 18:04 ` [RFC PATCH 1/3] vfio: selftests: Allow run.sh to bind to more than one device Aaron Lewis
2025-06-27 22:19   ` David Matlack
2025-06-26 18:04 ` [RFC PATCH 2/3] vfio: selftests: Introduce the selftest vfio_flr_test Aaron Lewis
2025-06-27 22:37   ` David Matlack [this message]
2025-06-26 18:04 ` [RFC PATCH 3/3] vfio: selftests: Include a BPF script to pair with " Aaron Lewis
2025-06-27 22:51   ` David Matlack
2025-06-26 19:26 ` [RFC PATCH 0/3] vfio: selftests: Add VFIO selftest to demontrate a latency issue Alex Williamson
2025-06-26 20:56   ` Jason Gunthorpe
2025-06-26 21:58     ` Aaron Lewis
2025-06-26 22:10       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aF8dH06DsMpWCLti@google.com \
    --to=dmatlack@google.com \
    --cc=aaronlewis@google.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=jrhilke@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=vipinsh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.