Re: [PATCH v13 4/4] vfio: selftests: Add NVIDIA Falcon driver for DMA testing

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: David Matlack <dmatlack@google.com>
To: Alex Williamson <alex.williamson@nvidia.com>
Cc: alex@shazbot.org, shuah@kernel.org, Rubin Du <rubind@nvidia.com>,
	kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v13 4/4] vfio: selftests: Add NVIDIA Falcon driver for DMA testing
Date: Fri, 10 Apr 2026 22:19:59 +0000	[thread overview]
Message-ID: <adl3j0piontmOQZS@google.com> (raw)
In-Reply-To: <20260408225459.3088623-5-alex.williamson@nvidia.com>

On 2026-04-08 04:54 PM, Alex Williamson wrote:
> From: Rubin Du <rubind@nvidia.com>
> 
> Add a new VFIO PCI driver for NVIDIA GPUs that enables DMA testing
> via the Falcon (Fast Logic Controller) microcontrollers. This driver
> extracts and adapts the DMA test functionality from NVIDIA's
> gpu-admin-tools project and integrates it into the existing VFIO
> selftest framework.
> 
> Falcons are general-purpose microcontrollers present on NVIDIA GPUs
> that can perform DMA operations between system memory and device
> memory. By leveraging Falcon DMA, this driver allows NVIDIA GPUs to
> be tested alongside Intel IOAT and DSA devices using the same
> selftest infrastructure.
> 
> The driver is named 'nv_falcon' to reflect that it specifically
> controls the Falcon microcontrollers for DMA operations, rather
> than exposing general GPU functionality.
> 
> Reference implementation:
> https://github.com/NVIDIA/gpu-admin-tools
> 
> Signed-off-by: Rubin Du <rubind@nvidia.com>
> Signed-off-by: Alex Williamson <alex.williamson@nvidia.com>

Overall looks close, my comments are mostly nits.

I'm hoping to find a machine with one of the supported GPUs next week
as well so I can test out the new driver.

> +++ b/tools/testing/selftests/vfio/lib/drivers/nv_falcon/hw.h
> @@ -0,0 +1,350 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> + */
> +#ifndef _NV_FALCON_HW_H_
> +#define _NV_FALCON_HW_H_

Please add #include <linux/types.h> here to avoid depending on header
include ordering.

> +static int gpu_poll_register(struct vfio_pci_device *device,
> +			     const char *name, u32 offset,
> +			     u32 expected, u32 mask, u32 timeout_ms)
> +{
> +	struct gpu_device *gpu = to_gpu_device(device);
> +	struct timespec start, now;
> +	u64 elapsed_ms;
> +	u32 value;
> +
> +	clock_gettime(CLOCK_MONOTONIC, &start);
> +
> +	for (;;) {
> +		value = gpu_read32(gpu, offset);
> +		if ((value & mask) == expected)
> +			return 0;
> +
> +		clock_gettime(CLOCK_MONOTONIC, &now);
> +		elapsed_ms = (now.tv_sec - start.tv_sec) * 1000
> +			     + (now.tv_nsec - start.tv_nsec) / 1000000;

Create a helper to avoid duplicating elapsed_ms below in
fsp_poll_queue().

> +static int fsp_init(struct vfio_pci_device *device)
> +{
> +	int ret;
> +
> +	ret = gpu_poll_register(device, "fsp_boot_complete",
> +				NV_FSP_BOOT_COMPLETE_OFFSET,
> +				NV_FSP_BOOT_COMPLETE_MASK, 0xffffffff, 5000);

Sashiko is wondering if the mask should be something other than
0xffffffff:

  https://sashiko.dev/#/patchset/20260408225459.3088623-1-alex.williamson%40nvidia.com?part=4

> +static void falcon_select_core_falcon(struct vfio_pci_device *device)
> +{
> +	struct gpu_device *gpu = to_gpu_device(device);
> +	const struct falcon *falcon = gpu->falcon;
> +	u32 core_select_reg = falcon->base_page + NV_FALCON_CORE_SELECT_OFFSET;
> +	u32 core_select;
> +
> +	/* Read current value */

nit: Unnecessary comment.

> +static int nv_falcon_dma_init(struct vfio_pci_device *device)
> +{
[...]
> +	if (!falcon->no_outside_reset) {
> +		ret = falcon_reset(device);
> +		if (ret)
> +			return ret;
> +	}

falcon_enable() is called right before nv_falcon_dma_init(). Calling
falcon_reset() here will do falcon_disable() and then falcon_enable()
again. So it seems unnecessary?

> +static int nv_falcon_dma(struct vfio_pci_device *device,
> +			     u64 address,
> +			     u32 size_encoding,
> +			     bool write)

nit: Align with open parenthesis.

> +static int nv_falcon_memcpy_chunk(struct vfio_pci_device *device,
> +				iova_t src,
> +				iova_t dst,
> +				u32 size_encoding)

nit: Align with open parenthesis.

> +{
> +	int ret;
> +
> +	ret = nv_falcon_dma(device, src, size_encoding, false);
> +	if (ret) {
> +		dev_err(device, "Failed DMA read (src=0x%lx, encoding=%u)\n",
> +			src, size_encoding);

nit: Move this and the next dev_err() into nv_falcon_dma().

> +		return ret;
> +	}
> +
> +	ret = nv_falcon_dma(device, dst, size_encoding, true);
> +	if (ret) {
> +		dev_err(device, "Failed DMA write (dst=0x%lx, encoding=%u)\n",
> +			dst, size_encoding);
> +		return ret;
> +	}
> +
> +	return ret;
> +}
> +
> +static int nv_falcon_probe(struct vfio_pci_device *device)
> +{
> +	enum gpu_arch gpu_arch;
> +	u32 pmc_boot_0;
> +	void *bar0;
> +	int i;
> +
> +	if (vfio_pci_config_readw(device, PCI_VENDOR_ID) != PCI_VENDOR_ID_NVIDIA)
> +		return -ENODEV;
> +
> +	if (vfio_pci_config_readw(device, PCI_CLASS_DEVICE) >> 8 !=
> +	    PCI_BASE_CLASS_DISPLAY)
> +		return -ENODEV;
> +
> +	/* Get BAR0 pointer for reading GPU registers */
> +	bar0 = device->bars[0].vaddr;
> +	if (!bar0)
> +		return -ENODEV;
> +
> +	/* Read PMC_BOOT_0 register from BAR0 to identify GPU */
> +	pmc_boot_0 = readl(bar0 + NV_PMC_BOOT_0);
> +
> +	/* Look up GPU architecture to verify this is a supported GPU */
> +	gpu_arch = nv_gpu_arch_lookup(pmc_boot_0);
> +	if (gpu_arch == GPU_ARCH_UNKNOWN) {
> +		dev_err(device, "Unsupported GPU architecture for PMC_BOOT_0: 0x%x\n",
> +			pmc_boot_0);
> +		return -ENODEV;
> +	}
> +
> +	/* Check verified GPU map */
> +	for (i = 0; i < VERIFIED_GPU_MAP_SIZE; i++) {
> +		if (verified_gpu_map[i] == pmc_boot_0)
> +			return 0;
> +	}
> +
> +	dev_info(device, "Unvalidated GPU: PMC_BOOT_0: 0x%x, possibly not supported\n",
> +		pmc_boot_0);

nit: Align with open parenthesis.

> +
> +	return 0;
> +}
> +
> +static void nv_falcon_init(struct vfio_pci_device *device)
> +{
> +	struct gpu_device *gpu = to_gpu_device(device);
> +	const struct gpu_properties *props;
> +	enum gpu_arch gpu_arch;
> +	u32 pmc_boot_0;
> +	int ret;
> +
> +	VFIO_ASSERT_GE(device->driver.region.size, sizeof(*gpu));
> +
> +	/* Read PMC_BOOT_0 register from BAR0 to identify GPU */
> +	pmc_boot_0 = readl(device->bars[0].vaddr + NV_PMC_BOOT_0);
> +
> +	/* Look up GPU architecture */
> +	gpu_arch = nv_gpu_arch_lookup(pmc_boot_0);

nit: gpu->arch can just be used to avoid needing extra local variable.

> +	VFIO_ASSERT_NE(gpu_arch, GPU_ARCH_UNKNOWN,
> +		       "Unsupported GPU architecture (PMC_BOOT_0=0x%x)\n", pmc_boot_0);

This assert should already be guaranteed by probe().

> +
> +	props = &gpu_properties_map[gpu_arch];
> +
> +	/* Populate GPU structure */
> +	gpu->arch = gpu_arch;
> +	gpu->bar0 = device->bars[0].vaddr;
> +	gpu->is_memory_clear_supported = props->memory_clear_supported;
> +	gpu->falcon = &falcon_map[props->falcon_type];
> +	gpu->pmc_enable_mask = props->pmc_enable_mask;
> +
> +	ret = falcon_enable(device);
> +	VFIO_ASSERT_EQ(ret, 0, "Failed to enable falcon: %d\n", ret);
> +
> +	/* Initialize falcon for DMA */
> +	ret = nv_falcon_dma_init(device);
> +	VFIO_ASSERT_EQ(ret, 0, "Failed to initialize falcon DMA: %d\n", ret);

Since the init() and remove() callbacks return void, propagating errors
from functions that are only reachable from there is unnecessary.
Consider using VFIO_ASSERT*() and making more functions return void.

Consider this an optional suggestion since will be a lot of churn and
I'm not sure it will be a net improvement.

> +
> +	device->driver.max_memcpy_size = NV_FALCON_DMA_MAX_TRANSFER_SIZE;
> +	device->driver.max_memcpy_count = NV_FALCON_DMA_MAX_TRANSFER_COUNT;
> +}
> +
> +static void nv_falcon_remove(struct vfio_pci_device *device)
> +{
> +	falcon_disable(device);
> +	vfio_pci_cmd_clear(device, PCI_COMMAND_MASTER);
> +}
> +
> +/*
> + * Falcon DMA can only process one transfer at a time,
> + * so the actual work is deferred to memcpy_wait() to conform to the
> + * memcpy_start()/memcpy_wait() contract.
> + */
> +static void nv_falcon_memcpy_start(struct vfio_pci_device *device,
> +				iova_t src, iova_t dst, u64 size, u64 count)

nit: Align with open parenthesis.

> +{
> +	struct gpu_device *gpu = to_gpu_device(device);
> +
> +	VFIO_ASSERT_EQ(gpu->memcpy_count, 0);

gpu->memcpy_count can be removed. The caller
vfio_pci_drvier_memcpy_start() guarantees no memcpys are in-progress. I
guess this was copied from the DSA driver, but that needs to track the
count because that determines what type of completion it needs to wait
on in memcpy_wait().

nv_falcon only supports a single copy, so no need to track the count
across start/wait.

> +static int nv_falcon_memcpy_wait(struct vfio_pci_device *device)
> +{
> +	struct gpu_device *gpu = to_gpu_device(device);
> +	iova_t src = gpu->memcpy_src;
> +	iova_t dst = gpu->memcpy_dst;
> +	u64 remaining = gpu->memcpy_size;
> +	int ret = 0;
> +
> +	VFIO_ASSERT_EQ(gpu->memcpy_count, 1);
> +
> +	/*
> +	 * Falcon DMA supports power-of-2 transfer sizes in [4, 256] and
> +	 * cannot cross 256-byte block boundaries.  Decompose the request
> +	 * into the largest valid chunk at each step.
> +	 */
> +	while (remaining) {
> +		u64 chunk = rounddown_pow_of_two(remaining);
> +
> +		chunk = min(chunk, dma_block_remain(src));
> +		chunk = min(chunk, dma_block_remain(dst));
> +
> +		ret = nv_falcon_memcpy_chunk(device, src, dst,
> +					     size_to_dma_encoding(chunk));

nit: Suggest moving size_to_dma_encoding() all the way down into
nv_falcon_dma()  keep this function simple, avoid the line wrap, and
it's an implementation detail of how the copy operation gets posted to
the device.

     prev parent reply	other threads:[~2026-04-10 22:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 22:54 [PATCH v13 0/4] vfio: selftest: Add NVIDIA GPU Falcon DMA test driver Alex Williamson
2026-04-08 22:54 ` [PATCH v13 1/4] vfio: selftests: Add memcpy chunking to vfio_pci_driver_memcpy() Alex Williamson
2026-04-08 22:54 ` [PATCH v13 2/4] vfio: selftests: Add generic PCI command register helpers Alex Williamson
2026-04-10 21:27   ` David Matlack
2026-04-08 22:54 ` [PATCH v13 3/4] vfio: selftests: Allow drivers without send_msi() support Alex Williamson
2026-04-08 22:54 ` [PATCH v13 4/4] vfio: selftests: Add NVIDIA Falcon driver for DMA testing Alex Williamson
2026-04-10 22:19   ` David Matlack [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adl3j0piontmOQZS@google.com \
    --to=dmatlack@google.com \
    --cc=alex.williamson@nvidia.com \
    --cc=alex@shazbot.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=rubind@nvidia.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox