From: Alex Williamson <alex@shazbot.org>
To: David Matlack <dmatlack@google.com>
Cc: Rubin Du <rubind@nvidia.com>, Shuah Khan <shuah@kernel.org>,
kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-kernel@vger.kernel.org, alex@shazbot.org
Subject: Re: [PATCH v12 4/4] selftests/vfio: Add NVIDIA Falcon driver for DMA testing
Date: Tue, 7 Apr 2026 16:49:15 -0600 [thread overview]
Message-ID: <20260407164915.08c2de1f@shazbot.org> (raw)
In-Reply-To: <adRF7clGId7Wuiik@google.com>
On Mon, 6 Apr 2026 23:46:53 +0000
David Matlack <dmatlack@google.com> wrote:
> On 2026-04-03 04:44 PM, Rubin Du wrote:
> > Add a new VFIO PCI driver for NVIDIA GPUs that enables DMA testing
> > via the Falcon (Fast Logic Controller) microcontrollers. This driver
> > extracts and adapts the DMA test functionality from NVIDIA's
> > gpu-admin-tools project and integrates it into the existing VFIO
> > selftest framework.
> >
> > Falcons are general-purpose microcontrollers present on NVIDIA GPUs
> > that can perform DMA operations between system memory and device
> > memory. By leveraging Falcon DMA, this driver allows NVIDIA GPUs to
> > be tested alongside Intel IOAT and DSA devices using the same
> > selftest infrastructure.
> >
> > The driver is named 'nv_falcon' to reflect that it specifically
> > controls the Falcon microcontrollers for DMA operations, rather
> > than exposing general GPU functionality.
> >
> > Reference implementation:
> > https://github.com/NVIDIA/gpu-admin-tools
> >
> > Signed-off-by: Rubin Du <rubind@nvidia.com>
>
> Will anyone from Nvidia be able to help review the correctness of the
> driver?
You can tell by the versioning snafu that we've seen a bunch of
versions of this internally, so we think it's correct. We also have
the reference implementation in the above link if anyone else wants to
compare.
> > +static int gpu_poll_register(struct vfio_pci_device *device,
> > + const char *name, u32 offset,
> > + u32 expected, u32 mask, u32 timeout_ms)
> > +{
> > + struct gpu_device *gpu = to_gpu_device(device);
> > + struct timespec start, now;
> > + u64 elapsed_ms;
> > + u32 value;
> > +
> > + clock_gettime(CLOCK_MONOTONIC, &start);
> > +
> > + for (;;) {
> > + value = gpu_read32(gpu, offset);
> > + if ((value & mask) == expected)
> > + return 0;
> > +
> > + clock_gettime(CLOCK_MONOTONIC, &now);
> > + elapsed_ms = (now.tv_sec - start.tv_sec) * 1000
> > + + (now.tv_nsec - start.tv_nsec) / 1000000;
> > +
> > + if (elapsed_ms >= timeout_ms)
> > + break;
> > +
> > + usleep(1000);
> > + }
> > +
> > + dev_err(device,
> > + "Timeout polling %s (0x%x): value=0x%x expected=0x%x mask=0x%x after %llu ms\n",
> > + name, offset, value, expected, mask,
> > + (unsigned long long)elapsed_ms);
>
> nit: You can replace ll with l in the format string and drop the cast to
> unsigned long long.
>
> We only support VFIO selftests on 64-bit architectures, and that matches
> what existing printfs for u64 use in VFIO selftests.
Yup
> > +static bool fsp_check_ofa_dma_support(struct vfio_pci_device *device)
> > +{
> > + struct gpu_device *gpu = to_gpu_device(device);
> > + u32 val = gpu_read32(gpu, NV_OFA_DMA_SUPPORT_CHECK_REG);
> > +
> > + return (val >> 16) != 0xbadf;
> > +}
> > +
> > +static int size_to_dma_encoding(u64 size)
> > +{
> > + size = min_t(u64, size, NV_FALCON_DMA_MAX_TRANSFER_SIZE);
>
> It should be impossible for size to be greater than
> NV_FALCON_DMA_MAX_TRANSFER_SIZE. This should be dropped or converted
> into a VFIO_ASSERT_LE().
Ok, I think the below tests can also become asserts.
> > +
> > + if (!size || (size & 0x3))
> > + return -EINVAL;
> > +
> > + return ffs(size) - 3;
>
> Sashiko pointed out that this can lead to partial memcpys:
>
> . If a non-power-of-two size is passed (for example, 24 bytes), ffs(size) will
> . return the lowest set bit (bit 4, yielding an encoding for 8 bytes).
> .
> . Because nv_gpu_memcpy_wait() performs exactly one chunk transfer and does not
> . loop over the remainder, could this silently drop the remaining bytes and
> . cause a partial data copy? Should this check if the size is a power of 2, or
> . should the caller loop to handle remainders?
>
> https://sashiko.dev/#/patchset/20260403234444.350867-1-rubind%40nvidia.com?part=4
>
> I think this nuance needs to be handled within the nv_falcon driver.
> vfio_pci_driver_memcpy_start() just guarantees that size <=
> NV_FALCON_DMA_MAX_TRANSFER_SIZE.
>
> Perhaps this is why you had the loop in an earlier version of the
> patchset, in which case it was my mistake to ask you to remove it!
>
> When you add the loop back please add a comment to the loop to explain
> that it is necessary since the region needs to be sliced up into
> power-of-2 sizes for Falcon.
Yes, shouldn't have been removed. We can't do arbitrary sizes, only
power-of-2 down to 4-bytes. We can make this more self documenting and
assert on memcpy_start for the size alignment requirement.
> > +const struct vfio_pci_driver_ops nv_falcon_ops = {
> > + .name = "nv_falcon",
> > + .probe = nv_gpu_probe,
> > + .init = nv_gpu_init,
> > + .remove = nv_gpu_remove,
> > + .memcpy_start = nv_gpu_memcpy_start,
> > + .memcpy_wait = nv_gpu_memcpy_wait,
> > +};
>
> Any particular reason these functions are named nv_gpu_*() instead of
> nv_falcon_*()
Yes, these can be made more consistent. Thanks,
Alex
prev parent reply other threads:[~2026-04-07 22:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-03 23:44 [PATCH v12 0/4] selftests/vfio: Add NVIDIA GPU Falcon DMA test driver Rubin Du
2026-04-03 23:44 ` [PATCH v12 1/4] selftests/vfio: Add memcpy chunking to vfio_pci_driver_memcpy() Rubin Du
2026-04-06 23:08 ` David Matlack
2026-04-03 23:44 ` [PATCH v12 2/4] selftests/vfio: Add generic PCI command register helpers Rubin Du
2026-04-06 23:10 ` David Matlack
2026-04-06 23:19 ` David Matlack
2026-04-03 23:44 ` [PATCH v12 3/4] selftests/vfio: Allow drivers without send_msi() support Rubin Du
2026-04-06 23:12 ` David Matlack
2026-04-03 23:44 ` [PATCH v12 4/4] selftests/vfio: Add NVIDIA Falcon driver for DMA testing Rubin Du
2026-04-06 23:46 ` David Matlack
2026-04-07 22:49 ` Alex Williamson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260407164915.08c2de1f@shazbot.org \
--to=alex@shazbot.org \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=rubind@nvidia.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.