From: Alex Williamson <alex@shazbot.org>
To: David Matlack <dmatlack@google.com>
Cc: Rubin Du <rubind@nvidia.com>, Shuah Khan <shuah@kernel.org>,
kvm@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-kernel@vger.kernel.org, alex@shazbot.org
Subject: Re: [PATCH v12 4/4] selftests/vfio: Add NVIDIA Falcon driver for DMA testing
Date: Tue, 7 Apr 2026 16:49:15 -0600 [thread overview]
Message-ID: <20260407164915.08c2de1f@shazbot.org> (raw)
In-Reply-To: <adRF7clGId7Wuiik@google.com>
On Mon, 6 Apr 2026 23:46:53 +0000
David Matlack <dmatlack@google.com> wrote:
> On 2026-04-03 04:44 PM, Rubin Du wrote:
> > Add a new VFIO PCI driver for NVIDIA GPUs that enables DMA testing
> > via the Falcon (Fast Logic Controller) microcontrollers. This driver
> > extracts and adapts the DMA test functionality from NVIDIA's
> > gpu-admin-tools project and integrates it into the existing VFIO
> > selftest framework.
> >
> > Falcons are general-purpose microcontrollers present on NVIDIA GPUs
> > that can perform DMA operations between system memory and device
> > memory. By leveraging Falcon DMA, this driver allows NVIDIA GPUs to
> > be tested alongside Intel IOAT and DSA devices using the same
> > selftest infrastructure.
> >
> > The driver is named 'nv_falcon' to reflect that it specifically
> > controls the Falcon microcontrollers for DMA operations, rather
> > than exposing general GPU functionality.
> >
> > Reference implementation:
> > https://github.com/NVIDIA/gpu-admin-tools
> >
> > Signed-off-by: Rubin Du <rubind@nvidia.com>
>
> Will anyone from Nvidia be able to help review the correctness of the
> driver?
You can tell by the versioning snafu that we've seen a bunch of
versions of this internally, so we think it's correct. We also have
the reference implementation in the above link if anyone else wants to
compare.
> > +static int gpu_poll_register(struct vfio_pci_device *device,
> > + const char *name, u32 offset,
> > + u32 expected, u32 mask, u32 timeout_ms)
> > +{
> > + struct gpu_device *gpu = to_gpu_device(device);
> > + struct timespec start, now;
> > + u64 elapsed_ms;
> > + u32 value;
> > +
> > + clock_gettime(CLOCK_MONOTONIC, &start);
> > +
> > + for (;;) {
> > + value = gpu_read32(gpu, offset);
> > + if ((value & mask) == expected)
> > + return 0;
> > +
> > + clock_gettime(CLOCK_MONOTONIC, &now);
> > + elapsed_ms = (now.tv_sec - start.tv_sec) * 1000
> > + + (now.tv_nsec - start.tv_nsec) / 1000000;
> > +
> > + if (elapsed_ms >= timeout_ms)
> > + break;
> > +
> > + usleep(1000);
> > + }
> > +
> > + dev_err(device,
> > + "Timeout polling %s (0x%x): value=0x%x expected=0x%x mask=0x%x after %llu ms\n",
> > + name, offset, value, expected, mask,
> > + (unsigned long long)elapsed_ms);
>
> nit: You can replace ll with l in the format string and drop the cast to
> unsigned long long.
>
> We only support VFIO selftests on 64-bit architectures, and that matches
> what existing printfs for u64 use in VFIO selftests.
Yup
> > +static bool fsp_check_ofa_dma_support(struct vfio_pci_device *device)
> > +{
> > + struct gpu_device *gpu = to_gpu_device(device);
> > + u32 val = gpu_read32(gpu, NV_OFA_DMA_SUPPORT_CHECK_REG);
> > +
> > + return (val >> 16) != 0xbadf;
> > +}
> > +
> > +static int size_to_dma_encoding(u64 size)
> > +{
> > + size = min_t(u64, size, NV_FALCON_DMA_MAX_TRANSFER_SIZE);
>
> It should be impossible for size to be greater than
> NV_FALCON_DMA_MAX_TRANSFER_SIZE. This should be dropped or converted
> into a VFIO_ASSERT_LE().
Ok, I think the below tests can also become asserts.
> > +
> > + if (!size || (size & 0x3))
> > + return -EINVAL;
> > +
> > + return ffs(size) - 3;
>
> Sashiko pointed out that this can lead to partial memcpys:
>
> . If a non-power-of-two size is passed (for example, 24 bytes), ffs(size) will
> . return the lowest set bit (bit 4, yielding an encoding for 8 bytes).
> .
> . Because nv_gpu_memcpy_wait() performs exactly one chunk transfer and does not
> . loop over the remainder, could this silently drop the remaining bytes and
> . cause a partial data copy? Should this check if the size is a power of 2, or
> . should the caller loop to handle remainders?
>
> https://sashiko.dev/#/patchset/20260403234444.350867-1-rubind%40nvidia.com?part=4
>
> I think this nuance needs to be handled within the nv_falcon driver.
> vfio_pci_driver_memcpy_start() just guarantees that size <=
> NV_FALCON_DMA_MAX_TRANSFER_SIZE.
>
> Perhaps this is why you had the loop in an earlier version of the
> patchset, in which case it was my mistake to ask you to remove it!
>
> When you add the loop back please add a comment to the loop to explain
> that it is necessary since the region needs to be sliced up into
> power-of-2 sizes for Falcon.
Yes, shouldn't have been removed. We can't do arbitrary sizes, only
power-of-2 down to 4-bytes. We can make this more self documenting and
assert on memcpy_start for the size alignment requirement.
> > +const struct vfio_pci_driver_ops nv_falcon_ops = {
> > + .name = "nv_falcon",
> > + .probe = nv_gpu_probe,
> > + .init = nv_gpu_init,
> > + .remove = nv_gpu_remove,
> > + .memcpy_start = nv_gpu_memcpy_start,
> > + .memcpy_wait = nv_gpu_memcpy_wait,
> > +};
>
> Any particular reason these functions are named nv_gpu_*() instead of
> nv_falcon_*()
Yes, these can be made more consistent. Thanks,
Alex
prev parent reply other threads:[~2026-04-07 22:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-03 23:44 [PATCH v12 0/4] selftests/vfio: Add NVIDIA GPU Falcon DMA test driver Rubin Du
2026-04-03 23:44 ` [PATCH v12 1/4] selftests/vfio: Add memcpy chunking to vfio_pci_driver_memcpy() Rubin Du
2026-04-06 23:08 ` David Matlack
2026-04-03 23:44 ` [PATCH v12 2/4] selftests/vfio: Add generic PCI command register helpers Rubin Du
2026-04-06 23:10 ` David Matlack
2026-04-06 23:19 ` David Matlack
2026-04-03 23:44 ` [PATCH v12 3/4] selftests/vfio: Allow drivers without send_msi() support Rubin Du
2026-04-06 23:12 ` David Matlack
2026-04-03 23:44 ` [PATCH v12 4/4] selftests/vfio: Add NVIDIA Falcon driver for DMA testing Rubin Du
2026-04-06 23:46 ` David Matlack
2026-04-07 22:49 ` Alex Williamson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260407164915.08c2de1f@shazbot.org \
--to=alex@shazbot.org \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=rubind@nvidia.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox