Re: [PATCH v7 1/1] vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC

Linux PCI subsystem development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Ankit Agrawal" <ankita@nvidia.com>
Cc: linux-pci@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v7 1/1] vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC
Date: Thu, 28 May 2026 10:22:14 +0000	[thread overview]
Message-ID: <20260528102214.D2D221F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260528093840.4018-1-ankita@nvidia.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] TOCTOU race in lockless device readiness checks causes unexpected -EAGAIN or VM_FAULT_SIGBUS errors instead of waiting for the device.
- [High] The 256-second device readiness polling uses uninterruptible `msleep()` loops, which can trigger hung task panics and renders user tasks unkillable.
--

commit a85819b2171b67b275d15e9b87eb7447b4dd63a4
Author: Ankit Agrawal <ankita@nvidia.com>

vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC

Adds a CXL DVSEC-based readiness check for Blackwell-Next GPUs in addition
to the existing legacy BAR0 polling. The device readiness is determined by
polling the Memory_Active bit outside the memory lock for up to 256 seconds.

> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> index fa056b69f899a..04fcc0d088f54 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c

[ ... ]

> +static int nvgrace_gpu_wait_device_ready_cxl(struct nvgrace_gpu_pci_core_device *nvdev)
> +{
> +	unsigned long deadline = jiffies + msecs_to_jiffies(POLL_QUANTUM_MS);
> +	bool active_phase = false;
> +	u32 status;
> +	int ret;
> +
> +	for (;;) {

[ ... ]

> +		if (time_after(jiffies, deadline))
> +			return -ETIME;
> +
> +		msleep(POLL_QUANTUM_MS);
> +	}
> +}

[Severity: High]
Will this uninterruptible sleep trigger hung task panics?

Since the CXL device readiness can take up to 256 seconds and msleep() places
the task in the TASK_UNINTERRUPTIBLE state, processes accessing the device
during a reset will become unresponsive and immune to signals (including
SIGKILL) for up to 4 minutes.

With the default Linux hung_task_timeout_secs at 120 seconds, blocking here
is highly likely to trigger the hung task detector and cause a kernel panic
on systems with hung_task_panic enabled. Would it be better to use an
interruptible sleep and abort on pending fatal signals?

[ ... ]

> @@ -313,6 +399,21 @@ static vm_fault_t nvgrace_gpu_vfio_pci_huge_fault(struct vm_fault *vmf,
>  	pfn = PHYS_PFN(memregion->memphys) + addr_to_pgoff(vma, addr);
>  
>  	if (is_aligned_for_order(vma, addr, pfn, order)) {
> +		/*
> +		 * Exit early under memory_lock to avoid a potentially lengthy
> +		 * device readiness wait on a runtime-suspended device. Any
> +		 * race after the lock is dropped is benign as the re-check
> +		 * inside the scoped guard below catches it.
> +		 */
> +		scoped_guard(rwsem_read, &vdev->memory_lock) {
> +			if (vdev->pm_runtime_engaged)
> +				return VM_FAULT_SIGBUS;
> +		}
> +
> +		if (nvdev->cxl_dvsec && READ_ONCE(nvdev->reset_done) &&
> +		    nvgrace_gpu_wait_device_ready_cxl(nvdev))
> +			return VM_FAULT_SIGBUS;

[Severity: High]
Can this lockless check race with a concurrent device reset and bypass the
intended polling mechanism?

If a concurrent device reset finishes and sets reset_done = true right after
this lockless check evaluates to false, the thread proceeds without polling
and blocks waiting for memory_lock.

Once the lock is acquired, the thread calls nvgrace_gpu_check_device_ready()
which observes reset_done == true and invokes the fallback
nvgrace_gpu_test_device_ready_cxl() under the lock. Because this CXL test is
non-blocking, it immediately returns -EAGAIN, causing
nvgrace_gpu_vfio_pci_huge_fault() to return VM_FAULT_SIGBUS and crash user
applications instead of waiting.

> +
>  		scoped_guard(rwsem_read, &vdev->memory_lock) {
>  			if (vdev->pm_runtime_engaged ||
>  			    nvgrace_gpu_check_device_ready(nvdev))

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260528093840.4018-1-ankita@nvidia.com?part=1

next prev parent reply	other threads:[~2026-05-28 10:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28  9:38 [PATCH v7 1/1] vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC Ankit Agrawal
2026-05-28 10:22 ` sashiko-bot [this message]
2026-05-28 17:56 ` Alex Williamson
2026-05-29 11:27   ` Ankit Agrawal
2026-06-08 15:47   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528102214.D2D221F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=ankita@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox