Re: [PATCH v8 1/1] vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Ankit Agrawal" <ankita@nvidia.com>
Cc: linux-pci@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v8 1/1] vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC
Date: Tue, 02 Jun 2026 06:50:59 +0000	[thread overview]
Message-ID: <20260602065100.48B2D1F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260602063015.3915-1-ankita@nvidia.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Read and write operations spuriously fail with -EAGAIN if they race with a device reset due to missing retry logic.
--

commit 2aea957cd945c5158a2106e110886e0b017eeb9a
Author: Ankit Agrawal <ankita@nvidia.com>

    vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC

This commit introduces a CXL DVSEC-based readiness check for Blackwell-Next
GPUs. It adds lockless polling followed by an in-lock non-blocking check,
and updates the probe, fault, read, and write paths to use the new flow.

> diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> index fa056b69f899a..4829ae29cff04 100644
> --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
[ ... ]
> @@ -712,6 +825,12 @@ nvgrace_gpu_read_mem(struct nvgrace_gpu_pci_core_device *nvdev,
>  	else
>  		mem_count = min(count, memregion->memlength - (size_t)offset);
>  
> +	if (nvdev->cxl_dvsec && READ_ONCE(nvdev->reset_done)) {
> +		ret = nvgrace_gpu_wait_device_ready_cxl(nvdev);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	scoped_guard(rwsem_read, &vdev->memory_lock) {
>  		ret = nvgrace_gpu_check_device_ready(nvdev);
>  		if (ret)

[Severity: High]
If a reset races between the lockless wait and the in-lock check, could this
code return -EAGAIN to userspace?

The page fault handler was updated with a retry loop to handle the -EAGAIN
returned by nvgrace_gpu_check_device_ready() when a concurrent reset happens.
However, in nvgrace_gpu_read_mem(), if nvgrace_gpu_check_device_ready()
returns -EAGAIN, that error will be directly propagated to the caller. 

Returning a spurious -EAGAIN on a standard blocking file descriptor violates
POSIX semantics and may cause unexpected application failures.

Should nvgrace_gpu_read_mem() include the same retry loop around the wait
and lock acquisition as the fault handler?

>  			return ret;
[ ... ]
> @@ -846,6 +965,12 @@ nvgrace_gpu_write_mem(struct nvgrace_gpu_pci_core_device *nvdev,
>  	 */
>  	mem_count = min(count, memregion->memlength - (size_t)offset);
>  
> +	if (nvdev->cxl_dvsec && READ_ONCE(nvdev->reset_done)) {
> +		ret = nvgrace_gpu_wait_device_ready_cxl(nvdev);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	scoped_guard(rwsem_read, &vdev->memory_lock) {
>  		ret = nvgrace_gpu_check_device_ready(nvdev);
>  		if (ret)

[Severity: High]
Does nvgrace_gpu_write_mem() have the same missing retry logic?

Similar to the read path, a racing reset could cause this to return a
spurious -EAGAIN to userspace.

>  			return ret;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260602063015.3915-1-ankita@nvidia.com?part=1

next prev parent reply	other threads:[~2026-06-02  6:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-02  6:30 [PATCH v8 1/1] vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC Ankit Agrawal
2026-06-02  6:50 ` sashiko-bot [this message]
2026-06-02 12:29 ` Ankit Agrawal
2026-06-02 18:22 ` Alex Williamson
2026-06-03  6:55 ` Tian, Kevin
2026-06-03  8:17 ` Cédric Le Goater
2026-06-03  9:17   ` Ankit Agrawal
2026-06-05 19:26 ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260602065100.48B2D1F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=ankita@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.