Re: [PATCH v4 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C link status

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: Ankit Agrawal <ankita@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	"shameerali.kolothum.thodi@huawei.com"
	<shameerali.kolothum.thodi@huawei.com>,
	"kevin.tian@intel.com" <kevin.tian@intel.com>,
	Zhi Wang <zhiw@nvidia.com>, Aniket Agashe <aniketa@nvidia.com>,
	Neo Jia <cjia@nvidia.com>, Kirti Wankhede <kwankhede@nvidia.com>,
	"Tarun Gupta (SW-GPU)" <targupta@nvidia.com>,
	Vikram Sethi <vsethi@nvidia.com>,
	Andy Currid <acurrid@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Dan Williams <danw@nvidia.com>,
	"Anuj Aggarwal (SW-GPU)" <anuaggarwal@nvidia.com>,
	Matt Ochs <mochs@nvidia.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C link status
Date: Sun, 19 Jan 2025 20:12:32 -0700	[thread overview]
Message-ID: <20250119201232.04af85b2.alex.williamson@redhat.com> (raw)
In-Reply-To: <SA1PR12MB7199DB6748D147F434404629B0E72@SA1PR12MB7199.namprd12.prod.outlook.com>

On Mon, 20 Jan 2025 02:24:14 +0000
Ankit Agrawal <ankita@nvidia.com> wrote:

> >> +EXPORT_SYMBOL_GPL(vfio_pci_memory_lock_and_enable);
> >>
> >>  void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev, u16 cmd)
> >>  {
> >>       pci_write_config_word(vdev->pdev, PCI_COMMAND, cmd);
> >>       up_write(&vdev->memory_lock);
> >>  }
> >> +EXPORT_SYMBOL_GPL(vfio_pci_memory_unlock_and_restore);
> >>
> >>  static unsigned long vma_to_pfn(struct vm_area_struct *vma)
> >>  {  
> >
> > The access is happening before the device is exposed to the user, the
> > above are for handling conditions while there may be races with user
> > access, this is totally unnecessary.  
> 
> Right. What I could do to reuse the code is to take out the part
> related to locking/unlocking as new functions and export that.
> The current vfio_pci_memory_lock_and_enable() would take the lock
> and call the new function. Same for vfio_pci_memory_unlock_and_restore().
> The nvgrace module could also call that new function. Does that sound
> reasonable?

No, this is standard PCI driver stuff, everything you need is already
there.  Probably pci_enable_device() and some variant of
pci_request_regions().

> > Does this delay even need to happen in the probe function, or could it
> > happen in the open_device callback?  That would still be before user
> > access, but if we expect it to generally work, it would allow the
> > training to happen in the background up until the user tries to open
> > the device.  Thanks,
> >
> > Alex  
> 
> The thought process is that since it is purely bare metal coming to proper
> state while boot, the nvgrace module should probably wait for the startup
> to complete during probe() instead of delaying until open() time.

If the driver is statically loaded, that might mean you're willing to
stall boot for up to 30s.  In practice is this ever actually going to
fail?  Thanks,

Alex

next prev parent reply	other threads:[~2025-01-20  3:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17 23:37 [PATCH v4 0/3] vfio/nvgrace-gpu: Enable grace blackwell boards ankita
2025-01-17 23:37 ` [PATCH v4 1/3] vfio/nvgrace-gpu: Read dvsec register to determine need for uncached resmem ankita
2025-01-20  7:09   ` Tian, Kevin
2025-01-20 17:01     ` Ankit Agrawal
2025-01-21  1:20       ` Tian, Kevin
2025-01-17 23:37 ` [PATCH v4 2/3] vfio/nvgrace-gpu: Expose the blackwell device PF BAR1 to the VM ankita
2025-01-20  7:29   ` Tian, Kevin
2025-01-20 17:13     ` Ankit Agrawal
2025-01-17 23:37 ` [PATCH v4 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C link status ankita
2025-01-18  1:52   ` Alex Williamson
2025-01-20  2:24     ` Ankit Agrawal
2025-01-20  3:12       ` Alex Williamson [this message]
2025-01-20  3:22         ` Alex Williamson
2025-01-20  3:35           ` Ankit Agrawal
2025-01-20  7:04           ` Tian, Kevin
2025-01-20 13:12           ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250119201232.04af85b2.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=acurrid@nvidia.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=anuaggarwal@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=danw@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox