Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jim Harris <jim.harris@samsung.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"ben@nvidia.com" <ben@nvidia.com>
Subject: Re: Locking between vfio hot-remove and pci sysfs sriov_numvfs
Date: Fri, 8 Dec 2023 11:12:15 -0700	[thread overview]
Message-ID: <20231208111215.5a47090e.alex.williamson@redhat.com> (raw)
In-Reply-To: <20231208180157.GR2692119@nvidia.com>

On Fri, 8 Dec 2023 14:01:57 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Fri, Dec 08, 2023 at 05:59:17PM +0000, Jim Harris wrote:
> > On Fri, Dec 08, 2023 at 01:41:09PM -0400, Jason Gunthorpe wrote:  
> > > On Fri, Dec 08, 2023 at 05:38:51PM +0000, Jim Harris wrote:  
> > > > On Thu, Dec 07, 2023 at 07:48:10PM -0400, Jason Gunthorpe wrote:  
> > > > > 
> > > > > The mechanism of waiting in remove for userspace is inherently flawed,
> > > > > it can never work fully correctly. :( I've hit this many times.
> > > > > 
> > > > > Upon remove VFIO should immediately remove itself and leave behind a
> > > > > non-functional file descriptor. Userspace should catch up eventually
> > > > > and see it is toast.  
> > > > 
> > > > One nice aspect of the current design is that vfio will leave the BARs
> > > > mapped until userspace releases the vfio handle. It avoids some rather
> > > > nasty hacks for handling SIGBUS errors in the fast path (i.e. writing
> > > > NVMe doorbells) where we cannot try to check for device removal on
> > > > every MMIO write. Would your proposal immediately yank the BARs, without
> > > > waiting for userspace to respond? This is mostly for my curiosity - SPDK
> > > > already has these hacks implemented, so I don't think it would be
> > > > affected by this kind of change in behavior.  
> > > 
> > > What we did in RDMA was map a dummy page to the BARs so the sigbus was
> > > avoided. But in that case RDMA knows the BAR memory is used only for
> > > doorbell write so this is a reasonable thing to do.  
> > 
> > Yeah, this is exactly what SPDK (and DPDK) does today.  
> 
> To be clear, I mean we did it in the kernel.
> 
> When the device driver is removed we zap all the VMAs and install a
> fault handler that installs the dummy page instead of SIGBUS
> 
> The application doesn't do anything, and this is how SPDK already will
> be supporting device hot unplug of the RDMA drivers.

But I think you can only do that in the kernel because you understand
the device uses those pages for doorbells and it's not a general
purpose solution, right?

Perhaps a variant driver could do something similar for NVMe devices
doorbell pages, but a device agnostic driver like vfio-pci would need
to SIGBUS on access or else we risk significant data integrity issues.
Thanks,

Alex


  reply	other threads:[~2023-12-08 18:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20231207223824uscas1p27dd91f0af56cda282cd28046cc981fe9@uscas1p2.samsung.com>
2023-12-07 22:38 ` Locking between vfio hot-remove and pci sysfs sriov_numvfs Jim Harris
2023-12-07 23:21   ` Alex Williamson
2023-12-07 23:48     ` Jason Gunthorpe
2023-12-08 17:07       ` Jim Harris
2023-12-08 19:41         ` Jason Gunthorpe
2023-12-08 20:09           ` Jim Harris
2023-12-10 19:05             ` Jason Gunthorpe
2023-12-11  7:20               ` Leon Romanovsky
2023-12-12 21:34                 ` Jim Harris
2023-12-13  6:55                   ` Leon Romanovsky
2023-12-08 17:38       ` Jim Harris
2023-12-08 17:41         ` Jason Gunthorpe
2023-12-08 17:59           ` Jim Harris
2023-12-08 18:01             ` Jason Gunthorpe
2023-12-08 18:12               ` Alex Williamson [this message]
2023-12-08 19:43                 ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231208111215.5a47090e.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=ben@nvidia.com \
    --cc=bhelgaas@google.com \
    --cc=jgg@nvidia.com \
    --cc=jim.harris@samsung.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox