From: Bjorn Helgaas <helgaas@kernel.org>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org, Lukas Wunner <lukas@wunner.de>,
Keith Busch <kbusch@kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Parav Pandit <parav@nvidia.com>,
virtualization@lists.linux.dev, stefanha@redhat.com,
alok.a.tiwari@oracle.com, linux-pci@vger.kernel.org
Subject: Re: [PATCH RFC v5 1/5] pci: report surprise removal event
Date: Mon, 14 Jul 2025 16:13:51 -0500 [thread overview]
Message-ID: <20250714211351.GA2418892@bhelgaas> (raw)
In-Reply-To: <20250714022128-mutt-send-email-mst@kernel.org>
On Mon, Jul 14, 2025 at 02:26:19AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 09, 2025 at 06:38:20PM -0500, Bjorn Helgaas wrote:
> > On Wed, Jul 09, 2025 at 04:55:26PM -0400, Michael S. Tsirkin wrote:
> > > At the moment, in case of a surprise removal, the regular remove
> > > callback is invoked, exclusively. This works well, because mostly, the
> > > cleanup would be the same.
> > >
> > > However, there's a race: imagine device removal was initiated by a user
> > > action, such as driver unbind, and it in turn initiated some cleanup and
> > > is now waiting for an interrupt from the device. If the device is now
> > > surprise-removed, that never arrives and the remove callback hangs
> > > forever.
> > >
> > > For example, this was reported for virtio-blk:
> > >
> > > 1. the graceful removal is ongoing in the remove() callback, where disk
> > > deletion del_gendisk() is ongoing, which waits for the requests +to
> > > complete,
> > >
> > > 2. Now few requests are yet to complete, and surprise removal started.
> > >
> > > At this point, virtio block driver will not get notified by the driver
> > > core layer, because it is likely serializing remove() happening by
> > > +user/driver unload and PCI hotplug driver-initiated device removal. So
> > > vblk driver doesn't know that device is removed, block layer is waiting
> > > for requests completions to arrive which it never gets. So
> > > del_gendisk() gets stuck.
> > >
> > > Drivers can artificially add timeouts to handle that, but it can be
> > > flaky.
> > >
> > > Instead, let's add a way for the driver to be notified about the
> > > disconnect. It can then do any necessary cleanup, knowing that the
> > > device is inactive.
> >
> > This relies on somebody (typically pciehp, I guess) calling
> > pci_dev_set_disconnected() when a surprise remove happens.
> >
> > Do you think it would be practical for the driver's .remove() method
> > to recognize that the device may stop responding at any point, even if
> > no hotplug driver is present to call pci_dev_set_disconnected()?
> >
> > Waiting forever for an interrupt seems kind of vulnerable in general.
> > Maybe "artificially adding timeouts" is alluding to *not* waiting
> > forever for interrupts? That doesn't seem artificial to me because
> > it's just a fact of life that devices can disappear at arbitrary
> > times.
> >
> > It seems a little fragile to me to depend on some other part of the
> > system to notice the surprise removal and tell you about it or
> > schedule your work function. I think it would be more robust for the
> > driver to check directly, i.e., assume writes to the device may be
> > lost, check for PCI_POSSIBLE_ERROR() after reads from the device, and
> > never wait for an interrupt without a timeout.
>
> virtio is ... kind of special, in that users already take it for
> granted that having a device as long as they want to respond
> still does not lead to errors and data loss.
>
> Makes it a bit harder as our timeout would have to
> check for presence and retry, we can't just fail as a
> normal hardware device does.
Sorry, I don't know enough about virtio to follow what you said about
"having a device as long as they want to respond".
We started with a graceful remove. That must mean the user no longer
needs the device.
> And there's the overhead thing - poking at the device a lot
> puts a high load on the host.
Checking for PCI_POSSIBLE_ERROR() doesn't touch the device. If you
did a config read already, and the result happened to be ~0, *then* we
have the problem of figuring out whether the actual data from the
device was ~0, or if the read failed and the Root Complex synthesized
the ~0. In many cases a driver knows that ~0 is not a possible
register value. Otherwise it might have to read another register that
is known not to be ~0.
Bjorn
next prev parent reply other threads:[~2025-07-14 21:13 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1752094439.git.mst@redhat.com>
2025-07-09 20:55 ` [PATCH RFC v5 1/5] pci: report surprise removal event Michael S. Tsirkin
2025-07-09 23:38 ` Bjorn Helgaas
2025-07-09 23:55 ` Keith Busch
2025-07-14 6:17 ` Michael S. Tsirkin
2025-07-14 6:26 ` Michael S. Tsirkin
2025-07-14 21:13 ` Bjorn Helgaas [this message]
2025-07-15 6:28 ` Michael S. Tsirkin
2025-07-16 22:29 ` Bjorn Helgaas
2025-07-17 15:15 ` Michael S. Tsirkin
2025-07-14 6:11 ` Lukas Wunner
2025-07-14 6:18 ` Michael S. Tsirkin
2025-07-14 6:54 ` Michael S. Tsirkin
2025-07-17 15:11 ` Michael S. Tsirkin
2025-07-17 20:12 ` Lukas Wunner
2025-07-17 23:31 ` Michael S. Tsirkin
2025-07-18 4:35 ` Lukas Wunner
2025-07-18 8:40 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250714211351.GA2418892@bhelgaas \
--to=helgaas@kernel.org \
--cc=alok.a.tiwari@oracle.com \
--cc=bhelgaas@google.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox