linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: JD Zheng <jiandong.zheng@broadcom.com>,
	linux-pci@vger.kernel.org, keith.busch@intel.com,
	bcm-kernel-feedback-list@broadcom.com,
	Lukas Wunner <lukas@wunner.de>
Subject: Re: SSD surprise removal leads to long wait inside pci_dev_wait() and FLR 65s timeout
Date: Sun, 2 Jun 2019 19:40:11 -0600	[thread overview]
Message-ID: <20190602194011.51ceaa23@x1.home> (raw)
In-Reply-To: <20190603004414.GA189360@google.com>

On Sun, 2 Jun 2019 19:44:14 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Alex, Lukas]
> 
> On Fri, May 31, 2019 at 09:55:20AM -0700, JD Zheng wrote:
> > Hello,
> > 
> > I am running DPDK 18.11+SPDK 19.04 with v5.1 kernel. DPDK/SPDK uses SSD vfio
> > devices and after running SPDK's nvmf_tgt, unplugging a SSD cause kernel to
> > print out following:
> > [  105.426952] vfio-pci 0000:04:00.0: not ready 2047ms after FLR; waiting
> > [  107.698953] vfio-pci 0000:04:00.0: not ready 4095ms after FLR; waiting
> > [  112.050960] vfio-pci 0000:04:00.0: not ready 8191ms after FLR; waiting
> > [  120.498953] vfio-pci 0000:04:00.0: not ready 16383ms after FLR; waiting
> > [  138.418957] vfio-pci 0000:04:00.0: not ready 32767ms after FLR; waiting
> > [  173.234953] vfio-pci 0000:04:00.0: not ready 65535ms after FLR; giving up
> > 
> > Looks like it is a PCI hotplug racing condition between DPDK's
> > eal-intr-thread thread and kernel's pciehp thread. And it causes lockup in
> > pci_dev_wait() at kernel side.
> > 
> > When SSD is removed, eal-intr-thread immediately receives
> > RTE_INTR_HANDLE_ALARM and handler calls rte_pci_detach_dev() and at kernel
> > side vfio_pci_release() is triggered to release this vfio device, which
> > calls pci_try_reset_function(), then _pci_reset_function_locked().
> > pci_try_reset_function acquires the device lock but
> > _pci_reset_function_locked() doesn't return, therefore lock is NOT released.

To what extent does vfio-pci need to learn about surprise hotplug?  My
expectation is that the current state of the code would only support
cooperative hotplug.  When a device is surprise removed, what backs a
user's mmaps?  AIUI, we don't have a revoke interface to invalidate
these.  We should probably start with an RFE or some development effort
to harden vfio-pci for surprise hotplug, it's not surprising it doesn't
just work TBH.  Thanks,

Alex


> > Inside _pci_reset_function_locked(), pcie_has_flr(), pci_pm_reset(), etc.
> > call pci_dev_wait() at the end but it doesn't return and print out above
> > message until 65s timeout.
> > 
> > At kernel pciehp side, it also detects the removal but doesn't run
> > immediately as it is configured as "pciehp.pciehp_poll_time=5". So a couple
> > of seconds later, it calls pciehp_unconfigure_device -> pci_walk_bus ->
> > pci_dev_set_disconnected. pci_dev_set_disconnected() couldn't get the device
> > lock and is stuck too because the lock is hold by eal-intr-thread.
> > 
> > The first issue is in pci_dev_wait(). It calls pci_read_config_dword() and
> > only when id is not all ones, it can return. But when SSD is physically
> > removed, id retrieved is always all ones therefore, it has to wait for FLR
> > 65s timeout to return.
> > 
> > I did the following to check return value of pci_read_config_dword() to fix
> > this:
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4439,7 +4439,11 @@ static int pci_dev_wait(struct pci_dev *dev, char
> > *reset_type, int timeout)
> > 
> >                 msleep(delay);
> >                 delay *= 2;
> > -               pci_read_config_dword(dev, PCI_COMMAND, &id);
> > +               if (pci_read_config_dword(dev, PCI_COMMAND, &id) ==
> > +                   PCIBIOS_DEVICE_NOT_FOUND) {
> > +                       pci_info(dev, "device disconnected\n");
> > +                       return -ENODEV;
> > +               }
> >         }
> > 
> >         if (delay > 1000)
> > 
> > The second issue is that due to lock up described above, the
> > pci_dev_set_disconnected() is stuck and pci_read_config_dword() won't return
> > PCIBIOS_DEVICE_NOT_FOUND.
> > 
> > I didn't find a easy way to fix it. Maybe use device lock in
> > pci_dev_set_disconnected() is too coarse and we need a finer device
> > err_state lock?
> > 
> > BTW, pci_dev_set_disconnected wasn't using device lock until this change
> > a6bd101b8f.
> > 
> > Any suggestions to fix this problem?  
> 
> Would you mind opening a report at https://bugzilla.kernel.org and
> attaching the complete dmesg log and "lspci -vv" output?
> 
> Out of curiosity, why do you use "pciehp.pciehp_poll_time=5"?
> 
> Bjorn


  reply	other threads:[~2019-06-03  1:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-31 16:55 SSD surprise removal leads to long wait inside pci_dev_wait() and FLR 65s timeout JD Zheng
2019-06-03  0:44 ` Bjorn Helgaas
2019-06-03  1:40   ` Alex Williamson [this message]
2019-06-03 21:17     ` JD Zheng
2019-06-03 22:40       ` Alex Williamson
2019-06-03 23:05       ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190602194011.51ceaa23@x1.home \
    --to=alex.williamson@redhat.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=helgaas@kernel.org \
    --cc=jiandong.zheng@broadcom.com \
    --cc=keith.busch@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).