From: Keith Busch <keith.busch@intel.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Alex_Gagniuc@Dellteam.com, linux-pci@vger.kernel.org,
bhelgaas@google.com, Austin.Bolen@dell.com
Subject: Re: PCI: hotplug: Erroneous removal of hotplug PCI devices
Date: Wed, 23 Jan 2019 12:47:27 -0700 [thread overview]
Message-ID: <20190123194727.GB8193@localhost.localdomain> (raw)
In-Reply-To: <20190123192829.qjxjhsmi7avasjnh@wunner.de>
On Wed, Jan 23, 2019 at 08:28:29PM +0100, Lukas Wunner wrote:
> On Wed, Jan 23, 2019 at 12:09:46PM -0700, Keith Busch wrote:
> > On Wed, Jan 23, 2019 at 08:07:23PM +0100, Lukas Wunner wrote:
> > > On Wed, Jan 23, 2019 at 07:54:20PM +0100, Lukas Wunner wrote:
> > > > So I don't see a perfect solution. What device are we talking about
> > > > anyway? 400 ms is a *long* time.
> > >
> > > Also, how exactly does this issue manifest itself: Is it just an
> > > annoyance that the slot is brought up/down/up or does it not work
> > > at all?
> >
> > Yeah, there is an nvme driver bug that hits a dead lock if you bring
> > a very quick add-remove sequence. The nvme remove tries to delete IO
> > resources before the async probe side set them up, so the driver doesn't
> > actually see that they're invalid. I have a proposed fix, but waiting to
> > here if it is successful.
> >
> > bz: https://bugzilla.kernel.org/show_bug.cgi?id=202081
>
> Hm, there's no full dmesg output attached, so it's not possible to
> tell what the topology looks like and what the vendor/device ID of
> 0000:b0:04.0 is.
>
> Also, there's only a card present / link up sequence visible in the
> abridged dmesg output which has a 4 usec delay, but no link up / card
> present sequence with a 400 msec delay?
Yeah, not easy to follow, and some discussion was off the bz.
Link Change:
[ 838.784541] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Up
Presence Detect Change +4msec:
[ 839.183506] pciehp 0000:b0:04.0:pcie204: Slot(178): Card not present
Inbetween these two entries has nvme start setting up its controller
detected on the link up. The "not present" side tries to remove the same
nvme device, but fails to invalidate the IO resources because it's racing
with probe before it even set them up, leaving probe unable to complete
IO a moment later because its IRQ resources were disabled.
Meanwhile, the blk-mq timeout handler can't do anything because the
device state is disconnected and believes the removal side is handling
things. What a mess...
We can fix it, just want to hear if Alex can confirm the proposal is
successful.
next prev parent reply other threads:[~2019-01-23 19:48 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-23 18:20 PCI: hotplug: Erroneous removal of hotplug PCI devices Alex_Gagniuc
2019-01-23 18:44 ` Keith Busch
2019-01-23 19:02 ` Lukas Wunner
2019-01-23 19:07 ` Keith Busch
2019-01-23 19:15 ` Lukas Wunner
2019-01-23 19:33 ` Keith Busch
2019-01-24 22:43 ` Austin.Bolen
2019-01-24 22:52 ` Austin.Bolen
[not found] ` <b32e6ca62ae2494f98450df81ca1ee14@AUSX13MPC131.AMER.DELL.COM>
2019-01-24 20:20 ` Keith Busch
2019-01-24 22:00 ` Austin.Bolen
2019-01-25 8:22 ` Lukas Wunner
2019-01-25 22:39 ` Austin.Bolen
2019-01-26 12:12 ` Lukas Wunner
2019-01-30 14:28 ` Austin.Bolen
2019-01-23 18:54 ` Lukas Wunner
2019-01-23 19:07 ` Lukas Wunner
2019-01-23 19:09 ` Keith Busch
2019-01-23 19:28 ` Lukas Wunner
2019-01-23 19:47 ` Keith Busch [this message]
2019-01-23 20:10 ` Alex_Gagniuc
2019-01-23 23:50 ` Alex_Gagniuc
2019-01-24 9:25 ` Lukas Wunner
2019-01-24 22:33 ` Austin.Bolen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190123194727.GB8193@localhost.localdomain \
--to=keith.busch@intel.com \
--cc=Alex_Gagniuc@Dellteam.com \
--cc=Austin.Bolen@dell.com \
--cc=bhelgaas@google.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox