From: mr.nuke.me@gmail.com (Alex G.)
Subject: IRQ/nvme_pci_complete_rq: NULL pointer dereference yet again
Date: Fri, 6 Apr 2018 14:08:38 -0500 [thread overview]
Message-ID: <e1218a86-2183-91c9-5a0a-a1c331b5b5f3@gmail.com> (raw)
In-Reply-To: <20180406180445.GL10098@localhost.localdomain>
On 04/06/2018 01:04 PM, Keith Busch wrote:
> On Fri, Apr 06, 2018@12:46:06PM -0500, Alex G. wrote:
>> On 04/06/2018 12:16 PM, Scott Bauer wrote:
>>> You're using AER inject, right?
>>
>> No. I'm causing the errors in hardware with hot-unplug.
>
> I think Scott's still on the right track for this particular sighting.
> The AER handler looks unsafe under changing topologies. It might need run
> under pci_lock_rescan_remove() before walking the bus to prevent races
> with the surprise removal, but it's not clear to me yet if holding that
> lock is okay to do in this context.
I think we have three mechanisms that can remove a device: nvme timeout,
Link Down interrupt, and AER.
Link Down comes 20-60ms after the link actually dies, in which time nvme
will queue IO, which can cause a flood of PCIe errors, which trigger AER
handling. I suspect there is a massive race condition somewhere, but I
don't yet have convincing evidence to prove it.
> This however does not appear to resemble your previous sightings. In your
> previous sightings, it looks like something has lost track of commands,
> and we're freeing the resources with them a second time.
I think they might be related.
Alex
next prev parent reply other threads:[~2018-04-06 19:08 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <5d6d1a8c-6490-4046-0fba-da0a0df3d00c@gmail.com>
2018-04-05 21:38 ` IRQ/nvme_pci_complete_rq: NULL pointer dereference yet again Keith Busch
2018-04-05 21:22 ` Scott Bauer
2018-04-05 22:21 ` Alex G.
2018-04-05 22:41 ` Keith Busch
2018-04-05 22:48 ` Keith Busch
2018-04-05 23:05 ` Keith Busch
2018-04-05 23:39 ` Alex G.
2018-04-05 23:44 ` Alex G.
2018-04-06 15:32 ` Keith Busch
2018-04-06 15:46 ` Alex G.
[not found] ` <94d77cb7-759f-595a-2264-37305dfa96c4@gmail.com>
2018-04-06 17:16 ` Scott Bauer
2018-04-06 17:46 ` Alex G.
2018-04-06 18:04 ` Keith Busch
2018-04-06 19:00 ` Scott Bauer
2018-04-06 19:34 ` Keith Busch
2018-04-06 19:08 ` Alex G. [this message]
2018-04-06 22:00 ` Keith Busch
2018-04-09 18:23 ` Alex G.
2018-04-09 19:11 ` Keith Busch
2018-04-09 19:36 ` Alex G.
2018-04-09 19:47 ` Keith Busch
2018-04-10 0:07 ` Alex G.
2018-04-10 14:19 ` Alex G.
2018-05-02 15:39 ` Alex G.
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e1218a86-2183-91c9-5a0a-a1c331b5b5f3@gmail.com \
--to=mr.nuke.me@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox