Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: mr.nuke.me@gmail.com (Alex G.)
Subject: IRQ/nvme_pci_complete_rq: NULL pointer dereference yet again
Date: Fri, 6 Apr 2018 14:08:38 -0500	[thread overview]
Message-ID: <e1218a86-2183-91c9-5a0a-a1c331b5b5f3@gmail.com> (raw)
In-Reply-To: <20180406180445.GL10098@localhost.localdomain>

On 04/06/2018 01:04 PM, Keith Busch wrote:
> On Fri, Apr 06, 2018@12:46:06PM -0500, Alex G. wrote:
>> On 04/06/2018 12:16 PM, Scott Bauer wrote:
>>> You're using AER inject, right?
>>
>> No. I'm causing the errors in hardware with hot-unplug.
> 
> I think Scott's still on the right track for this particular sighting.
> The AER handler looks unsafe under changing topologies. It might need run
> under pci_lock_rescan_remove() before walking the bus to prevent races
> with the surprise removal, but it's not clear to me yet if holding that
> lock is okay to do in this context.

I think we have three mechanisms that can remove a device: nvme timeout,
Link Down interrupt, and AER.
Link Down comes 20-60ms after the link actually dies, in which time nvme
will queue IO, which can cause a flood of PCIe errors, which trigger AER
handling. I suspect there is a massive race condition somewhere, but I
don't yet have convincing evidence to prove it.

> This however does not appear to resemble your previous sightings. In your
> previous sightings, it looks like something has lost track of commands,
> and we're freeing the resources with them a second time.

I think they might be related.

Alex

  parent reply	other threads:[~2018-04-06 19:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5d6d1a8c-6490-4046-0fba-da0a0df3d00c@gmail.com>
2018-04-05 21:38 ` IRQ/nvme_pci_complete_rq: NULL pointer dereference yet again Keith Busch
2018-04-05 21:22   ` Scott Bauer
2018-04-05 22:21     ` Alex G.
2018-04-05 22:41       ` Keith Busch
2018-04-05 22:48         ` Keith Busch
2018-04-05 23:05           ` Keith Busch
2018-04-05 23:39             ` Alex G.
2018-04-05 23:44               ` Alex G.
2018-04-06 15:32                 ` Keith Busch
2018-04-06 15:46                   ` Alex G.
     [not found]                   ` <94d77cb7-759f-595a-2264-37305dfa96c4@gmail.com>
2018-04-06 17:16                     ` Scott Bauer
2018-04-06 17:46                       ` Alex G.
2018-04-06 18:04                         ` Keith Busch
2018-04-06 19:00                           ` Scott Bauer
2018-04-06 19:34                             ` Keith Busch
2018-04-06 19:08                           ` Alex G. [this message]
2018-04-06 22:00                             ` Keith Busch
2018-04-09 18:23                               ` Alex G.
2018-04-09 19:11                                 ` Keith Busch
2018-04-09 19:36                                   ` Alex G.
2018-04-09 19:47                                     ` Keith Busch
2018-04-10  0:07                                       ` Alex G.
2018-04-10 14:19                                       ` Alex G.
2018-05-02 15:39                                       ` Alex G.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e1218a86-2183-91c9-5a0a-a1c331b5b5f3@gmail.com \
    --to=mr.nuke.me@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox