Oops when completing request on the wrong queue

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: krisman@linux.vnet.ibm.com (Gabriel Krisman Bertazi)
Subject: Oops when completing request on the wrong queue
Date: Tue, 23 Aug 2016 17:54:03 -0300	[thread overview]
Message-ID: <87a8g39pg4.fsf@linux.vnet.ibm.com> (raw)
In-Reply-To: <87shu0hfye.fsf@linux.vnet.ibm.com> (Gabriel Krisman Bertazi's message of "Fri, 19 Aug 2016 13:38:17 -0300")

Gabriel Krisman Bertazi <krisman at linux.vnet.ibm.com> writes:

>> Can you share what you ran to online/offline CPUs? I can't reproduce
>> this here.
>
> I was using the ppc64_cpu tool, which shouldn't do nothing more than
> write to sysfs.  but I just reproduced it with the script below.
>
> Note that this is ppc64le.  I don't have a x86 in hand to attempt to
> reproduce right now, but I'll look for one and see how it goes.

Hi,

Any luck on reproducing it?  We were initially reproducing with a
proprietary stress test, but I gave a try to a generated fio jobfile
associated with the SMT script I shared earlier and I could reproduce
the crash consistently in less than 10 minutes of execution.  this was
still ppc64le, though.  I couldn't get my hands on nvme on x86 yet.

The job file I used, as well as the smt.sh script, in case you want to
give it a try:

jobfile: http://krisman.be/k/nvmejob.fio
smt.sh:  http://krisman.be/k/smt.sh

Still, the trigger seems to be consistently a heavy load of IO
associated with CPU addition/removal.

Let me share my progress from the last couple days in the hope that it
rings a bell for you.

Firstly, I verified that when we hit the BUG_ON in nvme_queue_rq, the
request_queue's freeze_depth is 0, which points away from a fault in the
freeze/unfreeze mechanism.  If a request was escaping and going through
the block layer during a freeze, we'd see freeze_depth >= 1.  Before
that, I had also tried to keep the q_usage_counter in atomic mode, in
case of a bug in the percpu refcount.  No luck, the BUG_ON was still
hit.

Also, I don't see anything special about the request that reaches the
BUG_ON.  It's a REQ_TYPE_FS request and, at least in the last time I
reproduced, it was a READ that came from the stress test task through
submit_bio.  So nothing remarkable about it too, as far as I can see.

I'm still thinking about a case in which the mapping get's screwed up,
where a ctx would appear into two hctxs bitmaps after a remap, or if the
ctx got remaped to another hctx.  I'm still learning my way through the
cpumap code, so I'm not sure it's a real possibility, but I'm not
convinced it isn't.  Some preliminary tests don't suggest it's the case
at play, but I wanna spend a little more time on this theory (maybe
for my lack of better ideas :)

On a side note, probably unrelated to this crash, it also got me
thinking about the current usefulness of blk_mq_hctx_notify.  Since CPU
is dead, no more requests would be coming through its ctx.  I think we
could force a queue run in blk_mq_queue_reinit_notify, before remapping,
which would cause the hctx to fetch the remaining requests from that
dead ctx (since it's not unmapped yet).  This way, we could maintain a
single hotplug notification hook and simplify the hotplug path.  I
haven't written code for it yet, but I'll see if I can come up with
something and send to the list.

-- 
Gabriel Krisman Bertazi

next prev parent reply	other threads:[~2016-08-23 20:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-10  4:04 Oops when completing request on the wrong queue Gabriel Krisman Bertazi
2016-08-11 17:16 ` Keith Busch
2016-08-11 18:10   ` Gabriel Krisman Bertazi
2016-08-19 13:28 ` Gabriel Krisman Bertazi
2016-08-19 14:13   ` Jens Axboe
2016-08-19 15:51     ` Jens Axboe
2016-08-19 16:38       ` Gabriel Krisman Bertazi
2016-08-23 20:54         ` Gabriel Krisman Bertazi [this message]
2016-08-23 21:11           ` Jens Axboe
2016-08-23 21:14             ` Jens Axboe
2016-08-23 22:49               ` Keith Busch
2016-08-24 18:34               ` Jens Axboe
2016-08-24 20:36                 ` Jens Axboe
2016-08-29 18:06                   ` Gabriel Krisman Bertazi
2016-08-29 18:40                     ` Jens Axboe
2016-09-05 12:02                       ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a8g39pg4.fsf@linux.vnet.ibm.com \
    --to=krisman@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).