public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: Davidlohr Bueso <dave@stgolabs.net>,
	Klaus Jensen <its@irrelevant.dk>,
	"Paul E. McKenney" <paulmck@kernel.org>
Cc: shinichiro.kawasaki@wdc.com, Klaus Jensen <its@irrelevant.dk>,
	linux-block@vger.kernel.org, Pankaj Raghav <pankydev8@gmail.com>,
	Pankaj Raghav <p.raghav@samsung.com>,
	Adam Manzanares <a.manzanares@samsung.com>
Subject: Re: blktests with zbd/006 ZNS triggers a possible false positive RCU stall
Date: Fri, 15 Apr 2022 10:33:00 -0700	[thread overview]
Message-ID: <YlmsTCBLg6hB2eNn@bombadil.infradead.org> (raw)
In-Reply-To: <20220415010945.wvyztmss7rfqnlog@offworld>

On Thu, Apr 14, 2022 at 06:09:45PM -0700, Davidlohr Bueso wrote:
> No idea, however, why this would happen when using qemu as opposed to
> nbd.

Sorry I keep saying nbd often when I mean null_blk, but yeah, same concept,
in that I meant an nvme controller is involved *with* the RCU splat.

The qemu nvme driver and qemu controller would be one code run time surface
to conside here, the two consecutive nvme controller IO timeouts seem
to be telling of a backlog of some sort:

[493305.769531] run blktests zbd/006 at 2022-04-14 20:03:55
[493336.979482] nvme nvme9: I/O 192 QID 5 timeout, aborting
[493336.981666] nvme nvme9: Abort status: 0x0
[493367.699440] nvme nvme9: I/O 192 QID 5 timeout, reset controller
<-- rcu possible stall and NMIs-->
[493426.706021] nvme nvme9: 8/0/0 default/read/poll queues

So that seems to be pattern here. First puzzle is *why* we end up with
the IO timeout on the qemu nvme controller. And for this reason I CC'd
Klaus, in case he might have any ideas.

The first "timeout, aborting" happens when we have a first timeout of
a request for IO tag 192 sent to the controller and it just times out.
This seems to happen about 30 seconds after the test zbd/006 starts.

The "reset controller" will happen if we already issued the IO command
with tag 192 before and had already timed out before and hadn't been returned
to the driver. Resetting the controller is the stop gap measure.

The nvme_timeout() is just shared blk_mq_ops between nvme_mq_admin_ops
and nvme_mq_ops.

Inspecting the logs of running test zbd/005 and zbd/006 in a loop I can
see that not all nvme resets of the controller end up in the RCU stall splat
though.

The timeout of an IO happens about after 30 seconds the test starts, then
the reset 30 seconds after that first timeout, then the RCU stall about
30 seconds after. For instance follow these now mapped times:

Apr 13 23:02:33 linux517-blktests-zbd unknown: run blktests zbd/006 at 2022-04-13 23:02:33
30 seconds go by ...
Apr 13 23:03:05 linux517-blktests-zbd kernel: nvme nvme9: I/O 651 QID 3 timeout, aborting
Apr 13 23:03:05 linux517-blktests-zbd kernel: nvme nvme9: Abort status: 0x0
30 seconds go by ...
Apr 13 23:03:36 linux517-blktests-zbd kernel: nvme nvme9: I/O 651 QID 3 timeout, reset controller
30 seconds go by ...
Apr 13 23:04:04 linux517-blktests-zbd kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:

So the reset happens and yet we stay put. The reset work for the PCI
nvme controller is on nvme_reset_work() drivers/nvme/host/pci.c and
I don't see that issue'ing stuff back up the admin queue, however,
the first IO timeout did send an admin abort, on the same file:

drivers/nvme/host/pci.c
static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
{
	...
	dev_warn(nvmeq->dev->ctrl.device,                                       
                "I/O %d QID %d timeout, aborting\n",                            
                 req->tag, nvmeq->qid);                                         
                                                                                
        abort_req = blk_mq_alloc_request(dev->ctrl.admin_q, nvme_req_op(&cmd),  
                                         BLK_MQ_REQ_NOWAIT);                    
        if (IS_ERR(abort_req)) {                                                
                atomic_inc(&dev->ctrl.abort_limit);                             
                return BLK_EH_RESET_TIMER;                                      
        }                                                                       
        nvme_init_request(abort_req, &cmd);                                     
                                                                                
        abort_req->end_io_data = NULL;                                          
        blk_execute_rq_nowait(abort_req, false, abort_endio);  
	...
}

So this is all seems by design. What's puzzling is why it takes
about 30 seconds to get to the possible RCU stall splat *after* we
reset the nvme controller. It make me wonder if we left something
lingering in a black hole somewhere after our first reset. Is it
right to gather from the RCU splat that whatever it was, it was stuck
in a long spin lock?

  Luis

  parent reply	other threads:[~2022-04-15 17:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-14 22:02 blktests with zbd/006 ZNS triggers a possible false positive RCU stall Luis Chamberlain
2022-04-15  1:09 ` Davidlohr Bueso
2022-04-15  3:54   ` Paul E. McKenney
2022-04-15  4:30     ` Davidlohr Bueso
2022-04-15 17:35       ` Luis Chamberlain
2022-04-15 17:33   ` Luis Chamberlain [this message]
2022-04-15 17:42     ` Paul E. McKenney
2022-04-20  5:54 ` Shinichiro Kawasaki
2022-04-21 18:00   ` Luis Chamberlain
2022-04-27  5:08     ` Shinichiro Kawasaki
2022-04-27  5:42       ` Luis Chamberlain
2022-04-27  7:41       ` Klaus Jensen
2022-04-27  8:39         ` Damien Le Moal
2022-04-27  8:55           ` Klaus Jensen
2022-04-27  8:53         ` Shinichiro Kawasaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YlmsTCBLg6hB2eNn@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=a.manzanares@samsung.com \
    --cc=dave@stgolabs.net \
    --cc=its@irrelevant.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=pankydev8@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox