From: Keith Busch <keith.busch@intel.com>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "hch@lst.de" <hch@lst.de>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: [RFC PATCH] blk-mq: Fix lost request during timeout
Date: Mon, 18 Sep 2017 21:55:33 -0400 [thread overview]
Message-ID: <20170919015533.GC4671@localhost.localdomain> (raw)
In-Reply-To: <1505776477.12701.28.camel@wdc.com>
On Mon, Sep 18, 2017 at 11:14:38PM +0000, Bart Van Assche wrote:
> On Mon, 2017-09-18 at 19:08 -0400, Keith Busch wrote:
> > On Mon, Sep 18, 2017 at 10:53:12PM +0000, Bart Van Assche wrote:
> > > Are you sure that scenario can happen? The blk-mq core calls test_and_set_bit()
> > > for the REQ_ATOM_COMPLETE flag before any completion or timeout handler is
> > > called. See also blk_mark_rq_complete(). This avoids that the .complete() and
> > > .timeout() functions run concurrently.
> >
> > Indeed that prevents .complete from running concurrently with the
> > timeout handler, but scsi_mq_done and nvme_handle_cqe are not .complete
> > callbacks. These are the LLD functions that call blk_mq_complete_request
> > well before .complete. If the driver calls blk_mq_complete_request on
> > a request that blk-mq is timing out, the request is lost because blk-mq
> > already called blk_mark_rq_complete. Nothing prevents these LLD functions
> > from running at the same time as the timeout handler.
>
> Can you explain how you define "request is lost"?
Sure, what I mean by "lost" is when nothing will call
__blk_mq_complete_request, which is required to make forward progress on
that request.
If a driver calls blk_mq_complete_request on a request being checked by
by the timeout handler, blk-mq will return immediately instead of
making progress toward completion since blk-mq already set
REQ_ATOM_COMPLETE while running the timeout hanlder.
> If a timeout occurs for a
> SCSI request then scsi_times_out() calls scsi_abort_command() (if no
> .eh_timed_out() callback has been defined by the LLD). It is the responsibility
> of the SCSI LLD to call .scsi_done() before its .eh_abort_handler() returns
> SUCCESS. If .eh_abort_handler() returns a value other than SUCCESS then the
> SCSI core will escalate the error further until .scsi_done() has been called for
> the command that timed out. See also scsi_abort_eh_cmnd(). So I think what you
> wrote is not correct for the SCSI core and a properly implemented SCSI LLD.
Once blk-mq's blk_mq_check_expired calls blk_mark_rq_complete, an
LLD calling blk_mq_complete_request does absolutely nothing because
REQ_ATOM_COMPLETE is already set.
There's nothing stopping scsi_mq_done from running at the same time as
blk-mq's timeout handler.
It doesn't matter what path .scsi_done is called now. That will just
call scsi_mq_done -> blk_mq_complete_request, and since
REQ_ATOM_COMPLETE is already set, the command won't be completed.
The only way to complete that request now is if the timeout
handler returns BLK_EH_HANDLED, but the scsi-mq abort path returns
BLK_EH_NOT_HANDLED on success (very few drivers actually return
BLK_EH_HANDLED).
After the timeout handler runs, it's once again possible for the driver to
complete the request if it returned BLK_EH_RESET_TIMER, but if the driver
already completed the command during the timeout handler, how is the
driver supposed to know it needs to complete the request a second time?
next prev parent reply other threads:[~2017-09-19 1:55 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-18 22:03 [RFC PATCH] blk-mq: Fix lost request during timeout Keith Busch
2017-09-18 22:07 ` Bart Van Assche
2017-09-18 22:39 ` Keith Busch
2017-09-18 22:53 ` Bart Van Assche
2017-09-18 23:08 ` Keith Busch
2017-09-18 23:14 ` Bart Van Assche
2017-09-19 1:55 ` Keith Busch [this message]
2017-09-19 15:05 ` Bart Van Assche
2017-09-19 4:16 ` Ming Lei
2017-09-19 15:07 ` Keith Busch
2017-09-19 15:18 ` Bart Van Assche
2017-09-19 16:39 ` Keith Busch
2017-09-19 15:22 ` Ming Lei
2017-09-19 16:00 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170919015533.GC4671@localhost.localdomain \
--to=keith.busch@intel.com \
--cc=Bart.VanAssche@wdc.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox