From: Keith Busch <keith.busch@intel.com>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "hch@lst.de" <hch@lst.de>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: [RFC PATCH] blk-mq: Fix lost request during timeout
Date: Mon, 18 Sep 2017 18:39:00 -0400 [thread overview]
Message-ID: <20170918223900.GA4671@localhost.localdomain> (raw)
In-Reply-To: <1505772477.12701.6.camel@wdc.com>
On Mon, Sep 18, 2017 at 10:07:58PM +0000, Bart Van Assche wrote:
> On Mon, 2017-09-18 at 18:03 -0400, Keith Busch wrote:
> > I think we've always known it's possible to lose a request during timeout
> > handling, but just accepted that possibility. It seems to be causing
> > problems, though, leading to unnecessary error escalation and IO failures.
> >
> > The possiblity arises when the block layer marks the request complete
> > prior to running the timeout handler. If that request happens to complete
> > while the handler is running, the request will be lost, inevitably
> > triggering a second timeout.
> >
> > This patch attempts to shorten the window for this race condition by
> > clearing the started flag when the driver completes a request. The block
> > layer's timeout handler will then complete the command if it observes
> > the started flag is no longer set.
> >
> > Note it's possible to lose the command even with this patch. It's just
> > less likely to happen.
>
> Hello Keith,
>
> Are you sure the root cause of this race condition is in the blk-mq core?
> I've never observed such behavior in any of my numerous scsi-mq tests (which
> trigger timeouts). Are you sure the race you observed is not caused by a
> blk_mq_reinit_tagset() call, a function that is only used by the NVMe driver
> and not by any other block driver?
Hi Bart,
The nvme driver's use of blk_mq_reinit_tagset only happens during
controller initialisation, but I'm seeing lost commands well after that
during normal and stable running.
The timing is pretty narrow to hit, but I'm pretty sure this is what's
happening. For nvme, this occurs when nvme_timeout() runs concurrently
with nvme_handle_cqe() for the same struct request. For scsi-mq,
the same situation may arise if scsi_mq_done() runs concurrently with
scsi_times_out().
I don't really like the proposed "fix" though since it only makes it
less likely, but I didn't see a way to close that without introducing
locks. If someone knows of a better way, that would be awesome.
Thanks,
Keith
next prev parent reply other threads:[~2017-09-18 22:39 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-18 22:03 [RFC PATCH] blk-mq: Fix lost request during timeout Keith Busch
2017-09-18 22:07 ` Bart Van Assche
2017-09-18 22:39 ` Keith Busch [this message]
2017-09-18 22:53 ` Bart Van Assche
2017-09-18 23:08 ` Keith Busch
2017-09-18 23:14 ` Bart Van Assche
2017-09-19 1:55 ` Keith Busch
2017-09-19 15:05 ` Bart Van Assche
2017-09-19 4:16 ` Ming Lei
2017-09-19 15:07 ` Keith Busch
2017-09-19 15:18 ` Bart Van Assche
2017-09-19 16:39 ` Keith Busch
2017-09-19 15:22 ` Ming Lei
2017-09-19 16:00 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170918223900.GA4671@localhost.localdomain \
--to=keith.busch@intel.com \
--cc=Bart.VanAssche@wdc.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.