From: "tj@kernel.org" <tj@kernel.org>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"israelr@mellanox.com" <israelr@mellanox.com>,
"sagi@grimberg.me" <sagi@grimberg.me>, "hch@lst.de" <hch@lst.de>,
"martin@lichtvoll.de" <martin@lichtvoll.de>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
"axboe@kernel.dk" <axboe@kernel.dk>,
"ming.lei@redhat.com" <ming.lei@redhat.com>,
"maxg@mellanox.com" <maxg@mellanox.com>
Subject: Re: [PATCH v4] blk-mq: Fix race conditions in request timeout handling
Date: Wed, 11 Apr 2018 07:16:43 -0700 [thread overview]
Message-ID: <20180411141643.GF793541@devbig577.frc2.facebook.com> (raw)
In-Reply-To: <ac0d5904dbcd55e190df318a66a9d7d51c56f3ae.camel@wdc.com>
Hello, Bart.
On Wed, Apr 11, 2018 at 12:50:51PM +0000, Bart Van Assche wrote:
> Thank you for having shared this patch. It looks interesting to me. What I
> know about the blk-mq timeout handling is as follows:
> * Nobody who has reviewed the blk-mq timeout handling code with this patch
> applied has reported any shortcomings for that code.
> * However, several people have reported kernel crashes that disappear when
> the blk-mq timeout code is reworked. I'm referring to "nvme-rdma corrupts
> memory upon timeout"
> (http://lists.infradead.org/pipermail/linux-nvme/2018-February/015848.html)
> and also to a "RIP: scsi_times_out+0x17" crash during boot
> (https://bugzilla.kernel.org/show_bug.cgi?id=199077).
>
> So we have the choice between two approaches:
> (1) apply the patch from your previous e-mail and root-cause and fix the
> crashes referred to above.
> (2) apply a patch that makes the crashes reported against v4.16 disappear and
> remove the atomic instructions introduced by such a patch at a later time.
>
> Since crashes have been reported for kernel v4.16 I think we should follow
> approach (2). That will remove the time pressure from root-causing and fixing
> the crashes reported for the NVMeOF initiator and SCSI initiator drivers.
So, it really bothers me how blind we're going about this. It isn't
an insurmountable emergency that we have to adopt whatever solution
which passed a couple tests this minute. We can debug and root cause
this properly and pick the right solution. We even have two most
likely causes already analysed and patches proposed, one of them
months ago. If we wanna change the handover model, let's do that
because the new one is better, not because of vague fear.
Thanks.
--
tejun
next prev parent reply other threads:[~2018-04-11 14:16 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-10 1:34 [PATCH v4] blk-mq: Fix race conditions in request timeout handling Bart Van Assche
2018-04-10 7:59 ` jianchao.wang
2018-04-10 10:04 ` Ming Lei
2018-04-10 12:04 ` Shan Hai
2018-04-10 13:01 ` Bart Van Assche
2018-04-10 13:01 ` Bart Van Assche
2018-04-10 14:32 ` jianchao.wang
2018-04-10 8:41 ` Ming Lei
2018-04-10 12:58 ` Bart Van Assche
2018-04-10 12:58 ` Bart Van Assche
2018-04-10 13:55 ` Ming Lei
2018-04-10 14:09 ` Bart Van Assche
2018-04-10 14:09 ` Bart Van Assche
2018-04-10 14:30 ` Ming Lei
2018-04-10 15:02 ` Bart Van Assche
2018-04-10 15:02 ` Bart Van Assche
2018-04-10 15:25 ` Ming Lei
2018-04-10 15:30 ` tj
2018-04-10 15:38 ` Ming Lei
2018-04-10 15:40 ` tj
2018-04-10 21:33 ` tj
2018-04-10 21:46 ` Bart Van Assche
2018-04-10 21:46 ` Bart Van Assche
2018-04-10 21:54 ` tj
2018-04-11 12:50 ` Bart Van Assche
2018-04-11 12:50 ` Bart Van Assche
2018-04-11 14:16 ` tj [this message]
2018-04-11 18:38 ` Martin Steigerwald
2018-04-11 18:38 ` Martin Steigerwald
2018-04-11 14:24 ` Sagi Grimberg
2018-04-11 14:43 ` tj
2018-04-11 16:16 ` Israel Rukshin
2018-04-11 17:07 ` tj
2018-04-11 21:31 ` tj
2018-04-12 8:59 ` Israel Rukshin
2018-04-12 13:35 ` tj
2018-04-15 12:28 ` Israel Rukshin
2018-04-18 16:34 ` Bart Van Assche
2018-04-10 9:55 ` Christoph Hellwig
2018-04-10 13:26 ` Bart Van Assche
2018-04-10 13:26 ` Bart Van Assche
2018-04-10 14:50 ` hch
2018-04-10 14:41 ` Jens Axboe
2018-04-10 14:20 ` Tejun Heo
2018-04-10 14:30 ` Bart Van Assche
2018-04-10 14:30 ` Bart Van Assche
2018-04-10 14:33 ` tj
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180411141643.GF793541@devbig577.frc2.facebook.com \
--to=tj@kernel.org \
--cc=Bart.VanAssche@wdc.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=israelr@mellanox.com \
--cc=linux-block@vger.kernel.org \
--cc=martin@lichtvoll.de \
--cc=maxg@mellanox.com \
--cc=ming.lei@redhat.com \
--cc=sagi@grimberg.me \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.