* [PATCH] blk-mq: Fix a race between resetting the timer and completion handling
@ 2018-02-01 22:49 Bart Van Assche
2018-02-05 21:06 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2018-02-01 22:49 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche, Tejun Heo
While injecting errors from the target side I noticed many kinds of
weird behavior at the initiator side, including the call stack shown
below and CPU soft lockup complaints. This patch seems to address all
these issues. This patch not only avoids that a completion can
succeed while the timer is being reset but also shrinks the size of
struct request. The performance impact of this patch should be
minimal: the only change that affects performance of the hot path is
the change of a preempt_disable() / preempt_enable() pair in
blk_mq_start_request() into a local_irq_disable() /
local_irq_enable() pair.
This patch fixes the following kernel crash:
BUG: unable to handle kernel NULL pointer dereference at (null)
Oops: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 151 Comm: kworker/2:1H Tainted: G W 4.15.0-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:scsi_times_out+0x17/0x2c0 [scsi_mod]
Call Trace:
blk_mq_terminate_expired+0x42/0x80
bt_iter+0x3d/0x50
blk_mq_queue_tag_busy_iter+0xe9/0x200
blk_mq_timeout_work+0x181/0x2e0
process_one_work+0x21c/0x6d0
worker_thread+0x35/0x380
kthread+0x117/0x130
ret_from_fork+0x24/0x30
Fixes: 1d9bd5161ba3 ("blk-mq: replace timeout synchronization with a RCU and generation based scheme")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
---
block/blk-core.c | 1 -
block/blk-mq.c | 29 +++++++++++++++--------------
include/linux/blkdev.h | 12 ++----------
3 files changed, 17 insertions(+), 25 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index d0d104268f1a..a08355e1dd2a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -127,7 +127,6 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
set_start_time_ns(rq);
rq->part = NULL;
seqcount_init(&rq->gstate_seq);
- u64_stats_init(&rq->aborted_gstate_sync);
}
EXPORT_SYMBOL(blk_rq_init);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index df93102e2149..372f0aeb693f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -592,9 +592,9 @@ static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
* while updating.
*/
local_irq_save(flags);
- u64_stats_update_begin(&rq->aborted_gstate_sync);
+ write_seqcount_begin(&rq->gstate_seq);
rq->aborted_gstate = gstate;
- u64_stats_update_end(&rq->aborted_gstate_sync);
+ write_seqcount_end(&rq->gstate_seq);
local_irq_restore(flags);
}
@@ -603,10 +603,13 @@ static u64 blk_mq_rq_aborted_gstate(struct request *rq)
unsigned int start;
u64 aborted_gstate;
- do {
- start = u64_stats_fetch_begin(&rq->aborted_gstate_sync);
+ while (true) {
+ start = read_seqcount_begin(&rq->gstate_seq);
aborted_gstate = rq->aborted_gstate;
- } while (u64_stats_fetch_retry(&rq->aborted_gstate_sync, start));
+ if (!read_seqcount_retry(&rq->gstate_seq, start))
+ break;
+ cond_resched();
+ }
return aborted_gstate;
}
@@ -679,14 +682,14 @@ void blk_mq_start_request(struct request *rq)
* @rq->deadline it reads together under @rq->gstate_seq is
* guaranteed to be the matching one.
*/
- preempt_disable();
+ local_irq_disable();
write_seqcount_begin(&rq->gstate_seq);
blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
blk_add_timer(rq);
write_seqcount_end(&rq->gstate_seq);
- preempt_enable();
+ local_irq_enable();
if (q->dma_drain_size && blk_rq_bytes(rq)) {
/*
@@ -831,13 +834,12 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
__blk_mq_complete_request(req);
break;
case BLK_EH_RESET_TIMER:
- /*
- * As nothing prevents from completion happening while
- * ->aborted_gstate is set, this may lead to ignored
- * completions and further spurious timeouts.
- */
- blk_mq_rq_update_aborted_gstate(req, 0);
+ local_irq_disable();
+ write_seqcount_begin(&req->gstate_seq);
blk_add_timer(req);
+ req->aborted_gstate = 0;
+ write_seqcount_end(&req->gstate_seq);
+ local_irq_enable();
break;
case BLK_EH_NOT_HANDLED:
break;
@@ -2074,7 +2076,6 @@ static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
}
seqcount_init(&rq->gstate_seq);
- u64_stats_init(&rq->aborted_gstate_sync);
return 0;
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4f3df807cf8f..3dcebc046c2e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -231,19 +231,11 @@ struct request {
* generation number which is monotonically incremented and used to
* distinguish the reuse instances.
*
- * ->gstate_seq allows updates to ->gstate and other fields
- * (currently ->deadline) during request start to be read
- * atomically from the timeout path, so that it can operate on a
- * coherent set of information.
+ * ->gstate_seq serializes accesses of ->gstate, ->aborted_gstate and
+ * ->__deadline.
*/
seqcount_t gstate_seq;
u64 gstate;
-
- /*
- * ->aborted_gstate is used by the timeout to claim a specific
- * recycle instance of this request. See blk_mq_timeout_work().
- */
- struct u64_stats_sync aborted_gstate_sync;
u64 aborted_gstate;
/* access through blk_rq_set_deadline, blk_rq_deadline */
--
2.16.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling
2018-02-01 22:49 [PATCH] blk-mq: Fix a race between resetting the timer and completion handling Bart Van Assche
@ 2018-02-05 21:06 ` Tejun Heo
2018-02-05 21:33 ` Bart Van Assche
0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2018-02-05 21:06 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Christoph Hellwig
Hello, Bart.
Thanks a lot for testing and fixing the issues but I'm a bit confused
by the patch. Maybe we can split patch a bit more? There seem to be
three things going on,
1. Changing preemption protection to irq protection in issue path.
2. Merge of aborted_gstate_sync and gstate_seq.
3. Updates to blk_mq_rq_timed_out().
Are all three changes necessary for stability?
> @@ -831,13 +834,12 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
> __blk_mq_complete_request(req);
> break;
> case BLK_EH_RESET_TIMER:
> - /*
> - * As nothing prevents from completion happening while
> - * ->aborted_gstate is set, this may lead to ignored
> - * completions and further spurious timeouts.
> - */
> - blk_mq_rq_update_aborted_gstate(req, 0);
> + local_irq_disable();
> + write_seqcount_begin(&req->gstate_seq);
> blk_add_timer(req);
> + req->aborted_gstate = 0;
> + write_seqcount_end(&req->gstate_seq);
> + local_irq_enable();
> break;
So, this is #3 and I'm not sure how adding gstate_seq protection gets
rid of the race condition mentioned in the comment. It's still the
same that nothing is protecting against racing w/ completion.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling
2018-02-05 21:06 ` Tejun Heo
@ 2018-02-05 21:33 ` Bart Van Assche
2018-02-05 21:37 ` tj
0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2018-02-05 21:33 UTC (permalink / raw)
To: tj@kernel.org; +Cc: hch@lst.de, linux-block@vger.kernel.org, axboe@kernel.dk
T24gTW9uLCAyMDE4LTAyLTA1IGF0IDEzOjA2IC0wODAwLCBUZWp1biBIZW8gd3JvdGU6DQo+IFRo
YW5rcyBhIGxvdCBmb3IgdGVzdGluZyBhbmQgZml4aW5nIHRoZSBpc3N1ZXMgYnV0IEknbSBhIGJp
dCBjb25mdXNlZA0KPiBieSB0aGUgcGF0Y2guICBNYXliZSB3ZSBjYW4gc3BsaXQgcGF0Y2ggYSBi
aXQgbW9yZT8gIFRoZXJlIHNlZW0gdG8gYmUNCj4gdGhyZWUgdGhpbmdzIGdvaW5nIG9uLA0KPiAN
Cj4gMS4gQ2hhbmdpbmcgcHJlZW1wdGlvbiBwcm90ZWN0aW9uIHRvIGlycSBwcm90ZWN0aW9uIGlu
IGlzc3VlIHBhdGguDQo+IA0KPiAyLiBNZXJnZSBvZiBhYm9ydGVkX2dzdGF0ZV9zeW5jIGFuZCBn
c3RhdGVfc2VxLg0KPiANCj4gMy4gVXBkYXRlcyB0byBibGtfbXFfcnFfdGltZWRfb3V0KCkuDQo+
IA0KPiBBcmUgYWxsIHRocmVlIGNoYW5nZXMgbmVjZXNzYXJ5IGZvciBzdGFiaWxpdHk/DQoNCkhl
bGxvIFRlanVuLA0KDQpNeSBnb2FsIHdpdGggdGhpcyBwYXRjaCBpcyB0byBmaXggdGhlIHJhY2Ug
YmV0d2VlbiByZXNldHRpbmcgdGhlIHRpbWVyIGFuZA0KdGhlIGNvbXBsZXRpb24gcGF0aC4gSGVu
Y2UgY2hhbmdlICgzKS4gQ2hhbmdlcyAoMSkgYW5kICgyKSBhcmUgbmVlZGVkIHRvDQptYWtlIHRo
ZSBjaGFuZ2VzIGluIGJsa19tcV9ycV90aW1lZF9vdXQoKSB3b3JrLg0KDQo+ID4gQEAgLTgzMSwx
MyArODM0LDEyIEBAIHN0YXRpYyB2b2lkIGJsa19tcV9ycV90aW1lZF9vdXQoc3RydWN0IHJlcXVl
c3QgKnJlcSwgYm9vbCByZXNlcnZlZCkNCj4gPiAgCQlfX2Jsa19tcV9jb21wbGV0ZV9yZXF1ZXN0
KHJlcSk7DQo+ID4gIAkJYnJlYWs7DQo+ID4gIAljYXNlIEJMS19FSF9SRVNFVF9USU1FUjoNCj4g
PiAtCQkvKg0KPiA+IC0JCSAqIEFzIG5vdGhpbmcgcHJldmVudHMgZnJvbSBjb21wbGV0aW9uIGhh
cHBlbmluZyB3aGlsZQ0KPiA+IC0JCSAqIC0+YWJvcnRlZF9nc3RhdGUgaXMgc2V0LCB0aGlzIG1h
eSBsZWFkIHRvIGlnbm9yZWQNCj4gPiAtCQkgKiBjb21wbGV0aW9ucyBhbmQgZnVydGhlciBzcHVy
aW91cyB0aW1lb3V0cy4NCj4gPiAtCQkgKi8NCj4gPiAtCQlibGtfbXFfcnFfdXBkYXRlX2Fib3J0
ZWRfZ3N0YXRlKHJlcSwgMCk7DQo+ID4gKwkJbG9jYWxfaXJxX2Rpc2FibGUoKTsNCj4gPiArCQl3
cml0ZV9zZXFjb3VudF9iZWdpbigmcmVxLT5nc3RhdGVfc2VxKTsNCj4gPiAgCQlibGtfYWRkX3Rp
bWVyKHJlcSk7DQo+ID4gKwkJcmVxLT5hYm9ydGVkX2dzdGF0ZSA9IDA7DQo+ID4gKwkJd3JpdGVf
c2VxY291bnRfZW5kKCZyZXEtPmdzdGF0ZV9zZXEpOw0KPiA+ICsJCWxvY2FsX2lycV9lbmFibGUo
KTsNCj4gPiAgCQlicmVhazsNCj4gDQo+IFNvLCB0aGlzIGlzICMzIGFuZCBJJ20gbm90IHN1cmUg
aG93IGFkZGluZyBnc3RhdGVfc2VxIHByb3RlY3Rpb24gZ2V0cw0KPiByaWQgb2YgdGhlIHJhY2Ug
Y29uZGl0aW9uIG1lbnRpb25lZCBpbiB0aGUgY29tbWVudC4gIEl0J3Mgc3RpbGwgdGhlDQo+IHNh
bWUgdGhhdCBub3RoaW5nIGlzIHByb3RlY3RpbmcgYWdhaW5zdCByYWNpbmcgdy8gY29tcGxldGlv
bi4NCg0KSSB0aGluayB5b3UgYXJlIHJpZ2h0LiBJIHdpbGwgc2VlIHdoZXRoZXIgSSBjYW4gcmV3
b3JrIHRoaXMgcGF0Y2ggdG8gYWRkcmVzcw0KdGhhdCByYWNlLg0KDQpUaGFua3MsDQoNCkJhcnQu
DQoNCg0K
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] blk-mq: Fix a race between resetting the timer and completion handling
2018-02-05 21:33 ` Bart Van Assche
@ 2018-02-05 21:37 ` tj
0 siblings, 0 replies; 4+ messages in thread
From: tj @ 2018-02-05 21:37 UTC (permalink / raw)
To: Bart Van Assche; +Cc: hch@lst.de, linux-block@vger.kernel.org, axboe@kernel.dk
Hello, Bart.
On Mon, Feb 05, 2018 at 09:33:03PM +0000, Bart Van Assche wrote:
> My goal with this patch is to fix the race between resetting the timer and
> the completion path. Hence change (3). Changes (1) and (2) are needed to
> make the changes in blk_mq_rq_timed_out() work.
Ah, I see. That makes sense. Can I ask you to elaborate the scenario
you were fixing?
> > > @@ -831,13 +834,12 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
> > > __blk_mq_complete_request(req);
> > > break;
> > > case BLK_EH_RESET_TIMER:
> > > - /*
> > > - * As nothing prevents from completion happening while
> > > - * ->aborted_gstate is set, this may lead to ignored
> > > - * completions and further spurious timeouts.
> > > - */
> > > - blk_mq_rq_update_aborted_gstate(req, 0);
> > > + local_irq_disable();
> > > + write_seqcount_begin(&req->gstate_seq);
> > > blk_add_timer(req);
> > > + req->aborted_gstate = 0;
> > > + write_seqcount_end(&req->gstate_seq);
> > > + local_irq_enable();
> > > break;
> >
> > So, this is #3 and I'm not sure how adding gstate_seq protection gets
> > rid of the race condition mentioned in the comment. It's still the
> > same that nothing is protecting against racing w/ completion.
>
> I think you are right. I will see whether I can rework this patch to address
> that race.
That race is harmless and has always been there tho. It only happens
when the actual completion coincides with timeout expiring, which is
very unlikely, and the only downside is that the completion gets lost
and the request will get timed out down the line. It'd of course be
better to close the race window but code simplicity likely is an
important trade-off factor here.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-02-05 21:37 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-01 22:49 [PATCH] blk-mq: Fix a race between resetting the timer and completion handling Bart Van Assche
2018-02-05 21:06 ` Tejun Heo
2018-02-05 21:33 ` Bart Van Assche
2018-02-05 21:37 ` tj
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox