From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28972C4CEC9 for ; Tue, 17 Sep 2019 07:38:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0688620678 for ; Tue, 17 Sep 2019 07:38:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392660AbfIQHib (ORCPT ); Tue, 17 Sep 2019 03:38:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41430 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726059AbfIQHib (ORCPT ); Tue, 17 Sep 2019 03:38:31 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 01068C08EC19; Tue, 17 Sep 2019 07:38:30 +0000 (UTC) Received: from ming.t460p (ovpn-8-25.pek2.redhat.com [10.72.8.25]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CE6D15F7E6; Tue, 17 Sep 2019 07:38:03 +0000 (UTC) Date: Tue, 17 Sep 2019 15:37:58 +0800 From: Ming Lei To: Yufen Yu Cc: axboe@kernel.dk, linux-block@vger.kernel.org, hch@infradead.org, keith.busch@intel.com Subject: Re: [PATCH v2] block: fix null pointer dereference in blk_mq_rq_timed_out() Message-ID: <20190917073757.GA15374@ming.t460p> References: <20190917070312.711-1-yuyufen@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190917070312.711-1-yuyufen@huawei.com> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 17 Sep 2019 07:38:30 +0000 (UTC) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Tue, Sep 17, 2019 at 03:03:12PM +0800, Yufen Yu wrote: > We got a null pointer deference BUG_ON in blk_mq_rq_timed_out() > as following: > > [ 108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040 > [ 108.827059] PGD 0 P4D 0 > [ 108.827313] Oops: 0000 [#1] SMP PTI > [ 108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431 > [ 108.829503] Workqueue: kblockd blk_mq_timeout_work > [ 108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330 > [ 108.838191] Call Trace: > [ 108.838406] bt_iter+0x74/0x80 > [ 108.838665] blk_mq_queue_tag_busy_iter+0x204/0x450 > [ 108.839074] ? __switch_to_asm+0x34/0x70 > [ 108.839405] ? blk_mq_stop_hw_queue+0x40/0x40 > [ 108.839823] ? blk_mq_stop_hw_queue+0x40/0x40 > [ 108.840273] ? syscall_return_via_sysret+0xf/0x7f > [ 108.840732] blk_mq_timeout_work+0x74/0x200 > [ 108.841151] process_one_work+0x297/0x680 > [ 108.841550] worker_thread+0x29c/0x6f0 > [ 108.841926] ? rescuer_thread+0x580/0x580 > [ 108.842344] kthread+0x16a/0x1a0 > [ 108.842666] ? kthread_flush_work+0x170/0x170 > [ 108.843100] ret_from_fork+0x35/0x40 > > The bug is caused by the race between timeout handle and completion for > flush request. > > When timeout handle function blk_mq_rq_timed_out() try to read > 'req->q->mq_ops', the 'req' have completed and reinitiated by next > flush request, which would call blk_rq_init() to clear 'req' as 0. > > After commit 12f5b93145 ("blk-mq: Remove generation seqeunce"), > normal requests lifetime are protected by refcount. Until 'rq->ref' > drop to zero, the request can really be free. Thus, these requests > cannot been reused before timeout handle finish. > > However, flush request has defined .end_io and rq->end_io() is still > called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq' > can be reused by the next flush request handle, resulting in null > pointer deference BUG ON. > > We fix this problem by covering flush request with 'rq->ref'. > If the refcount is not zero, flush_end_io() return and wait the > last holder recall it. To record the request status, we add a new > entry 'rq_status', which will be used in flush_end_io(). > > Cc: Ming Lei > Cc: Christoph Hellwig > Cc: Keith Busch > Signed-off-by: Yufen Yu > --- > block/blk-flush.c | 8 ++++++++ > block/blk-mq.c | 7 +++++-- > block/blk.h | 6 ++++++ > 3 files changed, 19 insertions(+), 2 deletions(-) > > diff --git a/block/blk-flush.c b/block/blk-flush.c > index aedd9320e605..f3ef6ce05c78 100644 > --- a/block/blk-flush.c > +++ b/block/blk-flush.c > @@ -212,6 +212,14 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error) > struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx); > struct blk_mq_hw_ctx *hctx; > > + if (!refcount_dec_and_test(&flush_rq->ref)) { > + fq->rq_status = error; > + return; > + } > + > + if (fq->rq_status != BLK_STS_OK) > + error = fq->rq_status; > + > /* release the tag's ownership to the req cloned from */ > spin_lock_irqsave(&fq->mq_flush_lock, flags); > hctx = flush_rq->mq_hctx; > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 0835f4d8d42e..3d2b2c2e9cdf 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -905,9 +905,12 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx, > */ > if (blk_mq_req_expired(rq, next)) > blk_mq_rq_timed_out(rq, reserved); > - if (refcount_dec_and_test(&rq->ref)) > - __blk_mq_free_request(rq); > > + if (is_flush_rq(rq, hctx)) { > + rq->end_io(rq, 0); > + } else if (refcount_dec_and_test(&rq->ref)) { > + __blk_mq_free_request(rq); > + } The above two pair of '{}' can be removed. > return true; > } > > diff --git a/block/blk.h b/block/blk.h > index de6b2e146d6e..128bb53622ff 100644 > --- a/block/blk.h > +++ b/block/blk.h > @@ -30,6 +30,7 @@ struct blk_flush_queue { > */ > struct request *orig_rq; > spinlock_t mq_flush_lock; > + blk_status_t rq_status; > }; > > extern struct kmem_cache *blk_requestq_cachep; > @@ -47,6 +48,11 @@ static inline void __blk_get_queue(struct request_queue *q) > kobject_get(&q->kobj); > } > > +static inline bool > +is_flush_rq(struct request *req, struct blk_mq_hw_ctx *hctx) { > + return hctx->fq->flush_rq == req; > +} We usually don't put '{' at the end of function name line. Once the above patch style comments are addressed, feel free to add: Reviewed-by: Ming Lei thanks, Ming