From: Ming Lei <ming.lei@redhat.com>
To: chengming.zhou@linux.dev
Cc: axboe@kernel.dk, kch@nvidia.com, dhowells@redhat.com,
damien.lemoal@opensource.wdc.com, bvanassche@acm.org,
nj.shetty@samsung.com, kbusch@kernel.org,
zhouchengming@bytedance.com, akinobu.mita@gmail.com,
shinichiro.kawasaki@wdc.com, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] null_blk: fix poll request timeout handling
Date: Tue, 15 Aug 2023 16:31:02 +0800 [thread overview]
Message-ID: <ZNs3xtOzH4+blVCF@fedora> (raw)
In-Reply-To: <20230815060443.660263-1-chengming.zhou@linux.dev>
On Tue, Aug 15, 2023 at 02:04:42PM +0800, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
>
> When doing io_uring benchmark on /dev/nullb0, it's easy to crash the
> kernel if poll requests timeout triggered, as reported by David. [1]
Just be curious, how is the timeout triggered when running
"./fio/t/io_uring -r20 /dev/nullb0"?
David mentioned that the issue is triggered in 6.5-rc1, maybe one
regression?
>
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> Workqueue: kblockd blk_mq_timeout_work
> RIP: 0010:null_timeout_rq+0x4e/0x91
> Call Trace:
> ? __die_body+0x1a/0x5c
> ? page_fault_oops+0x6f/0x9c
> ? kernelmode_fixup_or_oops+0xc6/0xd6
> ? __bad_area_nosemaphore+0x44/0x1eb
> ? exc_page_fault+0xe2/0xf4
> ? asm_exc_page_fault+0x22/0x30
> ? null_timeout_rq+0x4e/0x91
> blk_mq_handle_expired+0x31/0x4b
> bt_iter+0x68/0x84
> ? bt_tags_iter+0x81/0x81
> __sbitmap_for_each_set.constprop.0+0xb0/0xf2
> ? __blk_mq_complete_request_remote+0xf/0xf
> bt_for_each+0x46/0x64
> ? __blk_mq_complete_request_remote+0xf/0xf
> ? percpu_ref_get_many+0xc/0x2a
> blk_mq_queue_tag_busy_iter+0x14d/0x18e
> blk_mq_timeout_work+0x95/0x127
> process_one_work+0x185/0x263
> worker_thread+0x1b5/0x227
> ? rescuer_thread+0x287/0x287
> kthread+0xfa/0x102
> ? kthread_complete_and_exit+0x1b/0x1b
> ret_from_fork+0x22/0x30
>
> This is indeed a race problem between null_timeout_rq() and null_poll().
>
> null_poll() null_timeout_rq()
> spin_lock(&nq->poll_lock)
> list_splice_init(&nq->poll_list, &list)
> spin_unlock(&nq->poll_lock)
>
> while (!list_empty(&list))
> req = list_first_entry()
> list_del_init()
> ...
> blk_mq_add_to_batch()
> // req->rq_next = NULL
> spin_lock(&nq->poll_lock)
>
> // rq->queuelist->next == NULL
> list_del_init(&rq->queuelist)
>
> spin_unlock(&nq->poll_lock)
>
> What's worse is that we don't call blk_mq_complete_request_remote()
> before blk_mq_add_to_batch(), so these completed requests have wrong
> rq->state == MQ_RQ_IN_FLIGHT. We can easily check this using bpftrace:
>
> ```
> bpftrace -e 'kretfunc:null_blk:null_poll {
> $iob=(struct io_comp_batch *)args->iob;
> @[$iob->req_list->state]=count();
> }'
>
> @[1]: 51708
> ```
>
> Fix these problems by setting requests state to MQ_RQ_COMPLETE under
> nq->poll_lock protection, in which null_timeout_rq() can safely detect
> this race and early return.
>
> [1] https://lore.kernel.org/all/3893581.1691785261@warthog.procyon.org.uk/
>
> Fixes: 0a593fbbc245 ("null_blk: poll queue support")
> Reported-by: David Howells <dhowells@redhat.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
> drivers/block/null_blk/main.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 864013019d6b..968090935eb2 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1643,9 +1643,12 @@ static int null_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
> struct nullb_queue *nq = hctx->driver_data;
> LIST_HEAD(list);
> int nr = 0;
> + struct request *rq;
>
> spin_lock(&nq->poll_lock);
> list_splice_init(&nq->poll_list, &list);
> + list_for_each_entry(rq, &list, queuelist)
> + blk_mq_set_request_complete(rq);
> spin_unlock(&nq->poll_lock);
>
> while (!list_empty(&list)) {
> @@ -1671,16 +1674,21 @@ static enum blk_eh_timer_return null_timeout_rq(struct request *rq)
> struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
> struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq);
>
> - pr_info("rq %p timed out\n", rq);
> -
> if (hctx->type == HCTX_TYPE_POLL) {
> struct nullb_queue *nq = hctx->driver_data;
>
> spin_lock(&nq->poll_lock);
> + /* The request may have completed meanwhile. */
> + if (blk_mq_request_completed(rq)) {
> + spin_unlock(&nq->poll_lock);
> + return BLK_EH_DONE;
> + }
> list_del_init(&rq->queuelist);
> spin_unlock(&nq->poll_lock);
> }
I think null_process_cmd() is needed for un-completed request.
Thanks,
Ming
next prev parent reply other threads:[~2023-08-15 8:32 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-15 6:04 [PATCH] null_blk: fix poll request timeout handling chengming.zhou
2023-08-15 8:19 ` David Howells
2023-08-16 9:15 ` Chengming Zhou
2023-08-15 8:31 ` Ming Lei [this message]
2023-08-15 9:45 ` Chengming Zhou
2023-08-15 12:18 ` Ming Lei
2023-08-17 9:02 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZNs3xtOzH4+blVCF@fedora \
--to=ming.lei@redhat.com \
--cc=akinobu.mita@gmail.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=chengming.zhou@linux.dev \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dhowells@redhat.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nj.shetty@samsung.com \
--cc=shinichiro.kawasaki@wdc.com \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.