From: Chengming Zhou <chengming.zhou@linux.dev>
To: Friedrich Weber <f.weber@proxmox.com>,
axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de,
bvanassche@acm.org
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
zhouchengming@bytedance.com
Subject: Re: [PATCH v4 4/4] blk-flush: reuse rq queuelist in flush state machine
Date: Wed, 29 May 2024 16:50:02 +0800 [thread overview]
Message-ID: <b73d3891-9a52-4f0c-b154-5a6d6117c697@linux.dev> (raw)
In-Reply-To: <acc28f2c-0e72-409d-bb61-791ef62ddfd4@proxmox.com>
On 2024/5/28 22:40, Friedrich Weber wrote:
> On 28/05/2024 11:09, Chengming Zhou wrote:
>> On 2024/5/28 16:42, Friedrich Weber wrote:
>>> Hope I did this correctly. With this, the reproducer triggered a BUG
>>> pretty quickly, see [0]. If I can provide anything else, just let me know.
>>
>> Thanks for your patience, it's correct. Then how about this fix?
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index d98654869615..b2ec5c4c738f 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -485,6 +485,7 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
>> if (data->nr_tags > 1) {
>> rq = __blk_mq_alloc_requests_batch(data);
>> if (rq) {
>> + INIT_LIST_HEAD(&rq->queuelist);
>> blk_mq_rq_time_init(rq, alloc_time_ns);
>> return rq;
>> }
>>
>
> Nice, seems like with this patch applied on top of 6.9, the reproducer
> does not trigger crashes anymore for me! Thanks!
Good news. Thanks.
>
> To verify that the reproducer hits the new INIT_LIST_HEAD, I added debug
> prints before/after:
[...]
> And indeed, I see quite some printouts where rq->queuelist.next differs
> before/after the request, e.g.
>
> May 28 16:31:25 reproflushfull kernel: before init: list:
> 000000001e0a144f next: 00000000aaa2e372 prev: 000000001e0a144f
> May 28 16:31:25 reproflushfull kernel: after init: list:
> 000000001e0a144f next: 000000001e0a144f prev: 000000001e0a144f
> May 28 16:31:26 reproflushfull kernel: before init: list:
> 000000001e0a144f next: 00000000aaa2e372 prev: 000000001e0a144f
> May 28 16:31:26 reproflushfull kernel: after init: list:
> 000000001e0a144f next: 000000001e0a144f prev: 000000001e0a144f
>
> I know very little about the block layer, but if I understand the commit
> message of the original 81ada09cc25e correctly, it's expected that
> queuelist needs to be reinitialized at some places. I'm just a little
Yes, because we use list_move_tail() in the flush sequences. Maybe we can
just use list_add_tail() so we don't need the queuelist initialized. It
should be ok since rq can't be on any list when PREFLUSH or POSTFLUSH,
so there isn't any move actually.
But now I'm concerned that rq->queuelist maybe changed by driver after
request end? Don't know if this is a reasonable behavior, or I'm afraid
that the safest way is to revert the last patch. Want to know what Jens,
Ming and Christoph think?
> confused to see the same pointer 00000000aaa2e372 in two subsequent
> "before init" printouts for the same queuelist 000000001e0a144f. Is this
> expected too?
Not sure, but it seems possible. This is a rq_list in the plug cache,
000000001e0a144f is a PREFLUSH request, it will be freed after request end.
Then next time we again allocate it and put it in the plug cache,
just before 00000000aaa2e372 again. The reason why block doesn't use
00000000aaa2e372 maybe it's from a different queue or hardware queue.
But these are just my guess.
>
> Also, just out of interest: Can you estimate whether this issue is
> specific to software RAID setups, or could similar NULL pointer
> dereferences also happen in setups without software RAID?
I think it can also happen without software RAID.
Thanks.
next prev parent reply other threads:[~2024-05-29 8:50 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-17 4:00 [PATCH v4 0/4] blk-mq: optimize flush and request size chengming.zhou
2023-07-17 4:00 ` [PATCH v4 1/4] blk-mq: use percpu csd to remote complete instead of per-rq csd chengming.zhou
2023-07-17 4:00 ` [PATCH v4 2/4] blk-flush: fix rq->flush.seq for post-flush requests chengming.zhou
2023-07-17 4:00 ` [PATCH v4 3/4] blk-flush: count inflight flush_data requests chengming.zhou
2023-07-17 4:00 ` [PATCH v4 4/4] blk-flush: reuse rq queuelist in flush state machine chengming.zhou
2024-05-24 16:07 ` Friedrich Weber
2024-05-27 5:09 ` Chengming Zhou
2024-05-27 16:04 ` Friedrich Weber
2024-05-27 23:34 ` Chengming Zhou
2024-05-27 23:50 ` Chengming Zhou
2024-05-28 0:12 ` Chengming Zhou
2024-05-28 8:42 ` Friedrich Weber
2024-05-28 9:09 ` Chengming Zhou
2024-05-28 14:40 ` Friedrich Weber
2024-05-29 8:50 ` Chengming Zhou [this message]
2024-05-31 6:17 ` Christoph Hellwig
2024-05-31 8:16 ` Chengming Zhou
2023-07-17 14:18 ` [PATCH v4 0/4] blk-mq: optimize flush and request size Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b73d3891-9a52-4f0c-b154-5a6d6117c697@linux.dev \
--to=chengming.zhou@linux.dev \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=f.weber@proxmox.com \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=zhouchengming@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox