linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chengming Zhou <chengming.zhou@linux.dev>
To: Friedrich Weber <f.weber@proxmox.com>,
	axboe@kernel.dk, ming.lei@redhat.com, hch@lst.de,
	bvanassche@acm.org
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	zhouchengming@bytedance.com
Subject: Re: [PATCH v4 4/4] blk-flush: reuse rq queuelist in flush state machine
Date: Tue, 28 May 2024 08:12:07 +0800	[thread overview]
Message-ID: <87f495c2-7504-4d22-b355-608b13c456cd@linux.dev> (raw)
In-Reply-To: <8b1400e6-b35e-486b-8ea0-de76270267c0@linux.dev>

On 2024/5/28 07:50, Chengming Zhou wrote:
> On 2024/5/28 07:34, Chengming Zhou wrote:
>> On 2024/5/28 00:04, Friedrich Weber wrote:
>>> Hi Chengming,
>>>
>>> Thank you for taking a look at this!
>>>
>>> On 27/05/2024 07:09, Chengming Zhou wrote:
>>>>> I've used this reproducer for a bisect, which produced
>>>>>
>>>>>  81ada09cc25e (blk-flush: reuse rq queuelist in flush state machine)
>>>>>
>>>>> as the first commit with which I can reproduce the crashes. I'm not 100%
>>>>> sure it is this one because the reproducer is a bit flaky. But it does
>>>>> sound plausible, as the commit is included in our 6.8 kernel, and
>>>>> touches `queuelist` which is AFAICT where blk_flush_complete_seq
>>>>> dereferences the NULL pointer.
>>>>
>>>> Ok, it will be better that I can reproduce it locally, will try later.
>>>
>>> Interestingly, so far I haven't been able to reproduce the crash when
>>> generating IO on the host itself, I only got crashes when generating IO
>>> in a QEMU VM.
>>>
>>> The reproducer in more detail:
>>
>> Thanks for these details, I will try to setup and reproduce when I back to work.
>>
>>>
>>> - Compile Linux 6.9 with CONFIG_FAULT_INJECTION,
>> [...]
>>>>
>>>> BUG shows it panic on 0000000000000008, not sure what it's accessing then,
>>>> does it means rq->queuelist.next == 0 or something? Could you use add2line
>>>> to show the exact source code line that panic? I use blk_flush_complete_seq+0x296/0x2e0
>>>> and get block/blk-flush.c:190, which is "fq->flush_data_in_flight++;",
>>>> obviously fq can't be NULL. (I'm using the v6.9 kernel)
>>>
>>> Sorry for the confusion, the crash dump was from a kernel compiled at
>>> 81ada09cc25e -- with 6.9, the offset seems to be different. See [2] for
>>> a kernel 6.9 crash dump.
>>>
>>> I don't know too much about kernel debugging, but I tried to get
>>> something useful out of addr2line:
>>>
>>> # addr2line -f -e /usr/lib/debug/vmlinux-6.9.0-debug2
>>> blk_flush_complete_seq+0x291/0x2d0
>>> __list_del
>>> /[...]./include/linux/list.h:195
>>>
>>> I tried to find the relevant portions in `objdump -SD blk-flush.o`, see
>>> [3]. If I'm not mistaken, blk_flush_complete_seq+0x291 should point to
>>>
>>> 351:   48 89 4f 08             mov    %rcx,0x8(%rdi)
>>>
>>> To me this looks like part of
>>>
>>> 	list_move_tail(&rq->queuelist, pending);
>>>
>>> What do you think?
>>
>> Yeah, it seems correct, so the rq->queuelist.next == NULL. It can't be NULL
>> if went through REQ_FSEQ_POSTFLUSH, so it must be REQ_FSEQ_PREFLUSH. It means
>> we allocated a request but its queuelist is not initialized or corrupted?
>>
>> Anyway, I will use below changes for debugging when reproduce, and you could
>> also try this to see if we could get something useful. :)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 3b4df8e5ac9e..6e3a6cd7739d 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2989,6 +2989,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>                 blk_mq_use_cached_rq(rq, plug, bio);
>>         }
>>
>> +       BUG_ON(rq->queuelist.next == NULL);
>> +
>>         trace_block_getrq(bio);
>>
>>         rq_qos_track(q, rq, bio);
>> @@ -3006,6 +3008,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>         if (bio_zone_write_plugging(bio))
>>                 blk_zone_write_plug_init_request(rq);
>>
>> +       BUG_ON(rq->queuelist.next == NULL);
>> +
>>         if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
>>                 return;
>>
> 
> Ah, I forgot to change to your kernel version, then should be:
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d98654869615..908fdfb62132 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2963,6 +2963,8 @@ void blk_mq_submit_bio(struct bio *bio)
>                         return;
>         }
> 
> +       BUG_ON(rq->queuelist.next == NULL);
> +
>         trace_block_getrq(bio);
> 
>         rq_qos_track(q, rq, bio);
> @@ -2977,6 +2979,8 @@ void blk_mq_submit_bio(struct bio *bio)
>                 return;
>         }
> 
> +       BUG_ON(rq->queuelist.next == NULL);
> +
>         if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
>                 return;
> 

Another possibility is that drivers may change rq->queuelist even after
rq->end_io(). So add two more BUG_ON() to detect this:

diff --git a/block/blk-flush.c b/block/blk-flush.c
index e73dc22d05c1..0eb684a468e5 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -179,7 +179,10 @@ static void blk_flush_complete_seq(struct request *rq,

        switch (seq) {
        case REQ_FSEQ_PREFLUSH:
+               BUG_ON(rq->queuelist.next == NULL);
+               fallthrough;
        case REQ_FSEQ_POSTFLUSH:
+               BUG_ON(rq->queuelist.next == NULL);
                /* queue for flush */
                if (list_empty(pending))
                        fq->flush_pending_since = jiffies;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d98654869615..908fdfb62132 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2963,6 +2963,8 @@ void blk_mq_submit_bio(struct bio *bio)
                        return;
        }

+       BUG_ON(rq->queuelist.next == NULL);
+
        trace_block_getrq(bio);

        rq_qos_track(q, rq, bio);
@@ -2977,6 +2979,8 @@ void blk_mq_submit_bio(struct bio *bio)
                return;
        }

+       BUG_ON(rq->queuelist.next == NULL);
+
        if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
                return;

  reply	other threads:[~2024-05-28  0:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-17  4:00 [PATCH v4 0/4] blk-mq: optimize flush and request size chengming.zhou
2023-07-17  4:00 ` [PATCH v4 1/4] blk-mq: use percpu csd to remote complete instead of per-rq csd chengming.zhou
2023-07-17  4:00 ` [PATCH v4 2/4] blk-flush: fix rq->flush.seq for post-flush requests chengming.zhou
2023-07-17  4:00 ` [PATCH v4 3/4] blk-flush: count inflight flush_data requests chengming.zhou
2023-07-17  4:00 ` [PATCH v4 4/4] blk-flush: reuse rq queuelist in flush state machine chengming.zhou
2024-05-24 16:07   ` Friedrich Weber
2024-05-27  5:09     ` Chengming Zhou
2024-05-27 16:04       ` Friedrich Weber
2024-05-27 23:34         ` Chengming Zhou
2024-05-27 23:50           ` Chengming Zhou
2024-05-28  0:12             ` Chengming Zhou [this message]
2024-05-28  8:42               ` Friedrich Weber
2024-05-28  9:09                 ` Chengming Zhou
2024-05-28 14:40                   ` Friedrich Weber
2024-05-29  8:50                     ` Chengming Zhou
2024-05-31  6:17                       ` Christoph Hellwig
2024-05-31  8:16                         ` Chengming Zhou
2023-07-17 14:18 ` [PATCH v4 0/4] blk-mq: optimize flush and request size Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87f495c2-7504-4d22-b355-608b13c456cd@linux.dev \
    --to=chengming.zhou@linux.dev \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=f.weber@proxmox.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).