linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Yi Zhang <yi.zhang@redhat.com>,
	Damien Le Moal <dlemoal@kernel.org>
Cc: Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	linux-block <linux-block@vger.kernel.org>,
	"open list:NVM EXPRESS DRIVER" <linux-nvme@lists.infradead.org>
Subject: Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
Date: Fri, 3 May 2024 14:01:52 +0300	[thread overview]
Message-ID: <1ceb71ce-c4fb-419a-8800-8ebbbe1706fe@grimberg.me> (raw)
In-Reply-To: <76c17ab2-b3a2-491c-a6b3-7bd39d6d5229@wdc.com>



On 03/05/2024 13:32, Johannes Thumshirn wrote:
> On 03.05.24 09:59, Sagi Grimberg wrote:
>>
>> On 4/30/24 17:17, Yi Zhang wrote:
>>> On Tue, Apr 30, 2024 at 2:17 PM Johannes Thumshirn
>>> <Johannes.Thumshirn@wdc.com> wrote:
>>>> On 30.04.24 00:18, Chaitanya Kulkarni wrote:
>>>>> On 4/29/24 07:35, Johannes Thumshirn wrote:
>>>>>> On 23.04.24 15:18, Yi Zhang wrote:
>>>>>>> Hi
>>>>>>> I found this issue on the latest linux-block/for-next by blktests
>>>>>>> nvme/tcp nvme/012, please help check it and let me know if you need
>>>>>>> any info/testing for it, thanks.
>>>>>>>
>>>>>>> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
>>>>>>> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
>>>>>>> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>>>> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>>>>>> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
>>>>>>> blktests-subsystem-1 for NQN
>>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>>>>> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
>>>>>>> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
>>>>>>> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
>>>>>>> 127.0.0.1:4420, hostnqn:
>>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
>>>>> [...]
>>>>>
>>>>>>> [  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
>>>>>>> [  327.475957] loop0: detected capacity change from 0 to 2097152
>>>>>>> [  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>>>>>>
>>>>>>> [  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
>>>>>>>
>>>>>>>
>>>>>>> [  327.872343] nvmet: creating nvm controller 1 for subsystem
>>>>>>> blktests-subsystem-1 for NQN
>>>>>>> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
>>>>>>>
>>>>>>> [  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
>>>>>>> support of multi-port devices.
>>>>> seems like you don't have multipath enabled that is one difference
>>>>> I can see in above log posted by Yi, and your log.
>>>> Yup, but even with multipath enabled I can't get the bug to trigger :(
>>> It's not one 100% reproduced issue, I tried on my another server and
>>> it cannot be reproduced.
>> Looking at the trace, I think I can see the issue here. In the test
>> case, nvme-mpath fails
>> the request upon submission as the queue is not live, and because it is
>> a mpath request, it
>> is failed over using nvme_failover_request, which steals the bios from
>> the request to its private
>> requeue list.
>>
>> The bisected patch, introduces req->bio dereference to a flush request
>> that has no bios (stolen
>> by the failover sequence). The reproduction seems to be related to in
>> where in the flush sequence
>> the request completion is called.
>>
>> I am unsure if simply making the dereference is the correct fix or
>> not... Damien?
>> --
>> diff --git a/block/blk-flush.c b/block/blk-flush.c
>> index 2f58ae018464..c17cf8ed8113 100644
>> --- a/block/blk-flush.c
>> +++ b/block/blk-flush.c
>> @@ -130,7 +130,8 @@ static void blk_flush_restore_request(struct request
>> *rq)
>>             * original @rq->bio.  Restore it.
>>             */
>>            rq->bio = rq->biotail;
>> -       rq->__sector = rq->bio->bi_iter.bi_sector;
>> +       if (rq->bio)
>> +               rq->__sector = rq->bio->bi_iter.bi_sector;
>>
>>            /* make @rq a normal request */
>>            rq->rq_flags &= ~RQF_FLUSH_SEQ;
>> --
>>
>
> This is something Damien added to his patch series. I just wonder, why I
> couldn't reproduce the failure, even with nvme-mpath enabled. I tried
> both nvme-tcp as well as nvme-loop without any problems.

Not exactly sure.

 From what I see blk_flush_complete_seq() will only call 
blk_flush_restore_request() and
panic is for error != 0. And if that is the case, any request with its 
bios stolen must panic.

However, nvme-mpath always ends a stolen request with error = 0.

Seems that there is code that may override the request error status in 
flush_end_io() but I cannot
see it in the trace...

  reply	other threads:[~2024-05-03 11:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-23 13:17 [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 Yi Zhang
2024-04-26  8:30 ` [bug report][bisected] " Yi Zhang
2024-04-29 14:35 ` [bug report] " Johannes Thumshirn
2024-04-29 22:18   ` Chaitanya Kulkarni
2024-04-30  6:16     ` Johannes Thumshirn
2024-04-30 14:17       ` Yi Zhang
2024-05-03  7:59         ` Sagi Grimberg
2024-05-03 10:32           ` Johannes Thumshirn
2024-05-03 11:01             ` Sagi Grimberg [this message]
2024-05-03 21:14               ` Chaitanya Kulkarni
2024-05-09  6:15                 ` Yi Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ceb71ce-c4fb-419a-8800-8ebbbe1706fe@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=chaitanyak@nvidia.com \
    --cc=dlemoal@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).