From: Nilay Shroff <nilay@linux.ibm.com>
To: "Shin'ichiro Kawasaki" <shinichiro.kawasaki@wdc.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
nbd@other.debian.org, linux-rdma@vger.kernel.org
Subject: Re: blktests failures with v7.1-rc1 kernel
Date: Fri, 29 May 2026 11:22:55 +0530 [thread overview]
Message-ID: <6734f050-d660-4d82-b59e-bef28ff332bc@linux.ibm.com> (raw)
In-Reply-To: <ahfQFHuVx2G7OFLE@shinmob>
On 5/28/26 10:54 AM, Shin'ichiro Kawasaki wrote:
> On May 25, 2026 / 18:14, Nilay Shroff wrote:
>> hi Shinichiro,
>>
>> On 4/28/26 2:43 PM, Shin'ichiro Kawasaki wrote:
> [...]
>>> #1: nvme/005,063 (tcp transport)
>>>
>>> The test cases nvme/005 and 063 fail for tcp transport due to the lockdep
>>> WARN related to the three locks q->q_usage_counter, q->elevator_lock and
>>> set->srcu. The failure was reported first time for nvme/063 and v6.16-rc1
>>> kernel [2].
>>>
>>> Chaitanya provided a fix patch (thanks!), and it is queued for v7.1-rcX tags
>>> [3]. However, nvme/005 and 063 still fail even when I apply the fix patch to
>>> v7.1-rc1 kernel. The call traces of the lockdep WARN are different between
>>> "v7.1-rc1" kernel [4] and "v7.1-rc1+the fix patch" kernel [5]. I guess that
>>> there exist two lockdep problems with similar symptoms and patch [3] fixed
>>> one of them. I guess that still one problem is left.
>>>
>>> [2]https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>> [3]https://lore.kernel.org/all/20260413171628.6204-1-kch@nvidia.com/
>>
>>
>> I looked into this lockdep warning, and it seems that Chaitanya's patch indeed fixes the
>> original issue reported in [4]. However, the new warning reported in [5] appears to be a
>> separate lockdep splat and, from what I can tell, likely a false positive. There are two
>> reasons why I think so:
>>
>> 1. The lockdep report suggests that thread #1 is sending data over a TCP socket while
>> another thread #2 is still in the process of establishing that same socket connection.
>> In practice, this should not be possible because request dispatch over the socket can
>> only happen after the connection setup has completed successfully.
>>
>> 2. The warning also suggests that while thread #0 is deleting the gendisk and unregistering
>> the corresponding request queue, another thread #5 is concurrently attempting to change
>> the queue elevator. However, once gendisk deletion starts, elevator switching is already
>> inhibited for that queue (see disable_elv_switch()), so the reported locking scenario
>> should not be reachable in practice.
>>
>> Based on the above, I suspect this is a lockdep false positive caused by dependency tracking
>> across different queue/socket lifecycle phases. We may need to suppress lock dependency tracking
>> in some of these paths to avoid the false warning.
>
> Hi Nilay, thank you very much looking into this. It is good to know that
> Chaitanya's patch fixed one problem, and the other problem looks like a false-
> positive.
>
> To confirm that "lockdep false positive caused by dependency tracking across
> different queue/socket lifecycle phases", I created the patch attached. It
> uses dynamic lockdep keys for the sockets of nvme-tcp controllers. With this
> patch, the WARN at nvme/005 disappears! I think this indicates that your
> suspect is correct. I will do some more testing and post the patch.
Thanks for working on the patch! I reviewed it and the changes look good to me.
I agree assigning a unique lockdep key to each nvmf-tcp socket is the right
solution.
Thanks,
--Nilay
prev parent reply other threads:[~2026-05-29 5:53 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 9:13 blktests failures with v7.1-rc1 kernel Shin'ichiro Kawasaki
2026-05-25 12:44 ` Nilay Shroff
2026-05-28 5:24 ` Shin'ichiro Kawasaki
2026-05-29 5:52 ` Nilay Shroff [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6734f050-d660-4d82-b59e-bef28ff332bc@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=nbd@other.debian.org \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox