Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Nilay Shroff <nilay@linux.ibm.com>
To: "Shin'ichiro Kawasaki" <shinichiro.kawasaki@wdc.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	nbd@other.debian.org, linux-rdma@vger.kernel.org
Subject: Re: blktests failures with v7.1-rc1 kernel
Date: Fri, 29 May 2026 11:22:55 +0530	[thread overview]
Message-ID: <6734f050-d660-4d82-b59e-bef28ff332bc@linux.ibm.com> (raw)
In-Reply-To: <ahfQFHuVx2G7OFLE@shinmob>

On 5/28/26 10:54 AM, Shin'ichiro Kawasaki wrote:
> On May 25, 2026 / 18:14, Nilay Shroff wrote:
>> hi Shinichiro,
>>
>> On 4/28/26 2:43 PM, Shin'ichiro Kawasaki wrote:
> [...]
>>> #1: nvme/005,063 (tcp transport)
>>>
>>>       The test cases nvme/005 and 063 fail for tcp transport due to the lockdep
>>>       WARN related to the three locks q->q_usage_counter, q->elevator_lock and
>>>       set->srcu. The failure was reported first time for nvme/063 and v6.16-rc1
>>>       kernel [2].
>>>
>>>       Chaitanya provided a fix patch (thanks!), and it is queued for v7.1-rcX tags
>>>       [3]. However, nvme/005 and 063 still fail even when I apply the fix patch to
>>>       v7.1-rc1 kernel. The call traces of the lockdep WARN are different between
>>>       "v7.1-rc1" kernel [4] and "v7.1-rc1+the fix patch" kernel [5]. I guess that
>>>       there exist two lockdep problems with similar symptoms and patch [3] fixed
>>>       one of them. I guess that still one problem is left.
>>>
>>>       [2]https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>       [3]https://lore.kernel.org/all/20260413171628.6204-1-kch@nvidia.com/
>>
>>
>> I looked into this lockdep warning, and it seems that Chaitanya's patch indeed fixes the
>> original issue reported in [4]. However, the new warning reported in [5] appears to be a
>> separate lockdep splat and, from what I can tell, likely a false positive. There are two
>> reasons why I think so:
>>
>> 1. The lockdep report suggests that thread #1 is sending data over a TCP socket while
>>     another thread #2 is still in the process of establishing that same socket connection.
>>     In practice, this should not be possible because request dispatch over the socket can
>>     only happen after the connection setup has completed successfully.
>>
>> 2. The warning also suggests that while thread #0 is deleting the gendisk and unregistering
>>     the corresponding request queue, another thread #5 is concurrently attempting to change
>>     the queue elevator. However, once gendisk deletion starts, elevator switching is already
>>     inhibited for that queue (see disable_elv_switch()), so the reported locking scenario
>>     should not be reachable in practice.
>>
>> Based on the above, I suspect this is a lockdep false positive caused by dependency tracking
>> across different queue/socket lifecycle phases. We may need to suppress lock dependency tracking
>> in some of these paths to avoid the false warning.
> 
> Hi Nilay, thank you very much looking into this. It is good to know that
> Chaitanya's patch fixed one problem, and the other problem looks like a false-
> positive.
> 
> To confirm that "lockdep false positive caused by dependency tracking across
> different queue/socket lifecycle phases", I created the patch attached. It
> uses dynamic lockdep keys for the sockets of nvme-tcp controllers. With this
> patch, the WARN at nvme/005 disappears! I think this indicates that your
> suspect is correct. I will do some more testing and post the patch.

Thanks for working on the patch! I reviewed it and the changes look good to me.
I agree assigning a unique lockdep key to each nvmf-tcp socket is the right
solution.

Thanks,
--Nilay



      reply	other threads:[~2026-05-29  5:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  9:13 blktests failures with v7.1-rc1 kernel Shin'ichiro Kawasaki
2026-05-25 12:44 ` Nilay Shroff
2026-05-28  5:24   ` Shin'ichiro Kawasaki
2026-05-29  5:52     ` Nilay Shroff [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6734f050-d660-4d82-b59e-bef28ff332bc@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=nbd@other.debian.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox