Re: blktests failures with v7.1-rc1 kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nilay Shroff <nilay@linux.ibm.com>
To: "Shin'ichiro Kawasaki" <shinichiro.kawasaki@wdc.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	nbd@other.debian.org, linux-rdma@vger.kernel.org
Subject: Re: blktests failures with v7.1-rc1 kernel
Date: Fri, 29 May 2026 11:22:55 +0530	[thread overview]
Message-ID: <6734f050-d660-4d82-b59e-bef28ff332bc@linux.ibm.com> (raw)
In-Reply-To: <ahfQFHuVx2G7OFLE@shinmob>

On 5/28/26 10:54 AM, Shin'ichiro Kawasaki wrote:
> On May 25, 2026 / 18:14, Nilay Shroff wrote:
>> hi Shinichiro,
>>
>> On 4/28/26 2:43 PM, Shin'ichiro Kawasaki wrote:
> [...]
>>> #1: nvme/005,063 (tcp transport)
>>>
>>>       The test cases nvme/005 and 063 fail for tcp transport due to the lockdep
>>>       WARN related to the three locks q->q_usage_counter, q->elevator_lock and
>>>       set->srcu. The failure was reported first time for nvme/063 and v6.16-rc1
>>>       kernel [2].
>>>
>>>       Chaitanya provided a fix patch (thanks!), and it is queued for v7.1-rcX tags
>>>       [3]. However, nvme/005 and 063 still fail even when I apply the fix patch to
>>>       v7.1-rc1 kernel. The call traces of the lockdep WARN are different between
>>>       "v7.1-rc1" kernel [4] and "v7.1-rc1+the fix patch" kernel [5]. I guess that
>>>       there exist two lockdep problems with similar symptoms and patch [3] fixed
>>>       one of them. I guess that still one problem is left.
>>>
>>>       [2]https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>       [3]https://lore.kernel.org/all/20260413171628.6204-1-kch@nvidia.com/
>>
>>
>> I looked into this lockdep warning, and it seems that Chaitanya's patch indeed fixes the
>> original issue reported in [4]. However, the new warning reported in [5] appears to be a
>> separate lockdep splat and, from what I can tell, likely a false positive. There are two
>> reasons why I think so:
>>
>> 1. The lockdep report suggests that thread #1 is sending data over a TCP socket while
>>     another thread #2 is still in the process of establishing that same socket connection.
>>     In practice, this should not be possible because request dispatch over the socket can
>>     only happen after the connection setup has completed successfully.
>>
>> 2. The warning also suggests that while thread #0 is deleting the gendisk and unregistering
>>     the corresponding request queue, another thread #5 is concurrently attempting to change
>>     the queue elevator. However, once gendisk deletion starts, elevator switching is already
>>     inhibited for that queue (see disable_elv_switch()), so the reported locking scenario
>>     should not be reachable in practice.
>>
>> Based on the above, I suspect this is a lockdep false positive caused by dependency tracking
>> across different queue/socket lifecycle phases. We may need to suppress lock dependency tracking
>> in some of these paths to avoid the false warning.
> 
> Hi Nilay, thank you very much looking into this. It is good to know that
> Chaitanya's patch fixed one problem, and the other problem looks like a false-
> positive.
> 
> To confirm that "lockdep false positive caused by dependency tracking across
> different queue/socket lifecycle phases", I created the patch attached. It
> uses dynamic lockdep keys for the sockets of nvme-tcp controllers. With this
> patch, the WARN at nvme/005 disappears! I think this indicates that your
> suspect is correct. I will do some more testing and post the patch.

Thanks for working on the patch! I reviewed it and the changes look good to me.
I agree assigning a unique lockdep key to each nvmf-tcp socket is the right
solution.

Thanks,
--Nilay

     prev parent reply	other threads:[~2026-05-29  5:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  9:13 blktests failures with v7.1-rc1 kernel Shin'ichiro Kawasaki
2026-05-25 12:44 ` Nilay Shroff
2026-05-28  5:24   ` Shin'ichiro Kawasaki
2026-05-29  5:52     ` Nilay Shroff [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6734f050-d660-4d82-b59e-bef28ff332bc@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=nbd@other.debian.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.