From: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: Nilay Shroff <nilay@linux.ibm.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"linux-nvme@lists.infradead.org"
<linux-nvme@lists.infradead.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
nbd@other.debian.org, linux-rdma@vger.kernel.org
Subject: Re: blktests failures with v7.1-rc1 kernel
Date: Thu, 28 May 2026 14:24:42 +0900 [thread overview]
Message-ID: <ahfQFHuVx2G7OFLE@shinmob> (raw)
In-Reply-To: <c4ddc101-184a-4e4f-82ca-c3123bce5e34@linux.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 2972 bytes --]
On May 25, 2026 / 18:14, Nilay Shroff wrote:
> hi Shinichiro,
>
> On 4/28/26 2:43 PM, Shin'ichiro Kawasaki wrote:
[...]
> > #1: nvme/005,063 (tcp transport)
> >
> > The test cases nvme/005 and 063 fail for tcp transport due to the lockdep
> > WARN related to the three locks q->q_usage_counter, q->elevator_lock and
> > set->srcu. The failure was reported first time for nvme/063 and v6.16-rc1
> > kernel [2].
> >
> > Chaitanya provided a fix patch (thanks!), and it is queued for v7.1-rcX tags
> > [3]. However, nvme/005 and 063 still fail even when I apply the fix patch to
> > v7.1-rc1 kernel. The call traces of the lockdep WARN are different between
> > "v7.1-rc1" kernel [4] and "v7.1-rc1+the fix patch" kernel [5]. I guess that
> > there exist two lockdep problems with similar symptoms and patch [3] fixed
> > one of them. I guess that still one problem is left.
> >
> > [2]https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> > [3]https://lore.kernel.org/all/20260413171628.6204-1-kch@nvidia.com/
>
>
> I looked into this lockdep warning, and it seems that Chaitanya's patch indeed fixes the
> original issue reported in [4]. However, the new warning reported in [5] appears to be a
> separate lockdep splat and, from what I can tell, likely a false positive. There are two
> reasons why I think so:
>
> 1. The lockdep report suggests that thread #1 is sending data over a TCP socket while
> another thread #2 is still in the process of establishing that same socket connection.
> In practice, this should not be possible because request dispatch over the socket can
> only happen after the connection setup has completed successfully.
>
> 2. The warning also suggests that while thread #0 is deleting the gendisk and unregistering
> the corresponding request queue, another thread #5 is concurrently attempting to change
> the queue elevator. However, once gendisk deletion starts, elevator switching is already
> inhibited for that queue (see disable_elv_switch()), so the reported locking scenario
> should not be reachable in practice.
>
> Based on the above, I suspect this is a lockdep false positive caused by dependency tracking
> across different queue/socket lifecycle phases. We may need to suppress lock dependency tracking
> in some of these paths to avoid the false warning.
Hi Nilay, thank you very much looking into this. It is good to know that
Chaitanya's patch fixed one problem, and the other problem looks like a false-
positive.
To confirm that "lockdep false positive caused by dependency tracking across
different queue/socket lifecycle phases", I created the patch attached. It
uses dynamic lockdep keys for the sockets of nvme-tcp controllers. With this
patch, the WARN at nvme/005 disappears! I think this indicates that your
suspect is correct. I will do some more testing and post the patch.
[-- Attachment #2: 0001-nvme-tcp-lockdep-use-dynamic-lockdep-keys.patch --]
[-- Type: text/plain, Size: 5676 bytes --]
From 74ae2157712e872711663ebb6cedbb4b0fc8c92a Mon Sep 17 00:00:00 2001
From: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Date: Thu, 28 May 2026 13:52:48 +0900
Subject: [PATCH] nvme-tcp: lockdep: use dynamic lockdep keys
When NVMe-TCP controller setup and teardown are repeated with lockdep
enabled, lockdep reports false positives for the following locks:
1) &q->elevator_lock : IO scheduler change context
2) &q->q_usage_counter(io) : SCSI disk probe context
3) fs_reclaim : CPU hotplug bring-up context
4) cpu_hotplug_lock : socket establishment context
5) sk_lock-AF_INET-NVME : MQ sched dispatch context for the socket
6) set->srcu : NVMe controller delete context
This is a false positive because lockdep confuses lock 4) (socket
establishment) with lock 5) (socket in use) for different socket
instances. The locks belong to different sockets, but lockdep treats
them as the same due to shared static lockdep keys.
Fix this by using dynamically allocated lockdep keys per socket instance
instead of static keys nvme_tcp_sk_key[] and nvme_tcp_slock_key[]. Add
nvme_tcp_sk_key and nvme_tcp_slock_key fields to struct nvme_tcp_queue
and pass them to sock_lock_init_class_and_name() for proper lockdep
tracking. Move nvme_tcp_reclassify_socket() after struct nvme_tcp_queue
definition to avoid "too early" reference compiler errors.
Suggested-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
drivers/nvme/host/tcp.c | 88 +++++++++++++++++++++++------------------
1 file changed, 49 insertions(+), 39 deletions(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 15d36d6a728e..51d496f414a1 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -56,44 +56,6 @@ MODULE_PARM_DESC(tls_handshake_timeout,
static atomic_t nvme_tcp_cpu_queues[NR_CPUS];
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-/* lockdep can detect a circular dependency of the form
- * sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
- * because dependencies are tracked for both nvme-tcp and user contexts. Using
- * a separate class prevents lockdep from conflating nvme-tcp socket use with
- * user-space socket API use.
- */
-static struct lock_class_key nvme_tcp_sk_key[2];
-static struct lock_class_key nvme_tcp_slock_key[2];
-
-static void nvme_tcp_reclassify_socket(struct socket *sock)
-{
- struct sock *sk = sock->sk;
-
- if (WARN_ON_ONCE(!sock_allow_reclassification(sk)))
- return;
-
- switch (sk->sk_family) {
- case AF_INET:
- sock_lock_init_class_and_name(sk, "slock-AF_INET-NVME",
- &nvme_tcp_slock_key[0],
- "sk_lock-AF_INET-NVME",
- &nvme_tcp_sk_key[0]);
- break;
- case AF_INET6:
- sock_lock_init_class_and_name(sk, "slock-AF_INET6-NVME",
- &nvme_tcp_slock_key[1],
- "sk_lock-AF_INET6-NVME",
- &nvme_tcp_sk_key[1]);
- break;
- default:
- WARN_ON_ONCE(1);
- }
-}
-#else
-static void nvme_tcp_reclassify_socket(struct socket *sock) { }
-#endif
-
enum nvme_tcp_send_state {
NVME_TCP_SEND_CMD_PDU = 0,
NVME_TCP_SEND_H2C_PDU,
@@ -180,6 +142,11 @@ struct nvme_tcp_queue {
void (*state_change)(struct sock *);
void (*data_ready)(struct sock *);
void (*write_space)(struct sock *);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lock_class_key nvme_tcp_sk_key;
+ struct lock_class_key nvme_tcp_slock_key;
+#endif
};
struct nvme_tcp_ctrl {
@@ -207,6 +174,44 @@ static const struct blk_mq_ops nvme_tcp_mq_ops;
static const struct blk_mq_ops nvme_tcp_admin_mq_ops;
static int nvme_tcp_try_send(struct nvme_tcp_queue *queue);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/* lockdep can detect a circular dependency of the form
+ * sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
+ * because dependencies are tracked for both nvme-tcp and user contexts. Using
+ * a separate class prevents lockdep from conflating nvme-tcp socket use with
+ * user-space socket API use.
+ */
+static void nvme_tcp_reclassify_socket(struct nvme_tcp_queue *queue)
+{
+ struct sock *sk = queue->sock->sk;
+
+ lockdep_register_key(&queue->nvme_tcp_sk_key);
+ lockdep_register_key(&queue->nvme_tcp_slock_key);
+
+ if (WARN_ON_ONCE(!sock_allow_reclassification(sk)))
+ return;
+
+ switch (sk->sk_family) {
+ case AF_INET:
+ sock_lock_init_class_and_name(sk, "slock-AF_INET-NVME",
+ &queue->nvme_tcp_slock_key,
+ "sk_lock-AF_INET-NVME",
+ &queue->nvme_tcp_sk_key);
+ break;
+ case AF_INET6:
+ sock_lock_init_class_and_name(sk, "slock-AF_INET6-NVME",
+ &queue->nvme_tcp_slock_key,
+ "sk_lock-AF_INET6-NVME",
+ &queue->nvme_tcp_sk_key);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ }
+}
+#else
+static void nvme_tcp_reclassify_socket(struct nvme_tcp_queue *queue) { }
+#endif
+
static inline struct nvme_tcp_ctrl *to_tcp_ctrl(struct nvme_ctrl *ctrl)
{
return container_of(ctrl, struct nvme_tcp_ctrl, ctrl);
@@ -1468,6 +1473,11 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
kfree(queue->pdu);
mutex_destroy(&queue->send_mutex);
mutex_destroy(&queue->queue_lock);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ lockdep_unregister_key(&queue->nvme_tcp_sk_key);
+ lockdep_unregister_key(&queue->nvme_tcp_slock_key);
+#endif
}
static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue)
@@ -1813,7 +1823,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
}
sk_net_refcnt_upgrade(queue->sock->sk);
- nvme_tcp_reclassify_socket(queue->sock);
+ nvme_tcp_reclassify_socket(queue);
/* Single syn retry */
tcp_sock_set_syncnt(queue->sock->sk, 1);
--
2.54.0
prev parent reply other threads:[~2026-05-28 5:24 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 9:13 blktests failures with v7.1-rc1 kernel Shin'ichiro Kawasaki
2026-05-25 12:44 ` Nilay Shroff
2026-05-28 5:24 ` Shin'ichiro Kawasaki [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahfQFHuVx2G7OFLE@shinmob \
--to=shinichiro.kawasaki@wdc.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=nbd@other.debian.org \
--cc=nilay@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox