blktests failures with v6.17-rc1 kernel

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* blktests failures with v6.17-rc1 kernel
@ 2025-08-13 10:50 Shinichiro Kawasaki
  2025-08-27 10:10 ` Daniel Wagner
  0 siblings, 1 reply; 9+ messages in thread
From: Shinichiro Kawasaki @ 2025-08-13 10:50 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

Hi all,

I ran the latest blktests (git hash: a08678c0cf2d) with the v6.17-rc1 kernel. I
observed 6 failures listed below. Comparing with the previous report with the
v6.16 kernel [1], 2 failures are new.

[1] https://lore.kernel.org/linux-block/g4svtamuk3jhhhzb52reoj3nj2agi4ws7fwyc45vca5uykjkb4@glfr4emapv7n/

List of failures
================
#1: block/005
#2: nvme/041 (fc transport)
#3: nvme/050 (new)
#4: nvme/061 (fc transport)
#5: nvme/063 (tcp transport)
#6: scsi/006 (IDE/PATA device)(new)


Failure description
===================

#1: block/005

    When the test case is run for a NVME device as TEST_DEV, the kernel reports
    a lockdep WARN related to the three locks q->q_usage_counter, fs_reclaim and
    cpu_hotplug_lock. Refer to the report for the v6.16-rc1 kernel [2]. Nilay
    posted the fix patch candiate [3].

    [2] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
    [3] https://lore.kernel.org/linux-block/20250805171749.3448694-1-nilay@linux.ibm.com/

#2: nvme/041 (fc transport)

    The test case nvme/041 fails for fc transport. Refer to the report for the
    v6.12 kernel [4].

    [4] https://lore.kernel.org/linux-nvme/6crydkodszx5vq4ieox3jjpwkxtu7mhbohypy24awlo5w7f4k6@to3dcng24rd4/

#3: nvme/050 (new)

    The test case fails with the message below:

    nvme/050 => nvme0n1 (test nvme-pci timeout with fio jobs)    [failed]
        runtime  90.974s  ...  90.912s
        --- tests/nvme/050.out      2024-09-20 11:20:26.422380826 +0900
        +++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/050.out.bad   2025-08-13 11:12:54.655610693 +0900
        @@ -1,2 +1,3 @@
         Running nvme/050
         Test complete
        +Failed to restore /dev/nvme0n1

    This needs further debug.

#4: nvme/061 (fc transport)

    The test case nvme/061 sometimes fails for fc transport due to a WARN and
    refcount message "refcount_t: underflow; use-after-free." Refer to the
    report for the v6.15 kernel [5].

    [5] https://lore.kernel.org/linux-block/2xsfqvnntjx5iiir7wghhebmnugmpfluv6ef22mghojgk6gilr@mvjscqxroqqk/

#5: nvme/063 (tcp transport)

    The test case nvme/063 fails for tcp transport due to the lockdep WARN
    related to the three locks q->q_usage_counter, q->elevator_lock and
    set->srcu. Refer to the report for the v6.16-rc1 kernel [2].

#6: scsi/006 (IDE/PATA device)(new)

    When the test case scsi/006 is run for QEMU IDE/PATA devices, it disables
    the devices and causes I/O errors. On the worst case, it makes the test
    system hang. Damien posted the fix candidate patch [6].

    [6] https://lore.kernel.org/linux-ide/20250813092707.447479-1-dlemoal@kernel.org/T/#u

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: blktests failures with v6.17-rc1 kernel
  2025-08-13 10:50 blktests failures with v6.17-rc1 kernel Shinichiro Kawasaki
@ 2025-08-27 10:10 ` Daniel Wagner
  2025-08-28  5:55   ` Shinichiro Kawasaki
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Wagner @ 2025-08-27 10:10 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On Wed, Aug 13, 2025 at 10:50:34AM +0000, Shinichiro Kawasaki wrote:
> #4: nvme/061 (fc transport)
> 
>     The test case nvme/061 sometimes fails for fc transport due to a WARN and
>     refcount message "refcount_t: underflow; use-after-free." Refer to the
>     report for the v6.15 kernel [5].
> 
>     [5]
>     https://lore.kernel.org/linux-block/2xsfqvnntjx5iiir7wghhebmnugmpfluv6ef22mghojgk6gilr@mvjscqxroqqk/

This one might be fixed with

https://lore.kernel.org/linux-nvme/20250821-fix-nvmet-fc-v1-1-3349da4f416e@kernel.org/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: blktests failures with v6.17-rc1 kernel
  2025-08-27 10:10 ` Daniel Wagner
@ 2025-08-28  5:55   ` Shinichiro Kawasaki
  2025-08-28 11:33     ` Daniel Wagner
  0 siblings, 1 reply; 9+ messages in thread
From: Shinichiro Kawasaki @ 2025-08-28  5:55 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On Aug 27, 2025 / 12:10, Daniel Wagner wrote:
> On Wed, Aug 13, 2025 at 10:50:34AM +0000, Shinichiro Kawasaki wrote:
> > #4: nvme/061 (fc transport)
> > 
> >     The test case nvme/061 sometimes fails for fc transport due to a WARN and
> >     refcount message "refcount_t: underflow; use-after-free." Refer to the
> >     report for the v6.15 kernel [5].
> > 
> >     [5]
> >     https://lore.kernel.org/linux-block/2xsfqvnntjx5iiir7wghhebmnugmpfluv6ef22mghojgk6gilr@mvjscqxroqqk/
> 
> This one might be fixed with
> 
> https://lore.kernel.org/linux-nvme/20250821-fix-nvmet-fc-v1-1-3349da4f416e@kernel.org/

I applied this patch on top of v6.17-rc3 kernel, but still I observe the
refcount WARN at nvme/061 with.

Said that, I like the patch. This week, I noticed that nvme/030 hangs with fc
transport. This hang is rare, but it is recreated in stable manner when I
repeat the test case. I tried the fix patch, and it avoided this hang :)
Thanks for the fix!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: blktests failures with v6.17-rc1 kernel
  2025-08-28  5:55   ` Shinichiro Kawasaki
@ 2025-08-28 11:33     ` Daniel Wagner
  2025-08-30 13:15       ` Shinichiro Kawasaki
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Wagner @ 2025-08-28 11:33 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

On Thu, Aug 28, 2025 at 05:55:06AM +0000, Shinichiro Kawasaki wrote:
> On Aug 27, 2025 / 12:10, Daniel Wagner wrote:
> > On Wed, Aug 13, 2025 at 10:50:34AM +0000, Shinichiro Kawasaki wrote:
> > > #4: nvme/061 (fc transport)
> > > 
> > >     The test case nvme/061 sometimes fails for fc transport due to a WARN and
> > >     refcount message "refcount_t: underflow; use-after-free." Refer to the
> > >     report for the v6.15 kernel [5].
> > > 
> > >     [5]
> > >     https://lore.kernel.org/linux-block/2xsfqvnntjx5iiir7wghhebmnugmpfluv6ef22mghojgk6gilr@mvjscqxroqqk/
> > 
> > This one might be fixed with
> > 
> > https://lore.kernel.org/linux-nvme/20250821-fix-nvmet-fc-v1-1-3349da4f416e@kernel.org/
> 
> I applied this patch on top of v6.17-rc3 kernel, but still I observe the
> refcount WARN at nvme/061 with.

Thanks for testing and I was able to reproduce it also. The problem is
that it's possible that an association is scheduled for deletion twice.

Would you mind to give the attached patch a try? It fixes the problem I
was able to reproduce.

> Said that, I like the patch. This week, I noticed that nvme/030 hangs with fc
> transport. This hang is rare, but it is recreated in stable manner when I
> repeat the test case. I tried the fix patch, and it avoided this hang :)
> Thanks for the fix!

Ah, nice so at least this one is fixed by the first patch :)

[-- Attachment #2: 0001-nvmet-fc-avoid-scheduling-association-deletion-twice.patch --]
[-- Type: text/plain, Size: 2196 bytes --]

From b0db044f5e828d5c12c368fecd17327f7a6e854d Mon Sep 17 00:00:00 2001
From: Daniel Wagner <wagi@kernel.org>
Date: Thu, 28 Aug 2025 13:18:21 +0200
Subject: [PATCH] nvmet-fc: avoid scheduling association deletion twice

When forcefully shutting down a port via the configfs interface,
nvmet_port_subsys_drop_link() first calls nvmet_port_del_ctrls() and
then nvmet_disable_port(). Both functions will eventually schedule all
remaining associations for deletion.

The current implementation checks whether an association is about to be
removed, but only after the work item has already been scheduled. As a
result, it is possible for the first scheduled work item to free all
resources, and then for the same work item to be scheduled again for
deletion.

Because the association list is an RCU list, it is not possible to take
a lock and remove the list entry directly, so it cannot be looked up
again. Instead, a flag (terminating) must be used to determine whether
the association is already in the process of being deleted.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
---
 drivers/nvme/target/fc.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 6725c34dd7c9..7d84527d5a43 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -1075,6 +1075,14 @@ nvmet_fc_delete_assoc_work(struct work_struct *work)
 static void
 nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
 {
+	int terminating;
+
+	terminating = atomic_xchg(&assoc->terminating, 1);
+
+	/* if already terminating, do nothing */
+	if (terminating)
+		return;
+
 	nvmet_fc_tgtport_get(assoc->tgtport);
 	if (!queue_work(nvmet_wq, &assoc->del_work))
 		nvmet_fc_tgtport_put(assoc->tgtport);
@@ -1202,13 +1210,7 @@ nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
 {
 	struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
 	unsigned long flags;
-	int i, terminating;
-
-	terminating = atomic_xchg(&assoc->terminating, 1);
-
-	/* if already terminating, do nothing */
-	if (terminating)
-		return;
+	int i;
 
 	spin_lock_irqsave(&tgtport->lock, flags);
 	list_del_rcu(&assoc->a_list);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: blktests failures with v6.17-rc1 kernel
  2025-08-28 11:33     ` Daniel Wagner
@ 2025-08-30 13:15       ` Shinichiro Kawasaki
  2025-09-01  8:34         ` Daniel Wagner
  0 siblings, 1 reply; 9+ messages in thread
From: Shinichiro Kawasaki @ 2025-08-30 13:15 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On Aug 28, 2025 / 13:33, Daniel Wagner wrote:
> On Thu, Aug 28, 2025 at 05:55:06AM +0000, Shinichiro Kawasaki wrote:
> > On Aug 27, 2025 / 12:10, Daniel Wagner wrote:
> > > On Wed, Aug 13, 2025 at 10:50:34AM +0000, Shinichiro Kawasaki wrote:
> > > > #4: nvme/061 (fc transport)
> > > > 
> > > >     The test case nvme/061 sometimes fails for fc transport due to a WARN and
> > > >     refcount message "refcount_t: underflow; use-after-free." Refer to the
> > > >     report for the v6.15 kernel [5].
> > > > 
> > > >     [5]
> > > >     https://lore.kernel.org/linux-block/2xsfqvnntjx5iiir7wghhebmnugmpfluv6ef22mghojgk6gilr@mvjscqxroqqk/
> > > 
> > > This one might be fixed with
> > > 
> > > https://lore.kernel.org/linux-nvme/20250821-fix-nvmet-fc-v1-1-3349da4f416e@kernel.org/
> > 
> > I applied this patch on top of v6.17-rc3 kernel, but still I observe the
> > refcount WARN at nvme/061 with.
> 
> Thanks for testing and I was able to reproduce it also. The problem is
> that it's possible that an association is scheduled for deletion twice.
> 
> Would you mind to give the attached patch a try? It fixes the problem I
> was able to reproduce.

Thanks for the effort. I applied the patch attached to v6.17-rc3 kernel an
repeated nvme/061. It avoided the WARN and the refcount_t message. This looks
good.

However, unfortunately, I observed a different failure symptom with KASAN
slab-use-after-free [*]. I'm not sure if the fix patch unveiled this KASAN, or
if created this KASAN. This failure is observed on my test systems in stable
manner, but it is required to repeat nvme/061 a few hundreds of times to
recreated it.


[*]

Aug 29 15:25:58 testnode1 unknown: run blktests nvme/061 at 2025-08-29 15:25:58
Aug 29 15:25:58 testnode1 kernel: loop0: detected capacity change from 0 to 2097152
Aug 29 15:25:58 testnode1 kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Aug 29 15:25:58 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:25:58 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:25:58 testnode1 kernel: nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
Aug 29 15:25:58 testnode1 kernel: nvme nvme2: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
Aug 29 15:25:58 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connect complete
Aug 29 15:25:58 testnode1 kernel: nvme nvme2: NVME-FC{0}: new ctrl: NQN "blktests-subsystem-1", hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
Aug 29 15:25:58 testnode1 kernel: nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:25:58 testnode1 kernel: (NULL device *): {0:1} Association created
Aug 29 15:25:58 testnode1 kernel: nvmet: Created discovery controller 2 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:344673d9-1587-47b5-813b-4c4060f39163.
Aug 29 15:25:58 testnode1 kernel: nvme nvme3: NVME-FC{1}: controller connect complete
Aug 29 15:25:58 testnode1 kernel: nvme nvme3: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:344673d9-1587-47b5-813b-4c4060f39163
Aug 29 15:25:58 testnode1 kernel: nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:25:58 testnode1 kernel: (NULL device *): {0:1} Association deleted
Aug 29 15:25:58 testnode1 kernel: (NULL device *): {0:1} Association freed
Aug 29 15:25:58 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: resetting controller
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme2n1: I/O Cmd(0x2) @ LBA 1801032, 8 blocks, I/O Error (sct 0x3 / sc 0x70) 
Aug 29 15:25:59 testnode1 kernel: recoverable transport error, dev nvme2n1, sector 1801032 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 2
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme2n1: I/O Cmd(0x2) @ LBA 1143192, 8 blocks, I/O Error (sct 0x3 / sc 0x70) 
Aug 29 15:25:59 testnode1 kernel: recoverable transport error, dev nvme2n1, sector 1143192 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 2
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error -107
Aug 29 15:25:59 testnode1 kernel: nvme2n1: I/O Cmd(0x2) @ LBA 1582360, 8 blocks, I/O Error (sct 0x3 / sc 0x70) 
Aug 29 15:25:59 testnode1 kernel: nvme2n1: I/O Cmd(0x2) @ LBA 927240, 8 blocks, I/O Error (sct 0x3 / sc 0x70) 
Aug 29 15:25:59 testnode1 kernel: recoverable transport error, dev nvme2n1, sector 1582360 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 2
Aug 29 15:25:59 testnode1 kernel: recoverable transport error, dev nvme2n1, sector 927240 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 2
Aug 29 15:25:59 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:25:59 testnode1 kernel: (NULL device *): queue 0 connect admin queue failed (-111).
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: reset: Reconnect attempt failed (-111)
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: Reconnect attempt in 1 seconds
Aug 29 15:25:59 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:25:59 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Aug 29 15:25:59 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connectivity lost. Awaiting Reconnect
Aug 29 15:25:59 testnode1 kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000001:pn-0x20001100ab000001 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found
Aug 29 15:26:00 testnode1 kernel: nvme nvme2: long keepalive RTT (4294953920 ms)
Aug 29 15:26:00 testnode1 kernel: nvme nvme2: failed nvme_keep_alive_end_io error=4
Aug 29 15:26:00 testnode1 kernel: loop1: detected capacity change from 0 to 2097152
Aug 29 15:26:00 testnode1 kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Aug 29 15:26:00 testnode1 kernel: nvme nvme2: NVME-FC{0}: connectivity re-established. Attempting reconnect
Aug 29 15:26:00 testnode1 kernel: nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:00 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:00 testnode1 kernel: nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:a0f31071-324f-4782-a409-53919ea33737.
Aug 29 15:26:00 testnode1 kernel: nvme nvme3: NVME-FC{1}: controller connect complete
Aug 29 15:26:00 testnode1 kernel: nvme nvme3: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:a0f31071-324f-4782-a409-53919ea33737
Aug 29 15:26:00 testnode1 kernel: nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:00 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:00 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:00 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Aug 29 15:26:00 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:26:00 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:00 testnode1 kernel: nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
Aug 29 15:26:00 testnode1 kernel: nvme nvme2: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
Aug 29 15:26:00 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connect complete
Aug 29 15:26:01 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:01 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
Aug 29 15:26:01 testnode1 kernel: nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
Aug 29 15:26:01 testnode1 kernel: nvme nvme2: NVME-FC{0}: resetting controller
Aug 29 15:26:01 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:01 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connectivity lost. Awaiting Reconnect
Aug 29 15:26:01 testnode1 kernel: nvme nvme2: long keepalive RTT (4294955248 ms)
Aug 29 15:26:01 testnode1 kernel: nvme nvme2: failed nvme_keep_alive_end_io error=4
Aug 29 15:26:01 testnode1 kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000001:pn-0x20001100ab000001 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found
Aug 29 15:26:01 testnode1 kernel: loop2: detected capacity change from 0 to 2097152
Aug 29 15:26:01 testnode1 kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Aug 29 15:26:01 testnode1 kernel: nvme nvme2: NVME-FC{0}: connectivity re-established. Attempting reconnect
Aug 29 15:26:01 testnode1 kernel: nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:01 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:01 testnode1 kernel: nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:726c2d00-6cc9-4c7f-86fb-6d1b6f5fe2a4.
Aug 29 15:26:01 testnode1 kernel: nvme nvme3: NVME-FC{1}: controller connect complete
Aug 29 15:26:01 testnode1 kernel: nvme nvme3: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:726c2d00-6cc9-4c7f-86fb-6d1b6f5fe2a4
Aug 29 15:26:01 testnode1 kernel: nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:01 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:01 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:01 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:26:02 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:02 testnode1 kernel: nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connect complete
Aug 29 15:26:02 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
Aug 29 15:26:02 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: NVME-FC{0}: resetting controller
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: NVME-FC{0}: transport association event: Disconnect Association LS received
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: NVME-FC{0}: resetting controller
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connectivity lost. Awaiting Reconnect
Aug 29 15:26:02 testnode1 kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000001:pn-0x20001100ab000001 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found
Aug 29 15:26:02 testnode1 kernel: loop3: detected capacity change from 0 to 2097152
Aug 29 15:26:02 testnode1 kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: long keepalive RTT (4294956800 ms)
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: failed nvme_keep_alive_end_io error=4
Aug 29 15:26:02 testnode1 kernel: nvme nvme2: NVME-FC{0}: connectivity re-established. Attempting reconnect
Aug 29 15:26:03 testnode1 kernel: nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:03 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:03 testnode1 kernel: nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:2e58093d-b298-46c0-9950-e5a72e9b3cd6.
Aug 29 15:26:03 testnode1 kernel: nvme nvme3: NVME-FC{1}: controller connect complete
Aug 29 15:26:03 testnode1 kernel: nvme nvme3: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:2e58093d-b298-46c0-9950-e5a72e9b3cd6
Aug 29 15:26:03 testnode1 kernel: nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:03 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:03 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:03 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Aug 29 15:26:03 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:26:03 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:03 testnode1 kernel: nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
Aug 29 15:26:03 testnode1 kernel: nvme nvme2: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
Aug 29 15:26:03 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connect complete
Aug 29 15:26:04 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:04 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
Aug 29 15:26:04 testnode1 kernel: nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
Aug 29 15:26:04 testnode1 kernel: nvme nvme2: NVME-FC{0}: resetting controller
Aug 29 15:26:04 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:04 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:26:04 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connectivity lost. Awaiting Reconnect
Aug 29 15:26:04 testnode1 kernel: (NULL device *): queue 0 connect admin queue failed (-111).
Aug 29 15:26:04 testnode1 kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000001:pn-0x20001100ab000001 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found
Aug 29 15:26:04 testnode1 kernel: nvme nvme2: long keepalive RTT (4294958193 ms)
Aug 29 15:26:04 testnode1 kernel: loop4: detected capacity change from 0 to 2097152
Aug 29 15:26:04 testnode1 kernel: nvme nvme2: failed nvme_keep_alive_end_io error=4
Aug 29 15:26:04 testnode1 kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Aug 29 15:26:04 testnode1 kernel: nvme nvme2: NVME-FC{0}: connectivity re-established. Attempting reconnect
Aug 29 15:26:04 testnode1 kernel: nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:04 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:04 testnode1 kernel: nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:81b7c2e6-bc71-4b3d-af4e-3a0b4bc3cbd8.
Aug 29 15:26:04 testnode1 kernel: nvme nvme3: NVME-FC{1}: controller connect complete
Aug 29 15:26:04 testnode1 kernel: nvme nvme3: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:81b7c2e6-bc71-4b3d-af4e-3a0b4bc3cbd8
Aug 29 15:26:04 testnode1 kernel: nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:04 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:04 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:04 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:26:05 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:05 testnode1 kernel: nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connect complete
Aug 29 15:26:05 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: NVME-FC{0}: resetting controller
Aug 29 15:26:05 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:26:05 testnode1 kernel: (NULL device *): queue 0 connect admin queue failed (-111).
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connectivity lost. Awaiting Reconnect
Aug 29 15:26:05 testnode1 kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000001:pn-0x20001100ab000001 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: long keepalive RTT (4294959609 ms)
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: failed nvme_keep_alive_end_io error=4
Aug 29 15:26:05 testnode1 kernel: loop5: detected capacity change from 0 to 2097152
Aug 29 15:26:05 testnode1 kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Aug 29 15:26:05 testnode1 kernel: nvme nvme2: NVME-FC{0}: connectivity re-established. Attempting reconnect
Aug 29 15:26:06 testnode1 kernel: nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:06 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:06 testnode1 kernel: nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:af34c46a-972a-4e88-b1ca-61ac0d58478c.
Aug 29 15:26:06 testnode1 kernel: nvme nvme3: NVME-FC{1}: controller connect complete
Aug 29 15:26:06 testnode1 kernel: nvme nvme3: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:af34c46a-972a-4e88-b1ca-61ac0d58478c
Aug 29 15:26:06 testnode1 kernel: nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:06 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:06 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:06 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Aug 29 15:26:06 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:26:06 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:06 testnode1 kernel: nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
Aug 29 15:26:06 testnode1 kernel: nvme nvme2: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
Aug 29 15:26:06 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connect complete
Aug 29 15:26:06 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: resetting controller
Aug 29 15:26:07 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:26:07 testnode1 kernel: (NULL device *): queue 0 connect admin queue failed (-111).
Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connectivity lost. Awaiting Reconnect
Aug 29 15:26:07 testnode1 kernel: ==================================================================
Aug 29 15:26:07 testnode1 kernel: BUG: KASAN: slab-use-after-free in fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
Aug 29 15:26:07 testnode1 kernel: Write of size 8 at addr ffff8881145fa700 by task kworker/u16:9/95
Aug 29 15:26:07 testnode1 kernel: 
Aug 29 15:26:07 testnode1 kernel: CPU: 0 UID: 0 PID: 95 Comm: kworker/u16:9 Not tainted 6.17.0-rc3+ #356 PREEMPT(voluntary) 
Aug 29 15:26:07 testnode1 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014
Aug 29 15:26:07 testnode1 kernel: Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
Aug 29 15:26:07 testnode1 kernel: Call Trace:
Aug 29 15:26:07 testnode1 kernel:  <TASK>
Aug 29 15:26:07 testnode1 kernel:  dump_stack_lvl+0x6a/0x90
Aug 29 15:26:07 testnode1 kernel:  ? fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
Aug 29 15:26:07 testnode1 kernel:  print_report+0x170/0x4f3
Aug 29 15:26:07 testnode1 kernel:  ? __virt_addr_valid+0x22e/0x4e0
Aug 29 15:26:07 testnode1 kernel:  ? fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
Aug 29 15:26:07 testnode1 kernel:  kasan_report+0xad/0x170
Aug 29 15:26:07 testnode1 kernel:  ? fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
Aug 29 15:26:07 testnode1 kernel:  fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
Aug 29 15:26:07 testnode1 kernel:  nvme_fc_ctlr_inactive_on_rport.isra.0+0x1b1/0x210 [nvme_fc]
Aug 29 15:26:07 testnode1 kernel:  nvme_fc_connect_ctrl_work.cold+0x33f/0x348e [nvme_fc]
Aug 29 15:26:07 testnode1 kernel:  ? lock_acquire+0x170/0x310
Aug 29 15:26:07 testnode1 kernel:  ? __pfx_nvme_fc_connect_ctrl_work+0x10/0x10 [nvme_fc]
Aug 29 15:26:07 testnode1 kernel:  ? lock_acquire+0x180/0x310
Aug 29 15:26:07 testnode1 kernel:  ? process_one_work+0x722/0x14b0
Aug 29 15:26:07 testnode1 kernel:  ? lock_release+0x1ad/0x300
Aug 29 15:26:07 testnode1 kernel:  ? rcu_is_watching+0x11/0xb0
Aug 29 15:26:07 testnode1 kernel:  process_one_work+0x868/0x14b0
Aug 29 15:26:07 testnode1 kernel:  ? __pfx_process_one_work+0x10/0x10
Aug 29 15:26:07 testnode1 kernel:  ? lock_acquire+0x170/0x310
Aug 29 15:26:07 testnode1 kernel:  ? assign_work+0x156/0x390
Aug 29 15:26:07 testnode1 kernel:  worker_thread+0x5ee/0xfd0
Aug 29 15:26:07 testnode1 kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 29 15:26:07 testnode1 kernel:  kthread+0x3af/0x770
Aug 29 15:26:07 testnode1 kernel:  ? lock_acquire+0x180/0x310
Aug 29 15:26:07 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 15:26:07 testnode1 kernel:  ? ret_from_fork+0x1d/0x4e0
Aug 29 15:26:07 testnode1 kernel:  ? lock_release+0x1ad/0x300
Aug 29 15:26:07 testnode1 kernel:  ? rcu_is_watching+0x11/0xb0
Aug 29 15:26:07 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 15:26:07 testnode1 kernel:  ret_from_fork+0x3be/0x4e0
Aug 29 15:26:07 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 15:26:07 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
Aug 29 15:26:07 testnode1 kernel:  ret_from_fork_asm+0x1a/0x30
Aug 29 15:26:07 testnode1 kernel:  </TASK>
Aug 29 15:26:07 testnode1 kernel: 
Aug 29 15:26:07 testnode1 kernel: Allocated by task 14561:
Aug 29 15:26:07 testnode1 kernel:  kasan_save_stack+0x2c/0x50
Aug 29 15:26:07 testnode1 kernel:  kasan_save_track+0x10/0x30
Aug 29 15:26:07 testnode1 kernel:  __kasan_kmalloc+0x96/0xb0
Aug 29 15:26:07 testnode1 kernel:  fcloop_alloc_nport.isra.0+0xdb/0x910 [nvme_fcloop]
Aug 29 15:26:07 testnode1 kernel:  fcloop_create_target_port+0xa6/0x5a0 [nvme_fcloop]
Aug 29 15:26:07 testnode1 kernel:  kernfs_fop_write_iter+0x39a/0x5a0
Aug 29 15:26:07 testnode1 kernel:  vfs_write+0x523/0xf80
Aug 29 15:26:07 testnode1 kernel:  ksys_write+0xfb/0x200
Aug 29 15:26:07 testnode1 kernel:  do_syscall_64+0x94/0x3d0
Aug 29 15:26:07 testnode1 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 29 15:26:07 testnode1 kernel: 
Aug 29 15:26:07 testnode1 kernel: Freed by task 14126:
Aug 29 15:26:07 testnode1 kernel:  kasan_save_stack+0x2c/0x50
Aug 29 15:26:07 testnode1 kernel:  kasan_save_track+0x10/0x30
Aug 29 15:26:07 testnode1 kernel:  kasan_save_free_info+0x37/0x70
Aug 29 15:26:07 testnode1 kernel:  __kasan_slab_free+0x5f/0x70
Aug 29 15:26:07 testnode1 kernel:  kfree+0x13a/0x4c0
Aug 29 15:26:07 testnode1 kernel:  fcloop_delete_remote_port+0x238/0x390 [nvme_fcloop]
Aug 29 15:26:07 testnode1 kernel:  kernfs_fop_write_iter+0x39a/0x5a0
Aug 29 15:26:07 testnode1 kernel:  vfs_write+0x523/0xf80
Aug 29 15:26:07 testnode1 kernel:  ksys_write+0xfb/0x200
Aug 29 15:26:07 testnode1 kernel:  do_syscall_64+0x94/0x3d0
Aug 29 15:26:07 testnode1 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 29 15:26:07 testnode1 kernel: 
Aug 29 15:26:07 testnode1 kernel: The buggy address belongs to the object at ffff8881145fa700
                                   which belongs to the cache kmalloc-96 of size 96
Aug 29 15:26:07 testnode1 kernel: The buggy address is located 0 bytes inside of
                                   freed 96-byte region [ffff8881145fa700, ffff8881145fa760)
Aug 29 15:26:07 testnode1 kernel: 
Aug 29 15:26:07 testnode1 kernel: The buggy address belongs to the physical page:
Aug 29 15:26:07 testnode1 kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1145fa
Aug 29 15:26:07 testnode1 kernel: flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
Aug 29 15:26:07 testnode1 kernel: page_type: f5(slab)
Aug 29 15:26:07 testnode1 kernel: raw: 0017ffffc0000000 ffff888100042280 ffffea0004792400 dead000000000002
Aug 29 15:26:07 testnode1 kernel: raw: 0000000000000000 0000000080200020 00000000f5000000 0000000000000000
Aug 29 15:26:07 testnode1 kernel: page dumped because: kasan: bad access detected
Aug 29 15:26:07 testnode1 kernel: 
Aug 29 15:26:07 testnode1 kernel: Memory state around the buggy address:
Aug 29 15:26:07 testnode1 kernel:  ffff8881145fa600: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
Aug 29 15:26:07 testnode1 kernel:  ffff8881145fa680: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
Aug 29 15:26:07 testnode1 kernel: >ffff8881145fa700: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
Aug 29 15:26:07 testnode1 kernel:                    ^
Aug 29 15:26:07 testnode1 kernel:  ffff8881145fa780: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
Aug 29 15:26:07 testnode1 kernel:  ffff8881145fa800: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
Aug 29 15:26:07 testnode1 kernel: ==================================================================
Aug 29 15:26:07 testnode1 kernel: Disabling lock debugging due to kernel taint
Aug 29 15:26:07 testnode1 kernel: nvme nvme2: long keepalive RTT (4294961016 ms)
Aug 29 15:26:07 testnode1 kernel: nvme nvme2: failed nvme_keep_alive_end_io error=4
Aug 29 15:26:07 testnode1 kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000001:pn-0x20001100ab000001 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found
Aug 29 15:26:07 testnode1 kernel: loop6: detected capacity change from 0 to 2097152
Aug 29 15:26:07 testnode1 kernel: nvmet: adding nsid 1 to subsystem blktests-subsystem-1
Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: connectivity re-established. Attempting reconnect
Aug 29 15:26:07 testnode1 kernel: nvme nvme3: NVME-FC{1}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:07 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:07 testnode1 kernel: nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:f1cd867c-f41b-4d5f-ae46-6bc6cefca7ab.
Aug 29 15:26:07 testnode1 kernel: nvme nvme3: NVME-FC{1}: controller connect complete
Aug 29 15:26:07 testnode1 kernel: nvme nvme3: NVME-FC{1}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", hostnqn: nqn.2014-08.org.nvmexpress:uuid:f1cd867c-f41b-4d5f-ae46-6bc6cefca7ab
Aug 29 15:26:07 testnode1 kernel: nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
Aug 29 15:26:07 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:07 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:07 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Aug 29 15:26:08 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
Aug 29 15:26:08 testnode1 kernel: (NULL device *): {0:0} Association created
Aug 29 15:26:08 testnode1 kernel: nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
Aug 29 15:26:08 testnode1 kernel: nvme nvme2: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.
Aug 29 15:26:08 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connect complete
Aug 29 15:26:08 testnode1 kernel: nvme nvme2: Removing ctrl: NQN "blktests-subsystem-1"
Aug 29 15:26:08 testnode1 kernel: (NULL device *): {0:0} Association deleted
Aug 29 15:26:08 testnode1 kernel: (NULL device *): {0:0} Association freed
Aug 29 15:26:08 testnode1 kernel: (NULL device *): Disconnect LS failed: No Association
Aug 29 15:26:08 testnode1 kernel: nvme_fc: nvme_fc_create_ctrl: nn-0x10001100ab000001:pn-0x20001100ab000001 - nn-0x10001100aa000001:pn-0x20001100aa000001 combination not found


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: blktests failures with v6.17-rc1 kernel
  2025-08-30 13:15       ` Shinichiro Kawasaki
@ 2025-09-01  8:34         ` Daniel Wagner
  2025-09-01  9:02           ` Daniel Wagner
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Wagner @ 2025-09-01  8:34 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On Sat, Aug 30, 2025 at 01:15:48PM +0000, Shinichiro Kawasaki wrote:
> On Aug 28, 2025 / 13:33, Daniel Wagner wrote:
> > Would you mind to give the attached patch a try? It fixes the problem I
> > was able to reproduce.
> 
> Thanks for the effort. I applied the patch attached to v6.17-rc3 kernel an
> repeated nvme/061. It avoided the WARN and the refcount_t message. This looks
> good.

Glad to hear this!

> However, unfortunately, I observed a different failure symptom with KASAN
> slab-use-after-free [*]. I'm not sure if the fix patch unveiled this KASAN, or
> if created this KASAN. This failure is observed on my test systems in stable
> manner, but it is required to repeat nvme/061 a few hundreds of times to
> recreated it.

I am not surprised that there are more bugs popping up. Maybe it was
hidden by the previous one. Anyway let's have a look.

> Aug 29 15:26:06 testnode1 kernel: nvme nvme2: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.

Do you happen to know if this is necessary to reproduce? After looking
at it, I don't think it matters.

> Aug 29 15:26:06 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connect complete
> Aug 29 15:26:06 testnode1 kernel: (NULL device *): {0:0} Association deleted
> Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: io failed due to lldd error 6
> Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: transport association event: transport detected io error
> Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: resetting controller
> Aug 29 15:26:07 testnode1 kernel: (NULL device *): {0:0} Association freed
> Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: create association : host wwpn 0x20001100aa000001  rport wwpn 0x20001100ab000001: NQN "blktests-subsystem-1"
> Aug 29 15:26:07 testnode1 kernel: (NULL device *): queue 0 connect admin queue failed (-111).
> Aug 29 15:26:07 testnode1 kernel: nvme nvme2: NVME-FC{0}: controller connectivity lost. Awaiting Reconnect
> Aug 29 15:26:07 testnode1 kernel: ==================================================================
> Aug 29 15:26:07 testnode1 kernel: BUG: KASAN: slab-use-after-free in fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
> Aug 29 15:26:07 testnode1 kernel: Write of size 8 at addr ffff8881145fa700 by task kworker/u16:9/95
> Aug 29 15:26:07 testnode1 kernel: 
> Aug 29 15:26:07 testnode1 kernel: CPU: 0 UID: 0 PID: 95 Comm: kworker/u16:9 Not tainted 6.17.0-rc3+ #356 PREEMPT(voluntary) 
> Aug 29 15:26:07 testnode1 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-4.fc42 04/01/2014
> Aug 29 15:26:07 testnode1 kernel: Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
> Aug 29 15:26:07 testnode1 kernel: Call Trace:
> Aug 29 15:26:07 testnode1 kernel:  <TASK>
> Aug 29 15:26:07 testnode1 kernel:  dump_stack_lvl+0x6a/0x90
> Aug 29 15:26:07 testnode1 kernel:  ? fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
> Aug 29 15:26:07 testnode1 kernel:  print_report+0x170/0x4f3
> Aug 29 15:26:07 testnode1 kernel:  ? __virt_addr_valid+0x22e/0x4e0
> Aug 29 15:26:07 testnode1 kernel:  ? fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
> Aug 29 15:26:07 testnode1 kernel:  kasan_report+0xad/0x170
> Aug 29 15:26:07 testnode1 kernel:  ? fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
> Aug 29 15:26:07 testnode1 kernel:  fcloop_remoteport_delete+0x150/0x190 [nvme_fcloop]
> Aug 29 15:26:07 testnode1 kernel:  nvme_fc_ctlr_inactive_on_rport.isra.0+0x1b1/0x210 [nvme_fc]
> Aug 29 15:26:07 testnode1 kernel:  nvme_fc_connect_ctrl_work.cold+0x33f/0x348e [nvme_fc]
> Aug 29 15:26:07 testnode1 kernel:  ? lock_acquire+0x170/0x310
> Aug 29 15:26:07 testnode1 kernel:  ? __pfx_nvme_fc_connect_ctrl_work+0x10/0x10 [nvme_fc]
> Aug 29 15:26:07 testnode1 kernel:  ? lock_acquire+0x180/0x310
> Aug 29 15:26:07 testnode1 kernel:  ? process_one_work+0x722/0x14b0
> Aug 29 15:26:07 testnode1 kernel:  ? lock_release+0x1ad/0x300
> Aug 29 15:26:07 testnode1 kernel:  ? rcu_is_watching+0x11/0xb0
> Aug 29 15:26:07 testnode1 kernel:  process_one_work+0x868/0x14b0
> Aug 29 15:26:07 testnode1 kernel:  ? __pfx_process_one_work+0x10/0x10
> Aug 29 15:26:07 testnode1 kernel:  ? lock_acquire+0x170/0x310
> Aug 29 15:26:07 testnode1 kernel:  ? assign_work+0x156/0x390
> Aug 29 15:26:07 testnode1 kernel:  worker_thread+0x5ee/0xfd0
> Aug 29 15:26:07 testnode1 kernel:  ? __pfx_worker_thread+0x10/0x10
> Aug 29 15:26:07 testnode1 kernel:  kthread+0x3af/0x770
> Aug 29 15:26:07 testnode1 kernel:  ? lock_acquire+0x180/0x310
> Aug 29 15:26:07 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
> Aug 29 15:26:07 testnode1 kernel:  ? ret_from_fork+0x1d/0x4e0
> Aug 29 15:26:07 testnode1 kernel:  ? lock_release+0x1ad/0x300
> Aug 29 15:26:07 testnode1 kernel:  ? rcu_is_watching+0x11/0xb0
> Aug 29 15:26:07 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
> Aug 29 15:26:07 testnode1 kernel:  ret_from_fork+0x3be/0x4e0
> Aug 29 15:26:07 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
> Aug 29 15:26:07 testnode1 kernel:  ? __pfx_kthread+0x10/0x10
> Aug 29 15:26:07 testnode1 kernel:  ret_from_fork_asm+0x1a/0x30
> Aug 29 15:26:07 testnode1 kernel:  </TASK>
> Aug 29 15:26:07 testnode1 kernel: 
> Aug 29 15:26:07 testnode1 kernel: Allocated by task 14561:
> Aug 29 15:26:07 testnode1 kernel:  kasan_save_stack+0x2c/0x50
> Aug 29 15:26:07 testnode1 kernel:  kasan_save_track+0x10/0x30
> Aug 29 15:26:07 testnode1 kernel:  __kasan_kmalloc+0x96/0xb0
> Aug 29 15:26:07 testnode1 kernel:  fcloop_alloc_nport.isra.0+0xdb/0x910 [nvme_fcloop]
> Aug 29 15:26:07 testnode1 kernel:  fcloop_create_target_port+0xa6/0x5a0 [nvme_fcloop]
> Aug 29 15:26:07 testnode1 kernel:  kernfs_fop_write_iter+0x39a/0x5a0
> Aug 29 15:26:07 testnode1 kernel:  vfs_write+0x523/0xf80
> Aug 29 15:26:07 testnode1 kernel:  ksys_write+0xfb/0x200
> Aug 29 15:26:07 testnode1 kernel:  do_syscall_64+0x94/0x3d0
> Aug 29 15:26:07 testnode1 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> Aug 29 15:26:07 testnode1 kernel: 
> Aug 29 15:26:07 testnode1 kernel: Freed by task 14126:
> Aug 29 15:26:07 testnode1 kernel:  kasan_save_stack+0x2c/0x50
> Aug 29 15:26:07 testnode1 kernel:  kasan_save_track+0x10/0x30
> Aug 29 15:26:07 testnode1 kernel:  kasan_save_free_info+0x37/0x70
> Aug 29 15:26:07 testnode1 kernel:  __kasan_slab_free+0x5f/0x70
> Aug 29 15:26:07 testnode1 kernel:  kfree+0x13a/0x4c0
> Aug 29 15:26:07 testnode1 kernel:  fcloop_delete_remote_port+0x238/0x390 [nvme_fcloop]
> Aug 29 15:26:07 testnode1 kernel:  kernfs_fop_write_iter+0x39a/0x5a0
> Aug 29 15:26:07 testnode1 kernel:  vfs_write+0x523/0xf80
> Aug 29 15:26:07 testnode1 kernel:  ksys_write+0xfb/0x200
> Aug 29 15:26:07 testnode1 kernel:  do_syscall_64+0x94/0x3d0
> Aug 29 15:26:07 testnode1 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> Aug 29 15:26:07 testnode1 kernel: 
> Aug 29 15:26:07 testnode1 kernel: The buggy address belongs to the object at ffff8881145fa700
>                                    which belongs to the cache kmalloc-96 of size 96
> Aug 29 15:26:07 testnode1 kernel: The buggy address is located 0 bytes inside of
>                                    freed 96-byte region [ffff8881145fa700, ffff8881145fa760)
> Aug 29 15:26:07 testnode1 kernel: 
> Aug 29 15:26:07 testnode1 kernel: The buggy address belongs to the physical page:
> Aug 29 15:26:07 testnode1 kernel: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1145fa
> Aug 29 15:26:07 testnode1 kernel: flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
> Aug 29 15:26:07 testnode1 kernel: page_type: f5(slab)
> Aug 29 15:26:07 testnode1 kernel: raw: 0017ffffc0000000 ffff888100042280 ffffea0004792400 dead000000000002
> Aug 29 15:26:07 testnode1 kernel: raw: 0000000000000000 0000000080200020 00000000f5000000 0000000000000000
> Aug 29 15:26:07 testnode1 kernel: page dumped because: kasan: bad access detected
> Aug 29 15:26:07 testnode1 kernel: 
> Aug 29 15:26:07 testnode1 kernel: Memory state around the buggy address:
> Aug 29 15:26:07 testnode1 kernel:  ffff8881145fa600: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> Aug 29 15:26:07 testnode1 kernel:  ffff8881145fa680: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
> Aug 29 15:26:07 testnode1 kernel: >ffff8881145fa700: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
> Aug 29 15:26:07 testnode1 kernel:                    ^
> Aug 29 15:26:07 testnode1 kernel:  ffff8881145fa780: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
> Aug 29 15:26:07 testnode1 kernel:  ffff8881145fa800: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
> Aug 29 15:26:07 testnode1 kernel: ==================================================================

The test is removing the ports while the host driver is about to
reconnect and accesses a stale pointer.

nvme_fc_create_association is calling nvme_fc_ctlr_inactive_on_rport in
the error path. The problem is that nvme_fc_create_association gets half
through the setup and then fails. In the cleanup path

	dev_warn(ctrl->ctrl.device,
		"NVME-FC{%d}: create_assoc failed, assoc_id %llx ret %d\n",
		ctrl->cnum, ctrl->association_id, ret);

is issued and then nvme_fc_ctlr_inactive_on_rport is called. And there
is the log message above, so it's clear the error path is taken.

But the thing is fcloop is not supposed to remove the ports when the
host driver is still using it. So there is a race window where it's
possible to enter nvme_fc_create_assocation and fcloop removing the
ports.

So between nvme_fc_create_assocation and nvme_fc_ctlr_active_on_rport.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: blktests failures with v6.17-rc1 kernel
  2025-09-01  8:34         ` Daniel Wagner
@ 2025-09-01  9:02           ` Daniel Wagner
  2025-09-02  6:00             ` Shinichiro Kawasaki
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Wagner @ 2025-09-01  9:02 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On Mon, Sep 01, 2025 at 10:34:23AM +0200, Daniel Wagner wrote:
> The test is removing the ports while the host driver is about to
> reconnect and accesses a stale pointer.
> 
> nvme_fc_create_association is calling nvme_fc_ctlr_inactive_on_rport in
> the error path. The problem is that nvme_fc_create_association gets half
> through the setup and then fails. In the cleanup path
> 
> 	dev_warn(ctrl->ctrl.device,
> 		"NVME-FC{%d}: create_assoc failed, assoc_id %llx ret %d\n",
> 		ctrl->cnum, ctrl->association_id, ret);
> 
> is issued and then nvme_fc_ctlr_inactive_on_rport is called. And there
> is the log message above, so it's clear the error path is taken.
> 
> But the thing is fcloop is not supposed to remove the ports when the
> host driver is still using it. So there is a race window where it's
> possible to enter nvme_fc_create_assocation and fcloop removing the
> ports.
> 
> So between nvme_fc_create_assocation and nvme_fc_ctlr_active_on_rport.

I think the problem is that nvme_fc_create_association is not holding
the rport locks when checking the port_state and marking the rport
active. This races with nvme_fc_unregister_remoteport.

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 3e12d4683ac7..03987f497a5b 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3032,11 +3032,17 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)

 	++ctrl->ctrl.nr_reconnects;

-	if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE)
+	spin_lock_irqsave(&ctrl->rport->lock, flags);
+	if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) {
+		spin_unlock_irqrestore(&ctrl->rport->lock, flags);
 		return -ENODEV;
+	}

-	if (nvme_fc_ctlr_active_on_rport(ctrl))
+	if (nvme_fc_ctlr_active_on_rport(ctrl)) {
+		spin_unlock_irqrestore(&ctrl->rport->lock, flags);
 		return -ENOTUNIQ;
+	}
+	spin_unlock_irqrestore(&ctrl->rport->lock, flags);

 	dev_info(ctrl->ctrl.device,
 		"NVME-FC{%d}: create association : host wwpn 0x%016llx "

I'll to reproduce it and see if this patch does make a difference.


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: blktests failures with v6.17-rc1 kernel
  2025-09-01  9:02           ` Daniel Wagner
@ 2025-09-02  6:00             ` Shinichiro Kawasaki
  2025-09-02  7:44               ` Daniel Wagner
  0 siblings, 1 reply; 9+ messages in thread
From: Shinichiro Kawasaki @ 2025-09-02  6:00 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

On Sep 01, 2025 / 11:02, Daniel Wagner wrote:
> On Mon, Sep 01, 2025 at 10:34:23AM +0200, Daniel Wagner wrote:
> > The test is removing the ports while the host driver is about to
> > reconnect and accesses a stale pointer.
> > 
> > nvme_fc_create_association is calling nvme_fc_ctlr_inactive_on_rport in
> > the error path. The problem is that nvme_fc_create_association gets half
> > through the setup and then fails. In the cleanup path
> > 
> > 	dev_warn(ctrl->ctrl.device,
> > 		"NVME-FC{%d}: create_assoc failed, assoc_id %llx ret %d\n",
> > 		ctrl->cnum, ctrl->association_id, ret);
> > 
> > is issued and then nvme_fc_ctlr_inactive_on_rport is called. And there
> > is the log message above, so it's clear the error path is taken.
> > 
> > But the thing is fcloop is not supposed to remove the ports when the
> > host driver is still using it. So there is a race window where it's
> > possible to enter nvme_fc_create_assocation and fcloop removing the
> > ports.
> > 
> > So between nvme_fc_create_assocation and nvme_fc_ctlr_active_on_rport.
> 
> I think the problem is that nvme_fc_create_association is not holding
> the rport locks when checking the port_state and marking the rport
> active. This races with nvme_fc_unregister_remoteport.
> 
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 3e12d4683ac7..03987f497a5b 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -3032,11 +3032,17 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
> 
>  	++ctrl->ctrl.nr_reconnects;
> 
> -	if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE)
> +	spin_lock_irqsave(&ctrl->rport->lock, flags);
> +	if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE) {
> +		spin_unlock_irqrestore(&ctrl->rport->lock, flags);
>  		return -ENODEV;
> +	}
> 
> -	if (nvme_fc_ctlr_active_on_rport(ctrl))
> +	if (nvme_fc_ctlr_active_on_rport(ctrl)) {
> +		spin_unlock_irqrestore(&ctrl->rport->lock, flags);
>  		return -ENOTUNIQ;
> +	}
> +	spin_unlock_irqrestore(&ctrl->rport->lock, flags);
> 
>  	dev_info(ctrl->ctrl.device,
>  		"NVME-FC{%d}: create association : host wwpn 0x%016llx "
> 
> I'll to reproduce it and see if this patch does make a difference.

I applied the fix patch above together with the previous fix patch on top of
v6.17-rc3, then I repeated nvme/061 with fc transport hundreds of times. I
did not observed the KASAN suaf. The fix patch looks working good. Thanks!


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: blktests failures with v6.17-rc1 kernel
  2025-09-02  6:00             ` Shinichiro Kawasaki
@ 2025-09-02  7:44               ` Daniel Wagner
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Wagner @ 2025-09-02  7:44 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org, nbd@other.debian.org,
	linux-rdma@vger.kernel.org

Hi Shinichiro,

On Tue, Sep 02, 2025 at 06:00:17AM +0000, Shinichiro Kawasaki wrote:
> > I'll to reproduce it and see if this patch does make a difference.
> 
> I applied the fix patch above together with the previous fix patch on top of
> v6.17-rc3, then I repeated nvme/061 with fc transport hundreds of times. I
> did not observed the KASAN suaf. The fix patch looks working good. Thanks!

Thanks for testing. I was not able to trigger the bug. Let me cleanup it
and post it.

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-09-02  7:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 10:50 blktests failures with v6.17-rc1 kernel Shinichiro Kawasaki
2025-08-27 10:10 ` Daniel Wagner
2025-08-28  5:55   ` Shinichiro Kawasaki
2025-08-28 11:33     ` Daniel Wagner
2025-08-30 13:15       ` Shinichiro Kawasaki
2025-09-01  8:34         ` Daniel Wagner
2025-09-01  9:02           ` Daniel Wagner
2025-09-02  6:00             ` Shinichiro Kawasaki
2025-09-02  7:44               ` Daniel Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).