* Re: nvme-6.3: blktests nvme/003 null pointer for fc transport
2023-02-01 8:10 ` Daniel Wagner
@ 2023-02-01 8:24 ` Hannes Reinecke
2023-02-20 16:04 ` Daniel Wagner
1 sibling, 0 replies; 5+ messages in thread
From: Hannes Reinecke @ 2023-02-01 8:24 UTC (permalink / raw)
To: Daniel Wagner, James Smart
Cc: linux-nvme@lists.infradead.org, Shinichiro Kawasaki
On 2/1/23 09:10, Daniel Wagner wrote:
> On Tue, Jan 31, 2023 at 03:27:20PM -0800, James Smart wrote:
>> On 1/30/2023 3:15 AM, Daniel Wagner wrote:
>>> Hi,
>>>
>>> Hannes added support to blktest to enable the FC transport for it. And as result
>>> blktests is able to crash the kernel. Haven't startet to look into it yet.
>>>
>>> run blktests nvme/003 at 2023-01-30 12:08:28
>>> nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>>> BUG: kernel NULL pointer dereference, address: 0000000000000010
>>> #PF: supervisor read access in kernel mode
>>> #PF: error_code(0x0000) - not-present page
>>> PGD 0 P4D 0
>>> Oops: 0000 [#1] PREEMPT SMP NOPTI
>>> CPU: 3 PID: 14200 Comm: nvme Not tainted 6.2.0-rc4+ #1 c9afca57e89c2aa3234427c2cd6cd4dd2984ae8c
>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>>> RIP: 0010:nvme_alloc_admin_tag_set+0x38/0x120 [nvme_core]
>>
>> bingo...
>>
>> The cleanup patch "nvme-fc: use the tagset alloc/free helpers" introduced
>> it.
>>
>> See the fix patch "nvme-fc: Fix initialization order" about 10 days ago that
>> corrects it.
>
> Good call! This is problem is gone. Now I am running into the next problem:
>
>
> run blktests nvme/003 at 2023-02-01 09:01:58
> nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> debugfs: Directory 'nvme0' with parent '/' already present!
> nvme0: failed to create debugfs attr
> nvme nvme0: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
> (NULL device *): {0:0} Association created
> nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:2cd2c43b-a90a-45c1-a8cd-86b33ab273b5.
> nvme nvme0: NVME-FC{0}: controller connect complete
> nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
> nvme nvme1: NVME-FC{1}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
> (NULL device *): {0:1} Association created
> nvmet: creating nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:2cd2c43b-a90a-45c1-a8cd-86b33ab273b5.
> nvme nvme1: NVME-FC{1}: controller connect complete
> nvme nvme1: NVME-FC{1}: new ctrl: NQN "blktests-subsystem-1"
> nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
> nvme nvme1: rescanning namespaces.
> (NULL device *): {0:0} Association deleted
> (NULL device *): {0:0} Association freed
> (NULL device *): Disconnect LS failed: No Association
> nvme nvme1: NVME-FC{1}: io failed due to lldd error 6
> nvme nvme1: NVME-FC{1}: transport association event: transport detected io error
> nvme nvme1: NVME-FC{1}: resetting controller
> debugfs: Directory 'nvme0' with parent '/' already present!
> nvme0: failed to create debugfs attr
> nvme nvme0: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
> (NULL device *): {0:0} Association created
> nvmet: connect request for invalid subsystem nqn.2014-08.org.nvmexpress.discovery!
> nvme nvme0: Connect Invalid Data Parameter, subsysnqn "nqn.2014-08.org.nvmexpress.discovery"
> nvme nvme0: NVME-FC{0}: reset: Reconnect attempt failed (16770)
> nvme nvme0: NVME-FC{0}: reconnect failure
> nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
> nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
> nvme nvme1: NVME-FC{1}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
> (NULL device *): {0:2} Association created
> nvmet: connect request for invalid subsystem blktests-subsystem-1!
> nvme nvme1: Connect Invalid Data Parameter, subsysnqn "blktests-subsystem-1"
> nvme nvme1: NVME-FC{1}: reset: Reconnect attempt failed (16770)
> nvme nvme1: NVME-FC{1}: reconnect failure
> nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> (NULL device *): {0:1} Association deleted
> (NULL device *): {0:1} Association freed
> (NULL device *): Disconnect LS failed: No Association
> (NULL device *): {0:0} Association deleted
> (NULL device *): {0:0} Association freed
> (NULL device *): Disconnect LS failed: No Association
> (NULL device *): {0:2} Association deleted
> (NULL device *): {0:2} Association freed
> ===========================================
> WARNING: possible recursive locking detected
> 6.2.0-rc4+ #1 Not tainted
> --------------------------------------------
> kworker/3:13/21736 is trying to acquire lock:
> ffff93fe60e251e8 ((work_completion)(&tport->ls_work)){+.+.}-{0:0}, at: __flush_work+0x38/0xc0
>
> but task is already holding lock:
> ffffa693c1a4fe68 ((work_completion)(&tport->ls_work)){+.+.}-{0:0}, at: process_one_work+0x236/0x410
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock((work_completion)(&tport->ls_work));
> lock((work_completion)(&tport->ls_work));
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 2 locks held by kworker/3:13/21736:
> #0: ffff93fe9882ff48 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x210/0x410
> #1: ffffa693c1a4fe68 ((work_completion)(&tport->ls_work)){+.+.}-{0:0}, at: process_one_work+0x236/0x410
>
> stack backtrace:
> CPU: 3 PID: 21736 Comm: kworker/3:13 Not tainted 6.2.0-rc4+ #1 c9afca57e89c2aa3234427c2cd6cd4dd2984ae8c
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> Workqueue: nvmet-wq fcloop_tport_lsrqst_work [nvme_fcloop]
> Call Trace:
> <TASK>
> dump_stack_lvl+0x5a/0x80
> validate_chain+0x1e02/0x2730
> ? lockdep_unlock+0x51/0xc0
> ? lockdep_unlock+0x51/0xc0
> ? validate_chain+0x7c8/0x2730
> ? validate_chain+0x7c8/0x2730
> ? lock_release+0xf7/0x400
> ? __lock_acquire+0x8f7/0xbc0
> ? lock_is_held_type+0x8b/0x110
> __lock_acquire+0x8f7/0xbc0
> ? lock_is_held_type+0x8b/0x110
> lock_acquire+0xb0/0x180
> ? __flush_work+0x38/0xc0
> ? lock_is_held_type+0x8b/0x110
> __flush_work+0x57/0xc0
> ? __flush_work+0x38/0xc0
> ? lockdep_hardirqs_on_prepare+0x16e/0x220
> ? _raw_spin_unlock_irqrestore+0x32/0x50
> ? __free_pages_ok+0x289/0x4a0
> fcloop_targetport_delete+0x15/0xa0 [nvme_fcloop 2f0f07bf6746e7b4ec6dff8d226e919dac60e0f1]
> nvmet_fc_tgtport_put+0x132/0x190 [nvmet_fc cec46c808c67bf2a4fefe925da2828f0b3f2334a]
> nvmet_fc_disconnect_assoc_done+0x7c/0xa0 [nvmet_fc cec46c808c67bf2a4fefe925da2828f0b3f2334a]
> fcloop_tport_lsrqst_work+0x6c/0xa0 [nvme_fcloop 2f0f07bf6746e7b4ec6dff8d226e919dac60e0f1]
> process_one_work+0x26a/0x410
> worker_thread+0x21d/0x490
> ? rcu_lock_release+0x20/0x20
> kthread+0xe1/0x100
> ? kthread_blkcg+0x30/0x30
> ret_from_fork+0x1f/0x30
> </TASK>
> nvme_fcloop: fcloop_exit: Failed deleting target port
> nvme_fcloop: fcloop_exit: Failed deleting remote port
> nvme_fcloop: fcloop_exit: Failed deleting target port
> nvme_fcloop: fcloop_exit: Failed deleting remote port
> nvme_fcloop: fcloop_exit: Failed deleting target port
> nvme_fcloop: fcloop_exit: Failed deleting remote port
>
>
> The last last to message never stop.
And this is why I wanted to have these tests :-)
In my original test I had this:
+ _remove_nvmet_subsystem_from_port "${port1}" "${subsys}"
+ _remove_nvmet_subsystem_from_port "${port2}" "${subsys}"
+ _remove_nvmet_subsystem "${subsys}"
+ _remove_nvmet_anagroup "${port1}" "${ag1}"
+ _remove_nvmet_port "${port1}"
+ _remove_nvmet_anagroup "${port2}" "${ag2}"
+ _remove_nvmet_port "${port2}"
+
+ losetup -d "$loop_dev"
+ rm "$TMPDIR/img"
+
+ # Fixme: need to wait for RCU grace period
+ sleep 5
+
+ _nvme_fcloop_del_rport "${host_wwnn1}" "${host_wwpn1}" \
+ "${remote_wwnn1}" "${remote_wwpn1}"
Problem is a subtle workqueue inversion with fcloop. I'll have to dig
out details here.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: nvme-6.3: blktests nvme/003 null pointer for fc transport
2023-02-01 8:10 ` Daniel Wagner
2023-02-01 8:24 ` Hannes Reinecke
@ 2023-02-20 16:04 ` Daniel Wagner
1 sibling, 0 replies; 5+ messages in thread
From: Daniel Wagner @ 2023-02-20 16:04 UTC (permalink / raw)
To: James Smart
Cc: linux-nvme@lists.infradead.org, Shinichiro Kawasaki,
Hannes Reinecke, Ming Lei
On Wed, Feb 01, 2023 at 09:10:03AM +0100, Daniel Wagner wrote:
> nvme_fcloop: fcloop_exit: Failed deleting target port
> nvme_fcloop: fcloop_exit: Failed deleting remote port
> nvme_fcloop: fcloop_exit: Failed deleting target port
> nvme_fcloop: fcloop_exit: Failed deleting remote port
> nvme_fcloop: fcloop_exit: Failed deleting target port
> nvme_fcloop: fcloop_exit: Failed deleting remote port
I've retested with v6.2.0 with
--- a/drivers/nvme/target/fcloop.c
+++ b/drivers/nvme/target/fcloop.c
@@ -1607,6 +1607,11 @@ static void __exit fcloop_exit(void)
tport = __unlink_target_port(nport);
rport = __unlink_remote_port(nport);
+ if (!tport && !rport) {
+ pr_warn("%s: target port and remote port have already been freed\n", __func__);
+ break;
+ }
+
spin_unlock_irqrestore(&fcloop_lock, flags);
ret = __targetport_unreg(nport, tport);
but I was not able to trigger it anymore. My guess is
0cab4404874f ("nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association")
fixed the problem.
Though I run into a lockdep warning:
================================
WARNING: inconsistent lock state
6.2.0+ #4 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
ksoftirqd/1/23 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff88817e63a160 (&fq->mq_flush_lock){?.-.}-{2:2}, at: flush_end_io+0x69/0x460
{IN-HARDIRQ-W} state was registered at:
lock_acquire+0x128/0x270
_raw_spin_lock_irqsave+0x55/0x90
flush_end_io+0x69/0x460
__blk_mq_end_request+0x16c/0x190
virtblk_done+0x12e/0x220 [virtio_blk]
vring_interrupt+0xc0/0x140
__handle_irq_event_percpu+0x10c/0x230
handle_irq_event+0x4e/0xb0
handle_edge_irq+0x14a/0x3f0
__common_interrupt+0x8a/0x120
common_interrupt+0x7b/0x90
asm_common_interrupt+0x22/0x40
native_safe_halt+0xb/0x10
default_idle+0xa/0x10
default_idle_call+0x70/0xc0
do_idle+0x185/0x450
cpu_startup_entry+0x25/0x30
start_secondary+0xbd/0xc0
secondary_startup_64_no_verify+0xe6/0xeb
irq event stamp: 781843
hardirqs last enabled at (781842): [<ffffffff8e69f6a1>] seqcount_lockdep_reader_access+0x91/0xa0
hardirqs last disabled at (781843): [<ffffffff8f99ca0c>] _raw_spin_lock_irqsave+0x2c/0x90
softirqs last enabled at (781834): [<ffffffff8e52b143>] run_ksoftirqd+0x33/0x70
softirqs last disabled at (781839): [<ffffffff8e52b143>] run_ksoftirqd+0x33/0x70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&fq->mq_flush_lock);
<Interrupt>
lock(&fq->mq_flush_lock);
*** DEADLOCK ***
1 lock held by ksoftirqd/1/23:
#0: ffff88817e63a160 (&fq->mq_flush_lock){?.-.}-{2:2}, at: flush_end_io+0x69/0x460
stack backtrace:
CPU: 1 PID: 23 Comm: ksoftirqd/1 Kdump: loaded Not tainted 6.2.0+ #4 7ac3eece8c11633fa2815efb5a20b1c5b55011f5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
<TASK>
dump_stack_lvl+0x5a/0x80
print_usage_bug+0x6db/0x6f0
? print_lockdep_off+0x30/0x30
? pvclock_clocksource_read+0xec/0x180
? kvm_sched_clock_read+0x14/0x40
? sched_clock_cpu+0x6f/0x1f0
? rcu_read_lock_sched_held+0x34/0x70
? trace_irq_disable+0x47/0x110
? lockdep_hardirqs_on_prepare+0x17e/0x2e0
? mod_find+0x100/0x120
? nvmet_bio_done+0x37/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
? nvmet_bio_done+0x37/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
? __module_address+0xe1/0x100
? ftrace_graph_ret_addr+0x1d/0xa0
? orc_find+0x164/0x4f0
? pvclock_clocksource_read+0xec/0x180
? __update_load_avg_cfs_rq+0x34a/0x500
? kvm_sched_clock_read+0x14/0x40
? pvclock_clocksource_read+0xec/0x180
? kvm_sched_clock_read+0x14/0x40
? sched_clock_cpu+0x6f/0x1f0
? pvclock_clocksource_read+0xec/0x180
? mod_find+0x100/0x120
? nvmet_bio_done+0x37/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
? nvmet_bio_done+0x37/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
? __module_address+0xe1/0x100
? ftrace_graph_ret_addr+0x1d/0xa0
? orc_find+0x164/0x4f0
? unwind_next_frame+0xd55/0xd90
? ret_from_fork+0x1f/0x30
? stack_trace_save+0x130/0x130
mark_lock_irq+0x7d0/0x7e0
? save_trace+0x4c0/0x4c0
? stack_trace_save+0xe0/0x130
? stack_trace_snprint+0xe0/0xe0
? lock_is_held_type+0x8b/0x110
? fcloop_fcp_op+0x92/0x310 [nvme_fcloop 103a43b51b136ed1b9f0be7fbf6a4787d9fa52e2]
? save_trace+0x447/0x4c0
mark_lock+0x19c/0x220
lockdep_hardirqs_on_prepare+0x169/0x2e0
trace_hardirqs_on+0x34/0x40
_raw_spin_unlock_irq+0x24/0x40
fcloop_fcp_op+0x92/0x310 [nvme_fcloop 103a43b51b136ed1b9f0be7fbf6a4787d9fa52e2]
? fcloop_h2t_xmt_ls_rsp+0x150/0x150 [nvme_fcloop 103a43b51b136ed1b9f0be7fbf6a4787d9fa52e2]
nvmet_fc_fcp_nvme_cmd_done+0x325/0x480 [nvmet_fc 14b6afa92b89f97b59495a77b43c19ce2c6642ff]
__nvmet_req_complete+0x20e/0x270 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
? nvmet_req_complete+0x30/0x30 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
nvmet_req_complete+0xe/0x30 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
nvmet_bio_done+0x38/0x180 [nvmet b6cfd0efddc3f6e687219ea59b5a92b951bb0ddd]
blk_update_request+0x343/0x6a0
blk_mq_end_request+0x22/0x40
blk_flush_complete_seq+0x22b/0x530
flush_end_io+0x424/0x460
__blk_mq_end_request+0x16c/0x190
blk_done_softirq+0x95/0xb0
__do_softirq+0x14b/0x369
? run_ksoftirqd+0x33/0x70
? ksoftirqd_should_run+0x20/0x20
run_ksoftirqd+0x33/0x70
smpboot_thread_fn+0x294/0x390
? cpu_report_death+0x170/0x170
kthread+0x16d/0x190
? kthread_blkcg+0x60/0x60
ret_from_fork+0x1f/0x30
</TASK>
It looks similar to the reports/fixes from here:
b3c6a5997541 ("block: Fix a lockdep complaint triggered by request queue flushing")
fb01a2932e81 ("blk-mq: add new API of blk_mq_hctx_set_fq_lock_class")
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread