linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]
@ 2024-12-13  9:40 Honggang LI
  2024-12-13 12:01 ` Bernard Metzler
  2024-12-13 18:55 ` Zhu Yanjun
  0 siblings, 2 replies; 7+ messages in thread
From: Honggang LI @ 2024-12-13  9:40 UTC (permalink / raw)
  To: linux-nvme, linux-rdma

It is 100% reproducible. The NVMEoRDMA client side is running RXE.
To reproduce it, the clinet side repeat to connect and disconnect
to the NVMEoRDMA target.

[ 685.757357] ------------[ cut here ]------------
[ 685.758725] workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]
[ 685.758809] WARNING: CPU: 16 PID: 1897 at kernel/workqueue.c:2966 check_flush_dependency+0x11f/0x140
[ 685.762880] Modules linked in: nvmet_rdma nvmet nvme_keyring tcm_loop target_core_user uio target_core_pscsi target_core_file target_core_iblock rpcrdma qrtr rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod rfkill ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp sunrpc kvm_intel kvm irqbypass binfmt_misc rapl intel_cstate irdma ipmi_ssif i40e iTCO_wdt intel_pmc_bxt iTCO_vendor_support ib_uverbs acpi_ipmi intel_uncore joydev ipmi_si pcspkr mxm_wmi ib_core mei_me ipmi_devintf i2c_i801 mei i2c_smbus lpc_ich ioatdma ipmi_msghandler loop dm_multipath nfnetlink zram ice crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme isci nvme_core ghash_clmulni_intel sha512_ssse3 igb sha256_ssse3 libsas sha1_ssse3 nvme_auth mgag200 scsi_transport_sas dca gnss i2c_algo_bit wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
[ 685.773891] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded Tainted: G S 6.8.4-300.patched.fc40.x86_64 #1
[ 685.775267] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
[ 685.776627] Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
[ 685.777993] RIP: 0010:check_flush_dependency+0x11f/0x140
[ 685.779331] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 00 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 fd ff <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
[ 685.782050] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
[ 685.783398] RAX: 0000000000000000 RBX: ffff96c705754800 RCX: 0000000000000027
[ 685.784744] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI: ffff96ce5fca18c0
[ 685.786077] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09: ffffb31348793b38
[ 685.787390] R10: ffffffffb3516808 R11: 0000000000000003 R12: ffff96c70d2aa8c0
[ 685.788688] R13: ffff96c7043c6a80 R14: 0000000000000001 R15: ffff96c704147400
[ 685.789970] FS: 0000000000000000(0000) GS:ffff96ce5fc80000(0000) knlGS:0000000000000000
[ 685.791239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 685.792495] CR2: 00007f8207151000 CR3: 0000000d15422006 CR4: 00000000001706f0
[ 685.793745] Call Trace:
[ 685.794973] <TASK>
[ 685.796179] ? check_flush_dependency+0x11f/0x140
[ 685.797382] ? __warn+0x81/0x130
[ 685.798563] ? check_flush_dependency+0x11f/0x140
[ 685.799732] ? report_bug+0x16f/0x1a0
[ 685.800882] ? handle_bug+0x3c/0x80
[ 685.802003] ? exc_invalid_op+0x17/0x70
[ 685.803107] ? asm_exc_invalid_op+0x1a/0x20
[ 685.804200] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
[ 685.805315] ? check_flush_dependency+0x11f/0x140
[ 685.806373] ? check_flush_dependency+0x11f/0x140
[ 685.807407] __flush_work.isra.0+0x10d/0x290
[ 685.808420] __cancel_work_timer+0x103/0x1a0
[ 685.809418] irdma_destroy_qp+0xd4/0x180 [irdma]
[ 685.810437] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
[ 685.811474] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
[ 685.812437] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
[ 685.813385] process_one_work+0x170/0x330
[ 685.814300] worker_thread+0x280/0x3d0
[ 685.815201] ? __pfx_worker_thread+0x10/0x10
[ 685.816090] kthread+0xe8/0x120
[ 685.816956] ? __pfx_kthread+0x10/0x10
[ 685.817801] ret_from_fork+0x34/0x50
[ 685.818633] ? __pfx_kthread+0x10/0x10
[ 685.819439] ret_from_fork_asm+0x1b/0x30
[ 685.820232] </TASK>
[ 685.820994] Kernel panic - not syncing: kernel: panic_on_warn set ...
[ 685.821749] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded Tainted: G S 6.8.4-300.patched.fc40.x86_64 #1
[ 685.822513] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
[ 685.823259] Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
[ 685.824002] Call Trace:
[ 685.824706] <TASK>
[ 685.825386] dump_stack_lvl+0x4d/0x70
[ 685.826060] panic+0x33e/0x370
[ 685.826724] ? check_flush_dependency+0x11f/0x140
[ 685.827383] check_panic_on_warn+0x44/0x60
[ 685.828021] __warn+0x8d/0x130
[ 685.828629] ? check_flush_dependency+0x11f/0x140
[ 685.829229] report_bug+0x16f/0x1a0
[ 685.829819] handle_bug+0x3c/0x80
[ 685.830396] exc_invalid_op+0x17/0x70
[ 685.830972] asm_exc_invalid_op+0x1a/0x20
[ 685.831548] RIP: 0010:check_flush_dependency+0x11f/0x140
[ 685.832129] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 00 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 fd ff <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
[ 685.833341] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
[ 685.833954] RAX: 0000000000000000 RBX: ffff96c705754800 RCX: 0000000000000027
[ 685.834569] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI: ffff96ce5fca18c0
[ 685.835196] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09: ffffb31348793b38
[ 685.835823] R10: ffffffffb3516808 R11: 0000000000000003 R12: ffff96c70d2aa8c0
[ 685.836450] R13: ffff96c7043c6a80 R14: 0000000000000001 R15: ffff96c704147400
[ 685.837079] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
[ 685.837755] ? check_flush_dependency+0x11f/0x140
[ 685.838394] __flush_work.isra.0+0x10d/0x290
[ 685.839037] __cancel_work_timer+0x103/0x1a0
[ 685.839679] irdma_destroy_qp+0xd4/0x180 [irdma]
[ 685.840354] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
[ 685.841049] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
[ 685.841707] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
[ 685.842367] process_one_work+0x170/0x330
[ 685.843020] worker_thread+0x280/0x3d0
[ 685.843670] ? __pfx_worker_thread+0x10/0x10
[ 685.844316] kthread+0xe8/0x120
[ 685.844955] ? __pfx_kthread+0x10/0x10
[ 685.845590] ret_from_fork+0x34/0x50
[ 685.846223] ? __pfx_kthread+0x10/0x10
[ 685.846853] ret_from_fork_asm+0x1b/0x30
[ 685.847485] </TASK> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE:  workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]
  2024-12-13  9:40 workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma] Honggang LI
@ 2024-12-13 12:01 ` Bernard Metzler
  2024-12-13 12:16   ` Bernard Metzler
  2024-12-13 18:55 ` Zhu Yanjun
  1 sibling, 1 reply; 7+ messages in thread
From: Bernard Metzler @ 2024-12-13 12:01 UTC (permalink / raw)
  To: Honggang LI, linux-nvme@lists.infradead.org,
	linux-rdma@vger.kernel.org

RXE? There are irdma calls on the stack?

Hmmm.



> -----Original Message-----
> From: Honggang LI <honggangli@163.com>
> Sent: Friday, December 13, 2024 10:41 AM
> To: linux-nvme@lists.infradead.org; linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] workqueue: WQ_MEM_RECLAIM nvmet-
> wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM
> irdma-cleanup-wq:irdma_flush_worker [irdma]
> 
> It is 100% reproducible. The NVMEoRDMA client side is running RXE.
> To reproduce it, the clinet side repeat to connect and disconnect
> to the NVMEoRDMA target.
> 
> [ 685.757357] ------------[ cut here ]------------
> [ 685.758725] workqueue: WQ_MEM_RECLAIM nvmet-
> wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM
> irdma-cleanup-wq:irdma_flush_worker [irdma]
> [ 685.758809] WARNING: CPU: 16 PID: 1897 at kernel/workqueue.c:2966
> check_flush_dependency+0x11f/0x140
> [ 685.762880] Modules linked in: nvmet_rdma nvmet nvme_keyring tcm_loop
> target_core_user uio target_core_pscsi target_core_file target_core_iblock
> rpcrdma qrtr rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod
> rfkill ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm
> intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal
> intel_powerclamp coretemp sunrpc kvm_intel kvm irqbypass binfmt_misc rapl
> intel_cstate irdma ipmi_ssif i40e iTCO_wdt intel_pmc_bxt
> iTCO_vendor_support ib_uverbs acpi_ipmi intel_uncore joydev ipmi_si pcspkr
> mxm_wmi ib_core mei_me ipmi_devintf i2c_i801 mei i2c_smbus lpc_ich ioatdma
> ipmi_msghandler loop dm_multipath nfnetlink zram ice crct10dif_pclmul
> crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme isci
> nvme_core ghash_clmulni_intel sha512_ssse3 igb sha256_ssse3 libsas
> sha1_ssse3 nvme_auth mgag200 scsi_transport_sas dca gnss i2c_algo_bit wmi
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
> [ 685.773891] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded Tainted: G
> S 6.8.4-300.patched.fc40.x86_64 #1
> [ 685.775267] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
> [ 685.776627] Workqueue: nvmet-wq nvmet_rdma_release_queue_work
> [nvmet_rdma]
> [ 685.777993] RIP: 0010:check_flush_dependency+0x11f/0x140
> [ 685.779331] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 00
> 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 fd ff
> <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
> [ 685.782050] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
> [ 685.783398] RAX: 0000000000000000 RBX: ffff96c705754800 RCX:
> 0000000000000027
> [ 685.784744] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI:
> ffff96ce5fca18c0
> [ 685.786077] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09:
> ffffb31348793b38
> [ 685.787390] R10: ffffffffb3516808 R11: 0000000000000003 R12:
> ffff96c70d2aa8c0
> [ 685.788688] R13: ffff96c7043c6a80 R14: 0000000000000001 R15:
> ffff96c704147400
> [ 685.789970] FS: 0000000000000000(0000) GS:ffff96ce5fc80000(0000)
> knlGS:0000000000000000
> [ 685.791239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 685.792495] CR2: 00007f8207151000 CR3: 0000000d15422006 CR4:
> 00000000001706f0
> [ 685.793745] Call Trace:
> [ 685.794973] <TASK>
> [ 685.796179] ? check_flush_dependency+0x11f/0x140
> [ 685.797382] ? __warn+0x81/0x130
> [ 685.798563] ? check_flush_dependency+0x11f/0x140
> [ 685.799732] ? report_bug+0x16f/0x1a0
> [ 685.800882] ? handle_bug+0x3c/0x80
> [ 685.802003] ? exc_invalid_op+0x17/0x70
> [ 685.803107] ? asm_exc_invalid_op+0x1a/0x20
> [ 685.804200] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
> [ 685.805315] ? check_flush_dependency+0x11f/0x140
> [ 685.806373] ? check_flush_dependency+0x11f/0x140
> [ 685.807407] __flush_work.isra.0+0x10d/0x290
> [ 685.808420] __cancel_work_timer+0x103/0x1a0
> [ 685.809418] irdma_destroy_qp+0xd4/0x180 [irdma]
> [ 685.810437] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
> [ 685.811474] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
> [ 685.812437] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
> [ 685.813385] process_one_work+0x170/0x330
> [ 685.814300] worker_thread+0x280/0x3d0
> [ 685.815201] ? __pfx_worker_thread+0x10/0x10
> [ 685.816090] kthread+0xe8/0x120
> [ 685.816956] ? __pfx_kthread+0x10/0x10
> [ 685.817801] ret_from_fork+0x34/0x50
> [ 685.818633] ? __pfx_kthread+0x10/0x10
> [ 685.819439] ret_from_fork_asm+0x1b/0x30
> [ 685.820232] </TASK>
> [ 685.820994] Kernel panic - not syncing: kernel: panic_on_warn set ...
> [ 685.821749] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded Tainted: G
> S 6.8.4-300.patched.fc40.x86_64 #1
> [ 685.822513] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
> [ 685.823259] Workqueue: nvmet-wq nvmet_rdma_release_queue_work
> [nvmet_rdma]
> [ 685.824002] Call Trace:
> [ 685.824706] <TASK>
> [ 685.825386] dump_stack_lvl+0x4d/0x70
> [ 685.826060] panic+0x33e/0x370
> [ 685.826724] ? check_flush_dependency+0x11f/0x140
> [ 685.827383] check_panic_on_warn+0x44/0x60
> [ 685.828021] __warn+0x8d/0x130
> [ 685.828629] ? check_flush_dependency+0x11f/0x140
> [ 685.829229] report_bug+0x16f/0x1a0
> [ 685.829819] handle_bug+0x3c/0x80
> [ 685.830396] exc_invalid_op+0x17/0x70
> [ 685.830972] asm_exc_invalid_op+0x1a/0x20
> [ 685.831548] RIP: 0010:check_flush_dependency+0x11f/0x140
> [ 685.832129] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 00
> 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 fd ff
> <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
> [ 685.833341] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
> [ 685.833954] RAX: 0000000000000000 RBX: ffff96c705754800 RCX:
> 0000000000000027
> [ 685.834569] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI:
> ffff96ce5fca18c0
> [ 685.835196] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09:
> ffffb31348793b38
> [ 685.835823] R10: ffffffffb3516808 R11: 0000000000000003 R12:
> ffff96c70d2aa8c0
> [ 685.836450] R13: ffff96c7043c6a80 R14: 0000000000000001 R15:
> ffff96c704147400
> [ 685.837079] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
> [ 685.837755] ? check_flush_dependency+0x11f/0x140
> [ 685.838394] __flush_work.isra.0+0x10d/0x290
> [ 685.839037] __cancel_work_timer+0x103/0x1a0
> [ 685.839679] irdma_destroy_qp+0xd4/0x180 [irdma]
> [ 685.840354] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
> [ 685.841049] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
> [ 685.841707] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
> [ 685.842367] process_one_work+0x170/0x330
> [ 685.843020] worker_thread+0x280/0x3d0
> [ 685.843670] ? __pfx_worker_thread+0x10/0x10
> [ 685.844316] kthread+0xe8/0x120
> [ 685.844955] ? __pfx_kthread+0x10/0x10
> [ 685.845590] ret_from_fork+0x34/0x50
> [ 685.846223] ? __pfx_kthread+0x10/0x10
> [ 685.846853] ret_from_fork_asm+0x1b/0x30
> [ 685.847485] </TASK>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE:  workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]
  2024-12-13 12:01 ` Bernard Metzler
@ 2024-12-13 12:16   ` Bernard Metzler
  0 siblings, 0 replies; 7+ messages in thread
From: Bernard Metzler @ 2024-12-13 12:16 UTC (permalink / raw)
  To: Bernard Metzler, Honggang LI, linux-nvme@lists.infradead.org,
	linux-rdma@vger.kernel.org



> -----Original Message-----
> From: Bernard Metzler <BMT@zurich.ibm.com>
> Sent: Friday, December 13, 2024 1:01 PM
> To: Honggang LI <honggangli@163.com>; linux-nvme@lists.infradead.org;
> linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] RE: workqueue: WQ_MEM_RECLAIM nvmet-
> wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM
> irdma-cleanup-wq:irdma_flush_worker [irdma]
> 
> RXE? There are irdma calls on the stack?
> 
> Hmmm.

Sorry for the noise. Yes its target side.
> 
> 
> 
> > -----Original Message-----
> > From: Honggang LI <honggangli@163.com>
> > Sent: Friday, December 13, 2024 10:41 AM
> > To: linux-nvme@lists.infradead.org; linux-rdma@vger.kernel.org
> > Subject: [EXTERNAL] workqueue: WQ_MEM_RECLAIM nvmet-
> > wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM
> > irdma-cleanup-wq:irdma_flush_worker [irdma]
> >
> > It is 100% reproducible. The NVMEoRDMA client side is running RXE.
> > To reproduce it, the clinet side repeat to connect and disconnect
> > to the NVMEoRDMA target.
> >
> > [ 685.757357] ------------[ cut here ]------------
> > [ 685.758725] workqueue: WQ_MEM_RECLAIM nvmet-
> > wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM
> > irdma-cleanup-wq:irdma_flush_worker [irdma]
> > [ 685.758809] WARNING: CPU: 16 PID: 1897 at kernel/workqueue.c:2966
> > check_flush_dependency+0x11f/0x140
> > [ 685.762880] Modules linked in: nvmet_rdma nvmet nvme_keyring tcm_loop
> > target_core_user uio target_core_pscsi target_core_file
> target_core_iblock
> > rpcrdma qrtr rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod
> > rfkill ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm
> > intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal
> > intel_powerclamp coretemp sunrpc kvm_intel kvm irqbypass binfmt_misc rapl
> > intel_cstate irdma ipmi_ssif i40e iTCO_wdt intel_pmc_bxt
> > iTCO_vendor_support ib_uverbs acpi_ipmi intel_uncore joydev ipmi_si
> pcspkr
> > mxm_wmi ib_core mei_me ipmi_devintf i2c_i801 mei i2c_smbus lpc_ich
> ioatdma
> > ipmi_msghandler loop dm_multipath nfnetlink zram ice crct10dif_pclmul
> > crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme isci
> > nvme_core ghash_clmulni_intel sha512_ssse3 igb sha256_ssse3 libsas
> > sha1_ssse3 nvme_auth mgag200 scsi_transport_sas dca gnss i2c_algo_bit wmi
> > scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
> > [ 685.773891] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded Tainted:
> G
> > S 6.8.4-300.patched.fc40.x86_64 #1
> > [ 685.775267] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
> > [ 685.776627] Workqueue: nvmet-wq nvmet_rdma_release_queue_work
> > [nvmet_rdma]
> > [ 685.777993] RIP: 0010:check_flush_dependency+0x11f/0x140
> > [ 685.779331] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 00
> > 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 fd ff
> > <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
> > [ 685.782050] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
> > [ 685.783398] RAX: 0000000000000000 RBX: ffff96c705754800 RCX:
> > 0000000000000027
> > [ 685.784744] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI:
> > ffff96ce5fca18c0
> > [ 685.786077] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09:
> > ffffb31348793b38
> > [ 685.787390] R10: ffffffffb3516808 R11: 0000000000000003 R12:
> > ffff96c70d2aa8c0
> > [ 685.788688] R13: ffff96c7043c6a80 R14: 0000000000000001 R15:
> > ffff96c704147400
> > [ 685.789970] FS: 0000000000000000(0000) GS:ffff96ce5fc80000(0000)
> > knlGS:0000000000000000
> > [ 685.791239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 685.792495] CR2: 00007f8207151000 CR3: 0000000d15422006 CR4:
> > 00000000001706f0
> > [ 685.793745] Call Trace:
> > [ 685.794973] <TASK>
> > [ 685.796179] ? check_flush_dependency+0x11f/0x140
> > [ 685.797382] ? __warn+0x81/0x130
> > [ 685.798563] ? check_flush_dependency+0x11f/0x140
> > [ 685.799732] ? report_bug+0x16f/0x1a0
> > [ 685.800882] ? handle_bug+0x3c/0x80
> > [ 685.802003] ? exc_invalid_op+0x17/0x70
> > [ 685.803107] ? asm_exc_invalid_op+0x1a/0x20
> > [ 685.804200] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
> > [ 685.805315] ? check_flush_dependency+0x11f/0x140
> > [ 685.806373] ? check_flush_dependency+0x11f/0x140
> > [ 685.807407] __flush_work.isra.0+0x10d/0x290
> > [ 685.808420] __cancel_work_timer+0x103/0x1a0
> > [ 685.809418] irdma_destroy_qp+0xd4/0x180 [irdma]
> > [ 685.810437] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
> > [ 685.811474] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
> > [ 685.812437] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
> > [ 685.813385] process_one_work+0x170/0x330
> > [ 685.814300] worker_thread+0x280/0x3d0
> > [ 685.815201] ? __pfx_worker_thread+0x10/0x10
> > [ 685.816090] kthread+0xe8/0x120
> > [ 685.816956] ? __pfx_kthread+0x10/0x10
> > [ 685.817801] ret_from_fork+0x34/0x50
> > [ 685.818633] ? __pfx_kthread+0x10/0x10
> > [ 685.819439] ret_from_fork_asm+0x1b/0x30
> > [ 685.820232] </TASK>
> > [ 685.820994] Kernel panic - not syncing: kernel: panic_on_warn set ...
> > [ 685.821749] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded Tainted:
> G
> > S 6.8.4-300.patched.fc40.x86_64 #1
> > [ 685.822513] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
> > [ 685.823259] Workqueue: nvmet-wq nvmet_rdma_release_queue_work
> > [nvmet_rdma]
> > [ 685.824002] Call Trace:
> > [ 685.824706] <TASK>
> > [ 685.825386] dump_stack_lvl+0x4d/0x70
> > [ 685.826060] panic+0x33e/0x370
> > [ 685.826724] ? check_flush_dependency+0x11f/0x140
> > [ 685.827383] check_panic_on_warn+0x44/0x60
> > [ 685.828021] __warn+0x8d/0x130
> > [ 685.828629] ? check_flush_dependency+0x11f/0x140
> > [ 685.829229] report_bug+0x16f/0x1a0
> > [ 685.829819] handle_bug+0x3c/0x80
> > [ 685.830396] exc_invalid_op+0x17/0x70
> > [ 685.830972] asm_exc_invalid_op+0x1a/0x20
> > [ 685.831548] RIP: 0010:check_flush_dependency+0x11f/0x140
> > [ 685.832129] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 00
> > 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 fd ff
> > <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
> > [ 685.833341] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
> > [ 685.833954] RAX: 0000000000000000 RBX: ffff96c705754800 RCX:
> > 0000000000000027
> > [ 685.834569] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI:
> > ffff96ce5fca18c0
> > [ 685.835196] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09:
> > ffffb31348793b38
> > [ 685.835823] R10: ffffffffb3516808 R11: 0000000000000003 R12:
> > ffff96c70d2aa8c0
> > [ 685.836450] R13: ffff96c7043c6a80 R14: 0000000000000001 R15:
> > ffff96c704147400
> > [ 685.837079] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
> > [ 685.837755] ? check_flush_dependency+0x11f/0x140
> > [ 685.838394] __flush_work.isra.0+0x10d/0x290
> > [ 685.839037] __cancel_work_timer+0x103/0x1a0
> > [ 685.839679] irdma_destroy_qp+0xd4/0x180 [irdma]
> > [ 685.840354] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
> > [ 685.841049] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
> > [ 685.841707] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
> > [ 685.842367] process_one_work+0x170/0x330
> > [ 685.843020] worker_thread+0x280/0x3d0
> > [ 685.843670] ? __pfx_worker_thread+0x10/0x10
> > [ 685.844316] kthread+0xe8/0x120
> > [ 685.844955] ? __pfx_kthread+0x10/0x10
> > [ 685.845590] ret_from_fork+0x34/0x50
> > [ 685.846223] ? __pfx_kthread+0x10/0x10
> > [ 685.846853] ret_from_fork_asm+0x1b/0x30
> > [ 685.847485] </TASK>
> >


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]
  2024-12-13  9:40 workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma] Honggang LI
  2024-12-13 12:01 ` Bernard Metzler
@ 2024-12-13 18:55 ` Zhu Yanjun
  2024-12-13 19:30   ` Zhu Yanjun
  1 sibling, 1 reply; 7+ messages in thread
From: Zhu Yanjun @ 2024-12-13 18:55 UTC (permalink / raw)
  To: Honggang LI, linux-nvme, linux-rdma

在 2024/12/13 10:40, Honggang LI 写道:
> It is 100% reproducible. The NVMEoRDMA client side is running RXE.
> To reproduce it, the clinet side repeat to connect and disconnect
> to the NVMEoRDMA target.
> 
> [ 685.757357] ------------[ cut here ]------------
> [ 685.758725] workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]
> [ 685.758809] WARNING: CPU: 16 PID: 1897 at kernel/workqueue.c:2966 check_flush_dependency+0x11f/0x140
> [ 685.762880] Modules linked in: nvmet_rdma nvmet nvme_keyring tcm_loop target_core_user uio target_core_pscsi target_core_file target_core_iblock rpcrdma qrtr rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod rfkill ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp sunrpc kvm_intel kvm irqbypass binfmt_misc rapl intel_cstate irdma ipmi_ssif i40e iTCO_wdt intel_pmc_bxt iTCO_vendor_support ib_uverbs acpi_ipmi intel_uncore joydev ipmi_si pcspkr mxm_wmi ib_core mei_me ipmi_devintf i2c_i801 mei i2c_smbus lpc_ich ioatdma ipmi_msghandler loop dm_multipath nfnetlink zram ice crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme isci nvme_core ghash_clmulni_intel sha512_ssse3 igb sha256_ssse3 libsas sha1_ssse3 nvme_auth mgag200 scsi_transport_sas dca gnss i2c_algo_bit wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
> [ 685.773891] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded Tainted: G S 6.8.4-300.patched.fc40.x86_64 #1
> [ 685.775267] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
> [ 685.776627] Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
> [ 685.777993] RIP: 0010:check_flush_dependency+0x11f/0x140

Maybe it is related with this line. What is the above line?

Zhu Yanjun

> [ 685.779331] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 00 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 fd ff <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
> [ 685.782050] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
> [ 685.783398] RAX: 0000000000000000 RBX: ffff96c705754800 RCX: 0000000000000027
> [ 685.784744] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI: ffff96ce5fca18c0
> [ 685.786077] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09: ffffb31348793b38
> [ 685.787390] R10: ffffffffb3516808 R11: 0000000000000003 R12: ffff96c70d2aa8c0
> [ 685.788688] R13: ffff96c7043c6a80 R14: 0000000000000001 R15: ffff96c704147400
> [ 685.789970] FS: 0000000000000000(0000) GS:ffff96ce5fc80000(0000) knlGS:0000000000000000
> [ 685.791239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 685.792495] CR2: 00007f8207151000 CR3: 0000000d15422006 CR4: 00000000001706f0
> [ 685.793745] Call Trace:
> [ 685.794973] <TASK>
> [ 685.796179] ? check_flush_dependency+0x11f/0x140
> [ 685.797382] ? __warn+0x81/0x130
> [ 685.798563] ? check_flush_dependency+0x11f/0x140
> [ 685.799732] ? report_bug+0x16f/0x1a0
> [ 685.800882] ? handle_bug+0x3c/0x80
> [ 685.802003] ? exc_invalid_op+0x17/0x70
> [ 685.803107] ? asm_exc_invalid_op+0x1a/0x20
> [ 685.804200] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
> [ 685.805315] ? check_flush_dependency+0x11f/0x140
> [ 685.806373] ? check_flush_dependency+0x11f/0x140
> [ 685.807407] __flush_work.isra.0+0x10d/0x290
> [ 685.808420] __cancel_work_timer+0x103/0x1a0
> [ 685.809418] irdma_destroy_qp+0xd4/0x180 [irdma]
> [ 685.810437] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
> [ 685.811474] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
> [ 685.812437] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
> [ 685.813385] process_one_work+0x170/0x330
> [ 685.814300] worker_thread+0x280/0x3d0
> [ 685.815201] ? __pfx_worker_thread+0x10/0x10
> [ 685.816090] kthread+0xe8/0x120
> [ 685.816956] ? __pfx_kthread+0x10/0x10
> [ 685.817801] ret_from_fork+0x34/0x50
> [ 685.818633] ? __pfx_kthread+0x10/0x10
> [ 685.819439] ret_from_fork_asm+0x1b/0x30
> [ 685.820232] </TASK>
> [ 685.820994] Kernel panic - not syncing: kernel: panic_on_warn set ...
> [ 685.821749] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded Tainted: G S 6.8.4-300.patched.fc40.x86_64 #1
> [ 685.822513] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
> [ 685.823259] Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
> [ 685.824002] Call Trace:
> [ 685.824706] <TASK>
> [ 685.825386] dump_stack_lvl+0x4d/0x70
> [ 685.826060] panic+0x33e/0x370
> [ 685.826724] ? check_flush_dependency+0x11f/0x140
> [ 685.827383] check_panic_on_warn+0x44/0x60
> [ 685.828021] __warn+0x8d/0x130
> [ 685.828629] ? check_flush_dependency+0x11f/0x140
> [ 685.829229] report_bug+0x16f/0x1a0
> [ 685.829819] handle_bug+0x3c/0x80
> [ 685.830396] exc_invalid_op+0x17/0x70
> [ 685.830972] asm_exc_invalid_op+0x1a/0x20
> [ 685.831548] RIP: 0010:check_flush_dependency+0x11f/0x140
> [ 685.832129] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 00 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 fd ff <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
> [ 685.833341] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
> [ 685.833954] RAX: 0000000000000000 RBX: ffff96c705754800 RCX: 0000000000000027
> [ 685.834569] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI: ffff96ce5fca18c0
> [ 685.835196] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09: ffffb31348793b38
> [ 685.835823] R10: ffffffffb3516808 R11: 0000000000000003 R12: ffff96c70d2aa8c0
> [ 685.836450] R13: ffff96c7043c6a80 R14: 0000000000000001 R15: ffff96c704147400
> [ 685.837079] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
> [ 685.837755] ? check_flush_dependency+0x11f/0x140
> [ 685.838394] __flush_work.isra.0+0x10d/0x290
> [ 685.839037] __cancel_work_timer+0x103/0x1a0
> [ 685.839679] irdma_destroy_qp+0xd4/0x180 [irdma]
> [ 685.840354] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
> [ 685.841049] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
> [ 685.841707] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
> [ 685.842367] process_one_work+0x170/0x330
> [ 685.843020] worker_thread+0x280/0x3d0
> [ 685.843670] ? __pfx_worker_thread+0x10/0x10
> [ 685.844316] kthread+0xe8/0x120
> [ 685.844955] ? __pfx_kthread+0x10/0x10
> [ 685.845590] ret_from_fork+0x34/0x50
> [ 685.846223] ? __pfx_kthread+0x10/0x10
> [ 685.846853] ret_from_fork_asm+0x1b/0x30
> [ 685.847485] </TASK>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]
  2024-12-13 18:55 ` Zhu Yanjun
@ 2024-12-13 19:30   ` Zhu Yanjun
  2024-12-14  2:26     ` Honggang LI
  0 siblings, 1 reply; 7+ messages in thread
From: Zhu Yanjun @ 2024-12-13 19:30 UTC (permalink / raw)
  To: Honggang LI, linux-nvme, linux-rdma



在 2024/12/13 19:55, Zhu Yanjun 写道:
> 在 2024/12/13 10:40, Honggang LI 写道:
>> It is 100% reproducible. The NVMEoRDMA client side is running RXE.
>> To reproduce it, the clinet side repeat to connect and disconnect
>> to the NVMEoRDMA target.
>>
>> [ 685.757357] ------------[ cut here ]------------
>> [ 685.758725] workqueue: WQ_MEM_RECLAIM nvmet- 
>> wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing ! 
>> WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]

I delved into this problem. It seems that it is a known problem.
Can you apply the following to make tests again?

diff --git a/drivers/infiniband/hw/irdma/hw.c 
b/drivers/infiniband/hw/irdma/hw.c
index ad50b77282f8..31501ff9f282 100644
--- a/drivers/infiniband/hw/irdma/hw.c
+++ b/drivers/infiniband/hw/irdma/hw.c
@@ -1872,7 +1872,7 @@ int irdma_rt_init_hw(struct irdma_device *iwdev,
                  * free cq bufs
                  */
                 iwdev->cleanup_wq = alloc_workqueue("irdma-cleanup-wq",
-                                       WQ_UNBOUND, WQ_UNBOUND_MAX_ACTIVE);
+                                       WQ_UNBOUND|WQ_MEM_RECLAIM, 
WQ_UNBOUND_MAX_ACTIVE);
                 if (!iwdev->cleanup_wq)
                         return -ENOMEM;
                 irdma_get_used_rsrc(iwdev);

Zhu Yanjun

>> [ 685.758809] WARNING: CPU: 16 PID: 1897 at kernel/workqueue.c:2966 
>> check_flush_dependency+0x11f/0x140
>> [ 685.762880] Modules linked in: nvmet_rdma nvmet nvme_keyring 
>> tcm_loop target_core_user uio target_core_pscsi target_core_file 
>> target_core_iblock rpcrdma qrtr rdma_ucm ib_srpt ib_isert 
>> iscsi_target_mod target_core_mod rfkill ib_iser libiscsi 
>> scsi_transport_iscsi rdma_cm iw_cm ib_cm intel_rapl_msr 
>> intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp 
>> coretemp sunrpc kvm_intel kvm irqbypass binfmt_misc rapl intel_cstate 
>> irdma ipmi_ssif i40e iTCO_wdt intel_pmc_bxt iTCO_vendor_support 
>> ib_uverbs acpi_ipmi intel_uncore joydev ipmi_si pcspkr mxm_wmi ib_core 
>> mei_me ipmi_devintf i2c_i801 mei i2c_smbus lpc_ich ioatdma 
>> ipmi_msghandler loop dm_multipath nfnetlink zram ice crct10dif_pclmul 
>> crc32_pclmul crc32c_intel polyval_clmulni polyval_generic nvme isci 
>> nvme_core ghash_clmulni_intel sha512_ssse3 igb sha256_ssse3 libsas 
>> sha1_ssse3 nvme_auth mgag200 scsi_transport_sas dca gnss i2c_algo_bit 
>> wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse
>> [ 685.773891] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded 
>> Tainted: G S 6.8.4-300.patched.fc40.x86_64 #1
>> [ 685.775267] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
>> [ 685.776627] Workqueue: nvmet-wq nvmet_rdma_release_queue_work 
>> [nvmet_rdma]
>> [ 685.777993] RIP: 0010:check_flush_dependency+0x11f/0x140
> 
> Maybe it is related with this line. What is the above line?
> 
> Zhu Yanjun
> 
>> [ 685.779331] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 
>> 00 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 
>> fd ff <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
>> [ 685.782050] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
>> [ 685.783398] RAX: 0000000000000000 RBX: ffff96c705754800 RCX: 
>> 0000000000000027
>> [ 685.784744] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI: 
>> ffff96ce5fca18c0
>> [ 685.786077] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09: 
>> ffffb31348793b38
>> [ 685.787390] R10: ffffffffb3516808 R11: 0000000000000003 R12: 
>> ffff96c70d2aa8c0
>> [ 685.788688] R13: ffff96c7043c6a80 R14: 0000000000000001 R15: 
>> ffff96c704147400
>> [ 685.789970] FS: 0000000000000000(0000) GS:ffff96ce5fc80000(0000) 
>> knlGS:0000000000000000
>> [ 685.791239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 685.792495] CR2: 00007f8207151000 CR3: 0000000d15422006 CR4: 
>> 00000000001706f0
>> [ 685.793745] Call Trace:
>> [ 685.794973] <TASK>
>> [ 685.796179] ? check_flush_dependency+0x11f/0x140
>> [ 685.797382] ? __warn+0x81/0x130
>> [ 685.798563] ? check_flush_dependency+0x11f/0x140
>> [ 685.799732] ? report_bug+0x16f/0x1a0
>> [ 685.800882] ? handle_bug+0x3c/0x80
>> [ 685.802003] ? exc_invalid_op+0x17/0x70
>> [ 685.803107] ? asm_exc_invalid_op+0x1a/0x20
>> [ 685.804200] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
>> [ 685.805315] ? check_flush_dependency+0x11f/0x140
>> [ 685.806373] ? check_flush_dependency+0x11f/0x140
>> [ 685.807407] __flush_work.isra.0+0x10d/0x290
>> [ 685.808420] __cancel_work_timer+0x103/0x1a0
>> [ 685.809418] irdma_destroy_qp+0xd4/0x180 [irdma]
>> [ 685.810437] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
>> [ 685.811474] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
>> [ 685.812437] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
>> [ 685.813385] process_one_work+0x170/0x330
>> [ 685.814300] worker_thread+0x280/0x3d0
>> [ 685.815201] ? __pfx_worker_thread+0x10/0x10
>> [ 685.816090] kthread+0xe8/0x120
>> [ 685.816956] ? __pfx_kthread+0x10/0x10
>> [ 685.817801] ret_from_fork+0x34/0x50
>> [ 685.818633] ? __pfx_kthread+0x10/0x10
>> [ 685.819439] ret_from_fork_asm+0x1b/0x30
>> [ 685.820232] </TASK>
>> [ 685.820994] Kernel panic - not syncing: kernel: panic_on_warn set ...
>> [ 685.821749] CPU: 16 PID: 1897 Comm: kworker/16:2 Kdump: loaded 
>> Tainted: G S 6.8.4-300.patched.fc40.x86_64 #1
>> [ 685.822513] Hardware name: Sugon I620-G10/X9DR3-F, BIOS 3.0b 07/22/2014
>> [ 685.823259] Workqueue: nvmet-wq nvmet_rdma_release_queue_work 
>> [nvmet_rdma]
>> [ 685.824002] Call Trace:
>> [ 685.824706] <TASK>
>> [ 685.825386] dump_stack_lvl+0x4d/0x70
>> [ 685.826060] panic+0x33e/0x370
>> [ 685.826724] ? check_flush_dependency+0x11f/0x140
>> [ 685.827383] check_panic_on_warn+0x44/0x60
>> [ 685.828021] __warn+0x8d/0x130
>> [ 685.828629] ? check_flush_dependency+0x11f/0x140
>> [ 685.829229] report_bug+0x16f/0x1a0
>> [ 685.829819] handle_bug+0x3c/0x80
>> [ 685.830396] exc_invalid_op+0x17/0x70
>> [ 685.830972] asm_exc_invalid_op+0x1a/0x20
>> [ 685.831548] RIP: 0010:check_flush_dependency+0x11f/0x140
>> [ 685.832129] Code: 8b 45 18 48 8d b2 b0 00 00 00 49 89 e8 48 8d 8b b0 
>> 00 00 00 48 c7 c7 28 fe b1 b2 c6 05 4f 97 59 02 01 48 89 c2 e8 a1 91 
>> fd ff <0f> 0b e9 fc fe ff ff 80 3d 3a 97 59 02 00 75 93 e9 2a ff ff ff 66
>> [ 685.833341] RSP: 0018:ffffb31348793cc8 EFLAGS: 00010082
>> [ 685.833954] RAX: 0000000000000000 RBX: ffff96c705754800 RCX: 
>> 0000000000000027
>> [ 685.834569] RDX: ffff96ce5fca18c8 RSI: 0000000000000001 RDI: 
>> ffff96ce5fca18c0
>> [ 685.835196] RBP: ffffffffc0d217f0 R08: 0000000000000000 R09: 
>> ffffb31348793b38
>> [ 685.835823] R10: ffffffffb3516808 R11: 0000000000000003 R12: 
>> ffff96c70d2aa8c0
>> [ 685.836450] R13: ffff96c7043c6a80 R14: 0000000000000001 R15: 
>> ffff96c704147400
>> [ 685.837079] ? __pfx_irdma_flush_worker+0x10/0x10 [irdma]
>> [ 685.837755] ? check_flush_dependency+0x11f/0x140
>> [ 685.838394] __flush_work.isra.0+0x10d/0x290
>> [ 685.839037] __cancel_work_timer+0x103/0x1a0
>> [ 685.839679] irdma_destroy_qp+0xd4/0x180 [irdma]
>> [ 685.840354] ib_destroy_qp_user+0x93/0x1a0 [ib_core]
>> [ 685.841049] nvmet_rdma_free_queue+0x35/0xc0 [nvmet_rdma]
>> [ 685.841707] nvmet_rdma_release_queue_work+0x1d/0x50 [nvmet_rdma]
>> [ 685.842367] process_one_work+0x170/0x330
>> [ 685.843020] worker_thread+0x280/0x3d0
>> [ 685.843670] ? __pfx_worker_thread+0x10/0x10
>> [ 685.844316] kthread+0xe8/0x120
>> [ 685.844955] ? __pfx_kthread+0x10/0x10
>> [ 685.845590] ret_from_fork+0x34/0x50
>> [ 685.846223] ? __pfx_kthread+0x10/0x10
>> [ 685.846853] ret_from_fork_asm+0x1b/0x30
>> [ 685.847485] </TASK>
>>
> 

-- 
Best Regards,
Yanjun.Zhu


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]
  2024-12-13 19:30   ` Zhu Yanjun
@ 2024-12-14  2:26     ` Honggang LI
  2024-12-14  9:36       ` Zhu Yanjun
  0 siblings, 1 reply; 7+ messages in thread
From: Honggang LI @ 2024-12-14  2:26 UTC (permalink / raw)
  To: Zhu Yanjun; +Cc: linux-nvme, linux-rdma

On Fri, Dec 13, 2024 at 08:30:01PM +0100, Zhu Yanjun wrote:
> I delved into this problem. It seems that it is a known problem.
> Can you apply the following to make tests again?
> 
> diff --git a/drivers/infiniband/hw/irdma/hw.c
> b/drivers/infiniband/hw/irdma/hw.c
> index ad50b77282f8..31501ff9f282 100644
> --- a/drivers/infiniband/hw/irdma/hw.c
> +++ b/drivers/infiniband/hw/irdma/hw.c
> @@ -1872,7 +1872,7 @@ int irdma_rt_init_hw(struct irdma_device *iwdev,
>                  * free cq bufs
>                  */
>                 iwdev->cleanup_wq = alloc_workqueue("irdma-cleanup-wq",
> -                                       WQ_UNBOUND, WQ_UNBOUND_MAX_ACTIVE);
> +                                       WQ_UNBOUND|WQ_MEM_RECLAIM,

After add flag WQ_MEM_RECLAIM, the warning message is gone. However,
it may raise similar issue fixed by commit 2cc7d150550 again.

thanks

commit 2cc7d150550cc981aceedf008f5459193282425c
Author: Sindhu Devale <sindhu.devale@intel.com>
Date:   Tue Apr 23 11:27:17 2024 -0700

    i40e: Do not use WQ_MEM_RECLAIM flag for workqueue


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma]
  2024-12-14  2:26     ` Honggang LI
@ 2024-12-14  9:36       ` Zhu Yanjun
  0 siblings, 0 replies; 7+ messages in thread
From: Zhu Yanjun @ 2024-12-14  9:36 UTC (permalink / raw)
  To: Honggang LI, Ismail, Mustafa; +Cc: linux-nvme, linux-rdma

在 2024/12/14 3:26, Honggang LI 写道:
> On Fri, Dec 13, 2024 at 08:30:01PM +0100, Zhu Yanjun wrote:
>> I delved into this problem. It seems that it is a known problem.
>> Can you apply the following to make tests again?
>>
>> diff --git a/drivers/infiniband/hw/irdma/hw.c
>> b/drivers/infiniband/hw/irdma/hw.c
>> index ad50b77282f8..31501ff9f282 100644
>> --- a/drivers/infiniband/hw/irdma/hw.c
>> +++ b/drivers/infiniband/hw/irdma/hw.c
>> @@ -1872,7 +1872,7 @@ int irdma_rt_init_hw(struct irdma_device *iwdev,
>>                   * free cq bufs
>>                   */
>>                  iwdev->cleanup_wq = alloc_workqueue("irdma-cleanup-wq",
>> -                                       WQ_UNBOUND, WQ_UNBOUND_MAX_ACTIVE);
>> +                                       WQ_UNBOUND|WQ_MEM_RECLAIM,
> 
> After add flag WQ_MEM_RECLAIM, the warning message is gone. However,
> it may raise similar issue fixed by commit 2cc7d150550 again.
> 
> thanks
> 
> commit 2cc7d150550cc981aceedf008f5459193282425c
> Author: Sindhu Devale <sindhu.devale@intel.com>
> Date:   Tue Apr 23 11:27:17 2024 -0700
> 
>      i40e: Do not use WQ_MEM_RECLAIM flag for workqueue

I read the commit log carefully. If I understand the commit log 
correctly, the flag WQ_MEM_RECLAIM is used in i40e while it is
not used in i40iw.
The fix is to remove the flag WQ_MEM_RECLAIM from i40e.

"
     Issue reported by customer during SRIOV testing, call trace:
     When both i40e and the i40iw driver are loaded, a warning
     in check_flush_dependency is being triggered. This seems
     to be because of the i40e driver workqueue is allocated with
     the WQ_MEM_RECLAIM flag, and the i40iw one is not.

     Similar error was encountered on ice too and it was fixed by
     removing the flag. Do the same for i40e too.
"
I do not have E810 device and i40e device and can not read the issue 
reported by customer during SRIOV testing.
Thus, let Intel engineers continue to handle this problem.
@Mustafa Ismail

Zhu Yanjun

> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-14  9:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13  9:40 workqueue: WQ_MEM_RECLAIM nvmet-wq:nvmet_rdma_release_queue_work [nvmet_rdma] is flushing !WQ_MEM_RECLAIM irdma-cleanup-wq:irdma_flush_worker [irdma] Honggang LI
2024-12-13 12:01 ` Bernard Metzler
2024-12-13 12:16   ` Bernard Metzler
2024-12-13 18:55 ` Zhu Yanjun
2024-12-13 19:30   ` Zhu Yanjun
2024-12-14  2:26     ` Honggang LI
2024-12-14  9:36       ` Zhu Yanjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).