* [PATCH v2 1/1] RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency
@ 2024-08-20 11:33 Zhu Yanjun
2024-08-22 23:30 ` Bart Van Assche
2024-08-23 14:47 ` Jason Gunthorpe
0 siblings, 2 replies; 3+ messages in thread
From: Zhu Yanjun @ 2024-08-20 11:33 UTC (permalink / raw)
To: bvanassche, leon, linux-rdma, shinichiro.kawasaki, oliver.sang,
jgg
Cc: Zhu Yanjun
In the commit aee2424246f9 ("RDMA/iwcm: Fix a use-after-free related to
destroying CM IDs"), the function flush_workqueue is invoked to flush
the work queue iwcm_wq.
But at that time, the work queue iwcm_wq was created via the function
alloc_ordered_workqueue without the flag WQ_MEM_RECLAIM.
Because the current process is trying to flush the whole iwcm_wq, if
iwcm_wq doesn't have the flag WQ_MEM_RECLAIM, verify that the current
process is not reclaiming memory or running on a workqueue which
doesn't have the flag WQ_MEM_RECLAIM as that can break forward-progress
guarantee leading to a deadlock.
The call trace is as below:
"
[ 125.350876][ T1430] Call Trace:
[ 125.356281][ T1430] <TASK>
[ 125.361285][ T1430] ? __warn (kernel/panic.c:693)
[ 125.367640][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.375689][ T1430] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 125.382505][ T1430] ? handle_bug (arch/x86/kernel/traps.c:239)
[ 125.388987][ T1430] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
[ 125.395831][ T1430] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621)
[ 125.403125][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.410984][ T1430] ? check_flush_dependency (kernel/workqueue.c:3706 (discriminator 9))
[ 125.418764][ T1430] __flush_workqueue (kernel/workqueue.c:3970)
[ 125.426021][ T1430] ? __pfx___might_resched (kernel/sched/core.c:10151)
[ 125.433431][ T1430] ? destroy_cm_id (drivers/infiniband/core/iwcm.c:375) iw_cm
[ 125.440844][ T2411] /usr/bin/wget -q --timeout=3600 --tries=1 --local-encoding=UTF-8 http://internal-lkp-server:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/lkp-spr-2sp1/blktests-1SSD-rdma-nvme-group-01-debian-12-x86_64-20240206.cgz-aee2424246f9-20240809-69442-1dktaed-4.yaml&job_state=running -O /dev/null
[ 125.441209][ T1430] ? __pfx___flush_workqueue (kernel/workqueue.c:3910)
[ 125.441215][ T2411]
[ 125.473900][ T1430] ? _raw_spin_lock_irqsave (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-arch-fallback.h:2170 include/linux/atomic/atomic-instrumented.h:1302 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 125.473909][ T1430] ? __pfx__raw_spin_lock_irqsave (kernel/locking/spinlock.c:161)
[ 125.480265][ T2411] target ucode: 0x2b0004b1
[ 125.482537][ T1430] _destroy_id (drivers/infiniband/core/cma.c:2044) rdma_cm
[ 125.488511][ T2411]
[ 125.495072][ T1430] nvme_rdma_free_queue (drivers/nvme/host/rdma.c:656 drivers/nvme/host/rdma.c:650) nvme_rdma
[ 125.500747][ T2411] LKP: stdout: 2876: current_version: 2b0004b1, target_version: 2b0004b1
[ 125.505827][ T1430] nvme_rdma_reset_ctrl_work (drivers/nvme/host/rdma.c:2180) nvme_rdma
[ 125.505831][ T1430] process_one_work (kernel/workqueue.c:3231)
[ 125.508377][ T2411]
[ 125.515122][ T1430] worker_thread (kernel/workqueue.c:3306 kernel/workqueue.c:3393)
[ 125.515127][ T1430] ? __pfx_worker_thread (kernel/workqueue.c:3339)
[ 125.524642][ T2411] check_nr_cpu
[ 125.531837][ T1430] kthread (kernel/kthread.c:389)
[ 125.537327][ T2411]
[ 125.539864][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.545392][ T2411] CPU(s): 224
[ 125.550628][ T1430] ret_from_fork (arch/x86/kernel/process.c:147)
[ 125.554342][ T2411]
[ 125.558840][ T1430] ? __pfx_kthread (kernel/kthread.c:342)
[ 125.558844][ T1430] ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
[ 125.561843][ T2411] On-line CPU(s) list: 0-223
[ 125.566487][ T1430] </TASK>
[ 125.566488][ T1430] ---[ end trace 0000000000000000 ]---
"
Fixes: aee2424246f9 ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
V1 -> V2: Modify commit logs based on Bart and Jason' suggestions
---
drivers/infiniband/core/iwcm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
index 1a6339f3a63f..7e3a55349e10 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -1182,7 +1182,7 @@ static int __init iw_cm_init(void)
if (ret)
return ret;
- iwcm_wq = alloc_ordered_workqueue("iw_cm_wq", 0);
+ iwcm_wq = alloc_ordered_workqueue("iw_cm_wq", WQ_MEM_RECLAIM);
if (!iwcm_wq)
goto err_alloc;
--
2.34.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2 1/1] RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency
2024-08-20 11:33 [PATCH v2 1/1] RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency Zhu Yanjun
@ 2024-08-22 23:30 ` Bart Van Assche
2024-08-23 14:47 ` Jason Gunthorpe
1 sibling, 0 replies; 3+ messages in thread
From: Bart Van Assche @ 2024-08-22 23:30 UTC (permalink / raw)
To: Zhu Yanjun, leon, linux-rdma, shinichiro.kawasaki, oliver.sang,
jgg
On 8/20/24 4:33 AM, Zhu Yanjun wrote:
> diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
> index 1a6339f3a63f..7e3a55349e10 100644
> --- a/drivers/infiniband/core/iwcm.c
> +++ b/drivers/infiniband/core/iwcm.c
> @@ -1182,7 +1182,7 @@ static int __init iw_cm_init(void)
> if (ret)
> return ret;
>
> - iwcm_wq = alloc_ordered_workqueue("iw_cm_wq", 0);
> + iwcm_wq = alloc_ordered_workqueue("iw_cm_wq", WQ_MEM_RECLAIM);
> if (!iwcm_wq)
> goto err_alloc;
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 1/1] RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency
2024-08-20 11:33 [PATCH v2 1/1] RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency Zhu Yanjun
2024-08-22 23:30 ` Bart Van Assche
@ 2024-08-23 14:47 ` Jason Gunthorpe
1 sibling, 0 replies; 3+ messages in thread
From: Jason Gunthorpe @ 2024-08-23 14:47 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: bvanassche, leon, linux-rdma, shinichiro.kawasaki, oliver.sang
On Tue, Aug 20, 2024 at 01:33:36PM +0200, Zhu Yanjun wrote:
> In the commit aee2424246f9 ("RDMA/iwcm: Fix a use-after-free related to
> destroying CM IDs"), the function flush_workqueue is invoked to flush
> the work queue iwcm_wq.
..
> Fixes: aee2424246f9 ("RDMA/iwcm: Fix a use-after-free related to destroying CM IDs")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> ---
> V1 -> V2: Modify commit logs based on Bart and Jason' suggestions
> ---
> drivers/infiniband/core/iwcm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Applied to for-next, thanks
Jason
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-08-23 14:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-20 11:33 [PATCH v2 1/1] RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency Zhu Yanjun
2024-08-22 23:30 ` Bart Van Assche
2024-08-23 14:47 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).