* [PATCH 0/5] Fix potential issues for siw
@ 2023-07-27 14:03 Guoqing Jiang
2023-07-27 14:03 ` [PATCH 1/5] RDMA/siw: Set siw_cm_wq to NULL after it is destroyed Guoqing Jiang
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: Guoqing Jiang @ 2023-07-27 14:03 UTC (permalink / raw)
To: bmt, jgg, leon; +Cc: linux-rdma
Hi,
Several issues appeared if we rmmod siw module after failed to insert
the module (with manual change like below).
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -577,6 +577,7 @@ static __init int siw_init_module(void)
if (rv)
goto out_error;
+ goto out_error;
rdma_link_register(&siw_link_ops);
Basically, these issues are double free, use before initalization or
null pointer dereference. For more details, pls review the individual
patch.
Thanks,
Guoqing
Guoqing Jiang (5):
RDMA/siw: Set siw_cm_wq to NULL after it is destroyed
RDMA/siw: Ensure siw_destroy_cpulist can be called more than once
RDMA/siw: Initialize siw_link_ops.list
RDMA/siw: Set siw_crypto_shash to NULL after it is freed
RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread
drivers/infiniband/sw/siw/siw_cm.c | 4 +++-
drivers/infiniband/sw/siw/siw_main.c | 7 ++++++-
drivers/infiniband/sw/siw/siw_qp_tx.c | 7 ++++++-
3 files changed, 15 insertions(+), 3 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/5] RDMA/siw: Set siw_cm_wq to NULL after it is destroyed
2023-07-27 14:03 [PATCH 0/5] Fix potential issues for siw Guoqing Jiang
@ 2023-07-27 14:03 ` Guoqing Jiang
2023-07-27 14:03 ` [PATCH 2/5] RDMA/siw: Ensure siw_destroy_cpulist can be called more than once Guoqing Jiang
` (5 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2023-07-27 14:03 UTC (permalink / raw)
To: bmt, jgg, leon; +Cc: linux-rdma
In case siw module can't be inserted successfully, after that remove the module
from kernel, then both siw_cm_exit and the failure path in siw_init_module call
siw_cm_exit, which cause below issue.
[ 73.561312] BUG: unable to handle page fault for address: 000000040000004c
[ 73.561317] #PF: supervisor read access in kernel mode
[ 73.561319] #PF: error_code(0x0000) - not-present page
[ 73.561320] PGD 0 P4D 0
[ 73.561322] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 73.561324] CPU: 1 PID: 1693 Comm: modprobe Tainted: G OE 6.5.0-rc3+ #16
[ 73.561326] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 73.561327] RIP: 0010:device_del+0x22/0x3d0
...
[ 73.561347] Call Trace:
[ 73.561348] <TASK>
[ 73.561350] ? show_regs+0x72/0x90
[ 73.561353] ? __die+0x25/0x80
[ 73.561355] ? page_fault_oops+0x154/0x4d0
[ 73.561357] ? lockdep_unlock+0x63/0xe0
[ 73.561361] ? do_user_addr_fault+0x381/0x8d0
[ 73.561363] ? rcu_is_watching+0x13/0x70
[ 73.561365] ? exc_page_fault+0x87/0x240
[ 73.561369] ? asm_exc_page_fault+0x27/0x30
[ 73.561373] ? device_del+0x22/0x3d0
[ 73.561374] ? __this_cpu_preempt_check+0x13/0x20
[ 73.561377] device_unregister+0x18/0x70
[ 73.561378] destroy_workqueue+0x33/0x2d0
[ 73.561381] siw_cm_exit+0x1a/0x30 [siw]
[ 73.561387] siw_exit_module+0x96/0x5a0 [siw]
So we need to set the workqueue to NULL after it is destroyed.
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
drivers/infiniband/sw/siw/siw_cm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c
index da530c0404da..758ac8a22f7a 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -1958,6 +1958,8 @@ int siw_cm_init(void)
void siw_cm_exit(void)
{
- if (siw_cm_wq)
+ if (siw_cm_wq) {
destroy_workqueue(siw_cm_wq);
+ siw_cm_wq = NULL;
+ }
}
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/5] RDMA/siw: Ensure siw_destroy_cpulist can be called more than once
2023-07-27 14:03 [PATCH 0/5] Fix potential issues for siw Guoqing Jiang
2023-07-27 14:03 ` [PATCH 1/5] RDMA/siw: Set siw_cm_wq to NULL after it is destroyed Guoqing Jiang
@ 2023-07-27 14:03 ` Guoqing Jiang
2023-07-27 14:03 ` [PATCH 3/5] RDMA/siw: Initialize siw_link_ops.list Guoqing Jiang
` (4 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2023-07-27 14:03 UTC (permalink / raw)
To: bmt, jgg, leon; +Cc: linux-rdma
In case siw module can't be inserted successfully, then if remove
the module from kernel, then both siw_cm_exit and the failure path
in siw_init_module call siw_destroy_cpulist. Let's set tx_valid_cpus
and num_nodes to prevent double free issues.
[ 32.197293] general protection fault, probably for non-canonical address 0xb4965e5a58a488: 0000 [#1] PREEMPT SMP NOPTI
[ 32.197300] CPU: 0 PID: 1676 Comm: modprobe Tainted: G OE 6.5.0-rc3+ #16
[ 32.197304] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 32.197306] RIP: 0010:kfree+0x62/0x150
...
[ 32.197339] Call Trace:
[ 32.197341] <TASK>
[ 32.197343] ? show_regs+0x72/0x90
[ 32.197348] ? die_addr+0x38/0xb0
[ 32.197351] ? exc_general_protection+0x1bf/0x4a0
[ 32.197357] ? asm_exc_general_protection+0x27/0x30
[ 32.197362] ? kfree+0x62/0x150
[ 32.197366] siw_exit_module+0xb8/0x590 [siw]
[ 32.197376] __do_sys_delete_module.constprop.0+0x18f/0x300
So let's set tx_valid_cpus and num_nodes to prevent the issue.
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
drivers/infiniband/sw/siw/siw_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 65b5cda5457b..b3547253c099 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -178,6 +178,8 @@ static void siw_destroy_cpulist(void)
kfree(siw_cpu_info.tx_valid_cpus[i++]);
kfree(siw_cpu_info.tx_valid_cpus);
+ siw_cpu_info.tx_valid_cpus = NULL;
+ siw_cpu_info.num_nodes = 0;
}
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/5] RDMA/siw: Initialize siw_link_ops.list
2023-07-27 14:03 [PATCH 0/5] Fix potential issues for siw Guoqing Jiang
2023-07-27 14:03 ` [PATCH 1/5] RDMA/siw: Set siw_cm_wq to NULL after it is destroyed Guoqing Jiang
2023-07-27 14:03 ` [PATCH 2/5] RDMA/siw: Ensure siw_destroy_cpulist can be called more than once Guoqing Jiang
@ 2023-07-27 14:03 ` Guoqing Jiang
2023-07-27 14:03 ` [PATCH 4/5] RDMA/siw: Set siw_crypto_shash to NULL after it is freed Guoqing Jiang
` (3 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2023-07-27 14:03 UTC (permalink / raw)
To: bmt, jgg, leon; +Cc: linux-rdma
In case siw module can't be inserted successfully, then if remove
the module from kernel, then siw_cm_exit can trigger below trace
because siw_link_ops.list is still NULL since rdma_link_register
is not called. So we need to init the list earlier.
[ 45.306864] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 45.306869] #PF: supervisor write access in kernel mode
[ 45.306871] #PF: error_code(0x0002) - not-present page
[ 45.306872] PGD 0 P4D 0
[ 45.306874] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 45.306876] CPU: 1 PID: 1742 Comm: modprobe Tainted: G OE 6.5.0-rc3+ #16
[ 45.306879] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 45.306880] RIP: 0010:rdma_link_unregister+0x27/0x60 [ib_core]
...
[ 45.306916] Call Trace:
[ 45.306917] <TASK>
[ 45.306918] ? show_regs+0x72/0x90
[ 45.306922] ? __die+0x25/0x80
[ 45.306924] ? page_fault_oops+0x154/0x4d0
[ 45.306927] ? __this_cpu_preempt_check+0x13/0x20
[ 45.306929] ? lockdep_unlock+0x63/0xe0
[ 45.306933] ? do_user_addr_fault+0x381/0x8d0
[ 45.306934] ? rcu_is_watching+0x13/0x70
[ 45.306937] ? exc_page_fault+0x87/0x240
[ 45.306940] ? asm_exc_page_fault+0x27/0x30
[ 45.306944] ? rdma_link_unregister+0x27/0x60 [ib_core]
[ 45.306956] ? rdma_link_unregister+0x19/0x60 [ib_core]
[ 45.306967] siw_exit_module+0x87/0x590 [siw]
[ 45.306973] __do_sys_delete_module.constprop.0+0x18f/0x300
[ 45.306975] ? syscall_enter_from_user_mode+0x21/0x70
[ 45.306977] ? __this_cpu_preempt_check+0x13/0x20
[ 45.306978] ? lockdep_hardirqs_on+0x86/0x120
[ 45.306980] __x64_sys_delete_module+0x12/0x20
[ 45.306982] do_syscall_64+0x5c/0x90
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
drivers/infiniband/sw/siw/siw_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index b3547253c099..6709ed0de3a4 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -526,6 +526,7 @@ static int siw_newlink(const char *basedev_name, struct net_device *netdev)
}
static struct rdma_link_ops siw_link_ops = {
+ .list = LIST_HEAD_INIT(siw_link_ops.list),
.type = "siw",
.newlink = siw_newlink,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/5] RDMA/siw: Set siw_crypto_shash to NULL after it is freed
2023-07-27 14:03 [PATCH 0/5] Fix potential issues for siw Guoqing Jiang
` (2 preceding siblings ...)
2023-07-27 14:03 ` [PATCH 3/5] RDMA/siw: Initialize siw_link_ops.list Guoqing Jiang
@ 2023-07-27 14:03 ` Guoqing Jiang
2023-07-27 14:03 ` [PATCH 5/5] RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread Guoqing Jiang
` (2 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2023-07-27 14:03 UTC (permalink / raw)
To: bmt, jgg, leon; +Cc: linux-rdma
In case siw module can't be inserted successfully, then remove the
module from kernel, which means both siw_cm_exit and the failure
path in siw_init_module call crypto_free_shash. We can see below
call trace appears.
[ 72.349344] ------------[ cut here ]------------
[ 72.349348] refcount_t: underflow; use-after-free.
[ 72.349386] WARNING: CPU: 1 PID: 1737 at lib/refcount.c:28 refcount_warn_saturate+0xfb/0x150
...
[ 72.349469] RIP: 0010:refcount_warn_saturate+0xfb/0x150
...
[ 72.349487] Call Trace:
[ 72.349488] <TASK>
[ 72.349490] ? show_regs+0x72/0x90
[ 72.349493] ? refcount_warn_saturate+0xfb/0x150
[ 72.349495] ? __warn+0x8d/0x1a0
[ 72.349498] ? refcount_warn_saturate+0xfb/0x150
[ 72.349500] ? report_bug+0x1f9/0x250
[ 72.349505] ? handle_bug+0x46/0x90
[ 72.349508] ? exc_invalid_op+0x19/0x80
[ 72.349511] ? asm_exc_invalid_op+0x1b/0x20
[ 72.349517] ? refcount_warn_saturate+0xfb/0x150
[ 72.349519] ? refcount_warn_saturate+0xfb/0x150
[ 72.349521] crypto_destroy_tfm+0x9b/0xe0
[ 72.349525] siw_exit_module+0xf6/0x590 [siw]
So we need to set siw_crypto_shash to null in the failure path of
siw_init_module.
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
drivers/infiniband/sw/siw/siw_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 6709ed0de3a4..f8549d01887f 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -589,8 +589,10 @@ static __init int siw_init_module(void)
siw_tx_thread[nr_cpu] = NULL;
}
}
- if (siw_crypto_shash)
+ if (siw_crypto_shash) {
crypto_free_shash(siw_crypto_shash);
+ siw_crypto_shash = NULL;
+ }
pr_info("SoftIWARP attach failed. Error: %d\n", rv);
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 5/5] RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread
2023-07-27 14:03 [PATCH 0/5] Fix potential issues for siw Guoqing Jiang
` (3 preceding siblings ...)
2023-07-27 14:03 ` [PATCH 4/5] RDMA/siw: Set siw_crypto_shash to NULL after it is freed Guoqing Jiang
@ 2023-07-27 14:03 ` Guoqing Jiang
2023-07-27 17:17 ` [PATCH 0/5] Fix potential issues for siw Bernard Metzler
2023-08-09 19:04 ` Jason Gunthorpe
6 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2023-07-27 14:03 UTC (permalink / raw)
To: bmt, jgg, leon; +Cc: linux-rdma
In case siw module can't be inserted successfully, and if the kthread
(siw_run_sq) is not run which means wait_queue_head (tx_task->waiting)
is not initialized. Then siw_stop_tx_thread is called from siw_init_module,
so below trace appeared.
kernel: BUG: spinlock bad magic on CPU#0, modprobe/2073
kernel: lock: 0xffff88babbd380e8, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
kernel: CPU: 0 PID: 2073 Comm: modprobe Tainted: G OE 6.5.0-rc3+ #16
kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
kernel: Call Trace:
kernel: <TASK>
kernel: dump_stack_lvl+0x77/0xd0
kernel: dump_stack+0x10/0x20
kernel: spin_bug+0xa5/0xd0
kernel: do_raw_spin_lock+0x90/0xd0
kernel: _raw_spin_lock_irqsave+0x56/0x80
kernel: ? __wake_up_common_lock+0x63/0xd0
kernel: __wake_up_common_lock+0x63/0xd0
kernel: __wake_up+0x13/0x30
kernel: siw_stop_tx_thread+0x49/0x70 [siw]
kernel: siw_init_module+0x15b/0xff0 [siw]
kernel: ? __pfx_siw_init_module+0x10/0x10 [siw]
kernel: do_one_initcall+0x60/0x390
...
kernel: </TASK>
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
To prevent the issue, add 'running' to tx_task_t, which is set to after
siw_run_sq is triggered. Then only wake up waitqueue after it is true.
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
drivers/infiniband/sw/siw/siw_qp_tx.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index 7c7a51d36d0c..70acc4cd553f 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -1204,14 +1204,18 @@ static void siw_sq_resume(struct siw_qp *qp)
struct tx_task_t {
struct llist_head active;
wait_queue_head_t waiting;
+ bool running;
};
static DEFINE_PER_CPU(struct tx_task_t, siw_tx_task_g);
void siw_stop_tx_thread(int nr_cpu)
{
+ struct tx_task_t *tx_task = &per_cpu(siw_tx_task_g, nr_cpu);
+
kthread_stop(siw_tx_thread[nr_cpu]);
- wake_up(&per_cpu(siw_tx_task_g, nr_cpu).waiting);
+ if (tx_task->running)
+ wake_up(&per_cpu(siw_tx_task_g, nr_cpu).waiting);
}
int siw_run_sq(void *data)
@@ -1223,6 +1227,7 @@ int siw_run_sq(void *data)
init_llist_head(&tx_task->active);
init_waitqueue_head(&tx_task->waiting);
+ tx_task->running = true;
while (1) {
struct llist_node *fifo_list = NULL;
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: [PATCH 0/5] Fix potential issues for siw
2023-07-27 14:03 [PATCH 0/5] Fix potential issues for siw Guoqing Jiang
` (4 preceding siblings ...)
2023-07-27 14:03 ` [PATCH 5/5] RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread Guoqing Jiang
@ 2023-07-27 17:17 ` Bernard Metzler
2023-07-27 17:29 ` Jason Gunthorpe
2023-08-09 19:04 ` Jason Gunthorpe
6 siblings, 1 reply; 15+ messages in thread
From: Bernard Metzler @ 2023-07-27 17:17 UTC (permalink / raw)
To: Guoqing Jiang, jgg@ziepe.ca, leon@kernel.org; +Cc: linux-rdma@vger.kernel.org
> -----Original Message-----
> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> Sent: Thursday, 27 July 2023 16:04
> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org
> Cc: linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
>
> Hi,
>
> Several issues appeared if we rmmod siw module after failed to insert
> the module (with manual change like below).
>
> --- a/drivers/infiniband/sw/siw/siw_main.c
> +++ b/drivers/infiniband/sw/siw/siw_main.c
> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
> if (rv)
> goto out_error;
>
> + goto out_error;
> rdma_link_register(&siw_link_ops);
>
> Basically, these issues are double free, use before initalization or
> null pointer dereference. For more details, pls review the individual
> patch.
>
> Thanks,
> Guoqing
Hi Guoqing,
very good catch, thank you. I was under the wrong assumption a
module is not loaded if the init_module() returns a value.
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
>
> Guoqing Jiang (5):
> RDMA/siw: Set siw_cm_wq to NULL after it is destroyed
> RDMA/siw: Ensure siw_destroy_cpulist can be called more than once
> RDMA/siw: Initialize siw_link_ops.list
> RDMA/siw: Set siw_crypto_shash to NULL after it is freed
> RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread
>
> drivers/infiniband/sw/siw/siw_cm.c | 4 +++-
> drivers/infiniband/sw/siw/siw_main.c | 7 ++++++-
> drivers/infiniband/sw/siw/siw_qp_tx.c | 7 ++++++-
> 3 files changed, 15 insertions(+), 3 deletions(-)
>
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] Fix potential issues for siw
2023-07-27 17:17 ` [PATCH 0/5] Fix potential issues for siw Bernard Metzler
@ 2023-07-27 17:29 ` Jason Gunthorpe
2023-07-27 18:15 ` Bart Van Assche
2023-07-28 1:16 ` Guoqing Jiang
0 siblings, 2 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2023-07-27 17:29 UTC (permalink / raw)
To: Bernard Metzler
Cc: Guoqing Jiang, leon@kernel.org, linux-rdma@vger.kernel.org
On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
>
>
> > -----Original Message-----
> > From: Guoqing Jiang <guoqing.jiang@linux.dev>
> > Sent: Thursday, 27 July 2023 16:04
> > To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org
> > Cc: linux-rdma@vger.kernel.org
> > Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
> >
> > Hi,
> >
> > Several issues appeared if we rmmod siw module after failed to insert
> > the module (with manual change like below).
> >
> > --- a/drivers/infiniband/sw/siw/siw_main.c
> > +++ b/drivers/infiniband/sw/siw/siw_main.c
> > @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
> > if (rv)
> > goto out_error;
> >
> > + goto out_error;
> > rdma_link_register(&siw_link_ops);
> >
> > Basically, these issues are double free, use before initalization or
> > null pointer dereference. For more details, pls review the individual
> > patch.
> >
> > Thanks,
> > Guoqing
>
> Hi Guoqing,
>
> very good catch, thank you. I was under the wrong assumption a
> module is not loaded if the init_module() returns a value.
I think that is actually true, isn't it? I'm confused?
Jason
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] Fix potential issues for siw
2023-07-27 17:29 ` Jason Gunthorpe
@ 2023-07-27 18:15 ` Bart Van Assche
2023-07-28 1:16 ` Guoqing Jiang
1 sibling, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2023-07-27 18:15 UTC (permalink / raw)
To: Jason Gunthorpe, Bernard Metzler
Cc: Guoqing Jiang, leon@kernel.org, linux-rdma@vger.kernel.org
On 7/27/23 10:29, Jason Gunthorpe wrote:
> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
>> very good catch, thank you. I was under the wrong assumption a
>> module is not loaded if the init_module() returns a value.
>
> I think that is actually true, isn't it? I'm confused?
My understanding is also that module loading should fail if
init_module() returns an error code. From kernel/module/main.c:
/* Start the module */
if (mod->init != NULL)
ret = do_one_initcall(mod->init);
if (ret < 0) {
goto fail_free_freeinit;
}
if (ret > 0) {
pr_warn("%s: '%s'->init suspiciously returned %d, it "
"should follow 0/-E convention\n"
"%s: loading module anyway...\n",
__func__, mod->name, ret, __func__);
dump_stack();
}
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] Fix potential issues for siw
2023-07-27 17:29 ` Jason Gunthorpe
2023-07-27 18:15 ` Bart Van Assche
@ 2023-07-28 1:16 ` Guoqing Jiang
2023-07-28 2:29 ` Guoqing Jiang
2023-07-28 9:36 ` Bernard Metzler
1 sibling, 2 replies; 15+ messages in thread
From: Guoqing Jiang @ 2023-07-28 1:16 UTC (permalink / raw)
To: Jason Gunthorpe, Bernard Metzler
Cc: leon@kernel.org, linux-rdma@vger.kernel.org
On 7/28/23 01:29, Jason Gunthorpe wrote:
> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
>>
>>> -----Original Message-----
>>> From: Guoqing Jiang <guoqing.jiang@linux.dev>
>>> Sent: Thursday, 27 July 2023 16:04
>>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org
>>> Cc: linux-rdma@vger.kernel.org
>>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
>>>
>>> Hi,
>>>
>>> Several issues appeared if we rmmod siw module after failed to insert
>>> the module (with manual change like below).
>>>
>>> --- a/drivers/infiniband/sw/siw/siw_main.c
>>> +++ b/drivers/infiniband/sw/siw/siw_main.c
>>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
>>> if (rv)
>>> goto out_error;
>>>
>>> + goto out_error;
>>> rdma_link_register(&siw_link_ops);
>>>
>>> Basically, these issues are double free, use before initalization or
>>> null pointer dereference. For more details, pls review the individual
>>> patch.
>>>
>>> Thanks,
>>> Guoqing
>> Hi Guoqing,
>>
>> very good catch, thank you. I was under the wrong assumption a
>> module is not loaded if the init_module() returns a value.
> I think that is actually true, isn't it? I'm confused?
Yes, you are right. Since rv is still 0, so the module appears in the
kernel. Not sure if some tool could inject err like this. Feel free to
ignore this.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] Fix potential issues for siw
2023-07-28 1:16 ` Guoqing Jiang
@ 2023-07-28 2:29 ` Guoqing Jiang
2023-07-28 11:10 ` Bernard Metzler
2023-07-28 9:36 ` Bernard Metzler
1 sibling, 1 reply; 15+ messages in thread
From: Guoqing Jiang @ 2023-07-28 2:29 UTC (permalink / raw)
To: Jason Gunthorpe, Bernard Metzler
Cc: leon@kernel.org, linux-rdma@vger.kernel.org
On 7/28/23 09:16, Guoqing Jiang wrote:
>
>
> On 7/28/23 01:29, Jason Gunthorpe wrote:
>> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
>>>
>>>> -----Original Message-----
>>>> From: Guoqing Jiang <guoqing.jiang@linux.dev>
>>>> Sent: Thursday, 27 July 2023 16:04
>>>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca;
>>>> leon@kernel.org
>>>> Cc: linux-rdma@vger.kernel.org
>>>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
>>>>
>>>> Hi,
>>>>
>>>> Several issues appeared if we rmmod siw module after failed to insert
>>>> the module (with manual change like below).
>>>>
>>>> --- a/drivers/infiniband/sw/siw/siw_main.c
>>>> +++ b/drivers/infiniband/sw/siw/siw_main.c
>>>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
>>>> if (rv)
>>>> goto out_error;
>>>>
>>>> + goto out_error;
>>>> rdma_link_register(&siw_link_ops);
>>>>
>>>> Basically, these issues are double free, use before initalization or
>>>> null pointer dereference. For more details, pls review the individual
>>>> patch.
>>>>
>>>> Thanks,
>>>> Guoqing
>>> Hi Guoqing,
>>>
>>> very good catch, thank you. I was under the wrong assumption a
>>> module is not loaded if the init_module() returns a value.
>> I think that is actually true, isn't it? I'm confused?
>
> Yes, you are right. Since rv is still 0, so the module appears in the
> kernel. Not sure if some tool could inject err like this. Feel free to
> ignore this.
The below trace happened if I run a stress test with load siw module and
unload siw in a loop, which should be fixed by patch 5, so I think we
need to apply it, what do you think?
[ 414.537961] BUG: spinlock bad magic on CPU#0, modprobe/3722
[ 414.537965] lock: 0xffff9d847bc380e8, .magic: 00000000, .owner:
<none>/-1, .owner_cpu: 0
[ 414.537969] CPU: 0 PID: 3722 Comm: modprobe Tainted: G OE
6.5.0-rc3+ #16
[ 414.537971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 414.537973] Call Trace:
[ 414.537973] <TASK>
[ 414.537975] dump_stack_lvl+0x77/0xd0
[ 414.537979] dump_stack+0x10/0x20
[ 414.537981] spin_bug+0xa5/0xd0
[ 414.537984] do_raw_spin_lock+0x90/0xd0
[ 414.537985] _raw_spin_lock_irqsave+0x56/0x80
[ 414.537988] ? __wake_up_common_lock+0x63/0xd0
[ 414.537990] __wake_up_common_lock+0x63/0xd0
[ 414.537992] __wake_up+0x13/0x30
[ 414.537994] siw_stop_tx_thread+0x49/0x70 [siw]
[ 414.538000] siw_exit_module+0x30/0x620 [siw]
[ 414.538006] __do_sys_delete_module.constprop.0+0x18f/0x300
[ 414.538008] ? syscall_enter_from_user_mode+0x21/0x70
[ 414.538010] ? __this_cpu_preempt_check+0x13/0x20
[ 414.538012] ? lockdep_hardirqs_on+0x86/0x120
[ 414.538014] __x64_sys_delete_module+0x12/0x20
[ 414.538016] do_syscall_64+0x5c/0x90
[ 414.538019] ? do_syscall_64+0x69/0x90
[ 414.538020] ? __this_cpu_preempt_check+0x13/0x20
[ 414.538022] ? lockdep_hardirqs_on+0x86/0x120
[ 414.538024] ? syscall_exit_to_user_mode+0x37/0x50
[ 414.538025] ? do_syscall_64+0x69/0x90
[ 414.538026] ? syscall_exit_to_user_mode+0x37/0x50
[ 414.538027] ? do_syscall_64+0x69/0x90
[ 414.538029] ? syscall_exit_to_user_mode+0x37/0x50
[ 414.538030] ? do_syscall_64+0x69/0x90
[ 414.538032] ? irqentry_exit_to_user_mode+0x25/0x30
[ 414.538033] ? irqentry_exit+0x77/0xb0
[ 414.538034] ? exc_page_fault+0xae/0x240
[ 414.538036] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 414.538038] RIP: 0033:0x7f177eb26c9b
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 0/5] Fix potential issues for siw
2023-07-28 1:16 ` Guoqing Jiang
2023-07-28 2:29 ` Guoqing Jiang
@ 2023-07-28 9:36 ` Bernard Metzler
1 sibling, 0 replies; 15+ messages in thread
From: Bernard Metzler @ 2023-07-28 9:36 UTC (permalink / raw)
To: Guoqing Jiang, Jason Gunthorpe
Cc: leon@kernel.org, linux-rdma@vger.kernel.org
> -----Original Message-----
> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> Sent: Friday, 28 July 2023 03:16
> To: Jason Gunthorpe <jgg@ziepe.ca>; Bernard Metzler <BMT@zurich.ibm.com>
> Cc: leon@kernel.org; linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] Re: [PATCH 0/5] Fix potential issues for siw
>
>
>
> On 7/28/23 01:29, Jason Gunthorpe wrote:
> > On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
> >>
> >>> -----Original Message-----
> >>> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> >>> Sent: Thursday, 27 July 2023 16:04
> >>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca; leon@kernel.org
> >>> Cc: linux-rdma@vger.kernel.org
> >>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
> >>>
> >>> Hi,
> >>>
> >>> Several issues appeared if we rmmod siw module after failed to insert
> >>> the module (with manual change like below).
> >>>
> >>> --- a/drivers/infiniband/sw/siw/siw_main.c
> >>> +++ b/drivers/infiniband/sw/siw/siw_main.c
> >>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
> >>> if (rv)
> >>> goto out_error;
> >>>
> >>> + goto out_error;
> >>> rdma_link_register(&siw_link_ops);
> >>>
> >>> Basically, these issues are double free, use before initalization or
> >>> null pointer dereference. For more details, pls review the individual
> >>> patch.
> >>>
> >>> Thanks,
> >>> Guoqing
> >> Hi Guoqing,
> >>
> >> very good catch, thank you. I was under the wrong assumption a
> >> module is not loaded if the init_module() returns a value.
> > I think that is actually true, isn't it? I'm confused?
>
> Yes, you are right. Since rv is still 0, so the module appears in the
> kernel. Not sure if some tool could inject err like this. Feel free to
> ignore this.
>
Right I came to the same conclusion today 😉 -
if you set a return value in you test the module is not loaded.
> Thanks,
> Guoqing
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [PATCH 0/5] Fix potential issues for siw
2023-07-28 2:29 ` Guoqing Jiang
@ 2023-07-28 11:10 ` Bernard Metzler
0 siblings, 0 replies; 15+ messages in thread
From: Bernard Metzler @ 2023-07-28 11:10 UTC (permalink / raw)
To: Guoqing Jiang, Jason Gunthorpe
Cc: leon@kernel.org, linux-rdma@vger.kernel.org
> -----Original Message-----
> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> Sent: Friday, 28 July 2023 04:29
> To: Jason Gunthorpe <jgg@ziepe.ca>; Bernard Metzler <BMT@zurich.ibm.com>
> Cc: leon@kernel.org; linux-rdma@vger.kernel.org
> Subject: [EXTERNAL] Re: [PATCH 0/5] Fix potential issues for siw
>
>
>
> On 7/28/23 09:16, Guoqing Jiang wrote:
> >
> >
> > On 7/28/23 01:29, Jason Gunthorpe wrote:
> >> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
> >>>
> >>>> -----Original Message-----
> >>>> From: Guoqing Jiang <guoqing.jiang@linux.dev>
> >>>> Sent: Thursday, 27 July 2023 16:04
> >>>> To: Bernard Metzler <BMT@zurich.ibm.com>; jgg@ziepe.ca;
> >>>> leon@kernel.org
> >>>> Cc: linux-rdma@vger.kernel.org
> >>>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
> >>>>
> >>>> Hi,
> >>>>
> >>>> Several issues appeared if we rmmod siw module after failed to insert
> >>>> the module (with manual change like below).
> >>>>
> >>>> --- a/drivers/infiniband/sw/siw/siw_main.c
> >>>> +++ b/drivers/infiniband/sw/siw/siw_main.c
> >>>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
> >>>> if (rv)
> >>>> goto out_error;
> >>>>
> >>>> + goto out_error;
> >>>> rdma_link_register(&siw_link_ops);
> >>>>
> >>>> Basically, these issues are double free, use before initalization or
> >>>> null pointer dereference. For more details, pls review the individual
> >>>> patch.
> >>>>
> >>>> Thanks,
> >>>> Guoqing
> >>> Hi Guoqing,
> >>>
> >>> very good catch, thank you. I was under the wrong assumption a
> >>> module is not loaded if the init_module() returns a value.
> >> I think that is actually true, isn't it? I'm confused?
> >
> > Yes, you are right. Since rv is still 0, so the module appears in the
> > kernel. Not sure if some tool could inject err like this. Feel free to
> > ignore this.
>
> The below trace happened if I run a stress test with load siw module and
> unload siw in a loop, which should be fixed by patch 5, so I think we
> need to apply it, what do you think?
I see. makes sense -- you are removing the module while some tx
threads did not get a chance to init their state.
Why don't we better cleanly initialize the tasks state before
spawning it? Your extra task parameter 'running' smells like
it could lead to a similar issue - at module removal, the
task may have not get a chance to set 'running' true, so it is not
stopped and finally left alone.
Working on an according patch...
Thanks,
Bernard.
>
> [ 414.537961] BUG: spinlock bad magic on CPU#0, modprobe/3722
> [ 414.537965] lock: 0xffff9d847bc380e8, .magic: 00000000, .owner:
> <none>/-1, .owner_cpu: 0
> [ 414.537969] CPU: 0 PID: 3722 Comm: modprobe Tainted: G OE
> 6.5.0-rc3+ #16
> [ 414.537971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
> [ 414.537973] Call Trace:
> [ 414.537973] <TASK>
> [ 414.537975] dump_stack_lvl+0x77/0xd0
> [ 414.537979] dump_stack+0x10/0x20
> [ 414.537981] spin_bug+0xa5/0xd0
> [ 414.537984] do_raw_spin_lock+0x90/0xd0
> [ 414.537985] _raw_spin_lock_irqsave+0x56/0x80
> [ 414.537988] ? __wake_up_common_lock+0x63/0xd0
> [ 414.537990] __wake_up_common_lock+0x63/0xd0
> [ 414.537992] __wake_up+0x13/0x30
> [ 414.537994] siw_stop_tx_thread+0x49/0x70 [siw]
> [ 414.538000] siw_exit_module+0x30/0x620 [siw]
> [ 414.538006] __do_sys_delete_module.constprop.0+0x18f/0x300
> [ 414.538008] ? syscall_enter_from_user_mode+0x21/0x70
> [ 414.538010] ? __this_cpu_preempt_check+0x13/0x20
> [ 414.538012] ? lockdep_hardirqs_on+0x86/0x120
> [ 414.538014] __x64_sys_delete_module+0x12/0x20
> [ 414.538016] do_syscall_64+0x5c/0x90
> [ 414.538019] ? do_syscall_64+0x69/0x90
> [ 414.538020] ? __this_cpu_preempt_check+0x13/0x20
> [ 414.538022] ? lockdep_hardirqs_on+0x86/0x120
> [ 414.538024] ? syscall_exit_to_user_mode+0x37/0x50
> [ 414.538025] ? do_syscall_64+0x69/0x90
> [ 414.538026] ? syscall_exit_to_user_mode+0x37/0x50
> [ 414.538027] ? do_syscall_64+0x69/0x90
> [ 414.538029] ? syscall_exit_to_user_mode+0x37/0x50
> [ 414.538030] ? do_syscall_64+0x69/0x90
> [ 414.538032] ? irqentry_exit_to_user_mode+0x25/0x30
> [ 414.538033] ? irqentry_exit+0x77/0xb0
> [ 414.538034] ? exc_page_fault+0xae/0x240
> [ 414.538036] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [ 414.538038] RIP: 0033:0x7f177eb26c9b
>
> Thanks,
> Guoqing
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] Fix potential issues for siw
2023-07-27 14:03 [PATCH 0/5] Fix potential issues for siw Guoqing Jiang
` (5 preceding siblings ...)
2023-07-27 17:17 ` [PATCH 0/5] Fix potential issues for siw Bernard Metzler
@ 2023-08-09 19:04 ` Jason Gunthorpe
2023-08-10 1:14 ` Guoqing Jiang
6 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2023-08-09 19:04 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: bmt, leon, linux-rdma
On Thu, Jul 27, 2023 at 10:03:44PM +0800, Guoqing Jiang wrote:
> Hi,
>
> Several issues appeared if we rmmod siw module after failed to insert
> the module (with manual change like below).
>
> --- a/drivers/infiniband/sw/siw/siw_main.c
> +++ b/drivers/infiniband/sw/siw/siw_main.c
> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
> if (rv)
> goto out_error;
>
> + goto out_error;
> rdma_link_register(&siw_link_ops);
>
> Basically, these issues are double free, use before initalization or
> null pointer dereference. For more details, pls review the individual
> patch.
>
> Thanks,
> Guoqing
>
> Guoqing Jiang (5):
> RDMA/siw: Set siw_cm_wq to NULL after it is destroyed
> RDMA/siw: Ensure siw_destroy_cpulist can be called more than once
> RDMA/siw: Initialize siw_link_ops.list
> RDMA/siw: Set siw_crypto_shash to NULL after it is freed
> RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread
What is the status of this series? Bernards fix for some of this was
already merged and this doesn't apply, so regardless it needs
resending.
Jason
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] Fix potential issues for siw
2023-08-09 19:04 ` Jason Gunthorpe
@ 2023-08-10 1:14 ` Guoqing Jiang
0 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2023-08-10 1:14 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: bmt, leon, linux-rdma
On 8/10/23 03:04, Jason Gunthorpe wrote:
> On Thu, Jul 27, 2023 at 10:03:44PM +0800, Guoqing Jiang wrote:
>> Hi,
>>
>> Several issues appeared if we rmmod siw module after failed to insert
>> the module (with manual change like below).
>>
>> --- a/drivers/infiniband/sw/siw/siw_main.c
>> +++ b/drivers/infiniband/sw/siw/siw_main.c
>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
>> if (rv)
>> goto out_error;
>>
>> + goto out_error;
>> rdma_link_register(&siw_link_ops);
>>
>> Basically, these issues are double free, use before initalization or
>> null pointer dereference. For more details, pls review the individual
>> patch.
>>
>> Thanks,
>> Guoqing
>>
>> Guoqing Jiang (5):
>> RDMA/siw: Set siw_cm_wq to NULL after it is destroyed
>> RDMA/siw: Ensure siw_destroy_cpulist can be called more than once
>> RDMA/siw: Initialize siw_link_ops.list
>> RDMA/siw: Set siw_crypto_shash to NULL after it is freed
>> RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread
> What is the status of this series? Bernards fix for some of this was
> already merged and this doesn't apply, so regardless it needs
> resending.
After Bernard's fix was already merged, and other patches are not needed
since it is impossible to trigger them in real life. So pls just ignore this
series.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-08-10 1:14 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-27 14:03 [PATCH 0/5] Fix potential issues for siw Guoqing Jiang
2023-07-27 14:03 ` [PATCH 1/5] RDMA/siw: Set siw_cm_wq to NULL after it is destroyed Guoqing Jiang
2023-07-27 14:03 ` [PATCH 2/5] RDMA/siw: Ensure siw_destroy_cpulist can be called more than once Guoqing Jiang
2023-07-27 14:03 ` [PATCH 3/5] RDMA/siw: Initialize siw_link_ops.list Guoqing Jiang
2023-07-27 14:03 ` [PATCH 4/5] RDMA/siw: Set siw_crypto_shash to NULL after it is freed Guoqing Jiang
2023-07-27 14:03 ` [PATCH 5/5] RDMA/siw: Don't call wake_up unconditionally in siw_stop_tx_thread Guoqing Jiang
2023-07-27 17:17 ` [PATCH 0/5] Fix potential issues for siw Bernard Metzler
2023-07-27 17:29 ` Jason Gunthorpe
2023-07-27 18:15 ` Bart Van Assche
2023-07-28 1:16 ` Guoqing Jiang
2023-07-28 2:29 ` Guoqing Jiang
2023-07-28 11:10 ` Bernard Metzler
2023-07-28 9:36 ` Bernard Metzler
2023-08-09 19:04 ` Jason Gunthorpe
2023-08-10 1:14 ` Guoqing Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).