* [PATCH] RDMA/cma: Fix null pointer issue
@ 2018-02-01 6:50 Guanglei Li
[not found] ` <1517467842-2437-1-git-send-email-guanglei.li-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Guanglei Li @ 2018-02-01 6:50 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, jgg-VPRAkNaXOzVWk0Htik3J/w
Cc: guanglei.li-QHcLZuEGTsvQT0dZR+AlfA,
junxiao.bi-QHcLZuEGTsvQT0dZR+AlfA,
honglei.wang-QHcLZuEGTsvQT0dZR+AlfA
Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall
PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6"
#0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
#1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
#2 [ffff898e35f15b30] oops_end at ffffffff8150f518
#3 [ffff898e35f15b60] no_context at ffffffff8104854c
#4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
#5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
#6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
#7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
[exception RIP: unknown or invalid address]
RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282
RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88
RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00
RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000
R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0
R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
#9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
#10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
#11 [ffff898e35f15ee8] kthread at ffffffff81090fe6
PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap"
#0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
#1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
#2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
#3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
#4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
#5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
#6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
#7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
#8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670
Race condition:
PID: 45659 PID: 47039
rds_ib_laddr_check
/*create id_priv with a null event_handler*/
rdma_create_id
rdma_bind_addr
cma_acquire_dev
/*add id_priv to cma_dev->id_list*/
cma_attach_to_dev
cma_ndev_work_handler
/*event_hanlder is null*/
id_priv->id.event_handler
Signed-off-by: Guanglei Li <guanglei.li-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Honglei Wang <honglei.wang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/core/cma.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index e66963c..d9ca943 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2431,9 +2431,12 @@ static void cma_ndev_work_handler(struct work_struct *_work)
id_priv->state == RDMA_CM_DEVICE_REMOVAL)
goto out;
- if (id_priv->id.event_handler(&id_priv->id, &work->event)) {
- cma_exch(id_priv, RDMA_CM_DESTROYING);
- destroy = 1;
+ /*event_handler is null when create cm id by calling rds_ib_laddr_check*/
+ if (id_priv->id.event_handler) {
+ if (id_priv->id.event_handler(&id_priv->id, &work->event)) {
+ cma_exch(id_priv, RDMA_CM_DESTROYING);
+ destroy = 1;
+ }
}
out:
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 3+ messages in thread[parent not found: <1517467842-2437-1-git-send-email-guanglei.li-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH] RDMA/cma: Fix null pointer issue [not found] ` <1517467842-2437-1-git-send-email-guanglei.li-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2018-02-01 7:28 ` yanjzhu 2018-02-01 7:47 ` Leon Romanovsky 1 sibling, 0 replies; 3+ messages in thread From: yanjzhu @ 2018-02-01 7:28 UTC (permalink / raw) To: Guanglei Li, dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA, jgg-VPRAkNaXOzVWk0Htik3J/w Cc: junxiao.bi-QHcLZuEGTsvQT0dZR+AlfA, honglei.wang-QHcLZuEGTsvQT0dZR+AlfA On 02/01/2018 02:50 PM, Guanglei Li wrote: > Scenario: > 1. Port down and do fail over > 2. Ap do rds_bind syscall > > PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6" > #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9 > #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3 > #2 [ffff898e35f15b30] oops_end at ffffffff8150f518 > #3 [ffff898e35f15b60] no_context at ffffffff8104854c > #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675 > #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3 > #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8 > #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95 > [exception RIP: unknown or invalid address] > RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282 > RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88 > RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00 > RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000 > R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0 > R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm] > #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6 > #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0 > #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6 > > PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap" > #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4 > #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf > #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7 > #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb > #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm] > #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma] > #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds] > #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds] > #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670 > > Race condition: > PID: 45659 PID: 47039 > rds_ib_laddr_check > /*create id_priv with a null event_handler*/ > rdma_create_id > rdma_bind_addr > cma_acquire_dev > /*add id_priv to cma_dev->id_list*/ > cma_attach_to_dev > cma_ndev_work_handler > /*event_hanlder is null*/ > id_priv->id.event_handler > > Signed-off-by: Guanglei Li <guanglei.li-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > Signed-off-by: Honglei Wang <honglei.wang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > --- > drivers/infiniband/core/cma.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index e66963c..d9ca943 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -2431,9 +2431,12 @@ static void cma_ndev_work_handler(struct work_struct *_work) > id_priv->state == RDMA_CM_DEVICE_REMOVAL) > goto out; > > - if (id_priv->id.event_handler(&id_priv->id, &work->event)) { > - cma_exch(id_priv, RDMA_CM_DESTROYING); > - destroy = 1; > + /*event_handler is null when create cm id by calling rds_ib_laddr_check*/ comment should be /* xxxxx */ Zhu Yanjun > + if (id_priv->id.event_handler) { > + if (id_priv->id.event_handler(&id_priv->id, &work->event)) { > + cma_exch(id_priv, RDMA_CM_DESTROYING); > + destroy = 1; > + } > } > > out: -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] RDMA/cma: Fix null pointer issue [not found] ` <1517467842-2437-1-git-send-email-guanglei.li-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2018-02-01 7:28 ` yanjzhu @ 2018-02-01 7:47 ` Leon Romanovsky 1 sibling, 0 replies; 3+ messages in thread From: Leon Romanovsky @ 2018-02-01 7:47 UTC (permalink / raw) To: Guanglei Li Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, linux-rdma-u79uwXL29TY76Z2rM5mHXA, jgg-VPRAkNaXOzVWk0Htik3J/w, junxiao.bi-QHcLZuEGTsvQT0dZR+AlfA, honglei.wang-QHcLZuEGTsvQT0dZR+AlfA [-- Attachment #1: Type: text/plain, Size: 4278 bytes --] On Thu, Feb 01, 2018 at 02:50:42PM +0800, Guanglei Li wrote: > Scenario: > 1. Port down and do fail over > 2. Ap do rds_bind syscall > > PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6" > #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9 > #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3 > #2 [ffff898e35f15b30] oops_end at ffffffff8150f518 > #3 [ffff898e35f15b60] no_context at ffffffff8104854c > #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675 > #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3 > #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8 > #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95 > [exception RIP: unknown or invalid address] > RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282 > RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88 > RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00 > RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000 > R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0 > R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm] > #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6 > #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0 > #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6 > > PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap" > #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4 > #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf > #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7 > #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb > #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm] > #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma] > #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds] > #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds] > #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670 > > Race condition: > PID: 45659 PID: 47039 > rds_ib_laddr_check > /*create id_priv with a null event_handler*/ > rdma_create_id > rdma_bind_addr > cma_acquire_dev > /*add id_priv to cma_dev->id_list*/ > cma_attach_to_dev > cma_ndev_work_handler > /*event_hanlder is null*/ > id_priv->id.event_handler > > Signed-off-by: Guanglei Li <guanglei.li-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > Signed-off-by: Honglei Wang <honglei.wang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> > --- > drivers/infiniband/core/cma.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index e66963c..d9ca943 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -2431,9 +2431,12 @@ static void cma_ndev_work_handler(struct work_struct *_work) > id_priv->state == RDMA_CM_DEVICE_REMOVAL) > goto out; > > - if (id_priv->id.event_handler(&id_priv->id, &work->event)) { > - cma_exch(id_priv, RDMA_CM_DESTROYING); > - destroy = 1; > + /*event_handler is null when create cm id by calling rds_ib_laddr_check*/ > + if (id_priv->id.event_handler) { > + if (id_priv->id.event_handler(&id_priv->id, &work->event)) { > + cma_exch(id_priv, RDMA_CM_DESTROYING); > + destroy = 1; > + } > } The analysis looks correct to me, but the solution is less. 339 static int rds_ib_laddr_check(struct net *net, __be32 addr) <...> 348 cm_id = rdma_create_id(&init_net, NULL, NULL, RDMA_PS_TCP, IB_QPT_RC); ^^^^ this part looks suspicious I would say that it is misuse of API and not CMA "issue", especially given the fact that you have rds_rdma_cm_event_handler() with all proper locking. Thanks > > out: > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-02-01 7:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-01 6:50 [PATCH] RDMA/cma: Fix null pointer issue Guanglei Li
[not found] ` <1517467842-2437-1-git-send-email-guanglei.li-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2018-02-01 7:28 ` yanjzhu
2018-02-01 7:47 ` Leon Romanovsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox