All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-rc] RDMA/cma: fix race between addr_handler and resolve_route
@ 2020-04-03 18:43 Håkon Bugge
  2020-04-03 18:57 ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Håkon Bugge @ 2020-04-03 18:43 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: linux-rdma, george.kennedy

A syzkaller test hits a NULL pointer dereference in
rdma_resolve_route():

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
Modules linked in:
CPU: 0 PID: 7185 Comm: syz-executor670 Not tainted 4.14.147 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
task: ffff8880a7c4c0c0 task.stack: ffff888084398000
RIP: 0010:rdma_cap_ib_sa include/rdma/ib_verbs.h:2682 [inline]
RIP: 0010:rdma_resolve_route+0x11f/0x2bf0 drivers/infiniband/core/cma.c:2678
RSP: 0018:ffff88808439fa20 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000029 RSI: 0000000000000001 RDI: 0000000000000148
RBP: ffff88808439fbc0 R08: ffff8880a7c4c0c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88809f540dc0
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88809f540f78
FS:  00007f03a8778700(0000) GS:ffff8880aee00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f03a8777e78 CR3: 000000008afd9000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  ucma_resolve_route+0xab/0x100 drivers/infiniband/core/ucma.c:740
  ucma_write+0x231/0x310 drivers/infiniband/core/ucma.c:1672
  __vfs_write+0x105/0x6b0 fs/read_write.c:480
  vfs_write+0x198/0x500 fs/read_write.c:544
  SYSC_write fs/read_write.c:590 [inline]
  SyS_write+0xfd/0x230 fs/read_write.c:582
  do_syscall_64+0x1e8/0x640 arch/x86/entry/common.c:292
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x446a49
RSP: 002b:00007f03a8777db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000006dbc48 RCX: 0000000000446a49
RDX: 0000000000000010 RSI: 0000000020000200 RDI: 0000000000000004
RBP: 00000000006dbc40 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc4c
R13: 00007ffea0dde3ef R14: 00007f03a87789c0 R15: 0000000000000000
Code: 00 48 c1 ea 03 80 3c 02 00 0f 85 ea 24 00 00 48 b8 00 00 00 00 00 fc
ff df 4d 8b 34 24 49 8d be 48 01 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00
0f 85 d1 24 00 00 48 b8 00 00 00 00 00 fc ff df 41
RIP: rdma_cap_ib_sa include/rdma/ib_verbs.h:2682 [inline] RSP:
ffff88808439fa20
RIP: rdma_resolve_route+0x11f/0x2bf0 drivers/infiniband/core/cma.c:2678
RSP: ffff88808439fa20

The crash is caused by a race between the CM state updates in
addr_handler() and rdma_resolve_route().

The syzkaller program executes following sequence:

1. rdma_create_id()
2. rdma_resolve_ip()
3. rdma_resolve_route()

Please note the lack of rdma_get_cm_event() between 2 and 3 above. The
following happens:

Thread 1                      CM_ID state                    Thread 2
rdma_create_id()                     IDLE
rdma_resolve_ip()              ADDR_QUERY
                                                       addr_handler()
                            ADDR_RESOLVED                 (set state)
rdma_resolve_route()
  (check state and set to)    ROUTE_QUERY
                                           --> cma_cm_event_handler()
                               DESTROYING             (returns error)
                                                --> rdma_destroy_id()
--> rdma_cap_ib_sa()
(but cm_id has been destroyed)

I see two solutions; a) let addr_handler() only change the state once,
to one of ADDR_RESOLVED, ADDR_BOUND, or DESTROYING or b) add mutex
locking to rdma_resolve_addr(). b) requires addr_handler() to
relinquish the lock when calling the event handler, as the event
handler typically will call rdma_resolve_route() before returning.

The b) solution is implemented herein.

Reported-by: syzbot+69226cc89d87fd4f8f40@syzkaller.appspotmail.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Cc: stable@vger.kernel.org
---
 drivers/infiniband/core/cma.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2dec3a02ab9f..45f26bc0fbfe 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2979,8 +2979,11 @@ int rdma_resolve_route(struct rdma_cm_id *id, unsigned long timeout_ms)
 	int ret;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ROUTE_QUERY))
+	mutex_lock(&id_priv->handler_mutex);
+	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED, RDMA_CM_ROUTE_QUERY)) {
+		mutex_unlock(&id_priv->handler_mutex);
 		return -EINVAL;
+	}
 
 	atomic_inc(&id_priv->refcount);
 	if (rdma_cap_ib_sa(id->device, id->port_num))
@@ -2995,9 +2998,12 @@ int rdma_resolve_route(struct rdma_cm_id *id, unsigned long timeout_ms)
 	if (ret)
 		goto err;
 
+	mutex_unlock(&id_priv->handler_mutex);
 	return 0;
 err:
 	cma_comp_exch(id_priv, RDMA_CM_ROUTE_QUERY, RDMA_CM_ADDR_RESOLVED);
+	mutex_unlock(&id_priv->handler_mutex);
+
 	cma_deref_id(id_priv);
 	return ret;
 }
@@ -3085,6 +3091,7 @@ static void addr_handler(int status, struct sockaddr *src_addr,
 	struct rdma_cm_event event = {};
 	struct sockaddr *addr;
 	struct sockaddr_storage old_addr;
+	int ret;
 
 	mutex_lock(&id_priv->handler_mutex);
 	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY,
@@ -3119,7 +3126,11 @@ static void addr_handler(int status, struct sockaddr *src_addr,
 	} else
 		event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
 
-	if (cma_cm_event_handler(id_priv, &event)) {
+	mutex_unlock(&id_priv->handler_mutex);
+	ret = cma_cm_event_handler(id_priv, &event);
+	mutex_lock(&id_priv->handler_mutex);
+
+	if (ret) {
 		cma_exch(id_priv, RDMA_CM_DESTROYING);
 		mutex_unlock(&id_priv->handler_mutex);
 		rdma_destroy_id(&id_priv->id);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread
* general protection fault in rdma_resolve_route
@ 2018-04-19 16:04 syzbot
  2018-04-19 16:12 ` Parav Pandit
  0 siblings, 1 reply; 18+ messages in thread
From: syzbot @ 2018-04-19 16:04 UTC (permalink / raw)
  To: danielj, dasaratharaman.chandramouli, dledford, jgg, leon,
	linux-kernel, linux-rdma, monis, parav, swise, syzkaller-bugs

Hello,

syzbot hit the following crash on upstream commit
a27fc14219f2e3c4a46ba9177b04d9b52c875532 (Mon Apr 16 21:07:39 2018 +0000)
Merge branch 'parisc-4.17-3' of  
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=17c13600b3977aa8ef7f

So far this crash happened 2 times on upstream.
Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=6198183931674624
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5914490758943236750
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+17c13600b3977aa8ef7f@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 750 Comm: syz-executor4 Not tainted 4.17.0-rc1+ #6
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:rdma_cap_ib_sa include/rdma/ib_verbs.h:2840 [inline]
RIP: 0010:rdma_resolve_route+0x134/0x2160 drivers/infiniband/core/cma.c:2668
RSP: 0018:ffff8801b3e87850 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8801abf92c00 RCX: 0000000000000029
RDX: dffffc0000000000 RSI: 0000000000000004 RDI: 0000000000000148
RBP: ffff8801b3e87a00 R08: ffffed00357f25e5 R09: ffffed00357f25e4
R10: ffffed00357f25e4 R11: ffff8801abf92f23 R12: 1ffff100367d0f12
R13: dffffc0000000000 R14: ffff8801abf92db8 R15: 0000000000000000
FS:  00007f673e752700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000a3eab8 CR3: 00000001b10e7000 CR4: 00000000001426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  ucma_resolve_route+0x179/0x1c0 drivers/infiniband/core/ucma.c:741
  ucma_write+0x328/0x410 drivers/infiniband/core/ucma.c:1664
  __vfs_write+0x10b/0x880 fs/read_write.c:485
  vfs_write+0x1f8/0x560 fs/read_write.c:549
  ksys_write+0xf9/0x250 fs/read_write.c:598
  __do_sys_write fs/read_write.c:610 [inline]
  __se_sys_write fs/read_write.c:607 [inline]
  __x64_sys_write+0x73/0xb0 fs/read_write.c:607
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455329
RSP: 002b:00007f673e751c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f673e7526d4 RCX: 0000000000455329
RDX: 0000000000000010 RSI: 0000000020000100 RDI: 0000000000000014
RBP: 000000000072c010 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000006c3 R14: 00000000006fd2e8 R15: 0000000000000002
Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 14 1c 00 00 48 ba 00 00 00 00 00  
fc ff df 48 8b 03 48 8d b8 48 01 00 00 48 89 f9 48 c1 e9 03 <80> 3c 11 00  
0f 85 d7 1b 00 00 45 0f b6 ef 49 c1 e5 04 4c 03 a8
RIP: rdma_cap_ib_sa include/rdma/ib_verbs.h:2840 [inline] RSP:  
ffff8801b3e87850
RIP: rdma_resolve_route+0x134/0x2160 drivers/infiniband/core/cma.c:2668  
RSP: ffff8801b3e87850
---[ end trace c34c2fb6aeff4a19 ]---


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-04-28  6:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-03 18:43 [PATCH for-rc] RDMA/cma: fix race between addr_handler and resolve_route Håkon Bugge
2020-04-03 18:57 ` Jason Gunthorpe
2020-04-03 19:07   ` Håkon Bugge
2020-04-03 19:36     ` Jason Gunthorpe
2020-04-06 17:00       ` Håkon Bugge
2020-04-06 17:31         ` Jason Gunthorpe
2020-04-06 18:02           ` Håkon Bugge
2020-04-06 18:10             ` Jason Gunthorpe
2020-04-14 10:34               ` Håkon Bugge
2020-04-14 12:50                 ` Jason Gunthorpe
2020-04-14 13:57                   ` Håkon Bugge
2020-04-14 16:11                     ` Jason Gunthorpe
2020-04-16 13:33                       ` Håkon Bugge
2020-04-16 18:55                         ` Jason Gunthorpe
2021-04-28  6:03   ` general protection fault in rdma_resolve_route syzbot
  -- strict thread matches above, loose matches on Subject: below --
2018-04-19 16:04 syzbot
2018-04-19 16:12 ` Parav Pandit
2018-04-19 16:23   ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.