* general protection fault in can_rx_register
@ 2020-01-17 13:46 syzbot
2020-01-17 20:02 ` Oliver Hartkopp
0 siblings, 1 reply; 18+ messages in thread
From: syzbot @ 2020-01-17 13:46 UTC (permalink / raw)
To: davem, dev.kurt, linux-can, linux-kernel, mkl, netdev, o.rempel,
socketcan, syzkaller-bugs
Hello,
syzbot found the following crash on:
HEAD commit: f5ae2ea6 Fix built-in early-load Intel microcode alignment
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1033df15e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=cfbb8fa33f49f9f3
dashboard link: https://syzkaller.appspot.com/bug?extid=c3ea30e1e2485573f953
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/
c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13204f15e00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000
The bug was bisected to:
commit 9868b5d44f3df9dd75247acd23dddff0a42f79be
Author: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Date: Mon Oct 8 09:48:33 2018 +0000
can: introduce CAN_REQUIRED_SIZE macro
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=129bfdb9e00000
final crash: https://syzkaller.appspot.com/x/report.txt?x=119bfdb9e00000
console output: https://syzkaller.appspot.com/x/log.txt?x=169bfdb9e00000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c3ea30e1e2485573f953@syzkaller.appspotmail.com
Fixes: 9868b5d44f3d ("can: introduce CAN_REQUIRED_SIZE macro")
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9593 Comm: syz-executor302 Not tainted 5.5.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
raw_enable_filters net/can/raw.c:189 [inline]
raw_enable_allfilters net/can/raw.c:255 [inline]
raw_bind+0x326/0x1230 net/can/raw.c:428
__sys_bind+0x2bd/0x3a0 net/socket.c:1649
__do_sys_bind net/socket.c:1660 [inline]
__se_sys_bind net/socket.c:1658 [inline]
__x64_sys_bind+0x7a/0x90 net/socket.c:1658
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446ba9
Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb132f25d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00000000006dbc88 RCX: 0000000000446ba9
RDX: 0000000000000008 RSI: 0000000020000180 RDI: 0000000000000003
RBP: 00000000006dbc80 R08: 00007fb132f26700 R09: 0000000000000000
R10: 00007fb132f26700 R11: 0000000000000246 R12: 00000000006dbc8c
R13: 0000000000000000 R14: 0000000000000000 R15: 068500100000003c
Modules linked in:
---[ end trace 0dedabb13ca8e7d7 ]---
RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline]
RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476
Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c
89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00
00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89
RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0
RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94
R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008
R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690
FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: general protection fault in can_rx_register 2020-01-17 13:46 general protection fault in can_rx_register syzbot @ 2020-01-17 20:02 ` Oliver Hartkopp 2020-01-20 9:11 ` Kurt Van Dijck 0 siblings, 1 reply; 18+ messages in thread From: Oliver Hartkopp @ 2020-01-17 20:02 UTC (permalink / raw) To: dev.kurt, mkl, o.rempel Cc: syzbot, davem, linux-can, linux-kernel, netdev, syzkaller-bugs Hi Marc, Oleksij, Kurt, On 17/01/2020 14.46, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: f5ae2ea6 Fix built-in early-load Intel microcode alignment > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1033df15e00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=cfbb8fa33f49f9f3 > dashboard link: > https://syzkaller.appspot.com/bug?extid=c3ea30e1e2485573f953 > compiler: clang version 10.0.0 > (https://github.com/llvm/llvm-project/ > c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13204f15e00000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000 > > The bug was bisected to: > > commit 9868b5d44f3df9dd75247acd23dddff0a42f79be > Author: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> > Date: Mon Oct 8 09:48:33 2018 +0000 > > can: introduce CAN_REQUIRED_SIZE macro > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=129bfdb9e00000 > final crash: https://syzkaller.appspot.com/x/report.txt?x=119bfdb9e00000 > console output: https://syzkaller.appspot.com/x/log.txt?x=169bfdb9e00000 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+c3ea30e1e2485573f953@syzkaller.appspotmail.com > Fixes: 9868b5d44f3d ("can: introduce CAN_REQUIRED_SIZE macro") > > kasan: CONFIG_KASAN_INLINE enabled > kasan: GPF could be caused by NULL-ptr deref or user memory access > general protection fault: 0000 [#1] PREEMPT SMP KASAN > CPU: 0 PID: 9593 Comm: syz-executor302 Not tainted 5.5.0-rc6-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline] > RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476 include/linux/rculist.h:528 is struct hlist_node *first = h->first; which would mean that 'h' must be NULL. But the h parameter is rcv_list from rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); Which can not return NULL - at least when dev_rcv_lists is a proper pointer to the dev_rcv_lists provided by can_dev_rcv_lists_find(). So either dev->ml_priv is NULL in the case of having a CAN interface (here vxcan) or we have not allocated net->can.rx_alldev_list in can_pernet_init() properly (which would lead to an -ENOMEM which is reported to whom?). Hm. I'm lost. Any ideas? Regards, Oliver > Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa > 4c 89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c > 05 00 00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89 > RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202 > RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0 > RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94 > R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008 > R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690 > FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > raw_enable_filters net/can/raw.c:189 [inline] > raw_enable_allfilters net/can/raw.c:255 [inline] > raw_bind+0x326/0x1230 net/can/raw.c:428 > __sys_bind+0x2bd/0x3a0 net/socket.c:1649 > __do_sys_bind net/socket.c:1660 [inline] > __se_sys_bind net/socket.c:1658 [inline] > __x64_sys_bind+0x7a/0x90 net/socket.c:1658 > do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x446ba9 > Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 > f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 > f0 ff ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:00007fb132f25d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 > RAX: ffffffffffffffda RBX: 00000000006dbc88 RCX: 0000000000446ba9 > RDX: 0000000000000008 RSI: 0000000020000180 RDI: 0000000000000003 > RBP: 00000000006dbc80 R08: 00007fb132f26700 R09: 0000000000000000 > R10: 00007fb132f26700 R11: 0000000000000246 R12: 00000000006dbc8c > R13: 0000000000000000 R14: 0000000000000000 R15: 068500100000003c > Modules linked in: > ---[ end trace 0dedabb13ca8e7d7 ]--- > RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline] > RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476 > Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa > 4c 89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c > 05 00 00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89 > RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202 > RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0 > RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94 > R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008 > R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690 > FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this bug report. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > For information about bisection process see: > https://goo.gl/tpsmEJ#bisection > syzbot can test patches for this bug, for details see: > https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-17 20:02 ` Oliver Hartkopp @ 2020-01-20 9:11 ` Kurt Van Dijck 2020-01-20 9:22 ` Dmitry Vyukov 0 siblings, 1 reply; 18+ messages in thread From: Kurt Van Dijck @ 2020-01-20 9:11 UTC (permalink / raw) To: Oliver Hartkopp Cc: mkl, o.rempel, syzbot, davem, linux-can, linux-kernel, netdev, syzkaller-bugs If bisect was right with this: > >The bug was bisected to: > > > >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be > >Author: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> > >Date: Mon Oct 8 09:48:33 2018 +0000 > > > > can: introduce CAN_REQUIRED_SIZE macro Then I'd start looking in malformed sockaddr_can data instead. Is this code what triggers the bug? > >C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000 Kind regards, Kurt On vr, 17 jan 2020 21:02:48 +0100, Oliver Hartkopp wrote: > Hi Marc, Oleksij, Kurt, > > On 17/01/2020 14.46, syzbot wrote: > >Hello, > > > >syzbot found the following crash on: > > > >HEAD commit: f5ae2ea6 Fix built-in early-load Intel microcode alignment > >git tree: upstream > >console output: https://syzkaller.appspot.com/x/log.txt?x=1033df15e00000 > >kernel config: https://syzkaller.appspot.com/x/.config?x=cfbb8fa33f49f9f3 > >dashboard link: > >https://syzkaller.appspot.com/bug?extid=c3ea30e1e2485573f953 > >compiler: clang version 10.0.0 > >(https://github.com/llvm/llvm-project/ > >c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > >syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13204f15e00000 > >C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000 > > > >The bug was bisected to: > > > >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be > >Author: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> > >Date: Mon Oct 8 09:48:33 2018 +0000 > > > > can: introduce CAN_REQUIRED_SIZE macro > > > >bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=129bfdb9e00000 > >final crash: https://syzkaller.appspot.com/x/report.txt?x=119bfdb9e00000 > >console output: https://syzkaller.appspot.com/x/log.txt?x=169bfdb9e00000 > > > >IMPORTANT: if you fix the bug, please add the following tag to the commit: > >Reported-by: syzbot+c3ea30e1e2485573f953@syzkaller.appspotmail.com > >Fixes: 9868b5d44f3d ("can: introduce CAN_REQUIRED_SIZE macro") > > > >kasan: CONFIG_KASAN_INLINE enabled > >kasan: GPF could be caused by NULL-ptr deref or user memory access > >general protection fault: 0000 [#1] PREEMPT SMP KASAN > >CPU: 0 PID: 9593 Comm: syz-executor302 Not tainted 5.5.0-rc6-syzkaller #0 > >Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > >Google 01/01/2011 > >RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline] > >RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476 > > include/linux/rculist.h:528 is > > struct hlist_node *first = h->first; > > which would mean that 'h' must be NULL. > > But the h parameter is rcv_list from > rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); > > Which can not return NULL - at least when dev_rcv_lists is a proper pointer > to the dev_rcv_lists provided by can_dev_rcv_lists_find(). > > So either dev->ml_priv is NULL in the case of having a CAN interface (here > vxcan) or we have not allocated net->can.rx_alldev_list in can_pernet_init() > properly (which would lead to an -ENOMEM which is reported to whom?). > > Hm. I'm lost. Any ideas? > > Regards, > Oliver > > > >Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c > >89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00 > >00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89 > >RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202 > >RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100 > >RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0 > >RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94 > >R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008 > >R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690 > >FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) > >knlGS:0000000000000000 > >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0 > >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >Call Trace: > > raw_enable_filters net/can/raw.c:189 [inline] > > raw_enable_allfilters net/can/raw.c:255 [inline] > > raw_bind+0x326/0x1230 net/can/raw.c:428 > > __sys_bind+0x2bd/0x3a0 net/socket.c:1649 > > __do_sys_bind net/socket.c:1660 [inline] > > __se_sys_bind net/socket.c:1658 [inline] > > __x64_sys_bind+0x7a/0x90 net/socket.c:1658 > > do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > >RIP: 0033:0x446ba9 > >Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 > >48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff > >ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00 > >RSP: 002b:00007fb132f25d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 > >RAX: ffffffffffffffda RBX: 00000000006dbc88 RCX: 0000000000446ba9 > >RDX: 0000000000000008 RSI: 0000000020000180 RDI: 0000000000000003 > >RBP: 00000000006dbc80 R08: 00007fb132f26700 R09: 0000000000000000 > >R10: 00007fb132f26700 R11: 0000000000000246 R12: 00000000006dbc8c > >R13: 0000000000000000 R14: 0000000000000000 R15: 068500100000003c > >Modules linked in: > >---[ end trace 0dedabb13ca8e7d7 ]--- > >RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline] > >RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476 > >Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c > >89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00 > >00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89 > >RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202 > >RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100 > >RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0 > >RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94 > >R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008 > >R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690 > >FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) > >knlGS:0000000000000000 > >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0 > >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > >--- > >This bug is generated by a bot. It may contain errors. > >See https://goo.gl/tpsmEJ for more information about syzbot. > >syzbot engineers can be reached at syzkaller@googlegroups.com. > > > >syzbot will keep track of this bug report. See: > >https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > >For information about bisection process see: > >https://goo.gl/tpsmEJ#bisection > >syzbot can test patches for this bug, for details see: > >https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-20 9:11 ` Kurt Van Dijck @ 2020-01-20 9:22 ` Dmitry Vyukov 2020-01-20 22:02 ` Oliver Hartkopp 0 siblings, 1 reply; 18+ messages in thread From: Dmitry Vyukov @ 2020-01-20 9:22 UTC (permalink / raw) To: Oliver Hartkopp, Marc Kleine-Budde, o.rempel, syzbot, David Miller, linux-can, LKML, netdev, syzkaller-bugs On Mon, Jan 20, 2020 at 10:11 AM Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> wrote: > > If bisect was right with this: > > > >The bug was bisected to: > > > > > >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be > > >Author: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> > > >Date: Mon Oct 8 09:48:33 2018 +0000 > > > > > > can: introduce CAN_REQUIRED_SIZE macro > > Then I'd start looking in malformed sockaddr_can data instead. > > Is this code what triggers the bug? > > >C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000 yes > Kind regards, > Kurt > > On vr, 17 jan 2020 21:02:48 +0100, Oliver Hartkopp wrote: > > Hi Marc, Oleksij, Kurt, > > > > On 17/01/2020 14.46, syzbot wrote: > > >Hello, > > > > > >syzbot found the following crash on: > > > > > >HEAD commit: f5ae2ea6 Fix built-in early-load Intel microcode alignment > > >git tree: upstream > > >console output: https://syzkaller.appspot.com/x/log.txt?x=1033df15e00000 > > >kernel config: https://syzkaller.appspot.com/x/.config?x=cfbb8fa33f49f9f3 > > >dashboard link: > > >https://syzkaller.appspot.com/bug?extid=c3ea30e1e2485573f953 > > >compiler: clang version 10.0.0 > > >(https://github.com/llvm/llvm-project/ > > >c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > >syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13204f15e00000 > > >C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000 > > > > > >The bug was bisected to: > > > > > >commit 9868b5d44f3df9dd75247acd23dddff0a42f79be > > >Author: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be> > > >Date: Mon Oct 8 09:48:33 2018 +0000 > > > > > > can: introduce CAN_REQUIRED_SIZE macro > > > > > >bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=129bfdb9e00000 > > >final crash: https://syzkaller.appspot.com/x/report.txt?x=119bfdb9e00000 > > >console output: https://syzkaller.appspot.com/x/log.txt?x=169bfdb9e00000 > > > > > >IMPORTANT: if you fix the bug, please add the following tag to the commit: > > >Reported-by: syzbot+c3ea30e1e2485573f953@syzkaller.appspotmail.com > > >Fixes: 9868b5d44f3d ("can: introduce CAN_REQUIRED_SIZE macro") > > > > > >kasan: CONFIG_KASAN_INLINE enabled > > >kasan: GPF could be caused by NULL-ptr deref or user memory access > > >general protection fault: 0000 [#1] PREEMPT SMP KASAN > > >CPU: 0 PID: 9593 Comm: syz-executor302 Not tainted 5.5.0-rc6-syzkaller #0 > > >Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > >Google 01/01/2011 > > >RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline] > > >RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476 > > > > include/linux/rculist.h:528 is > > > > struct hlist_node *first = h->first; > > > > which would mean that 'h' must be NULL. > > > > But the h parameter is rcv_list from > > rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); > > > > Which can not return NULL - at least when dev_rcv_lists is a proper pointer > > to the dev_rcv_lists provided by can_dev_rcv_lists_find(). > > > > So either dev->ml_priv is NULL in the case of having a CAN interface (here > > vxcan) or we have not allocated net->can.rx_alldev_list in can_pernet_init() > > properly (which would lead to an -ENOMEM which is reported to whom?). > > > > Hm. I'm lost. Any ideas? > > > > Regards, > > Oliver > > > > > > >Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c > > >89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00 > > >00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89 > > >RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202 > > >RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100 > > >RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0 > > >RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94 > > >R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008 > > >R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690 > > >FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) > > >knlGS:0000000000000000 > > >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0 > > >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > >Call Trace: > > > raw_enable_filters net/can/raw.c:189 [inline] > > > raw_enable_allfilters net/can/raw.c:255 [inline] > > > raw_bind+0x326/0x1230 net/can/raw.c:428 > > > __sys_bind+0x2bd/0x3a0 net/socket.c:1649 > > > __do_sys_bind net/socket.c:1660 [inline] > > > __se_sys_bind net/socket.c:1658 [inline] > > > __x64_sys_bind+0x7a/0x90 net/socket.c:1658 > > > do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294 > > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > >RIP: 0033:0x446ba9 > > >Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 > > >48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff > > >ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00 > > >RSP: 002b:00007fb132f25d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 > > >RAX: ffffffffffffffda RBX: 00000000006dbc88 RCX: 0000000000446ba9 > > >RDX: 0000000000000008 RSI: 0000000020000180 RDI: 0000000000000003 > > >RBP: 00000000006dbc80 R08: 00007fb132f26700 R09: 0000000000000000 > > >R10: 00007fb132f26700 R11: 0000000000000246 R12: 00000000006dbc8c > > >R13: 0000000000000000 R14: 0000000000000000 R15: 068500100000003c > > >Modules linked in: > > >---[ end trace 0dedabb13ca8e7d7 ]--- > > >RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline] > > >RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476 > > >Code: 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 89 22 8a fa 4c > > >89 33 4d 89 e5 49 c1 ed 03 48 b8 00 00 00 00 00 fc ff df <41> 80 7c 05 00 > > >00 74 08 4c 89 e7 e8 c5 21 8a fa 4d 8b 34 24 4c 89 > > >RSP: 0018:ffffc90003e27d00 EFLAGS: 00010202 > > >RAX: dffffc0000000000 RBX: ffff8880a77336c8 RCX: ffff88809306a100 > > >RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a77336c0 > > >RBP: ffffc90003e27d58 R08: ffffffff87289cd6 R09: fffff520007c4f94 > > >R10: fffff520007c4f94 R11: 0000000000000000 R12: 0000000000000008 > > >R13: 0000000000000001 R14: ffff88809fbcf000 R15: ffff8880a7733690 > > >FS: 00007fb132f26700(0000) GS:ffff8880aec00000(0000) > > >knlGS:0000000000000000 > > >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > >CR2: 000000000178f590 CR3: 00000000996d6000 CR4: 00000000001406f0 > > >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > >DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > > > > >--- > > >This bug is generated by a bot. It may contain errors. > > >See https://goo.gl/tpsmEJ for more information about syzbot. > > >syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > >syzbot will keep track of this bug report. See: > > >https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > >For information about bisection process see: > > >https://goo.gl/tpsmEJ#bisection > > >syzbot can test patches for this bug, for details see: > > >https://goo.gl/tpsmEJ#testing-patches > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200120091146.GD11138%40x1.vandijck-laurijssen.be. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-20 9:22 ` Dmitry Vyukov @ 2020-01-20 22:02 ` Oliver Hartkopp 2020-01-20 22:35 ` Oliver Hartkopp 0 siblings, 1 reply; 18+ messages in thread From: Oliver Hartkopp @ 2020-01-20 22:02 UTC (permalink / raw) To: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, David Miller, linux-can, LKML, netdev, syzkaller-bugs Hi all, On 20/01/2020 10.22, Dmitry Vyukov wrote: >> Is this code what triggers the bug? >>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138f5db9e00000 > > yes > (..) >>>> RIP: 0010:hlist_add_head_rcu include/linux/rculist.h:528 [inline] >>>> RIP: 0010:can_rx_register+0x43b/0x600 net/can/af_can.c:476 >>> >>> include/linux/rculist.h:528 is >>> >>> struct hlist_node *first = h->first; >>> >>> which would mean that 'h' must be NULL. >>> >>> But the h parameter is rcv_list from >>> rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); >>> >>> Which can not return NULL - at least when dev_rcv_lists is a proper pointer >>> to the dev_rcv_lists provided by can_dev_rcv_lists_find(). >>> >>> So either dev->ml_priv is NULL in the case of having a CAN interface (here >>> vxcan) ... Added some code to check whether dev->ml_priv is NULL: ~/linux$ git diff diff --git a/net/can/af_can.c b/net/can/af_can.c index 128d37a4c2e0..6fb4ae4c359e 100644 --- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id, spin_lock_bh(&net->can.rcvlists_lock); dev_rcv_lists = can_dev_rcv_lists_find(net, dev); + if (!dev_rcv_lists) { + pr_err("dev_rcv_lists == NULL! %p\n", dev); + goto out_unlock; + } rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); rcv->can_id = can_id; @@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id, rcv_lists_stats->rcv_entries++; rcv_lists_stats->rcv_entries_max = max(rcv_lists_stats->rcv_entries_max, rcv_lists_stats->rcv_entries); +out_unlock: spin_unlock_bh(&net->can.rcvlists_lock); return err; And the output (after some time) is: [ 758.505841] netlink: 'crash': attribute type 1 has an invalid length. [ 758.508045] bond7148: (slave vxcan1): The slave device specified does not support setting the MAC address [ 758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu [ 758.532025] bond10413: (slave vxcan1): The slave device specified does not support setting the MAC address [ 758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu [ 758.532254] dev_rcv_lists == NULL! 000000006b9d257f [ 758.547392] netlink: 'crash': attribute type 1 has an invalid length. [ 758.549310] bond7145: (slave vxcan1): The slave device specified does not support setting the MAC address [ 758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu [ 758.550464] netlink: 'crash': attribute type 1 has an invalid length. [ 758.552301] bond7146: (slave vxcan1): The slave device specified does not support setting the MAC address So we can see that we get a ml_priv pointer which is NULL which should not be possible due to this: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743 Btw. the variable 'size' is set two times at the top of alloc_candev_mqs() depending on echo_skb_max. This looks wrong. Best regards, Oliver ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-20 22:02 ` Oliver Hartkopp @ 2020-01-20 22:35 ` Oliver Hartkopp 2020-01-21 8:30 ` Kurt Van Dijck 2020-01-21 19:22 ` Kurt Van Dijck 0 siblings, 2 replies; 18+ messages in thread From: Oliver Hartkopp @ 2020-01-20 22:35 UTC (permalink / raw) To: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, David Miller, linux-can, LKML, netdev, syzkaller-bugs, Kurt Van Dijck Answering myself ... On 20/01/2020 23.02, Oliver Hartkopp wrote: > > Added some code to check whether dev->ml_priv is NULL: > > ~/linux$ git diff > diff --git a/net/can/af_can.c b/net/can/af_can.c > index 128d37a4c2e0..6fb4ae4c359e 100644 > --- a/net/can/af_can.c > +++ b/net/can/af_can.c > @@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct > net_device *dev, canid_t can_id, > spin_lock_bh(&net->can.rcvlists_lock); > > dev_rcv_lists = can_dev_rcv_lists_find(net, dev); > + if (!dev_rcv_lists) { > + pr_err("dev_rcv_lists == NULL! %p\n", dev); > + goto out_unlock; > + } > rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); > > rcv->can_id = can_id; > @@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct > net_device *dev, canid_t can_id, > rcv_lists_stats->rcv_entries++; > rcv_lists_stats->rcv_entries_max = > max(rcv_lists_stats->rcv_entries_max, > > rcv_lists_stats->rcv_entries); > +out_unlock: > spin_unlock_bh(&net->can.rcvlists_lock); > > return err; > > And the output (after some time) is: > > [ 758.505841] netlink: 'crash': attribute type 1 has an invalid length. > [ 758.508045] bond7148: (slave vxcan1): The slave device specified does > not support setting the MAC address > [ 758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu > [ 758.532025] bond10413: (slave vxcan1): The slave device specified > does not support setting the MAC address > [ 758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu > [ 758.532254] dev_rcv_lists == NULL! 000000006b9d257f > [ 758.547392] netlink: 'crash': attribute type 1 has an invalid length. > [ 758.549310] bond7145: (slave vxcan1): The slave device specified does > not support setting the MAC address > [ 758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu > [ 758.550464] netlink: 'crash': attribute type 1 has an invalid length. > [ 758.552301] bond7146: (slave vxcan1): The slave device specified does > not support setting the MAC address > > So we can see that we get a ml_priv pointer which is NULL which should > not be possible due to this: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743 This reference doesn't point to the right code as vxcan has its own handling do assign ml_priv in vxcan.c . > Btw. the variable 'size' is set two times at the top of > alloc_candev_mqs() depending on echo_skb_max. This looks wrong. No. It looks right as I did not get behind the ALIGN() macro at first sight. But it is still open why dev->ml_priv is not set correctly in vxcan.c as all the settings for .priv_size and in vxcan_setup look fine. Best regards, Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-20 22:35 ` Oliver Hartkopp @ 2020-01-21 8:30 ` Kurt Van Dijck 2020-01-21 8:35 ` Kurt Van Dijck 2020-01-21 18:54 ` Kurt Van Dijck 2020-01-21 19:22 ` Kurt Van Dijck 1 sibling, 2 replies; 18+ messages in thread From: Kurt Van Dijck @ 2020-01-21 8:30 UTC (permalink / raw) To: Oliver Hartkopp Cc: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, David Miller, linux-can, LKML, netdev, syzkaller-bugs On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote: > Answering myself ... > > On 20/01/2020 23.02, Oliver Hartkopp wrote: > > > > >Added some code to check whether dev->ml_priv is NULL: > > > >~/linux$ git diff > >diff --git a/net/can/af_can.c b/net/can/af_can.c > >index 128d37a4c2e0..6fb4ae4c359e 100644 > >--- a/net/can/af_can.c > >+++ b/net/can/af_can.c > >@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct > >net_device *dev, canid_t can_id, > > spin_lock_bh(&net->can.rcvlists_lock); > > > > dev_rcv_lists = can_dev_rcv_lists_find(net, dev); > >+ if (!dev_rcv_lists) { > >+ pr_err("dev_rcv_lists == NULL! %p\n", dev); > >+ goto out_unlock; > >+ } > > rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); > > > > rcv->can_id = can_id; > >@@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct net_device > >*dev, canid_t can_id, > > rcv_lists_stats->rcv_entries++; > > rcv_lists_stats->rcv_entries_max = > >max(rcv_lists_stats->rcv_entries_max, > > > >rcv_lists_stats->rcv_entries); > >+out_unlock: > > spin_unlock_bh(&net->can.rcvlists_lock); > > > > return err; > > > >And the output (after some time) is: > > > >[ 758.505841] netlink: 'crash': attribute type 1 has an invalid length. > >[ 758.508045] bond7148: (slave vxcan1): The slave device specified does > >not support setting the MAC address > >[ 758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu > >[ 758.532025] bond10413: (slave vxcan1): The slave device specified does > >not support setting the MAC address > >[ 758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu > >[ 758.532254] dev_rcv_lists == NULL! 000000006b9d257f > >[ 758.547392] netlink: 'crash': attribute type 1 has an invalid length. > >[ 758.549310] bond7145: (slave vxcan1): The slave device specified does > >not support setting the MAC address > >[ 758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu > >[ 758.550464] netlink: 'crash': attribute type 1 has an invalid length. > >[ 758.552301] bond7146: (slave vxcan1): The slave device specified does > >not support setting the MAC address > > > >So we can see that we get a ml_priv pointer which is NULL which should not > >be possible due to this: > > > >https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743 > > This reference doesn't point to the right code as vxcan has its own handling > do assign ml_priv in vxcan.c . > > >Btw. the variable 'size' is set two times at the top of alloc_candev_mqs() > >depending on echo_skb_max. This looks wrong. > > No. It looks right as I did not get behind the ALIGN() macro at first sight. > > But it is still open why dev->ml_priv is not set correctly in vxcan.c as all > the settings for .priv_size and in vxcan_setup look fine. Maybe I got completely lost: Shouldn't can_ml_priv and vxcan_priv not be similar? Where is the dev_rcv_lists in the vxcan case? > > Best regards, > Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-21 8:30 ` Kurt Van Dijck @ 2020-01-21 8:35 ` Kurt Van Dijck 2020-01-21 18:54 ` Kurt Van Dijck 1 sibling, 0 replies; 18+ messages in thread From: Kurt Van Dijck @ 2020-01-21 8:35 UTC (permalink / raw) To: Oliver Hartkopp, Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, David Miller, linux-can, LKML, netdev, syzkaller-bugs On di, 21 jan 2020 09:30:35 +0100, Kurt Van Dijck wrote: > On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote: > > Answering myself ... > > > > On 20/01/2020 23.02, Oliver Hartkopp wrote: > > > > > > > >Added some code to check whether dev->ml_priv is NULL: > > > > > >~/linux$ git diff > > >diff --git a/net/can/af_can.c b/net/can/af_can.c > > >index 128d37a4c2e0..6fb4ae4c359e 100644 > > >--- a/net/can/af_can.c > > >+++ b/net/can/af_can.c > > >@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct > > >net_device *dev, canid_t can_id, > > > spin_lock_bh(&net->can.rcvlists_lock); > > > > > > dev_rcv_lists = can_dev_rcv_lists_find(net, dev); > > >+ if (!dev_rcv_lists) { > > >+ pr_err("dev_rcv_lists == NULL! %p\n", dev); > > >+ goto out_unlock; > > >+ } > > > rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); > > > > > > rcv->can_id = can_id; > > >@@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct net_device > > >*dev, canid_t can_id, > > > rcv_lists_stats->rcv_entries++; > > > rcv_lists_stats->rcv_entries_max = > > >max(rcv_lists_stats->rcv_entries_max, > > > > > >rcv_lists_stats->rcv_entries); > > >+out_unlock: > > > spin_unlock_bh(&net->can.rcvlists_lock); > > > > > > return err; > > > > > >And the output (after some time) is: > > > > > >[ 758.505841] netlink: 'crash': attribute type 1 has an invalid length. > > >[ 758.508045] bond7148: (slave vxcan1): The slave device specified does > > >not support setting the MAC address > > >[ 758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu > > >[ 758.532025] bond10413: (slave vxcan1): The slave device specified does > > >not support setting the MAC address > > >[ 758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu > > >[ 758.532254] dev_rcv_lists == NULL! 000000006b9d257f > > >[ 758.547392] netlink: 'crash': attribute type 1 has an invalid length. > > >[ 758.549310] bond7145: (slave vxcan1): The slave device specified does > > >not support setting the MAC address > > >[ 758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu > > >[ 758.550464] netlink: 'crash': attribute type 1 has an invalid length. > > >[ 758.552301] bond7146: (slave vxcan1): The slave device specified does > > >not support setting the MAC address > > > > > >So we can see that we get a ml_priv pointer which is NULL which should not > > >be possible due to this: > > > > > >https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743 > > > > This reference doesn't point to the right code as vxcan has its own handling > > do assign ml_priv in vxcan.c . > > > > >Btw. the variable 'size' is set two times at the top of alloc_candev_mqs() > > >depending on echo_skb_max. This looks wrong. > > > > No. It looks right as I did not get behind the ALIGN() macro at first sight. > > > > But it is still open why dev->ml_priv is not set correctly in vxcan.c as all > > the settings for .priv_size and in vxcan_setup look fine. > > Maybe I got completely lost: > Shouldn't can_ml_priv and vxcan_priv not be similar? > Where is the dev_rcv_lists in the vxcan case? IMHO, net/can/af_can.c:306 is wrong in the vxcan case. > > > > > Best regards, > > Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-21 8:30 ` Kurt Van Dijck 2020-01-21 8:35 ` Kurt Van Dijck @ 2020-01-21 18:54 ` Kurt Van Dijck 2020-01-21 19:28 ` Oliver Hartkopp 1 sibling, 1 reply; 18+ messages in thread From: Kurt Van Dijck @ 2020-01-21 18:54 UTC (permalink / raw) To: Oliver Hartkopp, Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, David Miller, linux-can, LKML, netdev, syzkaller-bugs On di, 21 jan 2020 09:30:35 +0100, Kurt Van Dijck wrote: > On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote: > > Answering myself ... > > > > On 20/01/2020 23.02, Oliver Hartkopp wrote: > > > > > > > >Added some code to check whether dev->ml_priv is NULL: > > > > > >~/linux$ git diff > > >diff --git a/net/can/af_can.c b/net/can/af_can.c > > >index 128d37a4c2e0..6fb4ae4c359e 100644 > > >--- a/net/can/af_can.c > > >+++ b/net/can/af_can.c > > >@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct > > >net_device *dev, canid_t can_id, > > > spin_lock_bh(&net->can.rcvlists_lock); > > > > > > dev_rcv_lists = can_dev_rcv_lists_find(net, dev); > > >+ if (!dev_rcv_lists) { > > >+ pr_err("dev_rcv_lists == NULL! %p\n", dev); > > >+ goto out_unlock; > > >+ } > > > rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); > > > > > > rcv->can_id = can_id; > > >@@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct net_device > > >*dev, canid_t can_id, > > > rcv_lists_stats->rcv_entries++; > > > rcv_lists_stats->rcv_entries_max = > > >max(rcv_lists_stats->rcv_entries_max, > > > > > >rcv_lists_stats->rcv_entries); > > >+out_unlock: > > > spin_unlock_bh(&net->can.rcvlists_lock); > > > > > > return err; > > > > > >And the output (after some time) is: > > > > > >[ 758.505841] netlink: 'crash': attribute type 1 has an invalid length. > > >[ 758.508045] bond7148: (slave vxcan1): The slave device specified does > > >not support setting the MAC address > > >[ 758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu > > >[ 758.532025] bond10413: (slave vxcan1): The slave device specified does > > >not support setting the MAC address > > >[ 758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu > > >[ 758.532254] dev_rcv_lists == NULL! 000000006b9d257f > > >[ 758.547392] netlink: 'crash': attribute type 1 has an invalid length. > > >[ 758.549310] bond7145: (slave vxcan1): The slave device specified does > > >not support setting the MAC address > > >[ 758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu > > >[ 758.550464] netlink: 'crash': attribute type 1 has an invalid length. > > >[ 758.552301] bond7146: (slave vxcan1): The slave device specified does > > >not support setting the MAC address > > > > > >So we can see that we get a ml_priv pointer which is NULL which should not > > >be possible due to this: > > > > > >https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743 > > > > This reference doesn't point to the right code as vxcan has its own handling > > do assign ml_priv in vxcan.c . > > > > >Btw. the variable 'size' is set two times at the top of alloc_candev_mqs() > > >depending on echo_skb_max. This looks wrong. > > > > No. It looks right as I did not get behind the ALIGN() macro at first sight. > > > > But it is still open why dev->ml_priv is not set correctly in vxcan.c as all > > the settings for .priv_size and in vxcan_setup look fine. > > Maybe I got completely lost: > Shouldn't can_ml_priv and vxcan_priv not be similar? > Where is the dev_rcv_lists in the vxcan case? I indeed got completely lost. vxcan_priv & can_ml_priv form together the private part. I continue looking > > > > > Best regards, > > Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-21 18:54 ` Kurt Van Dijck @ 2020-01-21 19:28 ` Oliver Hartkopp 2020-01-21 19:47 ` Kurt Van Dijck 0 siblings, 1 reply; 18+ messages in thread From: Oliver Hartkopp @ 2020-01-21 19:28 UTC (permalink / raw) To: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, David Miller, linux-can, LKML, netdev, syzkaller-bugs, Kurt Van Dijck Hi Kurt, On 21/01/2020 19.54, Kurt Van Dijck wrote: > On di, 21 jan 2020 09:30:35 +0100, Kurt Van Dijck wrote: >> On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote: >>> But it is still open why dev->ml_priv is not set correctly in vxcan.c as all >>> the settings for .priv_size and in vxcan_setup look fine. >> >> Maybe I got completely lost: >> Shouldn't can_ml_priv and vxcan_priv not be similar? >> Where is the dev_rcv_lists in the vxcan case? > > I indeed got completely lost. vxcan_priv & can_ml_priv form together the > private part. I continue looking I added some more debug output: @@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct net_device *dev, canid_t can_id, spin_lock_bh(&net->can.rcvlists_lock); dev_rcv_lists = can_dev_rcv_lists_find(net, dev); + if (!dev_rcv_lists) { + pr_err("dev_rcv_lists == NULL! %p (%s)\n", dev, dev->name); + goto out_unlock; + } rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); rcv->can_id = can_id; and the output becomes: [ 1814.644087] bond5130: (slave vxcan1): The slave device specified does not support setting the MAC address [ 1814.644106] bond5130: (slave vxcan1): Error -22 calling dev_set_mtu [ 1814.648867] bond5128: (slave vxcan1): The slave device specified does not support setting the MAC address [ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu [ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128) [ 1814.696420] bond5129: (slave vxcan1): The slave device specified does not support setting the MAC address [ 1814.696438] bond5129: (slave vxcan1): Error -22 calling dev_set_mtu So it's not the vxcan1 netdev that causes the issue but (sporadically!!) the bonding netdev. Interesting enough that the bonding device bond5128 obviously passes the if (dev && dev->type != ARPHRD_CAN) return -ENODEV; test. ?!? Regards, Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-21 19:28 ` Oliver Hartkopp @ 2020-01-21 19:47 ` Kurt Van Dijck 0 siblings, 0 replies; 18+ messages in thread From: Kurt Van Dijck @ 2020-01-21 19:47 UTC (permalink / raw) To: Oliver Hartkopp Cc: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, David Miller, linux-can, LKML, netdev, syzkaller-bugs On di, 21 jan 2020 20:28:51 +0100, Oliver Hartkopp wrote: > Hi Kurt, > > On 21/01/2020 19.54, Kurt Van Dijck wrote: > >On di, 21 jan 2020 09:30:35 +0100, Kurt Van Dijck wrote: > >>On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote: > > > >>>But it is still open why dev->ml_priv is not set correctly in vxcan.c as all > >>>the settings for .priv_size and in vxcan_setup look fine. > >> > >>Maybe I got completely lost: > >>Shouldn't can_ml_priv and vxcan_priv not be similar? > >>Where is the dev_rcv_lists in the vxcan case? > > > >I indeed got completely lost. vxcan_priv & can_ml_priv form together the > >private part. I continue looking > > I added some more debug output: > > @@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct net_device > *dev, canid_t can_id, > spin_lock_bh(&net->can.rcvlists_lock); > > dev_rcv_lists = can_dev_rcv_lists_find(net, dev); > + if (!dev_rcv_lists) { > + pr_err("dev_rcv_lists == NULL! %p (%s)\n", dev, dev->name); > + goto out_unlock; > + } > rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); > > rcv->can_id = can_id; > > > and the output becomes: > > [ 1814.644087] bond5130: (slave vxcan1): The slave device specified does not > support setting the MAC address > [ 1814.644106] bond5130: (slave vxcan1): Error -22 calling dev_set_mtu > [ 1814.648867] bond5128: (slave vxcan1): The slave device specified does not > support setting the MAC address > [ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu > [ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128) > [ 1814.696420] bond5129: (slave vxcan1): The slave device specified does not > support setting the MAC address > [ 1814.696438] bond5129: (slave vxcan1): Error -22 calling dev_set_mtu > > So it's not the vxcan1 netdev that causes the issue but (sporadically!!) the > bonding netdev. > > Interesting enough that the bonding device bond5128 obviously passes the > > if (dev && dev->type != ARPHRD_CAN) > return -ENODEV; > test. > > ?!? Did you consider my hypothesis I sent you (at 20h22 tonight)? I don't personally understand all the locks around networking, but your observation acks my theory of race condition. > > Regards, > Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-20 22:35 ` Oliver Hartkopp 2020-01-21 8:30 ` Kurt Van Dijck @ 2020-01-21 19:22 ` Kurt Van Dijck 2020-01-21 20:03 ` Oliver Hartkopp 1 sibling, 1 reply; 18+ messages in thread From: Kurt Van Dijck @ 2020-01-21 19:22 UTC (permalink / raw) To: Oliver Hartkopp Cc: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, linux-can Oliver, I decreased the CC list a bit, as I'm more like thinking in the wild now: Since the problems happens only rarely, and with vxcan, I assume not vcan, I started to think to locking issues. 1. What surprised me a bit is 'rtnl_dereference()' calls, without rcu_read_lock() around it? is that supposed to be ok? 2. is it possible to call vxcan_dellink in between the 2 rcu_assign_pointer() calls in vxcan_newlink(), resulting in a dead end, i.e. one end is referenced, but already deleted? I'd expect a kind of rcu_write_lock around the cross-linking at least. It still puzzles me how this bisected to CAN_REQUIRED_SIZE() macro commit. Kurt On ma, 20 jan 2020 23:35:16 +0100, Oliver Hartkopp wrote: > > Answering myself ... > > On 20/01/2020 23.02, Oliver Hartkopp wrote: > > > > >Added some code to check whether dev->ml_priv is NULL: > > > >~/linux$ git diff > >diff --git a/net/can/af_can.c b/net/can/af_can.c > >index 128d37a4c2e0..6fb4ae4c359e 100644 > >--- a/net/can/af_can.c > >+++ b/net/can/af_can.c > >@@ -463,6 +463,10 @@ int can_rx_register(struct net *net, struct > >net_device *dev, canid_t can_id, > > spin_lock_bh(&net->can.rcvlists_lock); > > > > dev_rcv_lists = can_dev_rcv_lists_find(net, dev); > >+ if (!dev_rcv_lists) { > >+ pr_err("dev_rcv_lists == NULL! %p\n", dev); > >+ goto out_unlock; > >+ } > > rcv_list = can_rcv_list_find(&can_id, &mask, dev_rcv_lists); > > > > rcv->can_id = can_id; > >@@ -479,6 +483,7 @@ int can_rx_register(struct net *net, struct net_device > >*dev, canid_t can_id, > > rcv_lists_stats->rcv_entries++; > > rcv_lists_stats->rcv_entries_max = > >max(rcv_lists_stats->rcv_entries_max, > > > >rcv_lists_stats->rcv_entries); > >+out_unlock: > > spin_unlock_bh(&net->can.rcvlists_lock); > > > > return err; > > > >And the output (after some time) is: > > > >[ 758.505841] netlink: 'crash': attribute type 1 has an invalid length. > >[ 758.508045] bond7148: (slave vxcan1): The slave device specified does > >not support setting the MAC address > >[ 758.508057] bond7148: (slave vxcan1): Error -22 calling dev_set_mtu > >[ 758.532025] bond10413: (slave vxcan1): The slave device specified does > >not support setting the MAC address > >[ 758.532043] bond10413: (slave vxcan1): Error -22 calling dev_set_mtu > >[ 758.532254] dev_rcv_lists == NULL! 000000006b9d257f > >[ 758.547392] netlink: 'crash': attribute type 1 has an invalid length. > >[ 758.549310] bond7145: (slave vxcan1): The slave device specified does > >not support setting the MAC address > >[ 758.549313] bond7145: (slave vxcan1): Error -22 calling dev_set_mtu > >[ 758.550464] netlink: 'crash': attribute type 1 has an invalid length. > >[ 758.552301] bond7146: (slave vxcan1): The slave device specified does > >not support setting the MAC address > > > >So we can see that we get a ml_priv pointer which is NULL which should not > >be possible due to this: > > > >https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/can/dev.c#n743 > > This reference doesn't point to the right code as vxcan has its own handling > do assign ml_priv in vxcan.c . > > >Btw. the variable 'size' is set two times at the top of alloc_candev_mqs() > >depending on echo_skb_max. This looks wrong. > > No. It looks right as I did not get behind the ALIGN() macro at first sight. > > But it is still open why dev->ml_priv is not set correctly in vxcan.c as all > the settings for .priv_size and in vxcan_setup look fine. > > Best regards, > Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-21 19:22 ` Kurt Van Dijck @ 2020-01-21 20:03 ` Oliver Hartkopp 2020-01-21 20:34 ` Kurt Van Dijck 2020-01-21 20:39 ` Kurt Van Dijck 0 siblings, 2 replies; 18+ messages in thread From: Oliver Hartkopp @ 2020-01-21 20:03 UTC (permalink / raw) To: Kurt Van Dijck Cc: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, linux-can Hi Kurt, On 21/01/2020 20.22, Kurt Van Dijck wrote: > I decreased the CC list a bit, as I'm more like thinking in the wild > now: Good choice! > Since the problems happens only rarely, and with vxcan, I assume not > vcan, I started to think to locking issues. > > 1. What surprised me a bit is 'rtnl_dereference()' calls, without > rcu_read_lock() around it? is that supposed to be ok? Don't know. This code was just copied from veth.c But veth doesn't fiddle with ml_priv like us. > 2. is it possible to call vxcan_dellink in between the 2 > rcu_assign_pointer() calls in vxcan_newlink(), resulting in a dead end, > i.e. one end is referenced, but already deleted? > I'd expect a kind of rcu_write_lock around the cross-linking at least. Will look into this too. When there's a comment this might be a justification for doing "hacky" things ;-) > It still puzzles me how this bisected to CAN_REQUIRED_SIZE() macro > commit. Yes. That's really weird as is just has an impact on the socket API. But it should not have any impact on the CAN_RAW socket they are using. Hm - in fact is has an impact on the socket API. Before the patch the user space was able to send 16 bytes to the CAN_RAW socket. Now it's only 8 byte as you reduced the required size. Regards, Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-21 20:03 ` Oliver Hartkopp @ 2020-01-21 20:34 ` Kurt Van Dijck 2020-01-21 20:39 ` Kurt Van Dijck 1 sibling, 0 replies; 18+ messages in thread From: Kurt Van Dijck @ 2020-01-21 20:34 UTC (permalink / raw) To: Oliver Hartkopp Cc: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, linux-can On di, 21 jan 2020 21:03:42 +0100, Oliver Hartkopp wrote: > Hi Kurt, ... > > Yes. That's really weird as is just has an impact on the socket API. But it > should not have any impact on the CAN_RAW socket they are using. > > Hm - in fact is has an impact on the socket API. > Before the patch the user space was able to send 16 bytes to the CAN_RAW > socket. Now it's only 8 byte as you reduced the required size. Nack, the CAN_REQUIRED_SIZE is applied on struct sockaddr_can only, to guard against breaking ABI with the upcoming addition of j1939 fields. But where in the past, the tp members where necessary but unused, they are now optional and still unused, in both raw_bind and raw_sendmsg. But this does not propagate to sk_buff, so I could not find how it changes anything that worked before into a crash. > > Regards, > Oliver > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-21 20:03 ` Oliver Hartkopp 2020-01-21 20:34 ` Kurt Van Dijck @ 2020-01-21 20:39 ` Kurt Van Dijck 2020-01-24 18:43 ` Oliver Hartkopp 1 sibling, 1 reply; 18+ messages in thread From: Kurt Van Dijck @ 2020-01-21 20:39 UTC (permalink / raw) To: Oliver Hartkopp Cc: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, linux-can Oliver, On di, 21 jan 2020 21:03:42 +0100, Oliver Hartkopp wrote: > Hi Kurt, > > On 21/01/2020 20.22, Kurt Van Dijck wrote: > > >I decreased the CC list a bit, as I'm more like thinking in the wild > >now: > > Good choice! > > >Since the problems happens only rarely, and with vxcan, I assume not > >vcan, I started to think to locking issues. > > > >1. What surprised me a bit is 'rtnl_dereference()' calls, without > >rcu_read_lock() around it? is that supposed to be ok? > > Don't know. This code was just copied from veth.c > > But veth doesn't fiddle with ml_priv like us. > > >2. is it possible to call vxcan_dellink in between the 2 > >rcu_assign_pointer() calls in vxcan_newlink(), resulting in a dead end, > >i.e. one end is referenced, but already deleted? > >I'd expect a kind of rcu_write_lock around the cross-linking at least. > > Will look into this too. When there's a comment this might be a > justification for doing "hacky" things ;-) Maybe move the crosslinking to before the register, then they're inaccessible from userspace. and a little comment, indeed. Kurt ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-21 20:39 ` Kurt Van Dijck @ 2020-01-24 18:43 ` Oliver Hartkopp 2020-01-24 19:05 ` Kurt Van Dijck 0 siblings, 1 reply; 18+ messages in thread From: Oliver Hartkopp @ 2020-01-24 18:43 UTC (permalink / raw) To: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, linux-can Hi Kurt, Dmitry, On 21/01/2020 21.39, Kurt Van Dijck wrote: > Maybe move the crosslinking to before the register, then they're > inaccessible from userspace. I think I found the problem: [ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu [ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128) The bonding netdev bond5128 enslaved the vxcan1 netdev. As vxcan1 is a CAN netdev with ARPHRD_CAN the bonding process executes if (slave_dev->type != ARPHRD_ETHER) bond_setup_by_slave(bond_dev, slave_dev); in bond_enslave() in .../bonding/bond_main.c Which does this: static void bond_setup_by_slave(struct net_device *bond_dev, struct net_device *slave_dev) { bond_dev->header_ops = slave_dev->header_ops; bond_dev->type = slave_dev->type; bond_dev->hard_header_len = slave_dev->hard_header_len; bond_dev->addr_len = slave_dev->addr_len; memcpy(bond_dev->broadcast, slave_dev->broadcast, slave_dev->addr_len); } So bond5128 becomes an ARPHDR_CAN interface BUT without having a netdev_priv() space which contains our lovely can_ml_priv structure with the dev_rcv_lists for the CAN filters. I was able to confirm the bisected commit but the crashes still were pure luck IMO. can_rx_register() accesses netdev_priv() of the bonding device - but there are no CAN filters. BAM! So we need to make sure that ARPHDR_CAN dev->type can not be enslaved by the bonding driver. Best regards, Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-24 18:43 ` Oliver Hartkopp @ 2020-01-24 19:05 ` Kurt Van Dijck 2020-01-24 21:04 ` Oliver Hartkopp 0 siblings, 1 reply; 18+ messages in thread From: Kurt Van Dijck @ 2020-01-24 19:05 UTC (permalink / raw) To: Oliver Hartkopp Cc: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, linux-can On vr, 24 jan 2020 19:43:23 +0100, Oliver Hartkopp wrote: > Hi Kurt, Dmitry, > > On 21/01/2020 21.39, Kurt Van Dijck wrote: > > >Maybe move the crosslinking to before the register, then they're > >inaccessible from userspace. > > I think I found the problem: Well done! > > [ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu > [ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128) > > The bonding netdev bond5128 enslaved the vxcan1 netdev. As vxcan1 is a CAN > netdev with ARPHRD_CAN the bonding process executes > You were able to make the syscalls comprehensible then? > if (slave_dev->type != ARPHRD_ETHER) > bond_setup_by_slave(bond_dev, slave_dev); > > in bond_enslave() in .../bonding/bond_main.c > > Which does this: > > static void bond_setup_by_slave(struct net_device *bond_dev, > struct net_device *slave_dev) > { > bond_dev->header_ops = slave_dev->header_ops; > > bond_dev->type = slave_dev->type; > bond_dev->hard_header_len = slave_dev->hard_header_len; > bond_dev->addr_len = slave_dev->addr_len; > > memcpy(bond_dev->broadcast, slave_dev->broadcast, > slave_dev->addr_len); > } > > So bond5128 becomes an ARPHDR_CAN interface BUT without having a > netdev_priv() space which contains our lovely can_ml_priv structure with the > dev_rcv_lists for the CAN filters. > > I was able to confirm the bisected commit but the crashes still were pure > luck IMO. > > can_rx_register() accesses netdev_priv() of the bonding device - but there > are no CAN filters. BAM! > > So we need to make sure that ARPHDR_CAN dev->type can not be enslaved by the > bonding driver. This implies modifying bond_main.c, right? > > Best regards, > Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: general protection fault in can_rx_register 2020-01-24 19:05 ` Kurt Van Dijck @ 2020-01-24 21:04 ` Oliver Hartkopp 0 siblings, 0 replies; 18+ messages in thread From: Oliver Hartkopp @ 2020-01-24 21:04 UTC (permalink / raw) To: Dmitry Vyukov, Marc Kleine-Budde, o.rempel, syzbot, linux-can Hi Kurt, On 24/01/2020 20.05, Kurt Van Dijck wrote: > On vr, 24 jan 2020 19:43:23 +0100, Oliver Hartkopp wrote: >> Hi Kurt, Dmitry, >> >> On 21/01/2020 21.39, Kurt Van Dijck wrote: >> >>> Maybe move the crosslinking to before the register, then they're >>> inaccessible from userspace. >> >> I think I found the problem: > > Well done! > >> >> [ 1814.648904] bond5128: (slave vxcan1): Error -22 calling dev_set_mtu >> [ 1814.649124] dev_rcv_lists == NULL! 000000008e41fb06 (bond5128) >> >> The bonding netdev bond5128 enslaved the vxcan1 netdev. As vxcan1 is a CAN >> netdev with ARPHRD_CAN the bonding process executes >> > > You were able to make the syscalls comprehensible then? Not really. I was just digging into "what bonding CAN interfaces" probably means to us :-) The fact that we are only handling ARPHDR_CAN interfaces *and* the dev_rcv_lists have not been available finally lead to the problem. >> if (slave_dev->type != ARPHRD_ETHER) >> bond_setup_by_slave(bond_dev, slave_dev); >> >> in bond_enslave() in .../bonding/bond_main.c >> >> Which does this: >> >> static void bond_setup_by_slave(struct net_device *bond_dev, >> struct net_device *slave_dev) >> { >> bond_dev->header_ops = slave_dev->header_ops; >> >> bond_dev->type = slave_dev->type; >> bond_dev->hard_header_len = slave_dev->hard_header_len; >> bond_dev->addr_len = slave_dev->addr_len; >> >> memcpy(bond_dev->broadcast, slave_dev->broadcast, >> slave_dev->addr_len); >> } >> >> So bond5128 becomes an ARPHDR_CAN interface BUT without having a >> netdev_priv() space which contains our lovely can_ml_priv structure with the >> dev_rcv_lists for the CAN filters. >> >> I was able to confirm the bisected commit but the crashes still were pure >> luck IMO. >> >> can_rx_register() accesses netdev_priv() of the bonding device - but there >> are no CAN filters. BAM! >> >> So we need to make sure that ARPHDR_CAN dev->type can not be enslaved by the >> bonding driver. > > This implies modifying bond_main.c, right? I think so. But I wanted to have this discussed on the mailing list before preparing a patch. Best, Oliver ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2020-01-24 21:04 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-01-17 13:46 general protection fault in can_rx_register syzbot 2020-01-17 20:02 ` Oliver Hartkopp 2020-01-20 9:11 ` Kurt Van Dijck 2020-01-20 9:22 ` Dmitry Vyukov 2020-01-20 22:02 ` Oliver Hartkopp 2020-01-20 22:35 ` Oliver Hartkopp 2020-01-21 8:30 ` Kurt Van Dijck 2020-01-21 8:35 ` Kurt Van Dijck 2020-01-21 18:54 ` Kurt Van Dijck 2020-01-21 19:28 ` Oliver Hartkopp 2020-01-21 19:47 ` Kurt Van Dijck 2020-01-21 19:22 ` Kurt Van Dijck 2020-01-21 20:03 ` Oliver Hartkopp 2020-01-21 20:34 ` Kurt Van Dijck 2020-01-21 20:39 ` Kurt Van Dijck 2020-01-24 18:43 ` Oliver Hartkopp 2020-01-24 19:05 ` Kurt Van Dijck 2020-01-24 21:04 ` Oliver Hartkopp
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.