* [syzbot] [mptcp?] general protection fault in proc_scheduler @ 2025-01-02 14:12 syzbot 2025-01-02 15:21 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: syzbot @ 2025-01-02 14:12 UTC (permalink / raw) To: davem, edumazet, geliang, horms, kuba, linux-kernel, martineau, matttbe, mptcp, netdev, pabeni, syzkaller-bugs Hello, syzbot found the following issue on: HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 __kernel_write+0xf6/0x140 fs/read_write.c:632 do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 acct_pin_kill+0x2d/0x100 kernel/acct.c:192 pin_kill+0x194/0x7c0 fs/fs_pin.c:44 mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 task_work_run+0x14e/0x250 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xad8/0x2d70 kernel/exit.c:938 do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 get_signal+0x2576/0x2610 kernel/signal.c:3017 arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fee3cb87a6a Code: Unable to access opcode bytes at 0x7fee3cb87a40. RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ---------------- Code disassembly (best guess), 1 bytes skipped: 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) 5: 0f 85 fe 02 00 00 jne 0x309 b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 12: 00 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 1a: fc ff df 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi 22: 48 89 fa mov %rdi,%rdx 25: 48 c1 ea 03 shr $0x3,%rdx * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction 2d: 0f 85 cc 02 00 00 jne 0x2ff 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 38: 48 rex.W 39: 8d .byte 0x8d 3a: 84 24 c8 test %ah,(%rax,%rcx,8) --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing. If you want to overwrite report's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the report is a duplicate of another one, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-02 14:12 [syzbot] [mptcp?] general protection fault in proc_scheduler syzbot @ 2025-01-02 15:21 ` Eric Dumazet 2025-01-04 18:38 ` Matthieu Baerts 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2025-01-02 15:21 UTC (permalink / raw) To: syzbot, Al Viro Cc: davem, geliang, horms, kuba, linux-kernel, martineau, matttbe, mptcp, netdev, pabeni, syzkaller-bugs On Thu, Jan 2, 2025 at 3:12 PM syzbot <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f > dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz > kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com > > Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI > KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 > RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > > RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 > __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 > __kernel_write+0xf6/0x140 fs/read_write.c:632 > do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 > acct_pin_kill+0x2d/0x100 kernel/acct.c:192 > pin_kill+0x194/0x7c0 fs/fs_pin.c:44 > mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 > cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 > task_work_run+0x14e/0x250 kernel/task_work.c:239 > exit_task_work include/linux/task_work.h:43 [inline] > do_exit+0xad8/0x2d70 kernel/exit.c:938 > do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 > get_signal+0x2576/0x2610 kernel/signal.c:3017 > arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 > exit_to_user_mode_loop kernel/entry/common.c:111 [inline] > exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] > __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 > do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > RIP: 0033:0x7fee3cb87a6a > Code: Unable to access opcode bytes at 0x7fee3cb87a40. > RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 > RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a > RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 > RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 > R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 > R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 > </TASK> > Modules linked in: > ---[ end trace 0000000000000000 ]--- > RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > ---------------- > Code disassembly (best guess), 1 bytes skipped: > 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > 5: 0f 85 fe 02 00 00 jne 0x309 > b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 > 12: 00 > 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > 1a: fc ff df > 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi > 22: 48 89 fa mov %rdi,%rdx > 25: 48 c1 ea 03 shr $0x3,%rdx > * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction > 2d: 0f 85 cc 02 00 00 jne 0x2ff > 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 > 38: 48 rex.W > 39: 8d .byte 0x8d > 3a: 84 24 c8 test %ah,(%rax,%rcx,8) > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > If the report is already addressed, let syzbot know by replying with: > #syz fix: exact-commit-title > > If you want syzbot to run the reproducer, reply with: > #syz test: git://repo/address.git branch-or-commit-hash > If you attach or paste a git patch, syzbot will apply it before testing. > > If you want to overwrite report's subsystems, reply with: > #syz set subsystems: new-subsystem > (See the list of subsystem names on the web dashboard) > > If the report is a duplicate of another one, reply with: > #syz dup: exact-subject-of-another-report > > If you want to undo deduplication, reply with: > #syz undup I thought acct(2) was only allowing regular files. acct_on() indeed has : if (!S_ISREG(file_inode(file)->i_mode)) { kfree(acct); filp_close(file, NULL); return -EACCES; } It seems there are other ways to call do_acct_process() targeting a sysfs file ? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-02 15:21 ` Eric Dumazet @ 2025-01-04 18:38 ` Matthieu Baerts 2025-01-04 18:53 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Matthieu Baerts @ 2025-01-04 18:38 UTC (permalink / raw) To: Eric Dumazet Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot, Al Viro Hi Eric, Thank you for the bug report! On 02/01/2025 16:21, Eric Dumazet wrote: > On Thu, Jan 2, 2025 at 3:12 PM syzbot > <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote: >> >> Hello, >> >> syzbot found the following issue on: >> >> HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f >> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 >> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 >> >> Downloadable assets: >> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz >> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz >> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz >> >> IMPORTANT: if you fix the issue, please add the following tag to the commit: >> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com >> >> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI >> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] >> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 >> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 >> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 >> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 >> >> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 >> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 >> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 >> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 >> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 >> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Call Trace: >> <TASK> >> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 >> __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 >> __kernel_write+0xf6/0x140 fs/read_write.c:632 >> do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 >> acct_pin_kill+0x2d/0x100 kernel/acct.c:192 >> pin_kill+0x194/0x7c0 fs/fs_pin.c:44 >> mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 >> cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 >> task_work_run+0x14e/0x250 kernel/task_work.c:239 >> exit_task_work include/linux/task_work.h:43 [inline] >> do_exit+0xad8/0x2d70 kernel/exit.c:938 >> do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 >> get_signal+0x2576/0x2610 kernel/signal.c:3017 >> arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 >> exit_to_user_mode_loop kernel/entry/common.c:111 [inline] >> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] >> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] >> syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 >> do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 >> entry_SYSCALL_64_after_hwframe+0x77/0x7f >> RIP: 0033:0x7fee3cb87a6a >> Code: Unable to access opcode bytes at 0x7fee3cb87a40. >> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 >> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a >> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 >> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 >> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 >> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 >> </TASK> >> Modules linked in: >> ---[ end trace 0000000000000000 ]--- >> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 >> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 >> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 >> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 >> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 >> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 >> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 >> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 >> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> ---------------- >> Code disassembly (best guess), 1 bytes skipped: >> 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) >> 5: 0f 85 fe 02 00 00 jne 0x309 >> b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 >> 12: 00 >> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax >> 1a: fc ff df >> 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi >> 22: 48 89 fa mov %rdi,%rdx >> 25: 48 c1 ea 03 shr $0x3,%rdx >> * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction >> 2d: 0f 85 cc 02 00 00 jne 0x2ff >> 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 >> 38: 48 rex.W >> 39: 8d .byte 0x8d >> 3a: 84 24 c8 test %ah,(%rax,%rcx,8) (...) > I thought acct(2) was only allowing regular files. > > acct_on() indeed has : > > if (!S_ISREG(file_inode(file)->i_mode)) { > kfree(acct); > filp_close(file, NULL); > return -EACCES; > } > > It seems there are other ways to call do_acct_process() targeting a sysfs file ? Just to be sure I'm not misunderstanding your comment: do you mean that here, the issue is *not* in MPTCP code where we get the 'struct net' pointer via 'current->nsproxy->net_ns', but in the FS part, right? Here, we have an issue because 'current->nsproxy' is NULL, but is it normal? Or should we simply exit with an error if it is the case because we are in an exiting phase? I'm just a bit confused, because it looks like 'net' is retrieved from different places elsewhere when dealing with sysfs: some get it from 'current' like us, some assign 'net' to 'table->extra2', others get it from 'table->data' (via a container_of()), etc. Maybe we should not use 'current->nsproxy->net_ns' here then? Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-04 18:38 ` Matthieu Baerts @ 2025-01-04 18:53 ` Eric Dumazet 2025-01-04 19:00 ` Al Viro ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Eric Dumazet @ 2025-01-04 18:53 UTC (permalink / raw) To: Matthieu Baerts Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot, Al Viro On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote: > > Hi Eric, > > Thank you for the bug report! > > On 02/01/2025 16:21, Eric Dumazet wrote: > > On Thu, Jan 2, 2025 at 3:12 PM syzbot > > <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote: > >> > >> Hello, > >> > >> syzbot found the following issue on: > >> > >> HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. > >> git tree: upstream > >> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 > >> kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f > >> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 > >> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 > >> > >> Downloadable assets: > >> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz > >> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz > >> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz > >> > >> IMPORTANT: if you fix the issue, please add the following tag to the commit: > >> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com > >> > >> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI > >> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > >> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 > >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 > >> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > >> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > >> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > >> > >> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > >> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > >> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > >> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > >> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > >> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >> Call Trace: > >> <TASK> > >> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 > >> __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 > >> __kernel_write+0xf6/0x140 fs/read_write.c:632 > >> do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 > >> acct_pin_kill+0x2d/0x100 kernel/acct.c:192 > >> pin_kill+0x194/0x7c0 fs/fs_pin.c:44 > >> mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 > >> cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 > >> task_work_run+0x14e/0x250 kernel/task_work.c:239 > >> exit_task_work include/linux/task_work.h:43 [inline] > >> do_exit+0xad8/0x2d70 kernel/exit.c:938 > >> do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 > >> get_signal+0x2576/0x2610 kernel/signal.c:3017 > >> arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 > >> exit_to_user_mode_loop kernel/entry/common.c:111 [inline] > >> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] > >> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > >> syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 > >> do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 > >> entry_SYSCALL_64_after_hwframe+0x77/0x7f > >> RIP: 0033:0x7fee3cb87a6a > >> Code: Unable to access opcode bytes at 0x7fee3cb87a40. > >> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 > >> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a > >> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 > >> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 > >> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 > >> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 > >> </TASK> > >> Modules linked in: > >> ---[ end trace 0000000000000000 ]--- > >> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > >> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > >> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > >> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > >> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > >> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > >> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > >> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > >> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >> ---------------- > >> Code disassembly (best guess), 1 bytes skipped: > >> 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > >> 5: 0f 85 fe 02 00 00 jne 0x309 > >> b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 > >> 12: 00 > >> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > >> 1a: fc ff df > >> 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi > >> 22: 48 89 fa mov %rdi,%rdx > >> 25: 48 c1 ea 03 shr $0x3,%rdx > >> * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction > >> 2d: 0f 85 cc 02 00 00 jne 0x2ff > >> 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 > >> 38: 48 rex.W > >> 39: 8d .byte 0x8d > >> 3a: 84 24 c8 test %ah,(%rax,%rcx,8) > > (...) > > > I thought acct(2) was only allowing regular files. > > > > acct_on() indeed has : > > > > if (!S_ISREG(file_inode(file)->i_mode)) { > > kfree(acct); > > filp_close(file, NULL); > > return -EACCES; > > } > > > > It seems there are other ways to call do_acct_process() targeting a sysfs file ? > > Just to be sure I'm not misunderstanding your comment: do you mean that > here, the issue is *not* in MPTCP code where we get the 'struct net' > pointer via 'current->nsproxy->net_ns', but in the FS part, right? > > Here, we have an issue because 'current->nsproxy' is NULL, but is it > normal? Or should we simply exit with an error if it is the case because > we are in an exiting phase? > > I'm just a bit confused, because it looks like 'net' is retrieved from > different places elsewhere when dealing with sysfs: some get it from > 'current' like us, some assign 'net' to 'table->extra2', others get it > from 'table->data' (via a container_of()), etc. Maybe we should not use > 'current->nsproxy->net_ns' here then? I do think this is a bug in process accounting, not in networking. It might make sense to output a record on a regular file, but probably not on any other files. diff --git a/kernel/acct.c b/kernel/acct.c index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 100644 --- a/kernel/acct.c +++ b/kernel/acct.c @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) const struct cred *orig_cred; struct file *file = acct->file; + if (S_ISREG(file_inode(file)->i_mode)) + return; + /* * Accounting records are not subject to resource limits. */ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-04 18:53 ` Eric Dumazet @ 2025-01-04 19:00 ` Al Viro 2025-01-04 19:11 ` Matthieu Baerts 2025-01-04 19:11 ` Matthieu Baerts 2025-01-04 20:09 ` Al Viro 2 siblings, 1 reply; 22+ messages in thread From: Al Viro @ 2025-01-04 19:00 UTC (permalink / raw) To: Eric Dumazet Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Sat, Jan 04, 2025 at 07:53:22PM +0100, Eric Dumazet wrote: > I do think this is a bug in process accounting, not in networking. > > It might make sense to output a record on a regular file, but probably > not on any other files. > > diff --git a/kernel/acct.c b/kernel/acct.c > index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 > 100644 > --- a/kernel/acct.c > +++ b/kernel/acct.c > @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) > const struct cred *orig_cred; > struct file *file = acct->file; > > + if (S_ISREG(file_inode(file)->i_mode)) > + return; ... won't help, since the file in question *is* a regular file. IOW, it's a wrong predicate here. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-04 19:00 ` Al Viro @ 2025-01-04 19:11 ` Matthieu Baerts 2025-01-04 20:21 ` Al Viro 0 siblings, 1 reply; 22+ messages in thread From: Matthieu Baerts @ 2025-01-04 19:11 UTC (permalink / raw) To: Al Viro, Eric Dumazet Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot Hi Al, Eric, On 04/01/2025 20:00, Al Viro wrote: > On Sat, Jan 04, 2025 at 07:53:22PM +0100, Eric Dumazet wrote: > >> I do think this is a bug in process accounting, not in networking. >> >> It might make sense to output a record on a regular file, but probably >> not on any other files. >> >> diff --git a/kernel/acct.c b/kernel/acct.c >> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 >> 100644 >> --- a/kernel/acct.c >> +++ b/kernel/acct.c >> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) >> const struct cred *orig_cred; >> struct file *file = acct->file; >> >> + if (S_ISREG(file_inode(file)->i_mode)) >> + return; > > ... won't help, since the file in question *is* a regular file. IOW, it's > a wrong predicate here. On my side, it looks like I'm not able to reproduce the issue with this patch. Without it, it is very easy to reproduce it. (But I don't know if there are other consequences that would avoid the issue to happen: when looking at the logs, with the patch, I don't have heaps of "Process accounting resumed" messages that I had before.) Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-04 19:11 ` Matthieu Baerts @ 2025-01-04 20:21 ` Al Viro 2025-01-05 8:32 ` Eric Dumazet 2025-01-05 17:03 ` Matthieu Baerts 0 siblings, 2 replies; 22+ messages in thread From: Al Viro @ 2025-01-04 20:21 UTC (permalink / raw) To: Matthieu Baerts Cc: Eric Dumazet, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Sat, Jan 04, 2025 at 08:11:49PM +0100, Matthieu Baerts wrote: > >> + if (S_ISREG(file_inode(file)->i_mode)) ^^^^^^^^^ > >> + return; > > > > ... won't help, since the file in question *is* a regular file. IOW, it's > > a wrong predicate here. > > On my side, it looks like I'm not able to reproduce the issue with this > patch. Without it, it is very easy to reproduce it. (But I don't know if > there are other consequences that would avoid the issue to happen: when > looking at the logs, with the patch, I don't have heaps of "Process > accounting resumed" messages that I had before.) Unsurprisingly so, since it rejects all regular files due to a typo; fix that and you'll see that the oops is still there. The real issue (and the one that affects more than just this scenario) is the use of current->nsproxy->net to get to the damn thing. Why not something like static int proc_scheduler(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { char (*data)[MPTCP_SCHED_NAME_MAX] = table->data; char val[MPTCP_SCHED_NAME_MAX]; struct ctl_table tbl = { .data = val, .maxlen = MPTCP_SCHED_NAME_MAX, }; int ret; strscpy(val, *data, MPTCP_SCHED_NAME_MAX); ret = proc_dostring(&tbl, write, buffer, lenp, ppos); if (write && ret == 0) { rcu_read_lock(); sched = mptcp_sched_find(val); if (sched) strscpy(*data, val, MPTCP_SCHED_NAME_MAX); else ret = -ENOENT; rcu_read_unlock(); } return ret; } seeing that the data object you really want to access is mptcp_get_pernet(net)->scheduler and you have that pointer stored in table->data at the registration time? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-04 20:21 ` Al Viro @ 2025-01-05 8:32 ` Eric Dumazet 2025-01-05 11:29 ` Al Viro 2025-01-05 17:03 ` Matthieu Baerts 1 sibling, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2025-01-05 8:32 UTC (permalink / raw) To: Al Viro Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Sat, Jan 4, 2025 at 9:21 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sat, Jan 04, 2025 at 08:11:49PM +0100, Matthieu Baerts wrote: > > >> + if (S_ISREG(file_inode(file)->i_mode)) > ^^^^^^^^^ > > >> + return; > > > > > > ... won't help, since the file in question *is* a regular file. IOW, it's > > > a wrong predicate here. > > > > On my side, it looks like I'm not able to reproduce the issue with this > > patch. Without it, it is very easy to reproduce it. (But I don't know if > > there are other consequences that would avoid the issue to happen: when > > looking at the logs, with the patch, I don't have heaps of "Process > > accounting resumed" messages that I had before.) > > Unsurprisingly so, since it rejects all regular files due to a typo; > fix that and you'll see that the oops is still there. > > The real issue (and the one that affects more than just this scenario) is > the use of current->nsproxy->net to get to the damn thing. According to grep, we have many other places directly reading current->nsproxy->net_ns For instance in net/sctp/sysctl.c Should we change them all ? Perhaps an alternative would be to add a generic check in proc_sys_call_handler() diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 27a283d85a6e7df1a7edbfb513ce75832363e2e6..84968b10ce86e7fd88c6e3c43f52b601394b056f 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -576,6 +576,8 @@ static ssize_t proc_sys_call_handler(struct kiocb *iocb, struct iov_iter *iter, error = -EINVAL; if (!table->proc_handler) goto out; + if (unlikely(current->flags & PF_EXITING)) + goto out; /* don't even try if the size is too large */ error = -ENOMEM; Thanks. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-05 8:32 ` Eric Dumazet @ 2025-01-05 11:29 ` Al Viro 2025-01-05 16:52 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Al Viro @ 2025-01-05 11:29 UTC (permalink / raw) To: Eric Dumazet Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Sun, Jan 05, 2025 at 09:32:36AM +0100, Eric Dumazet wrote: > According to grep, we have many other places directly reading > current->nsproxy->net_ns > For instance in net/sctp/sysctl.c > Should we change them all ? Depends - do you want their contents match the netns of opener (as, AFAICS, for ipv4 sysctls) or that of the reader? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-05 11:29 ` Al Viro @ 2025-01-05 16:52 ` Eric Dumazet 2025-01-05 17:03 ` Matthieu Baerts 2025-01-05 19:54 ` Al Viro 0 siblings, 2 replies; 22+ messages in thread From: Eric Dumazet @ 2025-01-05 16:52 UTC (permalink / raw) To: Al Viro Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Sun, Jan 5, 2025 at 12:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sun, Jan 05, 2025 at 09:32:36AM +0100, Eric Dumazet wrote: > > > According to grep, we have many other places directly reading > > current->nsproxy->net_ns > > For instance in net/sctp/sysctl.c > > Should we change them all ? > > Depends - do you want their contents match the netns of opener (as, > AFAICS, for ipv4 sysctls) or that of the reader? I am only worried that a malicious user could crash the host with current kernels, not about this MPTP crash, but all unaware users of current->nsproxy in sysctl handlers. Back to MPTCP : Using the convention used in other mptcp sysctls like (enabled, add_addr_timeout, checksum_enabled, allow_join_initial_addr_port...) is better for consistency. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-05 16:52 ` Eric Dumazet @ 2025-01-05 17:03 ` Matthieu Baerts 2025-01-05 19:54 ` Al Viro 1 sibling, 0 replies; 22+ messages in thread From: Matthieu Baerts @ 2025-01-05 17:03 UTC (permalink / raw) To: Eric Dumazet, Al Viro Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot Hi Eric, On 05/01/2025 17:52, Eric Dumazet wrote: > On Sun, Jan 5, 2025 at 12:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote: >> >> On Sun, Jan 05, 2025 at 09:32:36AM +0100, Eric Dumazet wrote: >> >>> According to grep, we have many other places directly reading >>> current->nsproxy->net_ns >>> For instance in net/sctp/sysctl.c >>> Should we change them all ? >> >> Depends - do you want their contents match the netns of opener (as, >> AFAICS, for ipv4 sysctls) or that of the reader? > > I am only worried that a malicious user could crash the host with > current kernels, > not about this MPTP crash, but all unaware users of current->nsproxy > in sysctl handlers. > > Back to MPTCP : > > Using the convention used in other mptcp sysctls like (enabled, > add_addr_timeout, > checksum_enabled, allow_join_initial_addr_port...) is better for consistency. Indeed, I can do the modifications to stop using current->nsproxy in MPTCP. I can do the same in SCTP. Do you plan to send your patch modifying proc_sysctl.c? It is just to know if I should mark my patches as fixes, and split them to ease the backports -- each helper using current->nsproxy has been introduced in different commits -- or if I can send them to net-next instead. Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-05 16:52 ` Eric Dumazet 2025-01-05 17:03 ` Matthieu Baerts @ 2025-01-05 19:54 ` Al Viro 2025-01-05 20:50 ` Al Viro 1 sibling, 1 reply; 22+ messages in thread From: Al Viro @ 2025-01-05 19:54 UTC (permalink / raw) To: Eric Dumazet Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Sun, Jan 05, 2025 at 05:52:19PM +0100, Eric Dumazet wrote: > On Sun, Jan 5, 2025 at 12:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Sun, Jan 05, 2025 at 09:32:36AM +0100, Eric Dumazet wrote: > > > > > According to grep, we have many other places directly reading > > > current->nsproxy->net_ns > > > For instance in net/sctp/sysctl.c > > > Should we change them all ? > > > > Depends - do you want their contents match the netns of opener (as, > > AFAICS, for ipv4 sysctls) or that of the reader? > > I am only worried that a malicious user could crash the host with > current kernels, > not about this MPTP crash, but all unaware users of current->nsproxy > in sysctl handlers. I don't hate your mitigation in proc_sysctl.c, but IMO there are two problems mixed here - one is that we probably should have access to per-netns sysctl table act on the netns it had been created for, which may not coincide with reader's/writer's netns and another is that access to current->nsproxy->netns would simply oops if attempted when current->nsproxy had been dropped. So I suspect that current->nsproxy->netns shouldn't be used in per-netns sysctls for consistency sake (note that it can get more serious than just consistency, if you have e.g. a spinlock taken in something hanging off current netns to protect access to something table->data points to). As for the mitigation in fs/proc/proc_sysctl.c... might be useful, if it comes with a clear comment about the reasons it's there. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-05 19:54 ` Al Viro @ 2025-01-05 20:50 ` Al Viro 2025-01-05 21:11 ` Al Viro 0 siblings, 1 reply; 22+ messages in thread From: Al Viro @ 2025-01-05 20:50 UTC (permalink / raw) To: Eric Dumazet Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Sun, Jan 05, 2025 at 07:54:34PM +0000, Al Viro wrote: > So I suspect that current->nsproxy->netns shouldn't be used in > per-netns sysctls for consistency sake (note that it can get more > serious than just consistency, if you have e.g. a spinlock taken > in something hanging off current netns to protect access to > something table->data points to). > > As for the mitigation in fs/proc/proc_sysctl.c... might be useful, > if it comes with a clear comment about the reasons it's there. FWIW, looks like we have two such in mptcp (with sysctls next to those definitely accessing the netns of opener rather than reader/writer), two in rds (both inconsistent on the write side - struct net *net = current->nsproxy->net_ns; int err; err = proc_dointvec_minmax(ctl, write, buffer, lenp, fpos); if (err < 0) { pr_warn("Invalid input. Must be >= %d\n", *(int *)(ctl->extra1)); return err; } if (write) rds_tcp_sysctl_reset(net); will modify ctl->data, which points to &rtn->{snd,rcv}buf_size, with rtn == net_generic(net, rds_tcp_netid) and net being for opener's netns and then call rds_tcp_sysctl_reset(net) with net being the writer's netns) and 6 in sctp. At least some of sctp ones are also inconsistent on the write side; e.g. static int proc_sctp_do_rto_min(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { struct net *net = current->nsproxy->net_ns; unsigned int min = *(unsigned int *) ctl->extra1; unsigned int max = *(unsigned int *) ctl->extra2; struct ctl_table tbl; int ret, new_value; memset(&tbl, 0, sizeof(struct ctl_table)); tbl.maxlen = sizeof(unsigned int); if (write) tbl.data = &new_value; else tbl.data = &net->sctp.rto_min; ret = proc_dointvec(&tbl, write, buffer, lenp, ppos); if (write && ret == 0) { if (new_value > max || new_value < min) return -EINVAL; net->sctp.rto_min = new_value; } return ret; } has max taken from ctl->extra2, which is &net->sctp.rto_max of the opener's netns, but the value capped by that in stored into net->sctp.rto_min of *writer's* netns. So the logics that is supposed to prevent rto_min > rto_max can be bypassed; no idea how much can that escalate to, but it's clearly not what the code intends. So I'd rather document the "don't assume that current->nsproxy->netns will point to the same netns this ctl is for" and fix those 10 instances - at least some smell seriously fishy. It's not just the acct(2) weirdness and the damage may be worse than an oops... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-05 20:50 ` Al Viro @ 2025-01-05 21:11 ` Al Viro 0 siblings, 0 replies; 22+ messages in thread From: Al Viro @ 2025-01-05 21:11 UTC (permalink / raw) To: Eric Dumazet Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Sun, Jan 05, 2025 at 08:50:56PM +0000, Al Viro wrote: > has max taken from ctl->extra2, which is &net->sctp.rto_max of the > opener's netns, but the value capped by that in stored into > net->sctp.rto_min of *writer's* netns. So the logics that is supposed > to prevent rto_min > rto_max can be bypassed; no idea how much can that > escalate to, but it's clearly not what the code intends. Speaking of which, the logics that tries to maintain rto_min <= rto_max is broken in another way. There's no exclusion in those suckers. IOW, if we have set rto_min to 1 and rto_max to 10000, two processes can try to write 1000 to rto_min and 10 to rto_max resp., with successful validations done against the original state in both, followed by actual stores. Result is rto_min == 1000 and rto_max == 10, which is probably not what one wants there... IOW, the validation and stores should be atomic; the same goes for another pair (pf_retrans <= ps_retrans). Again, I've no idea how severe it is, but result seems to be at least contrary to expectation of the code authors... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-04 20:21 ` Al Viro 2025-01-05 8:32 ` Eric Dumazet @ 2025-01-05 17:03 ` Matthieu Baerts 1 sibling, 0 replies; 22+ messages in thread From: Matthieu Baerts @ 2025-01-05 17:03 UTC (permalink / raw) To: Al Viro Cc: Eric Dumazet, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot Hi Al, On 04/01/2025 21:21, Al Viro wrote: > The real issue (and the one that affects more than just this scenario) is > the use of current->nsproxy->net to get to the damn thing. > > Why not something like (...) > seeing that the data object you really want to access is > mptcp_get_pernet(net)->scheduler and you have that pointer > stored in table->data at the registration time? Good point, thank you for the suggestion! :) I will do this modification. Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-04 18:53 ` Eric Dumazet 2025-01-04 19:00 ` Al Viro @ 2025-01-04 19:11 ` Matthieu Baerts 2025-01-06 13:32 ` Joel Granados 2025-01-04 20:09 ` Al Viro 2 siblings, 1 reply; 22+ messages in thread From: Matthieu Baerts @ 2025-01-04 19:11 UTC (permalink / raw) To: Eric Dumazet Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot, Al Viro, Joel Granados Hi Eric, (+cc Joel) Thank you for your reply! On 04/01/2025 19:53, Eric Dumazet wrote: > On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote: >> >> Hi Eric, >> >> Thank you for the bug report! >> >> On 02/01/2025 16:21, Eric Dumazet wrote: >>> On Thu, Jan 2, 2025 at 3:12 PM syzbot >>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote: >>>> >>>> Hello, >>>> >>>> syzbot found the following issue on: >>>> >>>> HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. >>>> git tree: upstream >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 >>>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 >>>> >>>> Downloadable assets: >>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz >>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz >>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz >>>> >>>> IMPORTANT: if you fix the issue, please add the following tag to the commit: >>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com >>>> >>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI >>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] >>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 >>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 >>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 >>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 >>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 >>>> >>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 >>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 >>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 >>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 >>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 >>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> Call Trace: >>>> <TASK> >>>> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 >>>> __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 >>>> __kernel_write+0xf6/0x140 fs/read_write.c:632 >>>> do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 >>>> acct_pin_kill+0x2d/0x100 kernel/acct.c:192 >>>> pin_kill+0x194/0x7c0 fs/fs_pin.c:44 >>>> mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 >>>> cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 >>>> task_work_run+0x14e/0x250 kernel/task_work.c:239 >>>> exit_task_work include/linux/task_work.h:43 [inline] >>>> do_exit+0xad8/0x2d70 kernel/exit.c:938 >>>> do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 >>>> get_signal+0x2576/0x2610 kernel/signal.c:3017 >>>> arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 >>>> exit_to_user_mode_loop kernel/entry/common.c:111 [inline] >>>> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] >>>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] >>>> syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 >>>> do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 >>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f >>>> RIP: 0033:0x7fee3cb87a6a >>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40. >>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 >>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a >>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 >>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 >>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 >>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 >>>> </TASK> >>>> Modules linked in: >>>> ---[ end trace 0000000000000000 ]--- >>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 >>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 >>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 >>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 >>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 >>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 >>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 >>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 >>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> ---------------- >>>> Code disassembly (best guess), 1 bytes skipped: >>>> 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) >>>> 5: 0f 85 fe 02 00 00 jne 0x309 >>>> b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 >>>> 12: 00 >>>> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax >>>> 1a: fc ff df >>>> 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi >>>> 22: 48 89 fa mov %rdi,%rdx >>>> 25: 48 c1 ea 03 shr $0x3,%rdx >>>> * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction >>>> 2d: 0f 85 cc 02 00 00 jne 0x2ff >>>> 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 >>>> 38: 48 rex.W >>>> 39: 8d .byte 0x8d >>>> 3a: 84 24 c8 test %ah,(%rax,%rcx,8) >> >> (...) >> >>> I thought acct(2) was only allowing regular files. >>> >>> acct_on() indeed has : >>> >>> if (!S_ISREG(file_inode(file)->i_mode)) { >>> kfree(acct); >>> filp_close(file, NULL); >>> return -EACCES; >>> } >>> >>> It seems there are other ways to call do_acct_process() targeting a sysfs file ? >> >> Just to be sure I'm not misunderstanding your comment: do you mean that >> here, the issue is *not* in MPTCP code where we get the 'struct net' >> pointer via 'current->nsproxy->net_ns', but in the FS part, right? >> >> Here, we have an issue because 'current->nsproxy' is NULL, but is it >> normal? Or should we simply exit with an error if it is the case because >> we are in an exiting phase? >> >> I'm just a bit confused, because it looks like 'net' is retrieved from >> different places elsewhere when dealing with sysfs: some get it from >> 'current' like us, some assign 'net' to 'table->extra2', others get it >> from 'table->data' (via a container_of()), etc. Maybe we should not use >> 'current->nsproxy->net_ns' here then? > > I do think this is a bug in process accounting, not in networking. > > It might make sense to output a record on a regular file, but probably > not on any other files. > > diff --git a/kernel/acct.c b/kernel/acct.c > index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 > 100644 > --- a/kernel/acct.c > +++ b/kernel/acct.c > @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) > const struct cred *orig_cred; > struct file *file = acct->file; > > + if (S_ISREG(file_inode(file)->i_mode)) > + return; > + > /* > * Accounting records are not subject to resource limits. > */ OK, thank you, that's clearer. So this is then more a question for Joel, right? Do you plan to send this patch to him? #syz set subsystems: fs Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-04 19:11 ` Matthieu Baerts @ 2025-01-06 13:32 ` Joel Granados 2025-01-06 14:27 ` Matthieu Baerts 0 siblings, 1 reply; 22+ messages in thread From: Joel Granados @ 2025-01-06 13:32 UTC (permalink / raw) To: Matthieu Baerts Cc: Eric Dumazet, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot, Al Viro On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote: > Hi Eric, > > (+cc Joel) > > Thank you for your reply! > > On 04/01/2025 19:53, Eric Dumazet wrote: > > On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote: > >> > >> Hi Eric, > >> > >> Thank you for the bug report! > >> > >> On 02/01/2025 16:21, Eric Dumazet wrote: > >>> On Thu, Jan 2, 2025 at 3:12 PM syzbot > >>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote: > >>>> > >>>> Hello, > >>>> > >>>> syzbot found the following issue on: > >>>> > >>>> HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. > >>>> git tree: upstream > >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 > >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f > >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 > >>>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 > >>>> > >>>> Downloadable assets: > >>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz > >>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz > >>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz > >>>> > >>>> IMPORTANT: if you fix the issue, please add the following tag to the commit: > >>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com > >>>> > >>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI > >>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > >>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 > >>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 > >>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > >>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > >>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > >>>> > >>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > >>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > >>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > >>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > >>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > >>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>> Call Trace: > >>>> <TASK> > >>>> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 > >>>> __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 > >>>> __kernel_write+0xf6/0x140 fs/read_write.c:632 > >>>> do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 > >>>> acct_pin_kill+0x2d/0x100 kernel/acct.c:192 > >>>> pin_kill+0x194/0x7c0 fs/fs_pin.c:44 > >>>> mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 > >>>> cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 > >>>> task_work_run+0x14e/0x250 kernel/task_work.c:239 > >>>> exit_task_work include/linux/task_work.h:43 [inline] > >>>> do_exit+0xad8/0x2d70 kernel/exit.c:938 > >>>> do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 > >>>> get_signal+0x2576/0x2610 kernel/signal.c:3017 > >>>> arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 > >>>> exit_to_user_mode_loop kernel/entry/common.c:111 [inline] > >>>> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] > >>>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > >>>> syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 > >>>> do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 > >>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f > >>>> RIP: 0033:0x7fee3cb87a6a > >>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40. > >>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 > >>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a > >>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 > >>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 > >>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 > >>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 > >>>> </TASK> > >>>> Modules linked in: > >>>> ---[ end trace 0000000000000000 ]--- > >>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > >>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > >>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > >>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > >>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > >>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > >>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > >>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > >>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > >>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>> ---------------- > >>>> Code disassembly (best guess), 1 bytes skipped: > >>>> 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > >>>> 5: 0f 85 fe 02 00 00 jne 0x309 > >>>> b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 > >>>> 12: 00 > >>>> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > >>>> 1a: fc ff df > >>>> 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi > >>>> 22: 48 89 fa mov %rdi,%rdx > >>>> 25: 48 c1 ea 03 shr $0x3,%rdx > >>>> * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction > >>>> 2d: 0f 85 cc 02 00 00 jne 0x2ff > >>>> 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 > >>>> 38: 48 rex.W > >>>> 39: 8d .byte 0x8d > >>>> 3a: 84 24 c8 test %ah,(%rax,%rcx,8) > >> > >> (...) > >> > >>> I thought acct(2) was only allowing regular files. > >>> > >>> acct_on() indeed has : > >>> > >>> if (!S_ISREG(file_inode(file)->i_mode)) { > >>> kfree(acct); > >>> filp_close(file, NULL); > >>> return -EACCES; > >>> } > >>> > >>> It seems there are other ways to call do_acct_process() targeting a sysfs file ? If this is the case, can you point me to the place where this happens? > >> > >> Just to be sure I'm not misunderstanding your comment: do you mean that > >> here, the issue is *not* in MPTCP code where we get the 'struct net' > >> pointer via 'current->nsproxy->net_ns', but in the FS part, right? > >> > >> Here, we have an issue because 'current->nsproxy' is NULL, but is it > >> normal? Or should we simply exit with an error if it is the case because > >> we are in an exiting phase? > >> > >> I'm just a bit confused, because it looks like 'net' is retrieved from > >> different places elsewhere when dealing with sysfs: some get it from > >> 'current' like us, some assign 'net' to 'table->extra2', others get it > >> from 'table->data' (via a container_of()), etc. Maybe we should not use > >> 'current->nsproxy->net_ns' here then? > > > > I do think this is a bug in process accounting, not in networking. > > > > It might make sense to output a record on a regular file, but probably > > not on any other files. It for sure does not make sense to output a record on a sysctl file that has a maxlen of just 3*sizeof(int) (kernel/acct.c:79). > > > > diff --git a/kernel/acct.c b/kernel/acct.c > > index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 > > 100644 > > --- a/kernel/acct.c > > +++ b/kernel/acct.c > > @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) > > const struct cred *orig_cred; > > struct file *file = acct->file; > > > > + if (S_ISREG(file_inode(file)->i_mode)) > > + return; > > + This seems like it does not handle the actual culprit which is. Why is the sysctl file being used for the accounting. > > /* > > * Accounting records are not subject to resource limits. > > */ > > OK, thank you, that's clearer. > > So this is then more a question for Joel, right? > > Do you plan to send this patch to him? > > #syz set subsystems: fs > > Cheers, > Matt > -- > Sponsored by the NGI0 Core fund. > So what is happening is that: 1. The accounting file is set to a non-sysctl file. 2. And when accounting tries to write to this file, you get the behaviour explained in this mail? Please correct me if I have miss-read the situation. Best -- Joel Granados ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-06 13:32 ` Joel Granados @ 2025-01-06 14:27 ` Matthieu Baerts 2025-01-06 15:27 ` Eric Dumazet 2025-01-08 14:37 ` Joel Granados 0 siblings, 2 replies; 22+ messages in thread From: Matthieu Baerts @ 2025-01-06 14:27 UTC (permalink / raw) To: Joel Granados, Eric Dumazet, Al Viro Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot Hi Joel, Eric, Al, On 06/01/2025 14:32, Joel Granados wrote: > On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote: >> Hi Eric, >> >> (+cc Joel) >> >> Thank you for your reply! >> >> On 04/01/2025 19:53, Eric Dumazet wrote: >>> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote: >>>> >>>> Hi Eric, >>>> >>>> Thank you for the bug report! >>>> >>>> On 02/01/2025 16:21, Eric Dumazet wrote: >>>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot >>>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> syzbot found the following issue on: >>>>>> >>>>>> HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. >>>>>> git tree: upstream >>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 >>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f >>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 >>>>>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 >>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 >>>>>> >>>>>> Downloadable assets: >>>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz >>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz >>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz >>>>>> >>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit: >>>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com >>>>>> >>>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI >>>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] >>>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 >>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 >>>>>> >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 >>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>> Call Trace: >>>>>> <TASK> >>>>>> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 >>>>>> __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 >>>>>> __kernel_write+0xf6/0x140 fs/read_write.c:632 >>>>>> do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 >>>>>> acct_pin_kill+0x2d/0x100 kernel/acct.c:192 >>>>>> pin_kill+0x194/0x7c0 fs/fs_pin.c:44 >>>>>> mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 >>>>>> cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 >>>>>> task_work_run+0x14e/0x250 kernel/task_work.c:239 >>>>>> exit_task_work include/linux/task_work.h:43 [inline] >>>>>> do_exit+0xad8/0x2d70 kernel/exit.c:938 >>>>>> do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 >>>>>> get_signal+0x2576/0x2610 kernel/signal.c:3017 >>>>>> arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 >>>>>> exit_to_user_mode_loop kernel/entry/common.c:111 [inline] >>>>>> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] >>>>>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] >>>>>> syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 >>>>>> do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 >>>>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f >>>>>> RIP: 0033:0x7fee3cb87a6a >>>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40. >>>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 >>>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a >>>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 >>>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 >>>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 >>>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 >>>>>> </TASK> >>>>>> Modules linked in: >>>>>> ---[ end trace 0000000000000000 ]--- >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 >>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>> ---------------- >>>>>> Code disassembly (best guess), 1 bytes skipped: >>>>>> 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) >>>>>> 5: 0f 85 fe 02 00 00 jne 0x309 >>>>>> b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 >>>>>> 12: 00 >>>>>> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax >>>>>> 1a: fc ff df >>>>>> 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi >>>>>> 22: 48 89 fa mov %rdi,%rdx >>>>>> 25: 48 c1 ea 03 shr $0x3,%rdx >>>>>> * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction >>>>>> 2d: 0f 85 cc 02 00 00 jne 0x2ff >>>>>> 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 >>>>>> 38: 48 rex.W >>>>>> 39: 8d .byte 0x8d >>>>>> 3a: 84 24 c8 test %ah,(%rax,%rcx,8) >>>> >>>> (...) >>>> >>>>> I thought acct(2) was only allowing regular files. >>>>> >>>>> acct_on() indeed has : >>>>> >>>>> if (!S_ISREG(file_inode(file)->i_mode)) { >>>>> kfree(acct); >>>>> filp_close(file, NULL); >>>>> return -EACCES; >>>>> } >>>>> >>>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ? > If this is the case, can you point me to the place where this happens? > >>>> >>>> Just to be sure I'm not misunderstanding your comment: do you mean that >>>> here, the issue is *not* in MPTCP code where we get the 'struct net' >>>> pointer via 'current->nsproxy->net_ns', but in the FS part, right? >>>> >>>> Here, we have an issue because 'current->nsproxy' is NULL, but is it >>>> normal? Or should we simply exit with an error if it is the case because >>>> we are in an exiting phase? >>>> >>>> I'm just a bit confused, because it looks like 'net' is retrieved from >>>> different places elsewhere when dealing with sysfs: some get it from >>>> 'current' like us, some assign 'net' to 'table->extra2', others get it >>>> from 'table->data' (via a container_of()), etc. Maybe we should not use >>>> 'current->nsproxy->net_ns' here then? >>> >>> I do think this is a bug in process accounting, not in networking. >>> >>> It might make sense to output a record on a regular file, but probably >>> not on any other files. > It for sure does not make sense to output a record on a sysctl file that > has a maxlen of just 3*sizeof(int) (kernel/acct.c:79). > >>> >>> diff --git a/kernel/acct.c b/kernel/acct.c >>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 >>> 100644 >>> --- a/kernel/acct.c >>> +++ b/kernel/acct.c >>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) >>> const struct cred *orig_cred; >>> struct file *file = acct->file; >>> >>> + if (S_ISREG(file_inode(file)->i_mode)) >>> + return; >>> + > This seems like it does not handle the actual culprit which is. Why is > the sysctl file being used for the accounting. > >>> /* >>> * Accounting records are not subject to resource limits. >>> */ >> >> OK, thank you, that's clearer. >> >> So this is then more a question for Joel, right? >> >> Do you plan to send this patch to him? >> >> #syz set subsystems: fs >> >> Cheers, >> Matt >> -- >> Sponsored by the NGI0 Core fund. >> > > So what is happening is that: > 1. The accounting file is set to a non-sysctl file. > 2. And when accounting tries to write to this file, you get the > behaviour explained in this mail? > > Please correct me if I have miss-read the situation. @Joel: Thank you for your reply! I'm sorry, I'm not sure whether I can help here. I hope Eric and/or Al can jump in. What I can say is that the original issue has been found by syzbot, and the reproducer [1] shows that 3 syscalls have been used: - openat('/proc/sys/net/mptcp/scheduler') - mprotect() - acct() Please also note that the conversation continued in a sub-tread where you are not in the Cc list, see [2]. In short, Eric suggested another patch only for sysfs, and Al recommended dropping the use of 'current->nsproxy'. On my side, I'm looking at dropping the use of 'current->nsproxy' in sysctl callbacks. I guess such patches will be seen as fixes, except if Eric's new patch is enough for stable? [1] https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 [2] https://lore.kernel.org/netdev/67769ecb.050a0220.3a8527.003f.GAE@google.com/T/#m862d0913ebfcec5e462a9c33b47bc3f6440a2900 Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-06 14:27 ` Matthieu Baerts @ 2025-01-06 15:27 ` Eric Dumazet 2025-01-06 15:34 ` Matthieu Baerts 2025-01-08 14:37 ` Joel Granados 1 sibling, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2025-01-06 15:27 UTC (permalink / raw) To: Matthieu Baerts Cc: Joel Granados, Al Viro, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Mon, Jan 6, 2025 at 3:27 PM Matthieu Baerts <matttbe@kernel.org> wrote: > > Hi Joel, Eric, Al, > > On 06/01/2025 14:32, Joel Granados wrote: > > On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote: > >> Hi Eric, > >> > >> (+cc Joel) > >> > >> Thank you for your reply! > >> > >> On 04/01/2025 19:53, Eric Dumazet wrote: > >>> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote: > >>>> > >>>> Hi Eric, > >>>> > >>>> Thank you for the bug report! > >>>> > >>>> On 02/01/2025 16:21, Eric Dumazet wrote: > >>>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot > >>>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote: > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> syzbot found the following issue on: > >>>>>> > >>>>>> HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. > >>>>>> git tree: upstream > >>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 > >>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f > >>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 > >>>>>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > >>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 > >>>>>> > >>>>>> Downloadable assets: > >>>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz > >>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz > >>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz > >>>>>> > >>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit: > >>>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com > >>>>>> > >>>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI > >>>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > >>>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 > >>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 > >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > >>>>>> > >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > >>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>> Call Trace: > >>>>>> <TASK> > >>>>>> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 > >>>>>> __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 > >>>>>> __kernel_write+0xf6/0x140 fs/read_write.c:632 > >>>>>> do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 > >>>>>> acct_pin_kill+0x2d/0x100 kernel/acct.c:192 > >>>>>> pin_kill+0x194/0x7c0 fs/fs_pin.c:44 > >>>>>> mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 > >>>>>> cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 > >>>>>> task_work_run+0x14e/0x250 kernel/task_work.c:239 > >>>>>> exit_task_work include/linux/task_work.h:43 [inline] > >>>>>> do_exit+0xad8/0x2d70 kernel/exit.c:938 > >>>>>> do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 > >>>>>> get_signal+0x2576/0x2610 kernel/signal.c:3017 > >>>>>> arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 > >>>>>> exit_to_user_mode_loop kernel/entry/common.c:111 [inline] > >>>>>> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] > >>>>>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > >>>>>> syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 > >>>>>> do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 > >>>>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f > >>>>>> RIP: 0033:0x7fee3cb87a6a > >>>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40. > >>>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 > >>>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a > >>>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 > >>>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 > >>>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 > >>>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 > >>>>>> </TASK> > >>>>>> Modules linked in: > >>>>>> ---[ end trace 0000000000000000 ]--- > >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > >>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>> ---------------- > >>>>>> Code disassembly (best guess), 1 bytes skipped: > >>>>>> 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > >>>>>> 5: 0f 85 fe 02 00 00 jne 0x309 > >>>>>> b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 > >>>>>> 12: 00 > >>>>>> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > >>>>>> 1a: fc ff df > >>>>>> 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi > >>>>>> 22: 48 89 fa mov %rdi,%rdx > >>>>>> 25: 48 c1 ea 03 shr $0x3,%rdx > >>>>>> * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction > >>>>>> 2d: 0f 85 cc 02 00 00 jne 0x2ff > >>>>>> 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 > >>>>>> 38: 48 rex.W > >>>>>> 39: 8d .byte 0x8d > >>>>>> 3a: 84 24 c8 test %ah,(%rax,%rcx,8) > >>>> > >>>> (...) > >>>> > >>>>> I thought acct(2) was only allowing regular files. > >>>>> > >>>>> acct_on() indeed has : > >>>>> > >>>>> if (!S_ISREG(file_inode(file)->i_mode)) { > >>>>> kfree(acct); > >>>>> filp_close(file, NULL); > >>>>> return -EACCES; > >>>>> } > >>>>> > >>>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ? > > If this is the case, can you point me to the place where this happens? > > > >>>> > >>>> Just to be sure I'm not misunderstanding your comment: do you mean that > >>>> here, the issue is *not* in MPTCP code where we get the 'struct net' > >>>> pointer via 'current->nsproxy->net_ns', but in the FS part, right? > >>>> > >>>> Here, we have an issue because 'current->nsproxy' is NULL, but is it > >>>> normal? Or should we simply exit with an error if it is the case because > >>>> we are in an exiting phase? > >>>> > >>>> I'm just a bit confused, because it looks like 'net' is retrieved from > >>>> different places elsewhere when dealing with sysfs: some get it from > >>>> 'current' like us, some assign 'net' to 'table->extra2', others get it > >>>> from 'table->data' (via a container_of()), etc. Maybe we should not use > >>>> 'current->nsproxy->net_ns' here then? > >>> > >>> I do think this is a bug in process accounting, not in networking. > >>> > >>> It might make sense to output a record on a regular file, but probably > >>> not on any other files. > > It for sure does not make sense to output a record on a sysctl file that > > has a maxlen of just 3*sizeof(int) (kernel/acct.c:79). > > > >>> > >>> diff --git a/kernel/acct.c b/kernel/acct.c > >>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 > >>> 100644 > >>> --- a/kernel/acct.c > >>> +++ b/kernel/acct.c > >>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) > >>> const struct cred *orig_cred; > >>> struct file *file = acct->file; > >>> > >>> + if (S_ISREG(file_inode(file)->i_mode)) > >>> + return; > >>> + > > This seems like it does not handle the actual culprit which is. Why is > > the sysctl file being used for the accounting. > > > >>> /* > >>> * Accounting records are not subject to resource limits. > >>> */ > >> > >> OK, thank you, that's clearer. > >> > >> So this is then more a question for Joel, right? > >> > >> Do you plan to send this patch to him? > >> > >> #syz set subsystems: fs > >> > >> Cheers, > >> Matt > >> -- > >> Sponsored by the NGI0 Core fund. > >> > > > > So what is happening is that: > > 1. The accounting file is set to a non-sysctl file. > > 2. And when accounting tries to write to this file, you get the > > behaviour explained in this mail? > > > > Please correct me if I have miss-read the situation. > > @Joel: Thank you for your reply! > > I'm sorry, I'm not sure whether I can help here. I hope Eric and/or Al > can jump in. > > What I can say is that the original issue has been found by syzbot, and > the reproducer [1] shows that 3 syscalls have been used: > - openat('/proc/sys/net/mptcp/scheduler') > - mprotect() > - acct() > > Please also note that the conversation continued in a sub-tread where > you are not in the Cc list, see [2]. In short, Eric suggested another > patch only for sysfs, and Al recommended dropping the use of > 'current->nsproxy'. > > On my side, I'm looking at dropping the use of 'current->nsproxy' in > sysctl callbacks. I guess such patches will be seen as fixes, except if > Eric's new patch is enough for stable? It might be less risky in terms of backports to patch mptcp and others. Ie just use Al suggestion. Thanks ! ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-06 15:27 ` Eric Dumazet @ 2025-01-06 15:34 ` Matthieu Baerts 0 siblings, 0 replies; 22+ messages in thread From: Matthieu Baerts @ 2025-01-06 15:34 UTC (permalink / raw) To: Eric Dumazet Cc: Joel Granados, Al Viro, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot Hi Eric, Thank you for your reply! On 06/01/2025 16:27, Eric Dumazet wrote: > On Mon, Jan 6, 2025 at 3:27 PM Matthieu Baerts <matttbe@kernel.org> wrote: >> >> Hi Joel, Eric, Al, >> >> On 06/01/2025 14:32, Joel Granados wrote: >>> On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote: >>>> Hi Eric, >>>> >>>> (+cc Joel) >>>> >>>> Thank you for your reply! >>>> >>>> On 04/01/2025 19:53, Eric Dumazet wrote: >>>>> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote: >>>>>> >>>>>> Hi Eric, >>>>>> >>>>>> Thank you for the bug report! >>>>>> >>>>>> On 02/01/2025 16:21, Eric Dumazet wrote: >>>>>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot >>>>>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote: >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> syzbot found the following issue on: >>>>>>>> >>>>>>>> HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. >>>>>>>> git tree: upstream >>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 >>>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f >>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 >>>>>>>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 >>>>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 >>>>>>>> >>>>>>>> Downloadable assets: >>>>>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz >>>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz >>>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz >>>>>>>> >>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit: >>>>>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com >>>>>>>> >>>>>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI >>>>>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] >>>>>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 >>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 >>>>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 >>>>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 >>>>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 >>>>>>>> >>>>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 >>>>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 >>>>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 >>>>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 >>>>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 >>>>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 >>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 >>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>>> Call Trace: >>>>>>>> <TASK> >>>>>>>> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 >>>>>>>> __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 >>>>>>>> __kernel_write+0xf6/0x140 fs/read_write.c:632 >>>>>>>> do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 >>>>>>>> acct_pin_kill+0x2d/0x100 kernel/acct.c:192 >>>>>>>> pin_kill+0x194/0x7c0 fs/fs_pin.c:44 >>>>>>>> mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 >>>>>>>> cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 >>>>>>>> task_work_run+0x14e/0x250 kernel/task_work.c:239 >>>>>>>> exit_task_work include/linux/task_work.h:43 [inline] >>>>>>>> do_exit+0xad8/0x2d70 kernel/exit.c:938 >>>>>>>> do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 >>>>>>>> get_signal+0x2576/0x2610 kernel/signal.c:3017 >>>>>>>> arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 >>>>>>>> exit_to_user_mode_loop kernel/entry/common.c:111 [inline] >>>>>>>> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] >>>>>>>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] >>>>>>>> syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 >>>>>>>> do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 >>>>>>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f >>>>>>>> RIP: 0033:0x7fee3cb87a6a >>>>>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40. >>>>>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 >>>>>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a >>>>>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 >>>>>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 >>>>>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 >>>>>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 >>>>>>>> </TASK> >>>>>>>> Modules linked in: >>>>>>>> ---[ end trace 0000000000000000 ]--- >>>>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 >>>>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 >>>>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 >>>>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 >>>>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 >>>>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 >>>>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 >>>>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 >>>>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 >>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 >>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>>> ---------------- >>>>>>>> Code disassembly (best guess), 1 bytes skipped: >>>>>>>> 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) >>>>>>>> 5: 0f 85 fe 02 00 00 jne 0x309 >>>>>>>> b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 >>>>>>>> 12: 00 >>>>>>>> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax >>>>>>>> 1a: fc ff df >>>>>>>> 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi >>>>>>>> 22: 48 89 fa mov %rdi,%rdx >>>>>>>> 25: 48 c1 ea 03 shr $0x3,%rdx >>>>>>>> * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction >>>>>>>> 2d: 0f 85 cc 02 00 00 jne 0x2ff >>>>>>>> 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 >>>>>>>> 38: 48 rex.W >>>>>>>> 39: 8d .byte 0x8d >>>>>>>> 3a: 84 24 c8 test %ah,(%rax,%rcx,8) >>>>>> >>>>>> (...) >>>>>> >>>>>>> I thought acct(2) was only allowing regular files. >>>>>>> >>>>>>> acct_on() indeed has : >>>>>>> >>>>>>> if (!S_ISREG(file_inode(file)->i_mode)) { >>>>>>> kfree(acct); >>>>>>> filp_close(file, NULL); >>>>>>> return -EACCES; >>>>>>> } >>>>>>> >>>>>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ? >>> If this is the case, can you point me to the place where this happens? >>> >>>>>> >>>>>> Just to be sure I'm not misunderstanding your comment: do you mean that >>>>>> here, the issue is *not* in MPTCP code where we get the 'struct net' >>>>>> pointer via 'current->nsproxy->net_ns', but in the FS part, right? >>>>>> >>>>>> Here, we have an issue because 'current->nsproxy' is NULL, but is it >>>>>> normal? Or should we simply exit with an error if it is the case because >>>>>> we are in an exiting phase? >>>>>> >>>>>> I'm just a bit confused, because it looks like 'net' is retrieved from >>>>>> different places elsewhere when dealing with sysfs: some get it from >>>>>> 'current' like us, some assign 'net' to 'table->extra2', others get it >>>>>> from 'table->data' (via a container_of()), etc. Maybe we should not use >>>>>> 'current->nsproxy->net_ns' here then? >>>>> >>>>> I do think this is a bug in process accounting, not in networking. >>>>> >>>>> It might make sense to output a record on a regular file, but probably >>>>> not on any other files. >>> It for sure does not make sense to output a record on a sysctl file that >>> has a maxlen of just 3*sizeof(int) (kernel/acct.c:79). >>> >>>>> >>>>> diff --git a/kernel/acct.c b/kernel/acct.c >>>>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 >>>>> 100644 >>>>> --- a/kernel/acct.c >>>>> +++ b/kernel/acct.c >>>>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) >>>>> const struct cred *orig_cred; >>>>> struct file *file = acct->file; >>>>> >>>>> + if (S_ISREG(file_inode(file)->i_mode)) >>>>> + return; >>>>> + >>> This seems like it does not handle the actual culprit which is. Why is >>> the sysctl file being used for the accounting. >>> >>>>> /* >>>>> * Accounting records are not subject to resource limits. >>>>> */ >>>> >>>> OK, thank you, that's clearer. >>>> >>>> So this is then more a question for Joel, right? >>>> >>>> Do you plan to send this patch to him? >>>> >>>> #syz set subsystems: fs >>>> >>>> Cheers, >>>> Matt >>>> -- >>>> Sponsored by the NGI0 Core fund. >>>> >>> >>> So what is happening is that: >>> 1. The accounting file is set to a non-sysctl file. >>> 2. And when accounting tries to write to this file, you get the >>> behaviour explained in this mail? >>> >>> Please correct me if I have miss-read the situation. >> >> @Joel: Thank you for your reply! >> >> I'm sorry, I'm not sure whether I can help here. I hope Eric and/or Al >> can jump in. >> >> What I can say is that the original issue has been found by syzbot, and >> the reproducer [1] shows that 3 syscalls have been used: >> - openat('/proc/sys/net/mptcp/scheduler') >> - mprotect() >> - acct() >> >> Please also note that the conversation continued in a sub-tread where >> you are not in the Cc list, see [2]. In short, Eric suggested another >> patch only for sysfs, and Al recommended dropping the use of >> 'current->nsproxy'. >> >> On my side, I'm looking at dropping the use of 'current->nsproxy' in >> sysctl callbacks. I guess such patches will be seen as fixes, except if >> Eric's new patch is enough for stable? > > It might be less risky in terms of backports to patch mptcp and others. > > Ie just use Al suggestion. Thank you, will do! In fact, I already modified the kernel on my side, but it is hard for me to validate that for the moment: it is nice to have many trees around, but less when they fall on cables :) Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-06 14:27 ` Matthieu Baerts 2025-01-06 15:27 ` Eric Dumazet @ 2025-01-08 14:37 ` Joel Granados 1 sibling, 0 replies; 22+ messages in thread From: Joel Granados @ 2025-01-08 14:37 UTC (permalink / raw) To: Matthieu Baerts Cc: Eric Dumazet, Al Viro, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Mon, Jan 06, 2025 at 03:27:47PM +0100, Matthieu Baerts wrote: > Hi Joel, Eric, Al, > > On 06/01/2025 14:32, Joel Granados wrote: > > On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote: > >> Hi Eric, > >> > >> (+cc Joel) > >> > >> Thank you for your reply! > >> > >> On 04/01/2025 19:53, Eric Dumazet wrote: > >>> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote: > >>>> > >>>> Hi Eric, > >>>> > >>>> Thank you for the bug report! > >>>> > >>>> On 02/01/2025 16:21, Eric Dumazet wrote: > >>>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot > >>>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote: > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> syzbot found the following issue on: > >>>>>> > >>>>>> HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g.. > >>>>>> git tree: upstream > >>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000 > >>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f > >>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1 > >>>>>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40 > >>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000 > >>>>>> > >>>>>> Downloadable assets: > >>>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz > >>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz > >>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz > >>>>>> > >>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit: > >>>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com > >>>>>> > >>>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI > >>>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] > >>>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 > >>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 > >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > >>>>>> > >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > >>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>> Call Trace: > >>>>>> <TASK> > >>>>>> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 > >>>>>> __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 > >>>>>> __kernel_write+0xf6/0x140 fs/read_write.c:632 > >>>>>> do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 > >>>>>> acct_pin_kill+0x2d/0x100 kernel/acct.c:192 > >>>>>> pin_kill+0x194/0x7c0 fs/fs_pin.c:44 > >>>>>> mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 > >>>>>> cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 > >>>>>> task_work_run+0x14e/0x250 kernel/task_work.c:239 > >>>>>> exit_task_work include/linux/task_work.h:43 [inline] > >>>>>> do_exit+0xad8/0x2d70 kernel/exit.c:938 > >>>>>> do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 > >>>>>> get_signal+0x2576/0x2610 kernel/signal.c:3017 > >>>>>> arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 > >>>>>> exit_to_user_mode_loop kernel/entry/common.c:111 [inline] > >>>>>> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] > >>>>>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > >>>>>> syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 > >>>>>> do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 > >>>>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f > >>>>>> RIP: 0033:0x7fee3cb87a6a > >>>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40. > >>>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 > >>>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a > >>>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 > >>>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 > >>>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 > >>>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 > >>>>>> </TASK> > >>>>>> Modules linked in: > >>>>>> ---[ end trace 0000000000000000 ]--- > >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 > >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 > >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 > >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 > >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 > >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 > >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 > >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 > >>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 > >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 > >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>> ---------------- > >>>>>> Code disassembly (best guess), 1 bytes skipped: > >>>>>> 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > >>>>>> 5: 0f 85 fe 02 00 00 jne 0x309 > >>>>>> b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 > >>>>>> 12: 00 > >>>>>> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > >>>>>> 1a: fc ff df > >>>>>> 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi > >>>>>> 22: 48 89 fa mov %rdi,%rdx > >>>>>> 25: 48 c1 ea 03 shr $0x3,%rdx > >>>>>> * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction > >>>>>> 2d: 0f 85 cc 02 00 00 jne 0x2ff > >>>>>> 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 > >>>>>> 38: 48 rex.W > >>>>>> 39: 8d .byte 0x8d > >>>>>> 3a: 84 24 c8 test %ah,(%rax,%rcx,8) > >>>> > >>>> (...) > >>>> > >>>>> I thought acct(2) was only allowing regular files. > >>>>> > >>>>> acct_on() indeed has : > >>>>> > >>>>> if (!S_ISREG(file_inode(file)->i_mode)) { > >>>>> kfree(acct); > >>>>> filp_close(file, NULL); > >>>>> return -EACCES; > >>>>> } > >>>>> > >>>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ? > > If this is the case, can you point me to the place where this happens? > > > >>>> > >>>> Just to be sure I'm not misunderstanding your comment: do you mean that > >>>> here, the issue is *not* in MPTCP code where we get the 'struct net' > >>>> pointer via 'current->nsproxy->net_ns', but in the FS part, right? > >>>> > >>>> Here, we have an issue because 'current->nsproxy' is NULL, but is it > >>>> normal? Or should we simply exit with an error if it is the case because > >>>> we are in an exiting phase? > >>>> > >>>> I'm just a bit confused, because it looks like 'net' is retrieved from > >>>> different places elsewhere when dealing with sysfs: some get it from > >>>> 'current' like us, some assign 'net' to 'table->extra2', others get it > >>>> from 'table->data' (via a container_of()), etc. Maybe we should not use > >>>> 'current->nsproxy->net_ns' here then? > >>> > >>> I do think this is a bug in process accounting, not in networking. > >>> > >>> It might make sense to output a record on a regular file, but probably > >>> not on any other files. > > It for sure does not make sense to output a record on a sysctl file that > > has a maxlen of just 3*sizeof(int) (kernel/acct.c:79). > > > >>> > >>> diff --git a/kernel/acct.c b/kernel/acct.c > >>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 > >>> 100644 > >>> --- a/kernel/acct.c > >>> +++ b/kernel/acct.c > >>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) > >>> const struct cred *orig_cred; > >>> struct file *file = acct->file; > >>> > >>> + if (S_ISREG(file_inode(file)->i_mode)) > >>> + return; > >>> + > > This seems like it does not handle the actual culprit which is. Why is > > the sysctl file being used for the accounting. > > > >>> /* > >>> * Accounting records are not subject to resource limits. > >>> */ > >> > >> OK, thank you, that's clearer. > >> > >> So this is then more a question for Joel, right? > >> > >> Do you plan to send this patch to him? > >> > >> #syz set subsystems: fs > >> > >> Cheers, > >> Matt > >> -- > >> Sponsored by the NGI0 Core fund. > >> > > > > So what is happening is that: > > 1. The accounting file is set to a non-sysctl file. > > 2. And when accounting tries to write to this file, you get the > > behaviour explained in this mail? > > > > Please correct me if I have miss-read the situation. > > @Joel: Thank you for your reply! > > I'm sorry, I'm not sure whether I can help here. I hope Eric and/or Al > can jump in. > > What I can say is that the original issue has been found by syzbot, and > the reproducer [1] shows that 3 syscalls have been used: > - openat('/proc/sys/net/mptcp/scheduler') > - mprotect() > - acct() > > Please also note that the conversation continued in a sub-tread where > you are not in the Cc list, see [2]. In short, Eric suggested another > patch only for sysfs, and Al recommended dropping the use of > 'current->nsproxy'. Perfect. Thx for the summary. I'll remove this thread from my radar as it seems that a fix has already been found. Best -- Joel Granados ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler 2025-01-04 18:53 ` Eric Dumazet 2025-01-04 19:00 ` Al Viro 2025-01-04 19:11 ` Matthieu Baerts @ 2025-01-04 20:09 ` Al Viro 2 siblings, 0 replies; 22+ messages in thread From: Al Viro @ 2025-01-04 20:09 UTC (permalink / raw) To: Eric Dumazet Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel, martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot On Sat, Jan 04, 2025 at 07:53:22PM +0100, Eric Dumazet wrote: > I do think this is a bug in process accounting, not in networking. > > It might make sense to output a record on a regular file, but probably > not on any other files. > > diff --git a/kernel/acct.c b/kernel/acct.c > index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604 > 100644 > --- a/kernel/acct.c > +++ b/kernel/acct.c > @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct) > const struct cred *orig_cred; > struct file *file = acct->file; > > + if (S_ISREG(file_inode(file)->i_mode)) > + return; Wait, what? OK, that will stop attempts to write there - or to any other regular file. If you modify that to if (!S_ISREG(...)) you seem to have intended, it won't break the normal behaviour but it won't help with sysctls. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-01-08 14:37 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-02 14:12 [syzbot] [mptcp?] general protection fault in proc_scheduler syzbot 2025-01-02 15:21 ` Eric Dumazet 2025-01-04 18:38 ` Matthieu Baerts 2025-01-04 18:53 ` Eric Dumazet 2025-01-04 19:00 ` Al Viro 2025-01-04 19:11 ` Matthieu Baerts 2025-01-04 20:21 ` Al Viro 2025-01-05 8:32 ` Eric Dumazet 2025-01-05 11:29 ` Al Viro 2025-01-05 16:52 ` Eric Dumazet 2025-01-05 17:03 ` Matthieu Baerts 2025-01-05 19:54 ` Al Viro 2025-01-05 20:50 ` Al Viro 2025-01-05 21:11 ` Al Viro 2025-01-05 17:03 ` Matthieu Baerts 2025-01-04 19:11 ` Matthieu Baerts 2025-01-06 13:32 ` Joel Granados 2025-01-06 14:27 ` Matthieu Baerts 2025-01-06 15:27 ` Eric Dumazet 2025-01-06 15:34 ` Matthieu Baerts 2025-01-08 14:37 ` Joel Granados 2025-01-04 20:09 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).