[syzbot] [mptcp?] general protection fault in proc

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [mptcp?] general protection fault in proc_scheduler
@ 2025-01-02 14:12 syzbot
  2025-01-02 15:21 ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: syzbot @ 2025-01-02 14:12 UTC (permalink / raw)
  To: davem, edumazet, geliang, horms, kuba, linux-kernel, martineau,
	matttbe, mptcp, netdev, pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
RSP: 0018:ffffc900034774e8 EFLAGS: 00010206

RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
 __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
 __kernel_write+0xf6/0x140 fs/read_write.c:632
 do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
 acct_pin_kill+0x2d/0x100 kernel/acct.c:192
 pin_kill+0x194/0x7c0 fs/fs_pin.c:44
 mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
 cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
 task_work_run+0x14e/0x250 kernel/task_work.c:239
 exit_task_work include/linux/task_work.h:43 [inline]
 do_exit+0xad8/0x2d70 kernel/exit.c:938
 do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
 get_signal+0x2576/0x2610 kernel/signal.c:3017
 arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fee3cb87a6a
Code: Unable to access opcode bytes at 0x7fee3cb87a40.
RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess), 1 bytes skipped:
   0:	42 80 3c 38 00       	cmpb   $0x0,(%rax,%r15,1)
   5:	0f 85 fe 02 00 00    	jne    0x309
   b:	4d 8b a4 24 08 09 00 	mov    0x908(%r12),%r12
  12:	00
  13:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  1a:	fc ff df
  1d:	49 8d 7c 24 28       	lea    0x28(%r12),%rdi
  22:	48 89 fa             	mov    %rdi,%rdx
  25:	48 c1 ea 03          	shr    $0x3,%rdx
* 29:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
  2d:	0f 85 cc 02 00 00    	jne    0x2ff
  33:	4d 8b 7c 24 28       	mov    0x28(%r12),%r15
  38:	48                   	rex.W
  39:	8d                   	.byte 0x8d
  3a:	84 24 c8             	test   %ah,(%rax,%rcx,8)


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-02 14:12 [syzbot] [mptcp?] general protection fault in proc_scheduler syzbot
@ 2025-01-02 15:21 ` Eric Dumazet
  2025-01-04 18:38   ` Matthieu Baerts
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2025-01-02 15:21 UTC (permalink / raw)
  To: syzbot, Al Viro
  Cc: davem, geliang, horms, kuba, linux-kernel, martineau, matttbe,
	mptcp, netdev, pabeni, syzkaller-bugs

On Thu, Jan 2, 2025 at 3:12 PM syzbot
<syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>
> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
>  __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
>  __kernel_write+0xf6/0x140 fs/read_write.c:632
>  do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
>  acct_pin_kill+0x2d/0x100 kernel/acct.c:192
>  pin_kill+0x194/0x7c0 fs/fs_pin.c:44
>  mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
>  cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
>  task_work_run+0x14e/0x250 kernel/task_work.c:239
>  exit_task_work include/linux/task_work.h:43 [inline]
>  do_exit+0xad8/0x2d70 kernel/exit.c:938
>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
>  get_signal+0x2576/0x2610 kernel/signal.c:3017
>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>  exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>  syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
>  do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fee3cb87a6a
> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
>  </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> ----------------
> Code disassembly (best guess), 1 bytes skipped:
>    0:   42 80 3c 38 00          cmpb   $0x0,(%rax,%r15,1)
>    5:   0f 85 fe 02 00 00       jne    0x309
>    b:   4d 8b a4 24 08 09 00    mov    0x908(%r12),%r12
>   12:   00
>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
>   1a:   fc ff df
>   1d:   49 8d 7c 24 28          lea    0x28(%r12),%rdi
>   22:   48 89 fa                mov    %rdi,%rdx
>   25:   48 c1 ea 03             shr    $0x3,%rdx
> * 29:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
>   2d:   0f 85 cc 02 00 00       jne    0x2ff
>   33:   4d 8b 7c 24 28          mov    0x28(%r12),%r15
>   38:   48                      rex.W
>   39:   8d                      .byte 0x8d
>   3a:   84 24 c8                test   %ah,(%rax,%rcx,8)
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup

I thought acct(2) was only allowing regular files.

acct_on() indeed has :

if (!S_ISREG(file_inode(file)->i_mode)) {
    kfree(acct);
    filp_close(file, NULL);
    return -EACCES;
}

It seems there are other ways to call do_acct_process() targeting a sysfs file ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-02 15:21 ` Eric Dumazet
@ 2025-01-04 18:38   ` Matthieu Baerts
  2025-01-04 18:53     ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Matthieu Baerts @ 2025-01-04 18:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp,
	netdev, pabeni, syzkaller-bugs, syzbot, Al Viro

Hi Eric,

Thank you for the bug report!

On 02/01/2025 16:21, Eric Dumazet wrote:
> On Thu, Jan 2, 2025 at 3:12 PM syzbot
> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
>> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
>>
>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>>
>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>  <TASK>
>>  proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
>>  __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
>>  __kernel_write+0xf6/0x140 fs/read_write.c:632
>>  do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
>>  acct_pin_kill+0x2d/0x100 kernel/acct.c:192
>>  pin_kill+0x194/0x7c0 fs/fs_pin.c:44
>>  mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
>>  cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
>>  task_work_run+0x14e/0x250 kernel/task_work.c:239
>>  exit_task_work include/linux/task_work.h:43 [inline]
>>  do_exit+0xad8/0x2d70 kernel/exit.c:938
>>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
>>  get_signal+0x2576/0x2610 kernel/signal.c:3017
>>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
>>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>>  exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
>>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>>  syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
>>  do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7fee3cb87a6a
>> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
>>  </TASK>
>> Modules linked in:
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> ----------------
>> Code disassembly (best guess), 1 bytes skipped:
>>    0:   42 80 3c 38 00          cmpb   $0x0,(%rax,%r15,1)
>>    5:   0f 85 fe 02 00 00       jne    0x309
>>    b:   4d 8b a4 24 08 09 00    mov    0x908(%r12),%r12
>>   12:   00
>>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
>>   1a:   fc ff df
>>   1d:   49 8d 7c 24 28          lea    0x28(%r12),%rdi
>>   22:   48 89 fa                mov    %rdi,%rdx
>>   25:   48 c1 ea 03             shr    $0x3,%rdx
>> * 29:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
>>   2d:   0f 85 cc 02 00 00       jne    0x2ff
>>   33:   4d 8b 7c 24 28          mov    0x28(%r12),%r15
>>   38:   48                      rex.W
>>   39:   8d                      .byte 0x8d
>>   3a:   84 24 c8                test   %ah,(%rax,%rcx,8)

(...)

> I thought acct(2) was only allowing regular files.
> 
> acct_on() indeed has :
> 
> if (!S_ISREG(file_inode(file)->i_mode)) {
>     kfree(acct);
>     filp_close(file, NULL);
>     return -EACCES;
> }
> 
> It seems there are other ways to call do_acct_process() targeting a sysfs file ?

Just to be sure I'm not misunderstanding your comment: do you mean that
here, the issue is *not* in MPTCP code where we get the 'struct net'
pointer via 'current->nsproxy->net_ns', but in the FS part, right?

Here, we have an issue because 'current->nsproxy' is NULL, but is it
normal? Or should we simply exit with an error if it is the case because
we are in an exiting phase?

I'm just a bit confused, because it looks like 'net' is retrieved from
different places elsewhere when dealing with sysfs: some get it from
'current' like us, some assign 'net' to 'table->extra2', others get it
from 'table->data' (via a container_of()), etc. Maybe we should not use
'current->nsproxy->net_ns' here then?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-04 18:38   ` Matthieu Baerts
@ 2025-01-04 18:53     ` Eric Dumazet
  2025-01-04 19:00       ` Al Viro
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Eric Dumazet @ 2025-01-04 18:53 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp,
	netdev, pabeni, syzkaller-bugs, syzbot, Al Viro

On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi Eric,
>
> Thank you for the bug report!
>
> On 02/01/2025 16:21, Eric Dumazet wrote:
> > On Thu, Jan 2, 2025 at 3:12 PM syzbot
> > <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
> >>
> >> Hello,
> >>
> >> syzbot found the following issue on:
> >>
> >> HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
> >> git tree:       upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
> >> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> >> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
> >>
> >> Downloadable assets:
> >> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
> >> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
> >> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
> >>
> >> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
> >>
> >> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
> >> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
> >> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> >> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> >> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> >> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
> >>
> >> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> >> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> >> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> >> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> >> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> >> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> >> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> Call Trace:
> >>  <TASK>
> >>  proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
> >>  __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
> >>  __kernel_write+0xf6/0x140 fs/read_write.c:632
> >>  do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
> >>  acct_pin_kill+0x2d/0x100 kernel/acct.c:192
> >>  pin_kill+0x194/0x7c0 fs/fs_pin.c:44
> >>  mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
> >>  cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
> >>  task_work_run+0x14e/0x250 kernel/task_work.c:239
> >>  exit_task_work include/linux/task_work.h:43 [inline]
> >>  do_exit+0xad8/0x2d70 kernel/exit.c:938
> >>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
> >>  get_signal+0x2576/0x2610 kernel/signal.c:3017
> >>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
> >>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
> >>  exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
> >>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
> >>  syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
> >>  do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
> >>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >> RIP: 0033:0x7fee3cb87a6a
> >> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
> >> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
> >> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
> >> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
> >> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
> >> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
> >> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
> >>  </TASK>
> >> Modules linked in:
> >> ---[ end trace 0000000000000000 ]---
> >> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> >> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> >> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
> >> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> >> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> >> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> >> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> >> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> >> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> >> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> ----------------
> >> Code disassembly (best guess), 1 bytes skipped:
> >>    0:   42 80 3c 38 00          cmpb   $0x0,(%rax,%r15,1)
> >>    5:   0f 85 fe 02 00 00       jne    0x309
> >>    b:   4d 8b a4 24 08 09 00    mov    0x908(%r12),%r12
> >>   12:   00
> >>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
> >>   1a:   fc ff df
> >>   1d:   49 8d 7c 24 28          lea    0x28(%r12),%rdi
> >>   22:   48 89 fa                mov    %rdi,%rdx
> >>   25:   48 c1 ea 03             shr    $0x3,%rdx
> >> * 29:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
> >>   2d:   0f 85 cc 02 00 00       jne    0x2ff
> >>   33:   4d 8b 7c 24 28          mov    0x28(%r12),%r15
> >>   38:   48                      rex.W
> >>   39:   8d                      .byte 0x8d
> >>   3a:   84 24 c8                test   %ah,(%rax,%rcx,8)
>
> (...)
>
> > I thought acct(2) was only allowing regular files.
> >
> > acct_on() indeed has :
> >
> > if (!S_ISREG(file_inode(file)->i_mode)) {
> >     kfree(acct);
> >     filp_close(file, NULL);
> >     return -EACCES;
> > }
> >
> > It seems there are other ways to call do_acct_process() targeting a sysfs file ?
>
> Just to be sure I'm not misunderstanding your comment: do you mean that
> here, the issue is *not* in MPTCP code where we get the 'struct net'
> pointer via 'current->nsproxy->net_ns', but in the FS part, right?
>
> Here, we have an issue because 'current->nsproxy' is NULL, but is it
> normal? Or should we simply exit with an error if it is the case because
> we are in an exiting phase?
>
> I'm just a bit confused, because it looks like 'net' is retrieved from
> different places elsewhere when dealing with sysfs: some get it from
> 'current' like us, some assign 'net' to 'table->extra2', others get it
> from 'table->data' (via a container_of()), etc. Maybe we should not use
> 'current->nsproxy->net_ns' here then?

I do think this is a bug in process accounting, not in networking.

It might make sense to output a record on a regular file, but probably
not on any other files.

diff --git a/kernel/acct.c b/kernel/acct.c
index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
        const struct cred *orig_cred;
        struct file *file = acct->file;

+       if (S_ISREG(file_inode(file)->i_mode))
+               return;
+
        /*
         * Accounting records are not subject to resource limits.
         */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-04 18:53     ` Eric Dumazet
@ 2025-01-04 19:00       ` Al Viro
  2025-01-04 19:11         ` Matthieu Baerts
  2025-01-04 19:11       ` Matthieu Baerts
  2025-01-04 20:09       ` Al Viro
  2 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2025-01-04 19:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Sat, Jan 04, 2025 at 07:53:22PM +0100, Eric Dumazet wrote:

> I do think this is a bug in process accounting, not in networking.
> 
> It might make sense to output a record on a regular file, but probably
> not on any other files.
> 
> diff --git a/kernel/acct.c b/kernel/acct.c
> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
> 100644
> --- a/kernel/acct.c
> +++ b/kernel/acct.c
> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
>         const struct cred *orig_cred;
>         struct file *file = acct->file;
> 
> +       if (S_ISREG(file_inode(file)->i_mode))
> +               return;

... won't help, since the file in question *is* a regular file.  IOW, it's
a wrong predicate here.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-04 19:00       ` Al Viro
@ 2025-01-04 19:11         ` Matthieu Baerts
  2025-01-04 20:21           ` Al Viro
  0 siblings, 1 reply; 22+ messages in thread
From: Matthieu Baerts @ 2025-01-04 19:11 UTC (permalink / raw)
  To: Al Viro, Eric Dumazet
  Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp,
	netdev, pabeni, syzkaller-bugs, syzbot

Hi Al, Eric,

On 04/01/2025 20:00, Al Viro wrote:
> On Sat, Jan 04, 2025 at 07:53:22PM +0100, Eric Dumazet wrote:
> 
>> I do think this is a bug in process accounting, not in networking.
>>
>> It might make sense to output a record on a regular file, but probably
>> not on any other files.
>>
>> diff --git a/kernel/acct.c b/kernel/acct.c
>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
>> 100644
>> --- a/kernel/acct.c
>> +++ b/kernel/acct.c
>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
>>         const struct cred *orig_cred;
>>         struct file *file = acct->file;
>>
>> +       if (S_ISREG(file_inode(file)->i_mode))
>> +               return;
> 
> ... won't help, since the file in question *is* a regular file.  IOW, it's
> a wrong predicate here.

On my side, it looks like I'm not able to reproduce the issue with this
patch. Without it, it is very easy to reproduce it. (But I don't know if
there are other consequences that would avoid the issue to happen: when
looking at the logs, with the patch, I don't have heaps of "Process
accounting resumed" messages that I had before.)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-04 18:53     ` Eric Dumazet
  2025-01-04 19:00       ` Al Viro
@ 2025-01-04 19:11       ` Matthieu Baerts
  2025-01-06 13:32         ` Joel Granados
  2025-01-04 20:09       ` Al Viro
  2 siblings, 1 reply; 22+ messages in thread
From: Matthieu Baerts @ 2025-01-04 19:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp,
	netdev, pabeni, syzkaller-bugs, syzbot, Al Viro, Joel Granados

Hi Eric,

(+cc Joel)

Thank you for your reply!

On 04/01/2025 19:53, Eric Dumazet wrote:
> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>>
>> Hi Eric,
>>
>> Thank you for the bug report!
>>
>> On 02/01/2025 16:21, Eric Dumazet wrote:
>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot
>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
>>>> git tree:       upstream
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
>>>> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
>>>>
>>>> Downloadable assets:
>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
>>>>
>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
>>>>
>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>>>>
>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Call Trace:
>>>>  <TASK>
>>>>  proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
>>>>  __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
>>>>  __kernel_write+0xf6/0x140 fs/read_write.c:632
>>>>  do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
>>>>  acct_pin_kill+0x2d/0x100 kernel/acct.c:192
>>>>  pin_kill+0x194/0x7c0 fs/fs_pin.c:44
>>>>  mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
>>>>  cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
>>>>  task_work_run+0x14e/0x250 kernel/task_work.c:239
>>>>  exit_task_work include/linux/task_work.h:43 [inline]
>>>>  do_exit+0xad8/0x2d70 kernel/exit.c:938
>>>>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
>>>>  get_signal+0x2576/0x2610 kernel/signal.c:3017
>>>>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
>>>>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>>>>  exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
>>>>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>>>>  syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
>>>>  do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
>>>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>> RIP: 0033:0x7fee3cb87a6a
>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
>>>>  </TASK>
>>>> Modules linked in:
>>>> ---[ end trace 0000000000000000 ]---
>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> ----------------
>>>> Code disassembly (best guess), 1 bytes skipped:
>>>>    0:   42 80 3c 38 00          cmpb   $0x0,(%rax,%r15,1)
>>>>    5:   0f 85 fe 02 00 00       jne    0x309
>>>>    b:   4d 8b a4 24 08 09 00    mov    0x908(%r12),%r12
>>>>   12:   00
>>>>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
>>>>   1a:   fc ff df
>>>>   1d:   49 8d 7c 24 28          lea    0x28(%r12),%rdi
>>>>   22:   48 89 fa                mov    %rdi,%rdx
>>>>   25:   48 c1 ea 03             shr    $0x3,%rdx
>>>> * 29:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
>>>>   2d:   0f 85 cc 02 00 00       jne    0x2ff
>>>>   33:   4d 8b 7c 24 28          mov    0x28(%r12),%r15
>>>>   38:   48                      rex.W
>>>>   39:   8d                      .byte 0x8d
>>>>   3a:   84 24 c8                test   %ah,(%rax,%rcx,8)
>>
>> (...)
>>
>>> I thought acct(2) was only allowing regular files.
>>>
>>> acct_on() indeed has :
>>>
>>> if (!S_ISREG(file_inode(file)->i_mode)) {
>>>     kfree(acct);
>>>     filp_close(file, NULL);
>>>     return -EACCES;
>>> }
>>>
>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ?
>>
>> Just to be sure I'm not misunderstanding your comment: do you mean that
>> here, the issue is *not* in MPTCP code where we get the 'struct net'
>> pointer via 'current->nsproxy->net_ns', but in the FS part, right?
>>
>> Here, we have an issue because 'current->nsproxy' is NULL, but is it
>> normal? Or should we simply exit with an error if it is the case because
>> we are in an exiting phase?
>>
>> I'm just a bit confused, because it looks like 'net' is retrieved from
>> different places elsewhere when dealing with sysfs: some get it from
>> 'current' like us, some assign 'net' to 'table->extra2', others get it
>> from 'table->data' (via a container_of()), etc. Maybe we should not use
>> 'current->nsproxy->net_ns' here then?
> 
> I do think this is a bug in process accounting, not in networking.
> 
> It might make sense to output a record on a regular file, but probably
> not on any other files.
> 
> diff --git a/kernel/acct.c b/kernel/acct.c
> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
> 100644
> --- a/kernel/acct.c
> +++ b/kernel/acct.c
> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
>         const struct cred *orig_cred;
>         struct file *file = acct->file;
> 
> +       if (S_ISREG(file_inode(file)->i_mode))
> +               return;
> +
>         /*
>          * Accounting records are not subject to resource limits.
>          */

OK, thank you, that's clearer.

So this is then more a question for Joel, right?

Do you plan to send this patch to him?

#syz set subsystems: fs

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-04 18:53     ` Eric Dumazet
  2025-01-04 19:00       ` Al Viro
  2025-01-04 19:11       ` Matthieu Baerts
@ 2025-01-04 20:09       ` Al Viro
  2 siblings, 0 replies; 22+ messages in thread
From: Al Viro @ 2025-01-04 20:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Sat, Jan 04, 2025 at 07:53:22PM +0100, Eric Dumazet wrote:
> I do think this is a bug in process accounting, not in networking.
> 
> It might make sense to output a record on a regular file, but probably
> not on any other files.
> 
> diff --git a/kernel/acct.c b/kernel/acct.c
> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
> 100644
> --- a/kernel/acct.c
> +++ b/kernel/acct.c
> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
>         const struct cred *orig_cred;
>         struct file *file = acct->file;
> 
> +       if (S_ISREG(file_inode(file)->i_mode))
> +               return;

Wait, what?  OK, that will stop attempts to write there - or to any
other regular file.

If you modify that to
	if (!S_ISREG(...))
you seem to have intended, it won't break the normal behaviour but it
won't help with sysctls.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-04 19:11         ` Matthieu Baerts
@ 2025-01-04 20:21           ` Al Viro
  2025-01-05  8:32             ` Eric Dumazet
  2025-01-05 17:03             ` Matthieu Baerts
  0 siblings, 2 replies; 22+ messages in thread
From: Al Viro @ 2025-01-04 20:21 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Eric Dumazet, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Sat, Jan 04, 2025 at 08:11:49PM +0100, Matthieu Baerts wrote:
> >> +       if (S_ISREG(file_inode(file)->i_mode))
		^^^^^^^^^
> >> +               return;
> > 
> > ... won't help, since the file in question *is* a regular file.  IOW, it's
> > a wrong predicate here.
> 
> On my side, it looks like I'm not able to reproduce the issue with this
> patch. Without it, it is very easy to reproduce it. (But I don't know if
> there are other consequences that would avoid the issue to happen: when
> looking at the logs, with the patch, I don't have heaps of "Process
> accounting resumed" messages that I had before.)

Unsurprisingly so, since it rejects all regular files due to a typo;
fix that and you'll see that the oops is still there.

The real issue (and the one that affects more than just this scenario) is
the use of current->nsproxy->net to get to the damn thing.

Why not something like
static int proc_scheduler(const struct ctl_table *ctl, int write,
                          void *buffer, size_t *lenp, loff_t *ppos)
{
	char (*data)[MPTCP_SCHED_NAME_MAX] = table->data;
        char val[MPTCP_SCHED_NAME_MAX];
        struct ctl_table tbl = {
                .data = val,
                .maxlen = MPTCP_SCHED_NAME_MAX,
        };
        int ret;

        strscpy(val, *data, MPTCP_SCHED_NAME_MAX);

        ret = proc_dostring(&tbl, write, buffer, lenp, ppos);
        if (write && ret == 0) {
		rcu_read_lock();
		sched = mptcp_sched_find(val);
		if (sched)
			strscpy(*data, val, MPTCP_SCHED_NAME_MAX);
		else
			ret = -ENOENT;
		rcu_read_unlock();
        }
        return ret;
}

seeing that the data object you really want to access is
mptcp_get_pernet(net)->scheduler and you have that pointer
stored in table->data at the registration time?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-04 20:21           ` Al Viro
@ 2025-01-05  8:32             ` Eric Dumazet
  2025-01-05 11:29               ` Al Viro
  2025-01-05 17:03             ` Matthieu Baerts
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2025-01-05  8:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Sat, Jan 4, 2025 at 9:21 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sat, Jan 04, 2025 at 08:11:49PM +0100, Matthieu Baerts wrote:
> > >> +       if (S_ISREG(file_inode(file)->i_mode))
>                 ^^^^^^^^^
> > >> +               return;
> > >
> > > ... won't help, since the file in question *is* a regular file.  IOW, it's
> > > a wrong predicate here.
> >
> > On my side, it looks like I'm not able to reproduce the issue with this
> > patch. Without it, it is very easy to reproduce it. (But I don't know if
> > there are other consequences that would avoid the issue to happen: when
> > looking at the logs, with the patch, I don't have heaps of "Process
> > accounting resumed" messages that I had before.)
>
> Unsurprisingly so, since it rejects all regular files due to a typo;
> fix that and you'll see that the oops is still there.
>
> The real issue (and the one that affects more than just this scenario) is
> the use of current->nsproxy->net to get to the damn thing.

According to grep, we have many other places directly reading
current->nsproxy->net_ns
For instance in net/sctp/sysctl.c
Should we change them all ?

Perhaps an alternative would be to add a generic check in
proc_sys_call_handler()

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 27a283d85a6e7df1a7edbfb513ce75832363e2e6..84968b10ce86e7fd88c6e3c43f52b601394b056f
100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -576,6 +576,8 @@ static ssize_t proc_sys_call_handler(struct kiocb
*iocb, struct iov_iter *iter,
        error = -EINVAL;
        if (!table->proc_handler)
                goto out;
+       if (unlikely(current->flags & PF_EXITING))
+               goto out;

        /* don't even try if the size is too large */
        error = -ENOMEM;


Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-05  8:32             ` Eric Dumazet
@ 2025-01-05 11:29               ` Al Viro
  2025-01-05 16:52                 ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2025-01-05 11:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Sun, Jan 05, 2025 at 09:32:36AM +0100, Eric Dumazet wrote:

> According to grep, we have many other places directly reading
> current->nsproxy->net_ns
> For instance in net/sctp/sysctl.c
> Should we change them all ?

Depends - do you want their contents match the netns of opener (as,
AFAICS, for ipv4 sysctls) or that of the reader?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-05 11:29               ` Al Viro
@ 2025-01-05 16:52                 ` Eric Dumazet
  2025-01-05 17:03                   ` Matthieu Baerts
  2025-01-05 19:54                   ` Al Viro
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2025-01-05 16:52 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Sun, Jan 5, 2025 at 12:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Sun, Jan 05, 2025 at 09:32:36AM +0100, Eric Dumazet wrote:
>
> > According to grep, we have many other places directly reading
> > current->nsproxy->net_ns
> > For instance in net/sctp/sysctl.c
> > Should we change them all ?
>
> Depends - do you want their contents match the netns of opener (as,
> AFAICS, for ipv4 sysctls) or that of the reader?

I am only worried that a malicious user could crash the host with
current kernels,
not about this MPTP crash, but all unaware users of current->nsproxy
in sysctl handlers.

Back to MPTCP :

Using the convention used in other mptcp sysctls like (enabled,
add_addr_timeout,
checksum_enabled, allow_join_initial_addr_port...) is better for consistency.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-05 16:52                 ` Eric Dumazet
@ 2025-01-05 17:03                   ` Matthieu Baerts
  2025-01-05 19:54                   ` Al Viro
  1 sibling, 0 replies; 22+ messages in thread
From: Matthieu Baerts @ 2025-01-05 17:03 UTC (permalink / raw)
  To: Eric Dumazet, Al Viro
  Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp,
	netdev, pabeni, syzkaller-bugs, syzbot

Hi Eric,

On 05/01/2025 17:52, Eric Dumazet wrote:
> On Sun, Jan 5, 2025 at 12:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> On Sun, Jan 05, 2025 at 09:32:36AM +0100, Eric Dumazet wrote:
>>
>>> According to grep, we have many other places directly reading
>>> current->nsproxy->net_ns
>>> For instance in net/sctp/sysctl.c
>>> Should we change them all ?
>>
>> Depends - do you want their contents match the netns of opener (as,
>> AFAICS, for ipv4 sysctls) or that of the reader?
> 
> I am only worried that a malicious user could crash the host with
> current kernels,
> not about this MPTP crash, but all unaware users of current->nsproxy
> in sysctl handlers.
> 
> Back to MPTCP :
> 
> Using the convention used in other mptcp sysctls like (enabled,
> add_addr_timeout,
> checksum_enabled, allow_join_initial_addr_port...) is better for consistency.

Indeed, I can do the modifications to stop using current->nsproxy in
MPTCP. I can do the same in SCTP.

Do you plan to send your patch modifying proc_sysctl.c? It is just to
know if I should mark my patches as fixes, and split them to ease the
backports -- each helper using current->nsproxy has been introduced in
different commits -- or if I can send them to net-next instead.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-04 20:21           ` Al Viro
  2025-01-05  8:32             ` Eric Dumazet
@ 2025-01-05 17:03             ` Matthieu Baerts
  1 sibling, 0 replies; 22+ messages in thread
From: Matthieu Baerts @ 2025-01-05 17:03 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric Dumazet, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

Hi Al,

On 04/01/2025 21:21, Al Viro wrote:
> The real issue (and the one that affects more than just this scenario) is
> the use of current->nsproxy->net to get to the damn thing.
> 
> Why not something like

(...)

> seeing that the data object you really want to access is
> mptcp_get_pernet(net)->scheduler and you have that pointer
> stored in table->data at the registration time?

Good point, thank you for the suggestion! :)

I will do this modification.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-05 16:52                 ` Eric Dumazet
  2025-01-05 17:03                   ` Matthieu Baerts
@ 2025-01-05 19:54                   ` Al Viro
  2025-01-05 20:50                     ` Al Viro
  1 sibling, 1 reply; 22+ messages in thread
From: Al Viro @ 2025-01-05 19:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Sun, Jan 05, 2025 at 05:52:19PM +0100, Eric Dumazet wrote:
> On Sun, Jan 5, 2025 at 12:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Sun, Jan 05, 2025 at 09:32:36AM +0100, Eric Dumazet wrote:
> >
> > > According to grep, we have many other places directly reading
> > > current->nsproxy->net_ns
> > > For instance in net/sctp/sysctl.c
> > > Should we change them all ?
> >
> > Depends - do you want their contents match the netns of opener (as,
> > AFAICS, for ipv4 sysctls) or that of the reader?
> 
> I am only worried that a malicious user could crash the host with
> current kernels,
> not about this MPTP crash, but all unaware users of current->nsproxy
> in sysctl handlers.

I don't hate your mitigation in proc_sysctl.c, but IMO there are two
problems mixed here - one is that we probably should have access
to per-netns sysctl table act on the netns it had been created for,
which may not coincide with reader's/writer's netns and another is that
access to current->nsproxy->netns would simply oops if attempted when
current->nsproxy had been dropped.

So I suspect that current->nsproxy->netns shouldn't be used in
per-netns sysctls for consistency sake (note that it can get more
serious than just consistency, if you have e.g. a spinlock taken
in something hanging off current netns to protect access to
something table->data points to).

As for the mitigation in fs/proc/proc_sysctl.c... might be useful,
if it comes with a clear comment about the reasons it's there.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-05 19:54                   ` Al Viro
@ 2025-01-05 20:50                     ` Al Viro
  2025-01-05 21:11                       ` Al Viro
  0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2025-01-05 20:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Sun, Jan 05, 2025 at 07:54:34PM +0000, Al Viro wrote:

> So I suspect that current->nsproxy->netns shouldn't be used in
> per-netns sysctls for consistency sake (note that it can get more
> serious than just consistency, if you have e.g. a spinlock taken
> in something hanging off current netns to protect access to
> something table->data points to).
> 
> As for the mitigation in fs/proc/proc_sysctl.c... might be useful,
> if it comes with a clear comment about the reasons it's there.

FWIW, looks like we have two such in mptcp (with sysctls next to
those definitely accessing the netns of opener rather than reader/writer),
two in rds (both inconsistent on the write side -
        struct net *net = current->nsproxy->net_ns;
        int err;

        err = proc_dointvec_minmax(ctl, write, buffer, lenp, fpos);
        if (err < 0) {
                pr_warn("Invalid input. Must be >= %d\n",
                        *(int *)(ctl->extra1));
                return err;
        }
        if (write)
                rds_tcp_sysctl_reset(net);
will modify ctl->data, which points to &rtn->{snd,rcv}buf_size, with
rtn == net_generic(net, rds_tcp_netid) and net being for opener's netns
and then call rds_tcp_sysctl_reset(net) with net being the writer's
netns) and 6 in sctp.  At least some of sctp ones are also inconsistent
on the write side; e.g.
static int proc_sctp_do_rto_min(const struct ctl_table *ctl, int write,
                                void *buffer, size_t *lenp, loff_t *ppos)
{
        struct net *net = current->nsproxy->net_ns;
        unsigned int min = *(unsigned int *) ctl->extra1;
        unsigned int max = *(unsigned int *) ctl->extra2;
        struct ctl_table tbl;
        int ret, new_value;

        memset(&tbl, 0, sizeof(struct ctl_table));
        tbl.maxlen = sizeof(unsigned int);

        if (write)
                tbl.data = &new_value;
        else
                tbl.data = &net->sctp.rto_min;

        ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
        if (write && ret == 0) {
                if (new_value > max || new_value < min)
                        return -EINVAL;

                net->sctp.rto_min = new_value;
        }

        return ret;
}
has max taken from ctl->extra2, which is &net->sctp.rto_max of the
opener's netns, but the value capped by that in stored into
net->sctp.rto_min of *writer's* netns.  So the logics that is supposed
to prevent rto_min > rto_max can be bypassed; no idea how much can that
escalate to, but it's clearly not what the code intends.

So I'd rather document the "don't assume that current->nsproxy->netns will
point to the same netns this ctl is for" and fix those 10 instances - at
least some smell seriously fishy.  It's not just the acct(2) weirdness and
the damage may be worse than an oops...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-05 20:50                     ` Al Viro
@ 2025-01-05 21:11                       ` Al Viro
  0 siblings, 0 replies; 22+ messages in thread
From: Al Viro @ 2025-01-05 21:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Matthieu Baerts, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Sun, Jan 05, 2025 at 08:50:56PM +0000, Al Viro wrote:

> has max taken from ctl->extra2, which is &net->sctp.rto_max of the
> opener's netns, but the value capped by that in stored into
> net->sctp.rto_min of *writer's* netns.  So the logics that is supposed
> to prevent rto_min > rto_max can be bypassed; no idea how much can that
> escalate to, but it's clearly not what the code intends.

Speaking of which, the logics that tries to maintain rto_min <= rto_max is
broken in another way.  There's no exclusion in those suckers.  IOW, if
we have set rto_min to 1 and rto_max to 10000, two processes can try to
write 1000 to rto_min and 10 to rto_max resp., with successful validations
done against the original state in both, followed by actual stores.
Result is rto_min == 1000 and rto_max == 10, which is probably not what
one wants there...

IOW, the validation and stores should be atomic; the same goes for another
pair (pf_retrans <= ps_retrans).  Again, I've no idea how severe it is,
but result seems to be at least contrary to expectation of the code
authors...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-04 19:11       ` Matthieu Baerts
@ 2025-01-06 13:32         ` Joel Granados
  2025-01-06 14:27           ` Matthieu Baerts
  0 siblings, 1 reply; 22+ messages in thread
From: Joel Granados @ 2025-01-06 13:32 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Eric Dumazet, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot, Al Viro

On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote:
> Hi Eric,
> 
> (+cc Joel)
> 
> Thank you for your reply!
> 
> On 04/01/2025 19:53, Eric Dumazet wrote:
> > On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote:
> >>
> >> Hi Eric,
> >>
> >> Thank you for the bug report!
> >>
> >> On 02/01/2025 16:21, Eric Dumazet wrote:
> >>> On Thu, Jan 2, 2025 at 3:12 PM syzbot
> >>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> syzbot found the following issue on:
> >>>>
> >>>> HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
> >>>> git tree:       upstream
> >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
> >>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
> >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
> >>>> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> >>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
> >>>>
> >>>> Downloadable assets:
> >>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
> >>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
> >>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
> >>>>
> >>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
> >>>>
> >>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
> >>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
> >>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
> >>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> >>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> >>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> >>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
> >>>>
> >>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> >>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> >>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> >>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> >>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> >>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> >>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>> Call Trace:
> >>>>  <TASK>
> >>>>  proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
> >>>>  __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
> >>>>  __kernel_write+0xf6/0x140 fs/read_write.c:632
> >>>>  do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
> >>>>  acct_pin_kill+0x2d/0x100 kernel/acct.c:192
> >>>>  pin_kill+0x194/0x7c0 fs/fs_pin.c:44
> >>>>  mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
> >>>>  cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
> >>>>  task_work_run+0x14e/0x250 kernel/task_work.c:239
> >>>>  exit_task_work include/linux/task_work.h:43 [inline]
> >>>>  do_exit+0xad8/0x2d70 kernel/exit.c:938
> >>>>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
> >>>>  get_signal+0x2576/0x2610 kernel/signal.c:3017
> >>>>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
> >>>>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
> >>>>  exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
> >>>>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
> >>>>  syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
> >>>>  do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
> >>>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >>>> RIP: 0033:0x7fee3cb87a6a
> >>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
> >>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
> >>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
> >>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
> >>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
> >>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
> >>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
> >>>>  </TASK>
> >>>> Modules linked in:
> >>>> ---[ end trace 0000000000000000 ]---
> >>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> >>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> >>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
> >>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> >>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> >>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> >>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> >>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> >>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> >>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> >>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>> ----------------
> >>>> Code disassembly (best guess), 1 bytes skipped:
> >>>>    0:   42 80 3c 38 00          cmpb   $0x0,(%rax,%r15,1)
> >>>>    5:   0f 85 fe 02 00 00       jne    0x309
> >>>>    b:   4d 8b a4 24 08 09 00    mov    0x908(%r12),%r12
> >>>>   12:   00
> >>>>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
> >>>>   1a:   fc ff df
> >>>>   1d:   49 8d 7c 24 28          lea    0x28(%r12),%rdi
> >>>>   22:   48 89 fa                mov    %rdi,%rdx
> >>>>   25:   48 c1 ea 03             shr    $0x3,%rdx
> >>>> * 29:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
> >>>>   2d:   0f 85 cc 02 00 00       jne    0x2ff
> >>>>   33:   4d 8b 7c 24 28          mov    0x28(%r12),%r15
> >>>>   38:   48                      rex.W
> >>>>   39:   8d                      .byte 0x8d
> >>>>   3a:   84 24 c8                test   %ah,(%rax,%rcx,8)
> >>
> >> (...)
> >>
> >>> I thought acct(2) was only allowing regular files.
> >>>
> >>> acct_on() indeed has :
> >>>
> >>> if (!S_ISREG(file_inode(file)->i_mode)) {
> >>>     kfree(acct);
> >>>     filp_close(file, NULL);
> >>>     return -EACCES;
> >>> }
> >>>
> >>> It seems there are other ways to call do_acct_process() targeting a sysfs file ?
If this is the case, can you point me to the place where this happens?

> >>
> >> Just to be sure I'm not misunderstanding your comment: do you mean that
> >> here, the issue is *not* in MPTCP code where we get the 'struct net'
> >> pointer via 'current->nsproxy->net_ns', but in the FS part, right?
> >>
> >> Here, we have an issue because 'current->nsproxy' is NULL, but is it
> >> normal? Or should we simply exit with an error if it is the case because
> >> we are in an exiting phase?
> >>
> >> I'm just a bit confused, because it looks like 'net' is retrieved from
> >> different places elsewhere when dealing with sysfs: some get it from
> >> 'current' like us, some assign 'net' to 'table->extra2', others get it
> >> from 'table->data' (via a container_of()), etc. Maybe we should not use
> >> 'current->nsproxy->net_ns' here then?
> > 
> > I do think this is a bug in process accounting, not in networking.
> > 
> > It might make sense to output a record on a regular file, but probably
> > not on any other files.
It for sure does not make sense to output a record on a sysctl file that
has a maxlen of just 3*sizeof(int) (kernel/acct.c:79).

> > 
> > diff --git a/kernel/acct.c b/kernel/acct.c
> > index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
> > 100644
> > --- a/kernel/acct.c
> > +++ b/kernel/acct.c
> > @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
> >         const struct cred *orig_cred;
> >         struct file *file = acct->file;
> > 
> > +       if (S_ISREG(file_inode(file)->i_mode))
> > +               return;
> > +
This seems like it does not handle the actual culprit which is. Why is
the sysctl file being used for the accounting.

> >         /*
> >          * Accounting records are not subject to resource limits.
> >          */
> 
> OK, thank you, that's clearer.
> 
> So this is then more a question for Joel, right?
> 
> Do you plan to send this patch to him?
> 
> #syz set subsystems: fs
> 
> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
> 

So what is happening is that:
1. The accounting file is set to a non-sysctl file.
2. And when accounting tries to write to this file, you get the
   behaviour explained in this mail?

Please correct me if I have miss-read the situation.

Best


-- 

Joel Granados

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-06 13:32         ` Joel Granados
@ 2025-01-06 14:27           ` Matthieu Baerts
  2025-01-06 15:27             ` Eric Dumazet
  2025-01-08 14:37             ` Joel Granados
  0 siblings, 2 replies; 22+ messages in thread
From: Matthieu Baerts @ 2025-01-06 14:27 UTC (permalink / raw)
  To: Joel Granados, Eric Dumazet, Al Viro
  Cc: davem, geliang, horms, kuba, linux-kernel, martineau, mptcp,
	netdev, pabeni, syzkaller-bugs, syzbot

Hi Joel, Eric, Al,

On 06/01/2025 14:32, Joel Granados wrote:
> On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote:
>> Hi Eric,
>>
>> (+cc Joel)
>>
>> Thank you for your reply!
>>
>> On 04/01/2025 19:53, Eric Dumazet wrote:
>>> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> Thank you for the bug report!
>>>>
>>>> On 02/01/2025 16:21, Eric Dumazet wrote:
>>>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot
>>>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> syzbot found the following issue on:
>>>>>>
>>>>>> HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
>>>>>> git tree:       upstream
>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
>>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
>>>>>> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
>>>>>>
>>>>>> Downloadable assets:
>>>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
>>>>>>
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
>>>>>>
>>>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
>>>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
>>>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
>>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>>>>>>
>>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>>>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>> Call Trace:
>>>>>>  <TASK>
>>>>>>  proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
>>>>>>  __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
>>>>>>  __kernel_write+0xf6/0x140 fs/read_write.c:632
>>>>>>  do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
>>>>>>  acct_pin_kill+0x2d/0x100 kernel/acct.c:192
>>>>>>  pin_kill+0x194/0x7c0 fs/fs_pin.c:44
>>>>>>  mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
>>>>>>  cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
>>>>>>  task_work_run+0x14e/0x250 kernel/task_work.c:239
>>>>>>  exit_task_work include/linux/task_work.h:43 [inline]
>>>>>>  do_exit+0xad8/0x2d70 kernel/exit.c:938
>>>>>>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
>>>>>>  get_signal+0x2576/0x2610 kernel/signal.c:3017
>>>>>>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
>>>>>>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>>>>>>  exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
>>>>>>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>>>>>>  syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
>>>>>>  do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
>>>>>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>> RIP: 0033:0x7fee3cb87a6a
>>>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
>>>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
>>>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
>>>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
>>>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
>>>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
>>>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
>>>>>>  </TASK>
>>>>>> Modules linked in:
>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>>>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>> ----------------
>>>>>> Code disassembly (best guess), 1 bytes skipped:
>>>>>>    0:   42 80 3c 38 00          cmpb   $0x0,(%rax,%r15,1)
>>>>>>    5:   0f 85 fe 02 00 00       jne    0x309
>>>>>>    b:   4d 8b a4 24 08 09 00    mov    0x908(%r12),%r12
>>>>>>   12:   00
>>>>>>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
>>>>>>   1a:   fc ff df
>>>>>>   1d:   49 8d 7c 24 28          lea    0x28(%r12),%rdi
>>>>>>   22:   48 89 fa                mov    %rdi,%rdx
>>>>>>   25:   48 c1 ea 03             shr    $0x3,%rdx
>>>>>> * 29:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
>>>>>>   2d:   0f 85 cc 02 00 00       jne    0x2ff
>>>>>>   33:   4d 8b 7c 24 28          mov    0x28(%r12),%r15
>>>>>>   38:   48                      rex.W
>>>>>>   39:   8d                      .byte 0x8d
>>>>>>   3a:   84 24 c8                test   %ah,(%rax,%rcx,8)
>>>>
>>>> (...)
>>>>
>>>>> I thought acct(2) was only allowing regular files.
>>>>>
>>>>> acct_on() indeed has :
>>>>>
>>>>> if (!S_ISREG(file_inode(file)->i_mode)) {
>>>>>     kfree(acct);
>>>>>     filp_close(file, NULL);
>>>>>     return -EACCES;
>>>>> }
>>>>>
>>>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ?
> If this is the case, can you point me to the place where this happens?
> 
>>>>
>>>> Just to be sure I'm not misunderstanding your comment: do you mean that
>>>> here, the issue is *not* in MPTCP code where we get the 'struct net'
>>>> pointer via 'current->nsproxy->net_ns', but in the FS part, right?
>>>>
>>>> Here, we have an issue because 'current->nsproxy' is NULL, but is it
>>>> normal? Or should we simply exit with an error if it is the case because
>>>> we are in an exiting phase?
>>>>
>>>> I'm just a bit confused, because it looks like 'net' is retrieved from
>>>> different places elsewhere when dealing with sysfs: some get it from
>>>> 'current' like us, some assign 'net' to 'table->extra2', others get it
>>>> from 'table->data' (via a container_of()), etc. Maybe we should not use
>>>> 'current->nsproxy->net_ns' here then?
>>>
>>> I do think this is a bug in process accounting, not in networking.
>>>
>>> It might make sense to output a record on a regular file, but probably
>>> not on any other files.
> It for sure does not make sense to output a record on a sysctl file that
> has a maxlen of just 3*sizeof(int) (kernel/acct.c:79).
> 
>>>
>>> diff --git a/kernel/acct.c b/kernel/acct.c
>>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
>>> 100644
>>> --- a/kernel/acct.c
>>> +++ b/kernel/acct.c
>>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
>>>         const struct cred *orig_cred;
>>>         struct file *file = acct->file;
>>>
>>> +       if (S_ISREG(file_inode(file)->i_mode))
>>> +               return;
>>> +
> This seems like it does not handle the actual culprit which is. Why is
> the sysctl file being used for the accounting.
> 
>>>         /*
>>>          * Accounting records are not subject to resource limits.
>>>          */
>>
>> OK, thank you, that's clearer.
>>
>> So this is then more a question for Joel, right?
>>
>> Do you plan to send this patch to him?
>>
>> #syz set subsystems: fs
>>
>> Cheers,
>> Matt
>> -- 
>> Sponsored by the NGI0 Core fund.
>>
> 
> So what is happening is that:
> 1. The accounting file is set to a non-sysctl file.
> 2. And when accounting tries to write to this file, you get the
>    behaviour explained in this mail?
> 
> Please correct me if I have miss-read the situation.

@Joel: Thank you for your reply!

I'm sorry, I'm not sure whether I can help here. I hope Eric and/or Al
can jump in.

What I can say is that the original issue has been found by syzbot, and
the reproducer [1] shows that 3 syscalls have been used:
- openat('/proc/sys/net/mptcp/scheduler')
- mprotect()
- acct()

Please also note that the conversation continued in a sub-tread where
you are not in the Cc list, see [2]. In short, Eric suggested another
patch only for sysfs, and Al recommended dropping the use of
'current->nsproxy'.

On my side, I'm looking at dropping the use of 'current->nsproxy' in
sysctl callbacks. I guess such patches will be seen as fixes, except if
Eric's new patch is enough for stable?

[1] https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
[2]
https://lore.kernel.org/netdev/67769ecb.050a0220.3a8527.003f.GAE@google.com/T/#m862d0913ebfcec5e462a9c33b47bc3f6440a2900

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-06 14:27           ` Matthieu Baerts
@ 2025-01-06 15:27             ` Eric Dumazet
  2025-01-06 15:34               ` Matthieu Baerts
  2025-01-08 14:37             ` Joel Granados
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2025-01-06 15:27 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Joel Granados, Al Viro, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Mon, Jan 6, 2025 at 3:27 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi Joel, Eric, Al,
>
> On 06/01/2025 14:32, Joel Granados wrote:
> > On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote:
> >> Hi Eric,
> >>
> >> (+cc Joel)
> >>
> >> Thank you for your reply!
> >>
> >> On 04/01/2025 19:53, Eric Dumazet wrote:
> >>> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote:
> >>>>
> >>>> Hi Eric,
> >>>>
> >>>> Thank you for the bug report!
> >>>>
> >>>> On 02/01/2025 16:21, Eric Dumazet wrote:
> >>>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot
> >>>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> syzbot found the following issue on:
> >>>>>>
> >>>>>> HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
> >>>>>> git tree:       upstream
> >>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
> >>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
> >>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
> >>>>>> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> >>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
> >>>>>>
> >>>>>> Downloadable assets:
> >>>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
> >>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
> >>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
> >>>>>>
> >>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
> >>>>>>
> >>>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
> >>>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
> >>>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
> >>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
> >>>>>>
> >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> >>>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> >>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>> Call Trace:
> >>>>>>  <TASK>
> >>>>>>  proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
> >>>>>>  __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
> >>>>>>  __kernel_write+0xf6/0x140 fs/read_write.c:632
> >>>>>>  do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
> >>>>>>  acct_pin_kill+0x2d/0x100 kernel/acct.c:192
> >>>>>>  pin_kill+0x194/0x7c0 fs/fs_pin.c:44
> >>>>>>  mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
> >>>>>>  cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
> >>>>>>  task_work_run+0x14e/0x250 kernel/task_work.c:239
> >>>>>>  exit_task_work include/linux/task_work.h:43 [inline]
> >>>>>>  do_exit+0xad8/0x2d70 kernel/exit.c:938
> >>>>>>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
> >>>>>>  get_signal+0x2576/0x2610 kernel/signal.c:3017
> >>>>>>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
> >>>>>>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
> >>>>>>  exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
> >>>>>>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
> >>>>>>  syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
> >>>>>>  do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
> >>>>>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >>>>>> RIP: 0033:0x7fee3cb87a6a
> >>>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
> >>>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
> >>>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
> >>>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
> >>>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
> >>>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
> >>>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
> >>>>>>  </TASK>
> >>>>>> Modules linked in:
> >>>>>> ---[ end trace 0000000000000000 ]---
> >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
> >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> >>>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> >>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>> ----------------
> >>>>>> Code disassembly (best guess), 1 bytes skipped:
> >>>>>>    0:   42 80 3c 38 00          cmpb   $0x0,(%rax,%r15,1)
> >>>>>>    5:   0f 85 fe 02 00 00       jne    0x309
> >>>>>>    b:   4d 8b a4 24 08 09 00    mov    0x908(%r12),%r12
> >>>>>>   12:   00
> >>>>>>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
> >>>>>>   1a:   fc ff df
> >>>>>>   1d:   49 8d 7c 24 28          lea    0x28(%r12),%rdi
> >>>>>>   22:   48 89 fa                mov    %rdi,%rdx
> >>>>>>   25:   48 c1 ea 03             shr    $0x3,%rdx
> >>>>>> * 29:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
> >>>>>>   2d:   0f 85 cc 02 00 00       jne    0x2ff
> >>>>>>   33:   4d 8b 7c 24 28          mov    0x28(%r12),%r15
> >>>>>>   38:   48                      rex.W
> >>>>>>   39:   8d                      .byte 0x8d
> >>>>>>   3a:   84 24 c8                test   %ah,(%rax,%rcx,8)
> >>>>
> >>>> (...)
> >>>>
> >>>>> I thought acct(2) was only allowing regular files.
> >>>>>
> >>>>> acct_on() indeed has :
> >>>>>
> >>>>> if (!S_ISREG(file_inode(file)->i_mode)) {
> >>>>>     kfree(acct);
> >>>>>     filp_close(file, NULL);
> >>>>>     return -EACCES;
> >>>>> }
> >>>>>
> >>>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ?
> > If this is the case, can you point me to the place where this happens?
> >
> >>>>
> >>>> Just to be sure I'm not misunderstanding your comment: do you mean that
> >>>> here, the issue is *not* in MPTCP code where we get the 'struct net'
> >>>> pointer via 'current->nsproxy->net_ns', but in the FS part, right?
> >>>>
> >>>> Here, we have an issue because 'current->nsproxy' is NULL, but is it
> >>>> normal? Or should we simply exit with an error if it is the case because
> >>>> we are in an exiting phase?
> >>>>
> >>>> I'm just a bit confused, because it looks like 'net' is retrieved from
> >>>> different places elsewhere when dealing with sysfs: some get it from
> >>>> 'current' like us, some assign 'net' to 'table->extra2', others get it
> >>>> from 'table->data' (via a container_of()), etc. Maybe we should not use
> >>>> 'current->nsproxy->net_ns' here then?
> >>>
> >>> I do think this is a bug in process accounting, not in networking.
> >>>
> >>> It might make sense to output a record on a regular file, but probably
> >>> not on any other files.
> > It for sure does not make sense to output a record on a sysctl file that
> > has a maxlen of just 3*sizeof(int) (kernel/acct.c:79).
> >
> >>>
> >>> diff --git a/kernel/acct.c b/kernel/acct.c
> >>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
> >>> 100644
> >>> --- a/kernel/acct.c
> >>> +++ b/kernel/acct.c
> >>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
> >>>         const struct cred *orig_cred;
> >>>         struct file *file = acct->file;
> >>>
> >>> +       if (S_ISREG(file_inode(file)->i_mode))
> >>> +               return;
> >>> +
> > This seems like it does not handle the actual culprit which is. Why is
> > the sysctl file being used for the accounting.
> >
> >>>         /*
> >>>          * Accounting records are not subject to resource limits.
> >>>          */
> >>
> >> OK, thank you, that's clearer.
> >>
> >> So this is then more a question for Joel, right?
> >>
> >> Do you plan to send this patch to him?
> >>
> >> #syz set subsystems: fs
> >>
> >> Cheers,
> >> Matt
> >> --
> >> Sponsored by the NGI0 Core fund.
> >>
> >
> > So what is happening is that:
> > 1. The accounting file is set to a non-sysctl file.
> > 2. And when accounting tries to write to this file, you get the
> >    behaviour explained in this mail?
> >
> > Please correct me if I have miss-read the situation.
>
> @Joel: Thank you for your reply!
>
> I'm sorry, I'm not sure whether I can help here. I hope Eric and/or Al
> can jump in.
>
> What I can say is that the original issue has been found by syzbot, and
> the reproducer [1] shows that 3 syscalls have been used:
> - openat('/proc/sys/net/mptcp/scheduler')
> - mprotect()
> - acct()
>
> Please also note that the conversation continued in a sub-tread where
> you are not in the Cc list, see [2]. In short, Eric suggested another
> patch only for sysfs, and Al recommended dropping the use of
> 'current->nsproxy'.
>
> On my side, I'm looking at dropping the use of 'current->nsproxy' in
> sysctl callbacks. I guess such patches will be seen as fixes, except if
> Eric's new patch is enough for stable?

It might be less risky in terms of backports to patch mptcp and others.

Ie just use Al suggestion.

Thanks !

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-06 15:27             ` Eric Dumazet
@ 2025-01-06 15:34               ` Matthieu Baerts
  0 siblings, 0 replies; 22+ messages in thread
From: Matthieu Baerts @ 2025-01-06 15:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Joel Granados, Al Viro, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

Hi Eric,

Thank you for your reply!

On 06/01/2025 16:27, Eric Dumazet wrote:
> On Mon, Jan 6, 2025 at 3:27 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>>
>> Hi Joel, Eric, Al,
>>
>> On 06/01/2025 14:32, Joel Granados wrote:
>>> On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote:
>>>> Hi Eric,
>>>>
>>>> (+cc Joel)
>>>>
>>>> Thank you for your reply!
>>>>
>>>> On 04/01/2025 19:53, Eric Dumazet wrote:
>>>>> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> Thank you for the bug report!
>>>>>>
>>>>>> On 02/01/2025 16:21, Eric Dumazet wrote:
>>>>>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot
>>>>>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> syzbot found the following issue on:
>>>>>>>>
>>>>>>>> HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
>>>>>>>> git tree:       upstream
>>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
>>>>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
>>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
>>>>>>>> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
>>>>>>>>
>>>>>>>> Downloadable assets:
>>>>>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
>>>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
>>>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
>>>>>>>>
>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
>>>>>>>>
>>>>>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
>>>>>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
>>>>>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
>>>>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>>>>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>>>>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>>>>>>>>
>>>>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>>>>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>>>>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>>>>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>>>>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>>>>>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> Call Trace:
>>>>>>>>  <TASK>
>>>>>>>>  proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
>>>>>>>>  __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
>>>>>>>>  __kernel_write+0xf6/0x140 fs/read_write.c:632
>>>>>>>>  do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
>>>>>>>>  acct_pin_kill+0x2d/0x100 kernel/acct.c:192
>>>>>>>>  pin_kill+0x194/0x7c0 fs/fs_pin.c:44
>>>>>>>>  mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
>>>>>>>>  cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
>>>>>>>>  task_work_run+0x14e/0x250 kernel/task_work.c:239
>>>>>>>>  exit_task_work include/linux/task_work.h:43 [inline]
>>>>>>>>  do_exit+0xad8/0x2d70 kernel/exit.c:938
>>>>>>>>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
>>>>>>>>  get_signal+0x2576/0x2610 kernel/signal.c:3017
>>>>>>>>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
>>>>>>>>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>>>>>>>>  exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
>>>>>>>>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>>>>>>>>  syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
>>>>>>>>  do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
>>>>>>>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>>>> RIP: 0033:0x7fee3cb87a6a
>>>>>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
>>>>>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
>>>>>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
>>>>>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
>>>>>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
>>>>>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
>>>>>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
>>>>>>>>  </TASK>
>>>>>>>> Modules linked in:
>>>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>>>>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>>>>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>>>>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>>>>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>>>>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>>>>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>>>>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>>>>>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> ----------------
>>>>>>>> Code disassembly (best guess), 1 bytes skipped:
>>>>>>>>    0:   42 80 3c 38 00          cmpb   $0x0,(%rax,%r15,1)
>>>>>>>>    5:   0f 85 fe 02 00 00       jne    0x309
>>>>>>>>    b:   4d 8b a4 24 08 09 00    mov    0x908(%r12),%r12
>>>>>>>>   12:   00
>>>>>>>>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
>>>>>>>>   1a:   fc ff df
>>>>>>>>   1d:   49 8d 7c 24 28          lea    0x28(%r12),%rdi
>>>>>>>>   22:   48 89 fa                mov    %rdi,%rdx
>>>>>>>>   25:   48 c1 ea 03             shr    $0x3,%rdx
>>>>>>>> * 29:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
>>>>>>>>   2d:   0f 85 cc 02 00 00       jne    0x2ff
>>>>>>>>   33:   4d 8b 7c 24 28          mov    0x28(%r12),%r15
>>>>>>>>   38:   48                      rex.W
>>>>>>>>   39:   8d                      .byte 0x8d
>>>>>>>>   3a:   84 24 c8                test   %ah,(%rax,%rcx,8)
>>>>>>
>>>>>> (...)
>>>>>>
>>>>>>> I thought acct(2) was only allowing regular files.
>>>>>>>
>>>>>>> acct_on() indeed has :
>>>>>>>
>>>>>>> if (!S_ISREG(file_inode(file)->i_mode)) {
>>>>>>>     kfree(acct);
>>>>>>>     filp_close(file, NULL);
>>>>>>>     return -EACCES;
>>>>>>> }
>>>>>>>
>>>>>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ?
>>> If this is the case, can you point me to the place where this happens?
>>>
>>>>>>
>>>>>> Just to be sure I'm not misunderstanding your comment: do you mean that
>>>>>> here, the issue is *not* in MPTCP code where we get the 'struct net'
>>>>>> pointer via 'current->nsproxy->net_ns', but in the FS part, right?
>>>>>>
>>>>>> Here, we have an issue because 'current->nsproxy' is NULL, but is it
>>>>>> normal? Or should we simply exit with an error if it is the case because
>>>>>> we are in an exiting phase?
>>>>>>
>>>>>> I'm just a bit confused, because it looks like 'net' is retrieved from
>>>>>> different places elsewhere when dealing with sysfs: some get it from
>>>>>> 'current' like us, some assign 'net' to 'table->extra2', others get it
>>>>>> from 'table->data' (via a container_of()), etc. Maybe we should not use
>>>>>> 'current->nsproxy->net_ns' here then?
>>>>>
>>>>> I do think this is a bug in process accounting, not in networking.
>>>>>
>>>>> It might make sense to output a record on a regular file, but probably
>>>>> not on any other files.
>>> It for sure does not make sense to output a record on a sysctl file that
>>> has a maxlen of just 3*sizeof(int) (kernel/acct.c:79).
>>>
>>>>>
>>>>> diff --git a/kernel/acct.c b/kernel/acct.c
>>>>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
>>>>> 100644
>>>>> --- a/kernel/acct.c
>>>>> +++ b/kernel/acct.c
>>>>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
>>>>>         const struct cred *orig_cred;
>>>>>         struct file *file = acct->file;
>>>>>
>>>>> +       if (S_ISREG(file_inode(file)->i_mode))
>>>>> +               return;
>>>>> +
>>> This seems like it does not handle the actual culprit which is. Why is
>>> the sysctl file being used for the accounting.
>>>
>>>>>         /*
>>>>>          * Accounting records are not subject to resource limits.
>>>>>          */
>>>>
>>>> OK, thank you, that's clearer.
>>>>
>>>> So this is then more a question for Joel, right?
>>>>
>>>> Do you plan to send this patch to him?
>>>>
>>>> #syz set subsystems: fs
>>>>
>>>> Cheers,
>>>> Matt
>>>> --
>>>> Sponsored by the NGI0 Core fund.
>>>>
>>>
>>> So what is happening is that:
>>> 1. The accounting file is set to a non-sysctl file.
>>> 2. And when accounting tries to write to this file, you get the
>>>    behaviour explained in this mail?
>>>
>>> Please correct me if I have miss-read the situation.
>>
>> @Joel: Thank you for your reply!
>>
>> I'm sorry, I'm not sure whether I can help here. I hope Eric and/or Al
>> can jump in.
>>
>> What I can say is that the original issue has been found by syzbot, and
>> the reproducer [1] shows that 3 syscalls have been used:
>> - openat('/proc/sys/net/mptcp/scheduler')
>> - mprotect()
>> - acct()
>>
>> Please also note that the conversation continued in a sub-tread where
>> you are not in the Cc list, see [2]. In short, Eric suggested another
>> patch only for sysfs, and Al recommended dropping the use of
>> 'current->nsproxy'.
>>
>> On my side, I'm looking at dropping the use of 'current->nsproxy' in
>> sysctl callbacks. I guess such patches will be seen as fixes, except if
>> Eric's new patch is enough for stable?
> 
> It might be less risky in terms of backports to patch mptcp and others.
> 
> Ie just use Al suggestion.

Thank you, will do! In fact, I already modified the kernel on my side,
but it is hard for me to validate that for the moment: it is nice to
have many trees around, but less when they fall on cables :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
  2025-01-06 14:27           ` Matthieu Baerts
  2025-01-06 15:27             ` Eric Dumazet
@ 2025-01-08 14:37             ` Joel Granados
  1 sibling, 0 replies; 22+ messages in thread
From: Joel Granados @ 2025-01-08 14:37 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Eric Dumazet, Al Viro, davem, geliang, horms, kuba, linux-kernel,
	martineau, mptcp, netdev, pabeni, syzkaller-bugs, syzbot

On Mon, Jan 06, 2025 at 03:27:47PM +0100, Matthieu Baerts wrote:
> Hi Joel, Eric, Al,
> 
> On 06/01/2025 14:32, Joel Granados wrote:
> > On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote:
> >> Hi Eric,
> >>
> >> (+cc Joel)
> >>
> >> Thank you for your reply!
> >>
> >> On 04/01/2025 19:53, Eric Dumazet wrote:
> >>> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote:
> >>>>
> >>>> Hi Eric,
> >>>>
> >>>> Thank you for the bug report!
> >>>>
> >>>> On 02/01/2025 16:21, Eric Dumazet wrote:
> >>>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot
> >>>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> syzbot found the following issue on:
> >>>>>>
> >>>>>> HEAD commit:    ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
> >>>>>> git tree:       upstream
> >>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
> >>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
> >>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
> >>>>>> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> >>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
> >>>>>>
> >>>>>> Downloadable assets:
> >>>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
> >>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
> >>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
> >>>>>>
> >>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
> >>>>>>
> >>>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
> >>>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
> >>>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
> >>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
> >>>>>>
> >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> >>>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> >>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>> Call Trace:
> >>>>>>  <TASK>
> >>>>>>  proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
> >>>>>>  __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
> >>>>>>  __kernel_write+0xf6/0x140 fs/read_write.c:632
> >>>>>>  do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
> >>>>>>  acct_pin_kill+0x2d/0x100 kernel/acct.c:192
> >>>>>>  pin_kill+0x194/0x7c0 fs/fs_pin.c:44
> >>>>>>  mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
> >>>>>>  cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
> >>>>>>  task_work_run+0x14e/0x250 kernel/task_work.c:239
> >>>>>>  exit_task_work include/linux/task_work.h:43 [inline]
> >>>>>>  do_exit+0xad8/0x2d70 kernel/exit.c:938
> >>>>>>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
> >>>>>>  get_signal+0x2576/0x2610 kernel/signal.c:3017
> >>>>>>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
> >>>>>>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
> >>>>>>  exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
> >>>>>>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
> >>>>>>  syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
> >>>>>>  do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
> >>>>>>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >>>>>> RIP: 0033:0x7fee3cb87a6a
> >>>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
> >>>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
> >>>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
> >>>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
> >>>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
> >>>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
> >>>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
> >>>>>>  </TASK>
> >>>>>> Modules linked in:
> >>>>>> ---[ end trace 0000000000000000 ]---
> >>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
> >>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
> >>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
> >>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
> >>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
> >>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
> >>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
> >>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
> >>>>>> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> >>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
> >>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>> ----------------
> >>>>>> Code disassembly (best guess), 1 bytes skipped:
> >>>>>>    0:   42 80 3c 38 00          cmpb   $0x0,(%rax,%r15,1)
> >>>>>>    5:   0f 85 fe 02 00 00       jne    0x309
> >>>>>>    b:   4d 8b a4 24 08 09 00    mov    0x908(%r12),%r12
> >>>>>>   12:   00
> >>>>>>   13:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
> >>>>>>   1a:   fc ff df
> >>>>>>   1d:   49 8d 7c 24 28          lea    0x28(%r12),%rdi
> >>>>>>   22:   48 89 fa                mov    %rdi,%rdx
> >>>>>>   25:   48 c1 ea 03             shr    $0x3,%rdx
> >>>>>> * 29:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1) <-- trapping instruction
> >>>>>>   2d:   0f 85 cc 02 00 00       jne    0x2ff
> >>>>>>   33:   4d 8b 7c 24 28          mov    0x28(%r12),%r15
> >>>>>>   38:   48                      rex.W
> >>>>>>   39:   8d                      .byte 0x8d
> >>>>>>   3a:   84 24 c8                test   %ah,(%rax,%rcx,8)
> >>>>
> >>>> (...)
> >>>>
> >>>>> I thought acct(2) was only allowing regular files.
> >>>>>
> >>>>> acct_on() indeed has :
> >>>>>
> >>>>> if (!S_ISREG(file_inode(file)->i_mode)) {
> >>>>>     kfree(acct);
> >>>>>     filp_close(file, NULL);
> >>>>>     return -EACCES;
> >>>>> }
> >>>>>
> >>>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ?
> > If this is the case, can you point me to the place where this happens?
> > 
> >>>>
> >>>> Just to be sure I'm not misunderstanding your comment: do you mean that
> >>>> here, the issue is *not* in MPTCP code where we get the 'struct net'
> >>>> pointer via 'current->nsproxy->net_ns', but in the FS part, right?
> >>>>
> >>>> Here, we have an issue because 'current->nsproxy' is NULL, but is it
> >>>> normal? Or should we simply exit with an error if it is the case because
> >>>> we are in an exiting phase?
> >>>>
> >>>> I'm just a bit confused, because it looks like 'net' is retrieved from
> >>>> different places elsewhere when dealing with sysfs: some get it from
> >>>> 'current' like us, some assign 'net' to 'table->extra2', others get it
> >>>> from 'table->data' (via a container_of()), etc. Maybe we should not use
> >>>> 'current->nsproxy->net_ns' here then?
> >>>
> >>> I do think this is a bug in process accounting, not in networking.
> >>>
> >>> It might make sense to output a record on a regular file, but probably
> >>> not on any other files.
> > It for sure does not make sense to output a record on a sysctl file that
> > has a maxlen of just 3*sizeof(int) (kernel/acct.c:79).
> > 
> >>>
> >>> diff --git a/kernel/acct.c b/kernel/acct.c
> >>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
> >>> 100644
> >>> --- a/kernel/acct.c
> >>> +++ b/kernel/acct.c
> >>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
> >>>         const struct cred *orig_cred;
> >>>         struct file *file = acct->file;
> >>>
> >>> +       if (S_ISREG(file_inode(file)->i_mode))
> >>> +               return;
> >>> +
> > This seems like it does not handle the actual culprit which is. Why is
> > the sysctl file being used for the accounting.
> > 
> >>>         /*
> >>>          * Accounting records are not subject to resource limits.
> >>>          */
> >>
> >> OK, thank you, that's clearer.
> >>
> >> So this is then more a question for Joel, right?
> >>
> >> Do you plan to send this patch to him?
> >>
> >> #syz set subsystems: fs
> >>
> >> Cheers,
> >> Matt
> >> -- 
> >> Sponsored by the NGI0 Core fund.
> >>
> > 
> > So what is happening is that:
> > 1. The accounting file is set to a non-sysctl file.
> > 2. And when accounting tries to write to this file, you get the
> >    behaviour explained in this mail?
> > 
> > Please correct me if I have miss-read the situation.
> 
> @Joel: Thank you for your reply!
> 
> I'm sorry, I'm not sure whether I can help here. I hope Eric and/or Al
> can jump in.
> 
> What I can say is that the original issue has been found by syzbot, and
> the reproducer [1] shows that 3 syscalls have been used:
> - openat('/proc/sys/net/mptcp/scheduler')
> - mprotect()
> - acct()
> 
> Please also note that the conversation continued in a sub-tread where
> you are not in the Cc list, see [2]. In short, Eric suggested another
> patch only for sysfs, and Al recommended dropping the use of
> 'current->nsproxy'.
Perfect. Thx for the summary. I'll remove this thread from my radar as
it seems that a fix has already been found. 

Best

-- 

Joel Granados

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-01-08 14:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-02 14:12 [syzbot] [mptcp?] general protection fault in proc_scheduler syzbot
2025-01-02 15:21 ` Eric Dumazet
2025-01-04 18:38   ` Matthieu Baerts
2025-01-04 18:53     ` Eric Dumazet
2025-01-04 19:00       ` Al Viro
2025-01-04 19:11         ` Matthieu Baerts
2025-01-04 20:21           ` Al Viro
2025-01-05  8:32             ` Eric Dumazet
2025-01-05 11:29               ` Al Viro
2025-01-05 16:52                 ` Eric Dumazet
2025-01-05 17:03                   ` Matthieu Baerts
2025-01-05 19:54                   ` Al Viro
2025-01-05 20:50                     ` Al Viro
2025-01-05 21:11                       ` Al Viro
2025-01-05 17:03             ` Matthieu Baerts
2025-01-04 19:11       ` Matthieu Baerts
2025-01-06 13:32         ` Joel Granados
2025-01-06 14:27           ` Matthieu Baerts
2025-01-06 15:27             ` Eric Dumazet
2025-01-06 15:34               ` Matthieu Baerts
2025-01-08 14:37             ` Joel Granados
2025-01-04 20:09       ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).