From: Matthieu Baerts <matttbe@kernel.org>
To: Eric Dumazet <edumazet@google.com>
Cc: Joel Granados <joel.granados@kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
davem@davemloft.net, geliang@kernel.org, horms@kernel.org,
kuba@kernel.org, linux-kernel@vger.kernel.org,
martineau@kernel.org, mptcp@lists.linux.dev,
netdev@vger.kernel.org, pabeni@redhat.com,
syzkaller-bugs@googlegroups.com,
syzbot <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com>
Subject: Re: [syzbot] [mptcp?] general protection fault in proc_scheduler
Date: Mon, 6 Jan 2025 16:34:38 +0100 [thread overview]
Message-ID: <2588cccc-e1dd-418b-81be-38d11e383019@kernel.org> (raw)
In-Reply-To: <CANn89i+1gPTc7wmTgdbwyB19tKAk7j8ZR6j6z7hVhtwAXXD8-A@mail.gmail.com>
Hi Eric,
Thank you for your reply!
On 06/01/2025 16:27, Eric Dumazet wrote:
> On Mon, Jan 6, 2025 at 3:27 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>>
>> Hi Joel, Eric, Al,
>>
>> On 06/01/2025 14:32, Joel Granados wrote:
>>> On Sat, Jan 04, 2025 at 08:11:52PM +0100, Matthieu Baerts wrote:
>>>> Hi Eric,
>>>>
>>>> (+cc Joel)
>>>>
>>>> Thank you for your reply!
>>>>
>>>> On 04/01/2025 19:53, Eric Dumazet wrote:
>>>>> On Sat, Jan 4, 2025 at 7:38 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> Thank you for the bug report!
>>>>>>
>>>>>> On 02/01/2025 16:21, Eric Dumazet wrote:
>>>>>>> On Thu, Jan 2, 2025 at 3:12 PM syzbot
>>>>>>> <syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> syzbot found the following issue on:
>>>>>>>>
>>>>>>>> HEAD commit: ccb98ccef0e5 Merge tag 'platform-drivers-x86-v6.13-4' of g..
>>>>>>>> git tree: upstream
>>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=128f6ac4580000
>>>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=86dd15278dbfe19f
>>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e364f774c6f57f2c86d1
>>>>>>>> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>>>>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1245eaf8580000
>>>>>>>>
>>>>>>>> Downloadable assets:
>>>>>>>> disk image: https://storage.googleapis.com/syzbot-assets/d24eb225cff7/disk-ccb98cce.raw.xz
>>>>>>>> vmlinux: https://storage.googleapis.com/syzbot-assets/dd81532f8240/vmlinux-ccb98cce.xz
>>>>>>>> kernel image: https://storage.googleapis.com/syzbot-assets/18b08e4bbf40/bzImage-ccb98cce.xz
>>>>>>>>
>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>>>> Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
>>>>>>>>
>>>>>>>> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
>>>>>>>> KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
>>>>>>>> CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
>>>>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>>>>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>>>>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>>>>>>>>
>>>>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>>>>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>>>>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>>>>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>>>>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>>>>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> Call Trace:
>>>>>>>> <TASK>
>>>>>>>> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
>>>>>>>> __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
>>>>>>>> __kernel_write+0xf6/0x140 fs/read_write.c:632
>>>>>>>> do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
>>>>>>>> acct_pin_kill+0x2d/0x100 kernel/acct.c:192
>>>>>>>> pin_kill+0x194/0x7c0 fs/fs_pin.c:44
>>>>>>>> mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
>>>>>>>> cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
>>>>>>>> task_work_run+0x14e/0x250 kernel/task_work.c:239
>>>>>>>> exit_task_work include/linux/task_work.h:43 [inline]
>>>>>>>> do_exit+0xad8/0x2d70 kernel/exit.c:938
>>>>>>>> do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
>>>>>>>> get_signal+0x2576/0x2610 kernel/signal.c:3017
>>>>>>>> arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
>>>>>>>> exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>>>>>>>> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
>>>>>>>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>>>>>>>> syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
>>>>>>>> do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>>>> RIP: 0033:0x7fee3cb87a6a
>>>>>>>> Code: Unable to access opcode bytes at 0x7fee3cb87a40.
>>>>>>>> RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
>>>>>>>> RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
>>>>>>>> RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
>>>>>>>> RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
>>>>>>>> R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
>>>>>>>> R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
>>>>>>>> </TASK>
>>>>>>>> Modules linked in:
>>>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>>>> RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
>>>>>>>> Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
>>>>>>>> RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
>>>>>>>> RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
>>>>>>>> RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
>>>>>>>> RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
>>>>>>>> R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
>>>>>>>> R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
>>>>>>>> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> ----------------
>>>>>>>> Code disassembly (best guess), 1 bytes skipped:
>>>>>>>> 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1)
>>>>>>>> 5: 0f 85 fe 02 00 00 jne 0x309
>>>>>>>> b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12
>>>>>>>> 12: 00
>>>>>>>> 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
>>>>>>>> 1a: fc ff df
>>>>>>>> 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi
>>>>>>>> 22: 48 89 fa mov %rdi,%rdx
>>>>>>>> 25: 48 c1 ea 03 shr $0x3,%rdx
>>>>>>>> * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
>>>>>>>> 2d: 0f 85 cc 02 00 00 jne 0x2ff
>>>>>>>> 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15
>>>>>>>> 38: 48 rex.W
>>>>>>>> 39: 8d .byte 0x8d
>>>>>>>> 3a: 84 24 c8 test %ah,(%rax,%rcx,8)
>>>>>>
>>>>>> (...)
>>>>>>
>>>>>>> I thought acct(2) was only allowing regular files.
>>>>>>>
>>>>>>> acct_on() indeed has :
>>>>>>>
>>>>>>> if (!S_ISREG(file_inode(file)->i_mode)) {
>>>>>>> kfree(acct);
>>>>>>> filp_close(file, NULL);
>>>>>>> return -EACCES;
>>>>>>> }
>>>>>>>
>>>>>>> It seems there are other ways to call do_acct_process() targeting a sysfs file ?
>>> If this is the case, can you point me to the place where this happens?
>>>
>>>>>>
>>>>>> Just to be sure I'm not misunderstanding your comment: do you mean that
>>>>>> here, the issue is *not* in MPTCP code where we get the 'struct net'
>>>>>> pointer via 'current->nsproxy->net_ns', but in the FS part, right?
>>>>>>
>>>>>> Here, we have an issue because 'current->nsproxy' is NULL, but is it
>>>>>> normal? Or should we simply exit with an error if it is the case because
>>>>>> we are in an exiting phase?
>>>>>>
>>>>>> I'm just a bit confused, because it looks like 'net' is retrieved from
>>>>>> different places elsewhere when dealing with sysfs: some get it from
>>>>>> 'current' like us, some assign 'net' to 'table->extra2', others get it
>>>>>> from 'table->data' (via a container_of()), etc. Maybe we should not use
>>>>>> 'current->nsproxy->net_ns' here then?
>>>>>
>>>>> I do think this is a bug in process accounting, not in networking.
>>>>>
>>>>> It might make sense to output a record on a regular file, but probably
>>>>> not on any other files.
>>> It for sure does not make sense to output a record on a sysctl file that
>>> has a maxlen of just 3*sizeof(int) (kernel/acct.c:79).
>>>
>>>>>
>>>>> diff --git a/kernel/acct.c b/kernel/acct.c
>>>>> index 179848ad33e978a557ce695a0d6020aa169177c6..a211305cb930f6860d02de7f45ebd260ae03a604
>>>>> 100644
>>>>> --- a/kernel/acct.c
>>>>> +++ b/kernel/acct.c
>>>>> @@ -495,6 +495,9 @@ static void do_acct_process(struct bsd_acct_struct *acct)
>>>>> const struct cred *orig_cred;
>>>>> struct file *file = acct->file;
>>>>>
>>>>> + if (S_ISREG(file_inode(file)->i_mode))
>>>>> + return;
>>>>> +
>>> This seems like it does not handle the actual culprit which is. Why is
>>> the sysctl file being used for the accounting.
>>>
>>>>> /*
>>>>> * Accounting records are not subject to resource limits.
>>>>> */
>>>>
>>>> OK, thank you, that's clearer.
>>>>
>>>> So this is then more a question for Joel, right?
>>>>
>>>> Do you plan to send this patch to him?
>>>>
>>>> #syz set subsystems: fs
>>>>
>>>> Cheers,
>>>> Matt
>>>> --
>>>> Sponsored by the NGI0 Core fund.
>>>>
>>>
>>> So what is happening is that:
>>> 1. The accounting file is set to a non-sysctl file.
>>> 2. And when accounting tries to write to this file, you get the
>>> behaviour explained in this mail?
>>>
>>> Please correct me if I have miss-read the situation.
>>
>> @Joel: Thank you for your reply!
>>
>> I'm sorry, I'm not sure whether I can help here. I hope Eric and/or Al
>> can jump in.
>>
>> What I can say is that the original issue has been found by syzbot, and
>> the reproducer [1] shows that 3 syscalls have been used:
>> - openat('/proc/sys/net/mptcp/scheduler')
>> - mprotect()
>> - acct()
>>
>> Please also note that the conversation continued in a sub-tread where
>> you are not in the Cc list, see [2]. In short, Eric suggested another
>> patch only for sysfs, and Al recommended dropping the use of
>> 'current->nsproxy'.
>>
>> On my side, I'm looking at dropping the use of 'current->nsproxy' in
>> sysctl callbacks. I guess such patches will be seen as fixes, except if
>> Eric's new patch is enough for stable?
>
> It might be less risky in terms of backports to patch mptcp and others.
>
> Ie just use Al suggestion.
Thank you, will do! In fact, I already modified the kernel on my side,
but it is hard for me to validate that for the moment: it is nice to
have many trees around, but less when they fall on cables :)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
next prev parent reply other threads:[~2025-01-06 15:34 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-02 14:12 [syzbot] [mptcp?] general protection fault in proc_scheduler syzbot
2025-01-02 15:21 ` Eric Dumazet
2025-01-04 18:38 ` Matthieu Baerts
2025-01-04 18:53 ` Eric Dumazet
2025-01-04 19:00 ` Al Viro
2025-01-04 19:11 ` Matthieu Baerts
2025-01-04 20:21 ` Al Viro
2025-01-05 8:32 ` Eric Dumazet
2025-01-05 11:29 ` Al Viro
2025-01-05 16:52 ` Eric Dumazet
2025-01-05 17:03 ` Matthieu Baerts
2025-01-05 19:54 ` Al Viro
2025-01-05 20:50 ` Al Viro
2025-01-05 21:11 ` Al Viro
2025-01-05 17:03 ` Matthieu Baerts
2025-01-04 19:11 ` Matthieu Baerts
2025-01-06 13:32 ` Joel Granados
2025-01-06 14:27 ` Matthieu Baerts
2025-01-06 15:27 ` Eric Dumazet
2025-01-06 15:34 ` Matthieu Baerts [this message]
2025-01-08 14:37 ` Joel Granados
2025-01-04 20:09 ` Al Viro
2025-01-03 10:32 ` Hillf Danton
2025-01-03 10:52 ` syzbot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2588cccc-e1dd-418b-81be-38d11e383019@kernel.org \
--to=matttbe@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=geliang@kernel.org \
--cc=horms@kernel.org \
--cc=joel.granados@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martineau@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.