* [syzbot] [kernel?] WARNING in signal_wake_up_state
@ 2024-01-09 18:18 syzbot
2024-01-09 19:05 ` Linus Torvalds
2024-09-23 3:12 ` syzbot
0 siblings, 2 replies; 6+ messages in thread
From: syzbot @ 2024-01-09 18:18 UTC (permalink / raw)
To: ebiederm, linux-kernel, luto, michael.christie, mst, peterz,
syzkaller-bugs, tglx, torvalds
Hello,
syzbot found the following issue on:
HEAD commit: 610a9b8f49fb Linux 6.7-rc8
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=134dee09e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=56c2c781bb4ee18
dashboard link: https://syzkaller.appspot.com/bug?extid=c6d438f2d77f96cae7c2
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10223829e80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1121aeb5e80000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/1e10270bc146/disk-610a9b8f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c6066a38235d/vmlinux-610a9b8f.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e7df7096082d/bzImage-610a9b8f.xz
The issue was bisected to:
commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
Author: Mike Christie <michael.christie@oracle.com>
Date: Thu Jun 1 18:32:32 2023 +0000
fork, vhost: Use CLONE_THREAD to fix freezer/ps regression
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15ff657ee80000
final oops: https://syzkaller.appspot.com/x/report.txt?x=17ff657ee80000
console output: https://syzkaller.appspot.com/x/log.txt?x=13ff657ee80000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com
Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
process 'syz-executor189' launched '/dev/fd/4' with NULL argv: empty string added
process 'memfd:��n�dR\x04i5\x02��ም[@8��\x1f 9I\x7f\x15\x1d�=��\'L�Ҏ�)JtTDq�ρ��1� �\x10>�\�\x17L�ϑ�M�\x02^T*' started with executable stack
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
Modules linked in:
CPU: 1 PID: 5069 Comm: 4 Not tainted 6.7.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
Code: 00 be ff ff ff ff 48 8d 78 18 e8 31 6c 2e 09 31 ff 41 89 c4 89 c6 e8 55 e8 35 00 45 85 e4 0f 85 62 ff ff ff e8 d7 ec 35 00 90 <0f> 0b 90 e9 54 ff ff ff 48 c7 c7 38 71 19 8f e8 12 96 8c 00 e9 2d
RSP: 0018:ffffc900039979f0 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff888020380000 RCX: ffffffff8151856b
RDX: ffff888023c40000 RSI: ffffffff81518579 RDI: 0000000000000005
RBP: 0000000000000108 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: ffff888020380000 R15: ffff888023c40000
FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000b7000000 CR3: 00000000288f3000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
signal_wake_up include/linux/sched/signal.h:448 [inline]
zap_process fs/coredump.c:373 [inline]
zap_threads fs/coredump.c:392 [inline]
coredump_wait fs/coredump.c:410 [inline]
do_coredump+0x784/0x3f70 fs/coredump.c:571
get_signal+0x242f/0x2790 kernel/signal.c:2890
arch_do_signal_or_restart+0x90/0x7f0 arch/x86/kernel/signal.c:309
exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
exit_to_user_mode_prepare+0x121/0x240 kernel/entry/common.c:204
irqentry_exit_to_user_mode+0xa/0x40 kernel/entry/common.c:309
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
RIP: 0023:0xb7000000
Code: Unable to access opcode bytes at 0xb6ffffd6.
RSP: 002b:00000000ff8cdad0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state 2024-01-09 18:18 [syzbot] [kernel?] WARNING in signal_wake_up_state syzbot @ 2024-01-09 19:05 ` Linus Torvalds 2024-01-10 16:03 ` Oleg Nesterov 2024-01-10 16:11 ` Eric W. Biederman 2024-09-23 3:12 ` syzbot 1 sibling, 2 replies; 6+ messages in thread From: Linus Torvalds @ 2024-01-09 19:05 UTC (permalink / raw) To: syzbot, Oleg Nesterov, Eric W. Biederman Cc: linux-kernel, luto, michael.christie, mst, peterz, syzkaller-bugs, tglx Oleg/Eric, can you make any sense of this? On Tue, 9 Jan 2024 at 10:18, syzbot <syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com> wrote: > > The issue was bisected to: > > commit f9010dbdce911ee1f1af1398a24b1f9f992e0080 Hmm. This smells more like a "that triggers the problem" than a cause. Because the warning itself is > WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771 That's lockdep_assert_held(&t->sighand->siglock); at the top of the function, with the call trace being > signal_wake_up include/linux/sched/signal.h:448 [inline] just a wrapper setting 'state'. > zap_process fs/coredump.c:373 [inline] That's zap_process() that does a for_each_thread(start, t) { and then does a signal_wake_up(t, 1); on each thread. > zap_threads fs/coredump.c:392 [inline] And this is zap_threads(), which does spin_lock_irq(&tsk->sighand->siglock); ... nr = zap_process(tsk, exit_code); Strange. The sighand->siglock is definitely taken. The for_each_thread() must be hitting a thread with a different sighand, but it's basically a list_for_each_entry_rcu(..) walking over the tsk->signal->thread_head list. But if CLONE_THREAD is set (so that we share that 'tsk->signal', then we always require that CLONE_SIGHAND is also set: if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND)) return ERR_PTR(-EINVAL); so we most definitely should have the same ->sighand if we have the same ->signal. And that's true very much for that vhost_task_create() case too. So as far as I can see, that bisected commit does add a new case of threaded signal handling, but in no way explains the problem. Is there some odd exit race? The thread is removed with list_del_rcu(&p->thread_node); in __exit_signal -> __unhash_process(), and despite the RCU annotations, all these parts seem to hold the right locks too (ie sighand->siglock is held by __exit_signal too), so I don't even see any delayed de-allocation issue or anything like that. Thus bringing in Eric/Oleg to see if they see something I miss. Original email at https://lore.kernel.org/all/000000000000a41b82060e875721@google.com/ for your pleasure. Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state 2024-01-09 19:05 ` Linus Torvalds @ 2024-01-10 16:03 ` Oleg Nesterov 2024-01-10 16:11 ` Eric W. Biederman 1 sibling, 0 replies; 6+ messages in thread From: Oleg Nesterov @ 2024-01-10 16:03 UTC (permalink / raw) To: Linus Torvalds Cc: syzbot, Eric W. Biederman, linux-kernel, luto, michael.christie, mst, peterz, syzkaller-bugs, tglx On 01/09, Linus Torvalds wrote: > > Oleg/Eric, can you make any sense of this? > > On Tue, 9 Jan 2024 at 10:18, syzbot > <syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com> wrote: > > > > The issue was bisected to: > > > > commit f9010dbdce911ee1f1af1398a24b1f9f992e0080 > > Hmm. This smells more like a "that triggers the problem" than a cause. > > Because the warning itself is > > > WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771 > > That's > > lockdep_assert_held(&t->sighand->siglock); I have a fever, possibly I am totally confused, but this commit added + /* Don't require de_thread to wait for the vhost_worker */ + if ((t->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER) + count++; into zap_other_threads(). So it seems the caller can do unshare_sighand() before vhost thread exits and actually unshare ->sighand because oldsighand->count > 1. This is already very wrong (plus it seems this breaks the signal->notify_count logic). IIRC I even tried to argue with this change... not sure. And this can explain the warning, this task can start the coredump after exec and hit vhost_worker with the old sighand != current->sighand. Oleg. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state 2024-01-09 19:05 ` Linus Torvalds 2024-01-10 16:03 ` Oleg Nesterov @ 2024-01-10 16:11 ` Eric W. Biederman 2024-01-11 17:20 ` Mike Christie 1 sibling, 1 reply; 6+ messages in thread From: Eric W. Biederman @ 2024-01-10 16:11 UTC (permalink / raw) To: Linus Torvalds Cc: syzbot, Oleg Nesterov, linux-kernel, luto, michael.christie, mst, peterz, syzkaller-bugs, tglx Linus Torvalds <torvalds@linux-foundation.org> writes: > Oleg/Eric, can you make any sense of this? > > On Tue, 9 Jan 2024 at 10:18, syzbot > <syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com> wrote: >> >> The issue was bisected to: >> >> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080 > > Hmm. This smells more like a "that triggers the problem" than a cause. > > Because the warning itself is > >> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771 > > That's > > lockdep_assert_held(&t->sighand->siglock); > > at the top of the function, with the call trace being > >> signal_wake_up include/linux/sched/signal.h:448 [inline] > > just a wrapper setting 'state'. > >> zap_process fs/coredump.c:373 [inline] > > That's zap_process() that does a > > for_each_thread(start, t) { > > and then does a > > signal_wake_up(t, 1); > > on each thread. > >> zap_threads fs/coredump.c:392 [inline] > > And this is zap_threads(), which does > > spin_lock_irq(&tsk->sighand->siglock); > ... > nr = zap_process(tsk, exit_code); > > Strange. The sighand->siglock is definitely taken. > > The for_each_thread() must be hitting a thread with a different > sighand, but it's basically a > > list_for_each_entry_rcu(..) > > walking over the tsk->signal->thread_head list. > > But if CLONE_THREAD is set (so that we share that 'tsk->signal', then > we always require that CLONE_SIGHAND is also set: > > if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND)) > return ERR_PTR(-EINVAL); > > so we most definitely should have the same ->sighand if we have the > same ->signal. And that's true very much for that vhost_task_create() > case too. > > So as far as I can see, that bisected commit does add a new case of > threaded signal handling, but in no way explains the problem. > > Is there some odd exit race? The thread is removed with > > list_del_rcu(&p->thread_node); > > in __exit_signal -> __unhash_process(), and despite the RCU > annotations, all these parts seem to hold the right locks too (ie > sighand->siglock is held by __exit_signal too), so I don't even see > any delayed de-allocation issue or anything like that. > > Thus bringing in Eric/Oleg to see if they see something I miss. I expect this would take going through the rest of the reproducer to see what is going on. Hmm. The reproducer is in something other than C: > # https://syzkaller.appspot.com/bug?id=b7640dae2467568f05425b289a1f004faa2dc292 > # See https://goo.gl/kgGztJ for information about syzkaller reproducers. > #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"","sandbox_arg":0,"close_fds":false} > r0 = openat$vnet(0xffffffffffffff9c, &(0x7f0000000040), 0x2, 0x0) > ioctl$int_in(r0, 0x40000000af01, 0x0) > r1 = memfd_create(&(0x7f0000000400)='\xa3\x9fn\xb4dR\x04i5\x02\xac\xce\xe1\x88\x9d[@8\xd7\xce\x1f 9I\x7f\x15\x1d\x93=\xb5\xe7\\\'L\xe6\xd2\x8e\xbc)JtTDq\x81\xcf\x81\xba\xe51\xf5 \xc8\x10>\xc9\\\x85\x17L\xbf\xcf\x91\xdfM\xf3\x02^T*\x00\x02\xb9~B\x9f\xacl\x1d3\x06o\xf8\x16H\xaa*\x02\xf7\xfb\x06\xf1\x83\x92\xa8\xc2\xcb\xae\xb0\xb4\x93\xb8\x04\xf1\x99\xc2yY+\xd9y\x8a\xd5b\xe8\"q\x1b0)\xccm\xacz\xc1\xadd\x9b6a\xf3\xdds\xbb\x88\xff\b\x85\xb3s\x00\x0e\xbcfvi\x85\xfc.|\xd4h\xec\x82o\x8e\x93\x11\xc1\xd4\xae\x05\x17=\xd9R\xd0\xd4\x90\xcf\x9b\xdc\xaeV\x88\x94\x9f\xe3\xefqi\xed\xa8w\xbe\xd0\xd0-tBl\x9e+\xd3\xed\xce\x9f\x83\x86\xf9\x12\x16Ts\x80\x13]C\xfb`\xc2`\xf7\x1a\x00\x00\x00\x00\x00\x00\x00k\xae\xcb\x1a.\xc2\x8f\xd1x4]PZ\x9e\xd5Y\xf0L\xa4\xbc\x84\xf6\x04L\xff0\x8b\\*\xf9,\xb6\r\x97\xedy\xe0\x8a\xe2\x8ck\xc6S\xc3g\xb9\x1a\xf8\x8f \x9d\x00u7\xd8\'\xf1E\xa4(Q\x80Fy\xb5\xe4q\xc9\xff \xd8\x9d\xad\x11\xf8m\xd3\xbc\x9e\x10D\x7f!\xca\x0ev\x15h$\x01\xdd\xe5\xce\xf8*\xb3\x01\x85\a\xe4qv&\x9c\xac\x9aN~o\xe5\x89\xd5\a\x9f\f\x1f\xc2e/\x8d\x1e\n\xd0_\xbd!^\xa46\xb8j\xc0x\n\xdb\xe1\xa3\xd6\xae;\r\x92@\xa5I\x88Z1F\xf0\x1at\t\xd0\x8a\x04m\x06\xf3BL\xffS\x9eY\xf4\xb0U \xf8\xd00\x88y\xebX\x92\xd5\xbb\xa1h7\xf3\xe0\x0f\xbd\x02\xe4%\xf9\xb1\x87\x8aM\xfeG\xb2L\xbd\x92-\xcd\x1f\xf4\xe1,\xb7G|\xec\"\xa2\xab\xf6\x84\xe0\xcf1\x9a', 0x0) > write$binfmt_elf32(r1, &(0x7f0000000140)=ANY=[@ANYBLOB="7f454c466000002ed8e4f97765ce27b90300060000000000000000b738000000000035f4c38422a3bc8220000500000004020300b300000000002a002400b3d7c52ebf31a8d5c8c3c6cb00000009e500d5ffffff05ffffff03004f9ef4"], 0xd8) > execveat(r1, &(0x7f0000000000)='\x00', 0x0, 0x0, 0x1000) If I read that correctly it is intending to put an elf32 executable into a memfd and then execute it. Exec will possibly unshare SIGHAND struct if there is still a reference to it from somewhere else to ensure the new process has a clean one. But all of the other threads should get shutdown by de_thread before any of that happens. And de_thread should take care of the weird non-leader execve case as well. So after that point the process really should be single threaded. Which is why de_thread is the point of no return. That whole interrupt comes in, and a fatal signal is processed scenario is confusing. Hmm. That weird vnet ioctl at the beginning must be what is starting the vhost logic. So I guess it makes sense if the signal is received by the magic vhost thread. Perhaps there is some weird vhost logic where the thread lingers. Ugh. I seem to remember a whole conversation about the vhost logic (immediately after it was merged) and how it had a bug where it exited differently from everything else. I remember people figuring it was immediately ok, after the code was merged, and because everything had to be performed as root, and no one actually uses the vhost logic like that. It has been long enough I thought that would have been sorted out by now. Looking back to refresh my memory at the original conversation: https://lore.kernel.org/all/20230601183232.8384-1-michael.christie@oracle.com/ The bisect is 100% correct, and this was a known issue with that code at the time it was merged. I will let someone else take it from here. Eric ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state 2024-01-10 16:11 ` Eric W. Biederman @ 2024-01-11 17:20 ` Mike Christie 0 siblings, 0 replies; 6+ messages in thread From: Mike Christie @ 2024-01-11 17:20 UTC (permalink / raw) To: Eric W. Biederman, Linus Torvalds Cc: syzbot, Oleg Nesterov, linux-kernel, luto, mst, peterz, syzkaller-bugs, tglx On 1/10/24 10:11 AM, Eric W. Biederman wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > >> Oleg/Eric, can you make any sense of this? >> >> On Tue, 9 Jan 2024 at 10:18, syzbot >> <syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com> wrote: >>> >>> The issue was bisected to: >>> >>> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080 >> >> Hmm. This smells more like a "that triggers the problem" than a cause. >> >> Because the warning itself is >> >>> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771 >> >> That's >> >> lockdep_assert_held(&t->sighand->siglock); >> >> at the top of the function, with the call trace being >> >>> signal_wake_up include/linux/sched/signal.h:448 [inline] >> >> just a wrapper setting 'state'. >> >>> zap_process fs/coredump.c:373 [inline] >> >> That's zap_process() that does a >> >> for_each_thread(start, t) { >> >> and then does a >> >> signal_wake_up(t, 1); >> >> on each thread. >> >>> zap_threads fs/coredump.c:392 [inline] >> >> And this is zap_threads(), which does >> >> spin_lock_irq(&tsk->sighand->siglock); >> ... >> nr = zap_process(tsk, exit_code); >> >> Strange. The sighand->siglock is definitely taken. >> >> The for_each_thread() must be hitting a thread with a different >> sighand, but it's basically a >> >> list_for_each_entry_rcu(..) >> >> walking over the tsk->signal->thread_head list. >> >> But if CLONE_THREAD is set (so that we share that 'tsk->signal', then >> we always require that CLONE_SIGHAND is also set: >> >> if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND)) >> return ERR_PTR(-EINVAL); >> >> so we most definitely should have the same ->sighand if we have the >> same ->signal. And that's true very much for that vhost_task_create() >> case too. >> >> So as far as I can see, that bisected commit does add a new case of >> threaded signal handling, but in no way explains the problem. >> >> Is there some odd exit race? The thread is removed with >> >> list_del_rcu(&p->thread_node); >> >> in __exit_signal -> __unhash_process(), and despite the RCU >> annotations, all these parts seem to hold the right locks too (ie >> sighand->siglock is held by __exit_signal too), so I don't even see >> any delayed de-allocation issue or anything like that. >> >> Thus bringing in Eric/Oleg to see if they see something I miss. > > I expect this would take going through the rest of the reproducer > to see what is going on. > Hmm. The reproducer is in something other than C: > >> # https://syzkaller.appspot.com/bug?id=b7640dae2467568f05425b289a1f004faa2dc292 >> # See https://goo.gl/kgGztJ for information about syzkaller reproducers. >> #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"","sandbox_arg":0,"close_fds":false} >> r0 = openat$vnet(0xffffffffffffff9c, &(0x7f0000000040), 0x2, 0x0) >> ioctl$int_in(r0, 0x40000000af01, 0x0) >> r1 = memfd_create(&(0x7f0000000400)='\xa3\x9fn\xb4dR\x04i5\x02\xac\xce\xe1\x88\x9d[@8\xd7\xce\x1f 9I\x7f\x15\x1d\x93=\xb5\xe7\\\'L\xe6\xd2\x8e\xbc)JtTDq\x81\xcf\x81\xba\xe51\xf5 \xc8\x10>\xc9\\\x85\x17L\xbf\xcf\x91\xdfM\xf3\x02^T*\x00\x02\xb9~B\x9f\xacl\x1d3\x06o\xf8\x16H\xaa*\x02\xf7\xfb\x06\xf1\x83\x92\xa8\xc2\xcb\xae\xb0\xb4\x93\xb8\x04\xf1\x99\xc2yY+\xd9y\x8a\xd5b\xe8\"q\x1b0)\xccm\xacz\xc1\xadd\x9b6a\xf3\xdds\xbb\x88\xff\b\x85\xb3s\x00\x0e\xbcfvi\x85\xfc.|\xd4h\xec\x82o\x8e\x93\x11\xc1\xd4\xae\x05\x17=\xd9R\xd0\xd4\x90\xcf\x9b\xdc\xaeV\x88\x94\x9f\xe3\xefqi\xed\xa8w\xbe\xd0\xd0-tBl\x9e+\xd3\xed\xce\x9f\x83\x86\xf9\x12\x16Ts\x80\x13]C\xfb`\xc2`\xf7\x1a\x00\x00\x00\x00\x00\x00\x00k\xae\xcb\x1a.\xc2\x8f\xd1x4]PZ\x9e\xd5Y\xf0L\xa4\xbc\x84\xf6\x04L\xff0\x8b\\*\xf9,\xb6\r\x97\xedy\xe0\x8a\xe2\x8ck\xc6S\xc3g\xb9\x1a\xf8\x8f \x9d\x00u7\xd8\'\xf1E\xa4(Q\x80Fy\xb5\xe4q\xc9\xff \xd8\x9d\xad\x11\xf8m\xd3\xbc\x9e\x10D\x7f!\xca\x0ev\x15h$\x01\xdd\xe5\xce\xf8*\xb3\x01\x85\a\xe4qv&\x9c\xac\x9aN~o\xe5\x89\xd5\a\x9f\f\x1f\xc2e/\x8d\x1e\n\xd0_\xbd!^\xa46\xb8j\xc0x\n\xdb\xe1\xa3\xd6\xae;\r\x92@\xa5I\x88Z1F\xf0\x1at\t\xd0\x8a\x04m\x06\xf3BL\xffS\x9eY\xf4\xb0U \xf8\xd00\x88y\xebX\x92\xd5\xbb\xa1h7\xf3\xe0\x0f\xbd\x02\xe4%\xf9\xb1\x87\x8aM\xfeG\xb2L\xbd\x92-\xcd\x1f\xf4\xe1,\xb7G|\xec\"\xa2\xab\xf6\x84\xe0\xcf1\x9a', 0x0) >> write$binfmt_elf32(r1, &(0x7f0000000140)=ANY=[@ANYBLOB="7f454c466000002ed8e4f97765ce27b90300060000000000000000b738000000000035f4c38422a3bc8220000500000004020300b300000000002a002400b3d7c52ebf31a8d5c8c3c6cb00000009e500d5ffffff05ffffff03004f9ef4"], 0xd8) >> execveat(r1, &(0x7f0000000000)='\x00', 0x0, 0x0, 0x1000) > > If I read that correctly it is intending to put an elf32 executable into > a memfd and then execute it. > > Exec will possibly unshare SIGHAND struct if there is still a reference > to it from somewhere else to ensure the new process has a clean one. > > But all of the other threads should get shutdown by de_thread before any > of that happens. And de_thread should take care of the weird non-leader > execve case as well. So after that point the process really should > be single threaded. Which is why de_thread is the point of no return. > > That whole interrupt comes in, and a fatal signal is processed > scenario is confusing. > > Hmm. That weird vnet ioctl at the beginning must be what is starting > the vhost logic. So I guess it makes sense if the signal is received > by the magic vhost thread. > > Perhaps there is some weird vhost logic where the thread lingers. > > Ugh. I seem to remember a whole conversation about the vhost logic > (immediately after it was merged) and how it had a bug where it exited > differently from everything else. I remember people figuring it was The vhost code just wanted to wait for running IO from the vhost thread context before exiting instead of working like io_uring which would wait from a secondary thread. > immediately ok, after the code was merged, and because everything had to > be performed as root, and no one actually uses the vhost logic like > that. It has been long enough I thought that would have been sorted > out by now. > > Looking back to refresh my memory at the original conversation: > https://lore.kernel.org/all/20230601183232.8384-1-michael.christie@oracle.com/ > > The bisect is 100% correct, and this was a known issue with that code at > the time it was merged. > I have patches for this that I've been testing. Will post to the virt list for the next feature window. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state 2024-01-09 18:18 [syzbot] [kernel?] WARNING in signal_wake_up_state syzbot 2024-01-09 19:05 ` Linus Torvalds @ 2024-09-23 3:12 ` syzbot 1 sibling, 0 replies; 6+ messages in thread From: syzbot @ 2024-09-23 3:12 UTC (permalink / raw) To: akpm, brauner, ebiederm, jack, linux-fsdevel, linux-kernel, luto, mhocko, michael.christie, mst, oleg, pasha.tatashin, peterz, syzkaller-bugs, tandersen, tglx, torvalds, viro syzbot suspects this issue was fixed by commit: commit 240a1853b4d2bce51e5cac9ba65cd646152ab6d6 Author: Mike Christie <michael.christie@oracle.com> Date: Sat Mar 16 00:47:07 2024 +0000 kernel: Remove signal hacks for vhost_tasks bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13212480580000 start commit: e88c4cfcb7b8 Merge tag 'for-6.9-rc5-tag' of git://git.kern.. git tree: upstream kernel config: https://syzkaller.appspot.com/x/.config?x=98d5a8e00ed1044a dashboard link: https://syzkaller.appspot.com/bug?extid=c6d438f2d77f96cae7c2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=152442ef180000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=138c3d30980000 If the result looks correct, please mark the issue as fixed by replying with: #syz fix: kernel: Remove signal hacks for vhost_tasks For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-09-23 3:12 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-09 18:18 [syzbot] [kernel?] WARNING in signal_wake_up_state syzbot 2024-01-09 19:05 ` Linus Torvalds 2024-01-10 16:03 ` Oleg Nesterov 2024-01-10 16:11 ` Eric W. Biederman 2024-01-11 17:20 ` Mike Christie 2024-09-23 3:12 ` syzbot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox