[syzbot] [kernel?] WARNING in signal_wake_up

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [syzbot] [kernel?] WARNING in signal_wake_up_state
@ 2024-01-09 18:18 syzbot
  2024-01-09 19:05 ` Linus Torvalds
  2024-09-23  3:12 ` syzbot
  0 siblings, 2 replies; 6+ messages in thread
From: syzbot @ 2024-01-09 18:18 UTC (permalink / raw)
  To: ebiederm, linux-kernel, luto, michael.christie, mst, peterz,
	syzkaller-bugs, tglx, torvalds

Hello,

syzbot found the following issue on:

HEAD commit:    610a9b8f49fb Linux 6.7-rc8
git tree:       upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=134dee09e80000
kernel config:  https://syzkaller.appspot.com/x/.config?x=56c2c781bb4ee18
dashboard link: https://syzkaller.appspot.com/bug?extid=c6d438f2d77f96cae7c2
compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10223829e80000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1121aeb5e80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/1e10270bc146/disk-610a9b8f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c6066a38235d/vmlinux-610a9b8f.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e7df7096082d/bzImage-610a9b8f.xz

The issue was bisected to:

commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
Author: Mike Christie <michael.christie@oracle.com>
Date:   Thu Jun 1 18:32:32 2023 +0000

    fork, vhost: Use CLONE_THREAD to fix freezer/ps regression

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15ff657ee80000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=17ff657ee80000
console output: https://syzkaller.appspot.com/x/log.txt?x=13ff657ee80000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com
Fixes: f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")

process 'syz-executor189' launched '/dev/fd/4' with NULL argv: empty string added
process 'memfd:��n�dR\x04i5\x02��ም[@8��\x1f 9I\x7f\x15\x1d�=��\'L�Ҏ�)JtTDq�ρ��1� �\x10>�\�\x17L�ϑ�M�\x02^T*' started with executable stack
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
Modules linked in:
CPU: 1 PID: 5069 Comm: 4 Not tainted 6.7.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
Code: 00 be ff ff ff ff 48 8d 78 18 e8 31 6c 2e 09 31 ff 41 89 c4 89 c6 e8 55 e8 35 00 45 85 e4 0f 85 62 ff ff ff e8 d7 ec 35 00 90 <0f> 0b 90 e9 54 ff ff ff 48 c7 c7 38 71 19 8f e8 12 96 8c 00 e9 2d
RSP: 0018:ffffc900039979f0 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff888020380000 RCX: ffffffff8151856b
RDX: ffff888023c40000 RSI: ffffffff81518579 RDI: 0000000000000005
RBP: 0000000000000108 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: ffff888020380000 R15: ffff888023c40000
FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000b7000000 CR3: 00000000288f3000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 signal_wake_up include/linux/sched/signal.h:448 [inline]
 zap_process fs/coredump.c:373 [inline]
 zap_threads fs/coredump.c:392 [inline]
 coredump_wait fs/coredump.c:410 [inline]
 do_coredump+0x784/0x3f70 fs/coredump.c:571
 get_signal+0x242f/0x2790 kernel/signal.c:2890
 arch_do_signal_or_restart+0x90/0x7f0 arch/x86/kernel/signal.c:309
 exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
 exit_to_user_mode_prepare+0x121/0x240 kernel/entry/common.c:204
 irqentry_exit_to_user_mode+0xa/0x40 kernel/entry/common.c:309
 asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
RIP: 0023:0xb7000000
Code: Unable to access opcode bytes at 0xb6ffffd6.
RSP: 002b:00000000ff8cdad0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state
  2024-01-09 18:18 [syzbot] [kernel?] WARNING in signal_wake_up_state syzbot
@ 2024-01-09 19:05 ` Linus Torvalds
  2024-01-10 16:03   ` Oleg Nesterov
  2024-01-10 16:11   ` Eric W. Biederman
  2024-09-23  3:12 ` syzbot
  1 sibling, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2024-01-09 19:05 UTC (permalink / raw)
  To: syzbot, Oleg Nesterov, Eric W. Biederman
  Cc: linux-kernel, luto, michael.christie, mst, peterz, syzkaller-bugs,
	tglx

Oleg/Eric, can you make any sense of this?

On Tue, 9 Jan 2024 at 10:18, syzbot
<syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com> wrote:
>
> The issue was bisected to:
>
> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080

Hmm. This smells more like a "that triggers the problem" than a cause.

Because the warning itself is

> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771

That's

        lockdep_assert_held(&t->sighand->siglock);

at the top of the function, with the call trace being

>  signal_wake_up include/linux/sched/signal.h:448 [inline]

just a wrapper setting 'state'.

>  zap_process fs/coredump.c:373 [inline]

That's zap_process() that does a

        for_each_thread(start, t) {

and then does a

                        signal_wake_up(t, 1);

on each thread.

>  zap_threads fs/coredump.c:392 [inline]

And this is zap_threads(), which does

        spin_lock_irq(&tsk->sighand->siglock);
        ...
                nr = zap_process(tsk, exit_code);

Strange. The sighand->siglock is definitely taken.

The for_each_thread() must be hitting a thread with a different
sighand, but it's basically a

        list_for_each_entry_rcu(..)

walking over the tsk->signal->thread_head list.

But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
we always require that CLONE_SIGHAND is also set:

        if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
                return ERR_PTR(-EINVAL);

so we most definitely should have the same ->sighand if we have the
same ->signal. And that's true very much for that vhost_task_create()
case too.

So as far as I can see, that bisected commit does add a new case of
threaded signal handling, but in no way explains the problem.

Is there some odd exit race? The thread is removed with

        list_del_rcu(&p->thread_node);

in __exit_signal -> __unhash_process(), and despite the RCU
annotations, all these parts seem to hold the right locks too (ie
sighand->siglock is held by __exit_signal too), so I don't even see
any delayed de-allocation issue or anything like that.

Thus bringing in Eric/Oleg to see if they see something I miss.

Original email at

    https://lore.kernel.org/all/000000000000a41b82060e875721@google.com/

for your pleasure.

            Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state
  2024-01-09 19:05 ` Linus Torvalds
@ 2024-01-10 16:03   ` Oleg Nesterov
  2024-01-10 16:11   ` Eric W. Biederman
  1 sibling, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2024-01-10 16:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: syzbot, Eric W. Biederman, linux-kernel, luto, michael.christie,
	mst, peterz, syzkaller-bugs, tglx

On 01/09, Linus Torvalds wrote:
>
> Oleg/Eric, can you make any sense of this?
>
> On Tue, 9 Jan 2024 at 10:18, syzbot
> <syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com> wrote:
> >
> > The issue was bisected to:
> >
> > commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
>
> Hmm. This smells more like a "that triggers the problem" than a cause.
>
> Because the warning itself is
>
> > WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
>
> That's
>
>         lockdep_assert_held(&t->sighand->siglock);

I have a fever, possibly I am totally confused, but this commit added

+               /* Don't require de_thread to wait for the vhost_worker */
+               if ((t->flags & (PF_IO_WORKER | PF_USER_WORKER)) != PF_USER_WORKER)
+                       count++;

into zap_other_threads().

So it seems the caller can do unshare_sighand() before vhost thread exits and
actually unshare ->sighand because oldsighand->count > 1.

This is already very wrong (plus it seems this breaks the signal->notify_count
logic). IIRC I even tried to argue with this change... not sure.

And this can explain the warning, this task can start the coredump after exec
and hit vhost_worker with the old sighand != current->sighand.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state
  2024-01-09 19:05 ` Linus Torvalds
  2024-01-10 16:03   ` Oleg Nesterov
@ 2024-01-10 16:11   ` Eric W. Biederman
  2024-01-11 17:20     ` Mike Christie
  1 sibling, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2024-01-10 16:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: syzbot, Oleg Nesterov, linux-kernel, luto, michael.christie, mst,
	peterz, syzkaller-bugs, tglx

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Oleg/Eric, can you make any sense of this?
>
> On Tue, 9 Jan 2024 at 10:18, syzbot
> <syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com> wrote:
>>
>> The issue was bisected to:
>>
>> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
>
> Hmm. This smells more like a "that triggers the problem" than a cause.
>
> Because the warning itself is
>
>> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
>
> That's
>
>         lockdep_assert_held(&t->sighand->siglock);
>
> at the top of the function, with the call trace being
>
>>  signal_wake_up include/linux/sched/signal.h:448 [inline]
>
> just a wrapper setting 'state'.
>
>>  zap_process fs/coredump.c:373 [inline]
>
> That's zap_process() that does a
>
>         for_each_thread(start, t) {
>
> and then does a
>
>                         signal_wake_up(t, 1);
>
> on each thread.
>
>>  zap_threads fs/coredump.c:392 [inline]
>
> And this is zap_threads(), which does
>
>         spin_lock_irq(&tsk->sighand->siglock);
>         ...
>                 nr = zap_process(tsk, exit_code);
>
> Strange. The sighand->siglock is definitely taken.
>
> The for_each_thread() must be hitting a thread with a different
> sighand, but it's basically a
>
>         list_for_each_entry_rcu(..)
>
> walking over the tsk->signal->thread_head list.
>
> But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
> we always require that CLONE_SIGHAND is also set:
>
>         if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
>                 return ERR_PTR(-EINVAL);
>
> so we most definitely should have the same ->sighand if we have the
> same ->signal. And that's true very much for that vhost_task_create()
> case too.
>
> So as far as I can see, that bisected commit does add a new case of
> threaded signal handling, but in no way explains the problem.
>
> Is there some odd exit race? The thread is removed with
>
>         list_del_rcu(&p->thread_node);
>
> in __exit_signal -> __unhash_process(), and despite the RCU
> annotations, all these parts seem to hold the right locks too (ie
> sighand->siglock is held by __exit_signal too), so I don't even see
> any delayed de-allocation issue or anything like that.
>
> Thus bringing in Eric/Oleg to see if they see something I miss.

I expect this would take going through the rest of the reproducer
to see what is going on.
Hmm.  The reproducer is in something other than C:

> # https://syzkaller.appspot.com/bug?id=b7640dae2467568f05425b289a1f004faa2dc292
> # See https://goo.gl/kgGztJ for information about syzkaller reproducers.
> #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"","sandbox_arg":0,"close_fds":false}
> r0 = openat$vnet(0xffffffffffffff9c, &(0x7f0000000040), 0x2, 0x0)
> ioctl$int_in(r0, 0x40000000af01, 0x0)
> r1 = memfd_create(&(0x7f0000000400)='\xa3\x9fn\xb4dR\x04i5\x02\xac\xce\xe1\x88\x9d[@8\xd7\xce\x1f 9I\x7f\x15\x1d\x93=\xb5\xe7\\\'L\xe6\xd2\x8e\xbc)JtTDq\x81\xcf\x81\xba\xe51\xf5 \xc8\x10>\xc9\\\x85\x17L\xbf\xcf\x91\xdfM\xf3\x02^T*\x00\x02\xb9~B\x9f\xacl\x1d3\x06o\xf8\x16H\xaa*\x02\xf7\xfb\x06\xf1\x83\x92\xa8\xc2\xcb\xae\xb0\xb4\x93\xb8\x04\xf1\x99\xc2yY+\xd9y\x8a\xd5b\xe8\"q\x1b0)\xccm\xacz\xc1\xadd\x9b6a\xf3\xdds\xbb\x88\xff\b\x85\xb3s\x00\x0e\xbcfvi\x85\xfc.|\xd4h\xec\x82o\x8e\x93\x11\xc1\xd4\xae\x05\x17=\xd9R\xd0\xd4\x90\xcf\x9b\xdc\xaeV\x88\x94\x9f\xe3\xefqi\xed\xa8w\xbe\xd0\xd0-tBl\x9e+\xd3\xed\xce\x9f\x83\x86\xf9\x12\x16Ts\x80\x13]C\xfb`\xc2`\xf7\x1a\x00\x00\x00\x00\x00\x00\x00k\xae\xcb\x1a.\xc2\x8f\xd1x4]PZ\x9e\xd5Y\xf0L\xa4\xbc\x84\xf6\x04L\xff0\x8b\\*\xf9,\xb6\r\x97\xedy\xe0\x8a\xe2\x8ck\xc6S\xc3g\xb9\x1a\xf8\x8f \x9d\x00u7\xd8\'\xf1E\xa4(Q\x80Fy\xb5\xe4q\xc9\xff \xd8\x9d\xad\x11\xf8m\xd3\xbc\x9e\x10D\x7f!\xca\x0ev\x15h$\x01\xdd\xe5\xce\xf8*\xb3\x01\x85\a\xe4qv&\x9c\xac\x9aN~o\xe5\x89\xd5\a\x9f\f\x1f\xc2e/\x8d\x1e\n\xd0_\xbd!^\xa46\xb8j\xc0x\n\xdb\xe1\xa3\xd6\xae;\r\x92@\xa5I\x88Z1F\xf0\x1at\t\xd0\x8a\x04m\x06\xf3BL\xffS\x9eY\xf4\xb0U \xf8\xd00\x88y\xebX\x92\xd5\xbb\xa1h7\xf3\xe0\x0f\xbd\x02\xe4%\xf9\xb1\x87\x8aM\xfeG\xb2L\xbd\x92-\xcd\x1f\xf4\xe1,\xb7G|\xec\"\xa2\xab\xf6\x84\xe0\xcf1\x9a', 0x0)
> write$binfmt_elf32(r1, &(0x7f0000000140)=ANY=[@ANYBLOB="7f454c466000002ed8e4f97765ce27b90300060000000000000000b738000000000035f4c38422a3bc8220000500000004020300b300000000002a002400b3d7c52ebf31a8d5c8c3c6cb00000009e500d5ffffff05ffffff03004f9ef4"], 0xd8)
> execveat(r1, &(0x7f0000000000)='\x00', 0x0, 0x0, 0x1000)

If I read that correctly it is intending to put an elf32 executable into
a memfd and then execute it.

Exec will possibly unshare SIGHAND struct if there is still a reference
to it from somewhere else to ensure the new process has a clean one.

But all of the other threads should get shutdown by de_thread before any
of that happens.  And de_thread should take care of the weird non-leader
execve case as well.  So after that point the process really should
be single threaded.  Which is why de_thread is the point of no return.

That whole interrupt comes in, and a fatal signal is processed
scenario is confusing.

Hmm.  That weird vnet ioctl at the beginning must be what is starting
the vhost logic.  So I guess it makes sense if the signal is received
by the magic vhost thread.

Perhaps there is some weird vhost logic where the thread lingers.

Ugh.  I seem to remember a whole conversation about the vhost logic
(immediately after it was merged) and how it had a bug where it exited
differently from everything else.  I remember people figuring it was
immediately ok, after the code was merged, and because everything had to
be performed as root, and no one actually uses the vhost logic like
that.  It has been long enough I thought that would have been sorted
out by now.

Looking back to refresh my memory at the original conversation:
https://lore.kernel.org/all/20230601183232.8384-1-michael.christie@oracle.com/

The bisect is 100% correct, and this was a known issue with that code at
the time it was merged.

I will let someone else take it from here.

Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state
  2024-01-10 16:11   ` Eric W. Biederman
@ 2024-01-11 17:20     ` Mike Christie
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Christie @ 2024-01-11 17:20 UTC (permalink / raw)
  To: Eric W. Biederman, Linus Torvalds
  Cc: syzbot, Oleg Nesterov, linux-kernel, luto, mst, peterz,
	syzkaller-bugs, tglx

On 1/10/24 10:11 AM, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
>> Oleg/Eric, can you make any sense of this?
>>
>> On Tue, 9 Jan 2024 at 10:18, syzbot
>> <syzbot+c6d438f2d77f96cae7c2@syzkaller.appspotmail.com> wrote:
>>>
>>> The issue was bisected to:
>>>
>>> commit f9010dbdce911ee1f1af1398a24b1f9f992e0080
>>
>> Hmm. This smells more like a "that triggers the problem" than a cause.
>>
>> Because the warning itself is
>>
>>> WARNING: CPU: 1 PID: 5069 at kernel/signal.c:771 signal_wake_up_state+0xfa/0x120 kernel/signal.c:771
>>
>> That's
>>
>>         lockdep_assert_held(&t->sighand->siglock);
>>
>> at the top of the function, with the call trace being
>>
>>>  signal_wake_up include/linux/sched/signal.h:448 [inline]
>>
>> just a wrapper setting 'state'.
>>
>>>  zap_process fs/coredump.c:373 [inline]
>>
>> That's zap_process() that does a
>>
>>         for_each_thread(start, t) {
>>
>> and then does a
>>
>>                         signal_wake_up(t, 1);
>>
>> on each thread.
>>
>>>  zap_threads fs/coredump.c:392 [inline]
>>
>> And this is zap_threads(), which does
>>
>>         spin_lock_irq(&tsk->sighand->siglock);
>>         ...
>>                 nr = zap_process(tsk, exit_code);
>>
>> Strange. The sighand->siglock is definitely taken.
>>
>> The for_each_thread() must be hitting a thread with a different
>> sighand, but it's basically a
>>
>>         list_for_each_entry_rcu(..)
>>
>> walking over the tsk->signal->thread_head list.
>>
>> But if CLONE_THREAD is set (so that we share that 'tsk->signal', then
>> we always require that CLONE_SIGHAND is also set:
>>
>>         if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
>>                 return ERR_PTR(-EINVAL);
>>
>> so we most definitely should have the same ->sighand if we have the
>> same ->signal. And that's true very much for that vhost_task_create()
>> case too.
>>
>> So as far as I can see, that bisected commit does add a new case of
>> threaded signal handling, but in no way explains the problem.
>>
>> Is there some odd exit race? The thread is removed with
>>
>>         list_del_rcu(&p->thread_node);
>>
>> in __exit_signal -> __unhash_process(), and despite the RCU
>> annotations, all these parts seem to hold the right locks too (ie
>> sighand->siglock is held by __exit_signal too), so I don't even see
>> any delayed de-allocation issue or anything like that.
>>
>> Thus bringing in Eric/Oleg to see if they see something I miss.
> 
> I expect this would take going through the rest of the reproducer
> to see what is going on.
> Hmm.  The reproducer is in something other than C:
> 
>> # https://syzkaller.appspot.com/bug?id=b7640dae2467568f05425b289a1f004faa2dc292
>> # See https://goo.gl/kgGztJ for information about syzkaller reproducers.
>> #{"repeat":true,"procs":1,"slowdown":1,"sandbox":"","sandbox_arg":0,"close_fds":false}
>> r0 = openat$vnet(0xffffffffffffff9c, &(0x7f0000000040), 0x2, 0x0)
>> ioctl$int_in(r0, 0x40000000af01, 0x0)
>> r1 = memfd_create(&(0x7f0000000400)='\xa3\x9fn\xb4dR\x04i5\x02\xac\xce\xe1\x88\x9d[@8\xd7\xce\x1f 9I\x7f\x15\x1d\x93=\xb5\xe7\\\'L\xe6\xd2\x8e\xbc)JtTDq\x81\xcf\x81\xba\xe51\xf5 \xc8\x10>\xc9\\\x85\x17L\xbf\xcf\x91\xdfM\xf3\x02^T*\x00\x02\xb9~B\x9f\xacl\x1d3\x06o\xf8\x16H\xaa*\x02\xf7\xfb\x06\xf1\x83\x92\xa8\xc2\xcb\xae\xb0\xb4\x93\xb8\x04\xf1\x99\xc2yY+\xd9y\x8a\xd5b\xe8\"q\x1b0)\xccm\xacz\xc1\xadd\x9b6a\xf3\xdds\xbb\x88\xff\b\x85\xb3s\x00\x0e\xbcfvi\x85\xfc.|\xd4h\xec\x82o\x8e\x93\x11\xc1\xd4\xae\x05\x17=\xd9R\xd0\xd4\x90\xcf\x9b\xdc\xaeV\x88\x94\x9f\xe3\xefqi\xed\xa8w\xbe\xd0\xd0-tBl\x9e+\xd3\xed\xce\x9f\x83\x86\xf9\x12\x16Ts\x80\x13]C\xfb`\xc2`\xf7\x1a\x00\x00\x00\x00\x00\x00\x00k\xae\xcb\x1a.\xc2\x8f\xd1x4]PZ\x9e\xd5Y\xf0L\xa4\xbc\x84\xf6\x04L\xff0\x8b\\*\xf9,\xb6\r\x97\xedy\xe0\x8a\xe2\x8ck\xc6S\xc3g\xb9\x1a\xf8\x8f \x9d\x00u7\xd8\'\xf1E\xa4(Q\x80Fy\xb5\xe4q\xc9\xff \xd8\x9d\xad\x11\xf8m\xd3\xbc\x9e\x10D\x7f!\xca\x0ev\x15h$\x01\xdd\xe5\xce\xf8*\xb3\x01\x85\a\xe4qv&\x9c\xac\x9aN~o\xe5\x89\xd5\a\x9f\f\x1f\xc2e/\x8d\x1e\n\xd0_\xbd!^\xa46\xb8j\xc0x\n\xdb\xe1\xa3\xd6\xae;\r\x92@\xa5I\x88Z1F\xf0\x1at\t\xd0\x8a\x04m\x06\xf3BL\xffS\x9eY\xf4\xb0U \xf8\xd00\x88y\xebX\x92\xd5\xbb\xa1h7\xf3\xe0\x0f\xbd\x02\xe4%\xf9\xb1\x87\x8aM\xfeG\xb2L\xbd\x92-\xcd\x1f\xf4\xe1,\xb7G|\xec\"\xa2\xab\xf6\x84\xe0\xcf1\x9a', 0x0)
>> write$binfmt_elf32(r1, &(0x7f0000000140)=ANY=[@ANYBLOB="7f454c466000002ed8e4f97765ce27b90300060000000000000000b738000000000035f4c38422a3bc8220000500000004020300b300000000002a002400b3d7c52ebf31a8d5c8c3c6cb00000009e500d5ffffff05ffffff03004f9ef4"], 0xd8)
>> execveat(r1, &(0x7f0000000000)='\x00', 0x0, 0x0, 0x1000)
> 
> If I read that correctly it is intending to put an elf32 executable into
> a memfd and then execute it.
> 
> Exec will possibly unshare SIGHAND struct if there is still a reference
> to it from somewhere else to ensure the new process has a clean one.
> 
> But all of the other threads should get shutdown by de_thread before any
> of that happens.  And de_thread should take care of the weird non-leader
> execve case as well.  So after that point the process really should
> be single threaded.  Which is why de_thread is the point of no return.
> 
> That whole interrupt comes in, and a fatal signal is processed
> scenario is confusing.
> 
> Hmm.  That weird vnet ioctl at the beginning must be what is starting
> the vhost logic.  So I guess it makes sense if the signal is received
> by the magic vhost thread.
> 
> Perhaps there is some weird vhost logic where the thread lingers.
> 
> Ugh.  I seem to remember a whole conversation about the vhost logic
> (immediately after it was merged) and how it had a bug where it exited
> differently from everything else.  I remember people figuring it was

The vhost code just wanted to wait for running IO from the vhost thread
context before exiting instead of working like io_uring which would wait
from a secondary thread.

> immediately ok, after the code was merged, and because everything had to
> be performed as root, and no one actually uses the vhost logic like
> that.  It has been long enough I thought that would have been sorted
> out by now.
> 
> Looking back to refresh my memory at the original conversation:
> https://lore.kernel.org/all/20230601183232.8384-1-michael.christie@oracle.com/
> 
> The bisect is 100% correct, and this was a known issue with that code at
> the time it was merged.
> 

I have patches for this that I've been testing. Will post to the virt list
for the next feature window.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [kernel?] WARNING in signal_wake_up_state
  2024-01-09 18:18 [syzbot] [kernel?] WARNING in signal_wake_up_state syzbot
  2024-01-09 19:05 ` Linus Torvalds
@ 2024-09-23  3:12 ` syzbot
  1 sibling, 0 replies; 6+ messages in thread
From: syzbot @ 2024-09-23  3:12 UTC (permalink / raw)
  To: akpm, brauner, ebiederm, jack, linux-fsdevel, linux-kernel, luto,
	mhocko, michael.christie, mst, oleg, pasha.tatashin, peterz,
	syzkaller-bugs, tandersen, tglx, torvalds, viro

syzbot suspects this issue was fixed by commit:

commit 240a1853b4d2bce51e5cac9ba65cd646152ab6d6
Author: Mike Christie <michael.christie@oracle.com>
Date:   Sat Mar 16 00:47:07 2024 +0000

    kernel: Remove signal hacks for vhost_tasks

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13212480580000
start commit:   e88c4cfcb7b8 Merge tag 'for-6.9-rc5-tag' of git://git.kern..
git tree:       upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=98d5a8e00ed1044a
dashboard link: https://syzkaller.appspot.com/bug?extid=c6d438f2d77f96cae7c2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=152442ef180000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=138c3d30980000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: kernel: Remove signal hacks for vhost_tasks

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-09-23  3:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-09 18:18 [syzbot] [kernel?] WARNING in signal_wake_up_state syzbot
2024-01-09 19:05 ` Linus Torvalds
2024-01-10 16:03   ` Oleg Nesterov
2024-01-10 16:11   ` Eric W. Biederman
2024-01-11 17:20     ` Mike Christie
2024-09-23  3:12 ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox