From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-it1-f196.google.com ([209.85.166.196]:39905 "EHLO
        mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726060AbeKCE2W (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Sat, 3 Nov 2018 00:28:22 -0400
Received: by mail-it1-f196.google.com with SMTP id m15so4615532itl.4
        for <linux-fsdevel@vger.kernel.org>; Fri, 02 Nov 2018 12:19:59 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <727110bb-0154-e5df-4b2f-e965e3b98c62@i-love.sakura.ne.jp>
References: <0000000000002f5541057143a85e@google.com> <eb295f49-78b3-62f5-84a8-221671c84634@I-love.SAKURA.ne.jp>
 <CACT4Y+bWR8s4_Chd33cjZ_qbTDyiciAQfmYP8fDv+r3Ar7X8iw@mail.gmail.com>
 <CACT4Y+bM1tfisT+7=ZChHRZKZkddbWp0kNi5q46EqOYBSptXtg@mail.gmail.com>
 <0adc592b-d4a3-f6da-3c5c-22490f641eb9@i-love.sakura.ne.jp>
 <CACT4Y+YqDFNrdtk+aet9UVBxyvxs+O85YLnMYL3FvoPckBx-Mg@mail.gmail.com>
 <fe20c22a-028b-b2a3-7714-c90b7fe51cbf@i-love.sakura.ne.jp>
 <CACT4Y+YShmryp_RJErUrjP=9GJ3YcRPLXyQP8XKoA9AW114Ygg@mail.gmail.com> <727110bb-0154-e5df-4b2f-e965e3b98c62@i-love.sakura.ne.jp>
From: Dmitry Vyukov <dvyukov@google.com>
Date: Fri, 2 Nov 2018 20:19:38 +0100
Message-ID: <CACT4Y+Z-jwUJsfisdevXZdReWYgNnLELhgDtkVH53GC_mZqEPw@mail.gmail.com>
Subject: Re: INFO: task hung in grab_super
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Eric Van Hensbergen <ericvh@gmail.com>,
        Ron Minnich <rminnich@sandia.gov>,
        Latchesar Ionkov <lucho@ionkov.net>,
        v9fs-developer@lists.sourceforge.net,
        syzbot <syzbot+f425456ea8aa16b40d20@syzkaller.appspotmail.com>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        LKML <linux-kernel@vger.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
        Al Viro <viro@zeniv.linux.org.uk>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed, Jul 18, 2018 at 4:17 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> On 2018/07/18 23:11, Dmitry Vyukov wrote:
>> On Wed, Jul 18, 2018 at 3:35 PM, Tetsuo Handa
>> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>>>>> This seems to be related to 9p. After rerunning the log I got:
>>>>>>
>>>>>> root@syzkaller:~# ps afxu | grep syz
>>>>>> root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
>>>>>> [syz-executor] <defunct>
>>>>>> root@syzkaller:~# cat /proc/18253/task/*/stack
>>>>>> [<0>] p9_client_rpc+0x3a2/0x1400
>>>>>> [<0>] p9_client_flush+0x134/0x2a0
>>>>>> [<0>] p9_client_rpc+0x122c/0x1400
>>>>>> [<0>] p9_client_create+0xc56/0x16af
>>>>>> [<0>] v9fs_session_init+0x21a/0x1a80
>>>>>> [<0>] v9fs_mount+0x7c/0x900
>>>>>> [<0>] mount_fs+0xae/0x328
>>>>>> [<0>] vfs_kern_mount.part.34+0xdc/0x4e0
>>>>>> [<0>] do_mount+0x581/0x30e0
>>>>>> [<0>] ksys_mount+0x12d/0x140
>>>>>> [<0>] __x64_sys_mount+0xbe/0x150
>>>>>> [<0>] do_syscall_64+0x1b9/0x820
>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>> [<0>] 0xffffffffffffffff
>>>>>>
>>>>>> There is a bunch of hangs in 9p, so let's do:
>>>>>>
>>>>>> #syz dup: INFO: task hung in flush_work
>>>>>>
>>>>> Then, is dumping all threads when khungtaskd fires a candidate
>>>>> for CONFIG_DEBUG_AID_FOR_SYZBOT=y path?
>>>>
>>>> Perhaps would be useful. But maybe only tasks that are blocked for
>>>> more than timeout/2? and/or unkillable tasks? killable tasks are not a
>>>> problem.
>>>
>>> TASK_KILLABLE waiters are not reported by khungtaskd, are they?
>>>
>>>   /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
>>>   if (t->state == TASK_UNINTERRUPTIBLE)
>>>     check_hung_task(t, timeout);
>>>
>>> And TASK_KILLABLE waiters can become a problem because
>>>
>>>>
>>>> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it
>>>> killed along with the whole process?
>>>>
>>>
>>> wait_event_killable() would return -ERESTARTSYS if got SIGKILL.
>>> But if (c->status == Connected) && (type == P9_TFLUSH) is also true,
>>> it ignores SIGKILL by retrying the loop...
>>>
>>>   again:
>>>     err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
>>>     if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) {
>>>       sigpending = 1;
>>>       clear_thread_flag(TIF_SIGPENDING);
>>>       goto again;
>>>     }
>>>
>>> I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread).
>>
>>
>> I guess that's the problem, right? SIGKILL-ed task must not ignore
>> SIGKILL and hang in infinite loop. This would explain a bunch of hangs
>> in 9p.
>
> Did you check /proc/18253/task/*/stack after manually sending SIGKILL?

Yes:

root@syzkaller:~# ps afxu | grep syz
root     18253  0.0  0.0      0     0 ttyS0    Zl   10:16   0:00  \_
[syz-executor] <defunct>
root@syzkaller:~# cat /proc/18253/task/*/stack
[<0>] p9_client_rpc+0x3a2/0x1400
[<0>] p9_client_flush+0x134/0x2a0
[<0>] p9_client_rpc+0x122c/0x1400
[<0>] p9_client_create+0xc56/0x16af
[<0>] v9fs_session_init+0x21a/0x1a80
[<0>] v9fs_mount+0x7c/0x900
[<0>] mount_fs+0xae/0x328
[<0>] vfs_kern_mount.part.34+0xdc/0x4e0
[<0>] do_mount+0x581/0x30e0
[<0>] ksys_mount+0x12d/0x140
[<0>] __x64_sys_mount+0xbe/0x150
[<0>] do_syscall_64+0x1b9/0x820
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff



> I mean, who (i.e. you or syzkaller programs) is sending a signal (not limited
> to SIGKILL but any signal) that makes TASK_KILLABLE waiters to wake up?

Both. syzkaller always SIGKILLs test process after some timeout and
expects it to go away. I also tried manually after that, but it does
not make any difference.