From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752045AbcIAWtS (ORCPT ); Thu, 1 Sep 2016 18:49:18 -0400 Received: from merlin.infradead.org ([205.233.59.134]:50248 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751011AbcIAWtR (ORCPT ); Thu, 1 Sep 2016 18:49:17 -0400 Date: Thu, 1 Sep 2016 20:04:24 +0200 From: Peter Zijlstra To: Dmitry Vyukov Cc: eparis@redhat.com, LKML , Ingo Molnar , syzkaller , Kostya Serebryany , Alexander Potapenko Subject: Re: fanotify: unkillable hanged processes Message-ID: <20160901180424.GC10153@twins.programming.kicks-ass.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 01, 2016 at 07:53:44PM +0200, Dmitry Vyukov wrote: > Hello, > > The following program: > > https://gist.githubusercontent.com/dvyukov/0952eeac71069b46b3fe0e28bd1a02bf/raw/396b9dcce2636cecab1a4161c15d3f066e6ef639/gistfile1.txt > > if run in a parallel loop creates unkillable hanged processes: > > -bash-4.3# ps afxu | grep a.out > root 4489 0.0 0.0 8868 340 ttyp1 S+ 17:19 0:00 > \_ grep a.out > root 4423 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out] > root 4424 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out] > root 4425 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out] > root 4470 0.0 0.1 7016 2316 ttyp0 D 17:16 0:00 > ./stress ./a.out > > This looks like a classical deadlock, but LOCKDEP is silent so > +LOCKDEP maintainers. fanotify_get_response() uses wait_event(), that's an asymmetric API and therefore one cannot include it in block-on chains. That is, while it blocks, it doesn't know who it blocks on etc.. > Most of the processes are hanged at (from /proc/pid/stack): > > [] __synchronize_srcu+0x248/0x380 kernel/rcu/srcu.c:448 > [] synchronize_srcu+0x1e/0x40 kernel/rcu/srcu.c:492 > [] fsnotify_mark_destroy_list+0x107/0x370 fs/notify/mark.c:551 > [] fsnotify_destroy_group+0x1e/0xc0 fs/notify/group.c:57 > [] fanotify_release+0x20b/0x2d0 > fs/notify/fanotify/fanotify_user.c:392 > [] __fput+0x28c/0x780 fs/file_table.c:208 > [] ____fput+0x15/0x20 fs/file_table.c:244 > [] task_work_run+0xf3/0x170 kernel/task_work.c:116 > [< inline >] exit_task_work ./include/linux/task_work.h:21 > [] do_exit+0x868/0x2e70 kernel/exit.c:828 > [] do_group_exit+0x108/0x330 kernel/exit.c:958 > [< inline >] SYSC_exit_group kernel/exit.c:969 > [] SyS_exit_group+0x1d/0x20 kernel/exit.c:967 > [] entry_SYSCALL_64_fastpath+0x23/0xc1 > arch/x86/entry/entry_64.S:208 > [] 0xffffffffffffffff > > One process holds the srcu lock: > > stress D ffffffff86dfe09a 28024 4470 1 0x00000004 > ffff880064d9c440 0000000000000000 ffff8800686ab0c0 ffff88006cecc080 > ffff88006d322bd8 ffff8800653f7630 ffffffff86dfe09a ffff8800653f76e0 > 0000000000000282 ffff88006d323568 ffff88006d323540 ffff880064d9c448 > Call Trace: > [] schedule+0x97/0x1c0 kernel/sched/core.c:3414 > [< inline >] fanotify_get_response fs/notify/fanotify/fanotify.c:70 > [] fanotify_handle_event+0x537/0x830 > fs/notify/fanotify/fanotify.c:233 > [< inline >] send_to_group fs/notify/fsnotify.c:179 > [] fsnotify+0x73d/0x1020 fs/notify/fsnotify.c:275 > [< inline >] fsnotify_perm ./include/linux/fsnotify.h:55 > [] security_file_open+0x151/0x190 security/security.c:887 > [] do_dentry_open+0x2ab/0xd30 fs/open.c:736 > [] vfs_open+0x105/0x220 fs/open.c:860 > [< inline >] do_last fs/namei.c:3374 > [] path_openat+0x12f9/0x2ab0 fs/namei.c:3497 > [] do_filp_open+0x18c/0x250 fs/namei.c:3532 > [] do_open_execat+0xe8/0x4d0 fs/exec.c:818 > [] do_execveat_common.isra.35+0x71f/0x1d80 fs/exec.c:1679 > [< inline >] do_execve fs/exec.c:1783 > [< inline >] SYSC_execve fs/exec.c:1864 > [] SyS_execve+0x42/0x50 fs/exec.c:1859 > [] do_syscall_64+0x1df/0x640 arch/x86/entry/common.c:288 > [] entry_SYSCALL64_slow_path+0x25/0x25 > arch/x86/entry/entry_64.S:249 > > > [ 467.548485] Showing all locks held in the system: > [ 467.548981] 2 locks held by bash/4044: > [ 467.549313] #0: (&tty->ldisc_sem){.+.+.+}, at: > [] ldsem_down_read+0x37/0x40 > [ 467.550076] #1: (&ldata->atomic_read_lock){+.+...}, at: > [] n_tty_read+0x1e5/0x1860 > [ 467.550897] 3 locks held by bash/4062: > [ 467.551210] #0: (sb_writers#5){.+.+.+}, at: [] > vfs_write+0x3a4/0x4e0 > [ 467.551923] #1: (rcu_read_lock){......}, at: [] > __handle_sysrq+0x0/0x4d0 > [ 467.552655] #2: (tasklist_lock){.+.+..}, at: [] > debug_show_all_locks+0x74/0x290 > [ 467.553454] 2 locks held by stress/4470: > [ 467.553754] #0: (&sig->cred_guard_mutex){+.+.+.}, at: > [] prepare_bprm_creds+0x53/0x110 > [ 467.554590] #1: (&fsnotify_mark_srcu){......}, at: > [] fsnotify+0x1cd/0x1020 So not quite enough information, what is bash/4044's stack trace? Because that is holding the fanotify srcu reference. But it looks like a scenario where everyone is waiting for SRCU to complete, while the task holding up SRCU completion is waiting for something else. But I'm not at all familiar with fanotify.