From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-db5eur01on0135.outbound.protection.outlook.com ([104.47.2.135]:18080 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1759198AbcHEAP4 (ORCPT ); Thu, 4 Aug 2016 20:15:56 -0400 Date: Wed, 3 Aug 2016 20:36:48 +0300 From: Cyrill Gorcunov To: Stanislav Kinsburskiy CC: , , , , , , , , Subject: Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine Message-ID: <20160803173648.GA10543@uranus> References: <20160803165412.22407.47399.stgit@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" In-Reply-To: <20160803165412.22407.47399.stgit@localhost.localdomain> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Wed, Aug 03, 2016 at 08:54:50PM +0400, Stanislav Kinsburskiy wrote: > Otherwise freezer cgroup state might never become "FROZEN". > > Here is a deadlock scheme for 2 processes in one freezer cgroup, which is > freezing: > > CPU 0 CPU 1 > -------- -------- > do_last > inode_lock(dir->d_inode) > vfs_create > nfs_create > ... > __rpc_execute > rpc_wait_bit_killable > __refrigerator > do_last > inode_lock(dir->d_inode) > > So, the problem is that one process takes directory inode mutex, executes > creation request and goes to refrigerator. > Another one waits till directory lock is released, remains "thawed" and thus > freezer cgroup state never becomes "FROZEN". > > Notes: > 1) Interesting, that this is not a pure deadlock: one can thaw cgroup and then > freeze it again. > 2) The issue was introduced by commit d310310cbff18ec385c6ab4d58f33b100192a96a. > 3) This patch is not aimed to fix the issue, but to show the problem root. > Look like this problem moght be applicable to other hunks from the commit, > mentioned above. > > Signed-off-by: Stanislav Kinsburskiy I think it's worth adding backtrace as well --- === pid: 708987 === (file_read) [] __refrigerator+0x5b/0x190 [] rpc_wait_bit_killable+0x66/0x80 [sunrpc] [] __rpc_execute+0x154/0x420 [sunrpc] [] rpc_execute+0x5e/0xa0 [sunrpc] [] rpc_run_task+0x70/0x90 [sunrpc] [] rpc_call_sync+0x50/0xc0 [sunrpc] [] nfs3_rpc_wrapper.constprop.10+0x6b/0xb0 [nfsv3] [] nfs3_proc_setattr+0xbf/0x140 [nfsv3] [] nfs3_proc_create+0x1a3/0x220 [nfsv3] [] nfs_create+0x83/0x150 [nfs] [] vfs_create+0x8c/0x110 [] do_last+0xc0d/0x11d0 [] path_openat+0xc2/0x460 [] do_filp_open+0x4b/0xb0 [] do_sys_open+0xf3/0x1f0 [] SyS_open+0x1e/0x20 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff === pid: 708988 === (file_read) [] do_last+0x283/0x11d0 [] path_openat+0xc2/0x460 [] do_filp_open+0x4b/0xb0 [] do_sys_open+0xf3/0x1f0 [] SyS_open+0x1e/0x20 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff