From mboxrd@z Thu Jan 1 00:00:00 1970 From: Petr Mladek Subject: Re: [PATCH] capabilities: add capability cgroup controller Date: Fri, 8 Jul 2016 11:13:32 +0200 Message-ID: <20160708091332.GD3556@pathway.suse.cz> References: <20160624172447.GA3262@mtj.duckdns.org> <47890d79-0891-dd13-4f60-e7e5f1f3fed3@gmail.com> <20160627145457.GA26980@mail.hallyn.com> <58938c8b-aca6-a5b8-9533-58e78d878e85@gmail.com> <20160627194941.GA31843@mail.hallyn.com> <218f2bef-5e5e-89c4-154b-24dc49c82c31@gmail.com> <20160707091645.GG3238@pathway.suse.cz> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-doc-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Topi Miettinen Cc: "Serge E. Hallyn" , "Eric W. Biederman" , Tejun Heo , lkml , luto@kernel.org, Kees Cook , Jonathan Corbet , Li Zefan , Johannes Weiner , Serge Hallyn , James Morris , Andrew Morton , David Howells , David Woodhouse , Ard Biesheuvel , "Paul E. McKenney" , "open list:DOCUMENTATION" , "open list:CONTROL GROUP (CGROUP)" , "open list:CAPABILITIES" On Thu 2016-07-07 20:27:13, Topi Miettinen wrote: > On 07/07/16 09:16, Petr Mladek wrote: > > On Sun 2016-07-03 15:08:07, Topi Miettinen wrote: > >> The attached patch would make any uses of capabilities generate audit > >> messages. It works for simple tests as you can see from the commit > >> message, but unfortunately the call to audit_cgroup_list() deadlocks the > >> system when booting a full blown OS. There's no deadlock when the call > >> is removed. > >> > >> I guess that in some cases, cgroup_mutex and/or css_set_lock could be > >> already held earlier before entering audit_cgroup_list(). Holding the > >> locks is however required by task_cgroup_from_root(). Is there any way > >> to avoid this? For example, only print some kind of cgroup ID numbers > >> (are there unique and stable IDs, available without locks?) for those > >> cgroups where the task is registered in the audit message? > > > > I am not sure if anyone know what really happens here. I suggest to > > enable lockdep. It might detect possible deadlock even before it > > really happens, see Documentation/locking/lockdep-design.txt > > > > It can be enabled by > > > > CONFIG_PROVE_LOCKING=y > > > > It depends on > > > > CONFIG_DEBUG_KERNEL=y > > > > and maybe some more options, see lib/Kconfig.debug > > Thanks a lot! I caught this stack dump: > > starting version 230 > [ 3.416647] ------------[ cut here ]------------ > [ 3.417310] WARNING: CPU: 0 PID: 95 at > /home/topi/d/linux.git/kernel/locking/lockdep.c:2871 > lockdep_trace_alloc+0xb4/0xc0 > [ 3.417605] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) > [ 3.417923] Modules linked in: > [ 3.418288] CPU: 0 PID: 95 Comm: systemd-udevd Not tainted 4.7.0-rc5+ #97 > [ 3.418444] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS Debian-1.8.2-1 04/01/2014 > [ 3.418726] 0000000000000086 000000007970f3b0 ffff88000016fb00 > ffffffff813c9c45 > [ 3.418993] ffff88000016fb50 0000000000000000 ffff88000016fb40 > ffffffff81091e9b > [ 3.419176] 00000b3705e2c798 0000000000000046 0000000000000410 > 00000000ffffffff > [ 3.419374] Call Trace: > [ 3.419511] [] dump_stack+0x67/0x92 > [ 3.419644] [] __warn+0xcb/0xf0 > [ 3.419745] [] warn_slowpath_fmt+0x5f/0x80 > [ 3.419868] [] lockdep_trace_alloc+0xb4/0xc0 > [ 3.419988] [] kmem_cache_alloc_node+0x42/0x600 > [ 3.420156] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [ 3.420170] [] __alloc_skb+0x5b/0x1d0 > [ 3.420170] [] audit_log_start+0x29b/0x480 > [ 3.420170] [] ? __lock_task_sighand+0x95/0x270 > [ 3.420170] [] audit_log_cap_use+0x39/0xf0 > [ 3.420170] [] ns_capable+0x45/0x70 > [ 3.420170] [] capable+0x17/0x20 > [ 3.420170] [] oom_score_adj_write+0x150/0x2f0 > [ 3.420170] [] __vfs_write+0x37/0x160 > [ 3.420170] [] ? update_fast_ctr+0x17/0x30 > [ 3.420170] [] ? percpu_down_read+0x49/0x90 > [ 3.420170] [] ? __sb_start_write+0xb7/0xf0 > [ 3.420170] [] ? __sb_start_write+0xb7/0xf0 > [ 3.420170] [] vfs_write+0xb8/0x1b0 > [ 3.420170] [] ? __fget_light+0x66/0x90 > [ 3.420170] [] SyS_write+0x58/0xc0 > [ 3.420170] [] do_syscall_64+0x5c/0x300 > [ 3.420170] [] entry_SYSCALL64_slow_path+0x25/0x25 > [ 3.420170] ---[ end trace fb586899fb556a5e ]--- > [ 3.447922] random: systemd-udevd urandom read with 3 bits of entropy > available > [ 4.014078] clocksource: Switched to clocksource tsc > Begin: Loading essential drivers ... done. > > This is with qemu and the boot continues normally. With real computer, > there's no such output and system just seems to freeze. > > Could it be possible that the deadlock happens because there's some IO > towards /sys/fs/cgroup, which causes a capability check and that in turn > causes locking problems when we try to print cgroup list? The above warning is printed by the code from kernel/locking/lockdep.c:2871 static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags) { [...] /* We're only interested __GFP_FS allocations for now */ if (!(gfp_mask & __GFP_FS)) return; /* * Oi! Can't be having __GFP_FS allocations with IRQs disabled. */ if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))) return; The backtrace shows that your new audit_log_cap_use() is called from vfs_write(). You might try to use audit_log_start() with GFP_NOFS instead of GFP_KERNEL. Note that this is rather intuitive advice. I still need to learn a lot about memory management and kernel in general to be more sure about a correct solution. Best Regards, Petr