From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) Subject: Re: memcg creates an unkillable task in 3.11-rc2 Date: Thu, 26 Sep 2013 17:35:43 -0700 Message-ID: <87y56jnlsw.fsf@xmission.com> References: <20130723174711.GE21100@mtj.dyndns.org> <8761vui4cr.fsf@xmission.com> <20130729075939.GA4678@dhcp22.suse.cz> <87ehahg312.fsf@xmission.com> <20130729095109.GB4678@dhcp22.suse.cz> <20130729161026.GD22605@mtj.dyndns.org> <87r4eh70yg.fsf@xmission.com> <51F71DE2.4020102@huawei.com> <87ppu0a298.fsf_-_@tw-ebiederman.twitter.com> <87ppu03td7.fsf@tw-ebiederman.twitter.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: (Fabio Kung's message of "Thu, 26 Sep 2013 16:41:19 -0700") List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Fabio Kung Cc: Glauber Costa , containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Michal Hocko , Johannes Weiner , Tejun Heo , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linus Torvalds , kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Fabio Kung writes: > On Tue, Jul 30, 2013 at 9:28 AM, Eric W. Biederman > wrote: >> >> ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) writes: >> >> Ok. I have been trying for an hour and I have not been able to >> reproduce the weird hang with the memcg, and it used to be something I >> could reproduce trivially. So it appears the patch below is the fix. >> >> After I sleep I will see if I can turn it into a proper patch. > > > Contributing with another data point: I am seeing similar issues with > un-killable tasks inside LXC containers on a vanilla 3.8.11 kernel. > The stack from zombie tasks look like this: > > # cat /proc/12499/stack > [] __mem_cgroup_try_charge+0xa96/0xbf0 > [] __mem_cgroup_try_charge_swapin+0xab/0xd0 > [] mem_cgroup_try_charge_swapin+0x5d/0x70 > [] handle_pte_fault+0x315/0xac0 > [] handle_mm_fault+0x271/0x3d0 > [] __do_page_fault+0x20b/0x4c0 > [] do_page_fault+0xe/0x10 > [] page_fault+0x28/0x30 > [] mm_release+0x127/0x140 > [] do_exit+0x171/0xa70 > [] do_group_exit+0x55/0xd0 > [] get_signal_to_deliver+0x23f/0x5d0 > [] do_signal+0x42/0x600 > [] do_notify_resume+0x88/0xc0 > [] int_signal+0x12/0x17 > [] 0xffffffffffffffff > > Same symptoms that Eric described: a race condition in memcg when > there is a page fault and the process is exiting. > > I went ahead and reproduced the bug described earlier here on the same > 3.8.11 kernel, also using the Mesos framework > (http://mesos.apache.org/) memory Ballooning tests. The call trace > from zombie tasks in this case look very similar: > > # cat /proc/22827/stack > [] __mem_cgroup_try_charge+0xaf0/0xbf0 > [] __mem_cgroup_try_charge_swapin+0xab/0xd0 > [] mem_cgroup_try_charge_swapin+0x5d/0x70 > [] handle_pte_fault+0x315/0xac0 > [] handle_mm_fault+0x271/0x3d0 > [] __do_page_fault+0x20b/0x4c0 > [] do_page_fault+0xe/0x10 > [] page_fault+0x28/0x30 > [] mm_release+0x127/0x140 > [] do_exit+0x171/0xa70 > [] do_group_exit+0x55/0xd0 > [] get_signal_to_deliver+0x23f/0x5d0 > [] do_signal+0x42/0x600 > [] do_notify_resume+0x88/0xc0 > [] int_signal+0x12/0x17 > [] 0xffffffffffffffff > > Then, I applied Eric's patch below, and I can't reproduce the problem > anymore. Before the patch, it was very easy to reproduce it with some > extra memory pressure from other processes in the instance (increasing > the probability of page faults when processes are exiting). > > We also tried a vanilla 3.11.1 kernel, and we could reproduce the bug > on it pretty easily. There are some significant fixes in 3.12-rcX. I haven't had a chance to look at them in detail yet but they look very promising. Eric