From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: memcg creates an unkillable task in 3.11-rc2 Date: Tue, 12 Nov 2013 17:00:15 +0100 Message-ID: <20131112160015.GE6049@dhcp22.suse.cz> References: <8761vui4cr.fsf@xmission.com> <20130729075939.GA4678@dhcp22.suse.cz> <87ehahg312.fsf@xmission.com> <20130729095109.GB4678@dhcp22.suse.cz> <20130729161026.GD22605@mtj.dyndns.org> <87r4eh70yg.fsf@xmission.com> <51F71DE2.4020102@huawei.com> <87ppu0a298.fsf_-_@tw-ebiederman.twitter.com> <87ppu03td7.fsf@tw-ebiederman.twitter.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Fabio Kung Cc: Glauber Costa , containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Eric W. Biederman" , Johannes Weiner , Tejun Heo , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linus Torvalds , kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org On Thu 26-09-13 16:41:19, Fabio Kung wrote: > On Tue, Jul 30, 2013 at 9:28 AM, Eric W. Biederman > wrote: > > > > ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) writes: > > > > Ok. I have been trying for an hour and I have not been able to > > reproduce the weird hang with the memcg, and it used to be something I > > could reproduce trivially. So it appears the patch below is the fix. > > > > After I sleep I will see if I can turn it into a proper patch. > > > Contributing with another data point: I am seeing similar issues with > un-killable tasks inside LXC containers on a vanilla 3.8.11 kernel. > The stack from zombie tasks look like this: > > # cat /proc/12499/stack > [] __mem_cgroup_try_charge+0xa96/0xbf0 > [] __mem_cgroup_try_charge_swapin+0xab/0xd0 > [] mem_cgroup_try_charge_swapin+0x5d/0x70 > [] handle_pte_fault+0x315/0xac0 > [] handle_mm_fault+0x271/0x3d0 > [] __do_page_fault+0x20b/0x4c0 > [] do_page_fault+0xe/0x10 > [] page_fault+0x28/0x30 > [] mm_release+0x127/0x140 > [] do_exit+0x171/0xa70 > [] do_group_exit+0x55/0xd0 > [] get_signal_to_deliver+0x23f/0x5d0 > [] do_signal+0x42/0x600 > [] do_notify_resume+0x88/0xc0 > [] int_signal+0x12/0x17 > [] 0xffffffffffffffff > > Same symptoms that Eric described: a race condition in memcg when > there is a page fault and the process is exiting. > > I went ahead and reproduced the bug described earlier here on the same > 3.8.11 kernel, also using the Mesos framework > (http://mesos.apache.org/) memory Ballooning tests. The call trace > from zombie tasks in this case look very similar: > > # cat /proc/22827/stack > [] __mem_cgroup_try_charge+0xaf0/0xbf0 > [] __mem_cgroup_try_charge_swapin+0xab/0xd0 > [] mem_cgroup_try_charge_swapin+0x5d/0x70 > [] handle_pte_fault+0x315/0xac0 > [] handle_mm_fault+0x271/0x3d0 > [] __do_page_fault+0x20b/0x4c0 > [] do_page_fault+0xe/0x10 > [] page_fault+0x28/0x30 > [] mm_release+0x127/0x140 > [] do_exit+0x171/0xa70 > [] do_group_exit+0x55/0xd0 > [] get_signal_to_deliver+0x23f/0x5d0 > [] do_signal+0x42/0x600 > [] do_notify_resume+0x88/0xc0 > [] int_signal+0x12/0x17 > [] 0xffffffffffffffff > > Then, I applied Eric's patch below, and I can't reproduce the problem > anymore. Before the patch, it was very easy to reproduce it with some > extra memory pressure from other processes in the instance (increasing > the probability of page faults when processes are exiting). Could you try to reproduce with the patch posted earlier in the thread, please? https://lkml.org/lkml/2013/7/31/94 Eric had some concerns about the patch (https://lkml.org/lkml/2013/7/31/603) but I wasn't quite sure whether the issue he raised exists. As I tried to explain in the follow up answer the race shouldn't exit and the thread basically died at that state. The memcg handling was reworked considerably since then by Johannes - merged in 3.12 - and it has moved outside of memcg charging path. I still think that the rework hasn't fixed this particular bug and we still need a fix. And I would prefer if we simply set TIF_MEMDIE after we wake up from the sleep. > We also tried a vanilla 3.11.1 kernel, and we could reproduce the bug > on it pretty easily. -- Michal Hocko SUSE Labs