From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) Subject: Re: memcg creates an unkillable task in 3.11-rc2 Date: Thu, 26 Sep 2013 17:35:43 -0700 Message-ID: <87y56jnlsw.fsf@xmission.com> References: <20130723174711.GE21100@mtj.dyndns.org> <8761vui4cr.fsf@xmission.com> <20130729075939.GA4678@dhcp22.suse.cz> <87ehahg312.fsf@xmission.com> <20130729095109.GB4678@dhcp22.suse.cz> <20130729161026.GD22605@mtj.dyndns.org> <87r4eh70yg.fsf@xmission.com> <51F71DE2.4020102@huawei.com> <87ppu0a298.fsf_-_@tw-ebiederman.twitter.com> <87ppu03td7.fsf@tw-ebiederman.twitter.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: (Fabio Kung's message of "Thu, 26 Sep 2013 16:41:19 -0700") List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Fabio Kung Cc: Glauber Costa , containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Michal Hocko , Johannes Weiner , Tejun Heo , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linus Torvalds , kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Fabio Kung writes: > On Tue, Jul 30, 2013 at 9:28 AM, Eric W. Biederman > wrote: >> >> ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) writes: >> >> Ok. I have been trying for an hour and I have not been able to >> reproduce the weird hang with the memcg, and it used to be something I >> could reproduce trivially. So it appears the patch below is the fix. >> >> After I sleep I will see if I can turn it into a proper patch. > > > Contributing with another data point: I am seeing similar issues with > un-killable tasks inside LXC containers on a vanilla 3.8.11 kernel. > The stack from zombie tasks look like this: > > # cat /proc/12499/stack > [] __mem_cgroup_try_charge+0xa96/0xbf0 > [] __mem_cgroup_try_charge_swapin+0xab/0xd0 > [] mem_cgroup_try_charge_swapin+0x5d/0x70 > [] handle_pte_fault+0x315/0xac0 > [] handle_mm_fault+0x271/0x3d0 > [] __do_page_fault+0x20b/0x4c0 > [] do_page_fault+0xe/0x10 > [] page_fault+0x28/0x30 > [] mm_release+0x127/0x140 > [] do_exit+0x171/0xa70 > [] do_group_exit+0x55/0xd0 > [] get_signal_to_deliver+0x23f/0x5d0 > [] do_signal+0x42/0x600 > [] do_notify_resume+0x88/0xc0 > [] int_signal+0x12/0x17 > [] 0xffffffffffffffff > > Same symptoms that Eric described: a race condition in memcg when > there is a page fault and the process is exiting. > > I went ahead and reproduced the bug described earlier here on the same > 3.8.11 kernel, also using the Mesos framework > (http://mesos.apache.org/) memory Ballooning tests. The call trace > from zombie tasks in this case look very similar: > > # cat /proc/22827/stack > [] __mem_cgroup_try_charge+0xaf0/0xbf0 > [] __mem_cgroup_try_charge_swapin+0xab/0xd0 > [] mem_cgroup_try_charge_swapin+0x5d/0x70 > [] handle_pte_fault+0x315/0xac0 > [] handle_mm_fault+0x271/0x3d0 > [] __do_page_fault+0x20b/0x4c0 > [] do_page_fault+0xe/0x10 > [] page_fault+0x28/0x30 > [] mm_release+0x127/0x140 > [] do_exit+0x171/0xa70 > [] do_group_exit+0x55/0xd0 > [] get_signal_to_deliver+0x23f/0x5d0 > [] do_signal+0x42/0x600 > [] do_notify_resume+0x88/0xc0 > [] int_signal+0x12/0x17 > [] 0xffffffffffffffff > > Then, I applied Eric's patch below, and I can't reproduce the problem > anymore. Before the patch, it was very easy to reproduce it with some > extra memory pressure from other processes in the instance (increasing > the probability of page faults when processes are exiting). > > We also tried a vanilla 3.11.1 kernel, and we could reproduce the bug > on it pretty easily. There are some significant fixes in 3.12-rcX. I haven't had a chance to look at them in detail yet but they look very promising. Eric From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751156Ab3I0Afy (ORCPT ); Thu, 26 Sep 2013 20:35:54 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:33983 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750834Ab3I0Afv (ORCPT ); Thu, 26 Sep 2013 20:35:51 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Fabio Kung Cc: Li Zefan , Tejun Heo , Michal Hocko , Linus Torvalds , cgroups@vger.kernel.org, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, kent.overstreet@gmail.com, Glauber Costa , Johannes Weiner References: <20130723174711.GE21100@mtj.dyndns.org> <8761vui4cr.fsf@xmission.com> <20130729075939.GA4678@dhcp22.suse.cz> <87ehahg312.fsf@xmission.com> <20130729095109.GB4678@dhcp22.suse.cz> <20130729161026.GD22605@mtj.dyndns.org> <87r4eh70yg.fsf@xmission.com> <51F71DE2.4020102@huawei.com> <87ppu0a298.fsf_-_@tw-ebiederman.twitter.com> <87ppu03td7.fsf@tw-ebiederman.twitter.com> Date: Thu, 26 Sep 2013 17:35:43 -0700 In-Reply-To: (Fabio Kung's message of "Thu, 26 Sep 2013 16:41:19 -0700") Message-ID: <87y56jnlsw.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX1/y6Aw0Hh7T924GSxVuQbGEFe5BH3R9qeQ= X-SA-Exim-Connect-IP: 98.207.154.105 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 T_TM2_M_HEADER_IN_MSG BODY: T_TM2_M_HEADER_IN_MSG * 1.2 LotsOfNums_01 BODY: Lots of long strings of numbers * -0.0 BAYES_40 BODY: Bayes spam probability is 20 to 40% * [score: 0.2831] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa01 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa01 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Fabio Kung X-Spam-Relay-Country: Subject: Re: memcg creates an unkillable task in 3.11-rc2 X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 14 Nov 2012 14:26:46 -0700) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Fabio Kung writes: > On Tue, Jul 30, 2013 at 9:28 AM, Eric W. Biederman > wrote: >> >> ebiederm@xmission.com (Eric W. Biederman) writes: >> >> Ok. I have been trying for an hour and I have not been able to >> reproduce the weird hang with the memcg, and it used to be something I >> could reproduce trivially. So it appears the patch below is the fix. >> >> After I sleep I will see if I can turn it into a proper patch. > > > Contributing with another data point: I am seeing similar issues with > un-killable tasks inside LXC containers on a vanilla 3.8.11 kernel. > The stack from zombie tasks look like this: > > # cat /proc/12499/stack > [] __mem_cgroup_try_charge+0xa96/0xbf0 > [] __mem_cgroup_try_charge_swapin+0xab/0xd0 > [] mem_cgroup_try_charge_swapin+0x5d/0x70 > [] handle_pte_fault+0x315/0xac0 > [] handle_mm_fault+0x271/0x3d0 > [] __do_page_fault+0x20b/0x4c0 > [] do_page_fault+0xe/0x10 > [] page_fault+0x28/0x30 > [] mm_release+0x127/0x140 > [] do_exit+0x171/0xa70 > [] do_group_exit+0x55/0xd0 > [] get_signal_to_deliver+0x23f/0x5d0 > [] do_signal+0x42/0x600 > [] do_notify_resume+0x88/0xc0 > [] int_signal+0x12/0x17 > [] 0xffffffffffffffff > > Same symptoms that Eric described: a race condition in memcg when > there is a page fault and the process is exiting. > > I went ahead and reproduced the bug described earlier here on the same > 3.8.11 kernel, also using the Mesos framework > (http://mesos.apache.org/) memory Ballooning tests. The call trace > from zombie tasks in this case look very similar: > > # cat /proc/22827/stack > [] __mem_cgroup_try_charge+0xaf0/0xbf0 > [] __mem_cgroup_try_charge_swapin+0xab/0xd0 > [] mem_cgroup_try_charge_swapin+0x5d/0x70 > [] handle_pte_fault+0x315/0xac0 > [] handle_mm_fault+0x271/0x3d0 > [] __do_page_fault+0x20b/0x4c0 > [] do_page_fault+0xe/0x10 > [] page_fault+0x28/0x30 > [] mm_release+0x127/0x140 > [] do_exit+0x171/0xa70 > [] do_group_exit+0x55/0xd0 > [] get_signal_to_deliver+0x23f/0x5d0 > [] do_signal+0x42/0x600 > [] do_notify_resume+0x88/0xc0 > [] int_signal+0x12/0x17 > [] 0xffffffffffffffff > > Then, I applied Eric's patch below, and I can't reproduce the problem > anymore. Before the patch, it was very easy to reproduce it with some > extra memory pressure from other processes in the instance (increasing > the probability of page faults when processes are exiting). > > We also tried a vanilla 3.11.1 kernel, and we could reproduce the bug > on it pretty easily. There are some significant fixes in 3.12-rcX. I haven't had a chance to look at them in detail yet but they look very promising. Eric