public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
To: Fabio Kung <fabio.kung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Glauber Costa <glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Eric W. Biederman"
	<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Subject: Re: memcg creates an unkillable task in 3.11-rc2
Date: Tue, 12 Nov 2013 17:00:15 +0100	[thread overview]
Message-ID: <20131112160015.GE6049@dhcp22.suse.cz> (raw)
In-Reply-To: <CAHyO6Z33pUJ1_MjPO2OeUY_+ZRmc1niPiFm5DzGVDokm5vb4rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thu 26-09-13 16:41:19, Fabio Kung wrote:
> On Tue, Jul 30, 2013 at 9:28 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> >
> > ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) writes:
> >
> > Ok.  I have been trying for an hour and I have not been able to
> > reproduce the weird hang with the memcg, and it used to be something I
> > could reproduce trivially.  So it appears the patch below is the fix.
> >
> > After I sleep I will see if I can turn it into a proper patch.
> 
> 
> Contributing with another data point: I am seeing similar issues with
> un-killable tasks inside LXC containers on a vanilla 3.8.11 kernel.
> The stack from zombie tasks look like this:
> 
> # cat /proc/12499/stack
> [<ffffffff81186226>] __mem_cgroup_try_charge+0xa96/0xbf0
> [<ffffffff8118670b>] __mem_cgroup_try_charge_swapin+0xab/0xd0
> [<ffffffff8118678d>] mem_cgroup_try_charge_swapin+0x5d/0x70
> [<ffffffff811524f5>] handle_pte_fault+0x315/0xac0
> [<ffffffff81152f11>] handle_mm_fault+0x271/0x3d0
> [<ffffffff815bbf3b>] __do_page_fault+0x20b/0x4c0
> [<ffffffff815bc1fe>] do_page_fault+0xe/0x10
> [<ffffffff815b8718>] page_fault+0x28/0x30
> [<ffffffff81056327>] mm_release+0x127/0x140
> [<ffffffff8105ece1>] do_exit+0x171/0xa70
> [<ffffffff8105f635>] do_group_exit+0x55/0xd0
> [<ffffffff8106fa8f>] get_signal_to_deliver+0x23f/0x5d0
> [<ffffffff81014402>] do_signal+0x42/0x600
> [<ffffffff81014a48>] do_notify_resume+0x88/0xc0
> [<ffffffff815c0b92>] int_signal+0x12/0x17
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Same symptoms that Eric described: a race condition in memcg when
> there is a page fault and the process is exiting.
> 
> I went ahead and reproduced the bug described earlier here on the same
> 3.8.11 kernel, also using the Mesos framework
> (http://mesos.apache.org/) memory Ballooning tests. The call trace
> from zombie tasks in this case look very similar:
> 
> # cat /proc/22827/stack
> [<ffffffff81186280>] __mem_cgroup_try_charge+0xaf0/0xbf0
> [<ffffffff8118670b>] __mem_cgroup_try_charge_swapin+0xab/0xd0
> [<ffffffff8118678d>] mem_cgroup_try_charge_swapin+0x5d/0x70
> [<ffffffff811524f5>] handle_pte_fault+0x315/0xac0
> [<ffffffff81152f11>] handle_mm_fault+0x271/0x3d0
> [<ffffffff815bbf3b>] __do_page_fault+0x20b/0x4c0
> [<ffffffff815bc1fe>] do_page_fault+0xe/0x10
> [<ffffffff815b8718>] page_fault+0x28/0x30
> [<ffffffff81056327>] mm_release+0x127/0x140
> [<ffffffff8105ece1>] do_exit+0x171/0xa70
> [<ffffffff8105f635>] do_group_exit+0x55/0xd0
> [<ffffffff8106fa8f>] get_signal_to_deliver+0x23f/0x5d0
> [<ffffffff81014402>] do_signal+0x42/0x600
> [<ffffffff81014a48>] do_notify_resume+0x88/0xc0
> [<ffffffff815c0b92>] int_signal+0x12/0x17
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Then, I applied Eric's patch below, and I can't reproduce the problem
> anymore. Before the patch, it was very easy to reproduce it with some
> extra memory pressure from other processes in the instance (increasing
> the probability of page faults when processes are exiting).

Could you try to reproduce with the patch posted earlier in the thread,
please? https://lkml.org/lkml/2013/7/31/94

Eric had some concerns about the patch (https://lkml.org/lkml/2013/7/31/603)
but I wasn't quite sure whether the issue he raised exists. As I tried
to explain in the follow up answer the race shouldn't exit and the
thread basically died at that state.

The memcg handling was reworked considerably since then by Johannes -
merged in 3.12 - and it has moved outside of memcg charging path.
I still think that the rework hasn't fixed this particular bug and we
still need a fix. And I would prefer if we simply set TIF_MEMDIE after
we wake up from the sleep.

> We also tried a vanilla 3.11.1 kernel, and we could reproduce the bug
> on it pretty easily.
-- 
Michal Hocko
SUSE Labs

      parent reply	other threads:[~2013-11-12 16:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-23 17:47 [GIT PULL] cgroup changes for 3.11-rc2 Tejun Heo
     [not found] ` <20130723174711.GE21100-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29  0:42   ` memcg creates an unkillable task in 3.2-rc2 Eric W. Biederman
     [not found]     ` <8761vui4cr.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29  7:59       ` Michal Hocko
     [not found]         ` <20130729075939.GA4678-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-29  8:54           ` Eric W. Biederman
     [not found]             ` <87ehahg312.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29  9:51               ` Michal Hocko
     [not found]                 ` <20130729095109.GB4678-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-29 10:21                   ` Eric W. Biederman
2013-07-29 16:10                   ` Tejun Heo
     [not found]                     ` <20130729161026.GD22605-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 17:03                       ` Eric W. Biederman
     [not found]                         ` <87r4eh70yg.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 17:20                           ` Tejun Heo
     [not found]                             ` <20130729172046.GI22605-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 18:06                               ` Eric W. Biederman
2013-07-29 18:17                               ` Michal Hocko
2013-07-29 18:13                           ` Johannes Weiner
     [not found]                             ` <20130729181354.GX715-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-29 18:52                               ` Eric W. Biederman
2013-07-30  1:58                           ` Li Zefan
     [not found]                             ` <51F71DE2.4020102-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-07-30  8:19                               ` memcg creates an unkillable task in 3.11-rc2 Eric W. Biederman
     [not found]                                 ` <87ppu0a298.fsf_-_-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-07-30 12:31                                   ` Michal Hocko
     [not found]                                     ` <20130730123120.GA15847-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-30 16:37                                       ` Eric W. Biederman
     [not found]                                         ` <874nbc3sx1.fsf-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-07-31  7:37                                           ` Michal Hocko
     [not found]                                             ` <20130731073726.GC30514-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-31 12:10                                               ` Johannes Weiner
2013-07-31 22:09                                               ` Eric W. Biederman
     [not found]                                                 ` <87zjt2tm9f.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-08-01  9:06                                                   ` Michal Hocko
     [not found]                                                     ` <20130801090620.GA5198-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-05  9:56                                                       ` Michal Hocko
     [not found]                                                         ` <20130905095653.GB9702-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-06 18:09                                                           ` Eric W. Biederman
     [not found]                                                             ` <87ob85kejy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-09  8:31                                                               ` Michal Hocko
2013-07-30 16:28                                   ` Eric W. Biederman
     [not found]                                     ` <87ppu03td7.fsf-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-09-26 23:41                                       ` Fabio Kung
     [not found]                                         ` <CAHyO6Z33pUJ1_MjPO2OeUY_+ZRmc1niPiFm5DzGVDokm5vb4rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-27  0:35                                           ` Eric W. Biederman
2013-11-12 16:00                                           ` Michal Hocko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131112160015.GE6049@dhcp22.suse.cz \
    --to=mhocko-alswssmvlrq@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=fabio.kung-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox