From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Cc: "Eric W. Biederman"
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
Glauber Costa <glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Subject: Re: memcg creates an unkillable task in 3.11-rc2
Date: Wed, 31 Jul 2013 08:10:52 -0400 [thread overview]
Message-ID: <20130731121052.GK715@cmpxchg.org> (raw)
In-Reply-To: <20130731073726.GC30514-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
On Wed, Jul 31, 2013 at 09:37:26AM +0200, Michal Hocko wrote:
> [I am CCing David here as well]
>
> On Tue 30-07-13 09:37:46, Eric W. Biederman wrote:
> > Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
> >
> > > On Tue 30-07-13 01:19:31, Eric W. Biederman wrote:
> > > [...]
> > >> Hmm. Looking farther I see what is going on. And it has nothing to do
> > >> with the freezer. (I have commented out that code and reproduced it
> > >> without the freezer to be doubly certain).
> > >>
> > >>
> > >> On the exit path exit_robust_list is triggering a page fault to fault a
> > >> page back in. Which since we have no memory causes the exit path
> > >> to get stuck in mem_cgroup_handle_oom.
> > >
> > > Hmm, interesting. I assume the exit is caused by the SIGKILL, right?
> > > If yes, then why it hasn't coughed early in __mem_cgroup_try_charge
> >
> > Interesting question. This isn't the primary thread but we do send
> > SIGKILL to the secondary threads as well.
> >
> > We definitely need those checks on both paths making my change valid.
> >
> > Oh. Duh! This is after we act on SIGKILL so SIGKILL is no longer
> > pending.
>
> Very well spotted Eric! What do you think about the following patch?
> I would have to check since when the exit path could trigger the fault
> but I guess this is worth stable backport.
> ---
> >From 411408558f2858328ea25e69567e9a53a8314032 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Date: Wed, 31 Jul 2013 08:48:54 +0200
> Subject: [PATCH] memcg: Do not hang on OOM when killed by userspace OOM
>
> Eric has reported that he can see task(s) stuck in memcg OOM handler
> regularly. The only way out is to
> echo 0 > $GROUP/memory.oom_controll
>
> His usecase is:
> - Setup a hierarchy with memory and the freezer
> (disable kernel oom and have a process watch for oom).
> - In that memory cgroup add a process with one thread per cpu.
> - In one thread slowly allocate once per second I think it is 16M of ram
> and mlock and dirty it (just to force the pages into ram and stay there).
> - When oom is achieved loop:
> * attempt to freeze all of the tasks.
> * if frozen send every task SIGKILL, unfreeze, remove the directory in
> cgroupfs.
>
> Eric has then pinpointed the issue to be memcg specific.
>
> All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled.
> Those that have received fatal signal will bypass the charge and should
> continue on their way out. The tricky part is that that exit path might
> trigger a page fault (e.g. exit_robust_list) thus the memcg charge
> while its memcg is still under OOM because nobody has released any
> charges. Unlike with the in-kernel OOM handler the exiting task doesn't
> get TIF_MEMDIE set so it doesn't shortcut charges and falls to the
> memcg OOM again without any way out of it as there are no fatal signals
> pending anymore.
>
> This patch sets the TIF_MEMDIE flag pro actively in mem_cgroup_handle_oom
> if the memcg is disabled after the task is woken up with fatal signal
> pending. This means that any further charges will be bypassed early in
> __mem_cgroup_try_charge and the task will have chance to exit finally.
>
> Strictly speaking we might mark also a task which hasn't been killed by
> userspace OOM handler but this is not harmful as the task is going away
> anyway and under-oom group would like to see it go as soon as possible.
>
> Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Debugged-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
Looks good to me, FWIW.
Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Li Zefan <lizefan@huawei.com>, Tejun Heo <tj@kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
cgroups@vger.kernel.org, containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, kent.overstreet@gmail.com,
Glauber Costa <glommer@gmail.com>,
David Rientjes <rientjes@google.com>
Subject: Re: memcg creates an unkillable task in 3.11-rc2
Date: Wed, 31 Jul 2013 08:10:52 -0400 [thread overview]
Message-ID: <20130731121052.GK715@cmpxchg.org> (raw)
In-Reply-To: <20130731073726.GC30514@dhcp22.suse.cz>
On Wed, Jul 31, 2013 at 09:37:26AM +0200, Michal Hocko wrote:
> [I am CCing David here as well]
>
> On Tue 30-07-13 09:37:46, Eric W. Biederman wrote:
> > Michal Hocko <mhocko@suse.cz> writes:
> >
> > > On Tue 30-07-13 01:19:31, Eric W. Biederman wrote:
> > > [...]
> > >> Hmm. Looking farther I see what is going on. And it has nothing to do
> > >> with the freezer. (I have commented out that code and reproduced it
> > >> without the freezer to be doubly certain).
> > >>
> > >>
> > >> On the exit path exit_robust_list is triggering a page fault to fault a
> > >> page back in. Which since we have no memory causes the exit path
> > >> to get stuck in mem_cgroup_handle_oom.
> > >
> > > Hmm, interesting. I assume the exit is caused by the SIGKILL, right?
> > > If yes, then why it hasn't coughed early in __mem_cgroup_try_charge
> >
> > Interesting question. This isn't the primary thread but we do send
> > SIGKILL to the secondary threads as well.
> >
> > We definitely need those checks on both paths making my change valid.
> >
> > Oh. Duh! This is after we act on SIGKILL so SIGKILL is no longer
> > pending.
>
> Very well spotted Eric! What do you think about the following patch?
> I would have to check since when the exit path could trigger the fault
> but I guess this is worth stable backport.
> ---
> >From 411408558f2858328ea25e69567e9a53a8314032 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Wed, 31 Jul 2013 08:48:54 +0200
> Subject: [PATCH] memcg: Do not hang on OOM when killed by userspace OOM
>
> Eric has reported that he can see task(s) stuck in memcg OOM handler
> regularly. The only way out is to
> echo 0 > $GROUP/memory.oom_controll
>
> His usecase is:
> - Setup a hierarchy with memory and the freezer
> (disable kernel oom and have a process watch for oom).
> - In that memory cgroup add a process with one thread per cpu.
> - In one thread slowly allocate once per second I think it is 16M of ram
> and mlock and dirty it (just to force the pages into ram and stay there).
> - When oom is achieved loop:
> * attempt to freeze all of the tasks.
> * if frozen send every task SIGKILL, unfreeze, remove the directory in
> cgroupfs.
>
> Eric has then pinpointed the issue to be memcg specific.
>
> All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled.
> Those that have received fatal signal will bypass the charge and should
> continue on their way out. The tricky part is that that exit path might
> trigger a page fault (e.g. exit_robust_list) thus the memcg charge
> while its memcg is still under OOM because nobody has released any
> charges. Unlike with the in-kernel OOM handler the exiting task doesn't
> get TIF_MEMDIE set so it doesn't shortcut charges and falls to the
> memcg OOM again without any way out of it as there are no fatal signals
> pending anymore.
>
> This patch sets the TIF_MEMDIE flag pro actively in mem_cgroup_handle_oom
> if the memcg is disabled after the task is woken up with fatal signal
> pending. This means that any further charges will be bypassed early in
> __mem_cgroup_try_charge and the task will have chance to exit finally.
>
> Strictly speaking we might mark also a task which hasn't been killed by
> userspace OOM handler but this is not harmful as the task is going away
> anyway and under-oom group would like to see it go as soon as possible.
>
> Reported-by: Eric W. Biederman <ebiederm@xmission.com>
> Debugged-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
Looks good to me, FWIW.
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
next prev parent reply other threads:[~2013-07-31 12:10 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-23 17:47 [GIT PULL] cgroup changes for 3.11-rc2 Tejun Heo
2013-07-23 17:47 ` Tejun Heo
[not found] ` <20130723174711.GE21100-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 0:42 ` memcg creates an unkillable task in 3.2-rc2 Eric W. Biederman
2013-07-29 0:42 ` Eric W. Biederman
[not found] ` <8761vui4cr.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 7:59 ` Michal Hocko
2013-07-29 7:59 ` Michal Hocko
[not found] ` <20130729075939.GA4678-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-29 8:54 ` Eric W. Biederman
2013-07-29 8:54 ` Eric W. Biederman
[not found] ` <87ehahg312.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 9:51 ` Michal Hocko
2013-07-29 9:51 ` Michal Hocko
[not found] ` <20130729095109.GB4678-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-29 10:21 ` Eric W. Biederman
2013-07-29 10:21 ` Eric W. Biederman
2013-07-29 10:21 ` Eric W. Biederman
2013-07-29 16:10 ` Tejun Heo
2013-07-29 16:10 ` Tejun Heo
[not found] ` <20130729161026.GD22605-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 17:03 ` Eric W. Biederman
2013-07-29 17:03 ` Eric W. Biederman
[not found] ` <87r4eh70yg.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 17:20 ` Tejun Heo
2013-07-29 17:20 ` Tejun Heo
[not found] ` <20130729172046.GI22605-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 18:06 ` Eric W. Biederman
2013-07-29 18:06 ` Eric W. Biederman
2013-07-29 18:06 ` Eric W. Biederman
2013-07-29 18:17 ` Michal Hocko
2013-07-29 18:17 ` Michal Hocko
2013-07-29 18:17 ` Michal Hocko
2013-07-29 17:20 ` Tejun Heo
2013-07-29 18:13 ` Johannes Weiner
2013-07-29 18:13 ` Johannes Weiner
[not found] ` <20130729181354.GX715-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-29 18:52 ` Eric W. Biederman
2013-07-29 18:52 ` Eric W. Biederman
2013-07-29 18:52 ` Eric W. Biederman
2013-07-30 1:58 ` Li Zefan
2013-07-30 1:58 ` Li Zefan
2013-07-30 1:58 ` Li Zefan
[not found] ` <51F71DE2.4020102-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-07-30 8:19 ` memcg creates an unkillable task in 3.11-rc2 Eric W. Biederman
2013-07-30 8:19 ` Eric W. Biederman
[not found] ` <87ppu0a298.fsf_-_-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-07-30 12:31 ` Michal Hocko
2013-07-30 12:31 ` Michal Hocko
[not found] ` <20130730123120.GA15847-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-30 16:37 ` Eric W. Biederman
2013-07-30 16:37 ` Eric W. Biederman
[not found] ` <874nbc3sx1.fsf-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-07-31 7:37 ` Michal Hocko
2013-07-31 7:37 ` Michal Hocko
[not found] ` <20130731073726.GC30514-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-31 12:10 ` Johannes Weiner
2013-07-31 12:10 ` Johannes Weiner [this message]
2013-07-31 12:10 ` Johannes Weiner
2013-07-31 22:09 ` Eric W. Biederman
2013-07-31 22:09 ` Eric W. Biederman
2013-07-31 22:09 ` Eric W. Biederman
[not found] ` <87zjt2tm9f.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-08-01 9:06 ` Michal Hocko
2013-08-01 9:06 ` Michal Hocko
[not found] ` <20130801090620.GA5198-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-05 9:56 ` Michal Hocko
2013-09-05 9:56 ` Michal Hocko
[not found] ` <20130905095653.GB9702-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-06 18:09 ` Eric W. Biederman
2013-09-06 18:09 ` Eric W. Biederman
[not found] ` <87ob85kejy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-09 8:31 ` Michal Hocko
2013-09-09 8:31 ` Michal Hocko
2013-09-09 8:31 ` Michal Hocko
2013-09-05 9:56 ` Michal Hocko
2013-07-30 16:37 ` Eric W. Biederman
2013-07-30 16:28 ` Eric W. Biederman
2013-07-30 16:28 ` Eric W. Biederman
[not found] ` <87ppu03td7.fsf-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-09-26 23:41 ` Fabio Kung
2013-09-26 23:41 ` Fabio Kung
2013-09-26 23:41 ` Fabio Kung
[not found] ` <CAHyO6Z33pUJ1_MjPO2OeUY_+ZRmc1niPiFm5DzGVDokm5vb4rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-27 0:35 ` Eric W. Biederman
2013-09-27 0:35 ` Eric W. Biederman
2013-11-12 16:00 ` Michal Hocko
2013-11-12 16:00 ` Michal Hocko
2013-07-29 16:10 ` memcg creates an unkillable task in 3.2-rc2 Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130731121052.GK715@cmpxchg.org \
--to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.