From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Glauber Costa <glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Subject: Re: memcg creates an unkillable task in 3.11-rc2
Date: Thu, 1 Aug 2013 11:06:20 +0200 [thread overview]
Message-ID: <20130801090620.GA5198@dhcp22.suse.cz> (raw)
In-Reply-To: <87zjt2tm9f.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
On Wed 31-07-13 15:09:16, Eric W. Biederman wrote:
> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
>
> > [I am CCing David here as well]
> >
> > On Tue 30-07-13 09:37:46, Eric W. Biederman wrote:
> >> Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> writes:
> >>
> >> > On Tue 30-07-13 01:19:31, Eric W. Biederman wrote:
> >> > [...]
> >> >> Hmm. Looking farther I see what is going on. And it has nothing to do
> >> >> with the freezer. (I have commented out that code and reproduced it
> >> >> without the freezer to be doubly certain).
> >> >>
> >> >>
> >> >> On the exit path exit_robust_list is triggering a page fault to fault a
> >> >> page back in. Which since we have no memory causes the exit path
> >> >> to get stuck in mem_cgroup_handle_oom.
> >> >
> >> > Hmm, interesting. I assume the exit is caused by the SIGKILL, right?
> >> > If yes, then why it hasn't coughed early in __mem_cgroup_try_charge
> >>
> >> Interesting question. This isn't the primary thread but we do send
> >> SIGKILL to the secondary threads as well.
> >>
> >> We definitely need those checks on both paths making my change valid.
> >>
> >> Oh. Duh! This is after we act on SIGKILL so SIGKILL is no longer
> >> pending.
> >
> > Very well spotted Eric! What do you think about the following patch?
> > I would have to check since when the exit path could trigger the fault
> > but I guess this is worth stable backport.
>
> It doesn't have a prayer of working.
So it hasn't passed your test?
> You leave open the race of a fatal signal being received before we go to
> sleep.
If a fatal signal is received before we're going to sleep then
schedule() should keep it on the runqueue, no?
static void __sched __schedule(void)
{
[...]
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely(signal_pending_state(prev->state, prev))) {
prev->state = TASK_RUNNING;
} else {
so it should get a timeslice eventually, mem_cgroup_handle_oom sees
fatal_signal_pending and sets TIF_MEMDIE, bypass the charge, get to
signal handling, start exiting, fault in, get to charge and bail out in
__mem_cgroup_try_charge because it sees TIF_MEMDIE.
Or what am I missing?
> You don't handle a task that has processed the fatal signal and is in
> PF_EXITING. Which is what I experienced.
>
> From earlier comments about my code not being early enough I thought I
> was going to see a patch in __mem_cgroup_try_change so that the bypass
> case will kick in also for tasks in PF_EXITING.
This shouldn't be necessary because TIF_MEMDIE was set for the killed
task. I was playing with PF_EXITING there as well but TIF_MEMDIE sounds
like a more appropriate solution.
> You change actually addresses things later in the code path than mine
> does.
>
> I do like your summary of the problem.
>
> Eric
>
> > ---
> > From 411408558f2858328ea25e69567e9a53a8314032 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> > Date: Wed, 31 Jul 2013 08:48:54 +0200
> > Subject: [PATCH] memcg: Do not hang on OOM when killed by userspace OOM
> >
> > Eric has reported that he can see task(s) stuck in memcg OOM handler
> > regularly. The only way out is to
> > echo 0 > $GROUP/memory.oom_controll
> >
> > His usecase is:
> > - Setup a hierarchy with memory and the freezer
> > (disable kernel oom and have a process watch for oom).
> > - In that memory cgroup add a process with one thread per cpu.
> > - In one thread slowly allocate once per second I think it is 16M of ram
> > and mlock and dirty it (just to force the pages into ram and stay there).
> > - When oom is achieved loop:
> > * attempt to freeze all of the tasks.
> > * if frozen send every task SIGKILL, unfreeze, remove the directory in
> > cgroupfs.
> >
> > Eric has then pinpointed the issue to be memcg specific.
> >
> > All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled.
> > Those that have received fatal signal will bypass the charge and should
> > continue on their way out. The tricky part is that that exit path might
> > trigger a page fault (e.g. exit_robust_list) thus the memcg charge
> > while its memcg is still under OOM because nobody has released any
> > charges. Unlike with the in-kernel OOM handler the exiting task doesn't
> > get TIF_MEMDIE set so it doesn't shortcut charges and falls to the
> > memcg OOM again without any way out of it as there are no fatal signals
> > pending anymore.
> >
> > This patch sets the TIF_MEMDIE flag pro actively in mem_cgroup_handle_oom
> > if the memcg is disabled after the task is woken up with fatal signal
> > pending. This means that any further charges will be bypassed early in
> > __mem_cgroup_try_charge and the task will have chance to exit finally.
> >
> > Strictly speaking we might mark also a task which hasn't been killed by
> > userspace OOM handler but this is not harmful as the task is going away
> > anyway and under-oom group would like to see it go as soon as possible.
> >
> > Reported-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> > Debugged-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> > Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> > ---
> > mm/memcontrol.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index d12ca6f..d4103b0 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2235,8 +2235,19 @@ static bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask,
> >
> > mem_cgroup_unmark_under_oom(memcg);
> >
> > - if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current))
> > + if (test_thread_flag(TIF_MEMDIE))
> > return false;
> > +
> > + /*
> > + * Userspace OOM killer might have killed this task but
> > + * there is no way it could have set TIF_MEMDIE as well
> > + * so we have to set it manually.
> > + */
> > + if (fatal_signal_pending(current)) {
> > + if (memcg->oom_kill_disable)
> > + set_thread_flag(TIF_MEMDIE);
> > + return false;
> > + }
> > /* Give chance to dying process */
> > schedule_timeout_uninterruptible(1);
> > return true;
> > --
> > 1.8.3.2
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2013-08-01 9:06 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-23 17:47 [GIT PULL] cgroup changes for 3.11-rc2 Tejun Heo
[not found] ` <20130723174711.GE21100-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 0:42 ` memcg creates an unkillable task in 3.2-rc2 Eric W. Biederman
[not found] ` <8761vui4cr.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 7:59 ` Michal Hocko
[not found] ` <20130729075939.GA4678-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-29 8:54 ` Eric W. Biederman
[not found] ` <87ehahg312.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 9:51 ` Michal Hocko
[not found] ` <20130729095109.GB4678-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-29 10:21 ` Eric W. Biederman
2013-07-29 16:10 ` Tejun Heo
[not found] ` <20130729161026.GD22605-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 17:03 ` Eric W. Biederman
[not found] ` <87r4eh70yg.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 17:20 ` Tejun Heo
[not found] ` <20130729172046.GI22605-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 18:06 ` Eric W. Biederman
2013-07-29 18:17 ` Michal Hocko
2013-07-29 18:13 ` Johannes Weiner
[not found] ` <20130729181354.GX715-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-29 18:52 ` Eric W. Biederman
2013-07-30 1:58 ` Li Zefan
[not found] ` <51F71DE2.4020102-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-07-30 8:19 ` memcg creates an unkillable task in 3.11-rc2 Eric W. Biederman
[not found] ` <87ppu0a298.fsf_-_-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-07-30 12:31 ` Michal Hocko
[not found] ` <20130730123120.GA15847-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-30 16:37 ` Eric W. Biederman
[not found] ` <874nbc3sx1.fsf-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-07-31 7:37 ` Michal Hocko
[not found] ` <20130731073726.GC30514-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-31 12:10 ` Johannes Weiner
2013-07-31 22:09 ` Eric W. Biederman
[not found] ` <87zjt2tm9f.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-08-01 9:06 ` Michal Hocko [this message]
[not found] ` <20130801090620.GA5198-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-05 9:56 ` Michal Hocko
[not found] ` <20130905095653.GB9702-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-06 18:09 ` Eric W. Biederman
[not found] ` <87ob85kejy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-09 8:31 ` Michal Hocko
2013-07-30 16:28 ` Eric W. Biederman
[not found] ` <87ppu03td7.fsf-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-09-26 23:41 ` Fabio Kung
[not found] ` <CAHyO6Z33pUJ1_MjPO2OeUY_+ZRmc1niPiFm5DzGVDokm5vb4rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-27 0:35 ` Eric W. Biederman
2013-11-12 16:00 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130801090620.GA5198@dhcp22.suse.cz \
--to=mhocko-alswssmvlrq@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox