From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
Linus Torvalds
<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
Glauber Costa <glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Subject: Re: memcg creates an unkillable task in 3.2-rc2
Date: Mon, 29 Jul 2013 11:06:24 -0700 [thread overview]
Message-ID: <8738qx1brz.fsf@xmission.com> (raw)
In-Reply-To: <20130729172046.GI22605-9pTldWuhBndy/B6EtB590w@public.gmane.org> (Tejun Heo's message of "Mon, 29 Jul 2013 13:20:46 -0400")
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> writes:
> Hey, Eric.
>
> On Mon, Jul 29, 2013 at 10:03:35AM -0700, Eric W. Biederman wrote:
>> So this is not a simple matter of a frozen task not dying when SIGKILL
>> is received. For the most part not dying when SIGKILL is received seems
>> like correct behavior for a frozne task. Certainly it is correct
>> behavior for any other signal.
>>
>> The issue is that the tasks don't freeze or that when thawed the SIGKILL
>> is still ignored. It seems a wake up is being missed in there somewhere.
>
> That's actually interesting and shouldn't be happening. Can you
> please provide more data as to what's going on while freezing? It's
> likely that the problem is not caused by freezer per-se, the task
> might be stuck elsewhere and just fails to reach the freezing point.
Barring some infrastructure noise what is happening is:
- Setup a hierarchy with memory and the freezer
(disable kernel oom and have a process watch for oom).
- In that memory cgroup add a process with one thread per cpu.
- In one thread slowly allocate once persecond I think it is 16M of ram
and mlock and dirty it (just to force the pages into ram and stay there).
- When oom is achieved loop:
* attempt to freeze all of the tasks.
* if frozen send every task SIGKILL, unfreeze, remove the directory in cgroupfs.
The log message I am seeing says that the freezing fails.
So I don't actually know what is delivering SIGKILL. It may be the oom
situation is triguring it while we are attempting to freeze the tasks.
> Would it be possible for memcg and freezer to deadlock? Note that
> while freezing is in progress, some tasks will enter freezer earlier
> than others (of course) and won't respond to anything. If memcg adds
> wait dependency among the tasks being frozen, it'll surely deadlock.
There may be a livelock. But I have been able to unstick the processes
by simply echo 0 > memory.oom_control.
There may be a race where a we haven't hit whatever is causing the
freezer to fail and a task is frozen SIGKILL is delivered. The wakup is
ignored and then the unfreezing doesn't deliver it?
I need to explore some more.
>> A single unified hierarchy is a really nasty idea for the same set of
>> reasons. You have to recompile to disable a controller to see if it that
>> controller's bugs are what are causing problems on your production
>> system. Compiles or even just a reboot is a very heavy hammer to ask
>> people to use when they are triaging a problem.
>
> For Nth time, unified hierarchy doesn't mean all controllers are
> enabled on all hierarchies or that controllers can't be bound and
> unbound dynamically. Except for the removal of orthogonal
> hierarchies, things actually become a lot more dynamic.
Interesting. So by unified hierarchy you just mean that the same
directory structure must exist for all mounts of cgroupfs?
If that is not it we can wait until Plumbers and hash it out in person.
All I am really concerned about right now is the ability to easily toss
out questionable controllers/subsystems without having to recompile or
reboot.
>> I am also seeing what looks like a leak somewhere in the cgroup code as
>> well. After some runs of the same reproducer I get into a state where
>> after everything is clean up. All of the control groups have been
>> removed and the cgroup filesystem is unmounted, I can mount a cgroup
>> filesystem with that same combindation of subsystems, but I can't mount
>> a cgroup filesystem with any of those subsystems in any other
>> combination. So I am guessing that the superblock is from the original
>> mounting is still lingering for some reason.
>
> Hmmm... yeah, if there are cgroups with refs remaining, that'd happen.
> Note that AFAIU memcg keeps the cgroups hangling around until all the
> pages are gone from it, so it could just be that it's still draining
> which may take a long time. Maybe dropping cache would work?
Good suggestion. I will have to play with that.
I also saw one case where someone how the directory that described one
of these weird blocking tasks was removed.
Eric
next prev parent reply other threads:[~2013-07-29 18:06 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-23 17:47 [GIT PULL] cgroup changes for 3.11-rc2 Tejun Heo
[not found] ` <20130723174711.GE21100-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 0:42 ` memcg creates an unkillable task in 3.2-rc2 Eric W. Biederman
[not found] ` <8761vui4cr.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 7:59 ` Michal Hocko
[not found] ` <20130729075939.GA4678-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-29 8:54 ` Eric W. Biederman
[not found] ` <87ehahg312.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 9:51 ` Michal Hocko
[not found] ` <20130729095109.GB4678-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-29 10:21 ` Eric W. Biederman
2013-07-29 16:10 ` Tejun Heo
[not found] ` <20130729161026.GD22605-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 17:03 ` Eric W. Biederman
[not found] ` <87r4eh70yg.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-07-29 17:20 ` Tejun Heo
[not found] ` <20130729172046.GI22605-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-07-29 18:06 ` Eric W. Biederman [this message]
2013-07-29 18:17 ` Michal Hocko
2013-07-29 18:13 ` Johannes Weiner
[not found] ` <20130729181354.GX715-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2013-07-29 18:52 ` Eric W. Biederman
2013-07-30 1:58 ` Li Zefan
[not found] ` <51F71DE2.4020102-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-07-30 8:19 ` memcg creates an unkillable task in 3.11-rc2 Eric W. Biederman
[not found] ` <87ppu0a298.fsf_-_-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-07-30 12:31 ` Michal Hocko
[not found] ` <20130730123120.GA15847-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-30 16:37 ` Eric W. Biederman
[not found] ` <874nbc3sx1.fsf-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-07-31 7:37 ` Michal Hocko
[not found] ` <20130731073726.GC30514-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-07-31 12:10 ` Johannes Weiner
2013-07-31 22:09 ` Eric W. Biederman
[not found] ` <87zjt2tm9f.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-08-01 9:06 ` Michal Hocko
[not found] ` <20130801090620.GA5198-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-05 9:56 ` Michal Hocko
[not found] ` <20130905095653.GB9702-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2013-09-06 18:09 ` Eric W. Biederman
[not found] ` <87ob85kejy.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-09 8:31 ` Michal Hocko
2013-07-30 16:28 ` Eric W. Biederman
[not found] ` <87ppu03td7.fsf-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-09-26 23:41 ` Fabio Kung
[not found] ` <CAHyO6Z33pUJ1_MjPO2OeUY_+ZRmc1niPiFm5DzGVDokm5vb4rw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-27 0:35 ` Eric W. Biederman
2013-11-12 16:00 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8738qx1brz.fsf@xmission.com \
--to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=glommer-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
--cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox