From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
kernel-team-b10kYP2dOMg@public.gmane.org
Subject: [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller
Date: Fri, 9 Oct 2015 23:29:27 -0400 [thread overview]
Message-ID: <1444447781-16182-1-git-send-email-tj@kernel.org> (raw)
Hello,
cgroup currently disassociates a task from its cgroups on exit and
reassigns it to the root cgroup. This behavior turns out to be
problematic for several reasons.
* Resources can't be tracked for zombies. This breaks pids controller
as zombies escape resource restriction. A cgroup can easily go way
above its limits by creating a bunch of zombies.
* It's difficult to tell where zombies came from. /proc/PID/cgroup
gets reset to / on exit so given a zombie it's difficult to tell
from which cgroup the zombie came from.
* It creates an extra work for controllers for no reason. cpu and
perf_events controllers implement exit callbacks to switch the
exiting task's membership to root when just leaving it as-is is
enough.
Unfortunately, fixing this involves opening a few cans of worms.
* Decoupling tasks being on a css_set from its reference counting so
that css_set can be pinned w/o tasks being on it and decoupling
css_set existence from whether a cgroup is populated so that pinning
a css_set doesn't confuse populated state tracking and populated
state can be used to decide whether certain operations are allowed.
* Making css task iteration drop css_set_rwsem between iteration steps
so that internal locking is not exposed to iterator users and
css_set_rwsem can be converted to a spinlock which can be grabbed
from task free path.
After this patchset, besides pids controller being fixed, the visible
behavior isn't changed on traditional hierarchies but on the default
hierarchy a zombie reports its cgroup at the time of exit in
/proc/PID/cgroup. If the cgroup gets removed before the task is
reaped, " (deleted)" is appended to the reported path.
This patchset contains the following 14 patches.
0001-cgroup-remove-an-unused-parameter-from-cgroup_task_m.patch
0002-cgroup-make-cgroup-nr_populated-count-the-number-of-.patch
0003-cgroup-replace-cgroup_has_tasks-with-cgroup_is_popul.patch
0004-cgroup-move-check_for_release-invocation.patch
0005-cgroup-relocate-cgroup_-try-get-put.patch
0006-cgroup-make-css_sets-pin-the-associated-cgroups.patch
0007-cgroup-make-cgroup_destroy_locked-test-cgroup_is_pop.patch
0008-cgroup-keep-css_set-and-task-lists-in-chronological-.patch
0009-cgroup-factor-out-css_set_move_task.patch
0010-cgroup-reorganize-css_task_iter-functions.patch
0011-cgroup-don-t-hold-css_set_rwsem-across-css-task-iter.patch
0012-cgroup-make-css_set_rwsem-a-spinlock-and-rename-it-t.patch
0013-cgroup-keep-zombies-associated-with-their-original-c.patch
0014-cgroup-add-cgroup_subsys-free-method-and-use-it-to-f.patch
0001-0007 decouple populated state tracking from css_set existence and
allows css_sets to be pinned without tasks on them.
0008-0012 update css_set task iterator to not hold lock across
iteration steps and replace css_set_rwsem with a spinlock.
0013 makes zombies keep their cgroup associations. 0014 introduces
->exit() method and fixes pids controller.
The patchset is pretty lightly tested and I need to verify that the
corner cases behave as expected.
This patchset is on top of cgroup/for-4.4 a3e72739b7a7 ("cgroup: fix
too early usage of static_branch_disable()") and available in the
following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-zombies
diffstat follows. Thanks.
Documentation/cgroups/cgroups.txt | 4
Documentation/cgroups/unified-hierarchy.txt | 4
include/linux/cgroup-defs.h | 16
include/linux/cgroup.h | 14
kernel/cgroup.c | 522 +++++++++++++++++-----------
kernel/cgroup_pids.c | 8
kernel/cpuset.c | 2
kernel/events/core.c | 16
kernel/fork.c | 1
kernel/sched/core.c | 16
mm/memcontrol.c | 2
11 files changed, 354 insertions(+), 251 deletions(-)
--
tejun
next reply other threads:[~2015-10-10 3:29 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-10 3:29 Tejun Heo [this message]
2015-10-10 3:29 ` [PATCH 02/14] cgroup: make cgroup->nr_populated count the number of populated css_sets Tejun Heo
2015-10-10 3:29 ` [PATCH 04/14] cgroup: move check_for_release() invocation Tejun Heo
2015-10-10 3:29 ` [PATCH 06/14] cgroup: make css_sets pin the associated cgroups Tejun Heo
[not found] ` <1444447781-16182-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-15 1:34 ` [PATCH v2 " Tejun Heo
2015-10-10 3:29 ` [PATCH 07/14] cgroup: make cgroup_destroy_locked() test cgroup_is_populated() Tejun Heo
2015-10-10 3:29 ` [PATCH 08/14] cgroup: keep css_set and task lists in chronological order Tejun Heo
2015-10-10 3:29 ` [PATCH 09/14] cgroup: factor out css_set_move_task() Tejun Heo
[not found] ` <1444447781-16182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-10 3:29 ` [PATCH 01/14] cgroup: remove an unused parameter from cgroup_task_migrate() Tejun Heo
2015-10-10 3:29 ` [PATCH 03/14] cgroup: replace cgroup_has_tasks() with cgroup_is_populated() Tejun Heo
2015-10-10 3:29 ` [PATCH 05/14] cgroup: relocate cgroup_[try]get/put() Tejun Heo
2015-10-10 3:29 ` [PATCH 10/14] cgroup: reorganize css_task_iter functions Tejun Heo
2015-10-11 13:30 ` [PATCH 11/14] cgroup: don't hold css_set_rwsem across css task iteration Tejun Heo
2015-10-15 1:35 ` [PATCH v2 " Tejun Heo
2015-10-11 13:30 ` [PATCH 14/14] cgroup: add cgroup_subsys->free() method and use it to fix pids controller Tejun Heo
[not found] ` <1444570210-15640-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-12 10:29 ` Aleksa Sarai
2015-10-12 15:25 ` Tejun Heo
2015-10-11 13:30 ` [PATCH 12/14] cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock Tejun Heo
2015-10-11 13:30 ` [PATCH 13/14] cgroup: keep zombies associated with their original cgroups Tejun Heo
[not found] ` <1444570210-15640-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-12 17:44 ` [PATCH v2 " Tejun Heo
2015-10-15 1:38 ` [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller Tejun Heo
[not found] ` <20151015013809.GC20884-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-10-15 20:41 ` Tejun Heo
[not found] ` <20151015204114.GA3788-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-10-19 8:48 ` Zefan Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1444447781-16182-1-git-send-email-tj@kernel.org \
--to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=kernel-team-b10kYP2dOMg@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).