From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
kernel-team-b10kYP2dOMg@public.gmane.org
Subject: [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller
Date: Fri, 9 Oct 2015 23:29:27 -0400 [thread overview]
Message-ID: <1444447781-16182-1-git-send-email-tj@kernel.org> (raw)
Hello,
cgroup currently disassociates a task from its cgroups on exit and
reassigns it to the root cgroup. This behavior turns out to be
problematic for several reasons.
* Resources can't be tracked for zombies. This breaks pids controller
as zombies escape resource restriction. A cgroup can easily go way
above its limits by creating a bunch of zombies.
* It's difficult to tell where zombies came from. /proc/PID/cgroup
gets reset to / on exit so given a zombie it's difficult to tell
from which cgroup the zombie came from.
* It creates an extra work for controllers for no reason. cpu and
perf_events controllers implement exit callbacks to switch the
exiting task's membership to root when just leaving it as-is is
enough.
Unfortunately, fixing this involves opening a few cans of worms.
* Decoupling tasks being on a css_set from its reference counting so
that css_set can be pinned w/o tasks being on it and decoupling
css_set existence from whether a cgroup is populated so that pinning
a css_set doesn't confuse populated state tracking and populated
state can be used to decide whether certain operations are allowed.
* Making css task iteration drop css_set_rwsem between iteration steps
so that internal locking is not exposed to iterator users and
css_set_rwsem can be converted to a spinlock which can be grabbed
from task free path.
After this patchset, besides pids controller being fixed, the visible
behavior isn't changed on traditional hierarchies but on the default
hierarchy a zombie reports its cgroup at the time of exit in
/proc/PID/cgroup. If the cgroup gets removed before the task is
reaped, " (deleted)" is appended to the reported path.
This patchset contains the following 14 patches.
0001-cgroup-remove-an-unused-parameter-from-cgroup_task_m.patch
0002-cgroup-make-cgroup-nr_populated-count-the-number-of-.patch
0003-cgroup-replace-cgroup_has_tasks-with-cgroup_is_popul.patch
0004-cgroup-move-check_for_release-invocation.patch
0005-cgroup-relocate-cgroup_-try-get-put.patch
0006-cgroup-make-css_sets-pin-the-associated-cgroups.patch
0007-cgroup-make-cgroup_destroy_locked-test-cgroup_is_pop.patch
0008-cgroup-keep-css_set-and-task-lists-in-chronological-.patch
0009-cgroup-factor-out-css_set_move_task.patch
0010-cgroup-reorganize-css_task_iter-functions.patch
0011-cgroup-don-t-hold-css_set_rwsem-across-css-task-iter.patch
0012-cgroup-make-css_set_rwsem-a-spinlock-and-rename-it-t.patch
0013-cgroup-keep-zombies-associated-with-their-original-c.patch
0014-cgroup-add-cgroup_subsys-free-method-and-use-it-to-f.patch
0001-0007 decouple populated state tracking from css_set existence and
allows css_sets to be pinned without tasks on them.
0008-0012 update css_set task iterator to not hold lock across
iteration steps and replace css_set_rwsem with a spinlock.
0013 makes zombies keep their cgroup associations. 0014 introduces
->exit() method and fixes pids controller.
The patchset is pretty lightly tested and I need to verify that the
corner cases behave as expected.
This patchset is on top of cgroup/for-4.4 a3e72739b7a7 ("cgroup: fix
too early usage of static_branch_disable()") and available in the
following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-zombies
diffstat follows. Thanks.
Documentation/cgroups/cgroups.txt | 4
Documentation/cgroups/unified-hierarchy.txt | 4
include/linux/cgroup-defs.h | 16
include/linux/cgroup.h | 14
kernel/cgroup.c | 522 +++++++++++++++++-----------
kernel/cgroup_pids.c | 8
kernel/cpuset.c | 2
kernel/events/core.c | 16
kernel/fork.c | 1
kernel/sched/core.c | 16
mm/memcontrol.c | 2
11 files changed, 354 insertions(+), 251 deletions(-)
--
tejun
WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj@kernel.org>
To: lizefan@huawei.com, hannes@cmpxchg.org
Cc: cgroups@vger.kernel.org, cyphar@cyphar.com,
linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller
Date: Fri, 9 Oct 2015 23:29:27 -0400 [thread overview]
Message-ID: <1444447781-16182-1-git-send-email-tj@kernel.org> (raw)
Hello,
cgroup currently disassociates a task from its cgroups on exit and
reassigns it to the root cgroup. This behavior turns out to be
problematic for several reasons.
* Resources can't be tracked for zombies. This breaks pids controller
as zombies escape resource restriction. A cgroup can easily go way
above its limits by creating a bunch of zombies.
* It's difficult to tell where zombies came from. /proc/PID/cgroup
gets reset to / on exit so given a zombie it's difficult to tell
from which cgroup the zombie came from.
* It creates an extra work for controllers for no reason. cpu and
perf_events controllers implement exit callbacks to switch the
exiting task's membership to root when just leaving it as-is is
enough.
Unfortunately, fixing this involves opening a few cans of worms.
* Decoupling tasks being on a css_set from its reference counting so
that css_set can be pinned w/o tasks being on it and decoupling
css_set existence from whether a cgroup is populated so that pinning
a css_set doesn't confuse populated state tracking and populated
state can be used to decide whether certain operations are allowed.
* Making css task iteration drop css_set_rwsem between iteration steps
so that internal locking is not exposed to iterator users and
css_set_rwsem can be converted to a spinlock which can be grabbed
from task free path.
After this patchset, besides pids controller being fixed, the visible
behavior isn't changed on traditional hierarchies but on the default
hierarchy a zombie reports its cgroup at the time of exit in
/proc/PID/cgroup. If the cgroup gets removed before the task is
reaped, " (deleted)" is appended to the reported path.
This patchset contains the following 14 patches.
0001-cgroup-remove-an-unused-parameter-from-cgroup_task_m.patch
0002-cgroup-make-cgroup-nr_populated-count-the-number-of-.patch
0003-cgroup-replace-cgroup_has_tasks-with-cgroup_is_popul.patch
0004-cgroup-move-check_for_release-invocation.patch
0005-cgroup-relocate-cgroup_-try-get-put.patch
0006-cgroup-make-css_sets-pin-the-associated-cgroups.patch
0007-cgroup-make-cgroup_destroy_locked-test-cgroup_is_pop.patch
0008-cgroup-keep-css_set-and-task-lists-in-chronological-.patch
0009-cgroup-factor-out-css_set_move_task.patch
0010-cgroup-reorganize-css_task_iter-functions.patch
0011-cgroup-don-t-hold-css_set_rwsem-across-css-task-iter.patch
0012-cgroup-make-css_set_rwsem-a-spinlock-and-rename-it-t.patch
0013-cgroup-keep-zombies-associated-with-their-original-c.patch
0014-cgroup-add-cgroup_subsys-free-method-and-use-it-to-f.patch
0001-0007 decouple populated state tracking from css_set existence and
allows css_sets to be pinned without tasks on them.
0008-0012 update css_set task iterator to not hold lock across
iteration steps and replace css_set_rwsem with a spinlock.
0013 makes zombies keep their cgroup associations. 0014 introduces
->exit() method and fixes pids controller.
The patchset is pretty lightly tested and I need to verify that the
corner cases behave as expected.
This patchset is on top of cgroup/for-4.4 a3e72739b7a7 ("cgroup: fix
too early usage of static_branch_disable()") and available in the
following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-zombies
diffstat follows. Thanks.
Documentation/cgroups/cgroups.txt | 4
Documentation/cgroups/unified-hierarchy.txt | 4
include/linux/cgroup-defs.h | 16
include/linux/cgroup.h | 14
kernel/cgroup.c | 522 +++++++++++++++++-----------
kernel/cgroup_pids.c | 8
kernel/cpuset.c | 2
kernel/events/core.c | 16
kernel/fork.c | 1
kernel/sched/core.c | 16
mm/memcontrol.c | 2
11 files changed, 354 insertions(+), 251 deletions(-)
--
tejun
next reply other threads:[~2015-10-10 3:29 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-10 3:29 Tejun Heo [this message]
2015-10-10 3:29 ` [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller Tejun Heo
2015-10-10 3:29 ` [PATCH 02/14] cgroup: make cgroup->nr_populated count the number of populated css_sets Tejun Heo
2015-10-10 3:29 ` [PATCH 04/14] cgroup: move check_for_release() invocation Tejun Heo
2015-10-10 3:29 ` [PATCH 06/14] cgroup: make css_sets pin the associated cgroups Tejun Heo
[not found] ` <1444447781-16182-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-15 1:34 ` [PATCH v2 " Tejun Heo
2015-10-15 1:34 ` Tejun Heo
2015-10-10 3:29 ` [PATCH 07/14] cgroup: make cgroup_destroy_locked() test cgroup_is_populated() Tejun Heo
2015-10-10 3:29 ` [PATCH 08/14] cgroup: keep css_set and task lists in chronological order Tejun Heo
2015-10-10 3:29 ` [PATCH 09/14] cgroup: factor out css_set_move_task() Tejun Heo
[not found] ` <1444447781-16182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-10 3:29 ` [PATCH 01/14] cgroup: remove an unused parameter from cgroup_task_migrate() Tejun Heo
2015-10-10 3:29 ` Tejun Heo
2015-10-10 3:29 ` [PATCH 03/14] cgroup: replace cgroup_has_tasks() with cgroup_is_populated() Tejun Heo
2015-10-10 3:29 ` Tejun Heo
2015-10-10 3:29 ` [PATCH 05/14] cgroup: relocate cgroup_[try]get/put() Tejun Heo
2015-10-10 3:29 ` Tejun Heo
2015-10-10 3:29 ` [PATCH 10/14] cgroup: reorganize css_task_iter functions Tejun Heo
2015-10-10 3:29 ` Tejun Heo
2015-10-11 13:30 ` [PATCH 11/14] cgroup: don't hold css_set_rwsem across css task iteration Tejun Heo
2015-10-11 13:30 ` Tejun Heo
2015-10-15 1:35 ` [PATCH v2 " Tejun Heo
2015-10-15 1:35 ` Tejun Heo
2015-10-11 13:30 ` [PATCH 14/14] cgroup: add cgroup_subsys->free() method and use it to fix pids controller Tejun Heo
2015-10-11 13:30 ` Tejun Heo
[not found] ` <1444570210-15640-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-12 10:29 ` Aleksa Sarai
2015-10-12 10:29 ` Aleksa Sarai
2015-10-12 15:25 ` Tejun Heo
2015-10-11 13:30 ` [PATCH 12/14] cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock Tejun Heo
2015-10-11 13:30 ` [PATCH 13/14] cgroup: keep zombies associated with their original cgroups Tejun Heo
[not found] ` <1444570210-15640-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-12 17:44 ` [PATCH v2 " Tejun Heo
2015-10-12 17:44 ` Tejun Heo
2015-10-15 1:38 ` [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller Tejun Heo
[not found] ` <20151015013809.GC20884-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-10-15 20:41 ` Tejun Heo
2015-10-15 20:41 ` Tejun Heo
[not found] ` <20151015204114.GA3788-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-10-19 8:48 ` Zefan Li
2015-10-19 8:48 ` Zefan Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1444447781-16182-1-git-send-email-tj@kernel.org \
--to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=kernel-team-b10kYP2dOMg@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.