All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
	hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	kernel-team-b10kYP2dOMg@public.gmane.org
Subject: [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller
Date: Fri,  9 Oct 2015 23:29:27 -0400	[thread overview]
Message-ID: <1444447781-16182-1-git-send-email-tj@kernel.org> (raw)

Hello,

cgroup currently disassociates a task from its cgroups on exit and
reassigns it to the root cgroup.  This behavior turns out to be
problematic for several reasons.

* Resources can't be tracked for zombies.  This breaks pids controller
  as zombies escape resource restriction.  A cgroup can easily go way
  above its limits by creating a bunch of zombies.

* It's difficult to tell where zombies came from.  /proc/PID/cgroup
  gets reset to / on exit so given a zombie it's difficult to tell
  from which cgroup the zombie came from.

* It creates an extra work for controllers for no reason.  cpu and
  perf_events controllers implement exit callbacks to switch the
  exiting task's membership to root when just leaving it as-is is
  enough.

Unfortunately, fixing this involves opening a few cans of worms.

* Decoupling tasks being on a css_set from its reference counting so
  that css_set can be pinned w/o tasks being on it and decoupling
  css_set existence from whether a cgroup is populated so that pinning
  a css_set doesn't confuse populated state tracking and populated
  state can be used to decide whether certain operations are allowed.

* Making css task iteration drop css_set_rwsem between iteration steps
  so that internal locking is not exposed to iterator users and
  css_set_rwsem can be converted to a spinlock which can be grabbed
  from task free path.

After this patchset, besides pids controller being fixed, the visible
behavior isn't changed on traditional hierarchies but on the default
hierarchy a zombie reports its cgroup at the time of exit in
/proc/PID/cgroup.  If the cgroup gets removed before the task is
reaped, " (deleted)" is appended to the reported path.

This patchset contains the following 14 patches.

 0001-cgroup-remove-an-unused-parameter-from-cgroup_task_m.patch
 0002-cgroup-make-cgroup-nr_populated-count-the-number-of-.patch
 0003-cgroup-replace-cgroup_has_tasks-with-cgroup_is_popul.patch
 0004-cgroup-move-check_for_release-invocation.patch
 0005-cgroup-relocate-cgroup_-try-get-put.patch
 0006-cgroup-make-css_sets-pin-the-associated-cgroups.patch
 0007-cgroup-make-cgroup_destroy_locked-test-cgroup_is_pop.patch
 0008-cgroup-keep-css_set-and-task-lists-in-chronological-.patch
 0009-cgroup-factor-out-css_set_move_task.patch
 0010-cgroup-reorganize-css_task_iter-functions.patch
 0011-cgroup-don-t-hold-css_set_rwsem-across-css-task-iter.patch
 0012-cgroup-make-css_set_rwsem-a-spinlock-and-rename-it-t.patch
 0013-cgroup-keep-zombies-associated-with-their-original-c.patch
 0014-cgroup-add-cgroup_subsys-free-method-and-use-it-to-f.patch

0001-0007 decouple populated state tracking from css_set existence and
allows css_sets to be pinned without tasks on them.

0008-0012 update css_set task iterator to not hold lock across
iteration steps and replace css_set_rwsem with a spinlock.

0013 makes zombies keep their cgroup associations.  0014 introduces
->exit() method and fixes pids controller.

The patchset is pretty lightly tested and I need to verify that the
corner cases behave as expected.

This patchset is on top of cgroup/for-4.4 a3e72739b7a7 ("cgroup: fix
too early usage of static_branch_disable()") and available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-zombies

diffstat follows.  Thanks.

 Documentation/cgroups/cgroups.txt           |    4 
 Documentation/cgroups/unified-hierarchy.txt |    4 
 include/linux/cgroup-defs.h                 |   16 
 include/linux/cgroup.h                      |   14 
 kernel/cgroup.c                             |  522 +++++++++++++++++-----------
 kernel/cgroup_pids.c                        |    8 
 kernel/cpuset.c                             |    2 
 kernel/events/core.c                        |   16 
 kernel/fork.c                               |    1 
 kernel/sched/core.c                         |   16 
 mm/memcontrol.c                             |    2 
 11 files changed, 354 insertions(+), 251 deletions(-)

--
tejun

WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj@kernel.org>
To: lizefan@huawei.com, hannes@cmpxchg.org
Cc: cgroups@vger.kernel.org, cyphar@cyphar.com,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller
Date: Fri,  9 Oct 2015 23:29:27 -0400	[thread overview]
Message-ID: <1444447781-16182-1-git-send-email-tj@kernel.org> (raw)

Hello,

cgroup currently disassociates a task from its cgroups on exit and
reassigns it to the root cgroup.  This behavior turns out to be
problematic for several reasons.

* Resources can't be tracked for zombies.  This breaks pids controller
  as zombies escape resource restriction.  A cgroup can easily go way
  above its limits by creating a bunch of zombies.

* It's difficult to tell where zombies came from.  /proc/PID/cgroup
  gets reset to / on exit so given a zombie it's difficult to tell
  from which cgroup the zombie came from.

* It creates an extra work for controllers for no reason.  cpu and
  perf_events controllers implement exit callbacks to switch the
  exiting task's membership to root when just leaving it as-is is
  enough.

Unfortunately, fixing this involves opening a few cans of worms.

* Decoupling tasks being on a css_set from its reference counting so
  that css_set can be pinned w/o tasks being on it and decoupling
  css_set existence from whether a cgroup is populated so that pinning
  a css_set doesn't confuse populated state tracking and populated
  state can be used to decide whether certain operations are allowed.

* Making css task iteration drop css_set_rwsem between iteration steps
  so that internal locking is not exposed to iterator users and
  css_set_rwsem can be converted to a spinlock which can be grabbed
  from task free path.

After this patchset, besides pids controller being fixed, the visible
behavior isn't changed on traditional hierarchies but on the default
hierarchy a zombie reports its cgroup at the time of exit in
/proc/PID/cgroup.  If the cgroup gets removed before the task is
reaped, " (deleted)" is appended to the reported path.

This patchset contains the following 14 patches.

 0001-cgroup-remove-an-unused-parameter-from-cgroup_task_m.patch
 0002-cgroup-make-cgroup-nr_populated-count-the-number-of-.patch
 0003-cgroup-replace-cgroup_has_tasks-with-cgroup_is_popul.patch
 0004-cgroup-move-check_for_release-invocation.patch
 0005-cgroup-relocate-cgroup_-try-get-put.patch
 0006-cgroup-make-css_sets-pin-the-associated-cgroups.patch
 0007-cgroup-make-cgroup_destroy_locked-test-cgroup_is_pop.patch
 0008-cgroup-keep-css_set-and-task-lists-in-chronological-.patch
 0009-cgroup-factor-out-css_set_move_task.patch
 0010-cgroup-reorganize-css_task_iter-functions.patch
 0011-cgroup-don-t-hold-css_set_rwsem-across-css-task-iter.patch
 0012-cgroup-make-css_set_rwsem-a-spinlock-and-rename-it-t.patch
 0013-cgroup-keep-zombies-associated-with-their-original-c.patch
 0014-cgroup-add-cgroup_subsys-free-method-and-use-it-to-f.patch

0001-0007 decouple populated state tracking from css_set existence and
allows css_sets to be pinned without tasks on them.

0008-0012 update css_set task iterator to not hold lock across
iteration steps and replace css_set_rwsem with a spinlock.

0013 makes zombies keep their cgroup associations.  0014 introduces
->exit() method and fixes pids controller.

The patchset is pretty lightly tested and I need to verify that the
corner cases behave as expected.

This patchset is on top of cgroup/for-4.4 a3e72739b7a7 ("cgroup: fix
too early usage of static_branch_disable()") and available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-zombies

diffstat follows.  Thanks.

 Documentation/cgroups/cgroups.txt           |    4 
 Documentation/cgroups/unified-hierarchy.txt |    4 
 include/linux/cgroup-defs.h                 |   16 
 include/linux/cgroup.h                      |   14 
 kernel/cgroup.c                             |  522 +++++++++++++++++-----------
 kernel/cgroup_pids.c                        |    8 
 kernel/cpuset.c                             |    2 
 kernel/events/core.c                        |   16 
 kernel/fork.c                               |    1 
 kernel/sched/core.c                         |   16 
 mm/memcontrol.c                             |    2 
 11 files changed, 354 insertions(+), 251 deletions(-)

--
tejun

             reply	other threads:[~2015-10-10  3:29 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-10  3:29 Tejun Heo [this message]
2015-10-10  3:29 ` [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller Tejun Heo
2015-10-10  3:29 ` [PATCH 02/14] cgroup: make cgroup->nr_populated count the number of populated css_sets Tejun Heo
2015-10-10  3:29 ` [PATCH 04/14] cgroup: move check_for_release() invocation Tejun Heo
2015-10-10  3:29 ` [PATCH 06/14] cgroup: make css_sets pin the associated cgroups Tejun Heo
     [not found]   ` <1444447781-16182-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-15  1:34     ` [PATCH v2 " Tejun Heo
2015-10-15  1:34       ` Tejun Heo
2015-10-10  3:29 ` [PATCH 07/14] cgroup: make cgroup_destroy_locked() test cgroup_is_populated() Tejun Heo
2015-10-10  3:29 ` [PATCH 08/14] cgroup: keep css_set and task lists in chronological order Tejun Heo
2015-10-10  3:29 ` [PATCH 09/14] cgroup: factor out css_set_move_task() Tejun Heo
     [not found] ` <1444447781-16182-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-10  3:29   ` [PATCH 01/14] cgroup: remove an unused parameter from cgroup_task_migrate() Tejun Heo
2015-10-10  3:29     ` Tejun Heo
2015-10-10  3:29   ` [PATCH 03/14] cgroup: replace cgroup_has_tasks() with cgroup_is_populated() Tejun Heo
2015-10-10  3:29     ` Tejun Heo
2015-10-10  3:29   ` [PATCH 05/14] cgroup: relocate cgroup_[try]get/put() Tejun Heo
2015-10-10  3:29     ` Tejun Heo
2015-10-10  3:29   ` [PATCH 10/14] cgroup: reorganize css_task_iter functions Tejun Heo
2015-10-10  3:29     ` Tejun Heo
2015-10-11 13:30   ` [PATCH 11/14] cgroup: don't hold css_set_rwsem across css task iteration Tejun Heo
2015-10-11 13:30     ` Tejun Heo
2015-10-15  1:35     ` [PATCH v2 " Tejun Heo
2015-10-15  1:35       ` Tejun Heo
2015-10-11 13:30   ` [PATCH 14/14] cgroup: add cgroup_subsys->free() method and use it to fix pids controller Tejun Heo
2015-10-11 13:30     ` Tejun Heo
     [not found]     ` <1444570210-15640-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-12 10:29       ` Aleksa Sarai
2015-10-12 10:29         ` Aleksa Sarai
2015-10-12 15:25         ` Tejun Heo
2015-10-11 13:30 ` [PATCH 12/14] cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock Tejun Heo
2015-10-11 13:30 ` [PATCH 13/14] cgroup: keep zombies associated with their original cgroups Tejun Heo
     [not found]   ` <1444570210-15640-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2015-10-12 17:44     ` [PATCH v2 " Tejun Heo
2015-10-12 17:44       ` Tejun Heo
2015-10-15  1:38 ` [PATCHSET cgroup/for-4.4] cgroup: make zombies retain cgroup membership and fix pids controller Tejun Heo
     [not found]   ` <20151015013809.GC20884-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-10-15 20:41     ` Tejun Heo
2015-10-15 20:41       ` Tejun Heo
     [not found]       ` <20151015204114.GA3788-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-10-19  8:48         ` Zefan Li
2015-10-19  8:48           ` Zefan Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1444447781-16182-1-git-send-email-tj@kernel.org \
    --to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=kernel-team-b10kYP2dOMg@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.