cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET cgroup-for-3.14] cgroup: restructure pidlist handling
@ 2013-11-24 22:11 Tejun Heo
       [not found] ` <1385331096-7918-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2013-11-24 22:11 UTC (permalink / raw)
  To: lizefan-hv44wF8Li93QT0dZR+AlfA
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

pidlist is hanlding is quite elaborate.  Because the pidlist files -
"tasks" and "cgroup.pids" - guarantee that the result is sorted and a
task can be associated with different pids, with no inherent order
among them, depending on namespaces, it is impossible to give a
certain order to tasks of a cgroup and then just iterate through them.

Instead, we end up creating tables of the relevant ids and then sort
them before serving them out for reads.  As those tables can be huge,
we also implement logic to share those tables if the id type and
namespace match, which in turn involves reference counting those
tables and synchronizing accesses to them.

What could have been a simple iteration through the member tasks
became this unnecessary hunk of complexity because it, for some
reason, wanted to guarantee sorted output, which is extremely unusual
for this type of interface.

The refcnting is done from open() and release() callbacks, which
kernfs doesn't expose.  This patchset updates pidlist handling so that
pidlists are managed from seq_file operations proper.  As the duration
between the paired start and stop denotes a single read invocation and
we don't want to reload pidlist for each instance of consecutive read
calls, pidlist is released with time delay.  This also bounds the
stale the output of read calls can be.  This makes refcnting
unnecessary - locking is simplified and refcnting is dropped.

In the long term, we want to do away with pidlist and make this a
simple iteration over member tasks.  The last patch scrambles the sort
order of "cgroup.pids" if sane_behavior, so that the sorted
expectation is broken in the new interface and we can eventually drop
pidlist logic.

This patchset contains the following nine patches.

 0001-cgroup-don-t-skip-seq_open-on-write-only-opens-on-pi.patch
 0002-cgroup-remove-cftype-release.patch
 0003-cgroup-implement-delayed-destruction-for-cgroup_pidl.patch
 0004-cgroup-introduce-struct-cgroup_pidlist_open_file.patch
 0005-cgroup-refactor-cgroup_pidlist_find.patch
 0006-cgroup-remove-cgroup_pidlist-rwsem.patch
 0007-cgroup-load-and-release-pidlists-from-seq_file-start.patch
 0008-cgroup-remove-cgroup_pidlist-use_count.patch
 0009-cgroup-don-t-guarantee-cgroup.procs-is-sorted-if-san.patch

0001-0002 are prep patches.

0003-0008 restructure pidlist handling so that it's managed from
seq_file operations.

0009 scrames sort order of cgroup.pids if sane_behavior.

This patchset is on top of cgroup/for-3.14 edab95103d3a ("cgroup:
Merge branch 'memcg_event' into for-3.14") and available in the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-pidlist

diffstat follows.

 include/linux/cgroup.h |    5 
 kernel/cgroup.c        |  310 +++++++++++++++++++++++++++++++------------------
 2 files changed, 204 insertions(+), 111 deletions(-)

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-11-29 15:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-24 22:11 [PATCHSET cgroup-for-3.14] cgroup: restructure pidlist handling Tejun Heo
     [not found] ` <1385331096-7918-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-11-24 22:11   ` [PATCH 1/9] cgroup: don't skip seq_open on write only opens on pidlist files Tejun Heo
2013-11-24 22:11   ` [PATCH 2/9] cgroup: remove cftype->release() Tejun Heo
2013-11-24 22:11   ` [PATCH 3/9] cgroup: implement delayed destruction for cgroup_pidlist Tejun Heo
2013-11-24 22:11   ` [PATCH 4/9] cgroup: introduce struct cgroup_pidlist_open_file Tejun Heo
     [not found]     ` <1385331096-7918-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-11-29  1:03       ` Li Zefan
2013-11-29 15:44       ` [PATCH v2 " Tejun Heo
2013-11-24 22:11   ` [PATCH 5/9] cgroup: refactor cgroup_pidlist_find() Tejun Heo
2013-11-24 22:11   ` [PATCH 6/9] cgroup: remove cgroup_pidlist->rwsem Tejun Heo
2013-11-24 22:11   ` [PATCH 7/9] cgroup: load and release pidlists from seq_file start and stop respectively Tejun Heo
     [not found]     ` <1385331096-7918-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2013-11-29 15:45       ` [PATCH v2 " Tejun Heo
2013-11-24 22:11   ` [PATCH 8/9] cgroup: remove cgroup_pidlist->use_count Tejun Heo
2013-11-24 22:11   ` [PATCH 9/9] cgroup: don't guarantee cgroup.procs is sorted if sane_behavior Tejun Heo
2013-11-27 23:23   ` [PATCHSET cgroup-for-3.14] cgroup: restructure pidlist handling Tejun Heo
2013-11-29  1:03   ` Li Zefan
     [not found]     ` <5297E7E2.8080404-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-11-29 15:46       ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).