Linux Power Management development

Linux Power Management development
 help / color / mirror / Atom feed

* Re: [PATCH 02/11] PM: extend PM QoS with per-device wake-up constraints
From: mark gross @ 2011-08-26 16:54 UTC (permalink / raw)
  To: MyungJoo Ham; +Cc: markgross, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <CAJ0PZbRCvoh8-rJAR92Ak_JPk4FyR0dp_OeKOUkWtD5AdFsPCw@mail.gmail.com>

On Fri, Aug 26, 2011 at 11:25:13AM +0900, MyungJoo Ham wrote:
> On Fri, Jul 1, 2011 at 12:11 AM,  <jean.pihet@newoldbits.com> wrote:
> > @@ -129,19 +146,19 @@ static const struct file_operations pm_qos_power_fops = {
> >  /* unlocked internal variant */
> >  static inline int pm_qos_get_value(struct pm_qos_object *o)
> >  {
> > -       if (plist_head_empty(&o->requests))
> > +       if (plist_head_empty(o->requests))
> >                return o->default_value;
> >
> >        switch (o->type) {
> > -       case PM_QOS_MIN:
> > -               return plist_first(&o->requests)->prio;
> > +               case PM_QOS_MIN:
> > +                       return plist_first(o->requests)->prio;
> >
> > -       case PM_QOS_MAX:
> > -               return plist_last(&o->requests)->prio;
> > +               case PM_QOS_MAX:
> > +                       return plist_last(o->requests)->prio;
> >
> > -       default:
> > -               /* runtime check for not using enum */
> > -               BUG();
> > +               default:
> > +                       /* runtime check for not using enum */
> > +                       BUG();
> >        }
> >  }
> 
> Hello,
> 
> 
> Sorry to jump in this late, but, I've got a question in choosing QoS
> value from the list with pm_qos_get_value function and pm_qos_type.
> 
> For QoS objects such as network throughput, wouldn't "PM_QOS_ADD" be
> more appropriate than PM_QOS_MAX?
> If A is requesting 10MB/s on NIC-X and B is requesting 5MB/s on NIC-X,
> I guess PM QOS should say NIC-X that it needs to provide 15MB/s, not
> 10MB/s. Or, are we assuming that A and B will never put streams at the
> same time?

This was brought up a few years back as well.  The reason we kept the
aggregate QoS for network Throughput as a max instead of a sum was that
there where already a number of interfaces for network shaping and we
didn't have a good answer to the problem of what to do when the
throughput qos requested exceeds hardware capabilities.  

Now that I have more experience with handsets I'm not sure it makes any
practical difference to sum or max the throughput qos requests.  So if
someone would like it to aggregate throughputs by summation of requests
that could be done.  

It turns out its the latency requests that are really used.

The network qos parameters also have a problem with discriminating
between what physical interface the QoS is for.  This limitation in the
ABI is partially addressed with some of the work that Jean Pihet's
patches are doing.

--mark

^ permalink raw reply

* Re: [PATCH 02/11] PM: extend PM QoS with per-device wake-up constraints
From: mark gross @ 2011-08-26 16:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: markgross, Mark Brown, Linux PM mailing list, linux-omap,
	Jean Pihet
In-Reply-To: <201108251617.31959.rjw@sisk.pl>

On Thu, Aug 25, 2011 at 04:17:31PM +0200, Rafael J. Wysocki wrote:
> On Thursday, August 25, 2011, Mark Brown wrote:
> > On Tue, Aug 23, 2011 at 11:31:54PM +0200, Rafael J. Wysocki wrote:
> > 
> > > Perhaps.  Still, that requires some policy to be put into drivers,
> > > which I don't think is entirely correct.
> > 
> > So, I don't really have the bandwidth to discuss this properly for at
> > least the next week or so - is it possible to find some time in the
> > power track at LPC for this?
> 
> I think so, I'm going to add PM QoS to the agenda anyway.
>
FWIW I'll be there too.

--mark

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Paul Menage @ 2011-08-26 15:54 UTC (permalink / raw)
  To: Matt Helsley
  Cc: fweisbec, containers, lizf, linux-kernel, James Morris,
	Daisuke Nishimura, Tejun Heo, linux-pm, kamezawa.hiroyu
In-Reply-To: <20110826023818.GC3457@count0.beaverton.ibm.com>

On Thu, Aug 25, 2011 at 7:38 PM, Matt Helsley <matthltc@us.ibm.com> wrote:
> On Fri, Aug 26, 2011 at 12:43:09AM +0200, Tejun Heo wrote:
>> Currently, there's no way to pass multiple tasks to cgroup_subsys
>> methods necessitating the need for separate per-process and per-task
>> methods.  This patch introduces cgroup_taskset which can be used to
>> pass multiple tasks and their associated cgroups to cgroup_subsys
>> methods.
>
> This will be the third iterator-ish pattern in the cgroup code.
> It's not your fault but it does seem a bit much to have:

I agree with this sentiment in principle (in fact it was one of the
first things that I thought when I saw this patch) but I think that
merging them is a non-trivial process, and so shouldn't hold up the
introduction of this patch set.

Changing the cgroup_scanner interface to be control-loop rather than
callback (for the processed tasks, at least - testing tasks will have
to be a callback still, I think) shouldn't be too hard.

The important differences between the iterators then are how they
generate a stable set of tasks. If we made the basic callback
operations be:

start - do any prep work
fill - return some tasks to process, or none if the iterator is exhausted
end - undo any prep work

then the user API functions would look something like:

iter_start(iter *i) {
  i->start();
  i->avail_task_count = i->avail_task_pos = 0;
}

struct task_and_cgroup *iter_next(iter *i) {
  if (i->avail_task_pos >= i->avail_task_count) {
    i->avail_task_pos = 0;
    i->fill(&i->avail_tasks, &i->avail_task_count);
    if (!i->avail_tasks)
      return NULL;
    }
  }
  return i->avail_tasks + i->avail_task_pos++;
}

iter_end(iter *i) {
  i->end();
}

then for the three iteration abstractions:

- cgroup_iter_start/next/end:

creation API: taskset_from_cgroup(cgroup *cg)

start - takes the lock
fill - returns the next task in the cgroup, or NULL if exhausted
end - releases the lock

- cgroup_taskset

creation API: taskset_from_array(task_and_cgroup *tg, int count)

start - no-op
fill - returns a pointer to the array used to set up the taskset
end - no-op

- cgroup_scanner

creation API: atomic_taskset_from_array(cgroup, test_fn, optional_heap);

start - allocate an array for storing the heap used to build the next
batch of tasks (if not provided by creator)
fill - takes css_set lock and makes a pass over the cgroup to find the
lowest (by start-time) set of unprocessed tasks that match the test_fn
predicate, releases css_set_lock
end - frees the array (if not provided by the creator)

I think the progression of patches would be:

- convert the current user of cgroup_scanner to a control-loop API
rather than a callback API (for process_task) just to verify that this
fits in the existing model

- convert the initial implementation of cgroup_taskset to use the new
internal start/fill/end API

- convert the other two iterators (in separate patches) to use the new
internal and external APIs.

Paul

^ permalink raw reply

* Re: [PATCHSET] cgroup: introduce cgroup_taskset and consolidate subsys methods, take#2
From: Paul Menage @ 2011-08-26 14:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: fweisbec, containers, Li Zefan, linux-kernel, linux-pm,
	Andrew Morton, kamezawa.hiroyu
In-Reply-To: <20110826140900.GF2632@htj.dyndns.org>

On Fri, Aug 26, 2011 at 7:09 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Fri, Aug 26, 2011 at 04:05:39PM +0800, Li Zefan wrote:
>> It's fine for me to route those patches through pm-freezer, since this
>> work is initiated and needed for the work on freezer.
>
> Awesome.  Paul, what do you think?  Rafael, if Paul agrees, can you
> please take in this series?
>

Sure - but can you make sure that there's nothing in -mm that
conflicts with this patchset? If there is, then maybe cherry-pick
those into pm-freezer too, and fix them up?

Paul

^ permalink raw reply

* Re: [PATCHSET] cgroup: introduce cgroup_taskset and consolidate subsys methods, take#2
From: Tejun Heo @ 2011-08-26 14:09 UTC (permalink / raw)
  To: Li Zefan
  Cc: fweisbec, containers, linux-kernel, linux-pm, paul,
	kamezawa.hiroyu
In-Reply-To: <4E5753D3.9080500@cn.fujitsu.com>

Hello,

On Fri, Aug 26, 2011 at 04:05:39PM +0800, Li Zefan wrote:
> It's fine for me to route those patches through pm-freezer, since this
> work is initiated and needed for the work on freezer.

Awesome.  Paul, what do you think?  Rafael, if Paul agrees, can you
please take in this series?

Thanks.

-- 
tejun

^ permalink raw reply

* Issue: Runtime API usage in wake-up device irq_handler during wakeup from system-wide-suspend.
From: Govindraj.R @ 2011-08-26 13:01 UTC (permalink / raw)
  To: Rafael J. Wysocki, Paul Walmsley, Kevin Hilman
  Cc: Tero Kristo, Felipe Balbi, Partha Basak, Govindraj R,
	Keshava Munegowda, linux-pm, linux-omap

Hello,

During system_wide_suspend pm runtime is disabled.
I.e. __pm_runtime_disable is called from __device_suspend.
Now, if a wakeup interrupt is triggered and the wakeup device irq handler
is called even before device_resume and pm_runtime_enable happens,
the device irq_handler proceeds to enable clock with runtime API to
handle wakeup event.

Wouldn't this result in system wide abort since the pm_runtime is not enabled
yet from dpm_resume?
As we end up accessing regs after doing runtime get_sync.

Looks like this scenario is not handled currently.
Or Am I missing something here?

--
Thanks,
Govindraj.R

^ permalink raw reply

* Re: [PATCHSET] cgroup: introduce cgroup_taskset and consolidate subsys methods, take#2
From: Li Zefan @ 2011-08-26  8:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: fweisbec, containers, linux-kernel, linux-pm, paul,
	kamezawa.hiroyu
In-Reply-To: <1314312192-26885-1-git-send-email-tj@kernel.org>

Tejun Heo wrote:
> Hello,
> 
> This is the second take of cgroup_taskset patchset.  This patchset
> introduces cgroup_taskset along with some accessors and iterator,
> updates methods to use it, consolidates usages and drops superflous
> methods.  For more info, please read the head message and discussions
> from the last take[L].
> 
> Changes from the last take[L] are mostly minor.
> 
> * Acked-by's added
> * patch contamination fixed
> * local variable renamed
> * documentation updated
> 
> It contains the following six patches.
> 
>  0001-cgroup-subsys-attach_task-should-be-called-after-mig.patch
>  0002-cgroup-improve-old-cgroup-handling-in-cgroup_attach_.patch
>  0003-cgroup-introduce-cgroup_taskset-and-use-it-in-subsys.patch
>  0004-cgroup-don-t-use-subsys-can_attach_task-or-attach_ta.patch
>  0005-cgroup-cpuset-don-t-use-ss-pre_attach.patch
>  0006-cgroup-kill-subsys-can_attach_task-pre_attach-and-at.patch

This is really nice.

Acked-by: Li Zefan <lizf@cn.fujitsu.com>

> 
> and is based on the current linux-pm/pm-freezer (7b5b95b3f5 "freezer:
> remove should_send_signal() and update frozen()"), and available in
> the following git tree.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer
> 
> Any ideas on how to route these patches?  If we float these in -mm,
> further freezer patches would have to be in -mm too which in turn will
> force job control patches depending on them to -mm too.  It would be
> really nice if we can find a stable git branch to host these.  Routing
> through pm-freezer might not be such a bad idea either, I think.
> 

It's fine for me to route those patches through pm-freezer, since this
work is initiated and needed for the work on freezer.

> Thank you.
> 
>  Documentation/cgroups/cgroups.txt |   51 +++------
>  block/blk-cgroup.c                |   45 +++++---
>  include/linux/cgroup.h            |   31 ++++-
>  kernel/cgroup.c                   |  200 ++++++++++++++++++++++++--------------
>  kernel/cgroup_freezer.c           |   16 ---
>  kernel/cpuset.c                   |  105 +++++++++----------
>  kernel/events/core.c              |   13 +-
>  kernel/sched.c                    |   31 +++--
>  mm/memcontrol.c                   |   16 +--
>  security/device_cgroup.c          |    7 -
>  10 files changed, 294 insertions(+), 221 deletions(-)

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Tejun Heo @ 2011-08-26  6:40 UTC (permalink / raw)
  To: Matt Helsley
  Cc: fweisbec, containers, lizf, linux-kernel, James Morris,
	Daisuke Nishimura, linux-pm, paul, kamezawa.hiroyu
In-Reply-To: <20110826023818.GC3457@count0.beaverton.ibm.com>

Hello, Matt.

On Thu, Aug 25, 2011 at 07:38:18PM -0700, Matt Helsley wrote:
> On Fri, Aug 26, 2011 at 12:43:09AM +0200, Tejun Heo wrote:
> > Currently, there's no way to pass multiple tasks to cgroup_subsys
> > methods necessitating the need for separate per-process and per-task
> > methods.  This patch introduces cgroup_taskset which can be used to
> > pass multiple tasks and their associated cgroups to cgroup_subsys
> > methods.
> 
> This will be the third iterator-ish pattern in the cgroup code.
> It's not your fault but it does seem a bit much to have:
...
> Is there a sane way to merge all this?

If merging isn't complicated and simplifies the code and its users,
definitely but I don't think that will be the case here.  If there are
different targets to iterate, having matching iterators isn't a weird
thing after all.  I personally am not too enthusiastic about mixing
locking with iterator (the implied read_lock in the cgroup iterator)
or iteration-by-callback but that's a different issue.

> Perhaps we could drop the iterator interfaces in 1) and 2) and replace
> it with:
> 
> 		tset cgroup_taskset_from_cgroup(cgroup)
> 
> which would grab the css set lock to construct the tset from the given
> cgroup -- essentially inlining the current iterator code into a single
> function.

* It doesn't need to walk the whole cgroup.

* It requires stable set of tasks (ie. atomicity w.r.t. the thread
  group) with sleepable context which is currently achieved by a rwsem
  in the task to protect against fork and getting an extra reference
  to the tasks in the set.

I don't think widening the scope would achieve much (why do we want to
pass in unrelated tasks to subsystems as a single bundle?) while
introducing unnecessary scalability limitation (moving out a single
process ends up being an operation on the whole cgroup).

I don't see how it would be better but, if you do, please go ahead.
Consolidating stuff is (almost) always a good idea. :)

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 6/6] cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
From: KAMEZAWA Hiroyuki @ 2011-08-26  4:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: fweisbec, containers, lizf, linux-kernel, linux-pm, paul
In-Reply-To: <1314312192-26885-7-git-send-email-tj@kernel.org>

On Fri, 26 Aug 2011 00:43:12 +0200
Tejun Heo <tj@kernel.org> wrote:

> These three methods are no longer used.  Kill them.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Acked-by: Paul Menage <paul@paulmenage.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply

* Re: [PATCH 5/6] cgroup, cpuset: don't use ss->pre_attach()
From: KAMEZAWA Hiroyuki @ 2011-08-26  4:19 UTC (permalink / raw)
  To: Tejun Heo; +Cc: fweisbec, containers, lizf, linux-kernel, linux-pm, paul
In-Reply-To: <1314312192-26885-6-git-send-email-tj@kernel.org>

On Fri, 26 Aug 2011 00:43:11 +0200
Tejun Heo <tj@kernel.org> wrote:

> ->pre_attach() is supposed to be called before migration, which is
> observed during process migration but task migration does it the other
> way around.  The only ->pre_attach() user is cpuset which can do the
> same operaitons in ->can_attach().  Collapse cpuset_pre_attach() into
> cpuset_can_attach().
> 
> -v2: Patch contamination from later patch removed.  Spotted by Paul
>      Menage.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Acked-by: Paul Menage <paul@paulmenage.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply

* Re: [PATCH 4/6] cgroup: don't use subsys->can_attach_task() or ->attach_task()
From: KAMEZAWA Hiroyuki @ 2011-08-26  4:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, fweisbec, containers, Ingo Molnar, lizf,
	linux-kernel, James Morris, Daisuke Nishimura, linux-pm, paul
In-Reply-To: <1314312192-26885-5-git-send-email-tj@kernel.org>

On Fri, 26 Aug 2011 00:43:10 +0200
Tejun Heo <tj@kernel.org> wrote:

> Now that subsys->can_attach() and attach() take @tset instead of
> @task, they can handle per-task operations.  Convert
> ->can_attach_task() and ->attach_task() users to use ->can_attach()
> and attach() instead.  Most converions are straight-forward.
> Noteworthy changes are,
> 
> * In cgroup_freezer, remove unnecessary NULL assignments to unused
>   methods.  It's useless and very prone to get out of sync, which
>   already happened.
> 
> * In cpuset, PF_THREAD_BOUND test is checked for each task.  This
>   doesn't make any practical difference but is conceptually cleaner.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Paul Menage <paul@paulmenage.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: James Morris <jmorris@namei.org>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <peterz@infradead.org>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: KAMEZAWA Hiroyuki @ 2011-08-26  4:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: fweisbec, containers, lizf, linux-kernel, James Morris,
	Daisuke Nishimura, linux-pm, paul
In-Reply-To: <1314312192-26885-4-git-send-email-tj@kernel.org>

On Fri, 26 Aug 2011 00:43:09 +0200
Tejun Heo <tj@kernel.org> wrote:

> Currently, there's no way to pass multiple tasks to cgroup_subsys
> methods necessitating the need for separate per-process and per-task
> methods.  This patch introduces cgroup_taskset which can be used to
> pass multiple tasks and their associated cgroups to cgroup_subsys
> methods.
> 
> Three methods - can_attach(), cancel_attach() and attach() - are
> converted to use cgroup_taskset.  This unifies passed parameters so
> that all methods have access to all information.  Conversions in this
> patchset are identical and don't introduce any behavior change.
> 
> -v2: documentation updated as per Paul Menage's suggestion.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Acked-by: Paul Menage <paul@paulmenage.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: James Morris <jmorris@namei.org>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply

* Re: [PATCH 2/6] cgroup: improve old cgroup handling in cgroup_attach_proc()
From: KAMEZAWA Hiroyuki @ 2011-08-26  4:13 UTC (permalink / raw)
  To: Tejun Heo; +Cc: fweisbec, containers, lizf, linux-kernel, linux-pm, paul
In-Reply-To: <1314312192-26885-3-git-send-email-tj@kernel.org>

On Fri, 26 Aug 2011 00:43:08 +0200
Tejun Heo <tj@kernel.org> wrote:

> cgroup_attach_proc() behaves differently from cgroup_attach_task() in
> the following aspects.
> 
> * All hooks are invoked even if no task is actually being moved.
> 
> * ->can_attach_task() is called for all tasks in the group whether the
>   new cgrp is different from the current cgrp or not; however,
>   ->attach_task() is skipped if new equals new.  This makes the calls
>   asymmetric.
> 
> This patch improves old cgroup handling in cgroup_attach_proc() by
> looking up the current cgroup at the head, recording it in the flex
> array along with the task itself, and using it to remove the above two
> differences.  This will also ease further changes.
> 
> -v2: nr_todo renamed to nr_migrating_tasks as per Paul Menage's
>      suggestion.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Acked-by: Paul Menage <paul@paulmenage.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

^ permalink raw reply

* Re: [PATCH 1/6] cgroup: subsys->attach_task() should be called after migration
From: KAMEZAWA Hiroyuki @ 2011-08-26  4:12 UTC (permalink / raw)
  To: Tejun Heo; +Cc: fweisbec, containers, lizf, linux-kernel, linux-pm, paul
In-Reply-To: <1314312192-26885-2-git-send-email-tj@kernel.org>

On Fri, 26 Aug 2011 00:43:07 +0200
Tejun Heo <tj@kernel.org> wrote:

> cgroup_attach_task() calls subsys->attach_task() after
> cgroup_task_migrate(); however, cgroup_attach_proc() calls it before
> migration.  This actually affects some of the users.  Update
> cgroup_attach_proc() such that ->attach_task() is called after
> migration.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Paul Menage <paul@paulmenage.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>'

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@Jp.fujitsu.com>

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Matt Helsley @ 2011-08-26  2:48 UTC (permalink / raw)
  To: Matt Helsley
  Cc: fweisbec, paul, lizf, linux-kernel, James Morris,
	Daisuke Nishimura, Tejun Heo, linux-pm, containers,
	kamezawa.hiroyu
In-Reply-To: <20110826023818.GC3457@count0.beaverton.ibm.com>

On Thu, Aug 25, 2011 at 07:38:18PM -0700, Matt Helsley wrote:
> On Fri, Aug 26, 2011 at 12:43:09AM +0200, Tejun Heo wrote:
> > Currently, there's no way to pass multiple tasks to cgroup_subsys
> > methods necessitating the need for separate per-process and per-task
> > methods.  This patch introduces cgroup_taskset which can be used to
> > pass multiple tasks and their associated cgroups to cgroup_subsys
> > methods.
> 
> This will be the third iterator-ish pattern in the cgroup code.
> It's not your fault but it does seem a bit much to have:
> 
> 	1) When we need to iterate over all tasks in the cgroup and
> 	don't mind holding the css set lock:
> 		void cgroup_iter_start(cgroup, iterator)
> 		task cgroup_iter_next(cgroup, iterator)
> 		void cgroup_iter_end(cgroup, iterator)
> 
> 	2) For subsystem methods when we're iterating over a subset of
> 	tasks that may or may not be in the cgroup (e.g. for can_attach)
> 	-- use cgroup_tasksets:
> 		task cgroup_taskset_first(tset)
> 		task cgroup_taskset_next(tset)
> 
> 	3) An iterator over all the tasks which doesn't hold the css set
> 	lock:
> 		struct cgroup_scanner {
> 			struct cgroup *cg;
> 			int (*test_task)(struct task_struct *p,
> 					struct cgroup_scanner *scan);
> 			void (*process_task)(struct task_struct *p,
> 					struct cgroup_scanner *scan);
> 			struct ptr_heap *heap;
> 			void *data;
> 		};
> 	This is only used in cpuset code so far. Are other cgroup
> 	patches planning on making use of it?
> 
> Is there a sane way to merge all this?
> 
> Perhaps we could drop the iterator interfaces in 1) and 2) and replace

Oops! I meant to suggest dropping interfaces 1) and 3).

Cheers,
	-Matt

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Matt Helsley @ 2011-08-26  2:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: fweisbec, containers, lizf, linux-kernel, James Morris,
	Daisuke Nishimura, linux-pm, paul, kamezawa.hiroyu
In-Reply-To: <1314312192-26885-4-git-send-email-tj@kernel.org>

On Fri, Aug 26, 2011 at 12:43:09AM +0200, Tejun Heo wrote:
> Currently, there's no way to pass multiple tasks to cgroup_subsys
> methods necessitating the need for separate per-process and per-task
> methods.  This patch introduces cgroup_taskset which can be used to
> pass multiple tasks and their associated cgroups to cgroup_subsys
> methods.

This will be the third iterator-ish pattern in the cgroup code.
It's not your fault but it does seem a bit much to have:

	1) When we need to iterate over all tasks in the cgroup and
	don't mind holding the css set lock:
		void cgroup_iter_start(cgroup, iterator)
		task cgroup_iter_next(cgroup, iterator)
		void cgroup_iter_end(cgroup, iterator)

	2) For subsystem methods when we're iterating over a subset of
	tasks that may or may not be in the cgroup (e.g. for can_attach)
	-- use cgroup_tasksets:
		task cgroup_taskset_first(tset)
		task cgroup_taskset_next(tset)

	3) An iterator over all the tasks which doesn't hold the css set
	lock:
		struct cgroup_scanner {
			struct cgroup *cg;
			int (*test_task)(struct task_struct *p,
					struct cgroup_scanner *scan);
			void (*process_task)(struct task_struct *p,
					struct cgroup_scanner *scan);
			struct ptr_heap *heap;
			void *data;
		};
	This is only used in cpuset code so far. Are other cgroup
	patches planning on making use of it?

Is there a sane way to merge all this?

Perhaps we could drop the iterator interfaces in 1) and 2) and replace
it with:

		tset cgroup_taskset_from_cgroup(cgroup)

which would grab the css set lock to construct the tset from the given
cgroup -- essentially inlining the current iterator code into a single
function.

Of course this would eliminate the restriction that all the tasks in the
taskset are part of the same thread group; that would only be true for
those passed as parameters to the subsystem methods.

Cheers,
	-Matt Helsley

^ permalink raw reply

* Re: [PATCH 02/11] PM: extend PM QoS with per-device wake-up constraints
From: MyungJoo Ham @ 2011-08-26  2:25 UTC (permalink / raw)
  To: jean.pihet; +Cc: markgross, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1309446685-17502-3-git-send-email-j-pihet@ti.com>

On Fri, Jul 1, 2011 at 12:11 AM,  <jean.pihet@newoldbits.com> wrote:
> @@ -129,19 +146,19 @@ static const struct file_operations pm_qos_power_fops = {
>  /* unlocked internal variant */
>  static inline int pm_qos_get_value(struct pm_qos_object *o)
>  {
> -       if (plist_head_empty(&o->requests))
> +       if (plist_head_empty(o->requests))
>                return o->default_value;
>
>        switch (o->type) {
> -       case PM_QOS_MIN:
> -               return plist_first(&o->requests)->prio;
> +               case PM_QOS_MIN:
> +                       return plist_first(o->requests)->prio;
>
> -       case PM_QOS_MAX:
> -               return plist_last(&o->requests)->prio;
> +               case PM_QOS_MAX:
> +                       return plist_last(o->requests)->prio;
>
> -       default:
> -               /* runtime check for not using enum */
> -               BUG();
> +               default:
> +                       /* runtime check for not using enum */
> +                       BUG();
>        }
>  }

Hello,


Sorry to jump in this late, but, I've got a question in choosing QoS
value from the list with pm_qos_get_value function and pm_qos_type.

For QoS objects such as network throughput, wouldn't "PM_QOS_ADD" be
more appropriate than PM_QOS_MAX?
If A is requesting 10MB/s on NIC-X and B is requesting 5MB/s on NIC-X,
I guess PM QOS should say NIC-X that it needs to provide 15MB/s, not
10MB/s. Or, are we assuming that A and B will never put streams at the
same time?


Thanks,
MyungJoo

-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab, DMC Business, Samsung Electronics

^ permalink raw reply

* Re: [PATCHSET] cgroup: introduce cgroup_taskset and consolidate subsys methods, take#2
From: Tejun Heo @ 2011-08-25 22:48 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: fweisbec, containers, linux-kernel, linux-pm, kamezawa.hiroyu
In-Reply-To: <1314312192-26885-1-git-send-email-tj@kernel.org>

On Fri, Aug 26, 2011 at 12:43:06AM +0200, Tejun Heo wrote:
> and is based on the current linux-pm/pm-freezer (7b5b95b3f5 "freezer:
> remove should_send_signal() and update frozen()"), and available in
> the following git tree.
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer

Oops, that should have been

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cgroup_taskset

The above hasn't been properly mirrored yet (it should soon).  The
commit ID should be 53fd922e120c4ba0b65e388c2cda3c858f92db7c.
master.korg contains the master copy.

 ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc.git cgroup_taskset

Thanks.

-- 
tejun

^ permalink raw reply

* [PATCH 6/6] cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
From: Tejun Heo @ 2011-08-25 22:43 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: fweisbec, containers, linux-kernel, Tejun Heo, linux-pm,
	kamezawa.hiroyu
In-Reply-To: <1314312192-26885-1-git-send-email-tj@kernel.org>

These three methods are no longer used.  Kill them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
 Documentation/cgroups/cgroups.txt |   20 --------------
 include/linux/cgroup.h            |    3 --
 kernel/cgroup.c                   |   53 +++---------------------------------
 3 files changed, 5 insertions(+), 71 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index bf5d6c9..eb1b609 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -615,13 +615,6 @@ fork. If this method returns 0 (success) then this should remain valid
 while the caller holds cgroup_mutex and it is ensured that either
 attach() or cancel_attach() will be called in future.
 
-int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk);
-(cgroup_mutex held by caller)
-
-As can_attach, but for operations that must be run once per task to be
-attached (possibly many when using cgroup_attach_proc). Called after
-can_attach.
-
 void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 		   struct cgroup_taskset *tset)
 (cgroup_mutex held by caller)
@@ -632,12 +625,6 @@ function, so that the subsystem can implement a rollback. If not, not necessary.
 This will be called only about subsystems whose can_attach() operation have
 succeeded. The parameters are identical to can_attach().
 
-void pre_attach(struct cgroup *cgrp);
-(cgroup_mutex held by caller)
-
-For any non-per-thread attachment work that needs to happen before
-attach_task. Needed by cpuset.
-
 void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 	    struct cgroup_taskset *tset)
 (cgroup_mutex held by caller)
@@ -646,13 +633,6 @@ Called after the task has been attached to the cgroup, to allow any
 post-attachment activity that requires memory allocations or blocking.
 The parameters are identical to can_attach().
 
-void attach_task(struct cgroup *cgrp, struct task_struct *tsk);
-(cgroup_mutex held by caller)
-
-As attach, but for operations that must be run once per task to be attached,
-like can_attach_task. Called before attach. Currently does not support any
-subsystem that might need the old_cgrp for every thread in the group.
-
 void fork(struct cgroup_subsy *ss, struct task_struct *task)
 
 Called when a task is forked into a cgroup.
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 2470c8e..5659d37 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -490,11 +490,8 @@ struct cgroup_subsys {
 	void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
 			  struct cgroup_taskset *tset);
-	int (*can_attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
 	void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
 			      struct cgroup_taskset *tset);
-	void (*pre_attach)(struct cgroup *cgrp);
-	void (*attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
 	void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
 		       struct cgroup_taskset *tset);
 	void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f3c7f7a..ce765ec 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1926,13 +1926,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 				goto out;
 			}
 		}
-		if (ss->can_attach_task) {
-			retval = ss->can_attach_task(cgrp, tsk);
-			if (retval) {
-				failed_ss = ss;
-				goto out;
-			}
-		}
 	}
 
 	retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, false);
@@ -1940,10 +1933,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 		goto out;
 
 	for_each_subsys(root, ss) {
-		if (ss->pre_attach)
-			ss->pre_attach(cgrp);
-		if (ss->attach_task)
-			ss->attach_task(cgrp, tsk);
 		if (ss->attach)
 			ss->attach(ss, cgrp, &tset);
 	}
@@ -2075,7 +2064,6 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 {
 	int retval, i, group_size, nr_migrating_tasks;
 	struct cgroup_subsys *ss, *failed_ss = NULL;
-	bool cancel_failed_ss = false;
 	/* guaranteed to be initialized later, but the compiler needs this */
 	struct css_set *oldcg;
 	struct cgroupfs_root *root = cgrp->root;
@@ -2166,21 +2154,6 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 				goto out_cancel_attach;
 			}
 		}
-		/* a callback to be run on every thread in the threadgroup. */
-		if (ss->can_attach_task) {
-			/* run on each task in the threadgroup. */
-			for (i = 0; i < group_size; i++) {
-				tc = flex_array_get(group, i);
-				if (tc->cgrp == cgrp)
-					continue;
-				retval = ss->can_attach_task(cgrp, tc->task);
-				if (retval) {
-					failed_ss = ss;
-					cancel_failed_ss = true;
-					goto out_cancel_attach;
-				}
-			}
-		}
 	}
 
 	/*
@@ -2217,15 +2190,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 	}
 
 	/*
-	 * step 3: now that we're guaranteed success wrt the css_sets, proceed
-	 * to move all tasks to the new cgroup, calling ss->attach_task for each
-	 * one along the way. there are no failure cases after here, so this is
-	 * the commit point.
+	 * step 3: now that we're guaranteed success wrt the css_sets,
+	 * proceed to move all tasks to the new cgroup.  There are no
+	 * failure cases after here, so this is the commit point.
 	 */
-	for_each_subsys(root, ss) {
-		if (ss->pre_attach)
-			ss->pre_attach(cgrp);
-	}
 	for (i = 0; i < group_size; i++) {
 		tc = flex_array_get(group, i);
 		/* leave current thread as it is if it's already there */
@@ -2235,19 +2203,11 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 		/* if the thread is PF_EXITING, it can just get skipped. */
 		retval = cgroup_task_migrate(cgrp, tc->cgrp, tc->task, true);
 		BUG_ON(retval != 0 && retval != -ESRCH);
-
-		/* attach each task to each subsystem */
-		for_each_subsys(root, ss) {
-			if (ss->attach_task)
-				ss->attach_task(cgrp, tc->task);
-		}
 	}
 	/* nothing is sensitive to fork() after this point. */
 
 	/*
-	 * step 4: do expensive, non-thread-specific subsystem callbacks.
-	 * TODO: if ever a subsystem needs to know the oldcgrp for each task
-	 * being moved, this call will need to be reworked to communicate that.
+	 * step 4: do subsystem attach callbacks.
 	 */
 	for_each_subsys(root, ss) {
 		if (ss->attach)
@@ -2271,11 +2231,8 @@ out_cancel_attach:
 	/* same deal as in cgroup_attach_task */
 	if (retval) {
 		for_each_subsys(root, ss) {
-			if (ss == failed_ss) {
-				if (cancel_failed_ss && ss->cancel_attach)
-					ss->cancel_attach(ss, cgrp, &tset);
+			if (ss == failed_ss)
 				break;
-			}
 			if (ss->cancel_attach)
 				ss->cancel_attach(ss, cgrp, &tset);
 		}
-- 
1.7.6

^ permalink raw reply related

* [PATCH 5/6] cgroup, cpuset: don't use ss->pre_attach()
From: Tejun Heo @ 2011-08-25 22:43 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: fweisbec, containers, linux-kernel, Tejun Heo, linux-pm,
	kamezawa.hiroyu
In-Reply-To: <1314312192-26885-1-git-send-email-tj@kernel.org>

->pre_attach() is supposed to be called before migration, which is
observed during process migration but task migration does it the other
way around.  The only ->pre_attach() user is cpuset which can do the
same operaitons in ->can_attach().  Collapse cpuset_pre_attach() into
cpuset_can_attach().

-v2: Patch contamination from later patch removed.  Spotted by Paul
     Menage.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/cpuset.c |   29 ++++++++++++-----------------
 1 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 472ddd6..f0b8df3 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1367,6 +1367,15 @@ static int fmeter_getrate(struct fmeter *fmp)
 	return val;
 }
 
+/*
+ * Protected by cgroup_lock. The nodemasks must be stored globally because
+ * dynamically allocating them is not allowed in can_attach, and they must
+ * persist until attach.
+ */
+static cpumask_var_t cpus_attach;
+static nodemask_t cpuset_attach_nodemask_from;
+static nodemask_t cpuset_attach_nodemask_to;
+
 /* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */
 static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 			     struct cgroup_taskset *tset)
@@ -1393,29 +1402,16 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 		if ((ret = security_task_setscheduler(task)))
 			return ret;
 	}
-	return 0;
-}
-
-/*
- * Protected by cgroup_lock. The nodemasks must be stored globally because
- * dynamically allocating them is not allowed in pre_attach, and they must
- * persist among pre_attach, and attach.
- */
-static cpumask_var_t cpus_attach;
-static nodemask_t cpuset_attach_nodemask_from;
-static nodemask_t cpuset_attach_nodemask_to;
-
-/* Set-up work for before attaching each task. */
-static void cpuset_pre_attach(struct cgroup *cont)
-{
-	struct cpuset *cs = cgroup_cs(cont);
 
+	/* prepare for attach */
 	if (cs == &top_cpuset)
 		cpumask_copy(cpus_attach, cpu_possible_mask);
 	else
 		guarantee_online_cpus(cs, cpus_attach);
 
 	guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
+
+	return 0;
 }
 
 static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
@@ -1901,7 +1897,6 @@ struct cgroup_subsys cpuset_subsys = {
 	.create = cpuset_create,
 	.destroy = cpuset_destroy,
 	.can_attach = cpuset_can_attach,
-	.pre_attach = cpuset_pre_attach,
 	.attach = cpuset_attach,
 	.populate = cpuset_populate,
 	.post_clone = cpuset_post_clone,
-- 
1.7.6

^ permalink raw reply related

* [PATCH 4/6] cgroup: don't use subsys->can_attach_task() or ->attach_task()
From: Tejun Heo @ 2011-08-25 22:43 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: Peter Zijlstra, fweisbec, containers, Ingo Molnar,
	Daisuke Nishimura, linux-kernel, James Morris, Tejun Heo,
	linux-pm, kamezawa.hiroyu
In-Reply-To: <1314312192-26885-1-git-send-email-tj@kernel.org>

Now that subsys->can_attach() and attach() take @tset instead of
@task, they can handle per-task operations.  Convert
->can_attach_task() and ->attach_task() users to use ->can_attach()
and attach() instead.  Most converions are straight-forward.
Noteworthy changes are,

* In cgroup_freezer, remove unnecessary NULL assignments to unused
  methods.  It's useless and very prone to get out of sync, which
  already happened.

* In cpuset, PF_THREAD_BOUND test is checked for each task.  This
  doesn't make any practical difference but is conceptually cleaner.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: James Morris <jmorris@namei.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 block/blk-cgroup.c      |   45 +++++++++++++++++++-----------
 kernel/cgroup_freezer.c |   14 +++-------
 kernel/cpuset.c         |   70 +++++++++++++++++++++-------------------------
 kernel/events/core.c    |   13 +++++---
 kernel/sched.c          |   31 +++++++++++++--------
 5 files changed, 91 insertions(+), 82 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index bcaf16e..99e0bd4 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -30,8 +30,10 @@ EXPORT_SYMBOL_GPL(blkio_root_cgroup);
 
 static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
 						  struct cgroup *);
-static int blkiocg_can_attach_task(struct cgroup *, struct task_struct *);
-static void blkiocg_attach_task(struct cgroup *, struct task_struct *);
+static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
+			      struct cgroup_taskset *);
+static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
+			   struct cgroup_taskset *);
 static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
 static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
 
@@ -44,8 +46,8 @@ static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
 struct cgroup_subsys blkio_subsys = {
 	.name = "blkio",
 	.create = blkiocg_create,
-	.can_attach_task = blkiocg_can_attach_task,
-	.attach_task = blkiocg_attach_task,
+	.can_attach = blkiocg_can_attach,
+	.attach = blkiocg_attach,
 	.destroy = blkiocg_destroy,
 	.populate = blkiocg_populate,
 #ifdef CONFIG_BLK_CGROUP
@@ -1614,30 +1616,39 @@ done:
  * of the main cic data structures.  For now we allow a task to change
  * its cgroup only if it's the only owner of its ioc.
  */
-static int blkiocg_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
+static int blkiocg_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+			      struct cgroup_taskset *tset)
 {
+	struct task_struct *task;
 	struct io_context *ioc;
 	int ret = 0;
 
 	/* task_lock() is needed to avoid races with exit_io_context() */
-	task_lock(tsk);
-	ioc = tsk->io_context;
-	if (ioc && atomic_read(&ioc->nr_tasks) > 1)
-		ret = -EINVAL;
-	task_unlock(tsk);
-
+	cgroup_taskset_for_each(task, cgrp, tset) {
+		task_lock(task);
+		ioc = task->io_context;
+		if (ioc && atomic_read(&ioc->nr_tasks) > 1)
+			ret = -EINVAL;
+		task_unlock(task);
+		if (ret)
+			break;
+	}
 	return ret;
 }
 
-static void blkiocg_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
+static void blkiocg_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+			   struct cgroup_taskset *tset)
 {
+	struct task_struct *task;
 	struct io_context *ioc;
 
-	task_lock(tsk);
-	ioc = tsk->io_context;
-	if (ioc)
-		ioc->cgroup_changed = 1;
-	task_unlock(tsk);
+	cgroup_taskset_for_each(task, cgrp, tset) {
+		task_lock(task);
+		ioc = task->io_context;
+		if (ioc)
+			ioc->cgroup_changed = 1;
+		task_unlock(task);
+	}
 }
 
 void blkio_policy_register(struct blkio_policy_type *blkiop)
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index a2b0082..2cb5e72 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -162,10 +162,14 @@ static int freezer_can_attach(struct cgroup_subsys *ss,
 			      struct cgroup_taskset *tset)
 {
 	struct freezer *freezer;
+	struct task_struct *task;
 
 	/*
 	 * Anything frozen can't move or be moved to/from.
 	 */
+	cgroup_taskset_for_each(task, new_cgroup, tset)
+		if (cgroup_freezing(task))
+			return -EBUSY;
 
 	freezer = cgroup_freezer(new_cgroup);
 	if (freezer->state != CGROUP_THAWED)
@@ -174,11 +178,6 @@ static int freezer_can_attach(struct cgroup_subsys *ss,
 	return 0;
 }
 
-static int freezer_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
-{
-	return cgroup_freezing(tsk) ? -EBUSY : 0;
-}
-
 static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
 {
 	struct freezer *freezer;
@@ -374,10 +373,5 @@ struct cgroup_subsys freezer_subsys = {
 	.populate	= freezer_populate,
 	.subsys_id	= freezer_subsys_id,
 	.can_attach	= freezer_can_attach,
-	.can_attach_task = freezer_can_attach_task,
-	.pre_attach	= NULL,
-	.attach_task	= NULL,
-	.attach		= NULL,
 	.fork		= freezer_fork,
-	.exit		= NULL,
 };
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 2e5825b..472ddd6 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1372,33 +1372,34 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 			     struct cgroup_taskset *tset)
 {
 	struct cpuset *cs = cgroup_cs(cgrp);
+	struct task_struct *task;
+	int ret;
 
 	if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))
 		return -ENOSPC;
 
-	/*
-	 * Kthreads bound to specific cpus cannot be moved to a new cpuset; we
-	 * cannot change their cpu affinity and isolating such threads by their
-	 * set of allowed nodes is unnecessary.  Thus, cpusets are not
-	 * applicable for such threads.  This prevents checking for success of
-	 * set_cpus_allowed_ptr() on all attached tasks before cpus_allowed may
-	 * be changed.
-	 */
-	if (cgroup_taskset_first(tset)->flags & PF_THREAD_BOUND)
-		return -EINVAL;
-
+	cgroup_taskset_for_each(task, cgrp, tset) {
+		/*
+		 * Kthreads bound to specific cpus cannot be moved to a new
+		 * cpuset; we cannot change their cpu affinity and
+		 * isolating such threads by their set of allowed nodes is
+		 * unnecessary.  Thus, cpusets are not applicable for such
+		 * threads.  This prevents checking for success of
+		 * set_cpus_allowed_ptr() on all attached tasks before
+		 * cpus_allowed may be changed.
+		 */
+		if (task->flags & PF_THREAD_BOUND)
+			return -EINVAL;
+		if ((ret = security_task_setscheduler(task)))
+			return ret;
+	}
 	return 0;
 }
 
-static int cpuset_can_attach_task(struct cgroup *cgrp, struct task_struct *task)
-{
-	return security_task_setscheduler(task);
-}
-
 /*
  * Protected by cgroup_lock. The nodemasks must be stored globally because
  * dynamically allocating them is not allowed in pre_attach, and they must
- * persist among pre_attach, attach_task, and attach.
+ * persist among pre_attach, and attach.
  */
 static cpumask_var_t cpus_attach;
 static nodemask_t cpuset_attach_nodemask_from;
@@ -1417,39 +1418,34 @@ static void cpuset_pre_attach(struct cgroup *cont)
 	guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
 }
 
-/* Per-thread attachment work. */
-static void cpuset_attach_task(struct cgroup *cont, struct task_struct *tsk)
-{
-	int err;
-	struct cpuset *cs = cgroup_cs(cont);
-
-	/*
-	 * can_attach beforehand should guarantee that this doesn't fail.
-	 * TODO: have a better way to handle failure here
-	 */
-	err = set_cpus_allowed_ptr(tsk, cpus_attach);
-	WARN_ON_ONCE(err);
-
-	cpuset_change_task_nodemask(tsk, &cpuset_attach_nodemask_to);
-	cpuset_update_task_spread_flag(cs, tsk);
-}
-
 static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
 			  struct cgroup_taskset *tset)
 {
 	struct mm_struct *mm;
-	struct task_struct *tsk = cgroup_taskset_first(tset);
+	struct task_struct *task;
+	struct task_struct *leader = cgroup_taskset_first(tset);
 	struct cgroup *oldcgrp = cgroup_taskset_cur_cgroup(tset);
 	struct cpuset *cs = cgroup_cs(cgrp);
 	struct cpuset *oldcs = cgroup_cs(oldcgrp);
 
+	cgroup_taskset_for_each(task, cgrp, tset) {
+		/*
+		 * can_attach beforehand should guarantee that this doesn't
+		 * fail.  TODO: have a better way to handle failure here
+		 */
+		WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));
+
+		cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to);
+		cpuset_update_task_spread_flag(cs, task);
+	}
+
 	/*
 	 * Change mm, possibly for multiple threads in a threadgroup. This is
 	 * expensive and may sleep.
 	 */
 	cpuset_attach_nodemask_from = oldcs->mems_allowed;
 	cpuset_attach_nodemask_to = cs->mems_allowed;
-	mm = get_task_mm(tsk);
+	mm = get_task_mm(leader);
 	if (mm) {
 		mpol_rebind_mm(mm, &cpuset_attach_nodemask_to);
 		if (is_memory_migrate(cs))
@@ -1905,9 +1901,7 @@ struct cgroup_subsys cpuset_subsys = {
 	.create = cpuset_create,
 	.destroy = cpuset_destroy,
 	.can_attach = cpuset_can_attach,
-	.can_attach_task = cpuset_can_attach_task,
 	.pre_attach = cpuset_pre_attach,
-	.attach_task = cpuset_attach_task,
 	.attach = cpuset_attach,
 	.populate = cpuset_populate,
 	.post_clone = cpuset_post_clone,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b8785e2..95e189d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7000,10 +7000,13 @@ static int __perf_cgroup_move(void *info)
 	return 0;
 }
 
-static void
-perf_cgroup_attach_task(struct cgroup *cgrp, struct task_struct *task)
+static void perf_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+			       struct cgroup_taskset *tset)
 {
-	task_function_call(task, __perf_cgroup_move, task);
+	struct task_struct *task;
+
+	cgroup_taskset_for_each(task, cgrp, tset)
+		task_function_call(task, __perf_cgroup_move, task);
 }
 
 static void perf_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
@@ -7017,7 +7020,7 @@ static void perf_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
 	if (!(task->flags & PF_EXITING))
 		return;
 
-	perf_cgroup_attach_task(cgrp, task);
+	task_function_call(task, __perf_cgroup_move, task);
 }
 
 struct cgroup_subsys perf_subsys = {
@@ -7026,6 +7029,6 @@ struct cgroup_subsys perf_subsys = {
 	.create		= perf_cgroup_create,
 	.destroy	= perf_cgroup_destroy,
 	.exit		= perf_cgroup_exit,
-	.attach_task	= perf_cgroup_attach_task,
+	.attach		= perf_cgroup_attach,
 };
 #endif /* CONFIG_CGROUP_PERF */
diff --git a/kernel/sched.c b/kernel/sched.c
index ccacdbd..dd7e460 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -8966,24 +8966,31 @@ cpu_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
 	sched_destroy_group(tg);
 }
 
-static int
-cpu_cgroup_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
+static int cpu_cgroup_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+				 struct cgroup_taskset *tset)
 {
+	struct task_struct *task;
+
+	cgroup_taskset_for_each(task, cgrp, tset) {
 #ifdef CONFIG_RT_GROUP_SCHED
-	if (!sched_rt_can_attach(cgroup_tg(cgrp), tsk))
-		return -EINVAL;
+		if (!sched_rt_can_attach(cgroup_tg(cgrp), task))
+			return -EINVAL;
 #else
-	/* We don't support RT-tasks being in separate groups */
-	if (tsk->sched_class != &fair_sched_class)
-		return -EINVAL;
+		/* We don't support RT-tasks being in separate groups */
+		if (task->sched_class != &fair_sched_class)
+			return -EINVAL;
 #endif
+	}
 	return 0;
 }
 
-static void
-cpu_cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
+static void cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+			      struct cgroup_taskset *tset)
 {
-	sched_move_task(tsk);
+	struct task_struct *task;
+
+	cgroup_taskset_for_each(task, cgrp, tset)
+		sched_move_task(task);
 }
 
 static void
@@ -9071,8 +9078,8 @@ struct cgroup_subsys cpu_cgroup_subsys = {
 	.name		= "cpu",
 	.create		= cpu_cgroup_create,
 	.destroy	= cpu_cgroup_destroy,
-	.can_attach_task = cpu_cgroup_can_attach_task,
-	.attach_task	= cpu_cgroup_attach_task,
+	.can_attach	= cpu_cgroup_can_attach,
+	.attach		= cpu_cgroup_attach,
 	.exit		= cpu_cgroup_exit,
 	.populate	= cpu_cgroup_populate,
 	.subsys_id	= cpu_cgroup_subsys_id,
-- 
1.7.6

^ permalink raw reply related

* [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Tejun Heo @ 2011-08-25 22:43 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: fweisbec, containers, Daisuke Nishimura, linux-kernel,
	James Morris, Tejun Heo, linux-pm, kamezawa.hiroyu
In-Reply-To: <1314312192-26885-1-git-send-email-tj@kernel.org>

Currently, there's no way to pass multiple tasks to cgroup_subsys
methods necessitating the need for separate per-process and per-task
methods.  This patch introduces cgroup_taskset which can be used to
pass multiple tasks and their associated cgroups to cgroup_subsys
methods.

Three methods - can_attach(), cancel_attach() and attach() - are
converted to use cgroup_taskset.  This unifies passed parameters so
that all methods have access to all information.  Conversions in this
patchset are identical and don't introduce any behavior change.

-v2: documentation updated as per Paul Menage's suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: James Morris <jmorris@namei.org>
---
 Documentation/cgroups/cgroups.txt |   31 ++++++++----
 include/linux/cgroup.h            |   28 +++++++++-
 kernel/cgroup.c                   |   99 +++++++++++++++++++++++++++++++++----
 kernel/cgroup_freezer.c           |    2 +-
 kernel/cpuset.c                   |   18 ++++---
 mm/memcontrol.c                   |   16 +++---
 security/device_cgroup.c          |    7 ++-
 7 files changed, 158 insertions(+), 43 deletions(-)

diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index cd67e90..bf5d6c9 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -594,15 +594,25 @@ rmdir() will fail with it. From this behavior, pre_destroy() can be
 called multiple times against a cgroup.
 
 int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
-	       struct task_struct *task)
+	       struct cgroup_taskset *tset)
 (cgroup_mutex held by caller)
 
-Called prior to moving a task into a cgroup; if the subsystem
-returns an error, this will abort the attach operation.  If a NULL
-task is passed, then a successful result indicates that *any*
-unspecified task can be moved into the cgroup. Note that this isn't
-called on a fork. If this method returns 0 (success) then this should
-remain valid while the caller holds cgroup_mutex and it is ensured that either
+Called prior to moving one or more tasks into a cgroup; if the
+subsystem returns an error, this will abort the attach operation.
+@tset contains the tasks to be attached and is guaranteed to have at
+least one task in it.
+
+If there are multiple tasks in the taskset, then:
+  - it's guaranteed that all are from the same thread group
+  - @tset contains all tasks from the thread group whether or not
+    they're switching cgroups
+  - the first task is the leader
+
+Each @tset entry also contains the task's old cgroup and tasks which
+aren't switching cgroup can be skipped easily using the
+cgroup_taskset_for_each() iterator. Note that this isn't called on a
+fork. If this method returns 0 (success) then this should remain valid
+while the caller holds cgroup_mutex and it is ensured that either
 attach() or cancel_attach() will be called in future.
 
 int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk);
@@ -613,14 +623,14 @@ attached (possibly many when using cgroup_attach_proc). Called after
 can_attach.
 
 void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
-	       struct task_struct *task, bool threadgroup)
+		   struct cgroup_taskset *tset)
 (cgroup_mutex held by caller)
 
 Called when a task attach operation has failed after can_attach() has succeeded.
 A subsystem whose can_attach() has some side-effects should provide this
 function, so that the subsystem can implement a rollback. If not, not necessary.
 This will be called only about subsystems whose can_attach() operation have
-succeeded.
+succeeded. The parameters are identical to can_attach().
 
 void pre_attach(struct cgroup *cgrp);
 (cgroup_mutex held by caller)
@@ -629,11 +639,12 @@ For any non-per-thread attachment work that needs to happen before
 attach_task. Needed by cpuset.
 
 void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
-	    struct cgroup *old_cgrp, struct task_struct *task)
+	    struct cgroup_taskset *tset)
 (cgroup_mutex held by caller)
 
 Called after the task has been attached to the cgroup, to allow any
 post-attachment activity that requires memory allocations or blocking.
+The parameters are identical to can_attach().
 
 void attach_task(struct cgroup *cgrp, struct task_struct *tsk);
 (cgroup_mutex held by caller)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index da7e4bc..2470c8e 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -457,6 +457,28 @@ void cgroup_exclude_rmdir(struct cgroup_subsys_state *css);
 void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css);
 
 /*
+ * Control Group taskset, used to pass around set of tasks to cgroup_subsys
+ * methods.
+ */
+struct cgroup_taskset;
+struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset);
+struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset);
+struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset);
+int cgroup_taskset_size(struct cgroup_taskset *tset);
+
+/**
+ * cgroup_taskset_for_each - iterate cgroup_taskset
+ * @task: the loop cursor
+ * @skip_cgrp: skip if task's cgroup matches this, %NULL to iterate through all
+ * @tset: taskset to iterate
+ */
+#define cgroup_taskset_for_each(task, skip_cgrp, tset)			\
+	for ((task) = cgroup_taskset_first((tset)); (task);		\
+	     (task) = cgroup_taskset_next((tset)))			\
+		if (!(skip_cgrp) ||					\
+		    cgroup_taskset_cur_cgroup((tset)) != (skip_cgrp))
+
+/*
  * Control Group subsystem type.
  * See Documentation/cgroups/cgroups.txt for details
  */
@@ -467,14 +489,14 @@ struct cgroup_subsys {
 	int (*pre_destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
 	int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
-			  struct task_struct *tsk);
+			  struct cgroup_taskset *tset);
 	int (*can_attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
 	void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
-			      struct task_struct *tsk);
+			      struct cgroup_taskset *tset);
 	void (*pre_attach)(struct cgroup *cgrp);
 	void (*attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
 	void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
-		       struct cgroup *old_cgrp, struct task_struct *tsk);
+		       struct cgroup_taskset *tset);
 	void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
 	void (*exit)(struct cgroup_subsys *ss, struct cgroup *cgrp,
 			struct cgroup *old_cgrp, struct task_struct *task);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 8a47380..f3c7f7a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1739,11 +1739,85 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
 }
 EXPORT_SYMBOL_GPL(cgroup_path);
 
+/*
+ * Control Group taskset
+ */
 struct task_and_cgroup {
 	struct task_struct	*task;
 	struct cgroup		*cgrp;
 };
 
+struct cgroup_taskset {
+	struct task_and_cgroup	single;
+	struct flex_array	*tc_array;
+	int			tc_array_len;
+	int			idx;
+	struct cgroup		*cur_cgrp;
+};
+
+/**
+ * cgroup_taskset_first - reset taskset and return the first task
+ * @tset: taskset of interest
+ *
+ * @tset iteration is initialized and the first task is returned.
+ */
+struct task_struct *cgroup_taskset_first(struct cgroup_taskset *tset)
+{
+	if (tset->tc_array) {
+		tset->idx = 0;
+		return cgroup_taskset_next(tset);
+	} else {
+		tset->cur_cgrp = tset->single.cgrp;
+		return tset->single.task;
+	}
+}
+EXPORT_SYMBOL_GPL(cgroup_taskset_first);
+
+/**
+ * cgroup_taskset_next - iterate to the next task in taskset
+ * @tset: taskset of interest
+ *
+ * Return the next task in @tset.  Iteration must have been initialized
+ * with cgroup_taskset_first().
+ */
+struct task_struct *cgroup_taskset_next(struct cgroup_taskset *tset)
+{
+	struct task_and_cgroup *tc;
+
+	if (!tset->tc_array || tset->idx >= tset->tc_array_len)
+		return NULL;
+
+	tc = flex_array_get(tset->tc_array, tset->idx++);
+	tset->cur_cgrp = tc->cgrp;
+	return tc->task;
+}
+EXPORT_SYMBOL_GPL(cgroup_taskset_next);
+
+/**
+ * cgroup_taskset_cur_cgroup - return the matching cgroup for the current task
+ * @tset: taskset of interest
+ *
+ * Return the cgroup for the current (last returned) task of @tset.  This
+ * function must be preceded by either cgroup_taskset_first() or
+ * cgroup_taskset_next().
+ */
+struct cgroup *cgroup_taskset_cur_cgroup(struct cgroup_taskset *tset)
+{
+	return tset->cur_cgrp;
+}
+EXPORT_SYMBOL_GPL(cgroup_taskset_cur_cgroup);
+
+/**
+ * cgroup_taskset_size - return the number of tasks in taskset
+ * @tset: taskset of interest
+ */
+int cgroup_taskset_size(struct cgroup_taskset *tset)
+{
+	return tset->tc_array ? tset->tc_array_len : 1;
+}
+EXPORT_SYMBOL_GPL(cgroup_taskset_size);
+
+
 /*
  * cgroup_task_migrate - move a task from one cgroup to another.
  *
@@ -1828,15 +1902,19 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 	struct cgroup_subsys *ss, *failed_ss = NULL;
 	struct cgroup *oldcgrp;
 	struct cgroupfs_root *root = cgrp->root;
+	struct cgroup_taskset tset = { };
 
 	/* Nothing to do if the task is already in that cgroup */
 	oldcgrp = task_cgroup_from_root(tsk, root);
 	if (cgrp == oldcgrp)
 		return 0;
 
+	tset.single.task = tsk;
+	tset.single.cgrp = oldcgrp;
+
 	for_each_subsys(root, ss) {
 		if (ss->can_attach) {
-			retval = ss->can_attach(ss, cgrp, tsk);
+			retval = ss->can_attach(ss, cgrp, &tset);
 			if (retval) {
 				/*
 				 * Remember on which subsystem the can_attach()
@@ -1867,7 +1945,7 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
 		if (ss->attach_task)
 			ss->attach_task(cgrp, tsk);
 		if (ss->attach)
-			ss->attach(ss, cgrp, oldcgrp, tsk);
+			ss->attach(ss, cgrp, &tset);
 	}
 
 	synchronize_rcu();
@@ -1889,7 +1967,7 @@ out:
 				 */
 				break;
 			if (ss->cancel_attach)
-				ss->cancel_attach(ss, cgrp, tsk);
+				ss->cancel_attach(ss, cgrp, &tset);
 		}
 	}
 	return retval;
@@ -2005,6 +2083,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 	struct task_struct *tsk;
 	struct task_and_cgroup *tc;
 	struct flex_array *group;
+	struct cgroup_taskset tset = { };
 	/*
 	 * we need to make sure we have css_sets for all the tasks we're
 	 * going to move -before- we actually start moving them, so that in
@@ -2067,6 +2146,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 	} while_each_thread(leader, tsk);
 	/* remember the number of threads in the array for later. */
 	group_size = i;
+	tset.tc_array = group;
+	tset.tc_array_len = group_size;
 	rcu_read_unlock();
 
 	/* methods shouldn't be called if no task is actually migrating */
@@ -2079,7 +2160,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 	 */
 	for_each_subsys(root, ss) {
 		if (ss->can_attach) {
-			retval = ss->can_attach(ss, cgrp, leader);
+			retval = ss->can_attach(ss, cgrp, &tset);
 			if (retval) {
 				failed_ss = ss;
 				goto out_cancel_attach;
@@ -2169,10 +2250,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 	 * being moved, this call will need to be reworked to communicate that.
 	 */
 	for_each_subsys(root, ss) {
-		if (ss->attach) {
-			tc = flex_array_get(group, 0);
-			ss->attach(ss, cgrp, tc->cgrp, tc->task);
-		}
+		if (ss->attach)
+			ss->attach(ss, cgrp, &tset);
 	}
 
 	/*
@@ -2194,11 +2273,11 @@ out_cancel_attach:
 		for_each_subsys(root, ss) {
 			if (ss == failed_ss) {
 				if (cancel_failed_ss && ss->cancel_attach)
-					ss->cancel_attach(ss, cgrp, leader);
+					ss->cancel_attach(ss, cgrp, &tset);
 				break;
 			}
 			if (ss->cancel_attach)
-				ss->cancel_attach(ss, cgrp, leader);
+				ss->cancel_attach(ss, cgrp, &tset);
 		}
 	}
 out_put_tasks:
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 4e82525..a2b0082 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -159,7 +159,7 @@ static void freezer_destroy(struct cgroup_subsys *ss,
  */
 static int freezer_can_attach(struct cgroup_subsys *ss,
 			      struct cgroup *new_cgroup,
-			      struct task_struct *task)
+			      struct cgroup_taskset *tset)
 {
 	struct freezer *freezer;
 
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 10131fd..2e5825b 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1368,10 +1368,10 @@ static int fmeter_getrate(struct fmeter *fmp)
 }
 
 /* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */
-static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
-			     struct task_struct *tsk)
+static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+			     struct cgroup_taskset *tset)
 {
-	struct cpuset *cs = cgroup_cs(cont);
+	struct cpuset *cs = cgroup_cs(cgrp);
 
 	if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))
 		return -ENOSPC;
@@ -1384,7 +1384,7 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
 	 * set_cpus_allowed_ptr() on all attached tasks before cpus_allowed may
 	 * be changed.
 	 */
-	if (tsk->flags & PF_THREAD_BOUND)
+	if (cgroup_taskset_first(tset)->flags & PF_THREAD_BOUND)
 		return -EINVAL;
 
 	return 0;
@@ -1434,12 +1434,14 @@ static void cpuset_attach_task(struct cgroup *cont, struct task_struct *tsk)
 	cpuset_update_task_spread_flag(cs, tsk);
 }
 
-static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cont,
-			  struct cgroup *oldcont, struct task_struct *tsk)
+static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+			  struct cgroup_taskset *tset)
 {
 	struct mm_struct *mm;
-	struct cpuset *cs = cgroup_cs(cont);
-	struct cpuset *oldcs = cgroup_cs(oldcont);
+	struct task_struct *tsk = cgroup_taskset_first(tset);
+	struct cgroup *oldcgrp = cgroup_taskset_cur_cgroup(tset);
+	struct cpuset *cs = cgroup_cs(cgrp);
+	struct cpuset *oldcs = cgroup_cs(oldcgrp);
 
 	/*
 	 * Change mm, possibly for multiple threads in a threadgroup. This is
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 930de94..b2802cc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5460,8 +5460,9 @@ static void mem_cgroup_clear_mc(void)
 
 static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 				struct cgroup *cgroup,
-				struct task_struct *p)
+				struct cgroup_taskset *tset)
 {
+	struct task_struct *p = cgroup_taskset_first(tset);
 	int ret = 0;
 	struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
 
@@ -5499,7 +5500,7 @@ static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 
 static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
 				struct cgroup *cgroup,
-				struct task_struct *p)
+				struct cgroup_taskset *tset)
 {
 	mem_cgroup_clear_mc();
 }
@@ -5616,9 +5617,9 @@ retry:
 
 static void mem_cgroup_move_task(struct cgroup_subsys *ss,
 				struct cgroup *cont,
-				struct cgroup *old_cont,
-				struct task_struct *p)
+				struct cgroup_taskset *tset)
 {
+	struct task_struct *p = cgroup_taskset_first(tset);
 	struct mm_struct *mm = get_task_mm(p);
 
 	if (mm) {
@@ -5633,19 +5634,18 @@ static void mem_cgroup_move_task(struct cgroup_subsys *ss,
 #else	/* !CONFIG_MMU */
 static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
 				struct cgroup *cgroup,
-				struct task_struct *p)
+				struct cgroup_taskset *tset)
 {
 	return 0;
 }
 static void mem_cgroup_cancel_attach(struct cgroup_subsys *ss,
 				struct cgroup *cgroup,
-				struct task_struct *p)
+				struct cgroup_taskset *tset)
 {
 }
 static void mem_cgroup_move_task(struct cgroup_subsys *ss,
 				struct cgroup *cont,
-				struct cgroup *old_cont,
-				struct task_struct *p)
+				struct cgroup_taskset *tset)
 {
 }
 #endif
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 4450fbe..8b5b5d8 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -62,11 +62,12 @@ static inline struct dev_cgroup *task_devcgroup(struct task_struct *task)
 struct cgroup_subsys devices_subsys;
 
 static int devcgroup_can_attach(struct cgroup_subsys *ss,
-		struct cgroup *new_cgroup, struct task_struct *task)
+			struct cgroup *new_cgrp, struct cgroup_taskset *set)
 {
-	if (current != task && !capable(CAP_SYS_ADMIN))
-			return -EPERM;
+	struct task_struct *task = cgroup_taskset_first(set);
 
+	if (current != task && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
 	return 0;
 }
 
-- 
1.7.6

^ permalink raw reply related

* [PATCH 2/6] cgroup: improve old cgroup handling in cgroup_attach_proc()
From: Tejun Heo @ 2011-08-25 22:43 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: fweisbec, containers, linux-kernel, Tejun Heo, linux-pm,
	kamezawa.hiroyu
In-Reply-To: <1314312192-26885-1-git-send-email-tj@kernel.org>

cgroup_attach_proc() behaves differently from cgroup_attach_task() in
the following aspects.

* All hooks are invoked even if no task is actually being moved.

* ->can_attach_task() is called for all tasks in the group whether the
  new cgrp is different from the current cgrp or not; however,
  ->attach_task() is skipped if new equals new.  This makes the calls
  asymmetric.

This patch improves old cgroup handling in cgroup_attach_proc() by
looking up the current cgroup at the head, recording it in the flex
array along with the task itself, and using it to remove the above two
differences.  This will also ease further changes.

-v2: nr_todo renamed to nr_migrating_tasks as per Paul Menage's
     suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/cgroup.c |   70 ++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 44 insertions(+), 26 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a606fa2..8a47380 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1739,6 +1739,11 @@ int cgroup_path(const struct cgroup *cgrp, char *buf, int buflen)
 }
 EXPORT_SYMBOL_GPL(cgroup_path);
 
+struct task_and_cgroup {
+	struct task_struct	*task;
+	struct cgroup		*cgrp;
+};
+
 /*
  * cgroup_task_migrate - move a task from one cgroup to another.
  *
@@ -1990,15 +1995,15 @@ static int css_set_prefetch(struct cgroup *cgrp, struct css_set *cg,
  */
 int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 {
-	int retval, i, group_size;
+	int retval, i, group_size, nr_migrating_tasks;
 	struct cgroup_subsys *ss, *failed_ss = NULL;
 	bool cancel_failed_ss = false;
 	/* guaranteed to be initialized later, but the compiler needs this */
-	struct cgroup *oldcgrp = NULL;
 	struct css_set *oldcg;
 	struct cgroupfs_root *root = cgrp->root;
 	/* threadgroup list cursor and array */
 	struct task_struct *tsk;
+	struct task_and_cgroup *tc;
 	struct flex_array *group;
 	/*
 	 * we need to make sure we have css_sets for all the tasks we're
@@ -2017,8 +2022,7 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 	 */
 	group_size = get_nr_threads(leader);
 	/* flex_array supports very large thread-groups better than kmalloc. */
-	group = flex_array_alloc(sizeof(struct task_struct *), group_size,
-				 GFP_KERNEL);
+	group = flex_array_alloc(sizeof(*tc), group_size, GFP_KERNEL);
 	if (!group)
 		return -ENOMEM;
 	/* pre-allocate to guarantee space while iterating in rcu read-side. */
@@ -2042,8 +2046,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 	}
 	/* take a reference on each task in the group to go in the array. */
 	tsk = leader;
-	i = 0;
+	i = nr_migrating_tasks = 0;
 	do {
+		struct task_and_cgroup ent;
+
 		/* as per above, nr_threads may decrease, but not increase. */
 		BUG_ON(i >= group_size);
 		get_task_struct(tsk);
@@ -2051,14 +2057,23 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 		 * saying GFP_ATOMIC has no effect here because we did prealloc
 		 * earlier, but it's good form to communicate our expectations.
 		 */
-		retval = flex_array_put_ptr(group, i, tsk, GFP_ATOMIC);
+		ent.task = tsk;
+		ent.cgrp = task_cgroup_from_root(tsk, root);
+		retval = flex_array_put(group, i, &ent, GFP_ATOMIC);
 		BUG_ON(retval != 0);
 		i++;
+		if (ent.cgrp != cgrp)
+			nr_migrating_tasks++;
 	} while_each_thread(leader, tsk);
 	/* remember the number of threads in the array for later. */
 	group_size = i;
 	rcu_read_unlock();
 
+	/* methods shouldn't be called if no task is actually migrating */
+	retval = 0;
+	if (!nr_migrating_tasks)
+		goto out_put_tasks;
+
 	/*
 	 * step 1: check that we can legitimately attach to the cgroup.
 	 */
@@ -2074,8 +2089,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 		if (ss->can_attach_task) {
 			/* run on each task in the threadgroup. */
 			for (i = 0; i < group_size; i++) {
-				tsk = flex_array_get_ptr(group, i);
-				retval = ss->can_attach_task(cgrp, tsk);
+				tc = flex_array_get(group, i);
+				if (tc->cgrp == cgrp)
+					continue;
+				retval = ss->can_attach_task(cgrp, tc->task);
 				if (retval) {
 					failed_ss = ss;
 					cancel_failed_ss = true;
@@ -2091,23 +2108,22 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 	 */
 	INIT_LIST_HEAD(&newcg_list);
 	for (i = 0; i < group_size; i++) {
-		tsk = flex_array_get_ptr(group, i);
+		tc = flex_array_get(group, i);
 		/* nothing to do if this task is already in the cgroup */
-		oldcgrp = task_cgroup_from_root(tsk, root);
-		if (cgrp == oldcgrp)
+		if (tc->cgrp == cgrp)
 			continue;
 		/* get old css_set pointer */
-		task_lock(tsk);
-		if (tsk->flags & PF_EXITING) {
+		task_lock(tc->task);
+		if (tc->task->flags & PF_EXITING) {
 			/* ignore this task if it's going away */
-			task_unlock(tsk);
+			task_unlock(tc->task);
 			continue;
 		}
-		oldcg = tsk->cgroups;
+		oldcg = tc->task->cgroups;
 		get_css_set(oldcg);
-		task_unlock(tsk);
+		task_unlock(tc->task);
 		/* see if the new one for us is already in the list? */
-		if (css_set_check_fetched(cgrp, tsk, oldcg, &newcg_list)) {
+		if (css_set_check_fetched(cgrp, tc->task, oldcg, &newcg_list)) {
 			/* was already there, nothing to do. */
 			put_css_set(oldcg);
 		} else {
@@ -2130,20 +2146,19 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 			ss->pre_attach(cgrp);
 	}
 	for (i = 0; i < group_size; i++) {
-		tsk = flex_array_get_ptr(group, i);
+		tc = flex_array_get(group, i);
 		/* leave current thread as it is if it's already there */
-		oldcgrp = task_cgroup_from_root(tsk, root);
-		if (cgrp == oldcgrp)
+		if (tc->cgrp == cgrp)
 			continue;
 
 		/* if the thread is PF_EXITING, it can just get skipped. */
-		retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
+		retval = cgroup_task_migrate(cgrp, tc->cgrp, tc->task, true);
 		BUG_ON(retval != 0 && retval != -ESRCH);
 
 		/* attach each task to each subsystem */
 		for_each_subsys(root, ss) {
 			if (ss->attach_task)
-				ss->attach_task(cgrp, tsk);
+				ss->attach_task(cgrp, tc->task);
 		}
 	}
 	/* nothing is sensitive to fork() after this point. */
@@ -2154,8 +2169,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 	 * being moved, this call will need to be reworked to communicate that.
 	 */
 	for_each_subsys(root, ss) {
-		if (ss->attach)
-			ss->attach(ss, cgrp, oldcgrp, leader);
+		if (ss->attach) {
+			tc = flex_array_get(group, 0);
+			ss->attach(ss, cgrp, tc->cgrp, tc->task);
+		}
 	}
 
 	/*
@@ -2184,10 +2201,11 @@ out_cancel_attach:
 				ss->cancel_attach(ss, cgrp, leader);
 		}
 	}
+out_put_tasks:
 	/* clean up the array of referenced threads in the group. */
 	for (i = 0; i < group_size; i++) {
-		tsk = flex_array_get_ptr(group, i);
-		put_task_struct(tsk);
+		tc = flex_array_get(group, i);
+		put_task_struct(tc->task);
 	}
 out_free_group_list:
 	flex_array_free(group);
-- 
1.7.6

^ permalink raw reply related

* [PATCH 1/6] cgroup: subsys->attach_task() should be called after migration
From: Tejun Heo @ 2011-08-25 22:43 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: fweisbec, containers, linux-kernel, Tejun Heo, linux-pm,
	kamezawa.hiroyu
In-Reply-To: <1314312192-26885-1-git-send-email-tj@kernel.org>

cgroup_attach_task() calls subsys->attach_task() after
cgroup_task_migrate(); however, cgroup_attach_proc() calls it before
migration.  This actually affects some of the users.  Update
cgroup_attach_proc() such that ->attach_task() is called after
migration.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/cgroup.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1d2b6ce..a606fa2 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2135,14 +2135,16 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
 		oldcgrp = task_cgroup_from_root(tsk, root);
 		if (cgrp == oldcgrp)
 			continue;
+
+		/* if the thread is PF_EXITING, it can just get skipped. */
+		retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
+		BUG_ON(retval != 0 && retval != -ESRCH);
+
 		/* attach each task to each subsystem */
 		for_each_subsys(root, ss) {
 			if (ss->attach_task)
 				ss->attach_task(cgrp, tsk);
 		}
-		/* if the thread is PF_EXITING, it can just get skipped. */
-		retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
-		BUG_ON(retval != 0 && retval != -ESRCH);
 	}
 	/* nothing is sensitive to fork() after this point. */
 
-- 
1.7.6

^ permalink raw reply related

* [PATCHSET] cgroup: introduce cgroup_taskset and consolidate subsys methods, take#2
From: Tejun Heo @ 2011-08-25 22:43 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: fweisbec, containers, linux-kernel, linux-pm, kamezawa.hiroyu

Hello,

This is the second take of cgroup_taskset patchset.  This patchset
introduces cgroup_taskset along with some accessors and iterator,
updates methods to use it, consolidates usages and drops superflous
methods.  For more info, please read the head message and discussions
from the last take[L].

Changes from the last take[L] are mostly minor.

* Acked-by's added
* patch contamination fixed
* local variable renamed
* documentation updated

It contains the following six patches.

 0001-cgroup-subsys-attach_task-should-be-called-after-mig.patch
 0002-cgroup-improve-old-cgroup-handling-in-cgroup_attach_.patch
 0003-cgroup-introduce-cgroup_taskset-and-use-it-in-subsys.patch
 0004-cgroup-don-t-use-subsys-can_attach_task-or-attach_ta.patch
 0005-cgroup-cpuset-don-t-use-ss-pre_attach.patch
 0006-cgroup-kill-subsys-can_attach_task-pre_attach-and-at.patch

and is based on the current linux-pm/pm-freezer (7b5b95b3f5 "freezer:
remove should_send_signal() and update frozen()"), and available in
the following git tree.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer

Any ideas on how to route these patches?  If we float these in -mm,
further freezer patches would have to be in -mm too which in turn will
force job control patches depending on them to -mm too.  It would be
really nice if we can find a stable git branch to host these.  Routing
through pm-freezer might not be such a bad idea either, I think.

Thank you.

 Documentation/cgroups/cgroups.txt |   51 +++------
 block/blk-cgroup.c                |   45 +++++---
 include/linux/cgroup.h            |   31 ++++-
 kernel/cgroup.c                   |  200 ++++++++++++++++++++++++--------------
 kernel/cgroup_freezer.c           |   16 ---
 kernel/cpuset.c                   |  105 +++++++++----------
 kernel/events/core.c              |   13 +-
 kernel/sched.c                    |   31 +++--
 mm/memcontrol.c                   |   16 +--
 security/device_cgroup.c          |    7 -
 10 files changed, 294 insertions(+), 221 deletions(-)

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel/1183130

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox