Linux Power Management development
 help / color / mirror / Atom feed
* [PATCHSET] cgroup: introduce cgroup_taskset and consolidate subsys methods, take#2
From: Tejun Heo @ 2011-08-25 22:43 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: fweisbec, containers, linux-kernel, linux-pm, kamezawa.hiroyu

Hello,

This is the second take of cgroup_taskset patchset.  This patchset
introduces cgroup_taskset along with some accessors and iterator,
updates methods to use it, consolidates usages and drops superflous
methods.  For more info, please read the head message and discussions
from the last take[L].

Changes from the last take[L] are mostly minor.

* Acked-by's added
* patch contamination fixed
* local variable renamed
* documentation updated

It contains the following six patches.

 0001-cgroup-subsys-attach_task-should-be-called-after-mig.patch
 0002-cgroup-improve-old-cgroup-handling-in-cgroup_attach_.patch
 0003-cgroup-introduce-cgroup_taskset-and-use-it-in-subsys.patch
 0004-cgroup-don-t-use-subsys-can_attach_task-or-attach_ta.patch
 0005-cgroup-cpuset-don-t-use-ss-pre_attach.patch
 0006-cgroup-kill-subsys-can_attach_task-pre_attach-and-at.patch

and is based on the current linux-pm/pm-freezer (7b5b95b3f5 "freezer:
remove should_send_signal() and update frozen()"), and available in
the following git tree.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer

Any ideas on how to route these patches?  If we float these in -mm,
further freezer patches would have to be in -mm too which in turn will
force job control patches depending on them to -mm too.  It would be
really nice if we can find a stable git branch to host these.  Routing
through pm-freezer might not be such a bad idea either, I think.

Thank you.

 Documentation/cgroups/cgroups.txt |   51 +++------
 block/blk-cgroup.c                |   45 +++++---
 include/linux/cgroup.h            |   31 ++++-
 kernel/cgroup.c                   |  200 ++++++++++++++++++++++++--------------
 kernel/cgroup_freezer.c           |   16 ---
 kernel/cpuset.c                   |  105 +++++++++----------
 kernel/events/core.c              |   13 +-
 kernel/sched.c                    |   31 +++--
 mm/memcontrol.c                   |   16 +--
 security/device_cgroup.c          |    7 -
 10 files changed, 294 insertions(+), 221 deletions(-)

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel/1183130

^ permalink raw reply

* Re: [PATCH 02/11] PM: extend PM QoS with per-device wake-up constraints
From: Rafael J. Wysocki @ 2011-08-25 14:49 UTC (permalink / raw)
  To: Jean Pihet
  Cc: markgross, Mark Brown, Linux PM mailing list, linux-omap,
	Jean Pihet
In-Reply-To: <CAORVsuW7M_i6s85X5NFWv=DEeoARqFo47cyrrqq87djJ3nUFDQ@mail.gmail.com>

On Thursday, August 25, 2011, Jean Pihet wrote:
> On Thu, Aug 25, 2011 at 4:17 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Thursday, August 25, 2011, Mark Brown wrote:
> >> On Tue, Aug 23, 2011 at 11:31:54PM +0200, Rafael J. Wysocki wrote:
> >>
> >> > Perhaps.  Still, that requires some policy to be put into drivers,
> >> > which I don't think is entirely correct.
> >>
> >> So, I don't really have the bandwidth to discuss this properly for at
> >> least the next week or so - is it possible to find some time in the
> >> power track at LPC for this?
> >
> > I think so, I'm going to add PM QoS to the agenda anyway.
> Unfortunately I cannot attend the conference this time.

Sorry to hear that.

> Can you please keep me in the loop of the discussions?

Sure, we will.

> I am glad to help implementing the code for the user space API.

Great, thanks!

^ permalink raw reply

* Re: [PATCH 02/11] PM: extend PM QoS with per-device wake-up constraints
From: Jean Pihet @ 2011-08-25 14:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: markgross, Mark Brown, Linux PM mailing list, linux-omap,
	Jean Pihet
In-Reply-To: <201108251617.31959.rjw@sisk.pl>

On Thu, Aug 25, 2011 at 4:17 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Thursday, August 25, 2011, Mark Brown wrote:
>> On Tue, Aug 23, 2011 at 11:31:54PM +0200, Rafael J. Wysocki wrote:
>>
>> > Perhaps.  Still, that requires some policy to be put into drivers,
>> > which I don't think is entirely correct.
>>
>> So, I don't really have the bandwidth to discuss this properly for at
>> least the next week or so - is it possible to find some time in the
>> power track at LPC for this?
>
> I think so, I'm going to add PM QoS to the agenda anyway.
Unfortunately I cannot attend the conference this time.
Can you please keep me in the loop of the discussions? I am glad to
help implementing the code for the user space API.

>
> Thanks,
> Rafael

Thanks,
Jean

> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH] fuse: Freeze client on suspend when request sent to userspace
From: Miklos Szeredi @ 2011-08-25 14:20 UTC (permalink / raw)
  To: Todd Poynor
  Cc: fuse-devel, linux-fsdevel, Linux PM mailing list, linux-kernel
In-Reply-To: <1314230364-776-1-git-send-email-toddpoynor@google.com>

Todd Poynor <toddpoynor@google.com> writes:

> Suspend attempts can abort when the FUSE daemon is already frozen
> and a client is waiting uninterruptibly for a response, causing
> freezing of tasks to fail.
>
> Use the freeze-friendly wait API, but disregard other signals.
>
> Signed-off-by: Todd Poynor <toddpoynor@google.com>
> ---
> Have seen reports in which repeated suspend attempts were aborted
> due to a task waiting uninterruptibly in this function, but
> have only reproduced this artificially, by causing the daemon to
> sleep.  Only modified the normal request path, not request aborts
> and such, under the assumption that these should be rare and
> should make progress upon resume.  Certain apps that read or
> write a lot of data on the filesystem may apparently run into
> this case rather frequently.


Yes, this is a well known problem with no (known) trivial solutions.

Problem with the patch is, it's going to solve only a subset of the
suspend aborts.  It will allow any operation sent to userspace to
freeze, which is fine.  But imagine that such a request is holding a VFS
lock (e.g. write(2) is hoding i_mutex) and another operation is waiting
on that lock.  That one still won't be freezable and will prevent
suspend to proceed.

I'm not sure it's really worth fixing only a subset of the problematic
cases.  We should definitely be thinking about solving it properly, in
all cases.

Thanks,
Miklos

>
>  fs/fuse/dev.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 168a80f..bded2e5 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -19,6 +19,7 @@
>  #include <linux/pipe_fs_i.h>
>  #include <linux/swap.h>
>  #include <linux/splice.h>
> +#include <linux/freezer.h>
>  
>  MODULE_ALIAS_MISCDEV(FUSE_MINOR);
>  MODULE_ALIAS("devname:fuse");
> @@ -383,7 +384,10 @@ __acquires(fc->lock)
>  	 * Wait it out.
>  	 */
>  	spin_unlock(&fc->lock);
> -	wait_event(req->waitq, req->state == FUSE_REQ_FINISHED);
> +
> +	while (req->state != FUSE_REQ_FINISHED)
> +		wait_event_freezable(req->waitq,
> +				     req->state == FUSE_REQ_FINISHED);
>  	spin_lock(&fc->lock);
>  
>  	if (!req->aborted)

^ permalink raw reply

* Re: [PATCH 02/11] PM: extend PM QoS with per-device wake-up constraints
From: Rafael J. Wysocki @ 2011-08-25 14:17 UTC (permalink / raw)
  To: Mark Brown; +Cc: Linux PM mailing list, linux-omap, Jean Pihet, markgross
In-Reply-To: <20110825103847.GD31154@opensource.wolfsonmicro.com>

On Thursday, August 25, 2011, Mark Brown wrote:
> On Tue, Aug 23, 2011 at 11:31:54PM +0200, Rafael J. Wysocki wrote:
> 
> > Perhaps.  Still, that requires some policy to be put into drivers,
> > which I don't think is entirely correct.
> 
> So, I don't really have the bandwidth to discuss this properly for at
> least the next week or so - is it possible to find some time in the
> power track at LPC for this?

I think so, I'm going to add PM QoS to the agenda anyway.

Thanks,
Rafael

^ permalink raw reply

* Re: [PATCH 1/1] OMAP: omap_device: only override _noirq methods, not normal suspend/resume
From: Rafael J. Wysocki @ 2011-08-25 14:12 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: linux-pm, linux-omap, linux-arm-kernel
In-Reply-To: <1314229437-31836-2-git-send-email-khilman@ti.com>

On Thursday, August 25, 2011, Kevin Hilman wrote:
> commit c03f007a8bf0e092caeb6856a5c8a850df10b974 (OMAP: PM:
> omap_device: add system PM methods for PM domain handling) mistakenly
> used SET_SYSTEM_SLEEP_PM_OPS() when trying to configure custom methods
> for the PM domains noirq methods.  Fix that by setting only the
> suspend_noirq and resume_noirq methods with custom versions.
> 
> Note that all other PM domain methods (including the "normal"
> suspend/resume methods) are populated using USE_PLATFORM_PM_SLEEP_OPS,
> which configures them all to the default subsystem (platform_bus)
> methods.
> 
> Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> Signed-off-by: Kevin Hilman <khilman@ti.com>

Applied to linux-pm/pm-fixes, will be pushed for 3.1, thanks!

Rafael


> ---
>  arch/arm/plat-omap/omap_device.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm/plat-omap/omap_device.c b/arch/arm/plat-omap/omap_device.c
> index b6b4097..9a6a538 100644
> --- a/arch/arm/plat-omap/omap_device.c
> +++ b/arch/arm/plat-omap/omap_device.c
> @@ -622,7 +622,8 @@ static struct dev_pm_domain omap_device_pm_domain = {
>  		SET_RUNTIME_PM_OPS(_od_runtime_suspend, _od_runtime_resume,
>  				   _od_runtime_idle)
>  		USE_PLATFORM_PM_SLEEP_OPS
> -		SET_SYSTEM_SLEEP_PM_OPS(_od_suspend_noirq, _od_resume_noirq)
> +		.suspend_noirq = _od_suspend_noirq,
> +		.resume_noirq = _od_resume_noirq,
>  	}
>  };
>  
> 

^ permalink raw reply

* Re: [PATCH 02/11] PM: extend PM QoS with per-device wake-up constraints
From: Mark Brown @ 2011-08-25 10:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM mailing list, linux-omap, Jean Pihet, markgross
In-Reply-To: <201108232331.54844.rjw@sisk.pl>

On Tue, Aug 23, 2011 at 11:31:54PM +0200, Rafael J. Wysocki wrote:

> Perhaps.  Still, that requires some policy to be put into drivers,
> which I don't think is entirely correct.

So, I don't really have the bandwidth to discuss this properly for at
least the next week or so - is it possible to find some time in the
power track at LPC for this?

^ permalink raw reply

* Re: [PATCH 6/6] cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
From: Paul Menage @ 2011-08-25  9:45 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, linux-pm, lizf, linux-kernel
In-Reply-To: <1314138000-2049-7-git-send-email-tj@kernel.org>

On Tue, Aug 23, 2011 at 3:20 PM, Tejun Heo <tj@kernel.org> wrote:
> These three methods are no longer used.  Kill them.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>

Acked-by: Paul Menage <paul@paulmenage.org>

Overall, I think this patch set is a great improvement in code
simplicity and clarity.

Thanks,
Paul

> Cc: Paul Menage <paul@paulmenage.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>
> ---
>  include/linux/cgroup.h |    3 --
>  kernel/cgroup.c        |   53 ++++-------------------------------------------
>  2 files changed, 5 insertions(+), 51 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 2470c8e..5659d37 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -490,11 +490,8 @@ struct cgroup_subsys {
>        void (*destroy)(struct cgroup_subsys *ss, struct cgroup *cgrp);
>        int (*can_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
>                          struct cgroup_taskset *tset);
> -       int (*can_attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
>        void (*cancel_attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
>                              struct cgroup_taskset *tset);
> -       void (*pre_attach)(struct cgroup *cgrp);
> -       void (*attach_task)(struct cgroup *cgrp, struct task_struct *tsk);
>        void (*attach)(struct cgroup_subsys *ss, struct cgroup *cgrp,
>                       struct cgroup_taskset *tset);
>        void (*fork)(struct cgroup_subsys *ss, struct task_struct *task);
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 474674b..374a4cb 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1926,13 +1926,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
>                                goto out;
>                        }
>                }
> -               if (ss->can_attach_task) {
> -                       retval = ss->can_attach_task(cgrp, tsk);
> -                       if (retval) {
> -                               failed_ss = ss;
> -                               goto out;
> -                       }
> -               }
>        }
>
>        retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, false);
> @@ -1940,10 +1933,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
>                goto out;
>
>        for_each_subsys(root, ss) {
> -               if (ss->pre_attach)
> -                       ss->pre_attach(cgrp);
> -               if (ss->attach_task)
> -                       ss->attach_task(cgrp, tsk);
>                if (ss->attach)
>                        ss->attach(ss, cgrp, &tset);
>        }
> @@ -2075,7 +2064,6 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
>  {
>        int retval, i, group_size, nr_todo;
>        struct cgroup_subsys *ss, *failed_ss = NULL;
> -       bool cancel_failed_ss = false;
>        /* guaranteed to be initialized later, but the compiler needs this */
>        struct css_set *oldcg;
>        struct cgroupfs_root *root = cgrp->root;
> @@ -2166,21 +2154,6 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
>                                goto out_cancel_attach;
>                        }
>                }
> -               /* a callback to be run on every thread in the threadgroup. */
> -               if (ss->can_attach_task) {
> -                       /* run on each task in the threadgroup. */
> -                       for (i = 0; i < group_size; i++) {
> -                               tc = flex_array_get(group, i);
> -                               if (tc->cgrp == cgrp)
> -                                       continue;
> -                               retval = ss->can_attach_task(cgrp, tc->task);
> -                               if (retval) {
> -                                       failed_ss = ss;
> -                                       cancel_failed_ss = true;
> -                                       goto out_cancel_attach;
> -                               }
> -                       }
> -               }
>        }
>
>        /*
> @@ -2217,15 +2190,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
>        }
>
>        /*
> -        * step 3: now that we're guaranteed success wrt the css_sets, proceed
> -        * to move all tasks to the new cgroup, calling ss->attach_task for each
> -        * one along the way. there are no failure cases after here, so this is
> -        * the commit point.
> +        * step 3: now that we're guaranteed success wrt the css_sets,
> +        * proceed to move all tasks to the new cgroup.  There are no
> +        * failure cases after here, so this is the commit point.
>         */
> -       for_each_subsys(root, ss) {
> -               if (ss->pre_attach)
> -                       ss->pre_attach(cgrp);
> -       }
>        for (i = 0; i < group_size; i++) {
>                tc = flex_array_get(group, i);
>                /* leave current thread as it is if it's already there */
> @@ -2235,19 +2203,11 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
>                /* if the thread is PF_EXITING, it can just get skipped. */
>                retval = cgroup_task_migrate(cgrp, tc->cgrp, tc->task, true);
>                BUG_ON(retval != 0 && retval != -ESRCH);
> -
> -               /* attach each task to each subsystem */
> -               for_each_subsys(root, ss) {
> -                       if (ss->attach_task)
> -                               ss->attach_task(cgrp, tc->task);
> -               }
>        }
>        /* nothing is sensitive to fork() after this point. */
>
>        /*
> -        * step 4: do expensive, non-thread-specific subsystem callbacks.
> -        * TODO: if ever a subsystem needs to know the oldcgrp for each task
> -        * being moved, this call will need to be reworked to communicate that.
> +        * step 4: do subsystem attach callbacks.
>         */
>        for_each_subsys(root, ss) {
>                if (ss->attach)
> @@ -2271,11 +2231,8 @@ out_cancel_attach:
>        /* same deal as in cgroup_attach_task */
>        if (retval) {
>                for_each_subsys(root, ss) {
> -                       if (ss == failed_ss) {
> -                               if (cancel_failed_ss && ss->cancel_attach)
> -                                       ss->cancel_attach(ss, cgrp, &tset);
> +                       if (ss == failed_ss)
>                                break;
> -                       }
>                        if (ss->cancel_attach)
>                                ss->cancel_attach(ss, cgrp, &tset);
>                }
> --
> 1.7.6
>
>

^ permalink raw reply

* Re: [PATCH 2/6] cgroup: improve old cgroup handling in cgroup_attach_proc()
From: Tejun Heo @ 2011-08-25  9:44 UTC (permalink / raw)
  To: Paul Menage; +Cc: containers, linux-pm, lizf, linux-kernel
In-Reply-To: <CALdu-PBr-tu1qzScvncr-N4EpPaQ7sTdHf28GhEv_MZLbo1eSg@mail.gmail.com>

On Thu, Aug 25, 2011 at 02:42:46AM -0700, Paul Menage wrote:
> > +       int retval, i, group_size, nr_todo;
> 
> I'd be inclined to call "nr_todo" something like "nr_migrating_tasks"
> for clarity.

Will update.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 2/6] cgroup: improve old cgroup handling in cgroup_attach_proc()
From: Paul Menage @ 2011-08-25  9:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, linux-pm, lizf, linux-kernel
In-Reply-To: <1314138000-2049-3-git-send-email-tj@kernel.org>

On Tue, Aug 23, 2011 at 3:19 PM, Tejun Heo <tj@kernel.org> wrote:
> cgroup_attach_proc() behaves differently from cgroup_attach_task() in
> the following aspects.
>
> * All hooks are invoked even if no task is actually being moved.
>
> * ->can_attach_task() is called for all tasks in the group whether the
>  new cgrp is different from the current cgrp or not; however,
>  ->attach_task() is skipped if new equals new.  This makes the calls
>  asymmetric.
>
> This patch improves old cgroup handling in cgroup_attach_proc() by
> looking up the current cgroup at the head, recording it in the flex
> array along with the task itself, and using it to remove the above two
> differences.  This will also ease further changes.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>

Acked-by: Paul Menage <paul@paulmenage.org>

With the later cgroup_taskset changes making use of the same flex
array, I guess I agree that leaving all the tasks in the array makes
sense.

> +       int retval, i, group_size, nr_todo;

I'd be inclined to call "nr_todo" something like "nr_migrating_tasks"
for clarity.

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Paul Menage @ 2011-08-25  9:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daisuke Nishimura, containers, lizf, linux-kernel, James Morris,
	linux-pm, KAMEZAWA Hiroyuki
In-Reply-To: <1314138000-2049-4-git-send-email-tj@kernel.org>

On Tue, Aug 23, 2011 at 3:19 PM, Tejun Heo <tj@kernel.org> wrote:
> +Called prior to moving one or more tasks into a cgroup; if the
> +subsystem returns an error, this will abort the attach operation.
> +@tset contains the tasks to be attached and is guaranteed to have at
> +least one task in it. If there are multiple, it's guaranteed that all
> +are from the same thread group, @tset contains all tasks from the
> +group whether they're actually switching cgroup or not, and the first
> +task is the leader.

I'd reformat this a bit for clarity:

If there are multiple tasks in the taskset, then:
  - it's guaranteed that all are from the same thread group
  - @tset contains all tasks from the thread group whether or not
they're switching cgroup
  - the first task is the leader

Acked-by: Paul Menage <paul@paulmenage.org>

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Paul Menage @ 2011-08-25  9:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daisuke Nishimura, containers, lizf, linux-kernel, James Morris,
	linux-pm, KAMEZAWA Hiroyuki
In-Reply-To: <20110825092045.GG3286@htj.dyndns.org>

On Thu, Aug 25, 2011 at 2:20 AM, Tejun Heo <tj@kernel.org> wrote:
>
> I think having separate ->attach() and ->attach_task() is inherently
> broken.  Look at the memcg discussion I had in this thread for
> reference and as soon as we need to do something across the tasks
> being migrated, iteration-by-callback becomes very painful.
> e.g. let's say memcg wants to find the mm->owner and wants to print
> warning or fail if that doesn't work out.  How would that be
> implemented if it's iterating by callback.

OK, fair point. See my other email for patch comments.

Paul

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Tejun Heo @ 2011-08-25  9:20 UTC (permalink / raw)
  To: Paul Menage
  Cc: Daisuke Nishimura, containers, lizf, linux-kernel, James Morris,
	linux-pm, KAMEZAWA Hiroyuki
In-Reply-To: <CALdu-PDAgqeRJt5vqTB9wddwz70Yn+Jf-Pb0dDKDBD_q37tHQg@mail.gmail.com>

Hello,

On Thu, Aug 25, 2011 at 02:14:12AM -0700, Paul Menage wrote:
> The general idea of passing consistent information to all *attach
> methods seems good, but isn't it simpler to just fix up the various
> method signatures?

I think having separate ->attach() and ->attach_task() is inherently
broken.  Look at the memcg discussion I had in this thread for
reference and as soon as we need to do something across the tasks
being migrated, iteration-by-callback becomes very painful.
e.g. let's say memcg wants to find the mm->owner and wants to print
warning or fail if that doesn't work out.  How would that be
implemented if it's iterating by callback.

> The whole point of having *attach() and *attach_task() was to minimize
> the amount of boilerplate (in this case, iterating across a new
> cgroup_taskset abstraction) in the subsystems, leaving that to the
> cgroups framework.

Yeah, I agree with making things easier for subsystems but I violently
disagree that iteration-by-callback is helpful in any way.  If
control-loop style iterator is at all possible, it's almost always
better to go that way.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Paul Menage @ 2011-08-25  9:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daisuke Nishimura, containers, lizf, linux-kernel, James Morris,
	linux-pm, KAMEZAWA Hiroyuki
In-Reply-To: <1314138000-2049-4-git-send-email-tj@kernel.org>

On Tue, Aug 23, 2011 at 3:19 PM, Tejun Heo <tj@kernel.org> wrote:
> Currently, there's no way to pass multiple tasks to cgroup_subsys
> methods necessitating the need for separate per-process and per-task
> methods.  This patch introduces cgroup_taskset which can be used to
> pass multiple tasks and their associated cgroups to cgroup_subsys
> methods.
>
> Three methods - can_attach(), cancel_attach() and attach() - are
> converted to use cgroup_taskset.  This unifies passed parameters so
> that all methods have access to all information.  Conversions in this
> patchset are identical and don't introduce any behavior change.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>

The general idea of passing consistent information to all *attach
methods seems good, but isn't it simpler to just fix up the various
method signatures?

The whole point of having *attach() and *attach_task() was to minimize
the amount of boilerplate (in this case, iterating across a new
cgroup_taskset abstraction) in the subsystems, leaving that to the
cgroups framework.

Paul

^ permalink raw reply

* Re: [PATCH 4/6] cgroup: don't use subsys->can_attach_task() or ->attach_task()
From: Tejun Heo @ 2011-08-25  9:12 UTC (permalink / raw)
  To: Paul Menage
  Cc: Peter Zijlstra, Daisuke Nishimura, containers, Ingo Molnar, lizf,
	linux-kernel, James Morris, linux-pm, KAMEZAWA Hiroyuki
In-Reply-To: <CALdu-PCc2RzedXubReF9huamL6W+5qGCfXNNvqS2yUk3QTHRng@mail.gmail.com>

Hello, Paul.

On Thu, Aug 25, 2011 at 02:07:35AM -0700, Paul Menage wrote:
> Doesn't the other part of this patch set, that avoids calling the
> *attach() methods for tasks that aren't moving, eliminate the need for
> the usage of skip_cgrp here (and elsewhere)? When do we actually need
> to pass a non-NULL skip_cgrp to cgroup_taskset_for_each()?

If any task is moving ->*attach() should be called.  Whether the @tset
passed in should contain tasks which aren't changing cgroups is
debatable.  The operation is guaranteed to be for an entire thread
group and it makes sense to make at least the leader always available
even if it's not moving.  Given that the operation is defined to be
per-thread-group, I think it's better to pass in the whole thread
group with an easy way to skip the ones which aren't moving.  For
example, memcg seems to need to find the mm->owner and it's possible
that the mm->owner might not be changing cgroups.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 4/6] cgroup: don't use subsys->can_attach_task() or ->attach_task()
From: Paul Menage @ 2011-08-25  9:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Peter Zijlstra, Daisuke Nishimura, containers, Ingo Molnar, lizf,
	linux-kernel, James Morris, linux-pm, KAMEZAWA Hiroyuki
In-Reply-To: <1314138000-2049-5-git-send-email-tj@kernel.org>

On Tue, Aug 23, 2011 at 3:19 PM, Tejun Heo <tj@kernel.org> wrote:
> Now that subsys->can_attach() and attach() take @tset instead of
> @task, they can handle per-task operations.  Convert
> ->can_attach_task() and ->attach_task() users to use ->can_attach()
> and attach() instead.  Most converions are straight-forward.
> Noteworthy changes are,
>
> * In cgroup_freezer, remove unnecessary NULL assignments to unused
>  methods.  It's useless and very prone to get out of sync, which
>  already happened.
>
> * In cpuset, PF_THREAD_BOUND test is checked for each task.  This
>  doesn't make any practical difference but is conceptually cleaner.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Paul Menage <paul@paulmenage.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: James Morris <jmorris@namei.org>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  block/blk-cgroup.c      |   45 +++++++++++++++++++-----------
>  kernel/cgroup_freezer.c |   14 +++-------
>  kernel/cpuset.c         |   70 +++++++++++++++++++++-------------------------
>  kernel/events/core.c    |   13 +++++---
>  kernel/sched.c          |   31 +++++++++++++--------
>  5 files changed, 91 insertions(+), 82 deletions(-)
>
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index bcaf16e..99e0bd4 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -30,8 +30,10 @@ EXPORT_SYMBOL_GPL(blkio_root_cgroup);
>
>  static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *,
>                                                  struct cgroup *);
> -static int blkiocg_can_attach_task(struct cgroup *, struct task_struct *);
> -static void blkiocg_attach_task(struct cgroup *, struct task_struct *);
> +static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *,
> +                             struct cgroup_taskset *);
> +static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *,
> +                          struct cgroup_taskset *);
>  static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *);
>  static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
>
> @@ -44,8 +46,8 @@ static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *);
>  struct cgroup_subsys blkio_subsys = {
>        .name = "blkio",
>        .create = blkiocg_create,
> -       .can_attach_task = blkiocg_can_attach_task,
> -       .attach_task = blkiocg_attach_task,
> +       .can_attach = blkiocg_can_attach,
> +       .attach = blkiocg_attach,
>        .destroy = blkiocg_destroy,
>        .populate = blkiocg_populate,
>  #ifdef CONFIG_BLK_CGROUP
> @@ -1614,30 +1616,39 @@ done:
>  * of the main cic data structures.  For now we allow a task to change
>  * its cgroup only if it's the only owner of its ioc.
>  */
> -static int blkiocg_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
> +static int blkiocg_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> +                             struct cgroup_taskset *tset)
>  {
> +       struct task_struct *task;
>        struct io_context *ioc;
>        int ret = 0;
>
>        /* task_lock() is needed to avoid races with exit_io_context() */
> -       task_lock(tsk);
> -       ioc = tsk->io_context;
> -       if (ioc && atomic_read(&ioc->nr_tasks) > 1)
> -               ret = -EINVAL;
> -       task_unlock(tsk);
> -
> +       cgroup_taskset_for_each(task, cgrp, tset) {
> +               task_lock(task);
> +               ioc = task->io_context;
> +               if (ioc && atomic_read(&ioc->nr_tasks) > 1)
> +                       ret = -EINVAL;
> +               task_unlock(task);
> +               if (ret)
> +                       break;
> +       }

Doesn't the other part of this patch set, that avoids calling the
*attach() methods for tasks that aren't moving, eliminate the need for
the usage of skip_cgrp here (and elsewhere)? When do we actually need
to pass a non-NULL skip_cgrp to cgroup_taskset_for_each()?

Paul

>        return ret;
>  }
>
> -static void blkiocg_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
> +static void blkiocg_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> +                          struct cgroup_taskset *tset)
>  {
> +       struct task_struct *task;
>        struct io_context *ioc;
>
> -       task_lock(tsk);
> -       ioc = tsk->io_context;
> -       if (ioc)
> -               ioc->cgroup_changed = 1;
> -       task_unlock(tsk);
> +       cgroup_taskset_for_each(task, cgrp, tset) {
> +               task_lock(task);
> +               ioc = task->io_context;
> +               if (ioc)
> +                       ioc->cgroup_changed = 1;
> +               task_unlock(task);
> +       }
>  }
>
>  void blkio_policy_register(struct blkio_policy_type *blkiop)
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index a2b0082..2cb5e72 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -162,10 +162,14 @@ static int freezer_can_attach(struct cgroup_subsys *ss,
>                              struct cgroup_taskset *tset)
>  {
>        struct freezer *freezer;
> +       struct task_struct *task;
>
>        /*
>         * Anything frozen can't move or be moved to/from.
>         */
> +       cgroup_taskset_for_each(task, new_cgroup, tset)
> +               if (cgroup_freezing(task))
> +                       return -EBUSY;
>
>        freezer = cgroup_freezer(new_cgroup);
>        if (freezer->state != CGROUP_THAWED)
> @@ -174,11 +178,6 @@ static int freezer_can_attach(struct cgroup_subsys *ss,
>        return 0;
>  }
>
> -static int freezer_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
> -{
> -       return cgroup_freezing(tsk) ? -EBUSY : 0;
> -}
> -
>  static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
>  {
>        struct freezer *freezer;
> @@ -374,10 +373,5 @@ struct cgroup_subsys freezer_subsys = {
>        .populate       = freezer_populate,
>        .subsys_id      = freezer_subsys_id,
>        .can_attach     = freezer_can_attach,
> -       .can_attach_task = freezer_can_attach_task,
> -       .pre_attach     = NULL,
> -       .attach_task    = NULL,
> -       .attach         = NULL,
>        .fork           = freezer_fork,
> -       .exit           = NULL,
>  };
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 2e5825b..472ddd6 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1372,33 +1372,34 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
>                             struct cgroup_taskset *tset)
>  {
>        struct cpuset *cs = cgroup_cs(cgrp);
> +       struct task_struct *task;
> +       int ret;
>
>        if (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))
>                return -ENOSPC;
>
> -       /*
> -        * Kthreads bound to specific cpus cannot be moved to a new cpuset; we
> -        * cannot change their cpu affinity and isolating such threads by their
> -        * set of allowed nodes is unnecessary.  Thus, cpusets are not
> -        * applicable for such threads.  This prevents checking for success of
> -        * set_cpus_allowed_ptr() on all attached tasks before cpus_allowed may
> -        * be changed.
> -        */
> -       if (cgroup_taskset_first(tset)->flags & PF_THREAD_BOUND)
> -               return -EINVAL;
> -
> +       cgroup_taskset_for_each(task, cgrp, tset) {
> +               /*
> +                * Kthreads bound to specific cpus cannot be moved to a new
> +                * cpuset; we cannot change their cpu affinity and
> +                * isolating such threads by their set of allowed nodes is
> +                * unnecessary.  Thus, cpusets are not applicable for such
> +                * threads.  This prevents checking for success of
> +                * set_cpus_allowed_ptr() on all attached tasks before
> +                * cpus_allowed may be changed.
> +                */
> +               if (task->flags & PF_THREAD_BOUND)
> +                       return -EINVAL;
> +               if ((ret = security_task_setscheduler(task)))
> +                       return ret;
> +       }
>        return 0;
>  }
>
> -static int cpuset_can_attach_task(struct cgroup *cgrp, struct task_struct *task)
> -{
> -       return security_task_setscheduler(task);
> -}
> -
>  /*
>  * Protected by cgroup_lock. The nodemasks must be stored globally because
>  * dynamically allocating them is not allowed in pre_attach, and they must
> - * persist among pre_attach, attach_task, and attach.
> + * persist among pre_attach, and attach.
>  */
>  static cpumask_var_t cpus_attach;
>  static nodemask_t cpuset_attach_nodemask_from;
> @@ -1417,39 +1418,34 @@ static void cpuset_pre_attach(struct cgroup *cont)
>        guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
>  }
>
> -/* Per-thread attachment work. */
> -static void cpuset_attach_task(struct cgroup *cont, struct task_struct *tsk)
> -{
> -       int err;
> -       struct cpuset *cs = cgroup_cs(cont);
> -
> -       /*
> -        * can_attach beforehand should guarantee that this doesn't fail.
> -        * TODO: have a better way to handle failure here
> -        */
> -       err = set_cpus_allowed_ptr(tsk, cpus_attach);
> -       WARN_ON_ONCE(err);
> -
> -       cpuset_change_task_nodemask(tsk, &cpuset_attach_nodemask_to);
> -       cpuset_update_task_spread_flag(cs, tsk);
> -}
> -
>  static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
>                          struct cgroup_taskset *tset)
>  {
>        struct mm_struct *mm;
> -       struct task_struct *tsk = cgroup_taskset_first(tset);
> +       struct task_struct *task;
> +       struct task_struct *leader = cgroup_taskset_first(tset);
>        struct cgroup *oldcgrp = cgroup_taskset_cur_cgroup(tset);
>        struct cpuset *cs = cgroup_cs(cgrp);
>        struct cpuset *oldcs = cgroup_cs(oldcgrp);
>
> +       cgroup_taskset_for_each(task, cgrp, tset) {
> +               /*
> +                * can_attach beforehand should guarantee that this doesn't
> +                * fail.  TODO: have a better way to handle failure here
> +                */
> +               WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));
> +
> +               cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to);
> +               cpuset_update_task_spread_flag(cs, task);
> +       }
> +
>        /*
>         * Change mm, possibly for multiple threads in a threadgroup. This is
>         * expensive and may sleep.
>         */
>        cpuset_attach_nodemask_from = oldcs->mems_allowed;
>        cpuset_attach_nodemask_to = cs->mems_allowed;
> -       mm = get_task_mm(tsk);
> +       mm = get_task_mm(leader);
>        if (mm) {
>                mpol_rebind_mm(mm, &cpuset_attach_nodemask_to);
>                if (is_memory_migrate(cs))
> @@ -1905,9 +1901,7 @@ struct cgroup_subsys cpuset_subsys = {
>        .create = cpuset_create,
>        .destroy = cpuset_destroy,
>        .can_attach = cpuset_can_attach,
> -       .can_attach_task = cpuset_can_attach_task,
>        .pre_attach = cpuset_pre_attach,
> -       .attach_task = cpuset_attach_task,
>        .attach = cpuset_attach,
>        .populate = cpuset_populate,
>        .post_clone = cpuset_post_clone,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index b8785e2..95e189d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7000,10 +7000,13 @@ static int __perf_cgroup_move(void *info)
>        return 0;
>  }
>
> -static void
> -perf_cgroup_attach_task(struct cgroup *cgrp, struct task_struct *task)
> +static void perf_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> +                              struct cgroup_taskset *tset)
>  {
> -       task_function_call(task, __perf_cgroup_move, task);
> +       struct task_struct *task;
> +
> +       cgroup_taskset_for_each(task, cgrp, tset)
> +               task_function_call(task, __perf_cgroup_move, task);
>  }
>
>  static void perf_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
> @@ -7017,7 +7020,7 @@ static void perf_cgroup_exit(struct cgroup_subsys *ss, struct cgroup *cgrp,
>        if (!(task->flags & PF_EXITING))
>                return;
>
> -       perf_cgroup_attach_task(cgrp, task);
> +       task_function_call(task, __perf_cgroup_move, task);
>  }
>
>  struct cgroup_subsys perf_subsys = {
> @@ -7026,6 +7029,6 @@ struct cgroup_subsys perf_subsys = {
>        .create         = perf_cgroup_create,
>        .destroy        = perf_cgroup_destroy,
>        .exit           = perf_cgroup_exit,
> -       .attach_task    = perf_cgroup_attach_task,
> +       .attach         = perf_cgroup_attach,
>  };
>  #endif /* CONFIG_CGROUP_PERF */
> diff --git a/kernel/sched.c b/kernel/sched.c
> index ccacdbd..dd7e460 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -8966,24 +8966,31 @@ cpu_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp)
>        sched_destroy_group(tg);
>  }
>
> -static int
> -cpu_cgroup_can_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
> +static int cpu_cgroup_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> +                                struct cgroup_taskset *tset)
>  {
> +       struct task_struct *task;
> +
> +       cgroup_taskset_for_each(task, cgrp, tset) {
>  #ifdef CONFIG_RT_GROUP_SCHED
> -       if (!sched_rt_can_attach(cgroup_tg(cgrp), tsk))
> -               return -EINVAL;
> +               if (!sched_rt_can_attach(cgroup_tg(cgrp), task))
> +                       return -EINVAL;
>  #else
> -       /* We don't support RT-tasks being in separate groups */
> -       if (tsk->sched_class != &fair_sched_class)
> -               return -EINVAL;
> +               /* We don't support RT-tasks being in separate groups */
> +               if (task->sched_class != &fair_sched_class)
> +                       return -EINVAL;
>  #endif
> +       }
>        return 0;
>  }
>
> -static void
> -cpu_cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
> +static void cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> +                             struct cgroup_taskset *tset)
>  {
> -       sched_move_task(tsk);
> +       struct task_struct *task;
> +
> +       cgroup_taskset_for_each(task, cgrp, tset)
> +               sched_move_task(task);
>  }
>
>  static void
> @@ -9071,8 +9078,8 @@ struct cgroup_subsys cpu_cgroup_subsys = {
>        .name           = "cpu",
>        .create         = cpu_cgroup_create,
>        .destroy        = cpu_cgroup_destroy,
> -       .can_attach_task = cpu_cgroup_can_attach_task,
> -       .attach_task    = cpu_cgroup_attach_task,
> +       .can_attach     = cpu_cgroup_can_attach,
> +       .attach         = cpu_cgroup_attach,
>        .exit           = cpu_cgroup_exit,
>        .populate       = cpu_cgroup_populate,
>        .subsys_id      = cpu_cgroup_subsys_id,
> --
> 1.7.6
>
>

^ permalink raw reply

* Re: [PATCH 5/6] cgroup, cpuset: don't use ss->pre_attach()
From: Tejun Heo @ 2011-08-25  9:06 UTC (permalink / raw)
  To: Paul Menage; +Cc: containers, linux-pm, lizf, linux-kernel
In-Reply-To: <CALdu-PD5EbFJBRHf-iehPwb6vyJTYUTWZniihARFDZ7xRZ8_nQ@mail.gmail.com>

On Thu, Aug 25, 2011 at 01:53:57AM -0700, Paul Menage wrote:
> On Tue, Aug 23, 2011 at 3:19 PM, Tejun Heo <tj@kernel.org> wrote:
> > ->pre_attach() is supposed to be called before migration, which is
> > observed during process migration but task migration does it the other
> > way around.  The only ->pre_attach() user is cpuset which can do the
> > same operaitons in ->can_attach().  Collapse cpuset_pre_attach() into
> > cpuset_can_attach().
> >
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> 
> Acked-by: Paul Menage <paul@paulmenage.org>
> 
> Code looks good, but I think that the some of the Documentation
> changes slipped in here by mistake.

Oops, indeed.  Will relocate them.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 2/6] cgroup: improve old cgroup handling in cgroup_attach_proc()
From: Tejun Heo @ 2011-08-25  9:03 UTC (permalink / raw)
  To: Paul Menage; +Cc: containers, linux-pm, lizf, linux-kernel
In-Reply-To: <CALdu-PAj1ZUmB2ixxA6yeppB8MerBGk1cSeQadobH0H4cRSe7Q@mail.gmail.com>

Hello,

On Thu, Aug 25, 2011 at 01:51:39AM -0700, Paul Menage wrote:
> In particular, making the group flex-array entries contain both a task
> and a cgroup appears to be only necessary in order to skip tasks where
> new_cgroup == old_cgroup. Can't we get the same effect by simply
> leaving all such tasks out of the flex-array in the first place?

In general, the interface *should* give full information to subsys
methods at each stage including the old cgroup each task is migrating
from and the new cgroup; otherwise, they soon end up developing weird
acrobatics to work around shortcomings in the interface or being
outright buggy, so let's please look past the fixes which are
necessary immediately and think about what a proper interface should
look like.

I mean, seriously, why did ->attach_task() take @new_cgroup when it's
called after migration happened while ->attach() had access to the old
cgroup of the last iterated task in the group?  What the hell does
that even mean?

And, why is this a big change?  The big part is change of interface
but that we need to do anyway.  This one is just adding an entry to
the flex array.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 5/6] cgroup, cpuset: don't use ss->pre_attach()
From: Paul Menage @ 2011-08-25  8:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, linux-pm, lizf, linux-kernel
In-Reply-To: <1314138000-2049-6-git-send-email-tj@kernel.org>

On Tue, Aug 23, 2011 at 3:19 PM, Tejun Heo <tj@kernel.org> wrote:
> ->pre_attach() is supposed to be called before migration, which is
> observed during process migration but task migration does it the other
> way around.  The only ->pre_attach() user is cpuset which can do the
> same operaitons in ->can_attach().  Collapse cpuset_pre_attach() into
> cpuset_can_attach().
>
> Signed-off-by: Tejun Heo <tj@kernel.org>

Acked-by: Paul Menage <paul@paulmenage.org>

Code looks good, but I think that the some of the Documentation
changes slipped in here by mistake.

Paul

> Cc: Paul Menage <paul@paulmenage.org>
> Cc: Li Zefan <lizf@cn.fujitsu.com>
> ---
>  Documentation/cgroups/cgroups.txt |   20 --------------------
>  kernel/cpuset.c                   |   29 ++++++++++++-----------------
>  2 files changed, 12 insertions(+), 37 deletions(-)
>
> diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
> index 2eee7cf..afb7cde 100644
> --- a/Documentation/cgroups/cgroups.txt
> +++ b/Documentation/cgroups/cgroups.txt
> @@ -610,13 +610,6 @@ called on a fork. If this method returns 0 (success) then this should
>  remain valid while the caller holds cgroup_mutex and it is ensured
>  that either attach() or cancel_attach() will be called in future.
>
> -int can_attach_task(struct cgroup *cgrp, struct task_struct *tsk);
> -(cgroup_mutex held by caller)
> -
> -As can_attach, but for operations that must be run once per task to be
> -attached (possibly many when using cgroup_attach_proc). Called after
> -can_attach.
> -
>  void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
>                   struct cgroup_taskset *tset)
>  (cgroup_mutex held by caller)
> @@ -627,12 +620,6 @@ function, so that the subsystem can implement a rollback. If not, not necessary.
>  This will be called only about subsystems whose can_attach() operation have
>  succeeded. The parameters are identical to can_attach().
>
> -void pre_attach(struct cgroup *cgrp);
> -(cgroup_mutex held by caller)
> -
> -For any non-per-thread attachment work that needs to happen before
> -attach_task. Needed by cpuset.
> -
>  void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
>            struct cgroup_taskset *tset)
>  (cgroup_mutex held by caller)
> @@ -641,13 +628,6 @@ Called after the task has been attached to the cgroup, to allow any
>  post-attachment activity that requires memory allocations or blocking.
>  The parameters are identical to can_attach().
>
> -void attach_task(struct cgroup *cgrp, struct task_struct *tsk);
> -(cgroup_mutex held by caller)
> -
> -As attach, but for operations that must be run once per task to be attached,
> -like can_attach_task. Called before attach. Currently does not support any
> -subsystem that might need the old_cgrp for every thread in the group.
> -
>  void fork(struct cgroup_subsy *ss, struct task_struct *task)
>
>  Called when a task is forked into a cgroup.
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 472ddd6..f0b8df3 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1367,6 +1367,15 @@ static int fmeter_getrate(struct fmeter *fmp)
>        return val;
>  }
>
> +/*
> + * Protected by cgroup_lock. The nodemasks must be stored globally because
> + * dynamically allocating them is not allowed in can_attach, and they must
> + * persist until attach.
> + */
> +static cpumask_var_t cpus_attach;
> +static nodemask_t cpuset_attach_nodemask_from;
> +static nodemask_t cpuset_attach_nodemask_to;
> +
>  /* Called by cgroups to determine if a cpuset is usable; cgroup_mutex held */
>  static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
>                             struct cgroup_taskset *tset)
> @@ -1393,29 +1402,16 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
>                if ((ret = security_task_setscheduler(task)))
>                        return ret;
>        }
> -       return 0;
> -}
> -
> -/*
> - * Protected by cgroup_lock. The nodemasks must be stored globally because
> - * dynamically allocating them is not allowed in pre_attach, and they must
> - * persist among pre_attach, and attach.
> - */
> -static cpumask_var_t cpus_attach;
> -static nodemask_t cpuset_attach_nodemask_from;
> -static nodemask_t cpuset_attach_nodemask_to;
> -
> -/* Set-up work for before attaching each task. */
> -static void cpuset_pre_attach(struct cgroup *cont)
> -{
> -       struct cpuset *cs = cgroup_cs(cont);
>
> +       /* prepare for attach */
>        if (cs == &top_cpuset)
>                cpumask_copy(cpus_attach, cpu_possible_mask);
>        else
>                guarantee_online_cpus(cs, cpus_attach);
>
>        guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
> +
> +       return 0;
>  }
>
>  static void cpuset_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> @@ -1901,7 +1897,6 @@ struct cgroup_subsys cpuset_subsys = {
>        .create = cpuset_create,
>        .destroy = cpuset_destroy,
>        .can_attach = cpuset_can_attach,
> -       .pre_attach = cpuset_pre_attach,
>        .attach = cpuset_attach,
>        .populate = cpuset_populate,
>        .post_clone = cpuset_post_clone,
> --
> 1.7.6
>
>

^ permalink raw reply

* Re: [PATCH v2] Add regulator driver for the bq2407x family of charger ICs.
From: Heiko Stübner @ 2011-08-25  8:51 UTC (permalink / raw)
  To: MyungJoo Ham; +Cc: linux-pm, Mark Brown, Liam Girdwood
In-Reply-To: <CAJ0PZbQRGCHAQ1HxDmbf6uN8iz3CCcqjOVRtc63Cjn17EXak1A@mail.gmail.com>

Hi MyungJoo

Am Donnerstag, 25. August 2011, 08:55:29 schrieb MyungJoo Ham:
> On Thu, Aug 25, 2011 at 3:48 AM, Heiko Stübner <heiko@sntech.de> wrote:
> > This driver controls a TI bq2407x charger attached via GPIOs.
> > The provided current regulator can enable/disable charging and
> > select between 100 mA, 500 mA and a machine specific current limit.
> > 
> > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> 
> This looks like a bq24022 driver + max-current-mode ("USB standby"
> seems not implemented in the driver). Wouldn't it be possible to patch
> bq24022 driver so that the bq24022 driver becomes compatible with this
> bq2407x?
> 
> 2407x's EN1 = 24022's iset2
> 2407x's nce = 24022's nce
> 
> I think you may simply let the driver ignore EN2 (and max-current
> mode) if the supplied EN2 in pdata is NULL, then, the driver will be
> compatible for both.

Thanks for adding a new perspective on this. The way you put it, it seems 
quite easy to do it this way and integrate it into the bq24022 driver.

So I will give it shot this evening and will see were it leads me.

Thanks
Heiko

^ permalink raw reply

* Re: [PATCH 2/6] cgroup: improve old cgroup handling in cgroup_attach_proc()
From: Paul Menage @ 2011-08-25  8:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, linux-pm, lizf, linux-kernel
In-Reply-To: <1314138000-2049-3-git-send-email-tj@kernel.org>

On Tue, Aug 23, 2011 at 3:19 PM, Tejun Heo <tj@kernel.org> wrote:
> cgroup_attach_proc() behaves differently from cgroup_attach_task() in
> the following aspects.
>
> * All hooks are invoked even if no task is actually being moved.
>
> * ->can_attach_task() is called for all tasks in the group whether the
>  new cgrp is different from the current cgrp or not; however,
>  ->attach_task() is skipped if new equals new.  This makes the calls
>  asymmetric.
>
> This patch improves old cgroup handling in cgroup_attach_proc() by
> looking up the current cgroup at the head, recording it in the flex
> array along with the task itself, and using it to remove the above two
> differences.  This will also ease further changes.

While I'm all in favour of making things more consistent, do we need
such a big change?

In particular, making the group flex-array entries contain both a task
and a cgroup appears to be only necessary in order to skip tasks where
new_cgroup == old_cgroup. Can't we get the same effect by simply
leaving all such tasks out of the flex-array in the first place?

Paul

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Tejun Heo @ 2011-08-25  8:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, containers, lizf, linux-kernel, James Morris,
	linux-pm, paul
In-Reply-To: <20110825172140.eb34809f.kamezawa.hiroyu@jp.fujitsu.com>

Hello,

On Thu, Aug 25, 2011 at 10:21 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> Hmmm... the above is basically identity transformation of the existing
>> code.  If the above is broken, the current code is broken too.  Is it?
>>
> Current code is not broken.

Trust me.  If the posted code is broken, the current code is too. It
is an IDENTITY transformation.

> mem_cgroup_can_attach(....., task) need to do real job only when task->mm->owner
> == task. In this modification, you pass a set of task at once.

Before the change, cgroup would migrate multiple tasks all the same
but memcg wouldn't have noticed it unless it opted in explicitly using
[can_]attach_task(). When multiple tasks were moving, [can_]attach()
would only be called with the leader whether the leader actually is
changing cgroup or not. The interface sucked and it wasn't properly
documented but that's what was happening. The interface change is just
making the breakage obvious - +1 for the new interface. :)

> So, mem_cgroup_can_attach() need to check _all_ tasks in tset rather than a
> first task in tset. please scan and find mm->owner.
>
> Anyway, if you merge this onto mm-tree first, I think I can have time to
> write a fix up if complicated.

As for specific merging order, it shouldn't matter all that much but
if you wanna backport fixes for -stable, maybe it would make more
sense to sequence the fix before this change.

Thank you.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: KAMEZAWA Hiroyuki @ 2011-08-25  8:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daisuke Nishimura, containers, lizf, linux-kernel, James Morris,
	linux-pm, paul
In-Reply-To: <CAOS58YPM=cuWjAF+VJ4QJ8bnRcVtaDCVXBJCpdWg+2=2GmnKrA@mail.gmail.com>

On Thu, 25 Aug 2011 10:40:06 +0200
Tejun Heo <tj@kernel.org> wrote:

> Hello,
> 
> On Thu, Aug 25, 2011 at 10:21 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> Hmmm... the above is basically identity transformation of the existing
> >> code.  If the above is broken, the current code is broken too.  Is it?
> >>
> > Current code is not broken.
> 
> Trust me.  If the posted code is broken, the current code is too. It
> is an IDENTITY transformation.
> 
> > mem_cgroup_can_attach(....., task) need to do real job only when task->mm->owner
> > == task. In this modification, you pass a set of task at once.
> 
> Before the change, cgroup would migrate multiple tasks all the same
> but memcg wouldn't have noticed it unless it opted in explicitly using
> [can_]attach_task(). When multiple tasks were moving, [can_]attach()
> would only be called with the leader whether the leader actually is
> changing cgroup or not. The interface sucked and it wasn't properly
> documented but that's what was happening. The interface change is just
> making the breakage obvious - +1 for the new interface. :)
> 
Thank you for clarification. Ok, current code is broken.


> > So, mem_cgroup_can_attach() need to check _all_ tasks in tset rather than a
> > first task in tset. please scan and find mm->owner.
> >
> > Anyway, if you merge this onto mm-tree first, I think I can have time to
> > write a fix up if complicated.
> 
> As for specific merging order, it shouldn't matter all that much but
> if you wanna backport fixes for -stable, maybe it would make more
> sense to sequence the fix before this change.
> 
> Thank you.

Sure. IIUC, the case thread_leader != mm->owner is uncommon. 
I'll consider a fix onto your fix, first.

I'll cosinder a fix for stable tree if someone requests.

Thanks,
-Kame

_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: KAMEZAWA Hiroyuki @ 2011-08-25  8:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Daisuke Nishimura, containers, lizf, linux-kernel, James Morris,
	linux-pm, paul
In-Reply-To: <20110825082049.GC3286@htj.dyndns.org>

On Thu, 25 Aug 2011 10:20:49 +0200
Tejun Heo <tj@kernel.org> wrote:

> Hello, KAMEZAWA.
> 

> > > @@ -5460,8 +5460,9 @@ static void mem_cgroup_clear_mc(void)
> > >  
> > >  static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
> > >  				struct cgroup *cgroup,
> > > -				struct task_struct *p)
> > > +				struct cgroup_taskset *tset)
> > >  {
> > > +	struct task_struct *p = cgroup_taskset_first(tset);
> > >  	int ret = 0;
> > >  	struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
> > >  
> > 
> > Ah..hmm. I think this doesn't work as expected for memcg.
> > Maybe code like this will be required.
> 
> Hmmm... the above is basically identity transformation of the existing
> code.  If the above is broken, the current code is broken too.  Is it?
> 
Current code is not broken.

mem_cgroup_can_attach(....., task) need to do real job only when task->mm->owner
== task. In this modification, you pass a set of task at once.
So, mem_cgroup_can_attach() need to check _all_ tasks in tset rather than a
first task in tset. please scan and find mm->owner.

Anyway, if you merge this onto mm-tree first, I think I can have time to
write a fix up if complicated.

Thanks,
-Kame

^ permalink raw reply

* Re: [PATCH 3/6] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
From: Tejun Heo @ 2011-08-25  8:20 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, containers, lizf, linux-kernel, James Morris,
	linux-pm, paul
In-Reply-To: <20110825093958.75b95bd8.kamezawa.hiroyu@jp.fujitsu.com>

Hello, KAMEZAWA.

On Thu, Aug 25, 2011 at 09:39:58AM +0900, KAMEZAWA Hiroyuki wrote:
> > +Called prior to moving one or more tasks into a cgroup; if the
> > +subsystem returns an error, this will abort the attach operation.
> > +@tset contains the tasks to be attached and is guaranteed to have at
> > +least one task in it. If there are multiple, it's guaranteed that all
> > +are from the same thread group,
> 
> 
> Do this, "If there are multiple, it's guaranteed that all
> are from the same thread group ", means the 'tset' contains
> only one mm_struct ?

Yes, CLONE_THREAD requires CLONE_SIGHAND which in turn requires
CLONE_VM, so they'll all have the same mm.

> And is it guaranteed that any task in tset will not be freed while
> subsystem routine runs ?

Yeap, that one is guaranteed.  It might die but the the task_struct
itself won't be released.  However, I think it might be bette to block
task exits during migration too.

> > @@ -5460,8 +5460,9 @@ static void mem_cgroup_clear_mc(void)
> >  
> >  static int mem_cgroup_can_attach(struct cgroup_subsys *ss,
> >  				struct cgroup *cgroup,
> > -				struct task_struct *p)
> > +				struct cgroup_taskset *tset)
> >  {
> > +	struct task_struct *p = cgroup_taskset_first(tset);
> >  	int ret = 0;
> >  	struct mem_cgroup *mem = mem_cgroup_from_cont(cgroup);
> >  
> 
> Ah..hmm. I think this doesn't work as expected for memcg.
> Maybe code like this will be required.

Hmmm... the above is basically identity transformation of the existing
code.  If the above is broken, the current code is broken too.  Is it?

Thanks.

-- 
tejun

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox