Re: [PATCH v4 2/2] cgroups: make procs file writable

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Paul Menage <menage@google.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ben Blum <bblum@andrew.cmu.edu>,
	linux-kernel@vger.kernel.org,
	containers@lists.linux-foundation.org, akpm@linux-foundation.org,
	ebiederm@xmission.com, lizf@cn.fujitsu.com, matthltc@us.ibm.com,
	oleg@redhat.com
Subject: Re: [PATCH v4 2/2] cgroups: make procs file writable
Date: Tue, 3 Aug 2010 21:30:00 -0700	[thread overview]
Message-ID: <AANLkTikMofFGHSwF2QrdcAsit+hU6ihndhK5cod8duwS@mail.gmail.com> (raw)
In-Reply-To: <20100804100811.199d73ba.kamezawa.hiroyu@jp.fujitsu.com>

Hi Ben, Kame,

Sorry for the delay in getting to look at this,

On Tue, Aug 3, 2010 at 6:08 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
>>
>>       for_each_subsys(root, ss) {
>>               if (ss->attach)
>>                       ss->attach(ss, cgrp, oldcgrp, tsk, false);
>>       }
>> -     set_bit(CGRP_RELEASABLE, &oldcgrp->flags);
>> +
>
> Hmm. By this, we call ss->attach(ss, cgrp, oldcgrp, tsk, false) after
> makring CGRP_RELEASABLE+synchronize_rcu() to oldcgroup...is it safe ?
> And why move it before attach() ?
>

Marking as releasable should be fine - the only time this is cleared
is when you write to notify_on_release.

I think that the put_css_set(oldcg) and synchronize_rcu() is safe, by
the following logic:

- we took cgroup_lock in cgroup_tasks_write()

- after this point, oldcgrp was initialized from the task's cgroup;
therefore oldcgrp still existed at this point

- even if all the threads (including the one being operated on) exit
(and hence leave oldcgrp) while we're doing the attach, we're holding
cgroup_lock so no-one else can delete oldcgrp

So it's certainly possible that the task in question has exited by the
time we call the subsys attach methods, but oldcgrp should still be
alive.

Whether we need an additional synchronize_rcu() after the attach()
calls is harder to determine - I guess it's better to be safe than
sorry, unless people are seeing specific performance issues with this.

I think the css_set_check_fetched() function needs more comments
explaining its behaviour and what its return value indicates.
>> +     /*
>> +      * we need to make sure we have css_sets for all the tasks we're
>> +      * going to move -before- we actually start moving them, so that in
>> +      * case we get an ENOMEM we can bail out before making any changes.

More than that - even if we don't get an ENOMEM, we can't safely sleep
in the RCU section, so we'd either have to do all memory allocations
atomically (which would be bad and unreliable) or else we avoid the
need to allocate in the RCU section (which is the choice taken here).

>> +      */
>> +     struct list_head newcg_list;
>> +     struct cg_list_entry *cg_entry, *temp_nobe;
>> +
>> +     /* check that we can legitimately attach to the cgroup. */
>> +     for_each_subsys(root, ss) {
>> +             if (ss->can_attach) {
>> +                     retval = ss->can_attach(ss, cgrp, leader, true);
>> +                     if (retval) {
>> +                             failed_ss = ss;
>> +                             goto out;
>> +                     }
>> +             }
>> +     }
>
> Then, we cannot do attach limitaion control per thread ? (This just check leader.)
> Is it ok for all subsys ?

By passing "true" as the "threadgroup" parameter to can_attach(),
we're letting the subsystem decide if it needs to do per-thread
checks. For most subsystems, calling them once for each thread would
be unnecessary.

>
>
>> +
>> +     /*
>> +      * step 1: make sure css_sets exist for all threads to be migrated.
>> +      * we use find_css_set, which allocates a new one if necessary.
>> +      */
>> +     INIT_LIST_HEAD(&newcg_list);
>> +     oldcgrp = task_cgroup_from_root(leader, root);
>> +     if (cgrp != oldcgrp) {
>> +             /* get old css_set */
>> +             task_lock(leader);
>> +             if (leader->flags & PF_EXITING) {
>> +                     task_unlock(leader);
>> +                     goto prefetch_loop;
>> +             }
> Why do we continue here ? not -ESRCH ?
>

It's possible that some threads from the process are exiting while
we're trying to move the entire process. As long as we move at least
one thread, we shouldn't care if some of its threads are exiting.

Which means that after we've done the prefetch loop, we should
probably check that the newcg_list isn't empty, and return -ESRCH in
that case.

>
>> +             oldcg = leader->cgroups;
>> +             get_css_set(oldcg);
>> +             task_unlock(leader);
>> +             /* acquire new one */

/* acquire a new css_set for the leader */

>> +      * if we need to fetch a new css_set for this task, we must exit the
>> +      * rcu_read section because allocating it can sleep. afterwards, we'll
>> +      * need to restart iteration on the threadgroup list - the whole thing
>> +      * will be O(nm) in the number of threads and css_sets; as the typical
>> +      * case has only one css_set for all of them, usually O(n). which ones

Maybe better to say "in the worst case this is O(n^2) on the number of
threads; however, in the vast majority of cases all the threads will
be in the same cgroups as the leader and we'll make a just single pass
through the list with no additional allocations needed".

>
> It's going away but seems to exist for a while....then, "continue" is safe
> for keeping consistency ?

Yes, because we don't sleep so the RCU list is still valid.

>> +             /* see if the new one for us is already in the list? */

/* See if we already have an appropriate css_set for this thread */

>> +                     /* begin iteration again. */

/* Since we may have slept in css_set_prefetch(), the RCU list is no
longer valid, so we must begin the iteration again; Any threads that
we've previously processed will pass the css_set_check_fetched() test
on subsequent iterations since we hold cgroup_lock, so we're
guaranteed to make progress. */

> Does this function work well if the process has 10000+ threads ?

In general there'll only be one cgroup so it'll be a single pass
through the list.

>
> How about this logic ?
> ==
>
>        /* At first, find out necessary things */
>        rcu_read_lock();
>        list_for_each_entry_rcu() {
>                oldcgrp = task_cgroup_from_root(tsk, root);
>                if (oldcgrp == cgrp)
>                        continue;
>                task_lock(task);
>                if (task->flags & PF_EXITING) {
>                        task_unlock(task);
>                        continue;
>                }
>                oldcg = tsk->cgroups;
>                get_css_set(oldcg);
>                task_unlock(task);
>                read_lock(&css_set_lock);
>                newcg = find_existing_css_set(oldcgrp cgrp, template);
>                if (newcg)
>                        remember_this_newcg(newcg, &found_cg_array); {
>                        put_css_set(oldcg);
>                } else
>                        remember_need_to_allocate(oldcg, &need_to_allocate_array);

The problem with this is that remember_need_to_allocate() will itself
need to allocate memory in order to allow need_to_allocate_array to
expand arbitrarily. Which can't be done without GFP_ATOMIC or else
sleeping in the RCU section, neither of which are good.
>> +list_teardown:
>> +     /* clean up the list of prefetched css_sets. */
>> +     list_for_each_entry_safe(cg_entry, temp_nobe, &newcg_list, links) {
>> +             list_del(&cg_entry->links);
>> +             put_css_set(cg_entry->cg);
>> +             kfree(cg_entry);
>> +     }

I wonder if we might need a synchronize_rcu() here?

>> --- a/kernel/cpuset.c
>> +++ b/kernel/cpuset.c
>> @@ -1404,6 +1404,10 @@ static int cpuset_can_attach(struct cgroup_subsys *ss, struct cgroup *cont,
>>               struct task_struct *c;
>>
>>               rcu_read_lock();
>> +             if (!thread_group_leader(tsk)) {
>> +                     rcu_read_unlock();
>> +                     return -EAGAIN;
>> +             }

Why are you adding this requirement, here and in sched.c? (ns_cgroup.c
doesn't matter since it's being deleted).

Paul

next prev parent reply	other threads:[~2010-08-04  4:30 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-30 23:56 [PATCH v4 0/2] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Ben Blum
2010-07-30 23:57 ` [PATCH v4 1/2] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
2010-08-04  3:44   ` Paul Menage
2010-08-04  4:33     ` Ben Blum
2010-08-04  4:34       ` Paul Menage
2010-08-06  6:02         ` Ben Blum
2010-08-06  7:08           ` KAMEZAWA Hiroyuki
2010-08-04 16:34     ` Brian K. White
2010-07-30 23:59 ` [PATCH v4 2/2] cgroups: make procs file writable Ben Blum
2010-08-04  1:08   ` KAMEZAWA Hiroyuki
2010-08-04  4:28     ` Ben Blum
2010-08-04  4:30     ` Paul Menage [this message]
2010-08-04  4:38       ` Ben Blum
2010-08-04  4:46         ` Paul Menage
2010-08-03 19:58 ` [PATCH v4 0/2] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Andrew Morton
2010-08-03 23:45   ` KAMEZAWA Hiroyuki
2010-08-04  2:00   ` Li Zefan
2010-08-11  5:46 ` [PATCH v5 0/3] " Ben Blum
2010-08-11  5:47   ` [PATCH v5 1/3] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
2010-08-23 23:35     ` Paul Menage
2010-08-11  5:48   ` [PATCH v5 2/3] cgroups: add can_attach callback for checking all threads in a group Ben Blum
2010-08-23 23:31     ` Paul Menage
2010-08-11  5:48   ` [PATCH v5 3/3] cgroups: make procs file writable Ben Blum
2010-08-24 18:08     ` Paul Menage
2010-12-24  8:22   ` [PATCH v6 0/3] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Ben Blum
2010-12-24  8:23     ` [PATCH v6 1/3] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
2010-12-24  8:24     ` [PATCH v6 2/3] cgroups: add can_attach callback for checking all threads in a group Ben Blum
2010-12-24  8:24     ` [PATCH v6 3/3] cgroups: make procs file writable Ben Blum
2011-01-12 23:26       ` Paul E. McKenney
2010-12-26 12:09     ` [PATCH v7 0/3] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Ben Blum
2010-12-26 12:09       ` [PATCH v7 1/3] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
2011-01-24  8:38         ` Paul Menage
2011-01-24 21:05         ` Andrew Morton
2011-02-04 21:25           ` Ben Blum
2011-02-04 21:36             ` Andrew Morton
2011-02-04 21:43               ` Ben Blum
2011-02-14  5:31           ` Paul Menage
2010-12-26 12:11       ` [PATCH v7 2/3] cgroups: add atomic-context per-thread subsystem callbacks Ben Blum
2011-01-24  8:38         ` Paul Menage
2011-01-24 15:32           ` Ben Blum
2010-12-26 12:12       ` [PATCH v7 3/3] cgroups: make procs file writable Ben Blum
2011-02-08  1:35       ` [PATCH v8 0/3] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Ben Blum
2011-02-08  1:37         ` [PATCH v8 1/3] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
2011-03-03 17:54           ` Paul Menage
2011-02-08  1:39         ` [PATCH v8 2/3] cgroups: add per-thread subsystem callbacks Ben Blum
2011-03-03 17:59           ` Paul Menage
2011-02-08  1:39         ` [PATCH v8 3/3] cgroups: make procs file writable Ben Blum
2011-02-16 19:22           ` [PATCH v8 4/3] cgroups: use flex_array in attach_proc Ben Blum
2011-03-03 17:48             ` Paul Menage
2011-03-22  5:15               ` Ben Blum
2011-03-22  5:19                 ` [PATCH v8.5 " Ben Blum
2011-03-03 18:38           ` [PATCH v8 3/3] cgroups: make procs file writable Paul Menage
2011-03-10  6:18             ` Ben Blum
2011-03-10 20:01               ` Paul Menage
2011-03-15 21:13                 ` Ben Blum
2011-03-18 16:54                   ` Paul Menage
2011-03-22  5:18                     ` [PATCH v8.5 " Ben Blum
2011-03-29 23:27                       ` Paul Menage
2011-03-29 23:39                         ` Andrew Morton
2011-03-22  5:08               ` [PATCH v8 " Ben Blum
2011-02-09 23:10         ` [PATCH v8 0/3] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Andrew Morton
2011-02-10  1:02           ` KAMEZAWA Hiroyuki
2011-02-10  1:36             ` Ben Blum
2011-02-14  6:12             ` Paul Menage
2011-02-14  6:12           ` Paul Menage
2011-04-06 19:44         ` [PATCH v8.75 0/4] " Ben Blum
2011-04-06 19:45           ` [PATCH v8.75 1/4] cgroups: read-write lock CLONE_THREAD forking per threadgroup Ben Blum
2011-04-06 19:46           ` [PATCH v8.75 2/4] cgroups: add per-thread subsystem callbacks Ben Blum
2011-04-06 19:46           ` [PATCH v8.75 3/4] cgroups: make procs file writable Ben Blum
2011-04-06 19:47           ` [PATCH v8.75 4/4] cgroups: use flex_array in attach_proc Ben Blum
2011-04-12 23:25           ` [PATCH v8.75 0/4] cgroups: implement moving a threadgroup's threads atomically with cgroup.procs Andrew Morton
2011-04-12 23:59             ` Ben Blum
2011-04-13  2:07             ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTikMofFGHSwF2QrdcAsit+hU6ihndhK5cod8duwS@mail.gmail.com \
    --to=menage@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bblum@andrew.cmu.edu \
    --cc=containers@lists.linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=matthltc@us.ibm.com \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).