public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serue@us.ibm.com>
To: Benjamin Blum <bblum@google.com>
Cc: linux-kernel@vger.kernel.org,
	containers@lists.linux-foundation.org, akpm@linux-foundation.org,
	lizf@cn.fujitsu.com, menage@google.com
Subject: Re: [PATCH 6/6] Makes procs file writable to move all threads by tgid at once
Date: Mon, 3 Aug 2009 13:55:56 -0500	[thread overview]
Message-ID: <20090803185556.GA8469@us.ibm.com> (raw)
In-Reply-To: <2f86c2480908031113y525b6cbdhe418b8a0364c7760@mail.gmail.com>

Quoting Benjamin Blum (bblum@google.com):
> On Mon, Aug 3, 2009 at 1:54 PM, Serge E. Hallyn<serue@us.ibm.com> wrote:
> > Quoting Ben Blum (bblum@google.com):
> > ...
> >> +static int cgroup_task_migrate(struct cgroup *cgrp, struct cgroup *oldcgrp,
> >> +                            struct task_struct *tsk, int guarantee)
> >> +{
> >> +     struct css_set *oldcg;
> >> +     struct css_set *newcg;
> >> +
> >> +     /*
> >> +      * get old css_set. we need to take task_lock and refcount it, because
> >> +      * an exiting task can change its css_set to init_css_set and drop its
> >> +      * old one without taking cgroup_mutex.
> >> +      */
> >> +     task_lock(tsk);
> >> +     oldcg = tsk->cgroups;
> >> +     get_css_set(oldcg);
> >> +     task_unlock(tsk);
> >> +     /*
> >> +      * locate or allocate a new css_set for this task. 'guarantee' tells
> >> +      * us whether or not we are sure that a new css_set already exists;
> >> +      * in that case, we are not allowed to fail, as we won't need malloc.
> >> +      */
> >> +     if (guarantee) {
> >> +             /*
> >> +              * our caller promises us that the css_set we want already
> >> +              * exists, so we use find_existing_css_set directly.
> >> +              */
> >> +             struct cgroup_subsys_state *template[CGROUP_SUBSYS_COUNT];
> >> +             read_lock(&css_set_lock);
> >> +             newcg = find_existing_css_set(oldcg, cgrp, template);
> >> +             BUG_ON(!newcg);
> >> +             get_css_set(newcg);
> >> +             read_unlock(&css_set_lock);
> >> +     } else {
> >> +             might_sleep();
> >
> > So cgroup_task_migrate() might sleep, but
> >
> > ...
> >
> >
> >> +     down_write(&leader->cgroup_fork_mutex);
> >> +     rcu_read_lock();
> >> +     list_for_each_entry_rcu(tsk, &leader->thread_group, thread_group) {
> >> +             /* leave current thread as it is if it's already there */
> >> +             oldcgrp = task_cgroup(tsk, subsys_id);
> >> +             if (cgrp == oldcgrp)
> >> +                     continue;
> >> +             /* we don't care whether these threads are exiting */
> >> +             retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, 1);
> >
> > Here it is called under rcu_read_lock().
> 
> You'll notice the fourth argument, which tells cgroup_task_migrate

Hmmm, in my defense one would notice it more readily if the caller used a
meaningful #define instead of '1'.

> whether the css_set is guaranteed or not. If we say we've already got
> it covered, the might_sleep section doesn't happen.
> 
> >> -void cgroup_fork(struct task_struct *child)
> >> +void cgroup_fork(struct task_struct *child, int clone_flags)
> >>  {
> >> +     if (clone_flags & CLONE_THREAD)
> >> +             down_read(&current->group_leader->cgroup_fork_mutex);
> >> +     else
> >> +             init_rwsem(&child->cgroup_fork_mutex);
> >
> > I'm also worried about the overhead here on what should be a
> > fast case, CLONE_THREAD.  Have you done any benchmarking of
> > one thread spawning a bunch of others?
> 
> Should be strictly better as this is making the rwsem local to the
> threadgroup - at least in comparison to the previous edition of this
> patch which had it as a global lock.
> 
> > What *exactly* is it we are protecting with cgroup_fork_mutex?
> > 'fork' (as the name implies) is not a good answer, since we should be
> > protecting data, not code.  If it is solely tsk->cgroups, then perhaps
> > we should in fact try switching to (s?)rcu.  Then cgroup_fork() could
> > just do rcu_read_lock, while cgroup_task_migrate() would make the change
> > under a spinlock (to protect against concurrent cgroup_task_migrate()s),
> > and using rcu_assign_pointer to let cgroup_fork() see consistent data
> > either before or after the update...  That might mean that any checks done
> > before completing the migrate which involve the # of tasks might become
> > invalidated before the migration completes?  Seems acceptable (since
> > it'll be a small overcharge at most and can be quickly remedied).
> 
> You'll notice where the rwsem is released - not until cgroup_post_fork
> or cgroup_fork_failed. It doesn't just protect the tsk->cgroups
> pointer, but rather guarantees atomicity between adjusting
> tsk->cgroups and attaching it to the cgroups lists with respect to the
> critical section in attach_proc. If you've a better name for the lock
> for such a race condition, do suggest.

No the name is pretty accurate - it's the lock itself I'm objecting
to.  Maybe it's the best we can do, though.

thanks,
-serge

  reply	other threads:[~2009-08-03 18:55 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-31  1:51 [PATCH v2 0/6] CGroups: cgroup memberlist enhancement+fix Ben Blum
2009-07-31  1:51 ` [PATCH 1/6] Adds a read-only "procs" file similar to "tasks" that shows only unique tgids Ben Blum
2009-07-31  1:51 ` [PATCH 2/6] Ensures correct concurrent opening/reading of pidlists across pid namespaces Ben Blum
2009-07-31  1:51 ` [PATCH 3/6] Quick vmalloc vs kmalloc fix to the case where array size is too large Ben Blum
2009-07-31  1:51 ` [PATCH 4/6] Changes css_set freeing mechanism to be under RCU Ben Blum
2009-07-31  1:51 ` [PATCH 5/6] Lets ss->can_attach and ss->attach do whole threadgroups at a time Ben Blum
2009-08-03  2:22   ` Li Zefan
2009-08-04  0:35     ` Benjamin Blum
2009-07-31  1:51 ` [PATCH 6/6] Makes procs file writable to move all threads by tgid at once Ben Blum
2009-08-03  3:00   ` Li Zefan
2009-08-04  0:56     ` Benjamin Blum
2009-08-04  1:05       ` Paul Menage
2009-08-04  1:11         ` Benjamin Blum
2009-08-04  1:09       ` Li Zefan
2009-08-04  1:19         ` Benjamin Blum
2009-08-04  1:45           ` Li Zefan
2009-08-04  1:55             ` Paul Menage
2009-08-03 17:54   ` Serge E. Hallyn
2009-08-03 18:07     ` Paul Menage
2009-08-03 18:13     ` Benjamin Blum
2009-08-03 18:55       ` Serge E. Hallyn [this message]
2009-08-03 19:45         ` Serge E. Hallyn
2009-08-03 19:55           ` Paul Menage
2009-08-04 14:01             ` Serge E. Hallyn
2009-08-04 21:40             ` Matt Helsley
2009-08-04 18:48           ` Paul Menage
2009-08-04 19:01             ` Serge E. Hallyn
2009-08-04 19:14             ` Benjamin Blum
2009-08-04 19:28               ` Paul Menage
2009-08-05 10:20                 ` Louis Rilling
2009-08-05 16:11                   ` Paul Menage
2009-08-05 16:42                     ` Louis Rilling
2009-08-05 16:53                       ` Peter Zijlstra
2009-08-06  0:01                       ` Benjamin Blum
2009-08-06  9:58                         ` Louis Rilling
2009-08-06 10:04                           ` Louis Rilling
2009-08-06 10:28                           ` Paul Menage
2009-08-06 10:34                             ` Peter Zijlstra
2009-08-06 10:42                               ` Paul Menage
2009-08-06 11:02                                 ` Peter Zijlstra
2009-08-06 11:24                                   ` Paul Menage
2009-08-06 11:39                                     ` Peter Zijlstra
2009-08-06 15:19                                       ` Paul E. McKenney
2009-08-06 15:24                                         ` Peter Zijlstra
2009-08-06 15:37                                           ` Paul E. McKenney
2009-08-06 11:24                             ` Louis Rilling
2009-08-06 11:40                               ` Paul Menage
2009-08-06 14:54                                 ` Louis Rilling
2009-08-08  1:41                                 ` Benjamin Blum
2009-08-08  1:51                                   ` Benjamin Blum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090803185556.GA8469@us.ibm.com \
    --to=serue@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bblum@google.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox