Re: [RFC][PATCH v2] cgroups: Run subsystem fork callback from cgroup_post_fork()

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Frederic Weisbecker <fweisbec@gmail.com>
To: Li Zefan <lizf@cn.fujitsu.com>
Cc: Mandeep Singh Baines <msb@chromium.org>,
	Tejun Heo <tj@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC][PATCH v2] cgroups: Run subsystem fork callback from cgroup_post_fork()
Date: Sun, 4 Mar 2012 14:53:03 +0100	[thread overview]
Message-ID: <20120304135300.GA18143@somewhere.redhat.com> (raw)
In-Reply-To: <4F4EEB0F.2000504@cn.fujitsu.com>

On Thu, Mar 01, 2012 at 11:20:47AM +0800, Li Zefan wrote:
> 于 2012年03月01日 00:21, Frederic Weisbecker 写道:
> > On Wed, Feb 29, 2012 at 07:55:00AM -0800, Mandeep Singh Baines wrote:
> >> Frederic Weisbecker (fweisbec@gmail.com) wrote:
> >>> When a user freezes a cgroup, the freezer sets the subsystem state
> >>> to CGROUP_FREEZING and then iterates over the tasks in the cgroup links.
> >>>
> >>> But there is a possible race here, although unlikely, if a task
> >>> forks and the parent is preempted between write_unlock(tasklist_lock)
> >>> and cgroup_post_fork(). If we freeze the cgroup while the parent
> >>
> >> So what if you moved cgroup_post_forks() a few lines up to be
> >> inside the tasklist_lock?
> > 
> > It won't work. Consider this scenario:
> > 
> > CPU 0                                     CPU 1
> > 
> >                                        cgroup_fork_callbacks()
> >                                        write_lock(tasklist_lock)
> > try_to_freeze_cgroup() {               add child to task list etc...
> > 	cgroup_iter_start()
> >         freeze tasks                        
> >         cgroup_iter_end()
> > }                                      cgroup_post_fork()
> >                                        write_unlock(tasklist_lock)
> > 
> > If this is not the first time we call cgroup_iter_start(), we won't go
> > through the whole tasklist, we simply iterate through the css set task links.
> > 
> > Plus we try to avoid anything under tasklist_lock when possible.
> > 
> 
> Your patch won't close the race I'm afraid.
> 
> // state will be set to FREEZING
> echo FROZEN > /cgroup/sub/freezer.state
>                                           write_lock(tasklist_lock)
>                                           add child to task list ...
> 					  write_unlock(tasklist_lock)
> // state will be updated to FROZEN
> cat /cgroup/sub/freezer.state
> 					  cgroup_post_fork()
> 					  ->freezer_fork()
> 
> freezer_fork() will freeze the task only if the cgroup is in FREEZING
> state, and will BUG if the state is FROZEN.

Good point!

> 
> We can fix freezer_fork(), but seems that requires we hold cgroup_mutex
> in that function(), which we don't like at all. Not to say your
> task_counter stuff..
> 
> At this moment I don't see a solution without tasklist_lock involved,
> any better idea?

Ok, everything would be _much_ simpler if we were adding the css set task
link unconditionally.

Unfortunately this means acquiring a (global) lock unconditionally and
doing a list_add() on every fork. Although the lock could perhaps
be made per css set.

An idea could be to start the css set linking as soon as we create
the first subdirectory of a freezer cgroup. The root css set can't be
freezed and when we create the subdirectory we have no task there yet.
Due to the threadgroup_lock() that follows, none of the tasks that will be
attached there can be stuck in the middle of a fork so we are fine against
their css_set links: we know that when we attach a task to the cgroup of that
subdir, its css set link is set and we won't have any of the race we are describing.

How does that sound?

next prev parent reply	other threads:[~2012-03-04 13:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-24  4:23 [RFC][PATCH] cgroups: Run subsystem fork callback from cgroup_post_fork() Frederic Weisbecker
2012-02-24  4:32 ` Frederic Weisbecker
2012-02-27 17:02 ` [RFC][PATCH v2] " Frederic Weisbecker
2012-02-29 15:55   ` Mandeep Singh Baines
2012-02-29 16:21     ` Frederic Weisbecker
2012-03-01  3:20       ` Li Zefan
2012-03-04 13:53         ` Frederic Weisbecker [this message]
2012-03-07  9:22           ` Li Zefan
2012-03-08 15:53             ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120304135300.GA18143@somewhere.redhat.com \
    --to=fweisbec@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=msb@chromium.org \
    --cc=oleg@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).