From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
rjw-KKrjLPT3xs0@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set
Date: Wed, 17 Oct 2012 16:28:59 +0800 [thread overview]
Message-ID: <507E6C4B.6000704@huawei.com> (raw)
In-Reply-To: <1350426526-14254-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
On 2012/10/17 6:28, Tejun Heo wrote:
> cgroup core has a bug which violates a basic rule about event
> notifications - when a new entity needs to be added, you add that to
> the notification list first and then make the new entity conform to
> the current state. If done in the reverse order, an event happening
> inbetween will be lost.
>
> cgroup_subsys->fork() is invoked way before the new task is added to
> the css_set. Currently, cgroup_freezer is the only user of ->fork()
> and uses it to make new tasks conform to the current state of the
> freezer. If FROZEN state is requested while fork is in progress
> between cgroup_fork_callbacks() and cgroup_post_fork(), the child
> could escape freezing - the cgroup isn't frozen when ->fork() is
> called and the freezer couldn't see the new task on the css_set.
>
> This patch moves cgroup_subsys->fork() invocation to
> cgroup_post_fork() after the new task is added to the css_set.
> cgroup_fork_callbacks() is removed.
>
> Because now a task may be migrated during cgroup_subsys->fork(),
> freezer_fork() is updated so that it adheres to the usual RCU locking
> and the rather pointless comment on why locking can be different there
> is removed (if it doesn't make anything simpler, why even bother?).
>
I don't think rcu read section is sufficient. It guarantees the data you're
accessing is valid, but the data can be new or can be old.
So a case below is possible:
in freezer_fork():
rcu_read_lock();
freezer = task_freezer(task);
move task from freezer to freezer2
which is in FREEZING/FROZEN state
freezer is in THAWED state,
nothing to do.
rcu_read_unlock();
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
> include/linux/cgroup.h | 1 -
> kernel/cgroup.c | 62 ++++++++++++++++++++++------------------------
> kernel/cgroup_freezer.c | 13 +++-------
> kernel/fork.c | 9 +------
> 4 files changed, 35 insertions(+), 50 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index f8a030c..4cd1d0f 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -34,7 +34,6 @@ extern int cgroup_lock_is_held(void);
> extern bool cgroup_lock_live_group(struct cgroup *cgrp);
> extern void cgroup_unlock(void);
> extern void cgroup_fork(struct task_struct *p);
> -extern void cgroup_fork_callbacks(struct task_struct *p);
> extern void cgroup_post_fork(struct task_struct *p);
> extern void cgroup_exit(struct task_struct *p, int run_callbacks);
> extern int cgroupstats_build(struct cgroupstats *stats,
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 13774b3..b7a0171 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -4844,44 +4844,19 @@ void cgroup_fork(struct task_struct *child)
> }
>
> /**
> - * cgroup_fork_callbacks - run fork callbacks
> - * @child: the new task
> - *
> - * Called on a new task very soon before adding it to the
> - * tasklist. No need to take any locks since no-one can
> - * be operating on this task.
> - */
> -void cgroup_fork_callbacks(struct task_struct *child)
> -{
> - if (need_forkexit_callback) {
> - int i;
> - for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> - struct cgroup_subsys *ss = subsys[i];
> -
> - /*
> - * forkexit callbacks are only supported for
> - * builtin subsystems.
> - */
> - if (!ss || ss->module)
> - continue;
> -
> - if (ss->fork)
> - ss->fork(child);
> - }
> - }
> -}
> -
> -/**
> * cgroup_post_fork - called on a new task after adding it to the task list
> * @child: the task in question
> *
> - * Adds the task to the list running through its css_set if necessary.
> - * Has to be after the task is visible on the task list in case we race
> - * with the first call to cgroup_iter_start() - to guarantee that the
> - * new task ends up on its list.
> + * Adds the task to the list running through its css_set if necessary and
> + * call the subsystem fork() callbacks. Has to be after the task is
> + * visible on the task list in case we race with the first call to
> + * cgroup_iter_start() - to guarantee that the new task ends up on its
> + * list.
> */
> void cgroup_post_fork(struct task_struct *child)
> {
> + int i;
> +
> /*
> * use_task_css_set_links is set to 1 before we walk the tasklist
> * under the tasklist_lock and we read it here after we added the child
> @@ -4910,7 +4885,30 @@ void cgroup_post_fork(struct task_struct *child)
> }
> write_unlock(&css_set_lock);
> }
> +
> + /*
> + * Call ss->fork(). This must happen after @child is linked on
> + * css_set; otherwise, @child might change state between ->fork()
> + * and addition to css_set.
> + */
> + if (need_forkexit_callback) {
> + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> + struct cgroup_subsys *ss = subsys[i];
> +
> + /*
> + * fork/exit callbacks are supported only for
> + * builtin subsystems and we don't need further
> + * synchronization as they never go away.
> + */
> + if (!ss || ss->module)
> + continue;
> +
> + if (ss->fork)
> + ss->fork(child);
> + }
> + }
> }
> +
> /**
> * cgroup_exit - detach cgroup from exiting task
> * @tsk: pointer to task_struct of exiting process
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index b1724ce..12bfedb 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -186,23 +186,15 @@ static void freezer_fork(struct task_struct *task)
> {
> struct freezer *freezer;
>
> - /*
> - * No lock is needed, since the task isn't on tasklist yet,
> - * so it can't be moved to another cgroup, which means the
> - * freezer won't be removed and will be valid during this
> - * function call. Nevertheless, apply RCU read-side critical
> - * section to suppress RCU lockdep false positives.
> - */
> rcu_read_lock();
> freezer = task_freezer(task);
> - rcu_read_unlock();
>
> /*
> * The root cgroup is non-freezable, so we can skip the
> * following check.
> */
> if (!freezer->css.cgroup->parent)
> - return;
> + goto out;
>
> spin_lock_irq(&freezer->lock);
> BUG_ON(freezer->state == CGROUP_FROZEN);
> @@ -210,7 +202,10 @@ static void freezer_fork(struct task_struct *task)
> /* Locking avoids race with FREEZING -> THAWED transitions. */
> if (freezer->state == CGROUP_FREEZING)
> freeze_task(task);
> +
> spin_unlock_irq(&freezer->lock);
> +out:
> + rcu_read_unlock();
> }
>
> /*
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 8b20ab7..acc4cb6 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1135,7 +1135,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> {
> int retval;
> struct task_struct *p;
> - int cgroup_callbacks_done = 0;
>
> if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
> return ERR_PTR(-EINVAL);
> @@ -1393,12 +1392,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> INIT_LIST_HEAD(&p->thread_group);
> p->task_works = NULL;
>
> - /* Now that the task is set up, run cgroup callbacks if
> - * necessary. We need to run them before the task is visible
> - * on the tasklist. */
> - cgroup_fork_callbacks(p);
> - cgroup_callbacks_done = 1;
> -
> /* Need tasklist lock for parent etc handling! */
> write_lock_irq(&tasklist_lock);
>
> @@ -1503,7 +1496,7 @@ bad_fork_cleanup_cgroup:
> #endif
> if (clone_flags & CLONE_THREAD)
> threadgroup_change_end(current);
> - cgroup_exit(p, cgroup_callbacks_done);
> + cgroup_exit(p, 0);
> delayacct_tsk_free(p);
> module_put(task_thread_info(p)->exec_domain->module);
> bad_fork_cleanup_count:
>
next prev parent reply other threads:[~2012-10-17 8:28 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1350426526-14254-1-git-send-email-tj@kernel.org>
[not found] ` <1350426526-14254-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-16 22:28 ` [PATCH 1/7] cgroup: cgroup_subsys->fork() should be called after the task is added to css_set Tejun Heo
[not found] ` <1350426526-14254-2-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-17 8:28 ` Li Zefan [this message]
[not found] ` <507E6C4B.6000704-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2012-10-18 1:25 ` Li Zefan
2012-10-21 19:11 ` Oleg Nesterov
[not found] ` <20121021191141.GA26218-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-21 19:22 ` Tejun Heo
[not found] ` <20121021192222.GB5951@atj.dyndns.org>
[not found] ` <20121021192222.GB5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-22 18:04 ` Oleg Nesterov
[not found] ` <20121022180445.GB21553-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-22 21:16 ` Tejun Heo
[not found] ` <20121022211631.GE5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-23 15:51 ` Oleg Nesterov
[not found] ` <20121023155128.GB16201-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-24 19:04 ` Tejun Heo
[not found] ` <20121024190458.GB12182-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-25 17:42 ` Oleg Nesterov
2012-12-20 5:25 ` Herton Ronaldo Krzesinski
2012-12-28 21:22 ` [PATCH] cgroup: remove unused dummy cgroup_fork_callbacks() Tejun Heo
2012-10-16 22:28 ` [PATCH 2/7] freezer: add missing mb's to freezer_count() and freezer_should_skip() Tejun Heo
[not found] ` <1350426526-14254-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-22 17:44 ` Oleg Nesterov
[not found] ` <20121022174404.GA21553-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-22 21:13 ` Tejun Heo
[not found] ` <20121022211317.GD5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-23 15:39 ` Oleg Nesterov
[not found] ` <20121023153919.GA16201-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-24 18:57 ` Tejun Heo
[not found] ` <20121024185710.GA12182@atj.dyndns.org>
[not found] ` <20121024185710.GA12182-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-25 16:39 ` [PATCH 0/1] (Was: freezer: add missing mb's to freezer_count() and freezer_should_skip()) Oleg Nesterov
[not found] ` <20121025163941.GA3801-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 16:39 ` [PATCH 1/1] freezer: change ptrace_stop/do_signal_stop to use freezable_schedule() Oleg Nesterov
[not found] ` <20121025163959.GB3801-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:18 ` Tejun Heo
[not found] ` <20121025171812.GE11442@htj.dyndns.org>
[not found] ` <20121025171812.GE11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-25 17:34 ` Oleg Nesterov
[not found] ` <20121025173433.GA7650@redhat.com>
[not found] ` <20121025173433.GA7650-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:36 ` Tejun Heo
[not found] ` <20121025173632.GI11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-26 17:45 ` [PATCH v2 0/1] " Oleg Nesterov
[not found] ` <20121026174545.GA21639-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-26 17:46 ` [PATCH v2 1/1] " Oleg Nesterov
[not found] ` <20121026174606.GB21639-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-26 17:52 ` Tejun Heo
[not found] ` <20121026175258.GV11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-26 18:01 ` Oleg Nesterov
[not found] ` <20121026180149.GA22421@redhat.com>
[not found] ` <20121026180149.GA22421-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-26 21:14 ` Rafael J. Wysocki
[not found] ` <2566006.UzAQbpOjNQ@vostro.rjw.lan>
[not found] ` <2566006.UzAQbpOjNQ-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
2012-10-26 21:29 ` Rafael J. Wysocki
[not found] ` <2718983.vORnrfWdbE-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
2012-10-26 21:29 ` Tejun Heo
[not found] ` <20121026212909.GW11442@htj.dyndns.org>
[not found] ` <20121026212909.GW11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-28 0:16 ` Rafael J. Wysocki
2012-10-27 22:22 ` Ben Hutchings
[not found] ` <1351376558.21585.1.camel@deadeye.wl.decadent.org.uk>
[not found] ` <1351376558.21585.1.camel-nDn/Rdv9kqW9Jme8/bJn5UCKIB8iOfG2tUK59QYPAWc@public.gmane.org>
2012-10-28 13:45 ` Oleg Nesterov
2012-10-16 22:28 ` [PATCH 3/7] cgroup_freezer: make it official that writes to freezer.state don't fail Tejun Heo
2012-10-16 22:28 ` [PATCH 4/7] cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks Tejun Heo
[not found] ` <1350426526-14254-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-22 18:34 ` Oleg Nesterov
[not found] ` <20121022183453.GA24687@redhat.com>
[not found] ` <20121022183453.GA24687-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-22 21:18 ` Tejun Heo
[not found] ` <20121022211822.GF5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-23 15:55 ` Oleg Nesterov
[not found] ` <20121023155533.GC16201@redhat.com>
[not found] ` <20121023155533.GC16201-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-24 19:06 ` Tejun Heo
[not found] ` <20121024190651.GC12182-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-25 17:12 ` [PATCH 0/1] (Was: cgroup_freezer: don't stall transition to FROZEN for PF_NOFREEZE or PF_FREEZER_SKIP tasks) Oleg Nesterov
[not found] ` <20121025171236.GA6776-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:12 ` [PATCH 1/1] freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD Oleg Nesterov
[not found] ` <20121025171256.GB6776-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:20 ` Tejun Heo
[not found] ` <20121025172016.GF11442@htj.dyndns.org>
[not found] ` <20121025172016.GF11442-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2012-10-25 17:37 ` Oleg Nesterov
[not found] ` <20121025173756.GB7650-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-25 17:37 ` Tejun Heo
[not found] ` <CAOS58YPAVVr=itauGD9eTpfRLSBLuM8Bpyuq9AP73MDr8dPmiQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-25 20:13 ` Rafael J. Wysocki
2012-10-16 22:28 ` [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup Tejun Heo
2012-10-16 22:28 ` [PATCH 6/7] cgroup_freezer: prepare update_if_frozen() for locking change Tejun Heo
2012-10-16 22:28 ` [PATCH 7/7] cgroup_freezer: don't use cgroup_lock_live_group() Tejun Heo
2012-10-17 19:16 ` [PATCHSET cgroup/for-3.8] cgroup_freezer: allow migration regardless of freezer state and update locking Matt Helsley
[not found] ` <20121017191606.GA6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-10-18 21:14 ` Tejun Heo
[not found] ` <20121018211434.GI13370@google.com>
[not found] ` <20121018211434.GI13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-18 22:21 ` Matt Helsley
[not found] ` <20121018222155.GB6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-10-18 22:35 ` Tejun Heo
[not found] ` <20121018223517.GQ13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-18 23:47 ` Matt Helsley
[not found] ` <20121018234726.GC6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-10-19 0:01 ` Tejun Heo
[not found] ` <20121019000153.GZ13370@google.com>
[not found] ` <20121019000153.GZ13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-19 1:29 ` Matt Helsley
[not found] ` <20121019012945.GD6223@us.ibm.com>
[not found] ` <20121019012945.GD6223-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2012-10-19 20:02 ` Tejun Heo
2012-10-19 16:54 ` Rafael J. Wysocki
[not found] ` <2424755.Pg0O5tTD3k@vostro.rjw.lan>
[not found] ` <2424755.Pg0O5tTD3k-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>
2012-10-19 20:04 ` Tejun Heo
[not found] ` <20121019200421.GO13370@google.com>
[not found] ` <20121019200421.GO13370-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-10-21 19:18 ` Oleg Nesterov
[not found] ` <20121021191853.GB26218-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-21 19:24 ` Tejun Heo
[not found] ` <1350426526-14254-6-git-send-email-tj@kernel.org>
[not found] ` <1350426526-14254-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-10-22 19:25 ` [PATCH 5/7] cgroup_freezer: allow moving tasks in and out of a frozen cgroup Oleg Nesterov
[not found] ` <20121022192506.GA27163@redhat.com>
[not found] ` <20121022192506.GA27163-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-10-22 21:25 ` Tejun Heo
[not found] ` <20121022212505.GG5951@atj.dyndns.org>
[not found] ` <20121022212505.GG5951-OlzNCW9NnSVy/B6EtB590w@public.gmane.org>
2012-10-23 16:14 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=507E6C4B.6000704@huawei.com \
--to=lizefan-hv44wf8li93qt0dzr+alfa@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=rjw-KKrjLPT3xs0@public.gmane.org \
--cc=stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox