[PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Oleg Nesterov <oleg@redhat.com>
To: Aleksa Sarai <cyphar@cyphar.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Li Zefan <lizefan@huawei.com>, Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org
Subject: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()
Date: Wed, 25 Nov 2015 17:34:27 +0100	[thread overview]
Message-ID: <20151125163427.GA8139@redhat.com> (raw)

Hello,

I am trying to backport cgroup_pids.c and I fail to understand pids_fork()
which does

	css = task_get_css(task, pids_cgrp_id);
	pids = css_pids(css);

	/*
	 * If the association has changed, we have to revert and reapply the
	 * charge/uncharge on the wrong hierarchy to the current one. Since
	 * the association can only change due to an organisation event, its
	 * okay for us to ignore the limit in this case.
	 */
	if (pids != old_pids) {
		pids_uncharge(old_pids, 1);
		pids_charge(pids, 1);
	}

But if the association has changed, pids_can_attach() which moved the child
into another cgrp has called pids_uncharge(old_pids) too?

IOW. Suppose that the new child is moved right before cgroup_post_fork() does

	for_each_subsys_which(...)
		ss->fork(child);

doesn't this mean that after ss->fork() we do the same sequence

		pids_uncharge(old_pids, 1);
		pids_charge(pids, 1);

twice? Note that threadgroup_change_begin/end depends on CLONE_THREAD.
So we can actually hit WARN_ON() in pids_cancel().

However, we can't simply remove this uncharge/charge afaics. We need this in
case when the parent was moved to another cgroup before cgroup_post_fork(),
and then css_set_move_task() moves the child.



I know almost nothing about cgroups, perhaps I missed something, please
correct me.

If am right. How about the patch below? percpu_down_read() is cheap. And
we can simplify cgroup_pids after this change.

And. We can probably unify cgroup_threadgroup_rwsem and dup_mmap_sem.
Note that if we take cgroup_threadgroup_rwsem for reading if CLONE_THREAD,
otherwise we take another percpu-rwsem in dup_mmap(), dup_mmap_sem.

Or I am totally confused?

Oleg.

--- x/kernel/fork.c
+++ x/kernel/fork.c
@@ -1368,8 +1368,7 @@ static struct task_struct *copy_process(
 	p->real_start_time = ktime_get_boot_ns();
 	p->io_context = NULL;
 	p->audit_context = NULL;
-	if (clone_flags & CLONE_THREAD)
-		threadgroup_change_begin(current);
+	threadgroup_change_begin(current);
 	cgroup_fork(p);
 #ifdef CONFIG_NUMA
 	p->mempolicy = mpol_dup(p->mempolicy);
@@ -1610,8 +1609,7 @@ static struct task_struct *copy_process(
 
 	proc_fork_connector(p);
 	cgroup_post_fork(p, cgrp_ss_priv);
-	if (clone_flags & CLONE_THREAD)
-		threadgroup_change_end(current);
+	threadgroup_change_end(current);
 	perf_event_fork(p);
 
 	trace_task_newtask(p, clone_flags);
--- x/kernel/cgroup_pids.c
+++ x/kernel/cgroup_pids.c
@@ -243,27 +243,10 @@ static void pids_cancel_fork(struct task
 
 static void pids_fork(struct task_struct *task, void *priv)
 {
-	struct cgroup_subsys_state *css;
-	struct cgroup_subsys_state *old_css = priv;
-	struct pids_cgroup *pids;
-	struct pids_cgroup *old_pids = css_pids(old_css);
-
-	css = task_get_css(task, pids_cgrp_id);
-	pids = css_pids(css);
-
-	/*
-	 * If the association has changed, we have to revert and reapply the
-	 * charge/uncharge on the wrong hierarchy to the current one. Since
-	 * the association can only change due to an organisation event, its
-	 * okay for us to ignore the limit in this case.
-	 */
-	if (pids != old_pids) {
-		pids_uncharge(old_pids, 1);
-		pids_charge(pids, 1);
-	}
+	struct cgroup_subsys_state *css = priv;
 
+	WARN_ON(task_css(task, pids_cgrp_id) != css);
 	css_put(css);
-	css_put(old_css);
 }
 
 static void pids_free(struct task_struct *task)

next             reply	other threads:[~2015-11-25 16:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-25 16:34 Oleg Nesterov [this message]
2015-11-25 19:51 ` [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate() Tejun Heo
2015-11-25 19:54   ` Tejun Heo
2015-11-25 20:40     ` Tejun Heo
2015-11-26 16:01       ` Oleg Nesterov
2015-11-26 15:36     ` Oleg Nesterov
2015-11-26 23:35       ` Aleksa Sarai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151125163427.GA8139@redhat.com \
    --to=oleg@redhat.com \
    --cc=cyphar@cyphar.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox