All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mathias Krause <minipli@grsecurity.net>
Cc: "Vincent Guittot" <vincent.guittot@linaro.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Benjamin Segall" <bsegall@google.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Mel Gorman" <mgorman@suse.de>,
	"Daniel Bristot de Oliveira" <bristot@redhat.com>,
	"Valentin Schneider" <Valentin.Schneider@arm.com>,
	linux-kernel@vger.kernel.org, "Odin Ugedal" <odin@uged.al>,
	"Kevin Tanguy" <kevin.tanguy@corp.ovh.com>,
	"Brad Spengler" <spender@grsecurity.net>
Subject: Re: [PATCH] sched/fair: Prevent dead task groups from regaining cfs_rq's
Date: Sat, 6 Nov 2021 11:48:54 +0100	[thread overview]
Message-ID: <20211106104854.GU174703@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20211105162914.215420-1-minipli@grsecurity.net>

On Fri, Nov 05, 2021 at 05:29:14PM +0100, Mathias Krause wrote:
> > Looks like it needs to be the kfree_rcu() one in this case. I'll prepare
> > a patch.
> 
> Testing the below patch right now. Looking good so far. Will prepare a
> proper patch later, if we all can agree that this covers all cases.
> 
> But the basic idea is to defer the kfree()'s to after the next RCU GP,
> which also means we need to free the tg object itself later. Slightly
> ugly. :/

How's this then?

---
diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c
index 2067080bb235..8629b37d118e 100644
--- a/kernel/sched/autogroup.c
+++ b/kernel/sched/autogroup.c
@@ -31,7 +31,7 @@ static inline void autogroup_destroy(struct kref *kref)
 	ag->tg->rt_se = NULL;
 	ag->tg->rt_rq = NULL;
 #endif
-	sched_offline_group(ag->tg);
+	sched_release_group(ag->tg);
 	sched_destroy_group(ag->tg);
 }
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9cb81ef8acc8..22528bd61ba5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9715,6 +9715,21 @@ static void sched_free_group(struct task_group *tg)
 	kmem_cache_free(task_group_cache, tg);
 }
 
+static void sched_free_group_rcu(struct rcu_head *rcu)
+{
+	sched_free_group(container_of(rcu, struct task_group, rcu_head));
+}
+
+static void sched_unregister_group(struct task_group *tg)
+{
+	unregister_fair_sched_group(tg);
+	/*
+	 * We have to wait for yet another RCU grace period to expire, as
+	 * print_cfs_stats() might run concurrently.
+	 */
+	call_rcu(&tg->rcu, sched_free_group_rcu);
+}
+
 /* allocate runqueue etc for a new task group */
 struct task_group *sched_create_group(struct task_group *parent)
 {
@@ -9735,7 +9750,7 @@ struct task_group *sched_create_group(struct task_group *parent)
 	return tg;
 
 err:
-	sched_free_group(tg);
+	sched_unregister_group(tg);
 	return ERR_PTR(-ENOMEM);
 }
 
@@ -9758,25 +9773,35 @@ void sched_online_group(struct task_group *tg, struct task_group *parent)
 }
 
 /* rcu callback to free various structures associated with a task group */
-static void sched_free_group_rcu(struct rcu_head *rhp)
+static void sched_unregister_group_rcu(struct rcu_head *rhp)
 {
 	/* Now it should be safe to free those cfs_rqs: */
-	sched_free_group(container_of(rhp, struct task_group, rcu));
+	sched_unregister_group(container_of(rhp, struct task_group, rcu));
 }
 
 void sched_destroy_group(struct task_group *tg)
 {
 	/* Wait for possible concurrent references to cfs_rqs complete: */
-	call_rcu(&tg->rcu, sched_free_group_rcu);
+	call_rcu(&tg->rcu, sched_unregister_group_rcu);
 }
 
-void sched_offline_group(struct task_group *tg)
+void sched_release_group(struct task_group *tg)
 {
 	unsigned long flags;
 
-	/* End participation in shares distribution: */
-	unregister_fair_sched_group(tg);
-
+	/*
+	 * Unlink first, to avoid walk_tg_tree_from() from finding us (via
+	 * sched_cfs_period_timer()).
+	 *
+	 * For this to be effective, we have to wait for all pending users of
+	 * this task group to leave their RCU critical section to ensure no new
+	 * user will see our dying task group any more. Specifically ensure
+	 * that tg_unthrottle_up() won't add decayed cfs_rq's to it.
+	 *
+	 * We therefore defer calling unregister_fair_sched_group() to
+	 * sched_unregister_group() which is guarantied to get called only after the
+	 * current RCU grace period has expired.
+	 */
 	spin_lock_irqsave(&task_group_lock, flags);
 	list_del_rcu(&tg->list);
 	list_del_rcu(&tg->siblings);
@@ -9895,7 +9920,7 @@ static void cpu_cgroup_css_released(struct cgroup_subsys_state *css)
 {
 	struct task_group *tg = css_tg(css);
 
-	sched_offline_group(tg);
+	sched_release_group(tg);
 }
 
 static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
@@ -9905,7 +9930,7 @@ static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
 	/*
 	 * Relies on the RCU grace period between css_released() and this.
 	 */
-	sched_free_group(tg);
+	sched_unregister_group(tg);
 }
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f0b249ec581d..20038274c57b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -504,7 +504,7 @@ extern struct task_group *sched_create_group(struct task_group *parent);
 extern void sched_online_group(struct task_group *tg,
 			       struct task_group *parent);
 extern void sched_destroy_group(struct task_group *tg);
-extern void sched_offline_group(struct task_group *tg);
+extern void sched_release_group(struct task_group *tg);
 
 extern void sched_move_task(struct task_struct *tsk);
 

  parent reply	other threads:[~2021-11-06 10:49 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-11 17:22 [PATCH] sched/fair: Use rq->lock when checking cfs_rq list presence Michal Koutný
2021-10-11 19:12 ` Odin Ugedal
2021-10-12 18:32   ` Tao Zhou
2021-10-13 18:52     ` Odin Ugedal
2021-10-13 14:39   ` Michal Koutný
2021-10-13 18:45     ` Odin Ugedal
2021-10-13  7:57 ` Vincent Guittot
2021-10-13 14:26   ` Michal Koutný
2021-11-02 16:02     ` task_group unthrottling and removal race (was Re: [PATCH] sched/fair: Use rq->lock when checking cfs_rq list) presence Michal Koutný
2021-11-02 20:20       ` Odin Ugedal
2021-11-03  9:51       ` Mathias Krause
2021-11-03 10:51         ` Mathias Krause
2021-11-03 11:10           ` Michal Koutný
2021-11-03 14:16             ` Mathias Krause
2021-11-03 19:06               ` [PATCH] sched/fair: Prevent dead task groups from regaining cfs_rq's Mathias Krause
2021-11-03 22:03                 ` Benjamin Segall
2021-11-04  8:50                   ` Vincent Guittot
2021-11-04 15:13                     ` Mathias Krause
2021-11-04 16:49                       ` Vincent Guittot
2021-11-04 17:37                         ` Mathias Krause
2021-11-05 14:25                           ` Vincent Guittot
2021-11-05 14:44                             ` Mathias Krause
2021-11-05 16:29                               ` Mathias Krause
2021-11-05 16:58                                 ` Peter Zijlstra
2021-11-05 17:14                                   ` Mathias Krause
2021-11-05 17:27                                     ` Peter Zijlstra
2021-11-05 17:40                                       ` Mathias Krause
2021-11-06 10:48                                 ` Peter Zijlstra [this message]
2021-11-08 10:27                                   ` Mathias Krause
2021-11-08 11:40                                     ` Peter Zijlstra
2021-11-08 15:06                                       ` Mathias Krause
2021-11-10 15:14                                         ` Vincent Guittot
2021-11-09 18:47                                       ` Michal Koutný
2021-11-09 18:47                                         ` Michal Koutný
     [not found]                                         ` <20211109184744.GA31882-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2021-11-10 15:17                                           ` Vincent Guittot
2021-11-10 15:17                                             ` Vincent Guittot
2021-11-04 20:46                       ` Benjamin Segall
2021-11-04 18:49                 ` Michal Koutný
2021-11-05 14:55                   ` Mathias Krause
2021-11-05 14:58                 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211106104854.GU174703@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Valentin.Schneider@arm.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kevin.tanguy@corp.ovh.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=minipli@grsecurity.net \
    --cc=mkoutny@suse.com \
    --cc=odin@uged.al \
    --cc=rostedt@goodmis.org \
    --cc=spender@grsecurity.net \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.