public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched
@ 2024-12-16 20:12 Michal Koutný
  2024-12-16 20:12 ` [RFC PATCH 1/9] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions Michal Koutný
                   ` (10 more replies)
  0 siblings, 11 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:12 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

Despite RT_GROUP_SCHED is only available on cgroup v1, there are still
some users of this feature. General purpose distros (e.g. [1][2][3][4])
cannot enable CONFIG_RT_GROUP_SCHED easily:
- since it prevents creation of RT tasks unless RT runtime is determined
  and distributed into cgroup tree,
- grouping of RT threads is not what is desired by default on such
  systems,
- it prevents use of cgroup v2 with RT tasks.

This changeset aims at deferring the decision whether to have
CONFIG_RT_GROUP_SCHED or not up until the boot time.
By default RT groups are available as originally but the user can
pass rt_group_sched=0 kernel cmdline parameter that disables the
grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
runtime overhead).

The series is organized as follows:

1) generic ifdefs cleanup, no functional changes,
2) preparing root_task_group to be used in places that take shortcuts in
   the case of !CONFIG_RT_GROUP_SCHED,
3) boot cmdline option that controls cgroup (v1) attributes,
4) conditional bypass of non-root task groups,
5) checks and comments refresh.

The crux are patches:
  sched: Skip non-root task_groups with disabled RT_GROUP
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED

Futher notes:
- it is not sched_feat() flag because that can be flipped any time
- runtime disablement is not implemented as infinite per-cgroup RT limit
  since that'd still employ group scheduling which is unlike
  !CONFIG_RT_GROUP_SCHED

RFC notes:
- there remain two variants of various functions for
  CONFIG_RT_GROUP_SCHED and !CONFIG_RT_GROUP_SCHED, those could be
  folded into one and runtime evaluated guards in the folded functions
  could be used (I haven't posted it yet due to unclear performance
  benefit)
- I noticed some lockdep issues over rt_runtime_lock but those are also
  in an unpatched kernel (and they seem to have been present since a
  long time without complications)

[1] Debian (https://salsa.debian.org/kernel-team/linux/-/blob/debian/latest/debian/config/kernelarch-x86/config),
[2] ArchLinux (https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config),
[3] Fedora (https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/kernel-x86_64-fedora.config)
[4] openSUSE TW (https://github.com/SUSE/kernel-source/blob/stable/config/x86_64/default)

Michal Koutný (9):
  sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions
  sched: Remove unneeed macro wrap
  sched: Always initialize rt_rq's task_group
  sched: Add commadline option for RT_GROUP_SCHED toggling
  sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  sched: Do not construct nor expose RT_GROUP_SCHED structures if
    disabled
  sched: Add RT_GROUP WARN checks for non-root task_groups
  sched: Add annotations to RT_GROUP_SCHED fields

 .../admin-guide/kernel-parameters.txt         |  5 ++
 init/Kconfig                                  | 11 +++
 kernel/sched/core.c                           | 69 +++++++++++++++----
 kernel/sched/rt.c                             | 51 +++++++++-----
 kernel/sched/sched.h                          | 34 +++++++--
 kernel/sched/syscalls.c                       |  5 +-
 6 files changed, 137 insertions(+), 38 deletions(-)


base-commit: f92f4749861b06fed908d336b4dee1326003291b
-- 
2.47.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/9] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
@ 2024-12-16 20:12 ` Michal Koutný
  2024-12-16 20:12 ` [RFC PATCH 2/9] sched: Remove unneeed macro wrap Michal Koutný
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:12 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

Convert the blocks guarded by macros to regular code so that the RT
group code gets more compile validation. Reasoning is in
Documentation/process/coding-style.rst 21) Conditional Compilation.
With that, no functional change is expected.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/rt.c       | 10 ++++------
 kernel/sched/syscalls.c |  2 +-
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index bd66a46b06aca..6ea46c7219634 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1068,13 +1068,12 @@ inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
 {
 	struct rq *rq = rq_of_rt_rq(rt_rq);
 
-#ifdef CONFIG_RT_GROUP_SCHED
 	/*
 	 * Change rq's cpupri only if rt_rq is the top queue.
 	 */
-	if (&rq->rt != rt_rq)
+	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && &rq->rt != rt_rq)
 		return;
-#endif
+
 	if (rq->online && prio < prev_prio)
 		cpupri_set(&rq->rd->cpupri, rq->cpu, prio);
 }
@@ -1084,13 +1083,12 @@ dec_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
 {
 	struct rq *rq = rq_of_rt_rq(rt_rq);
 
-#ifdef CONFIG_RT_GROUP_SCHED
 	/*
 	 * Change rq's cpupri only if rt_rq is the top queue.
 	 */
-	if (&rq->rt != rt_rq)
+	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && &rq->rt != rt_rq)
 		return;
-#endif
+
 	if (rq->online && rt_rq->highest_prio.curr != prev_prio)
 		cpupri_set(&rq->rd->cpupri, rq->cpu, rt_rq->highest_prio.curr);
 }
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index ff0e5ab4e37cb..77d0d4a2b68da 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -650,7 +650,7 @@ int __sched_setscheduler(struct task_struct *p,
 			retval = -EPERM;
 			goto unlock;
 		}
-#endif
+#endif /* CONFIG_RT_GROUP_SCHED */
 #ifdef CONFIG_SMP
 		if (dl_bandwidth_enabled() && dl_policy(policy) &&
 				!(attr->sched_flags & SCHED_FLAG_SUGOV)) {
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/9] sched: Remove unneeed macro wrap
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
  2024-12-16 20:12 ` [RFC PATCH 1/9] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions Michal Koutný
@ 2024-12-16 20:12 ` Michal Koutný
  2024-12-16 20:12 ` [RFC PATCH 3/9] sched: Always initialize rt_rq's task_group Michal Koutný
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:12 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

rt_entity_is_task has split definitions based on CONFIG_RT_GROUP_SCHED,
therefore we can use it always. No functional change intended.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/rt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6ea46c7219634..1940301c40f7d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1257,11 +1257,9 @@ static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct rt_prio_arr
 static inline struct sched_statistics *
 __schedstats_from_rt_se(struct sched_rt_entity *rt_se)
 {
-#ifdef CONFIG_RT_GROUP_SCHED
 	/* schedstats is not supported for rt group. */
 	if (!rt_entity_is_task(rt_se))
 		return NULL;
-#endif
 
 	return &rt_task_of(rt_se)->stats;
 }
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/9] sched: Always initialize rt_rq's task_group
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
  2024-12-16 20:12 ` [RFC PATCH 1/9] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions Michal Koutný
  2024-12-16 20:12 ` [RFC PATCH 2/9] sched: Remove unneeed macro wrap Michal Koutný
@ 2024-12-16 20:12 ` Michal Koutný
  2024-12-16 20:13 ` [RFC PATCH 4/9] sched: Add commadline option for RT_GROUP_SCHED toggling Michal Koutný
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:12 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

rt_rq->tg may be NULL which denotes the root task_group.
Store the pointer to root_task_group directly so that callers may use
rt_rq->tg homogenously.

root_task_group exists always with CONFIG_CGROUPS_SCHED,
CONFIG_RT_GROUP_SCHED depends on that.

This changes root level rt_rq's default limit from infinity to the
value of (originally) global RT throttling.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/rt.c    | 7 ++-----
 kernel/sched/sched.h | 2 ++
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 1940301c40f7d..41fed8865cb09 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -89,6 +89,7 @@ void init_rt_rq(struct rt_rq *rt_rq)
 	rt_rq->rt_throttled = 0;
 	rt_rq->rt_runtime = 0;
 	raw_spin_lock_init(&rt_rq->rt_runtime_lock);
+	rt_rq->tg = &root_task_group;
 #endif
 }
 
@@ -484,9 +485,6 @@ static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
 
 static inline u64 sched_rt_runtime(struct rt_rq *rt_rq)
 {
-	if (!rt_rq->tg)
-		return RUNTIME_INF;
-
 	return rt_rq->rt_runtime;
 }
 
@@ -1156,8 +1154,7 @@ inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 	if (rt_se_boosted(rt_se))
 		rt_rq->rt_nr_boosted++;
 
-	if (rt_rq->tg)
-		start_rt_bandwidth(&rt_rq->tg->rt_bandwidth);
+	start_rt_bandwidth(&rt_rq->tg->rt_bandwidth);
 }
 
 static void
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 76f5f53a645fc..38325bd32a0e0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -820,6 +820,8 @@ struct rt_rq {
 	unsigned int		rt_nr_boosted;
 
 	struct rq		*rq;
+#endif
+#ifdef CONFIG_CGROUP_SCHED
 	struct task_group	*tg;
 #endif
 };
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 4/9] sched: Add commadline option for RT_GROUP_SCHED toggling
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (2 preceding siblings ...)
  2024-12-16 20:12 ` [RFC PATCH 3/9] sched: Always initialize rt_rq's task_group Michal Koutný
@ 2024-12-16 20:13 ` Michal Koutný
  2024-12-16 20:13 ` [RFC PATCH 5/9] sched: Skip non-root task_groups with disabled RT_GROUP_SCHED Michal Koutný
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:13 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

Only simple implementation with a static key wrapper, it will be wired
in later.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 ++++
 init/Kconfig                                  | 11 ++++++++
 kernel/sched/core.c                           | 25 +++++++++++++++++++
 kernel/sched/sched.h                          | 17 +++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3872bc6ec49d6..1c890c9ad8716 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5937,6 +5937,11 @@
 			Memory area to be used by remote processor image,
 			managed by CMA.
 
+	rt_group_sched=	[KNL] Enable or disable SCHED_RR/FIFO group scheduling
+			when CONFIG_RT_GROUP_SCHED=y. Defaults to
+			!CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED.
+			Format: <bool>
+
 	rw		[KNL] Mount root device read-write on boot
 
 	S		[KNL] Run init in single mode
diff --git a/init/Kconfig b/init/Kconfig
index a20e6efd3f0fb..7823e5ac0311d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1076,6 +1076,17 @@ config RT_GROUP_SCHED
 	  realtime bandwidth for them.
 	  See Documentation/scheduler/sched-rt-group.rst for more information.
 
+config RT_GROUP_SCHED_DEFAULT_DISABLED
+	bool "Require boot parameter to enable group scheduling for SCHED_RR/FIFO"
+	depends on RT_GROUP_SCHED
+	default n
+	help
+	  When set, the RT group scheduling is disabled by default. The option
+	  is in inverted form so that mere RT_GROUP_SCHED enables the group
+	  scheduling.
+
+	  Say N if unsure.
+
 config EXT_GROUP_SCHED
 	bool
 	depends on SCHED_CLASS_EXT && CGROUP_SCHED
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c6d8232ad9eea..47898f895a5a3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9844,6 +9844,31 @@ static struct cftype cpu_legacy_files[] = {
 	{ }	/* Terminate */
 };
 
+#ifdef CONFIG_RT_GROUP_SCHED
+# ifdef RT_GROUP_SCHED_DEFAULT_DISABLED
+DEFINE_STATIC_KEY_FALSE(rt_group_sched);
+# else
+DEFINE_STATIC_KEY_TRUE(rt_group_sched);
+# endif
+
+static int __init setup_rt_group_sched(char *str)
+{
+	long val;
+
+	if (kstrtol(str, 0, &val) || val < 0 || val > 1) {
+		pr_warn("Unable to set rt_group_sched\n");
+		return 1;
+	}
+	if (val)
+		static_branch_enable(&rt_group_sched);
+	else
+		static_branch_disable(&rt_group_sched);
+
+	return 1;
+}
+__setup("rt_group_sched=", setup_rt_group_sched);
+#endif /* CONFIG_RT_GROUP_SCHED */
+
 static int cpu_extra_stat_show(struct seq_file *sf,
 			       struct cgroup_subsys_state *css)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 38325bd32a0e0..1c457dc1472a3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1501,6 +1501,23 @@ static inline bool sched_group_cookie_match(struct rq *rq,
 }
 
 #endif /* !CONFIG_SCHED_CORE */
+#ifdef CONFIG_RT_GROUP_SCHED
+# ifdef RT_GROUP_SCHED_DEFAULT_DISABLED
+DECLARE_STATIC_KEY_FALSE(rt_group_sched);
+static inline bool rt_group_sched_enabled(void)
+{
+	return static_branch_unlikely(&rt_group_sched);
+}
+# else
+DECLARE_STATIC_KEY_TRUE(rt_group_sched);
+static inline bool rt_group_sched_enabled(void)
+{
+	return static_branch_likely(&rt_group_sched);
+}
+# endif /* RT_GROUP_SCHED_DEFAULT_DISABLED */
+#else
+# define rt_group_sched_enabled()	false
+#endif /* CONFIG_RT_GROUP_SCHED */
 
 static inline void lockdep_assert_rq_held(struct rq *rq)
 {
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 5/9] sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (3 preceding siblings ...)
  2024-12-16 20:13 ` [RFC PATCH 4/9] sched: Add commadline option for RT_GROUP_SCHED toggling Michal Koutný
@ 2024-12-16 20:13 ` Michal Koutný
  2024-12-16 20:13 ` [RFC PATCH 6/9] sched: Bypass bandwitdh checks with runtime " Michal Koutný
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:13 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

First, we want to prevent placement of RT tasks on non-root rt_rqs which
we achieve in the task migration code that'd fall back to
root_task_group's rt_rq.

Second, we want to work with only root_task_group's rt_rq when iterating
all "real" rt_rqs when RT_GROUP is disabled. To achieve this we keep
root_task_group as the first one on the task_groups and break out
quickly.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/core.c  | 2 +-
 kernel/sched/rt.c    | 9 ++++++---
 kernel/sched/sched.h | 7 +++++++
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 47898f895a5a3..dfd2778622b8b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8958,7 +8958,7 @@ void sched_online_group(struct task_group *tg, struct task_group *parent)
 	unsigned long flags;
 
 	spin_lock_irqsave(&task_group_lock, flags);
-	list_add_rcu(&tg->list, &task_groups);
+	list_add_tail_rcu(&tg->list, &task_groups);
 
 	/* Root should already exist: */
 	WARN_ON(!parent);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 41fed8865cb09..923ec978ff756 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -497,6 +497,9 @@ typedef struct task_group *rt_rq_iter_t;
 
 static inline struct task_group *next_task_group(struct task_group *tg)
 {
+	if (!rt_group_sched_enabled())
+		return NULL;
+
 	do {
 		tg = list_entry_rcu(tg->list.next,
 			typeof(struct task_group), list);
@@ -509,9 +512,9 @@ static inline struct task_group *next_task_group(struct task_group *tg)
 }
 
 #define for_each_rt_rq(rt_rq, iter, rq)					\
-	for (iter = container_of(&task_groups, typeof(*iter), list);	\
-		(iter = next_task_group(iter)) &&			\
-		(rt_rq = iter->rt_rq[cpu_of(rq)]);)
+	for (iter = &root_task_group;					\
+		iter && (rt_rq = iter->rt_rq[cpu_of(rq)]);		\
+		iter = next_task_group(iter))
 
 #define for_each_sched_rt_entity(rt_se) \
 	for (; rt_se; rt_se = rt_se->parent)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1c457dc1472a3..d8d28c3d1ac5f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2148,6 +2148,13 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 #endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
+	/*
+	 * p->rt.rt_rq is NULL initially and it is easier to assign
+	 * root_task_group's rt_rq than switching in rt_rq_of_se()
+	 * Clobbers tg(!)
+	 */
+	if (!rt_group_sched_enabled())
+		tg = &root_task_group;
 	p->rt.rt_rq  = tg->rt_rq[cpu];
 	p->rt.parent = tg->rt_se[cpu];
 #endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 6/9] sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (4 preceding siblings ...)
  2024-12-16 20:13 ` [RFC PATCH 5/9] sched: Skip non-root task_groups with disabled RT_GROUP_SCHED Michal Koutný
@ 2024-12-16 20:13 ` Michal Koutný
  2024-12-16 20:13 ` [RFC PATCH 7/9] sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled Michal Koutný
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:13 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

When RT_GROUPs are compiled but not exposed, their bandwidth cannot
be configured (and it is not initialized for non-root task_groups neither).
Therefore bypass any checks of task vs task_group bandwidth.

This will achieve behavior very similar to setups that have
!CONFIG_RT_GROUP_SCHED and attach cpu controller to cgroup v2 hierarchy.
(On a related note, this may allow having RT tasks with
CONFIG_RT_GROUP_SCHED and cgroup v2 hierarchy.)

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/core.c     | 6 +++++-
 kernel/sched/rt.c       | 2 +-
 kernel/sched/syscalls.c | 3 ++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dfd2778622b8b..6e21e0885557d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9158,11 +9158,15 @@ static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 	struct task_struct *task;
 	struct cgroup_subsys_state *css;
 
+	if (!rt_group_sched_enabled())
+		goto scx_check;
+
 	cgroup_taskset_for_each(task, css, tset) {
 		if (!sched_rt_can_attach(css_tg(css), task))
 			return -EINVAL;
 	}
-#endif
+scx_check:
+#endif /* CONFIG_RT_GROUP_SCHED */
 	return scx_cgroup_can_attach(tset);
 }
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 923ec978ff756..161d91f7479b4 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2866,7 +2866,7 @@ static int sched_rt_global_constraints(void)
 int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
 {
 	/* Don't accept real-time tasks when there is no way for them to run */
-	if (rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)
+	if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)
 		return 0;
 
 	return 1;
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 77d0d4a2b68da..80382f5d53a44 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -644,7 +644,8 @@ int __sched_setscheduler(struct task_struct *p,
 		 * Do not allow real-time tasks into groups that have no runtime
 		 * assigned.
 		 */
-		if (rt_bandwidth_enabled() && rt_policy(policy) &&
+		if (rt_group_sched_enabled() &&
+				rt_bandwidth_enabled() && rt_policy(policy) &&
 				task_group(p)->rt_bandwidth.rt_runtime == 0 &&
 				!task_group_is_autogroup(task_group(p))) {
 			retval = -EPERM;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 7/9] sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (5 preceding siblings ...)
  2024-12-16 20:13 ` [RFC PATCH 6/9] sched: Bypass bandwitdh checks with runtime " Michal Koutný
@ 2024-12-16 20:13 ` Michal Koutný
  2024-12-16 20:13 ` [RFC PATCH 8/9] sched: Add RT_GROUP WARN checks for non-root task_groups Michal Koutný
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:13 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

Thanks to kernel cmdline being available early, before any
cgroup hierarchy exists, we can achieve the RT_GROUP_SCHED boottime
disabling goal by simply skipping any creation (and destruction) of
RT_GROUP data and its exposure via RT attributes.

We can do this thanks to previously placed runtime guards that would
redirect all operations to root_task_group's data when RT_GROUP_SCHED
disabled.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/core.c | 36 ++++++++++++++++++++++++------------
 kernel/sched/rt.c   |  9 +++++++++
 2 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6e21e0885557d..300a1a83e1a3c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9819,18 +9819,6 @@ static struct cftype cpu_legacy_files[] = {
 		.seq_show = cpu_cfs_local_stat_show,
 	},
 #endif
-#ifdef CONFIG_RT_GROUP_SCHED
-	{
-		.name = "rt_runtime_us",
-		.read_s64 = cpu_rt_runtime_read,
-		.write_s64 = cpu_rt_runtime_write,
-	},
-	{
-		.name = "rt_period_us",
-		.read_u64 = cpu_rt_period_read_uint,
-		.write_u64 = cpu_rt_period_write_uint,
-	},
-#endif
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 	{
 		.name = "uclamp.min",
@@ -9849,6 +9837,20 @@ static struct cftype cpu_legacy_files[] = {
 };
 
 #ifdef CONFIG_RT_GROUP_SCHED
+static struct cftype rt_group_files[] = {
+	{
+		.name = "rt_runtime_us",
+		.read_s64 = cpu_rt_runtime_read,
+		.write_s64 = cpu_rt_runtime_write,
+	},
+	{
+		.name = "rt_period_us",
+		.read_u64 = cpu_rt_period_read_uint,
+		.write_u64 = cpu_rt_period_write_uint,
+	},
+	{ }	/* Terminate */
+};
+
 # ifdef RT_GROUP_SCHED_DEFAULT_DISABLED
 DEFINE_STATIC_KEY_FALSE(rt_group_sched);
 # else
@@ -9871,6 +9873,16 @@ static int __init setup_rt_group_sched(char *str)
 	return 1;
 }
 __setup("rt_group_sched=", setup_rt_group_sched);
+
+static int __init cpu_rt_group_init(void)
+{
+	if (!rt_group_sched_enabled())
+		return 0;
+
+	WARN_ON(cgroup_add_legacy_cftypes(&cpu_cgrp_subsys, rt_group_files));
+	return 0;
+}
+subsys_initcall(cpu_rt_group_init);
 #endif /* CONFIG_RT_GROUP_SCHED */
 
 static int cpu_extra_stat_show(struct seq_file *sf,
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 161d91f7479b4..db7cdc82003bd 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -195,6 +195,9 @@ static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se)
 
 void unregister_rt_sched_group(struct task_group *tg)
 {
+	if (!rt_group_sched_enabled())
+		return;
+
 	if (tg->rt_se)
 		destroy_rt_bandwidth(&tg->rt_bandwidth);
 }
@@ -203,6 +206,9 @@ void free_rt_sched_group(struct task_group *tg)
 {
 	int i;
 
+	if (!rt_group_sched_enabled())
+		return;
+
 	for_each_possible_cpu(i) {
 		if (tg->rt_rq)
 			kfree(tg->rt_rq[i]);
@@ -247,6 +253,9 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
 	struct sched_rt_entity *rt_se;
 	int i;
 
+	if (!rt_group_sched_enabled())
+		return 1;
+
 	tg->rt_rq = kcalloc(nr_cpu_ids, sizeof(rt_rq), GFP_KERNEL);
 	if (!tg->rt_rq)
 		goto err;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 8/9] sched: Add RT_GROUP WARN checks for non-root task_groups
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (6 preceding siblings ...)
  2024-12-16 20:13 ` [RFC PATCH 7/9] sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled Michal Koutný
@ 2024-12-16 20:13 ` Michal Koutný
  2024-12-16 20:13 ` [RFC PATCH 9/9] sched: Add annotations to RT_GROUP_SCHED fields Michal Koutný
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:13 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

With CONFIG_RT_GROUP_SCHED but runtime disabling of RT_GROUPs we expect
the existence of the root task_group only and all rt_sched_entity'ies
should be queued on root's rt_rq.

If we get a non-root RT_GROUP something went wrong.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/rt.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index db7cdc82003bd..deacd46e27823 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -178,11 +178,14 @@ static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
 
 static inline struct rq *rq_of_rt_rq(struct rt_rq *rt_rq)
 {
+	/* Cannot fold with non-CONFIG_RT_GROUP_SCHED version, layout */
+	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
 	return rt_rq->rq;
 }
 
 static inline struct rt_rq *rt_rq_of_se(struct sched_rt_entity *rt_se)
 {
+	WARN_ON(!rt_group_sched_enabled() && rt_se->rt_rq->tg != &root_task_group);
 	return rt_se->rt_rq;
 }
 
@@ -190,6 +193,7 @@ static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se)
 {
 	struct rt_rq *rt_rq = rt_se->rt_rq;
 
+	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
 	return rt_rq->rq;
 }
 
@@ -506,8 +510,10 @@ typedef struct task_group *rt_rq_iter_t;
 
 static inline struct task_group *next_task_group(struct task_group *tg)
 {
-	if (!rt_group_sched_enabled())
+	if (!rt_group_sched_enabled()) {
+		WARN_ON(tg != &root_task_group);
 		return NULL;
+	}
 
 	do {
 		tg = list_entry_rcu(tg->list.next,
@@ -2609,8 +2615,9 @@ static int task_is_throttled_rt(struct task_struct *p, int cpu)
 {
 	struct rt_rq *rt_rq;
 
-#ifdef CONFIG_RT_GROUP_SCHED
+#ifdef CONFIG_RT_GROUP_SCHED // XXX maybe add task_rt_rq(), see also sched_rt_period_rt_rq
 	rt_rq = task_group(p)->rt_rq[cpu];
+	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
 #else
 	rt_rq = &cpu_rq(cpu)->rt;
 #endif
@@ -2720,6 +2727,9 @@ static int tg_rt_schedulable(struct task_group *tg, void *data)
 	    tg->rt_bandwidth.rt_runtime && tg_has_rt_tasks(tg))
 		return -EBUSY;
 
+	if (WARN_ON(!rt_group_sched_enabled() && tg != &root_task_group))
+		return -EBUSY;
+
 	total = to_ratio(period, runtime);
 
 	/*
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 9/9] sched: Add annotations to RT_GROUP_SCHED fields
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (7 preceding siblings ...)
  2024-12-16 20:13 ` [RFC PATCH 8/9] sched: Add RT_GROUP WARN checks for non-root task_groups Michal Koutný
@ 2024-12-16 20:13 ` Michal Koutný
  2025-01-07 19:28 ` [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
  2025-01-07 19:41 ` Peter Zijlstra
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2024-12-16 20:13 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

Update comments to ease RT throttling understanding.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/sched.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d8d28c3d1ac5f..5c32c23915810 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -812,17 +812,17 @@ struct rt_rq {
 
 #ifdef CONFIG_RT_GROUP_SCHED
 	int			rt_throttled;
-	u64			rt_time;
-	u64			rt_runtime;
+	u64			rt_time; /* consumed RT time, goes up in update_curr_rt */
+	u64			rt_runtime; /* allotted RT time, "slice" from rt_bandwidth, RT sharing/balancing */
 	/* Nests inside the rq lock: */
 	raw_spinlock_t		rt_runtime_lock;
 
 	unsigned int		rt_nr_boosted;
 
-	struct rq		*rq;
+	struct rq		*rq; /* this is always top-level rq, cache? */
 #endif
 #ifdef CONFIG_CGROUP_SCHED
-	struct task_group	*tg;
+	struct task_group	*tg; /* this tg has "this" rt_rq on given CPU for runnable entities */
 #endif
 };
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (8 preceding siblings ...)
  2024-12-16 20:13 ` [RFC PATCH 9/9] sched: Add annotations to RT_GROUP_SCHED fields Michal Koutný
@ 2025-01-07 19:28 ` Michal Koutný
  2025-01-07 19:41 ` Peter Zijlstra
  10 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2025-01-07 19:28 UTC (permalink / raw)
  To: cgroups, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

On Mon, Dec 16, 2024 at 09:12:56PM +0100, Michal Koutný <mkoutny@suse.com> wrote:
> The series is organized as follows:

(I saw no replies, this may have slipped through the turn of the year
period.)

> RFC notes:

So I wonder if there any initial comments on this change.

Thanks,
Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched
  2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (9 preceding siblings ...)
  2025-01-07 19:28 ` [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
@ 2025-01-07 19:41 ` Peter Zijlstra
  2025-01-10 10:04   ` Michal Koutný
  10 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2025-01-07 19:41 UTC (permalink / raw)
  To: Michal Koutný
  Cc: cgroups, linux-kernel, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

On Mon, Dec 16, 2024 at 09:12:56PM +0100, Michal Koutný wrote:
> Despite RT_GROUP_SCHED is only available on cgroup v1, there are still
> some users of this feature. General purpose distros (e.g. [1][2][3][4])
> cannot enable CONFIG_RT_GROUP_SCHED easily:

We all hate this thing and want it to go away. So not being able to use
it is a pro from where I'm at.

Sadly the replacement isn't there yet either, which makes it all really
difficult.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched
  2025-01-07 19:41 ` Peter Zijlstra
@ 2025-01-10 10:04   ` Michal Koutný
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Koutný @ 2025-01-10 10:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: cgroups, linux-kernel, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 860 bytes --]

On Tue, Jan 07, 2025 at 08:41:06PM +0100, Peter Zijlstra <peterz@infradead.org> wrote:
> We all hate this thing and want it to go away. So not being able to use
> it is a pro from where I'm at.

I understand and to some extent am not a fan of it neither (we had
disabled it in SUSE quite some time ago). I'd consider the remaining
existing users legacy.

> Sadly the replacement isn't there yet either, which makes it all really
> difficult.

Exactly. Thus the runtime switch is meant as a bridge for general
purpose distros where a kernel is shipped pre-configured (i.e. one
config where the default is non-grouped not to hinder the majority use
cases).

Considering the legacy usecases on distribution kernels do you oppose
the chosen approach? I can work on changes if you have comments on the
implementation itself.

Thanks,
Michal


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-01-10 10:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-16 20:12 [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
2024-12-16 20:12 ` [RFC PATCH 1/9] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions Michal Koutný
2024-12-16 20:12 ` [RFC PATCH 2/9] sched: Remove unneeed macro wrap Michal Koutný
2024-12-16 20:12 ` [RFC PATCH 3/9] sched: Always initialize rt_rq's task_group Michal Koutný
2024-12-16 20:13 ` [RFC PATCH 4/9] sched: Add commadline option for RT_GROUP_SCHED toggling Michal Koutný
2024-12-16 20:13 ` [RFC PATCH 5/9] sched: Skip non-root task_groups with disabled RT_GROUP_SCHED Michal Koutný
2024-12-16 20:13 ` [RFC PATCH 6/9] sched: Bypass bandwitdh checks with runtime " Michal Koutný
2024-12-16 20:13 ` [RFC PATCH 7/9] sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled Michal Koutný
2024-12-16 20:13 ` [RFC PATCH 8/9] sched: Add RT_GROUP WARN checks for non-root task_groups Michal Koutný
2024-12-16 20:13 ` [RFC PATCH 9/9] sched: Add annotations to RT_GROUP_SCHED fields Michal Koutný
2025-01-07 19:28 ` [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched Michal Koutný
2025-01-07 19:41 ` Peter Zijlstra
2025-01-10 10:04   ` Michal Koutný

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox