cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
@ 2025-03-10 17:04 Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 01/10] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions Michal Koutný
                   ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

Despite RT_GROUP_SCHED is only available on cgroup v1, there are still
some (v1-bound) users of this feature. General purpose distros (e.g.
[1][2][3][4]) cannot enable CONFIG_RT_GROUP_SCHED easily:
- since it prevents creation of RT tasks unless RT runtime is determined
  and distributed into cgroup tree,
- grouping of RT threads is not what is desired by default on such
  systems,
- it prevents use of cgroup v2 with RT tasks.

This changeset aims at deferring the decision whether to have
CONFIG_RT_GROUP_SCHED or not up until the boot time.
By default RT groups are available as originally but the user can
pass rt_group_sched=0 kernel cmdline parameter that disables the
grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
runtime overhead).

The series is organized as follows:

1) generic ifdefs cleanup, no functional changes,
2) preparing root_task_group to be used in places that take shortcuts in
   the case of !CONFIG_RT_GROUP_SCHED,
3) boot cmdline option that controls cgroup (v1) attributes,
4) conditional bypass of non-root task groups,
5) checks and comments refresh.

The crux are patches:
  sched: Skip non-root task_groups with disabled RT_GROUP
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED

Further notes:
- it is not sched_feat() flag because that can be flipped any time
- runtime disablement is not implemented as infinite per-cgroup RT limit
  since that'd still employ group scheduling which is unlike
  !CONFIG_RT_GROUP_SCHED
- there remain two variants of various functions for
  CONFIG_RT_GROUP_SCHED and !CONFIG_RT_GROUP_SCHED, those could be
  folded into one and runtime evaluated guards in the folded functions
  could be used (I haven't posted it yet due to unclear performance
  benefit)
- I noticed some lockdep issues over rt_runtime_lock but those are also
  in an unpatched kernel (and they seem to have been present since a
  long time with CONFIG_RT_GROUP_SCHED)

Changes from RFC (https://lore.kernel.org/r/20241216201305.19761-1-mkoutny@suse.com/):
- fix macro CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED invocation
- rebase on torvalds/master

Changes from v1 (https://lore.kernel.org/all/20250210151239.50055-1-mkoutny@suse.com/)
- add runtime deprecation warning

[1] Debian (https://salsa.debian.org/kernel-team/linux/-/blob/debian/latest/debian/config/kernelarch-x86/config),
[2] ArchLinux (https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config),
[3] Fedora (https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/kernel-x86_64-fedora.config)
[4] openSUSE TW (https://github.com/SUSE/kernel-source/blob/stable/config/x86_64/default)

Michal Koutný (10):
  sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions
  sched: Remove unneeed macro wrap
  sched: Always initialize rt_rq's task_group
  sched: Add commadline option for RT_GROUP_SCHED toggling
  sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  sched: Do not construct nor expose RT_GROUP_SCHED structures if
    disabled
  sched: Add RT_GROUP WARN checks for non-root task_groups
  sched: Add annotations to RT_GROUP_SCHED fields
  sched: Add deprecation warning for users of RT_GROUP_SCHED

 .../admin-guide/kernel-parameters.txt         |  5 ++
 init/Kconfig                                  | 11 +++
 kernel/sched/core.c                           | 70 +++++++++++++++----
 kernel/sched/rt.c                             | 51 +++++++++-----
 kernel/sched/sched.h                          | 34 +++++++--
 kernel/sched/syscalls.c                       |  5 +-
 6 files changed, 138 insertions(+), 38 deletions(-)


base-commit: 69e858e0b8b2ea07759e995aa383e8780d9d140c
-- 
2.48.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2 01/10] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 02/10] sched: Remove unneeed macro wrap Michal Koutný
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

Convert the blocks guarded by macros to regular code so that the RT
group code gets more compile validation. Reasoning is in
Documentation/process/coding-style.rst 21) Conditional Compilation.
With that, no functional change is expected.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/rt.c       | 10 ++++------
 kernel/sched/syscalls.c |  2 +-
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4b8e33c615b12..3116745be304b 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1068,13 +1068,12 @@ inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
 {
 	struct rq *rq = rq_of_rt_rq(rt_rq);
 
-#ifdef CONFIG_RT_GROUP_SCHED
 	/*
 	 * Change rq's cpupri only if rt_rq is the top queue.
 	 */
-	if (&rq->rt != rt_rq)
+	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && &rq->rt != rt_rq)
 		return;
-#endif
+
 	if (rq->online && prio < prev_prio)
 		cpupri_set(&rq->rd->cpupri, rq->cpu, prio);
 }
@@ -1084,13 +1083,12 @@ dec_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
 {
 	struct rq *rq = rq_of_rt_rq(rt_rq);
 
-#ifdef CONFIG_RT_GROUP_SCHED
 	/*
 	 * Change rq's cpupri only if rt_rq is the top queue.
 	 */
-	if (&rq->rt != rt_rq)
+	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && &rq->rt != rt_rq)
 		return;
-#endif
+
 	if (rq->online && rt_rq->highest_prio.curr != prev_prio)
 		cpupri_set(&rq->rd->cpupri, rq->cpu, rt_rq->highest_prio.curr);
 }
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 456d339be98fb..8629a87628ebf 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -640,7 +640,7 @@ int __sched_setscheduler(struct task_struct *p,
 			retval = -EPERM;
 			goto unlock;
 		}
-#endif
+#endif /* CONFIG_RT_GROUP_SCHED */
 #ifdef CONFIG_SMP
 		if (dl_bandwidth_enabled() && dl_policy(policy) &&
 				!(attr->sched_flags & SCHED_FLAG_SUGOV)) {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 02/10] sched: Remove unneeed macro wrap
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 01/10] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 03/10] sched: Always initialize rt_rq's task_group Michal Koutný
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

rt_entity_is_task has split definitions based on CONFIG_RT_GROUP_SCHED,
therefore we can use it always. No functional change intended.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/rt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 3116745be304b..17b1fd0bac1d9 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1257,11 +1257,9 @@ static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct rt_prio_arr
 static inline struct sched_statistics *
 __schedstats_from_rt_se(struct sched_rt_entity *rt_se)
 {
-#ifdef CONFIG_RT_GROUP_SCHED
 	/* schedstats is not supported for rt group. */
 	if (!rt_entity_is_task(rt_se))
 		return NULL;
-#endif
 
 	return &rt_task_of(rt_se)->stats;
 }
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 03/10] sched: Always initialize rt_rq's task_group
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 01/10] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 02/10] sched: Remove unneeed macro wrap Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 04/10] sched: Add commadline option for RT_GROUP_SCHED toggling Michal Koutný
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

rt_rq->tg may be NULL which denotes the root task_group.
Store the pointer to root_task_group directly so that callers may use
rt_rq->tg homogenously.

root_task_group exists always with CONFIG_CGROUPS_SCHED,
CONFIG_RT_GROUP_SCHED depends on that.

This changes root level rt_rq's default limit from infinity to the
value of (originally) global RT throttling.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/rt.c    | 7 ++-----
 kernel/sched/sched.h | 2 ++
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 17b1fd0bac1d9..dabb26b438e88 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -89,6 +89,7 @@ void init_rt_rq(struct rt_rq *rt_rq)
 	rt_rq->rt_throttled = 0;
 	rt_rq->rt_runtime = 0;
 	raw_spin_lock_init(&rt_rq->rt_runtime_lock);
+	rt_rq->tg = &root_task_group;
 #endif
 }
 
@@ -484,9 +485,6 @@ static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
 
 static inline u64 sched_rt_runtime(struct rt_rq *rt_rq)
 {
-	if (!rt_rq->tg)
-		return RUNTIME_INF;
-
 	return rt_rq->rt_runtime;
 }
 
@@ -1156,8 +1154,7 @@ inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 	if (rt_se_boosted(rt_se))
 		rt_rq->rt_nr_boosted++;
 
-	if (rt_rq->tg)
-		start_rt_bandwidth(&rt_rq->tg->rt_bandwidth);
+	start_rt_bandwidth(&rt_rq->tg->rt_bandwidth);
 }
 
 static void
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 38e0e323dda26..4453e79ff65a3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -827,6 +827,8 @@ struct rt_rq {
 	unsigned int		rt_nr_boosted;
 
 	struct rq		*rq;
+#endif
+#ifdef CONFIG_CGROUP_SCHED
 	struct task_group	*tg;
 #endif
 };
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 04/10] sched: Add commadline option for RT_GROUP_SCHED toggling
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (2 preceding siblings ...)
  2025-03-10 17:04 ` [PATCH v2 03/10] sched: Always initialize rt_rq's task_group Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 05/10] sched: Skip non-root task_groups with disabled RT_GROUP_SCHED Michal Koutný
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

Only simple implementation with a static key wrapper, it will be wired
in later.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 ++++
 init/Kconfig                                  | 11 ++++++++
 kernel/sched/core.c                           | 25 +++++++++++++++++++
 kernel/sched/sched.h                          | 17 +++++++++++++
 4 files changed, 58 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb8752b42ec85..6f734c57e6ce2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6235,6 +6235,11 @@
 			Memory area to be used by remote processor image,
 			managed by CMA.
 
+	rt_group_sched=	[KNL] Enable or disable SCHED_RR/FIFO group scheduling
+			when CONFIG_RT_GROUP_SCHED=y. Defaults to
+			!CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED.
+			Format: <bool>
+
 	rw		[KNL] Mount root device read-write on boot
 
 	S		[KNL] Run init in single mode
diff --git a/init/Kconfig b/init/Kconfig
index 4dbc059d2de5c..5461e232d325a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1079,6 +1079,17 @@ config RT_GROUP_SCHED
 	  realtime bandwidth for them.
 	  See Documentation/scheduler/sched-rt-group.rst for more information.
 
+config RT_GROUP_SCHED_DEFAULT_DISABLED
+	bool "Require boot parameter to enable group scheduling for SCHED_RR/FIFO"
+	depends on RT_GROUP_SCHED
+	default n
+	help
+	  When set, the RT group scheduling is disabled by default. The option
+	  is in inverted form so that mere RT_GROUP_SCHED enables the group
+	  scheduling.
+
+	  Say N if unsure.
+
 config EXT_GROUP_SCHED
 	bool
 	depends on SCHED_CLASS_EXT && CGROUP_SCHED
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 165c90ba64ea9..e6e072e618a00 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9852,6 +9852,31 @@ static struct cftype cpu_legacy_files[] = {
 	{ }	/* Terminate */
 };
 
+#ifdef CONFIG_RT_GROUP_SCHED
+# ifdef CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED
+DEFINE_STATIC_KEY_FALSE(rt_group_sched);
+# else
+DEFINE_STATIC_KEY_TRUE(rt_group_sched);
+# endif
+
+static int __init setup_rt_group_sched(char *str)
+{
+	long val;
+
+	if (kstrtol(str, 0, &val) || val < 0 || val > 1) {
+		pr_warn("Unable to set rt_group_sched\n");
+		return 1;
+	}
+	if (val)
+		static_branch_enable(&rt_group_sched);
+	else
+		static_branch_disable(&rt_group_sched);
+
+	return 1;
+}
+__setup("rt_group_sched=", setup_rt_group_sched);
+#endif /* CONFIG_RT_GROUP_SCHED */
+
 static int cpu_extra_stat_show(struct seq_file *sf,
 			       struct cgroup_subsys_state *css)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4453e79ff65a3..e4f6c0b1a3163 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1508,6 +1508,23 @@ static inline bool sched_group_cookie_match(struct rq *rq,
 }
 
 #endif /* !CONFIG_SCHED_CORE */
+#ifdef CONFIG_RT_GROUP_SCHED
+# ifdef CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED
+DECLARE_STATIC_KEY_FALSE(rt_group_sched);
+static inline bool rt_group_sched_enabled(void)
+{
+	return static_branch_unlikely(&rt_group_sched);
+}
+# else
+DECLARE_STATIC_KEY_TRUE(rt_group_sched);
+static inline bool rt_group_sched_enabled(void)
+{
+	return static_branch_likely(&rt_group_sched);
+}
+# endif /* CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED */
+#else
+# define rt_group_sched_enabled()	false
+#endif /* CONFIG_RT_GROUP_SCHED */
 
 static inline void lockdep_assert_rq_held(struct rq *rq)
 {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 05/10] sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (3 preceding siblings ...)
  2025-03-10 17:04 ` [PATCH v2 04/10] sched: Add commadline option for RT_GROUP_SCHED toggling Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 06/10] sched: Bypass bandwitdh checks with runtime " Michal Koutný
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

First, we want to prevent placement of RT tasks on non-root rt_rqs which
we achieve in the task migration code that'd fall back to
root_task_group's rt_rq.

Second, we want to work with only root_task_group's rt_rq when iterating
all "real" rt_rqs when RT_GROUP is disabled. To achieve this we keep
root_task_group as the first one on the task_groups and break out
quickly.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/core.c  | 2 +-
 kernel/sched/rt.c    | 9 ++++++---
 kernel/sched/sched.h | 7 +++++++
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e6e072e618a00..5b67b4704a5ed 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8966,7 +8966,7 @@ void sched_online_group(struct task_group *tg, struct task_group *parent)
 	unsigned long flags;
 
 	spin_lock_irqsave(&task_group_lock, flags);
-	list_add_rcu(&tg->list, &task_groups);
+	list_add_tail_rcu(&tg->list, &task_groups);
 
 	/* Root should already exist: */
 	WARN_ON(!parent);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index dabb26b438e88..a427c3f560b71 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -497,6 +497,9 @@ typedef struct task_group *rt_rq_iter_t;
 
 static inline struct task_group *next_task_group(struct task_group *tg)
 {
+	if (!rt_group_sched_enabled())
+		return NULL;
+
 	do {
 		tg = list_entry_rcu(tg->list.next,
 			typeof(struct task_group), list);
@@ -509,9 +512,9 @@ static inline struct task_group *next_task_group(struct task_group *tg)
 }
 
 #define for_each_rt_rq(rt_rq, iter, rq)					\
-	for (iter = container_of(&task_groups, typeof(*iter), list);	\
-		(iter = next_task_group(iter)) &&			\
-		(rt_rq = iter->rt_rq[cpu_of(rq)]);)
+	for (iter = &root_task_group;					\
+		iter && (rt_rq = iter->rt_rq[cpu_of(rq)]);		\
+		iter = next_task_group(iter))
 
 #define for_each_sched_rt_entity(rt_se) \
 	for (; rt_se; rt_se = rt_se->parent)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e4f6c0b1a3163..4548048dbcb8f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2187,6 +2187,13 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
 #endif
 
 #ifdef CONFIG_RT_GROUP_SCHED
+	/*
+	 * p->rt.rt_rq is NULL initially and it is easier to assign
+	 * root_task_group's rt_rq than switching in rt_rq_of_se()
+	 * Clobbers tg(!)
+	 */
+	if (!rt_group_sched_enabled())
+		tg = &root_task_group;
 	p->rt.rt_rq  = tg->rt_rq[cpu];
 	p->rt.parent = tg->rt_se[cpu];
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 06/10] sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (4 preceding siblings ...)
  2025-03-10 17:04 ` [PATCH v2 05/10] sched: Skip non-root task_groups with disabled RT_GROUP_SCHED Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-04-02 12:02   ` Peter Zijlstra
  2025-03-10 17:04 ` [PATCH v2 07/10] sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled Michal Koutný
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

When RT_GROUPs are compiled but not exposed, their bandwidth cannot
be configured (and it is not initialized for non-root task_groups neither).
Therefore bypass any checks of task vs task_group bandwidth.

This will achieve behavior very similar to setups that have
!CONFIG_RT_GROUP_SCHED and attach cpu controller to cgroup v2 hierarchy.
(On a related note, this may allow having RT tasks with
CONFIG_RT_GROUP_SCHED and cgroup v2 hierarchy.)

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/core.c     | 6 +++++-
 kernel/sched/rt.c       | 2 +-
 kernel/sched/syscalls.c | 3 ++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5b67b4704a5ed..a418e7bc6a123 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9166,11 +9166,15 @@ static int cpu_cgroup_can_attach(struct cgroup_taskset *tset)
 	struct task_struct *task;
 	struct cgroup_subsys_state *css;
 
+	if (!rt_group_sched_enabled())
+		goto scx_check;
+
 	cgroup_taskset_for_each(task, css, tset) {
 		if (!sched_rt_can_attach(css_tg(css), task))
 			return -EINVAL;
 	}
-#endif
+scx_check:
+#endif /* CONFIG_RT_GROUP_SCHED */
 	return scx_cgroup_can_attach(tset);
 }
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index a427c3f560b71..f25fe2862a7df 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2866,7 +2866,7 @@ static int sched_rt_global_constraints(void)
 int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
 {
 	/* Don't accept real-time tasks when there is no way for them to run */
-	if (rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)
+	if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runtime == 0)
 		return 0;
 
 	return 1;
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 8629a87628ebf..7b1689af9ff1e 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -634,7 +634,8 @@ int __sched_setscheduler(struct task_struct *p,
 		 * Do not allow real-time tasks into groups that have no runtime
 		 * assigned.
 		 */
-		if (rt_bandwidth_enabled() && rt_policy(policy) &&
+		if (rt_group_sched_enabled() &&
+				rt_bandwidth_enabled() && rt_policy(policy) &&
 				task_group(p)->rt_bandwidth.rt_runtime == 0 &&
 				!task_group_is_autogroup(task_group(p))) {
 			retval = -EPERM;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 07/10] sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (5 preceding siblings ...)
  2025-03-10 17:04 ` [PATCH v2 06/10] sched: Bypass bandwitdh checks with runtime " Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 08/10] sched: Add RT_GROUP WARN checks for non-root task_groups Michal Koutný
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

Thanks to kernel cmdline being available early, before any
cgroup hierarchy exists, we can achieve the RT_GROUP_SCHED boottime
disabling goal by simply skipping any creation (and destruction) of
RT_GROUP data and its exposure via RT attributes.

We can do this thanks to previously placed runtime guards that would
redirect all operations to root_task_group's data when RT_GROUP_SCHED
disabled.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/core.c | 36 ++++++++++++++++++++++++------------
 kernel/sched/rt.c   |  9 +++++++++
 2 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a418e7bc6a123..4b2d9ec0c1f23 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9827,18 +9827,6 @@ static struct cftype cpu_legacy_files[] = {
 		.seq_show = cpu_cfs_local_stat_show,
 	},
 #endif
-#ifdef CONFIG_RT_GROUP_SCHED
-	{
-		.name = "rt_runtime_us",
-		.read_s64 = cpu_rt_runtime_read,
-		.write_s64 = cpu_rt_runtime_write,
-	},
-	{
-		.name = "rt_period_us",
-		.read_u64 = cpu_rt_period_read_uint,
-		.write_u64 = cpu_rt_period_write_uint,
-	},
-#endif
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 	{
 		.name = "uclamp.min",
@@ -9857,6 +9845,20 @@ static struct cftype cpu_legacy_files[] = {
 };
 
 #ifdef CONFIG_RT_GROUP_SCHED
+static struct cftype rt_group_files[] = {
+	{
+		.name = "rt_runtime_us",
+		.read_s64 = cpu_rt_runtime_read,
+		.write_s64 = cpu_rt_runtime_write,
+	},
+	{
+		.name = "rt_period_us",
+		.read_u64 = cpu_rt_period_read_uint,
+		.write_u64 = cpu_rt_period_write_uint,
+	},
+	{ }	/* Terminate */
+};
+
 # ifdef CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED
 DEFINE_STATIC_KEY_FALSE(rt_group_sched);
 # else
@@ -9879,6 +9881,16 @@ static int __init setup_rt_group_sched(char *str)
 	return 1;
 }
 __setup("rt_group_sched=", setup_rt_group_sched);
+
+static int __init cpu_rt_group_init(void)
+{
+	if (!rt_group_sched_enabled())
+		return 0;
+
+	WARN_ON(cgroup_add_legacy_cftypes(&cpu_cgrp_subsys, rt_group_files));
+	return 0;
+}
+subsys_initcall(cpu_rt_group_init);
 #endif /* CONFIG_RT_GROUP_SCHED */
 
 static int cpu_extra_stat_show(struct seq_file *sf,
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index f25fe2862a7df..1633b49b2ce26 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -195,6 +195,9 @@ static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se)
 
 void unregister_rt_sched_group(struct task_group *tg)
 {
+	if (!rt_group_sched_enabled())
+		return;
+
 	if (tg->rt_se)
 		destroy_rt_bandwidth(&tg->rt_bandwidth);
 }
@@ -203,6 +206,9 @@ void free_rt_sched_group(struct task_group *tg)
 {
 	int i;
 
+	if (!rt_group_sched_enabled())
+		return;
+
 	for_each_possible_cpu(i) {
 		if (tg->rt_rq)
 			kfree(tg->rt_rq[i]);
@@ -247,6 +253,9 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
 	struct sched_rt_entity *rt_se;
 	int i;
 
+	if (!rt_group_sched_enabled())
+		return 1;
+
 	tg->rt_rq = kcalloc(nr_cpu_ids, sizeof(rt_rq), GFP_KERNEL);
 	if (!tg->rt_rq)
 		goto err;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 08/10] sched: Add RT_GROUP WARN checks for non-root task_groups
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (6 preceding siblings ...)
  2025-03-10 17:04 ` [PATCH v2 07/10] sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 09/10] sched: Add annotations to RT_GROUP_SCHED fields Michal Koutný
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

With CONFIG_RT_GROUP_SCHED but runtime disabling of RT_GROUPs we expect
the existence of the root task_group only and all rt_sched_entity'ies
should be queued on root's rt_rq.

If we get a non-root RT_GROUP something went wrong.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/rt.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 1633b49b2ce26..d0acfc112d68e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -178,11 +178,14 @@ static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
 
 static inline struct rq *rq_of_rt_rq(struct rt_rq *rt_rq)
 {
+	/* Cannot fold with non-CONFIG_RT_GROUP_SCHED version, layout */
+	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
 	return rt_rq->rq;
 }
 
 static inline struct rt_rq *rt_rq_of_se(struct sched_rt_entity *rt_se)
 {
+	WARN_ON(!rt_group_sched_enabled() && rt_se->rt_rq->tg != &root_task_group);
 	return rt_se->rt_rq;
 }
 
@@ -190,6 +193,7 @@ static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se)
 {
 	struct rt_rq *rt_rq = rt_se->rt_rq;
 
+	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
 	return rt_rq->rq;
 }
 
@@ -506,8 +510,10 @@ typedef struct task_group *rt_rq_iter_t;
 
 static inline struct task_group *next_task_group(struct task_group *tg)
 {
-	if (!rt_group_sched_enabled())
+	if (!rt_group_sched_enabled()) {
+		WARN_ON(tg != &root_task_group);
 		return NULL;
+	}
 
 	do {
 		tg = list_entry_rcu(tg->list.next,
@@ -2609,8 +2615,9 @@ static int task_is_throttled_rt(struct task_struct *p, int cpu)
 {
 	struct rt_rq *rt_rq;
 
-#ifdef CONFIG_RT_GROUP_SCHED
+#ifdef CONFIG_RT_GROUP_SCHED // XXX maybe add task_rt_rq(), see also sched_rt_period_rt_rq
 	rt_rq = task_group(p)->rt_rq[cpu];
+	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg != &root_task_group);
 #else
 	rt_rq = &cpu_rq(cpu)->rt;
 #endif
@@ -2720,6 +2727,9 @@ static int tg_rt_schedulable(struct task_group *tg, void *data)
 	    tg->rt_bandwidth.rt_runtime && tg_has_rt_tasks(tg))
 		return -EBUSY;
 
+	if (WARN_ON(!rt_group_sched_enabled() && tg != &root_task_group))
+		return -EBUSY;
+
 	total = to_ratio(period, runtime);
 
 	/*
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 09/10] sched: Add annotations to RT_GROUP_SCHED fields
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (7 preceding siblings ...)
  2025-03-10 17:04 ` [PATCH v2 08/10] sched: Add RT_GROUP WARN checks for non-root task_groups Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-03-10 17:04 ` [PATCH v2 10/10] sched: Add deprecation warning for users of RT_GROUP_SCHED Michal Koutný
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

Update comments to ease RT throttling understanding.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/sched.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4548048dbcb8f..51feefef65c66 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -819,17 +819,17 @@ struct rt_rq {
 
 #ifdef CONFIG_RT_GROUP_SCHED
 	int			rt_throttled;
-	u64			rt_time;
-	u64			rt_runtime;
+	u64			rt_time; /* consumed RT time, goes up in update_curr_rt */
+	u64			rt_runtime; /* allotted RT time, "slice" from rt_bandwidth, RT sharing/balancing */
 	/* Nests inside the rq lock: */
 	raw_spinlock_t		rt_runtime_lock;
 
 	unsigned int		rt_nr_boosted;
 
-	struct rq		*rq;
+	struct rq		*rq; /* this is always top-level rq, cache? */
 #endif
 #ifdef CONFIG_CGROUP_SCHED
-	struct task_group	*tg;
+	struct task_group	*tg; /* this tg has "this" rt_rq on given CPU for runnable entities */
 #endif
 };
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2 10/10] sched: Add deprecation warning for users of RT_GROUP_SCHED
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (8 preceding siblings ...)
  2025-03-10 17:04 ` [PATCH v2 09/10] sched: Add annotations to RT_GROUP_SCHED fields Michal Koutný
@ 2025-03-10 17:04 ` Michal Koutný
  2025-04-17 12:13   ` Michal Koutný
  2025-03-24 18:10 ` [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
  2025-04-01 11:05 ` Peter Zijlstra
  11 siblings, 1 reply; 19+ messages in thread
From: Michal Koutný @ 2025-03-10 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker, Michal Koutný

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4b2d9ec0c1f23..6866355046d21 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9748,6 +9748,7 @@ static int cpu_cfs_local_stat_show(struct seq_file *sf, void *v)
 static int cpu_rt_runtime_write(struct cgroup_subsys_state *css,
 				struct cftype *cft, s64 val)
 {
+	pr_warn_once("RT_GROUP throttling is deprecated, use global sched_rt_runtime_us and deadline tasks.\n");
 	return sched_group_set_rt_runtime(css_tg(css), val);
 }
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (9 preceding siblings ...)
  2025-03-10 17:04 ` [PATCH v2 10/10] sched: Add deprecation warning for users of RT_GROUP_SCHED Michal Koutný
@ 2025-03-24 18:10 ` Michal Koutný
  2025-04-01 11:05 ` Peter Zijlstra
  11 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-03-24 18:10 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 376 bytes --]

Hello.

On Mon, Mar 10, 2025 at 06:04:32PM +0100, Michal Koutný <mkoutny@suse.com> wrote:
...
> Changes from v1 (https://lore.kernel.org/all/20250210151239.50055-1-mkoutny@suse.com/)
> - add runtime deprecation warning

Peter, has this addition made the boot-time configurability less
dreadful (until legacy users can migrate to something better)?

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
  2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
                   ` (10 preceding siblings ...)
  2025-03-24 18:10 ` [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
@ 2025-04-01 11:05 ` Peter Zijlstra
  2025-04-03 12:17   ` Michal Koutný
  11 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2025-04-01 11:05 UTC (permalink / raw)
  To: Michal Koutný
  Cc: cgroups, linux-kernel, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

On Mon, Mar 10, 2025 at 06:04:32PM +0100, Michal Koutný wrote:
> Despite RT_GROUP_SCHED is only available on cgroup v1, there are still
> some (v1-bound) users of this feature. General purpose distros (e.g.
> [1][2][3][4]) cannot enable CONFIG_RT_GROUP_SCHED easily:
> - since it prevents creation of RT tasks unless RT runtime is determined
>   and distributed into cgroup tree,
> - grouping of RT threads is not what is desired by default on such
>   systems,
> - it prevents use of cgroup v2 with RT tasks.
> 
> This changeset aims at deferring the decision whether to have
> CONFIG_RT_GROUP_SCHED or not up until the boot time.
> By default RT groups are available as originally but the user can
> pass rt_group_sched=0 kernel cmdline parameter that disables the
> grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
> runtime overhead).
> 
> The series is organized as follows:

Right, so at OSPM we had a proposal for a cgroup-v2 variant of all this
that's based on deadline servers. And I am hoping we can eventually
either fully deprecate the v1 thing or re-implement it sufficiently
close without breaking the interface.

But this is purely about enabling cgroup-v1 usage, right?

You meantion some overhead of having this on, is that measured and in
the patches?

Anyway, I'll go have a peek now, finally :-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 06/10] sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  2025-03-10 17:04 ` [PATCH v2 06/10] sched: Bypass bandwitdh checks with runtime " Michal Koutný
@ 2025-04-02 12:02   ` Peter Zijlstra
  2025-04-03 12:20     ` Michal Koutný
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2025-04-02 12:02 UTC (permalink / raw)
  To: Michal Koutný
  Cc: cgroups, linux-kernel, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

On Mon, Mar 10, 2025 at 06:04:38PM +0100, Michal Koutný wrote:
> When RT_GROUPs are compiled but not exposed, their bandwidth cannot
> be configured (and it is not initialized for non-root task_groups neither).
> Therefore bypass any checks of task vs task_group bandwidth.
> 
> This will achieve behavior very similar to setups that have
> !CONFIG_RT_GROUP_SCHED and attach cpu controller to cgroup v2 hierarchy.
> (On a related note, this may allow having RT tasks with
> CONFIG_RT_GROUP_SCHED and cgroup v2 hierarchy.)

Can we make it so that cgroup-v2 is explicitly disallowed for now? As I
said earlier, we're looking at a new implemention with a incompatible
interface.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
  2025-04-01 11:05 ` Peter Zijlstra
@ 2025-04-03 12:17   ` Michal Koutný
  2025-04-07  5:57     ` Juri Lelli
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Koutný @ 2025-04-03 12:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: cgroups, linux-kernel, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

On Tue, Apr 01, 2025 at 01:05:08PM +0200, Peter Zijlstra <peterz@infradead.org> wrote:
> > By default RT groups are available as originally but the user can
> > pass rt_group_sched=0 kernel cmdline parameter that disables the
> > grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
> > runtime overhead).
> > 
> ...
> 
> Right, so at OSPM we had a proposal for a cgroup-v2 variant of all this
> that's based on deadline servers.

Interesting, are there any slides or recording available?

> And I am hoping we can eventually either fully deprecate the v1 thing
> or re-implement it sufficiently close without breaking the interface.

I converged to discourate rt_groups for these reasons:
1) They aren't RT guarantee for workloads
  - especially when it's possible to configure different periods
2) They aren't containment of RT tasks
  - RT task throttled in a group may hold a shared resource and thus its
    issues propagate to RT tasks in different groups
3) The allocation model [2] is difficult to configure
  - to honor delegation and reasonable default
  - illustration of another allocation model resource are cpuset cpus,
    whose abstraction in cgroup v2 is quite sophisticated

Based on that, I'm not proponent of any RT groups support in cgroup v2
(I'd need to see a use case where it could be justified). IIUC, the
deadline servers could help with 1).

> But this is purely about enabling cgroup-v1 usage, right?

Yes, users need to explicitly be on cgroup v1 (IOW they're stuck on v1
because of reliance on RT groups).

> You meantion some overhead of having this on, is that measured and in
> the patches?

I expect most would be affected RT task users who go from
!CONFIG_RT_GROUP_SCHED to CONFIG_RT_GROUP_SCHED and
CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED. That's my perception from code
that I touched but I haven't measured anything. Would this be
an interesting datum?

Thanks,
Michal

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#allocations

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 06/10] sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  2025-04-02 12:02   ` Peter Zijlstra
@ 2025-04-03 12:20     ` Michal Koutný
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-04-03 12:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: cgroups, linux-kernel, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

On Wed, Apr 02, 2025 at 02:02:21PM +0200, Peter Zijlstra <peterz@infradead.org> wrote:
> Can we make it so that cgroup-v2 is explicitly disallowed for now? As I
> said earlier, we're looking at a new implemention with a incompatible
> interface.

I meant here that
- rt_group_sched=0 -> cgroup v2 works but there's no RT group scheduling
- rt_group_sched=1 -> cgroup v2 doesn't work (prohibit RT tasks in
                      non-root groups)

I.e. there is no new function for cgroup v2 besides that it is possible
to switch to RT group scheduling (with v1) without recompiling the
kernel.

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
  2025-04-03 12:17   ` Michal Koutný
@ 2025-04-07  5:57     ` Juri Lelli
  0 siblings, 0 replies; 19+ messages in thread
From: Juri Lelli @ 2025-04-07  5:57 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Peter Zijlstra, cgroups, linux-kernel, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Frederic Weisbecker

Hi Michal,

On 03/04/25 14:17, Michal Koutný wrote:
> On Tue, Apr 01, 2025 at 01:05:08PM +0200, Peter Zijlstra <peterz@infradead.org> wrote:
> > > By default RT groups are available as originally but the user can
> > > pass rt_group_sched=0 kernel cmdline parameter that disables the
> > > grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
> > > runtime overhead).
> > > 
> > ...
> > 
> > Right, so at OSPM we had a proposal for a cgroup-v2 variant of all this
> > that's based on deadline servers.
> 
> Interesting, are there any slides or recording available?

Yes, here (freshly uploaded :)

https://youtu.be/1-s8YU3Rzts?si=c4H0jZl4_5bq8pI9

Best,
Juri


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 10/10] sched: Add deprecation warning for users of RT_GROUP_SCHED
  2025-03-10 17:04 ` [PATCH v2 10/10] sched: Add deprecation warning for users of RT_GROUP_SCHED Michal Koutný
@ 2025-04-17 12:13   ` Michal Koutný
  2025-05-02 14:48     ` Michal Koutný
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Koutný @ 2025-04-17 12:13 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

Hello.

On Mon, Mar 10, 2025 at 06:04:42PM +0100, Michal Koutný <mkoutny@suse.com> wrote:
...
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
...
>  static int cpu_rt_runtime_write(struct cgroup_subsys_state *css,
>                                 struct cftype *cft, s64 val)
> {
> +	pr_warn_once("RT_GROUP throttling is deprecated, use global sched_rt_runtime_us and deadline tasks.\n");

I just noticed that this patch isn't picked together with the rest of
the series in tip/sched/core.
Did it slip through the cracks (as the last one) or is that intentional
for some reason?

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2 10/10] sched: Add deprecation warning for users of RT_GROUP_SCHED
  2025-04-17 12:13   ` Michal Koutný
@ 2025-05-02 14:48     ` Michal Koutný
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Koutný @ 2025-05-02 14:48 UTC (permalink / raw)
  To: Peter Zijlstra, cgroups, linux-kernel
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 794 bytes --]

Hello Peter.

On Thu, Apr 17, 2025 at 02:13:25PM +0200, Michal Koutný <mkoutny@suse.com> wrote:
> On Mon, Mar 10, 2025 at 06:04:42PM +0100, Michal Koutný <mkoutny@suse.com> wrote:
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> ...
> >  static int cpu_rt_runtime_write(struct cgroup_subsys_state *css,
> >                                 struct cftype *cft, s64 val)
> > {
> > +	pr_warn_once("RT_GROUP throttling is deprecated, use global sched_rt_runtime_us and deadline tasks.\n");
> 
> I just noticed that this patch isn't picked together with the rest of
> the series in tip/sched/core.
> Did it slip through the cracks (as the last one) or is that intentional
> for some reason?

I'm still wondering about this so that the users get right (or no)
message.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-05-02 14:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-10 17:04 [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
2025-03-10 17:04 ` [PATCH v2 01/10] sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions Michal Koutný
2025-03-10 17:04 ` [PATCH v2 02/10] sched: Remove unneeed macro wrap Michal Koutný
2025-03-10 17:04 ` [PATCH v2 03/10] sched: Always initialize rt_rq's task_group Michal Koutný
2025-03-10 17:04 ` [PATCH v2 04/10] sched: Add commadline option for RT_GROUP_SCHED toggling Michal Koutný
2025-03-10 17:04 ` [PATCH v2 05/10] sched: Skip non-root task_groups with disabled RT_GROUP_SCHED Michal Koutný
2025-03-10 17:04 ` [PATCH v2 06/10] sched: Bypass bandwitdh checks with runtime " Michal Koutný
2025-04-02 12:02   ` Peter Zijlstra
2025-04-03 12:20     ` Michal Koutný
2025-03-10 17:04 ` [PATCH v2 07/10] sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled Michal Koutný
2025-03-10 17:04 ` [PATCH v2 08/10] sched: Add RT_GROUP WARN checks for non-root task_groups Michal Koutný
2025-03-10 17:04 ` [PATCH v2 09/10] sched: Add annotations to RT_GROUP_SCHED fields Michal Koutný
2025-03-10 17:04 ` [PATCH v2 10/10] sched: Add deprecation warning for users of RT_GROUP_SCHED Michal Koutný
2025-04-17 12:13   ` Michal Koutný
2025-05-02 14:48     ` Michal Koutný
2025-03-24 18:10 ` [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched Michal Koutný
2025-04-01 11:05 ` Peter Zijlstra
2025-04-03 12:17   ` Michal Koutný
2025-04-07  5:57     ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).