* [PATCH v3 0/3] cgroup, docs: cpu controller interaction with various scheduling policies
@ 2025-05-22 2:08 Shashank Balaji
2025-05-22 2:08 ` [PATCH v3 1/3] cgroup, docs: convert space indentation to tab indentation Shashank Balaji
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Shashank Balaji @ 2025-05-22 2:08 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet
Cc: cgroups, linux-doc, linux-kernel, Shinya Takumi, Shashank Balaji
The cgroup v2 cpu controller interface files interact with processes
differently based on their scheduling policy and the underlying
scheduler used (fair-class vs. BPF scheduler). This patchset
documents these differences.
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
---
Changes in v3:
- Refer to sched-ext.rst for fair-class vs. BPF scheduler instead of repeating
the details in cgroup-v2.rst
- Link to v2: https://lore.kernel.org/r/20250520-rt-and-cpu-controller-doc-v2-0-70a2b6a1b703@sony.com
Changes in v2:
- Expanded scope from only RT processes to all scheduling policies
- Link to v1: https://lore.kernel.org/all/20250305-rt-and-cpu-controller-doc-v1-0-7b6a6f5ff43d@sony.com/
---
Shashank Balaji (3):
cgroup, docs: convert space indentation to tab indentation
sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler"
cgroup, docs: cpu controller's interaction with various scheduling policies
Documentation/admin-guide/cgroup-v2.rst | 77 ++++++++++++++++++++++++---------
Documentation/scheduler/sched-ext.rst | 8 ++--
2 files changed, 60 insertions(+), 25 deletions(-)
---
base-commit: 036ee8a17bd046d7a350de0aae152307a061cc46
change-id: 20250226-rt-and-cpu-controller-doc-8a8aac572f3e
Best regards,
--
Shashank Balaji <shashank.mahadasyam@sony.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 1/3] cgroup, docs: convert space indentation to tab indentation
2025-05-22 2:08 [PATCH v3 0/3] cgroup, docs: cpu controller interaction with various scheduling policies Shashank Balaji
@ 2025-05-22 2:08 ` Shashank Balaji
2025-05-22 19:07 ` Tejun Heo
2025-05-22 2:08 ` [PATCH v3 2/3] sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler" Shashank Balaji
2025-05-22 2:08 ` [PATCH v3 3/3] cgroup, docs: cpu controller's interaction with various scheduling policies Shashank Balaji
2 siblings, 1 reply; 8+ messages in thread
From: Shashank Balaji @ 2025-05-22 2:08 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet
Cc: cgroups, linux-doc, linux-kernel, Shinya Takumi, Shashank Balaji
The paragraphs on cpu.uclamp.{min,max} are space indented. Convert them to
tab indentation to make them uniform with the other paragraphs.
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
---
Documentation/admin-guide/cgroup-v2.rst | 36 +++++++++++++++++----------------
1 file changed, 19 insertions(+), 17 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 1a16ce68a4d7f6f8c9070be89c4975dbfa79077e..226fc7f9212eafcbf83c81f5b08391f215c1d894 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1162,30 +1162,32 @@ All time durations are in microseconds.
:ref:`Documentation/accounting/psi.rst <psi>` for details.
cpu.uclamp.min
- A read-write single value file which exists on non-root cgroups.
- The default is "0", i.e. no utilization boosting.
+ A read-write single value file which exists on non-root cgroups.
+ The default is "0", i.e. no utilization boosting.
- The requested minimum utilization (protection) as a percentage
- rational number, e.g. 12.34 for 12.34%.
+ The requested minimum utilization (protection) as a percentage
+ rational number, e.g. 12.34 for 12.34%.
- This interface allows reading and setting minimum utilization clamp
- values similar to the sched_setattr(2). This minimum utilization
- value is used to clamp the task specific minimum utilization clamp.
+ This interface allows reading and setting minimum utilization clamp
+ values similar to the sched_setattr(2). This minimum utilization
+ value is used to clamp the task specific minimum utilization clamp,
+ including those of realtime processes.
- The requested minimum utilization (protection) is always capped by
- the current value for the maximum utilization (limit), i.e.
- `cpu.uclamp.max`.
+ The requested minimum utilization (protection) is always capped by
+ the current value for the maximum utilization (limit), i.e.
+ `cpu.uclamp.max`.
cpu.uclamp.max
- A read-write single value file which exists on non-root cgroups.
- The default is "max". i.e. no utilization capping
+ A read-write single value file which exists on non-root cgroups.
+ The default is "max". i.e. no utilization capping
- The requested maximum utilization (limit) as a percentage rational
- number, e.g. 98.76 for 98.76%.
+ The requested maximum utilization (limit) as a percentage rational
+ number, e.g. 98.76 for 98.76%.
- This interface allows reading and setting maximum utilization clamp
- values similar to the sched_setattr(2). This maximum utilization
- value is used to clamp the task specific maximum utilization clamp.
+ This interface allows reading and setting maximum utilization clamp
+ values similar to the sched_setattr(2). This maximum utilization
+ value is used to clamp the task specific maximum utilization clamp,
+ including those of realtime processes.
cpu.idle
A read-write single value file which exists on non-root cgroups.
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 2/3] sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler"
2025-05-22 2:08 [PATCH v3 0/3] cgroup, docs: cpu controller interaction with various scheduling policies Shashank Balaji
2025-05-22 2:08 ` [PATCH v3 1/3] cgroup, docs: convert space indentation to tab indentation Shashank Balaji
@ 2025-05-22 2:08 ` Shashank Balaji
2025-05-22 19:09 ` Tejun Heo
2025-05-22 2:08 ` [PATCH v3 3/3] cgroup, docs: cpu controller's interaction with various scheduling policies Shashank Balaji
2 siblings, 1 reply; 8+ messages in thread
From: Shashank Balaji @ 2025-05-22 2:08 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet
Cc: cgroups, linux-doc, linux-kernel, Shinya Takumi, Shashank Balaji
Mentions of CFS are stale since the fair-class scheduler is implemented using
EEVDF. So, convert such mentions to "fair-class scheduler" to stay
algorithm-name agnostic.
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
---
Documentation/scheduler/sched-ext.rst | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index 0b2654e2164b8e6139db19fc8b68e6c5c289503d..ceca6f8966eeeb5f029a9ae41c039d67c1db7be8 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -47,8 +47,8 @@ options should be enabled to use sched_ext:
sched_ext is used only when the BPF scheduler is loaded and running.
If a task explicitly sets its scheduling policy to ``SCHED_EXT``, it will be
-treated as ``SCHED_NORMAL`` and scheduled by CFS until the BPF scheduler is
-loaded.
+treated as ``SCHED_NORMAL`` and scheduled by the fair-class scheduler until the
+BPF scheduler is loaded.
When the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is not set
in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
@@ -57,11 +57,11 @@ in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is
set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled
by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and
-``SCHED_IDLE`` policies are scheduled by CFS.
+``SCHED_IDLE`` policies are scheduled by the fair-class scheduler.
Terminating the sched_ext scheduler program, triggering `SysRq-S`, or
detection of any internal error including stalled runnable tasks aborts the
-BPF scheduler and reverts all tasks back to CFS.
+BPF scheduler and reverts all tasks back to the fair-class scheduler.
.. code-block:: none
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 3/3] cgroup, docs: cpu controller's interaction with various scheduling policies
2025-05-22 2:08 [PATCH v3 0/3] cgroup, docs: cpu controller interaction with various scheduling policies Shashank Balaji
2025-05-22 2:08 ` [PATCH v3 1/3] cgroup, docs: convert space indentation to tab indentation Shashank Balaji
2025-05-22 2:08 ` [PATCH v3 2/3] sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler" Shashank Balaji
@ 2025-05-22 2:08 ` Shashank Balaji
2025-05-22 2:16 ` Shashank Balaji
2025-05-22 19:11 ` Tejun Heo
2 siblings, 2 replies; 8+ messages in thread
From: Shashank Balaji @ 2025-05-22 2:08 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet
Cc: cgroups, linux-doc, linux-kernel, Shinya Takumi, Shashank Balaji
The cpu controller interface files account for or affect processes
differently based on their scheduling policy, and the underlying
scheduler used (fair-class vs. BPF scheduler). Document these
differences
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
---
Documentation/admin-guide/cgroup-v2.rst | 41 +++++++++++++++++++++++++++++----
1 file changed, 37 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 226fc7f9212eafcbf83c81f5b08391f215c1d894..f6dc95608d239d586b482154c4367baaf5614fb6 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1095,19 +1095,34 @@ realtime processes irrespective of CONFIG_RT_GROUP_SCHED.
CPU Interface Files
~~~~~~~~~~~~~~~~~~~
-All time durations are in microseconds.
+The interaction of a process with the cpu controller depends on its scheduling
+policy and the underlying scheduler. From the point of view of the cpu controller,
+processes can be categorized as follows:
+
+* Processes under the fair-class scheduler
+* Processes under a BPF scheduler with the ``cgroup_set_weight`` callback
+* Everything else: ``SCHED_{FIFO,RR,DEADLINE}`` and processes under a BPF scheduler
+ without the ``cgroup_set_weight`` callback
+
+For details on when a process is under the fair-class scheduler or a BPF scheduler,
+check out :ref:`Documentation/scheduler/sched-ext.rst <sched-ext>`.
+
+For each of the following interface files, the above categories
+will be referred to. All time durations are in microseconds.
cpu.stat
A read-only flat-keyed file.
This file exists whether the controller is enabled or not.
- It always reports the following three stats:
+ It always reports the following three stats, which account for all the
+ processes in the cgroup:
- usage_usec
- user_usec
- system_usec
- and the following five when the controller is enabled:
+ and the following five when the controller is enabled, which account for
+ only the processes under the fair-class scheduler:
- nr_periods
- nr_throttled
@@ -1125,6 +1140,10 @@ All time durations are in microseconds.
If the cgroup has been configured to be SCHED_IDLE (cpu.idle = 1),
then the weight will show as a 0.
+ This file affects only processes under the fair-class scheduler and a BPF
+ scheduler with the ``cgroup_set_weight`` callback depending on what the
+ callback actually does.
+
cpu.weight.nice
A read-write single value file which exists on non-root
cgroups. The default is "0".
@@ -1137,6 +1156,10 @@ All time durations are in microseconds.
granularity is coarser for the nice values, the read value is
the closest approximation of the current weight.
+ This file affects only processes under the fair-class scheduler and a BPF
+ scheduler with the ``cgroup_set_weight`` callback depending on what the
+ callback actually does.
+
cpu.max
A read-write two value file which exists on non-root cgroups.
The default is "max 100000".
@@ -1149,18 +1172,24 @@ All time durations are in microseconds.
$PERIOD duration. "max" for $MAX indicates no limit. If only
one number is written, $MAX is updated.
+ This file affects only processes under the fair-class scheduler.
+
cpu.max.burst
A read-write single value file which exists on non-root
cgroups. The default is "0".
The burst in the range [0, $MAX].
+ This file affects only processes under the fair-class scheduler.
+
cpu.pressure
A read-write nested-keyed file.
Shows pressure stall information for CPU. See
:ref:`Documentation/accounting/psi.rst <psi>` for details.
+ This file accounts for all the processes in the cgroup.
+
cpu.uclamp.min
A read-write single value file which exists on non-root cgroups.
The default is "0", i.e. no utilization boosting.
@@ -1177,6 +1206,8 @@ All time durations are in microseconds.
the current value for the maximum utilization (limit), i.e.
`cpu.uclamp.max`.
+ This file affects all the processes in the cgroup.
+
cpu.uclamp.max
A read-write single value file which exists on non-root cgroups.
The default is "max". i.e. no utilization capping
@@ -1189,6 +1220,8 @@ All time durations are in microseconds.
value is used to clamp the task specific maximum utilization clamp,
including those of realtime processes.
+ This file affects all the processes in the cgroup.
+
cpu.idle
A read-write single value file which exists on non-root cgroups.
The default is 0.
@@ -1199,7 +1232,7 @@ All time durations are in microseconds.
own relative priorities, but the cgroup itself will be treated as
very low priority relative to its peers.
-
+ This file affects only processes under the fair-class scheduler.
Memory
------
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 3/3] cgroup, docs: cpu controller's interaction with various scheduling policies
2025-05-22 2:08 ` [PATCH v3 3/3] cgroup, docs: cpu controller's interaction with various scheduling policies Shashank Balaji
@ 2025-05-22 2:16 ` Shashank Balaji
2025-05-22 19:11 ` Tejun Heo
1 sibling, 0 replies; 8+ messages in thread
From: Shashank Balaji @ 2025-05-22 2:16 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Jonathan Corbet
Cc: cgroups, linux-doc, linux-kernel, Shinya Takumi
Hi,
On Thu, May 22, 2025 at 11:08:14AM +0900, Shashank Balaji wrote:
> +* Processes under the fair-class scheduler
> +* Processes under a BPF scheduler with the ``cgroup_set_weight`` callback
> +* Everything else: ``SCHED_{FIFO,RR,DEADLINE}`` and processes under a BPF scheduler
> + without the ``cgroup_set_weight`` callback
Though `cgroup_set_weight` is referred to here, CONFIG_EXT_GROUP_SCHED
is not yet documented in sched-ext.rst. But I don't understand it well
enough to add that documentation myself.
Thanks
Regards,
Shashank
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/3] cgroup, docs: convert space indentation to tab indentation
2025-05-22 2:08 ` [PATCH v3 1/3] cgroup, docs: convert space indentation to tab indentation Shashank Balaji
@ 2025-05-22 19:07 ` Tejun Heo
0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2025-05-22 19:07 UTC (permalink / raw)
To: Shashank Balaji
Cc: Johannes Weiner, Michal Koutný, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Shinya Takumi
On Thu, May 22, 2025 at 11:08:12AM +0900, Shashank Balaji wrote:
> The paragraphs on cpu.uclamp.{min,max} are space indented. Convert them to
> tab indentation to make them uniform with the other paragraphs.
>
> Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
Applied to cgroup/for-6.16.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/3] sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler"
2025-05-22 2:08 ` [PATCH v3 2/3] sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler" Shashank Balaji
@ 2025-05-22 19:09 ` Tejun Heo
0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2025-05-22 19:09 UTC (permalink / raw)
To: Shashank Balaji
Cc: Johannes Weiner, Michal Koutný, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Shinya Takumi
On Thu, May 22, 2025 at 11:08:13AM +0900, Shashank Balaji wrote:
> Mentions of CFS are stale since the fair-class scheduler is implemented using
> EEVDF. So, convert such mentions to "fair-class scheduler" to stay
> algorithm-name agnostic.
>
> Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
Applied to sched_ext/for-6.16.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 3/3] cgroup, docs: cpu controller's interaction with various scheduling policies
2025-05-22 2:08 ` [PATCH v3 3/3] cgroup, docs: cpu controller's interaction with various scheduling policies Shashank Balaji
2025-05-22 2:16 ` Shashank Balaji
@ 2025-05-22 19:11 ` Tejun Heo
1 sibling, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2025-05-22 19:11 UTC (permalink / raw)
To: Shashank Balaji
Cc: Johannes Weiner, Michal Koutný, Jonathan Corbet, cgroups,
linux-doc, linux-kernel, Shinya Takumi
On Thu, May 22, 2025 at 11:08:14AM +0900, Shashank Balaji wrote:
> The cpu controller interface files account for or affect processes
> differently based on their scheduling policy, and the underlying
> scheduler used (fair-class vs. BPF scheduler). Document these
> differences
>
> Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com>
Applied to cgroup/for-6.16.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-05-22 19:11 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 2:08 [PATCH v3 0/3] cgroup, docs: cpu controller interaction with various scheduling policies Shashank Balaji
2025-05-22 2:08 ` [PATCH v3 1/3] cgroup, docs: convert space indentation to tab indentation Shashank Balaji
2025-05-22 19:07 ` Tejun Heo
2025-05-22 2:08 ` [PATCH v3 2/3] sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler" Shashank Balaji
2025-05-22 19:09 ` Tejun Heo
2025-05-22 2:08 ` [PATCH v3 3/3] cgroup, docs: cpu controller's interaction with various scheduling policies Shashank Balaji
2025-05-22 2:16 ` Shashank Balaji
2025-05-22 19:11 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).