public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters
@ 2026-03-19  5:30 zhidao su
  2026-03-19  5:30 ` [PATCH 2/2] selftests/sched_ext: Return non-zero exit code on test failure zhidao su
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: zhidao su @ 2026-03-19  5:30 UTC (permalink / raw)
  To: sched-ext
  Cc: linux-kernel, tj, void, arighi, changwoo, peterz, mingo,
	zhidao su

Two categories of sched_ext diagnostics are currently undocumented:

1. Per-scheduler events sysfs file
   Each active BPF scheduler exposes a set of diagnostic counters at
   /sys/kernel/sched_ext/<name>/events.  These counters are defined
   (with detailed comments) in kernel/sched/ext_internal.h but have
   no corresponding documentation in sched-ext.rst.  BPF scheduler
   developers must read kernel source to understand what each counter
   means.

   Add a description of the events file, an example of its output, and
   a brief explanation of every counter.

2. Module parameters
   kernel/sched/ext.c registers two parameters under the sched_ext.
   prefix (slice_bypass_us, bypass_lb_intv_us) via module_param_cb()
   with MODULE_PARM_DESC() strings, but sched-ext.rst makes no mention
   of them.  Users who need to tune bypass-mode behavior have no
   in-tree documentation to consult.

   Add a "Module Parameters" section documenting both knobs: their
   default values, valid ranges (taken from the set_*() validators in
   ext.c), and the note from the source that they are primarily for
   debugging.

No functional changes.

Signed-off-by: zhidao su <suzhidao@xiaomi.com>
---
 Documentation/scheduler/sched-ext.rst | 68 +++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index f4f7d8f4f9e4..6fc7e720a956 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -93,6 +93,55 @@ scheduler has been loaded):
     # cat /sys/kernel/sched_ext/enable_seq
     1
 
+Each running scheduler also exposes a per-scheduler ``events`` file under
+``/sys/kernel/sched_ext/<scheduler-name>/events`` that tracks diagnostic
+counters. Each counter occupies one ``name value`` line:
+
+.. code-block:: none
+
+    # cat /sys/kernel/sched_ext/simple/events
+    SCX_EV_SELECT_CPU_FALLBACK 0
+    SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE 0
+    SCX_EV_DISPATCH_KEEP_LAST 123
+    SCX_EV_ENQ_SKIP_EXITING 0
+    SCX_EV_ENQ_SKIP_MIGRATION_DISABLED 0
+    SCX_EV_REENQ_IMMED 0
+    SCX_EV_REENQ_LOCAL_REPEAT 0
+    SCX_EV_REFILL_SLICE_DFL 456789
+    SCX_EV_BYPASS_DURATION 0
+    SCX_EV_BYPASS_DISPATCH 0
+    SCX_EV_BYPASS_ACTIVATE 0
+    SCX_EV_INSERT_NOT_OWNED 0
+    SCX_EV_SUB_BYPASS_DISPATCH 0
+
+The counters are described in ``kernel/sched/ext_internal.h``; briefly:
+
+* ``SCX_EV_SELECT_CPU_FALLBACK``: ops.select_cpu() returned a CPU unusable by
+  the task and the core scheduler silently picked a fallback CPU.
+* ``SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE``: a local-DSQ dispatch was redirected
+  to the global DSQ because the target CPU went offline.
+* ``SCX_EV_DISPATCH_KEEP_LAST``: a task continued running because no other
+  task was available (only when ``SCX_OPS_ENQ_LAST`` is not set).
+* ``SCX_EV_ENQ_SKIP_EXITING``: an exiting task was dispatched to the local DSQ
+  directly, bypassing ops.enqueue() (only when ``SCX_OPS_ENQ_EXITING`` is not set).
+* ``SCX_EV_ENQ_SKIP_MIGRATION_DISABLED``: a migration-disabled task was
+  dispatched to its local DSQ directly (only when
+  ``SCX_OPS_ENQ_MIGRATION_DISABLED`` is not set).
+* ``SCX_EV_REENQ_IMMED``: a task dispatched with ``SCX_ENQ_IMMED`` was
+  re-enqueued because the target CPU was not available for immediate execution.
+* ``SCX_EV_REENQ_LOCAL_REPEAT``: a reenqueue of the local DSQ triggered
+  another reenqueue; recurring counts indicate incorrect ``SCX_ENQ_REENQ``
+  handling in the BPF scheduler.
+* ``SCX_EV_REFILL_SLICE_DFL``: a task's time slice was refilled with the
+  default value (``SCX_SLICE_DFL``).
+* ``SCX_EV_BYPASS_DURATION``: total nanoseconds spent in bypass mode.
+* ``SCX_EV_BYPASS_DISPATCH``: number of tasks dispatched while in bypass mode.
+* ``SCX_EV_BYPASS_ACTIVATE``: number of times bypass mode was activated.
+* ``SCX_EV_INSERT_NOT_OWNED``: attempted to insert a task into a DSQ not owned
+  by this scheduler; such attempts are silently ignored.
+* ``SCX_EV_SUB_BYPASS_DISPATCH``: tasks dispatched from sub-scheduler bypass
+  DSQs (only relevant with ``CONFIG_EXT_SUB_SCHED``).
+
 ``tools/sched_ext/scx_show_state.py`` is a drgn script which shows more
 detailed information:
 
@@ -441,6 +490,25 @@ Where to Look
     scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order;
     all others are scheduled in user space by a simple vruntime scheduler.
 
+Module Parameters
+=================
+
+sched_ext exposes two module parameters under the ``sched_ext.`` prefix that
+control bypass-mode behaviour. These knobs are primarily for debugging; there
+is usually no reason to change them during normal operation. They can be read
+and written at runtime (mode 0600) via
+``/sys/module/sched_ext/parameters/``.
+
+``sched_ext.slice_bypass_us`` (default: 5000 µs)
+    The time slice assigned to all tasks when the scheduler is in bypass mode,
+    i.e. during BPF scheduler load, unload, and error recovery. Valid range is
+    100 µs to 100 ms.
+
+``sched_ext.bypass_lb_intv_us`` (default: 500000 µs)
+    The interval at which the bypass-mode load balancer redistributes tasks
+    across CPUs. Set to 0 to disable load balancing during bypass mode. Valid
+    range is 0 to 10 s.
+
 ABI Instability
 ===============
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] selftests/sched_ext: Return non-zero exit code on test failure
  2026-03-19  5:30 [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters zhidao su
@ 2026-03-19  5:30 ` zhidao su
  2026-03-19  6:49   ` Andrea Righi
  2026-03-19  6:45 ` [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters Andrea Righi
  2026-03-21 18:42 ` Tejun Heo
  2 siblings, 1 reply; 5+ messages in thread
From: zhidao su @ 2026-03-19  5:30 UTC (permalink / raw)
  To: sched-ext
  Cc: linux-kernel, tj, void, arighi, changwoo, peterz, mingo,
	zhidao su

runner.c always returned 0 regardless of test results.  The kselftest
framework (tools/testing/selftests/kselftest/runner.sh) invokes the runner
binary and treats a non-zero exit code as a test failure; with the old
code, failed sched_ext tests were silently hidden from the parent harness
even though individual "not ok" TAP lines were emitted.

Return 1 when at least one test failed, 0 when all tests passed or were
skipped.

Signed-off-by: zhidao su <suzhidao@xiaomi.com>
---
 tools/testing/selftests/sched_ext/runner.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/sched_ext/runner.c b/tools/testing/selftests/sched_ext/runner.c
index 37ad56c3eb29..4c68efa1512a 100644
--- a/tools/testing/selftests/sched_ext/runner.c
+++ b/tools/testing/selftests/sched_ext/runner.c
@@ -217,7 +217,7 @@ int main(int argc, char **argv)
 			printf("  - %s\n", failed_tests[i]);
 	}
 
-	return 0;
+	return failed > 0 ? 1 : 0;
 }
 
 void scx_test_register(struct scx_test *test)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters
  2026-03-19  5:30 [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters zhidao su
  2026-03-19  5:30 ` [PATCH 2/2] selftests/sched_ext: Return non-zero exit code on test failure zhidao su
@ 2026-03-19  6:45 ` Andrea Righi
  2026-03-21 18:42 ` Tejun Heo
  2 siblings, 0 replies; 5+ messages in thread
From: Andrea Righi @ 2026-03-19  6:45 UTC (permalink / raw)
  To: zhidao su
  Cc: sched-ext, linux-kernel, tj, void, changwoo, peterz, mingo,
	zhidao su

Hi,

On Thu, Mar 19, 2026 at 01:30:25PM +0800, zhidao su wrote:
> Two categories of sched_ext diagnostics are currently undocumented:
> 
> 1. Per-scheduler events sysfs file
>    Each active BPF scheduler exposes a set of diagnostic counters at
>    /sys/kernel/sched_ext/<name>/events.  These counters are defined
>    (with detailed comments) in kernel/sched/ext_internal.h but have
>    no corresponding documentation in sched-ext.rst.  BPF scheduler
>    developers must read kernel source to understand what each counter
>    means.
> 
>    Add a description of the events file, an example of its output, and
>    a brief explanation of every counter.
> 
> 2. Module parameters
>    kernel/sched/ext.c registers two parameters under the sched_ext.
>    prefix (slice_bypass_us, bypass_lb_intv_us) via module_param_cb()
>    with MODULE_PARM_DESC() strings, but sched-ext.rst makes no mention
>    of them.  Users who need to tune bypass-mode behavior have no
>    in-tree documentation to consult.
> 
>    Add a "Module Parameters" section documenting both knobs: their
>    default values, valid ranges (taken from the set_*() validators in
>    ext.c), and the note from the source that they are primarily for
>    debugging.
> 
> No functional changes.
> 
> Signed-off-by: zhidao su <suzhidao@xiaomi.com>

Thanks for documenting this. Comments below.

> ---
>  Documentation/scheduler/sched-ext.rst | 68 +++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
> 
> diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
> index f4f7d8f4f9e4..6fc7e720a956 100644
> --- a/Documentation/scheduler/sched-ext.rst
> +++ b/Documentation/scheduler/sched-ext.rst
> @@ -93,6 +93,55 @@ scheduler has been loaded):
>      # cat /sys/kernel/sched_ext/enable_seq
>      1
>  
> +Each running scheduler also exposes a per-scheduler ``events`` file under
> +``/sys/kernel/sched_ext/<scheduler-name>/events`` that tracks diagnostic

The right path is /sys/kernel/sched_ext/root/events. And now that we have
sub-schedulers it's probably worth mentioning that sub-scheduler events
file is located at /sys/kernel/sched_ext/root/sub/sub-<cgroup_id>/events.

> +counters. Each counter occupies one ``name value`` line:
> +
> +.. code-block:: none
> +
> +    # cat /sys/kernel/sched_ext/simple/events
> +    SCX_EV_SELECT_CPU_FALLBACK 0
> +    SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE 0
> +    SCX_EV_DISPATCH_KEEP_LAST 123
> +    SCX_EV_ENQ_SKIP_EXITING 0
> +    SCX_EV_ENQ_SKIP_MIGRATION_DISABLED 0
> +    SCX_EV_REENQ_IMMED 0
> +    SCX_EV_REENQ_LOCAL_REPEAT 0
> +    SCX_EV_REFILL_SLICE_DFL 456789
> +    SCX_EV_BYPASS_DURATION 0
> +    SCX_EV_BYPASS_DISPATCH 0
> +    SCX_EV_BYPASS_ACTIVATE 0
> +    SCX_EV_INSERT_NOT_OWNED 0
> +    SCX_EV_SUB_BYPASS_DISPATCH 0
> +
> +The counters are described in ``kernel/sched/ext_internal.h``; briefly:
> +
> +* ``SCX_EV_SELECT_CPU_FALLBACK``: ops.select_cpu() returned a CPU unusable by
> +  the task and the core scheduler silently picked a fallback CPU.
> +* ``SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE``: a local-DSQ dispatch was redirected
> +  to the global DSQ because the target CPU went offline.
> +* ``SCX_EV_DISPATCH_KEEP_LAST``: a task continued running because no other
> +  task was available (only when ``SCX_OPS_ENQ_LAST`` is not set).
> +* ``SCX_EV_ENQ_SKIP_EXITING``: an exiting task was dispatched to the local DSQ
> +  directly, bypassing ops.enqueue() (only when ``SCX_OPS_ENQ_EXITING`` is not set).
> +* ``SCX_EV_ENQ_SKIP_MIGRATION_DISABLED``: a migration-disabled task was
> +  dispatched to its local DSQ directly (only when
> +  ``SCX_OPS_ENQ_MIGRATION_DISABLED`` is not set).
> +* ``SCX_EV_REENQ_IMMED``: a task dispatched with ``SCX_ENQ_IMMED`` was
> +  re-enqueued because the target CPU was not available for immediate execution.
> +* ``SCX_EV_REENQ_LOCAL_REPEAT``: a reenqueue of the local DSQ triggered
> +  another reenqueue; recurring counts indicate incorrect ``SCX_ENQ_REENQ``
> +  handling in the BPF scheduler.
> +* ``SCX_EV_REFILL_SLICE_DFL``: a task's time slice was refilled with the
> +  default value (``SCX_SLICE_DFL``).
> +* ``SCX_EV_BYPASS_DURATION``: total nanoseconds spent in bypass mode.
> +* ``SCX_EV_BYPASS_DISPATCH``: number of tasks dispatched while in bypass mode.
> +* ``SCX_EV_BYPASS_ACTIVATE``: number of times bypass mode was activated.
> +* ``SCX_EV_INSERT_NOT_OWNED``: attempted to insert a task into a DSQ not owned
> +  by this scheduler; such attempts are silently ignored.
> +* ``SCX_EV_SUB_BYPASS_DISPATCH``: tasks dispatched from sub-scheduler bypass
> +  DSQs (only relevant with ``CONFIG_EXT_SUB_SCHED``).
> +
>  ``tools/sched_ext/scx_show_state.py`` is a drgn script which shows more
>  detailed information:
>  
> @@ -441,6 +490,25 @@ Where to Look
>      scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order;
>      all others are scheduled in user space by a simple vruntime scheduler.
>  
> +Module Parameters
> +=================
> +
> +sched_ext exposes two module parameters under the ``sched_ext.`` prefix that

Maybe ``sched_ext.`` namespace?

> +control bypass-mode behaviour. These knobs are primarily for debugging; there
> +is usually no reason to change them during normal operation. They can be read
> +and written at runtime (mode 0600) via
> +``/sys/module/sched_ext/parameters/``.
> +
> +``sched_ext.slice_bypass_us`` (default: 5000 µs)
> +    The time slice assigned to all tasks when the scheduler is in bypass mode,
> +    i.e. during BPF scheduler load, unload, and error recovery. Valid range is
> +    100 µs to 100 ms.
> +
> +``sched_ext.bypass_lb_intv_us`` (default: 500000 µs)
> +    The interval at which the bypass-mode load balancer redistributes tasks
> +    across CPUs. Set to 0 to disable load balancing during bypass mode. Valid
> +    range is 0 to 10 s.
> +
>  ABI Instability
>  ===============
>  
> -- 
> 2.43.0
> 

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] selftests/sched_ext: Return non-zero exit code on test failure
  2026-03-19  5:30 ` [PATCH 2/2] selftests/sched_ext: Return non-zero exit code on test failure zhidao su
@ 2026-03-19  6:49   ` Andrea Righi
  0 siblings, 0 replies; 5+ messages in thread
From: Andrea Righi @ 2026-03-19  6:49 UTC (permalink / raw)
  To: zhidao su
  Cc: sched-ext, linux-kernel, tj, void, changwoo, peterz, mingo,
	zhidao su

On Thu, Mar 19, 2026 at 01:30:26PM +0800, zhidao su wrote:
> runner.c always returned 0 regardless of test results.  The kselftest
> framework (tools/testing/selftests/kselftest/runner.sh) invokes the runner
> binary and treats a non-zero exit code as a test failure; with the old
> code, failed sched_ext tests were silently hidden from the parent harness
> even though individual "not ok" TAP lines were emitted.
> 
> Return 1 when at least one test failed, 0 when all tests passed or were
> skipped.
> 
> Signed-off-by: zhidao su <suzhidao@xiaomi.com>

Makes sense.

Acked-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

> ---
>  tools/testing/selftests/sched_ext/runner.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/sched_ext/runner.c b/tools/testing/selftests/sched_ext/runner.c
> index 37ad56c3eb29..4c68efa1512a 100644
> --- a/tools/testing/selftests/sched_ext/runner.c
> +++ b/tools/testing/selftests/sched_ext/runner.c
> @@ -217,7 +217,7 @@ int main(int argc, char **argv)
>  			printf("  - %s\n", failed_tests[i]);
>  	}
>  
> -	return 0;
> +	return failed > 0 ? 1 : 0;
>  }
>  
>  void scx_test_register(struct scx_test *test)
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters
  2026-03-19  5:30 [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters zhidao su
  2026-03-19  5:30 ` [PATCH 2/2] selftests/sched_ext: Return non-zero exit code on test failure zhidao su
  2026-03-19  6:45 ` [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters Andrea Righi
@ 2026-03-21 18:42 ` Tejun Heo
  2 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2026-03-21 18:42 UTC (permalink / raw)
  To: zhidao su, sched-ext
  Cc: linux-kernel, David Vernet, Andrea Righi, Changwoo Min,
	Peter Zijlstra, Ingo Molnar, Emil Tsalapatis

Hello,

Applied 1-2 to sched_ext/for-7.1 with the following modification
to 1/2:

  Fixed SCX_EV_INSERT_NOT_OWNED description - the check is whether
  the task (not the DSQ) is owned by the scheduler.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-21 18:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19  5:30 [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters zhidao su
2026-03-19  5:30 ` [PATCH 2/2] selftests/sched_ext: Return non-zero exit code on test failure zhidao su
2026-03-19  6:49   ` Andrea Righi
2026-03-19  6:45 ` [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters Andrea Righi
2026-03-21 18:42 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox