From: Andrea Righi <arighi@nvidia.com>
To: zhidao su <soolaugust@gmail.com>
Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
tj@kernel.org, void@manifault.com, changwoo@igalia.com,
peterz@infradead.org, mingo@redhat.com,
zhidao su <suzhidao@xiaomi.com>
Subject: Re: [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters
Date: Thu, 19 Mar 2026 07:45:29 +0100 [thread overview]
Message-ID: <abubiaubvpc2GqNH@gpd4> (raw)
In-Reply-To: <20260319053026.447892-1-suzhidao@xiaomi.com>
Hi,
On Thu, Mar 19, 2026 at 01:30:25PM +0800, zhidao su wrote:
> Two categories of sched_ext diagnostics are currently undocumented:
>
> 1. Per-scheduler events sysfs file
> Each active BPF scheduler exposes a set of diagnostic counters at
> /sys/kernel/sched_ext/<name>/events. These counters are defined
> (with detailed comments) in kernel/sched/ext_internal.h but have
> no corresponding documentation in sched-ext.rst. BPF scheduler
> developers must read kernel source to understand what each counter
> means.
>
> Add a description of the events file, an example of its output, and
> a brief explanation of every counter.
>
> 2. Module parameters
> kernel/sched/ext.c registers two parameters under the sched_ext.
> prefix (slice_bypass_us, bypass_lb_intv_us) via module_param_cb()
> with MODULE_PARM_DESC() strings, but sched-ext.rst makes no mention
> of them. Users who need to tune bypass-mode behavior have no
> in-tree documentation to consult.
>
> Add a "Module Parameters" section documenting both knobs: their
> default values, valid ranges (taken from the set_*() validators in
> ext.c), and the note from the source that they are primarily for
> debugging.
>
> No functional changes.
>
> Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Thanks for documenting this. Comments below.
> ---
> Documentation/scheduler/sched-ext.rst | 68 +++++++++++++++++++++++++++
> 1 file changed, 68 insertions(+)
>
> diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
> index f4f7d8f4f9e4..6fc7e720a956 100644
> --- a/Documentation/scheduler/sched-ext.rst
> +++ b/Documentation/scheduler/sched-ext.rst
> @@ -93,6 +93,55 @@ scheduler has been loaded):
> # cat /sys/kernel/sched_ext/enable_seq
> 1
>
> +Each running scheduler also exposes a per-scheduler ``events`` file under
> +``/sys/kernel/sched_ext/<scheduler-name>/events`` that tracks diagnostic
The right path is /sys/kernel/sched_ext/root/events. And now that we have
sub-schedulers it's probably worth mentioning that sub-scheduler events
file is located at /sys/kernel/sched_ext/root/sub/sub-<cgroup_id>/events.
> +counters. Each counter occupies one ``name value`` line:
> +
> +.. code-block:: none
> +
> + # cat /sys/kernel/sched_ext/simple/events
> + SCX_EV_SELECT_CPU_FALLBACK 0
> + SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE 0
> + SCX_EV_DISPATCH_KEEP_LAST 123
> + SCX_EV_ENQ_SKIP_EXITING 0
> + SCX_EV_ENQ_SKIP_MIGRATION_DISABLED 0
> + SCX_EV_REENQ_IMMED 0
> + SCX_EV_REENQ_LOCAL_REPEAT 0
> + SCX_EV_REFILL_SLICE_DFL 456789
> + SCX_EV_BYPASS_DURATION 0
> + SCX_EV_BYPASS_DISPATCH 0
> + SCX_EV_BYPASS_ACTIVATE 0
> + SCX_EV_INSERT_NOT_OWNED 0
> + SCX_EV_SUB_BYPASS_DISPATCH 0
> +
> +The counters are described in ``kernel/sched/ext_internal.h``; briefly:
> +
> +* ``SCX_EV_SELECT_CPU_FALLBACK``: ops.select_cpu() returned a CPU unusable by
> + the task and the core scheduler silently picked a fallback CPU.
> +* ``SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE``: a local-DSQ dispatch was redirected
> + to the global DSQ because the target CPU went offline.
> +* ``SCX_EV_DISPATCH_KEEP_LAST``: a task continued running because no other
> + task was available (only when ``SCX_OPS_ENQ_LAST`` is not set).
> +* ``SCX_EV_ENQ_SKIP_EXITING``: an exiting task was dispatched to the local DSQ
> + directly, bypassing ops.enqueue() (only when ``SCX_OPS_ENQ_EXITING`` is not set).
> +* ``SCX_EV_ENQ_SKIP_MIGRATION_DISABLED``: a migration-disabled task was
> + dispatched to its local DSQ directly (only when
> + ``SCX_OPS_ENQ_MIGRATION_DISABLED`` is not set).
> +* ``SCX_EV_REENQ_IMMED``: a task dispatched with ``SCX_ENQ_IMMED`` was
> + re-enqueued because the target CPU was not available for immediate execution.
> +* ``SCX_EV_REENQ_LOCAL_REPEAT``: a reenqueue of the local DSQ triggered
> + another reenqueue; recurring counts indicate incorrect ``SCX_ENQ_REENQ``
> + handling in the BPF scheduler.
> +* ``SCX_EV_REFILL_SLICE_DFL``: a task's time slice was refilled with the
> + default value (``SCX_SLICE_DFL``).
> +* ``SCX_EV_BYPASS_DURATION``: total nanoseconds spent in bypass mode.
> +* ``SCX_EV_BYPASS_DISPATCH``: number of tasks dispatched while in bypass mode.
> +* ``SCX_EV_BYPASS_ACTIVATE``: number of times bypass mode was activated.
> +* ``SCX_EV_INSERT_NOT_OWNED``: attempted to insert a task into a DSQ not owned
> + by this scheduler; such attempts are silently ignored.
> +* ``SCX_EV_SUB_BYPASS_DISPATCH``: tasks dispatched from sub-scheduler bypass
> + DSQs (only relevant with ``CONFIG_EXT_SUB_SCHED``).
> +
> ``tools/sched_ext/scx_show_state.py`` is a drgn script which shows more
> detailed information:
>
> @@ -441,6 +490,25 @@ Where to Look
> scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order;
> all others are scheduled in user space by a simple vruntime scheduler.
>
> +Module Parameters
> +=================
> +
> +sched_ext exposes two module parameters under the ``sched_ext.`` prefix that
Maybe ``sched_ext.`` namespace?
> +control bypass-mode behaviour. These knobs are primarily for debugging; there
> +is usually no reason to change them during normal operation. They can be read
> +and written at runtime (mode 0600) via
> +``/sys/module/sched_ext/parameters/``.
> +
> +``sched_ext.slice_bypass_us`` (default: 5000 µs)
> + The time slice assigned to all tasks when the scheduler is in bypass mode,
> + i.e. during BPF scheduler load, unload, and error recovery. Valid range is
> + 100 µs to 100 ms.
> +
> +``sched_ext.bypass_lb_intv_us`` (default: 500000 µs)
> + The interval at which the bypass-mode load balancer redistributes tasks
> + across CPUs. Set to 0 to disable load balancing during bypass mode. Valid
> + range is 0 to 10 s.
> +
> ABI Instability
> ===============
>
> --
> 2.43.0
>
Thanks,
-Andrea
next prev parent reply other threads:[~2026-03-19 6:45 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 5:30 [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters zhidao su
2026-03-19 5:30 ` [PATCH 2/2] selftests/sched_ext: Return non-zero exit code on test failure zhidao su
2026-03-19 6:49 ` Andrea Righi
2026-03-19 6:45 ` Andrea Righi [this message]
2026-03-21 18:42 ` [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abubiaubvpc2GqNH@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=sched-ext@lists.linux.dev \
--cc=soolaugust@gmail.com \
--cc=suzhidao@xiaomi.com \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox