From: Andrea Righi <arighi@nvidia.com>
To: zhidao su <soolaugust@gmail.com>
Cc: sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
tj@kernel.org, void@manifault.com, changwoo@igalia.com,
peterz@infradead.org, mingo@redhat.com,
zhidao su <suzhidao@xiaomi.com>
Subject: Re: [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters
Date: Thu, 19 Mar 2026 07:45:29 +0100 [thread overview]
Message-ID: <abubiaubvpc2GqNH@gpd4> (raw)
In-Reply-To: <20260319053026.447892-1-suzhidao@xiaomi.com>
Hi,
On Thu, Mar 19, 2026 at 01:30:25PM +0800, zhidao su wrote:
> Two categories of sched_ext diagnostics are currently undocumented:
>
> 1. Per-scheduler events sysfs file
> Each active BPF scheduler exposes a set of diagnostic counters at
> /sys/kernel/sched_ext/<name>/events. These counters are defined
> (with detailed comments) in kernel/sched/ext_internal.h but have
> no corresponding documentation in sched-ext.rst. BPF scheduler
> developers must read kernel source to understand what each counter
> means.
>
> Add a description of the events file, an example of its output, and
> a brief explanation of every counter.
>
> 2. Module parameters
> kernel/sched/ext.c registers two parameters under the sched_ext.
> prefix (slice_bypass_us, bypass_lb_intv_us) via module_param_cb()
> with MODULE_PARM_DESC() strings, but sched-ext.rst makes no mention
> of them. Users who need to tune bypass-mode behavior have no
> in-tree documentation to consult.
>
> Add a "Module Parameters" section documenting both knobs: their
> default values, valid ranges (taken from the set_*() validators in
> ext.c), and the note from the source that they are primarily for
> debugging.
>
> No functional changes.
>
> Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Thanks for documenting this. Comments below.
> ---
> Documentation/scheduler/sched-ext.rst | 68 +++++++++++++++++++++++++++
> 1 file changed, 68 insertions(+)
>
> diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
> index f4f7d8f4f9e4..6fc7e720a956 100644
> --- a/Documentation/scheduler/sched-ext.rst
> +++ b/Documentation/scheduler/sched-ext.rst
> @@ -93,6 +93,55 @@ scheduler has been loaded):
> # cat /sys/kernel/sched_ext/enable_seq
> 1
>
> +Each running scheduler also exposes a per-scheduler ``events`` file under
> +``/sys/kernel/sched_ext/<scheduler-name>/events`` that tracks diagnostic
The right path is /sys/kernel/sched_ext/root/events. And now that we have
sub-schedulers it's probably worth mentioning that sub-scheduler events
file is located at /sys/kernel/sched_ext/root/sub/sub-<cgroup_id>/events.
> +counters. Each counter occupies one ``name value`` line:
> +
> +.. code-block:: none
> +
> + # cat /sys/kernel/sched_ext/simple/events
> + SCX_EV_SELECT_CPU_FALLBACK 0
> + SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE 0
> + SCX_EV_DISPATCH_KEEP_LAST 123
> + SCX_EV_ENQ_SKIP_EXITING 0
> + SCX_EV_ENQ_SKIP_MIGRATION_DISABLED 0
> + SCX_EV_REENQ_IMMED 0
> + SCX_EV_REENQ_LOCAL_REPEAT 0
> + SCX_EV_REFILL_SLICE_DFL 456789
> + SCX_EV_BYPASS_DURATION 0
> + SCX_EV_BYPASS_DISPATCH 0
> + SCX_EV_BYPASS_ACTIVATE 0
> + SCX_EV_INSERT_NOT_OWNED 0
> + SCX_EV_SUB_BYPASS_DISPATCH 0
> +
> +The counters are described in ``kernel/sched/ext_internal.h``; briefly:
> +
> +* ``SCX_EV_SELECT_CPU_FALLBACK``: ops.select_cpu() returned a CPU unusable by
> + the task and the core scheduler silently picked a fallback CPU.
> +* ``SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE``: a local-DSQ dispatch was redirected
> + to the global DSQ because the target CPU went offline.
> +* ``SCX_EV_DISPATCH_KEEP_LAST``: a task continued running because no other
> + task was available (only when ``SCX_OPS_ENQ_LAST`` is not set).
> +* ``SCX_EV_ENQ_SKIP_EXITING``: an exiting task was dispatched to the local DSQ
> + directly, bypassing ops.enqueue() (only when ``SCX_OPS_ENQ_EXITING`` is not set).
> +* ``SCX_EV_ENQ_SKIP_MIGRATION_DISABLED``: a migration-disabled task was
> + dispatched to its local DSQ directly (only when
> + ``SCX_OPS_ENQ_MIGRATION_DISABLED`` is not set).
> +* ``SCX_EV_REENQ_IMMED``: a task dispatched with ``SCX_ENQ_IMMED`` was
> + re-enqueued because the target CPU was not available for immediate execution.
> +* ``SCX_EV_REENQ_LOCAL_REPEAT``: a reenqueue of the local DSQ triggered
> + another reenqueue; recurring counts indicate incorrect ``SCX_ENQ_REENQ``
> + handling in the BPF scheduler.
> +* ``SCX_EV_REFILL_SLICE_DFL``: a task's time slice was refilled with the
> + default value (``SCX_SLICE_DFL``).
> +* ``SCX_EV_BYPASS_DURATION``: total nanoseconds spent in bypass mode.
> +* ``SCX_EV_BYPASS_DISPATCH``: number of tasks dispatched while in bypass mode.
> +* ``SCX_EV_BYPASS_ACTIVATE``: number of times bypass mode was activated.
> +* ``SCX_EV_INSERT_NOT_OWNED``: attempted to insert a task into a DSQ not owned
> + by this scheduler; such attempts are silently ignored.
> +* ``SCX_EV_SUB_BYPASS_DISPATCH``: tasks dispatched from sub-scheduler bypass
> + DSQs (only relevant with ``CONFIG_EXT_SUB_SCHED``).
> +
> ``tools/sched_ext/scx_show_state.py`` is a drgn script which shows more
> detailed information:
>
> @@ -441,6 +490,25 @@ Where to Look
> scheduling. Tasks with CPU affinity are direct-dispatched in FIFO order;
> all others are scheduled in user space by a simple vruntime scheduler.
>
> +Module Parameters
> +=================
> +
> +sched_ext exposes two module parameters under the ``sched_ext.`` prefix that
Maybe ``sched_ext.`` namespace?
> +control bypass-mode behaviour. These knobs are primarily for debugging; there
> +is usually no reason to change them during normal operation. They can be read
> +and written at runtime (mode 0600) via
> +``/sys/module/sched_ext/parameters/``.
> +
> +``sched_ext.slice_bypass_us`` (default: 5000 µs)
> + The time slice assigned to all tasks when the scheduler is in bypass mode,
> + i.e. during BPF scheduler load, unload, and error recovery. Valid range is
> + 100 µs to 100 ms.
> +
> +``sched_ext.bypass_lb_intv_us`` (default: 500000 µs)
> + The interval at which the bypass-mode load balancer redistributes tasks
> + across CPUs. Set to 0 to disable load balancing during bypass mode. Valid
> + range is 0 to 10 s.
> +
> ABI Instability
> ===============
>
> --
> 2.43.0
>
Thanks,
-Andrea
next prev parent reply other threads:[~2026-03-19 6:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-19 5:30 [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters zhidao su
2026-03-19 5:30 ` [PATCH 2/2] selftests/sched_ext: Return non-zero exit code on test failure zhidao su
2026-03-19 6:49 ` Andrea Righi
2026-03-19 6:45 ` Andrea Righi [this message]
2026-03-19 7:04 ` [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters zhidao su
2026-03-19 7:01 ` [PATCH v2 0/2] sched_ext: documentation and selftest fixes zhidao su
2026-03-19 7:01 ` [PATCH v2 1/2] sched_ext: Documentation: Document events sysfs file and module parameters zhidao su
2026-03-19 7:01 ` [PATCH v2 2/2] selftests/sched_ext: Return non-zero exit code on test failure zhidao su
2026-03-21 18:42 ` [PATCH 1/2] sched_ext: Documentation: Document events sysfs file and module parameters Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abubiaubvpc2GqNH@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=sched-ext@lists.linux.dev \
--cc=soolaugust@gmail.com \
--cc=suzhidao@xiaomi.com \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.