public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: David Vernet <void@manifault.com>,
	Andrea Righi <arighi@nvidia.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: Emil Tsalapatis <emil@etsalapatis.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	Tejun Heo <tj@kernel.org>
Subject: [PATCHSET v4 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops
Date: Wed, 29 Apr 2026 08:21:14 -1000	[thread overview]
Message-ID: <20260429182131.1780125-1-tj@kernel.org> (raw)

Hello,

v4:
- cmask: bump CMASK_CAS_TRIES to (1U << 23) so abort fires only after
  seconds of real spinning, not on plausible contention. The kfunc
  slow-path Changwoo suggested would let BPF loops keep banging a
  contended cacheline indefinitely - on multi-socket SPRs that path
  can stall the machine into hard lockups, so failing hard is the
  right behavior. A follow-up patch will add a kfunc to bail the BPF
  CAS loops immediately when sch->aborting is set. Switch
  __builtin_ctzll() to the ctzll() wrapper for clang compat.
- cid-qmap-port: cid-shard handling was wired against a future kfunc
  signature that didn't make it into v3, leaving the snapshot broken.
  Drop the shard test plumbing for v4, match the 2-arg
  scx_bpf_cid_override(), bound nr_cpu_ids for the verifier, and
  rename mode 3 from bad-mono to bad-range. (Changwoo, Andrea)
- Rebased over the exit_cpu plumbing in for-7.2:
    scx-error-header: scx_exit() and scx_verror() are macros now;
    move both plus the underlying __scx_exit() / scx_vexit()
    declarations to ext_internal.h.
    cid-struct-ops: dump_cpu callsite shifted into scx_dump_cpu()
    helper; the scx_cpu_arg() wrap moved with it.

v3: https://lore.kernel.org/r/20260428203545.181052-1-tj@kernel.org
v2: https://lore.kernel.org/r/20260424172721.3458520-1-tj@kernel.org
v1: https://lore.kernel.org/r/20260421071945.3110084-1-tj@kernel.org

This patchset introduces topological CPU IDs (cids) - dense,
topology-ordered cpu identifiers - and an alternative cid-form struct_ops
type that lets BPF schedulers operate in cid space directly.

Key pieces:

- cid space: scx_cid_init() walks nodes * LLCs * cores * threads and packs
  a dense cid mapping. The mapping can be overridden via
  scx_bpf_cid_override(). See "Topological CPU IDs" in ext_cid.h for the
  model.

- cmask: a base-windowed bitmap over cid space. Kernel and BPF helpers with
  identical semantics. Used by scx_qmap for per-task affinity and idle-cid
  tracking; meant to be the substrate for sub-sched cid allocation.

- bpf_sched_ext_ops_cid: a parallel struct_ops type whose callbacks take
  cids/cmasks instead of cpus/cpumasks. Kernel translates at the boundary
  via scx_cpu_arg() / scx_cpu_ret(); the two struct types share offsets up
  through @priv (verified by BUILD_BUG_ON) so the union view in scx_sched
  works without function-pointer casts. Sub-sched support is tied to
  cid-form: validate_ops() rejects cpu-form sub-scheds and cpu-form roots
  that expose sub_attach / sub_detach.

- cid-form kfuncs: scx_bpf_kick_cid, scx_bpf_cidperf_{cap,cur,set},
  scx_bpf_cid_curr, scx_bpf_task_cid, scx_bpf_this_cid,
  scx_bpf_nr_{cids,online_cids}, scx_bpf_cid_to_cpu, scx_bpf_cpu_to_cid.
  A cid-form program may not call cpu-only kfuncs (enforced at verifier
  load via scx_kfunc_context_filter); the reverse is intentionally
  permissive to ease migration.

- scx_qmap port: scx_qmap is converted to cid-form. It uses the cmask-based
  idle picker, per-task cid-space cpus_allowed, and cid-form kfuncs
  throughout. Sub-sched dispatching via scx_bpf_sub_dispatch() continues to
  work.

v4 re-tested on the 16-cpu QEMU VM with the v3 cut (only the 17 cid
patches applied): basic load + stress, cid-override modes
(shuffle/bad-dup/bad-range), and three enable/disable cycles all clean.
No BUG/WARNING/panic in the dump.

Based on sched_ext/for-7.2 (ee8391ba1164).

  0001-sched_ext-Add-ext_types.h-for-early-subsystem-wide-d.patch
  0002-sched_ext-Rename-ops_cpu_valid-to-scx_cpu_valid-and-.patch
  0003-sched_ext-Move-scx_exit-scx_error-and-friends-to-ext.patch
  0004-sched_ext-Shift-scx_kick_cpu-validity-check-to-scx_b.patch
  0005-sched_ext-Relocate-cpu_acquire-cpu_release-to-end-of.patch
  0006-sched_ext-Make-scx_enable-take-scx_enable_cmd.patch
  0007-sched_ext-Add-topological-CPU-IDs-cids.patch
  0008-sched_ext-Add-scx_bpf_cid_override-kfunc.patch
  0009-tools-sched_ext-Add-struct_size-helpers-to-common.bp.patch
  0010-sched_ext-Add-cmask-a-base-windowed-bitmap-over-cid-.patch
  0011-sched_ext-Add-cid-form-kfunc-wrappers-alongside-cpu-.patch
  0012-sched_ext-Add-bpf_sched_ext_ops_cid-struct_ops-type.patch
  0013-sched_ext-Forbid-cpu-form-kfuncs-from-cid-form-sched.patch
  0014-tools-sched_ext-scx_qmap-Restart-on-hotplug-instead-.patch
  0015-tools-sched_ext-scx_qmap-Add-cmask-based-idle-tracki.patch
  0016-tools-sched_ext-scx_qmap-Port-to-cid-form-struct_ops.patch
  0017-sched_ext-Require-cid-form-struct_ops-for-sub-sched-.patch

Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-cid-v4

 kernel/sched/build_policy.c              |   3 +
 kernel/sched/ext.c                       | 660 ++++++++++++++++++++++++++----
 kernel/sched/ext_cid.c                   | 409 +++++++++++++++++++
 kernel/sched/ext_cid.h                   | 164 ++++++++
 kernel/sched/ext_idle.c                  |   8 +-
 kernel/sched/ext_internal.h              | 209 +++++++---
 kernel/sched/ext_types.h                 | 104 +++++
 tools/sched_ext/include/scx/cid.bpf.h    | 666 +++++++++++++++++++++++++++++++
 tools/sched_ext/include/scx/common.bpf.h |  23 ++
 tools/sched_ext/include/scx/compat.bpf.h |  24 ++
 tools/sched_ext/scx_qmap.bpf.c           | 350 +++++++++-------
 tools/sched_ext/scx_qmap.c               |  57 ++-
 tools/sched_ext/scx_qmap.h               |   2 +-
 13 files changed, 2387 insertions(+), 292 deletions(-)

Thanks.

--
tejun

             reply	other threads:[~2026-04-29 18:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29 18:21 Tejun Heo [this message]
2026-04-29 18:21 ` [PATCH 01/17] sched_ext: Add ext_types.h for early subsystem-wide defs Tejun Heo
2026-04-29 18:21 ` [PATCH 02/17] sched_ext: Rename ops_cpu_valid() to scx_cpu_valid() and expose it Tejun Heo
2026-04-29 18:21 ` [PATCH 03/17] sched_ext: Move scx_exit(), scx_error() and friends to ext_internal.h Tejun Heo
2026-04-29 18:21 ` [PATCH 04/17] sched_ext: Shift scx_kick_cpu() validity check to scx_bpf_kick_cpu() Tejun Heo
2026-04-29 18:21 ` [PATCH 05/17] sched_ext: Relocate cpu_acquire/cpu_release to end of struct sched_ext_ops Tejun Heo
2026-04-29 18:21 ` [PATCH 06/17] sched_ext: Make scx_enable() take scx_enable_cmd Tejun Heo
2026-04-29 18:21 ` [PATCH 07/17] sched_ext: Add topological CPU IDs (cids) Tejun Heo
2026-04-29 18:21 ` [PATCH 08/17] sched_ext: Add scx_bpf_cid_override() kfunc Tejun Heo
2026-04-29 18:21 ` [PATCH 09/17] tools/sched_ext: Add struct_size() helpers to common.bpf.h Tejun Heo
2026-04-29 18:21 ` [PATCH 10/17] sched_ext: Add cmask, a base-windowed bitmap over cid space Tejun Heo
2026-04-29 18:21 ` [PATCH 11/17] sched_ext: Add cid-form kfunc wrappers alongside cpu-form Tejun Heo
2026-04-29 18:21 ` [PATCH 12/17] sched_ext: Add bpf_sched_ext_ops_cid struct_ops type Tejun Heo
2026-04-29 18:21 ` [PATCH 13/17] sched_ext: Forbid cpu-form kfuncs from cid-form schedulers Tejun Heo
2026-04-29 18:21 ` [PATCH 14/17] tools/sched_ext: scx_qmap: Restart on hotplug instead of cpu_online/offline Tejun Heo
2026-04-29 18:21 ` [PATCH 15/17] tools/sched_ext: scx_qmap: Add cmask-based idle tracking and cid-based idle pick Tejun Heo
2026-04-29 18:21 ` [PATCH 16/17] tools/sched_ext: scx_qmap: Port to cid-form struct_ops Tejun Heo
2026-04-29 18:21 ` [PATCH 17/17] sched_ext: Require cid-form struct_ops for sub-sched support Tejun Heo
2026-04-29 18:29 ` [PATCHSET v4 sched_ext/for-7.2] sched_ext: Topological CPU IDs and cid-form struct_ops Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260429182131.1780125-1-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=emil@etsalapatis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox