* [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test
@ 2026-03-06 12:49 zhidao su
0 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 12:49 UTC (permalink / raw)
To: tj, sched-ext, linux-kernel
Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao
From: Su Zhidao <suzhidao@xiaomi.com>
Add a test that verifies the sched_ext bypass mechanism does not
prevent tasks from running to completion.
The test attaches a minimal global FIFO scheduler, spawns worker
processes that complete a fixed computation, detaches the scheduler
(which triggers bypass mode while workers are still active), and
verifies all workers complete successfully under bypass mode.
This exercises the scheduler attach/detach lifecycle and verifies
that bypass mode (activated during unregistration to guarantee
forward progress) does not stall running tasks.
Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
tools/testing/selftests/sched_ext/Makefile | 1 +
.../testing/selftests/sched_ext/bypass.bpf.c | 32 ++++++
tools/testing/selftests/sched_ext/bypass.c | 105 ++++++++++++++++++
3 files changed, 138 insertions(+)
create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
create mode 100644 tools/testing/selftests/sched_ext/bypass.c
diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
index a3bbe2c7911b..5fb6278d3f97 100644
--- a/tools/testing/selftests/sched_ext/Makefile
+++ b/tools/testing/selftests/sched_ext/Makefile
@@ -162,6 +162,7 @@ endef
all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
auto-test-targets := \
+ bypass \
create_dsq \
dequeue \
enq_last_no_enq_fails \
diff --git a/tools/testing/selftests/sched_ext/bypass.bpf.c b/tools/testing/selftests/sched_ext/bypass.bpf.c
new file mode 100644
index 000000000000..cb37c8df6834
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.bpf.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * BPF scheduler for bypass mode operational test.
+ *
+ * Implements a minimal global FIFO scheduler. The userspace side
+ * attaches this scheduler, runs worker tasks to completion, and
+ * verifies that tasks complete successfully.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#include <scx/common.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+UEI_DEFINE(uei);
+
+void BPF_STRUCT_OPS(bypass_enqueue, struct task_struct *p, u64 enq_flags)
+{
+ scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
+}
+
+void BPF_STRUCT_OPS(bypass_exit, struct scx_exit_info *ei)
+{
+ UEI_RECORD(uei, ei);
+}
+
+SEC(".struct_ops.link")
+struct sched_ext_ops bypass_ops = {
+ .enqueue = (void *)bypass_enqueue,
+ .exit = (void *)bypass_exit,
+ .name = "bypass_test",
+};
diff --git a/tools/testing/selftests/sched_ext/bypass.c b/tools/testing/selftests/sched_ext/bypass.c
new file mode 100644
index 000000000000..952f09d76bdb
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Verify the sched_ext bypass mechanism: spawn worker tasks and ensure
+ * they run to completion while a BPF scheduler is active.
+ *
+ * The bypass mechanism (activated on scheduler unregistration) must
+ * guarantee forward progress. This test verifies that worker tasks
+ * complete successfully when the scheduler is detached.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <sys/wait.h>
+#include <bpf/bpf.h>
+#include <scx/common.h>
+#include "scx_test.h"
+#include "bypass.bpf.skel.h"
+
+#define NUM_BYPASS_WORKERS 4
+
+static void worker_fn(void)
+{
+ volatile int sum = 0;
+ int i;
+
+ /*
+ * Do enough work to still be running when bpf_link__destroy()
+ * is called, ensuring tasks are active during bypass mode.
+ */
+ for (i = 0; i < 10000000; i++)
+ sum += i;
+}
+
+static enum scx_test_status setup(void **ctx)
+{
+ struct bypass *skel;
+
+ skel = bypass__open();
+ SCX_FAIL_IF(!skel, "Failed to open bypass skel");
+ SCX_ENUM_INIT(skel);
+ SCX_FAIL_IF(bypass__load(skel), "Failed to load bypass skel");
+
+ *ctx = skel;
+ return SCX_TEST_PASS;
+}
+
+static enum scx_test_status run(void *ctx)
+{
+ struct bypass *skel = ctx;
+ struct bpf_link *link;
+ pid_t pids[NUM_BYPASS_WORKERS];
+ int i, status;
+
+ link = bpf_map__attach_struct_ops(skel->maps.bypass_ops);
+ SCX_FAIL_IF(!link, "Failed to attach bypass scheduler");
+
+ /*
+ * Spawn worker processes. These must complete successfully
+ * even as the scheduler is active and then detached (which
+ * triggers bypass mode).
+ */
+ for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+ pids[i] = fork();
+ SCX_FAIL_IF(pids[i] < 0, "fork() failed for worker %d", i);
+
+ if (pids[i] == 0) {
+ worker_fn();
+ _exit(0);
+ }
+ }
+
+ /*
+ * Detach the scheduler while workers are still running. This
+ * triggers bypass mode, which must guarantee forward progress
+ * for all active tasks.
+ */
+ bpf_link__destroy(link);
+
+ /* Workers must complete successfully under bypass mode */
+ for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+ SCX_FAIL_IF(waitpid(pids[i], &status, 0) != pids[i],
+ "waitpid failed for worker %d", i);
+ SCX_FAIL_IF(!WIFEXITED(status) || WEXITSTATUS(status) != 0,
+ "Worker %d did not exit cleanly", i);
+ }
+
+ SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
+
+ return SCX_TEST_PASS;
+}
+
+static void cleanup(void *ctx)
+{
+ bypass__destroy(ctx);
+}
+
+struct scx_test bypass_test = {
+ .name = "bypass",
+ .description = "Verify tasks complete during bypass mode",
+ .setup = setup,
+ .run = run,
+ .cleanup = cleanup,
+};
+REGISTER_SCX_TEST(&bypass_test)
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest
@ 2026-03-06 14:03 zhidao su
2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
To: tj, sched-ext, linux-kernel
Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao
From: Su Zhidao <suzhidao@xiaomi.com>
This series does a small cleanup pass on the sched_ext bypass code path
and adds a selftest for the bypass mechanism.
Patch 1 removes SCX_OPS_HAS_CGROUP_WEIGHT, which was marked deprecated
in 6.15 with a "will be removed on 6.18" comment. We are now past that
point.
Patches 2-3 improve the bypass code in ext.c: add inline comments
explaining the bypass depth counter semantics and the dequeue/enqueue
re-queue loop, and replace rcu_dereference_all() with the more precise
rcu_dereference_bh() in scx_bypass_lb_timerfn() which runs in softirq
context.
Patch 4 adds a selftest that verifies forward progress under bypass
mode: worker processes are spawned while the scheduler is active, then
bpf_link__destroy() is called (triggering bypass), and the test confirms
all workers complete successfully.
Patch 5 adds a comment to the scx_bypass_depth declaration noting its
planned migration into struct scx_sched.
Tested on 6.18.7 with CONFIG_SCHED_CLASS_EXT=y; all existing selftests
pass.
Su Zhidao (5):
sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag
sched_ext: Add comments to scx_bypass() for bypass depth semantics
sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn()
sched_ext/selftests: Add bypass mode operational test
sched_ext: Document scx_bypass_depth migration path
kernel/sched/ext.c | 29 ++++-
kernel/sched/ext_internal.h | 8 +-
.../sched_ext/include/scx/enum_defs.autogen.h | 1 -
tools/sched_ext/scx_flatcg.bpf.c | 2 +-
tools/testing/selftests/sched_ext/Makefile | 1 +
.../testing/selftests/sched_ext/bypass.bpf.c | 32 ++++++
tools/testing/selftests/sched_ext/bypass.c | 105 ++++++++++++++++++
7 files changed, 165 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
create mode 100644 tools/testing/selftests/sched_ext/bypass.c
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag
2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
@ 2026-03-06 14:03 ` zhidao su
2026-03-06 14:03 ` [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics zhidao su
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
To: tj, sched-ext, linux-kernel
Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao
From: Su Zhidao <suzhidao@xiaomi.com>
SCX_OPS_HAS_CGROUP_WEIGHT was deprecated in 6.15 with a comment
'will be removed on 6.18'. Now that we are at 6.18, remove it.
The flag was a no-op and only triggered a pr_warn() on use. Remove
the flag definition, the warning, and update scx_flatcg which was
the last in-tree user.
Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
kernel/sched/ext.c | 3 ---
kernel/sched/ext_internal.h | 8 +-------
tools/sched_ext/include/scx/enum_defs.autogen.h | 1 -
tools/sched_ext/scx_flatcg.bpf.c | 2 +-
4 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c4ccd685259f..56ff5874af94 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5061,9 +5061,6 @@ static int validate_ops(struct scx_sched *sch, const struct sched_ext_ops *ops)
return -EINVAL;
}
- if (ops->flags & SCX_OPS_HAS_CGROUP_WEIGHT)
- pr_warn("SCX_OPS_HAS_CGROUP_WEIGHT is deprecated and a noop\n");
-
if (ops->cpu_acquire || ops->cpu_release)
pr_warn("ops->cpu_acquire/release() are deprecated, use sched_switch TP instead\n");
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index bd26811fea99..3c86c53e1975 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -174,19 +174,13 @@ enum scx_ops_flags {
*/
SCX_OPS_BUILTIN_IDLE_PER_NODE = 1LLU << 6,
- /*
- * CPU cgroup support flags
- */
- SCX_OPS_HAS_CGROUP_WEIGHT = 1LLU << 16, /* DEPRECATED, will be removed on 6.18 */
-
SCX_OPS_ALL_FLAGS = SCX_OPS_KEEP_BUILTIN_IDLE |
SCX_OPS_ENQ_LAST |
SCX_OPS_ENQ_EXITING |
SCX_OPS_ENQ_MIGRATION_DISABLED |
SCX_OPS_ALLOW_QUEUED_WAKEUP |
SCX_OPS_SWITCH_PARTIAL |
- SCX_OPS_BUILTIN_IDLE_PER_NODE |
- SCX_OPS_HAS_CGROUP_WEIGHT,
+ SCX_OPS_BUILTIN_IDLE_PER_NODE,
/* high 8 bits are internal, don't include in SCX_OPS_ALL_FLAGS */
__SCX_OPS_INTERNAL_MASK = 0xffLLU << 56,
diff --git a/tools/sched_ext/include/scx/enum_defs.autogen.h b/tools/sched_ext/include/scx/enum_defs.autogen.h
index dcc945304760..80c885f781ba 100644
--- a/tools/sched_ext/include/scx/enum_defs.autogen.h
+++ b/tools/sched_ext/include/scx/enum_defs.autogen.h
@@ -91,7 +91,6 @@
#define HAVE_SCX_OPS_SWITCH_PARTIAL
#define HAVE_SCX_OPS_ENQ_MIGRATION_DISABLED
#define HAVE_SCX_OPS_ALLOW_QUEUED_WAKEUP
-#define HAVE_SCX_OPS_HAS_CGROUP_WEIGHT
#define HAVE_SCX_OPS_ALL_FLAGS
#define HAVE_SCX_OPSS_NONE
#define HAVE_SCX_OPSS_QUEUEING
diff --git a/tools/sched_ext/scx_flatcg.bpf.c b/tools/sched_ext/scx_flatcg.bpf.c
index 0e785cff0f24..a8a9234bb41e 100644
--- a/tools/sched_ext/scx_flatcg.bpf.c
+++ b/tools/sched_ext/scx_flatcg.bpf.c
@@ -960,5 +960,5 @@ SCX_OPS_DEFINE(flatcg_ops,
.cgroup_move = (void *)fcg_cgroup_move,
.init = (void *)fcg_init,
.exit = (void *)fcg_exit,
- .flags = SCX_OPS_HAS_CGROUP_WEIGHT | SCX_OPS_ENQ_EXITING,
+ .flags = SCX_OPS_ENQ_EXITING,
.name = "flatcg");
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics
2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
@ 2026-03-06 14:03 ` zhidao su
2026-03-06 14:03 ` [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn() zhidao su
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
To: tj, sched-ext, linux-kernel
Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao
From: Su Zhidao <suzhidao@xiaomi.com>
The bypass depth counter (scx_bypass_depth) uses WRITE_ONCE/READ_ONCE
to communicate that it can be observed locklessly from IRQ context, even
though modifications are serialized by bypass_lock. The existing code did
not explain this pattern or the re-queue loop's role in propagating the
bypass state change to all CPUs.
Add inline comments to clarify:
- Why bypass_depth uses WRITE_ONCE/READ_ONCE despite lock protection
- How the dequeue/enqueue cycle propagates bypass state to all per-CPU queues
Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
kernel/sched/ext.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 56ff5874af94..053d99c58802 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4229,6 +4229,14 @@ static void scx_bypass(bool bypass)
if (bypass) {
u32 intv_us;
+ /*
+ * Increment bypass depth. Only the first caller (depth 0->1)
+ * needs to set up the bypass state; subsequent callers just
+ * increment the counter and return. The depth counter is
+ * protected by bypass_lock but READ_ONCE/WRITE_ONCE are used
+ * to communicate that the value can be observed locklessly
+ * (e.g., from scx_bypass_lb_timerfn() in softirq context).
+ */
WRITE_ONCE(scx_bypass_depth, scx_bypass_depth + 1);
WARN_ON_ONCE(scx_bypass_depth <= 0);
if (scx_bypass_depth != 1)
@@ -4263,6 +4271,10 @@ static void scx_bypass(bool bypass)
*
* This function can't trust the scheduler and thus can't use
* cpus_read_lock(). Walk all possible CPUs instead of online.
+ *
+ * The dequeue/enqueue cycle forces tasks through the updated code
+ * paths: in bypass mode, do_enqueue_task() routes to the per-CPU
+ * bypass DSQ instead of calling ops.enqueue().
*/
for_each_possible_cpu(cpu) {
struct rq *rq = cpu_rq(cpu);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn()
2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
2026-03-06 14:03 ` [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics zhidao su
@ 2026-03-06 14:03 ` zhidao su
2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
To: tj, sched-ext, linux-kernel
Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao
From: Su Zhidao <suzhidao@xiaomi.com>
scx_bypass_lb_timerfn() runs in softirq (BH) context, so
rcu_dereference_bh() is the correct RCU accessor. The previous
rcu_dereference_all() suppresses all sparse warnings and masks
potential RCU context issues.
Add a comment noting this is a transitional state: when
multi-scheduler support lands, the bypass LB timer will become
per-scheduler and the global scx_root reference will be removed.
Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
kernel/sched/ext.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 053d99c58802..c269e489902c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4170,7 +4170,13 @@ static void scx_bypass_lb_timerfn(struct timer_list *timer)
int node;
u32 intv_us;
- sch = rcu_dereference_all(scx_root);
+ /*
+ * scx_bypass_lb_timer is a global timer that fires in softirq
+ * context while bypass mode is active. Use rcu_dereference_bh()
+ * matching the BH context. When multi-scheduler support lands,
+ * this timer will become per-scheduler instance.
+ */
+ sch = rcu_dereference_bh(scx_root);
if (unlikely(!sch) || !READ_ONCE(scx_bypass_depth))
return;
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test
2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
` (2 preceding siblings ...)
2026-03-06 14:03 ` [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn() zhidao su
@ 2026-03-06 14:03 ` zhidao su
2026-03-06 15:02 ` Andrea Righi
2026-03-06 14:03 ` [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path zhidao su
2026-03-06 15:02 ` [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest Andrea Righi
5 siblings, 1 reply; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
To: tj, sched-ext, linux-kernel
Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao
From: Su Zhidao <suzhidao@xiaomi.com>
Add a test that verifies the sched_ext bypass mechanism does not
prevent tasks from running to completion.
The test attaches a minimal global FIFO scheduler, spawns worker
processes that complete a fixed computation, detaches the scheduler
(which triggers bypass mode while workers are still active), and
verifies all workers complete successfully under bypass mode.
This exercises the scheduler attach/detach lifecycle and verifies
that bypass mode (activated during unregistration to guarantee
forward progress) does not stall running tasks.
Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
tools/testing/selftests/sched_ext/Makefile | 1 +
.../testing/selftests/sched_ext/bypass.bpf.c | 32 ++++++
tools/testing/selftests/sched_ext/bypass.c | 105 ++++++++++++++++++
3 files changed, 138 insertions(+)
create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
create mode 100644 tools/testing/selftests/sched_ext/bypass.c
diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
index a3bbe2c7911b..5fb6278d3f97 100644
--- a/tools/testing/selftests/sched_ext/Makefile
+++ b/tools/testing/selftests/sched_ext/Makefile
@@ -162,6 +162,7 @@ endef
all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
auto-test-targets := \
+ bypass \
create_dsq \
dequeue \
enq_last_no_enq_fails \
diff --git a/tools/testing/selftests/sched_ext/bypass.bpf.c b/tools/testing/selftests/sched_ext/bypass.bpf.c
new file mode 100644
index 000000000000..cb37c8df6834
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.bpf.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * BPF scheduler for bypass mode operational test.
+ *
+ * Implements a minimal global FIFO scheduler. The userspace side
+ * attaches this scheduler, runs worker tasks to completion, and
+ * verifies that tasks complete successfully.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#include <scx/common.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+UEI_DEFINE(uei);
+
+void BPF_STRUCT_OPS(bypass_enqueue, struct task_struct *p, u64 enq_flags)
+{
+ scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
+}
+
+void BPF_STRUCT_OPS(bypass_exit, struct scx_exit_info *ei)
+{
+ UEI_RECORD(uei, ei);
+}
+
+SEC(".struct_ops.link")
+struct sched_ext_ops bypass_ops = {
+ .enqueue = (void *)bypass_enqueue,
+ .exit = (void *)bypass_exit,
+ .name = "bypass_test",
+};
diff --git a/tools/testing/selftests/sched_ext/bypass.c b/tools/testing/selftests/sched_ext/bypass.c
new file mode 100644
index 000000000000..952f09d76bdb
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Verify the sched_ext bypass mechanism: spawn worker tasks and ensure
+ * they run to completion while a BPF scheduler is active.
+ *
+ * The bypass mechanism (activated on scheduler unregistration) must
+ * guarantee forward progress. This test verifies that worker tasks
+ * complete successfully when the scheduler is detached.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <sys/wait.h>
+#include <bpf/bpf.h>
+#include <scx/common.h>
+#include "scx_test.h"
+#include "bypass.bpf.skel.h"
+
+#define NUM_BYPASS_WORKERS 4
+
+static void worker_fn(void)
+{
+ volatile int sum = 0;
+ int i;
+
+ /*
+ * Do enough work to still be running when bpf_link__destroy()
+ * is called, ensuring tasks are active during bypass mode.
+ */
+ for (i = 0; i < 10000000; i++)
+ sum += i;
+}
+
+static enum scx_test_status setup(void **ctx)
+{
+ struct bypass *skel;
+
+ skel = bypass__open();
+ SCX_FAIL_IF(!skel, "Failed to open bypass skel");
+ SCX_ENUM_INIT(skel);
+ SCX_FAIL_IF(bypass__load(skel), "Failed to load bypass skel");
+
+ *ctx = skel;
+ return SCX_TEST_PASS;
+}
+
+static enum scx_test_status run(void *ctx)
+{
+ struct bypass *skel = ctx;
+ struct bpf_link *link;
+ pid_t pids[NUM_BYPASS_WORKERS];
+ int i, status;
+
+ link = bpf_map__attach_struct_ops(skel->maps.bypass_ops);
+ SCX_FAIL_IF(!link, "Failed to attach bypass scheduler");
+
+ /*
+ * Spawn worker processes. These must complete successfully
+ * even as the scheduler is active and then detached (which
+ * triggers bypass mode).
+ */
+ for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+ pids[i] = fork();
+ SCX_FAIL_IF(pids[i] < 0, "fork() failed for worker %d", i);
+
+ if (pids[i] == 0) {
+ worker_fn();
+ _exit(0);
+ }
+ }
+
+ /*
+ * Detach the scheduler while workers are still running. This
+ * triggers bypass mode, which must guarantee forward progress
+ * for all active tasks.
+ */
+ bpf_link__destroy(link);
+
+ /* Workers must complete successfully under bypass mode */
+ for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+ SCX_FAIL_IF(waitpid(pids[i], &status, 0) != pids[i],
+ "waitpid failed for worker %d", i);
+ SCX_FAIL_IF(!WIFEXITED(status) || WEXITSTATUS(status) != 0,
+ "Worker %d did not exit cleanly", i);
+ }
+
+ SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
+
+ return SCX_TEST_PASS;
+}
+
+static void cleanup(void *ctx)
+{
+ bypass__destroy(ctx);
+}
+
+struct scx_test bypass_test = {
+ .name = "bypass",
+ .description = "Verify tasks complete during bypass mode",
+ .setup = setup,
+ .run = run,
+ .cleanup = cleanup,
+};
+REGISTER_SCX_TEST(&bypass_test)
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path
2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
` (3 preceding siblings ...)
2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
@ 2026-03-06 14:03 ` zhidao su
2026-03-06 15:02 ` [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest Andrea Righi
5 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
To: tj, sched-ext, linux-kernel
Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao
From: Su Zhidao <suzhidao@xiaomi.com>
scx_bypass_depth is a global counter that will be moved into
struct scx_sched when multi-scheduler support lands. Add a comment
explaining why READ_ONCE/WRITE_ONCE are used despite bypass_lock
serialization: modifications are serialized by the lock, but the
value can be observed locklessly from softirq context (e.g., in
scx_bypass_lb_timerfn()).
Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
kernel/sched/ext.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c269e489902c..b1e5a95682c1 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -33,6 +33,12 @@ static DEFINE_MUTEX(scx_enable_mutex);
DEFINE_STATIC_KEY_FALSE(__scx_enabled);
DEFINE_STATIC_PERCPU_RWSEM(scx_fork_rwsem);
static atomic_t scx_enable_state_var = ATOMIC_INIT(SCX_DISABLED);
+/*
+ * Counts the number of active bypass requests. Protected by bypass_lock
+ * inside scx_bypass(), but read locklessly (e.g., from
+ * scx_bypass_lb_timerfn() in softirq context) using READ_ONCE(). Will
+ * be moved into struct scx_sched when multi-scheduler support lands.
+ */
static int scx_bypass_depth;
static cpumask_var_t scx_bypass_lb_donee_cpumask;
static cpumask_var_t scx_bypass_lb_resched_cpumask;
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test
2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
@ 2026-03-06 15:02 ` Andrea Righi
0 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2026-03-06 15:02 UTC (permalink / raw)
To: zhidao su
Cc: tj, sched-ext, linux-kernel, void, changwoo, linux-kselftest,
Su Zhidao
Hi,
On Fri, Mar 06, 2026 at 10:03:24PM +0800, zhidao su wrote:
> From: Su Zhidao <suzhidao@xiaomi.com>
>
> Add a test that verifies the sched_ext bypass mechanism does not
> prevent tasks from running to completion.
>
> The test attaches a minimal global FIFO scheduler, spawns worker
> processes that complete a fixed computation, detaches the scheduler
> (which triggers bypass mode while workers are still active), and
> verifies all workers complete successfully under bypass mode.
>
> This exercises the scheduler attach/detach lifecycle and verifies
> that bypass mode (activated during unregistration to guarantee
> forward progress) does not stall running tasks.
I'm not sure this selftest adds much value. Implicitly we're already
testing the validity of bypass in the other sched_ext kselftests: if a task
is missed or gets stuck due to bypass mode, we would trigger a soft lockup,
a hung task timeout, or something similar.
>
> Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
> ---
> tools/testing/selftests/sched_ext/Makefile | 1 +
> .../testing/selftests/sched_ext/bypass.bpf.c | 32 ++++++
> tools/testing/selftests/sched_ext/bypass.c | 105 ++++++++++++++++++
> 3 files changed, 138 insertions(+)
> create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
> create mode 100644 tools/testing/selftests/sched_ext/bypass.c
>
> diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
> index a3bbe2c7911b..5fb6278d3f97 100644
> --- a/tools/testing/selftests/sched_ext/Makefile
> +++ b/tools/testing/selftests/sched_ext/Makefile
> @@ -162,6 +162,7 @@ endef
> all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
>
> auto-test-targets := \
> + bypass \
> create_dsq \
> dequeue \
> enq_last_no_enq_fails \
> diff --git a/tools/testing/selftests/sched_ext/bypass.bpf.c b/tools/testing/selftests/sched_ext/bypass.bpf.c
> new file mode 100644
> index 000000000000..cb37c8df6834
> --- /dev/null
> +++ b/tools/testing/selftests/sched_ext/bypass.bpf.c
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * BPF scheduler for bypass mode operational test.
> + *
> + * Implements a minimal global FIFO scheduler. The userspace side
> + * attaches this scheduler, runs worker tasks to completion, and
> + * verifies that tasks complete successfully.
> + *
> + * Copyright (c) 2026 Xiaomi Corporation.
> + */
> +#include <scx/common.bpf.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +UEI_DEFINE(uei);
> +
> +void BPF_STRUCT_OPS(bypass_enqueue, struct task_struct *p, u64 enq_flags)
> +{
> + scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
> +}
We could also remove bypass_enqueue() and sched_ext core will do exactly
the same (implicitly enqueue to SCX_DSQ_GLOBAL).
> +
> +void BPF_STRUCT_OPS(bypass_exit, struct scx_exit_info *ei)
> +{
> + UEI_RECORD(uei, ei);
> +}
> +
> +SEC(".struct_ops.link")
> +struct sched_ext_ops bypass_ops = {
> + .enqueue = (void *)bypass_enqueue,
> + .exit = (void *)bypass_exit,
> + .name = "bypass_test",
> +};
> diff --git a/tools/testing/selftests/sched_ext/bypass.c b/tools/testing/selftests/sched_ext/bypass.c
> new file mode 100644
> index 000000000000..952f09d76bdb
> --- /dev/null
> +++ b/tools/testing/selftests/sched_ext/bypass.c
> @@ -0,0 +1,105 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Verify the sched_ext bypass mechanism: spawn worker tasks and ensure
> + * they run to completion while a BPF scheduler is active.
> + *
> + * The bypass mechanism (activated on scheduler unregistration) must
> + * guarantee forward progress. This test verifies that worker tasks
> + * complete successfully when the scheduler is detached.
> + *
> + * Copyright (c) 2026 Xiaomi Corporation.
> + */
> +#define _GNU_SOURCE
> +#include <unistd.h>
> +#include <sys/wait.h>
> +#include <bpf/bpf.h>
> +#include <scx/common.h>
> +#include "scx_test.h"
> +#include "bypass.bpf.skel.h"
> +
> +#define NUM_BYPASS_WORKERS 4
> +
> +static void worker_fn(void)
> +{
> + volatile int sum = 0;
> + int i;
> +
> + /*
> + * Do enough work to still be running when bpf_link__destroy()
> + * is called, ensuring tasks are active during bypass mode.
> + */
> + for (i = 0; i < 10000000; i++)
> + sum += i;
> +}
> +
> +static enum scx_test_status setup(void **ctx)
> +{
> + struct bypass *skel;
> +
> + skel = bypass__open();
> + SCX_FAIL_IF(!skel, "Failed to open bypass skel");
> + SCX_ENUM_INIT(skel);
> + SCX_FAIL_IF(bypass__load(skel), "Failed to load bypass skel");
> +
> + *ctx = skel;
> + return SCX_TEST_PASS;
> +}
> +
> +static enum scx_test_status run(void *ctx)
> +{
> + struct bypass *skel = ctx;
> + struct bpf_link *link;
> + pid_t pids[NUM_BYPASS_WORKERS];
> + int i, status;
> +
> + link = bpf_map__attach_struct_ops(skel->maps.bypass_ops);
> + SCX_FAIL_IF(!link, "Failed to attach bypass scheduler");
> +
> + /*
> + * Spawn worker processes. These must complete successfully
> + * even as the scheduler is active and then detached (which
> + * triggers bypass mode).
> + */
> + for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
> + pids[i] = fork();
> + SCX_FAIL_IF(pids[i] < 0, "fork() failed for worker %d", i);
> +
> + if (pids[i] == 0) {
> + worker_fn();
> + _exit(0);
> + }
> + }
There's no synchronization with the parent, so on a fast system the workers
may even finish the loop before the parent ever detaches the scheduler.
> +
> + /*
> + * Detach the scheduler while workers are still running. This
> + * triggers bypass mode, which must guarantee forward progress
> + * for all active tasks.
> + */
> + bpf_link__destroy(link);
> +
> + /* Workers must complete successfully under bypass mode */
> + for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
> + SCX_FAIL_IF(waitpid(pids[i], &status, 0) != pids[i],
> + "waitpid failed for worker %d", i);
> + SCX_FAIL_IF(!WIFEXITED(status) || WEXITSTATUS(status) != 0,
> + "Worker %d did not exit cleanly", i);
> + }
> +
> + SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
> +
> + return SCX_TEST_PASS;
> +}
> +
> +static void cleanup(void *ctx)
> +{
> + bypass__destroy(ctx);
> +}
> +
> +struct scx_test bypass_test = {
> + .name = "bypass",
> + .description = "Verify tasks complete during bypass mode",
> + .setup = setup,
> + .run = run,
> + .cleanup = cleanup,
> +};
> +REGISTER_SCX_TEST(&bypass_test)
> --
> 2.43.0
>
Thanks,
-Andrea
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest
2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
` (4 preceding siblings ...)
2026-03-06 14:03 ` [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path zhidao su
@ 2026-03-06 15:02 ` Andrea Righi
5 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2026-03-06 15:02 UTC (permalink / raw)
To: zhidao su
Cc: tj, sched-ext, linux-kernel, void, changwoo, linux-kselftest,
Su Zhidao
Hi,
On Fri, Mar 06, 2026 at 10:03:20PM +0800, zhidao su wrote:
> From: Su Zhidao <suzhidao@xiaomi.com>
>
> This series does a small cleanup pass on the sched_ext bypass code path
> and adds a selftest for the bypass mechanism.
>
> Patch 1 removes SCX_OPS_HAS_CGROUP_WEIGHT, which was marked deprecated
> in 6.15 with a "will be removed on 6.18" comment. We are now past that
> point.
See:
https://lore.kernel.org/all/20260306073110.229595-1-zhaomzhao@126.com/
>
> Patches 2-3 improve the bypass code in ext.c: add inline comments
> explaining the bypass depth counter semantics and the dequeue/enqueue
> re-queue loop, and replace rcu_dereference_all() with the more precise
> rcu_dereference_bh() in scx_bypass_lb_timerfn() which runs in softirq
> context.
These patches don't really improve code, they just add comments. Which is
nice, it's good to improve documentation, but documentation should help
understand better the high-level semantic, or clarify non-obvious
implemenatation details. In this case you're just commenting how the
specific code works, which should be already clear enough just by looking
at the code IMHO.
>
> Patch 4 adds a selftest that verifies forward progress under bypass
> mode: worker processes are spawned while the scheduler is active, then
> bpf_link__destroy() is called (triggering bypass), and the test confirms
> all workers complete successfully.
Already commented on the patch.
>
> Patch 5 adds a comment to the scx_bypass_depth declaration noting its
> planned migration into struct scx_sched.
Ditto about documentation.
Thanks,
-Andrea
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-06 15:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
2026-03-06 14:03 ` [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics zhidao su
2026-03-06 14:03 ` [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn() zhidao su
2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
2026-03-06 15:02 ` Andrea Righi
2026-03-06 14:03 ` [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path zhidao su
2026-03-06 15:02 ` [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest Andrea Righi
-- strict thread matches above, loose matches on Subject: below --
2026-03-06 12:49 [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox