public inbox for sched-ext@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test
@ 2026-03-06 12:49 zhidao su
  0 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 12:49 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

Add a test that verifies the sched_ext bypass mechanism does not
prevent tasks from running to completion.

The test attaches a minimal global FIFO scheduler, spawns worker
processes that complete a fixed computation, detaches the scheduler
(which triggers bypass mode while workers are still active), and
verifies all workers complete successfully under bypass mode.

This exercises the scheduler attach/detach lifecycle and verifies
that bypass mode (activated during unregistration to guarantee
forward progress) does not stall running tasks.

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../testing/selftests/sched_ext/bypass.bpf.c  |  32 ++++++
 tools/testing/selftests/sched_ext/bypass.c    | 105 ++++++++++++++++++
 3 files changed, 138 insertions(+)
 create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/bypass.c

diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
index a3bbe2c7911b..5fb6278d3f97 100644
--- a/tools/testing/selftests/sched_ext/Makefile
+++ b/tools/testing/selftests/sched_ext/Makefile
@@ -162,6 +162,7 @@ endef
 all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
 
 auto-test-targets :=			\
+	bypass				\
 	create_dsq			\
 	dequeue				\
 	enq_last_no_enq_fails		\
diff --git a/tools/testing/selftests/sched_ext/bypass.bpf.c b/tools/testing/selftests/sched_ext/bypass.bpf.c
new file mode 100644
index 000000000000..cb37c8df6834
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.bpf.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * BPF scheduler for bypass mode operational test.
+ *
+ * Implements a minimal global FIFO scheduler. The userspace side
+ * attaches this scheduler, runs worker tasks to completion, and
+ * verifies that tasks complete successfully.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#include <scx/common.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+UEI_DEFINE(uei);
+
+void BPF_STRUCT_OPS(bypass_enqueue, struct task_struct *p, u64 enq_flags)
+{
+	scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
+}
+
+void BPF_STRUCT_OPS(bypass_exit, struct scx_exit_info *ei)
+{
+	UEI_RECORD(uei, ei);
+}
+
+SEC(".struct_ops.link")
+struct sched_ext_ops bypass_ops = {
+	.enqueue		= (void *)bypass_enqueue,
+	.exit			= (void *)bypass_exit,
+	.name			= "bypass_test",
+};
diff --git a/tools/testing/selftests/sched_ext/bypass.c b/tools/testing/selftests/sched_ext/bypass.c
new file mode 100644
index 000000000000..952f09d76bdb
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Verify the sched_ext bypass mechanism: spawn worker tasks and ensure
+ * they run to completion while a BPF scheduler is active.
+ *
+ * The bypass mechanism (activated on scheduler unregistration) must
+ * guarantee forward progress. This test verifies that worker tasks
+ * complete successfully when the scheduler is detached.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <sys/wait.h>
+#include <bpf/bpf.h>
+#include <scx/common.h>
+#include "scx_test.h"
+#include "bypass.bpf.skel.h"
+
+#define NUM_BYPASS_WORKERS 4
+
+static void worker_fn(void)
+{
+	volatile int sum = 0;
+	int i;
+
+	/*
+	 * Do enough work to still be running when bpf_link__destroy()
+	 * is called, ensuring tasks are active during bypass mode.
+	 */
+	for (i = 0; i < 10000000; i++)
+		sum += i;
+}
+
+static enum scx_test_status setup(void **ctx)
+{
+	struct bypass *skel;
+
+	skel = bypass__open();
+	SCX_FAIL_IF(!skel, "Failed to open bypass skel");
+	SCX_ENUM_INIT(skel);
+	SCX_FAIL_IF(bypass__load(skel), "Failed to load bypass skel");
+
+	*ctx = skel;
+	return SCX_TEST_PASS;
+}
+
+static enum scx_test_status run(void *ctx)
+{
+	struct bypass *skel = ctx;
+	struct bpf_link *link;
+	pid_t pids[NUM_BYPASS_WORKERS];
+	int i, status;
+
+	link = bpf_map__attach_struct_ops(skel->maps.bypass_ops);
+	SCX_FAIL_IF(!link, "Failed to attach bypass scheduler");
+
+	/*
+	 * Spawn worker processes. These must complete successfully
+	 * even as the scheduler is active and then detached (which
+	 * triggers bypass mode).
+	 */
+	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+		pids[i] = fork();
+		SCX_FAIL_IF(pids[i] < 0, "fork() failed for worker %d", i);
+
+		if (pids[i] == 0) {
+			worker_fn();
+			_exit(0);
+		}
+	}
+
+	/*
+	 * Detach the scheduler while workers are still running. This
+	 * triggers bypass mode, which must guarantee forward progress
+	 * for all active tasks.
+	 */
+	bpf_link__destroy(link);
+
+	/* Workers must complete successfully under bypass mode */
+	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+		SCX_FAIL_IF(waitpid(pids[i], &status, 0) != pids[i],
+			    "waitpid failed for worker %d", i);
+		SCX_FAIL_IF(!WIFEXITED(status) || WEXITSTATUS(status) != 0,
+			    "Worker %d did not exit cleanly", i);
+	}
+
+	SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
+
+	return SCX_TEST_PASS;
+}
+
+static void cleanup(void *ctx)
+{
+	bypass__destroy(ctx);
+}
+
+struct scx_test bypass_test = {
+	.name		= "bypass",
+	.description	= "Verify tasks complete during bypass mode",
+	.setup		= setup,
+	.run		= run,
+	.cleanup	= cleanup,
+};
+REGISTER_SCX_TEST(&bypass_test)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest
@ 2026-03-06 14:03 zhidao su
  2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

This series does a small cleanup pass on the sched_ext bypass code path
and adds a selftest for the bypass mechanism.

Patch 1 removes SCX_OPS_HAS_CGROUP_WEIGHT, which was marked deprecated
in 6.15 with a "will be removed on 6.18" comment. We are now past that
point.

Patches 2-3 improve the bypass code in ext.c: add inline comments
explaining the bypass depth counter semantics and the dequeue/enqueue
re-queue loop, and replace rcu_dereference_all() with the more precise
rcu_dereference_bh() in scx_bypass_lb_timerfn() which runs in softirq
context.

Patch 4 adds a selftest that verifies forward progress under bypass
mode: worker processes are spawned while the scheduler is active, then
bpf_link__destroy() is called (triggering bypass), and the test confirms
all workers complete successfully.

Patch 5 adds a comment to the scx_bypass_depth declaration noting its
planned migration into struct scx_sched.

Tested on 6.18.7 with CONFIG_SCHED_CLASS_EXT=y; all existing selftests
pass.

Su Zhidao (5):
  sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag
  sched_ext: Add comments to scx_bypass() for bypass depth semantics
  sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn()
  sched_ext/selftests: Add bypass mode operational test
  sched_ext: Document scx_bypass_depth migration path

 kernel/sched/ext.c                            |  29 ++++-
 kernel/sched/ext_internal.h                   |   8 +-
 .../sched_ext/include/scx/enum_defs.autogen.h |   1 -
 tools/sched_ext/scx_flatcg.bpf.c              |   2 +-
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../testing/selftests/sched_ext/bypass.bpf.c  |  32 ++++++
 tools/testing/selftests/sched_ext/bypass.c    | 105 ++++++++++++++++++
 7 files changed, 165 insertions(+), 13 deletions(-)
 create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/bypass.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 14:03 ` [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics zhidao su
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

SCX_OPS_HAS_CGROUP_WEIGHT was deprecated in 6.15 with a comment
'will be removed on 6.18'. Now that we are at 6.18, remove it.

The flag was a no-op and only triggered a pr_warn() on use. Remove
the flag definition, the warning, and update scx_flatcg which was
the last in-tree user.

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 kernel/sched/ext.c                              | 3 ---
 kernel/sched/ext_internal.h                     | 8 +-------
 tools/sched_ext/include/scx/enum_defs.autogen.h | 1 -
 tools/sched_ext/scx_flatcg.bpf.c                | 2 +-
 4 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c4ccd685259f..56ff5874af94 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5061,9 +5061,6 @@ static int validate_ops(struct scx_sched *sch, const struct sched_ext_ops *ops)
 		return -EINVAL;
 	}
 
-	if (ops->flags & SCX_OPS_HAS_CGROUP_WEIGHT)
-		pr_warn("SCX_OPS_HAS_CGROUP_WEIGHT is deprecated and a noop\n");
-
 	if (ops->cpu_acquire || ops->cpu_release)
 		pr_warn("ops->cpu_acquire/release() are deprecated, use sched_switch TP instead\n");
 
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index bd26811fea99..3c86c53e1975 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -174,19 +174,13 @@ enum scx_ops_flags {
 	 */
 	SCX_OPS_BUILTIN_IDLE_PER_NODE	= 1LLU << 6,
 
-	/*
-	 * CPU cgroup support flags
-	 */
-	SCX_OPS_HAS_CGROUP_WEIGHT	= 1LLU << 16,	/* DEPRECATED, will be removed on 6.18 */
-
 	SCX_OPS_ALL_FLAGS		= SCX_OPS_KEEP_BUILTIN_IDLE |
 					  SCX_OPS_ENQ_LAST |
 					  SCX_OPS_ENQ_EXITING |
 					  SCX_OPS_ENQ_MIGRATION_DISABLED |
 					  SCX_OPS_ALLOW_QUEUED_WAKEUP |
 					  SCX_OPS_SWITCH_PARTIAL |
-					  SCX_OPS_BUILTIN_IDLE_PER_NODE |
-					  SCX_OPS_HAS_CGROUP_WEIGHT,
+					  SCX_OPS_BUILTIN_IDLE_PER_NODE,
 
 	/* high 8 bits are internal, don't include in SCX_OPS_ALL_FLAGS */
 	__SCX_OPS_INTERNAL_MASK		= 0xffLLU << 56,
diff --git a/tools/sched_ext/include/scx/enum_defs.autogen.h b/tools/sched_ext/include/scx/enum_defs.autogen.h
index dcc945304760..80c885f781ba 100644
--- a/tools/sched_ext/include/scx/enum_defs.autogen.h
+++ b/tools/sched_ext/include/scx/enum_defs.autogen.h
@@ -91,7 +91,6 @@
 #define HAVE_SCX_OPS_SWITCH_PARTIAL
 #define HAVE_SCX_OPS_ENQ_MIGRATION_DISABLED
 #define HAVE_SCX_OPS_ALLOW_QUEUED_WAKEUP
-#define HAVE_SCX_OPS_HAS_CGROUP_WEIGHT
 #define HAVE_SCX_OPS_ALL_FLAGS
 #define HAVE_SCX_OPSS_NONE
 #define HAVE_SCX_OPSS_QUEUEING
diff --git a/tools/sched_ext/scx_flatcg.bpf.c b/tools/sched_ext/scx_flatcg.bpf.c
index 0e785cff0f24..a8a9234bb41e 100644
--- a/tools/sched_ext/scx_flatcg.bpf.c
+++ b/tools/sched_ext/scx_flatcg.bpf.c
@@ -960,5 +960,5 @@ SCX_OPS_DEFINE(flatcg_ops,
 	       .cgroup_move		= (void *)fcg_cgroup_move,
 	       .init			= (void *)fcg_init,
 	       .exit			= (void *)fcg_exit,
-	       .flags			= SCX_OPS_HAS_CGROUP_WEIGHT | SCX_OPS_ENQ_EXITING,
+	       .flags			= SCX_OPS_ENQ_EXITING,
 	       .name			= "flatcg");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
  2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 14:03 ` [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn() zhidao su
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

The bypass depth counter (scx_bypass_depth) uses WRITE_ONCE/READ_ONCE
to communicate that it can be observed locklessly from IRQ context, even
though modifications are serialized by bypass_lock. The existing code did
not explain this pattern or the re-queue loop's role in propagating the
bypass state change to all CPUs.

Add inline comments to clarify:
- Why bypass_depth uses WRITE_ONCE/READ_ONCE despite lock protection
- How the dequeue/enqueue cycle propagates bypass state to all per-CPU queues

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 kernel/sched/ext.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 56ff5874af94..053d99c58802 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4229,6 +4229,14 @@ static void scx_bypass(bool bypass)
 	if (bypass) {
 		u32 intv_us;
 
+		/*
+		 * Increment bypass depth. Only the first caller (depth 0->1)
+		 * needs to set up the bypass state; subsequent callers just
+		 * increment the counter and return. The depth counter is
+		 * protected by bypass_lock but READ_ONCE/WRITE_ONCE are used
+		 * to communicate that the value can be observed locklessly
+		 * (e.g., from scx_bypass_lb_timerfn() in softirq context).
+		 */
 		WRITE_ONCE(scx_bypass_depth, scx_bypass_depth + 1);
 		WARN_ON_ONCE(scx_bypass_depth <= 0);
 		if (scx_bypass_depth != 1)
@@ -4263,6 +4271,10 @@ static void scx_bypass(bool bypass)
 	 *
 	 * This function can't trust the scheduler and thus can't use
 	 * cpus_read_lock(). Walk all possible CPUs instead of online.
+	 *
+	 * The dequeue/enqueue cycle forces tasks through the updated code
+	 * paths: in bypass mode, do_enqueue_task() routes to the per-CPU
+	 * bypass DSQ instead of calling ops.enqueue().
 	 */
 	for_each_possible_cpu(cpu) {
 		struct rq *rq = cpu_rq(cpu);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn()
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
  2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
  2026-03-06 14:03 ` [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

scx_bypass_lb_timerfn() runs in softirq (BH) context, so
rcu_dereference_bh() is the correct RCU accessor. The previous
rcu_dereference_all() suppresses all sparse warnings and masks
potential RCU context issues.

Add a comment noting this is a transitional state: when
multi-scheduler support lands, the bypass LB timer will become
per-scheduler and the global scx_root reference will be removed.

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 kernel/sched/ext.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 053d99c58802..c269e489902c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4170,7 +4170,13 @@ static void scx_bypass_lb_timerfn(struct timer_list *timer)
 	int node;
 	u32 intv_us;
 
-	sch = rcu_dereference_all(scx_root);
+	/*
+	 * scx_bypass_lb_timer is a global timer that fires in softirq
+	 * context while bypass mode is active. Use rcu_dereference_bh()
+	 * matching the BH context. When multi-scheduler support lands,
+	 * this timer will become per-scheduler instance.
+	 */
+	sch = rcu_dereference_bh(scx_root);
 	if (unlikely(!sch) || !READ_ONCE(scx_bypass_depth))
 		return;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
                   ` (2 preceding siblings ...)
  2026-03-06 14:03 ` [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn() zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 15:02   ` Andrea Righi
  2026-03-06 14:03 ` [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path zhidao su
  2026-03-06 15:02 ` [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest Andrea Righi
  5 siblings, 1 reply; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

Add a test that verifies the sched_ext bypass mechanism does not
prevent tasks from running to completion.

The test attaches a minimal global FIFO scheduler, spawns worker
processes that complete a fixed computation, detaches the scheduler
(which triggers bypass mode while workers are still active), and
verifies all workers complete successfully under bypass mode.

This exercises the scheduler attach/detach lifecycle and verifies
that bypass mode (activated during unregistration to guarantee
forward progress) does not stall running tasks.

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../testing/selftests/sched_ext/bypass.bpf.c  |  32 ++++++
 tools/testing/selftests/sched_ext/bypass.c    | 105 ++++++++++++++++++
 3 files changed, 138 insertions(+)
 create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/bypass.c

diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
index a3bbe2c7911b..5fb6278d3f97 100644
--- a/tools/testing/selftests/sched_ext/Makefile
+++ b/tools/testing/selftests/sched_ext/Makefile
@@ -162,6 +162,7 @@ endef
 all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
 
 auto-test-targets :=			\
+	bypass				\
 	create_dsq			\
 	dequeue				\
 	enq_last_no_enq_fails		\
diff --git a/tools/testing/selftests/sched_ext/bypass.bpf.c b/tools/testing/selftests/sched_ext/bypass.bpf.c
new file mode 100644
index 000000000000..cb37c8df6834
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.bpf.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * BPF scheduler for bypass mode operational test.
+ *
+ * Implements a minimal global FIFO scheduler. The userspace side
+ * attaches this scheduler, runs worker tasks to completion, and
+ * verifies that tasks complete successfully.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#include <scx/common.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+UEI_DEFINE(uei);
+
+void BPF_STRUCT_OPS(bypass_enqueue, struct task_struct *p, u64 enq_flags)
+{
+	scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
+}
+
+void BPF_STRUCT_OPS(bypass_exit, struct scx_exit_info *ei)
+{
+	UEI_RECORD(uei, ei);
+}
+
+SEC(".struct_ops.link")
+struct sched_ext_ops bypass_ops = {
+	.enqueue		= (void *)bypass_enqueue,
+	.exit			= (void *)bypass_exit,
+	.name			= "bypass_test",
+};
diff --git a/tools/testing/selftests/sched_ext/bypass.c b/tools/testing/selftests/sched_ext/bypass.c
new file mode 100644
index 000000000000..952f09d76bdb
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Verify the sched_ext bypass mechanism: spawn worker tasks and ensure
+ * they run to completion while a BPF scheduler is active.
+ *
+ * The bypass mechanism (activated on scheduler unregistration) must
+ * guarantee forward progress. This test verifies that worker tasks
+ * complete successfully when the scheduler is detached.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <sys/wait.h>
+#include <bpf/bpf.h>
+#include <scx/common.h>
+#include "scx_test.h"
+#include "bypass.bpf.skel.h"
+
+#define NUM_BYPASS_WORKERS 4
+
+static void worker_fn(void)
+{
+	volatile int sum = 0;
+	int i;
+
+	/*
+	 * Do enough work to still be running when bpf_link__destroy()
+	 * is called, ensuring tasks are active during bypass mode.
+	 */
+	for (i = 0; i < 10000000; i++)
+		sum += i;
+}
+
+static enum scx_test_status setup(void **ctx)
+{
+	struct bypass *skel;
+
+	skel = bypass__open();
+	SCX_FAIL_IF(!skel, "Failed to open bypass skel");
+	SCX_ENUM_INIT(skel);
+	SCX_FAIL_IF(bypass__load(skel), "Failed to load bypass skel");
+
+	*ctx = skel;
+	return SCX_TEST_PASS;
+}
+
+static enum scx_test_status run(void *ctx)
+{
+	struct bypass *skel = ctx;
+	struct bpf_link *link;
+	pid_t pids[NUM_BYPASS_WORKERS];
+	int i, status;
+
+	link = bpf_map__attach_struct_ops(skel->maps.bypass_ops);
+	SCX_FAIL_IF(!link, "Failed to attach bypass scheduler");
+
+	/*
+	 * Spawn worker processes. These must complete successfully
+	 * even as the scheduler is active and then detached (which
+	 * triggers bypass mode).
+	 */
+	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+		pids[i] = fork();
+		SCX_FAIL_IF(pids[i] < 0, "fork() failed for worker %d", i);
+
+		if (pids[i] == 0) {
+			worker_fn();
+			_exit(0);
+		}
+	}
+
+	/*
+	 * Detach the scheduler while workers are still running. This
+	 * triggers bypass mode, which must guarantee forward progress
+	 * for all active tasks.
+	 */
+	bpf_link__destroy(link);
+
+	/* Workers must complete successfully under bypass mode */
+	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+		SCX_FAIL_IF(waitpid(pids[i], &status, 0) != pids[i],
+			    "waitpid failed for worker %d", i);
+		SCX_FAIL_IF(!WIFEXITED(status) || WEXITSTATUS(status) != 0,
+			    "Worker %d did not exit cleanly", i);
+	}
+
+	SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
+
+	return SCX_TEST_PASS;
+}
+
+static void cleanup(void *ctx)
+{
+	bypass__destroy(ctx);
+}
+
+struct scx_test bypass_test = {
+	.name		= "bypass",
+	.description	= "Verify tasks complete during bypass mode",
+	.setup		= setup,
+	.run		= run,
+	.cleanup	= cleanup,
+};
+REGISTER_SCX_TEST(&bypass_test)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
                   ` (3 preceding siblings ...)
  2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 15:02 ` [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest Andrea Righi
  5 siblings, 0 replies; 9+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

scx_bypass_depth is a global counter that will be moved into
struct scx_sched when multi-scheduler support lands. Add a comment
explaining why READ_ONCE/WRITE_ONCE are used despite bypass_lock
serialization: modifications are serialized by the lock, but the
value can be observed locklessly from softirq context (e.g., in
scx_bypass_lb_timerfn()).

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 kernel/sched/ext.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c269e489902c..b1e5a95682c1 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -33,6 +33,12 @@ static DEFINE_MUTEX(scx_enable_mutex);
 DEFINE_STATIC_KEY_FALSE(__scx_enabled);
 DEFINE_STATIC_PERCPU_RWSEM(scx_fork_rwsem);
 static atomic_t scx_enable_state_var = ATOMIC_INIT(SCX_DISABLED);
+/*
+ * Counts the number of active bypass requests. Protected by bypass_lock
+ * inside scx_bypass(), but read locklessly (e.g., from
+ * scx_bypass_lb_timerfn() in softirq context) using READ_ONCE(). Will
+ * be moved into struct scx_sched when multi-scheduler support lands.
+ */
 static int scx_bypass_depth;
 static cpumask_var_t scx_bypass_lb_donee_cpumask;
 static cpumask_var_t scx_bypass_lb_resched_cpumask;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test
  2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
@ 2026-03-06 15:02   ` Andrea Righi
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2026-03-06 15:02 UTC (permalink / raw)
  To: zhidao su
  Cc: tj, sched-ext, linux-kernel, void, changwoo, linux-kselftest,
	Su Zhidao

Hi,

On Fri, Mar 06, 2026 at 10:03:24PM +0800, zhidao su wrote:
> From: Su Zhidao <suzhidao@xiaomi.com>
> 
> Add a test that verifies the sched_ext bypass mechanism does not
> prevent tasks from running to completion.
> 
> The test attaches a minimal global FIFO scheduler, spawns worker
> processes that complete a fixed computation, detaches the scheduler
> (which triggers bypass mode while workers are still active), and
> verifies all workers complete successfully under bypass mode.
> 
> This exercises the scheduler attach/detach lifecycle and verifies
> that bypass mode (activated during unregistration to guarantee
> forward progress) does not stall running tasks.

I'm not sure this selftest adds much value. Implicitly we're already
testing the validity of bypass in the other sched_ext kselftests: if a task
is missed or gets stuck due to bypass mode, we would trigger a soft lockup,
a hung task timeout, or something similar.

> 
> Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
> ---
>  tools/testing/selftests/sched_ext/Makefile    |   1 +
>  .../testing/selftests/sched_ext/bypass.bpf.c  |  32 ++++++
>  tools/testing/selftests/sched_ext/bypass.c    | 105 ++++++++++++++++++
>  3 files changed, 138 insertions(+)
>  create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
>  create mode 100644 tools/testing/selftests/sched_ext/bypass.c
> 
> diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
> index a3bbe2c7911b..5fb6278d3f97 100644
> --- a/tools/testing/selftests/sched_ext/Makefile
> +++ b/tools/testing/selftests/sched_ext/Makefile
> @@ -162,6 +162,7 @@ endef
>  all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
>  
>  auto-test-targets :=			\
> +	bypass				\
>  	create_dsq			\
>  	dequeue				\
>  	enq_last_no_enq_fails		\
> diff --git a/tools/testing/selftests/sched_ext/bypass.bpf.c b/tools/testing/selftests/sched_ext/bypass.bpf.c
> new file mode 100644
> index 000000000000..cb37c8df6834
> --- /dev/null
> +++ b/tools/testing/selftests/sched_ext/bypass.bpf.c
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * BPF scheduler for bypass mode operational test.
> + *
> + * Implements a minimal global FIFO scheduler. The userspace side
> + * attaches this scheduler, runs worker tasks to completion, and
> + * verifies that tasks complete successfully.
> + *
> + * Copyright (c) 2026 Xiaomi Corporation.
> + */
> +#include <scx/common.bpf.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +UEI_DEFINE(uei);
> +
> +void BPF_STRUCT_OPS(bypass_enqueue, struct task_struct *p, u64 enq_flags)
> +{
> +	scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
> +}

We could also remove bypass_enqueue() and sched_ext core will do exactly
the same (implicitly enqueue to SCX_DSQ_GLOBAL).

> +
> +void BPF_STRUCT_OPS(bypass_exit, struct scx_exit_info *ei)
> +{
> +	UEI_RECORD(uei, ei);
> +}
> +
> +SEC(".struct_ops.link")
> +struct sched_ext_ops bypass_ops = {
> +	.enqueue		= (void *)bypass_enqueue,
> +	.exit			= (void *)bypass_exit,
> +	.name			= "bypass_test",
> +};
> diff --git a/tools/testing/selftests/sched_ext/bypass.c b/tools/testing/selftests/sched_ext/bypass.c
> new file mode 100644
> index 000000000000..952f09d76bdb
> --- /dev/null
> +++ b/tools/testing/selftests/sched_ext/bypass.c
> @@ -0,0 +1,105 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Verify the sched_ext bypass mechanism: spawn worker tasks and ensure
> + * they run to completion while a BPF scheduler is active.
> + *
> + * The bypass mechanism (activated on scheduler unregistration) must
> + * guarantee forward progress. This test verifies that worker tasks
> + * complete successfully when the scheduler is detached.
> + *
> + * Copyright (c) 2026 Xiaomi Corporation.
> + */
> +#define _GNU_SOURCE
> +#include <unistd.h>
> +#include <sys/wait.h>
> +#include <bpf/bpf.h>
> +#include <scx/common.h>
> +#include "scx_test.h"
> +#include "bypass.bpf.skel.h"
> +
> +#define NUM_BYPASS_WORKERS 4
> +
> +static void worker_fn(void)
> +{
> +	volatile int sum = 0;
> +	int i;
> +
> +	/*
> +	 * Do enough work to still be running when bpf_link__destroy()
> +	 * is called, ensuring tasks are active during bypass mode.
> +	 */
> +	for (i = 0; i < 10000000; i++)
> +		sum += i;
> +}
> +
> +static enum scx_test_status setup(void **ctx)
> +{
> +	struct bypass *skel;
> +
> +	skel = bypass__open();
> +	SCX_FAIL_IF(!skel, "Failed to open bypass skel");
> +	SCX_ENUM_INIT(skel);
> +	SCX_FAIL_IF(bypass__load(skel), "Failed to load bypass skel");
> +
> +	*ctx = skel;
> +	return SCX_TEST_PASS;
> +}
> +
> +static enum scx_test_status run(void *ctx)
> +{
> +	struct bypass *skel = ctx;
> +	struct bpf_link *link;
> +	pid_t pids[NUM_BYPASS_WORKERS];
> +	int i, status;
> +
> +	link = bpf_map__attach_struct_ops(skel->maps.bypass_ops);
> +	SCX_FAIL_IF(!link, "Failed to attach bypass scheduler");
> +
> +	/*
> +	 * Spawn worker processes. These must complete successfully
> +	 * even as the scheduler is active and then detached (which
> +	 * triggers bypass mode).
> +	 */
> +	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
> +		pids[i] = fork();
> +		SCX_FAIL_IF(pids[i] < 0, "fork() failed for worker %d", i);
> +
> +		if (pids[i] == 0) {
> +			worker_fn();
> +			_exit(0);
> +		}
> +	}

There's no synchronization with the parent, so on a fast system the workers
may even finish the loop before the parent ever detaches the scheduler.

> +
> +	/*
> +	 * Detach the scheduler while workers are still running. This
> +	 * triggers bypass mode, which must guarantee forward progress
> +	 * for all active tasks.
> +	 */
> +	bpf_link__destroy(link);
> +
> +	/* Workers must complete successfully under bypass mode */
> +	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
> +		SCX_FAIL_IF(waitpid(pids[i], &status, 0) != pids[i],
> +			    "waitpid failed for worker %d", i);
> +		SCX_FAIL_IF(!WIFEXITED(status) || WEXITSTATUS(status) != 0,
> +			    "Worker %d did not exit cleanly", i);
> +	}
> +
> +	SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
> +
> +	return SCX_TEST_PASS;
> +}
> +
> +static void cleanup(void *ctx)
> +{
> +	bypass__destroy(ctx);
> +}
> +
> +struct scx_test bypass_test = {
> +	.name		= "bypass",
> +	.description	= "Verify tasks complete during bypass mode",
> +	.setup		= setup,
> +	.run		= run,
> +	.cleanup	= cleanup,
> +};
> +REGISTER_SCX_TEST(&bypass_test)
> -- 
> 2.43.0
> 

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
                   ` (4 preceding siblings ...)
  2026-03-06 14:03 ` [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path zhidao su
@ 2026-03-06 15:02 ` Andrea Righi
  5 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2026-03-06 15:02 UTC (permalink / raw)
  To: zhidao su
  Cc: tj, sched-ext, linux-kernel, void, changwoo, linux-kselftest,
	Su Zhidao

Hi,

On Fri, Mar 06, 2026 at 10:03:20PM +0800, zhidao su wrote:
> From: Su Zhidao <suzhidao@xiaomi.com>
> 
> This series does a small cleanup pass on the sched_ext bypass code path
> and adds a selftest for the bypass mechanism.
> 
> Patch 1 removes SCX_OPS_HAS_CGROUP_WEIGHT, which was marked deprecated
> in 6.15 with a "will be removed on 6.18" comment. We are now past that
> point.

See:
https://lore.kernel.org/all/20260306073110.229595-1-zhaomzhao@126.com/

> 
> Patches 2-3 improve the bypass code in ext.c: add inline comments
> explaining the bypass depth counter semantics and the dequeue/enqueue
> re-queue loop, and replace rcu_dereference_all() with the more precise
> rcu_dereference_bh() in scx_bypass_lb_timerfn() which runs in softirq
> context.

These patches don't really improve code, they just add comments. Which is
nice, it's good to improve documentation, but documentation should help
understand better the high-level semantic, or clarify non-obvious
implemenatation details. In this case you're just commenting how the
specific code works, which should be already clear enough just by looking
at the code IMHO.

> 
> Patch 4 adds a selftest that verifies forward progress under bypass
> mode: worker processes are spawned while the scheduler is active, then
> bpf_link__destroy() is called (triggering bypass), and the test confirms
> all workers complete successfully.

Already commented on the patch.

> 
> Patch 5 adds a comment to the scx_bypass_depth declaration noting its
> planned migration into struct scx_sched.

Ditto about documentation.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-06 15:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
2026-03-06 14:03 ` [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics zhidao su
2026-03-06 14:03 ` [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn() zhidao su
2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
2026-03-06 15:02   ` Andrea Righi
2026-03-06 14:03 ` [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path zhidao su
2026-03-06 15:02 ` [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest Andrea Righi
  -- strict thread matches above, loose matches on Subject: below --
2026-03-06 12:49 [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox