All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest
@ 2026-03-06 14:03 zhidao su
  2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

This series does a small cleanup pass on the sched_ext bypass code path
and adds a selftest for the bypass mechanism.

Patch 1 removes SCX_OPS_HAS_CGROUP_WEIGHT, which was marked deprecated
in 6.15 with a "will be removed on 6.18" comment. We are now past that
point.

Patches 2-3 improve the bypass code in ext.c: add inline comments
explaining the bypass depth counter semantics and the dequeue/enqueue
re-queue loop, and replace rcu_dereference_all() with the more precise
rcu_dereference_bh() in scx_bypass_lb_timerfn() which runs in softirq
context.

Patch 4 adds a selftest that verifies forward progress under bypass
mode: worker processes are spawned while the scheduler is active, then
bpf_link__destroy() is called (triggering bypass), and the test confirms
all workers complete successfully.

Patch 5 adds a comment to the scx_bypass_depth declaration noting its
planned migration into struct scx_sched.

Tested on 6.18.7 with CONFIG_SCHED_CLASS_EXT=y; all existing selftests
pass.

Su Zhidao (5):
  sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag
  sched_ext: Add comments to scx_bypass() for bypass depth semantics
  sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn()
  sched_ext/selftests: Add bypass mode operational test
  sched_ext: Document scx_bypass_depth migration path

 kernel/sched/ext.c                            |  29 ++++-
 kernel/sched/ext_internal.h                   |   8 +-
 .../sched_ext/include/scx/enum_defs.autogen.h |   1 -
 tools/sched_ext/scx_flatcg.bpf.c              |   2 +-
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../testing/selftests/sched_ext/bypass.bpf.c  |  32 ++++++
 tools/testing/selftests/sched_ext/bypass.c    | 105 ++++++++++++++++++
 7 files changed, 165 insertions(+), 13 deletions(-)
 create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/bypass.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 14:03 ` [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics zhidao su
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

SCX_OPS_HAS_CGROUP_WEIGHT was deprecated in 6.15 with a comment
'will be removed on 6.18'. Now that we are at 6.18, remove it.

The flag was a no-op and only triggered a pr_warn() on use. Remove
the flag definition, the warning, and update scx_flatcg which was
the last in-tree user.

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 kernel/sched/ext.c                              | 3 ---
 kernel/sched/ext_internal.h                     | 8 +-------
 tools/sched_ext/include/scx/enum_defs.autogen.h | 1 -
 tools/sched_ext/scx_flatcg.bpf.c                | 2 +-
 4 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c4ccd685259f..56ff5874af94 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5061,9 +5061,6 @@ static int validate_ops(struct scx_sched *sch, const struct sched_ext_ops *ops)
 		return -EINVAL;
 	}
 
-	if (ops->flags & SCX_OPS_HAS_CGROUP_WEIGHT)
-		pr_warn("SCX_OPS_HAS_CGROUP_WEIGHT is deprecated and a noop\n");
-
 	if (ops->cpu_acquire || ops->cpu_release)
 		pr_warn("ops->cpu_acquire/release() are deprecated, use sched_switch TP instead\n");
 
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index bd26811fea99..3c86c53e1975 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -174,19 +174,13 @@ enum scx_ops_flags {
 	 */
 	SCX_OPS_BUILTIN_IDLE_PER_NODE	= 1LLU << 6,
 
-	/*
-	 * CPU cgroup support flags
-	 */
-	SCX_OPS_HAS_CGROUP_WEIGHT	= 1LLU << 16,	/* DEPRECATED, will be removed on 6.18 */
-
 	SCX_OPS_ALL_FLAGS		= SCX_OPS_KEEP_BUILTIN_IDLE |
 					  SCX_OPS_ENQ_LAST |
 					  SCX_OPS_ENQ_EXITING |
 					  SCX_OPS_ENQ_MIGRATION_DISABLED |
 					  SCX_OPS_ALLOW_QUEUED_WAKEUP |
 					  SCX_OPS_SWITCH_PARTIAL |
-					  SCX_OPS_BUILTIN_IDLE_PER_NODE |
-					  SCX_OPS_HAS_CGROUP_WEIGHT,
+					  SCX_OPS_BUILTIN_IDLE_PER_NODE,
 
 	/* high 8 bits are internal, don't include in SCX_OPS_ALL_FLAGS */
 	__SCX_OPS_INTERNAL_MASK		= 0xffLLU << 56,
diff --git a/tools/sched_ext/include/scx/enum_defs.autogen.h b/tools/sched_ext/include/scx/enum_defs.autogen.h
index dcc945304760..80c885f781ba 100644
--- a/tools/sched_ext/include/scx/enum_defs.autogen.h
+++ b/tools/sched_ext/include/scx/enum_defs.autogen.h
@@ -91,7 +91,6 @@
 #define HAVE_SCX_OPS_SWITCH_PARTIAL
 #define HAVE_SCX_OPS_ENQ_MIGRATION_DISABLED
 #define HAVE_SCX_OPS_ALLOW_QUEUED_WAKEUP
-#define HAVE_SCX_OPS_HAS_CGROUP_WEIGHT
 #define HAVE_SCX_OPS_ALL_FLAGS
 #define HAVE_SCX_OPSS_NONE
 #define HAVE_SCX_OPSS_QUEUEING
diff --git a/tools/sched_ext/scx_flatcg.bpf.c b/tools/sched_ext/scx_flatcg.bpf.c
index 0e785cff0f24..a8a9234bb41e 100644
--- a/tools/sched_ext/scx_flatcg.bpf.c
+++ b/tools/sched_ext/scx_flatcg.bpf.c
@@ -960,5 +960,5 @@ SCX_OPS_DEFINE(flatcg_ops,
 	       .cgroup_move		= (void *)fcg_cgroup_move,
 	       .init			= (void *)fcg_init,
 	       .exit			= (void *)fcg_exit,
-	       .flags			= SCX_OPS_HAS_CGROUP_WEIGHT | SCX_OPS_ENQ_EXITING,
+	       .flags			= SCX_OPS_ENQ_EXITING,
 	       .name			= "flatcg");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
  2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 14:03 ` [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn() zhidao su
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

The bypass depth counter (scx_bypass_depth) uses WRITE_ONCE/READ_ONCE
to communicate that it can be observed locklessly from IRQ context, even
though modifications are serialized by bypass_lock. The existing code did
not explain this pattern or the re-queue loop's role in propagating the
bypass state change to all CPUs.

Add inline comments to clarify:
- Why bypass_depth uses WRITE_ONCE/READ_ONCE despite lock protection
- How the dequeue/enqueue cycle propagates bypass state to all per-CPU queues

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 kernel/sched/ext.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 56ff5874af94..053d99c58802 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4229,6 +4229,14 @@ static void scx_bypass(bool bypass)
 	if (bypass) {
 		u32 intv_us;
 
+		/*
+		 * Increment bypass depth. Only the first caller (depth 0->1)
+		 * needs to set up the bypass state; subsequent callers just
+		 * increment the counter and return. The depth counter is
+		 * protected by bypass_lock but READ_ONCE/WRITE_ONCE are used
+		 * to communicate that the value can be observed locklessly
+		 * (e.g., from scx_bypass_lb_timerfn() in softirq context).
+		 */
 		WRITE_ONCE(scx_bypass_depth, scx_bypass_depth + 1);
 		WARN_ON_ONCE(scx_bypass_depth <= 0);
 		if (scx_bypass_depth != 1)
@@ -4263,6 +4271,10 @@ static void scx_bypass(bool bypass)
 	 *
 	 * This function can't trust the scheduler and thus can't use
 	 * cpus_read_lock(). Walk all possible CPUs instead of online.
+	 *
+	 * The dequeue/enqueue cycle forces tasks through the updated code
+	 * paths: in bypass mode, do_enqueue_task() routes to the per-CPU
+	 * bypass DSQ instead of calling ops.enqueue().
 	 */
 	for_each_possible_cpu(cpu) {
 		struct rq *rq = cpu_rq(cpu);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn()
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
  2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
  2026-03-06 14:03 ` [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

scx_bypass_lb_timerfn() runs in softirq (BH) context, so
rcu_dereference_bh() is the correct RCU accessor. The previous
rcu_dereference_all() suppresses all sparse warnings and masks
potential RCU context issues.

Add a comment noting this is a transitional state: when
multi-scheduler support lands, the bypass LB timer will become
per-scheduler and the global scx_root reference will be removed.

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 kernel/sched/ext.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 053d99c58802..c269e489902c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4170,7 +4170,13 @@ static void scx_bypass_lb_timerfn(struct timer_list *timer)
 	int node;
 	u32 intv_us;
 
-	sch = rcu_dereference_all(scx_root);
+	/*
+	 * scx_bypass_lb_timer is a global timer that fires in softirq
+	 * context while bypass mode is active. Use rcu_dereference_bh()
+	 * matching the BH context. When multi-scheduler support lands,
+	 * this timer will become per-scheduler instance.
+	 */
+	sch = rcu_dereference_bh(scx_root);
 	if (unlikely(!sch) || !READ_ONCE(scx_bypass_depth))
 		return;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
                   ` (2 preceding siblings ...)
  2026-03-06 14:03 ` [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn() zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 15:02   ` Andrea Righi
  2026-03-06 14:03 ` [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path zhidao su
  2026-03-06 15:02 ` [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest Andrea Righi
  5 siblings, 1 reply; 8+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

Add a test that verifies the sched_ext bypass mechanism does not
prevent tasks from running to completion.

The test attaches a minimal global FIFO scheduler, spawns worker
processes that complete a fixed computation, detaches the scheduler
(which triggers bypass mode while workers are still active), and
verifies all workers complete successfully under bypass mode.

This exercises the scheduler attach/detach lifecycle and verifies
that bypass mode (activated during unregistration to guarantee
forward progress) does not stall running tasks.

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../testing/selftests/sched_ext/bypass.bpf.c  |  32 ++++++
 tools/testing/selftests/sched_ext/bypass.c    | 105 ++++++++++++++++++
 3 files changed, 138 insertions(+)
 create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/bypass.c

diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
index a3bbe2c7911b..5fb6278d3f97 100644
--- a/tools/testing/selftests/sched_ext/Makefile
+++ b/tools/testing/selftests/sched_ext/Makefile
@@ -162,6 +162,7 @@ endef
 all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
 
 auto-test-targets :=			\
+	bypass				\
 	create_dsq			\
 	dequeue				\
 	enq_last_no_enq_fails		\
diff --git a/tools/testing/selftests/sched_ext/bypass.bpf.c b/tools/testing/selftests/sched_ext/bypass.bpf.c
new file mode 100644
index 000000000000..cb37c8df6834
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.bpf.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * BPF scheduler for bypass mode operational test.
+ *
+ * Implements a minimal global FIFO scheduler. The userspace side
+ * attaches this scheduler, runs worker tasks to completion, and
+ * verifies that tasks complete successfully.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#include <scx/common.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+UEI_DEFINE(uei);
+
+void BPF_STRUCT_OPS(bypass_enqueue, struct task_struct *p, u64 enq_flags)
+{
+	scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
+}
+
+void BPF_STRUCT_OPS(bypass_exit, struct scx_exit_info *ei)
+{
+	UEI_RECORD(uei, ei);
+}
+
+SEC(".struct_ops.link")
+struct sched_ext_ops bypass_ops = {
+	.enqueue		= (void *)bypass_enqueue,
+	.exit			= (void *)bypass_exit,
+	.name			= "bypass_test",
+};
diff --git a/tools/testing/selftests/sched_ext/bypass.c b/tools/testing/selftests/sched_ext/bypass.c
new file mode 100644
index 000000000000..952f09d76bdb
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/bypass.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Verify the sched_ext bypass mechanism: spawn worker tasks and ensure
+ * they run to completion while a BPF scheduler is active.
+ *
+ * The bypass mechanism (activated on scheduler unregistration) must
+ * guarantee forward progress. This test verifies that worker tasks
+ * complete successfully when the scheduler is detached.
+ *
+ * Copyright (c) 2026 Xiaomi Corporation.
+ */
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <sys/wait.h>
+#include <bpf/bpf.h>
+#include <scx/common.h>
+#include "scx_test.h"
+#include "bypass.bpf.skel.h"
+
+#define NUM_BYPASS_WORKERS 4
+
+static void worker_fn(void)
+{
+	volatile int sum = 0;
+	int i;
+
+	/*
+	 * Do enough work to still be running when bpf_link__destroy()
+	 * is called, ensuring tasks are active during bypass mode.
+	 */
+	for (i = 0; i < 10000000; i++)
+		sum += i;
+}
+
+static enum scx_test_status setup(void **ctx)
+{
+	struct bypass *skel;
+
+	skel = bypass__open();
+	SCX_FAIL_IF(!skel, "Failed to open bypass skel");
+	SCX_ENUM_INIT(skel);
+	SCX_FAIL_IF(bypass__load(skel), "Failed to load bypass skel");
+
+	*ctx = skel;
+	return SCX_TEST_PASS;
+}
+
+static enum scx_test_status run(void *ctx)
+{
+	struct bypass *skel = ctx;
+	struct bpf_link *link;
+	pid_t pids[NUM_BYPASS_WORKERS];
+	int i, status;
+
+	link = bpf_map__attach_struct_ops(skel->maps.bypass_ops);
+	SCX_FAIL_IF(!link, "Failed to attach bypass scheduler");
+
+	/*
+	 * Spawn worker processes. These must complete successfully
+	 * even as the scheduler is active and then detached (which
+	 * triggers bypass mode).
+	 */
+	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+		pids[i] = fork();
+		SCX_FAIL_IF(pids[i] < 0, "fork() failed for worker %d", i);
+
+		if (pids[i] == 0) {
+			worker_fn();
+			_exit(0);
+		}
+	}
+
+	/*
+	 * Detach the scheduler while workers are still running. This
+	 * triggers bypass mode, which must guarantee forward progress
+	 * for all active tasks.
+	 */
+	bpf_link__destroy(link);
+
+	/* Workers must complete successfully under bypass mode */
+	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
+		SCX_FAIL_IF(waitpid(pids[i], &status, 0) != pids[i],
+			    "waitpid failed for worker %d", i);
+		SCX_FAIL_IF(!WIFEXITED(status) || WEXITSTATUS(status) != 0,
+			    "Worker %d did not exit cleanly", i);
+	}
+
+	SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
+
+	return SCX_TEST_PASS;
+}
+
+static void cleanup(void *ctx)
+{
+	bypass__destroy(ctx);
+}
+
+struct scx_test bypass_test = {
+	.name		= "bypass",
+	.description	= "Verify tasks complete during bypass mode",
+	.setup		= setup,
+	.run		= run,
+	.cleanup	= cleanup,
+};
+REGISTER_SCX_TEST(&bypass_test)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
                   ` (3 preceding siblings ...)
  2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
@ 2026-03-06 14:03 ` zhidao su
  2026-03-06 15:02 ` [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest Andrea Righi
  5 siblings, 0 replies; 8+ messages in thread
From: zhidao su @ 2026-03-06 14:03 UTC (permalink / raw)
  To: tj, sched-ext, linux-kernel
  Cc: void, arighi, changwoo, linux-kselftest, Su Zhidao

From: Su Zhidao <suzhidao@xiaomi.com>

scx_bypass_depth is a global counter that will be moved into
struct scx_sched when multi-scheduler support lands. Add a comment
explaining why READ_ONCE/WRITE_ONCE are used despite bypass_lock
serialization: modifications are serialized by the lock, but the
value can be observed locklessly from softirq context (e.g., in
scx_bypass_lb_timerfn()).

Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
---
 kernel/sched/ext.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c269e489902c..b1e5a95682c1 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -33,6 +33,12 @@ static DEFINE_MUTEX(scx_enable_mutex);
 DEFINE_STATIC_KEY_FALSE(__scx_enabled);
 DEFINE_STATIC_PERCPU_RWSEM(scx_fork_rwsem);
 static atomic_t scx_enable_state_var = ATOMIC_INIT(SCX_DISABLED);
+/*
+ * Counts the number of active bypass requests. Protected by bypass_lock
+ * inside scx_bypass(), but read locklessly (e.g., from
+ * scx_bypass_lb_timerfn() in softirq context) using READ_ONCE(). Will
+ * be moved into struct scx_sched when multi-scheduler support lands.
+ */
 static int scx_bypass_depth;
 static cpumask_var_t scx_bypass_lb_donee_cpumask;
 static cpumask_var_t scx_bypass_lb_resched_cpumask;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test
  2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
@ 2026-03-06 15:02   ` Andrea Righi
  0 siblings, 0 replies; 8+ messages in thread
From: Andrea Righi @ 2026-03-06 15:02 UTC (permalink / raw)
  To: zhidao su
  Cc: tj, sched-ext, linux-kernel, void, changwoo, linux-kselftest,
	Su Zhidao

Hi,

On Fri, Mar 06, 2026 at 10:03:24PM +0800, zhidao su wrote:
> From: Su Zhidao <suzhidao@xiaomi.com>
> 
> Add a test that verifies the sched_ext bypass mechanism does not
> prevent tasks from running to completion.
> 
> The test attaches a minimal global FIFO scheduler, spawns worker
> processes that complete a fixed computation, detaches the scheduler
> (which triggers bypass mode while workers are still active), and
> verifies all workers complete successfully under bypass mode.
> 
> This exercises the scheduler attach/detach lifecycle and verifies
> that bypass mode (activated during unregistration to guarantee
> forward progress) does not stall running tasks.

I'm not sure this selftest adds much value. Implicitly we're already
testing the validity of bypass in the other sched_ext kselftests: if a task
is missed or gets stuck due to bypass mode, we would trigger a soft lockup,
a hung task timeout, or something similar.

> 
> Signed-off-by: Su Zhidao <suzhidao@xiaomi.com>
> ---
>  tools/testing/selftests/sched_ext/Makefile    |   1 +
>  .../testing/selftests/sched_ext/bypass.bpf.c  |  32 ++++++
>  tools/testing/selftests/sched_ext/bypass.c    | 105 ++++++++++++++++++
>  3 files changed, 138 insertions(+)
>  create mode 100644 tools/testing/selftests/sched_ext/bypass.bpf.c
>  create mode 100644 tools/testing/selftests/sched_ext/bypass.c
> 
> diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
> index a3bbe2c7911b..5fb6278d3f97 100644
> --- a/tools/testing/selftests/sched_ext/Makefile
> +++ b/tools/testing/selftests/sched_ext/Makefile
> @@ -162,6 +162,7 @@ endef
>  all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
>  
>  auto-test-targets :=			\
> +	bypass				\
>  	create_dsq			\
>  	dequeue				\
>  	enq_last_no_enq_fails		\
> diff --git a/tools/testing/selftests/sched_ext/bypass.bpf.c b/tools/testing/selftests/sched_ext/bypass.bpf.c
> new file mode 100644
> index 000000000000..cb37c8df6834
> --- /dev/null
> +++ b/tools/testing/selftests/sched_ext/bypass.bpf.c
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * BPF scheduler for bypass mode operational test.
> + *
> + * Implements a minimal global FIFO scheduler. The userspace side
> + * attaches this scheduler, runs worker tasks to completion, and
> + * verifies that tasks complete successfully.
> + *
> + * Copyright (c) 2026 Xiaomi Corporation.
> + */
> +#include <scx/common.bpf.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +UEI_DEFINE(uei);
> +
> +void BPF_STRUCT_OPS(bypass_enqueue, struct task_struct *p, u64 enq_flags)
> +{
> +	scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
> +}

We could also remove bypass_enqueue() and sched_ext core will do exactly
the same (implicitly enqueue to SCX_DSQ_GLOBAL).

> +
> +void BPF_STRUCT_OPS(bypass_exit, struct scx_exit_info *ei)
> +{
> +	UEI_RECORD(uei, ei);
> +}
> +
> +SEC(".struct_ops.link")
> +struct sched_ext_ops bypass_ops = {
> +	.enqueue		= (void *)bypass_enqueue,
> +	.exit			= (void *)bypass_exit,
> +	.name			= "bypass_test",
> +};
> diff --git a/tools/testing/selftests/sched_ext/bypass.c b/tools/testing/selftests/sched_ext/bypass.c
> new file mode 100644
> index 000000000000..952f09d76bdb
> --- /dev/null
> +++ b/tools/testing/selftests/sched_ext/bypass.c
> @@ -0,0 +1,105 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Verify the sched_ext bypass mechanism: spawn worker tasks and ensure
> + * they run to completion while a BPF scheduler is active.
> + *
> + * The bypass mechanism (activated on scheduler unregistration) must
> + * guarantee forward progress. This test verifies that worker tasks
> + * complete successfully when the scheduler is detached.
> + *
> + * Copyright (c) 2026 Xiaomi Corporation.
> + */
> +#define _GNU_SOURCE
> +#include <unistd.h>
> +#include <sys/wait.h>
> +#include <bpf/bpf.h>
> +#include <scx/common.h>
> +#include "scx_test.h"
> +#include "bypass.bpf.skel.h"
> +
> +#define NUM_BYPASS_WORKERS 4
> +
> +static void worker_fn(void)
> +{
> +	volatile int sum = 0;
> +	int i;
> +
> +	/*
> +	 * Do enough work to still be running when bpf_link__destroy()
> +	 * is called, ensuring tasks are active during bypass mode.
> +	 */
> +	for (i = 0; i < 10000000; i++)
> +		sum += i;
> +}
> +
> +static enum scx_test_status setup(void **ctx)
> +{
> +	struct bypass *skel;
> +
> +	skel = bypass__open();
> +	SCX_FAIL_IF(!skel, "Failed to open bypass skel");
> +	SCX_ENUM_INIT(skel);
> +	SCX_FAIL_IF(bypass__load(skel), "Failed to load bypass skel");
> +
> +	*ctx = skel;
> +	return SCX_TEST_PASS;
> +}
> +
> +static enum scx_test_status run(void *ctx)
> +{
> +	struct bypass *skel = ctx;
> +	struct bpf_link *link;
> +	pid_t pids[NUM_BYPASS_WORKERS];
> +	int i, status;
> +
> +	link = bpf_map__attach_struct_ops(skel->maps.bypass_ops);
> +	SCX_FAIL_IF(!link, "Failed to attach bypass scheduler");
> +
> +	/*
> +	 * Spawn worker processes. These must complete successfully
> +	 * even as the scheduler is active and then detached (which
> +	 * triggers bypass mode).
> +	 */
> +	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
> +		pids[i] = fork();
> +		SCX_FAIL_IF(pids[i] < 0, "fork() failed for worker %d", i);
> +
> +		if (pids[i] == 0) {
> +			worker_fn();
> +			_exit(0);
> +		}
> +	}

There's no synchronization with the parent, so on a fast system the workers
may even finish the loop before the parent ever detaches the scheduler.

> +
> +	/*
> +	 * Detach the scheduler while workers are still running. This
> +	 * triggers bypass mode, which must guarantee forward progress
> +	 * for all active tasks.
> +	 */
> +	bpf_link__destroy(link);
> +
> +	/* Workers must complete successfully under bypass mode */
> +	for (i = 0; i < NUM_BYPASS_WORKERS; i++) {
> +		SCX_FAIL_IF(waitpid(pids[i], &status, 0) != pids[i],
> +			    "waitpid failed for worker %d", i);
> +		SCX_FAIL_IF(!WIFEXITED(status) || WEXITSTATUS(status) != 0,
> +			    "Worker %d did not exit cleanly", i);
> +	}
> +
> +	SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
> +
> +	return SCX_TEST_PASS;
> +}
> +
> +static void cleanup(void *ctx)
> +{
> +	bypass__destroy(ctx);
> +}
> +
> +struct scx_test bypass_test = {
> +	.name		= "bypass",
> +	.description	= "Verify tasks complete during bypass mode",
> +	.setup		= setup,
> +	.run		= run,
> +	.cleanup	= cleanup,
> +};
> +REGISTER_SCX_TEST(&bypass_test)
> -- 
> 2.43.0
> 

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest
  2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
                   ` (4 preceding siblings ...)
  2026-03-06 14:03 ` [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path zhidao su
@ 2026-03-06 15:02 ` Andrea Righi
  5 siblings, 0 replies; 8+ messages in thread
From: Andrea Righi @ 2026-03-06 15:02 UTC (permalink / raw)
  To: zhidao su
  Cc: tj, sched-ext, linux-kernel, void, changwoo, linux-kselftest,
	Su Zhidao

Hi,

On Fri, Mar 06, 2026 at 10:03:20PM +0800, zhidao su wrote:
> From: Su Zhidao <suzhidao@xiaomi.com>
> 
> This series does a small cleanup pass on the sched_ext bypass code path
> and adds a selftest for the bypass mechanism.
> 
> Patch 1 removes SCX_OPS_HAS_CGROUP_WEIGHT, which was marked deprecated
> in 6.15 with a "will be removed on 6.18" comment. We are now past that
> point.

See:
https://lore.kernel.org/all/20260306073110.229595-1-zhaomzhao@126.com/

> 
> Patches 2-3 improve the bypass code in ext.c: add inline comments
> explaining the bypass depth counter semantics and the dequeue/enqueue
> re-queue loop, and replace rcu_dereference_all() with the more precise
> rcu_dereference_bh() in scx_bypass_lb_timerfn() which runs in softirq
> context.

These patches don't really improve code, they just add comments. Which is
nice, it's good to improve documentation, but documentation should help
understand better the high-level semantic, or clarify non-obvious
implemenatation details. In this case you're just commenting how the
specific code works, which should be already clear enough just by looking
at the code IMHO.

> 
> Patch 4 adds a selftest that verifies forward progress under bypass
> mode: worker processes are spawned while the scheduler is active, then
> bpf_link__destroy() is called (triggering bypass), and the test confirms
> all workers complete successfully.

Already commented on the patch.

> 
> Patch 5 adds a comment to the scx_bypass_depth declaration noting its
> planned migration into struct scx_sched.

Ditto about documentation.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-06 15:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 14:03 [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest zhidao su
2026-03-06 14:03 ` [PATCH 1/5] sched_ext: Remove deprecated SCX_OPS_HAS_CGROUP_WEIGHT flag zhidao su
2026-03-06 14:03 ` [PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics zhidao su
2026-03-06 14:03 ` [PATCH 3/5] sched_ext: Use rcu_dereference_bh() in scx_bypass_lb_timerfn() zhidao su
2026-03-06 14:03 ` [PATCH 4/5] sched_ext/selftests: Add bypass mode operational test zhidao su
2026-03-06 15:02   ` Andrea Righi
2026-03-06 14:03 ` [PATCH 5/5] sched_ext: Document scx_bypass_depth migration path zhidao su
2026-03-06 15:02 ` [PATCH 0/5] sched_ext: bypass state machine cleanup and selftest Andrea Righi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.