[PATCH v3 00/10] Add a deadline server for sched

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/10] Add a deadline server for sched_ext tasks
@ 2025-06-13  5:17 Joel Fernandes
  2025-06-13  5:17 ` [PATCH v3 10/10] selftests/sched_ext: Add test for sched_ext dl_server Joel Fernandes
  2025-06-13 17:35 ` [PATCH v3 00/10] Add a deadline server for sched_ext tasks Joel Fernandes
  0 siblings, 2 replies; 5+ messages in thread
From: Joel Fernandes @ 2025-06-13  5:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Tejun Heo, David Vernet,
	Andrea Righi, Changwoo Min, bpf

sched_ext tasks currently are starved by RT hoggers especially since RT
throttling was replaced by deadline servers to boost only CFS tasks. Several
users in the community have reported issues with RT stalling sched_ext tasks.
Add a sched_ext deadline server as well so that sched_ext tasks are also
boosted and do not suffer starvation.

A kselftest is also provided to verify the starvation issues are now fixed.

Btw, there is still something funky going on with CPU hotplug and the
relinquish patch. Sometimes the sched_ext's hotplug self-test locks up
(./runner -t hotplug). Reverting that patch fixes it, so I am suspecting
something is off in dl_server_remove_params() when it is being called on
offline CPUs.

v2->v3:
 - Removed code duplication in debugfs. Made ext interface separate.
 - Fixed issue where rq_lock_irqsave was not used in the relinquish patch.
 - Fixed running bw accounting issue in dl_server_remove_params.

Link to v1: https://lore.kernel.org/all/20250315022158.2354454-1-joelagnelf@nvidia.com/
Link to v2: https://lore.kernel.org/all/20250602180110.816225-1-joelagnelf@nvidia.com/

Andrea Righi (1):
  selftests/sched_ext: Add test for sched_ext dl_server

Joel Fernandes (9):
  sched/debug: Fix updating of ppos on server write ops
  sched/debug: Stop and start server based on if it was active
  sched/deadline: Clear the defer params
  sched: Add support to pick functions to take rf
  sched: Add a server arg to dl_server_update_idle_time()
  sched/ext: Add a DL server for sched_ext tasks
  sched/debug: Add support to change sched_ext server params
  sched/deadline: Add support to remove DL server bandwidth
  sched/ext: Relinquish DL server reservations when not needed

 include/linux/sched.h                         |   2 +-
 kernel/sched/core.c                           |  19 +-
 kernel/sched/deadline.c                       |  78 +++++--
 kernel/sched/debug.c                          | 171 +++++++++++---
 kernel/sched/ext.c                            | 108 ++++++++-
 kernel/sched/fair.c                           |  15 +-
 kernel/sched/idle.c                           |   4 +-
 kernel/sched/rt.c                             |   2 +-
 kernel/sched/sched.h                          |  13 +-
 kernel/sched/stop_task.c                      |   2 +-
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../selftests/sched_ext/rt_stall.bpf.c        |  23 ++
 tools/testing/selftests/sched_ext/rt_stall.c  | 213 ++++++++++++++++++
 13 files changed, 579 insertions(+), 72 deletions(-)
 create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 10/10] selftests/sched_ext: Add test for sched_ext dl_server
  2025-06-13  5:17 [PATCH v3 00/10] Add a deadline server for sched_ext tasks Joel Fernandes
@ 2025-06-13  5:17 ` Joel Fernandes
  2025-06-13 17:35 ` [PATCH v3 00/10] Add a deadline server for sched_ext tasks Joel Fernandes
  1 sibling, 0 replies; 5+ messages in thread
From: Joel Fernandes @ 2025-06-13  5:17 UTC (permalink / raw)
  To: linux-kernel, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
	Shuah Khan
  Cc: Joel Fernandes, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, linux-kselftest, bpf

From: Andrea Righi <arighi@nvidia.com>

Add a selftest to validate the correct behavior of the deadline server
for the ext_sched_class.

[ Joel: Replaced occurences of CFS in the test with EXT. ]

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../selftests/sched_ext/rt_stall.bpf.c        |  23 ++
 tools/testing/selftests/sched_ext/rt_stall.c  | 213 ++++++++++++++++++
 3 files changed, 237 insertions(+)
 create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c

diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
index f4531327b8e7..dcc803eeab39 100644
--- a/tools/testing/selftests/sched_ext/Makefile
+++ b/tools/testing/selftests/sched_ext/Makefile
@@ -181,6 +181,7 @@ auto-test-targets :=			\
 	select_cpu_dispatch_bad_dsq	\
 	select_cpu_dispatch_dbl_dsp	\
 	select_cpu_vtime		\
+	rt_stall			\
 	test_example			\
 
 testcase-targets := $(addsuffix .o,$(addprefix $(SCXOBJ_DIR)/,$(auto-test-targets)))
diff --git a/tools/testing/selftests/sched_ext/rt_stall.bpf.c b/tools/testing/selftests/sched_ext/rt_stall.bpf.c
new file mode 100644
index 000000000000..80086779dd1e
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/rt_stall.bpf.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * A scheduler that verified if RT tasks can stall SCHED_EXT tasks.
+ *
+ * Copyright (c) 2025 NVIDIA Corporation.
+ */
+
+#include <scx/common.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+UEI_DEFINE(uei);
+
+void BPF_STRUCT_OPS(rt_stall_exit, struct scx_exit_info *ei)
+{
+	UEI_RECORD(uei, ei);
+}
+
+SEC(".struct_ops.link")
+struct sched_ext_ops rt_stall_ops = {
+	.exit			= (void *)rt_stall_exit,
+	.name			= "rt_stall",
+};
diff --git a/tools/testing/selftests/sched_ext/rt_stall.c b/tools/testing/selftests/sched_ext/rt_stall.c
new file mode 100644
index 000000000000..d4cb545ebfd8
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/rt_stall.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2025 NVIDIA Corporation.
+ */
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sched.h>
+#include <sys/prctl.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <time.h>
+#include <linux/sched.h>
+#include <signal.h>
+#include <bpf/bpf.h>
+#include <scx/common.h>
+#include <sys/wait.h>
+#include <unistd.h>
+#include "rt_stall.bpf.skel.h"
+#include "scx_test.h"
+#include "../kselftest.h"
+
+#define CORE_ID		0	/* CPU to pin tasks to */
+#define RUN_TIME        5	/* How long to run the test in seconds */
+
+/* Simple busy-wait function for test tasks */
+static void process_func(void)
+{
+	while (1) {
+		/* Busy wait */
+		for (volatile unsigned long i = 0; i < 10000000UL; i++);
+	}
+}
+
+/* Set CPU affinity to a specific core */
+static void set_affinity(int cpu)
+{
+	cpu_set_t mask;
+
+	CPU_ZERO(&mask);
+	CPU_SET(cpu, &mask);
+	if (sched_setaffinity(0, sizeof(mask), &mask) != 0) {
+		perror("sched_setaffinity");
+		exit(EXIT_FAILURE);
+	}
+}
+
+/* Set task scheduling policy and priority */
+static void set_sched(int policy, int priority)
+{
+	struct sched_param param;
+
+	param.sched_priority = priority;
+	if (sched_setscheduler(0, policy, &param) != 0) {
+		perror("sched_setscheduler");
+		exit(EXIT_FAILURE);
+	}
+}
+
+/* Get process runtime from /proc/<pid>/stat */
+static float get_process_runtime(int pid)
+{
+	char path[256];
+	FILE *file;
+	long utime, stime;
+	int fields;
+
+	snprintf(path, sizeof(path), "/proc/%d/stat", pid);
+	file = fopen(path, "r");
+	if (file == NULL) {
+		perror("Failed to open stat file");
+		return -1;
+	}
+
+	/* Skip the first 13 fields and read the 14th and 15th */
+	fields = fscanf(file,
+			"%*d %*s %*c %*d %*d %*d %*d %*d %*u %*u %*u %*u %*u %lu %lu",
+			&utime, &stime);
+	fclose(file);
+
+	if (fields != 2) {
+		fprintf(stderr, "Failed to read stat file\n");
+		return -1;
+	}
+
+	/* Calculate the total time spent in the process */
+	long total_time = utime + stime;
+	long ticks_per_second = sysconf(_SC_CLK_TCK);
+	float runtime_seconds = total_time * 1.0 / ticks_per_second;
+
+	return runtime_seconds;
+}
+
+static enum scx_test_status setup(void **ctx)
+{
+	struct rt_stall *skel;
+
+	skel = rt_stall__open();
+	SCX_FAIL_IF(!skel, "Failed to open");
+	SCX_ENUM_INIT(skel);
+	SCX_FAIL_IF(rt_stall__load(skel), "Failed to load skel");
+
+	*ctx = skel;
+
+	return SCX_TEST_PASS;
+}
+
+static bool sched_stress_test(void)
+{
+	float cfs_runtime, rt_runtime;
+	int cfs_pid, rt_pid;
+	float expected_min_ratio = 0.04; /* 4% */
+
+	ksft_print_header();
+	ksft_set_plan(1);
+
+	/* Create and set up a EXT task */
+	cfs_pid = fork();
+	if (cfs_pid == 0) {
+		set_affinity(CORE_ID);
+		process_func();
+		exit(0);
+	} else if (cfs_pid < 0) {
+		perror("fork for EXT task");
+		ksft_exit_fail();
+	}
+
+	/* Create an RT task */
+	rt_pid = fork();
+	if (rt_pid == 0) {
+		set_affinity(CORE_ID);
+		set_sched(SCHED_FIFO, 50);
+		process_func();
+		exit(0);
+	} else if (rt_pid < 0) {
+		perror("fork for RT task");
+		ksft_exit_fail();
+	}
+
+	/* Let the processes run for the specified time */
+	sleep(RUN_TIME);
+
+	/* Get runtime for the EXT task */
+	cfs_runtime = get_process_runtime(cfs_pid);
+	if (cfs_runtime != -1)
+		ksft_print_msg("Runtime of EXT task (PID %d) is %f seconds\n", cfs_pid, cfs_runtime);
+	else
+		ksft_exit_fail_msg("Error getting runtime for EXT task (PID %d)\n", cfs_pid);
+
+	/* Get runtime for the RT task */
+	rt_runtime = get_process_runtime(rt_pid);
+	if (rt_runtime != -1)
+		ksft_print_msg("Runtime of RT task (PID %d) is %f seconds\n", rt_pid, rt_runtime);
+	else
+		ksft_exit_fail_msg("Error getting runtime for RT task (PID %d)\n", rt_pid);
+
+	/* Kill the processes */
+	kill(cfs_pid, SIGKILL);
+	kill(rt_pid, SIGKILL);
+	waitpid(cfs_pid, NULL, 0);
+	waitpid(rt_pid, NULL, 0);
+
+	/* Verify that the scx task got enough runtime */
+	float actual_ratio = cfs_runtime / (cfs_runtime + rt_runtime);
+	ksft_print_msg("EXT task got %.2f%% of total runtime\n", actual_ratio * 100);
+
+	if (actual_ratio >= expected_min_ratio) {
+		ksft_test_result_pass("PASS: EXT task got more than %.2f%% of runtime\n",
+				      expected_min_ratio * 100);
+		return true;
+	} else {
+		ksft_test_result_fail("FAIL: EXT task got less than %.2f%% of runtime\n",
+				      expected_min_ratio * 100);
+		return false;
+	}
+}
+
+static enum scx_test_status run(void *ctx)
+{
+	struct rt_stall *skel = ctx;
+	struct bpf_link *link;
+	bool res;
+
+	link = bpf_map__attach_struct_ops(skel->maps.rt_stall_ops);
+	SCX_FAIL_IF(!link, "Failed to attach scheduler");
+
+	res = sched_stress_test();
+
+	SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_NONE));
+	bpf_link__destroy(link);
+
+	if (!res)
+		ksft_exit_fail();
+
+	return SCX_TEST_PASS;
+}
+
+static void cleanup(void *ctx)
+{
+	struct rt_stall *skel = ctx;
+
+	rt_stall__destroy(skel);
+}
+
+struct scx_test rt_stall = {
+	.name = "rt_stall",
+	.description = "Verify that RT tasks cannot stall SCHED_EXT tasks",
+	.setup = setup,
+	.run = run,
+	.cleanup = cleanup,
+};
+REGISTER_SCX_TEST(&rt_stall)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 00/10] Add a deadline server for sched_ext tasks
  2025-06-13  5:17 [PATCH v3 00/10] Add a deadline server for sched_ext tasks Joel Fernandes
  2025-06-13  5:17 ` [PATCH v3 10/10] selftests/sched_ext: Add test for sched_ext dl_server Joel Fernandes
@ 2025-06-13 17:35 ` Joel Fernandes
  2025-06-13 18:05   ` Joel Fernandes
  1 sibling, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2025-06-13 17:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Tejun Heo, David Vernet, Andrea Righi,
	Changwoo Min, bpf



On 6/13/2025 1:17 AM, Joel Fernandes wrote:
> sched_ext tasks currently are starved by RT hoggers especially since RT
> throttling was replaced by deadline servers to boost only CFS tasks. Several
> users in the community have reported issues with RT stalling sched_ext tasks.
> Add a sched_ext deadline server as well so that sched_ext tasks are also
> boosted and do not suffer starvation.
> 
> A kselftest is also provided to verify the starvation issues are now fixed.
> 
> Btw, there is still something funky going on with CPU hotplug and the
> relinquish patch. Sometimes the sched_ext's hotplug self-test locks up
> (./runner -t hotplug). Reverting that patch fixes it, so I am suspecting
> something is off in dl_server_remove_params() when it is being called on
> offline CPUs.

I think I got somewhere here with this sched_ext hotplug test but still not
there yet. Juri, Andrea, Tejun, can you take a look at the below when you get a
chance?

In the hotplug test, when the CPU is brought online, I see the following warning
fire [1]. Basically, dl_server_apply_params() fails with -EBUSY due to overflow
checks.

@@ -1657,8 +1657,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)
                u64 runtime =  50 * NSEC_PER_MSEC;
                u64 period = 1000 * NSEC_PER_MSEC;

-               dl_server_apply_params(dl_se, runtime, period, 1);
-
+               WARN_ON_ONCE(dl_server_apply_params(dl_se, runtime, period, 1));
                dl_se->dl_server = 1;
                dl_se->dl_defer = 1;
                setup_new_dl_entity(dl_se);

I dug deeper, and it seems CPU 1 was previously brought offline and then online
before the warning happened during *that onlining*. During the onlining,
enqueue_task_scx() -> dl_server_start() was called but dl_server_apply_params()
returned -EBUSY.

In dl_server_apply_params() -> __dl_overflow(), it appears dl_bw_cpus()=0 and
cap=0. That is really odd and probably the reason for warning. Is that because
the CPU was offlined earlier and is not yet attached to the root domain?

The problem also comes down to why does this happen only when calling my
dl_server_remove_params() only and not otherwise, and why on earth is
dl_bw_cpus() returning 0. There's at least 2 other CPUs online at the time.

Anyway, other than this mystery, I fixed all other bandwidth-related warnings
due to dl_server_remove_params() and the updated patch below [2].

[1] Warning:

[   11.878005] DL server bandwidth overflow on CPU 1: dl_b->bw=996147, cap=0,
total_bw=0, old_bw=0, new_bw=52428, dl_bw_cpus=0
[   11.878356] ------------[ cut here ]------------
[   11.878528] WARNING: CPU: 0 PID: 145 at
               kernel/sched/deadline.c:1670 dl_server_start+0x96/0xa0
[   11.879400] Sched_ext: hotplug_cbs (enabled+all), task: runnable_at=+0ms

       [   11.879404] RIP: 0010:dl_server_start+0x96/0xa0
[   11.879732] Code: 53 10 75 1d 49 8b 86 10 0c 00 00 48 8b
[   11.882510] Call Trace:
[   11.882592]  <TASK>
[   11.882685]  enqueue_task_scx+0x190/0x280
[   11.882802]  ttwu_do_activate+0xaa/0x2a0
[   11.882925]  try_to_wake_up+0x371/0x600
[   11.883047]  cpuhp_bringup_ap+0xd6/0x170

       [   11.883172]  cpuhp_invoke_callback+0x142/0x540

              [   11.883327]  _cpu_up+0x15b/0x270
[   11.883450]  cpu_up+0x52/0xb0
[   11.883576]  cpu_subsys_online+0x32/0x120
[   11.883704]  online_store+0x98/0x130
[   11.883824]  kernfs_fop_write_iter+0xeb/0x170
[   11.883972]  vfs_write+0x2c7/0x430

       [   11.884091]  ksys_write+0x70/0xe0
[   11.884209]  do_syscall_64+0xd6/0x250
[   11.884327]  ? clear_bhb_loop+0x40/0x90

       [   11.884443]  entry_SYSCALL_64_after_hwframe+0x77/0x7f


[2]: Updated patch "sched/ext: Relinquish DL server reservations when not needed":
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=sched/scx-dlserver-boost-rebase&id=56581c2a6bb8e78593df80ad47520a8399055eae

thanks,

 - Joel


> 
> v2->v3:
>  - Removed code duplication in debugfs. Made ext interface separate.
>  - Fixed issue where rq_lock_irqsave was not used in the relinquish patch.
>  - Fixed running bw accounting issue in dl_server_remove_params.
> 
> Link to v1: https://lore.kernel.org/all/20250315022158.2354454-1-joelagnelf@nvidia.com/
> Link to v2: https://lore.kernel.org/all/20250602180110.816225-1-joelagnelf@nvidia.com/
> 
> Andrea Righi (1):
>   selftests/sched_ext: Add test for sched_ext dl_server
> 
> Joel Fernandes (9):
>   sched/debug: Fix updating of ppos on server write ops
>   sched/debug: Stop and start server based on if it was active
>   sched/deadline: Clear the defer params
>   sched: Add support to pick functions to take rf
>   sched: Add a server arg to dl_server_update_idle_time()
>   sched/ext: Add a DL server for sched_ext tasks
>   sched/debug: Add support to change sched_ext server params
>   sched/deadline: Add support to remove DL server bandwidth
>   sched/ext: Relinquish DL server reservations when not needed
> 
>  include/linux/sched.h                         |   2 +-
>  kernel/sched/core.c                           |  19 +-
>  kernel/sched/deadline.c                       |  78 +++++--
>  kernel/sched/debug.c                          | 171 +++++++++++---
>  kernel/sched/ext.c                            | 108 ++++++++-
>  kernel/sched/fair.c                           |  15 +-
>  kernel/sched/idle.c                           |   4 +-
>  kernel/sched/rt.c                             |   2 +-
>  kernel/sched/sched.h                          |  13 +-
>  kernel/sched/stop_task.c                      |   2 +-
>  tools/testing/selftests/sched_ext/Makefile    |   1 +
>  .../selftests/sched_ext/rt_stall.bpf.c        |  23 ++
>  tools/testing/selftests/sched_ext/rt_stall.c  | 213 ++++++++++++++++++
>  13 files changed, 579 insertions(+), 72 deletions(-)
>  create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c
>  create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 00/10] Add a deadline server for sched_ext tasks
  2025-06-13 17:35 ` [PATCH v3 00/10] Add a deadline server for sched_ext tasks Joel Fernandes
@ 2025-06-13 18:05   ` Joel Fernandes
  2025-06-13 22:44     ` Andrea Righi
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2025-06-13 18:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Tejun Heo, David Vernet, Andrea Righi,
	Changwoo Min, bpf



On 6/13/2025 1:35 PM, Joel Fernandes wrote:
> 
> 
> On 6/13/2025 1:17 AM, Joel Fernandes wrote:
>> sched_ext tasks currently are starved by RT hoggers especially since RT
>> throttling was replaced by deadline servers to boost only CFS tasks. Several
>> users in the community have reported issues with RT stalling sched_ext tasks.
>> Add a sched_ext deadline server as well so that sched_ext tasks are also
>> boosted and do not suffer starvation.
>>
>> A kselftest is also provided to verify the starvation issues are now fixed.
>>
>> Btw, there is still something funky going on with CPU hotplug and the
>> relinquish patch. Sometimes the sched_ext's hotplug self-test locks up
>> (./runner -t hotplug). Reverting that patch fixes it, so I am suspecting
>> something is off in dl_server_remove_params() when it is being called on
>> offline CPUs.
> 
> I think I got somewhere here with this sched_ext hotplug test but still not
> there yet. Juri, Andrea, Tejun, can you take a look at the below when you get a
> chance?

The following patch makes the sched_ext hotplug test reliably pass for me now.
Thoughts?

From: Joel Fernandes <joelagnelf@nvidia.com>
Subject: [PATCH] sched/deadline: Prevent setting server as started if params
 couldn't be applied

The following call trace fails to set dl_server_apply_params() as
dl_bw_cpus() is 0 during CPU onlining in the below path.

[   11.878356] ------------[ cut here ]------------
[   11.882592]  <TASK>
[   11.882685]  enqueue_task_scx+0x190/0x280
[   11.882802]  ttwu_do_activate+0xaa/0x2a0
[   11.882925]  try_to_wake_up+0x371/0x600
[   11.883047]  cpuhp_bringup_ap+0xd6/0x170

       [   11.883172]  cpuhp_invoke_callback+0x142/0x540

              [   11.883327]  _cpu_up+0x15b/0x270
[   11.883450]  cpu_up+0x52/0xb0
[   11.883576]  cpu_subsys_online+0x32/0x120
[   11.883704]  online_store+0x98/0x130
[   11.883824]  kernfs_fop_write_iter+0xeb/0x170
[   11.883972]  vfs_write+0x2c7/0x430

       [   11.884091]  ksys_write+0x70/0xe0
[   11.884209]  do_syscall_64+0xd6/0x250
[   11.884327]  ? clear_bhb_loop+0x40/0x90

       [   11.884443]  entry_SYSCALL_64_after_hwframe+0x77/0x7f

It seems too early to start the server. Simply defer the starting of the
server to the next enqueue if dl_server_apply_params() returns an error.
In any case, we should not pretend like the server started and it does
seem to mess up with the sched_ext CPU hotplug test.

With this, the sched_ext hotplug test reliably passes.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 kernel/sched/deadline.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f0cd1dbca4b8..8dd0c6d71489 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1657,8 +1657,8 @@ void dl_server_start(struct sched_dl_entity *dl_se)
                u64 runtime =  50 * NSEC_PER_MSEC;
                u64 period = 1000 * NSEC_PER_MSEC;

-               dl_server_apply_params(dl_se, runtime, period, 1);
-
+               if (dl_server_apply_params(dl_se, runtime, period, 1))
+                       return;
                dl_se->dl_server = 1;
                dl_se->dl_defer = 1;
                setup_new_dl_entity(dl_se);
@@ -1675,7 +1675,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)

 void dl_server_stop(struct sched_dl_entity *dl_se)
 {
-       if (!dl_se->dl_runtime)
+       if (!dl_se->dl_runtime || !dl_se->dl_server_active)
                return;

        dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 00/10] Add a deadline server for sched_ext tasks
  2025-06-13 18:05   ` Joel Fernandes
@ 2025-06-13 22:44     ` Andrea Righi
  0 siblings, 0 replies; 5+ messages in thread
From: Andrea Righi @ 2025-06-13 22:44 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider, Tejun Heo, David Vernet,
	Changwoo Min, bpf

Hi Joel,

On Fri, Jun 13, 2025 at 02:05:03PM -0400, Joel Fernandes wrote:
> 
> 
> On 6/13/2025 1:35 PM, Joel Fernandes wrote:
> > 
> > 
> > On 6/13/2025 1:17 AM, Joel Fernandes wrote:
> >> sched_ext tasks currently are starved by RT hoggers especially since RT
> >> throttling was replaced by deadline servers to boost only CFS tasks. Several
> >> users in the community have reported issues with RT stalling sched_ext tasks.
> >> Add a sched_ext deadline server as well so that sched_ext tasks are also
> >> boosted and do not suffer starvation.
> >>
> >> A kselftest is also provided to verify the starvation issues are now fixed.
> >>
> >> Btw, there is still something funky going on with CPU hotplug and the
> >> relinquish patch. Sometimes the sched_ext's hotplug self-test locks up
> >> (./runner -t hotplug). Reverting that patch fixes it, so I am suspecting
> >> something is off in dl_server_remove_params() when it is being called on
> >> offline CPUs.
> > 
> > I think I got somewhere here with this sched_ext hotplug test but still not
> > there yet. Juri, Andrea, Tejun, can you take a look at the below when you get a
> > chance?
> 
> The following patch makes the sched_ext hotplug test reliably pass for me now.
> Thoughts?

For me it gets stuck here, when the hotplug test tries to bring the CPU
offline:

TEST: hotplug
DESCRIPTION: Verify hotplug behavior
OUTPUT:
[    5.042497] smpboot: CPU 1 is now offline
[    5.069691] sched_ext: BPF scheduler "hotplug_cbs" enabled
[    5.108705] smpboot: Booting Node 0 Processor 1 APIC 0x1
[    5.149484] sched_ext: BPF scheduler "hotplug_cbs" disabled (unregistered from BPF)
EXIT: unregistered from BPF (hotplug event detected (1 going online))
[    5.204500] sched_ext: BPF scheduler "hotplug_cbs" enabled
Failed to bring CPU offline (Device or resource busy)

However, if I don't stop rq->fair_server in the scx_switching_all case
everything seems to work (which I still don't understand why).

I didn't have much time to look at this today, I'll investigate more
tomorrow.

-Andrea

> 
> From: Joel Fernandes <joelagnelf@nvidia.com>
> Subject: [PATCH] sched/deadline: Prevent setting server as started if params
>  couldn't be applied
> 
> The following call trace fails to set dl_server_apply_params() as
> dl_bw_cpus() is 0 during CPU onlining in the below path.
> 
> [   11.878356] ------------[ cut here ]------------
> [   11.882592]  <TASK>
> [   11.882685]  enqueue_task_scx+0x190/0x280
> [   11.882802]  ttwu_do_activate+0xaa/0x2a0
> [   11.882925]  try_to_wake_up+0x371/0x600
> [   11.883047]  cpuhp_bringup_ap+0xd6/0x170
> 
>        [   11.883172]  cpuhp_invoke_callback+0x142/0x540
> 
>               [   11.883327]  _cpu_up+0x15b/0x270
> [   11.883450]  cpu_up+0x52/0xb0
> [   11.883576]  cpu_subsys_online+0x32/0x120
> [   11.883704]  online_store+0x98/0x130
> [   11.883824]  kernfs_fop_write_iter+0xeb/0x170
> [   11.883972]  vfs_write+0x2c7/0x430
> 
>        [   11.884091]  ksys_write+0x70/0xe0
> [   11.884209]  do_syscall_64+0xd6/0x250
> [   11.884327]  ? clear_bhb_loop+0x40/0x90
> 
>        [   11.884443]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> It seems too early to start the server. Simply defer the starting of the
> server to the next enqueue if dl_server_apply_params() returns an error.
> In any case, we should not pretend like the server started and it does
> seem to mess up with the sched_ext CPU hotplug test.
> 
> With this, the sched_ext hotplug test reliably passes.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  kernel/sched/deadline.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index f0cd1dbca4b8..8dd0c6d71489 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1657,8 +1657,8 @@ void dl_server_start(struct sched_dl_entity *dl_se)
>                 u64 runtime =  50 * NSEC_PER_MSEC;
>                 u64 period = 1000 * NSEC_PER_MSEC;
> 
> -               dl_server_apply_params(dl_se, runtime, period, 1);
> -
> +               if (dl_server_apply_params(dl_se, runtime, period, 1))
> +                       return;
>                 dl_se->dl_server = 1;
>                 dl_se->dl_defer = 1;
>                 setup_new_dl_entity(dl_se);
> @@ -1675,7 +1675,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)
> 
>  void dl_server_stop(struct sched_dl_entity *dl_se)
>  {
> -       if (!dl_se->dl_runtime)
> +       if (!dl_se->dl_runtime || !dl_se->dl_server_active)
>                 return;
> 
>         dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-06-13 22:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-13  5:17 [PATCH v3 00/10] Add a deadline server for sched_ext tasks Joel Fernandes
2025-06-13  5:17 ` [PATCH v3 10/10] selftests/sched_ext: Add test for sched_ext dl_server Joel Fernandes
2025-06-13 17:35 ` [PATCH v3 00/10] Add a deadline server for sched_ext tasks Joel Fernandes
2025-06-13 18:05   ` Joel Fernandes
2025-06-13 22:44     ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).