public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers
@ 2024-09-27 23:46 Tejun Heo
  2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Tejun Heo @ 2024-09-27 23:46 UTC (permalink / raw)
  To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext

During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
migration is disabled. sched_ext schedulers may perform operations such as
direct dispatch from ->select_task_rq() path and it is useful for them to
know whether ->select_task_rq() was skipped in the ->enqueue_task() path.

Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
and end up assuming incorrectly that ->select_task_rq() was called for tasks
that are bound to a single CPU or migration disabled.

This patchset adds ENQUEUE_RQ_SELECTED to indicate whether
->select_task_rq() was called to ->enqueue_task() and propagate that to
sched_ext schedulers so that they can reliably detect whether
->select_task_rq() was skipped.

Peter, please let me know how 0001-0002 should be routed. If they get
applied to sched/urgent (when it opens), I'll pull it into
sched_ext/for-6.12-fixes and apply 0003 on top. If you'd prefer, I can take
all three through sched_ext/for-6.12-fixes too.

This patchset contains the following patches:

 0001-sched-core-Make-select_task_rq-take-the-pointer-to-w.patch
 0002-sched-core-Add-ENQUEUE_RQ_SELECTED-to-indicate-wheth.patch
 0003-sched_ext-scx_qmap-Add-and-use-SCX_ENQ_CPU_SELECTED.patch

 0001-0002 are sched/core patches to add ENQUEUE_RQ_SELECTED.

 0003 makes sched_ext propagate the flag as SCX_ENQ_CPU_SELECTED and fix
 scx_qmap using the new flag.

and can also be found in the following git branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-ENQUEUE_RQ_SELECTED

diffstat follows. Thanks.

 kernel/sched/core.c            |   21 ++++++++++++++-------
 kernel/sched/ext.c             |    1 +
 kernel/sched/sched.h           |    3 +++
 tools/sched_ext/scx_qmap.bpf.c |    4 ++--
 4 files changed, 20 insertions(+), 9 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value
  2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
@ 2024-09-27 23:46 ` Tejun Heo
  2024-09-28  0:38   ` David Vernet
  2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2024-09-27 23:46 UTC (permalink / raw)
  To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext, Tejun Heo

This will be used to allow select_task_rq() to indicate whether
->select_task_rq() was called by modifying *wake_flags.

This makes try_to_wake_up() call all functions that take wake_flags with
WF_TTWU set. Previously, only select_task_rq() was. Using the same flags is
more consistent, and, as the flag is only tested by ->select_task_rq()
implementations, it doesn't cause any behavior differences.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/core.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 43e453ab7e20..e70b57a5693e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3518,12 +3518,12 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_ptr is stable.
  */
 static inline
-int select_task_rq(struct task_struct *p, int cpu, int wake_flags)
+int select_task_rq(struct task_struct *p, int cpu, int *wake_flags)
 {
 	lockdep_assert_held(&p->pi_lock);
 
 	if (p->nr_cpus_allowed > 1 && !is_migration_disabled(p))
-		cpu = p->sched_class->select_task_rq(p, cpu, wake_flags);
+		cpu = p->sched_class->select_task_rq(p, cpu, *wake_flags);
 	else
 		cpu = cpumask_any(p->cpus_ptr);
 
@@ -4120,6 +4120,8 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	guard(preempt)();
 	int cpu, success = 0;
 
+	wake_flags |= WF_TTWU;
+
 	if (p == current) {
 		/*
 		 * We're waking current, this means 'p->on_rq' and 'task_cpu(p)
@@ -4252,7 +4254,7 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		 */
 		smp_cond_load_acquire(&p->on_cpu, !VAL);
 
-		cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
+		cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
 		if (task_cpu(p) != cpu) {
 			if (p->in_iowait) {
 				delayacct_blkio_end(p);
@@ -4793,6 +4795,7 @@ void wake_up_new_task(struct task_struct *p)
 {
 	struct rq_flags rf;
 	struct rq *rq;
+	int wake_flags = WF_FORK;
 
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 	WRITE_ONCE(p->__state, TASK_RUNNING);
@@ -4807,7 +4810,7 @@ void wake_up_new_task(struct task_struct *p)
 	 */
 	p->recent_used_cpu = task_cpu(p);
 	rseq_migrate(p);
-	__set_task_cpu(p, select_task_rq(p, task_cpu(p), WF_FORK));
+	__set_task_cpu(p, select_task_rq(p, task_cpu(p), &wake_flags));
 #endif
 	rq = __task_rq_lock(p, &rf);
 	update_rq_clock(rq);
@@ -4815,7 +4818,7 @@ void wake_up_new_task(struct task_struct *p)
 
 	activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
 	trace_sched_wakeup_new(p);
-	wakeup_preempt(rq, p, WF_FORK);
+	wakeup_preempt(rq, p, wake_flags);
 #ifdef CONFIG_SMP
 	if (p->sched_class->task_woken) {
 		/*
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
  2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
  2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
@ 2024-09-27 23:46 ` Tejun Heo
  2024-09-28  0:38   ` David Vernet
  2024-10-01 20:12   ` Tejun Heo
  2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
  2024-10-07 20:20 ` [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
  3 siblings, 2 replies; 14+ messages in thread
From: Tejun Heo @ 2024-09-27 23:46 UTC (permalink / raw)
  To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext, Tejun Heo

During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
migration is disabled. sched_ext schedulers may perform operations such as
direct dispatch from ->select_task_rq() path and it is useful for them to
know whether ->select_task_rq() was skipped in the ->enqueue_task() path.

Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
and end up assuming incorrectly that ->select_task_rq() was called for tasks
that are bound to a single CPU or migration disabled.

Make select_task_rq() indicate whether ->select_task_rq() was called by
setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
to ENQUEUE_RQ_SELECTED for ->enqueue_task().

This will be used by sched_ext to fix ->select_task_rq() skip detection.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/core.c  | 8 ++++++--
 kernel/sched/sched.h | 3 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e70b57a5693e..aeb595514461 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3522,10 +3522,12 @@ int select_task_rq(struct task_struct *p, int cpu, int *wake_flags)
 {
 	lockdep_assert_held(&p->pi_lock);
 
-	if (p->nr_cpus_allowed > 1 && !is_migration_disabled(p))
+	if (p->nr_cpus_allowed > 1 && !is_migration_disabled(p)) {
 		cpu = p->sched_class->select_task_rq(p, cpu, *wake_flags);
-	else
+		*wake_flags |= WF_RQ_SELECTED;
+	} else {
 		cpu = cpumask_any(p->cpus_ptr);
+	}
 
 	/*
 	 * In order not to call set_task_cpu() on a blocking task we need
@@ -3659,6 +3661,8 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 		rq->nr_uninterruptible--;
 
 #ifdef CONFIG_SMP
+	if (wake_flags & WF_RQ_SELECTED)
+		en_flags |= ENQUEUE_RQ_SELECTED;
 	if (wake_flags & WF_MIGRATED)
 		en_flags |= ENQUEUE_MIGRATED;
 	else
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b1c3588a8f00..6085ef50febf 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2292,6 +2292,7 @@ static inline int task_on_rq_migrating(struct task_struct *p)
 #define WF_SYNC			0x10 /* Waker goes to sleep after wakeup */
 #define WF_MIGRATED		0x20 /* Internal use, task got migrated */
 #define WF_CURRENT_CPU		0x40 /* Prefer to move the wakee to the current CPU. */
+#define WF_RQ_SELECTED		0x80 /* ->select_task_rq() was called */
 
 #ifdef CONFIG_SMP
 static_assert(WF_EXEC == SD_BALANCE_EXEC);
@@ -2334,6 +2335,7 @@ extern const u32		sched_prio_to_wmult[40];
  * ENQUEUE_HEAD      - place at front of runqueue (tail if not specified)
  * ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline)
  * ENQUEUE_MIGRATED  - the task was migrated during wakeup
+ * ENQUEUE_RQ_SELECTED - ->select_task_rq() was called
  *
  */
 
@@ -2360,6 +2362,7 @@ extern const u32		sched_prio_to_wmult[40];
 #define ENQUEUE_INITIAL		0x80
 #define ENQUEUE_MIGRATING	0x100
 #define ENQUEUE_DELAYED		0x200
+#define ENQUEUE_RQ_SELECTED	0x400
 
 #define RETRY_TASK		((void *)-1UL)
 
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
  2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
  2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
  2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
@ 2024-09-27 23:46 ` Tejun Heo
  2024-09-28  0:39   ` David Vernet
  2024-09-28 13:21   ` Andrea Righi
  2024-10-07 20:20 ` [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
  3 siblings, 2 replies; 14+ messages in thread
From: Tejun Heo @ 2024-09-27 23:46 UTC (permalink / raw)
  To: void, peterz, mingo
  Cc: kernel-team, linux-kernel, sched-ext, Tejun Heo, Daniel Hodges,
	Changwoo Min, Andrea Righi, Dan Schatzberg

scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
tell whether ops.select_cpu() was called. This is incorrect as
ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
incorrectly skipping direct dispatch for tasks that are bound to a single
CPU.

sched core has been udpated to specify ENQUEUE_RQ_SELECTED if
->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
scx_qmap to test it instead of SCX_ENQ_WAKEUP.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
---
 kernel/sched/ext.c             | 1 +
 tools/sched_ext/scx_qmap.bpf.c | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c09e3dc38c34..9f00c8b629f1 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -691,6 +691,7 @@ enum scx_enq_flags {
 	/* expose select ENQUEUE_* flags as enums */
 	SCX_ENQ_WAKEUP		= ENQUEUE_WAKEUP,
 	SCX_ENQ_HEAD		= ENQUEUE_HEAD,
+	SCX_ENQ_CPU_SELECTED	= ENQUEUE_RQ_SELECTED,
 
 	/* high 32bits are SCX specific */
 
diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c
index 83c8f54c1e31..588b7dce44fa 100644
--- a/tools/sched_ext/scx_qmap.bpf.c
+++ b/tools/sched_ext/scx_qmap.bpf.c
@@ -230,8 +230,8 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags)
 		return;
 	}
 
-	/* if !WAKEUP, select_cpu() wasn't called, try direct dispatch */
-	if (!(enq_flags & SCX_ENQ_WAKEUP) &&
+	/* if select_cpu() wasn't called, try direct dispatch */
+	if (!(enq_flags & SCX_ENQ_CPU_SELECTED) &&
 	    (cpu = pick_direct_dispatch_cpu(p, scx_bpf_task_cpu(p))) >= 0) {
 		__sync_fetch_and_add(&nr_ddsp_from_enq, 1);
 		scx_bpf_dispatch(p, SCX_DSQ_LOCAL_ON | cpu, slice_ns, enq_flags);
-- 
2.46.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value
  2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
@ 2024-09-28  0:38   ` David Vernet
  0 siblings, 0 replies; 14+ messages in thread
From: David Vernet @ 2024-09-28  0:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: peterz, mingo, kernel-team, linux-kernel, sched-ext

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

On Fri, Sep 27, 2024 at 01:46:11PM -1000, Tejun Heo wrote:
> This will be used to allow select_task_rq() to indicate whether
> ->select_task_rq() was called by modifying *wake_flags.
> 
> This makes try_to_wake_up() call all functions that take wake_flags with
> WF_TTWU set. Previously, only select_task_rq() was. Using the same flags is
> more consistent, and, as the flag is only tested by ->select_task_rq()
> implementations, it doesn't cause any behavior differences.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Acked-by: David Vernet <void@manifault.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
  2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
@ 2024-09-28  0:38   ` David Vernet
  2024-10-01 20:12   ` Tejun Heo
  1 sibling, 0 replies; 14+ messages in thread
From: David Vernet @ 2024-09-28  0:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: peterz, mingo, kernel-team, linux-kernel, sched-ext

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

On Fri, Sep 27, 2024 at 01:46:12PM -1000, Tejun Heo wrote:
> During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> migration is disabled. sched_ext schedulers may perform operations such as
> direct dispatch from ->select_task_rq() path and it is useful for them to
> know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
> 
> Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> and end up assuming incorrectly that ->select_task_rq() was called for tasks
> that are bound to a single CPU or migration disabled.
> 
> Make select_task_rq() indicate whether ->select_task_rq() was called by
> setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
> to ENQUEUE_RQ_SELECTED for ->enqueue_task().
> 
> This will be used by sched_ext to fix ->select_task_rq() skip detection.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Acked-by: David Vernet <void@manifault.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
  2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
@ 2024-09-28  0:39   ` David Vernet
  2024-09-28 13:21   ` Andrea Righi
  1 sibling, 0 replies; 14+ messages in thread
From: David Vernet @ 2024-09-28  0:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: peterz, mingo, kernel-team, linux-kernel, sched-ext,
	Daniel Hodges, Changwoo Min, Andrea Righi, Dan Schatzberg

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]

On Fri, Sep 27, 2024 at 01:46:13PM -1000, Tejun Heo wrote:
> scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
> tell whether ops.select_cpu() was called. This is incorrect as
> ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
> incorrectly skipping direct dispatch for tasks that are bound to a single
> CPU.
> 
> sched core has been udpated to specify ENQUEUE_RQ_SELECTED if

s/udpated/updated

> ->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
> scx_qmap to test it instead of SCX_ENQ_WAKEUP.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Acked-by: David Vernet <void@manifault.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
  2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
  2024-09-28  0:39   ` David Vernet
@ 2024-09-28 13:21   ` Andrea Righi
  2024-09-28 16:52     ` Tejun Heo
  1 sibling, 1 reply; 14+ messages in thread
From: Andrea Righi @ 2024-09-28 13:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: void, peterz, mingo, kernel-team, linux-kernel, sched-ext,
	Daniel Hodges, Changwoo Min, Dan Schatzberg

On Fri, Sep 27, 2024 at 01:46:13PM -1000, Tejun Heo wrote:
> scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
> tell whether ops.select_cpu() was called. This is incorrect as
> ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
> incorrectly skipping direct dispatch for tasks that are bound to a single
> CPU.
> 
> sched core has been udpated to specify ENQUEUE_RQ_SELECTED if
> ->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
> scx_qmap to test it instead of SCX_ENQ_WAKEUP.

Even if it's quite convenient to have the SCX_ENQ_CPU_SELECTED flag
provided by kernel, I was wondering if we could just delegate this whole
logic to BPF and avoid introducing this extra flag in the kernel.

In theory we could track when ops.select_cpu() is called by setting a
flag in the BPF task context (BPF_MAP_TYPE_TASK_STORAGE). Specifically,
the flag could be set in ops.select_cpu() and cleared in ops.stopping().
Then, when ops.enqueue() is called, we can check the flag to determine
whether ops.select_cpu() was skipped or not.

Since most of the scx schedulers already implement their own task
context, this shouldn't add too much complexity/overhead to the BPF
code, it'd be fully backward-compatible and it doesn't depend on the
particular kernel logic that calls ->select_task_rq(). WDYT?

Thanks,
-Andrea

> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: David Vernet <void@manifault.com>
> Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
> Cc: Changwoo Min <multics69@gmail.com>
> Cc: Andrea Righi <andrea.righi@linux.dev>
> Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
> ---
>  kernel/sched/ext.c             | 1 +
>  tools/sched_ext/scx_qmap.bpf.c | 4 ++--
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index c09e3dc38c34..9f00c8b629f1 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -691,6 +691,7 @@ enum scx_enq_flags {
>  	/* expose select ENQUEUE_* flags as enums */
>  	SCX_ENQ_WAKEUP		= ENQUEUE_WAKEUP,
>  	SCX_ENQ_HEAD		= ENQUEUE_HEAD,
> +	SCX_ENQ_CPU_SELECTED	= ENQUEUE_RQ_SELECTED,
>  
>  	/* high 32bits are SCX specific */
>  
> diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c
> index 83c8f54c1e31..588b7dce44fa 100644
> --- a/tools/sched_ext/scx_qmap.bpf.c
> +++ b/tools/sched_ext/scx_qmap.bpf.c
> @@ -230,8 +230,8 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags)
>  		return;
>  	}
>  
> -	/* if !WAKEUP, select_cpu() wasn't called, try direct dispatch */
> -	if (!(enq_flags & SCX_ENQ_WAKEUP) &&
> +	/* if select_cpu() wasn't called, try direct dispatch */
> +	if (!(enq_flags & SCX_ENQ_CPU_SELECTED) &&
>  	    (cpu = pick_direct_dispatch_cpu(p, scx_bpf_task_cpu(p))) >= 0) {
>  		__sync_fetch_and_add(&nr_ddsp_from_enq, 1);
>  		scx_bpf_dispatch(p, SCX_DSQ_LOCAL_ON | cpu, slice_ns, enq_flags);
> -- 
> 2.46.2
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
  2024-09-28 13:21   ` Andrea Righi
@ 2024-09-28 16:52     ` Tejun Heo
  0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2024-09-28 16:52 UTC (permalink / raw)
  To: Andrea Righi
  Cc: void, peterz, mingo, kernel-team, linux-kernel, sched-ext,
	Daniel Hodges, Changwoo Min, Dan Schatzberg

Hello,

On Sat, Sep 28, 2024 at 03:21:46PM +0200, Andrea Righi wrote:
...
> > sched core has been udpated to specify ENQUEUE_RQ_SELECTED if
> > ->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
> > scx_qmap to test it instead of SCX_ENQ_WAKEUP.
> 
> Even if it's quite convenient to have the SCX_ENQ_CPU_SELECTED flag
> provided by kernel, I was wondering if we could just delegate this whole
> logic to BPF and avoid introducing this extra flag in the kernel.
> 
> In theory we could track when ops.select_cpu() is called by setting a
> flag in the BPF task context (BPF_MAP_TYPE_TASK_STORAGE). Specifically,
> the flag could be set in ops.select_cpu() and cleared in ops.stopping().
> Then, when ops.enqueue() is called, we can check the flag to determine
> whether ops.select_cpu() was skipped or not.
> 
> Since most of the scx schedulers already implement their own task
> context, this shouldn't add too much complexity/overhead to the BPF
> code, it'd be fully backward-compatible and it doesn't depend on the
> particular kernel logic that calls ->select_task_rq(). WDYT?

Yeah, that would work too and probably what we should do to work around on
older kernels, but also it's a relatively obvious hole in the API and we
don't lose anything by updating the kernel to indicate the state.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
  2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
  2024-09-28  0:38   ` David Vernet
@ 2024-10-01 20:12   ` Tejun Heo
  2024-10-04 20:14     ` Tejun Heo
  1 sibling, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2024-10-01 20:12 UTC (permalink / raw)
  To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext

On Fri, Sep 27, 2024 at 01:46:12PM -1000, Tejun Heo wrote:
> During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> migration is disabled. sched_ext schedulers may perform operations such as
> direct dispatch from ->select_task_rq() path and it is useful for them to
> know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
> 
> Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> and end up assuming incorrectly that ->select_task_rq() was called for tasks
> that are bound to a single CPU or migration disabled.
> 
> Make select_task_rq() indicate whether ->select_task_rq() was called by
> setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
> to ENQUEUE_RQ_SELECTED for ->enqueue_task().
> 
> This will be used by sched_ext to fix ->select_task_rq() skip detection.

Peter, ping on patches 1-2. If they look okay, I can route them through
sched_ext/for-6.12-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
  2024-10-01 20:12   ` Tejun Heo
@ 2024-10-04 20:14     ` Tejun Heo
  2024-10-05  9:38       ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2024-10-04 20:14 UTC (permalink / raw)
  To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext

On Tue, Oct 01, 2024 at 10:12:53AM -1000, Tejun Heo wrote:
> On Fri, Sep 27, 2024 at 01:46:12PM -1000, Tejun Heo wrote:
> > During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> > migration is disabled. sched_ext schedulers may perform operations such as
> > direct dispatch from ->select_task_rq() path and it is useful for them to
> > know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
> > 
> > Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> > and end up assuming incorrectly that ->select_task_rq() was called for tasks
> > that are bound to a single CPU or migration disabled.
> > 
> > Make select_task_rq() indicate whether ->select_task_rq() was called by
> > setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
> > to ENQUEUE_RQ_SELECTED for ->enqueue_task().
> > 
> > This will be used by sched_ext to fix ->select_task_rq() skip detection.
> 
> Peter, ping on patches 1-2. If they look okay, I can route them through
> sched_ext/for-6.12-fixes.

As the core changes are on the trivial side, I'll wait till early next week
and route them through sched_ext/for-6.12-fixes. Please holler for any
concerns.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
  2024-10-04 20:14     ` Tejun Heo
@ 2024-10-05  9:38       ` Peter Zijlstra
  2024-10-07 16:44         ` Tejun Heo
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2024-10-05  9:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, mingo, kernel-team, linux-kernel, sched-ext

On Fri, Oct 04, 2024 at 10:14:20AM -1000, Tejun Heo wrote:
> On Tue, Oct 01, 2024 at 10:12:53AM -1000, Tejun Heo wrote:
> > On Fri, Sep 27, 2024 at 01:46:12PM -1000, Tejun Heo wrote:
> > > During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> > > migration is disabled. sched_ext schedulers may perform operations such as
> > > direct dispatch from ->select_task_rq() path and it is useful for them to
> > > know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
> > > 
> > > Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> > > and end up assuming incorrectly that ->select_task_rq() was called for tasks
> > > that are bound to a single CPU or migration disabled.
> > > 
> > > Make select_task_rq() indicate whether ->select_task_rq() was called by
> > > setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
> > > to ENQUEUE_RQ_SELECTED for ->enqueue_task().
> > > 
> > > This will be used by sched_ext to fix ->select_task_rq() skip detection.
> > 
> > Peter, ping on patches 1-2. If they look okay, I can route them through
> > sched_ext/for-6.12-fixes.
> 
> As the core changes are on the trivial side, I'll wait till early next week
> and route them through sched_ext/for-6.12-fixes. Please holler for any
> concerns.

Right, I've been busy chasing merge window fallout, and this didn't seem
that urgent. I'm not exactly liking it, but it isn't too horrible. So
yeah, you can take it I suppose.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
  2024-10-05  9:38       ` Peter Zijlstra
@ 2024-10-07 16:44         ` Tejun Heo
  0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2024-10-07 16:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: void, mingo, kernel-team, linux-kernel, sched-ext

On Sat, Oct 05, 2024 at 11:38:04AM +0200, Peter Zijlstra wrote:
> Right, I've been busy chasing merge window fallout, and this didn't seem
> that urgent. I'm not exactly liking it, but it isn't too horrible. So
> yeah, you can take it I suppose.

Thanks for taking a look. Will route them through sched_ext/for-6.12-fixes.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers
  2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
                   ` (2 preceding siblings ...)
  2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
@ 2024-10-07 20:20 ` Tejun Heo
  3 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2024-10-07 20:20 UTC (permalink / raw)
  To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext

On Fri, Sep 27, 2024 at 01:46:10PM -1000, Tejun Heo wrote:
> During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> migration is disabled. sched_ext schedulers may perform operations such as
> direct dispatch from ->select_task_rq() path and it is useful for them to
> know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
> 
> Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> and end up assuming incorrectly that ->select_task_rq() was called for tasks
> that are bound to a single CPU or migration disabled.
> 
> This patchset adds ENQUEUE_RQ_SELECTED to indicate whether
> ->select_task_rq() was called to ->enqueue_task() and propagate that to
> sched_ext schedulers so that they can reliably detect whether
> ->select_task_rq() was skipped.
> 
> Peter, please let me know how 0001-0002 should be routed. If they get
> applied to sched/urgent (when it opens), I'll pull it into
> sched_ext/for-6.12-fixes and apply 0003 on top. If you'd prefer, I can take
> all three through sched_ext/for-6.12-fixes too.
> 
> This patchset contains the following patches:
> 
>  0001-sched-core-Make-select_task_rq-take-the-pointer-to-w.patch
>  0002-sched-core-Add-ENQUEUE_RQ_SELECTED-to-indicate-wheth.patch
>  0003-sched_ext-scx_qmap-Add-and-use-SCX_ENQ_CPU_SELECTED.patch
> 
>  0001-0002 are sched/core patches to add ENQUEUE_RQ_SELECTED.
> 
>  0003 makes sched_ext propagate the flag as SCX_ENQ_CPU_SELECTED and fix
>  scx_qmap using the new flag.

Applied 1-3 to sched_ext/for-6.12-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-10-07 20:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
2024-09-28  0:38   ` David Vernet
2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
2024-09-28  0:38   ` David Vernet
2024-10-01 20:12   ` Tejun Heo
2024-10-04 20:14     ` Tejun Heo
2024-10-05  9:38       ` Peter Zijlstra
2024-10-07 16:44         ` Tejun Heo
2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
2024-09-28  0:39   ` David Vernet
2024-09-28 13:21   ` Andrea Righi
2024-09-28 16:52     ` Tejun Heo
2024-10-07 20:20 ` [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox