* [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers
@ 2024-09-27 23:46 Tejun Heo
2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Tejun Heo @ 2024-09-27 23:46 UTC (permalink / raw)
To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext
During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
migration is disabled. sched_ext schedulers may perform operations such as
direct dispatch from ->select_task_rq() path and it is useful for them to
know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
and end up assuming incorrectly that ->select_task_rq() was called for tasks
that are bound to a single CPU or migration disabled.
This patchset adds ENQUEUE_RQ_SELECTED to indicate whether
->select_task_rq() was called to ->enqueue_task() and propagate that to
sched_ext schedulers so that they can reliably detect whether
->select_task_rq() was skipped.
Peter, please let me know how 0001-0002 should be routed. If they get
applied to sched/urgent (when it opens), I'll pull it into
sched_ext/for-6.12-fixes and apply 0003 on top. If you'd prefer, I can take
all three through sched_ext/for-6.12-fixes too.
This patchset contains the following patches:
0001-sched-core-Make-select_task_rq-take-the-pointer-to-w.patch
0002-sched-core-Add-ENQUEUE_RQ_SELECTED-to-indicate-wheth.patch
0003-sched_ext-scx_qmap-Add-and-use-SCX_ENQ_CPU_SELECTED.patch
0001-0002 are sched/core patches to add ENQUEUE_RQ_SELECTED.
0003 makes sched_ext propagate the flag as SCX_ENQ_CPU_SELECTED and fix
scx_qmap using the new flag.
and can also be found in the following git branch:
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-ENQUEUE_RQ_SELECTED
diffstat follows. Thanks.
kernel/sched/core.c | 21 ++++++++++++++-------
kernel/sched/ext.c | 1 +
kernel/sched/sched.h | 3 +++
tools/sched_ext/scx_qmap.bpf.c | 4 ++--
4 files changed, 20 insertions(+), 9 deletions(-)
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value
2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
@ 2024-09-27 23:46 ` Tejun Heo
2024-09-28 0:38 ` David Vernet
2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2024-09-27 23:46 UTC (permalink / raw)
To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext, Tejun Heo
This will be used to allow select_task_rq() to indicate whether
->select_task_rq() was called by modifying *wake_flags.
This makes try_to_wake_up() call all functions that take wake_flags with
WF_TTWU set. Previously, only select_task_rq() was. Using the same flags is
more consistent, and, as the flag is only tested by ->select_task_rq()
implementations, it doesn't cause any behavior differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/core.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 43e453ab7e20..e70b57a5693e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3518,12 +3518,12 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
* The caller (fork, wakeup) owns p->pi_lock, ->cpus_ptr is stable.
*/
static inline
-int select_task_rq(struct task_struct *p, int cpu, int wake_flags)
+int select_task_rq(struct task_struct *p, int cpu, int *wake_flags)
{
lockdep_assert_held(&p->pi_lock);
if (p->nr_cpus_allowed > 1 && !is_migration_disabled(p))
- cpu = p->sched_class->select_task_rq(p, cpu, wake_flags);
+ cpu = p->sched_class->select_task_rq(p, cpu, *wake_flags);
else
cpu = cpumask_any(p->cpus_ptr);
@@ -4120,6 +4120,8 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
guard(preempt)();
int cpu, success = 0;
+ wake_flags |= WF_TTWU;
+
if (p == current) {
/*
* We're waking current, this means 'p->on_rq' and 'task_cpu(p)
@@ -4252,7 +4254,7 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
*/
smp_cond_load_acquire(&p->on_cpu, !VAL);
- cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
+ cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
if (task_cpu(p) != cpu) {
if (p->in_iowait) {
delayacct_blkio_end(p);
@@ -4793,6 +4795,7 @@ void wake_up_new_task(struct task_struct *p)
{
struct rq_flags rf;
struct rq *rq;
+ int wake_flags = WF_FORK;
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
WRITE_ONCE(p->__state, TASK_RUNNING);
@@ -4807,7 +4810,7 @@ void wake_up_new_task(struct task_struct *p)
*/
p->recent_used_cpu = task_cpu(p);
rseq_migrate(p);
- __set_task_cpu(p, select_task_rq(p, task_cpu(p), WF_FORK));
+ __set_task_cpu(p, select_task_rq(p, task_cpu(p), &wake_flags));
#endif
rq = __task_rq_lock(p, &rf);
update_rq_clock(rq);
@@ -4815,7 +4818,7 @@ void wake_up_new_task(struct task_struct *p)
activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
trace_sched_wakeup_new(p);
- wakeup_preempt(rq, p, WF_FORK);
+ wakeup_preempt(rq, p, wake_flags);
#ifdef CONFIG_SMP
if (p->sched_class->task_woken) {
/*
--
2.46.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
@ 2024-09-27 23:46 ` Tejun Heo
2024-09-28 0:38 ` David Vernet
2024-10-01 20:12 ` Tejun Heo
2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
2024-10-07 20:20 ` [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
3 siblings, 2 replies; 14+ messages in thread
From: Tejun Heo @ 2024-09-27 23:46 UTC (permalink / raw)
To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext, Tejun Heo
During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
migration is disabled. sched_ext schedulers may perform operations such as
direct dispatch from ->select_task_rq() path and it is useful for them to
know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
and end up assuming incorrectly that ->select_task_rq() was called for tasks
that are bound to a single CPU or migration disabled.
Make select_task_rq() indicate whether ->select_task_rq() was called by
setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
to ENQUEUE_RQ_SELECTED for ->enqueue_task().
This will be used by sched_ext to fix ->select_task_rq() skip detection.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/core.c | 8 ++++++--
kernel/sched/sched.h | 3 +++
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e70b57a5693e..aeb595514461 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3522,10 +3522,12 @@ int select_task_rq(struct task_struct *p, int cpu, int *wake_flags)
{
lockdep_assert_held(&p->pi_lock);
- if (p->nr_cpus_allowed > 1 && !is_migration_disabled(p))
+ if (p->nr_cpus_allowed > 1 && !is_migration_disabled(p)) {
cpu = p->sched_class->select_task_rq(p, cpu, *wake_flags);
- else
+ *wake_flags |= WF_RQ_SELECTED;
+ } else {
cpu = cpumask_any(p->cpus_ptr);
+ }
/*
* In order not to call set_task_cpu() on a blocking task we need
@@ -3659,6 +3661,8 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
rq->nr_uninterruptible--;
#ifdef CONFIG_SMP
+ if (wake_flags & WF_RQ_SELECTED)
+ en_flags |= ENQUEUE_RQ_SELECTED;
if (wake_flags & WF_MIGRATED)
en_flags |= ENQUEUE_MIGRATED;
else
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b1c3588a8f00..6085ef50febf 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2292,6 +2292,7 @@ static inline int task_on_rq_migrating(struct task_struct *p)
#define WF_SYNC 0x10 /* Waker goes to sleep after wakeup */
#define WF_MIGRATED 0x20 /* Internal use, task got migrated */
#define WF_CURRENT_CPU 0x40 /* Prefer to move the wakee to the current CPU. */
+#define WF_RQ_SELECTED 0x80 /* ->select_task_rq() was called */
#ifdef CONFIG_SMP
static_assert(WF_EXEC == SD_BALANCE_EXEC);
@@ -2334,6 +2335,7 @@ extern const u32 sched_prio_to_wmult[40];
* ENQUEUE_HEAD - place at front of runqueue (tail if not specified)
* ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline)
* ENQUEUE_MIGRATED - the task was migrated during wakeup
+ * ENQUEUE_RQ_SELECTED - ->select_task_rq() was called
*
*/
@@ -2360,6 +2362,7 @@ extern const u32 sched_prio_to_wmult[40];
#define ENQUEUE_INITIAL 0x80
#define ENQUEUE_MIGRATING 0x100
#define ENQUEUE_DELAYED 0x200
+#define ENQUEUE_RQ_SELECTED 0x400
#define RETRY_TASK ((void *)-1UL)
--
2.46.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
@ 2024-09-27 23:46 ` Tejun Heo
2024-09-28 0:39 ` David Vernet
2024-09-28 13:21 ` Andrea Righi
2024-10-07 20:20 ` [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
3 siblings, 2 replies; 14+ messages in thread
From: Tejun Heo @ 2024-09-27 23:46 UTC (permalink / raw)
To: void, peterz, mingo
Cc: kernel-team, linux-kernel, sched-ext, Tejun Heo, Daniel Hodges,
Changwoo Min, Andrea Righi, Dan Schatzberg
scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
tell whether ops.select_cpu() was called. This is incorrect as
ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
incorrectly skipping direct dispatch for tasks that are bound to a single
CPU.
sched core has been udpated to specify ENQUEUE_RQ_SELECTED if
->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
scx_qmap to test it instead of SCX_ENQ_WAKEUP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
---
kernel/sched/ext.c | 1 +
tools/sched_ext/scx_qmap.bpf.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index c09e3dc38c34..9f00c8b629f1 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -691,6 +691,7 @@ enum scx_enq_flags {
/* expose select ENQUEUE_* flags as enums */
SCX_ENQ_WAKEUP = ENQUEUE_WAKEUP,
SCX_ENQ_HEAD = ENQUEUE_HEAD,
+ SCX_ENQ_CPU_SELECTED = ENQUEUE_RQ_SELECTED,
/* high 32bits are SCX specific */
diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c
index 83c8f54c1e31..588b7dce44fa 100644
--- a/tools/sched_ext/scx_qmap.bpf.c
+++ b/tools/sched_ext/scx_qmap.bpf.c
@@ -230,8 +230,8 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags)
return;
}
- /* if !WAKEUP, select_cpu() wasn't called, try direct dispatch */
- if (!(enq_flags & SCX_ENQ_WAKEUP) &&
+ /* if select_cpu() wasn't called, try direct dispatch */
+ if (!(enq_flags & SCX_ENQ_CPU_SELECTED) &&
(cpu = pick_direct_dispatch_cpu(p, scx_bpf_task_cpu(p))) >= 0) {
__sync_fetch_and_add(&nr_ddsp_from_enq, 1);
scx_bpf_dispatch(p, SCX_DSQ_LOCAL_ON | cpu, slice_ns, enq_flags);
--
2.46.2
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value
2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
@ 2024-09-28 0:38 ` David Vernet
0 siblings, 0 replies; 14+ messages in thread
From: David Vernet @ 2024-09-28 0:38 UTC (permalink / raw)
To: Tejun Heo; +Cc: peterz, mingo, kernel-team, linux-kernel, sched-ext
[-- Attachment #1: Type: text/plain, Size: 577 bytes --]
On Fri, Sep 27, 2024 at 01:46:11PM -1000, Tejun Heo wrote:
> This will be used to allow select_task_rq() to indicate whether
> ->select_task_rq() was called by modifying *wake_flags.
>
> This makes try_to_wake_up() call all functions that take wake_flags with
> WF_TTWU set. Previously, only select_task_rq() was. Using the same flags is
> more consistent, and, as the flag is only tested by ->select_task_rq()
> implementations, it doesn't cause any behavior differences.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
@ 2024-09-28 0:38 ` David Vernet
2024-10-01 20:12 ` Tejun Heo
1 sibling, 0 replies; 14+ messages in thread
From: David Vernet @ 2024-09-28 0:38 UTC (permalink / raw)
To: Tejun Heo; +Cc: peterz, mingo, kernel-team, linux-kernel, sched-ext
[-- Attachment #1: Type: text/plain, Size: 970 bytes --]
On Fri, Sep 27, 2024 at 01:46:12PM -1000, Tejun Heo wrote:
> During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> migration is disabled. sched_ext schedulers may perform operations such as
> direct dispatch from ->select_task_rq() path and it is useful for them to
> know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
>
> Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> and end up assuming incorrectly that ->select_task_rq() was called for tasks
> that are bound to a single CPU or migration disabled.
>
> Make select_task_rq() indicate whether ->select_task_rq() was called by
> setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
> to ENQUEUE_RQ_SELECTED for ->enqueue_task().
>
> This will be used by sched_ext to fix ->select_task_rq() skip detection.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
@ 2024-09-28 0:39 ` David Vernet
2024-09-28 13:21 ` Andrea Righi
1 sibling, 0 replies; 14+ messages in thread
From: David Vernet @ 2024-09-28 0:39 UTC (permalink / raw)
To: Tejun Heo
Cc: peterz, mingo, kernel-team, linux-kernel, sched-ext,
Daniel Hodges, Changwoo Min, Andrea Righi, Dan Schatzberg
[-- Attachment #1: Type: text/plain, Size: 673 bytes --]
On Fri, Sep 27, 2024 at 01:46:13PM -1000, Tejun Heo wrote:
> scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
> tell whether ops.select_cpu() was called. This is incorrect as
> ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
> incorrectly skipping direct dispatch for tasks that are bound to a single
> CPU.
>
> sched core has been udpated to specify ENQUEUE_RQ_SELECTED if
s/udpated/updated
> ->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
> scx_qmap to test it instead of SCX_ENQ_WAKEUP.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
2024-09-28 0:39 ` David Vernet
@ 2024-09-28 13:21 ` Andrea Righi
2024-09-28 16:52 ` Tejun Heo
1 sibling, 1 reply; 14+ messages in thread
From: Andrea Righi @ 2024-09-28 13:21 UTC (permalink / raw)
To: Tejun Heo
Cc: void, peterz, mingo, kernel-team, linux-kernel, sched-ext,
Daniel Hodges, Changwoo Min, Dan Schatzberg
On Fri, Sep 27, 2024 at 01:46:13PM -1000, Tejun Heo wrote:
> scx_qmap and other schedulers in the SCX repo are using SCX_ENQ_WAKEUP to
> tell whether ops.select_cpu() was called. This is incorrect as
> ops.select_cpu() can be skipped in the wakeup path and leads to e.g.
> incorrectly skipping direct dispatch for tasks that are bound to a single
> CPU.
>
> sched core has been udpated to specify ENQUEUE_RQ_SELECTED if
> ->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
> scx_qmap to test it instead of SCX_ENQ_WAKEUP.
Even if it's quite convenient to have the SCX_ENQ_CPU_SELECTED flag
provided by kernel, I was wondering if we could just delegate this whole
logic to BPF and avoid introducing this extra flag in the kernel.
In theory we could track when ops.select_cpu() is called by setting a
flag in the BPF task context (BPF_MAP_TYPE_TASK_STORAGE). Specifically,
the flag could be set in ops.select_cpu() and cleared in ops.stopping().
Then, when ops.enqueue() is called, we can check the flag to determine
whether ops.select_cpu() was skipped or not.
Since most of the scx schedulers already implement their own task
context, this shouldn't add too much complexity/overhead to the BPF
code, it'd be fully backward-compatible and it doesn't depend on the
particular kernel logic that calls ->select_task_rq(). WDYT?
Thanks,
-Andrea
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: David Vernet <void@manifault.com>
> Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
> Cc: Changwoo Min <multics69@gmail.com>
> Cc: Andrea Righi <andrea.righi@linux.dev>
> Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
> ---
> kernel/sched/ext.c | 1 +
> tools/sched_ext/scx_qmap.bpf.c | 4 ++--
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index c09e3dc38c34..9f00c8b629f1 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -691,6 +691,7 @@ enum scx_enq_flags {
> /* expose select ENQUEUE_* flags as enums */
> SCX_ENQ_WAKEUP = ENQUEUE_WAKEUP,
> SCX_ENQ_HEAD = ENQUEUE_HEAD,
> + SCX_ENQ_CPU_SELECTED = ENQUEUE_RQ_SELECTED,
>
> /* high 32bits are SCX specific */
>
> diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c
> index 83c8f54c1e31..588b7dce44fa 100644
> --- a/tools/sched_ext/scx_qmap.bpf.c
> +++ b/tools/sched_ext/scx_qmap.bpf.c
> @@ -230,8 +230,8 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags)
> return;
> }
>
> - /* if !WAKEUP, select_cpu() wasn't called, try direct dispatch */
> - if (!(enq_flags & SCX_ENQ_WAKEUP) &&
> + /* if select_cpu() wasn't called, try direct dispatch */
> + if (!(enq_flags & SCX_ENQ_CPU_SELECTED) &&
> (cpu = pick_direct_dispatch_cpu(p, scx_bpf_task_cpu(p))) >= 0) {
> __sync_fetch_and_add(&nr_ddsp_from_enq, 1);
> scx_bpf_dispatch(p, SCX_DSQ_LOCAL_ON | cpu, slice_ns, enq_flags);
> --
> 2.46.2
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
2024-09-28 13:21 ` Andrea Righi
@ 2024-09-28 16:52 ` Tejun Heo
0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2024-09-28 16:52 UTC (permalink / raw)
To: Andrea Righi
Cc: void, peterz, mingo, kernel-team, linux-kernel, sched-ext,
Daniel Hodges, Changwoo Min, Dan Schatzberg
Hello,
On Sat, Sep 28, 2024 at 03:21:46PM +0200, Andrea Righi wrote:
...
> > sched core has been udpated to specify ENQUEUE_RQ_SELECTED if
> > ->select_task_rq() was called. Map it to SCX_ENQ_CPU_SELECTED and update
> > scx_qmap to test it instead of SCX_ENQ_WAKEUP.
>
> Even if it's quite convenient to have the SCX_ENQ_CPU_SELECTED flag
> provided by kernel, I was wondering if we could just delegate this whole
> logic to BPF and avoid introducing this extra flag in the kernel.
>
> In theory we could track when ops.select_cpu() is called by setting a
> flag in the BPF task context (BPF_MAP_TYPE_TASK_STORAGE). Specifically,
> the flag could be set in ops.select_cpu() and cleared in ops.stopping().
> Then, when ops.enqueue() is called, we can check the flag to determine
> whether ops.select_cpu() was skipped or not.
>
> Since most of the scx schedulers already implement their own task
> context, this shouldn't add too much complexity/overhead to the BPF
> code, it'd be fully backward-compatible and it doesn't depend on the
> particular kernel logic that calls ->select_task_rq(). WDYT?
Yeah, that would work too and probably what we should do to work around on
older kernels, but also it's a relatively obvious hole in the API and we
don't lose anything by updating the kernel to indicate the state.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
2024-09-28 0:38 ` David Vernet
@ 2024-10-01 20:12 ` Tejun Heo
2024-10-04 20:14 ` Tejun Heo
1 sibling, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2024-10-01 20:12 UTC (permalink / raw)
To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext
On Fri, Sep 27, 2024 at 01:46:12PM -1000, Tejun Heo wrote:
> During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> migration is disabled. sched_ext schedulers may perform operations such as
> direct dispatch from ->select_task_rq() path and it is useful for them to
> know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
>
> Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> and end up assuming incorrectly that ->select_task_rq() was called for tasks
> that are bound to a single CPU or migration disabled.
>
> Make select_task_rq() indicate whether ->select_task_rq() was called by
> setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
> to ENQUEUE_RQ_SELECTED for ->enqueue_task().
>
> This will be used by sched_ext to fix ->select_task_rq() skip detection.
Peter, ping on patches 1-2. If they look okay, I can route them through
sched_ext/for-6.12-fixes.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
2024-10-01 20:12 ` Tejun Heo
@ 2024-10-04 20:14 ` Tejun Heo
2024-10-05 9:38 ` Peter Zijlstra
0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2024-10-04 20:14 UTC (permalink / raw)
To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext
On Tue, Oct 01, 2024 at 10:12:53AM -1000, Tejun Heo wrote:
> On Fri, Sep 27, 2024 at 01:46:12PM -1000, Tejun Heo wrote:
> > During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> > migration is disabled. sched_ext schedulers may perform operations such as
> > direct dispatch from ->select_task_rq() path and it is useful for them to
> > know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
> >
> > Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> > and end up assuming incorrectly that ->select_task_rq() was called for tasks
> > that are bound to a single CPU or migration disabled.
> >
> > Make select_task_rq() indicate whether ->select_task_rq() was called by
> > setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
> > to ENQUEUE_RQ_SELECTED for ->enqueue_task().
> >
> > This will be used by sched_ext to fix ->select_task_rq() skip detection.
>
> Peter, ping on patches 1-2. If they look okay, I can route them through
> sched_ext/for-6.12-fixes.
As the core changes are on the trivial side, I'll wait till early next week
and route them through sched_ext/for-6.12-fixes. Please holler for any
concerns.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
2024-10-04 20:14 ` Tejun Heo
@ 2024-10-05 9:38 ` Peter Zijlstra
2024-10-07 16:44 ` Tejun Heo
0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2024-10-05 9:38 UTC (permalink / raw)
To: Tejun Heo; +Cc: void, mingo, kernel-team, linux-kernel, sched-ext
On Fri, Oct 04, 2024 at 10:14:20AM -1000, Tejun Heo wrote:
> On Tue, Oct 01, 2024 at 10:12:53AM -1000, Tejun Heo wrote:
> > On Fri, Sep 27, 2024 at 01:46:12PM -1000, Tejun Heo wrote:
> > > During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> > > migration is disabled. sched_ext schedulers may perform operations such as
> > > direct dispatch from ->select_task_rq() path and it is useful for them to
> > > know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
> > >
> > > Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> > > and end up assuming incorrectly that ->select_task_rq() was called for tasks
> > > that are bound to a single CPU or migration disabled.
> > >
> > > Make select_task_rq() indicate whether ->select_task_rq() was called by
> > > setting WF_RQ_SELECTED in *wake_flags and make ttwu_do_activate() map that
> > > to ENQUEUE_RQ_SELECTED for ->enqueue_task().
> > >
> > > This will be used by sched_ext to fix ->select_task_rq() skip detection.
> >
> > Peter, ping on patches 1-2. If they look okay, I can route them through
> > sched_ext/for-6.12-fixes.
>
> As the core changes are on the trivial side, I'll wait till early next week
> and route them through sched_ext/for-6.12-fixes. Please holler for any
> concerns.
Right, I've been busy chasing merge window fallout, and this didn't seem
that urgent. I'm not exactly liking it, but it isn't too horrible. So
yeah, you can take it I suppose.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
2024-10-05 9:38 ` Peter Zijlstra
@ 2024-10-07 16:44 ` Tejun Heo
0 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2024-10-07 16:44 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: void, mingo, kernel-team, linux-kernel, sched-ext
On Sat, Oct 05, 2024 at 11:38:04AM +0200, Peter Zijlstra wrote:
> Right, I've been busy chasing merge window fallout, and this didn't seem
> that urgent. I'm not exactly liking it, but it isn't too horrible. So
> yeah, you can take it I suppose.
Thanks for taking a look. Will route them through sched_ext/for-6.12-fixes.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers
2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
` (2 preceding siblings ...)
2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
@ 2024-10-07 20:20 ` Tejun Heo
3 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2024-10-07 20:20 UTC (permalink / raw)
To: void, peterz, mingo; +Cc: kernel-team, linux-kernel, sched-ext
On Fri, Sep 27, 2024 at 01:46:10PM -1000, Tejun Heo wrote:
> During ttwu, ->select_task_rq() can be skipped if only one CPU is allowed or
> migration is disabled. sched_ext schedulers may perform operations such as
> direct dispatch from ->select_task_rq() path and it is useful for them to
> know whether ->select_task_rq() was skipped in the ->enqueue_task() path.
>
> Currently, sched_ext schedulers are using ENQUEUE_WAKEUP for this purpose
> and end up assuming incorrectly that ->select_task_rq() was called for tasks
> that are bound to a single CPU or migration disabled.
>
> This patchset adds ENQUEUE_RQ_SELECTED to indicate whether
> ->select_task_rq() was called to ->enqueue_task() and propagate that to
> sched_ext schedulers so that they can reliably detect whether
> ->select_task_rq() was skipped.
>
> Peter, please let me know how 0001-0002 should be routed. If they get
> applied to sched/urgent (when it opens), I'll pull it into
> sched_ext/for-6.12-fixes and apply 0003 on top. If you'd prefer, I can take
> all three through sched_ext/for-6.12-fixes too.
>
> This patchset contains the following patches:
>
> 0001-sched-core-Make-select_task_rq-take-the-pointer-to-w.patch
> 0002-sched-core-Add-ENQUEUE_RQ_SELECTED-to-indicate-wheth.patch
> 0003-sched_ext-scx_qmap-Add-and-use-SCX_ENQ_CPU_SELECTED.patch
>
> 0001-0002 are sched/core patches to add ENQUEUE_RQ_SELECTED.
>
> 0003 makes sched_ext propagate the flag as SCX_ENQ_CPU_SELECTED and fix
> scx_qmap using the new flag.
Applied 1-3 to sched_ext/for-6.12-fixes.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-10-07 20:20 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-27 23:46 [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
2024-09-27 23:46 ` [PATCH 1/3] sched/core: Make select_task_rq() take the pointer to wake_flags instead of value Tejun Heo
2024-09-28 0:38 ` David Vernet
2024-09-27 23:46 ` [PATCH 2/3] sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called Tejun Heo
2024-09-28 0:38 ` David Vernet
2024-10-01 20:12 ` Tejun Heo
2024-10-04 20:14 ` Tejun Heo
2024-10-05 9:38 ` Peter Zijlstra
2024-10-07 16:44 ` Tejun Heo
2024-09-27 23:46 ` [PATCH 3/3] sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED Tejun Heo
2024-09-28 0:39 ` David Vernet
2024-09-28 13:21 ` Andrea Righi
2024-09-28 16:52 ` Tejun Heo
2024-10-07 20:20 ` [PATCHSET sched/urgent, sched_ext/for-6.12-fixes] sched/core, sched_ext: Add ENQUEUE_RQ_SELECTED to fix ->select_task_rq() skip detection in sched_ext schedulers Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox