From: Peter Zijlstra <peterz@infradead.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: John Stultz <jstultz@google.com>,
LKML <linux-kernel@vger.kernel.org>,
zhidao su <suzhidao@xiaomi.com>, Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Joel Fernandes <joelagnelf@nvidia.com>,
Qais Yousef <qyousef@layalina.io>,
Xuewen Yan <xuewen.yan94@gmail.com>,
Suleiman Souhlal <suleiman@google.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
soolaugust@gmail.com, kernel-team@android.com
Subject: Re: [RFC][PATCH 2/2] sched: proxy-exec: Add allow/prevent_migration hooks in the sched classes for proxy_tag_curr
Date: Thu, 5 Mar 2026 15:46:18 +0100 [thread overview]
Message-ID: <20260305144618.GC2277644@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <d5ac8ea9-e9e3-4888-b9a2-c316d6209b70@amd.com>
On Thu, Mar 05, 2026 at 09:32:05AM +0530, K Prateek Nayak wrote:
> Hello Peter,
>
> On 3/4/2026 6:48 PM, Peter Zijlstra wrote:
> > +static inline void set_proxy_task(struct task_struct *p)
> > {
> > - if (!sched_proxy_exec())
> > - return;
> > - /*
> > - * pick_next_task() calls set_next_task() on the chosen task
> > - * at some point, which ensures it is not push/pullable.
> > - * However, the chosen/donor task *and* the mutex owner form an
> > - * atomic pair wrt push/pull.
> > - *
> > - * Make sure owner we run is not pushable. Unfortunately we can
> > - * only deal with that by means of a dequeue/enqueue cycle. :-/
> > - */
> > - dequeue_task(rq, owner, DEQUEUE_NOCLOCK | DEQUEUE_SAVE);
> > - enqueue_task(rq, owner, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE);
> > + WARN_ON_ONCE(p->migration_flags & MDF_PROXY);
> > + p->migration_flags |= MDF_PROXY;
> > + p->migration_disabled++;
> > +}
> > +
> > +static inline void put_proxy_task(struct task_struct *p)
> > +{
> > + WARN_ON_ONCE(!(p->migration_flags & MDF_PROXY));
> > + p->migration_flags &= ~MDF_PROXY;
> > + p->migration_disabled--;
>
> Note: I'm not too familiar with the set_affinity bits so my
> understanding might be all wrong but ...
>
> Doesn't the set_affinity bits have a completion based wait for tasks
> that are migration disabled? If we have a case like:
>
> P0 P1
> == ==
>
> migrate_disable()
> p->migration_disabled++; /* 1 */
> ...
> mutex_lock()
> ...
>
> /* preempted */
> /* proxied */
> set_proxy_task(P0)
> p->migration_disabled++; /* 2 */
> ...
> /* Continues running */
> set_cpus_allowed_ptr(P0)
> /* Task CPU not in the new mask. */
> affine_move_task()
> /*
> * blocks as per the comment
> * above affine_move_task().
> */
> migrate_enable()
> if (p->migration_disabled > 1)
> p->migration_disabled--; /* 1 */
> return;
> ...
> mutex_unlock();
> /* Goes into schedule. */
> put_proxy_task(P0)
> p->migration_disabled--; /* 0 */
>
> /* !!! Who does the migration + wakeup? !!! */
>
>
> Isn't it up to the last migrate_enable() (or in this case,
> put_proxy_task()) to schedule in the stopper and push the prev to
> another CPU? Or is it handled in some other way?
Indeed; so I think we can fix that by doing something like the below.
Have actual migrate_{dis,en}able {inc,dec}rement by 2 and have
this proxy thing {inc,dec} by 1.
Then have migrate_enable() ignore the low bit, such that 3->1 does the
slow-path and issues the completion.
So we only have the low bit set for the current proxy task; IOW any
non-current task will have it clear.
Then there are 3 sites that do the completion:
- migrate_cpu_stop()
- affine_move_task(); (1) when the mask fits the current location
- affine_move_task(); (2) when the mask doesn't fit and the task is not
running
migrate_cpu_stop() is safe, because the stopper task can neither block
nor be the owner of a lock and must this exist outside of PE,
furthermore if it is runnung, no other task is running and thus the
target task cannot have the low bit set.
affine_move_task() case (1) is obviously fine.
affine_move_task() case (2) is also fine, because similar to
migrate_cpu_stop() the target task is found to not be running, and
therefore it cannot have the low bit set.
*lightbulb*... but, doesn't that mean that we don't need any of this at
all, and could simply make sure RT/DL refuse to migrate task_on_cpu(p) ?
---
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2412,8 +2412,8 @@ static inline void __migrate_enable(void
return;
#endif
- if (p->migration_disabled > 1) {
- p->migration_disabled--;
+ if (p->migration_disabled > 3) {
+ p->migration_disabled -= 2;
return;
}
@@ -2430,7 +2430,7 @@ static inline void __migrate_enable(void
* select_fallback_rq) get confused.
*/
barrier();
- p->migration_disabled = 0;
+ p->migration_disabled -= 2;
this_rq_pinned()--;
}
@@ -2445,13 +2445,13 @@ static inline void __migrate_disable(voi
*/
WARN_ON_ONCE((s16)p->migration_disabled < 0);
#endif
- p->migration_disabled++;
+ p->migration_disabled += 2;
return;
}
guard(preempt)();
this_rq_pinned()++;
- p->migration_disabled = 1;
+ p->migration_disabled += 2;
}
#else /* !COMPILE_OFFSETS */
static inline void __migrate_disable(void) { }
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2410,11 +2410,7 @@ static void migrate_disable_switch(struc
.flags = SCA_MIGRATE_DISABLE,
};
- if (likely(!p->migration_disabled))
- return;
-
- if ((p->migration_flags & MDF_PROXY) &&
- p->migration_disabled == 1)
+ if (likely(!(p->migration_disabled & ~1)))
return;
if (p->cpus_ptr != &p->cpus_mask)
@@ -6758,15 +6754,13 @@ find_proxy_task(struct rq *rq, struct ta
static inline void set_proxy_task(struct task_struct *p)
{
- WARN_ON_ONCE(p->migration_flags & MDF_PROXY);
- p->migration_flags |= MDF_PROXY;
+ WARN_ON_ONCE(p->migration_disabled & 1);
p->migration_disabled++;
}
static inline void put_proxy_task(struct task_struct *p)
{
- WARN_ON_ONCE(!(p->migration_flags & MDF_PROXY));
- p->migration_flags &= ~MDF_PROXY;
+ WARN_ON_ONCE(!(p->migration_disabled & 1));
p->migration_disabled--;
}
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1368,7 +1368,6 @@ static inline int cpu_of(struct rq *rq)
}
#define MDF_PUSH 0x01
-#define MDF_PROXY 0x02
static inline bool is_migration_disabled(struct task_struct *p)
{
next prev parent reply other threads:[~2026-03-05 14:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 6:38 [RFC][PATCH 1/2] sched: proxy-exec: Fix tasks being left unpushable from proxy_tag_curr() John Stultz
2026-03-04 6:38 ` [RFC][PATCH 2/2] sched: proxy-exec: Add allow/prevent_migration hooks in the sched classes for proxy_tag_curr John Stultz
2026-03-04 13:18 ` Peter Zijlstra
2026-03-05 4:02 ` K Prateek Nayak
2026-03-05 14:46 ` Peter Zijlstra [this message]
2026-03-07 1:36 ` John Stultz
2026-03-07 7:39 ` [RFC][PATCH] sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr() John Stultz
2026-03-07 9:40 ` Peter Zijlstra
2026-03-05 7:31 ` [RFC][PATCH 2/2] sched: proxy-exec: Add allow/prevent_migration hooks in the sched classes for proxy_tag_curr John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260305144618.GC2277644@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=soolaugust@gmail.com \
--cc=suleiman@google.com \
--cc=suzhidao@xiaomi.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=xuewen.yan94@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.