From: Yury Norov <ynorov@nvidia.com>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, tglx@linutronix.de,
yury.norov@gmail.com, gregkh@linuxfoundation.org,
pbonzini@redhat.com, seanjc@google.com, kprateek.nayak@amd.com,
vschneid@redhat.com, iii@linux.ibm.com, huschle@linux.ibm.com,
rostedt@goodmis.org, dietmar.eggemann@arm.com, mgorman@suse.de,
bsegall@google.com, maddy@linux.ibm.com, srikar@linux.ibm.com,
hdanton@sina.com, chleroy@kernel.org, vineeth@bitbyteword.org,
joelagnelf@nvidia.com
Subject: Re: [PATCH v2 05/17] sched/core: allow only preferred CPUs in is_cpu_allowed
Date: Wed, 8 Apr 2026 14:09:04 -0400 [thread overview]
Message-ID: <adaZwCOUrozqkNej@yury> (raw)
In-Reply-To: <1430f070-a0cb-4c16-a96d-99acb5e53ab1@linux.ibm.com>
On Wed, Apr 08, 2026 at 06:26:20PM +0530, Shrikanth Hegde wrote:
> Hi Yury.
>
> On 4/8/26 6:35 AM, Yury Norov wrote:
> > On Wed, Apr 08, 2026 at 12:49:38AM +0530, Shrikanth Hegde wrote:
> > >
> > > When possible, choose a preferred CPUs to pick.
> > >
> > > Push task mechanism uses stopper thread which going to call
> > > select_fallback_rq and use this mechanism to pick only a preferred CPU.
> > >
> > > When task is affined only to non-preferred CPUs it should continue to
> > > run there. Detect that by checking if cpus_ptr and cpu_preferred_mask
> > > interesect or not.
> > >
> > > Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> > > ---
> > > kernel/sched/core.c | 17 ++++++++++++++---
> > > kernel/sched/sched.h | 12 ++++++++++++
> > > 2 files changed, 26 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 7ea05a7a717b..336e7c694eb7 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -2463,9 +2463,16 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
> > > if (is_migration_disabled(p))
> > > return cpu_online(cpu);
> > > - /* Non kernel threads are not allowed during either online or offline. */
> > > - if (!(p->flags & PF_KTHREAD))
> > > - return cpu_active(cpu);
> > > + /*
> > > + * Non kernel threads are not allowed during either online or offline.
> > > + * Ensure it is a preferred CPU to avoid further contention
> > > + */
> > > + if (!(p->flags & PF_KTHREAD)) {
> > > + if (!cpu_active(cpu))
> > > + return false;
> > > + if (!cpu_preferred(cpu) && task_can_run_on_preferred_cpu(p))
> > > + return false;
> > > + }
> > > /* KTHREAD_IS_PER_CPU is always allowed. */
> > > if (kthread_is_per_cpu(p))
> > > @@ -2475,6 +2482,10 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
> > > if (cpu_dying(cpu))
> > > return false;
> > > + /* Try on preferred CPU first */
> > > + if (!cpu_preferred(cpu) && task_can_run_on_preferred_cpu(p))
> > > + return false;
> >
>
> First one was regular tasks, this is for unbound kernel threads.
> Both will need. No?
>
> > You repeat this for the 2nd time. The cpu_preferred() call should go
> > inside task_can_run_on_preferred_cpu().
>
> I want to keep this check for cpu_preferred() first. the reason being
> it is inexpensive since it is bit check. Only if it fails, then one should
> bother about task_can_run_on_preferred_cpu which is O(N) as you said.
>
> I am using the task_can_run_on_preferred_cpu in push task mechanism too.
> PATCH 10/17. I get there only on non-preferred CPU.
> So can I keep as is?
It's more a matter of taste. I just don't like copy-pasting the same
logic. Maybe add something like:
task_is_preferred_on_cpu(task, cpu);
And use it here?
> > And can you please pick some shorter name?
> >
>
> task_has_preferred_cpus?
Yep, it's shorter.
> > > /* But are allowed during online. */
> > > return cpu_online(cpu);
> > > }
> > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > > index 88e0c93b9e21..7271af2ca64f 100644
> > > --- a/kernel/sched/sched.h
> > > +++ b/kernel/sched/sched.h
> > > @@ -4130,4 +4130,16 @@ DEFINE_CLASS_IS_UNCONDITIONAL(sched_change)
> > > #include "ext.h"
> > > +#ifdef CONFIG_PARAVIRT
> > > +static inline bool task_can_run_on_preferred_cpu(struct task_struct *p)
> > > +{
> > > + return cpumask_intersects(p->cpus_ptr, cpu_preferred_mask);
> >
> > This makes is_cpu_allowed() O(N). Even if CONFIG_PARAVIRT is enabled,
> > I think some people would prefer to avoid this. Also, select_fallback_rq()
> > calls it in a loop, and this makes it O(N^2).
> >
> >
> > /* Any allowed, online CPU? */
> > for_each_cpu(dest_cpu, p->cpus_ptr) {
> > if (!is_cpu_allowed(p, dest_cpu))
> > continue;
> >
> > goto out;
> > }
> >
> > You can keep it O(N):
> > for_each_cpu_and(dest_cpu, p->cpus_ptr, cpu_preferred_mask) {
> > ...
> > }
>
> This would leave tasks which has affinity only on non-preferred CPUs without a CPU.
>
> That breaks below case,
> 600 CPUs, high steal time and hence preferred is 0-399.
> In that state, user does "taskset -c 500 <stress-ng>", that task ends up
> going to preferred CPUs since its affinity gets reset in the switch
> block later in select_fallback_rq.
>
> >
> > Not sure how critical that path is, but this looks suspicious.
> >
>
> Fair point. But we come here only if cpu_preferred(cpu) == false.
>
> When that happens, we expect the task running on will get preempted
> to a preferred CPUs. If it migrated to a preferred cpu, then on-wards
> it wont suffer the additional overhead of task_can_run_on_preferred_cpu
>
> case where it would be O(N^2) is for tasks which have explicit affinity
> on non-preferred CPUs. I see that as rare case.
>
> For the majority of the cases, it should still be O(N) since cpu_preferred(cpu)
> will be true.
Please describe it in the comment.
> Here is the benchmark data on system where there is 0 steal time.
> It is dedicated LPAR(VM).
>
> | Test | Baseline | NO STEAL | %diff to base| STEAL | %diff to base |
> | | | MONITOR | | MONITOR | |
> |---------------------------------|----------|----------|--------------|-----------|---------------|
> | HackBench Process 20 groups | 2.70 | 2.67 | +1.11% | 2.66 | +1.48% |
> | HackBench Process 40 groups | 5.31 | 5.26 | +0.94% | 5.30 | +0.19% |
> | HackBench Process 60 groups | 7.95 | 7.82 | +1.64% | 7.90 | +0.63% |
> | HackBench thread 10 Time | 1.61 | 1.59 | +1.24% | 1.56 | +3.11% |
> | HackBench thread 20 Time | 2.82 | 2.83 | -0.35% | 2.80 | +0.71% |
> | HackBench Process(Pipe) 20 Time | 1.51 | 1.50 | +0.66% | 1.47 | +2.65% |
> | HackBench Process(Pipe) 40 Time | 2.78 | 2.69 | +3.24% | 2.69 | +3.24% |
> | HackBench Process(Pipe) 60 Time | 3.73 | 3.76 | -0.80% | 3.64 | +2.41% |
> | HackBench thread(Pipe) 10 Time | 0.94 | 0.91 | +3.19% | 0.91 | +3.19% |
> | HackBench thread(Pipe) 20 Time | 1.58 | 1.58 | -0.00% | 1.59 | -0.63% |
>
>
>
>
> > > +}
> > > +#else
> > > +static inline bool task_can_run_on_preferred_cpu(struct task_struct *p)
> > > +{
> > > + return true;
> > > +}
> > > +#endif
> >
> > Same comment as in patch 3. I believe, it's worth to declare cpu_preferred_mask
> > unrelated to CONFIG_PARAVIRT, so that you'll not have to spread this
> > ifdefery around.
> >
>
> Yes.
>
> > > +
> > > #endif /* _KERNEL_SCHED_SCHED_H */
> > > --
> > > 2.47.3
next prev parent reply other threads:[~2026-04-08 18:09 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-07 19:19 [PATCH v2 00/17] sched/paravirt: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 01/17] sched/debug: Remove unused schedstats Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 02/17] sched/docs: Document cpu_preferred_mask and Preferred CPU concept Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 03/17] cpumask: Introduce cpu_preferred_mask Shrikanth Hegde
2026-04-07 20:27 ` Yury Norov
2026-04-08 9:16 ` Shrikanth Hegde
2026-04-08 17:57 ` Yury Norov
2026-05-05 4:07 ` Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 04/17] sysfs: Add preferred CPU file Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 05/17] sched/core: allow only preferred CPUs in is_cpu_allowed Shrikanth Hegde
2026-04-08 1:05 ` Yury Norov
2026-04-08 12:56 ` Shrikanth Hegde
2026-04-08 18:09 ` Yury Norov [this message]
2026-04-07 19:19 ` [PATCH v2 06/17] sched/fair: Select preferred CPU at wakeup when possible Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 07/17] sched/fair: load balance only among preferred CPUs Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 08/17] sched/rt: Select a preferred CPU for wakeup and pulling rt task Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 09/17] sched/core: Keep tick on non-preferred CPUs until tasks are out Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 10/17] sched/core: Push current task from non preferred CPU Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 11/17] sched/debug: Add migration stats due to non preferred CPUs Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 12/17] sched/feature: Add STEAL_MONITOR feature Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 13/17] sched/core: Introduce a simple steal monitor Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 14/17] sched/core: Compute steal values at regular intervals Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 15/17] sched/core: Handle steal values and mark CPUs as preferred Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 16/17] sched/core: Mark the direction of steal values to avoid oscillations Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 17/17] sched/debug: Add debug knobs for steal monitor Shrikanth Hegde
2026-04-07 19:50 ` [PATCH v2 00/17] sched/paravirt: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-04-08 10:14 ` Hillf Danton
2026-04-08 13:49 ` Shrikanth Hegde
2026-04-09 5:15 ` Hillf Danton
2026-04-09 10:27 ` Shrikanth Hegde
2026-04-10 9:47 ` Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adaZwCOUrozqkNej@yury \
--to=ynorov@nvidia.com \
--cc=bsegall@google.com \
--cc=chleroy@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=gregkh@linuxfoundation.org \
--cc=hdanton@sina.com \
--cc=huschle@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maddy@linux.ibm.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=srikar@linux.ibm.com \
--cc=sshegde@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vineeth@bitbyteword.org \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.