Re: [PATCH v5 06/24] sched/core: allow only preferred CPUs in is_cpu_allowed

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Yury Norov <yury.norov@gmail.com>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>,
	linux-kernel@vger.kernel.org, mingo@kernel.org,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, kprateek.nayak@amd.com,
	iii@linux.ibm.com, corbet@lwn.net, tglx@kernel.org,
	gregkh@linuxfoundation.org, pbonzini@redhat.com,
	seanjc@google.com, vschneid@redhat.com, huschle@linux.ibm.com,
	rostedt@goodmis.org, dietmar.eggemann@arm.com,
	maddy@linux.ibm.com, srikar@linux.ibm.com, hdanton@sina.com,
	chleroy@kernel.org, vineeth@bitbyteword.org, frederic@kernel.org,
	arighi@nvidia.com, pauld@redhat.com, christian.loehle@arm.com,
	tj@kernel.org, tommaso.cucinotta@gmail.com, maz@kernel.org,
	rafael@kernel.org, rdunlap@infradead.org, kernellwp@gmail.com,
	linux-doc@vger.kernel.org
Subject: Re: [PATCH v5 06/24] sched/core: allow only preferred CPUs in is_cpu_allowed
Date: Mon, 29 Jun 2026 00:09:35 -0400	[thread overview]
Message-ID: <akHv_4l0izJb6TgZ@yury> (raw)
In-Reply-To: <1ace8847-db42-49a8-8d0f-6eeead1c360f@linux.ibm.com>

On Sat, Jun 27, 2026 at 12:13:49AM +0530, Shrikanth Hegde wrote:
> Hi Yury.
> 
> On 6/26/26 6:55 PM, Shrikanth Hegde wrote:
> > Hi Yury. Thanks for going through the patches.
> > 
> 
> [...]
> 
> > > So, you've got 3 options to declare the status: self-explaining enum,
> > > self-explaining #defines, and this random numbers explained in
> > > comment. The latter option is the worst to me.
> > 
> > ok. I will define the enums.
> > 
> > > 
> > > And you didn't provide any benchmark advocating this caching
> > > optimization.
> 
> 
> I did below to see. Made interval as 100ms.
> Ran ./hackbench 30 process 30000 loops in both the VM at the same time.
> Values are average of 5 runs.
> 
> With optimization:
> 13.6 seconds
> 
> Without optimization:
> 13.8 seconds

And what's the p-value for them?

...
 
> > If we move to local variable then this won;t be necessary,
> > just enum's would be enough (I think). Let me go stare at it.
> 
> I have made it use the local variable instead. There maybe better names
> for variable, put something quickly to check the idea.
> Effectively this PATCH 6 becomes:
> 
> Does this seems better?
> Please let me know your comments.

I think, the below is too massive change for optimization of an
optimization for a particular, not too common config. And the whole
improvement is ~1%, assuming it's statistically important...

Just as said on previous round. Please order your series such that the
core logic goes first, and all sorts of complications, like this
optimization, are appended at the end.

> ---
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9e16946c9d62..fafedd52611f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2498,8 +2498,10 @@ static inline bool rq_has_pinned_tasks(struct rq *rq)
>   * Per-CPU kthreads are allowed to run on !active && online CPUs, see
>   * __set_cpus_allowed_ptr() and select_fallback_rq().
>   */
> -static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
> +static inline bool is_cpu_allowed(struct task_struct *p, int cpu, int cached)
>  {
> +       bool task_check_preferred_cpu;
> +
>         /* When not in the task's cpumask, no point in looking further. */
>         if (!task_allowed_on_cpu(p, cpu))
>                 return false;
> @@ -2508,9 +2510,24 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
>         if (is_migration_disabled(p))
>                 return cpu_online(cpu);
> +       /*
> +        * This is essential to maintain user affinities when preferred
> +        * CPUs change. A task pinned on non-preferred CPU should continue
> +        * to run there, since this is non-user triggered.
> +        *
> +        * If CPU is non-preferred and task can run on other CPUs which are
> +        * currently preferred, then choose those other CPUs instead.
> +        * Overhead is minimal when CPU is preferred.
> +        */
> +       task_check_preferred_cpu = !cpu_preferred(cpu) &&
> +                                  task_has_preferred_cpus(p, cached);
> +
>         /* Non kernel threads are not allowed during either online or offline. */
> -       if (!(p->flags & PF_KTHREAD))
> +       if (!(p->flags & PF_KTHREAD)) {
> +               if (task_check_preferred_cpu)
> +                       return false;
>                 return cpu_active(cpu);
> +       }
>         /* KTHREAD_IS_PER_CPU is always allowed. */
>         if (kthread_is_per_cpu(p))
> @@ -2520,6 +2537,10 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
>         if (cpu_dying(cpu))
>                 return false;
> +       /* Try on preferred CPU first if possible*/
> +       if (task_check_preferred_cpu)
> +               return false;
> +
>         /* But are allowed during online. */
>         return cpu_online(cpu);
>  }
> @@ -2595,7 +2616,7 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
>         __must_hold(__rq_lockp(rq))
>  {
>         /* Affinity changed (again). */
> -       if (!is_cpu_allowed(p, dest_cpu))
> +       if (!is_cpu_allowed(p, dest_cpu, NO_CACHED_VAL))
>                 return rq;

This thing I really dislike. The unrelated code should not be
affected. You can make it less visually invasive with:
        
        #define is_cpu_allowed(p, cpu) __is_cpu_allowed(p, cpu, NO_CACHED_VAL)

Please reconsider your code to have the changes better localized.

Thanks,
Yury

>         rq = move_queued_task(rq, rf, p, dest_cpu);
> @@ -3547,7 +3568,15 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>         int nid = cpu_to_node(cpu);
>         const struct cpumask *nodemask = NULL;
>         enum { cpuset, possible, fail } state = cpuset;
> -       int dest_cpu;
> +       int dest_cpu, has_preferred_cpu;
> +
> +       /*
> +        * Cache the value whether task's affinity spans preferred CPUs.
> +        * This helps to avoid repeating the same for each CPU
> +        * later in the loop.
> +        */
> +       has_preferred_cpu = task_has_preferred_cpus(p, NO_CACHED_VAL) ?
> +                           TASK_HAS_PREFERRED_CPUS : TASK_NO_PREFERRED_CPUS;
>         /*
>          * If the node that the CPU is on has been offlined, cpu_to_node()
> @@ -3559,7 +3588,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>                 /* Look for allowed, online CPU in same node. */
>                 for_each_cpu(dest_cpu, nodemask) {
> -                       if (is_cpu_allowed(p, dest_cpu))
> +                       if (is_cpu_allowed(p, dest_cpu, has_preferred_cpu))
>                                 return dest_cpu;
>                 }
>         }
> @@ -3567,7 +3596,7 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>         for (;;) {
>                 /* Any allowed, online CPU? */
>                 for_each_cpu(dest_cpu, p->cpus_ptr) {
> -                       if (!is_cpu_allowed(p, dest_cpu))
> +                       if (!is_cpu_allowed(p, dest_cpu, has_preferred_cpu))
>                                 continue;
>                         goto out;
> @@ -3632,7 +3661,7 @@ int select_task_rq(struct task_struct *p, int cpu, int *wake_flags)
>          * [ this allows ->select_task() to simply return task_cpu(p) and
>          *   not worry about this generic constraint ]
>          */
> -       if (unlikely(!is_cpu_allowed(p, cpu)))
> +       if (unlikely(!is_cpu_allowed(p, cpu, NO_CACHED_VAL)))
>                 cpu = select_fallback_rq(task_cpu(p), p);
>         return cpu;
> @@ -6467,7 +6496,7 @@ static bool try_steal_cookie(int this, int that)
>                 if (p == src->core_pick || p == src->curr)
>                         goto next;
> -               if (!is_cpu_allowed(p, this))
> +               if (!is_cpu_allowed(p, this, NO_CACHED_VAL))
>                         goto next;
>                 if (p->core_occupation > dst->idle->core_occupation)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index c7c2dea65edd..949c044702c1 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -4213,4 +4213,32 @@ DEFINE_CLASS_IS_UNCONDITIONAL(sched_change)
>  #include "ext.h"
> +enum task_preferred_cached {
> +       TASK_NO_PREFERRED_CPUS = -1,
> +       NO_CACHED_VAL,
> +       TASK_HAS_PREFERRED_CPUS,
> +};
> +
> +/*
> + * Value is cached when called via select_fallback_rq().
> + *
> + * TASK_NO_PREFERRED_CPUS : Cached and preferred CPUs exists in task's
> + *                          affinity.
> + * NO_CACHED_VAL:           Not cached and need to evaluate.
> + * TASK_HAS_PREFERRED_CPUS: Cached and preferred CPU doesn't exits
> + *                          task's affinity
> + *
> + * Only affects FAIR task.
> + */
> +static inline bool task_has_preferred_cpus(struct task_struct *p, int cached)
> +{
> +       /* Only FAIR tasks honor preferred CPU state */
> +       if (unlikely(p->sched_class != &fair_sched_class))
> +               return false;
> +
> +       if (cached)
> +               return cached > 0;
> +       else
> +               return cpumask_intersects(p->cpus_ptr, cpu_preferred_mask);
> +}
>  #endif /* _KERNEL_SCHED_SCHED_H */

next prev parent reply	other threads:[~2026-06-29  4:09 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 12:46 [PATCH v5 00/24] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 01/24] sched/debug: Remove unused schedstats Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 02/24] sched/docs: Document cpu_preferred_mask and Preferred CPU concept Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 03/24] kconfig: Provide PREFERRED_CPU option Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 04/24] cpumask: Introduce cpu_preferred_mask Shrikanth Hegde
2026-06-26  9:34   ` Peter Zijlstra
2026-06-26 13:37     ` Shrikanth Hegde
2026-06-26  9:39   ` Peter Zijlstra
2026-06-26  9:41     ` Peter Zijlstra
2026-06-26 13:09       ` Shrikanth Hegde
2026-06-26 13:18         ` Yury Norov
2026-06-26 13:27           ` Shrikanth Hegde
2026-06-26 12:40     ` Yury Norov
2026-06-26 13:18       ` Shrikanth Hegde
2026-06-26 18:51     ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 05/24] sysfs: Add preferred CPU file Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 06/24] sched/core: allow only preferred CPUs in is_cpu_allowed Shrikanth Hegde
2026-06-26 13:06   ` Yury Norov
2026-06-26 13:25     ` Shrikanth Hegde
2026-06-26 18:43       ` Shrikanth Hegde
2026-06-29  4:09         ` Yury Norov [this message]
2026-06-29  4:14           ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 07/24] sched/fair: Select preferred CPU at wakeup when possible Shrikanth Hegde
2026-06-26  9:59   ` Peter Zijlstra
2026-06-26 13:17     ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 08/24] sched/fair: load balance only among preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 09/24] sched/fair: Pull the load on preferred CPU Shrikanth Hegde
2026-06-26 10:00   ` Peter Zijlstra
2026-06-26 13:35     ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 10/24] sched/core: Keep tick on non-preferred CPUs until tasks are out Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 11/24] sched/core: Push current task from non preferred CPU Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 12/24] sched/debug: Add migration stats due to non preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 13/24] virt/steal_monitor: Add documentation Shrikanth Hegde
2026-06-25 17:00   ` Randy Dunlap
2026-06-26  4:30     ` Shrikanth Hegde
2026-06-26  9:28   ` Peter Zijlstra
2026-06-26 14:05     ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 14/24] virt: Introduce steal monitor driver Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 15/24] virt/steal_monitor: Restore to active on module disable Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 16/24] virt/steal_monitor: Define steal_monitor structure Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 17/24] virt/steal_monitor: Add control knobs for handling steal values Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 18/24] virt/steal_monitor: Compute work at regular intervals Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 19/24] virt/steal_monitor: Provide default method to get systemwide steal time Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 20/24] virt/steal_monitor: Provide default method to inc/dec preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 21/24] virt/steal_monitor: Provide default method to get num of CPUs for steal ratio Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 22/24] virt/steal_monitor: Act on steal values at regular intervals Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 23/24] virt/steal_monitor: Add direction control Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 24/24] virt/steal_monitor: Add design check of preferred subset of active Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akHv_4l0izJb6TgZ@yury \
    --to=yury.norov@gmail.com \
    --cc=arighi@nvidia.com \
    --cc=chleroy@kernel.org \
    --cc=christian.loehle@arm.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=frederic@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hdanton@sina.com \
    --cc=huschle@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernellwp@gmail.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maddy@linux.ibm.com \
    --cc=maz@kernel.org \
    --cc=mingo@kernel.org \
    --cc=pauld@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    --cc=srikar@linux.ibm.com \
    --cc=sshegde@linux.ibm.com \
    --cc=tglx@kernel.org \
    --cc=tj@kernel.org \
    --cc=tommaso.cucinotta@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vineeth@bitbyteword.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox