All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, yury.norov@gmail.com,
	kprateek.nayak@amd.com, iii@linux.ibm.com, tglx@kernel.org,
	gregkh@linuxfoundation.org, pbonzini@redhat.com,
	seanjc@google.com, vschneid@redhat.com, huschle@linux.ibm.com,
	rostedt@goodmis.org, dietmar.eggemann@arm.com, mgorman@suse.de,
	bsegall@google.com, maddy@linux.ibm.com, srikar@linux.ibm.com,
	hdanton@sina.com, chleroy@kernel.org, vineeth@bitbyteword.org,
	frederic@kernel.org, arighi@nvidia.com, pauld@redhat.com,
	christian.loehle@arm.com, tj@kernel.org,
	tommaso.cucinotta@gmail.com, maz@kernel.org, rafael@kernel.org
Subject: Re: [PATCH v4 15/20] sched/core: Compute steal values at regular intervals
Date: Thu, 18 Jun 2026 00:04:25 -0400	[thread overview]
Message-ID: <ajNuSb5_MmT7IULg@yury> (raw)
In-Reply-To: <20260617174139.155540-16-sshegde@linux.ibm.com>

On Wed, Jun 17, 2026 at 11:11:34PM +0530, Shrikanth Hegde wrote:
> Kick off the work to compute the steal time at regular interval.
> Gated with steal monitor enabled static key check to avoid any overhead
> when its disabled.
> 
> The sampling period can changed at runtime using steal_mon/sampling_period.
> By default is 1000 milliseconds. I.e. 1 second
> 
> This work is done by first active housekeeping CPU only. Hence it won't
> need any complicated synchronization.
> 
> Now, that sched_steal_mon_enabled() is available which is a static branch,
> add this to hotpath such as wakeup and load balance.
> This will make them effectively nop when the feature is disabled.
> 
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> v3->v4:
> - Add static key check in hotpaths. Could be split into a separate
>   patch. Let me know if thats better. 
> 
>  include/linux/sched.h |  2 ++
>  kernel/sched/core.c   | 28 +++++++++++++++++++++++++++-
>  kernel/sched/debug.c  |  1 +
>  kernel/sched/fair.c   |  3 ++-
>  kernel/sched/sched.h  | 10 +++++++++-
>  5 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ce6bc8a22eb1..5b15353ed7ef 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2527,5 +2527,7 @@ struct steal_monitor_t {
>  	unsigned int high_threshold;
>  	unsigned int sampling_period_ms;
>  };
> +
> +extern struct steal_monitor_t steal_mon;
>  #endif
>  #endif
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index cc48632dd42d..f1a91021e357 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5793,7 +5793,7 @@ void sched_tick(void)
>  	unsigned long hw_pressure;
>  	u64 resched_latency;
>  
> -	if (!cpu_preferred(cpu))
> +	if (sched_steal_mon_enabled() && !cpu_preferred(cpu))
>  		sched_push_current_non_preferred_cpu(rq);

This looks like CPU can be non-preferred only if steal monitor is
enabled. To properly implement it, you need to mark all active CPUs
as preferred during the steal monitor disabling. That way you don't
need to complicate the condition.

>  
>  	if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE))
> @@ -5834,6 +5834,9 @@ void sched_tick(void)
>  		rq->idle_balance = idle_cpu(cpu);
>  		sched_balance_trigger(rq);
>  	}
> +
> +	if (sched_steal_mon_enabled())
> +		sched_trigger_steal_computation(cpu);
>  }
>  
>  #ifdef CONFIG_NO_HZ_FULL
> @@ -11407,4 +11410,27 @@ void sched_steal_detection_work(struct work_struct *work)
>  	now = ktime_get();
>  	sm->prev_time = now;
>  }
> +
> +void sched_trigger_steal_computation(int cpu)
> +{
> +	int first_hk_cpu = cpumask_first_and(housekeeping_cpumask(HK_TYPE_KERNEL_NOISE),
> +					     cpu_active_mask);
> +	ktime_t now;
> +
> +	/* Done by first active housekeeping CPU only */
> +	if (likely(cpu != first_hk_cpu))
> +		return;
> +
> +	/*
> +	 * Since everything is updated by first housekeeping CPU,
> +	 * There is no need for complex syncronization.
> +	 */
> +	now = ktime_get();
> +
> +	/* Default is once per second */
> +	if (likely(ktime_ms_delta(now, steal_mon.prev_time) < steal_mon.sampling_period_ms))
> +		return;
> +
> +	schedule_work_on(first_hk_cpu, &steal_mon.work);

I think, there should be a better way to schedule a work on regular
interval...

Maybe steal_mon.work would schedule itself? So, the first time it's
scheduled on steal monitor enablement, and then just reschedules
itself. This way you'll avoid polluting sched_tick().


> +}
>  #endif
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 2d62858f9cc0..55b8beb42574 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -649,6 +649,7 @@ static ssize_t sched_sm_en_write(struct file *filp, const char __user *ubuf,
>  		static_branch_enable(&__sched_sm_enable);
>  	} else if (!sched_sm_wr_enable && orig) {
>  		static_branch_disable(&__sched_sm_enable);
> +		cancel_work_sync(&steal_mon.work);
>  		cpumask_copy(&__cpu_preferred_mask, cpu_active_mask);
>  	}
>  
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3f3c7f0ca489..b02a414ffaae 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -13292,7 +13292,8 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
>  	cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>  
>  	/* Spread load among preferred CPUs */
> -	cpumask_and(cpus, cpus, cpu_preferred_mask);
> +	if (sched_steal_mon_enabled())
> +		cpumask_and(cpus, cpus, cpu_preferred_mask);

Again, if you mark do cpumask_copy(preferred, active) on the steal
monitor disablement, you don't need to complicate core logic here and
there.

>  
>  	schedstat_inc(sd->lb_count[idle]);
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 984da3827f19..f3814099cc0b 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1060,6 +1060,7 @@ struct root_domain {
>  	struct perf_domain __rcu *pd;
>  };
>  
> +static inline bool sched_steal_mon_enabled(void);
>  extern void init_defrootdomain(void);
>  extern int sched_init_domains(const struct cpumask *cpu_map);
>  extern void rq_attach_root(struct rq *rq, struct root_domain *rd);
> @@ -1436,7 +1437,7 @@ static inline bool available_idle_cpu(int cpu)
>  	if (!idle_rq(cpu_rq(cpu)))
>  		return 0;
>  
> -	if (!cpu_preferred(cpu))
> +	if (sched_steal_mon_enabled() && !cpu_preferred(cpu))
>  		return 0;
>  
>  	if (vcpu_is_preempted(cpu))
> @@ -4243,8 +4244,15 @@ DECLARE_STATIC_KEY_FALSE(__sched_sm_enable);
>  void sched_init_steal_monitor(void);
>  void sched_steal_detection_work(struct work_struct *work);
>  void sched_push_current_non_preferred_cpu(struct rq *rq);
> +void sched_trigger_steal_computation(int cpu);
> +static inline bool sched_steal_mon_enabled(void)
> +{
> +	return static_branch_unlikely(&__sched_sm_enable);
> +}
>  #else	/* !CONFIG_PREFERRED_CPU */
>  static inline void sched_push_current_non_preferred_cpu(struct rq *rq) { }
>  static inline void sched_init_steal_monitor(void) { }
> +static inline void sched_trigger_steal_computation(int cpu) { }
> +static inline bool sched_steal_mon_enabled(void) { return false; }
>  #endif
>  #endif /* _KERNEL_SCHED_SCHED_H */
> -- 
> 2.47.3

  reply	other threads:[~2026-06-18  4:04 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17 17:41 [PATCH v4 00/20] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 01/20] sched/debug: Remove unused schedstats Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 02/20] sched/docs: Document cpu_preferred_mask and Preferred CPU concept Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 03/20] kconfig: Provide PREFERRED_CPU option Shrikanth Hegde
2026-06-18  0:51   ` Yury Norov
2026-06-18  3:44     ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 04/20] cpumask: Introduce cpu_preferred_mask Shrikanth Hegde
2026-06-18  1:29   ` Yury Norov
2026-06-18  3:53     ` Shrikanth Hegde
2026-06-18  8:27       ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 05/20] sysfs: Add preferred CPU file Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 06/20] sched/core: allow only preferred CPUs in is_cpu_allowed Shrikanth Hegde
2026-06-18  3:32   ` Yury Norov
2026-06-18  4:17     ` Shrikanth Hegde
2026-06-18  4:49       ` Yury Norov
2026-06-18  5:14         ` Shrikanth Hegde
2026-06-18  3:49   ` K Prateek Nayak
2026-06-18  4:22     ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 07/20] sched/fair: Select preferred CPU at wakeup when possible Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 08/20] sched/fair: load balance only among preferred CPUs Shrikanth Hegde
2026-06-18  3:03   ` K Prateek Nayak
2026-06-18  3:54     ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 09/20] sched/core: Keep tick on non-preferred CPUs until tasks are out Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 10/20] sched/core: Push current task from non preferred CPU Shrikanth Hegde
2026-06-18  4:09   ` K Prateek Nayak
2026-06-18  6:05     ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 11/20] sched/debug: Add migration stats due to non preferred CPUs Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 12/20] sched/debug: Create debugfs folder steal monitor Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 13/20] sched/debug: Provide debugfs to enable/disable " Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 14/20] sched/core: Introduce a simple " Shrikanth Hegde
2026-06-18  4:30   ` Yury Norov
2026-06-18  4:44     ` Shrikanth Hegde
2026-06-18  5:32       ` K Prateek Nayak
2026-06-18  6:01         ` Shrikanth Hegde
2026-06-18  6:39           ` Yury Norov
2026-06-18  6:45             ` Shrikanth Hegde
2026-06-18  7:16               ` Yury Norov
2026-06-17 17:41 ` [PATCH v4 15/20] sched/core: Compute steal values at regular intervals Shrikanth Hegde
2026-06-18  4:04   ` Yury Norov [this message]
2026-06-18  5:39     ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 16/20] sched/core: Introduce default arch handling code for inc/dec preferred CPUs Shrikanth Hegde
2026-06-18  4:15   ` Yury Norov
2026-06-18  4:42     ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 17/20] sched/core: Handle steal values and mark CPUs as preferred Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 18/20] sched/core: Mark the direction of steal values to avoid oscillations Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 19/20] sched/debug: Add debug knobs for steal monitor Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 20/20] sched/core: Add a few check for valid CPU in inc/dec of preferred CPUs Shrikanth Hegde
2026-06-18  4:21   ` Yury Norov
2026-06-18  4:40     ` Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajNuSb5_MmT7IULg@yury \
    --to=yury.norov@gmail.com \
    --cc=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=chleroy@kernel.org \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=frederic@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hdanton@sina.com \
    --cc=huschle@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maddy@linux.ibm.com \
    --cc=maz@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=pauld@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    --cc=srikar@linux.ibm.com \
    --cc=sshegde@linux.ibm.com \
    --cc=tglx@kernel.org \
    --cc=tj@kernel.org \
    --cc=tommaso.cucinotta@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vineeth@bitbyteword.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.