From: Yury Norov <yury.norov@gmail.com>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, yury.norov@gmail.com,
kprateek.nayak@amd.com, iii@linux.ibm.com, tglx@kernel.org,
gregkh@linuxfoundation.org, pbonzini@redhat.com,
seanjc@google.com, vschneid@redhat.com, huschle@linux.ibm.com,
rostedt@goodmis.org, dietmar.eggemann@arm.com, mgorman@suse.de,
bsegall@google.com, maddy@linux.ibm.com, srikar@linux.ibm.com,
hdanton@sina.com, chleroy@kernel.org, vineeth@bitbyteword.org,
frederic@kernel.org, arighi@nvidia.com, pauld@redhat.com,
christian.loehle@arm.com, tj@kernel.org,
tommaso.cucinotta@gmail.com, maz@kernel.org, rafael@kernel.org
Subject: Re: [PATCH v4 14/20] sched/core: Introduce a simple steal monitor
Date: Thu, 18 Jun 2026 00:30:50 -0400 [thread overview]
Message-ID: <ajN0erWZ-Bx8_Jtv@yury> (raw)
In-Reply-To: <20260617174139.155540-15-sshegde@linux.ibm.com>
On Wed, Jun 17, 2026 at 11:11:33PM +0530, Shrikanth Hegde wrote:
> Start with a simple steal monitor.
>
> It is meant to look at steal time and make the decision to
> reduce/increase the preferred CPUs.
>
> It has
> - work function to execute the steal time calculations and decision
> making periodically.
> - low and high thresholds for steal time.
> - sampling period to control the frequency of steal time calculations.
> - cache the previous decision to avoid oscillations
This monitor is the one implementation out of quite many possible,
right? I don't think it should live in the core scheduler files, it
should be a module.
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> v3->v4:
> - Drop tmp_mask
>
> include/linux/sched.h | 11 +++++++++++
> kernel/sched/core.c | 23 +++++++++++++++++++++++
> kernel/sched/sched.h | 3 +++
> 3 files changed, 37 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5f523782ca28..ce6bc8a22eb1 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2517,4 +2517,15 @@ extern void migrate_enable(void);
>
> DEFINE_LOCK_GUARD_0(migrate, migrate_disable(), migrate_enable())
>
> +#ifdef CONFIG_PREFERRED_CPU
> +struct steal_monitor_t {
> + struct work_struct work;
> + ktime_t prev_time;
> + u64 prev_steal;
> + int previous_decision;
> + unsigned int low_threshold;
> + unsigned int high_threshold;
> + unsigned int sampling_period_ms;
> +};
> +#endif
> #endif
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 24d4abc74241..cc48632dd42d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9138,6 +9138,8 @@ void __init sched_init(void)
>
> preempt_dynamic_init();
>
> + sched_init_steal_monitor();
> +
> scheduler_running = 1;
> }
>
> @@ -11384,4 +11386,25 @@ void sched_push_current_non_preferred_cpu(struct rq *rq)
> stop_one_cpu_nowait(rq->cpu, sched_non_preferred_cpu_push_stop,
> push_task, this_cpu_ptr(&npc_push_task_work));
> }
> +
> +struct steal_monitor_t steal_mon;
> +
> +void sched_init_steal_monitor(void)
> +{
> + INIT_WORK(&steal_mon.work, sched_steal_detection_work);
> + steal_mon.low_threshold = 200; /* 2% steal time */
> + steal_mon.high_threshold = 500; /* 5% steal time */
> + steal_mon.sampling_period_ms = 1000; /* once per second */
> +}
> +
> +/* This is only a skeleton. Subsequent patches introduce more of it */
> +void sched_steal_detection_work(struct work_struct *work)
> +{
> + struct steal_monitor_t *sm = container_of(work, struct steal_monitor_t, work);
> + ktime_t now;
> +
> + /* Update the prev_time for next iteration*/
> + now = ktime_get();
> + sm->prev_time = now;
> +}
> #endif
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 9cb006c21090..984da3827f19 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -4240,8 +4240,11 @@ static inline bool task_has_preferred_cpus(struct task_struct *p)
> #ifdef CONFIG_PREFERRED_CPU
> DECLARE_STATIC_KEY_FALSE(__sched_sm_enable);
>
> +void sched_init_steal_monitor(void);
> +void sched_steal_detection_work(struct work_struct *work);
> void sched_push_current_non_preferred_cpu(struct rq *rq);
> #else /* !CONFIG_PREFERRED_CPU */
> static inline void sched_push_current_non_preferred_cpu(struct rq *rq) { }
> +static inline void sched_init_steal_monitor(void) { }
> #endif
> #endif /* _KERNEL_SCHED_SCHED_H */
> --
> 2.47.3
next prev parent reply other threads:[~2026-06-18 4:30 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-17 17:41 [PATCH v4 00/20] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 01/20] sched/debug: Remove unused schedstats Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 02/20] sched/docs: Document cpu_preferred_mask and Preferred CPU concept Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 03/20] kconfig: Provide PREFERRED_CPU option Shrikanth Hegde
2026-06-18 0:51 ` Yury Norov
2026-06-18 3:44 ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 04/20] cpumask: Introduce cpu_preferred_mask Shrikanth Hegde
2026-06-18 1:29 ` Yury Norov
2026-06-18 3:53 ` Shrikanth Hegde
2026-06-18 8:27 ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 05/20] sysfs: Add preferred CPU file Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 06/20] sched/core: allow only preferred CPUs in is_cpu_allowed Shrikanth Hegde
2026-06-18 3:32 ` Yury Norov
2026-06-18 4:17 ` Shrikanth Hegde
2026-06-18 4:49 ` Yury Norov
2026-06-18 5:14 ` Shrikanth Hegde
2026-06-18 3:49 ` K Prateek Nayak
2026-06-18 4:22 ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 07/20] sched/fair: Select preferred CPU at wakeup when possible Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 08/20] sched/fair: load balance only among preferred CPUs Shrikanth Hegde
2026-06-18 3:03 ` K Prateek Nayak
2026-06-18 3:54 ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 09/20] sched/core: Keep tick on non-preferred CPUs until tasks are out Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 10/20] sched/core: Push current task from non preferred CPU Shrikanth Hegde
2026-06-18 4:09 ` K Prateek Nayak
2026-06-18 6:05 ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 11/20] sched/debug: Add migration stats due to non preferred CPUs Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 12/20] sched/debug: Create debugfs folder steal monitor Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 13/20] sched/debug: Provide debugfs to enable/disable " Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 14/20] sched/core: Introduce a simple " Shrikanth Hegde
2026-06-18 4:30 ` Yury Norov [this message]
2026-06-18 4:44 ` Shrikanth Hegde
2026-06-18 5:32 ` K Prateek Nayak
2026-06-18 6:01 ` Shrikanth Hegde
2026-06-18 6:39 ` Yury Norov
2026-06-18 6:45 ` Shrikanth Hegde
2026-06-18 7:16 ` Yury Norov
2026-06-17 17:41 ` [PATCH v4 15/20] sched/core: Compute steal values at regular intervals Shrikanth Hegde
2026-06-18 4:04 ` Yury Norov
2026-06-18 5:39 ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 16/20] sched/core: Introduce default arch handling code for inc/dec preferred CPUs Shrikanth Hegde
2026-06-18 4:15 ` Yury Norov
2026-06-18 4:42 ` Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 17/20] sched/core: Handle steal values and mark CPUs as preferred Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 18/20] sched/core: Mark the direction of steal values to avoid oscillations Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 19/20] sched/debug: Add debug knobs for steal monitor Shrikanth Hegde
2026-06-17 17:41 ` [PATCH v4 20/20] sched/core: Add a few check for valid CPU in inc/dec of preferred CPUs Shrikanth Hegde
2026-06-18 4:21 ` Yury Norov
2026-06-18 4:40 ` Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajN0erWZ-Bx8_Jtv@yury \
--to=yury.norov@gmail.com \
--cc=arighi@nvidia.com \
--cc=bsegall@google.com \
--cc=chleroy@kernel.org \
--cc=christian.loehle@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=hdanton@sina.com \
--cc=huschle@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maddy@linux.ibm.com \
--cc=maz@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=pauld@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=srikar@linux.ibm.com \
--cc=sshegde@linux.ibm.com \
--cc=tglx@kernel.org \
--cc=tj@kernel.org \
--cc=tommaso.cucinotta@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=vineeth@bitbyteword.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.