From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 177E221D5AF for ; Thu, 18 Jun 2026 04:04:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781755469; cv=none; b=WOC9P46jRL9uX9jqu/l96eY+mJ9XvV98s6NySA18Vcw4RhLcWw1p2foEYlQSq2RUfdQZNQwqPjFCEfuMvcGP1dkY47Dc2cPKR/RPRBaIewrAwP8GnJXJQhVuvqSBpBdaT2Bfhq+K4ASDnC3sEAdy2P4fNrLNxY/K0NEPCkITJGQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781755469; c=relaxed/simple; bh=VIpdiBOTRt1So1BDDH+5PvaUvwpSlcmwRuzL+Zcsez4=; h=From:Date:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Ek+CMq9KeSAxpzaHgMIxZ0lFBoBzUOM4kwRplU3TId7slfp3EaRxxUlZOtBcelOVK4URzlp59WIDDjKBCm7hjMMxWlKwrYPE7OkMy3ruz0bR1uaZNoTAJVC+zWC2A+abuXY22aaAWF8jkpUDnFIezLL0EoANI34rrzprcEAUapU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=To3Bt7Mg; arc=none smtp.client-ip=209.85.128.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="To3Bt7Mg" Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-7fd5346b5e2so15026887b3.0 for ; Wed, 17 Jun 2026 21:04:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781755467; x=1782360267; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=uqjUKq1CJpvPfhbvh697ImrP1WQHEJPlkjNdj0uMY70=; b=To3Bt7Mg/g09FXSS31rTWMlF1iGlOUXenjqGdUZaox0UkdXMoN17MweUcTaRzeWA6a gd9tcpzTkn4bPnFJRut1QjR0nWyhrvOxwKRUMSaTED5NzttGj/MRigzCSdobC1U3+qAt fO0W5uXk9E3pLBj4n9kMGXdSUbzSmomdcct7B6DOZ3mcXFXi/MDagkRXw8dam/cXHknF gnVp++WN5NoS91pM4TFn/yQC+wxe1fhQcFWB4N0Mg7Fh88r3E0IPs520F7gxRXYmZ4AM hUxy88vsET8Cx+QOOkRV9rAhCSV+tUpqA9qCVtZPScn9eMFhIf9mMy1eYAhh17FsMvpQ A41w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781755467; x=1782360267; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uqjUKq1CJpvPfhbvh697ImrP1WQHEJPlkjNdj0uMY70=; b=oEf2do6g+oYv6HGeoMa8rAOpkds6baVLcK6sf6YexJdRtvVyMSXdnQUNyn/6hIArWY i4sTpqbP3vm2v5XX5bNHrRNwGEzHTdkfBzisWLVSoZQVmuIp1jIfTAXiRX3YiheFHNrU CpE94sRU64hC8sBAVbZNzK2cD4PQYn3fhoFGArZ9FepjZ6yir1mo3MLWfUleBVtqjo57 zkxpnsOwbRdOardU9V2FSERaO9+d4X8FHCr0A/j54SHuDduPMPN0ihVSvKrRThXQ8ovR roDef+wtMnfZcbzmRNVIYh08JSpL4D/IyXZ1t8JcCQwp4LX9kdOwL08e+tAlj6K34C/o MDyA== X-Gm-Message-State: AOJu0YyRtwzmQvf7Mf/1GIwe02mRnCGYuGL+sYCYXk5zLHSv6u/ItomU iH52FSeOIOOgKiwrpBY5MiEVDYSBgVdL23pL/ehfZ8g2/KIk+TcpzV5W X-Gm-Gg: AfdE7cmZtUhqOlP/LpIKxCXZsBhlfzCmHGHxuklXhsJyMD0zVtM7/9hLCkDQwgqbq9D JSiidh+g3C3D9GHNJZetEF99DUoPj6RqyIzV/SKSi6100C1Loo3Q8wH0sDWTWaz5Ph+eTkTV8Rw E3LTNjtn0+2+s5FnVp55qfo0bc3NmtVc69ku+NFT/a/qymudh81/IVxyR9I/aw/HBWrgnNACja8 uc/mk2LJqTEAXkxSBeqPTl/hWAObgXkxVV6ZoTcZsjdU+ET1o9MKBsY89+1AcrNc2bODfhw+Cm7 PzlfOhA0SeYHw7fLqi/B0Cs7flQUjGMDrkh1kDJlZ88inbWLPZa1TJV3/HhXmOKILn2bi2SEb/f Hc/8K41R2epIEVQs5TEC2vwgWZROEt94PoZTnrfJjWgG6EXecpwC87gT4b1JqjA5pEGycA23TA/ JUKASV7F72Aq0XDsuX5iUfM5ShCd8ZiZi4OaSBXY3TTJ11Kw== X-Received: by 2002:a05:690c:64c9:b0:7ff:5f0b:508d with SMTP id 00721157ae682-7ffa1089afdmr23739387b3.7.1781755466887; Wed, 17 Jun 2026 21:04:26 -0700 (PDT) Received: from localhost (syn-035-130-123-074.biz.spectrum.com. [35.130.123.74]) by smtp.gmail.com with ESMTPSA id 00721157ae682-7ff03e9eca6sm25753887b3.45.2026.06.17.21.04.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jun 2026 21:04:26 -0700 (PDT) From: Yury Norov X-Google-Original-From: Yury Norov Date: Thu, 18 Jun 2026 00:04:25 -0400 To: Shrikanth Hegde Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, yury.norov@gmail.com, kprateek.nayak@amd.com, iii@linux.ibm.com, tglx@kernel.org, gregkh@linuxfoundation.org, pbonzini@redhat.com, seanjc@google.com, vschneid@redhat.com, huschle@linux.ibm.com, rostedt@goodmis.org, dietmar.eggemann@arm.com, mgorman@suse.de, bsegall@google.com, maddy@linux.ibm.com, srikar@linux.ibm.com, hdanton@sina.com, chleroy@kernel.org, vineeth@bitbyteword.org, frederic@kernel.org, arighi@nvidia.com, pauld@redhat.com, christian.loehle@arm.com, tj@kernel.org, tommaso.cucinotta@gmail.com, maz@kernel.org, rafael@kernel.org Subject: Re: [PATCH v4 15/20] sched/core: Compute steal values at regular intervals Message-ID: References: <20260617174139.155540-1-sshegde@linux.ibm.com> <20260617174139.155540-16-sshegde@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260617174139.155540-16-sshegde@linux.ibm.com> On Wed, Jun 17, 2026 at 11:11:34PM +0530, Shrikanth Hegde wrote: > Kick off the work to compute the steal time at regular interval. > Gated with steal monitor enabled static key check to avoid any overhead > when its disabled. > > The sampling period can changed at runtime using steal_mon/sampling_period. > By default is 1000 milliseconds. I.e. 1 second > > This work is done by first active housekeeping CPU only. Hence it won't > need any complicated synchronization. > > Now, that sched_steal_mon_enabled() is available which is a static branch, > add this to hotpath such as wakeup and load balance. > This will make them effectively nop when the feature is disabled. > > Signed-off-by: Shrikanth Hegde > --- > v3->v4: > - Add static key check in hotpaths. Could be split into a separate > patch. Let me know if thats better. > > include/linux/sched.h | 2 ++ > kernel/sched/core.c | 28 +++++++++++++++++++++++++++- > kernel/sched/debug.c | 1 + > kernel/sched/fair.c | 3 ++- > kernel/sched/sched.h | 10 +++++++++- > 5 files changed, 41 insertions(+), 3 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index ce6bc8a22eb1..5b15353ed7ef 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -2527,5 +2527,7 @@ struct steal_monitor_t { > unsigned int high_threshold; > unsigned int sampling_period_ms; > }; > + > +extern struct steal_monitor_t steal_mon; > #endif > #endif > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index cc48632dd42d..f1a91021e357 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5793,7 +5793,7 @@ void sched_tick(void) > unsigned long hw_pressure; > u64 resched_latency; > > - if (!cpu_preferred(cpu)) > + if (sched_steal_mon_enabled() && !cpu_preferred(cpu)) > sched_push_current_non_preferred_cpu(rq); This looks like CPU can be non-preferred only if steal monitor is enabled. To properly implement it, you need to mark all active CPUs as preferred during the steal monitor disabling. That way you don't need to complicate the condition. > > if (housekeeping_cpu(cpu, HK_TYPE_KERNEL_NOISE)) > @@ -5834,6 +5834,9 @@ void sched_tick(void) > rq->idle_balance = idle_cpu(cpu); > sched_balance_trigger(rq); > } > + > + if (sched_steal_mon_enabled()) > + sched_trigger_steal_computation(cpu); > } > > #ifdef CONFIG_NO_HZ_FULL > @@ -11407,4 +11410,27 @@ void sched_steal_detection_work(struct work_struct *work) > now = ktime_get(); > sm->prev_time = now; > } > + > +void sched_trigger_steal_computation(int cpu) > +{ > + int first_hk_cpu = cpumask_first_and(housekeeping_cpumask(HK_TYPE_KERNEL_NOISE), > + cpu_active_mask); > + ktime_t now; > + > + /* Done by first active housekeeping CPU only */ > + if (likely(cpu != first_hk_cpu)) > + return; > + > + /* > + * Since everything is updated by first housekeeping CPU, > + * There is no need for complex syncronization. > + */ > + now = ktime_get(); > + > + /* Default is once per second */ > + if (likely(ktime_ms_delta(now, steal_mon.prev_time) < steal_mon.sampling_period_ms)) > + return; > + > + schedule_work_on(first_hk_cpu, &steal_mon.work); I think, there should be a better way to schedule a work on regular interval... Maybe steal_mon.work would schedule itself? So, the first time it's scheduled on steal monitor enablement, and then just reschedules itself. This way you'll avoid polluting sched_tick(). > +} > #endif > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c > index 2d62858f9cc0..55b8beb42574 100644 > --- a/kernel/sched/debug.c > +++ b/kernel/sched/debug.c > @@ -649,6 +649,7 @@ static ssize_t sched_sm_en_write(struct file *filp, const char __user *ubuf, > static_branch_enable(&__sched_sm_enable); > } else if (!sched_sm_wr_enable && orig) { > static_branch_disable(&__sched_sm_enable); > + cancel_work_sync(&steal_mon.work); > cpumask_copy(&__cpu_preferred_mask, cpu_active_mask); > } > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 3f3c7f0ca489..b02a414ffaae 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -13292,7 +13292,8 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq, > cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask); > > /* Spread load among preferred CPUs */ > - cpumask_and(cpus, cpus, cpu_preferred_mask); > + if (sched_steal_mon_enabled()) > + cpumask_and(cpus, cpus, cpu_preferred_mask); Again, if you mark do cpumask_copy(preferred, active) on the steal monitor disablement, you don't need to complicate core logic here and there. > > schedstat_inc(sd->lb_count[idle]); > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 984da3827f19..f3814099cc0b 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -1060,6 +1060,7 @@ struct root_domain { > struct perf_domain __rcu *pd; > }; > > +static inline bool sched_steal_mon_enabled(void); > extern void init_defrootdomain(void); > extern int sched_init_domains(const struct cpumask *cpu_map); > extern void rq_attach_root(struct rq *rq, struct root_domain *rd); > @@ -1436,7 +1437,7 @@ static inline bool available_idle_cpu(int cpu) > if (!idle_rq(cpu_rq(cpu))) > return 0; > > - if (!cpu_preferred(cpu)) > + if (sched_steal_mon_enabled() && !cpu_preferred(cpu)) > return 0; > > if (vcpu_is_preempted(cpu)) > @@ -4243,8 +4244,15 @@ DECLARE_STATIC_KEY_FALSE(__sched_sm_enable); > void sched_init_steal_monitor(void); > void sched_steal_detection_work(struct work_struct *work); > void sched_push_current_non_preferred_cpu(struct rq *rq); > +void sched_trigger_steal_computation(int cpu); > +static inline bool sched_steal_mon_enabled(void) > +{ > + return static_branch_unlikely(&__sched_sm_enable); > +} > #else /* !CONFIG_PREFERRED_CPU */ > static inline void sched_push_current_non_preferred_cpu(struct rq *rq) { } > static inline void sched_init_steal_monitor(void) { } > +static inline void sched_trigger_steal_computation(int cpu) { } > +static inline bool sched_steal_mon_enabled(void) { return false; } > #endif > #endif /* _KERNEL_SCHED_SCHED_H */ > -- > 2.47.3