From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Cc: sshegde@linux.ibm.com, mingo@redhat.com, peterz@infradead.org,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
tglx@linutronix.de, yury.norov@gmail.com, maddy@linux.ibm.com,
srikar@linux.ibm.com, gregkh@linuxfoundation.org,
pbonzini@redhat.com, seanjc@google.com, kprateek.nayak@amd.com,
vschneid@redhat.com, iii@linux.ibm.com, huschle@linux.ibm.com,
rostedt@goodmis.org, dietmar.eggemann@arm.com,
christophe.leroy@csgroup.eu
Subject: [RFC PATCH v4 15/17] powerpc: add debugfs file for controlling handling on steal values
Date: Wed, 19 Nov 2025 11:50:58 +0530 [thread overview]
Message-ID: <20251119062100.1112520-16-sshegde@linux.ibm.com> (raw)
In-Reply-To: <20251119062100.1112520-1-sshegde@linux.ibm.com>
Since the low,high threshold for steal time can change based on the
system, make these values tunable.
Values are be to given as expected percentage value * 100. i.e one
wants say 8% of steal time is high, then should specify 800 as the high
threshold. Similar value computation holds true for low threshold.
Provide one more tunable to control how often steal time compution is
done. By default it is 1 second. If one thinks thats too aggressive can
increase it. Max value is 10 seconds since one should act relatively
fast based on steal values.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
arch/powerpc/platforms/pseries/lpar.c | 94 ++++++++++++++++++++++++---
1 file changed, 86 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index c16d97e1a1fe..090e5c48243b 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -662,7 +662,8 @@ machine_device_initcall(pseries, vcpudispatch_stats_procfs_init);
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
#define STEAL_MULTIPLE 10000
-#define PURR_UPDATE_TB NSEC_PER_SEC
+static int steal_check_freq = 1;
+#define PURR_UPDATE_TB (steal_check_freq * NSEC_PER_SEC)
static bool should_cpu_process_steal(int cpu)
{
@@ -2106,9 +2107,6 @@ void pseries_init_ec_vp_cores(void)
available_cores = max(entitled_cores, virtual_procs);
}
-#define STEAL_RATIO_HIGH 400
-#define STEAL_RATIO_LOW 150
-
/*
* [0]<----------->[EC]---->{AC}-->[VP]
* EC == Entitled Cores. Guaranteed number of cores by hypervsior.
@@ -2120,6 +2118,9 @@ void pseries_init_ec_vp_cores(void)
* If steal time is low, increase Available Cores
*/
+static unsigned int steal_ratio_high = 400;
+static unsigned int steal_ratio_low = 150;
+
void update_soft_entitlement(unsigned long steal_ratio)
{
static int prev_direction;
@@ -2128,7 +2129,7 @@ void update_soft_entitlement(unsigned long steal_ratio)
if (!entitled_cores)
return;
- if (steal_ratio >= STEAL_RATIO_HIGH && prev_direction > 0) {
+ if (steal_ratio >= steal_ratio_high && prev_direction > 0) {
/*
* System entitlement was reduced earlier but we continue to
* see steal time. Reduce entitlement further.
@@ -2145,7 +2146,7 @@ void update_soft_entitlement(unsigned long steal_ratio)
}
available_cores--;
- } else if (steal_ratio <= STEAL_RATIO_LOW && prev_direction < 0) {
+ } else if (steal_ratio <= steal_ratio_low && prev_direction < 0) {
/*
* System entitlement was increased but we continue to see
* less steal time. Increase entitlement further.
@@ -2160,13 +2161,90 @@ void update_soft_entitlement(unsigned long steal_ratio)
available_cores++;
}
- if (steal_ratio >= STEAL_RATIO_HIGH)
+ if (steal_ratio >= steal_ratio_high)
prev_direction = 1;
- else if (steal_ratio <= STEAL_RATIO_LOW)
+ else if (steal_ratio <= steal_ratio_low)
prev_direction = -1;
else
prev_direction = 0;
}
+
+/*
+ * Any value above this set threshold will reduce the available cores
+ * Value can't be more than 100% and can't be less than low threshould value
+ * Specifying 500 means 5% steal time
+ */
+
+static int pv_steal_ratio_high_set(void *data, u64 val)
+{
+ if (val > 10000 || val < steal_ratio_low)
+ return -EINVAL;
+
+ steal_ratio_high = val;
+ return 0;
+}
+
+static int pv_steal_ratio_high_get(void *data, u64 *val)
+{
+ *val = steal_ratio_high;
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_pv_steal_ratio_high, pv_steal_ratio_high_get,
+ pv_steal_ratio_high_set, "%llu\n");
+
+static int pv_steal_ratio_low_set(void *data, u64 val)
+{
+ if (val < 1 || val > steal_ratio_high)
+ return -EINVAL;
+
+ steal_ratio_low = val;
+ return 0;
+}
+
+static int pv_steal_ratio_low_get(void *data, u64 *val)
+{
+ *val = steal_ratio_low;
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_pv_steal_ratio_low, pv_steal_ratio_low_get,
+ pv_steal_ratio_low_set, "%llu\n");
+
+static int pv_steal_check_freq_set(void *data, u64 val)
+{
+ if (val < 1 || val > 10)
+ return -EINVAL;
+
+ steal_check_freq = val;
+ return 0;
+}
+
+static int pv_steal_check_freq_get(void *data, u64 *val)
+{
+ *val = steal_check_freq;
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_pv_steal_check_freq, pv_steal_check_freq_get,
+ pv_steal_check_freq_set, "%llu\n");
+
+static int __init steal_debugfs_init(void)
+{
+ if (!is_shared_processor() || is_kvm_guest())
+ return 0;
+
+ debugfs_create_file("steal_ratio_high", 0600, arch_debugfs_dir,
+ NULL, &fops_pv_steal_ratio_high);
+ debugfs_create_file("steal_ratio_low", 0600, arch_debugfs_dir,
+ NULL, &fops_pv_steal_ratio_low);
+ debugfs_create_file("steal_check_frequency", 0600, arch_debugfs_dir,
+ NULL, &fops_pv_steal_check_freq);
+
+ return 0;
+}
+
+machine_arch_initcall(pseries, steal_debugfs_init);
#else
void pseries_init_ec_vp_cores(void) { return; }
void update_soft_entitlement(unsigned long steal_ratio) { return; }
--
2.47.3
next prev parent reply other threads:[~2025-11-19 6:23 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-19 6:20 [RFC PATCH v4 00/17] Paravirt CPUs and push task for less vCPU preemption Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 01/17] sched/docs: Document cpu_paravirt_mask and Paravirt CPU concept Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 02/17] cpumask: Introduce cpu_paravirt_mask Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 03/17] sched/core: Dont allow to use CPU marked as paravirt Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 04/17] sched/debug: Remove unused schedstats Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v5 05/17] sched/fair: Add paravirt movements for proc sched file Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 06/17] sched/fair: Pass current cpu in select_idle_sibling Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 07/17] sched/fair: Don't consider paravirt CPUs for wakeup and load balance Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 08/17] sched/rt: Don't select paravirt CPU for wakeup and push/pull rt task Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 09/17] sched/core: Add support for nohz_full CPUs Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 10/17] sched/core: Push current task from paravirt CPU Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 11/17] sysfs: Add paravirt CPU file Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 12/17] powerpc: method to initialize ec and vp cores Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 13/17] powerpc: enable/disable paravirt CPUs based on steal time Shrikanth Hegde
2025-11-19 6:20 ` [RFC PATCH v4 14/17] powerpc: process steal values at fixed intervals Shrikanth Hegde
2025-11-19 6:20 ` Shrikanth Hegde [this message]
2025-11-19 6:20 ` [HELPER PATCH 1] sysfs: Provide write method for paravirt Shrikanth Hegde
2025-11-19 7:42 ` Greg KH
2025-11-19 8:08 ` Shrikanth Hegde
2025-11-19 8:20 ` Christophe Leroy
2025-11-19 10:01 ` Shrikanth Hegde
2025-11-19 8:23 ` Greg KH
2025-11-19 9:56 ` Shrikanth Hegde
2025-11-19 6:21 ` [HELPER PATCH 2] helper: disable arch handling if paravirt file being written Shrikanth Hegde
2025-11-19 12:53 ` [RFC PATCH v4 00/17] Paravirt CPUs and push task for less vCPU preemption Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251119062100.1112520-16-sshegde@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=dietmar.eggemann@arm.com \
--cc=gregkh@linuxfoundation.org \
--cc=huschle@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=srikar@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox