LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Cc: sshegde@linux.ibm.com, mingo@redhat.com, peterz@infradead.org,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	tglx@linutronix.de, yury.norov@gmail.com, maddy@linux.ibm.com,
	srikar@linux.ibm.com, gregkh@linuxfoundation.org,
	pbonzini@redhat.com, seanjc@google.com, kprateek.nayak@amd.com,
	vschneid@redhat.com, iii@linux.ibm.com, huschle@linux.ibm.com,
	rostedt@goodmis.org, dietmar.eggemann@arm.com,
	christophe.leroy@csgroup.eu
Subject: [RFC PATCH v4 14/17] powerpc: process steal values at fixed intervals
Date: Wed, 19 Nov 2025 11:50:57 +0530	[thread overview]
Message-ID: <20251119062100.1112520-15-sshegde@linux.ibm.com> (raw)
In-Reply-To: <20251119062100.1112520-1-sshegde@linux.ibm.com>

Process steal time at regular intervals. Sum of steal time across the
vCPUs is computed against the time to get the steal ratio.

Only first online CPU does this work. That reduces the racing issues.
This is done only on SPLPAR (non kvm guest). This assumes PowerVM being
the hypervisor.
 
Originally-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 59 +++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 825b5b4e2b43..c16d97e1a1fe 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -660,10 +660,58 @@ static int __init vcpudispatch_stats_procfs_init(void)
 machine_device_initcall(pseries, vcpudispatch_stats_procfs_init);
 
 #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+
+#define STEAL_MULTIPLE 10000
+#define PURR_UPDATE_TB NSEC_PER_SEC
+
+static bool should_cpu_process_steal(int cpu)
+{
+	if (cpu == cpumask_first(cpu_online_mask))
+		return true;
+
+	return false;
+}
+
+static void process_steal(int cpu)
+{
+	static unsigned long next_tb_ns, prev_steal;
+	unsigned long steal_ratio, delta_tb;
+	unsigned long tb_ns = tb_to_ns(mftb());
+	unsigned long steal = 0;
+	unsigned int i;
+
+	if (!should_cpu_process_steal(cpu))
+		return;
+
+	if (tb_ns < next_tb_ns)
+		return;
+
+	for_each_online_cpu(i) {
+		struct lppaca *lppaca = &lppaca_of(i);
+
+		steal += be64_to_cpu(READ_ONCE(lppaca->ready_enqueue_tb));
+		steal += be64_to_cpu(READ_ONCE(lppaca->enqueue_dispatch_tb));
+	}
+
+	steal = tb_to_ns(steal);
+
+	if (next_tb_ns && prev_steal) {
+		delta_tb = max(tb_ns - (next_tb_ns - PURR_UPDATE_TB), 1);
+		steal_ratio = (steal - prev_steal) * STEAL_MULTIPLE;
+		steal_ratio /= (delta_tb * num_online_cpus());
+		update_soft_entitlement(steal_ratio);
+	}
+
+	next_tb_ns = tb_ns + PURR_UPDATE_TB;
+	prev_steal = steal;
+}
+
 u64 pseries_paravirt_steal_clock(int cpu)
 {
 	struct lppaca *lppaca = &lppaca_of(cpu);
 
+	if (is_shared_processor() && !is_kvm_guest())
+		process_steal(cpu);
 	/*
 	 * VPA steal time counters are reported at TB frequency. Hence do a
 	 * conversion to ns before returning
@@ -2061,6 +2109,17 @@ void pseries_init_ec_vp_cores(void)
 #define STEAL_RATIO_HIGH 400
 #define STEAL_RATIO_LOW  150
 
+/*
+ * [0]<----------->[EC]---->{AC}-->[VP]
+ * EC == Entitled Cores. Guaranteed number of cores by hypervsior.
+ * VP == Virtual Processors. Total number of cores. When there is overcommit
+ * this will be higher than EC.
+ * AC == Available Cores Varies between EC <-> VP.
+ *
+ * If Steal time is high, then reduce Available Cores.
+ * If steal time is low, increase Available Cores
+ */
+
 void update_soft_entitlement(unsigned long steal_ratio)
 {
 	static int prev_direction;
-- 
2.47.3



  parent reply	other threads:[~2025-11-19  6:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19  6:20 [RFC PATCH v4 00/17] Paravirt CPUs and push task for less vCPU preemption Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 01/17] sched/docs: Document cpu_paravirt_mask and Paravirt CPU concept Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 02/17] cpumask: Introduce cpu_paravirt_mask Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 03/17] sched/core: Dont allow to use CPU marked as paravirt Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 04/17] sched/debug: Remove unused schedstats Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v5 05/17] sched/fair: Add paravirt movements for proc sched file Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 06/17] sched/fair: Pass current cpu in select_idle_sibling Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 07/17] sched/fair: Don't consider paravirt CPUs for wakeup and load balance Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 08/17] sched/rt: Don't select paravirt CPU for wakeup and push/pull rt task Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 09/17] sched/core: Add support for nohz_full CPUs Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 10/17] sched/core: Push current task from paravirt CPU Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 11/17] sysfs: Add paravirt CPU file Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 12/17] powerpc: method to initialize ec and vp cores Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 13/17] powerpc: enable/disable paravirt CPUs based on steal time Shrikanth Hegde
2025-11-19  6:20 ` Shrikanth Hegde [this message]
2025-11-19  6:20 ` [RFC PATCH v4 15/17] powerpc: add debugfs file for controlling handling on steal values Shrikanth Hegde
2025-11-19  6:20 ` [HELPER PATCH 1] sysfs: Provide write method for paravirt Shrikanth Hegde
2025-11-19  7:42   ` Greg KH
2025-11-19  8:08     ` Shrikanth Hegde
2025-11-19  8:20       ` Christophe Leroy
2025-11-19 10:01         ` Shrikanth Hegde
2025-11-19  8:23       ` Greg KH
2025-11-19  9:56         ` Shrikanth Hegde
2025-11-19  6:21 ` [HELPER PATCH 2] helper: disable arch handling if paravirt file being written Shrikanth Hegde
2025-11-19 12:53 ` [RFC PATCH v4 00/17] Paravirt CPUs and push task for less vCPU preemption Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251119062100.1112520-15-sshegde@linux.ibm.com \
    --to=sshegde@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dietmar.eggemann@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=huschle@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    --cc=srikar@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox