linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.ibm.com>
To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Peter Zijlstra <peterz@infradead.org>
Cc: Ben Segall <bsegall@google.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Mel Gorman <mgorman@suse.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Srikar Dronamraju <srikar@linux.ibm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Swapnil Sapkal <swapnil.sapkal@amd.com>,
	Thomas Huth <thuth@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	virtualization@lists.linux.dev,
	Yicong Yang <yangyicong@hisilicon.com>,
	Ilya Leoshkevich <iii@linux.ibm.com>
Subject: [PATCH 12/17] pseries/smp: Trigger softoffline based on steal metrics
Date: Thu,  4 Dec 2025 23:24:00 +0530	[thread overview]
Message-ID: <20251204175405.1511340-13-srikar@linux.ibm.com> (raw)
In-Reply-To: <20251204175405.1511340-1-srikar@linux.ibm.com>

Based on the steal metrics, update the number of CPUs that need to
soft onlined/offlined. If LPAR continues to see steal above the given
higher threshold, then continue to offline more CPUs. This will result
in more CPUs of the active cores being used and LPAR should see lesser
vCPU preemption. In the next interval, the steal metrics would also
continue to drop. If LPAR continues to see steal below the lower
threshold, then continue to online more cores. To avoid ping-pong
behaviour, online/offline a core only if steal metrics trend is seen for
at least 2 intervals.

In a PowerVM environment schedules at a core granularity. Hence its
preferable to soft online/offline an entire core. Online / Offline of
only few CPUs from a core is neither going to reduce steal nor would the
resources being used efficiently/effectively.

A Shared LPAR on a PowerVM environment will have cores interleaved
across multiple NUMA nodes. Hence choosing the last active core to
offline and the first inactive core to online will most likely be able
to balance NUMA. A more intelligent approach to select cores to online
/offline may be needed in the future.

Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c    |  3 --
 arch/powerpc/platforms/pseries/pseries.h |  3 ++
 arch/powerpc/platforms/pseries/smp.c     | 57 ++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index f8e049ac9364..f5caf1137707 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -662,9 +662,6 @@ machine_device_initcall(pseries, vcpudispatch_stats_procfs_init);
 #define STEAL_MULTIPLE (STEAL_RATIO * STEAL_RATIO)
 #define PURR_UPDATE_TB tb_ticks_per_sec
 
-static void trigger_softoffline(unsigned long steal_ratio)
-{
-}
 
 static bool should_cpu_process_steal(int cpu)
 {
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 68cf25152870..2527c2049e74 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -119,6 +119,9 @@ int dlpar_workqueue_init(void);
 
 extern u32 pseries_security_flavor;
 void pseries_setup_security_mitigations(void);
+#ifdef CONFIG_PPC_SPLPAR
+void trigger_softoffline(unsigned long steal_ratio);
+#endif
 
 #ifdef CONFIG_PPC_64S_HASH_MMU
 void pseries_lpar_read_hblkrm_characteristics(void);
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index ec1af13670f2..4c83749018d0 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -51,6 +51,9 @@
  * interface by prom_hold_cpus and is spinning on secondary_hold_spinloop.
  */
 static cpumask_var_t of_spin_mask;
+#ifdef CONFIG_PPC_SPLPAR
+static cpumask_var_t cpus;
+#endif
 
 /* Query where a cpu is now.  Return codes #defined in plpar_wrappers.h */
 int smp_query_cpu_stopped(unsigned int pcpu)
@@ -277,6 +280,14 @@ static __init void pSeries_smp_probe(void)
 }
 
 #ifdef CONFIG_PPC_SPLPAR
+/*
+ * Set higher threshold values to which steal has to be limited. Also set
+ * lower threshold values below which allow work to spread out to more
+ * cores.
+ */
+#define STEAL_RATIO_HIGH (10 * STEAL_RATIO)
+#define STEAL_RATIO_LOW (5 * STEAL_RATIO)
+
 static unsigned int max_virtual_cores __read_mostly;
 static unsigned int entitled_cores __read_mostly;
 static unsigned int available_cores;
@@ -311,6 +322,49 @@ static unsigned int pseries_num_available_cores(void)
 
 	return available_cores;
 }
+
+void trigger_softoffline(unsigned long steal_ratio)
+{
+	int currcpu = smp_processor_id();
+	static int prev_direction;
+	int cpu, i;
+
+	if (steal_ratio >= STEAL_RATIO_HIGH && prev_direction > 0) {
+		/*
+		 * System entitlement was reduced earlier but we continue to
+		 * see steal time. Reduce entitlement further.
+		 */
+		cpu = cpumask_last(cpu_active_mask);
+		for_each_cpu_andnot(i, cpu_sibling_mask(cpu), cpu_sibling_mask(currcpu)) {
+			struct offline_worker *worker = &per_cpu(offline_workers, i);
+
+			worker->offline = 1;
+			schedule_work_on(i, &worker->work);
+		}
+	} else if (steal_ratio <= STEAL_RATIO_LOW && prev_direction < 0) {
+		/*
+		 * System entitlement was increased but we continue to see
+		 * less steal time. Increase entitlement further.
+		 */
+		cpumask_andnot(cpus, cpu_online_mask, cpu_active_mask);
+		if (cpumask_empty(cpus))
+			return;
+
+		cpu = cpumask_first(cpus);
+		for_each_cpu_andnot(i, cpu_sibling_mask(cpu), cpu_sibling_mask(currcpu)) {
+			struct offline_worker *worker = &per_cpu(offline_workers, i);
+
+			worker->offline = 0;
+			schedule_work_on(i, &worker->work);
+		}
+	}
+	if (steal_ratio >= STEAL_RATIO_HIGH)
+		prev_direction = 1;
+	else if (steal_ratio <= STEAL_RATIO_LOW)
+		prev_direction = -1;
+	else
+		prev_direction = 0;
+}
 #endif
 
 static struct smp_ops_t pseries_smp_ops = {
@@ -336,6 +390,9 @@ void __init smp_init_pseries(void)
 	smp_ops = &pseries_smp_ops;
 
 	alloc_bootmem_cpumask_var(&of_spin_mask);
+#ifdef CONFIG_PPC_SPLPAR
+	alloc_bootmem_cpumask_var(&cpus);
+#endif
 
 	/*
 	 * Mark threads which are still spinning in hold loops
-- 
2.43.7



  parent reply	other threads:[~2025-12-04 17:57 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-04 17:53 [PATCH 00/17] Steal time based dynamic CPU resource management Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 01/17] sched/fair: Enable group_asym_packing in find_idlest_group Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 02/17] powerpc/lpar: Reorder steal accounting calculation Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 03/17] pseries/lpar: Process steal metrics Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 04/17] powerpc/smp: Add num_available_cores callback for smp_ops Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 05/17] pseries/smp: Query and set entitlements Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 06/17] powerpc/smp: Delay processing steal time at boot Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 07/17] sched/core: Set balance_callback only if CPU is dying Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 08/17] sched/core: Implement CPU soft offline/online Srikar Dronamraju
2025-12-05 16:03   ` Peter Zijlstra
2025-12-05 18:54     ` Srikar Dronamraju
2025-12-05 16:07   ` Peter Zijlstra
2025-12-05 18:57     ` Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 09/17] powerpc/smp: Implement arch_scale_cpu_capacity for shared LPARs Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 10/17] powerpc/smp: Define arch_update_cpu_topology " Srikar Dronamraju
2025-12-04 17:53 ` [PATCH 11/17] pseries/smp: Create soft offline infrastructure for Powerpc " Srikar Dronamraju
2025-12-04 17:54 ` Srikar Dronamraju [this message]
2025-12-04 17:54 ` [PATCH 13/17] pseries/smp: Account cores when triggering softoffline Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 14/17] powerpc/smp: Assume preempt if CPU is inactive Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 15/17] pseries/hotplug: Update available_cores on a dlpar event Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 16/17] pseries/smp: Allow users to override steal thresholds Srikar Dronamraju
2025-12-04 17:54 ` [PATCH 17/17] pseries/lpar: Add debug interface to set steal interval Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251204175405.1511340-13-srikar@linux.ibm.com \
    --to=srikar@linux.ibm.com \
    --cc=bsegall@google.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dietmar.eggemann@arm.com \
    --cc=iii@linux.ibm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sshegde@linux.ibm.com \
    --cc=swapnil.sapkal@amd.com \
    --cc=thuth@redhat.com \
    --cc=vincent.guittot@linaro.org \
    --cc=virtualization@lists.linux.dev \
    --cc=vschneid@redhat.com \
    --cc=yangyicong@hisilicon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).