Re: [PATCH 00/17] Paravirt CPUs and push task for less vCPU preemption

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, tglx@linutronix.de,
	yury.norov@gmail.com, maddy@linux.ibm.com, srikar@linux.ibm.com,
	gregkh@linuxfoundation.org, pbonzini@redhat.com,
	seanjc@google.com, kprateek.nayak@amd.com, vschneid@redhat.com,
	iii@linux.ibm.com, huschle@linux.ibm.com, rostedt@goodmis.org,
	dietmar.eggemann@arm.com, christophe.leroy@csgroup.eu
Subject: Re: [PATCH 00/17] Paravirt CPUs and push task for less vCPU preemption
Date: Thu, 27 Nov 2025 16:14:56 +0530	[thread overview]
Message-ID: <52811199-742a-40fc-8bf2-9cb4397afda4@linux.ibm.com> (raw)
In-Reply-To: <20251119124449.1149616-1-sshegde@linux.ibm.com>



On 11/19/25 6:14 PM, Shrikanth Hegde wrote:
> Detailed problem statement and some of the implementation choices were
> discussed earlier[1].


Performance data on x86 and PowerPC:

++++++++++++++++++++++++++++++++++++++++++++++++
PowerPC: LPAR(VM) Running on powerVM hypervisor
++++++++++++++++++++++++++++++++++++++++++++++++

Host: 126 cores available in pool.
VM1: 96VP/64EC - 768 CPUs
VM2: 72VP/48EC - 576 CPUs
(VP- Virtual Processor core), (EC - Entitled Cores)
steal_check_frequency:1
steal_ratio_high:400
steal_ratio_low:150

Scenarios:
Secario 1: (Major improvement)
VM1 is running daytrader[1] and VM2 is running stress-ng --cpu=$(nproc)
Note: High gains. In the upstream the steal time was around 15%. With series it comes down
to 3%. With further tuning it could be reduced.

				upstream		+series
daytrader	   	   	1x			  1.7x     <<- 70% gain
throughput

-----------
Scenario 2: (improves thread_count < num_cpus)
VM1 is running schbench and VM2 is running stress-ng --cpu=$(nproc)
Note: Values are average of 5 runs and they are wakeup latencies

schbench -t 400			upstream		+series
50.0th:				  18.00			  16.60
90.0th:				 174.00			  46.80
99.0th:				3197.60                  928.80
99.9th:				6203.20                 4539.20
average rps:                   39665.61		       42334.65
  
schbench -t 600			upstream		+series
50.0th:				  23.80 		  19.80
90.0th:				 917.20                  439.00
99.0th:				5582.40                 3869.60
99.9th:				8982.40      		6574.40
average rps:		       39541.00		       40018.11

-----------
Scenario 3: (Improves)
VM1 is running hackbench and VM2 is running  stress-ng --cpu=$(nproc)
Note: Values are average of 10 runs and 20000 loops.

Process 10 groups          	  2.84               2.62
Process 20 groups          	  5.39               4.48
Process 30 groups          	  7.51               6.29
Process 40 groups          	  9.88               7.42
Process 50 groups    	  	 12.46               9.54
Process 60 groups          	 14.76              12.09
thread  10 groups          	  2.93               2.70
thread  20 groups          	  5.79               4.78
Process(Pipe) 10 groups    	  2.31               2.18
Process(Pipe) 20 groups  	  3.32               3.26
Process(Pipe) 30 groups  	  4.19               4.14
Process(Pipe) 40 groups  	  5.18               5.53
Process(Pipe) 50 groups 	  6.57               6.80
Process(Pipe) 60 groups  	  8.21               8.13
thread(Pipe)  10 groups 	  2.42               2.24
thread(Pipe)  20 groups 	  3.62               3.42

-----------
Notes:

Numbers might be very favorable since VM2 is constantly running and has some CPUs
marked as paravirt when there is steal time and thresholds also might have played a role.
Will plan to run same workload i.e hackbench and schbench on both VM's and see the behavior.

VM1 is CPUs distributed equally across Nodes, while VM2 is not. Since CPUs are marked paravirt
based on core count, some nodes on VM2 would have left unused and that could have added a boot for
VM1 performance specially for daytrader.

[1]: Daytrader is real life benchmark which does stock trading simulation.
https://www.ibm.com/docs/en/linux-on-systems?topic=descriptions-daytrader-benchmark-application
https://cwiki.apache.org/confluence/display/GMOxDOC12/Daytrader

TODO: Get numbers with very high concurrency of hackbench/schbench.

+++++++++++++++++++++++++++++++
on x86_64 (Laptop running KVMs)
+++++++++++++++++++++++++++++++
Host: 8 CPUs.
Two VM. Each spawned with -smp 8.
-----------
Scenario 1:
Both VM's are running hackbench 10 process 10000 loops.
Values are average of 3 runs. High steal of close 50% was seen when
running upstream. So marked 4-7 as paravirt by writing to sysfs file.
Since laptop has lot of host tasks running, there will be still be steal time.

hackbench 10 groups		upstream		+series (4-7 marked as paravirt)
(seconds)		 	  58			   54.42			

Note: Having 5 groups helps too. But when concurrency goes such as very high(40 groups), it regress.

-----------
Scenario 2:
Both VM's are running schbench. Values are average of 2 runs. 		
"schbench -t 4 -r 30 -i 30" (latencies improve but rps is slightly less)

wakeup latencies		upstream		+series(4-7 marked as paravirt)
50.0th				  25.5		  		13.5
90.0th				  70.0				30.0
99.0th				2588.0			      1992.0
99.9th				3844.0			      6032.0
average rps:			   338				326

schbench -t 8 -r 30 -i 30    (Major degradation of rps)
wakeup latencies		upstream		+series(4-7 marked as paravirt)
50.0th				  15.0				11.5
90.0th				1630.0			      2844.0
99.0th				4314.0			      6624.0
99.9th				8572.0			     10896.0
average rps:			 393			       240.5

Anything higher also regress. Need to see why it might be? Maybe too many context
switches since number of threads are too high and CPUs available is less.

next prev parent reply	other threads:[~2025-11-27 10:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-19 12:44 [PATCH 00/17] Paravirt CPUs and push task for less vCPU preemption Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 01/17] sched/docs: Document cpu_paravirt_mask and Paravirt CPU concept Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 02/17] cpumask: Introduce cpu_paravirt_mask Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 03/17] sched/core: Dont allow to use CPU marked as paravirt Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 04/17] sched/debug: Remove unused schedstats Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 05/17] sched/fair: Add paravirt movements for proc sched file Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 06/17] sched/fair: Pass current cpu in select_idle_sibling Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 07/17] sched/fair: Don't consider paravirt CPUs for wakeup and load balance Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 08/17] sched/rt: Don't select paravirt CPU for wakeup and push/pull rt task Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 09/17] sched/core: Add support for nohz_full CPUs Shrikanth Hegde
2025-11-21  3:16   ` K Prateek Nayak
2025-11-21  4:40     ` Shrikanth Hegde
2025-11-24  4:36       ` K Prateek Nayak
2025-11-19 12:44 ` [PATCH 10/17] sched/core: Push current task from paravirt CPU Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 11/17] sysfs: Add paravirt CPU file Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 12/17] powerpc: method to initialize ec and vp cores Shrikanth Hegde
2025-11-21  8:29   ` kernel test robot
2025-11-21 10:14   ` kernel test robot
2025-11-19 12:44 ` [PATCH 13/17] powerpc: enable/disable paravirt CPUs based on steal time Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 14/17] powerpc: process steal values at fixed intervals Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 15/17] powerpc: add debugfs file for controlling handling on steal values Shrikanth Hegde
2025-11-19 12:44 ` [PATCH 16/17] sysfs: Provide write method for paravirt Shrikanth Hegde
2025-11-24 17:04   ` Greg KH
2025-11-24 17:24     ` Steven Rostedt
2025-11-25  2:49       ` Shrikanth Hegde
2025-11-25 15:52         ` Steven Rostedt
2025-11-25 16:02           ` Konstantin Ryabitsev
2025-11-25 16:08             ` Steven Rostedt
2025-11-19 12:44 ` [PATCH 17/17] sysfs: disable arch handling if paravirt file being written Shrikanth Hegde
2025-11-24 17:05 ` [PATCH 00/17] Paravirt CPUs and push task for less vCPU preemption Greg KH
2025-11-25  2:39   ` Shrikanth Hegde
2025-11-25  7:48     ` Christophe Leroy (CS GROUP)
2025-11-25  8:48       ` Shrikanth Hegde
2025-11-27 10:44 ` Shrikanth Hegde [this message]
2025-12-04 13:28 ` Ilya Leoshkevich
2025-12-05  5:30   ` Shrikanth Hegde
2025-12-15 17:39     ` Yury Norov
2025-12-18  5:22       ` Shrikanth Hegde
2025-12-08  4:47 ` K Prateek Nayak
2025-12-08  9:57   ` Shrikanth Hegde
2025-12-08 17:58     ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52811199-742a-40fc-8bf2-9cb4397afda4@linux.ibm.com \
    --to=sshegde@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dietmar.eggemann@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=huschle@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    --cc=srikar@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).