linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	 vincent.guittot@linaro.org, tglx@linutronix.de,
	yury.norov@gmail.com,  maddy@linux.ibm.com,
	linux-kernel@vger.kernel.org,  linuxppc-dev@lists.ozlabs.org,
	gregkh@linuxfoundation.org,  vschneid@redhat.com,
	iii@linux.ibm.com, huschle@linux.ibm.com,  rostedt@goodmis.org,
	dietmar.eggemann@arm.com, vineeth@bitbyteword.org,
	 jgross@suse.com, pbonzini@redhat.com
Subject: Re: [RFC PATCH v3 00/10] paravirt CPUs and push task for less vCPU preemption
Date: Mon, 20 Oct 2025 07:32:56 -0700	[thread overview]
Message-ID: <aPZIGCFk-Rnlc1yT@google.com> (raw)
In-Reply-To: <20250910174210.1969750-1-sshegde@linux.ibm.com>

On Wed, Sep 10, 2025, Shrikanth Hegde wrote:
> tl;dr
> 
> This is follow up of [1] with few fixes and addressing review comments.
> Upgraded it to RFC PATCH from RFC. 
> Please review. 
> 
> [1]: v2 - https://lore.kernel.org/all/20250625191108.1646208-1-sshegde@linux.ibm.com/
> 
> v2 -> v3:
> - Renamed to paravirt CPUs

There are myriad uses of "paravirt" throughout Linux and related environments,
and none of them mean "oversubscribed" or "contended".  I assume Hillf's comments
triggered the rename from "avoid CPUs", but IMO "avoid" is at least somewhat
accurate; "paravirt" is wildly misleading.

> - Folded the changes under CONFIG_PARAVIRT.
> - Fixed the crash due work_buf corruption while using
>   stop_one_cpu_nowait. 
> - Added sysfs documentation.
> - Copy most of __balance_push_cpu_stop to new one, this helps it move 
>   the code out of CONFIG_HOTPLUG_CPU. 
> - Some of the code movement suggested. 
> 
> -----------------
> ::Detailed info:: 
> -----------------
> Problem statement 
> 
> vCPU - Virtual CPUs - CPU in VM world.
> pCPU - Physical CPUs - CPU in baremetal world.
> 
> A hypervisor does scheduling of vCPUs on a pCPUs. It has to give each
> vCPU some cycles and be fair. When there are more vCPU requests than
> the pCPUs, hypervsior has to preempt some vCPUs in order to run others.
> This is called as vCPU preemption.
> 
> If we take two VM's, When hypervisor preempts vCPU from VM1 to run vCPU from 
> VM2, it has to do save/restore VM context.Instead if VM's can co-ordinate among
> each other and request for limited  vCPUs, it avoids the above overhead and 
> there is context switching within vCPU(less expensive). Even if hypervisor
> is preempting one vCPU to run another within the same VM, it is still more 
> expensive than the task preemption within the vCPU. So basic aim to avoid 
> vCPU preemption.
> 
> So to achieve this, introduce "Paravirt CPU" concept, where it is better if
> workload avoids these vCPUs at this moment. (vCPUs stays online, don't want
> the overhead of sched domain rebuild and hotplug takes a lot of time too).
> 
> When there is contention, don't use paravirt CPUs.
> When there is no contention, use all vCPUs. 

...

> ------------
> Open issues: 
> 
> - Derivation of hint from steal time is still a challenge. Some work is
>   underway to address it. 
> 
> - Consider kvm and other hypervsiors and how they could derive the hint.
>   Need inputs from community. 

Bluntly, this series is never going to land, at least not in a form that's remotely
close to what is proposed here.  This is an incredibly simplistic way of handling
overcommit, and AFAICT there's no line of sight to supporting more complex scenarios.

I.e. I don't see a path to resolving all these "todos" in the changelog from the
last patch:

 : Ideal would be get the hint from hypervisor. It would be more accurate
 : since it has knowledge of all SPLPARs deployed in the system.
 : 
 : Till the hint from underlying hypervisor arrives, another idea is to
 : approximate the hint from steal time. There are some works ongoing, but
 : not there yet due to challenges revolving around limits and
 : convergence.
 : 
 : Till that happens, there is a need for debugfs file which could be used to
 : set/unset the hint. The interface currently is number starting from which
 : CPUs will marked as paravirt. It could be changed to one the takes a
 : cpumask(list of CPUs) in future.

I see Vineeth and Steven are on the Cc.  Argh, and you even commented on their
first RFC[1], where it was made quite clear that sprinkling one-off "hints"
throughoug the kernel wasn't a viable approach.

I don't know the current status of the ChromeOS work, but there was agreement in
principle that the bulk of paravirt scheduling should not need to touch the kernel
(host or guest)[2].

[1] https://lore.kernel.org/all/20231214024727.3503870-1-vineeth@bitbyteword.org
[2] https://lore.kernel.org/all/ZjJf27yn-vkdB32X@google.com


  parent reply	other threads:[~2025-10-20 14:33 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 17:42 [RFC PATCH v3 00/10] paravirt CPUs and push task for less vCPU preemption Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 01/10] sched/docs: Document cpu_paravirt_mask and Paravirt CPU concept Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 02/10] cpumask: Introduce cpu_paravirt_mask Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 03/10] sched: Static key to check paravirt cpu push Shrikanth Hegde
2025-09-11  1:53   ` Yury Norov
2025-09-11 14:37     ` Shrikanth Hegde
2025-09-11 15:29       ` Yury Norov
2025-09-10 17:42 ` [RFC PATCH v3 04/10] sched/core: Dont allow to use CPU marked as paravirt Shrikanth Hegde
2025-09-11  5:16   ` K Prateek Nayak
2025-09-11 14:44     ` Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance Shrikanth Hegde
2025-09-11  5:23   ` K Prateek Nayak
2025-09-11 15:56     ` Shrikanth Hegde
2025-09-11 16:55       ` K Prateek Nayak
2025-11-08 12:04     ` Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 06/10] sched/rt: Don't select paravirt CPU for wakeup and push/pull rt task Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 07/10] sched/core: Push current task from paravirt CPU Shrikanth Hegde
2025-09-11  5:40   ` K Prateek Nayak
2025-09-11 16:52     ` Shrikanth Hegde
2025-09-11 17:06       ` K Prateek Nayak
2025-09-12  5:22         ` Shrikanth Hegde
2025-09-12  8:48           ` K Prateek Nayak
2025-09-12 12:49             ` Shrikanth Hegde
2025-11-10  4:54     ` Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 08/10] sysfs: Add paravirt CPU file Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 09/10] powerpc: Add debug file for set/unset paravirt CPUs Shrikanth Hegde
2025-09-10 17:42 ` [HELPER PATCH] sysfs: Provide write method for paravirt Shrikanth Hegde
2025-10-20 14:32 ` Sean Christopherson [this message]
2025-10-20 15:05   ` [RFC PATCH v3 00/10] paravirt CPUs and push task for less vCPU preemption Paolo Bonzini
2025-10-23  4:03     ` Shrikanth Hegde
2025-10-21  6:10   ` Shrikanth Hegde
2025-10-22 18:46     ` Sean Christopherson
2025-10-30 17:43       ` Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPZIGCFk-Rnlc1yT@google.com \
    --to=seanjc@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=huschle@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=jgross@suse.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sshegde@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vineeth@bitbyteword.org \
    --cc=vschneid@redhat.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).