Re: [PATCH v10 2/3] KVM: x86: Dirty quota-based throttling of vcpus

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Shivam Kumar <shivam.kumar1@nutanix.com>
Cc: maz@kernel.org, pbonzini@redhat.com, james.morse@arm.com,
	 suzuki.poulose@arm.com, oliver.upton@linux.dev,
	yuzenghui@huawei.com,  catalin.marinas@arm.com,
	aravind.retnakaran@nutanix.com,  carl.waldspurger@nutanix.com,
	david.vrabel@nutanix.com, david@redhat.com,  will@kernel.org,
	kvm@vger.kernel.org,  Shaju Abraham <shaju.abraham@nutanix.com>,
	Manish Mishra <manish.mishra@nutanix.com>,
	 Anurag Madnawat <anurag.madnawat@nutanix.com>
Subject: Re: [PATCH v10 2/3] KVM: x86: Dirty quota-based throttling of vcpus
Date: Tue, 16 Apr 2024 10:44:18 -0700	[thread overview]
Message-ID: <Zh648kuOwZMucG0h@google.com> (raw)
In-Reply-To: <20240221195125.102479-3-shivam.kumar1@nutanix.com>

On Wed, Feb 21, 2024, Shivam Kumar wrote:
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 2d6cdeab1f8a..fa0b3853ee31 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3397,8 +3397,12 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
>  	if (!try_cmpxchg64(sptep, &old_spte, new_spte))
>  		return false;
>  
> -	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
> +	if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
> +		struct kvm_mmu_page *sp = sptep_to_sp(sptep);
> +
> +		update_dirty_quota(vcpu->kvm, (1L << SPTE_LEVEL_SHIFT(sp->role.level)));
>  		mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);

Forcing KVM to manually call update_dirty_quota() whenever mark_page_dirty_in_slot()
is invoked is not maintainable, as we inevitably will forget to update the quota
and probably not notice.  We've already had bugs escape where KVM fails to mark
gfns dirty, and those flows are much more testable.

Stepping back, I feel like this series has gone off the rails a bit.

I understand Marc's objections to the uAPI not differentiating between page sizes,
but simply updating the quota based on KVM's page size is also flawed.  E.g. if
the guest is backed with 1GiB pages, odds are very good that the dirty quotas are
going to be completely out of whack due to the first vCPU that writes a given 1GiB
region being charged with the entire 1GiB page.

And without a way to trigger detection of writes, e.g. by enabling PML or write-
protecting memory, I don't see how userspace can build anything on the "bytes
dirtied" information.

From v7[*], Marc was specifically objecting to the proposed API effectively being
presented as a general purpose API, but in reality the API was heavily reliant
on dirty logging being enabled.

 : My earlier comments still stand: the proposed API is not usable as a
 : general purpose memory-tracking API because it counts faults instead
 : of memory, making it inadequate except for the most trivial cases.
 : And I cannot believe you were serious when you mentioned that you were
 : happy to make that the API.

To avoid going in circles, I think we need to first agree on the scope of the uAPI.
Specifically, do we want to shoot for a generic write-tracking API, or do we want
something that is explicitly tied to dirty logging?

Marc,

If we figured out a clean-ish way to tie the "gfns dirtied" information to
dirty logging, i.e. didn't misconstrue the counts as generally useful data, would
that be acceptable?  While I like the idea of a generic solution, I don't see a
path to an implementation that isn't deeply flawed without basically doing dirty
logging, i.e. without forcing the use of non-huge pages and write-protecting memory
to intercept "new" writes based on input from userspace.

[*] https://lore.kernel.org/all/20221113170507.208810-2-shivam.kumar1@nutanix.com

next prev parent reply	other threads:[~2024-04-16 17:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-21 19:51 [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
2024-02-21 19:51 ` [PATCH v10 1/3] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
2024-02-22  2:00   ` Anish Moorthy
2024-04-16 16:52     ` Sean Christopherson
2024-04-16 16:59   ` Sean Christopherson
2024-04-18 10:36     ` Shivam Kumar
2024-02-21 19:51 ` [PATCH v10 2/3] KVM: x86: Dirty " Shivam Kumar
2024-04-16 17:44   ` Sean Christopherson [this message]
2024-02-21 19:51 ` [PATCH v10 3/3] KVM: arm64: " Shivam Kumar
2024-03-21  5:48 ` [PATCH v10 0/3] Per-vCPU dirty quota-based throttling Shivam Kumar
2024-04-04  9:19   ` Marc Zyngier
2024-04-18 10:46     ` Shivam Kumar
2024-04-16 17:44 ` Sean Christopherson
2024-04-18 10:42   ` Shivam Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zh648kuOwZMucG0h@google.com \
    --to=seanjc@google.com \
    --cc=anurag.madnawat@nutanix.com \
    --cc=aravind.retnakaran@nutanix.com \
    --cc=carl.waldspurger@nutanix.com \
    --cc=catalin.marinas@arm.com \
    --cc=david.vrabel@nutanix.com \
    --cc=david@redhat.com \
    --cc=james.morse@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=manish.mishra@nutanix.com \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=shaju.abraham@nutanix.com \
    --cc=shivam.kumar1@nutanix.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox