All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Shivam Kumar <shivam.kumar1@nutanix.com>
Cc: pbonzini@redhat.com, seanjc@google.com, james.morse@arm.com,
	borntraeger@linux.ibm.com, david@redhat.com, kvm@vger.kernel.org,
	Shaju Abraham <shaju.abraham@nutanix.com>,
	Manish Mishra <manish.mishra@nutanix.com>,
	Anurag Madnawat <anurag.madnawat@nutanix.com>
Subject: Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus
Date: Tue, 22 Nov 2022 17:46:04 +0000	[thread overview]
Message-ID: <86mt8iopb7.wl-maz@kernel.org> (raw)
In-Reply-To: <18b66b42-0bb4-4b32-e92c-3dce61d8e6a4@nutanix.com>

On Fri, 18 Nov 2022 09:47:50 +0000,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> 
> 
> 
> On 18/11/22 12:56 am, Marc Zyngier wrote:
> > On Sun, 13 Nov 2022 17:05:06 +0000,
> > Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
> >> 
> >> +    count: the current count of pages dirtied by the VCPU, can be
> >> +    skewed based on the size of the pages accessed by each vCPU.
> > 
> > How can userspace make a decision on the amount of dirtying this
> > represent if this doesn't represent a number of base pages? Or are you
> > saying that this only counts the number of permission faults that have
> > dirtied pages?
> 
> Yes, this only counts the number of permission faults that have
> dirtied pages.

So how can userspace consistently set a quota of dirtied memory? This
has to account for the size that has been faulted, because that's all
userspace can reason about. Remember that at least on arm64, we're
dealing with 3 different base page sizes, and many more large page
sizes.

> 
> > 
> >> +    quota: the observed dirty quota just before the exit to
> >> userspace.
> > 
> > You are defining the quota in terms of quota. -ENOCLUE.
> 
> I am defining the "quota" member of the dirty_quota_exit struct in
> terms of "dirty quota" which is already defined in the commit
> message.

Which nobody will see. This is supposed to be a self contained
documentation.

> 
> > 
> >> +
> >> +The userspace can design a strategy to allocate the overall scope of dirtying
> >> +for the VM among the vcpus. Based on the strategy and the current state of dirty
> >> +quota throttling, the userspace can make a decision to either update (increase)
> >> +the quota or to put the VCPU to sleep for some time.
> > 
> > This looks like something out of 1984 (Newspeak anyone)? Can't you
> > just say that userspace is responsible for allocating the quota and
> > manage the resulting throttling effect?
> 
> We didn't intend to sound like the Party or the Big Brother. We
> started working on the linux and QEMU patches at the same time and got
> tempted into exposing the details of how we were using this feature in
> QEMU for throttling. I can get rid of the details if it helps.

I think the details are meaningless, and this should stick to the API,
not the way the API could be used.

> 
> >> +	/*
> >> +	 * Number of pages the vCPU is allowed to have dirtied over its entire
> >> +	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
> >> +	 * is reached/exceeded.
> >> +	 */
> >> +	__u64 dirty_quota;
> > 
> > How are dirty_quota and dirty_quota_exit.quota related?
> > 
> 
> dirty_quota_exit.quota is the dirty quota at the time of the exit. We
> are capturing it for userspace's reference because dirty quota can be
> updated anytime.

Shouldn't that be described here?

> 
> >> @@ -48,6 +48,7 @@ config KVM
> >>   	select KVM_VFIO
> >>   	select SRCU
> >>   	select INTERVAL_TREE
> >> +	select HAVE_KVM_DIRTY_QUOTA
> > 
> > Why isn't this part of the x86 patch?
> 
> Ack. Thanks.
> 
> >>    * Architecture-independent vcpu->requests bit members
> >> - * Bits 3-7 are reserved for more arch-independent bits.
> >> + * Bits 5-7 are reserved for more arch-independent bits.
> >>    */
> >>   #define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >>   #define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> >>   #define KVM_REQ_UNBLOCK           2
> >> +#define KVM_REQ_DIRTY_QUOTA_EXIT  4
> > 
> > Where is 3? Why reserve two bits when only one is used?
> 
> Ack. 3 was in use when I was working on the patchset. Missed this in
> my last code walkthrough before sending the patchset. Thanks.
> 
> >>   +static bool kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu
> >> *vcpu)
> >> +{
> >> +#ifdef CONFIG_HAVE_KVM_DIRTY_QUOTA
> >> +	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
> >> +
> >> +	return dirty_quota && (vcpu->stat.generic.pages_dirtied >= dirty_quota);
> >> +#else
> >> +	return false;
> >> +#endif
> > 
> > If you introduce additional #ifdefery here, why are the additional
> > fields in the vcpu structure unconditional?
> 
> pages_dirtied can be a useful information even if dirty quota
> throttling is not used. So, I kept it unconditional based on
> feedback.

Useful for whom? This creates an ABI for all architectures, and this
needs buy-in from everyone. Personally, I think it is a pretty useless
stat.

And while we're talking about pages_dirtied, I really dislike the
WARN_ON in mark_page_dirty_in_slot(). A counter has rolled over?
Shock, horror...

> 
> CC: Sean
> 
> I can add #ifdefery in the vcpu run struct for dirty_quota.
> 
> >>   		else
> >>   			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> >> +
> >> +		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
> >> +			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);
> > 
> > This is broken in the light of new dirty-tracking code queued for
> > 6.2. Specifically, you absolutely can end-up here *without* a vcpu on
> > arm64. You just have to snapshot the ITS state to observe the fireworks.
> 
> Could you please point me to the patchset which is in queue?

The patches are in -next, and you can look at the branch here[1].
Please also refer to the discussion on the list, as a lot of what was
discussed there does apply here.

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/dirty-ring

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2022-11-22 17:49 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-13 17:05 [PATCH v7 0/4] KVM: Dirty quota-based throttling Shivam Kumar
2022-11-13 17:05 ` [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
2022-11-14 23:29   ` Yunhong Jiang
2022-11-15  4:48     ` Shivam Kumar
2022-11-17 19:26   ` Marc Zyngier
2022-11-18  9:47     ` Shivam Kumar
2022-11-22 17:46       ` Marc Zyngier [this message]
2022-12-06  6:22         ` Shivam Kumar
2022-12-07 16:44           ` Marc Zyngier
2022-12-07 19:53             ` Sean Christopherson
2022-12-08  7:30               ` Shivam Kumar
2022-12-25 16:50                 ` Shivam Kumar
2022-12-26 10:07                   ` Marc Zyngier
2023-01-07 17:24                     ` Shivam Kumar
2023-01-07 21:44                       ` Marc Zyngier
2023-01-14 13:07                         ` Shivam Kumar
2023-01-15  9:56                           ` Marc Zyngier
2023-01-15 14:50                             ` Shivam Kumar
2023-01-15 19:13                               ` Marc Zyngier
2023-01-29 22:00                             ` Shivam Kumar
2023-02-11  6:52                               ` Shivam Kumar
2023-02-12 17:09                                 ` Marc Zyngier
2023-02-12 17:54                                   ` Shivam Kumar
2023-02-12 18:02                                     ` Marc Zyngier
2023-02-12 17:56                               ` Shivam Kumar
2022-12-08  7:20             ` Shivam Kumar
2022-11-25 10:52   ` kernel test robot
2022-11-13 17:05 ` [PATCH v7 2/4] KVM: x86: Dirty " Shivam Kumar
2022-11-15  0:16   ` Yunhong Jiang
2022-11-15  4:55     ` Shivam Kumar
2022-11-15  6:45       ` Yunhong Jiang
2022-11-18  8:51         ` Shivam Kumar
2022-11-13 17:05 ` [PATCH v7 3/4] KVM: arm64: " Shivam Kumar
2022-11-15  0:27   ` Yunhong Jiang
2022-11-15  5:10     ` Shivam Kumar
2022-11-17 20:44   ` Marc Zyngier
2022-11-18  8:56     ` Shivam Kumar
2022-11-13 17:05 ` [PATCH v7 4/4] KVM: selftests: Add selftests for dirty quota throttling Shivam Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86mt8iopb7.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=anurag.madnawat@nutanix.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=james.morse@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=manish.mishra@nutanix.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shaju.abraham@nutanix.com \
    --cc=shivam.kumar1@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.