From: Marc Zyngier <maz@kernel.org>
To: Shivam Kumar <shivam.kumar1@nutanix.com>
Cc: pbonzini@redhat.com, seanjc@google.com, james.morse@arm.com,
borntraeger@linux.ibm.com, david@redhat.com, kvm@vger.kernel.org,
Shaju Abraham <shaju.abraham@nutanix.com>,
Manish Mishra <manish.mishra@nutanix.com>,
Anurag Madnawat <anurag.madnawat@nutanix.com>
Subject: Re: [PATCH v4 1/4] KVM: Implement dirty quota-based throttling of vcpus
Date: Tue, 24 May 2022 08:11:56 +0100 [thread overview]
Message-ID: <87h75fmmkj.wl-maz@kernel.org> (raw)
In-Reply-To: <20220521202937.184189-2-shivam.kumar1@nutanix.com>
On Sat, 21 May 2022 21:29:36 +0100,
Shivam Kumar <shivam.kumar1@nutanix.com> wrote:
>
> Define variables to track and throttle memory dirtying for every vcpu.
>
> dirty_count: Number of pages the vcpu has dirtied since its creation,
> while dirty logging is enabled.
> dirty_quota: Number of pages the vcpu is allowed to dirty. To dirty
> more, it needs to request more quota by exiting to
> userspace.
>
> Implement the flow for throttling based on dirty quota.
>
> i) Increment dirty_count for the vcpu whenever it dirties a page.
> ii) Exit to userspace whenever the dirty quota is exhausted (i.e. dirty
> count equals/exceeds dirty quota) to request more dirty quota.
>
> Suggested-by: Shaju Abraham <shaju.abraham@nutanix.com>
> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
> Co-developed-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Anurag Madnawat <anurag.madnawat@nutanix.com>
> Signed-off-by: Shivam Kumar <shivam.kumar1@nutanix.com>
> ---
> Documentation/virt/kvm/api.rst | 32 ++++++++++++++++++++++++++++++++
> arch/x86/kvm/mmu/spte.c | 4 ++--
> arch/x86/kvm/vmx/vmx.c | 3 +++
> arch/x86/kvm/x86.c | 4 ++++
Please split the x86 support in its own patch.
> include/linux/kvm_host.h | 15 +++++++++++++++
> include/linux/kvm_types.h | 1 +
> include/uapi/linux/kvm.h | 12 ++++++++++++
> virt/kvm/kvm_main.c | 7 ++++++-
> 8 files changed, 75 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 9f3172376ec3..a9317ed31d06 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6125,6 +6125,24 @@ array field represents return values. The userspace should update the return
> values of SBI call before resuming the VCPU. For more details on RISC-V SBI
> spec refer, https://github.com/riscv/riscv-sbi-doc.
>
> +::
> +
> + /* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
> + struct {
> + __u64 count;
> + __u64 quota;
> + } dirty_quota_exit;
> +If exit reason is KVM_EXIT_DIRTY_QUOTA_EXHAUSTED, it indicates that the VCPU has
> +exhausted its dirty quota. The 'dirty_quota_exit' member of kvm_run structure
> +makes the following information available to the userspace:
> + 'count' field: the current count of pages dirtied by the VCPU, can be
> + skewed based on the size of the pages accessed by each vCPU.
> + 'quota' field: the observed dirty quota just before the exit to userspace.
> +The userspace can design a strategy to allocate the overall scope of dirtying
> +for the VM among the vcpus. Based on the strategy and the current state of dirty
> +quota throttling, the userspace can make a decision to either update (increase)
> +the quota or to put the VCPU to sleep for some time.
> +
> ::
>
> /* Fix the size of the union. */
> @@ -6159,6 +6177,20 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
>
> ::
>
> + /*
> + * Number of pages the vCPU is allowed to have dirtied over its entire
> + * lifetime. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
> + * is reached/exceeded.
> + */
> + __u64 dirty_quota;
> +Please note that enforcing the quota is best effort, as the guest may dirty
> +multiple pages before KVM can recheck the quota. However, unless KVM is using
> +a hardware-based dirty ring buffer, e.g. Intel's Page Modification Logging,
> +KVM will detect quota exhaustion within a handful of dirtied page. If a
> +hardware ring buffer is used, the overrun is bounded by the size of the buffer
> +(512 entries for PML).
> +
> +::
> };
>
>
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 73cfe62fdad1..01f0d2a04796 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -182,9 +182,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
> get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
>
> - if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
> + if (spte & PT_WRITABLE_MASK) {
> /* Enforced by kvm_mmu_hugepage_adjust. */
> - WARN_ON(level > PG_LEVEL_4K);
> + WARN_ON(level > PG_LEVEL_4K && kvm_slot_dirty_track_enabled(slot));
> mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
> }
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b730d799c26e..5cbe4992692a 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5507,6 +5507,9 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu)
> */
> if (__xfer_to_guest_mode_work_pending())
> return 1;
> +
> + if (!kvm_vcpu_check_dirty_quota(vcpu))
> + return 0;
> }
>
> return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eb4029660bd9..0b35b8cc0274 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10257,6 +10257,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> vcpu->arch.l1tf_flush_l1d = true;
>
> for (;;) {
> + r = kvm_vcpu_check_dirty_quota(vcpu);
I really wonder why this has to be checked on each and every run. Why
isn't this a request set by the core code instead?
> + if (!r)
> + break;
> +
> if (kvm_vcpu_running(vcpu)) {
> r = vcpu_enter_guest(vcpu);
> } else {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f11039944c08..ca1ac970a6cf 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -530,6 +530,21 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
> return cmpxchg(&vcpu->mode, IN_GUEST_MODE, EXITING_GUEST_MODE);
> }
>
> +static inline int kvm_vcpu_check_dirty_quota(struct kvm_vcpu *vcpu)
Why is this inline instead of a normal call?
> +{
> + struct kvm_run *run = vcpu->run;
> + u64 dirty_quota = READ_ONCE(run->dirty_quota);
> + u64 pages_dirtied = vcpu->stat.generic.pages_dirtied;
> +
> + if (!dirty_quota || (pages_dirtied < dirty_quota))
> + return 1;
What happens when page_dirtied becomes large and dirty_quota has to
wrap to allow further progress?
> +
> + run->exit_reason = KVM_EXIT_DIRTY_QUOTA_EXHAUSTED;
> + run->dirty_quota_exit.count = pages_dirtied;
> + run->dirty_quota_exit.quota = dirty_quota;
> + return 0;
> +}
> +
> /*
> * Some of the bitops functions do not support too long bitmaps.
> * This number must be determined not to exceed such limits.
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index dceac12c1ce5..7f42486b0405 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -106,6 +106,7 @@ struct kvm_vcpu_stat_generic {
> u64 halt_poll_fail_hist[HALT_POLL_HIST_COUNT];
> u64 halt_wait_hist[HALT_POLL_HIST_COUNT];
> u64 blocking;
> + u64 pages_dirtied;
> };
>
> #define KVM_STATS_NAME_SIZE 48
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 507ee1f2aa96..1d9531efe1fb 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -270,6 +270,7 @@ struct kvm_xen_exit {
> #define KVM_EXIT_X86_BUS_LOCK 33
> #define KVM_EXIT_XEN 34
> #define KVM_EXIT_RISCV_SBI 35
> +#define KVM_EXIT_DIRTY_QUOTA_EXHAUSTED 36
>
> /* For KVM_EXIT_INTERNAL_ERROR */
> /* Emulate instruction failed. */
> @@ -487,6 +488,11 @@ struct kvm_run {
> unsigned long args[6];
> unsigned long ret[2];
> } riscv_sbi;
> + /* KVM_EXIT_DIRTY_QUOTA_EXHAUSTED */
> + struct {
> + __u64 count;
> + __u64 quota;
> + } dirty_quota_exit;
> /* Fix the size of the union. */
> char padding[256];
> };
> @@ -508,6 +514,12 @@ struct kvm_run {
> struct kvm_sync_regs regs;
> char padding[SYNC_REGS_SIZE_BYTES];
> } s;
> + /*
> + * Number of pages the vCPU is allowed to have dirtied over its entire
> + * liftime. KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the
lifetime
> + * quota is reached/exceeded.
> + */
> + __u64 dirty_quota;
How is the feature detected from userspace? I don't see how it can
make use of it without knowing about it the first place.
> };
>
> /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 0afc016cc54d..041ab464405d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3163,7 +3163,12 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> return;
> #endif
>
> - if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> + if (!memslot)
> + return;
> +
> + vcpu->stat.generic.pages_dirtied++;
> +
> + if (kvm_slot_dirty_track_enabled(memslot)) {
> unsigned long rel_gfn = gfn - memslot->base_gfn;
> u32 slot = (memslot->as_id << 16) | memslot->id;
>
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2022-05-24 7:12 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-21 20:29 [PATCH v4 0/4] KVM: Dirty quota-based throttling Shivam Kumar
2022-05-21 20:29 ` [PATCH v4 1/4] KVM: Implement dirty quota-based throttling of vcpus Shivam Kumar
2022-05-24 7:11 ` Marc Zyngier [this message]
2022-05-26 9:33 ` Shivam Kumar
2022-05-26 13:30 ` Marc Zyngier
2022-05-26 15:44 ` Sean Christopherson
2022-05-26 16:16 ` Marc Zyngier
2022-05-26 17:46 ` Sean Christopherson
2022-05-30 11:05 ` Shivam Kumar
2022-07-05 7:21 ` Shivam Kumar
2022-07-14 18:04 ` Peter Xu
2022-07-14 20:48 ` Sean Christopherson
2022-07-14 22:19 ` Peter Xu
2022-07-15 16:23 ` Sean Christopherson
2022-07-15 16:56 ` Peter Xu
2022-07-15 17:07 ` Sean Christopherson
2022-07-15 17:26 ` Peter Xu
2022-07-15 22:38 ` Sean Christopherson
2022-05-21 20:29 ` [PATCH v4 2/4] KVM: arm64: Dirty " Shivam Kumar
2022-05-24 7:14 ` Marc Zyngier
2022-05-26 9:16 ` Shivam Kumar
2022-05-26 17:58 ` Sean Christopherson
2022-05-21 20:29 ` [PATCH v4 3/4] KVM: s390x: " Shivam Kumar
2022-05-21 20:29 ` [PATCH v4 4/4] KVM: selftests: Add selftests for dirty quota throttling Shivam Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h75fmmkj.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=anurag.madnawat@nutanix.com \
--cc=borntraeger@linux.ibm.com \
--cc=david@redhat.com \
--cc=james.morse@arm.com \
--cc=kvm@vger.kernel.org \
--cc=manish.mishra@nutanix.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=shaju.abraham@nutanix.com \
--cc=shivam.kumar1@nutanix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox