Re: [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: maobibo <maobibo@loongson.cn>
Cc: David Matlack <dmatlack@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org,  Marc Zyngier <maz@kernel.org>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Tianrui Zhao <zhaotianrui@loongson.cn>,
	Huacai Chen <chenhuacai@kernel.org>,
	 Michael Ellerman <mpe@ellerman.id.au>,
	Anup Patel <anup@brainfault.org>,
	 Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	 Claudio Imbrenda <imbrenda@linux.ibm.com>
Subject: Re: [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG
Date: Wed, 3 Apr 2024 10:20:02 -0700	[thread overview]
Message-ID: <Zg2PwgxNlGqA9T3b@google.com> (raw)
In-Reply-To: <cb793d79-f476-3134-23b7-dc43801b133e@loongson.cn>

On Wed, Apr 03, 2024, maobibo wrote:
> 
> On 2024/4/3 上午5:36, David Matlack wrote:
> > Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG to avoid
> > blocking other threads (e.g. vCPUs taking page faults) for too long.
> > 
> > Specifically, change kvm_clear_dirty_log_protect() to acquire/release
> > mmu_lock only when calling kvm_arch_mmu_enable_log_dirty_pt_masked(),
> > rather than around the entire for loop. This ensures that KVM will only
> > hold mmu_lock for the time it takes the architecture-specific code to
> > process up to 64 pages, rather than holding mmu_lock for log->num_pages,
> > which is controllable by userspace. This also avoids holding mmu_lock
> > when processing parts of the dirty_bitmap that are zero (i.e. when there
> > is nothing to clear).
> > 
> > Moving the acquire/release points for mmu_lock should be safe since
> > dirty_bitmap_buffer is already protected by slots_lock, and dirty_bitmap
> > is already accessed with atomic_long_fetch_andnot(). And at least on x86
> > holding mmu_lock doesn't even serialize access to the memslot dirty
> > bitmap, as vCPUs can call mark_page_dirty_in_slot() without holding
> > mmu_lock.
> > 
> > This change eliminates dips in guest performance during live migration
> > in a 160 vCPU VM when userspace is issuing CLEAR ioctls (tested with
> > 1GiB and 8GiB CLEARs). Userspace could issue finer-grained CLEARs, which
> Frequently drop/reacquire mmu_lock will cause userspace migration process
> issuing CLEAR ioctls to contend with 160 vCPU, migration speed maybe become
> slower.

Only if vCPUs actually acquire mmu_lock.  E.g. on x86, KVM fixes/handles faults
due to write-protection for dirty logging without acquiring mmu_lock.  So for x86,
taking faults that need to acquire mmu_lock while dirty logging should be fairly
uncommon (and if vCPUs are taking lots of faults, performance is likely going to
be bad no matter what).

> In theory priority of userspace migration thread should be higher than vcpu
> thread.

That's very debatable.  And it's not an apples-to-apples comparison, because
CLEAR_DIRTY_LOG can hold mmu_lock for a very long time, probably orders of
magnitude longer than a vCPU will hold mmu_lock when handling a page fault.

And on x86 and ARM, page faults can be resolved while hold mmu_lock for read.  As
a result, holding mmu_lock in CLEAR_DIRTY_LOG (for write) is effectively more
costly than holding it in vCPUs.

> Drop and reacquire mmu_lock with 64-pages may be a little too smaller,
> in generic it is one huge page size. However it should be decided by
> framework maintainer:)

We could tweak the batching, but my very strong preference would be to do that
only as a last resort, i.e. if and only if some magic batch number provides waaay
better performance in all scenarios.

Maintaining code with arbitrary magic numbers is a pain, e.g. KVM x86's MMU has
arbitrary batching in kvm_zap_obsolete_pages(), and the justification for the
number is extremely handwavy (paraphasing the changelog):

      Zap at least 10 shadow pages before releasing mmu_lock to reduce the
      overhead associated with re-acquiring the lock.

      Note: "10" is an arbitrary number, speculated to be high enough so
      that a vCPU isn't stuck zapping obsolete pages for an extended period,
      but small enough so that other vCPUs aren't starved waiting for
      mmu_lock.

I.e. we're stuck with someone's best guess from years ago without any real idea
if 10 is a good number, let alone optimal.

Obviously that doesn't mean 64 pages is optimal, but it's at least not arbitrary,
it's just an artifact of how KVM processes the bitmap.

To be clear, I'm not opposed to per-arch behavior, nor do I think x86 should
dictate how all other architectures should behave.  But I would strongly prefer
to avoid per-arch behavior unless it's actually necessary (doubly so for batching).

In other words, if we do need per-arch behavior, e.g. because aggressivly dropping
mmu_lock causes performance issues on other architectures that need to take mmu_lock
for write to handle faults, I would prefer to have the arch knob control whether
the lock+unlock is outside versus inside the loop, not control an arbitrary batch
size.

next prev parent reply	other threads:[~2024-04-03 17:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 21:36 [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG David Matlack
2024-04-03  1:50 ` maobibo
2024-04-03 17:20   ` Sean Christopherson [this message]
2024-04-04 16:29   ` David Matlack
2024-04-04 17:10     ` Sean Christopherson
2024-04-04 18:12       ` David Matlack
2024-04-04 18:17         ` Sean Christopherson
2024-04-12 16:14           ` David Matlack
2024-04-15 17:20             ` David Matlack
2024-04-15 20:00               ` Sean Christopherson
2024-04-18 18:50                 ` David Matlack
2024-04-18 19:39                   ` Sean Christopherson
2024-04-07  2:25       ` maobibo
2024-04-12 16:12         ` David Matlack
2024-04-15  1:21           ` maobibo
2024-04-07  1:36     ` maobibo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zg2PwgxNlGqA9T3b@google.com \
    --to=seanjc@google.com \
    --cc=anup@brainfault.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=chenhuacai@kernel.org \
    --cc=dmatlack@google.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=maobibo@loongson.cn \
    --cc=maz@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=zhaotianrui@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox