From: David Matlack <dmatlack@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: maobibo <maobibo@loongson.cn>,
Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
Oliver Upton <oliver.upton@linux.dev>,
Tianrui Zhao <zhaotianrui@loongson.cn>,
Huacai Chen <chenhuacai@kernel.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Anup Patel <anup@brainfault.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>
Subject: Re: [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG
Date: Mon, 15 Apr 2024 10:20:34 -0700 [thread overview]
Message-ID: <Zh1h4gfOpImWHQsC@google.com> (raw)
In-Reply-To: <CALzav=coESqsXnLbX2emiO_P12WrPZh9WutxF6JWWqwX-6RFDg@mail.gmail.com>
On 2024-04-12 09:14 AM, David Matlack wrote:
> On Thu, Apr 4, 2024 at 11:17 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Apr 04, 2024, David Matlack wrote:
> > > > I don't love the idea of adding more arch specific MMU behavior (going the wrong
> > > > direction), but it doesn't seem like an unreasonable approach in this case.
> > >
> > > I wonder if this is being overly cautious.
> >
> > Probably. "Lazy" is another word for it ;-)
> >
> > > I would expect only more benefit on architectures that more aggressively take
> > > the mmu_lock on vCPU threads during faults. The more lock acquisition on vCPU
> > > threads, the more this patch will help reduce vCPU starvation during
> > > CLEAR_DIRTY_LOG.
> > >
> > > Hm, perhaps testing with ept=N (which will use the write-lock for even
> > > dirty logging faults) would be a way to increase confidence in the
> > > effect on other architectures?
> >
> > Turning off the TDP MMU would be more representative, just manually disable the
> > fast-path, e.g.
>
> Good idea. I'm actually throwing in some writable module parameters
> too to make it easy to toggle between configurations.
>
> I'll report back when I have some data.
tl;dr
* My patch likely _will_ regress migration performance on other architectures.
Thank you Bibo and Sean for keeping me honest here.
* I suspect the original issue my patch is trying to fix is actually specific
to the way the TDP MMU does eager page splitting and a more targeted fix is
warranted.
---
To evaluate my patch I tested on x86 with different mmu_lock configurations
to simulate other architectures.
Config 1: tdp_mmu=Y fast_page_fault_read_lock=N eager_page_split=Y
Config 2: tdp_mmu=Y fast_page_fault_read_lock=Y eager_page_split=Y
Config 3: tdp_mmu=N fast_page_fault_read_lock=N eager_page_split=N
Note: "fast_page_fault_read_lock" is a non-upstream parameter I added to
add a read_lock/unlock() in fast_page_fault().
Config 1 is vanilla KVM/x86. Config 2 emulates KVM/arm64. Config 3 emulates
LoongArch if LoongArch added support for lockless write-protection fault
handling.
The test I ran was a Live Migration of a 16VCPU 64GB VM running an aggressive
write-heavy workload. To compare runs I evaluated 3 metrics:
* Duration of pre-copy.
* Amount of dirty memory going into post-copy.
* Total CPU usage of CLEAR_DIRTY_LOG.
The following table shows how each metric changed after adding my patch to drop
mmu_lock during CLEAR_DIRTY_LOG.
| Precopy Duration | Post-Copy Dirty | CLEAR_DIRTY_LOG CPU
---------|------------------|-----------------|---------------------
Config 1 | -1% | -1% | +6%
Config 2 | -1% | +1% | +123%
Config 3 | +32% | +158% | +5200%
Config 2 and 3 both show regressions, with Config 3 severely regressed in all 3
dimensions.
Given these regressions, I started rethinking the original issue this patch is
trying to fix.
The dips in guest performance during CLEAR_DIRTY_LOG occurred during the first
pre-copy pass but not during subsequent passes. One thing unique to the first
pass is eager page splitting.
Ah ha, a theory! The TDP MMU allocates shadow pages while holding the mmu_lock
during eager page splitting and only drops the lock if need_resched=True or a
GFP_NOWAIT allocation fails. If neither occurs, CLEAR_DIRTY_LOG could potential
hold mmu_lock in write-mode for a long time.
Second, the host platform where we saw the dips has nx_huge_pages=Y. I suspect
the long CLEAR_DIRTY_LOG calls are blocking vCPUs taking steady-state
faults for NX Huge Pages, causing the dips in performance.
This theory also explains why we (Google) haven't seen similar drops in guest
performance when using !manual-protect, as in that configuration the TDP MMU
does eager page splitting under the read-lock instead of write-lock.
If this is all true, then a better / more targeted fix for this issue would be
to drop mmu_lock in the TDP MMU eager page splitting path. For example, we
could limit the "allocate under lock" behavior to only when the read-lock is
held, e.g.
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7dfdc49a6ade..ea34f8232d97 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1472,9 +1472,11 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
* If this allocation fails we drop the lock and retry with reclaim
* allowed.
*/
- sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT, nid);
- if (sp)
- return sp;
+ if (shared) {
+ sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT, nid);
+ if (sp)
+ return sp;
+ }
rcu_read_unlock();
I checked the KVM/arm64 eager page splitting code, and it drops the mmu_lock to
allocate page tables. So I suspect no fix is needed there and this is, in fact,
purely and x86-specific issue.
next prev parent reply other threads:[~2024-04-15 17:20 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-02 21:36 [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG David Matlack
2024-04-03 1:50 ` maobibo
2024-04-03 17:20 ` Sean Christopherson
2024-04-04 16:29 ` David Matlack
2024-04-04 17:10 ` Sean Christopherson
2024-04-04 18:12 ` David Matlack
2024-04-04 18:17 ` Sean Christopherson
2024-04-12 16:14 ` David Matlack
2024-04-15 17:20 ` David Matlack [this message]
2024-04-15 20:00 ` Sean Christopherson
2024-04-18 18:50 ` David Matlack
2024-04-18 19:39 ` Sean Christopherson
2024-04-07 2:25 ` maobibo
2024-04-12 16:12 ` David Matlack
2024-04-15 1:21 ` maobibo
2024-04-07 1:36 ` maobibo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zh1h4gfOpImWHQsC@google.com \
--to=dmatlack@google.com \
--cc=anup@brainfault.org \
--cc=borntraeger@linux.ibm.com \
--cc=chenhuacai@kernel.org \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=maobibo@loongson.cn \
--cc=maz@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=zhaotianrui@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.