From: David Matlack <dmatlack@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: maobibo <maobibo@loongson.cn>,
Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
Oliver Upton <oliver.upton@linux.dev>,
Tianrui Zhao <zhaotianrui@loongson.cn>,
Huacai Chen <chenhuacai@kernel.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Anup Patel <anup@brainfault.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>
Subject: Re: [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG
Date: Mon, 15 Apr 2024 10:20:34 -0700 [thread overview]
Message-ID: <Zh1h4gfOpImWHQsC@google.com> (raw)
In-Reply-To: <CALzav=coESqsXnLbX2emiO_P12WrPZh9WutxF6JWWqwX-6RFDg@mail.gmail.com>
On 2024-04-12 09:14 AM, David Matlack wrote:
> On Thu, Apr 4, 2024 at 11:17 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Apr 04, 2024, David Matlack wrote:
> > > > I don't love the idea of adding more arch specific MMU behavior (going the wrong
> > > > direction), but it doesn't seem like an unreasonable approach in this case.
> > >
> > > I wonder if this is being overly cautious.
> >
> > Probably. "Lazy" is another word for it ;-)
> >
> > > I would expect only more benefit on architectures that more aggressively take
> > > the mmu_lock on vCPU threads during faults. The more lock acquisition on vCPU
> > > threads, the more this patch will help reduce vCPU starvation during
> > > CLEAR_DIRTY_LOG.
> > >
> > > Hm, perhaps testing with ept=N (which will use the write-lock for even
> > > dirty logging faults) would be a way to increase confidence in the
> > > effect on other architectures?
> >
> > Turning off the TDP MMU would be more representative, just manually disable the
> > fast-path, e.g.
>
> Good idea. I'm actually throwing in some writable module parameters
> too to make it easy to toggle between configurations.
>
> I'll report back when I have some data.
tl;dr
* My patch likely _will_ regress migration performance on other architectures.
Thank you Bibo and Sean for keeping me honest here.
* I suspect the original issue my patch is trying to fix is actually specific
to the way the TDP MMU does eager page splitting and a more targeted fix is
warranted.
---
To evaluate my patch I tested on x86 with different mmu_lock configurations
to simulate other architectures.
Config 1: tdp_mmu=Y fast_page_fault_read_lock=N eager_page_split=Y
Config 2: tdp_mmu=Y fast_page_fault_read_lock=Y eager_page_split=Y
Config 3: tdp_mmu=N fast_page_fault_read_lock=N eager_page_split=N
Note: "fast_page_fault_read_lock" is a non-upstream parameter I added to
add a read_lock/unlock() in fast_page_fault().
Config 1 is vanilla KVM/x86. Config 2 emulates KVM/arm64. Config 3 emulates
LoongArch if LoongArch added support for lockless write-protection fault
handling.
The test I ran was a Live Migration of a 16VCPU 64GB VM running an aggressive
write-heavy workload. To compare runs I evaluated 3 metrics:
* Duration of pre-copy.
* Amount of dirty memory going into post-copy.
* Total CPU usage of CLEAR_DIRTY_LOG.
The following table shows how each metric changed after adding my patch to drop
mmu_lock during CLEAR_DIRTY_LOG.
| Precopy Duration | Post-Copy Dirty | CLEAR_DIRTY_LOG CPU
---------|------------------|-----------------|---------------------
Config 1 | -1% | -1% | +6%
Config 2 | -1% | +1% | +123%
Config 3 | +32% | +158% | +5200%
Config 2 and 3 both show regressions, with Config 3 severely regressed in all 3
dimensions.
Given these regressions, I started rethinking the original issue this patch is
trying to fix.
The dips in guest performance during CLEAR_DIRTY_LOG occurred during the first
pre-copy pass but not during subsequent passes. One thing unique to the first
pass is eager page splitting.
Ah ha, a theory! The TDP MMU allocates shadow pages while holding the mmu_lock
during eager page splitting and only drops the lock if need_resched=True or a
GFP_NOWAIT allocation fails. If neither occurs, CLEAR_DIRTY_LOG could potential
hold mmu_lock in write-mode for a long time.
Second, the host platform where we saw the dips has nx_huge_pages=Y. I suspect
the long CLEAR_DIRTY_LOG calls are blocking vCPUs taking steady-state
faults for NX Huge Pages, causing the dips in performance.
This theory also explains why we (Google) haven't seen similar drops in guest
performance when using !manual-protect, as in that configuration the TDP MMU
does eager page splitting under the read-lock instead of write-lock.
If this is all true, then a better / more targeted fix for this issue would be
to drop mmu_lock in the TDP MMU eager page splitting path. For example, we
could limit the "allocate under lock" behavior to only when the read-lock is
held, e.g.
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7dfdc49a6ade..ea34f8232d97 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1472,9 +1472,11 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
* If this allocation fails we drop the lock and retry with reclaim
* allowed.
*/
- sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT, nid);
- if (sp)
- return sp;
+ if (shared) {
+ sp = __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT, nid);
+ if (sp)
+ return sp;
+ }
rcu_read_unlock();
I checked the KVM/arm64 eager page splitting code, and it drops the mmu_lock to
allocate page tables. So I suspect no fix is needed there and this is, in fact,
purely and x86-specific issue.
next prev parent reply other threads:[~2024-04-15 17:20 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-02 21:36 [PATCH v2] KVM: Aggressively drop and reacquire mmu_lock during CLEAR_DIRTY_LOG David Matlack
2024-04-03 1:50 ` maobibo
2024-04-03 17:20 ` Sean Christopherson
2024-04-04 16:29 ` David Matlack
2024-04-04 17:10 ` Sean Christopherson
2024-04-04 18:12 ` David Matlack
2024-04-04 18:17 ` Sean Christopherson
2024-04-12 16:14 ` David Matlack
2024-04-15 17:20 ` David Matlack [this message]
2024-04-15 20:00 ` Sean Christopherson
2024-04-18 18:50 ` David Matlack
2024-04-18 19:39 ` Sean Christopherson
2024-04-07 2:25 ` maobibo
2024-04-12 16:12 ` David Matlack
2024-04-15 1:21 ` maobibo
2024-04-07 1:36 ` maobibo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zh1h4gfOpImWHQsC@google.com \
--to=dmatlack@google.com \
--cc=anup@brainfault.org \
--cc=borntraeger@linux.ibm.com \
--cc=chenhuacai@kernel.org \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=maobibo@loongson.cn \
--cc=maz@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=zhaotianrui@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox