From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Quentin Perret <qperret@google.com>,
Ricardo Koller <ricarkol@google.com>,
Reiji Watanabe <reijiw@google.com>,
David Matlack <dmatlack@google.com>,
Ben Gardon <bgardon@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Gavin Shan <gshan@redhat.com>, Peter Xu <peterx@redhat.com>,
Sean Christopherson <seanjc@google.com>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org
Subject: Re: [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling
Date: Tue, 06 Sep 2022 11:00:09 +0100 [thread overview]
Message-ID: <87o7vsvn4m.wl-maz@kernel.org> (raw)
In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev>
On Tue, 30 Aug 2022 20:41:18 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Presently KVM only takes a read lock for stage 2 faults if it believes
> the fault can be fixed by relaxing permissions on a PTE (write unprotect
> for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> predictably can pile up all the vCPUs in a sufficiently large VM.
>
> Like the TDP MMU for x86, this series loosens the locking around
> manipulations of the stage 2 page tables to allow parallel faults. RCU
> and atomics are exploited to safely build/destroy the stage 2 page
> tables in light of multiple software observers.
>
> Patches 1-2 are a cleanup to the way we collapse page tables, with the
> added benefit of narrowing the window of time a range of memory is
> unmapped.
>
> Patches 3-7 are minor cleanups and refactorings to the way KVM reads
> PTEs and traverses the stage 2 page tables to make it amenable to
> concurrent modification.
>
> Patches 8-9 use RCU to punt page table cleanup out of the vCPU fault
> path, which should also improve fault latency a bit.
>
> Patches 10-14 implement the meat of this series, extending the
> 'break-before-make' sequence with atomics to realize locking on PTEs.
> Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
> changes to a given PTE.
>
> Finally, patch 15 flips the switch on all the new code and starts
> grabbing the read side of the MMU lock for stage 2 faults.
>
> Applies to 6.0-rc3. Tested with KVM selftests and benchmarked with
> dirty_log_perf_test, scaling from 1 to 48 vCPUs with 4GB of memory per
> vCPU backed by THP.
>
> ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}
>
> Time to dirty memory:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.89s | 0.92s |
> | 2 | 1.13s | 1.18s |
> | 4 | 2.42s | 1.25s |
> | 8 | 5.03s | 1.36s |
> | 16 | 8.84s | 2.09s |
> | 32 | 19.60s | 4.47s |
> | 48 | 31.39s | 6.22s |
> +-------+---------+------------------+
>
> It is also worth mentioning that the time to populate memory has
> improved:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.19s | 0.18s |
> | 2 | 0.25s | 0.21s |
> | 4 | 0.38s | 0.32s |
> | 8 | 0.64s | 0.40s |
> | 16 | 1.22s | 0.54s |
> | 32 | 2.50s | 1.03s |
> | 48 | 3.88s | 1.52s |
> +-------+---------+------------------+
>
> RFC: https://lore.kernel.org/kvmarm/20220415215901.1737897-1-oupton@google.com/
>
> RFC -> v1:
> - Factored out page table teardown from kvm_pgtable_stage2_map()
> - Use the RCU callback to tear down a subtree, instead of scheduling a
> callback for every individual table page.
> - Reorganized series to (hopefully) avoid intermediate breakage.
> - Dropped the use of page headers, instead stuffing KVM metadata into
> page::private directly
>
> Oliver Upton (14):
> KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
> KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
> KVM: arm64: Directly read owner id field in stage2_pte_is_counted()
> KVM: arm64: Read the PTE once per visit
> KVM: arm64: Split init and set for table PTE
> KVM: arm64: Return next table from map callbacks
> KVM: arm64: Document behavior of pgtable visitor callback
> KVM: arm64: Protect page table traversal with RCU
> KVM: arm64: Free removed stage-2 tables in RCU callback
> KVM: arm64: Atomically update stage 2 leaf attributes in parallel
> walks
> KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
> KVM: arm64: Make leaf->leaf PTE changes parallel-aware
> KVM: arm64: Make table->block changes parallel-aware
> KVM: arm64: Handle stage-2 faults in parallel
>
> arch/arm64/include/asm/kvm_pgtable.h | 59 ++++-
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +-
> arch/arm64/kvm/hyp/nvhe/setup.c | 4 +-
> arch/arm64/kvm/hyp/pgtable.c | 360 ++++++++++++++++----------
> arch/arm64/kvm/mmu.c | 65 +++--
> 5 files changed, 325 insertions(+), 170 deletions(-)
This fails to build on -rc4:
MODPOST vmlinux.symvers
MODINFO modules.builtin.modinfo
GEN modules.builtin
CC .vmlinux.export.o
LD .tmp_vmlinux.kallsyms1
ld: Unexpected GOT/PLT entries detected!
ld: Unexpected run-time procedure linkages detected!
ld: ID map text too big or misaligned
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_walk':
(.hyp.text+0xdc0c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc1c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_get_leaf':
(.hyp.text+0xdc80): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc90): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_map':
(.hyp.text+0xddb0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xddc0): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_unmap':
(.hyp.text+0xde44): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xde50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_destroy':
(.hyp.text+0xdf40): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdf50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_map':
(.hyp.text+0xe16c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe17c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_set_owner':
(.hyp.text+0xe264): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe274): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_unmap':
(.hyp.text+0xe2d4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe2e4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_flush':
(.hyp.text+0xe5b4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe5c4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_destroy':
(.hyp.text+0xe6f0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe700): undefined reference to `__kvm_nvhe___rcu_read_unlock'
make[3]: *** [Makefile:1169: vmlinux] Error 1
make[2]: *** [debian/rules:7: build-arch] Error 2
as this drags the RCU read-lock into EL2, and that's not going to
work... The following fixes it, but I wonder how you tested it.
Thanks,
M.
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index dc839db86a1a..adf170122daf 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -580,7 +580,7 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
*/
enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
-#if defined(__KVM_NVHE_HYPERVISOR___)
+#if defined(__KVM_NVHE_HYPERVISOR__)
static inline void kvm_pgtable_walk_begin(void) {}
static inline void kvm_pgtable_walk_end(void) {}
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-09-06 10:01 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
2022-08-30 19:41 ` [PATCH 01/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees Oliver Upton
2022-08-30 19:41 ` [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Oliver Upton
2022-09-06 14:35 ` Quentin Perret
2022-09-09 10:04 ` Oliver Upton
2022-09-07 20:57 ` David Matlack
2022-09-09 10:07 ` Oliver Upton
2022-09-14 0:20 ` Ricardo Koller
2022-10-10 3:58 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 03/14] KVM: arm64: Directly read owner id field in stage2_pte_is_counted() Oliver Upton
2022-08-30 19:41 ` [PATCH 04/14] KVM: arm64: Read the PTE once per visit Oliver Upton
2022-08-30 19:41 ` [PATCH 05/14] KVM: arm64: Split init and set for table PTE Oliver Upton
2022-08-30 19:41 ` [PATCH 06/14] KVM: arm64: Return next table from map callbacks Oliver Upton
2022-09-07 21:32 ` David Matlack
2022-09-09 9:38 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 07/14] KVM: arm64: Document behavior of pgtable visitor callback Oliver Upton
2022-08-30 19:41 ` [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU Oliver Upton
2022-09-07 21:47 ` David Matlack
2022-09-09 9:55 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback Oliver Upton
2022-09-07 22:00 ` David Matlack
2022-09-08 16:40 ` David Matlack
2022-09-14 0:49 ` Ricardo Koller
2022-08-30 19:50 ` [PATCH 10/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks Oliver Upton
2022-08-30 19:51 ` [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware Oliver Upton
2022-09-14 0:51 ` Ricardo Koller
2022-09-14 0:53 ` Ricardo Koller
2022-08-30 19:51 ` [PATCH 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware Oliver Upton
2022-08-30 19:51 ` [PATCH 13/14] KVM: arm64: Make table->block " Oliver Upton
2022-08-30 19:52 ` [PATCH 14/14] KVM: arm64: Handle stage-2 faults in parallel Oliver Upton
2022-09-06 10:00 ` Marc Zyngier [this message]
2022-09-09 10:01 ` [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o7vsvn4m.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=alexandru.elisei@arm.com \
--cc=bgardon@google.com \
--cc=catalin.marinas@arm.com \
--cc=dmatlack@google.com \
--cc=gshan@redhat.com \
--cc=james.morse@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qperret@google.com \
--cc=reijiw@google.com \
--cc=ricarkol@google.com \
--cc=seanjc@google.com \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).