From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvm@vger.kernel.org, Will Deacon <will@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Ben Gardon <bgardon@google.com>,
David Matlack <dmatlack@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
kvmarm@lists.cs.columbia.edu,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling
Date: Tue, 06 Sep 2022 11:00:09 +0100 [thread overview]
Message-ID: <87o7vsvn4m.wl-maz@kernel.org> (raw)
In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev>
On Tue, 30 Aug 2022 20:41:18 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Presently KVM only takes a read lock for stage 2 faults if it believes
> the fault can be fixed by relaxing permissions on a PTE (write unprotect
> for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> predictably can pile up all the vCPUs in a sufficiently large VM.
>
> Like the TDP MMU for x86, this series loosens the locking around
> manipulations of the stage 2 page tables to allow parallel faults. RCU
> and atomics are exploited to safely build/destroy the stage 2 page
> tables in light of multiple software observers.
>
> Patches 1-2 are a cleanup to the way we collapse page tables, with the
> added benefit of narrowing the window of time a range of memory is
> unmapped.
>
> Patches 3-7 are minor cleanups and refactorings to the way KVM reads
> PTEs and traverses the stage 2 page tables to make it amenable to
> concurrent modification.
>
> Patches 8-9 use RCU to punt page table cleanup out of the vCPU fault
> path, which should also improve fault latency a bit.
>
> Patches 10-14 implement the meat of this series, extending the
> 'break-before-make' sequence with atomics to realize locking on PTEs.
> Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
> changes to a given PTE.
>
> Finally, patch 15 flips the switch on all the new code and starts
> grabbing the read side of the MMU lock for stage 2 faults.
>
> Applies to 6.0-rc3. Tested with KVM selftests and benchmarked with
> dirty_log_perf_test, scaling from 1 to 48 vCPUs with 4GB of memory per
> vCPU backed by THP.
>
> ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}
>
> Time to dirty memory:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.89s | 0.92s |
> | 2 | 1.13s | 1.18s |
> | 4 | 2.42s | 1.25s |
> | 8 | 5.03s | 1.36s |
> | 16 | 8.84s | 2.09s |
> | 32 | 19.60s | 4.47s |
> | 48 | 31.39s | 6.22s |
> +-------+---------+------------------+
>
> It is also worth mentioning that the time to populate memory has
> improved:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.19s | 0.18s |
> | 2 | 0.25s | 0.21s |
> | 4 | 0.38s | 0.32s |
> | 8 | 0.64s | 0.40s |
> | 16 | 1.22s | 0.54s |
> | 32 | 2.50s | 1.03s |
> | 48 | 3.88s | 1.52s |
> +-------+---------+------------------+
>
> RFC: https://lore.kernel.org/kvmarm/20220415215901.1737897-1-oupton@google.com/
>
> RFC -> v1:
> - Factored out page table teardown from kvm_pgtable_stage2_map()
> - Use the RCU callback to tear down a subtree, instead of scheduling a
> callback for every individual table page.
> - Reorganized series to (hopefully) avoid intermediate breakage.
> - Dropped the use of page headers, instead stuffing KVM metadata into
> page::private directly
>
> Oliver Upton (14):
> KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
> KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
> KVM: arm64: Directly read owner id field in stage2_pte_is_counted()
> KVM: arm64: Read the PTE once per visit
> KVM: arm64: Split init and set for table PTE
> KVM: arm64: Return next table from map callbacks
> KVM: arm64: Document behavior of pgtable visitor callback
> KVM: arm64: Protect page table traversal with RCU
> KVM: arm64: Free removed stage-2 tables in RCU callback
> KVM: arm64: Atomically update stage 2 leaf attributes in parallel
> walks
> KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
> KVM: arm64: Make leaf->leaf PTE changes parallel-aware
> KVM: arm64: Make table->block changes parallel-aware
> KVM: arm64: Handle stage-2 faults in parallel
>
> arch/arm64/include/asm/kvm_pgtable.h | 59 ++++-
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +-
> arch/arm64/kvm/hyp/nvhe/setup.c | 4 +-
> arch/arm64/kvm/hyp/pgtable.c | 360 ++++++++++++++++----------
> arch/arm64/kvm/mmu.c | 65 +++--
> 5 files changed, 325 insertions(+), 170 deletions(-)
This fails to build on -rc4:
MODPOST vmlinux.symvers
MODINFO modules.builtin.modinfo
GEN modules.builtin
CC .vmlinux.export.o
LD .tmp_vmlinux.kallsyms1
ld: Unexpected GOT/PLT entries detected!
ld: Unexpected run-time procedure linkages detected!
ld: ID map text too big or misaligned
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_walk':
(.hyp.text+0xdc0c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc1c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_get_leaf':
(.hyp.text+0xdc80): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc90): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_map':
(.hyp.text+0xddb0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xddc0): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_unmap':
(.hyp.text+0xde44): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xde50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_destroy':
(.hyp.text+0xdf40): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdf50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_map':
(.hyp.text+0xe16c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe17c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_set_owner':
(.hyp.text+0xe264): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe274): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_unmap':
(.hyp.text+0xe2d4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe2e4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_flush':
(.hyp.text+0xe5b4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe5c4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_destroy':
(.hyp.text+0xe6f0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe700): undefined reference to `__kvm_nvhe___rcu_read_unlock'
make[3]: *** [Makefile:1169: vmlinux] Error 1
make[2]: *** [debian/rules:7: build-arch] Error 2
as this drags the RCU read-lock into EL2, and that's not going to
work... The following fixes it, but I wonder how you tested it.
Thanks,
M.
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index dc839db86a1a..adf170122daf 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -580,7 +580,7 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
*/
enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
-#if defined(__KVM_NVHE_HYPERVISOR___)
+#if defined(__KVM_NVHE_HYPERVISOR__)
static inline void kvm_pgtable_walk_begin(void) {}
static inline void kvm_pgtable_walk_end(void) {}
--
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Quentin Perret <qperret@google.com>,
Ricardo Koller <ricarkol@google.com>,
Reiji Watanabe <reijiw@google.com>,
David Matlack <dmatlack@google.com>,
Ben Gardon <bgardon@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Gavin Shan <gshan@redhat.com>, Peter Xu <peterx@redhat.com>,
Sean Christopherson <seanjc@google.com>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org
Subject: Re: [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling
Date: Tue, 06 Sep 2022 11:00:09 +0100 [thread overview]
Message-ID: <87o7vsvn4m.wl-maz@kernel.org> (raw)
In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev>
On Tue, 30 Aug 2022 20:41:18 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Presently KVM only takes a read lock for stage 2 faults if it believes
> the fault can be fixed by relaxing permissions on a PTE (write unprotect
> for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> predictably can pile up all the vCPUs in a sufficiently large VM.
>
> Like the TDP MMU for x86, this series loosens the locking around
> manipulations of the stage 2 page tables to allow parallel faults. RCU
> and atomics are exploited to safely build/destroy the stage 2 page
> tables in light of multiple software observers.
>
> Patches 1-2 are a cleanup to the way we collapse page tables, with the
> added benefit of narrowing the window of time a range of memory is
> unmapped.
>
> Patches 3-7 are minor cleanups and refactorings to the way KVM reads
> PTEs and traverses the stage 2 page tables to make it amenable to
> concurrent modification.
>
> Patches 8-9 use RCU to punt page table cleanup out of the vCPU fault
> path, which should also improve fault latency a bit.
>
> Patches 10-14 implement the meat of this series, extending the
> 'break-before-make' sequence with atomics to realize locking on PTEs.
> Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
> changes to a given PTE.
>
> Finally, patch 15 flips the switch on all the new code and starts
> grabbing the read side of the MMU lock for stage 2 faults.
>
> Applies to 6.0-rc3. Tested with KVM selftests and benchmarked with
> dirty_log_perf_test, scaling from 1 to 48 vCPUs with 4GB of memory per
> vCPU backed by THP.
>
> ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}
>
> Time to dirty memory:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.89s | 0.92s |
> | 2 | 1.13s | 1.18s |
> | 4 | 2.42s | 1.25s |
> | 8 | 5.03s | 1.36s |
> | 16 | 8.84s | 2.09s |
> | 32 | 19.60s | 4.47s |
> | 48 | 31.39s | 6.22s |
> +-------+---------+------------------+
>
> It is also worth mentioning that the time to populate memory has
> improved:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.19s | 0.18s |
> | 2 | 0.25s | 0.21s |
> | 4 | 0.38s | 0.32s |
> | 8 | 0.64s | 0.40s |
> | 16 | 1.22s | 0.54s |
> | 32 | 2.50s | 1.03s |
> | 48 | 3.88s | 1.52s |
> +-------+---------+------------------+
>
> RFC: https://lore.kernel.org/kvmarm/20220415215901.1737897-1-oupton@google.com/
>
> RFC -> v1:
> - Factored out page table teardown from kvm_pgtable_stage2_map()
> - Use the RCU callback to tear down a subtree, instead of scheduling a
> callback for every individual table page.
> - Reorganized series to (hopefully) avoid intermediate breakage.
> - Dropped the use of page headers, instead stuffing KVM metadata into
> page::private directly
>
> Oliver Upton (14):
> KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
> KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
> KVM: arm64: Directly read owner id field in stage2_pte_is_counted()
> KVM: arm64: Read the PTE once per visit
> KVM: arm64: Split init and set for table PTE
> KVM: arm64: Return next table from map callbacks
> KVM: arm64: Document behavior of pgtable visitor callback
> KVM: arm64: Protect page table traversal with RCU
> KVM: arm64: Free removed stage-2 tables in RCU callback
> KVM: arm64: Atomically update stage 2 leaf attributes in parallel
> walks
> KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
> KVM: arm64: Make leaf->leaf PTE changes parallel-aware
> KVM: arm64: Make table->block changes parallel-aware
> KVM: arm64: Handle stage-2 faults in parallel
>
> arch/arm64/include/asm/kvm_pgtable.h | 59 ++++-
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +-
> arch/arm64/kvm/hyp/nvhe/setup.c | 4 +-
> arch/arm64/kvm/hyp/pgtable.c | 360 ++++++++++++++++----------
> arch/arm64/kvm/mmu.c | 65 +++--
> 5 files changed, 325 insertions(+), 170 deletions(-)
This fails to build on -rc4:
MODPOST vmlinux.symvers
MODINFO modules.builtin.modinfo
GEN modules.builtin
CC .vmlinux.export.o
LD .tmp_vmlinux.kallsyms1
ld: Unexpected GOT/PLT entries detected!
ld: Unexpected run-time procedure linkages detected!
ld: ID map text too big or misaligned
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_walk':
(.hyp.text+0xdc0c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc1c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_get_leaf':
(.hyp.text+0xdc80): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc90): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_map':
(.hyp.text+0xddb0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xddc0): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_unmap':
(.hyp.text+0xde44): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xde50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_destroy':
(.hyp.text+0xdf40): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdf50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_map':
(.hyp.text+0xe16c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe17c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_set_owner':
(.hyp.text+0xe264): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe274): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_unmap':
(.hyp.text+0xe2d4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe2e4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_flush':
(.hyp.text+0xe5b4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe5c4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_destroy':
(.hyp.text+0xe6f0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe700): undefined reference to `__kvm_nvhe___rcu_read_unlock'
make[3]: *** [Makefile:1169: vmlinux] Error 1
make[2]: *** [debian/rules:7: build-arch] Error 2
as this drags the RCU read-lock into EL2, and that's not going to
work... The following fixes it, but I wonder how you tested it.
Thanks,
M.
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index dc839db86a1a..adf170122daf 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -580,7 +580,7 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
*/
enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
-#if defined(__KVM_NVHE_HYPERVISOR___)
+#if defined(__KVM_NVHE_HYPERVISOR__)
static inline void kvm_pgtable_walk_begin(void) {}
static inline void kvm_pgtable_walk_end(void) {}
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Quentin Perret <qperret@google.com>,
Ricardo Koller <ricarkol@google.com>,
Reiji Watanabe <reijiw@google.com>,
David Matlack <dmatlack@google.com>,
Ben Gardon <bgardon@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Gavin Shan <gshan@redhat.com>, Peter Xu <peterx@redhat.com>,
Sean Christopherson <seanjc@google.com>,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org
Subject: Re: [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling
Date: Tue, 06 Sep 2022 11:00:09 +0100 [thread overview]
Message-ID: <87o7vsvn4m.wl-maz@kernel.org> (raw)
In-Reply-To: <20220830194132.962932-1-oliver.upton@linux.dev>
On Tue, 30 Aug 2022 20:41:18 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Presently KVM only takes a read lock for stage 2 faults if it believes
> the fault can be fixed by relaxing permissions on a PTE (write unprotect
> for dirty logging). Otherwise, stage 2 faults grab the write lock, which
> predictably can pile up all the vCPUs in a sufficiently large VM.
>
> Like the TDP MMU for x86, this series loosens the locking around
> manipulations of the stage 2 page tables to allow parallel faults. RCU
> and atomics are exploited to safely build/destroy the stage 2 page
> tables in light of multiple software observers.
>
> Patches 1-2 are a cleanup to the way we collapse page tables, with the
> added benefit of narrowing the window of time a range of memory is
> unmapped.
>
> Patches 3-7 are minor cleanups and refactorings to the way KVM reads
> PTEs and traverses the stage 2 page tables to make it amenable to
> concurrent modification.
>
> Patches 8-9 use RCU to punt page table cleanup out of the vCPU fault
> path, which should also improve fault latency a bit.
>
> Patches 10-14 implement the meat of this series, extending the
> 'break-before-make' sequence with atomics to realize locking on PTEs.
> Effectively a cmpxchg() is used to 'break' a PTE, thereby serializing
> changes to a given PTE.
>
> Finally, patch 15 flips the switch on all the new code and starts
> grabbing the read side of the MMU lock for stage 2 faults.
>
> Applies to 6.0-rc3. Tested with KVM selftests and benchmarked with
> dirty_log_perf_test, scaling from 1 to 48 vCPUs with 4GB of memory per
> vCPU backed by THP.
>
> ./dirty_log_perf_test -s anonymous_thp -m 2 -b 4G -v ${NR_VCPUS}
>
> Time to dirty memory:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.89s | 0.92s |
> | 2 | 1.13s | 1.18s |
> | 4 | 2.42s | 1.25s |
> | 8 | 5.03s | 1.36s |
> | 16 | 8.84s | 2.09s |
> | 32 | 19.60s | 4.47s |
> | 48 | 31.39s | 6.22s |
> +-------+---------+------------------+
>
> It is also worth mentioning that the time to populate memory has
> improved:
>
> +-------+---------+------------------+
> | vCPUs | 6.0-rc3 | 6.0-rc3 + series |
> +-------+---------+------------------+
> | 1 | 0.19s | 0.18s |
> | 2 | 0.25s | 0.21s |
> | 4 | 0.38s | 0.32s |
> | 8 | 0.64s | 0.40s |
> | 16 | 1.22s | 0.54s |
> | 32 | 2.50s | 1.03s |
> | 48 | 3.88s | 1.52s |
> +-------+---------+------------------+
>
> RFC: https://lore.kernel.org/kvmarm/20220415215901.1737897-1-oupton@google.com/
>
> RFC -> v1:
> - Factored out page table teardown from kvm_pgtable_stage2_map()
> - Use the RCU callback to tear down a subtree, instead of scheduling a
> callback for every individual table page.
> - Reorganized series to (hopefully) avoid intermediate breakage.
> - Dropped the use of page headers, instead stuffing KVM metadata into
> page::private directly
>
> Oliver Upton (14):
> KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees
> KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make
> KVM: arm64: Directly read owner id field in stage2_pte_is_counted()
> KVM: arm64: Read the PTE once per visit
> KVM: arm64: Split init and set for table PTE
> KVM: arm64: Return next table from map callbacks
> KVM: arm64: Document behavior of pgtable visitor callback
> KVM: arm64: Protect page table traversal with RCU
> KVM: arm64: Free removed stage-2 tables in RCU callback
> KVM: arm64: Atomically update stage 2 leaf attributes in parallel
> walks
> KVM: arm64: Make changes block->table to leaf PTEs parallel-aware
> KVM: arm64: Make leaf->leaf PTE changes parallel-aware
> KVM: arm64: Make table->block changes parallel-aware
> KVM: arm64: Handle stage-2 faults in parallel
>
> arch/arm64/include/asm/kvm_pgtable.h | 59 ++++-
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +-
> arch/arm64/kvm/hyp/nvhe/setup.c | 4 +-
> arch/arm64/kvm/hyp/pgtable.c | 360 ++++++++++++++++----------
> arch/arm64/kvm/mmu.c | 65 +++--
> 5 files changed, 325 insertions(+), 170 deletions(-)
This fails to build on -rc4:
MODPOST vmlinux.symvers
MODINFO modules.builtin.modinfo
GEN modules.builtin
CC .vmlinux.export.o
LD .tmp_vmlinux.kallsyms1
ld: Unexpected GOT/PLT entries detected!
ld: Unexpected run-time procedure linkages detected!
ld: ID map text too big or misaligned
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_walk':
(.hyp.text+0xdc0c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc1c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_get_leaf':
(.hyp.text+0xdc80): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdc90): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_map':
(.hyp.text+0xddb0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xddc0): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_unmap':
(.hyp.text+0xde44): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xde50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_hyp_destroy':
(.hyp.text+0xdf40): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xdf50): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_map':
(.hyp.text+0xe16c): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe17c): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_set_owner':
(.hyp.text+0xe264): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe274): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_unmap':
(.hyp.text+0xe2d4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe2e4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_flush':
(.hyp.text+0xe5b4): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe5c4): undefined reference to `__kvm_nvhe___rcu_read_unlock'
ld: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.o: in function `__kvm_nvhe_kvm_pgtable_stage2_destroy':
(.hyp.text+0xe6f0): undefined reference to `__kvm_nvhe___rcu_read_lock'
ld: (.hyp.text+0xe700): undefined reference to `__kvm_nvhe___rcu_read_unlock'
make[3]: *** [Makefile:1169: vmlinux] Error 1
make[2]: *** [debian/rules:7: build-arch] Error 2
as this drags the RCU read-lock into EL2, and that's not going to
work... The following fixes it, but I wonder how you tested it.
Thanks,
M.
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index dc839db86a1a..adf170122daf 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -580,7 +580,7 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
*/
enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
-#if defined(__KVM_NVHE_HYPERVISOR___)
+#if defined(__KVM_NVHE_HYPERVISOR__)
static inline void kvm_pgtable_walk_begin(void) {}
static inline void kvm_pgtable_walk_end(void) {}
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2022-09-06 10:00 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-30 19:41 [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 01/14] KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 02/14] KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-09-06 14:35 ` Quentin Perret
2022-09-06 14:35 ` Quentin Perret
2022-09-06 14:35 ` Quentin Perret
2022-09-09 10:04 ` Oliver Upton
2022-09-09 10:04 ` Oliver Upton
2022-09-09 10:04 ` Oliver Upton
2022-09-07 20:57 ` David Matlack
2022-09-07 20:57 ` David Matlack
2022-09-07 20:57 ` David Matlack
2022-09-09 10:07 ` Oliver Upton
2022-09-09 10:07 ` Oliver Upton
2022-09-09 10:07 ` Oliver Upton
2022-09-14 0:20 ` Ricardo Koller
2022-09-14 0:20 ` Ricardo Koller
2022-09-14 0:20 ` Ricardo Koller
2022-10-10 3:58 ` Oliver Upton
2022-10-10 3:58 ` Oliver Upton
2022-10-10 3:58 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 03/14] KVM: arm64: Directly read owner id field in stage2_pte_is_counted() Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 04/14] KVM: arm64: Read the PTE once per visit Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 05/14] KVM: arm64: Split init and set for table PTE Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 06/14] KVM: arm64: Return next table from map callbacks Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-09-07 21:32 ` David Matlack
2022-09-07 21:32 ` David Matlack
2022-09-07 21:32 ` David Matlack
2022-09-09 9:38 ` Oliver Upton
2022-09-09 9:38 ` Oliver Upton
2022-09-09 9:38 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 07/14] KVM: arm64: Document behavior of pgtable visitor callback Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 08/14] KVM: arm64: Protect page table traversal with RCU Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-09-07 21:47 ` David Matlack
2022-09-07 21:47 ` David Matlack
2022-09-07 21:47 ` David Matlack
2022-09-09 9:55 ` Oliver Upton
2022-09-09 9:55 ` Oliver Upton
2022-09-09 9:55 ` Oliver Upton
2022-08-30 19:41 ` [PATCH 09/14] KVM: arm64: Free removed stage-2 tables in RCU callback Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-08-30 19:41 ` Oliver Upton
2022-09-07 22:00 ` David Matlack
2022-09-07 22:00 ` David Matlack
2022-09-07 22:00 ` David Matlack
2022-09-08 16:40 ` David Matlack
2022-09-08 16:40 ` David Matlack
2022-09-08 16:40 ` David Matlack
2022-09-14 0:49 ` Ricardo Koller
2022-09-14 0:49 ` Ricardo Koller
2022-09-14 0:49 ` Ricardo Koller
2022-08-30 19:50 ` [PATCH 10/14] KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks Oliver Upton
2022-08-30 19:50 ` Oliver Upton
2022-08-30 19:50 ` Oliver Upton
2022-08-30 19:51 ` [PATCH 11/14] KVM: arm64: Make changes block->table to leaf PTEs parallel-aware Oliver Upton
2022-08-30 19:51 ` Oliver Upton
2022-08-30 19:51 ` Oliver Upton
2022-09-14 0:51 ` Ricardo Koller
2022-09-14 0:51 ` Ricardo Koller
2022-09-14 0:51 ` Ricardo Koller
2022-09-14 0:53 ` Ricardo Koller
2022-09-14 0:53 ` Ricardo Koller
2022-09-14 0:53 ` Ricardo Koller
2022-08-30 19:51 ` [PATCH 12/14] KVM: arm64: Make leaf->leaf PTE changes parallel-aware Oliver Upton
2022-08-30 19:51 ` Oliver Upton
2022-08-30 19:51 ` Oliver Upton
2022-08-30 19:51 ` [PATCH 13/14] KVM: arm64: Make table->block " Oliver Upton
2022-08-30 19:51 ` Oliver Upton
2022-08-30 19:51 ` Oliver Upton
2022-08-30 19:52 ` [PATCH 14/14] KVM: arm64: Handle stage-2 faults in parallel Oliver Upton
2022-08-30 19:52 ` Oliver Upton
2022-08-30 19:52 ` Oliver Upton
2022-09-06 10:00 ` Marc Zyngier [this message]
2022-09-06 10:00 ` [PATCH 00/14] KVM: arm64: Parallel stage-2 fault handling Marc Zyngier
2022-09-06 10:00 ` Marc Zyngier
2022-09-09 10:01 ` Oliver Upton
2022-09-09 10:01 ` Oliver Upton
2022-09-09 10:01 ` Oliver Upton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o7vsvn4m.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=bgardon@google.com \
--cc=catalin.marinas@arm.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.