linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] KVM: arm64: Don't split hugepages outside of MMU write lock
@ 2022-04-01 19:46 Oliver Upton
  2022-04-02  1:06 ` Reiji Watanabe
  2022-04-06 13:22 ` Marc Zyngier
  0 siblings, 2 replies; 3+ messages in thread
From: Oliver Upton @ 2022-04-01 19:46 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Peter Shier, Ricardo Koller,
	Reiji Watanabe, Oliver Upton, Jing Zhang

It is possible to take a stage-2 permission fault on a page larger than
PAGE_SIZE. For example, when running a guest backed by 2M HugeTLB, KVM
eagerly maps at the largest possible block size. When dirty logging is
enabled on a memslot, KVM does *not* eagerly split these 2M stage-2
mappings and instead clears the write bit on the pte.

Since dirty logging is always performed at PAGE_SIZE granularity, KVM
lazily splits these 2M block mappings down to PAGE_SIZE in the stage-2
fault handler. This operation must be done under the write lock. Since
commit f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission
relaxation during dirty logging"), the stage-2 fault handler
conditionally takes the read lock on permission faults with dirty
logging enabled. To that end, it is possible to split a 2M block mapping
while only holding the read lock.

The problem is demonstrated by running kvm_page_table_test with 2M
anonymous HugeTLB, which splats like so:

  WARNING: CPU: 5 PID: 15276 at arch/arm64/kvm/hyp/pgtable.c:153 stage2_map_walk_leaf+0x124/0x158

  [...]

  Call trace:
  stage2_map_walk_leaf+0x124/0x158
  stage2_map_walker+0x5c/0xf0
  __kvm_pgtable_walk+0x100/0x1d4
  __kvm_pgtable_walk+0x140/0x1d4
  __kvm_pgtable_walk+0x140/0x1d4
  kvm_pgtable_walk+0xa0/0xf8
  kvm_pgtable_stage2_map+0x15c/0x198
  user_mem_abort+0x56c/0x838
  kvm_handle_guest_abort+0x1fc/0x2a4
  handle_exit+0xa4/0x120
  kvm_arch_vcpu_ioctl_run+0x200/0x448
  kvm_vcpu_ioctl+0x588/0x664
  __arm64_sys_ioctl+0x9c/0xd4
  invoke_syscall+0x4c/0x144
  el0_svc_common+0xc4/0x190
  do_el0_svc+0x30/0x8c
  el0_svc+0x28/0xcc
  el0t_64_sync_handler+0x84/0xe4
  el0t_64_sync+0x1a4/0x1a8

Fix the issue by only acquiring the read lock if the guest faulted on a
PAGE_SIZE granule w/ dirty logging enabled. Add a WARN to catch locking
bugs in future changes.

Fixes: f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation during dirty logging")
Cc: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---

Applies cleanly to kvmarm/fixes at the following commit:

  8872d9b3e35a ("KVM: arm64: Drop unneeded minor version check from PSCI v1.x handler")

Tested the patch by running KVM selftests. Additionally, I did 10
iterations of the kvm_page_table_test with 2M anon HugeTLB memory.

It is expected that this patch will cause fault serialization in the
pathological case where all vCPUs are faulting on the same granule of
memory, as every vCPU will attempt to acquire the write lock. The only
safe way to cure this contention is to dissolve pages eagerly outside of
the stage-2 fault handler (like x86).

v1: https://lore.kernel.org/kvmarm/20220331213844.2894006-1-oupton@google.com/

v1 -> v2:
 - Drop impossible check for !use_read_lock before
   kvm_pgtable_stage2_map() (Reiji)
 - Codify the requirement to hold the write lock to call
   kvm_pgtable_stage2_map() with a WARN

 arch/arm64/kvm/mmu.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0d19259454d8..53ae2c0640bc 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1079,7 +1079,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	bool logging_perm_fault = false;
+	bool use_read_lock = false;
 	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
@@ -1114,7 +1114,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (logging_active) {
 		force_pte = true;
 		vma_shift = PAGE_SHIFT;
-		logging_perm_fault = (fault_status == FSC_PERM && write_fault);
+		use_read_lock = (fault_status == FSC_PERM && write_fault &&
+				 fault_granule == PAGE_SIZE);
 	} else {
 		vma_shift = get_vma_page_shift(vma, hva);
 	}
@@ -1218,7 +1219,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * logging dirty logging, only acquire read lock for permission
 	 * relaxation.
 	 */
-	if (logging_perm_fault)
+	if (use_read_lock)
 		read_lock(&kvm->mmu_lock);
 	else
 		write_lock(&kvm->mmu_lock);
@@ -1268,6 +1269,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
+		WARN_ONCE(use_read_lock, "Attempted stage-2 map outside of write lock\n");
+
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
 					     memcache);
@@ -1280,7 +1283,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 out_unlock:
-	if (logging_perm_fault)
+	if (use_read_lock)
 		read_unlock(&kvm->mmu_lock);
 	else
 		write_unlock(&kvm->mmu_lock);
-- 
2.35.1.1094.g7c7d902a7c-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] KVM: arm64: Don't split hugepages outside of MMU write lock
  2022-04-01 19:46 [PATCH v2] KVM: arm64: Don't split hugepages outside of MMU write lock Oliver Upton
@ 2022-04-02  1:06 ` Reiji Watanabe
  2022-04-06 13:22 ` Marc Zyngier
  1 sibling, 0 replies; 3+ messages in thread
From: Reiji Watanabe @ 2022-04-02  1:06 UTC (permalink / raw)
  To: Oliver Upton, kvmarm
  Cc: kvm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, linux-arm-kernel, Peter Shier, Ricardo Koller,
	Jing Zhang

On 4/1/22 12:46 PM, Oliver Upton wrote:
> It is possible to take a stage-2 permission fault on a page larger than
> PAGE_SIZE. For example, when running a guest backed by 2M HugeTLB, KVM
> eagerly maps at the largest possible block size. When dirty logging is
> enabled on a memslot, KVM does *not* eagerly split these 2M stage-2
> mappings and instead clears the write bit on the pte.
> 
> Since dirty logging is always performed at PAGE_SIZE granularity, KVM
> lazily splits these 2M block mappings down to PAGE_SIZE in the stage-2
> fault handler. This operation must be done under the write lock. Since
> commit f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission
> relaxation during dirty logging"), the stage-2 fault handler
> conditionally takes the read lock on permission faults with dirty
> logging enabled. To that end, it is possible to split a 2M block mapping
> while only holding the read lock.
> 
> The problem is demonstrated by running kvm_page_table_test with 2M
> anonymous HugeTLB, which splats like so:
> 
>    WARNING: CPU: 5 PID: 15276 at arch/arm64/kvm/hyp/pgtable.c:153 stage2_map_walk_leaf+0x124/0x158
> 
>    [...]
> 
>    Call trace:
>    stage2_map_walk_leaf+0x124/0x158
>    stage2_map_walker+0x5c/0xf0
>    __kvm_pgtable_walk+0x100/0x1d4
>    __kvm_pgtable_walk+0x140/0x1d4
>    __kvm_pgtable_walk+0x140/0x1d4
>    kvm_pgtable_walk+0xa0/0xf8
>    kvm_pgtable_stage2_map+0x15c/0x198
>    user_mem_abort+0x56c/0x838
>    kvm_handle_guest_abort+0x1fc/0x2a4
>    handle_exit+0xa4/0x120
>    kvm_arch_vcpu_ioctl_run+0x200/0x448
>    kvm_vcpu_ioctl+0x588/0x664
>    __arm64_sys_ioctl+0x9c/0xd4
>    invoke_syscall+0x4c/0x144
>    el0_svc_common+0xc4/0x190
>    do_el0_svc+0x30/0x8c
>    el0_svc+0x28/0xcc
>    el0t_64_sync_handler+0x84/0xe4
>    el0t_64_sync+0x1a4/0x1a8
> 
> Fix the issue by only acquiring the read lock if the guest faulted on a
> PAGE_SIZE granule w/ dirty logging enabled. Add a WARN to catch locking
> bugs in future changes.
> 
> Fixes: f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation during dirty logging")
> Cc: Jing Zhang <jingzhangos@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>

Reviewed-by: Reiji Watanabe <reijiw@google.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] KVM: arm64: Don't split hugepages outside of MMU write lock
  2022-04-01 19:46 [PATCH v2] KVM: arm64: Don't split hugepages outside of MMU write lock Oliver Upton
  2022-04-02  1:06 ` Reiji Watanabe
@ 2022-04-06 13:22 ` Marc Zyngier
  1 sibling, 0 replies; 3+ messages in thread
From: Marc Zyngier @ 2022-04-06 13:22 UTC (permalink / raw)
  To: kvmarm, Oliver Upton
  Cc: Reiji Watanabe, Alexandru Elisei, Peter Shier, linux-arm-kernel,
	Jing Zhang, kvm, Suzuki K Poulose, James Morse, Ricardo Koller

On Fri, 1 Apr 2022 19:46:52 +0000, Oliver Upton wrote:
> It is possible to take a stage-2 permission fault on a page larger than
> PAGE_SIZE. For example, when running a guest backed by 2M HugeTLB, KVM
> eagerly maps at the largest possible block size. When dirty logging is
> enabled on a memslot, KVM does *not* eagerly split these 2M stage-2
> mappings and instead clears the write bit on the pte.
> 
> Since dirty logging is always performed at PAGE_SIZE granularity, KVM
> lazily splits these 2M block mappings down to PAGE_SIZE in the stage-2
> fault handler. This operation must be done under the write lock. Since
> commit f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission
> relaxation during dirty logging"), the stage-2 fault handler
> conditionally takes the read lock on permission faults with dirty
> logging enabled. To that end, it is possible to split a 2M block mapping
> while only holding the read lock.
> 
> [...]

Applied to fixes, thanks!

[1/1] KVM: arm64: Don't split hugepages outside of MMU write lock
      commit: f587661f21eb9a38af52488bbe54ce61a64dfae8

Cheers,

	M.
-- 
Without deviation from the norm, progress is not possible.



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-04-06 13:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-01 19:46 [PATCH v2] KVM: arm64: Don't split hugepages outside of MMU write lock Oliver Upton
2022-04-02  1:06 ` Reiji Watanabe
2022-04-06 13:22 ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).