kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Set up KVM_EXIT_MEMORY_FAULTs when arm64/x86 stage-2 fault handlers fail
@ 2024-08-09 20:51 Anish Moorthy
  2024-08-09 20:51 ` [PATCH v2 1/3] KVM: Documentation: Clarify docs for KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Anish Moorthy @ 2024-08-09 20:51 UTC (permalink / raw)
  To: seanjc, oliver.upton, kvm, kvmarm; +Cc: jthoughton, amoorthy, rananta

v2:
  - x86 patch unchanged
  - Arm patches merged [Oliver]
  - Do memory fault exits in *all* EFAULT cases of user_mem_abort()
    [Oliver]
  - Add patch to tweak docs for KVM_CAP_MEMORY_FAULT_INFO

v1: https://lore.kernel.org/kvm/20240215235405.368539-1-amoorthy@google.com/

Anish Moorthy (3):
  KVM: Documentation: Clarify docs for KVM_CAP_MEMORY_FAULT_INFO
  KVM: x86: Do a KVM_MEMORY_FAULT EXIT when stage-2 fault handler
    EFAULTs
  KVM: arm64: Perform memory fault exits when stage-2 handler EFAULTs

 Documentation/virt/kvm/api.rst |  7 ++++---
 arch/arm64/kvm/arm.c           |  1 +
 arch/arm64/kvm/mmu.c           | 11 ++++++++++-
 arch/x86/kvm/mmu/mmu.c         |  1 +
 4 files changed, 16 insertions(+), 4 deletions(-)


base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f
-- 
2.46.0.76.ge559c4bf1a-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] KVM: Documentation: Clarify docs for KVM_CAP_MEMORY_FAULT_INFO
  2024-08-09 20:51 [PATCH v2 0/3] Set up KVM_EXIT_MEMORY_FAULTs when arm64/x86 stage-2 fault handlers fail Anish Moorthy
@ 2024-08-09 20:51 ` Anish Moorthy
  2024-08-16 20:53   ` Sean Christopherson
  2024-08-09 20:51 ` [PATCH v2 2/3] KVM: x86: Do a KVM_MEMORY_FAULT EXIT when stage-2 fault handler EFAULTs Anish Moorthy
  2024-08-09 20:51 ` [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 " Anish Moorthy
  2 siblings, 1 reply; 11+ messages in thread
From: Anish Moorthy @ 2024-08-09 20:51 UTC (permalink / raw)
  To: seanjc, oliver.upton, kvm, kvmarm; +Cc: jthoughton, amoorthy, rananta

The initial paragraph of the documentation here makes it sound like a
KVM_EXIT_MEMORY_FAULT will always accompany an EFAULT from KVM_RUN, but
that's not a guarantee.

Also, define zero to be a special value for the "size" field. This
allows memory faults exits to be set up in spots where KVM_RUN must
EFAULT, but is not able to supply an accurate size.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 Documentation/virt/kvm/api.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 8e5dad80b337..c5ce7944005c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7073,7 +7073,8 @@ spec refer, https://github.com/riscv/riscv-sbi-doc.
 
 KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that
 could not be resolved by KVM.  The 'gpa' and 'size' (in bytes) describe the
-guest physical address range [gpa, gpa + size) of the fault.  The 'flags' field
+guest physical address range [gpa, gpa + size) of the fault: when zero, it
+indicates that the size of the fault could not be determined. The 'flags' field
 describes properties of the faulting access that are likely pertinent:
 
  - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred
@@ -8131,7 +8132,7 @@ unavailable to host or other VMs.
 :Architectures: x86
 :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
 
-The presence of this capability indicates that KVM_RUN will fill
+The presence of this capability indicates that KVM_RUN *may* fill
 kvm_run.memory_fault if KVM cannot resolve a guest page fault VM-Exit, e.g. if
 there is a valid memslot but no backing VMA for the corresponding host virtual
 address.
-- 
2.46.0.76.ge559c4bf1a-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] KVM: x86: Do a KVM_MEMORY_FAULT EXIT when stage-2 fault handler EFAULTs
  2024-08-09 20:51 [PATCH v2 0/3] Set up KVM_EXIT_MEMORY_FAULTs when arm64/x86 stage-2 fault handlers fail Anish Moorthy
  2024-08-09 20:51 ` [PATCH v2 1/3] KVM: Documentation: Clarify docs for KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
@ 2024-08-09 20:51 ` Anish Moorthy
  2024-08-16 20:57   ` Sean Christopherson
  2024-08-09 20:51 ` [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 " Anish Moorthy
  2 siblings, 1 reply; 11+ messages in thread
From: Anish Moorthy @ 2024-08-09 20:51 UTC (permalink / raw)
  To: seanjc, oliver.upton, kvm, kvmarm; +Cc: jthoughton, amoorthy, rananta

Right now userspace just gets a bare EFAULT when the stage-2 fault
handler fails to fault in the relevant page. Set up a memory fault exit
when this happens, which at the very least eases debugging and might
also let userspace decide on/take some specific action other than
crashing the VM.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 901be9e420a4..c22c807696ae 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3264,6 +3264,7 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
 		return RET_PF_RETRY;
 	}
 
+	kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 	return -EFAULT;
 }
 
-- 
2.46.0.76.ge559c4bf1a-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 handler EFAULTs
  2024-08-09 20:51 [PATCH v2 0/3] Set up KVM_EXIT_MEMORY_FAULTs when arm64/x86 stage-2 fault handlers fail Anish Moorthy
  2024-08-09 20:51 ` [PATCH v2 1/3] KVM: Documentation: Clarify docs for KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
  2024-08-09 20:51 ` [PATCH v2 2/3] KVM: x86: Do a KVM_MEMORY_FAULT EXIT when stage-2 fault handler EFAULTs Anish Moorthy
@ 2024-08-09 20:51 ` Anish Moorthy
  2024-08-12  7:51   ` Aneesh Kumar K.V
  2024-08-16 21:22   ` Sean Christopherson
  2 siblings, 2 replies; 11+ messages in thread
From: Anish Moorthy @ 2024-08-09 20:51 UTC (permalink / raw)
  To: seanjc, oliver.upton, kvm, kvmarm; +Cc: jthoughton, amoorthy, rananta

Right now userspace just gets a bare EFAULT when the stage-2 fault
handler fails to fault in the relevant page. Set up a
KVM_EXIT_MEMORY_FAULT whenever this happens, which at the very least
eases debugging and might also let userspace decide on/take some
specific action other than crashing the VM.

In some cases, user_mem_abort() EFAULTs before the size of the fault is
calculated: return 0 in these cases to indicate that the fault is of
unknown size.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
---
 Documentation/virt/kvm/api.rst |  2 +-
 arch/arm64/kvm/arm.c           |  1 +
 arch/arm64/kvm/mmu.c           | 11 ++++++++++-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c5ce7944005c..7b321fefcb3e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8129,7 +8129,7 @@ unavailable to host or other VMs.
 7.34 KVM_CAP_MEMORY_FAULT_INFO
 ------------------------------
 
-:Architectures: x86
+:Architectures: arm64, x86
 :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
 
 The presence of this capability indicates that KVM_RUN *may* fill
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a7ca776b51ec..4121b5a43b9c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -335,6 +335,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_SYSTEM_SUSPEND:
 	case KVM_CAP_IRQFD_RESAMPLE:
 	case KVM_CAP_COUNTER_OFFSET:
+	case KVM_CAP_MEMORY_FAULT_INFO:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6981b1bc0946..c97199d1feac 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1448,6 +1448,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	if (fault_is_perm && !write_fault && !exec_fault) {
 		kvm_err("Unexpected L2 read permission error\n");
+		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, 0,
+					      write_fault, exec_fault, false);
 		return -EFAULT;
 	}
 
@@ -1473,6 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (unlikely(!vma)) {
 		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
 		mmap_read_unlock(current->mm);
+		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, 0,
+					      write_fault, exec_fault, false);
 		return -EFAULT;
 	}
 
@@ -1568,8 +1572,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
 	}
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(pfn)) {
+		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, vma_pagesize,
+					      write_fault, exec_fault, false);
 		return -EFAULT;
+	}
 
 	if (kvm_is_device_pfn(pfn)) {
 		/*
@@ -1643,6 +1650,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (mte_allowed) {
 			sanitise_mte_tags(kvm, pfn, vma_pagesize);
 		} else {
+			kvm_prepare_memory_fault_exit(vcpu, fault_ipa, vma_pagesize,
+						      write_fault, exec_fault, false);
 			ret = -EFAULT;
 			goto out_unlock;
 		}
-- 
2.46.0.76.ge559c4bf1a-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 handler EFAULTs
  2024-08-09 20:51 ` [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 " Anish Moorthy
@ 2024-08-12  7:51   ` Aneesh Kumar K.V
  2024-08-13 14:26     ` Sean Christopherson
  2024-08-16 21:22   ` Sean Christopherson
  1 sibling, 1 reply; 11+ messages in thread
From: Aneesh Kumar K.V @ 2024-08-12  7:51 UTC (permalink / raw)
  To: Anish Moorthy, seanjc, oliver.upton, kvm, kvmarm
  Cc: jthoughton, amoorthy, rananta

Anish Moorthy <amoorthy@google.com> writes:

> Right now userspace just gets a bare EFAULT when the stage-2 fault
> handler fails to fault in the relevant page. Set up a
> KVM_EXIT_MEMORY_FAULT whenever this happens, which at the very least
> eases debugging and might also let userspace decide on/take some
> specific action other than crashing the VM.
>
> In some cases, user_mem_abort() EFAULTs before the size of the fault is
> calculated: return 0 in these cases to indicate that the fault is of
> unknown size.
>

VMMs are now converting private memory to shared or vice-versa on vcpu
exit due to memory fault. This change will require VMM track each page's
private/shared state so that they can now handle an exit fault on a
shared memory where the fault happened due to reasons other than
conversion. Should we make it easy by adding additional flag bits to
indicate the fault was due to attribute and access type mismatch?


> Signed-off-by: Anish Moorthy <amoorthy@google.com>
> ---
>  Documentation/virt/kvm/api.rst |  2 +-
>  arch/arm64/kvm/arm.c           |  1 +
>  arch/arm64/kvm/mmu.c           | 11 ++++++++++-
>  3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index c5ce7944005c..7b321fefcb3e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8129,7 +8129,7 @@ unavailable to host or other VMs.
>  7.34 KVM_CAP_MEMORY_FAULT_INFO
>  ------------------------------
>  
> -:Architectures: x86
> +:Architectures: arm64, x86
>  :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
>  
>  The presence of this capability indicates that KVM_RUN *may* fill
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index a7ca776b51ec..4121b5a43b9c 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -335,6 +335,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_ARM_SYSTEM_SUSPEND:
>  	case KVM_CAP_IRQFD_RESAMPLE:
>  	case KVM_CAP_COUNTER_OFFSET:
> +	case KVM_CAP_MEMORY_FAULT_INFO:
>  		r = 1;
>  		break;
>  	case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 6981b1bc0946..c97199d1feac 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1448,6 +1448,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	if (fault_is_perm && !write_fault && !exec_fault) {
>  		kvm_err("Unexpected L2 read permission error\n");
> +		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, 0,
> +					      write_fault, exec_fault, false);
>  		return -EFAULT;
>  	}
>  
> @@ -1473,6 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (unlikely(!vma)) {
>  		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
>  		mmap_read_unlock(current->mm);
> +		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, 0,
> +					      write_fault, exec_fault, false);
>  		return -EFAULT;
>  	}
>  
> @@ -1568,8 +1572,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		kvm_send_hwpoison_signal(hva, vma_shift);
>  		return 0;
>  	}
> -	if (is_error_noslot_pfn(pfn))
> +	if (is_error_noslot_pfn(pfn)) {
> +		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, vma_pagesize,
> +					      write_fault, exec_fault, false);
>  		return -EFAULT;
> +	}
>  
>  	if (kvm_is_device_pfn(pfn)) {
>  		/*
> @@ -1643,6 +1650,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		if (mte_allowed) {
>  			sanitise_mte_tags(kvm, pfn, vma_pagesize);
>  		} else {
> +			kvm_prepare_memory_fault_exit(vcpu, fault_ipa, vma_pagesize,
> +						      write_fault, exec_fault, false);
>  			ret = -EFAULT;
>  			goto out_unlock;
>  		}
> -- 
> 2.46.0.76.ge559c4bf1a-goog

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 handler EFAULTs
  2024-08-12  7:51   ` Aneesh Kumar K.V
@ 2024-08-13 14:26     ` Sean Christopherson
  2024-08-14  8:02       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2024-08-13 14:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Anish Moorthy, oliver.upton, kvm, kvmarm, jthoughton, rananta

On Mon, Aug 12, 2024, Aneesh Kumar K.V wrote:
> Anish Moorthy <amoorthy@google.com> writes:
> 
> > Right now userspace just gets a bare EFAULT when the stage-2 fault
> > handler fails to fault in the relevant page. Set up a
> > KVM_EXIT_MEMORY_FAULT whenever this happens, which at the very least
> > eases debugging and might also let userspace decide on/take some
> > specific action other than crashing the VM.
> >
> > In some cases, user_mem_abort() EFAULTs before the size of the fault is
> > calculated: return 0 in these cases to indicate that the fault is of
> > unknown size.
> >
> 
> VMMs are now converting private memory to shared or vice-versa on vcpu
> exit due to memory fault. This change will require VMM track each page's
> private/shared state so that they can now handle an exit fault on a
> shared memory where the fault happened due to reasons other than
> conversion.

I don't see how filling kvm_run.memory_fault in more locations changes anything.
The userspace exits are inherently racy, e.g. userspace may have already converted
the page to the appropriate state, thus making KVM's exit spurious.  So either
the VMM already tracks state, or the VMM blindly converts to shared/private.

> Should we make it easy by adding additional flag bits to
> indicate the fault was due to attribute and access type mismatch?

Like above, describing _why_ an exit occurred is problematic when an exit races
with a "fix" from userspace.  It's also problematic when there are multiple
possible faults, e.g. if the guest attempts to write to private memory, but
userspace has the memory mapped as read-only, shared (contrived, but possible).
Describing only the fault that KVM's see means the vCPU will encounter multiple
faults, and userspace will end up getting multiple exits

Instead, KVM should describe the access that led to the fault, as planned in the
original series[1][2].  Userpace can then get the page into the correct state
straightaway, or take punitive action if the guest is misbehaving.

	if (is_write)
		vcpu->run->memory_fault.flags |= KVM_MEMORY_FAULT_FLAG_WRITE;
	else if (is_exec)
		vcpu->run->memory_fault.flags |= KVM_MEMORY_FAULT_FLAG_EXEC;
	else
		vcpu->run->memory_fault.flags |= KVM_MEMORY_FAULT_FLAG_READ;

That said, I'm a little hesitant to capture RWX information without a use case,
mainly because it will require a new capability for userspace to be able to rely
on the information.  In hindsight, it probably would have been better to capture
RWX information in the initial implementation.  Doh.

[1] https://lore.kernel.org/all/ZIn6VQSebTRN1jtX@google.com
[2] https://lore.kernel.org/all/ZR4N8cwzTMDanPUY@google.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 handler EFAULTs
  2024-08-13 14:26     ` Sean Christopherson
@ 2024-08-14  8:02       ` Aneesh Kumar K.V
  2024-08-14 14:49         ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Aneesh Kumar K.V @ 2024-08-14  8:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Anish Moorthy, oliver.upton, kvm, kvmarm, jthoughton, rananta

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 12, 2024, Aneesh Kumar K.V wrote:
>> Anish Moorthy <amoorthy@google.com> writes:
>>
>> > Right now userspace just gets a bare EFAULT when the stage-2 fault
>> > handler fails to fault in the relevant page. Set up a
>> > KVM_EXIT_MEMORY_FAULT whenever this happens, which at the very least
>> > eases debugging and might also let userspace decide on/take some
>> > specific action other than crashing the VM.
>> >
>> > In some cases, user_mem_abort() EFAULTs before the size of the fault is
>> > calculated: return 0 in these cases to indicate that the fault is of
>> > unknown size.
>> >
>>
>> VMMs are now converting private memory to shared or vice-versa on vcpu
>> exit due to memory fault. This change will require VMM track each page's
>> private/shared state so that they can now handle an exit fault on a
>> shared memory where the fault happened due to reasons other than
>> conversion.
>
> I don't see how filling kvm_run.memory_fault in more locations changes anything.
> The userspace exits are inherently racy, e.g. userspace may have already converted
> the page to the appropriate state, thus making KVM's exit spurious.  So either
> the VMM already tracks state, or the VMM blindly converts to shared/private.
>

I might be missing some details here. The change is adding exit_reason =
KVM_EXIT_MEMORY_FAULT to code path which would earlier result in VMM
panics?

For ex:

@@ -1473,6 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
	if (unlikely(!vma)) {
		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
		mmap_read_unlock(current->mm);
+		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, 0,
+					      write_fault, exec_fault, false);
		return -EFAULT;
	}


VMMs handle this with code as below

static bool handle_memoryfault(struct kvm_cpu *vcpu)
{
....
        return true;
}

bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
{
	switch (vcpu->kvm_run->exit_reason) {
        ...
	case KVM_EXIT_MEMORY_FAULT:
		return handle_memoryfault(vcpu);
	}

	return false;
}

and the caller did

		ret = kvm_cpu__handle_exit(cpu);
		if (!ret)
			goto panic_kvm;
		break;


This change will break those VMMs isn't? ie, we will not panic after
this change?

-aneesh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 handler EFAULTs
  2024-08-14  8:02       ` Aneesh Kumar K.V
@ 2024-08-14 14:49         ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-08-14 14:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Anish Moorthy, oliver.upton, kvm, kvmarm, jthoughton, rananta

On Wed, Aug 14, 2024, Aneesh Kumar K.V wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Mon, Aug 12, 2024, Aneesh Kumar K.V wrote:
> >> Anish Moorthy <amoorthy@google.com> writes:
> >>
> >> > Right now userspace just gets a bare EFAULT when the stage-2 fault
> >> > handler fails to fault in the relevant page. Set up a
> >> > KVM_EXIT_MEMORY_FAULT whenever this happens, which at the very least
> >> > eases debugging and might also let userspace decide on/take some
> >> > specific action other than crashing the VM.
> >> >
> >> > In some cases, user_mem_abort() EFAULTs before the size of the fault is
> >> > calculated: return 0 in these cases to indicate that the fault is of
> >> > unknown size.
> >> >
> >>
> >> VMMs are now converting private memory to shared or vice-versa on vcpu
> >> exit due to memory fault. This change will require VMM track each page's
> >> private/shared state so that they can now handle an exit fault on a
> >> shared memory where the fault happened due to reasons other than
> >> conversion.
> >
> > I don't see how filling kvm_run.memory_fault in more locations changes anything.
> > The userspace exits are inherently racy, e.g. userspace may have already converted
> > the page to the appropriate state, thus making KVM's exit spurious.  So either
> > the VMM already tracks state, or the VMM blindly converts to shared/private.
> >
> 
> I might be missing some details here. The change is adding exit_reason =
> KVM_EXIT_MEMORY_FAULT to code path which would earlier result in VMM
> panics?
> 
> For ex:
> 
> @@ -1473,6 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> 	if (unlikely(!vma)) {
> 		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> 		mmap_read_unlock(current->mm);
> +		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, 0,
> +					      write_fault, exec_fault, false);
> 		return -EFAULT;
> 	}
> 
> 
> VMMs handle this with code as below
> 
> static bool handle_memoryfault(struct kvm_cpu *vcpu)
> {
> ....
>         return true;
> }
> 
> bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
> {
> 	switch (vcpu->kvm_run->exit_reason) {
>         ...
> 	case KVM_EXIT_MEMORY_FAULT:
> 		return handle_memoryfault(vcpu);
> 	}
> 
> 	return false;
> }
> 
> and the caller did
> 
> 		ret = kvm_cpu__handle_exit(cpu);
> 		if (!ret)
> 			goto panic_kvm;
> 		break;
> 
> 
> This change will break those VMMs isn't? ie, we will not panic after
> this change?

If the VMM unconditionally resumes the guest on errno=EFAULT, that's a VMM bug.
handle_memoryfault() needs to have some amount of checking to verify that it can
actually resolve the fault that was reported, given the gfn and metadata.  In
practice, that means panicking on any gfn that's not associated with a memslot
that has KVM_MEM_GUEST_MEMFD, because prior to this series, it's impossible for
userspace to resolve any faults besides implict shared<=>private conversions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] KVM: Documentation: Clarify docs for KVM_CAP_MEMORY_FAULT_INFO
  2024-08-09 20:51 ` [PATCH v2 1/3] KVM: Documentation: Clarify docs for KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
@ 2024-08-16 20:53   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-08-16 20:53 UTC (permalink / raw)
  To: Anish Moorthy; +Cc: oliver.upton, kvm, kvmarm, jthoughton, rananta

On Fri, Aug 09, 2024, Anish Moorthy wrote:
> The initial paragraph of the documentation here makes it sound like a
> KVM_EXIT_MEMORY_FAULT will always accompany an EFAULT from KVM_RUN, but
> that's not a guarantee.
> 
> Also, define zero to be a special value for the "size" field. This
> allows memory faults exits to be set up in spots where KVM_RUN must
> EFAULT, but is not able to supply an accurate size.
> 
> Signed-off-by: Anish Moorthy <amoorthy@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 8e5dad80b337..c5ce7944005c 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7073,7 +7073,8 @@ spec refer, https://github.com/riscv/riscv-sbi-doc.
>  
>  KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that
>  could not be resolved by KVM.  The 'gpa' and 'size' (in bytes) describe the
> -guest physical address range [gpa, gpa + size) of the fault.  The 'flags' field
> +guest physical address range [gpa, gpa + size) of the fault: when zero, it
> +indicates that the size of the fault could not be determined. The 'flags' field
>  describes properties of the faulting access that are likely pertinent:
>  
>   - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred
> @@ -8131,7 +8132,7 @@ unavailable to host or other VMs.
>  :Architectures: x86
>  :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
>  
> -The presence of this capability indicates that KVM_RUN will fill
> +The presence of this capability indicates that KVM_RUN *may* fill

I would prefer to fix KVM than to change the documentation.  The "will fill" is
specifically scoped to guest page fault VM-Exits, so it should be a fully solvable
problem.  I don't want to leave wriggle room for KVM, because then it will be
quite difficult for userspace to do anything useful with memory_fault.

E.g. for x86, convert all -EFAULTs that are returned when KVM is hosed to -EIO
and KVM_BUG_ON, and then there's only one -EFAULT that doesn't fill memory_fault.
Completely untested...

---
 arch/x86/kvm/mmu/mmu.c         | 13 +++++++------
 arch/x86/kvm/mmu/paging_tmpl.h |  4 ++--
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 928cf84778b0..cb4e3a1041ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3225,8 +3225,8 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 					     fault->req_level >= it.level);
 	}
 
-	if (WARN_ON_ONCE(it.level != fault->goal_level))
-		return -EFAULT;
+	if (KVM_BUG_ON(it.level != fault->goal_level, vcpu->kvm))
+		return -EIO;
 
 	ret = mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL,
 			   base_gfn, fault->pfn, fault);
@@ -3264,6 +3264,7 @@ static int kvm_handle_error_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
 		return RET_PF_RETRY;
 	}
 
+	kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
 	return -EFAULT;
 }
 
@@ -4597,8 +4598,8 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 
 #ifndef CONFIG_X86_64
 	/* A 64-bit CR2 should be impossible on 32-bit KVM. */
-	if (WARN_ON_ONCE(fault_address >> 32))
-		return -EFAULT;
+	if (KVM_BUG_ON(fault_address >> 32, vcpu->kvm))
+		return -EIO;
 #endif
 	/*
 	 * Legacy #PF exception only have a 32-bit error code.  Simply drop the
@@ -5988,8 +5989,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 
 	r = RET_PF_INVALID;
 	if (unlikely(error_code & PFERR_RSVD_MASK)) {
-		if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
-			return -EFAULT;
+		if (KVM_BUG_ON(error_code & PFERR_PRIVATE_ACCESS, vcpu->kvm))
+			return -EIO;
 
 		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
 		if (r == RET_PF_EMULATE)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 69941cebb3a8..4f4704c65c40 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -745,8 +745,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 					     fault->req_level >= it.level);
 	}
 
-	if (WARN_ON_ONCE(it.level != fault->goal_level))
-		return -EFAULT;
+	if (KVM_BUG_ON(it.level != fault->goal_level, vcpu->kvm))
+		return -EIO;
 
 	ret = mmu_set_spte(vcpu, fault->slot, it.sptep, gw->pte_access,
 			   base_gfn, fault->pfn, fault);

base-commit: 12ac7b9981ff30f0deffe6331bb742c71b279300
-- 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] KVM: x86: Do a KVM_MEMORY_FAULT EXIT when stage-2 fault handler EFAULTs
  2024-08-09 20:51 ` [PATCH v2 2/3] KVM: x86: Do a KVM_MEMORY_FAULT EXIT when stage-2 fault handler EFAULTs Anish Moorthy
@ 2024-08-16 20:57   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-08-16 20:57 UTC (permalink / raw)
  To: Anish Moorthy; +Cc: oliver.upton, kvm, kvmarm, jthoughton, rananta

On Fri, Aug 09, 2024, Anish Moorthy wrote:
> Right now userspace just gets a bare EFAULT when the stage-2 fault
> handler fails to fault in the relevant page. Set up a memory fault exit
> when this happens, which at the very least eases debugging and might
> also let userspace decide on/take some specific action other than
> crashing the VM.

Heh, most of the way there (from my hack-a-patch response), just need to add
the KVM_BUG_ON() + -EIO conversions.

Can you send x86 and arm64 as separate series for v3?  E.g. for x86, just this
patch and the -EIO changes.  I'm pretty sure the docs updates can go in the arm64
series (I need to send another response to that patch).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 handler EFAULTs
  2024-08-09 20:51 ` [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 " Anish Moorthy
  2024-08-12  7:51   ` Aneesh Kumar K.V
@ 2024-08-16 21:22   ` Sean Christopherson
  1 sibling, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2024-08-16 21:22 UTC (permalink / raw)
  To: Anish Moorthy; +Cc: oliver.upton, kvm, kvmarm, jthoughton, rananta

On Fri, Aug 09, 2024, Anish Moorthy wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 6981b1bc0946..c97199d1feac 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1448,6 +1448,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	if (fault_is_perm && !write_fault && !exec_fault) {
>  		kvm_err("Unexpected L2 read permission error\n");
> +		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, 0,

In this case, KVM has the fault granule, can't we just use that instead of
reporting '0'?

> +					      write_fault, exec_fault, false);
>  		return -EFAULT;
>  	}
>  
> @@ -1473,6 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (unlikely(!vma)) {
>  		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
>  		mmap_read_unlock(current->mm);
> +		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, 0,

Why can't KVM use the minimum possible page (granule?) size.  It's _always_ legal
for KVM to map at a smaller granularity than the primary MMU, thus it's always
accurate to report KVM's minimum size.

In fact, I would argue that it's inaccurate to report anything larger, because
there's no way for KVM to know if the badness extends to an entire hugepage.
E.g. even in the MTE case below, reporting the vma _page size_ is weird.  IIUC,
the problem exists with the entire vma, not some random (huge)page in the vma.

> +					      write_fault, exec_fault, false);
>  		return -EFAULT;
>  	}
>  
> @@ -1568,8 +1572,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		kvm_send_hwpoison_signal(hva, vma_shift);
>  		return 0;
>  	}
> -	if (is_error_noslot_pfn(pfn))
> +	if (is_error_noslot_pfn(pfn)) {
> +		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, vma_pagesize,
> +					      write_fault, exec_fault, false);
>  		return -EFAULT;

Shouldn't this be:

	if (KVM_BUG_ON(is_error_noslot_pfn(pfn), vcpu->kvm))
		return -EIO;

Emulated MMIO is suppposed to be handled in kvm_handle_guest_abort():

	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {o
		...

		/*
		 * The IPA is reported as [MAX:12], so we need to
		 * complement it with the bottom 12 bits from the
		 * faulting VA. This is always 12 bits, irrespective
		 * of the page size.
		 */
		ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
		ret = io_mem_abort(vcpu, ipa);
		goto out_unlock;
	}

And the memslot itself is stable, e.g. it can't disappear, and it can't have its
flags toggled.  KVM specifically does all modifications to memslots on unreachable
structures so that a memslot cannot change once it has been retrieved from the
memslots tree. 
	/*
	 * Mark the current slot INVALID.  As with all memslot modifications,
	 * this must be done on an unreachable slot to avoid modifying the
	 * current slot in the active tree.
	 */
	kvm_copy_memslot(invalid_slot, old);
	invalid_slot->flags |= KVM_MEMSLOT_INVALID;
	kvm_replace_memslot(kvm, old, invalid_slot);

And if KVM were indeed re-retrieving the memslot from kvm->memslots, then the
appropriate behavior would be

	if (is_error_noslot_pfn(pfn))
		return -EAGAIN;

so that KVM retries the fault.  It's perfectly legal to delete a memslot at any
time, with the rather large caveat that if bad things happen to the guest, it's
userspace responsibility to deal with the fallout.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-08-16 21:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-09 20:51 [PATCH v2 0/3] Set up KVM_EXIT_MEMORY_FAULTs when arm64/x86 stage-2 fault handlers fail Anish Moorthy
2024-08-09 20:51 ` [PATCH v2 1/3] KVM: Documentation: Clarify docs for KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
2024-08-16 20:53   ` Sean Christopherson
2024-08-09 20:51 ` [PATCH v2 2/3] KVM: x86: Do a KVM_MEMORY_FAULT EXIT when stage-2 fault handler EFAULTs Anish Moorthy
2024-08-16 20:57   ` Sean Christopherson
2024-08-09 20:51 ` [PATCH v2 3/3] KVM: arm64: Perform memory fault exits when stage-2 " Anish Moorthy
2024-08-12  7:51   ` Aneesh Kumar K.V
2024-08-13 14:26     ` Sean Christopherson
2024-08-14  8:02       ` Aneesh Kumar K.V
2024-08-14 14:49         ` Sean Christopherson
2024-08-16 21:22   ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).