linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: arm64: handle the translation table walk RAS error
  2017-11-29 20:48 [PATCH] KVM: arm64: handle the translation table walk RAS error Dongjiu Geng
@ 2017-11-29 13:22 ` Christoffer Dall
  2017-11-30 11:32   ` gengdongjiu
  0 siblings, 1 reply; 3+ messages in thread
From: Christoffer Dall @ 2017-11-29 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 30, 2017 at 04:48:44AM +0800, Dongjiu Geng wrote:
> For the RAS Synchronous External Abort, there are two types.
> One is memory access, it will be handled by host APEI driver.
> Another is translation table walk, in essence, it is hardware
> memory error on stage1 or stage2 page table.
> 
> For the guest stage1 translation table error, if host APEI
> driver handles it, APEI driver will unmap this page for the
> stage1 page table, then switch to guest, guest reused this
> page table and generate stage2 data abort, KVM deliver SIGBUS
> to user space. User space inject this error to guest, when
> guest handle this abort, it may also use this stage1 page
> table, but it already unmap by host APEI driver, then
> generate stage2 data abort again, so this will lead to dead
> loop.

Why does it lead to a loop? If the host has marked a page as unusable,
shouldn't the guest stage 1 page table be backed by a different page
when the fault happens on stage 2?

> 
> For the guest stage2 translation table error, if host APEI
> driver handles it, it will do nothing.
> 
> So for above reasons, we directly inject this Synchronous
> External Abort to guest and let guest handle it, for example,
> kill the guest application or panic guest OS.

I don't see why we need to distinguish between what caused a memory
access error, a direct access or a page table walk, in terms of how the
host/guest interaction works here.

What is the fundamental difference?

Thanks,
-Christoffer

> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  2 ++
>  virt/kvm/arm/mmu.c               | 14 ++++++++++++--
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 1188272..b8cb67a 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -217,6 +217,8 @@
>  #define FSC_SECC_TTW2	(0x1e)
>  #define FSC_SECC_TTW3	(0x1f)
>  
> +#define FSC_SEA_TTW    FSC_SEA_TTW0
> +
>  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>  #define HPFAR_MASK	(~UL(0xf))
>  
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index b36945d..6eab82d 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1484,8 +1484,18 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	/* Synchronous External Abort? */
>  	if (kvm_vcpu_dabt_isextabt(vcpu)) {
>  		/*
> -		 * For RAS the host kernel may handle this abort.
> -		 * There is no need to pass the error into the guest.
> +		 * For RAS translation table walk abort, pass the error
> +		 * into the guest.
> +		 */
> +		if (fault_status == FSC_SEA_TTW) {
> +			kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> +			return 1;
> +		}
> +
> +		/*
> +		 * For RAS normal memory access abort, the host kernel may
> +		 * handle this abort. There is no need to pass the error into
> +		 * the guest.
>  		 */
>  		if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
>  			return 1;
> -- 
> 1.9.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] KVM: arm64: handle the translation table walk RAS error
@ 2017-11-29 20:48 Dongjiu Geng
  2017-11-29 13:22 ` Christoffer Dall
  0 siblings, 1 reply; 3+ messages in thread
From: Dongjiu Geng @ 2017-11-29 20:48 UTC (permalink / raw)
  To: linux-arm-kernel

For the RAS Synchronous External Abort, there are two types.
One is memory access, it will be handled by host APEI driver.
Another is translation table walk, in essence, it is hardware
memory error on stage1 or stage2 page table.

For the guest stage1 translation table error, if host APEI
driver handles it, APEI driver will unmap this page for the
stage1 page table, then switch to guest, guest reused this
page table and generate stage2 data abort, KVM deliver SIGBUS
to user space. User space inject this error to guest, when
guest handle this abort, it may also use this stage1 page
table, but it already unmap by host APEI driver, then
generate stage2 data abort again, so this will lead to dead
loop.

For the guest stage2 translation table error, if host APEI
driver handles it, it will do nothing.

So for above reasons, we directly inject this Synchronous
External Abort to guest and let guest handle it, for example,
kill the guest application or panic guest OS.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 arch/arm64/include/asm/kvm_arm.h |  2 ++
 virt/kvm/arm/mmu.c               | 14 ++++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 1188272..b8cb67a 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -217,6 +217,8 @@
 #define FSC_SECC_TTW2	(0x1e)
 #define FSC_SECC_TTW3	(0x1f)
 
+#define FSC_SEA_TTW    FSC_SEA_TTW0
+
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK	(~UL(0xf))
 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b36945d..6eab82d 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1484,8 +1484,18 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	/* Synchronous External Abort? */
 	if (kvm_vcpu_dabt_isextabt(vcpu)) {
 		/*
-		 * For RAS the host kernel may handle this abort.
-		 * There is no need to pass the error into the guest.
+		 * For RAS translation table walk abort, pass the error
+		 * into the guest.
+		 */
+		if (fault_status == FSC_SEA_TTW) {
+			kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+			return 1;
+		}
+
+		/*
+		 * For RAS normal memory access abort, the host kernel may
+		 * handle this abort. There is no need to pass the error into
+		 * the guest.
 		 */
 		if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
 			return 1;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH] KVM: arm64: handle the translation table walk RAS error
  2017-11-29 13:22 ` Christoffer Dall
@ 2017-11-30 11:32   ` gengdongjiu
  0 siblings, 0 replies; 3+ messages in thread
From: gengdongjiu @ 2017-11-30 11:32 UTC (permalink / raw)
  To: linux-arm-kernel


On 2017/11/29 21:22, Christoffer Dall wrote:
>> table, but it already unmap by host APEI driver, then
>> generate stage2 data abort again, so this will lead to dead
>> loop.
> Why does it lead to a loop? If the host has marked a page as unusable,
> shouldn't the guest stage 1 page table be backed by a different page
> when the fault happens on stage 2?

Thanks a lot for the question and reply, I will test more to confirm it.

> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-11-30 11:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-29 20:48 [PATCH] KVM: arm64: handle the translation table walk RAS error Dongjiu Geng
2017-11-29 13:22 ` Christoffer Dall
2017-11-30 11:32   ` gengdongjiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).