linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: Dongjiu Geng <gengdongjiu@huawei.com>
Cc: wuquanming@huawei.com, linux-doc@vger.kernel.org,
	kvm@vger.kernel.org, marc.zyngier@arm.com,
	catalin.marinas@arm.com, corbet@lwn.net, rjw@rjwysocki.net,
	linux@armlinux.org.uk, linuxarm@huawei.com,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	bp@alien8.de, linux-arm-kernel@lists.infradead.org,
	huangshaoyu@huawei.com, pbonzini@redhat.com,
	kvmarm@lists.cs.columbia.edu, devel@acpica.org
Subject: Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
Date: Tue, 14 Nov 2017 16:00:52 +0000	[thread overview]
Message-ID: <5A0B1334.7060500@arm.com> (raw)
In-Reply-To: <1510343650-23659-8-git-send-email-gengdongjiu@huawei.com>

Hi Dongjiu Geng,

On 10/11/17 19:54, Dongjiu Geng wrote:
> If it is not RAS SError, directly inject virtual SError,
> which will keep the old way. If it is RAS SError, firstly
> let host ACPI module to handle it.

> For the ACPI handling,
> if the error address is invalid, APEI driver will not
> identify the address to hwpoison memory and can not notify
> guest to do the recovery.

The guest can't do any recover either. There is no recovery you can do without
some information about what the error is.

This is your memory corruption at an unknown address? We should reboot.

(I agree memory_failure.c's::me_kernel() is ignoring kernel errors, we should
try and fix this. It makes some sense for polled or irq notifications, but not
SEA/SEI).


> In order to safe, KVM continues
> categorizing errors and handle it separately.

> If the RAS error is not propagated, let host user space to
> handle it. 

No. Host user space should not know anything about the kernel or platform RAS
support. Doing so creates an ABI link between EL3 firmware and Qemu. This is
totally unmaintainable.

This thing needs to be portable. The kernel should handle the error, and report
any symptoms to user-space. e.g. 'this memory is gone'.

We shouldn't special case KVM.


> The reason is that sometimes we can only kill the
> guest effected application instead of panic whose guest OS.
> Host user space specifies a valid ESR and inject virtual
> SError, guest can just kill the current application if the
> non-consumed error coming from guest application.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Quanming Wu <wuquanming@huawei.com>

The last Signed-off-by should match the person posting the patch. It's a chain
of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want
to credit Quanming Wu you can add CC and they can Ack/Review your patch.


> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 7debb74..1afdc87 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>  	return arm_exit_handlers[hsr_ec];
>  }
>  
> +/**
> + * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
> + * @vcpu:	the VCPU pointer
> + *
> + * For RAS SError interrupt, firstly let host kernel handle it.
> + * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
> + */
> +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
> +	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
> +	unsigned int aet = esr & ESR_ELx_AET;
> +
> +	/*
> +	 * This is not RAS SError
> +	 */
> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}

> +	/* The host kernel may handle this abort. */
> +	handle_guest_sei();

This has to claim the SError as a notification. If APEI claims the error, KVM
doesn't need to do anything more. You ignore its return code.


> +
> +	/*
> +	 * In below two conditions, it will directly inject the
> +	 * virtual SError:
> +	 * 1. The Syndrome is IMPLEMENTATION DEFINED
> +	 * 2. It is Uncategorized SEI
> +	 */
> +	if (impdef_syndrome ||
> +		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}
> +
> +	switch (aet) {
> +	case ESR_ELx_AET_CE:	/* corrected error */
> +	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
> +		return 1;	/* continue processing the guest exit */

> +	case ESR_ELx_AET_UER:	/* The error has not been propagated */
> +		/*
> +		 * Userspace only handle the guest SError Interrupt(SEI) if the
> +		 * error has not been propagated
> +		 */
> +		run->exit_reason = KVM_EXIT_EXCEPTION;
> +		run->ex.exception = ESR_ELx_EC_SERROR;
> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
> +		return 0;

We should not pass RAS notifications to user space. The kernel either handles
them, or it panics(). User space shouldn't even know if the kernel supports RAS
until it gets an MCEERR signal.

You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.

If we get a RAS SError and there are no CPER records or values in the ERR nodes,
we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)


> +	default:
> +		/*
> +		 * Until now, the CPU supports RAS and SEI is fatal, or host
> +		 * does not support to handle the SError.
> +		 */
> +		panic("This Asynchronous SError interrupt is dangerous, panic");
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>   * proper exit to userspace.



James

  reply	other threads:[~2017-11-14 16:00 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-10 19:54 [PATCH v8 0/7] Support RAS virtualization in KVM Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 1/7] arm64: cpufeature: Detect CPU RAS Extentions Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 2/7] KVM: arm64: Save ESR_EL2 on guest SError Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 3/7] acpi: apei: Add SEI notification type support for ARMv8 Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization Dongjiu Geng
2017-11-14 16:00   ` James Morse [this message]
2017-11-15 11:29     ` gengdongjiu
2017-12-06 10:26     ` gengdongjiu
2017-12-06 19:04       ` James Morse
2017-12-07  6:37         ` gengdongjiu
2017-12-15  3:30           ` gengdongjiu
2018-01-12 18:05             ` James Morse
2018-01-15  8:33               ` Christoffer Dall
2018-01-16 11:19                 ` gengdongjiu
2018-01-21  3:10                 ` gengdongjiu
2018-01-21  2:45               ` gengdongjiu
2018-01-22 19:32                 ` James Morse
2017-12-15 18:52           ` James Morse
2017-12-16  3:44             ` gengdongjiu
2018-01-22 19:36               ` James Morse
2017-12-16  4:47     ` gengdongjiu
2018-01-12 18:05       ` James Morse
2018-01-16 11:22         ` gengdongjiu
2018-01-21  2:54         ` gengdongjiu
2017-11-14 16:00 ` [PATCH v8 0/7] Support RAS virtualization in KVM James Morse
2017-11-15 11:06   ` gengdongjiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A0B1334.7060500@arm.com \
    --to=james.morse@arm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=devel@acpica.org \
    --cc=gengdongjiu@huawei.com \
    --cc=huangshaoyu@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxarm@huawei.com \
    --cc=marc.zyngier@arm.com \
    --cc=pbonzini@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=wuquanming@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).