Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: paulus@ozlabs.org, linuxppc-dev@ozlabs.org,
	kvm-ppc@vger.kernel.org, mahesh@linux.vnet.ibm.com,
	michaele@au1.ibm.com, agraf@suse.de, kvm@vger.kernel.org
Subject: Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception
Date: Thu, 12 Nov 2015 10:48:34 +0530	[thread overview]
Message-ID: <5644212A.2000804@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151112033437.GI5852@voom.redhat.com>



On Thursday 12 November 2015 09:04 AM, David Gibson wrote:
> On Wed, Nov 11, 2015 at 10:28:46PM +0530, Aravinda Prasad wrote:
>> This patch modifies KVM to cause a guest exit with
>> KVM_EXIT_NMI instead of immediately delivering a 0x200
>> interrupt to guest upon machine check exception in
>> guest address. Exiting the guest enables QEMU to build
>> error log and deliver machine check exception to guest
>> OS (either via guest OS registered machine check
>> handler or via 0x200 guest OS interrupt vector).
>>
>> This approach simplifies the delivering of machine
>> check exception to guest OS compared to the earlier approach
>> of KVM directly invoking 0x200 guest interrupt vector.
>> In the earlier approach QEMU patched the 0x200 interrupt
>> vector during boot. The patched code at 0x200 issued a
>> private hcall to pass the control to QEMU to build the
>> error log.
>>
>> This design/approach is based on the feedback for the
>> QEMU patches to handle machine check exception. Details
>> of earlier approach of handling machine check exception
>> in QEMU and related discussions can be found at:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
>>
>> Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kvm/book3s_hv.c            |   12 +++---------
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   31 +++++++++++--------------------
>>  2 files changed, 14 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 2280497..1b1dff0 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -859,15 +859,9 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>  		r = RESUME_GUEST;
>>  		break;
>>  	case BOOK3S_INTERRUPT_MACHINE_CHECK:
>> -		/*
>> -		 * Deliver a machine check interrupt to the guest.
>> -		 * We have to do this, even if the host has handled the
>> -		 * machine check, because machine checks use SRR0/1 and
>> -		 * the interrupt might have trashed guest state in them.
>> -		 */
>> -		kvmppc_book3s_queue_irqprio(vcpu,
>> -					    BOOK3S_INTERRUPT_MACHINE_CHECK);
>> -		r = RESUME_GUEST;
>> +		/* Exit to guest with KVM_EXIT_NMI as exit reason */
>> +		run->exit_reason = KVM_EXIT_NMI;
>> +		r = RESUME_HOST;
>>  		break;
>>  	case BOOK3S_INTERRUPT_PROGRAM:
>>  	{
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index b98889e..672b4f6 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -147,7 +147,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  	addi	r1, r1, 112
>>  	ld	r7, HSTATE_HOST_MSR(r13)
>>  
>> -	cmpwi	cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>>  	beq	11f
>>  	cmpwi	cr2, r12, BOOK3S_INTERRUPT_HMI
>> @@ -160,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  	mtmsrd	r6, 1			/* Clear RI in MSR */
>>  	mtsrr0	r8
>>  	mtsrr1	r7
>> -	beq	cr1, 13f		/* machine check */
>>  	RFI
>>  
>>  	/* On POWER7, we have external interrupts set to use HSRR0/1 */
>> @@ -168,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>  	mtspr	SPRN_HSRR1, r7
>>  	ba	0x500
>>  
>> -13:	b	machine_check_fwnmi
>> -
> 
> I don't quite understand the 3 hunks above.  The rest seems to change
> the delivery of MCs as I'd expect, but the above seems to change their
> detection and the reason for that isn't obvious to me.

We need to do guest exit here and hence we continue with RFI or else if
we branch to machine_check_fwnmi then the control will flow to
opal_recover_mce routine without causing the guest to exit with NMI exit
code. And I think, opal_recover_mce() is used to recover from UEs in
host user/kernel space and does not have a check for UEs belonging to
guest. Hence if we branch to machine_check_fwnmi we end up running into
panic() in opal_machine_check() if UE belonged to guest.

Regards,
Aravinda

> 
> 
>>  14:	mtspr	SPRN_HSRR0, r8
>>  	mtspr	SPRN_HSRR1, r7
>>  	b	hmi_exception_after_realmode
>> @@ -2381,24 +2377,19 @@ machine_check_realmode:
>>  	ld	r9, HSTATE_KVM_VCPU(r13)
>>  	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>  	/*
>> -	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
>> -	 * machine check interrupt (set HSRR0 to 0x200). And for handled
>> -	 * errors (no-fatal), just go back to guest execution with current
>> -	 * HSRR0 instead of exiting guest. This new approach will inject
>> -	 * machine check to guest for fatal error causing guest to crash.
>> -	 *
>> -	 * The old code used to return to host for unhandled errors which
>> -	 * was causing guest to hang with soft lockups inside guest and
>> -	 * makes it difficult to recover guest instance.
>> +	 * Deliver unhandled/fatal (e.g. UE) MCE errors to guest
>> +	 * by exiting the guest with KVM_EXIT_NMI exit reason (exit
>> +	 * reason set later based on trap). For handled errors
>> +	 * (no-fatal), go back to guest execution with current HSRR0
>> +	 * instead of exiting the guest. This approach will cause
>> +	 * the guest to exit in case of fatal machine check error.
>>  	 */
>> -	ld	r10, VCPU_PC(r9)
>> +	bne	2f	/* Continue guest execution? */
>> +	/* If not, exit the guest. SRR0/1 are already set */
>> +	b	mc_cont
>> +2:	ld	r10, VCPU_PC(r9)
>>  	ld	r11, VCPU_MSR(r9)
>> -	bne	2f	/* Continue guest execution. */
>> -	/* If not, deliver a machine check.  SRR0/1 are already set */
>> -	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
>> -	ld	r11, VCPU_MSR(r9)
>> -	bl	kvmppc_msr_interrupt
>> -2:	b	fast_interrupt_c_return
>> +	b	fast_interrupt_c_return
>>  
>>  /*
>>   * Check the reason we woke from nap, and take appropriate action.
>>
> 

-- 
Regards,
Aravinda

     prev parent reply	other threads:[~2015-11-12  5:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-11 16:58 [PATCH] KVM: PPC: Exit guest upon fatal machine check exception Aravinda Prasad
2015-11-12  2:24 ` Daniel Axtens
2015-11-12  3:38   ` David Gibson
2015-11-12  4:32     ` Aravinda Prasad
2015-11-12  4:43       ` David Gibson
2015-11-12 17:52         ` Aravinda Prasad
2015-11-13  1:50           ` David Gibson
2015-11-13  6:26             ` Aravinda Prasad
2015-11-13  7:38               ` Thomas Huth
2015-11-13 11:25                 ` Aravinda Prasad
2015-11-12  4:58     ` Daniel Axtens
2015-11-12 17:22       ` Aravinda Prasad
2015-11-12 21:37         ` Daniel Axtens
2015-11-13  4:58           ` Aravinda Prasad
2015-11-12  3:34 ` David Gibson
2015-11-12  5:18   ` Aravinda Prasad [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5644212A.2000804@linux.vnet.ibm.com \
    --to=aravinda@linux.vnet.ibm.com \
    --cc=agraf@suse.de \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=michaele@au1.ibm.com \
    --cc=paulus@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).