From: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@ozlabs.org>
Cc: kvm@vger.kernel.org, gleb@kernel.org, mahesh@linux.vnet.ibm.com,
agraf@suse.de, kvm-ppc@vger.kernel.org, linuxppc-dev@ozlabs.org,
pbonzini@redhat.com, david@gibson.dropbear.id.au
Subject: Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
Date: Mon, 25 Jan 2016 08:51:30 +0000 [thread overview]
Message-ID: <56A5DF42.70908@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160123212448.GC11916@fergus.ozlabs.ibm.com>
On Sunday 24 January 2016 02:54 AM, Paul Mackerras wrote:
> On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
>>> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>>>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>>>> exit reasons upon a machine check exception (MCE) in
>>>> the guest address space if the KVM_CAP_PPC_FWNMI
>>>> capability is enabled (instead of delivering 0x200
>>>> interrupt to guest). This enables QEMU to build error
>>>> log and deliver machine check exception to guest via
>>>> guest registered machine check handler.
>>>>
>>>> This approach simplifies the delivering of machine
>>>> check exception to guest OS compared to the earlier
>>>> approach of KVM directly invoking 0x200 guest interrupt
>>>> vector. In the earlier approach QEMU was enhanced to
>>>> patch the 0x200 interrupt vector during boot. The
>>>> patched code at 0x200 issued a private hcall to pass
>>>> the control to QEMU to build the error log.
>>>>
>>>> This design/approach is based on the feedback for the
>>>> QEMU patches to handle machine check exception. Details
>>>> of earlier approach of handling machine check exception
>>>> in QEMU and related discussions can be found at:
>>>
>>> [snip]
>>>
>>>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>> stb r0, HSTATE_HWTHREAD_REQ(r13)
>>>>
>>>> /*
>>>> - * For external and machine check interrupts, we need
>>>> - * to call the Linux handler to process the interrupt.
>>>> - * We do that by jumping to absolute address 0x500 for
>>>> - * external interrupts, or the machine_check_fwnmi label
>>>> - * for machine checks (since firmware might have patched
>>>> - * the vector area at 0x200). The [h]rfid at the end of the
>>>> - * handler will return to the book3s_hv_interrupts.S code.
>>>> - * For other interrupts we do the rfid to get back
>>>> - * to the book3s_hv_interrupts.S code here.
>>>> + * For external interrupts we need to call the Linux
>>>> + * handler to process the interrupt. We do that by jumping
>>>> + * to absolute address 0x500 for external interrupts.
>>>> + * The [h]rfid at the end of the handler will return to
>>>> + * the book3s_hv_interrupts.S code. For other interrupts
>>>> + * we do the rfid to get back to the book3s_hv_interrupts.S
>>>> + * code here.
>>>> */
>>>> ld r8, 112+PPC_LR_STKOFF(r1)
>>>> addi r1, r1, 112
>>>> ld r7, HSTATE_HOST_MSR(r13)
>>>>
>>>> - cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>>> cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
>>>> beq 11f
>>>> cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL
>>>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>> mtmsrd r6, 1 /* Clear RI in MSR */
>>>> mtsrr0 r8
>>>> mtsrr1 r7
>>>> - beq cr1, 13f /* machine check */
>>>> RFI
>>>>
>>>> /* On POWER7, we have external interrupts set to use HSRR0/1 */
>>>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>> mtspr SPRN_HSRR1, r7
>>>> ba 0x500
>>>>
>>>> -13: b machine_check_fwnmi
>>>> -
>>>
>>> So, what you're disabling here is the host-side handling of the
>>> machine check after completing the guest->host switch. This has
>>> nothing to do with how the machine check gets communicated to the
>>> guest.
>>>
>>> Now, part of the host-side machine check handling has already
>>> happened, but I thought there was more that was done in host kernel
>>> virtual mode. If this change really is needed then I would want an
>>> ack from Mahesh that this is correct, and it will need to be explained
>>> in detail in the patch description.
>>
>> If we don't do that we will end up running into
>> panic() in opal_machine_check() if UE belonged to guest.
>>
>> Details in this link:
>> http://marc.info/?l=kvm-ppc&m\x144730552720044&w=2
>
> Well maybe the panic call needs to be changed. But the way you have
> it, we *never* get to opal_machine_check for any machine check
> interrupt, and I have a hard time believing that is correct.
If I am not wrong, earlier, even without this patch we never get to
opal_machine_check() from this place for the machine check happening
inside the guest (irrespective of whether the machine check is recovered
or not). Because, we jump to fast_interrupt_c_return() from
machine_check_realmode without causing the guest to exit.
With this patch I call mc_cont from machine_check_realmode and want the
guest to exit with NMI exit code. However, we don't want to call
opal_machine_check(), because if we call opal_machine_check() we don't
end up with NMI exit for the guest.
Regards,
Aravinda
>
> Paul.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
--
Regards,
Aravinda
WARNING: multiple messages have this Message-ID (diff)
From: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@ozlabs.org>
Cc: kvm@vger.kernel.org, gleb@kernel.org, mahesh@linux.vnet.ibm.com,
agraf@suse.de, kvm-ppc@vger.kernel.org, linuxppc-dev@ozlabs.org,
pbonzini@redhat.com, david@gibson.dropbear.id.au
Subject: Re: [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled
Date: Mon, 25 Jan 2016 14:09:30 +0530 [thread overview]
Message-ID: <56A5DF42.70908@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160123212448.GC11916@fergus.ozlabs.ibm.com>
On Sunday 24 January 2016 02:54 AM, Paul Mackerras wrote:
> On Sat, Jan 23, 2016 at 06:23:35PM +0530, Aravinda Prasad wrote:
>>
>>
>> On Saturday 23 January 2016 03:58 PM, Paul Mackerras wrote:
>>> On Wed, Jan 13, 2016 at 12:38:09PM +0530, Aravinda Prasad wrote:
>>>> Enhance KVM to cause a guest exit with KVM_EXIT_NMI
>>>> exit reasons upon a machine check exception (MCE) in
>>>> the guest address space if the KVM_CAP_PPC_FWNMI
>>>> capability is enabled (instead of delivering 0x200
>>>> interrupt to guest). This enables QEMU to build error
>>>> log and deliver machine check exception to guest via
>>>> guest registered machine check handler.
>>>>
>>>> This approach simplifies the delivering of machine
>>>> check exception to guest OS compared to the earlier
>>>> approach of KVM directly invoking 0x200 guest interrupt
>>>> vector. In the earlier approach QEMU was enhanced to
>>>> patch the 0x200 interrupt vector during boot. The
>>>> patched code at 0x200 issued a private hcall to pass
>>>> the control to QEMU to build the error log.
>>>>
>>>> This design/approach is based on the feedback for the
>>>> QEMU patches to handle machine check exception. Details
>>>> of earlier approach of handling machine check exception
>>>> in QEMU and related discussions can be found at:
>>>
>>> [snip]
>>>
>>>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>>>> @@ -133,21 +133,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>> stb r0, HSTATE_HWTHREAD_REQ(r13)
>>>>
>>>> /*
>>>> - * For external and machine check interrupts, we need
>>>> - * to call the Linux handler to process the interrupt.
>>>> - * We do that by jumping to absolute address 0x500 for
>>>> - * external interrupts, or the machine_check_fwnmi label
>>>> - * for machine checks (since firmware might have patched
>>>> - * the vector area at 0x200). The [h]rfid at the end of the
>>>> - * handler will return to the book3s_hv_interrupts.S code.
>>>> - * For other interrupts we do the rfid to get back
>>>> - * to the book3s_hv_interrupts.S code here.
>>>> + * For external interrupts we need to call the Linux
>>>> + * handler to process the interrupt. We do that by jumping
>>>> + * to absolute address 0x500 for external interrupts.
>>>> + * The [h]rfid at the end of the handler will return to
>>>> + * the book3s_hv_interrupts.S code. For other interrupts
>>>> + * we do the rfid to get back to the book3s_hv_interrupts.S
>>>> + * code here.
>>>> */
>>>> ld r8, 112+PPC_LR_STKOFF(r1)
>>>> addi r1, r1, 112
>>>> ld r7, HSTATE_HOST_MSR(r13)
>>>>
>>>> - cmpwi cr1, r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>>> cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
>>>> beq 11f
>>>> cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL
>>>> @@ -162,7 +159,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>> mtmsrd r6, 1 /* Clear RI in MSR */
>>>> mtsrr0 r8
>>>> mtsrr1 r7
>>>> - beq cr1, 13f /* machine check */
>>>> RFI
>>>>
>>>> /* On POWER7, we have external interrupts set to use HSRR0/1 */
>>>> @@ -170,8 +166,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>>>> mtspr SPRN_HSRR1, r7
>>>> ba 0x500
>>>>
>>>> -13: b machine_check_fwnmi
>>>> -
>>>
>>> So, what you're disabling here is the host-side handling of the
>>> machine check after completing the guest->host switch. This has
>>> nothing to do with how the machine check gets communicated to the
>>> guest.
>>>
>>> Now, part of the host-side machine check handling has already
>>> happened, but I thought there was more that was done in host kernel
>>> virtual mode. If this change really is needed then I would want an
>>> ack from Mahesh that this is correct, and it will need to be explained
>>> in detail in the patch description.
>>
>> If we don't do that we will end up running into
>> panic() in opal_machine_check() if UE belonged to guest.
>>
>> Details in this link:
>> http://marc.info/?l=kvm-ppc&m=144730552720044&w=2
>
> Well maybe the panic call needs to be changed. But the way you have
> it, we *never* get to opal_machine_check for any machine check
> interrupt, and I have a hard time believing that is correct.
If I am not wrong, earlier, even without this patch we never get to
opal_machine_check() from this place for the machine check happening
inside the guest (irrespective of whether the machine check is recovered
or not). Because, we jump to fast_interrupt_c_return() from
machine_check_realmode without causing the guest to exit.
With this patch I call mc_cont from machine_check_realmode and want the
guest to exit with NMI exit code. However, we don't want to call
opal_machine_check(), because if we call opal_machine_check() we don't
end up with NMI exit for the guest.
Regards,
Aravinda
>
> Paul.
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
--
Regards,
Aravinda
next prev parent reply other threads:[~2016-01-25 8:51 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-13 7:07 [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour Aravinda Prasad
2016-01-13 7:19 ` Aravinda Prasad
2016-01-13 7:08 ` [PATCH v3 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled Aravinda Prasad
2016-01-13 7:20 ` Aravinda Prasad
2016-01-14 0:06 ` David Gibson
2016-01-14 0:06 ` David Gibson
2016-01-23 10:28 ` Paul Mackerras
2016-01-23 10:28 ` Paul Mackerras
2016-01-23 12:53 ` Aravinda Prasad
2016-01-23 12:53 ` Aravinda Prasad
2016-01-23 21:24 ` Paul Mackerras
2016-01-23 21:24 ` Paul Mackerras
2016-01-23 21:24 ` Paul Mackerras
2016-01-25 8:39 ` Aravinda Prasad [this message]
2016-01-25 8:51 ` Aravinda Prasad
2016-01-27 5:02 ` Mahesh Jagannath Salgaonkar
2016-01-27 5:14 ` Mahesh Jagannath Salgaonkar
2016-01-27 5:02 ` Mahesh Jagannath Salgaonkar
2016-06-20 5:18 ` Paul Mackerras
2016-06-20 5:18 ` Paul Mackerras
2016-06-20 5:18 ` Paul Mackerras
2016-06-21 21:01 ` Aravinda Prasad
2016-06-21 21:13 ` Aravinda Prasad
2016-01-14 0:02 ` [PATCH v3 1/2] KVM: PPC: New capability to control MCE behaviour David Gibson
2016-01-14 0:02 ` David Gibson
2016-01-14 0:05 ` David Gibson
2016-01-14 0:05 ` David Gibson
2016-01-23 10:20 ` Paul Mackerras
2016-01-23 10:20 ` Paul Mackerras
2016-01-23 12:28 ` Aravinda Prasad
2016-01-23 12:40 ` Aravinda Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56A5DF42.70908@linux.vnet.ibm.com \
--to=aravinda@linux.vnet.ibm.com \
--cc=agraf@suse.de \
--cc=david@gibson.dropbear.id.au \
--cc=gleb@kernel.org \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=mahesh@linux.vnet.ibm.com \
--cc=paulus@ozlabs.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.