Re: [Qemu-devel] [PATCH 4/4] target-ppc: Handle NMI guest exit

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Huth <thuth@redhat.com>
To: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: benh@au1.ibm.com, aik@ozlabs.ru, agraf@suse.de,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org, paulus@samba.org,
	sam.bobroff@au1.ibm.com, david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [PATCH 4/4] target-ppc: Handle NMI guest exit
Date: Mon, 16 Nov 2015 11:41:46 +0100	[thread overview]
Message-ID: <5649B2EA.6040108@redhat.com> (raw)
In-Reply-To: <5649AAFE.5030002@linux.vnet.ibm.com>

On 16/11/15 11:07, Aravinda Prasad wrote:
> 
> 
> On Monday 16 November 2015 01:22 PM, Thomas Huth wrote:
>> On 12/11/15 19:49, Aravinda Prasad wrote:
>>>
>>> On Thursday 12 November 2015 03:10 PM, Thomas Huth wrote:
>> ...
>>>> Also LoPAPR talks about 'subsequent processors report "fatal error
>>>> previously reported"', so maybe the other processors should report that
>>>> condition in this case?
>>>
>>> I feel guest kernel is responsible for that or does that mean that qemu
>>> should report the same error, which first processor encountered, for
>>> subsequent processors? In that case what if the error encountered by
>>> first processor was recovered.
>>
>> I simply refered to this text in LoPAPR:
>>
>>  Multiple processors of the same OS image may experi-
>>  ence fatal events at, or about, the same time. The first processor
>>  to enter the machine check handling firmware reports
>>  the fatal error. Subsequent processors serialize waiting for the
>>  first processor to issue the ibm,nmi-interlock call. These
>>  subsequent processors report "fatal error previously reported".
> 
> Yes, I asked this because I am not clear what "fatal error previously
> reported" means as described in PAPR.

Looking at table "Table 137. RTAS Event Return Format (Fixed Part)" in
LoPAPR, there is a "ALREADY_REPORTED" severity - I assume this is what
is meant by the cited paragraph?

>> Is there code in the host kernel already that takes care of this (I
>> haven't checked)? If so, how does the host kernel know that the event
>> happened "at or about the same time" since you're checking at the QEMU
>> side for the mutex condition?
> 
> I don't think the host kernel takes care of this; it simply forwards
> such errors to QEMU via NMI exit. I feel the time referred by "at or
> about the same time" is the duration between the registered machine
> check handler is invoked and the corresponding interlock call is issued
> by guest, which QEMU knows and is protected by a mutex.

I agree, that makes sense.

>>>> And of course you've also got to check that the same CPU is not getting
>>>> multiple NMIs before the interlock function has been called again.
>>>
>>> I think it is good to check that. However, shouldn't the guest enable ME
>>> until it calls interlock function?
>>
>> First, the hypervisor should never trust the guest to do the right
>> things. Second, LoPAPR says "the OS permanently relinquishes to firmware
>> the Machine State Register's Machine Check Enable bit", and Paul also
>> said something similar in another mail to this thread, so I think you
>> really have to check this in QEMU instead.
> 
> Hmm. ok. Since ME is always set when running in guest (assuming guest is
> not disabling it), we cannot check ME bit to figure out whether the same
> CPU is getting UEs before interlock is called. One way is to record the
> CPU ID upon such error and check before invoking registered machine
> check handler whether that CPU has a pending interlock call. Terminate
> the guest if there is a pending interlock call for that CPU rather than
> causing the guest to trigger recursive machine check errors.

Do we have some kind of checkstop state emulation in QEMU (sorry, I
haven't checked yet)? If yes, it might be nicer to use that and set the
guest state to PANIC instead of exiting QEMU directly - i.e. to do
something similar like the guest_panicked() function in
target-s390x/kvm.c. That way the management layer (libvirt) can decide
on its own whether to terminate the guest, reboot or keep it in the
crashed state for further analysis.

 Thomas

next prev parent reply	other threads:[~2015-11-16 10:41 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-11 17:15 [Qemu-devel] [PATCH 0/4] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests Aravinda Prasad
2015-11-11 17:15 ` [Qemu-devel] [PATCH 1/4] spapr: Extend rtas-blob Aravinda Prasad
2015-11-12  3:40   ` David Gibson
2015-11-12  8:26   ` Thomas Huth
2015-11-12 11:53     ` David Gibson
2015-11-12 18:59     ` Aravinda Prasad
2015-11-11 17:15 ` [Qemu-devel] [PATCH 2/4] spapr: Register and handle HCALL to receive updated RTAS region Aravinda Prasad
2015-11-12  3:42   ` David Gibson
2015-11-12  5:28     ` [Qemu-devel] [Qemu-ppc] " Nikunj A Dadhania
2015-11-12  7:23       ` David Gibson
2015-11-11 17:15 ` [Qemu-devel] [PATCH 3/4] spapr: Handle "ibm, nmi-register" and "ibm, nmi-interlock" RTAS calls Aravinda Prasad
2015-11-12  4:02   ` David Gibson
2015-11-12 18:04     ` Aravinda Prasad
2015-11-12  9:23   ` Thomas Huth
2015-11-12 18:52     ` Aravinda Prasad
2015-11-11 17:16 ` [Qemu-devel] [PATCH 4/4] target-ppc: Handle NMI guest exit Aravinda Prasad
2015-11-12  4:29   ` David Gibson
2015-11-12  5:20     ` Aravinda Prasad
2015-11-12  8:09   ` Thomas Huth
2015-11-12  9:40     ` Thomas Huth
2015-11-12 18:49       ` Aravinda Prasad
2015-11-16  7:52         ` Thomas Huth
2015-11-16 10:07           ` Aravinda Prasad
2015-11-16 10:41             ` Thomas Huth [this message]
2015-11-16 11:57               ` Aravinda Prasad
2015-11-13  1:57       ` David Gibson
2015-11-13  7:03         ` Thomas Huth
2015-11-16  5:45           ` David Gibson
2015-11-12 18:23     ` Aravinda Prasad
2015-11-13  1:58       ` David Gibson
2015-11-13  4:53         ` Aravinda Prasad
2015-11-13  5:57           ` David Gibson
2015-11-13  6:27             ` Aravinda Prasad
2015-11-19  1:56       ` Alexey Kardashevskiy
2015-11-19 16:02         ` Aravinda Prasad
2015-11-16  3:50     ` Paul Mackerras
2015-11-16  9:01       ` Thomas Huth
2015-11-16 11:29         ` Aravinda Prasad
2015-11-16 21:46         ` Paul Mackerras
2015-11-12  4:30 ` [Qemu-devel] [PATCH 0/4] target-ppc/spapr: Add FWNMI support in QEMU for PowerKVM guests David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5649B2EA.6040108@redhat.com \
    --to=thuth@redhat.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=aravinda@linux.vnet.ibm.com \
    --cc=benh@au1.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=paulus@samba.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=sam.bobroff@au1.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).