From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:40405)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1VC5R2-0002FL-Tk
	for qemu-devel@nongnu.org; Wed, 21 Aug 2013 06:14:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1VC5Qw-00023U-Rq
	for qemu-devel@nongnu.org; Wed, 21 Aug 2013 06:14:40 -0400
Received: from mx1.redhat.com ([209.132.183.28]:18895)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1VC5Qw-00023N-KR
	for qemu-devel@nongnu.org; Wed, 21 Aug 2013 06:14:34 -0400
Date: Wed, 21 Aug 2013 13:16:17 +0300
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20130821101617.GB4757@redhat.com>
References: <1376233843-19410-1-git-send-email-marcel.a@redhat.com>
	<520B2B8D.8070401@redhat.com>
	<1377072197.1888.35.camel@localhost.localdomain>
	<521477CF.4010703@redhat.com> <20130821094237.GA4757@redhat.com>
	<52148F88.5000509@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <52148F88.5000509@redhat.com>
Subject: Re: [Qemu-devel] [PATCH for-1.6 V2 0/2] pvpanic: Separate pvpanic
 from machine type
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: aliguori@us.ibm.com, gleb@redhat.com, Marcel Apfelbaum <marcel.a@redhat.com>, hutao@cn.fujitsu.com, qemu-devel@nongnu.org, Ronen Hod <rhod@redhat.com>, kraxel@redhat.com, afaerber@suse.de, vrozenfe@redhat.com

On Wed, Aug 21, 2013 at 11:59:36AM +0200, Paolo Bonzini wrote:
> Il 21/08/2013 11:42, Michael S. Tsirkin ha scritto:
> > On Wed, Aug 21, 2013 at 10:18:23AM +0200, Paolo Bonzini wrote:
> >> Il 21/08/2013 10:03, Marcel Apfelbaum ha scritto:
> >>> On Wed, 2013-08-14 at 10:02 +0300, Ronen Hod wrote:
> >>>> How about adding a flag that tells QEMU whether to pause or reboot the guest
> >>>> after the panic?
> >>>> We cannot assume that we always have a management layer that takes care
> >>>> of this.
> >>>> One example is Microsoft's WHQL that deliberately generates a BSOD, and then
> >>>> examines the dump files.
> >>> After this patch the pvpanic is not part of the global devices anymore so just
> >>> don't enable it if you want to reboot on BSOD.
> >>> In my opinion "reboot after panic" equals "run without pvpanic device"
> >>
> >> This is not entirely possible, since "reboot after panic" is a guest
> >> setting while "run without pvpanic device" is a host setting (that the
> >> guest administrator may not even have access to: Ronen's case is a good
> >> example of this, because the "administrator" there is the WHQL harness).
> >>
> >> However, I think this is a driver problem.  The driver should just probe
> >> the "reboot after panic" setting and not issue the outb to the pvpanic port.
> > 
> > This might or might not be possible on different OS-es.
> > What exactly is gained by doing vmstop on outb of pvpanic?
> 
> Because events are edge-triggered, and can be lost if management dies at
> the wrong time, each event that QEMU sends must go together with a way
> for management to poll the state.
> 
> For panic, the way to poll the state is "info status".  This matches
> what we do for watchdogs, for example.  Management can issue "info
> status" to learn of the panic state, even if it happens while management
> itself is not running:
> 
>      libvirtd                 QEMU                  guest
>   ---------------------------------------------------------------
>      stops
>                                                  <- pvpanic outb
>                               emits panic event
>                               (no one receives it)
>      starts
>      info status ->
>                               <- PANICKED
> 
> 
> Because there is only one running state, this means the VM has to be
> stopped.
> 
> But actually, fixing the driver would only be required if pvpanic were
> mandatory.
> 
> Now that pvpanic is optional, "reboot after panic" can also be fixed in
> libvirt.  Let's remove the "must reset after panic" limitation; then,
> libvirt can simply do itself a "continue" after receiving the panicked
> event (or after seeing that the guest is in panicked state).  The
> panicked event will never be sent unless management explicitly requests
> it (with "-device pvpanic"), so backwards compatibility is preserved.
> 
> The pause will still happen if management was stopped, but that's a fair
> compromise IMHO.
> 
> It will mean also that "reboot after panic" will be broken in 1.6.0,
> unfortunately.  Perhaps we can have a quick 1.6.1 release with this patch:
> 
> diff --git a/vl.c b/vl.c
> index 25b8f2f..25e890a 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -685,8 +685,7 @@ int runstate_is_running(void)
>  bool runstate_needs_reset(void)
>  {
>      return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
> -        runstate_check(RUN_STATE_SHUTDOWN) ||
> -        runstate_check(RUN_STATE_GUEST_PANICKED);
> +        runstate_check(RUN_STATE_SHUTDOWN);
>  }
> 
>  StatusInfo *qmp_query_status(Error **errp)
> 
> 
> By the way, this means two things:
> 
> - I am now sold on the idea that explicitly enabling of pvpanic is the
> right thing to do;
> 
> - on the other hand this is the proof that the change was not fully
> understood, and rushing it in 1.6 was the wrong thing to do.
> 
> Paolo

You mean 1.5.
pvpanic was a builtin in 1.5 and that was clearly the wrong thing to do.
We fixed that in 1.6, thankfully.

> > We want a notification about the panic but
> > adding yet another way to halt seems kind of useless.
> > Why not let VM continue? If it wants to stop it
> > can always call halt.
>