From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40405) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC5R2-0002FL-Tk for qemu-devel@nongnu.org; Wed, 21 Aug 2013 06:14:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VC5Qw-00023U-Rq for qemu-devel@nongnu.org; Wed, 21 Aug 2013 06:14:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:18895) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC5Qw-00023N-KR for qemu-devel@nongnu.org; Wed, 21 Aug 2013 06:14:34 -0400 Date: Wed, 21 Aug 2013 13:16:17 +0300 From: "Michael S. Tsirkin" Message-ID: <20130821101617.GB4757@redhat.com> References: <1376233843-19410-1-git-send-email-marcel.a@redhat.com> <520B2B8D.8070401@redhat.com> <1377072197.1888.35.camel@localhost.localdomain> <521477CF.4010703@redhat.com> <20130821094237.GA4757@redhat.com> <52148F88.5000509@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52148F88.5000509@redhat.com> Subject: Re: [Qemu-devel] [PATCH for-1.6 V2 0/2] pvpanic: Separate pvpanic from machine type List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: aliguori@us.ibm.com, gleb@redhat.com, Marcel Apfelbaum , hutao@cn.fujitsu.com, qemu-devel@nongnu.org, Ronen Hod , kraxel@redhat.com, afaerber@suse.de, vrozenfe@redhat.com On Wed, Aug 21, 2013 at 11:59:36AM +0200, Paolo Bonzini wrote: > Il 21/08/2013 11:42, Michael S. Tsirkin ha scritto: > > On Wed, Aug 21, 2013 at 10:18:23AM +0200, Paolo Bonzini wrote: > >> Il 21/08/2013 10:03, Marcel Apfelbaum ha scritto: > >>> On Wed, 2013-08-14 at 10:02 +0300, Ronen Hod wrote: > >>>> How about adding a flag that tells QEMU whether to pause or reboot the guest > >>>> after the panic? > >>>> We cannot assume that we always have a management layer that takes care > >>>> of this. > >>>> One example is Microsoft's WHQL that deliberately generates a BSOD, and then > >>>> examines the dump files. > >>> After this patch the pvpanic is not part of the global devices anymore so just > >>> don't enable it if you want to reboot on BSOD. > >>> In my opinion "reboot after panic" equals "run without pvpanic device" > >> > >> This is not entirely possible, since "reboot after panic" is a guest > >> setting while "run without pvpanic device" is a host setting (that the > >> guest administrator may not even have access to: Ronen's case is a good > >> example of this, because the "administrator" there is the WHQL harness). > >> > >> However, I think this is a driver problem. The driver should just probe > >> the "reboot after panic" setting and not issue the outb to the pvpanic port. > > > > This might or might not be possible on different OS-es. > > What exactly is gained by doing vmstop on outb of pvpanic? > > Because events are edge-triggered, and can be lost if management dies at > the wrong time, each event that QEMU sends must go together with a way > for management to poll the state. > > For panic, the way to poll the state is "info status". This matches > what we do for watchdogs, for example. Management can issue "info > status" to learn of the panic state, even if it happens while management > itself is not running: > > libvirtd QEMU guest > --------------------------------------------------------------- > stops > <- pvpanic outb > emits panic event > (no one receives it) > starts > info status -> > <- PANICKED > > > Because there is only one running state, this means the VM has to be > stopped. > > But actually, fixing the driver would only be required if pvpanic were > mandatory. > > Now that pvpanic is optional, "reboot after panic" can also be fixed in > libvirt. Let's remove the "must reset after panic" limitation; then, > libvirt can simply do itself a "continue" after receiving the panicked > event (or after seeing that the guest is in panicked state). The > panicked event will never be sent unless management explicitly requests > it (with "-device pvpanic"), so backwards compatibility is preserved. > > The pause will still happen if management was stopped, but that's a fair > compromise IMHO. > > It will mean also that "reboot after panic" will be broken in 1.6.0, > unfortunately. Perhaps we can have a quick 1.6.1 release with this patch: > > diff --git a/vl.c b/vl.c > index 25b8f2f..25e890a 100644 > --- a/vl.c > +++ b/vl.c > @@ -685,8 +685,7 @@ int runstate_is_running(void) > bool runstate_needs_reset(void) > { > return runstate_check(RUN_STATE_INTERNAL_ERROR) || > - runstate_check(RUN_STATE_SHUTDOWN) || > - runstate_check(RUN_STATE_GUEST_PANICKED); > + runstate_check(RUN_STATE_SHUTDOWN); > } > > StatusInfo *qmp_query_status(Error **errp) > > > By the way, this means two things: > > - I am now sold on the idea that explicitly enabling of pvpanic is the > right thing to do; > > - on the other hand this is the proof that the change was not fully > understood, and rushing it in 1.6 was the wrong thing to do. > > Paolo You mean 1.5. pvpanic was a builtin in 1.5 and that was clearly the wrong thing to do. We fixed that in 1.6, thankfully. > > We want a notification about the panic but > > adding yet another way to halt seems kind of useless. > > Why not let VM continue? If it wants to stop it > > can always call halt. >