From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48718) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VCAZB-0002WC-QF for qemu-devel@nongnu.org; Wed, 21 Aug 2013 11:43:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VCAYy-00045p-FN for qemu-devel@nongnu.org; Wed, 21 Aug 2013 11:43:25 -0400 Date: Wed, 21 Aug 2013 18:44:53 +0300 From: "Michael S. Tsirkin" Message-ID: <20130821154453.GB10984@redhat.com> References: <1377086477-19553-1-git-send-email-pbonzini@redhat.com> <5214DB87.6010305@redhat.com> <5214DD8B.2020803@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5214DD8B.2020803@redhat.com> Subject: Re: [Qemu-devel] [PATCH] vl: allow "cont" from panicked state List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: pkrempa@redhat.com, marcel.a@redhat.com, libvir-list@redhat.com, qemu-stable@nongnu.org, qemu-devel@nongnu.org, lcapitulino@redhat.com, rhod@redhat.com, kraxel@redhat.com, anthony@codemonkey.ws, hutao@cn.fujitsu.com, lersek@redhat.com, afaerber@suse.de On Wed, Aug 21, 2013 at 05:32:27PM +0200, Paolo Bonzini wrote: > Il 21/08/2013 17:23, Eric Blake ha scritto: > >> Upon learning of a panic, management (if configured to do so) can pick a > >> variety of behaviors: leave the VM paused, reset it, destroy it. In > >> addition to all of these behaviors, it is possible dumping the VM core > >> from the host. > > > > s/possible dumping/possible to dump/ > > > > and yes, libvirt wants to do just that, as one of its > > mappings, since it could do the same for Xen. > > > >> > >> However, right now, the panicked state is irreversible, and can only be > >> exited by resetting the machine. This means that any policy decision > >> is entirely in the hands of the host. In particular there is no way to > >> use the "reboot on panic" option together with pvpanic. > >> > >> This patch makes the panicked state reversible (and removes various > >> workarounds that were there because of the state being irreversible). > >> With this change, management has a wider set of possible policies: it > >> can just log the crash and leave policy to the guest, it can leave the > >> VM paused. In particular, the "log the crash and continue" is implemented > >> simply by sending a "cont" as soon as management learns about the panic. > >> Management could also implement the "irreversible paused state" itself. > >> And again, all such actions can be coupled with dumping the VM core. > > > > Yes, this makes sense. > > > >> > >> Unfortunately we cannot change the behavior of 1.6.0. Thus, even if > >> it uses "-device pvpanic", management should check for "cont" failures. > >> If "cont" fails, management can then log that the VM remained paused > >> and urge the administrator to update QEMU. > > > > Is that the best we can do? Is there any sort of QMP introspection that > > libvirt can do, where we can know UP FRONT what level of panic support > > is provided by the qemu binary and the machine type being run in that > > binary? > > No, this is not possible unfortunately. The only possibility that comes > to mind would be to rename the pvpanic device, e.g. to "isa-pvpanic", > and forget about "-device pvpanic" on 1.6.x. A hack, I know. > > To support 1.5, libvirt should simply be ready to react to unanticipated > GUEST_PANICKED events. reboot-on-panic will simply be broken for 1.5 > and Linux 3.10+ guests. :( Let's just fix the bugs in 1.6.X. I don't think libvirt needs to work around all qemu bugs. For 1.5.X it might be possible to backport -device pvpanic there. We need to make sure cross-version migration works. > >> +++ b/vl.c > >> @@ -637,9 +637,8 @@ static const RunStateTransition runstate_transitions_def[] = { > >> { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, > >> { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, > >> > >> - { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, > >> + { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, > > > > Is 'cont' the only viable way to escape PANICKED, or is it also > > reasonable to support 'stop' as a way to transition from PANICKED to > > PAUSED? That is, management may want to make the state reversible but > > still leave the guest paused, so this patch may be incomplete. > > No, there is no way to move from PANICKED to PAUSED. Libvirt has its > own statuses (PAUSED, CRASHED etc.) and substatuses. You don't really > care about the QEMU state: both the PAUSED_PANICKED and CRASHED_PANICKED > substatuses map to QEMU's GUEST_PANICKED state. Simply, libvirt will > not allow a "virsh resume" for preserve, and will > allow it for a hypothetical new pause element. > > BTW, any chance "coredump-destroy" and "coredump-restart" can be > preserved just for backwards compatibility, and a new coredump='yes/no' > attribute introduced instead? Because coredump-pause and > coredump-preserve would make just as much sense.