qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: pkrempa@redhat.com, mst@redhat.com, libvir-list@redhat.com,
	hutao@cn.fujitsu.com, marcel.a@redhat.com,
	qemu-stable@nongnu.org, qemu-devel@nongnu.org, rhod@redhat.com,
	kraxel@redhat.com, anthony@codemonkey.ws, lcapitulino@redhat.com,
	afaerber@suse.de
Subject: Re: [Qemu-devel] [PATCH] vl: allow "cont" from panicked state
Date: Wed, 21 Aug 2013 14:42:38 +0200	[thread overview]
Message-ID: <5214B5BE.1030906@redhat.com> (raw)
In-Reply-To: <1377086477-19553-1-git-send-email-pbonzini@redhat.com>

one question below

On 08/21/13 14:01, Paolo Bonzini wrote:
> After reporting the GUEST_PANICKED monitor event, QEMU stops the VM.
> The reason for this is that events are edge-triggered, and can be lost if
> management dies at the wrong time.  Stopping a panicked VM lets management
> know of a panic even if it has crashed; management can learn about the
> panic when it restarts and queries running QEMU processes.  The downside
> is of course that the VM will be paused while management is not running,
> but that is acceptable if it only happens with explicit "-device pvpanic".
> 
> Upon learning of a panic, management (if configured to do so) can pick a
> variety of behaviors: leave the VM paused, reset it, destroy it.  In
> addition to all of these behaviors, it is possible dumping the VM core
> from the host.
> 
> However, right now, the panicked state is irreversible, and can only be
> exited by resetting the machine.  This means that any policy decision
> is entirely in the hands of the host.  In particular there is no way to
> use the "reboot on panic" option together with pvpanic.
> 
> This patch makes the panicked state reversible (and removes various
> workarounds that were there because of the state being irreversible).
> With this change, management has a wider set of possible policies: it
> can just log the crash and leave policy to the guest, it can leave the
> VM paused.  In particular, the "log the crash and continue" is implemented
> simply by sending a "cont" as soon as management learns about the panic.
> Management could also implement the "irreversible paused state" itself.
> And again, all such actions can be coupled with dumping the VM core.
> 
> Unfortunately we cannot change the behavior of 1.6.0.  Thus, even if
> it uses "-device pvpanic", management should check for "cont" failures.
> If "cont" fails, management can then log that the VM remained paused
> and urge the administrator to update QEMU.
> 
> I suggest that this patch be included in an 1.6.1 release as soon as
> possible, and perhaps in the 1.5 branch too.
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  gdbstub.c | 3 ---
>  vl.c      | 6 ++----
>  2 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index 35ca7c2..747e67d 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -372,9 +372,6 @@ static inline void gdb_continue(GDBState *s)
>  #ifdef CONFIG_USER_ONLY
>      s->running_state = 1;
>  #else
> -    if (runstate_check(RUN_STATE_GUEST_PANICKED)) {
> -        runstate_set(RUN_STATE_DEBUG);
> -    }

Undoes bc7d0e66. Makes sense -- what we're allowing now is laxer
(includes the above).

>      if (!runstate_needs_reset()) {
>          vm_start();
>      }
> diff --git a/vl.c b/vl.c
> index 25b8f2f..818d99e 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -637,9 +637,8 @@ static const RunStateTransition runstate_transitions_def[] = {
>      { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
>      { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
>  
> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },

I don't understand why this used to be here.

So, why? (*)

> +    { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },

This is the "cont" we care about -- it should allow the guest to kexec
or reboot (ie. the panic notifier will return).

>      { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
> -    { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },

Undoes bc7d0e66. OK.

>  
>      { RUN_STATE_MAX, RUN_STATE_MAX },
>  };
> @@ -685,8 +684,7 @@ int runstate_is_running(void)
>  bool runstate_needs_reset(void)
>  {
>      return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
> -        runstate_check(RUN_STATE_SHUTDOWN) ||
> -        runstate_check(RUN_STATE_GUEST_PANICKED);
> +        runstate_check(RUN_STATE_SHUTDOWN);
>  }
>  
>  StatusInfo *qmp_query_status(Error **errp)
> 

This last hunk in effect reverts the runstate_needs_reset() changes of
the initial pvpanic commit ede085b3.

(*) Hm I think I understand why. main_loop_should_exit(), when a reset
was requested *and* runstate_needs_reset() evaluated to true, used to
set the runstate to PAUSED -- I guess temporarily.

Since PANICKED was included in runstate_needs_reset(), this generic code
could request a transition from PANICKED to PAUSED (**). As PANICKED is
being removed from runstate_needs_reset(), the PANICKED->PAUSED
transition is not required any longer.

(**) I don't know why the generic code moves to PAUSED temporarily (from
INTERNAL_ERROR and SHUTDOWN), but I'll just accept that as status quo.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

(Note that my R-b is mostly worthless: similarly to the ACPI table move,
I've been happily acking patches with opposite goals here, and that
seriously questions whether my review adds any value (beyond the lowest
technical level).)

Laszlo

  reply	other threads:[~2013-08-21 12:40 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-21 12:01 [Qemu-devel] [PATCH] vl: allow "cont" from panicked state Paolo Bonzini
2013-08-21 12:42 ` Laszlo Ersek [this message]
2013-08-21 12:43   ` Paolo Bonzini
2013-08-21 14:17     ` Luiz Capitulino
2013-08-21 14:30       ` Michael S. Tsirkin
2013-08-21 14:37         ` Paolo Bonzini
2013-08-21 14:58           ` Michael S. Tsirkin
2013-08-21 15:07             ` Paolo Bonzini
2013-08-21 13:32   ` Michael S. Tsirkin
2013-08-21 13:30 ` Michael S. Tsirkin
2013-08-21 13:46   ` Paolo Bonzini
2013-08-21 14:11 ` Luiz Capitulino
2013-08-21 15:23 ` Eric Blake
2013-08-21 15:32   ` Paolo Bonzini
2013-08-21 15:44     ` Michael S. Tsirkin
2013-08-22  8:38     ` Laszlo Ersek
2013-08-22  9:19       ` Paolo Bonzini
2013-08-22  9:37         ` Michael S. Tsirkin
2013-08-22  9:52           ` Paolo Bonzini
2013-08-22 10:34         ` Laszlo Ersek
2013-08-22 10:36           ` Laszlo Ersek
2013-08-22 11:35           ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5214B5BE.1030906@redhat.com \
    --to=lersek@redhat.com \
    --cc=afaerber@suse.de \
    --cc=anthony@codemonkey.ws \
    --cc=hutao@cn.fujitsu.com \
    --cc=kraxel@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=marcel.a@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pkrempa@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=rhod@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).