[PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM
@ 2022-02-07  3:26 Peter Xu
  2022-02-07  9:08 ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Xu @ 2022-02-07  3:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: Juan Quintela, David Hildenbrand, Hyman Huang,
	Dr . David Alan Gilbert, peterx, Philippe Mathieu-Daudé,
	Leonardo Bras Soares Passos, Paolo Bonzini

QEMU can now easily crash with two continuous migration carried out:

(qemu) migrate -d exec:cat>out
(qemu) migrate_cancel
(qemu) migrate -d exec:cat>out
[crash] ../softmmu/memory.c:2782: memory_global_dirty_log_start: Assertion
`!(global_dirty_tracking & flags)' failed.

It's because memory API provides a way to postpone dirty log stop if the VM is
stopped, and that'll be re-done until the next VM start.  It was added in 2017
with commit 1931076077 ("migration: optimize the downtime", 2017-08-01).

However the recent work on allowing dirty tracking to be bitmask broke it,
which is commit 63b41db4bc ("memory: make global_dirty_tracking a bitmask",
2021-11-01).

The fix proposed in this patch contains two things:

  (1) Instead of passing over the flags to postpone stop dirty track, we add a
      global variable (along with current vmstate_change variable) to record
      what flags to stop dirty tracking.

  (2) When start dirty tracking, instead if remove the vmstate hook directly,
      we also execute the postponed stop process so that we make sure all the
      starts and stops will be paired.

This procedure is overlooked in the bitmask-ify work in 2021.

Actually with the postponed stop flags, a smarter thing to do is when start
dirty logging with specific flag, we can ignore the start if the flag is
contained in the "postponed to stop" flags, however that'll slightly complicate
the code, and it shouldn't be a major use case for QEMU.  Considering that this
should also copy stable, keep it simple while fixing the crash.

Cc: Hyman Huang <huangy81@chinatelecom.cn>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044818
Fixes: 63b41db4bc ("memory: make global_dirty_tracking a bitmask")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 softmmu/memory.c | 53 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 678dc62f06..cbb9b241ea 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2790,16 +2790,25 @@ void memory_global_after_dirty_log_sync(void)
     MEMORY_LISTENER_CALL_GLOBAL(log_global_after_sync, Forward);
 }
 
+/*
+ * Dirty track stop flags that are postponed due to VM being stopped.  Should
+ * only be used within vmstate_change hook.
+ */
+static unsigned int postponed_stop_flags;
 static VMChangeStateEntry *vmstate_change;
+static void memory_global_dirty_log_stop_postponed_run(void);
 
 void memory_global_dirty_log_start(unsigned int flags)
 {
     unsigned int old_flags = global_dirty_tracking;
 
-    if (vmstate_change) {
-        qemu_del_vm_change_state_handler(vmstate_change);
-        vmstate_change = NULL;
-    }
+    /*
+     * If there is postponed dirty log stop flags, do it, so that start/stop
+     * will always be paired.  A smarter thing to do is ignore start process if
+     * the same flag has got postponed on stop, but it shouldn't matter a lot
+     * in major user scenarios, so keep the code simple for now.
+     */
+    memory_global_dirty_log_stop_postponed_run();
 
     assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
     assert(!(global_dirty_tracking & flags));
@@ -2830,29 +2839,43 @@ static void memory_global_dirty_log_do_stop(unsigned int flags)
     }
 }
 
+/*
+ * Execute the postponed dirty log stop operations if there is, then reset
+ * everything (including the flags and the vmstate change hook).
+ */
+static void memory_global_dirty_log_stop_postponed_run(void)
+{
+    if (!vmstate_change) {
+        /* It means nothing is postponed */
+        return;
+    }
+
+    memory_global_dirty_log_do_stop(postponed_stop_flags);
+    postponed_stop_flags = 0;
+    qemu_del_vm_change_state_handler(vmstate_change);
+    vmstate_change = NULL;
+}
+
 static void memory_vm_change_state_handler(void *opaque, bool running,
                                            RunState state)
 {
-    unsigned int flags = (unsigned int)(uintptr_t)opaque;
     if (running) {
-        memory_global_dirty_log_do_stop(flags);
-
-        if (vmstate_change) {
-            qemu_del_vm_change_state_handler(vmstate_change);
-            vmstate_change = NULL;
-        }
+        memory_global_dirty_log_stop_postponed_run();
     }
 }
 
 void memory_global_dirty_log_stop(unsigned int flags)
 {
     if (!runstate_is_running()) {
+        /* Postpone the dirty log stop, e.g., to when VM starts again */
         if (vmstate_change) {
-            return;
+            /* Batch with previous postponed flags */
+            postponed_stop_flags |= flags;
+        } else {
+            postponed_stop_flags = flags;
+            vmstate_change = qemu_add_vm_change_state_handler(
+                memory_vm_change_state_handler, NULL);
         }
-        vmstate_change = qemu_add_vm_change_state_handler(
-                                memory_vm_change_state_handler,
-                                (void *)(uintptr_t)flags);
         return;
     }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM
  2022-02-07  3:26 [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM Peter Xu
@ 2022-02-07  9:08 ` Paolo Bonzini
  2022-02-07 10:36   ` Peter Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2022-02-07  9:08 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: David Hildenbrand, Juan Quintela, Hyman Huang,
	Philippe Mathieu-Daudé, Dr . David Alan Gilbert,
	Leonardo Bras Soares Passos

On 2/7/22 04:26, Peter Xu wrote:
> QEMU can now easily crash with two continuous migration carried out:
> 
> (qemu) migrate -d exec:cat>out
> (qemu) migrate_cancel
> (qemu) migrate -d exec:cat>out
> [crash] ../softmmu/memory.c:2782: memory_global_dirty_log_start: Assertion
> `!(global_dirty_tracking & flags)' failed.
> 
> It's because memory API provides a way to postpone dirty log stop if the VM is
> stopped, and that'll be re-done until the next VM start.  It was added in 2017
> with commit 1931076077 ("migration: optimize the downtime", 2017-08-01).
> 
> However the recent work on allowing dirty tracking to be bitmask broke it,
> which is commit 63b41db4bc ("memory: make global_dirty_tracking a bitmask",
> 2021-11-01).
> 
> The fix proposed in this patch contains two things:
> 
>    (1) Instead of passing over the flags to postpone stop dirty track, we add a
>        global variable (along with current vmstate_change variable) to record
>        what flags to stop dirty tracking.
> 
>    (2) When start dirty tracking, instead if remove the vmstate hook directly,
>        we also execute the postponed stop process so that we make sure all the
>        starts and stops will be paired.
> 
> This procedure is overlooked in the bitmask-ify work in 2021.
> 
> Actually with the postponed stop flags, a smarter thing to do is when start
> dirty logging with specific flag, we can ignore the start if the flag is
> contained in the "postponed to stop" flags, however that'll slightly complicate
> the code, and it shouldn't be a major use case for QEMU.  Considering that this
> should also copy stable, keep it simple while fixing the crash.
> 
> Cc: Hyman Huang <huangy81@chinatelecom.cn>
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044818
> Fixes: 63b41db4bc ("memory: make global_dirty_tracking a bitmask")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   softmmu/memory.c | 53 ++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 38 insertions(+), 15 deletions(-)
> 
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 678dc62f06..cbb9b241ea 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -2790,16 +2790,25 @@ void memory_global_after_dirty_log_sync(void)
>       MEMORY_LISTENER_CALL_GLOBAL(log_global_after_sync, Forward);
>   }
>   
> +/*
> + * Dirty track stop flags that are postponed due to VM being stopped.  Should
> + * only be used within vmstate_change hook.
> + */
> +static unsigned int postponed_stop_flags;
>   static VMChangeStateEntry *vmstate_change;
> +static void memory_global_dirty_log_stop_postponed_run(void);
>   
>   void memory_global_dirty_log_start(unsigned int flags)
>   {
>       unsigned int old_flags = global_dirty_tracking;
>   
> -    if (vmstate_change) {
> -        qemu_del_vm_change_state_handler(vmstate_change);
> -        vmstate_change = NULL;
> -    }
> +    /*
> +     * If there is postponed dirty log stop flags, do it, so that start/stop
> +     * will always be paired.  A smarter thing to do is ignore start process if
> +     * the same flag has got postponed on stop, but it shouldn't matter a lot
> +     * in major user scenarios, so keep the code simple for now.
> +     */
> +    memory_global_dirty_log_stop_postponed_run();

I think this should be as easy as doing here:

     postponed_stop_flags &= ~flags;
     memory_global_dirty_log_stop_postponed_run();
     flags &= ~global_dirty_tracking_flags;
     if (!flags) {
         return;
     }

plus adding an "if (postponed_stop_flags)" in 
memory_global_dirty_log_stop_postponed_run?

Paolo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM
  2022-02-07  9:08 ` Paolo Bonzini
@ 2022-02-07 10:36   ` Peter Xu
  2022-02-08  8:11     ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Xu @ 2022-02-07 10:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: David Hildenbrand, Juan Quintela, Hyman Huang,
	Philippe Mathieu-Daudé, qemu-devel,
	Leonardo Bras Soares Passos, Dr . David Alan Gilbert

On Mon, Feb 07, 2022 at 10:08:44AM +0100, Paolo Bonzini wrote:
> >   void memory_global_dirty_log_start(unsigned int flags)
> >   {
> >       unsigned int old_flags = global_dirty_tracking;
> > -    if (vmstate_change) {
> > -        qemu_del_vm_change_state_handler(vmstate_change);
> > -        vmstate_change = NULL;
> > -    }
> > +    /*
> > +     * If there is postponed dirty log stop flags, do it, so that start/stop
> > +     * will always be paired.  A smarter thing to do is ignore start process if
> > +     * the same flag has got postponed on stop, but it shouldn't matter a lot
> > +     * in major user scenarios, so keep the code simple for now.
> > +     */
> > +    memory_global_dirty_log_stop_postponed_run();
> 
> I think this should be as easy as doing here:
> 
>     postponed_stop_flags &= ~flags;
>     memory_global_dirty_log_stop_postponed_run();
>     flags &= ~global_dirty_tracking_flags;
>     if (!flags) {
>         return;
>     }
> 
> plus adding an "if (postponed_stop_flags)" in
> memory_global_dirty_log_stop_postponed_run?

Yeah I can do.  Though the latter "if (!flags)" check will also start to allow
nesting of memory_global_dirty_log_start(), and it'll make this assert useless:

    assert(!(global_dirty_tracking & flags));

I'll probably drop it too, then.

Curious: do we have any real case of nesting calls of starting dirty log?  I
always thought there's none, but I could miss something.

I'll post a v2 soon.  Thanks for the quick review!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM
  2022-02-07 10:36   ` Peter Xu
@ 2022-02-08  8:11     ` Paolo Bonzini
  0 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2022-02-08  8:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, David Hildenbrand, Hyman Huang, qemu-devel,
	Philippe Mathieu-Daudé, Leonardo Bras Soares Passos,
	Dr . David Alan Gilbert

On 2/7/22 11:36, Peter Xu wrote:
> Yeah I can do.  Though the latter "if (!flags)" check will also start to allow
> nesting of memory_global_dirty_log_start(), and it'll make this assert useless:
> 
>      assert(!(global_dirty_tracking & flags));
> 
> I'll probably drop it too, then.
> 
> Curious: do we have any real case of nesting calls of starting dirty log?  I
> always thought there's none, but I could miss something.

I don't think so, but I think there's no disadvantage in allowing it.

Paolo


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-02-08  9:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-07  3:26 [PATCH] memory: Fix qemu crash on starting dirty log twice with stopped VM Peter Xu
2022-02-07  9:08 ` Paolo Bonzini
2022-02-07 10:36   ` Peter Xu
2022-02-08  8:11     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).