qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PULL 0/8] Next patches
@ 2022-11-21 12:58 Juan Quintela
  2022-11-21 12:59 ` [PULL 1/8] migration/channel-block: fix return value for qio_channel_block_{readv, writev} Juan Quintela
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Juan Quintela @ 2022-11-21 12:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini

The following changes since commit a082fab9d259473a9d5d53307cf83b1223301181:

  Merge tag 'pull-ppc-20221117' of https://gitlab.com/danielhb/qemu into staging (2022-11-17 12:39:38 -0500)

are available in the Git repository at:

  https://gitlab.com/juan.quintela/qemu.git tags/next-pull-request

for you to fetch changes up to b5280437a7f49cf617cdd99bbbe2c7bd1652408b:

  migration: Block migration comment or code is wrong (2022-11-21 11:58:10 +0100)

----------------------------------------------------------------
Migration PULL request (take 3)

Hi

Drop everything that is not a bug fix:
- fixes by peter
- fix comment on block creation (me)
- fix return values from qio_channel_block()

Please, apply.

(take 1)
It includes:
- Leonardo fix for zero_copy flush
- Fiona fix for return value of readv/writev
- Peter Xu cleanups
- Peter Xu preempt patches
- Patches ready from zero page (me)
- AVX2 support (ling)
- fix for slow networking and reordering of first packets (manish)

Please, apply.

----------------------------------------------------------------

Fiona Ebner (1):
  migration/channel-block: fix return value for
    qio_channel_block_{readv,writev}

Juan Quintela (1):
  migration: Block migration comment or code is wrong

Leonardo Bras (1):
  migration/multifd/zero-copy: Create helper function for flushing

Peter Xu (5):
  migration: Fix possible infinite loop of ram save process
  migration: Fix race on qemu_file_shutdown()
  migration: Disallow postcopy preempt to be used with compress
  migration: Use non-atomic ops for clear log bitmap
  migration: Disable multifd explicitly with compression

 include/exec/ram_addr.h   | 11 +++++-----
 include/exec/ramblock.h   |  3 +++
 include/qemu/bitmap.h     |  1 +
 migration/block.c         |  4 ++--
 migration/channel-block.c |  6 ++++--
 migration/migration.c     | 18 ++++++++++++++++
 migration/multifd.c       | 30 ++++++++++++++++----------
 migration/qemu-file.c     | 27 ++++++++++++++++++++---
 migration/ram.c           | 27 ++++++++++++++---------
 util/bitmap.c             | 45 +++++++++++++++++++++++++++++++++++++++
 10 files changed, 139 insertions(+), 33 deletions(-)

-- 
2.38.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PULL 1/8] migration/channel-block: fix return value for qio_channel_block_{readv, writev}
  2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
@ 2022-11-21 12:59 ` Juan Quintela
  2022-11-21 12:59 ` [PULL 2/8] migration/multifd/zero-copy: Create helper function for flushing Juan Quintela
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2022-11-21 12:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini, Fiona Ebner

From: Fiona Ebner <f.ebner@proxmox.com>

in the error case. The documentation in include/io/channel.h states
that -1 or QIO_CHANNEL_ERR_BLOCK should be returned upon error. Simply
passing along the return value from the bdrv-functions has the
potential to confuse the call sides. Non-blocking mode is not
implemented currently, so -1 it is.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/channel-block.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/channel-block.c b/migration/channel-block.c
index c55c8c93ce..f4ab53acdb 100644
--- a/migration/channel-block.c
+++ b/migration/channel-block.c
@@ -62,7 +62,8 @@ qio_channel_block_readv(QIOChannel *ioc,
     qemu_iovec_init_external(&qiov, (struct iovec *)iov, niov);
     ret = bdrv_readv_vmstate(bioc->bs, &qiov, bioc->offset);
     if (ret < 0) {
-        return ret;
+        error_setg_errno(errp, -ret, "bdrv_readv_vmstate failed");
+        return -1;
     }
 
     bioc->offset += qiov.size;
@@ -86,7 +87,8 @@ qio_channel_block_writev(QIOChannel *ioc,
     qemu_iovec_init_external(&qiov, (struct iovec *)iov, niov);
     ret = bdrv_writev_vmstate(bioc->bs, &qiov, bioc->offset);
     if (ret < 0) {
-        return ret;
+        error_setg_errno(errp, -ret, "bdrv_writev_vmstate failed");
+        return -1;
     }
 
     bioc->offset += qiov.size;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 2/8] migration/multifd/zero-copy: Create helper function for flushing
  2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
  2022-11-21 12:59 ` [PULL 1/8] migration/channel-block: fix return value for qio_channel_block_{readv, writev} Juan Quintela
@ 2022-11-21 12:59 ` Juan Quintela
  2022-11-21 12:59 ` [PULL 3/8] migration: Fix possible infinite loop of ram save process Juan Quintela
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2022-11-21 12:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini, Leonardo Bras

From: Leonardo Bras <leobras@redhat.com>

Move flushing code from multifd_send_sync_main() to a new helper, and call
it in multifd_send_sync_main().

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/multifd.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 586ddc9d65..509bbbe3bf 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -566,6 +566,23 @@ void multifd_save_cleanup(void)
     multifd_send_state = NULL;
 }
 
+static int multifd_zero_copy_flush(QIOChannel *c)
+{
+    int ret;
+    Error *err = NULL;
+
+    ret = qio_channel_flush(c, &err);
+    if (ret < 0) {
+        error_report_err(err);
+        return -1;
+    }
+    if (ret == 1) {
+        dirty_sync_missed_zero_copy();
+    }
+
+    return ret;
+}
+
 int multifd_send_sync_main(QEMUFile *f)
 {
     int i;
@@ -616,17 +633,8 @@ int multifd_send_sync_main(QEMUFile *f)
         qemu_mutex_unlock(&p->mutex);
         qemu_sem_post(&p->sem);
 
-        if (flush_zero_copy && p->c) {
-            int ret;
-            Error *err = NULL;
-
-            ret = qio_channel_flush(p->c, &err);
-            if (ret < 0) {
-                error_report_err(err);
-                return -1;
-            } else if (ret == 1) {
-                dirty_sync_missed_zero_copy();
-            }
+        if (flush_zero_copy && p->c && (multifd_zero_copy_flush(p->c) < 0)) {
+            return -1;
         }
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 3/8] migration: Fix possible infinite loop of ram save process
  2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
  2022-11-21 12:59 ` [PULL 1/8] migration/channel-block: fix return value for qio_channel_block_{readv, writev} Juan Quintela
  2022-11-21 12:59 ` [PULL 2/8] migration/multifd/zero-copy: Create helper function for flushing Juan Quintela
@ 2022-11-21 12:59 ` Juan Quintela
  2022-11-21 12:59 ` [PULL 4/8] migration: Fix race on qemu_file_shutdown() Juan Quintela
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2022-11-21 12:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini

From: Peter Xu <peterx@redhat.com>

When starting ram saving procedure (especially at the completion phase),
always set last_seen_block to non-NULL to make sure we can always correctly
detect the case where "we've migrated all the dirty pages".

Then we'll guarantee both last_seen_block and pss.block will be valid
always before the loop starts.

See the comment in the code for some details.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/ram.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index dc1de9ddbc..1d42414ecc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2546,14 +2546,22 @@ static int ram_find_and_save_block(RAMState *rs)
         return pages;
     }
 
+    /*
+     * Always keep last_seen_block/last_page valid during this procedure,
+     * because find_dirty_block() relies on these values (e.g., we compare
+     * last_seen_block with pss.block to see whether we searched all the
+     * ramblocks) to detect the completion of migration.  Having NULL value
+     * of last_seen_block can conditionally cause below loop to run forever.
+     */
+    if (!rs->last_seen_block) {
+        rs->last_seen_block = QLIST_FIRST_RCU(&ram_list.blocks);
+        rs->last_page = 0;
+    }
+
     pss.block = rs->last_seen_block;
     pss.page = rs->last_page;
     pss.complete_round = false;
 
-    if (!pss.block) {
-        pss.block = QLIST_FIRST_RCU(&ram_list.blocks);
-    }
-
     do {
         again = true;
         found = get_queued_page(rs, &pss);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 4/8] migration: Fix race on qemu_file_shutdown()
  2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
                   ` (2 preceding siblings ...)
  2022-11-21 12:59 ` [PULL 3/8] migration: Fix possible infinite loop of ram save process Juan Quintela
@ 2022-11-21 12:59 ` Juan Quintela
  2022-11-21 12:59 ` [PULL 5/8] migration: Disallow postcopy preempt to be used with compress Juan Quintela
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2022-11-21 12:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini, Daniel P . Berrange

From: Peter Xu <peterx@redhat.com>

In qemu_file_shutdown(), there's a possible race if with current order of
operation.  There're two major things to do:

  (1) Do real shutdown() (e.g. shutdown() syscall on socket)
  (2) Update qemufile's last_error

We must do (2) before (1) otherwise there can be a race condition like:

      page receiver                     other thread
      -------------                     ------------
      qemu_get_buffer()
                                        do shutdown()
        returns 0 (buffer all zero)
        (meanwhile we didn't check this retcode)
      try to detect IO error
        last_error==NULL, IO okay
      install ALL-ZERO page
                                        set last_error
      --> guest crash!

To fix this, we can also check retval of qemu_get_buffer(), but not all
APIs can be properly checked and ultimately we still need to go back to
qemu_file_get_error().  E.g. qemu_get_byte() doesn't return error.

Maybe some day a rework of qemufile API is really needed, but for now keep
using qemu_file_get_error() and fix it by not allowing that race condition
to happen.  Here shutdown() is indeed special because the last_error was
emulated.  For real -EIO errors it'll always be set when e.g. sendmsg()
error triggers so we won't miss those ones, only shutdown() is a bit tricky
here.

Cc: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/qemu-file.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 4f400c2e52..2d5f74ffc2 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -79,6 +79,30 @@ int qemu_file_shutdown(QEMUFile *f)
     int ret = 0;
 
     f->shutdown = true;
+
+    /*
+     * We must set qemufile error before the real shutdown(), otherwise
+     * there can be a race window where we thought IO all went though
+     * (because last_error==NULL) but actually IO has already stopped.
+     *
+     * If without correct ordering, the race can happen like this:
+     *
+     *      page receiver                     other thread
+     *      -------------                     ------------
+     *      qemu_get_buffer()
+     *                                        do shutdown()
+     *        returns 0 (buffer all zero)
+     *        (we didn't check this retcode)
+     *      try to detect IO error
+     *        last_error==NULL, IO okay
+     *      install ALL-ZERO page
+     *                                        set last_error
+     *      --> guest crash!
+     */
+    if (!f->last_error) {
+        qemu_file_set_error(f, -EIO);
+    }
+
     if (!qio_channel_has_feature(f->ioc,
                                  QIO_CHANNEL_FEATURE_SHUTDOWN)) {
         return -ENOSYS;
@@ -88,9 +112,6 @@ int qemu_file_shutdown(QEMUFile *f)
         ret = -EIO;
     }
 
-    if (!f->last_error) {
-        qemu_file_set_error(f, -EIO);
-    }
     return ret;
 }
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 5/8] migration: Disallow postcopy preempt to be used with compress
  2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
                   ` (3 preceding siblings ...)
  2022-11-21 12:59 ` [PULL 4/8] migration: Fix race on qemu_file_shutdown() Juan Quintela
@ 2022-11-21 12:59 ` Juan Quintela
  2022-11-21 12:59 ` [PULL 6/8] migration: Use non-atomic ops for clear log bitmap Juan Quintela
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2022-11-21 12:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini

From: Peter Xu <peterx@redhat.com>

The preempt mode requires the capability to assign channel for each of the
page, while the compression logic will currently assign pages to different
compress thread/local-channel so potentially they're incompatible.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 739bb683f3..f3ed77a7d0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1337,6 +1337,17 @@ static bool migrate_caps_check(bool *cap_list,
             error_setg(errp, "Postcopy preempt requires postcopy-ram");
             return false;
         }
+
+        /*
+         * Preempt mode requires urgent pages to be sent in separate
+         * channel, OTOH compression logic will disorder all pages into
+         * different compression channels, which is not compatible with the
+         * preempt assumptions on channel assignments.
+         */
+        if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
+            error_setg(errp, "Postcopy preempt not compatible with compress");
+            return false;
+        }
     }
 
     return true;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 6/8] migration: Use non-atomic ops for clear log bitmap
  2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
                   ` (4 preceding siblings ...)
  2022-11-21 12:59 ` [PULL 5/8] migration: Disallow postcopy preempt to be used with compress Juan Quintela
@ 2022-11-21 12:59 ` Juan Quintela
  2022-11-21 12:59 ` [PULL 7/8] migration: Disable multifd explicitly with compression Juan Quintela
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2022-11-21 12:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini

From: Peter Xu <peterx@redhat.com>

Since we already have bitmap_mutex to protect either the dirty bitmap or
the clear log bitmap, we don't need atomic operations to set/clear/test on
the clear log bitmap.  Switching all ops from atomic to non-atomic
versions, meanwhile touch up the comments to show which lock is in charge.

Introduced non-atomic version of bitmap_test_and_clear_atomic(), mostly the
same as the atomic version but simplified a few places, e.g. dropped the
"old_bits" variable, and also the explicit memory barriers.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 include/exec/ram_addr.h | 11 +++++-----
 include/exec/ramblock.h |  3 +++
 include/qemu/bitmap.h   |  1 +
 util/bitmap.c           | 45 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 1500680458..f4fb6a2111 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -42,7 +42,8 @@ static inline long clear_bmap_size(uint64_t pages, uint8_t shift)
 }
 
 /**
- * clear_bmap_set: set clear bitmap for the page range
+ * clear_bmap_set: set clear bitmap for the page range.  Must be with
+ * bitmap_mutex held.
  *
  * @rb: the ramblock to operate on
  * @start: the start page number
@@ -55,12 +56,12 @@ static inline void clear_bmap_set(RAMBlock *rb, uint64_t start,
 {
     uint8_t shift = rb->clear_bmap_shift;
 
-    bitmap_set_atomic(rb->clear_bmap, start >> shift,
-                      clear_bmap_size(npages, shift));
+    bitmap_set(rb->clear_bmap, start >> shift, clear_bmap_size(npages, shift));
 }
 
 /**
- * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set
+ * clear_bmap_test_and_clear: test clear bitmap for the page, clear if set.
+ * Must be with bitmap_mutex held.
  *
  * @rb: the ramblock to operate on
  * @page: the page number to check
@@ -71,7 +72,7 @@ static inline bool clear_bmap_test_and_clear(RAMBlock *rb, uint64_t page)
 {
     uint8_t shift = rb->clear_bmap_shift;
 
-    return bitmap_test_and_clear_atomic(rb->clear_bmap, page >> shift, 1);
+    return bitmap_test_and_clear(rb->clear_bmap, page >> shift, 1);
 }
 
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 6cbedf9e0c..adc03df59c 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -53,6 +53,9 @@ struct RAMBlock {
      * and split clearing of dirty bitmap on the remote node (e.g.,
      * KVM).  The bitmap will be set only when doing global sync.
      *
+     * It is only used during src side of ram migration, and it is
+     * protected by the global ram_state.bitmap_mutex.
+     *
      * NOTE: this bitmap is different comparing to the other bitmaps
      * in that one bit can represent multiple guest pages (which is
      * decided by the `clear_bmap_shift' variable below).  On
diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 82a1d2f41f..3ccb00865f 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -253,6 +253,7 @@ void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
 bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr);
+bool bitmap_test_and_clear(unsigned long *map, long start, long nr);
 void bitmap_copy_and_clear_atomic(unsigned long *dst, unsigned long *src,
                                   long nr);
 unsigned long bitmap_find_next_zero_area(unsigned long *map,
diff --git a/util/bitmap.c b/util/bitmap.c
index f81d8057a7..8d12e90a5a 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -240,6 +240,51 @@ void bitmap_clear(unsigned long *map, long start, long nr)
     }
 }
 
+bool bitmap_test_and_clear(unsigned long *map, long start, long nr)
+{
+    unsigned long *p = map + BIT_WORD(start);
+    const long size = start + nr;
+    int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
+    unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
+    bool dirty = false;
+
+    assert(start >= 0 && nr >= 0);
+
+    /* First word */
+    if (nr - bits_to_clear > 0) {
+        if ((*p) & mask_to_clear) {
+            dirty = true;
+        }
+        *p &= ~mask_to_clear;
+        nr -= bits_to_clear;
+        bits_to_clear = BITS_PER_LONG;
+        p++;
+    }
+
+    /* Full words */
+    if (bits_to_clear == BITS_PER_LONG) {
+        while (nr >= BITS_PER_LONG) {
+            if (*p) {
+                dirty = true;
+                *p = 0;
+            }
+            nr -= BITS_PER_LONG;
+            p++;
+        }
+    }
+
+    /* Last word */
+    if (nr) {
+        mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
+        if ((*p) & mask_to_clear) {
+            dirty = true;
+        }
+        *p &= ~mask_to_clear;
+    }
+
+    return dirty;
+}
+
 bool bitmap_test_and_clear_atomic(unsigned long *map, long start, long nr)
 {
     unsigned long *p = map + BIT_WORD(start);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 7/8] migration: Disable multifd explicitly with compression
  2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
                   ` (5 preceding siblings ...)
  2022-11-21 12:59 ` [PULL 6/8] migration: Use non-atomic ops for clear log bitmap Juan Quintela
@ 2022-11-21 12:59 ` Juan Quintela
  2022-11-21 12:59 ` [PULL 8/8] migration: Block migration comment or code is wrong Juan Quintela
  2022-11-21 15:54 ` [PULL 0/8] Next patches Stefan Hajnoczi
  8 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2022-11-21 12:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini

From: Peter Xu <peterx@redhat.com>

Multifd thread model does not work for compression, explicitly disable it.

Note that previuosly even we can enable both of them, nothing will go
wrong, because the compression code has higher priority so multifd feature
will just be ignored.  Now we'll fail even earlier at config time so the
user should be aware of the consequence better.

Note that there can be a slight chance of breaking existing users, but
let's assume they're not majority and not serious users, or they should
have found that multifd is not working already.

With that, we can safely drop the check in ram_save_target_page() for using
multifd, because when multifd=on then compression=off, then the removed
check on save_page_use_compression() will also always return false too.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.c |  7 +++++++
 migration/ram.c       | 11 +++++------
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f3ed77a7d0..f485eea5fb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1350,6 +1350,13 @@ static bool migrate_caps_check(bool *cap_list,
         }
     }
 
+    if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+        if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
+            error_setg(errp, "Multifd is not compatible with compress");
+            return false;
+        }
+    }
+
     return true;
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 1d42414ecc..1338e47665 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2305,13 +2305,12 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
     }
 
     /*
-     * Do not use multifd for:
-     * 1. Compression as the first page in the new block should be posted out
-     *    before sending the compressed page
-     * 2. In postcopy as one whole host page should be placed
+     * Do not use multifd in postcopy as one whole host page should be
+     * placed.  Meanwhile postcopy requires atomic update of pages, so even
+     * if host page size == guest page size the dest guest during run may
+     * still see partially copied pages which is data corruption.
      */
-    if (!save_page_use_compression(rs) && migrate_use_multifd()
-        && !migration_in_postcopy()) {
+    if (migrate_use_multifd() && !migration_in_postcopy()) {
         return ram_save_multifd_page(rs, block, offset);
     }
 
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 8/8] migration: Block migration comment or code is wrong
  2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
                   ` (6 preceding siblings ...)
  2022-11-21 12:59 ` [PULL 7/8] migration: Disable multifd explicitly with compression Juan Quintela
@ 2022-11-21 12:59 ` Juan Quintela
  2022-11-21 15:54 ` [PULL 0/8] Next patches Stefan Hajnoczi
  8 siblings, 0 replies; 10+ messages in thread
From: Juan Quintela @ 2022-11-21 12:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini

And it appears that what is wrong is the code. During bulk stage we
need to make sure that some block is dirty, but no games with
max_size at all.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 migration/block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index 3577c815a9..4347da1526 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -880,8 +880,8 @@ static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
     blk_mig_unlock();
 
     /* Report at least one block pending during bulk phase */
-    if (pending <= max_size && !block_mig_state.bulk_completed) {
-        pending = max_size + BLK_MIG_BLOCK_SIZE;
+    if (!pending && !block_mig_state.bulk_completed) {
+        pending = BLK_MIG_BLOCK_SIZE;
     }
 
     trace_migration_block_save_pending(pending);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PULL 0/8] Next patches
  2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
                   ` (7 preceding siblings ...)
  2022-11-21 12:59 ` [PULL 8/8] migration: Block migration comment or code is wrong Juan Quintela
@ 2022-11-21 15:54 ` Stefan Hajnoczi
  8 siblings, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2022-11-21 15:54 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Stefan Hajnoczi, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé, Fam Zheng, Juan Quintela, qemu-block,
	David Hildenbrand, Peter Xu, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 115 bytes --]

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any user-visible changes.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-21 15:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-21 12:58 [PULL 0/8] Next patches Juan Quintela
2022-11-21 12:59 ` [PULL 1/8] migration/channel-block: fix return value for qio_channel_block_{readv, writev} Juan Quintela
2022-11-21 12:59 ` [PULL 2/8] migration/multifd/zero-copy: Create helper function for flushing Juan Quintela
2022-11-21 12:59 ` [PULL 3/8] migration: Fix possible infinite loop of ram save process Juan Quintela
2022-11-21 12:59 ` [PULL 4/8] migration: Fix race on qemu_file_shutdown() Juan Quintela
2022-11-21 12:59 ` [PULL 5/8] migration: Disallow postcopy preempt to be used with compress Juan Quintela
2022-11-21 12:59 ` [PULL 6/8] migration: Use non-atomic ops for clear log bitmap Juan Quintela
2022-11-21 12:59 ` [PULL 7/8] migration: Disable multifd explicitly with compression Juan Quintela
2022-11-21 12:59 ` [PULL 8/8] migration: Block migration comment or code is wrong Juan Quintela
2022-11-21 15:54 ` [PULL 0/8] Next patches Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).