[RFC PATCH 0/5] migration: fast snapshot load

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/5] migration: fast snapshot load
@ 2026-06-18  3:20 Aadeshveer Singh
  2026-06-18  3:20 ` [RFC PATCH 1/5] migration: add RAM Block fields and helpers for " Aadeshveer Singh
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Aadeshveer Singh @ 2026-06-18  3:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, farosas, pbonzini, philmd, lvivier, ayoub,
	Aadeshveer Singh

This RFC implements a "fast snapshot load" mechanism to significantly
reduce the perceived resume time of a VM from a snapshot file.

Currently, resuming a VM from a snapshot file requires loading all RAM
pages into the QEMU instance before execution begins. This extension
allows the user to run the VM nearly instantly by loading only the
required device states up front and loading RAM pages lazily, by
trapping access to pages that have not yet been loaded.

Using the Linux userfaultfd syscall, a fault thread catches all page
faults caused by the guest and loads in the pages required to keep
the VM running. Concurrently, an eager background thread iteratively
loads all remaining pages into RAM so the guest does not have to
depend on the fault thread indefinitely.

Much of code is reused from postcopy for fault handling and precopy
for reading mapped ram file. Implementation revolves around two
threads named the fault thread and eager load thread. Fault thread as
name suggests catches page faults by the guest and serves them using
userfaultfd. Postcopy fault thread is reused but instead of requesting
source for a page it loads the page directly by reading form file. In
order to remove the dependency of guest on fault thread indefinitely
the eager load thread loads in the entire RAM sequentially, and after
iterating through the entire RAM signals fault thread to exit and
calls cleanup.

In order to prevent the case of a page being loaded twice(in the
case when eager load thread is loading it and fault thread also
tries to serve fault on same page) a bitmap called pending_bmap is
used to track pages which are pending and not being loaded by any
thread. Atomic operations on this bitmap allows coordination between
threads to prevent any unwanted behaviours

This patch was tested using a Debian 13 bare minimum system and Fedora
44 KDE, snapshots for both are loaded successfully with no error.

Next Steps:
- Add testing framework, in qtest and unit tests
- Add support for postcopy-blocktime
- Update documentation

Future direction:
- Add support for hugepages
- Add support for multifd
- Add support for vhost-user

Aadeshveer Singh (5):
  migration: add RAM Block fields and helpers for fast snapshot load
  migration: add support for fault thread to load pages from disk
  migration: add eager load thread for fast snapshot load
  migration: write up code to run fast snapshot load in
    qemu_loadvm_state
  migration/tests: remove capability conflict test
    postcopy-ram+mapped-ram

 include/system/ramblock.h          |   8 ++
 migration/migration.c              |  10 +-
 migration/migration.h              |   5 +
 migration/options.c                |  11 +-
 migration/options.h                |   1 +
 migration/postcopy-ram.c           | 167 ++++++++++++++++++++++++++---
 migration/postcopy-ram.h           |   2 +
 migration/qemu-file.c              |  10 +-
 migration/ram.c                    |  61 +++++++++--
 migration/savevm.c                 |  52 ++++++++-
 migration/savevm.h                 |   2 +
 migration/trace-events             |   2 +
 tests/qtest/migration/misc-tests.c |  52 ---------
 13 files changed, 283 insertions(+), 100 deletions(-)

-- 
2.54.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/5] migration: add RAM Block fields and helpers for fast snapshot load
  2026-06-18  3:20 [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
@ 2026-06-18  3:20 ` Aadeshveer Singh
  2026-06-22 16:23   ` Peter Xu
  2026-06-18  3:20 ` [RFC PATCH 2/5] migration: add support for fault thread to load pages from disk Aadeshveer Singh
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Aadeshveer Singh @ 2026-06-18  3:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, farosas, pbonzini, philmd, lvivier, ayoub,
	Aadeshveer Singh

Add two fields per RAMBlock:

- nonzeropages: Mirrors the mapped-ram bitmap for storing which pages
  are present in file and which are zero.
- pending_bmap: Bitmap to store internal state of which pages have been
  read by some thread to ensure coordination between threads.

Both fields are allocated and initialized in ram_load_setup and freed in
ram_load_cleanup. nonzeropages is populated in parse_ramblock_mapped_ram
eliminating the use of a temporary bitmap.

Change ram_load() to load using ram_load_precopy() in case of fast
snapshot load.

Also add migrate_fast_snapshot_load() returning true when both
postcopy-ram and mapped-ram capabilities are set.

Update qemu_get_buffer_at() to not set error to make it thread safe. All
the callers of qemu_get_buffer_at(), take care of error handling.

Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
---
 include/system/ramblock.h |  8 +++++
 migration/options.c       |  5 ++++
 migration/options.h       |  1 +
 migration/qemu-file.c     | 10 +------
 migration/ram.c           | 61 ++++++++++++++++++++++++++++++++-------
 5 files changed, 65 insertions(+), 20 deletions(-)

diff --git a/include/system/ramblock.h b/include/system/ramblock.h
index 4435f8d55f..73275d0459 100644
--- a/include/system/ramblock.h
+++ b/include/system/ramblock.h
@@ -60,6 +60,14 @@ struct RAMBlock {
 
     /* Bitmap of already received pages.  Only used on destination side. */
     unsigned long *receivedmap;
+    /* Bitmap of zero pages. Used for fast snapshot load. */
+    unsigned long *nonzeropages;
+    /*
+     * Bitmap for pages that are yet to be read from disk. It is required for
+     * fault thread and eager thread to keep note of which pages are currently
+     * being read. Used by fast snapshot load.
+     */
+    unsigned long *pending_bmap;
 
     /*
      * bitmap to track already cleared dirty bitmap.  When the bit is
diff --git a/migration/options.c b/migration/options.c
index 5cbfd29099..5f80dd5b42 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -467,6 +467,11 @@ bool migrate_rdma(void)
     return s->rdma_migration;
 }
 
+bool migrate_fast_snapshot_load(void)
+{
+    return migrate_mapped_ram() && migrate_postcopy_ram();
+}
+
 typedef enum WriteTrackingSupport {
     WT_SUPPORT_UNKNOWN = 0,
     WT_SUPPORT_ABSENT,
diff --git a/migration/options.h b/migration/options.h
index b46221998a..a81ca40d23 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -54,6 +54,7 @@ bool migrate_multifd_flush_after_each_section(void);
 bool migrate_postcopy(void);
 bool migrate_rdma(void);
 bool migrate_tls(void);
+bool migrate_fast_snapshot_load(void);
 
 /* capabilities helpers */
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index d5a48115bd..602ece1b74 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -553,17 +553,9 @@ void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
 size_t qemu_get_buffer_at(QEMUFile *f, uint8_t *buf, size_t buflen,
                           off_t pos)
 {
-    Error *err = NULL;
-
-    if (f->last_error) {
-        return 0;
-    }
-
-    if (qio_channel_pread_all(f->ioc, buf, buflen, pos, &err) < 0) {
-        qemu_file_set_error_obj(f, -EIO, err);
+    if (qio_channel_pread_all(f->ioc, buf, buflen, pos, NULL) < 0) {
         return 0;
     }
-
     return buflen;
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index fc38ffbf8a..c2bacf3dfc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -252,6 +252,31 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque)
     return ret;
 }
 
+static void ramblock_non_zero_map_init(void)
+{
+    RAMBlock *rb;
+
+    RAMBLOCK_FOREACH_NOT_IGNORED(rb)
+    {
+        assert(!rb->nonzeropages);
+        size_t size = rb->max_length >> qemu_target_page_bits();
+        rb->nonzeropages = bitmap_new(size);
+    }
+}
+
+static void ramblock_pending_bmap_init(void)
+{
+    RAMBlock *rb;
+
+    RAMBLOCK_FOREACH_NOT_IGNORED(rb)
+    {
+        assert(!rb->pending_bmap);
+        size_t size = rb->max_length >> qemu_target_page_bits();
+        rb->pending_bmap = bitmap_new(size);
+        bitmap_set(rb->pending_bmap, 0, size);
+    }
+}
+
 static void ramblock_recv_map_init(void)
 {
     RAMBlock *rb;
@@ -3749,6 +3774,12 @@ static int ram_load_setup(QEMUFile *f, void *opaque, Error **errp)
 {
     xbzrle_load_setup();
     ramblock_recv_map_init();
+    if (migrate_mapped_ram()) {
+        ramblock_non_zero_map_init();
+    }
+    if (migrate_fast_snapshot_load()) {
+        ramblock_pending_bmap_init();
+    }
 
     return 0;
 }
@@ -3768,6 +3799,10 @@ static int ram_load_cleanup(void *opaque)
     RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
         g_free(rb->receivedmap);
         rb->receivedmap = NULL;
+        g_free(rb->pending_bmap);
+        rb->pending_bmap = NULL;
+        g_free(rb->nonzeropages);
+        rb->nonzeropages = NULL;
     }
 
     return 0;
@@ -4102,7 +4137,7 @@ static bool read_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
             host = host_from_ram_block_offset(block, offset);
             if (!host) {
                 error_setg(errp, "page outside of ramblock %s range",
-                           block->idstr);
+                            block->idstr);
                 return false;
             }
 
@@ -4110,10 +4145,10 @@ static bool read_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
 
             if (migrate_multifd()) {
                 read = ram_load_multifd_pages(host, size,
-                                              block->pages_offset + offset);
+                                                block->pages_offset + offset);
             } else {
                 read = qemu_get_buffer_at(f, host, size,
-                                          block->pages_offset + offset);
+                                            block->pages_offset + offset);
             }
 
             if (!read) {
@@ -4142,7 +4177,6 @@ err:
 static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
                                       ram_addr_t length, Error **errp)
 {
-    g_autofree unsigned long *bitmap = NULL;
     MappedRamHeader header;
     size_t bitmap_size;
     long num_pages;
@@ -4174,15 +4208,18 @@ static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
     num_pages = length / header.page_size;
     bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
 
-    bitmap = g_malloc0(bitmap_size);
-    if (qemu_get_buffer_at(f, (uint8_t *)bitmap, bitmap_size,
+    if (qemu_get_buffer_at(f, (uint8_t *)block->nonzeropages, bitmap_size,
                            header.bitmap_offset) != bitmap_size) {
         error_setg(errp, "Error reading dirty bitmap");
         return;
     }
 
-    if (!read_ramblock_mapped_ram(f, block, num_pages, bitmap, errp)) {
-        return;
+    if (!migrate_fast_snapshot_load()) {
+        /* Do not load RAM during setup for fast snapshot load */
+        if (!read_ramblock_mapped_ram(f, block, num_pages, block->nonzeropages,
+                                      errp)) {
+            return;
+        }
     }
 
     /* Skip pages array */
@@ -4460,9 +4497,11 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     static uint64_t seq_iter;
     /*
      * If system is running in postcopy mode, page inserts to host memory must
-     * be atomic
+     * be atomic. However, fast snapshot load uses the mapped ram precopy like
+     * path to read block headers and populating bitmaps.
      */
-    bool postcopy_running = postcopy_is_running();
+    bool load_using_postcopy =
+        postcopy_is_running() && !migrate_fast_snapshot_load();
 
     seq_iter++;
 
@@ -4478,7 +4517,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
      */
     trace_ram_load_start();
     WITH_RCU_READ_LOCK_GUARD() {
-        if (postcopy_running) {
+        if (load_using_postcopy) {
             /*
              * Note!  Here RAM_CHANNEL_PRECOPY is the precopy channel of
              * postcopy migration, we have another RAM_CHANNEL_POSTCOPY to
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/5] migration: add support for fault thread to load pages from disk
  2026-06-18  3:20 [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
  2026-06-18  3:20 ` [RFC PATCH 1/5] migration: add RAM Block fields and helpers for " Aadeshveer Singh
@ 2026-06-18  3:20 ` Aadeshveer Singh
  2026-06-22 18:32   ` Peter Xu
  2026-06-18  3:20 ` [RFC PATCH 3/5] migration: add eager load thread for fast snapshot load Aadeshveer Singh
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Aadeshveer Singh @ 2026-06-18  3:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, farosas, pbonzini, philmd, lvivier, ayoub,
	Aadeshveer Singh

In fast snapshot load, we would like to serve faults as soon as possible
hence loading pages directly instead of requesting a source

Add postcopy_mapped_ram_load_page() function which serves single page
fault by reading the snapshot file. It uses bitmap_test_and_clear_atomic
on pending_bmap to coordinate between threads so each page is loaded
exactly once. Non-zero pages are read using qemu_get_buffer_at into a
temporary page (for loading page atomically), which is then placed using
postcopy_place_page. Zero pages are placed directly using
postcopy_place_page_zero.

Update postcopy_ram_fault_thread to call postcopy_mapped_ram_load_page
instead of requesting source in case of fast snapshot load. to_src_file
check is bypassed in fast snapshot load case as there is no source

Allocate another channel in postcopy_temp_pages_setup(like the preempt
case), for both the fault thread and eager thread to load pages
independently.

In case of failure to read required page crash the system using assert
as disk failure is critical and VM cannot be recovered.

Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
---
 migration/postcopy-ram.c | 92 ++++++++++++++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 17 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index f5ef93f193..1ec20a07dd 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -949,6 +949,53 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
                        pagesize);
 }
 
+/*
+ * Load a page from RAMBlock at offset at given host address.
+ * Used by postcopy ram fault thread and eager thread in fast snapshot load
+ * case. rb_offset: Offset of page in RAMBlock haddr: Base of page where to load
+ * in page Channel: Used to identify between threads and use corresponding temp
+ * pages Returns 0 on success
+ */
+static int postcopy_mapped_ram_load_page(MigrationIncomingState *mis,
+                                         RAMBlock *rb, ram_addr_t rb_offset,
+                                         uint64_t haddr, int channel)
+{
+    int ret = 0;
+    unsigned long page;
+    void *host = (void *)haddr;
+    void *place_source = mis->postcopy_tmp_pages[channel].tmp_huge_page;
+    size_t read;
+
+    page = rb_offset >> TARGET_PAGE_BITS;
+
+    if (bitmap_test_and_clear_atomic(rb->pending_bmap, page, 1)) {
+        if (test_bit(page, rb->nonzeropages)) {
+            /*
+             * qemu_get_buffer_at uses preadv which is thread safe we do not
+             * need different channels
+             */
+            read = qemu_get_buffer_at(mis->from_src_file, place_source,
+                                      TARGET_PAGE_SIZE,
+                                      rb->pages_offset + rb_offset);
+
+            g_assert(read == TARGET_PAGE_SIZE);
+
+            ret = postcopy_place_page(mis, host, place_source, rb);
+            if (ret) {
+                return ret;
+            }
+
+        } else {
+            /* zero page */
+            ret = postcopy_place_page_zero(mis, host, rb);
+            if (ret) {
+                return ret;
+            }
+        }
+    }
+    return ret;
+}
+
 /*
  * NOTE: @tid is only used when postcopy-blocktime feature is enabled, and
  * also optional: when zero is provided, the fault accounting will be ignored.
@@ -1320,11 +1367,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
             break;
         }
 
-        if (!mis->to_src_file) {
+        if (!migrate_fast_snapshot_load() && !mis->to_src_file) {
             /*
-             * Possibly someone tells us that the return path is
-             * broken already using the event. We should hold until
-             * the channel is rebuilt.
+             * Fast snapshot load has no to src file or in other case someone
+             * possibly tells us that the return path is broken already using
+             * the event. We should hold until the channel is rebuilt.
              */
             postcopy_pause_fault_thread(mis);
         }
@@ -1387,18 +1434,26 @@ static void *postcopy_ram_fault_thread(void *opaque)
                                                 qemu_ram_get_idstr(rb),
                                                 rb_offset,
                                                 msg.arg.pagefault.feat.ptid);
+
+            if (migrate_fast_snapshot_load()) {
+                if (postcopy_mapped_ram_load_page(
+                        mis, rb, rb_offset, msg.arg.pagefault.address, 1)) {
+                    break;
+                }
+            } else {
 retry:
-            /*
-             * Send the request to the source - we want to request one
-             * of our host page sizes (which is >= TPS)
-             */
-            ret = postcopy_request_page(mis, rb, rb_offset,
-                                        msg.arg.pagefault.address,
-                                        msg.arg.pagefault.feat.ptid);
-            if (ret) {
-                /* May be network failure, try to wait for recovery */
-                postcopy_pause_fault_thread(mis);
-                goto retry;
+                /*
+                 * Send the request to the source - we want to request one
+                 * of our host page sizes (which is >= TPS)
+                 */
+                ret = postcopy_request_page(mis, rb, rb_offset,
+                                            msg.arg.pagefault.address,
+                                            msg.arg.pagefault.feat.ptid);
+                if (ret) {
+                    /* May be network failure, try to wait for recovery */
+                    postcopy_pause_fault_thread(mis);
+                    goto retry;
+                }
             }
         }
 
@@ -1471,8 +1526,11 @@ static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
     unsigned i, channels;
     void *temp_page;
 
-    if (migrate_postcopy_preempt()) {
-        /* If preemption enabled, need extra channel for urgent requests */
+    if (migrate_postcopy_preempt() || migrate_fast_snapshot_load()) {
+        /*
+         * If preemption enabled or it is fast snapshot load, need extra channel
+         * for urgent requests/faults
+         */
         mis->postcopy_channels = RAM_CHANNEL_MAX;
     } else {
         /* Both precopy/postcopy on the same channel */
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/5] migration: add eager load thread for fast snapshot load
  2026-06-18  3:20 [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
  2026-06-18  3:20 ` [RFC PATCH 1/5] migration: add RAM Block fields and helpers for " Aadeshveer Singh
  2026-06-18  3:20 ` [RFC PATCH 2/5] migration: add support for fault thread to load pages from disk Aadeshveer Singh
@ 2026-06-18  3:20 ` Aadeshveer Singh
  2026-06-22 18:50   ` Peter Xu
  2026-06-18  3:20 ` [RFC PATCH 4/5] migration: write up code to run fast snapshot load in qemu_loadvm_state Aadeshveer Singh
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Aadeshveer Singh @ 2026-06-18  3:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, farosas, pbonzini, philmd, lvivier, ayoub,
	Aadeshveer Singh

In fast snapshot load a thread is needed for actively loading in pages
along with the fault path so that the guest is not dependent on fault
thread indefinitely.

Add postcopy_ram_eager_load_thread(), for the eager thread which
iterates over all non ignored blocks calling ram_block_load_eager()
each. ram_block_load_eager then iterates to load in all pages using
postcopy_mapped_ram_load_page(), with a different channel, which takes
care of not loading in pages already loaded by fault thread. On
completion the thread schedules postcopy_ram_eager_load_bh() to destroy
the incoming migration state and set states to completed/end.

Add postcopy_ram_eager_load_setup() to create the thread. Added joining
logic in postcopy_incoming_cleanup().

Add tracepoints for entry and exit to eager load thread.

Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
---
 migration/migration.h    |  5 +++
 migration/postcopy-ram.c | 75 ++++++++++++++++++++++++++++++++++++++++
 migration/postcopy-ram.h |  2 ++
 migration/trace-events   |  2 ++
 4 files changed, 84 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index 841f49b215..7bb54a6584 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -42,6 +42,7 @@
 #define  MIGRATION_THREAD_DST_FAULT         "mig/dst/fault"
 #define  MIGRATION_THREAD_DST_LISTEN        "mig/dst/listen"
 #define  MIGRATION_THREAD_DST_PREEMPT       "mig/dst/preempt"
+#define  MIGRATION_THREAD_DST_EAGER         "mig/dst/eager"
 
 struct PostcopyBlocktimeContext;
 typedef struct ThreadPool ThreadPool;
@@ -120,6 +121,10 @@ struct MigrationIncomingState {
     bool           have_listen_thread;
     QemuThread     listen_thread;
 
+    /* Thread to load pages eagerly in fast snapshot load case */
+    bool have_eager_load_thread;
+    QemuThread eager_load_thread;
+
     /* For the kernel to send us notifications */
     int       userfault_fd;
     /* To notify the fault_thread to wake, e.g., when need to quit */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 1ec20a07dd..0ee294a381 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -2289,9 +2289,84 @@ int postcopy_incoming_cleanup(MigrationIncomingState *mis)
         mis->have_listen_thread = false;
     }
 
+    if (mis->have_eager_load_thread) {
+        qemu_thread_join(&mis->eager_load_thread);
+        mis->have_eager_load_thread = false;
+    }
+
     if (migrate_postcopy_ram()) {
         rc = postcopy_ram_incoming_cleanup(mis);
     }
 
     return rc;
 }
+
+/*
+ * Called by postcopy_ram_eager_load_thread over all blocks to load in all the
+ * pending pages of given ram block
+ */
+static int ram_block_load_eager(RAMBlock *rb, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    void *host = qemu_ram_get_host_addr(rb);
+    void *target;
+    int ret = 0;
+
+    for (ram_addr_t page_loc = 0; page_loc < rb->used_length;
+         page_loc += TARGET_PAGE_SIZE) {
+        target = (uint8_t *)host + page_loc;
+        ret = postcopy_mapped_ram_load_page(mis, rb, page_loc, (uint64_t)target,
+                                            0);
+        if (ret) {
+            break;
+        }
+    }
+    return ret;
+}
+
+/*
+ * Bottom half for fast snapshot load, scheduled by eager load thread
+ */
+static void postcopy_ram_eager_load_bh(void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    postcopy_state_set(POSTCOPY_INCOMING_END);
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                      MIGRATION_STATUS_COMPLETED);
+    migration_incoming_state_destroy();
+}
+
+/*
+ * Used by fast snapshot load to eagerly load in all pages of RAM and schedule
+ * cleanup after entire RAM is loaded
+ */
+static void *postcopy_ram_eager_load_thread(void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    trace_postcopy_ram_eager_load_thread_entry();
+    rcu_register_thread();
+    qemu_event_set(&mis->thread_sync_event);
+
+    if (foreach_not_ignored_block(ram_block_load_eager, mis)) {
+        error_report("ram_block_load_eager failed");
+    }
+
+    migration_bh_schedule(postcopy_ram_eager_load_bh, mis);
+
+    rcu_unregister_thread();
+    trace_postcopy_ram_eager_load_thread_exit();
+    return NULL;
+}
+
+/*
+ * Create thread for eager loading in fast snapshot load case
+ */
+int postcopy_ram_eager_load_setup(MigrationIncomingState *mis)
+{
+    postcopy_thread_create(
+        mis, &mis->eager_load_thread, MIGRATION_THREAD_DST_EAGER,
+        postcopy_ram_eager_load_thread, QEMU_THREAD_JOINABLE);
+    mis->have_eager_load_thread = true;
+    return 0;
+}
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index a080dd65a7..b3ba42e447 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -202,4 +202,6 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
 int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp);
 int postcopy_incoming_cleanup(MigrationIncomingState *mis);
 
+int postcopy_ram_eager_load_setup(MigrationIncomingState *mis);
+
 #endif
diff --git a/migration/trace-events b/migration/trace-events
index de99d976ab..38f11e1e9f 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -314,6 +314,8 @@ postcopy_blocktime_tid_cpu_map(int cpu, uint32_t tid) "cpu: %d, tid: %u"
 postcopy_blocktime_begin(uint64_t addr, uint64_t time, int cpu, bool exists) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", cpu: %d, exist: %d"
 postcopy_blocktime_end(uint64_t addr, uint64_t time, int affected_cpu, int affected_non_cpus) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", affected_cpus: %d, affected_non_cpus: %d"
 postcopy_blocktime_end_one(int cpu, uint8_t left_faults) "cpu: %d, left_faults: %" PRIu8
+postcopy_ram_eager_load_thread_entry(void) ""
+postcopy_ram_eager_load_thread_exit(void) ""
 
 # exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 4/5] migration: write up code to run fast snapshot load in qemu_loadvm_state
  2026-06-18  3:20 [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
                   ` (2 preceding siblings ...)
  2026-06-18  3:20 ` [RFC PATCH 3/5] migration: add eager load thread for fast snapshot load Aadeshveer Singh
@ 2026-06-18  3:20 ` Aadeshveer Singh
  2026-06-22 19:16   ` Peter Xu
  2026-06-18  3:20 ` [RFC PATCH 5/5] migration/tests: remove capability conflict test postcopy-ram+mapped-ram Aadeshveer Singh
  2026-06-19 13:18 ` [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
  5 siblings, 1 reply; 13+ messages in thread
From: Aadeshveer Singh @ 2026-06-18  3:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, farosas, pbonzini, philmd, lvivier, ayoub,
	Aadeshveer Singh

When both mapped-ram and postcopy-ram are set, divert from
qemu_loadvm_state to run fast snapshot load

Initialize postcopy RAM state and register RAM Blocks with userfaultfd
via ram_postcopy_incoming_init() and postcopy_ram_incoming_setup().
Launch fault thread before VM to serve faults for some hardwares
emulation that need to read RAM (like vapic devices). Populate bitmaps
and offset tables while reading file in qemu_loadvm_state_main. Call to
qemu_loadvm_state_postcopy() which starts the VM using
loadvm_postcopy_handle_run_bh() and launches eager load thread.

Skip scheduling process_incoming_migration_bh() in
process_incoming_migration_co(), for fast snapshot load as the state
cleanup is managed by eager load thread on completion.

Skip setting migration status to ACTIVE in process_incoming_migration_co
and set set it to POSTCOPY_DEVICE in qemu_loadvm_state() itself.

Remove the capability check that rejected mapped-ram and postcopy-ram
being set simultaneously, as this combination now corresponds to fast
snapshot load. The corresponding test will be updated in following
patch.

Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
---
 migration/migration.c | 10 ++++++---
 migration/options.c   |  6 -----
 migration/savevm.c    | 52 +++++++++++++++++++++++++++++++++++++++++--
 migration/savevm.h    |  2 ++
 4 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 074d3f2c69..e1ac310e20 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -756,8 +756,10 @@ process_incoming_migration_co(void *opaque)
 
     mis->largest_page_size = qemu_ram_pagesize_largest();
     postcopy_state_set(POSTCOPY_INCOMING_NONE);
-    migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
-                      MIGRATION_STATUS_ACTIVE);
+    if (!migrate_fast_snapshot_load()) {
+        migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
+                          MIGRATION_STATUS_ACTIVE);
+    }
 
     mis->loadvm_co = qemu_coroutine_self();
     ret = qemu_loadvm_state(mis->from_src_file, &local_err);
@@ -786,7 +788,9 @@ process_incoming_migration_co(void *opaque)
         colo_incoming_co();
     }
 
-    migration_bh_schedule(process_incoming_migration_bh, mis);
+    if (!migrate_fast_snapshot_load()) {
+        migration_bh_schedule(process_incoming_migration_bh, mis);
+    }
     goto out;
 
 fail:
diff --git a/migration/options.c b/migration/options.c
index 5f80dd5b42..3f447cf7b2 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -732,12 +732,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
                        "Mapped-ram migration is incompatible with xbzrle");
             return false;
         }
-
-        if (new_caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
-            error_setg(errp,
-                       "Mapped-ram migration is incompatible with postcopy");
-            return false;
-        }
     }
 
     /*
diff --git a/migration/savevm.c b/migration/savevm.c
index 23adaf9dd9..f10cc3c2fc 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2959,6 +2959,32 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     return true;
 }
 
+/*
+ * Starts the VM and launches the eager thread for fast snapshot load
+ */
+int qemu_loadvm_state_postcopy(QEMUFile *f, MigrationIncomingState *mis,
+                               Error **errp)
+{
+    ERRP_GUARD();
+    int ret = 0;
+
+    postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
+
+    migration_bh_schedule(loadvm_postcopy_handle_run_bh, mis);
+
+    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
+    ret = postcopy_ram_eager_load_setup(mis);
+    if (ret) {
+        error_prepend(errp,
+                      "Failed to setup eager load for fast snapshot load: ");
+        return ret;
+    }
+
+    return ret;
+}
+
 int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis,
                            Error **errp)
 {
@@ -3067,8 +3093,30 @@ int qemu_loadvm_state(QEMUFile *f, Error **errp)
 
     cpu_synchronize_all_pre_loadvm();
 
-    ret = qemu_loadvm_state_main(f, mis, errp);
-    qemu_event_set(&mis->main_thread_load_event);
+    if (migrate_fast_snapshot_load()) {
+        migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
+                          MIGRATION_STATUS_POSTCOPY_DEVICE);
+
+        if (ram_postcopy_incoming_init(mis, errp)) {
+            return -EINVAL;
+        }
+
+        postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
+        if (postcopy_ram_incoming_setup(mis)) {
+            return -EINVAL;
+        }
+
+        ret = qemu_loadvm_state_main(f, mis, errp);
+
+        qemu_event_set(&mis->main_thread_load_event);
+
+        if (ret == 0) {
+            ret = qemu_loadvm_state_postcopy(f, mis, errp);
+        }
+    } else {
+        ret = qemu_loadvm_state_main(f, mis, errp);
+        qemu_event_set(&mis->main_thread_load_event);
+    }
 
     trace_qemu_loadvm_state_post_main(ret);
 
diff --git a/migration/savevm.h b/migration/savevm.h
index 96fdf96d4e..9656acd7fe 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,6 +67,8 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
 int qemu_save_device_state(QEMUFile *f, Error **errp);
 int qemu_loadvm_state(QEMUFile *f, Error **errp);
 void qemu_loadvm_state_cleanup(MigrationIncomingState *mis);
+int qemu_loadvm_state_postcopy(QEMUFile *f, MigrationIncomingState *mis,
+                               Error **errp);
 int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis,
                            Error **errp);
 int qemu_load_device_state(QEMUFile *f, Error **errp);
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 5/5] migration/tests: remove capability conflict test postcopy-ram+mapped-ram
  2026-06-18  3:20 [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
                   ` (3 preceding siblings ...)
  2026-06-18  3:20 ` [RFC PATCH 4/5] migration: write up code to run fast snapshot load in qemu_loadvm_state Aadeshveer Singh
@ 2026-06-18  3:20 ` Aadeshveer Singh
  2026-06-22 18:51   ` Peter Xu
  2026-06-19 13:18 ` [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
  5 siblings, 1 reply; 13+ messages in thread
From: Aadeshveer Singh @ 2026-06-18  3:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: peterx, farosas, pbonzini, philmd, lvivier, ayoub,
	Aadeshveer Singh

Remove the test test_validate_caps_pair, which asserted postcopy-ram
and mapped-ram capabilities cannot be active together. The new fast
snapshot load feature is exactly this pair of capabilities active
together, with the previous patches in this series, this combination
is now supported and functional.

No replacement test for fast snapshot load is included in this RFC. A
test exercising the full save/load flow will be added in a follow-up.

Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
---
 tests/qtest/migration/misc-tests.c | 52 ------------------------------
 1 file changed, 52 deletions(-)

diff --git a/tests/qtest/migration/misc-tests.c b/tests/qtest/migration/misc-tests.c
index ec6d438cdc..4e0deb7f18 100644
--- a/tests/qtest/migration/misc-tests.c
+++ b/tests/qtest/migration/misc-tests.c
@@ -201,55 +201,6 @@ static void do_test_validate_uri_channel(MigrateCommon *args)
     migrate_end(from, to, false);
 }
 
-static void validate_caps_pair(QTestState *from,
-                               const char *first_capability,
-                               const char *second_capability,
-                               const char *expected_error)
-{
-    QDict *rsp;
-    const char *error_desc;
-
-    migrate_set_capability(from, first_capability, true);
-
-    rsp = qtest_qmp_assert_failure_ref(
-        from,
-        "{ 'execute': 'migrate-set-capabilities',"
-        "  'arguments': { 'capabilities': [ { "
-        "      'capability': %s, 'state': true } ] } }",
-        second_capability);
-
-    error_desc = qdict_get_str(rsp, "desc");
-    g_assert_cmpstr(error_desc, ==, expected_error);
-    qobject_unref(rsp);
-
-    migrate_set_capability(from, first_capability, false);
-}
-
-static void test_validate_caps_pair(char *test_path, MigrateCommon *args)
-{
-    g_autofree char *serial_path = g_strconcat(tmpfs, "/src_serial", NULL);
-    g_autofree char *cap_pair = g_path_get_basename(test_path);
-    QTestState *from, *to;
-
-    args->start.hide_stderr = true;
-    args->start.only_source = true;
-
-    if (migrate_start(&from, &to, &args->start)) {
-        return;
-    }
-
-    if (g_str_equal(cap_pair, "mapped_ram_postcopy")) {
-        const char *error =
-            "Mapped-ram migration is incompatible with postcopy";
-
-        validate_caps_pair(from, "mapped-ram", "postcopy-ram", error);
-        validate_caps_pair(from, "postcopy-ram", "mapped-ram", error);
-    }
-
-    qtest_quit(from);
-    unlink(serial_path);
-}
-
 static void test_validate_uri_channels_both_set(char *name, MigrateCommon *args)
 {
     args->uri = "tcp:127.0.0.1:0",
@@ -309,7 +260,4 @@ void migration_test_add_misc(MigrationTestEnv *env)
                        test_validate_uri_channels_both_set);
     migration_test_add("/migration/validate_uri/channels/none_set",
                        test_validate_uri_channels_none_set);
-    migration_test_add_suffix("/migration/validate_caps/",
-                              "mapped_ram_postcopy",
-                              test_validate_caps_pair);
 }
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/5] migration: fast snapshot load
  2026-06-18  3:20 [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
                   ` (4 preceding siblings ...)
  2026-06-18  3:20 ` [RFC PATCH 5/5] migration/tests: remove capability conflict test postcopy-ram+mapped-ram Aadeshveer Singh
@ 2026-06-19 13:18 ` Aadeshveer Singh
  2026-06-22 19:19   ` Peter Xu
  5 siblings, 1 reply; 13+ messages in thread
From: Aadeshveer Singh @ 2026-06-19 13:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, pbonzini, philmd, lvivier, ayoub


[-- Attachment #1.1: Type: text/plain, Size: 5532 bytes --]

Hi Everyone,

Adding this small patch on top to enable blocktime support. I will include
it directly when I repost the v2 RFC.
Running cold cache tests most faults lie in the [64 us - 128 us] range(most
likely pages loaded from disk) and a secondary mean is around [4 us - 8
us](likely for directly served zero pages).
Per CPU blocktime is a fraction of a second and offers a visible ergonomic
improvement over precopy migration for snapshot loads, which previously
required a few seconds to load the guest.

Test context:
- Guest: Fedora 44 (4 core, 16 GB RAM)
- Host: Fedora 43
- Host CPU: Intel ultra9 185H(22 cores)
- Host RAM: 32 GB DDR5x
- Host Drive: Gen 4 NVMe SSD
- Acceleration: KVM enabled

Cold Cache Output:
  Postcopy Blocktime (ms): 0
  Postcopy vCPU Blocktime (ms):
  [128, 117, 111, 97]
  Postcopy Latency (ns): 153571
  Postcopy non-vCPU Latencies (ns): 149885
  Postcopy vCPU Latencies (ns):
  [234229, 225562, 195604, 209678]
  Postcopy Latency Distribution:
    [     1 us -     2 us ]:         25
    [     2 us -     4 us ]:        370
    [     4 us -     8 us ]:       3938
    [     8 us -    16 us ]:       3599
    [    16 us -    32 us ]:        520
    [    32 us -    64 us ]:        192
    [    64 us -   128 us ]:      18643
    [   128 us -   256 us ]:       6199
    [   256 us -   512 us ]:       2768
    [   512 us -     1 ms ]:       1096
    [     1 ms -     2 ms ]:        486
    [     2 ms -     4 ms ]:         78
    [     4 ms -     8 ms ]:          5
    [     8 ms -    16 ms ]:          0
    [    16 ms -    32 ms ]:          0
    [    32 ms -    65 ms ]:          0
    [    65 ms -   131 ms ]:          0
    [   131 ms -   262 ms ]:          0
    [   262 ms -   524 ms ]:          0
    [   524 ms -    1 sec ]:          0
    [    1 sec -    2 sec ]:          0
    [    2 sec -    4 sec ]:          0
    [    4 sec -    8 sec ]:          0
    [    8 sec -   16 sec ]:          0

Thank you,
Aadeshveer

On Thu, Jun 18, 2026 at 8:50 AM Aadeshveer Singh <aadeshveer07@gmail.com>
wrote:

> This RFC implements a "fast snapshot load" mechanism to significantly
> reduce the perceived resume time of a VM from a snapshot file.
>
> Currently, resuming a VM from a snapshot file requires loading all RAM
> pages into the QEMU instance before execution begins. This extension
> allows the user to run the VM nearly instantly by loading only the
> required device states up front and loading RAM pages lazily, by
> trapping access to pages that have not yet been loaded.
>
> Using the Linux userfaultfd syscall, a fault thread catches all page
> faults caused by the guest and loads in the pages required to keep
> the VM running. Concurrently, an eager background thread iteratively
> loads all remaining pages into RAM so the guest does not have to
> depend on the fault thread indefinitely.
>
> Much of code is reused from postcopy for fault handling and precopy
> for reading mapped ram file. Implementation revolves around two
> threads named the fault thread and eager load thread. Fault thread as
> name suggests catches page faults by the guest and serves them using
> userfaultfd. Postcopy fault thread is reused but instead of requesting
> source for a page it loads the page directly by reading form file. In
> order to remove the dependency of guest on fault thread indefinitely
> the eager load thread loads in the entire RAM sequentially, and after
> iterating through the entire RAM signals fault thread to exit and
> calls cleanup.
>
> In order to prevent the case of a page being loaded twice(in the
> case when eager load thread is loading it and fault thread also
> tries to serve fault on same page) a bitmap called pending_bmap is
> used to track pages which are pending and not being loaded by any
> thread. Atomic operations on this bitmap allows coordination between
> threads to prevent any unwanted behaviours
>
> This patch was tested using a Debian 13 bare minimum system and Fedora
> 44 KDE, snapshots for both are loaded successfully with no error.
>
> Next Steps:
> - Add testing framework, in qtest and unit tests
> - Add support for postcopy-blocktime
> - Update documentation
>
> Future direction:
> - Add support for hugepages
> - Add support for multifd
> - Add support for vhost-user
>
> Aadeshveer Singh (5):
>   migration: add RAM Block fields and helpers for fast snapshot load
>   migration: add support for fault thread to load pages from disk
>   migration: add eager load thread for fast snapshot load
>   migration: write up code to run fast snapshot load in
>     qemu_loadvm_state
>   migration/tests: remove capability conflict test
>     postcopy-ram+mapped-ram
>
>  include/system/ramblock.h          |   8 ++
>  migration/migration.c              |  10 +-
>  migration/migration.h              |   5 +
>  migration/options.c                |  11 +-
>  migration/options.h                |   1 +
>  migration/postcopy-ram.c           | 167 ++++++++++++++++++++++++++---
>  migration/postcopy-ram.h           |   2 +
>  migration/qemu-file.c              |  10 +-
>  migration/ram.c                    |  61 +++++++++--
>  migration/savevm.c                 |  52 ++++++++-
>  migration/savevm.h                 |   2 +
>  migration/trace-events             |   2 +
>  tests/qtest/migration/misc-tests.c |  52 ---------
>  13 files changed, 283 insertions(+), 100 deletions(-)
>
> --
> 2.54.0
>
>

[-- Attachment #1.2: Type: text/html, Size: 6550 bytes --]

[-- Attachment #2: 0001-migration-postcopy-blocktime-support-for-fast-snapsh.patch --]
[-- Type: text/x-patch, Size: 1559 bytes --]

From 70ab2949ef99968c2fc16e6a0d9860a993514367 Mon Sep 17 00:00:00 2001
From: Aadeshveer Singh <aadeshveer07@gmail.com>
Date: Fri, 19 Jun 2026 18:12:36 +0530
Subject: [PATCH] migration: postcopy-blocktime support for fast snapshot load

Add postcopy-blocktime support to fast snapshot load by calling
mark_postcopy_blocktime_begin(), on all page faults intercepted by fault
thread.

There is no need to call mark_postcopy_blocktime_end(), as
postcopy_mapped_ram_load_page() calls postcopy_place_page() and
postcopy_place_page_zero() which call the end marking internally.

Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
---
 migration/postcopy-ram.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 0ee294a381..2f4698fbed 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1436,6 +1436,12 @@ static void *postcopy_ram_fault_thread(void *opaque)
                                                 msg.arg.pagefault.feat.ptid);
 
             if (migrate_fast_snapshot_load()) {
+                WITH_QEMU_LOCK_GUARD(&mis->page_request_mutex)
+                {
+                    mark_postcopy_blocktime_begin(msg.arg.pagefault.address,
+                                                  msg.arg.pagefault.feat.ptid,
+                                                  rb);
+                }
                 if (postcopy_mapped_ram_load_page(
                         mis, rb, rb_offset, msg.arg.pagefault.address, 1)) {
                     break;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/5] migration: add RAM Block fields and helpers for fast snapshot load
  2026-06-18  3:20 ` [RFC PATCH 1/5] migration: add RAM Block fields and helpers for " Aadeshveer Singh
@ 2026-06-22 16:23   ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2026-06-22 16:23 UTC (permalink / raw)
  To: Aadeshveer Singh; +Cc: qemu-devel, farosas, pbonzini, philmd, lvivier, ayoub

On Thu, Jun 18, 2026 at 08:50:06AM +0530, Aadeshveer Singh wrote:
> Add two fields per RAMBlock:
> 
> - nonzeropages: Mirrors the mapped-ram bitmap for storing which pages
>   are present in file and which are zero.
> - pending_bmap: Bitmap to store internal state of which pages have been
>   read by some thread to ensure coordination between threads.
> 
> Both fields are allocated and initialized in ram_load_setup and freed in
> ram_load_cleanup. nonzeropages is populated in parse_ramblock_mapped_ram
> eliminating the use of a temporary bitmap.
> 
> Change ram_load() to load using ram_load_precopy() in case of fast
> snapshot load.
> 
> Also add migrate_fast_snapshot_load() returning true when both
> postcopy-ram and mapped-ram capabilities are set.
> 
> Update qemu_get_buffer_at() to not set error to make it thread safe. All
> the callers of qemu_get_buffer_at(), take care of error handling.
> 
> Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
> ---
>  include/system/ramblock.h |  8 +++++
>  migration/options.c       |  5 ++++
>  migration/options.h       |  1 +
>  migration/qemu-file.c     | 10 +------
>  migration/ram.c           | 61 ++++++++++++++++++++++++++++++++-------
>  5 files changed, 65 insertions(+), 20 deletions(-)
> 
> diff --git a/include/system/ramblock.h b/include/system/ramblock.h
> index 4435f8d55f..73275d0459 100644
> --- a/include/system/ramblock.h
> +++ b/include/system/ramblock.h
> @@ -60,6 +60,14 @@ struct RAMBlock {
>  
>      /* Bitmap of already received pages.  Only used on destination side. */
>      unsigned long *receivedmap;
> +    /* Bitmap of zero pages. Used for fast snapshot load. */
> +    unsigned long *nonzeropages;

We have file_bmap, only used on source for now.  I think we can safely
reuse it by caching the pointer allocated.

> +    /*
> +     * Bitmap for pages that are yet to be read from disk. It is required for
> +     * fault thread and eager thread to keep note of which pages are currently
> +     * being read. Used by fast snapshot load.
> +     */
> +    unsigned long *pending_bmap;

We have receivedmap right above, and it's always allocated on dest.  IIUC
we can directly use it.

It's also already set by uffd helpers, see qemu_ufd_copy_ioctl().
Currently it's a bit ugly put under a "if (!ret)"..  if you want you can
clean it up a bit.

There, we may want to skip page_requested or page_request_mutex operations
for file load because they're not necessary.

>  
>      /*
>       * bitmap to track already cleared dirty bitmap.  When the bit is
> diff --git a/migration/options.c b/migration/options.c
> index 5cbfd29099..5f80dd5b42 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -467,6 +467,11 @@ bool migrate_rdma(void)
>      return s->rdma_migration;
>  }
>  
> +bool migrate_fast_snapshot_load(void)
> +{
> +    return migrate_mapped_ram() && migrate_postcopy_ram();
> +}
> +
>  typedef enum WriteTrackingSupport {
>      WT_SUPPORT_UNKNOWN = 0,
>      WT_SUPPORT_ABSENT,
> diff --git a/migration/options.h b/migration/options.h
> index b46221998a..a81ca40d23 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -54,6 +54,7 @@ bool migrate_multifd_flush_after_each_section(void);
>  bool migrate_postcopy(void);
>  bool migrate_rdma(void);
>  bool migrate_tls(void);
> +bool migrate_fast_snapshot_load(void);
>  
>  /* capabilities helpers */
>  
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index d5a48115bd..602ece1b74 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -553,17 +553,9 @@ void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
>  size_t qemu_get_buffer_at(QEMUFile *f, uint8_t *buf, size_t buflen,
>                            off_t pos)
>  {
> -    Error *err = NULL;
> -
> -    if (f->last_error) {
> -        return 0;
> -    }

Let's keep this line, this is thread-safe and if we have a prior error it
seems still reasonable to return immediately.  Then leave below one line
change to be a small separate patch would be nicer.

> -
> -    if (qio_channel_pread_all(f->ioc, buf, buflen, pos, &err) < 0) {
> -        qemu_file_set_error_obj(f, -EIO, err);
> +    if (qio_channel_pread_all(f->ioc, buf, buflen, pos, NULL) < 0) {
>          return 0;
>      }
> -
>      return buflen;
>  }
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index fc38ffbf8a..c2bacf3dfc 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -252,6 +252,31 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque)
>      return ret;
>  }
>  
> +static void ramblock_non_zero_map_init(void)
> +{
> +    RAMBlock *rb;
> +
> +    RAMBLOCK_FOREACH_NOT_IGNORED(rb)
> +    {

Let's follow QEMU's coding style, see:

https://qemu-project.gitlab.io/qemu/devel/style.html#block-structure

So:

    RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
        ...
    }

The other thing is, if we pre-allocate this bmap, then it might be wise to
double check this size suites what to be read later in the snapshot file
header, in parse_ramblock_mapped_ram().  We can fail the load immediately
if we found the size requested in header is larger than what is allocated
here (aka, max_length).

We have some length checks, like:

    if (length != block->used_length) {
        ret = qemu_ram_resize(block, length, &local_err);
        if (local_err) {
            error_report_err(local_err);
            return ret;
        }
    }

But I think that != isn't as safe.. we should likely check max_length
first.  This can also be a separate patch just to introduce the file_bmap
to RAMBlock, so that sync mapped-ram can already use it.

> +        assert(!rb->nonzeropages);
> +        size_t size = rb->max_length >> qemu_target_page_bits();
> +        rb->nonzeropages = bitmap_new(size);
> +    }
> +}
> +
> +static void ramblock_pending_bmap_init(void)
> +{
> +    RAMBlock *rb;
> +
> +    RAMBLOCK_FOREACH_NOT_IGNORED(rb)
> +    {

Similar indent issue, IIUC this function can be dropped after reusing
receivedmap.

> +        assert(!rb->pending_bmap);
> +        size_t size = rb->max_length >> qemu_target_page_bits();
> +        rb->pending_bmap = bitmap_new(size);
> +        bitmap_set(rb->pending_bmap, 0, size);
> +    }
> +}
> +
>  static void ramblock_recv_map_init(void)
>  {
>      RAMBlock *rb;
> @@ -3749,6 +3774,12 @@ static int ram_load_setup(QEMUFile *f, void *opaque, Error **errp)
>  {
>      xbzrle_load_setup();
>      ramblock_recv_map_init();
> +    if (migrate_mapped_ram()) {
> +        ramblock_non_zero_map_init();
> +    }
> +    if (migrate_fast_snapshot_load()) {
> +        ramblock_pending_bmap_init();
> +    }
>  
>      return 0;
>  }
> @@ -3768,6 +3799,10 @@ static int ram_load_cleanup(void *opaque)
>      RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
>          g_free(rb->receivedmap);
>          rb->receivedmap = NULL;
> +        g_free(rb->pending_bmap);
> +        rb->pending_bmap = NULL;
> +        g_free(rb->nonzeropages);
> +        rb->nonzeropages = NULL;

For new code, IMHO we can use g_clear_pointer().

>      }
>  
>      return 0;
> @@ -4102,7 +4137,7 @@ static bool read_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
>              host = host_from_ram_block_offset(block, offset);
>              if (!host) {
>                  error_setg(errp, "page outside of ramblock %s range",
> -                           block->idstr);
> +                            block->idstr);

Looks like irrelevant line changes, let's try to not touch them if they're
not needed.

>                  return false;
>              }
>  
> @@ -4110,10 +4145,10 @@ static bool read_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
>  
>              if (migrate_multifd()) {
>                  read = ram_load_multifd_pages(host, size,
> -                                              block->pages_offset + offset);
> +                                                block->pages_offset + offset);
>              } else {
>                  read = qemu_get_buffer_at(f, host, size,
> -                                          block->pages_offset + offset);
> +                                            block->pages_offset + offset);

Same here.

>              }
>  
>              if (!read) {
> @@ -4142,7 +4177,6 @@ err:
>  static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
>                                        ram_addr_t length, Error **errp)
>  {
> -    g_autofree unsigned long *bitmap = NULL;
>      MappedRamHeader header;
>      size_t bitmap_size;
>      long num_pages;
> @@ -4174,15 +4208,18 @@ static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
>      num_pages = length / header.page_size;
>      bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
>  
> -    bitmap = g_malloc0(bitmap_size);
> -    if (qemu_get_buffer_at(f, (uint8_t *)bitmap, bitmap_size,
> +    if (qemu_get_buffer_at(f, (uint8_t *)block->nonzeropages, bitmap_size,
>                             header.bitmap_offset) != bitmap_size) {
>          error_setg(errp, "Error reading dirty bitmap");
>          return;
>      }
>  
> -    if (!read_ramblock_mapped_ram(f, block, num_pages, bitmap, errp)) {
> -        return;
> +    if (!migrate_fast_snapshot_load()) {

Nitpick: when reaching here it must have mapped-ram enabled, we can
directly check migrate_postcopy_ram() here.

> +        /* Do not load RAM during setup for fast snapshot load */
> +        if (!read_ramblock_mapped_ram(f, block, num_pages, block->nonzeropages,
> +                                      errp)) {
> +            return;
> +        }
>      }
>  
>      /* Skip pages array */
> @@ -4460,9 +4497,11 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      static uint64_t seq_iter;
>      /*
>       * If system is running in postcopy mode, page inserts to host memory must
> -     * be atomic
> +     * be atomic. However, fast snapshot load uses the mapped ram precopy like
> +     * path to read block headers and populating bitmaps.
>       */
> -    bool postcopy_running = postcopy_is_running();
> +    bool load_using_postcopy =
> +        postcopy_is_running() && !migrate_fast_snapshot_load();

I think this change is correct, it's just that I believe it'll be hard to
follow for most readers.

Here, what you really need is to parse the ramblock headers only, it's just
that we used to have both steps (RAM setup + precopy) all processed in the
same ram_load_precopy() function, and here you want to leverage the "RAM
setup" phase only.

Maybe rename the bool to load_postcopy_pages?  That may help to explain why
in your case postcopy-ram=on but you don't set this bool to true: it's
because your case doesn't need to load any page in postcopy way.

Maybe something like this to be slightly more verbose:

  bool ram_should_load_postcopy_pages(void)
  {
      /* This is pure precopy, we don't need to load pages in postcopy way */
      if (!postcopy_is_running()) {
          return false;
      }

      /*
       * This is postcopy, but when with mapped-ram, pages are not loaded
       * in the migration stream here, but done separately in a thread eagerly
       * reading pages from the snapshot.  Here, we only need to read the
       * ram headers, reusing the precopy code.  TODO: when we have separate 
       * function to parse RAM headers we should switch to that.
       */
       if (migrate_mapped_ram()) {
           return false;
       }

       /*
        * Genuine network postcopy, we will load pages in this current stream
        * and they need to be done in postcopy way.
        */
       return true;
  }

  ...
  bool load_postcopy_pages = ram_expects_postcopy_pages();

Thanks,

>  
>      seq_iter++;
>  
> @@ -4478,7 +4517,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>       */
>      trace_ram_load_start();
>      WITH_RCU_READ_LOCK_GUARD() {
> -        if (postcopy_running) {
> +        if (load_using_postcopy) {
>              /*
>               * Note!  Here RAM_CHANNEL_PRECOPY is the precopy channel of
>               * postcopy migration, we have another RAM_CHANNEL_POSTCOPY to
> -- 
> 2.54.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/5] migration: add support for fault thread to load pages from disk
  2026-06-18  3:20 ` [RFC PATCH 2/5] migration: add support for fault thread to load pages from disk Aadeshveer Singh
@ 2026-06-22 18:32   ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2026-06-22 18:32 UTC (permalink / raw)
  To: Aadeshveer Singh; +Cc: qemu-devel, farosas, pbonzini, philmd, lvivier, ayoub

On Thu, Jun 18, 2026 at 08:50:07AM +0530, Aadeshveer Singh wrote:
> In fast snapshot load, we would like to serve faults as soon as possible
> hence loading pages directly instead of requesting a source
> 
> Add postcopy_mapped_ram_load_page() function which serves single page
> fault by reading the snapshot file. It uses bitmap_test_and_clear_atomic
> on pending_bmap to coordinate between threads so each page is loaded
> exactly once. Non-zero pages are read using qemu_get_buffer_at into a
> temporary page (for loading page atomically), which is then placed using
> postcopy_place_page. Zero pages are placed directly using
> postcopy_place_page_zero.
> 
> Update postcopy_ram_fault_thread to call postcopy_mapped_ram_load_page
> instead of requesting source in case of fast snapshot load. to_src_file
> check is bypassed in fast snapshot load case as there is no source
> 
> Allocate another channel in postcopy_temp_pages_setup(like the preempt
> case), for both the fault thread and eager thread to load pages
> independently.
> 
> In case of failure to read required page crash the system using assert
> as disk failure is critical and VM cannot be recovered.
> 
> Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
> ---
>  migration/postcopy-ram.c | 92 ++++++++++++++++++++++++++++++++--------
>  1 file changed, 75 insertions(+), 17 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index f5ef93f193..1ec20a07dd 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -949,6 +949,53 @@ int postcopy_wake_shared(struct PostCopyFD *pcfd,
>                         pagesize);
>  }
>  
> +/*
> + * Load a page from RAMBlock at offset at given host address.
> + * Used by postcopy ram fault thread and eager thread in fast snapshot load
> + * case. rb_offset: Offset of page in RAMBlock haddr: Base of page where to load
> + * in page Channel: Used to identify between threads and use corresponding temp
> + * pages Returns 0 on success
> + */

Somehow this paragraph is not properly formatted with newlines.  If you
want to provide a full doc of the function, you can follow kernel-doc
format:

https://docs.kernel.org/doc-guide/kernel-doc.html

/**
 * function_name() - desc
 *
 * @arg1: desc for @arg1
 * @arg2: desc for @arg2
 * ...

> +static int postcopy_mapped_ram_load_page(MigrationIncomingState *mis,
> +                                         RAMBlock *rb, ram_addr_t rb_offset,
> +                                         uint64_t haddr, int channel)
> +{
> +    int ret = 0;
> +    unsigned long page;
> +    void *host = (void *)haddr;

Can drop this var.

> +    void *place_source = mis->postcopy_tmp_pages[channel].tmp_huge_page;
> +    size_t read;

Nit: can use reverse christmas tree.

> +
> +    page = rb_offset >> TARGET_PAGE_BITS;
> +
> +    if (bitmap_test_and_clear_atomic(rb->pending_bmap, page, 1)) {
> +        if (test_bit(page, rb->nonzeropages)) {
> +            /*
> +             * qemu_get_buffer_at uses preadv which is thread safe we do not
> +             * need different channels
> +             */

Slightly misleading when the two threads do not use the same "channel" (or
say, temp pages..).  Maybe what you wanted to emphasize is QEMU might have
more than one thread using this function to install pages.  In that case,
it can be put as:

               /*
                * This can happen concurrently, but it's thread-safe because
                * qemu_get_buffer_at() is thread-safe, and the caller will be
                * using different temporary buffers.
                */

> +            read = qemu_get_buffer_at(mis->from_src_file, place_source,
> +                                      TARGET_PAGE_SIZE,
> +                                      rb->pages_offset + rb_offset);
> +
> +            g_assert(read == TARGET_PAGE_SIZE);

Two things can be improved on errors:

- When an error can be reached with user input, logically we shouldn't
  assert(), assert() is only for program errors.  Here it's possible an
  user specified a broken image, then we should cleanly exit with an err
  code.

- Still better to report the error to the caller and only handle error at
  the very top caller after throwing the error to stderr.

I did suggest that we can assert on loading failures when we talked, but I
confess I was not clear on how to do, sorry.  Let's use assert() only if
it's a programming error.

Postcopy didn't do as good on error reporting, normally nowadays QEMU
suggests to use "bool function(..., Error **errp)" as function interface,
return false if failure hit.  You can do it with the new functions like
this, then return the Error** object to the top caller and do one
error_report() before exit().

Alone the way if you want to convert some postcopy code to start using
Error** it'll be even better.  Feel free to have a look at the comments at
the start of include/qapi/error.h on the suggested way to handle errors in
QEMU.

> +
> +            ret = postcopy_place_page(mis, host, place_source, rb);
> +            if (ret) {
> +                return ret;
> +            }
> +
> +        } else {
> +            /* zero page */

Nit, we can drop this comment.

> +            ret = postcopy_place_page_zero(mis, host, rb);
> +            if (ret) {
> +                return ret;
> +            }
> +        }
> +    }
> +    return ret;
> +}
> +
>  /*
>   * NOTE: @tid is only used when postcopy-blocktime feature is enabled, and
>   * also optional: when zero is provided, the fault accounting will be ignored.
> @@ -1320,11 +1367,11 @@ static void *postcopy_ram_fault_thread(void *opaque)
>              break;
>          }
>  

We can leave below comments as-is, then add one line here to explain:

           /*
            * Fast snapshot load doesn't support pause and recover, because
            * it's not necessary: we can fail right away when QEMU just booted
            * with nothing to lose.
            */

> -        if (!mis->to_src_file) {
> +        if (!migrate_fast_snapshot_load() && !mis->to_src_file) {
>              /*
> -             * Possibly someone tells us that the return path is
> -             * broken already using the event. We should hold until
> -             * the channel is rebuilt.
> +             * Fast snapshot load has no to src file or in other case someone
> +             * possibly tells us that the return path is broken already using
> +             * the event. We should hold until the channel is rebuilt.
>               */
>              postcopy_pause_fault_thread(mis);
>          }
> @@ -1387,18 +1434,26 @@ static void *postcopy_ram_fault_thread(void *opaque)
>                                                  qemu_ram_get_idstr(rb),
>                                                  rb_offset,
>                                                  msg.arg.pagefault.feat.ptid);
> +
> +            if (migrate_fast_snapshot_load()) {
> +                if (postcopy_mapped_ram_load_page(
> +                        mis, rb, rb_offset, msg.arg.pagefault.address, 1)) {

s/1/RAM_CHANNEL_POSTCOPY/?  I agree it's not ideal with the current names,
but it is still kind of suitable.

> +                    break;
> +                }

With above, we can error_report() here and exit() when error happens.

> +            } else {
>  retry:
> -            /*
> -             * Send the request to the source - we want to request one
> -             * of our host page sizes (which is >= TPS)
> -             */
> -            ret = postcopy_request_page(mis, rb, rb_offset,
> -                                        msg.arg.pagefault.address,
> -                                        msg.arg.pagefault.feat.ptid);
> -            if (ret) {
> -                /* May be network failure, try to wait for recovery */
> -                postcopy_pause_fault_thread(mis);
> -                goto retry;
> +                /*
> +                 * Send the request to the source - we want to request one
> +                 * of our host page sizes (which is >= TPS)
> +                 */
> +                ret = postcopy_request_page(mis, rb, rb_offset,
> +                                            msg.arg.pagefault.address,
> +                                            msg.arg.pagefault.feat.ptid);
> +                if (ret) {
> +                    /* May be network failure, try to wait for recovery */
> +                    postcopy_pause_fault_thread(mis);
> +                    goto retry;
> +                }
>              }
>          }
>  
> @@ -1471,8 +1526,11 @@ static int postcopy_temp_pages_setup(MigrationIncomingState *mis)
>      unsigned i, channels;
>      void *temp_page;
>  
> -    if (migrate_postcopy_preempt()) {
> -        /* If preemption enabled, need extra channel for urgent requests */
> +    if (migrate_postcopy_preempt() || migrate_fast_snapshot_load()) {
> +        /*
> +         * If preemption enabled or it is fast snapshot load, need extra channel
> +         * for urgent requests/faults
> +         */
>          mis->postcopy_channels = RAM_CHANNEL_MAX;
>      } else {
>          /* Both precopy/postcopy on the same channel */
> -- 
> 2.54.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/5] migration: add eager load thread for fast snapshot load
  2026-06-18  3:20 ` [RFC PATCH 3/5] migration: add eager load thread for fast snapshot load Aadeshveer Singh
@ 2026-06-22 18:50   ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2026-06-22 18:50 UTC (permalink / raw)
  To: Aadeshveer Singh; +Cc: qemu-devel, farosas, pbonzini, philmd, lvivier, ayoub

On Thu, Jun 18, 2026 at 08:50:08AM +0530, Aadeshveer Singh wrote:
> In fast snapshot load a thread is needed for actively loading in pages
> along with the fault path so that the guest is not dependent on fault
> thread indefinitely.
> 
> Add postcopy_ram_eager_load_thread(), for the eager thread which
> iterates over all non ignored blocks calling ram_block_load_eager()
> each. ram_block_load_eager then iterates to load in all pages using
> postcopy_mapped_ram_load_page(), with a different channel, which takes
> care of not loading in pages already loaded by fault thread. On
> completion the thread schedules postcopy_ram_eager_load_bh() to destroy
> the incoming migration state and set states to completed/end.
> 
> Add postcopy_ram_eager_load_setup() to create the thread. Added joining
> logic in postcopy_incoming_cleanup().
> 
> Add tracepoints for entry and exit to eager load thread.
> 
> Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
> ---
>  migration/migration.h    |  5 +++
>  migration/postcopy-ram.c | 75 ++++++++++++++++++++++++++++++++++++++++
>  migration/postcopy-ram.h |  2 ++
>  migration/trace-events   |  2 ++
>  4 files changed, 84 insertions(+)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index 841f49b215..7bb54a6584 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -42,6 +42,7 @@
>  #define  MIGRATION_THREAD_DST_FAULT         "mig/dst/fault"
>  #define  MIGRATION_THREAD_DST_LISTEN        "mig/dst/listen"
>  #define  MIGRATION_THREAD_DST_PREEMPT       "mig/dst/preempt"
> +#define  MIGRATION_THREAD_DST_EAGER         "mig/dst/eager"

The name isn't easy to digest when someone saw it the 1st time.  Maybe
"snapshot_load" / DST_SNAPSHOT_LOAD?

>  
>  struct PostcopyBlocktimeContext;
>  typedef struct ThreadPool ThreadPool;
> @@ -120,6 +121,10 @@ struct MigrationIncomingState {
>      bool           have_listen_thread;
>      QemuThread     listen_thread;
>  
> +    /* Thread to load pages eagerly in fast snapshot load case */
> +    bool have_eager_load_thread;
> +    QemuThread eager_load_thread;
> +
>      /* For the kernel to send us notifications */
>      int       userfault_fd;
>      /* To notify the fault_thread to wake, e.g., when need to quit */
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 1ec20a07dd..0ee294a381 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -2289,9 +2289,84 @@ int postcopy_incoming_cleanup(MigrationIncomingState *mis)
>          mis->have_listen_thread = false;
>      }
>  
> +    if (mis->have_eager_load_thread) {
> +        qemu_thread_join(&mis->eager_load_thread);
> +        mis->have_eager_load_thread = false;
> +    }
> +
>      if (migrate_postcopy_ram()) {
>          rc = postcopy_ram_incoming_cleanup(mis);
>      }
>  
>      return rc;
>  }
> +
> +/*
> + * Called by postcopy_ram_eager_load_thread over all blocks to load in all the
> + * pending pages of given ram block
> + */
> +static int ram_block_load_eager(RAMBlock *rb, void *opaque)
> +{
> +    MigrationIncomingState *mis = opaque;
> +    void *host = qemu_ram_get_host_addr(rb);
> +    void *target;
> +    int ret = 0;
> +
> +    for (ram_addr_t page_loc = 0; page_loc < rb->used_length;
> +         page_loc += TARGET_PAGE_SIZE) {

I think you can directly use rb->page_size here, then huge page will also
work.

> +        target = (uint8_t *)host + page_loc;
> +        ret = postcopy_mapped_ram_load_page(mis, rb, page_loc, (uint64_t)target,
> +                                            0);

RAM_CHANNEL_PRECOPY

> +        if (ret) {
> +            break;
> +        }
> +    }
> +    return ret;
> +}
> +
> +/*
> + * Bottom half for fast snapshot load, scheduled by eager load thread
> + */
> +static void postcopy_ram_eager_load_bh(void *opaque)
> +{
> +    MigrationIncomingState *mis = opaque;
> +    postcopy_state_set(POSTCOPY_INCOMING_END);
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> +                      MIGRATION_STATUS_COMPLETED);

If you follow what postcopy_listen_thread() does, this should be put in the
eager load thread when all things succeeded.

> +    migration_incoming_state_destroy();
> +}

IIUC we should reuse what postcopy does.

Say, you can have one prior patch making postcopy_listen_thread_bh() to be
what you want.  IIUC we can safely move these two lines into it:

    postcopy_state_set(POSTCOPY_INCOMING_END);

Then I believe you can reuse it.  Maybe when at it, rename it to
postcopy_complete_bh() so it detaches from "listen thread".

That also handles exit_on_error case, so IIUC that's also what you should
do: when any failure happens, set s->error, then set FAILED status, finally
kickoff this BH to handle the rest, so exit() will be done in the BH.

PS: on incoming side so far we still sometimes reuse MigrationState->error.
It's a historical "slight" misuse.. but let's stick with it so far; I think
the whole point is to make error visible in query-migrate if
exit_on_error=false.  It's another problem to solve in the future.

> +
> +/*
> + * Used by fast snapshot load to eagerly load in all pages of RAM and schedule
> + * cleanup after entire RAM is loaded
> + */
> +static void *postcopy_ram_eager_load_thread(void *opaque)
> +{
> +    MigrationIncomingState *mis = opaque;
> +
> +    trace_postcopy_ram_eager_load_thread_entry();
> +    rcu_register_thread();
> +    qemu_event_set(&mis->thread_sync_event);
> +
> +    if (foreach_not_ignored_block(ram_block_load_eager, mis)) {
> +        error_report("ram_block_load_eager failed");

We can set error to MigrationState here with migrate_error_propagate(), if
you would reuse the BH I mentioned above.

> +    }
> +
> +    migration_bh_schedule(postcopy_ram_eager_load_bh, mis);
> +
> +    rcu_unregister_thread();
> +    trace_postcopy_ram_eager_load_thread_exit();
> +    return NULL;
> +}
> +
> +/*
> + * Create thread for eager loading in fast snapshot load case
> + */
> +int postcopy_ram_eager_load_setup(MigrationIncomingState *mis)
> +{
> +    postcopy_thread_create(
> +        mis, &mis->eager_load_thread, MIGRATION_THREAD_DST_EAGER,
> +        postcopy_ram_eager_load_thread, QEMU_THREAD_JOINABLE);
> +    mis->have_eager_load_thread = true;
> +    return 0;
> +}

Then here this patch introduced these functions without using them.  Then
this patch will be almost not possible to do proper review because the
reviewer won't know how it will be used, and where.

IMHO you can squash this patch with the one that will use this function.

Thanks,

> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index a080dd65a7..b3ba42e447 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -202,4 +202,6 @@ void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid,
>  int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp);
>  int postcopy_incoming_cleanup(MigrationIncomingState *mis);
>  
> +int postcopy_ram_eager_load_setup(MigrationIncomingState *mis);
> +
>  #endif
> diff --git a/migration/trace-events b/migration/trace-events
> index de99d976ab..38f11e1e9f 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -314,6 +314,8 @@ postcopy_blocktime_tid_cpu_map(int cpu, uint32_t tid) "cpu: %d, tid: %u"
>  postcopy_blocktime_begin(uint64_t addr, uint64_t time, int cpu, bool exists) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", cpu: %d, exist: %d"
>  postcopy_blocktime_end(uint64_t addr, uint64_t time, int affected_cpu, int affected_non_cpus) "addr: 0x%" PRIx64 ", time: %" PRIu64 ", affected_cpus: %d, affected_non_cpus: %d"
>  postcopy_blocktime_end_one(int cpu, uint8_t left_faults) "cpu: %d, left_faults: %" PRIu8
> +postcopy_ram_eager_load_thread_entry(void) ""
> +postcopy_ram_eager_load_thread_exit(void) ""
>  
>  # exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> -- 
> 2.54.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 5/5] migration/tests: remove capability conflict test postcopy-ram+mapped-ram
  2026-06-18  3:20 ` [RFC PATCH 5/5] migration/tests: remove capability conflict test postcopy-ram+mapped-ram Aadeshveer Singh
@ 2026-06-22 18:51   ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2026-06-22 18:51 UTC (permalink / raw)
  To: Aadeshveer Singh; +Cc: qemu-devel, farosas, pbonzini, philmd, lvivier, ayoub

On Thu, Jun 18, 2026 at 08:50:10AM +0530, Aadeshveer Singh wrote:
> Remove the test test_validate_caps_pair, which asserted postcopy-ram
> and mapped-ram capabilities cannot be active together. The new fast
> snapshot load feature is exactly this pair of capabilities active
> together, with the previous patches in this series, this combination
> is now supported and functional.
> 
> No replacement test for fast snapshot load is included in this RFC. A
> test exercising the full save/load flow will be added in a follow-up.
> 
> Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>

When sending formal patch, let's make this the 1st one otherwise previous
patches will start to break qtest.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 4/5] migration: write up code to run fast snapshot load in qemu_loadvm_state
  2026-06-18  3:20 ` [RFC PATCH 4/5] migration: write up code to run fast snapshot load in qemu_loadvm_state Aadeshveer Singh
@ 2026-06-22 19:16   ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2026-06-22 19:16 UTC (permalink / raw)
  To: Aadeshveer Singh; +Cc: qemu-devel, farosas, pbonzini, philmd, lvivier, ayoub

On Thu, Jun 18, 2026 at 08:50:09AM +0530, Aadeshveer Singh wrote:
> When both mapped-ram and postcopy-ram are set, divert from
> qemu_loadvm_state to run fast snapshot load
> 
> Initialize postcopy RAM state and register RAM Blocks with userfaultfd
> via ram_postcopy_incoming_init() and postcopy_ram_incoming_setup().
> Launch fault thread before VM to serve faults for some hardwares
> emulation that need to read RAM (like vapic devices). Populate bitmaps
> and offset tables while reading file in qemu_loadvm_state_main. Call to
> qemu_loadvm_state_postcopy() which starts the VM using
> loadvm_postcopy_handle_run_bh() and launches eager load thread.
> 
> Skip scheduling process_incoming_migration_bh() in
> process_incoming_migration_co(), for fast snapshot load as the state
> cleanup is managed by eager load thread on completion.
> 
> Skip setting migration status to ACTIVE in process_incoming_migration_co
> and set set it to POSTCOPY_DEVICE in qemu_loadvm_state() itself.
> 
> Remove the capability check that rejected mapped-ram and postcopy-ram
> being set simultaneously, as this combination now corresponds to fast
> snapshot load. The corresponding test will be updated in following
> patch.
> 
> Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
> ---
>  migration/migration.c | 10 ++++++---
>  migration/options.c   |  6 -----
>  migration/savevm.c    | 52 +++++++++++++++++++++++++++++++++++++++++--
>  migration/savevm.h    |  2 ++
>  4 files changed, 59 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 074d3f2c69..e1ac310e20 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -756,8 +756,10 @@ process_incoming_migration_co(void *opaque)
>  
>      mis->largest_page_size = qemu_ram_pagesize_largest();
>      postcopy_state_set(POSTCOPY_INCOMING_NONE);
> -    migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
> -                      MIGRATION_STATUS_ACTIVE);
> +    if (!migrate_fast_snapshot_load()) {
> +        migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
> +                          MIGRATION_STATUS_ACTIVE);
> +    }

Is this really needed?  The less common code we touch the better.

I think it's OK having ACTIVE during early load, set POSTCOPY_ACTIVE only
until postcopy setup.

>  
>      mis->loadvm_co = qemu_coroutine_self();
>      ret = qemu_loadvm_state(mis->from_src_file, &local_err);
> @@ -786,7 +788,9 @@ process_incoming_migration_co(void *opaque)
>          colo_incoming_co();
>      }
>  
> -    migration_bh_schedule(process_incoming_migration_bh, mis);
> +    if (!migrate_fast_snapshot_load()) {
> +        migration_bh_schedule(process_incoming_migration_bh, mis);
> +    }

IMHO we should reuse as much postcopy path as possible, like:
 
+static bool                                                                                                                             
+migration_incoming_has_postcopy_thread(MigrationIncomingState *mis)                                                                     
+{                                                                                                                                       
+    return mis->have_listen_thread || mis->have_eager_load_thread;                                                                      
+}                                                                                                                                       
+                                                                                                                                        
 static void coroutine_fn
 process_incoming_migration_co(void *opaque)
 {
@@ -768,7 +774,7 @@ process_incoming_migration_co(void *opaque)
     trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed");
 
     trace_process_incoming_migration_co_end(ret);
-    if (mis->have_listen_thread) {                                                                                                      
+    if (migration_incoming_has_postcopy_thread(mis)) {                                                                                  
         /*
          * Postcopy was started, cleanup should happen at the end of the
          * postcopy listen thread.
@@ -788,9 +794,7 @@ process_incoming_migration_co(void *opaque)
         colo_incoming_co();
     }
 
-    if (!migrate_fast_snapshot_load()) {                                                                                                
-        migration_bh_schedule(process_incoming_migration_bh, mis);                                                                      
-    }                                                                                                                                   
+    migration_bh_schedule(process_incoming_migration_bh, mis);                                                                          
     goto out;
 

>      goto out;
>  
>  fail:
> diff --git a/migration/options.c b/migration/options.c
> index 5f80dd5b42..3f447cf7b2 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -732,12 +732,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
>                         "Mapped-ram migration is incompatible with xbzrle");
>              return false;
>          }
> -
> -        if (new_caps[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
> -            error_setg(errp,
> -                       "Mapped-ram migration is incompatible with postcopy");
> -            return false;
> -        }

Maybe you can split this change into the test file change that reverts that
check, then it can be the last patch.

>      }
>  
>      /*
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 23adaf9dd9..f10cc3c2fc 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2959,6 +2959,32 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
>      return true;
>  }
>  
> +/*
> + * Starts the VM and launches the eager thread for fast snapshot load
> + */
> +int qemu_loadvm_state_postcopy(QEMUFile *f, MigrationIncomingState *mis,
> +                               Error **errp)
> +{
> +    ERRP_GUARD();
> +    int ret = 0;
> +
> +    postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
> +
> +    migration_bh_schedule(loadvm_postcopy_handle_run_bh, mis);
> +
> +    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
> +                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +
> +    ret = postcopy_ram_eager_load_setup(mis);
> +    if (ret) {
> +        error_prepend(errp,
> +                      "Failed to setup eager load for fast snapshot load: ");
> +        return ret;
> +    }
> +
> +    return ret;
> +}
> +
>  int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis,
>                             Error **errp)
>  {
> @@ -3067,8 +3093,30 @@ int qemu_loadvm_state(QEMUFile *f, Error **errp)
>  
>      cpu_synchronize_all_pre_loadvm();
>  
> -    ret = qemu_loadvm_state_main(f, mis, errp);
> -    qemu_event_set(&mis->main_thread_load_event);
> +    if (migrate_fast_snapshot_load()) {
> +        migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
> +                          MIGRATION_STATUS_POSTCOPY_DEVICE);

Yes, switching to POSTCOPY_DEVICE seems reasonable here.

> +
> +        if (ram_postcopy_incoming_init(mis, errp)) {
> +            return -EINVAL;
> +        }
> +
> +        postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
> +        if (postcopy_ram_incoming_setup(mis)) {
> +            return -EINVAL;

Careful; when returning an error we must make sure *errp is set.  See
postcopy_incoming_setup().

> +        }
> +
> +        ret = qemu_loadvm_state_main(f, mis, errp);
> +
> +        qemu_event_set(&mis->main_thread_load_event);
> +
> +        if (ret == 0) {
> +            ret = qemu_loadvm_state_postcopy(f, mis, errp);
> +        }
> +    } else {
> +        ret = qemu_loadvm_state_main(f, mis, errp);
> +        qemu_event_set(&mis->main_thread_load_event);
> +    }

This chunk may not belong here, qemu_loadvm_state() has three callers, only
process_incoming_migration_co() will use this logic.  Let's move it out
into process_incoming_migration_co() instead.

>  
>      trace_qemu_loadvm_state_post_main(ret);
>  
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 96fdf96d4e..9656acd7fe 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -67,6 +67,8 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
>  int qemu_save_device_state(QEMUFile *f, Error **errp);
>  int qemu_loadvm_state(QEMUFile *f, Error **errp);
>  void qemu_loadvm_state_cleanup(MigrationIncomingState *mis);
> +int qemu_loadvm_state_postcopy(QEMUFile *f, MigrationIncomingState *mis,
> +                               Error **errp);
>  int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis,
>                             Error **errp);
>  int qemu_load_device_state(QEMUFile *f, Error **errp);
> -- 
> 2.54.0
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/5] migration: fast snapshot load
  2026-06-19 13:18 ` [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
@ 2026-06-22 19:19   ` Peter Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2026-06-22 19:19 UTC (permalink / raw)
  To: Aadeshveer Singh; +Cc: qemu-devel, farosas, pbonzini, philmd, lvivier, ayoub

On Fri, Jun 19, 2026 at 06:48:57PM +0530, Aadeshveer Singh wrote:
> From 70ab2949ef99968c2fc16e6a0d9860a993514367 Mon Sep 17 00:00:00 2001
> From: Aadeshveer Singh <aadeshveer07@gmail.com>
> Date: Fri, 19 Jun 2026 18:12:36 +0530
> Subject: [PATCH] migration: postcopy-blocktime support for fast snapshot load
> 
> Add postcopy-blocktime support to fast snapshot load by calling
> mark_postcopy_blocktime_begin(), on all page faults intercepted by fault
> thread.
> 
> There is no need to call mark_postcopy_blocktime_end(), as
> postcopy_mapped_ram_load_page() calls postcopy_place_page() and
> postcopy_place_page_zero() which call the end marking internally.
> 
> Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
> ---
>  migration/postcopy-ram.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 0ee294a381..2f4698fbed 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1436,6 +1436,12 @@ static void *postcopy_ram_fault_thread(void *opaque)
>                                                  msg.arg.pagefault.feat.ptid);
>  
>              if (migrate_fast_snapshot_load()) {
> +                WITH_QEMU_LOCK_GUARD(&mis->page_request_mutex)
> +                {
> +                    mark_postcopy_blocktime_begin(msg.arg.pagefault.address,
> +                                                  msg.arg.pagefault.feat.ptid,
> +                                                  rb);
> +                }
>                  if (postcopy_mapped_ram_load_page(
>                          mis, rb, rb_offset, msg.arg.pagefault.address, 1)) {
>                      break;

Let's squash this directly to your core patch 4, then mention it in the
commit log.

Even if I left quite some comments, most of them are small nitpicks. It's
good to know there're only a few postcopy functions need some touch, and
most logics can be reused.

The RFC series looks a great start, thank you!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-06-22 19:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18  3:20 [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
2026-06-18  3:20 ` [RFC PATCH 1/5] migration: add RAM Block fields and helpers for " Aadeshveer Singh
2026-06-22 16:23   ` Peter Xu
2026-06-18  3:20 ` [RFC PATCH 2/5] migration: add support for fault thread to load pages from disk Aadeshveer Singh
2026-06-22 18:32   ` Peter Xu
2026-06-18  3:20 ` [RFC PATCH 3/5] migration: add eager load thread for fast snapshot load Aadeshveer Singh
2026-06-22 18:50   ` Peter Xu
2026-06-18  3:20 ` [RFC PATCH 4/5] migration: write up code to run fast snapshot load in qemu_loadvm_state Aadeshveer Singh
2026-06-22 19:16   ` Peter Xu
2026-06-18  3:20 ` [RFC PATCH 5/5] migration/tests: remove capability conflict test postcopy-ram+mapped-ram Aadeshveer Singh
2026-06-22 18:51   ` Peter Xu
2026-06-19 13:18 ` [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
2026-06-22 19:19   ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.