All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Aadeshveer Singh <aadeshveer07@gmail.com>
Cc: qemu-devel@nongnu.org, farosas@suse.de, pbonzini@redhat.com,
	philmd@mailo.com, lvivier@redhat.com, ayoub@saferwall.com
Subject: Re: [RFC PATCH 1/5] migration: add RAM Block fields and helpers for fast snapshot load
Date: Mon, 22 Jun 2026 12:23:36 -0400	[thread overview]
Message-ID: <ajlhiEZ4TdBFvdMu@x1.local> (raw)
In-Reply-To: <20260618032010.88755-2-aadeshveer07@gmail.com>

On Thu, Jun 18, 2026 at 08:50:06AM +0530, Aadeshveer Singh wrote:
> Add two fields per RAMBlock:
> 
> - nonzeropages: Mirrors the mapped-ram bitmap for storing which pages
>   are present in file and which are zero.
> - pending_bmap: Bitmap to store internal state of which pages have been
>   read by some thread to ensure coordination between threads.
> 
> Both fields are allocated and initialized in ram_load_setup and freed in
> ram_load_cleanup. nonzeropages is populated in parse_ramblock_mapped_ram
> eliminating the use of a temporary bitmap.
> 
> Change ram_load() to load using ram_load_precopy() in case of fast
> snapshot load.
> 
> Also add migrate_fast_snapshot_load() returning true when both
> postcopy-ram and mapped-ram capabilities are set.
> 
> Update qemu_get_buffer_at() to not set error to make it thread safe. All
> the callers of qemu_get_buffer_at(), take care of error handling.
> 
> Signed-off-by: Aadeshveer Singh <aadeshveer07@gmail.com>
> ---
>  include/system/ramblock.h |  8 +++++
>  migration/options.c       |  5 ++++
>  migration/options.h       |  1 +
>  migration/qemu-file.c     | 10 +------
>  migration/ram.c           | 61 ++++++++++++++++++++++++++++++++-------
>  5 files changed, 65 insertions(+), 20 deletions(-)
> 
> diff --git a/include/system/ramblock.h b/include/system/ramblock.h
> index 4435f8d55f..73275d0459 100644
> --- a/include/system/ramblock.h
> +++ b/include/system/ramblock.h
> @@ -60,6 +60,14 @@ struct RAMBlock {
>  
>      /* Bitmap of already received pages.  Only used on destination side. */
>      unsigned long *receivedmap;
> +    /* Bitmap of zero pages. Used for fast snapshot load. */
> +    unsigned long *nonzeropages;

We have file_bmap, only used on source for now.  I think we can safely
reuse it by caching the pointer allocated.

> +    /*
> +     * Bitmap for pages that are yet to be read from disk. It is required for
> +     * fault thread and eager thread to keep note of which pages are currently
> +     * being read. Used by fast snapshot load.
> +     */
> +    unsigned long *pending_bmap;

We have receivedmap right above, and it's always allocated on dest.  IIUC
we can directly use it.

It's also already set by uffd helpers, see qemu_ufd_copy_ioctl().
Currently it's a bit ugly put under a "if (!ret)"..  if you want you can
clean it up a bit.

There, we may want to skip page_requested or page_request_mutex operations
for file load because they're not necessary.

>  
>      /*
>       * bitmap to track already cleared dirty bitmap.  When the bit is
> diff --git a/migration/options.c b/migration/options.c
> index 5cbfd29099..5f80dd5b42 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -467,6 +467,11 @@ bool migrate_rdma(void)
>      return s->rdma_migration;
>  }
>  
> +bool migrate_fast_snapshot_load(void)
> +{
> +    return migrate_mapped_ram() && migrate_postcopy_ram();
> +}
> +
>  typedef enum WriteTrackingSupport {
>      WT_SUPPORT_UNKNOWN = 0,
>      WT_SUPPORT_ABSENT,
> diff --git a/migration/options.h b/migration/options.h
> index b46221998a..a81ca40d23 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -54,6 +54,7 @@ bool migrate_multifd_flush_after_each_section(void);
>  bool migrate_postcopy(void);
>  bool migrate_rdma(void);
>  bool migrate_tls(void);
> +bool migrate_fast_snapshot_load(void);
>  
>  /* capabilities helpers */
>  
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index d5a48115bd..602ece1b74 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -553,17 +553,9 @@ void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen,
>  size_t qemu_get_buffer_at(QEMUFile *f, uint8_t *buf, size_t buflen,
>                            off_t pos)
>  {
> -    Error *err = NULL;
> -
> -    if (f->last_error) {
> -        return 0;
> -    }

Let's keep this line, this is thread-safe and if we have a prior error it
seems still reasonable to return immediately.  Then leave below one line
change to be a small separate patch would be nicer.

> -
> -    if (qio_channel_pread_all(f->ioc, buf, buflen, pos, &err) < 0) {
> -        qemu_file_set_error_obj(f, -EIO, err);
> +    if (qio_channel_pread_all(f->ioc, buf, buflen, pos, NULL) < 0) {
>          return 0;
>      }
> -
>      return buflen;
>  }
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index fc38ffbf8a..c2bacf3dfc 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -252,6 +252,31 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque)
>      return ret;
>  }
>  
> +static void ramblock_non_zero_map_init(void)
> +{
> +    RAMBlock *rb;
> +
> +    RAMBLOCK_FOREACH_NOT_IGNORED(rb)
> +    {

Let's follow QEMU's coding style, see:

https://qemu-project.gitlab.io/qemu/devel/style.html#block-structure

So:

    RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
        ...
    }

The other thing is, if we pre-allocate this bmap, then it might be wise to
double check this size suites what to be read later in the snapshot file
header, in parse_ramblock_mapped_ram().  We can fail the load immediately
if we found the size requested in header is larger than what is allocated
here (aka, max_length).

We have some length checks, like:

    if (length != block->used_length) {
        ret = qemu_ram_resize(block, length, &local_err);
        if (local_err) {
            error_report_err(local_err);
            return ret;
        }
    }

But I think that != isn't as safe.. we should likely check max_length
first.  This can also be a separate patch just to introduce the file_bmap
to RAMBlock, so that sync mapped-ram can already use it.

> +        assert(!rb->nonzeropages);
> +        size_t size = rb->max_length >> qemu_target_page_bits();
> +        rb->nonzeropages = bitmap_new(size);
> +    }
> +}
> +
> +static void ramblock_pending_bmap_init(void)
> +{
> +    RAMBlock *rb;
> +
> +    RAMBLOCK_FOREACH_NOT_IGNORED(rb)
> +    {

Similar indent issue, IIUC this function can be dropped after reusing
receivedmap.

> +        assert(!rb->pending_bmap);
> +        size_t size = rb->max_length >> qemu_target_page_bits();
> +        rb->pending_bmap = bitmap_new(size);
> +        bitmap_set(rb->pending_bmap, 0, size);
> +    }
> +}
> +
>  static void ramblock_recv_map_init(void)
>  {
>      RAMBlock *rb;
> @@ -3749,6 +3774,12 @@ static int ram_load_setup(QEMUFile *f, void *opaque, Error **errp)
>  {
>      xbzrle_load_setup();
>      ramblock_recv_map_init();
> +    if (migrate_mapped_ram()) {
> +        ramblock_non_zero_map_init();
> +    }
> +    if (migrate_fast_snapshot_load()) {
> +        ramblock_pending_bmap_init();
> +    }
>  
>      return 0;
>  }
> @@ -3768,6 +3799,10 @@ static int ram_load_cleanup(void *opaque)
>      RAMBLOCK_FOREACH_NOT_IGNORED(rb) {
>          g_free(rb->receivedmap);
>          rb->receivedmap = NULL;
> +        g_free(rb->pending_bmap);
> +        rb->pending_bmap = NULL;
> +        g_free(rb->nonzeropages);
> +        rb->nonzeropages = NULL;

For new code, IMHO we can use g_clear_pointer().

>      }
>  
>      return 0;
> @@ -4102,7 +4137,7 @@ static bool read_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
>              host = host_from_ram_block_offset(block, offset);
>              if (!host) {
>                  error_setg(errp, "page outside of ramblock %s range",
> -                           block->idstr);
> +                            block->idstr);

Looks like irrelevant line changes, let's try to not touch them if they're
not needed.

>                  return false;
>              }
>  
> @@ -4110,10 +4145,10 @@ static bool read_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
>  
>              if (migrate_multifd()) {
>                  read = ram_load_multifd_pages(host, size,
> -                                              block->pages_offset + offset);
> +                                                block->pages_offset + offset);
>              } else {
>                  read = qemu_get_buffer_at(f, host, size,
> -                                          block->pages_offset + offset);
> +                                            block->pages_offset + offset);

Same here.

>              }
>  
>              if (!read) {
> @@ -4142,7 +4177,6 @@ err:
>  static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
>                                        ram_addr_t length, Error **errp)
>  {
> -    g_autofree unsigned long *bitmap = NULL;
>      MappedRamHeader header;
>      size_t bitmap_size;
>      long num_pages;
> @@ -4174,15 +4208,18 @@ static void parse_ramblock_mapped_ram(QEMUFile *f, RAMBlock *block,
>      num_pages = length / header.page_size;
>      bitmap_size = BITS_TO_LONGS(num_pages) * sizeof(unsigned long);
>  
> -    bitmap = g_malloc0(bitmap_size);
> -    if (qemu_get_buffer_at(f, (uint8_t *)bitmap, bitmap_size,
> +    if (qemu_get_buffer_at(f, (uint8_t *)block->nonzeropages, bitmap_size,
>                             header.bitmap_offset) != bitmap_size) {
>          error_setg(errp, "Error reading dirty bitmap");
>          return;
>      }
>  
> -    if (!read_ramblock_mapped_ram(f, block, num_pages, bitmap, errp)) {
> -        return;
> +    if (!migrate_fast_snapshot_load()) {

Nitpick: when reaching here it must have mapped-ram enabled, we can
directly check migrate_postcopy_ram() here.

> +        /* Do not load RAM during setup for fast snapshot load */
> +        if (!read_ramblock_mapped_ram(f, block, num_pages, block->nonzeropages,
> +                                      errp)) {
> +            return;
> +        }
>      }
>  
>      /* Skip pages array */
> @@ -4460,9 +4497,11 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      static uint64_t seq_iter;
>      /*
>       * If system is running in postcopy mode, page inserts to host memory must
> -     * be atomic
> +     * be atomic. However, fast snapshot load uses the mapped ram precopy like
> +     * path to read block headers and populating bitmaps.
>       */
> -    bool postcopy_running = postcopy_is_running();
> +    bool load_using_postcopy =
> +        postcopy_is_running() && !migrate_fast_snapshot_load();

I think this change is correct, it's just that I believe it'll be hard to
follow for most readers.

Here, what you really need is to parse the ramblock headers only, it's just
that we used to have both steps (RAM setup + precopy) all processed in the
same ram_load_precopy() function, and here you want to leverage the "RAM
setup" phase only.

Maybe rename the bool to load_postcopy_pages?  That may help to explain why
in your case postcopy-ram=on but you don't set this bool to true: it's
because your case doesn't need to load any page in postcopy way.

Maybe something like this to be slightly more verbose:

  bool ram_should_load_postcopy_pages(void)
  {
      /* This is pure precopy, we don't need to load pages in postcopy way */
      if (!postcopy_is_running()) {
          return false;
      }

      /*
       * This is postcopy, but when with mapped-ram, pages are not loaded
       * in the migration stream here, but done separately in a thread eagerly
       * reading pages from the snapshot.  Here, we only need to read the
       * ram headers, reusing the precopy code.  TODO: when we have separate 
       * function to parse RAM headers we should switch to that.
       */
       if (migrate_mapped_ram()) {
           return false;
       }

       /*
        * Genuine network postcopy, we will load pages in this current stream
        * and they need to be done in postcopy way.
        */
       return true;
  }

  ...
  bool load_postcopy_pages = ram_expects_postcopy_pages();

Thanks,

>  
>      seq_iter++;
>  
> @@ -4478,7 +4517,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>       */
>      trace_ram_load_start();
>      WITH_RCU_READ_LOCK_GUARD() {
> -        if (postcopy_running) {
> +        if (load_using_postcopy) {
>              /*
>               * Note!  Here RAM_CHANNEL_PRECOPY is the precopy channel of
>               * postcopy migration, we have another RAM_CHANNEL_POSTCOPY to
> -- 
> 2.54.0
> 

-- 
Peter Xu



  reply	other threads:[~2026-06-22 16:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-18  3:20 [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
2026-06-18  3:20 ` [RFC PATCH 1/5] migration: add RAM Block fields and helpers for " Aadeshveer Singh
2026-06-22 16:23   ` Peter Xu [this message]
2026-06-18  3:20 ` [RFC PATCH 2/5] migration: add support for fault thread to load pages from disk Aadeshveer Singh
2026-06-22 18:32   ` Peter Xu
2026-06-18  3:20 ` [RFC PATCH 3/5] migration: add eager load thread for fast snapshot load Aadeshveer Singh
2026-06-22 18:50   ` Peter Xu
2026-06-18  3:20 ` [RFC PATCH 4/5] migration: write up code to run fast snapshot load in qemu_loadvm_state Aadeshveer Singh
2026-06-22 19:16   ` Peter Xu
2026-06-18  3:20 ` [RFC PATCH 5/5] migration/tests: remove capability conflict test postcopy-ram+mapped-ram Aadeshveer Singh
2026-06-22 18:51   ` Peter Xu
2026-06-19 13:18 ` [RFC PATCH 0/5] migration: fast snapshot load Aadeshveer Singh
2026-06-22 19:19   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ajlhiEZ4TdBFvdMu@x1.local \
    --to=peterx@redhat.com \
    --cc=aadeshveer07@gmail.com \
    --cc=ayoub@saferwall.com \
    --cc=farosas@suse.de \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@mailo.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.