qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Juan Quintela" <quintela@redhat.com>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	zhengchuan@huawei.com, "Stefan Hajnoczi" <stefanha@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [PULL 00/10] migration queue
Date: Thu, 8 Oct 2020 20:09:55 +0100	[thread overview]
Message-ID: <20201008190955.GG2962@work-vm> (raw)
In-Reply-To: <20201008185114.GB376361@xz-x1>

* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Oct 08, 2020 at 05:18:14PM +0100, Peter Maydell wrote:
> > Compile failure, windows crossbuilds:
> > 
> > ../../migration/migration.c: In function 'page_request_addr_cmp':
> > ../../migration/migration.c:148:23: error: cast from pointer to
> > integer of different size [-Werror=pointer-to-int-cast]
> >      unsigned long a = (unsigned long) ap, b = (unsigned long) bp;
> >                        ^
> > ../../migration/migration.c:148:47: error: cast from pointer to
> > integer of different size [-Werror=pointer-to-int-cast]
> >      unsigned long a = (unsigned long) ap, b = (unsigned long) bp;
> >                                                ^
> 
> Below squashed into "migration: Maintain postcopy faulted addresses" should fix
> the problem:
> 
> ------8<------
> diff --git a/migration/migration.c b/migration/migration.c
> index e7d179bffc..d8a5c0de44 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -145,7 +145,7 @@ static void migrate_fd_cancel(MigrationState *s);
>  
>  static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
>  {
> -    unsigned long a = (unsigned long) ap, b = (unsigned long) bp;
> +    uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
>  
>      return (a > b) - (a < b);
>  }
> ------8<------
> 
> A whole replacement patch attached too.
> 
> Thanks!
> 


> -- 
> Peter Xu

> From 6d5708092466abb2ed683f9ef24021e2edf25872 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Fri, 28 Aug 2020 18:02:14 -0400
> Subject: [PATCH] migration: Maintain postcopy faulted addresses
> 
> Maintain a list of faulted addresses on the destination host for which we're
> waiting on.  This is implemented using a GTree rather than a real list to make
> sure even there're plenty of vCPUs/threads that are faulting, the lookup will
> still be fast with O(log(N)) (because we'll do that after placing each page).
> It should bring a slight overhead, but ideally that shouldn't be a big problem
> simply because in most cases the requested page list will be short.
> 
> Actually we did similar things for postcopy blocktime measurements.  This patch
> didn't use that simply because:
> 
>   (1) blocktime measurement is towards vcpu threads only, but here we need to
>       record all faulted addresses, including main thread and external
>       thread (like, DPDK via vhost-user).
> 
>   (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we
>       don't want to add that extra dependency on the kernel version since not
>       necessary.  E.g., we don't need to know which thread faulted on which
>       page, we also don't care about multiple threads faulting on the same
>       page.  But we only care about what addresses are faulted so waiting for a
>       page copying from src.
> 
>   (3) blocktime measurement is not enabled by default.  However we need this by
>       default especially for postcopy recover.
> 
> Another thing to mention is that this patch introduced a new mutex to serialize
> the receivedmap and the page_requested tree, however that serialization does
> not cover other procedures like UFFDIO_COPY.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Respin with that replacing the original coming up.

Dave

> ---
>  migration/migration.c    | 41 +++++++++++++++++++++++++++++++++++++++-
>  migration/migration.h    | 19 ++++++++++++++++++-
>  migration/postcopy-ram.c | 17 ++++++++++++++---
>  migration/trace-events   |  2 ++
>  4 files changed, 74 insertions(+), 5 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index b2dac6b39c..d8a5c0de44 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -143,6 +143,13 @@ static int migration_maybe_pause(MigrationState *s,
>                                   int new_state);
>  static void migrate_fd_cancel(MigrationState *s);
>  
> +static gint page_request_addr_cmp(gconstpointer ap, gconstpointer bp)
> +{
> +    uintptr_t a = (uintptr_t) ap, b = (uintptr_t) bp;
> +
> +    return (a > b) - (a < b);
> +}
> +
>  void migration_object_init(void)
>  {
>      MachineState *ms = MACHINE(qdev_get_machine());
> @@ -165,6 +172,8 @@ void migration_object_init(void)
>      qemu_event_init(&current_incoming->main_thread_load_event, false);
>      qemu_sem_init(&current_incoming->postcopy_pause_sem_dst, 0);
>      qemu_sem_init(&current_incoming->postcopy_pause_sem_fault, 0);
> +    qemu_mutex_init(&current_incoming->page_request_mutex);
> +    current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
>  
>      if (!migration_object_check(current_migration, &err)) {
>          error_report_err(err);
> @@ -240,6 +249,11 @@ void migration_incoming_state_destroy(void)
>  
>      qemu_event_reset(&mis->main_thread_load_event);
>  
> +    if (mis->page_requested) {
> +        g_tree_destroy(mis->page_requested);
> +        mis->page_requested = NULL;
> +    }
> +
>      if (mis->socket_address_list) {
>          qapi_free_SocketAddressList(mis->socket_address_list);
>          mis->socket_address_list = NULL;
> @@ -354,8 +368,33 @@ int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
>  }
>  
>  int migrate_send_rp_req_pages(MigrationIncomingState *mis,
> -                              RAMBlock *rb, ram_addr_t start)
> +                              RAMBlock *rb, ram_addr_t start, uint64_t haddr)
>  {
> +    void *aligned = (void *)(uintptr_t)(haddr & (-qemu_target_page_size()));
> +    bool received;
> +
> +    WITH_QEMU_LOCK_GUARD(&mis->page_request_mutex) {
> +        received = ramblock_recv_bitmap_test_byte_offset(rb, start);
> +        if (!received && !g_tree_lookup(mis->page_requested, aligned)) {
> +            /*
> +             * The page has not been received, and it's not yet in the page
> +             * request list.  Queue it.  Set the value of element to 1, so that
> +             * things like g_tree_lookup() will return TRUE (1) when found.
> +             */
> +            g_tree_insert(mis->page_requested, aligned, (gpointer)1);
> +            mis->page_requested_count++;
> +            trace_postcopy_page_req_add(aligned, mis->page_requested_count);
> +        }
> +    }
> +
> +    /*
> +     * If the page is there, skip sending the message.  We don't even need the
> +     * lock because as long as the page arrived, it'll be there forever.
> +     */
> +    if (received) {
> +        return 0;
> +    }
> +
>      return migrate_send_rp_message_req_pages(mis, rb, start);
>  }
>  
> diff --git a/migration/migration.h b/migration/migration.h
> index e853ccf8b1..8d2d1ce839 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -104,6 +104,23 @@ struct MigrationIncomingState {
>  
>      /* List of listening socket addresses  */
>      SocketAddressList *socket_address_list;
> +
> +    /* A tree of pages that we requested to the source VM */
> +    GTree *page_requested;
> +    /* For debugging purpose only, but would be nice to keep */
> +    int page_requested_count;
> +    /*
> +     * The mutex helps to maintain the requested pages that we sent to the
> +     * source, IOW, to guarantee coherent between the page_requests tree and
> +     * the per-ramblock receivedmap.  Note! This does not guarantee consistency
> +     * of the real page copy procedures (using UFFDIO_[ZERO]COPY).  E.g., even
> +     * if one bit in receivedmap is cleared, UFFDIO_COPY could have happened
> +     * for that page already.  This is intended so that the mutex won't
> +     * serialize and blocked by slow operations like UFFDIO_* ioctls.  However
> +     * this should be enough to make sure the page_requested tree always
> +     * contains valid information.
> +     */
> +    QemuMutex page_request_mutex;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> @@ -332,7 +349,7 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
>  void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
>  int migrate_send_rp_req_pages(MigrationIncomingState *mis, RAMBlock *rb,
> -                              ram_addr_t start);
> +                              ram_addr_t start, uint64_t haddr);
>  int migrate_send_rp_message_req_pages(MigrationIncomingState *mis,
>                                        RAMBlock *rb, ram_addr_t start);
>  void migrate_send_rp_recv_bitmap(MigrationIncomingState *mis,
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 722034dc01..ca1daf0024 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -684,7 +684,7 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
>                                          qemu_ram_get_idstr(rb), rb_offset);
>          return postcopy_wake_shared(pcfd, client_addr, rb);
>      }
> -    migrate_send_rp_req_pages(mis, rb, aligned_rbo);
> +    migrate_send_rp_req_pages(mis, rb, aligned_rbo, client_addr);
>      return 0;
>  }
>  
> @@ -979,7 +979,8 @@ retry:
>               * Send the request to the source - we want to request one
>               * of our host page sizes (which is >= TPS)
>               */
> -            ret = migrate_send_rp_req_pages(mis, rb, rb_offset);
> +            ret = migrate_send_rp_req_pages(mis, rb, rb_offset,
> +                                            msg.arg.pagefault.address);
>              if (ret) {
>                  /* May be network failure, try to wait for recovery */
>                  if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
> @@ -1149,10 +1150,20 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingState *mis, void *host_addr,
>          ret = ioctl(userfault_fd, UFFDIO_ZEROPAGE, &zero_struct);
>      }
>      if (!ret) {
> +        qemu_mutex_lock(&mis->page_request_mutex);
>          ramblock_recv_bitmap_set_range(rb, host_addr,
>                                         pagesize / qemu_target_page_size());
> +        /*
> +         * If this page resolves a page fault for a previous recorded faulted
> +         * address, take a special note to maintain the requested page list.
> +         */
> +        if (g_tree_lookup(mis->page_requested, host_addr)) {
> +            g_tree_remove(mis->page_requested, host_addr);
> +            mis->page_requested_count--;
> +            trace_postcopy_page_req_del(host_addr, mis->page_requested_count);
> +        }
> +        qemu_mutex_unlock(&mis->page_request_mutex);
>          mark_postcopy_blocktime_end((uintptr_t)host_addr);
> -
>      }
>      return ret;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index 338f38b3dd..e4d5eb94ca 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -162,6 +162,7 @@ postcopy_pause_return_path(void) ""
>  postcopy_pause_return_path_continued(void) ""
>  postcopy_pause_continued(void) ""
>  postcopy_start_set_run(void) ""
> +postcopy_page_req_add(void *addr, int count) "new page req %p total %d"
>  source_return_path_thread_bad_end(void) ""
>  source_return_path_thread_end(void) ""
>  source_return_path_thread_entry(void) ""
> @@ -272,6 +273,7 @@ postcopy_ram_incoming_cleanup_blocktime(uint64_t total) "total blocktime %" PRIu
>  postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_offset) "for %s in %s offset 0x%"PRIx64
>  postcopy_request_shared_page_present(const char *sharer, const char *rb, uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
>  postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in %s"
> +postcopy_page_req_del(void *addr, int count) "resolved page req %p total %d"
>  
>  get_mem_fault_cpu_index(int cpu, uint32_t pid) "cpu: %d, pid: %u"
>  
> -- 
> 2.26.2
> 

-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2020-10-08 19:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07 15:55 [PULL 00/10] migration queue Dr. David Alan Gilbert (git)
2020-10-07 15:55 ` [PULL 01/10] virtiofsd: Silence gcc warning Dr. David Alan Gilbert (git)
2020-10-07 15:55 ` [PULL 02/10] tools/virtiofsd: add support for --socket-group Dr. David Alan Gilbert (git)
2020-10-07 15:55 ` [PULL 03/10] virtiofsd: Call qemu_init_exec_dir Dr. David Alan Gilbert (git)
2020-10-07 15:55 ` [PULL 04/10] virtiofsd: avoid /proc/self/fd tempdir Dr. David Alan Gilbert (git)
2020-10-07 15:55 ` [PULL 05/10] migration: Pass incoming state into qemu_ufd_copy_ioctl() Dr. David Alan Gilbert (git)
2020-10-07 15:55 ` [PULL 06/10] migration: Introduce migrate_send_rp_message_req_pages() Dr. David Alan Gilbert (git)
2020-10-07 15:55 ` [PULL 07/10] migration: Maintain postcopy faulted addresses Dr. David Alan Gilbert (git)
2020-10-07 15:55 ` [PULL 08/10] migration: Sync requested pages after postcopy recovery Dr. David Alan Gilbert (git)
2020-10-07 15:55 ` [PULL 09/10] migration/dirtyrate: record start_time and calc_time while at the measuring state Dr. David Alan Gilbert (git)
2020-10-07 15:56 ` [PULL 10/10] migration/dirtyrate: present dirty rate only when querying the rate has completed Dr. David Alan Gilbert (git)
2020-10-08 16:18 ` [PULL 00/10] migration queue Peter Maydell
2020-10-08 16:31   ` Dr. David Alan Gilbert
2020-10-08 17:09   ` Eric Blake
2020-10-08 17:43     ` Peter Xu
2020-10-08 18:51   ` Peter Xu
2020-10-08 19:09     ` Dr. David Alan Gilbert [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-10-08 19:10 Dr. David Alan Gilbert (git)
2020-10-11 18:29 ` Peter Maydell
2020-10-12  8:12   ` Dr. David Alan Gilbert
2020-10-14 20:01   ` Peter Xu
2020-10-15  9:13     ` Thomas Huth
2020-10-15 18:58       ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201008190955.GG2962@work-vm \
    --to=dgilbert@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=zhengchuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).