All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: qemu-devel@nongnu.org, lvivier@redhat.com, peterx@redhat.com
Subject: Re: [Qemu-devel] [PATCH v10 09/14] migration: Flush receive queue
Date: Wed, 24 Jan 2018 14:12:36 +0000	[thread overview]
Message-ID: <20180124141235.GC2497@work-vm> (raw)
In-Reply-To: <20180110124723.11879-10-quintela@redhat.com>

* Juan Quintela (quintela@redhat.com) wrote:
> Each time that we sync the bitmap, it is a possiblity that we receive
> a page that is being processed by a different thread.  We fix this
> problem just making sure that we wait for all receiving threads to
> finish its work before we procedeed with the next stage.
> 
> We are low on page flags, so we use a combination that is not valid to
> emit that message:  MULTIFD_PAGE and COMPRESSED.
> 
> I tried to make a migration command for it, but it don't work because
> we sync the bitmap sometimes when we have already sent the beggining
> of the section, so I just added a new page flag.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> 
> --
> Create RAM_SAVE_FLAG_MULTIFD_SYNC (dave suggestion)
> Move the set of need_flush to inside the bitmap_sync code (peter suggestion)
> ---
>  migration/ram.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 20f3726909..b1ad7b2730 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -74,6 +74,14 @@
>  #define RAM_SAVE_FLAG_COMPRESS_PAGE    0x100
>  #define RAM_SAVE_FLAG_MULTIFD_PAGE     0x200
>  
> +/* We are getting low on pages flags, so we start using combinations
> +   When we need to flush a page, we sent it as
> +   RAM_SAVE_FLAG_MULTIFD_PAGE | RAM_SAVE_FLAG_ZERO
> +   We don't allow that combination
> +*/
> +#define RAM_SAVE_FLAG_MULTIFD_SYNC \
> +    (RAM_SAVE_FLAG_MULTIFD_PAGE | RAM_SAVE_FLAG_ZERO)
> +
>  static inline bool is_zero_range(uint8_t *p, uint64_t size)
>  {
>      return buffer_is_zero(p, size);
> @@ -226,6 +234,9 @@ struct RAMState {
>      uint64_t iterations_prev;
>      /* Iterations since start */
>      uint64_t iterations;
> +    /* Indicates if we have synced the bitmap and we need to assure that
> +       target has processeed all previous pages */
> +    bool multifd_needs_flush;
>      /* number of dirty bits in the bitmap */
>      uint64_t migration_dirty_pages;
>      /* protects modification of the bitmap */
> @@ -675,11 +686,13 @@ struct MultiFDRecvParams {
>      QIOChannel *c;
>      QemuSemaphore ready;
>      QemuSemaphore sem;
> +    QemuCond cond_sync;
>      QemuMutex mutex;
>      /* proteced by param mutex */
>      bool quit;
>      /* how many patches has sent this channel */
>      uint32_t packets_recv;
> +    bool sync;
>      multifd_pages_t *pages;
>      bool done;
>  };
> @@ -734,6 +747,7 @@ int multifd_load_cleanup(Error **errp)
>          qemu_thread_join(&p->thread);
>          qemu_mutex_destroy(&p->mutex);
>          qemu_sem_destroy(&p->sem);
> +        qemu_cond_destroy(&p->cond_sync);
>          socket_recv_channel_unref(p->c);
>          g_free(p->name);
>          p->name = NULL;
> @@ -779,6 +793,10 @@ static void *multifd_recv_thread(void *opaque)
>              qemu_mutex_lock(&p->mutex);
>              p->done = true;
>              p->packets_recv++;
> +            if (p->sync) {
> +                qemu_cond_signal(&p->cond_sync);
> +                p->sync = false;
> +            }
>              qemu_mutex_unlock(&p->mutex);
>              qemu_sem_post(&p->ready);
>              continue;
> @@ -845,9 +863,11 @@ void multifd_recv_new_channel(QIOChannel *ioc)
>      qemu_mutex_init(&p->mutex);
>      qemu_sem_init(&p->sem, 0);
>      qemu_sem_init(&p->ready, 0);
> +    qemu_cond_init(&p->cond_sync);
>      p->quit = false;
>      p->id = msg.id;
>      p->done = false;
> +    p->sync = false;
>      multifd_pages_init(&p->pages, migrate_multifd_page_count());
>      p->c = ioc;
>      multifd_recv_state->count++;
> @@ -891,6 +911,27 @@ static void multifd_recv_page(RAMBlock *block, ram_addr_t offset,
>      qemu_sem_post(&p->sem);
>  }
>  
> +static int multifd_flush(void)
> +{
> +    int i, thread_count;
> +
> +    if (!migrate_use_multifd()) {
> +        return 0;
> +    }
> +    thread_count = migrate_multifd_channels();
> +    for (i = 0; i < thread_count; i++) {
> +        MultiFDRecvParams *p = &multifd_recv_state->params[i];
> +
> +        qemu_mutex_lock(&p->mutex);
> +        while (!p->done) {
> +            p->sync = true;
> +            qemu_cond_wait(&p->cond_sync, &p->mutex);
> +        }

I think this can get stuck if there's an error at just the wrong time
in the recv_thread;  perhaps if you make sure the
terminate_multifd_recev_threads wakes all the cond_sync's it would work?

Dave

> +        qemu_mutex_unlock(&p->mutex);
> +    }
> +    return 0;
> +}
> +
>  /**
>   * save_page_header: write page header to wire
>   *
> @@ -908,6 +949,12 @@ static size_t save_page_header(RAMState *rs, QEMUFile *f,  RAMBlock *block,
>  {
>      size_t size, len;
>  
> +    if (rs->multifd_needs_flush &&
> +        (offset & RAM_SAVE_FLAG_MULTIFD_PAGE)) {
> +        offset |= RAM_SAVE_FLAG_ZERO;
> +        rs->multifd_needs_flush = false;
> +    }
> +
>      if (block == rs->last_sent_block) {
>          offset |= RAM_SAVE_FLAG_CONTINUE;
>      }
> @@ -1196,6 +1243,9 @@ static void migration_bitmap_sync(RAMState *rs)
>      if (migrate_use_events()) {
>          qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
>      }
> +    if (!rs->ram_bulk_stage && migrate_use_multifd()) {
> +        rs->multifd_needs_flush = true;
> +    }
>  }
>  
>  /**
> @@ -3201,6 +3251,11 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>              break;
>          }
>  
> +        if ((flags & RAM_SAVE_FLAG_MULTIFD_SYNC)
> +            == RAM_SAVE_FLAG_MULTIFD_SYNC) {
> +            multifd_flush();
> +            flags = flags & ~RAM_SAVE_FLAG_ZERO;

OK, so it's worth a comment saying that a MULTIFD_SYNC is not just a
sync, it's a sync and a page.

Dave

> +        }
>          if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE |
>                       RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE |
>                       RAM_SAVE_FLAG_MULTIFD_PAGE)) {
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-01-24 14:12 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-10 12:47 [Qemu-devel] [RFC 00/14] Multifd Juan Quintela
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 01/14] migration: Make migrate_fd_error() the owner of the Error Juan Quintela
2018-01-12 18:50   ` Dr. David Alan Gilbert
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 02/14] migration: Rename initial_bytes Juan Quintela
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 03/14] migration: Drop current address parameter from save_zero_page() Juan Quintela
2018-01-12 18:56   ` Dr. David Alan Gilbert
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 04/14] migration: Start of multiple fd work Juan Quintela
2018-01-22  7:00   ` Peter Xu
2018-01-23 19:52   ` Dr. David Alan Gilbert
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 05/14] migration: Create ram_multifd_page Juan Quintela
2018-01-23 20:16   ` Dr. David Alan Gilbert
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 06/14] migration: Send the fd number which we are going to use for this page Juan Quintela
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 07/14] migration: Create thread infrastructure for multifd recv side Juan Quintela
2018-01-24 13:34   ` Dr. David Alan Gilbert
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 08/14] migration: Transfer pages over new channels Juan Quintela
2018-01-24 13:46   ` Dr. David Alan Gilbert
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 09/14] migration: Flush receive queue Juan Quintela
2018-01-24 14:12   ` Dr. David Alan Gilbert [this message]
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 10/14] migration: Add multifd test Juan Quintela
2018-01-24 14:23   ` Dr. David Alan Gilbert
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 11/14] LOCAL: use trace events for migration-test Juan Quintela
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 12/14] migration: Sent the page list over the normal thread Juan Quintela
2018-01-24 14:29   ` Dr. David Alan Gilbert
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 13/14] migration: Add multifd_send_packet trace Juan Quintela
2018-01-10 12:47 ` [Qemu-devel] [PATCH v10 14/14] all works Juan Quintela
2018-01-10 15:01 ` [Qemu-devel] [RFC 00/14] Multifd Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180124141235.GC2497@work-vm \
    --to=dgilbert@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.