* [Qemu-devel] [PATCH v3 0/6] Enable postcopy RDMA live migration @ 2018-05-05 14:35 Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 1/6] migration: disable RDMA WRITE after postcopy started Lidong Chen ` (5 more replies) 0 siblings, 6 replies; 13+ messages in thread From: Lidong Chen @ 2018-05-05 14:35 UTC (permalink / raw) To: quintela, dgilbert, berrange Cc: qemu-devel, galsha, aviadye, adido, Lidong Chen The RDMA QIOChannel does not support bi-directional communication, so when RDMA live migration with postcopy enabled, the source qemu return path get qemu file error. These patches implement bi-directional communication for RDMA QIOChannel and disable the RDMA WRITE during the postcopy phase. This patch just make postcopy works, and will improve performance later. [v3] - add a mutex in QEMUFile struct to avoid concurrent channel close (Daniel) - destroy the mutex before free QEMUFile (David) - use rdmain and rmdaout instead of rdma->return_path (Daniel) [v2] - does not update bytes_xfer when disable RDMA WRITE (David) - implement bi-directional communication for RDMA QIOChannel (Daniel) Lidong Chen (6): migration: disable RDMA WRITE after postcopy started migration: create a dedicated connection for rdma return path migration: remove unnecessary variables len in QIOChannelRDMA migration: avoid concurrent invoke channel_close by different threads migration: implement bi-directional RDMA QIOChannel migration: Stop rdma yielding during incoming postcopy migration/colo.c | 2 + migration/migration.c | 2 + migration/postcopy-ram.c | 2 + migration/qemu-file.c | 13 +- migration/ram.c | 4 + migration/rdma.c | 321 +++++++++++++++++++++++++++++++++++++++++------ migration/savevm.c | 3 + 7 files changed, 307 insertions(+), 40 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH v3 1/6] migration: disable RDMA WRITE after postcopy started 2018-05-05 14:35 [Qemu-devel] [PATCH v3 0/6] Enable postcopy RDMA live migration Lidong Chen @ 2018-05-05 14:35 ` Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 2/6] migration: create a dedicated connection for rdma return path Lidong Chen ` (4 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: Lidong Chen @ 2018-05-05 14:35 UTC (permalink / raw) To: quintela, dgilbert, berrange Cc: qemu-devel, galsha, aviadye, adido, Lidong Chen RDMA WRITE operations are performed with no notification to the destination qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE after postcopy started. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- migration/qemu-file.c | 8 ++++++-- migration/rdma.c | 12 ++++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 0463f4c..977b9ae 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -253,8 +253,12 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset, if (f->hooks && f->hooks->save_page) { int ret = f->hooks->save_page(f, f->opaque, block_offset, offset, size, bytes_sent); - f->bytes_xfer += size; - if (ret != RAM_SAVE_CONTROL_DELAYED) { + if (ret != RAM_SAVE_CONTROL_NOT_SUPP) { + f->bytes_xfer += size; + } + + if (ret != RAM_SAVE_CONTROL_DELAYED && + ret != RAM_SAVE_CONTROL_NOT_SUPP) { if (bytes_sent && *bytes_sent > 0) { qemu_update_position(f, *bytes_sent); } else if (ret < 0) { diff --git a/migration/rdma.c b/migration/rdma.c index da474fc..a22be43 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2927,6 +2927,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque, CHECK_ERROR_STATE(); + if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + return RAM_SAVE_CONTROL_NOT_SUPP; + } + qemu_fflush(f); if (size > 0) { @@ -3482,6 +3486,10 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque, CHECK_ERROR_STATE(); + if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + return 0; + } + trace_qemu_rdma_registration_start(flags); qemu_put_be64(f, RAM_SAVE_FLAG_HOOK); qemu_fflush(f); @@ -3504,6 +3512,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, CHECK_ERROR_STATE(); + if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + return 0; + } + qemu_fflush(f); ret = qemu_rdma_drain_cq(f, rdma); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH v3 2/6] migration: create a dedicated connection for rdma return path 2018-05-05 14:35 [Qemu-devel] [PATCH v3 0/6] Enable postcopy RDMA live migration Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 1/6] migration: disable RDMA WRITE after postcopy started Lidong Chen @ 2018-05-05 14:35 ` Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 3/6] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen ` (3 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: Lidong Chen @ 2018-05-05 14:35 UTC (permalink / raw) To: quintela, dgilbert, berrange Cc: qemu-devel, galsha, aviadye, adido, Lidong Chen If start a RDMA migration with postcopy enabled, the source qemu establish a dedicated connection for return path. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- migration/rdma.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 91 insertions(+), 3 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index a22be43..c745427 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -387,6 +387,10 @@ typedef struct RDMAContext { uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX]; GHashTable *blockmap; + + /* the RDMAContext for return path */ + struct RDMAContext *return_path; + bool is_return_path; } RDMAContext; #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma" @@ -2329,10 +2333,22 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) rdma_destroy_id(rdma->cm_id); rdma->cm_id = NULL; } + + /* the destination side, listen_id and channel is shared */ if (rdma->listen_id) { - rdma_destroy_id(rdma->listen_id); + if (!rdma->is_return_path) { + rdma_destroy_id(rdma->listen_id); + } rdma->listen_id = NULL; + + if (rdma->channel) { + if (!rdma->is_return_path) { + rdma_destroy_event_channel(rdma->channel); + } + rdma->channel = NULL; + } } + if (rdma->channel) { rdma_destroy_event_channel(rdma->channel); rdma->channel = NULL; @@ -2561,6 +2577,25 @@ err_dest_init_create_listen_id: } +static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path, + RDMAContext *rdma) +{ + int idx; + + for (idx = 0; idx < RDMA_WRID_MAX; idx++) { + rdma_return_path->wr_data[idx].control_len = 0; + rdma_return_path->wr_data[idx].control_curr = NULL; + } + + /*the CM channel and CM id is shared*/ + rdma_return_path->channel = rdma->channel; + rdma_return_path->listen_id = rdma->listen_id; + + rdma->return_path = rdma_return_path; + rdma_return_path->return_path = rdma; + rdma_return_path->is_return_path = true; +} + static void *qemu_rdma_data_init(const char *host_port, Error **errp) { RDMAContext *rdma = NULL; @@ -3018,6 +3053,8 @@ err: return ret; } +static void rdma_accept_incoming_migration(void *opaque); + static int qemu_rdma_accept(RDMAContext *rdma) { RDMACapabilities cap; @@ -3112,7 +3149,14 @@ static int qemu_rdma_accept(RDMAContext *rdma) } } - qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); + /* Accept the second connection request for return path */ + if (migrate_postcopy() && !rdma->is_return_path) { + qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration, + NULL, + (void *)(intptr_t)rdma->return_path); + } else { + qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL); + } ret = rdma_accept(rdma->cm_id, &conn_param); if (ret) { @@ -3693,6 +3737,10 @@ static void rdma_accept_incoming_migration(void *opaque) trace_qemu_rdma_accept_incoming_migration_accepted(); + if (rdma->is_return_path) { + return; + } + f = qemu_fopen_rdma(rdma, "rb"); if (f == NULL) { ERROR(errp, "could not qemu_fopen_rdma!"); @@ -3707,7 +3755,7 @@ static void rdma_accept_incoming_migration(void *opaque) void rdma_start_incoming_migration(const char *host_port, Error **errp) { int ret; - RDMAContext *rdma; + RDMAContext *rdma, *rdma_return_path; Error *local_err = NULL; trace_rdma_start_incoming_migration(); @@ -3734,12 +3782,24 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp) trace_rdma_start_incoming_migration_after_rdma_listen(); + /* initialize the RDMAContext for return path */ + if (migrate_postcopy()) { + rdma_return_path = qemu_rdma_data_init(host_port, &local_err); + + if (rdma_return_path == NULL) { + goto err; + } + + qemu_rdma_return_path_dest_init(rdma_return_path, rdma); + } + qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration, NULL, (void *)(intptr_t)rdma); return; err: error_propagate(errp, local_err); g_free(rdma); + g_free(rdma_return_path); } void rdma_start_outgoing_migration(void *opaque, @@ -3747,6 +3807,7 @@ void rdma_start_outgoing_migration(void *opaque, { MigrationState *s = opaque; RDMAContext *rdma = qemu_rdma_data_init(host_port, errp); + RDMAContext *rdma_return_path = NULL; int ret = 0; if (rdma == NULL) { @@ -3767,6 +3828,32 @@ void rdma_start_outgoing_migration(void *opaque, goto err; } + /* RDMA postcopy need a seprate queue pair for return path */ + if (migrate_postcopy()) { + rdma_return_path = qemu_rdma_data_init(host_port, errp); + + if (rdma_return_path == NULL) { + goto err; + } + + ret = qemu_rdma_source_init(rdma_return_path, + s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL], errp); + + if (ret) { + goto err; + } + + ret = qemu_rdma_connect(rdma_return_path, errp); + + if (ret) { + goto err; + } + + rdma->return_path = rdma_return_path; + rdma_return_path->return_path = rdma; + rdma_return_path->is_return_path = true; + } + trace_rdma_start_outgoing_migration_after_rdma_connect(); s->to_dst_file = qemu_fopen_rdma(rdma, "wb"); @@ -3774,4 +3861,5 @@ void rdma_start_outgoing_migration(void *opaque, return; err: g_free(rdma); + g_free(rdma_return_path); } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH v3 3/6] migration: remove unnecessary variables len in QIOChannelRDMA 2018-05-05 14:35 [Qemu-devel] [PATCH v3 0/6] Enable postcopy RDMA live migration Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 1/6] migration: disable RDMA WRITE after postcopy started Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 2/6] migration: create a dedicated connection for rdma return path Lidong Chen @ 2018-05-05 14:35 ` Lidong Chen 2018-05-08 14:19 ` Dr. David Alan Gilbert 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 4/6] migration: avoid concurrent invoke channel_close by different threads Lidong Chen ` (2 subsequent siblings) 5 siblings, 1 reply; 13+ messages in thread From: Lidong Chen @ 2018-05-05 14:35 UTC (permalink / raw) To: quintela, dgilbert, berrange Cc: qemu-devel, galsha, aviadye, adido, Lidong Chen Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked by different threads concurrently, this patch removes unnecessary variables len in QIOChannelRDMA and use local variable instead. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangéberrange@redhat.com> --- migration/rdma.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index c745427..f5c1d02 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -404,7 +404,6 @@ struct QIOChannelRDMA { QIOChannel parent; RDMAContext *rdma; QEMUFile *file; - size_t len; bool blocking; /* XXX we don't actually honour this yet */ }; @@ -2640,6 +2639,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, int ret; ssize_t done = 0; size_t i; + size_t len = 0; CHECK_ERROR_STATE(); @@ -2659,10 +2659,10 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, while (remaining) { RDMAControlHeader head; - rioc->len = MIN(remaining, RDMA_SEND_INCREMENT); - remaining -= rioc->len; + len = MIN(remaining, RDMA_SEND_INCREMENT); + remaining -= len; - head.len = rioc->len; + head.len = len; head.type = RDMA_CONTROL_QEMU_FILE; ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL); @@ -2672,8 +2672,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, return ret; } - data += rioc->len; - done += rioc->len; + data += len; + done += len; } } @@ -2768,8 +2768,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, } } } - rioc->len = done; - return rioc->len; + return done; } /* -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v3 3/6] migration: remove unnecessary variables len in QIOChannelRDMA 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 3/6] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen @ 2018-05-08 14:19 ` Dr. David Alan Gilbert 2018-05-09 1:28 ` 858585 jemmy 0 siblings, 1 reply; 13+ messages in thread From: Dr. David Alan Gilbert @ 2018-05-08 14:19 UTC (permalink / raw) To: Lidong Chen Cc: quintela, berrange, qemu-devel, galsha, aviadye, adido, Lidong Chen * Lidong Chen (jemmy858585@gmail.com) wrote: > Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked > by different threads concurrently, this patch removes unnecessary variables > len in QIOChannelRDMA and use local variable instead. > > Signed-off-by: Lidong Chen <lidongchen@tencent.com> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > Reviewed-by: Daniel P. Berrangéberrange@redhat.com> Note there's a ' <' missing somehow; minor fix up during commit hopefully. Dave > --- > migration/rdma.c | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index c745427..f5c1d02 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -404,7 +404,6 @@ struct QIOChannelRDMA { > QIOChannel parent; > RDMAContext *rdma; > QEMUFile *file; > - size_t len; > bool blocking; /* XXX we don't actually honour this yet */ > }; > > @@ -2640,6 +2639,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, > int ret; > ssize_t done = 0; > size_t i; > + size_t len = 0; > > CHECK_ERROR_STATE(); > > @@ -2659,10 +2659,10 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, > while (remaining) { > RDMAControlHeader head; > > - rioc->len = MIN(remaining, RDMA_SEND_INCREMENT); > - remaining -= rioc->len; > + len = MIN(remaining, RDMA_SEND_INCREMENT); > + remaining -= len; > > - head.len = rioc->len; > + head.len = len; > head.type = RDMA_CONTROL_QEMU_FILE; > > ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL); > @@ -2672,8 +2672,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, > return ret; > } > > - data += rioc->len; > - done += rioc->len; > + data += len; > + done += len; > } > } > > @@ -2768,8 +2768,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, > } > } > } > - rioc->len = done; > - return rioc->len; > + return done; > } > > /* > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v3 3/6] migration: remove unnecessary variables len in QIOChannelRDMA 2018-05-08 14:19 ` Dr. David Alan Gilbert @ 2018-05-09 1:28 ` 858585 jemmy 0 siblings, 0 replies; 13+ messages in thread From: 858585 jemmy @ 2018-05-09 1:28 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Juan Quintela, Daniel P. Berrange, qemu-devel, Gal Shachaf, Aviad Yehezkel, adido, Lidong Chen On Tue, May 8, 2018 at 10:19 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Lidong Chen (jemmy858585@gmail.com) wrote: >> Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked >> by different threads concurrently, this patch removes unnecessary variables >> len in QIOChannelRDMA and use local variable instead. >> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com> >> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> >> Reviewed-by: Daniel P. Berrangéberrange@redhat.com> > > Note there's a ' <' missing somehow; minor fix up during commit > hopefully. > > Dave Sorry for this mistake, I will check more carefully. > >> --- >> migration/rdma.c | 15 +++++++-------- >> 1 file changed, 7 insertions(+), 8 deletions(-) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index c745427..f5c1d02 100644 >> --- a/migration/rdma.c >> +++ b/migration/rdma.c >> @@ -404,7 +404,6 @@ struct QIOChannelRDMA { >> QIOChannel parent; >> RDMAContext *rdma; >> QEMUFile *file; >> - size_t len; >> bool blocking; /* XXX we don't actually honour this yet */ >> }; >> >> @@ -2640,6 +2639,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, >> int ret; >> ssize_t done = 0; >> size_t i; >> + size_t len = 0; >> >> CHECK_ERROR_STATE(); >> >> @@ -2659,10 +2659,10 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, >> while (remaining) { >> RDMAControlHeader head; >> >> - rioc->len = MIN(remaining, RDMA_SEND_INCREMENT); >> - remaining -= rioc->len; >> + len = MIN(remaining, RDMA_SEND_INCREMENT); >> + remaining -= len; >> >> - head.len = rioc->len; >> + head.len = len; >> head.type = RDMA_CONTROL_QEMU_FILE; >> >> ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL); >> @@ -2672,8 +2672,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, >> return ret; >> } >> >> - data += rioc->len; >> - done += rioc->len; >> + data += len; >> + done += len; >> } >> } >> >> @@ -2768,8 +2768,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, >> } >> } >> } >> - rioc->len = done; >> - return rioc->len; >> + return done; >> } >> >> /* >> -- >> 1.8.3.1 >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH v3 4/6] migration: avoid concurrent invoke channel_close by different threads 2018-05-05 14:35 [Qemu-devel] [PATCH v3 0/6] Enable postcopy RDMA live migration Lidong Chen ` (2 preceding siblings ...) 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 3/6] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen @ 2018-05-05 14:35 ` Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 5/6] migration: implement bi-directional RDMA QIOChannel Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 6/6] migration: Stop rdma yielding during incoming postcopy Lidong Chen 5 siblings, 0 replies; 13+ messages in thread From: Lidong Chen @ 2018-05-05 14:35 UTC (permalink / raw) To: quintela, dgilbert, berrange Cc: qemu-devel, galsha, aviadye, adido, Lidong Chen The channel_close maybe invoked by different threads. For example, source qemu invokes qemu_fclose in main thread, migration thread and return path thread. Destination qemu invokes qemu_fclose in main thread, listen thread and COLO incoming thread. Add a mutex in QEMUFile struct to avoid concurrent invoke channel_close. Signed-off-by: Lidong Chen <lidongchen@tencent.com> --- migration/qemu-file.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 977b9ae..87d0f05 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -52,6 +52,7 @@ struct QEMUFile { unsigned int iovcnt; int last_error; + QemuMutex lock; }; /* @@ -96,6 +97,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops) f = g_new0(QEMUFile, 1); + qemu_mutex_init(&f->lock); f->opaque = opaque; f->ops = ops; return f; @@ -328,7 +330,9 @@ int qemu_fclose(QEMUFile *f) ret = qemu_file_get_error(f); if (f->ops->close) { + qemu_mutex_lock(&f->lock); int ret2 = f->ops->close(f->opaque); + qemu_mutex_unlock(&f->lock); if (ret >= 0) { ret = ret2; } @@ -339,6 +343,7 @@ int qemu_fclose(QEMUFile *f) if (f->last_error) { ret = f->last_error; } + qemu_mutex_destroy(&f->lock); g_free(f); trace_qemu_file_fclose(); return ret; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH v3 5/6] migration: implement bi-directional RDMA QIOChannel 2018-05-05 14:35 [Qemu-devel] [PATCH v3 0/6] Enable postcopy RDMA live migration Lidong Chen ` (3 preceding siblings ...) 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 4/6] migration: avoid concurrent invoke channel_close by different threads Lidong Chen @ 2018-05-05 14:35 ` Lidong Chen 2018-05-15 14:54 ` Paolo Bonzini 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 6/6] migration: Stop rdma yielding during incoming postcopy Lidong Chen 5 siblings, 1 reply; 13+ messages in thread From: Lidong Chen @ 2018-05-05 14:35 UTC (permalink / raw) To: quintela, dgilbert, berrange Cc: qemu-devel, galsha, aviadye, adido, Lidong Chen This patch implements bi-directional RDMA QIOChannel. Because different threads may access RDMAQIOChannel currently, this patch use RCU to protect it. Signed-off-by: Lidong Chen <lidongchen@tencent.com> --- migration/colo.c | 2 + migration/migration.c | 2 + migration/postcopy-ram.c | 2 + migration/ram.c | 4 + migration/rdma.c | 196 ++++++++++++++++++++++++++++++++++++++++------- migration/savevm.c | 3 + 6 files changed, 183 insertions(+), 26 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 4381067..88936f5 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -534,6 +534,7 @@ void *colo_process_incoming_thread(void *opaque) uint64_t value; Error *local_err = NULL; + rcu_register_thread(); qemu_sem_init(&mis->colo_incoming_sem, 0); migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, @@ -666,5 +667,6 @@ out: } migration_incoming_exit_colo(); + rcu_unregister_thread(); return NULL; } diff --git a/migration/migration.c b/migration/migration.c index 0bdb28e..584666b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1787,6 +1787,7 @@ static void *source_return_path_thread(void *opaque) int res; trace_source_return_path_thread_entry(); + rcu_register_thread(); while (!ms->rp_state.error && !qemu_file_get_error(rp) && migration_is_setup_or_active(ms->state)) { trace_source_return_path_thread_loop_top(); @@ -1887,6 +1888,7 @@ static void *source_return_path_thread(void *opaque) out: ms->rp_state.from_dst_file = NULL; qemu_fclose(rp); + rcu_unregister_thread(); return NULL; } diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 8ceeaa2..4e05966 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -842,6 +842,7 @@ static void *postcopy_ram_fault_thread(void *opaque) RAMBlock *rb = NULL; trace_postcopy_ram_fault_thread_entry(); + rcu_register_thread(); mis->last_rb = NULL; /* last RAMBlock we sent part of */ qemu_sem_post(&mis->fault_thread_sem); @@ -1013,6 +1014,7 @@ static void *postcopy_ram_fault_thread(void *opaque) } } } + rcu_unregister_thread(); trace_postcopy_ram_fault_thread_exit(); g_free(pfd); return NULL; diff --git a/migration/ram.c b/migration/ram.c index 912810c..9bc92fc 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -491,6 +491,7 @@ static void *multifd_send_thread(void *opaque) { MultiFDSendParams *p = opaque; + rcu_register_thread(); while (true) { qemu_mutex_lock(&p->mutex); if (p->quit) { @@ -500,6 +501,7 @@ static void *multifd_send_thread(void *opaque) qemu_mutex_unlock(&p->mutex); qemu_sem_wait(&p->sem); } + rcu_unregister_thread(); return NULL; } @@ -592,6 +594,7 @@ static void *multifd_recv_thread(void *opaque) { MultiFDRecvParams *p = opaque; + rcu_register_thread(); while (true) { qemu_mutex_lock(&p->mutex); if (p->quit) { @@ -601,6 +604,7 @@ static void *multifd_recv_thread(void *opaque) qemu_mutex_unlock(&p->mutex); qemu_sem_wait(&p->sem); } + rcu_unregister_thread(); return NULL; } diff --git a/migration/rdma.c b/migration/rdma.c index f5c1d02..854f355 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL; " to abort!"); \ rdma->error_reported = 1; \ } \ + rcu_read_unlock(); \ return rdma->error_state; \ } \ } while (0) @@ -402,7 +403,8 @@ typedef struct QIOChannelRDMA QIOChannelRDMA; struct QIOChannelRDMA { QIOChannel parent; - RDMAContext *rdma; + RDMAContext *rdmain; + RDMAContext *rdmaout; QEMUFile *file; bool blocking; /* XXX we don't actually honour this yet */ }; @@ -2635,12 +2637,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); QEMUFile *f = rioc->file; - RDMAContext *rdma = rioc->rdma; + RDMAContext *rdma; int ret; ssize_t done = 0; size_t i; size_t len = 0; + rcu_read_lock(); + rdma = atomic_rcu_read(&rioc->rdmaout); + + if (!rdma) { + rcu_read_unlock(); + return -EIO; + } + CHECK_ERROR_STATE(); /* @@ -2650,6 +2660,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, ret = qemu_rdma_write_flush(f, rdma); if (ret < 0) { rdma->error_state = ret; + rcu_read_unlock(); return ret; } @@ -2669,6 +2680,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, if (ret < 0) { rdma->error_state = ret; + rcu_read_unlock(); return ret; } @@ -2677,6 +2689,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, } } + rcu_read_unlock(); return done; } @@ -2710,12 +2723,20 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, Error **errp) { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); - RDMAContext *rdma = rioc->rdma; + RDMAContext *rdma; RDMAControlHeader head; int ret = 0; ssize_t i; size_t done = 0; + rcu_read_lock(); + rdma = atomic_rcu_read(&rioc->rdmain); + + if (!rdma) { + rcu_read_unlock(); + return -EIO; + } + CHECK_ERROR_STATE(); for (i = 0; i < niov; i++) { @@ -2727,7 +2748,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, * were given and dish out the bytes until we run * out of bytes. */ - ret = qemu_rdma_fill(rioc->rdma, data, want, 0); + ret = qemu_rdma_fill(rdma, data, want, 0); done += ret; want -= ret; /* Got what we needed, so go to next iovec */ @@ -2749,25 +2770,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc, if (ret < 0) { rdma->error_state = ret; + rcu_read_unlock(); return ret; } /* * SEND was received with new bytes, now try again. */ - ret = qemu_rdma_fill(rioc->rdma, data, want, 0); + ret = qemu_rdma_fill(rdma, data, want, 0); done += ret; want -= ret; /* Still didn't get enough, so lets just return */ if (want) { if (done == 0) { + rcu_read_unlock(); return QIO_CHANNEL_ERR_BLOCK; } else { break; } } } + rcu_read_unlock(); return done; } @@ -2819,15 +2843,29 @@ qio_channel_rdma_source_prepare(GSource *source, gint *timeout) { QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source; - RDMAContext *rdma = rsource->rioc->rdma; + RDMAContext *rdma; GIOCondition cond = 0; *timeout = -1; + rcu_read_lock(); + if (rsource->condition == G_IO_IN) { + rdma = atomic_rcu_read(&rsource->rioc->rdmain); + } else { + rdma = atomic_rcu_read(&rsource->rioc->rdmaout); + } + + if (!rdma) { + error_report("RDMAContext is NULL when prepare Gsource"); + rcu_read_unlock(); + return FALSE; + } + if (rdma->wr_data[0].control_len) { cond |= G_IO_IN; } cond |= G_IO_OUT; + rcu_read_unlock(); return cond & rsource->condition; } @@ -2835,14 +2873,28 @@ static gboolean qio_channel_rdma_source_check(GSource *source) { QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source; - RDMAContext *rdma = rsource->rioc->rdma; + RDMAContext *rdma; GIOCondition cond = 0; + rcu_read_lock(); + if (rsource->condition == G_IO_IN) { + rdma = atomic_rcu_read(&rsource->rioc->rdmain); + } else { + rdma = atomic_rcu_read(&rsource->rioc->rdmaout); + } + + if (!rdma) { + error_report("RDMAContext is NULL when check Gsource"); + rcu_read_unlock(); + return FALSE; + } + if (rdma->wr_data[0].control_len) { cond |= G_IO_IN; } cond |= G_IO_OUT; + rcu_read_unlock(); return cond & rsource->condition; } @@ -2853,14 +2905,28 @@ qio_channel_rdma_source_dispatch(GSource *source, { QIOChannelFunc func = (QIOChannelFunc)callback; QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source; - RDMAContext *rdma = rsource->rioc->rdma; + RDMAContext *rdma; GIOCondition cond = 0; + rcu_read_lock(); + if (rsource->condition == G_IO_IN) { + rdma = atomic_rcu_read(&rsource->rioc->rdmain); + } else { + rdma = atomic_rcu_read(&rsource->rioc->rdmaout); + } + + if (!rdma) { + error_report("RDMAContext is NULL when dispatch Gsource"); + rcu_read_unlock(); + return FALSE; + } + if (rdma->wr_data[0].control_len) { cond |= G_IO_IN; } cond |= G_IO_OUT; + rcu_read_unlock(); return (*func)(QIO_CHANNEL(rsource->rioc), (cond & rsource->condition), user_data); @@ -2905,15 +2971,32 @@ static int qio_channel_rdma_close(QIOChannel *ioc, Error **errp) { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); + RDMAContext *rdmain, *rdmaout; trace_qemu_rdma_close(); - if (rioc->rdma) { - if (!rioc->rdma->error_state) { - rioc->rdma->error_state = qemu_file_get_error(rioc->file); - } - qemu_rdma_cleanup(rioc->rdma); - g_free(rioc->rdma); - rioc->rdma = NULL; + + rdmain = rioc->rdmain; + if (rdmain) { + atomic_rcu_set(&rioc->rdmain, NULL); + } + + rdmaout = rioc->rdmaout; + if (rdmaout) { + atomic_rcu_set(&rioc->rdmaout, NULL); } + + synchronize_rcu(); + + if (rdmain) { + qemu_rdma_cleanup(rdmain); + } + + if (rdmaout) { + qemu_rdma_cleanup(rdmaout); + } + + g_free(rdmain); + g_free(rdmaout); + return 0; } @@ -2956,12 +3039,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque, size_t size, uint64_t *bytes_sent) { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque); - RDMAContext *rdma = rioc->rdma; + RDMAContext *rdma; int ret; + rcu_read_lock(); + rdma = atomic_rcu_read(&rioc->rdmaout); + + if (!rdma) { + rcu_read_unlock(); + return -EIO; + } + CHECK_ERROR_STATE(); if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + rcu_read_unlock(); return RAM_SAVE_CONTROL_NOT_SUPP; } @@ -3046,9 +3138,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque, } } + rcu_read_unlock(); return RAM_SAVE_CONTROL_DELAYED; err: rdma->error_state = ret; + rcu_read_unlock(); return ret; } @@ -3224,8 +3318,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque) RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT, .repeat = 1 }; QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque); - RDMAContext *rdma = rioc->rdma; - RDMALocalBlocks *local = &rdma->local_ram_blocks; + RDMAContext *rdma; + RDMALocalBlocks *local; RDMAControlHeader head; RDMARegister *reg, *registers; RDMACompress *comp; @@ -3238,8 +3332,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque) int count = 0; int i = 0; + rcu_read_lock(); + rdma = atomic_rcu_read(&rioc->rdmain); + + if (!rdma) { + rcu_read_unlock(); + return -EIO; + } + CHECK_ERROR_STATE(); + local = &rdma->local_ram_blocks; do { trace_qemu_rdma_registration_handle_wait(); @@ -3469,6 +3572,7 @@ out: if (ret < 0) { rdma->error_state = ret; } + rcu_read_unlock(); return ret; } @@ -3482,10 +3586,18 @@ out: static int rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name) { - RDMAContext *rdma = rioc->rdma; + RDMAContext *rdma; int curr; int found = -1; + rcu_read_lock(); + rdma = atomic_rcu_read(&rioc->rdmain); + + if (!rdma) { + rcu_read_unlock(); + return -EIO; + } + /* Find the matching RAMBlock in our local list */ for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) { if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) { @@ -3496,6 +3608,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name) if (found == -1) { error_report("RAMBlock '%s' not found on destination", name); + rcu_read_unlock(); return -ENOENT; } @@ -3503,6 +3616,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name) trace_rdma_block_notification_handle(name, rdma->next_src_index); rdma->next_src_index++; + rcu_read_unlock(); return 0; } @@ -3525,11 +3639,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque, uint64_t flags, void *data) { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque); - RDMAContext *rdma = rioc->rdma; + RDMAContext *rdma; + + rcu_read_lock(); + rdma = atomic_rcu_read(&rioc->rdmaout); + if (!rdma) { + rcu_read_unlock(); + return -EIO; + } CHECK_ERROR_STATE(); if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + rcu_read_unlock(); return 0; } @@ -3537,6 +3659,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque, qemu_put_be64(f, RAM_SAVE_FLAG_HOOK); qemu_fflush(f); + rcu_read_unlock(); return 0; } @@ -3549,13 +3672,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, { Error *local_err = NULL, **errp = &local_err; QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque); - RDMAContext *rdma = rioc->rdma; + RDMAContext *rdma; RDMAControlHeader head = { .len = 0, .repeat = 1 }; int ret = 0; + rcu_read_lock(); + rdma = atomic_rcu_read(&rioc->rdmaout); + if (!rdma) { + rcu_read_unlock(); + return -EIO; + } + CHECK_ERROR_STATE(); if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + rcu_read_unlock(); return 0; } @@ -3587,6 +3718,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, qemu_rdma_reg_whole_ram_blocks : NULL); if (ret < 0) { ERROR(errp, "receiving remote info!"); + rcu_read_unlock(); return ret; } @@ -3610,6 +3742,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, "not identical on both the source and destination.", local->nb_blocks, nb_dest_blocks); rdma->error_state = -EINVAL; + rcu_read_unlock(); return -EINVAL; } @@ -3626,6 +3759,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, local->block[i].length, rdma->dest_blocks[i].length); rdma->error_state = -EINVAL; + rcu_read_unlock(); return -EINVAL; } local->block[i].remote_host_addr = @@ -3643,9 +3777,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, goto err; } + rcu_read_unlock(); return 0; err: rdma->error_state = ret; + rcu_read_unlock(); return ret; } @@ -3663,10 +3799,15 @@ static const QEMUFileHooks rdma_write_hooks = { static void qio_channel_rdma_finalize(Object *obj) { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj); - if (rioc->rdma) { - qemu_rdma_cleanup(rioc->rdma); - g_free(rioc->rdma); - rioc->rdma = NULL; + if (rioc->rdmain) { + qemu_rdma_cleanup(rioc->rdmain); + g_free(rioc->rdmain); + rioc->rdmain = NULL; + } + if (rioc->rdmaout) { + qemu_rdma_cleanup(rioc->rdmaout); + g_free(rioc->rdmaout); + rioc->rdmaout = NULL; } } @@ -3706,13 +3847,16 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode) } rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA)); - rioc->rdma = rdma; if (mode[0] == 'w') { rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc)); + rioc->rdmaout = rdma; + rioc->rdmain = rdma->return_path; qemu_file_set_hooks(rioc->file, &rdma_write_hooks); } else { rioc->file = qemu_fopen_channel_input(QIO_CHANNEL(rioc)); + rioc->rdmain = rdma; + rioc->rdmaout = rdma->return_path; qemu_file_set_hooks(rioc->file, &rdma_read_hooks); } diff --git a/migration/savevm.c b/migration/savevm.c index e2be02a..45ec809 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1573,6 +1573,7 @@ static void *postcopy_ram_listen_thread(void *opaque) qemu_sem_post(&mis->listen_thread_sem); trace_postcopy_ram_listen_thread_start(); + rcu_register_thread(); /* * Because we're a thread and not a coroutine we can't yield * in qemu_file, and thus we must be blocking now. @@ -1605,6 +1606,7 @@ static void *postcopy_ram_listen_thread(void *opaque) * to leave the guest running and fire MCEs for pages that never * arrived as a desperate recovery step. */ + rcu_unregister_thread(); exit(EXIT_FAILURE); } @@ -1619,6 +1621,7 @@ static void *postcopy_ram_listen_thread(void *opaque) migration_incoming_state_destroy(); qemu_loadvm_state_cleanup(); + rcu_unregister_thread(); return NULL; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v3 5/6] migration: implement bi-directional RDMA QIOChannel 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 5/6] migration: implement bi-directional RDMA QIOChannel Lidong Chen @ 2018-05-15 14:54 ` Paolo Bonzini 2018-05-16 9:36 ` 858585 jemmy 0 siblings, 1 reply; 13+ messages in thread From: Paolo Bonzini @ 2018-05-15 14:54 UTC (permalink / raw) To: Lidong Chen, quintela, dgilbert, berrange Cc: adido, galsha, aviadye, qemu-devel, Lidong Chen On 05/05/2018 16:35, Lidong Chen wrote: > @@ -2635,12 +2637,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, > { > QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); > QEMUFile *f = rioc->file; > - RDMAContext *rdma = rioc->rdma; > + RDMAContext *rdma; > int ret; > ssize_t done = 0; > size_t i; > size_t len = 0; > > + rcu_read_lock(); > + rdma = atomic_rcu_read(&rioc->rdmaout); > + > + if (!rdma) { > + rcu_read_unlock(); > + return -EIO; > + } > + > CHECK_ERROR_STATE(); > > /* I am not sure I understand this. It would probably be wrong to use the output side from two threads at the same time, so why not use two mutexes? Also, who is calling qio_channel_rdma_close in such a way that another thread is still using it? Would it be possible to synchronize with the other thread *before*, for example with qemu_thread_join? Thanks, Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v3 5/6] migration: implement bi-directional RDMA QIOChannel 2018-05-15 14:54 ` Paolo Bonzini @ 2018-05-16 9:36 ` 858585 jemmy 2018-05-21 11:49 ` 858585 jemmy 0 siblings, 1 reply; 13+ messages in thread From: 858585 jemmy @ 2018-05-16 9:36 UTC (permalink / raw) To: Paolo Bonzini Cc: Juan Quintela, Dave Gilbert, Daniel P. Berrange, adido, Gal Shachaf, Aviad Yehezkel, qemu-devel, Lidong Chen On Tue, May 15, 2018 at 10:54 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 05/05/2018 16:35, Lidong Chen wrote: >> @@ -2635,12 +2637,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, >> { >> QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); >> QEMUFile *f = rioc->file; >> - RDMAContext *rdma = rioc->rdma; >> + RDMAContext *rdma; >> int ret; >> ssize_t done = 0; >> size_t i; >> size_t len = 0; >> >> + rcu_read_lock(); >> + rdma = atomic_rcu_read(&rioc->rdmaout); >> + >> + if (!rdma) { >> + rcu_read_unlock(); >> + return -EIO; >> + } >> + >> CHECK_ERROR_STATE(); >> >> /* > > I am not sure I understand this. It would probably be wrong to use the > output side from two threads at the same time, so why not use two mutexes? Two thread will not invoke qio_channel_rdma_writev at the same time. The source qemu, migration thread only use writev, and the return path thread only use readv. The destination qemu already have a mutex mis->rp_mutex to make sure not use writev at the same time. The rcu_read_lock is used to protect not use RDMAContext when another thread closes it. > > Also, who is calling qio_channel_rdma_close in such a way that another > thread is still using it? Would it be possible to synchronize with the > other thread *before*, for example with qemu_thread_join? The MigrationState structure includes to_dst_file and from_dst_file QEMUFile, the two QEMUFile use the same QIOChannel. For example, if the return path thread call qemu_fclose(ms->rp_state.from_dst_file), It will also close the RDMAContext for ms->to_dst_file. For live migration, the source qemu invokes qemu_fclose in different threads include main thread, migration thread, return path thread. The destination qemu invokes qemu_fclose in main thread, listen thread and COLO incoming thread. I do not find an effective way to synchronize these threads. Thanks. > > Thanks, > > Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v3 5/6] migration: implement bi-directional RDMA QIOChannel 2018-05-16 9:36 ` 858585 jemmy @ 2018-05-21 11:49 ` 858585 jemmy 2018-05-23 2:36 ` 858585 jemmy 0 siblings, 1 reply; 13+ messages in thread From: 858585 jemmy @ 2018-05-21 11:49 UTC (permalink / raw) To: Paolo Bonzini, Daniel P. Berrange Cc: Juan Quintela, Dave Gilbert, Adi Dotan, Gal Shachaf, Aviad Yehezkel, qemu-devel, Lidong Chen On Wed, May 16, 2018 at 5:36 PM, 858585 jemmy <jemmy858585@gmail.com> wrote: > On Tue, May 15, 2018 at 10:54 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> On 05/05/2018 16:35, Lidong Chen wrote: >>> @@ -2635,12 +2637,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, >>> { >>> QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); >>> QEMUFile *f = rioc->file; >>> - RDMAContext *rdma = rioc->rdma; >>> + RDMAContext *rdma; >>> int ret; >>> ssize_t done = 0; >>> size_t i; >>> size_t len = 0; >>> >>> + rcu_read_lock(); >>> + rdma = atomic_rcu_read(&rioc->rdmaout); >>> + >>> + if (!rdma) { >>> + rcu_read_unlock(); >>> + return -EIO; >>> + } >>> + >>> CHECK_ERROR_STATE(); >>> >>> /* >> >> I am not sure I understand this. It would probably be wrong to use the >> output side from two threads at the same time, so why not use two mutexes? > > Two thread will not invoke qio_channel_rdma_writev at the same time. > The source qemu, migration thread only use writev, and the return path > thread only > use readv. > The destination qemu already have a mutex mis->rp_mutex to make sure > not use writev > at the same time. > > The rcu_read_lock is used to protect not use RDMAContext when another > thread closes it. Any suggestion? > >> >> Also, who is calling qio_channel_rdma_close in such a way that another >> thread is still using it? Would it be possible to synchronize with the >> other thread *before*, for example with qemu_thread_join? > > The MigrationState structure includes to_dst_file and from_dst_file > QEMUFile, the two QEMUFile use the same QIOChannel. > For example, if the return path thread call > qemu_fclose(ms->rp_state.from_dst_file), > It will also close the RDMAContext for ms->to_dst_file. > > For live migration, the source qemu invokes qemu_fclose in different > threads include main thread, migration thread, return path thread. > > The destination qemu invokes qemu_fclose in main thread, listen thread and > COLO incoming thread. > > I do not find an effective way to synchronize these threads. > > Thanks. > >> >> Thanks, >> >> Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v3 5/6] migration: implement bi-directional RDMA QIOChannel 2018-05-21 11:49 ` 858585 jemmy @ 2018-05-23 2:36 ` 858585 jemmy 0 siblings, 0 replies; 13+ messages in thread From: 858585 jemmy @ 2018-05-23 2:36 UTC (permalink / raw) To: Paolo Bonzini, Daniel P. Berrange Cc: Juan Quintela, Dave Gilbert, Adi Dotan, Gal Shachaf, Aviad Yehezkel, qemu-devel, Lidong Chen ping. On Mon, May 21, 2018 at 7:49 PM, 858585 jemmy <jemmy858585@gmail.com> wrote: > On Wed, May 16, 2018 at 5:36 PM, 858585 jemmy <jemmy858585@gmail.com> wrote: >> On Tue, May 15, 2018 at 10:54 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> On 05/05/2018 16:35, Lidong Chen wrote: >>>> @@ -2635,12 +2637,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, >>>> { >>>> QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); >>>> QEMUFile *f = rioc->file; >>>> - RDMAContext *rdma = rioc->rdma; >>>> + RDMAContext *rdma; >>>> int ret; >>>> ssize_t done = 0; >>>> size_t i; >>>> size_t len = 0; >>>> >>>> + rcu_read_lock(); >>>> + rdma = atomic_rcu_read(&rioc->rdmaout); >>>> + >>>> + if (!rdma) { >>>> + rcu_read_unlock(); >>>> + return -EIO; >>>> + } >>>> + >>>> CHECK_ERROR_STATE(); >>>> >>>> /* >>> >>> I am not sure I understand this. It would probably be wrong to use the >>> output side from two threads at the same time, so why not use two mutexes? >> >> Two thread will not invoke qio_channel_rdma_writev at the same time. >> The source qemu, migration thread only use writev, and the return path >> thread only >> use readv. >> The destination qemu already have a mutex mis->rp_mutex to make sure >> not use writev >> at the same time. >> >> The rcu_read_lock is used to protect not use RDMAContext when another >> thread closes it. > > Any suggestion? > >> >>> >>> Also, who is calling qio_channel_rdma_close in such a way that another >>> thread is still using it? Would it be possible to synchronize with the >>> other thread *before*, for example with qemu_thread_join? >> >> The MigrationState structure includes to_dst_file and from_dst_file >> QEMUFile, the two QEMUFile use the same QIOChannel. >> For example, if the return path thread call >> qemu_fclose(ms->rp_state.from_dst_file), >> It will also close the RDMAContext for ms->to_dst_file. >> >> For live migration, the source qemu invokes qemu_fclose in different >> threads include main thread, migration thread, return path thread. >> >> The destination qemu invokes qemu_fclose in main thread, listen thread and >> COLO incoming thread. >> >> I do not find an effective way to synchronize these threads. >> >> Thanks. >> >>> >>> Thanks, >>> >>> Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH v3 6/6] migration: Stop rdma yielding during incoming postcopy 2018-05-05 14:35 [Qemu-devel] [PATCH v3 0/6] Enable postcopy RDMA live migration Lidong Chen ` (4 preceding siblings ...) 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 5/6] migration: implement bi-directional RDMA QIOChannel Lidong Chen @ 2018-05-05 14:35 ` Lidong Chen 5 siblings, 0 replies; 13+ messages in thread From: Lidong Chen @ 2018-05-05 14:35 UTC (permalink / raw) To: quintela, dgilbert, berrange Cc: qemu-devel, galsha, aviadye, adido, Lidong Chen During incoming postcopy, the destination qemu will invoke qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma yield, and poll the completion channel fd instead. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- migration/rdma.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/migration/rdma.c b/migration/rdma.c index 854f355..ed9cfb1 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -1490,11 +1490,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma) * Coroutine doesn't start until migration_fd_process_incoming() * so don't yield unless we know we're running inside of a coroutine. */ - if (rdma->migration_started_on_destination) { + if (rdma->migration_started_on_destination && + migration_incoming_get_current()->state == MIGRATION_STATUS_ACTIVE) { yield_until_fd_readable(rdma->comp_channel->fd); } else { /* This is the source side, we're in a separate thread * or destination prior to migration_fd_process_incoming() + * after postcopy, the destination also in a seprate thread. * we can't yield; so we have to poll the fd. * But we need to be able to handle 'cancel' or an error * without hanging forever. -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-05-23 2:36 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-05-05 14:35 [Qemu-devel] [PATCH v3 0/6] Enable postcopy RDMA live migration Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 1/6] migration: disable RDMA WRITE after postcopy started Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 2/6] migration: create a dedicated connection for rdma return path Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 3/6] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen 2018-05-08 14:19 ` Dr. David Alan Gilbert 2018-05-09 1:28 ` 858585 jemmy 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 4/6] migration: avoid concurrent invoke channel_close by different threads Lidong Chen 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 5/6] migration: implement bi-directional RDMA QIOChannel Lidong Chen 2018-05-15 14:54 ` Paolo Bonzini 2018-05-16 9:36 ` 858585 jemmy 2018-05-21 11:49 ` 858585 jemmy 2018-05-23 2:36 ` 858585 jemmy 2018-05-05 14:35 ` [Qemu-devel] [PATCH v3 6/6] migration: Stop rdma yielding during incoming postcopy Lidong Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).