qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration
@ 2018-08-06 13:29 Lidong Chen
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 01/11] migration: disable RDMA WRITE after postcopy started Lidong Chen
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert; +Cc: qemu-devel, Lidong Chen

The RDMA QIOChannel does not support bi-directional communication, so when RDMA 
live migration with postcopy enabled, the source qemu return path get qemu file 
error.

These patches implement bi-directional communication for RDMA QIOChannel and 
disable the RDMA WRITE during the postcopy phase.

This patch just make postcopy works, and will improve performance later.

[v6]
 - rebase
 - add the check whether release rdma resource has finished(David)
 - remove unnecessary RDMA_CONTROL_ERROR when cleanup(David)
 - poll the cm event for destination qemu

[v5]
 - rebase
 - fix bug for create a dedicated thread to release rdma resource(David)
 - fix bug for poll the cm event while wait RDMA work request completion(David,Gal)

[v4]
 - not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect
 - implement io_set_aio_fd_handler function for RDMA QIOChannel (Juan Quintela)
 - invoke qio_channel_yield only when qemu_in_coroutine() (Juan Quintela)
 - create a dedicated thread to release rdma resource
 - poll the cm event while wait RDMA work request completion
 - implement the shutdown function for RDMA QIOChannel

[v3]
 - add a mutex in QEMUFile struct to avoid concurrent channel close (Daniel)
 - destroy the mutex before free QEMUFile (David)
 - use rdmain and rmdaout instead of rdma->return_path (Daniel)

[v2]
 - does not update bytes_xfer when disable RDMA WRITE (David)
 - implement bi-directional communication for RDMA QIOChannel (Daniel)

Lidong Chen (11):
  migration: disable RDMA WRITE after postcopy started
  migration: create a dedicated connection for rdma return path
  migration: implement bi-directional RDMA QIOChannel
  migration: Stop rdma yielding during incoming postcopy
  migration: implement io_set_aio_fd_handler function for RDMA
    QIOChannel
  migration: invoke qio_channel_yield only when qemu_in_coroutine()
  migration: poll the cm event while wait RDMA work request completion
  migration: implement the shutdown for RDMA QIOChannel
  migration: poll the cm event for destination qemu
  migration: remove the unnecessary RDMA_CONTROL_ERROR message
  migration: create a dedicated thread to release rdma resource

 migration/colo.c              |   2 +
 migration/migration.c         |  11 +-
 migration/migration.h         |   3 +
 migration/postcopy-ram.c      |   2 +
 migration/qemu-file-channel.c |  12 +-
 migration/qemu-file.c         |   8 +-
 migration/ram.c               |   4 +
 migration/rdma.c              | 449 ++++++++++++++++++++++++++++++++++++++----
 migration/savevm.c            |   3 +
 9 files changed, 446 insertions(+), 48 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 01/11] migration: disable RDMA WRITE after postcopy started
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-22 10:04   ` Juan Quintela
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 02/11] migration: create a dedicated connection for rdma return path Lidong Chen
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert
  Cc: qemu-devel, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

RDMA WRITE operations are performed with no notification to the destination
qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE
after postcopy started.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/qemu-file.c |  8 ++++++--
 migration/rdma.c      | 12 ++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 0463f4c..977b9ae 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -253,8 +253,12 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
     if (f->hooks && f->hooks->save_page) {
         int ret = f->hooks->save_page(f, f->opaque, block_offset,
                                       offset, size, bytes_sent);
-        f->bytes_xfer += size;
-        if (ret != RAM_SAVE_CONTROL_DELAYED) {
+        if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
+            f->bytes_xfer += size;
+        }
+
+        if (ret != RAM_SAVE_CONTROL_DELAYED &&
+            ret != RAM_SAVE_CONTROL_NOT_SUPP) {
             if (bytes_sent && *bytes_sent > 0) {
                 qemu_update_position(f, *bytes_sent);
             } else if (ret < 0) {
diff --git a/migration/rdma.c b/migration/rdma.c
index 8bd7159..76424a5 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2921,6 +2921,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return RAM_SAVE_CONTROL_NOT_SUPP;
+    }
+
     qemu_fflush(f);
 
     if (size > 0) {
@@ -3480,6 +3484,10 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
     trace_qemu_rdma_registration_start(flags);
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
     qemu_fflush(f);
@@ -3502,6 +3510,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
 
     CHECK_ERROR_STATE();
 
+    if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        return 0;
+    }
+
     qemu_fflush(f);
     ret = qemu_rdma_drain_cq(f, rdma);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 02/11] migration: create a dedicated connection for rdma return path
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 01/11] migration: disable RDMA WRITE after postcopy started Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 03/11] migration: implement bi-directional RDMA QIOChannel Lidong Chen
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert
  Cc: qemu-devel, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

If start a RDMA migration with postcopy enabled, the source qemu
establish a dedicated connection for return path.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 91 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 76424a5..57af5ed 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -387,6 +387,10 @@ typedef struct RDMAContext {
     uint64_t unregistrations[RDMA_SIGNALED_SEND_MAX];
 
     GHashTable *blockmap;
+
+    /* the RDMAContext for return path */
+    struct RDMAContext *return_path;
+    bool is_return_path;
 } RDMAContext;
 
 #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
@@ -2323,10 +2327,22 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
         rdma_destroy_id(rdma->cm_id);
         rdma->cm_id = NULL;
     }
+
+    /* the destination side, listen_id and channel is shared */
     if (rdma->listen_id) {
-        rdma_destroy_id(rdma->listen_id);
+        if (!rdma->is_return_path) {
+            rdma_destroy_id(rdma->listen_id);
+        }
         rdma->listen_id = NULL;
+
+        if (rdma->channel) {
+            if (!rdma->is_return_path) {
+                rdma_destroy_event_channel(rdma->channel);
+            }
+            rdma->channel = NULL;
+        }
     }
+
     if (rdma->channel) {
         rdma_destroy_event_channel(rdma->channel);
         rdma->channel = NULL;
@@ -2555,6 +2571,25 @@ err_dest_init_create_listen_id:
 
 }
 
+static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path,
+                                            RDMAContext *rdma)
+{
+    int idx;
+
+    for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
+        rdma_return_path->wr_data[idx].control_len = 0;
+        rdma_return_path->wr_data[idx].control_curr = NULL;
+    }
+
+    /*the CM channel and CM id is shared*/
+    rdma_return_path->channel = rdma->channel;
+    rdma_return_path->listen_id = rdma->listen_id;
+
+    rdma->return_path = rdma_return_path;
+    rdma_return_path->return_path = rdma;
+    rdma_return_path->is_return_path = true;
+}
+
 static void *qemu_rdma_data_init(const char *host_port, Error **errp)
 {
     RDMAContext *rdma = NULL;
@@ -3012,6 +3047,8 @@ err:
     return ret;
 }
 
+static void rdma_accept_incoming_migration(void *opaque);
+
 static int qemu_rdma_accept(RDMAContext *rdma)
 {
     RDMACapabilities cap;
@@ -3106,7 +3143,14 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         }
     }
 
-    qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
+    /* Accept the second connection request for return path */
+    if (migrate_postcopy() && !rdma->is_return_path) {
+        qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
+                            NULL,
+                            (void *)(intptr_t)rdma->return_path);
+    } else {
+        qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
+    }
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
     if (ret) {
@@ -3691,6 +3735,10 @@ static void rdma_accept_incoming_migration(void *opaque)
 
     trace_qemu_rdma_accept_incoming_migration_accepted();
 
+    if (rdma->is_return_path) {
+        return;
+    }
+
     f = qemu_fopen_rdma(rdma, "rb");
     if (f == NULL) {
         ERROR(errp, "could not qemu_fopen_rdma!");
@@ -3705,7 +3753,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
     int ret;
-    RDMAContext *rdma;
+    RDMAContext *rdma, *rdma_return_path;
     Error *local_err = NULL;
 
     trace_rdma_start_incoming_migration();
@@ -3732,12 +3780,24 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 
     trace_rdma_start_incoming_migration_after_rdma_listen();
 
+    /* initialize the RDMAContext for return path */
+    if (migrate_postcopy()) {
+        rdma_return_path = qemu_rdma_data_init(host_port, &local_err);
+
+        if (rdma_return_path == NULL) {
+            goto err;
+        }
+
+        qemu_rdma_return_path_dest_init(rdma_return_path, rdma);
+    }
+
     qemu_set_fd_handler(rdma->channel->fd, rdma_accept_incoming_migration,
                         NULL, (void *)(intptr_t)rdma);
     return;
 err:
     error_propagate(errp, local_err);
     g_free(rdma);
+    g_free(rdma_return_path);
 }
 
 void rdma_start_outgoing_migration(void *opaque,
@@ -3745,6 +3805,7 @@ void rdma_start_outgoing_migration(void *opaque,
 {
     MigrationState *s = opaque;
     RDMAContext *rdma = qemu_rdma_data_init(host_port, errp);
+    RDMAContext *rdma_return_path = NULL;
     int ret = 0;
 
     if (rdma == NULL) {
@@ -3765,6 +3826,32 @@ void rdma_start_outgoing_migration(void *opaque,
         goto err;
     }
 
+    /* RDMA postcopy need a seprate queue pair for return path */
+    if (migrate_postcopy()) {
+        rdma_return_path = qemu_rdma_data_init(host_port, errp);
+
+        if (rdma_return_path == NULL) {
+            goto err;
+        }
+
+        ret = qemu_rdma_source_init(rdma_return_path,
+            s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL], errp);
+
+        if (ret) {
+            goto err;
+        }
+
+        ret = qemu_rdma_connect(rdma_return_path, errp);
+
+        if (ret) {
+            goto err;
+        }
+
+        rdma->return_path = rdma_return_path;
+        rdma_return_path->return_path = rdma;
+        rdma_return_path->is_return_path = true;
+    }
+
     trace_rdma_start_outgoing_migration_after_rdma_connect();
 
     s->to_dst_file = qemu_fopen_rdma(rdma, "wb");
@@ -3772,4 +3859,5 @@ void rdma_start_outgoing_migration(void *opaque,
     return;
 err:
     g_free(rdma);
+    g_free(rdma_return_path);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 03/11] migration: implement bi-directional RDMA QIOChannel
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 01/11] migration: disable RDMA WRITE after postcopy started Lidong Chen
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 02/11] migration: create a dedicated connection for rdma return path Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 04/11] migration: Stop rdma yielding during incoming postcopy Lidong Chen
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert
  Cc: qemu-devel, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

This patch implements bi-directional RDMA QIOChannel. Because different
threads may access RDMAQIOChannel currently, this patch use RCU to protect it.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/colo.c         |   2 +
 migration/migration.c    |   2 +
 migration/postcopy-ram.c |   2 +
 migration/ram.c          |   4 +
 migration/rdma.c         | 196 ++++++++++++++++++++++++++++++++++++++++-------
 migration/savevm.c       |   3 +
 6 files changed, 183 insertions(+), 26 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index 4381067..88936f5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -534,6 +534,7 @@ void *colo_process_incoming_thread(void *opaque)
     uint64_t value;
     Error *local_err = NULL;
 
+    rcu_register_thread();
     qemu_sem_init(&mis->colo_incoming_sem, 0);
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
@@ -666,5 +667,6 @@ out:
     }
     migration_incoming_exit_colo();
 
+    rcu_unregister_thread();
     return NULL;
 }
diff --git a/migration/migration.c b/migration/migration.c
index b7d9854..df0c2cf 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2104,6 +2104,7 @@ static void *source_return_path_thread(void *opaque)
     int res;
 
     trace_source_return_path_thread_entry();
+    rcu_register_thread();
 
 retry:
     while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
@@ -2243,6 +2244,7 @@ out:
     trace_source_return_path_thread_end();
     ms->rp_state.from_dst_file = NULL;
     qemu_fclose(rp);
+    rcu_unregister_thread();
     return NULL;
 }
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 932f188..3952d78 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -853,6 +853,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
     RAMBlock *rb = NULL;
 
     trace_postcopy_ram_fault_thread_entry();
+    rcu_register_thread();
     mis->last_rb = NULL; /* last RAMBlock we sent part of */
     qemu_sem_post(&mis->fault_thread_sem);
 
@@ -1059,6 +1060,7 @@ retry:
             }
         }
     }
+    rcu_unregister_thread();
     trace_postcopy_ram_fault_thread_exit();
     g_free(pfd);
     return NULL;
diff --git a/migration/ram.c b/migration/ram.c
index 24dea27..4da0930 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -988,6 +988,7 @@ static void *multifd_send_thread(void *opaque)
     int ret;
 
     trace_multifd_send_thread_start(p->id);
+    rcu_register_thread();
 
     if (multifd_send_initial_packet(p, &local_err) < 0) {
         goto out;
@@ -1050,6 +1051,7 @@ out:
     p->running = false;
     qemu_mutex_unlock(&p->mutex);
 
+    rcu_unregister_thread();
     trace_multifd_send_thread_end(p->id, p->num_packets, p->num_pages);
 
     return NULL;
@@ -1219,6 +1221,7 @@ static void *multifd_recv_thread(void *opaque)
     int ret;
 
     trace_multifd_recv_thread_start(p->id);
+    rcu_register_thread();
 
     while (true) {
         uint32_t used;
@@ -1265,6 +1268,7 @@ static void *multifd_recv_thread(void *opaque)
     p->running = false;
     qemu_mutex_unlock(&p->mutex);
 
+    rcu_unregister_thread();
     trace_multifd_recv_thread_end(p->id, p->num_packets, p->num_pages);
 
     return NULL;
diff --git a/migration/rdma.c b/migration/rdma.c
index 57af5ed..a5535fb 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -86,6 +86,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
                                 " to abort!"); \
                 rdma->error_reported = 1; \
             } \
+            rcu_read_unlock(); \
             return rdma->error_state; \
         } \
     } while (0)
@@ -402,7 +403,8 @@ typedef struct QIOChannelRDMA QIOChannelRDMA;
 
 struct QIOChannelRDMA {
     QIOChannel parent;
-    RDMAContext *rdma;
+    RDMAContext *rdmain;
+    RDMAContext *rdmaout;
     QEMUFile *file;
     bool blocking; /* XXX we don't actually honour this yet */
 };
@@ -2630,12 +2632,20 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     QEMUFile *f = rioc->file;
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int ret;
     ssize_t done = 0;
     size_t i;
     size_t len = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     /*
@@ -2645,6 +2655,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     ret = qemu_rdma_write_flush(f, rdma);
     if (ret < 0) {
         rdma->error_state = ret;
+        rcu_read_unlock();
         return ret;
     }
 
@@ -2664,6 +2675,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 
             if (ret < 0) {
                 rdma->error_state = ret;
+                rcu_read_unlock();
                 return ret;
             }
 
@@ -2672,6 +2684,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         }
     }
 
+    rcu_read_unlock();
     return done;
 }
 
@@ -2705,12 +2718,20 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
                                       Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     RDMAControlHeader head;
     int ret = 0;
     ssize_t i;
     size_t done = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmain);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     for (i = 0; i < niov; i++) {
@@ -2722,7 +2743,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
          * were given and dish out the bytes until we run
          * out of bytes.
          */
-        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
+        ret = qemu_rdma_fill(rdma, data, want, 0);
         done += ret;
         want -= ret;
         /* Got what we needed, so go to next iovec */
@@ -2744,25 +2765,28 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 
         if (ret < 0) {
             rdma->error_state = ret;
+            rcu_read_unlock();
             return ret;
         }
 
         /*
          * SEND was received with new bytes, now try again.
          */
-        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
+        ret = qemu_rdma_fill(rdma, data, want, 0);
         done += ret;
         want -= ret;
 
         /* Still didn't get enough, so lets just return */
         if (want) {
             if (done == 0) {
+                rcu_read_unlock();
                 return QIO_CHANNEL_ERR_BLOCK;
             } else {
                 break;
             }
         }
     }
+    rcu_read_unlock();
     return done;
 }
 
@@ -2814,15 +2838,29 @@ qio_channel_rdma_source_prepare(GSource *source,
                                 gint *timeout)
 {
     QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
-    RDMAContext *rdma = rsource->rioc->rdma;
+    RDMAContext *rdma;
     GIOCondition cond = 0;
     *timeout = -1;
 
+    rcu_read_lock();
+    if (rsource->condition == G_IO_IN) {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
+    } else {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when prepare Gsource");
+        rcu_read_unlock();
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
     cond |= G_IO_OUT;
 
+    rcu_read_unlock();
     return cond & rsource->condition;
 }
 
@@ -2830,14 +2868,28 @@ static gboolean
 qio_channel_rdma_source_check(GSource *source)
 {
     QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
-    RDMAContext *rdma = rsource->rioc->rdma;
+    RDMAContext *rdma;
     GIOCondition cond = 0;
 
+    rcu_read_lock();
+    if (rsource->condition == G_IO_IN) {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
+    } else {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when check Gsource");
+        rcu_read_unlock();
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
     cond |= G_IO_OUT;
 
+    rcu_read_unlock();
     return cond & rsource->condition;
 }
 
@@ -2848,14 +2900,28 @@ qio_channel_rdma_source_dispatch(GSource *source,
 {
     QIOChannelFunc func = (QIOChannelFunc)callback;
     QIOChannelRDMASource *rsource = (QIOChannelRDMASource *)source;
-    RDMAContext *rdma = rsource->rioc->rdma;
+    RDMAContext *rdma;
     GIOCondition cond = 0;
 
+    rcu_read_lock();
+    if (rsource->condition == G_IO_IN) {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmain);
+    } else {
+        rdma = atomic_rcu_read(&rsource->rioc->rdmaout);
+    }
+
+    if (!rdma) {
+        error_report("RDMAContext is NULL when dispatch Gsource");
+        rcu_read_unlock();
+        return FALSE;
+    }
+
     if (rdma->wr_data[0].control_len) {
         cond |= G_IO_IN;
     }
     cond |= G_IO_OUT;
 
+    rcu_read_unlock();
     return (*func)(QIO_CHANNEL(rsource->rioc),
                    (cond & rsource->condition),
                    user_data);
@@ -2900,15 +2966,32 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
                                   Error **errp)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    RDMAContext *rdmain, *rdmaout;
     trace_qemu_rdma_close();
-    if (rioc->rdma) {
-        if (!rioc->rdma->error_state) {
-            rioc->rdma->error_state = qemu_file_get_error(rioc->file);
-        }
-        qemu_rdma_cleanup(rioc->rdma);
-        g_free(rioc->rdma);
-        rioc->rdma = NULL;
+
+    rdmain = rioc->rdmain;
+    if (rdmain) {
+        atomic_rcu_set(&rioc->rdmain, NULL);
+    }
+
+    rdmaout = rioc->rdmaout;
+    if (rdmaout) {
+        atomic_rcu_set(&rioc->rdmaout, NULL);
     }
+
+    synchronize_rcu();
+
+    if (rdmain) {
+        qemu_rdma_cleanup(rdmain);
+    }
+
+    if (rdmaout) {
+        qemu_rdma_cleanup(rdmaout);
+    }
+
+    g_free(rdmain);
+    g_free(rdmaout);
+
     return 0;
 }
 
@@ -2951,12 +3034,21 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
                                   size_t size, uint64_t *bytes_sent)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int ret;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return RAM_SAVE_CONTROL_NOT_SUPP;
     }
 
@@ -3041,9 +3133,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
         }
     }
 
+    rcu_read_unlock();
     return RAM_SAVE_CONTROL_DELAYED;
 err:
     rdma->error_state = ret;
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3219,8 +3313,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
     RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
                                  .repeat = 1 };
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
-    RDMALocalBlocks *local = &rdma->local_ram_blocks;
+    RDMAContext *rdma;
+    RDMALocalBlocks *local;
     RDMAControlHeader head;
     RDMARegister *reg, *registers;
     RDMACompress *comp;
@@ -3233,8 +3327,17 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
     int count = 0;
     int i = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmain);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
+    local = &rdma->local_ram_blocks;
     do {
         trace_qemu_rdma_registration_handle_wait();
 
@@ -3468,6 +3571,7 @@ out:
     if (ret < 0) {
         rdma->error_state = ret;
     }
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3481,10 +3585,18 @@ out:
 static int
 rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
 {
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     int curr;
     int found = -1;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmain);
+
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     /* Find the matching RAMBlock in our local list */
     for (curr = 0; curr < rdma->local_ram_blocks.nb_blocks; curr++) {
         if (!strcmp(rdma->local_ram_blocks.block[curr].block_name, name)) {
@@ -3495,6 +3607,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
 
     if (found == -1) {
         error_report("RAMBlock '%s' not found on destination", name);
+        rcu_read_unlock();
         return -ENOENT;
     }
 
@@ -3502,6 +3615,7 @@ rdma_block_notification_handle(QIOChannelRDMA *rioc, const char *name)
     trace_rdma_block_notification_handle(name, rdma->next_src_index);
     rdma->next_src_index++;
 
+    rcu_read_unlock();
     return 0;
 }
 
@@ -3524,11 +3638,19 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
                                         uint64_t flags, void *data)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
+
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
 
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return 0;
     }
 
@@ -3536,6 +3658,7 @@ static int qemu_rdma_registration_start(QEMUFile *f, void *opaque,
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
     qemu_fflush(f);
 
+    rcu_read_unlock();
     return 0;
 }
 
@@ -3548,13 +3671,21 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
 {
     Error *local_err = NULL, **errp = &local_err;
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(opaque);
-    RDMAContext *rdma = rioc->rdma;
+    RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret = 0;
 
+    rcu_read_lock();
+    rdma = atomic_rcu_read(&rioc->rdmaout);
+    if (!rdma) {
+        rcu_read_unlock();
+        return -EIO;
+    }
+
     CHECK_ERROR_STATE();
 
     if (migrate_get_current()->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        rcu_read_unlock();
         return 0;
     }
 
@@ -3586,6 +3717,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                     qemu_rdma_reg_whole_ram_blocks : NULL);
         if (ret < 0) {
             ERROR(errp, "receiving remote info!");
+            rcu_read_unlock();
             return ret;
         }
 
@@ -3609,6 +3741,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                         "not identical on both the source and destination.",
                         local->nb_blocks, nb_dest_blocks);
             rdma->error_state = -EINVAL;
+            rcu_read_unlock();
             return -EINVAL;
         }
 
@@ -3625,6 +3758,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
                             local->block[i].length,
                             rdma->dest_blocks[i].length);
                 rdma->error_state = -EINVAL;
+                rcu_read_unlock();
                 return -EINVAL;
             }
             local->block[i].remote_host_addr =
@@ -3642,9 +3776,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
         goto err;
     }
 
+    rcu_read_unlock();
     return 0;
 err:
     rdma->error_state = ret;
+    rcu_read_unlock();
     return ret;
 }
 
@@ -3662,10 +3798,15 @@ static const QEMUFileHooks rdma_write_hooks = {
 static void qio_channel_rdma_finalize(Object *obj)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(obj);
-    if (rioc->rdma) {
-        qemu_rdma_cleanup(rioc->rdma);
-        g_free(rioc->rdma);
-        rioc->rdma = NULL;
+    if (rioc->rdmain) {
+        qemu_rdma_cleanup(rioc->rdmain);
+        g_free(rioc->rdmain);
+        rioc->rdmain = NULL;
+    }
+    if (rioc->rdmaout) {
+        qemu_rdma_cleanup(rioc->rdmaout);
+        g_free(rioc->rdmaout);
+        rioc->rdmaout = NULL;
     }
 }
 
@@ -3705,13 +3846,16 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode)
     }
 
     rioc = QIO_CHANNEL_RDMA(object_new(TYPE_QIO_CHANNEL_RDMA));
-    rioc->rdma = rdma;
 
     if (mode[0] == 'w') {
         rioc->file = qemu_fopen_channel_output(QIO_CHANNEL(rioc));
+        rioc->rdmaout = rdma;
+        rioc->rdmain = rdma->return_path;
         qemu_file_set_hooks(rioc->file, &rdma_write_hooks);
     } else {
         rioc->file = qemu_fopen_channel_input(QIO_CHANNEL(rioc));
+        rioc->rdmain = rdma;
+        rioc->rdmaout = rdma->return_path;
         qemu_file_set_hooks(rioc->file, &rdma_read_hooks);
     }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 7f92567..13e51f0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1622,6 +1622,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
     qemu_sem_post(&mis->listen_thread_sem);
     trace_postcopy_ram_listen_thread_start();
 
+    rcu_register_thread();
     /*
      * Because we're a thread and not a coroutine we can't yield
      * in qemu_file, and thus we must be blocking now.
@@ -1662,6 +1663,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
          * to leave the guest running and fire MCEs for pages that never
          * arrived as a desperate recovery step.
          */
+        rcu_unregister_thread();
         exit(EXIT_FAILURE);
     }
 
@@ -1676,6 +1678,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
     migration_incoming_state_destroy();
     qemu_loadvm_state_cleanup();
 
+    rcu_unregister_thread();
     return NULL;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 04/11] migration: Stop rdma yielding during incoming postcopy
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
                   ` (2 preceding siblings ...)
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 03/11] migration: implement bi-directional RDMA QIOChannel Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-22 10:06   ` Juan Quintela
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 05/11] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert
  Cc: qemu-devel, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

During incoming postcopy, the destination qemu will invoke
qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma
yield, and poll the completion channel fd instead.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a5535fb..cfb0671 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1493,11 +1493,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
      * Coroutine doesn't start until migration_fd_process_incoming()
      * so don't yield unless we know we're running inside of a coroutine.
      */
-    if (rdma->migration_started_on_destination) {
+    if (rdma->migration_started_on_destination &&
+        migration_incoming_get_current()->state == MIGRATION_STATUS_ACTIVE) {
         yield_until_fd_readable(rdma->comp_channel->fd);
     } else {
         /* This is the source side, we're in a separate thread
          * or destination prior to migration_fd_process_incoming()
+         * after postcopy, the destination also in a seprate thread.
          * we can't yield; so we have to poll the fd.
          * But we need to be able to handle 'cancel' or an error
          * without hanging forever.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 05/11] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
                   ` (3 preceding siblings ...)
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 04/11] migration: Stop rdma yielding during incoming postcopy Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-22 10:06   ` Juan Quintela
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 06/11] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert
  Cc: qemu-devel, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu
crash.

The backtrace is:
(gdb) bt
    #0  0x0000000000000000 in ?? ()
    #1  0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080,
        io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c:
    #2  0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438
    #3  0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47
    #4  0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327
        at migration/qemu-file-channel.c:83
    #5  0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299
    #6  0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562
    #7  0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575
    #8  0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655
    #9  0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126
    #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366
    #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1
    #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6
    #13 0x00007f96fe858760 in ?? ()
    #14 0x0000000000000000 in ?? ()

RDMA QIOChannel not implement io_set_aio_fd_handler. so
qio_channel_set_aio_fd_handler will access NULL pointer.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 migration/rdma.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index cfb0671..d6bbf28 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2963,6 +2963,21 @@ static GSource *qio_channel_rdma_create_watch(QIOChannel *ioc,
     return source;
 }
 
+static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
+                                                  AioContext *ctx,
+                                                  IOHandler *io_read,
+                                                  IOHandler *io_write,
+                                                  void *opaque)
+{
+    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    if (io_read) {
+        aio_set_fd_handler(ctx, rioc->rdmain->comp_channel->fd,
+                           false, io_read, io_write, NULL, opaque);
+    } else {
+        aio_set_fd_handler(ctx, rioc->rdmaout->comp_channel->fd,
+                           false, io_read, io_write, NULL, opaque);
+    }
+}
 
 static int qio_channel_rdma_close(QIOChannel *ioc,
                                   Error **errp)
@@ -3822,6 +3837,7 @@ static void qio_channel_rdma_class_init(ObjectClass *klass,
     ioc_klass->io_set_blocking = qio_channel_rdma_set_blocking;
     ioc_klass->io_close = qio_channel_rdma_close;
     ioc_klass->io_create_watch = qio_channel_rdma_create_watch;
+    ioc_klass->io_set_aio_fd_handler = qio_channel_rdma_set_aio_fd_handler;
 }
 
 static const TypeInfo qio_channel_rdma_info = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 06/11] migration: invoke qio_channel_yield only when qemu_in_coroutine()
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
                   ` (4 preceding siblings ...)
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 05/11] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-22 10:07   ` Juan Quintela
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion Lidong Chen
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert
  Cc: qemu-devel, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash.

The backtrace is:
    (gdb) bt
    #0  0x00007fb20aba91d7 in raise () from /lib64/libc.so.6
    #1  0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6
    #2  0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6
    #3  0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6
    #4  0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460
    #5  0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768)
        at migration/qemu-file-channel.c:83
    #6  0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299
    #7  0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562
    #8  0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575
    #9  0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647
    #10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794
    #11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504
    #12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0
    #13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6

This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine().

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 migration/qemu-file-channel.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
index e202d73..8e639eb 100644
--- a/migration/qemu-file-channel.c
+++ b/migration/qemu-file-channel.c
@@ -49,7 +49,11 @@ static ssize_t channel_writev_buffer(void *opaque,
         ssize_t len;
         len = qio_channel_writev(ioc, local_iov, nlocal_iov, NULL);
         if (len == QIO_CHANNEL_ERR_BLOCK) {
-            qio_channel_wait(ioc, G_IO_OUT);
+            if (qemu_in_coroutine()) {
+                qio_channel_yield(ioc, G_IO_OUT);
+            } else {
+                qio_channel_wait(ioc, G_IO_OUT);
+            }
             continue;
         }
         if (len < 0) {
@@ -80,7 +84,11 @@ static ssize_t channel_get_buffer(void *opaque,
         ret = qio_channel_read(ioc, (char *)buf, size, NULL);
         if (ret < 0) {
             if (ret == QIO_CHANNEL_ERR_BLOCK) {
-                qio_channel_yield(ioc, G_IO_IN);
+                if (qemu_in_coroutine()) {
+                    qio_channel_yield(ioc, G_IO_IN);
+                } else {
+                    qio_channel_wait(ioc, G_IO_IN);
+                }
             } else {
                 /* XXX handle Error * object */
                 return -EIO;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
                   ` (5 preceding siblings ...)
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 06/11] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-17 12:24   ` Dr. David Alan Gilbert
  2018-08-22 10:07   ` Juan Quintela
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 08/11] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert
  Cc: qemu-devel, Lidong Chen, Lidong Chen, Gal Shachaf, Aviad Yehezkel

From: Lidong Chen <jemmy858585@gmail.com>

If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
maybe loop forever. so we should also poll the cm event fd, and when
receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL,
we consider some error happened.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: Gal Shachaf <galsha@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
---
 migration/rdma.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d6bbf28..673f126 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
  */
 static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
 {
+    struct rdma_cm_event *cm_event;
+    int ret = -1;
+
     /*
      * Coroutine doesn't start until migration_fd_process_incoming()
      * so don't yield unless we know we're running inside of a coroutine.
@@ -1505,13 +1508,37 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
          * without hanging forever.
          */
         while (!rdma->error_state  && !rdma->received_error) {
-            GPollFD pfds[1];
+            GPollFD pfds[2];
             pfds[0].fd = rdma->comp_channel->fd;
             pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+            pfds[0].revents = 0;
+
+            pfds[1].fd = rdma->channel->fd;
+            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+            pfds[1].revents = 0;
+
             /* 0.1s timeout, should be fine for a 'cancel' */
-            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
+            switch (qemu_poll_ns(pfds, 2, 100 * 1000 * 1000)) {
+            case 2:
             case 1: /* fd active */
-                return 0;
+                if (pfds[0].revents) {
+                    return 0;
+                }
+
+                if (pfds[1].revents) {
+                    ret = rdma_get_cm_event(rdma->channel, &cm_event);
+                    if (!ret) {
+                        rdma_ack_cm_event(cm_event);
+                    }
+
+                    error_report("receive cm event while wait comp channel,"
+                                 "cm event is %d", cm_event->event);
+                    if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
+                        cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
+                        return -EPIPE;
+                    }
+                }
+                break;
 
             case 0: /* Timeout, go around again */
                 break;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 08/11] migration: implement the shutdown for RDMA QIOChannel
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
                   ` (6 preceding siblings ...)
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-22 10:08   ` Juan Quintela
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 09/11] migration: poll the cm event for destination qemu Lidong Chen
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert
  Cc: qemu-devel, Lidong Chen, Lidong Chen

From: Lidong Chen <jemmy858585@gmail.com>

Because RDMA QIOChannel not implement shutdown function,
If the to_dst_file was set error, the return path thread
will wait forever. and the migration thread will wait
return path thread exit.

the backtrace of return path thread is:

(gdb) bt
    #0  0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6
    #1  0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000)
        at qemu-timer.c:325
    #2  0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000)
        at migration/rdma.c:1501
    #3  0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000,
        byte_len=0x7ef7091d0640) at migration/rdma.c:1580
    #4  0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000,
        head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726
    #5  0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720,
        expecting=3) at migration/rdma.c:1903
    #6  0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8,
        size=32768) at migration/rdma.c:2714
    #7  0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232
    #8  0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0)
        at migration/qemu-file.c:502
    #9  0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515
    #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591
    #11 0x00000000006a46d3 in source_return_path_thread (
        opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331
    #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    #13 0x00007f372a77635d in clone () from /lib64/libc.so.6

the backtrace of migration thread is:

(gdb) bt
    #0  0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0
    #1  0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>)
        at util/qemu-thread-posix.c:504
    #2  0x00000000006a4bc5 in await_return_path_close_on_source (
        ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460
    #3  0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>,
        current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980)
        at migration/migration.c:1695
    #4  0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>)
        at migration/migration.c:1837
    #5  0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
    #6  0x00007f372a77635d in clone () from /lib64/libc.so.6

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 673f126..1affc46 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3039,6 +3039,45 @@ static int qio_channel_rdma_close(QIOChannel *ioc,
     return 0;
 }
 
+static int
+qio_channel_rdma_shutdown(QIOChannel *ioc,
+                            QIOChannelShutdown how,
+                            Error **errp)
+{
+    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    RDMAContext *rdmain, *rdmaout;
+
+    rcu_read_lock();
+
+    rdmain = atomic_rcu_read(&rioc->rdmain);
+    rdmaout = atomic_rcu_read(&rioc->rdmain);
+
+    switch (how) {
+    case QIO_CHANNEL_SHUTDOWN_READ:
+        if (rdmain) {
+            rdmain->error_state = -1;
+        }
+        break;
+    case QIO_CHANNEL_SHUTDOWN_WRITE:
+        if (rdmaout) {
+            rdmaout->error_state = -1;
+        }
+        break;
+    case QIO_CHANNEL_SHUTDOWN_BOTH:
+    default:
+        if (rdmain) {
+            rdmain->error_state = -1;
+        }
+        if (rdmaout) {
+            rdmaout->error_state = -1;
+        }
+        break;
+    }
+
+    rcu_read_unlock();
+    return 0;
+}
+
 /*
  * Parameters:
  *    @offset == 0 :
@@ -3865,6 +3904,7 @@ static void qio_channel_rdma_class_init(ObjectClass *klass,
     ioc_klass->io_close = qio_channel_rdma_close;
     ioc_klass->io_create_watch = qio_channel_rdma_create_watch;
     ioc_klass->io_set_aio_fd_handler = qio_channel_rdma_set_aio_fd_handler;
+    ioc_klass->io_shutdown = qio_channel_rdma_shutdown;
 }
 
 static const TypeInfo qio_channel_rdma_info = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 09/11] migration: poll the cm event for destination qemu
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
                   ` (7 preceding siblings ...)
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 08/11] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-17 14:01   ` Dr. David Alan Gilbert
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 10/11] migration: remove the unnecessary RDMA_CONTROL_ERROR message Lidong Chen
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 11/11] migration: create a dedicated thread to release rdma resource Lidong Chen
  10 siblings, 1 reply; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert; +Cc: qemu-devel, Lidong Chen

The destination qemu only poll the comp_channel->fd in
qemu_rdma_wait_comp_channel. But when source qemu disconnnect
the rdma connection, the destination qemu should be notified.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/migration.c |  3 ++-
 migration/rdma.c      | 32 +++++++++++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index df0c2cf..f7d6e26 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -389,6 +389,7 @@ static void process_incoming_migration_co(void *opaque)
     int ret;
 
     assert(mis->from_src_file);
+    mis->migration_incoming_co = qemu_coroutine_self();
     mis->largest_page_size = qemu_ram_pagesize_largest();
     postcopy_state_set(POSTCOPY_INCOMING_NONE);
     migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
@@ -418,7 +419,6 @@ static void process_incoming_migration_co(void *opaque)
 
     /* we get COLO info, and know if we are in COLO mode */
     if (!ret && migration_incoming_enable_colo()) {
-        mis->migration_incoming_co = qemu_coroutine_self();
         qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
              colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
         mis->have_colo_incoming_thread = true;
@@ -442,6 +442,7 @@ static void process_incoming_migration_co(void *opaque)
     }
     mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
     qemu_bh_schedule(mis->bh);
+    mis->migration_incoming_co = NULL;
 }
 
 static void migration_incoming_setup(QEMUFile *f)
diff --git a/migration/rdma.c b/migration/rdma.c
index 1affc46..ae07515 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3226,6 +3226,35 @@ err:
 
 static void rdma_accept_incoming_migration(void *opaque);
 
+static void rdma_cm_poll_handler(void *opaque)
+{
+    RDMAContext *rdma = opaque;
+    int ret;
+    struct rdma_cm_event *cm_event;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    ret = rdma_get_cm_event(rdma->channel, &cm_event);
+    if (ret) {
+        error_report("get_cm_event failed %d", errno);
+        return;
+    }
+    rdma_ack_cm_event(cm_event);
+
+    if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
+        cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
+        error_report("receive cm event, cm event is %d", cm_event->event);
+        rdma->error_state = -EPIPE;
+        if (rdma->return_path) {
+            rdma->return_path->error_state = -EPIPE;
+        }
+
+        if (mis->migration_incoming_co) {
+            qemu_coroutine_enter(mis->migration_incoming_co);
+        }
+        return;
+    }
+}
+
 static int qemu_rdma_accept(RDMAContext *rdma)
 {
     RDMACapabilities cap;
@@ -3326,7 +3355,8 @@ static int qemu_rdma_accept(RDMAContext *rdma)
                             NULL,
                             (void *)(intptr_t)rdma->return_path);
     } else {
-        qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
+        qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
+                            NULL, rdma);
     }
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 10/11] migration: remove the unnecessary RDMA_CONTROL_ERROR message
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
                   ` (8 preceding siblings ...)
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 09/11] migration: poll the cm event for destination qemu Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-17 14:04   ` Dr. David Alan Gilbert
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 11/11] migration: create a dedicated thread to release rdma resource Lidong Chen
  10 siblings, 1 reply; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert; +Cc: qemu-devel, Lidong Chen

It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource.
If rdma->error_state is ture, the message may not send successfully.
and the cm event can also notify the peer qemu.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index ae07515..e1498f2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2305,17 +2305,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
-        if ((rdma->error_state ||
-             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
-            !rdma->received_error) {
-            RDMAControlHeader head = { .len = 0,
-                                       .type = RDMA_CONTROL_ERROR,
-                                       .repeat = 1,
-                                     };
-            error_report("Early error. Sending error.");
-            qemu_rdma_post_send_control(rdma, NULL, &head);
-        }
-
         rdma_disconnect(rdma->cm_id);
         trace_qemu_rdma_cleanup_disconnect();
         rdma->connected = false;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v6 11/11] migration: create a dedicated thread to release rdma resource
  2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
                   ` (9 preceding siblings ...)
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 10/11] migration: remove the unnecessary RDMA_CONTROL_ERROR message Lidong Chen
@ 2018-08-06 13:29 ` Lidong Chen
  2018-08-17 14:59   ` Dr. David Alan Gilbert
  10 siblings, 1 reply; 25+ messages in thread
From: Lidong Chen @ 2018-08-06 13:29 UTC (permalink / raw)
  To: zhang.zhanghailiang, quintela, dgilbert; +Cc: qemu-devel, Lidong Chen

ibv_dereg_mr wait for a long time for big memory size virtual server.

The test result is:
  10GB  326ms
  20GB  699ms
  30GB  1021ms
  40GB  1387ms
  50GB  1712ms
  60GB  2034ms
  70GB  2457ms
  80GB  2807ms
  90GB  3107ms
  100GB 3474ms
  110GB 3735ms
  120GB 4064ms
  130GB 4567ms
  140GB 4886ms

this will cause the guest os hang for a while when migration finished.
So create a dedicated thread to release rdma resource.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/migration.c |  6 ++++++
 migration/migration.h |  3 +++
 migration/rdma.c      | 47 +++++++++++++++++++++++++++++++----------------
 3 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f7d6e26..25d9009 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1499,6 +1499,7 @@ void migrate_init(MigrationState *s)
     s->vm_was_running = false;
     s->iteration_initial_bytes = 0;
     s->threshold_size = 0;
+    s->rdma_cleanup_thread_quit = true;
 }
 
 static GSList *migration_blockers;
@@ -1660,6 +1661,10 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
         return false;
     }
 
+    if (s->rdma_cleanup_thread_quit != true) {
+        return false;
+    }
+
     if (runstate_check(RUN_STATE_INMIGRATE)) {
         error_setg(errp, "Guest is waiting for an incoming migration");
         return false;
@@ -3213,6 +3218,7 @@ static void migration_instance_init(Object *obj)
 
     ms->state = MIGRATION_STATUS_NONE;
     ms->mbps = -1;
+    ms->rdma_cleanup_thread_quit = true;
     qemu_sem_init(&ms->pause_sem, 0);
     qemu_mutex_init(&ms->error_mutex);
 
diff --git a/migration/migration.h b/migration/migration.h
index 64a7b33..60138dd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -224,6 +224,9 @@ struct MigrationState
      * do not trigger spurious decompression errors.
      */
     bool decompress_error_check;
+
+    /* Set this when rdma resource have released */
+    bool rdma_cleanup_thread_quit;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/rdma.c b/migration/rdma.c
index e1498f2..3282f35 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2995,35 +2995,50 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
     }
 }
 
-static int qio_channel_rdma_close(QIOChannel *ioc,
-                                  Error **errp)
+static void *qio_channel_rdma_close_thread(void *arg)
 {
-    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
-    RDMAContext *rdmain, *rdmaout;
-    trace_qemu_rdma_close();
-
-    rdmain = rioc->rdmain;
-    if (rdmain) {
-        atomic_rcu_set(&rioc->rdmain, NULL);
-    }
+    RDMAContext **rdma = arg;
+    RDMAContext *rdmain = rdma[0];
+    RDMAContext *rdmaout = rdma[1];
+    MigrationState *s = migrate_get_current();
 
-    rdmaout = rioc->rdmaout;
-    if (rdmaout) {
-        atomic_rcu_set(&rioc->rdmaout, NULL);
-    }
+    rcu_register_thread();
 
     synchronize_rcu();
-
     if (rdmain) {
         qemu_rdma_cleanup(rdmain);
     }
-
     if (rdmaout) {
         qemu_rdma_cleanup(rdmaout);
     }
 
     g_free(rdmain);
     g_free(rdmaout);
+    g_free(rdma);
+
+    rcu_unregister_thread();
+    s->rdma_cleanup_thread_quit = true;
+    return NULL;
+}
+
+static int qio_channel_rdma_close(QIOChannel *ioc,
+                                  Error **errp)
+{
+    QemuThread t;
+    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+    RDMAContext **rdma = g_new0(RDMAContext*, 2);
+    MigrationState *s = migrate_get_current();
+
+    trace_qemu_rdma_close();
+    if (rioc->rdmain || rioc->rdmaout) {
+        rdma[0] =  rioc->rdmain;
+        rdma[1] =  rioc->rdmaout;
+        atomic_rcu_set(&rioc->rdmain, NULL);
+        atomic_rcu_set(&rioc->rdmaout, NULL);
+        s->rdma_cleanup_thread_quit = false;
+        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
+                           rdma, QEMU_THREAD_DETACHED);
+    }
 
     return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion Lidong Chen
@ 2018-08-17 12:24   ` Dr. David Alan Gilbert
  2018-08-22 10:07   ` Juan Quintela
  1 sibling, 0 replies; 25+ messages in thread
From: Dr. David Alan Gilbert @ 2018-08-17 12:24 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, quintela, qemu-devel, Lidong Chen,
	Gal Shachaf, Aviad Yehezkel

* Lidong Chen (jemmy858585@gmail.com) wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
> 
> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
> maybe loop forever. so we should also poll the cm event fd, and when
> receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL,
> we consider some error happened.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> Signed-off-by: Gal Shachaf <galsha@mellanox.com>
> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>

I found the doc in the man page for rdma_create_event_channel
that said 'Users may make the fd non-blocking, poll or select the fd,
etc', so:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/rdma.c | 33 ++++++++++++++++++++++++++++++---
>  1 file changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index d6bbf28..673f126 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
>   */
>  static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>  {
> +    struct rdma_cm_event *cm_event;
> +    int ret = -1;
> +
>      /*
>       * Coroutine doesn't start until migration_fd_process_incoming()
>       * so don't yield unless we know we're running inside of a coroutine.
> @@ -1505,13 +1508,37 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
>           * without hanging forever.
>           */
>          while (!rdma->error_state  && !rdma->received_error) {
> -            GPollFD pfds[1];
> +            GPollFD pfds[2];
>              pfds[0].fd = rdma->comp_channel->fd;
>              pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> +            pfds[0].revents = 0;
> +
> +            pfds[1].fd = rdma->channel->fd;
> +            pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> +            pfds[1].revents = 0;
> +
>              /* 0.1s timeout, should be fine for a 'cancel' */
> -            switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
> +            switch (qemu_poll_ns(pfds, 2, 100 * 1000 * 1000)) {
> +            case 2:
>              case 1: /* fd active */
> -                return 0;
> +                if (pfds[0].revents) {
> +                    return 0;
> +                }
> +
> +                if (pfds[1].revents) {
> +                    ret = rdma_get_cm_event(rdma->channel, &cm_event);
> +                    if (!ret) {
> +                        rdma_ack_cm_event(cm_event);
> +                    }
> +
> +                    error_report("receive cm event while wait comp channel,"
> +                                 "cm event is %d", cm_event->event);
> +                    if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
> +                        cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
> +                        return -EPIPE;
> +                    }
> +                }
> +                break;
>  
>              case 0: /* Timeout, go around again */
>                  break;
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/11] migration: poll the cm event for destination qemu
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 09/11] migration: poll the cm event for destination qemu Lidong Chen
@ 2018-08-17 14:01   ` Dr. David Alan Gilbert
  2018-08-20  8:35     ` 858585 jemmy
  0 siblings, 1 reply; 25+ messages in thread
From: Dr. David Alan Gilbert @ 2018-08-17 14:01 UTC (permalink / raw)
  To: Lidong Chen; +Cc: zhang.zhanghailiang, quintela, qemu-devel, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> The destination qemu only poll the comp_channel->fd in
> qemu_rdma_wait_comp_channel. But when source qemu disconnnect
> the rdma connection, the destination qemu should be notified.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

OK, this could do with an update to the migration_incoming_co comment in
migration.h, since previously it was only used by colo; if we merge this
first please post a patch to update the comment.

Other than that, I think I'm OK:

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c |  3 ++-
>  migration/rdma.c      | 32 +++++++++++++++++++++++++++++++-
>  2 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index df0c2cf..f7d6e26 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -389,6 +389,7 @@ static void process_incoming_migration_co(void *opaque)
>      int ret;
>  
>      assert(mis->from_src_file);
> +    mis->migration_incoming_co = qemu_coroutine_self();
>      mis->largest_page_size = qemu_ram_pagesize_largest();
>      postcopy_state_set(POSTCOPY_INCOMING_NONE);
>      migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
> @@ -418,7 +419,6 @@ static void process_incoming_migration_co(void *opaque)
>  
>      /* we get COLO info, and know if we are in COLO mode */
>      if (!ret && migration_incoming_enable_colo()) {
> -        mis->migration_incoming_co = qemu_coroutine_self();
>          qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
>               colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
>          mis->have_colo_incoming_thread = true;
> @@ -442,6 +442,7 @@ static void process_incoming_migration_co(void *opaque)
>      }
>      mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
>      qemu_bh_schedule(mis->bh);
> +    mis->migration_incoming_co = NULL;
>  }
>  
>  static void migration_incoming_setup(QEMUFile *f)
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 1affc46..ae07515 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3226,6 +3226,35 @@ err:
>  
>  static void rdma_accept_incoming_migration(void *opaque);
>  
> +static void rdma_cm_poll_handler(void *opaque)
> +{
> +    RDMAContext *rdma = opaque;
> +    int ret;
> +    struct rdma_cm_event *cm_event;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +
> +    ret = rdma_get_cm_event(rdma->channel, &cm_event);
> +    if (ret) {
> +        error_report("get_cm_event failed %d", errno);
> +        return;
> +    }
> +    rdma_ack_cm_event(cm_event);
> +
> +    if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
> +        cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
> +        error_report("receive cm event, cm event is %d", cm_event->event);
> +        rdma->error_state = -EPIPE;
> +        if (rdma->return_path) {
> +            rdma->return_path->error_state = -EPIPE;
> +        }
> +
> +        if (mis->migration_incoming_co) {
> +            qemu_coroutine_enter(mis->migration_incoming_co);
> +        }
> +        return;
> +    }
> +}
> +
>  static int qemu_rdma_accept(RDMAContext *rdma)
>  {
>      RDMACapabilities cap;
> @@ -3326,7 +3355,8 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>                              NULL,
>                              (void *)(intptr_t)rdma->return_path);
>      } else {
> -        qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
> +        qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
> +                            NULL, rdma);
>      }
>  
>      ret = rdma_accept(rdma->cm_id, &conn_param);
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 10/11] migration: remove the unnecessary RDMA_CONTROL_ERROR message
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 10/11] migration: remove the unnecessary RDMA_CONTROL_ERROR message Lidong Chen
@ 2018-08-17 14:04   ` Dr. David Alan Gilbert
  2018-08-20  9:04     ` 858585 jemmy
  0 siblings, 1 reply; 25+ messages in thread
From: Dr. David Alan Gilbert @ 2018-08-17 14:04 UTC (permalink / raw)
  To: Lidong Chen; +Cc: zhang.zhanghailiang, quintela, qemu-devel, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource.
> If rdma->error_state is ture, the message may not send successfully.
> and the cm event can also notify the peer qemu.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

How does this keep 'cancel' working; I added 32bce196344 last year to
make that code also send the RDMA_CONTROL_ERROR in 'cancelling'.

Dave

> ---
>  migration/rdma.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index ae07515..e1498f2 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2305,17 +2305,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>      int idx;
>  
>      if (rdma->cm_id && rdma->connected) {
> -        if ((rdma->error_state ||
> -             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
> -            !rdma->received_error) {
> -            RDMAControlHeader head = { .len = 0,
> -                                       .type = RDMA_CONTROL_ERROR,
> -                                       .repeat = 1,
> -                                     };
> -            error_report("Early error. Sending error.");
> -            qemu_rdma_post_send_control(rdma, NULL, &head);
> -        }
> -
>          rdma_disconnect(rdma->cm_id);
>          trace_qemu_rdma_cleanup_disconnect();
>          rdma->connected = false;
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/11] migration: create a dedicated thread to release rdma resource
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 11/11] migration: create a dedicated thread to release rdma resource Lidong Chen
@ 2018-08-17 14:59   ` Dr. David Alan Gilbert
  2018-08-20  9:46     ` 858585 jemmy
  0 siblings, 1 reply; 25+ messages in thread
From: Dr. David Alan Gilbert @ 2018-08-17 14:59 UTC (permalink / raw)
  To: Lidong Chen; +Cc: zhang.zhanghailiang, quintela, qemu-devel, Lidong Chen

* Lidong Chen (jemmy858585@gmail.com) wrote:
> ibv_dereg_mr wait for a long time for big memory size virtual server.
> 
> The test result is:
>   10GB  326ms
>   20GB  699ms
>   30GB  1021ms
>   40GB  1387ms
>   50GB  1712ms
>   60GB  2034ms
>   70GB  2457ms
>   80GB  2807ms
>   90GB  3107ms
>   100GB 3474ms
>   110GB 3735ms
>   120GB 4064ms
>   130GB 4567ms
>   140GB 4886ms
> 
> this will cause the guest os hang for a while when migration finished.
> So create a dedicated thread to release rdma resource.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  migration/migration.c |  6 ++++++
>  migration/migration.h |  3 +++
>  migration/rdma.c      | 47 +++++++++++++++++++++++++++++++----------------
>  3 files changed, 40 insertions(+), 16 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index f7d6e26..25d9009 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1499,6 +1499,7 @@ void migrate_init(MigrationState *s)
>      s->vm_was_running = false;
>      s->iteration_initial_bytes = 0;
>      s->threshold_size = 0;
> +    s->rdma_cleanup_thread_quit = true;
>  }
>  
>  static GSList *migration_blockers;
> @@ -1660,6 +1661,10 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
>          return false;
>      }
>  
> +    if (s->rdma_cleanup_thread_quit != true) {
> +        return false;
> +    }

That's not good! We error out without saying anything to the user
about why;  if we error we must always say why, otherwise we'll get
bug reports from people without anyway to debug them.

However, we also don't need to; all we need to do is turn this into
a semaphore or similar, and then wait for it at the start of 
'rdma_start_outgoing_migration' - that way there's no error exit, it
just waits a few seconds and then carries on correctly.
(Maybe we need to wait in incoming as well to cope with
postcopy-recovery).

Dave

>      if (runstate_check(RUN_STATE_INMIGRATE)) {
>          error_setg(errp, "Guest is waiting for an incoming migration");
>          return false;
> @@ -3213,6 +3218,7 @@ static void migration_instance_init(Object *obj)
>  
>      ms->state = MIGRATION_STATUS_NONE;
>      ms->mbps = -1;
> +    ms->rdma_cleanup_thread_quit = true;
>      qemu_sem_init(&ms->pause_sem, 0);
>      qemu_mutex_init(&ms->error_mutex);
>  
> diff --git a/migration/migration.h b/migration/migration.h
> index 64a7b33..60138dd 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -224,6 +224,9 @@ struct MigrationState
>       * do not trigger spurious decompression errors.
>       */
>      bool decompress_error_check;
> +
> +    /* Set this when rdma resource have released */
> +    bool rdma_cleanup_thread_quit;
>  };
>  
>  void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/migration/rdma.c b/migration/rdma.c
> index e1498f2..3282f35 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2995,35 +2995,50 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>      }
>  }
>  
> -static int qio_channel_rdma_close(QIOChannel *ioc,
> -                                  Error **errp)
> +static void *qio_channel_rdma_close_thread(void *arg)
>  {
> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> -    RDMAContext *rdmain, *rdmaout;
> -    trace_qemu_rdma_close();
> -
> -    rdmain = rioc->rdmain;
> -    if (rdmain) {
> -        atomic_rcu_set(&rioc->rdmain, NULL);
> -    }
> +    RDMAContext **rdma = arg;
> +    RDMAContext *rdmain = rdma[0];
> +    RDMAContext *rdmaout = rdma[1];
> +    MigrationState *s = migrate_get_current();
>  
> -    rdmaout = rioc->rdmaout;
> -    if (rdmaout) {
> -        atomic_rcu_set(&rioc->rdmaout, NULL);
> -    }
> +    rcu_register_thread();
>  
>      synchronize_rcu();
> -
>      if (rdmain) {
>          qemu_rdma_cleanup(rdmain);
>      }
> -
>      if (rdmaout) {
>          qemu_rdma_cleanup(rdmaout);
>      }
>  
>      g_free(rdmain);
>      g_free(rdmaout);
> +    g_free(rdma);
> +
> +    rcu_unregister_thread();
> +    s->rdma_cleanup_thread_quit = true;
> +    return NULL;
> +}
> +
> +static int qio_channel_rdma_close(QIOChannel *ioc,
> +                                  Error **errp)
> +{
> +    QemuThread t;
> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> +    RDMAContext **rdma = g_new0(RDMAContext*, 2);
> +    MigrationState *s = migrate_get_current();
> +
> +    trace_qemu_rdma_close();
> +    if (rioc->rdmain || rioc->rdmaout) {
> +        rdma[0] =  rioc->rdmain;
> +        rdma[1] =  rioc->rdmaout;
> +        atomic_rcu_set(&rioc->rdmain, NULL);
> +        atomic_rcu_set(&rioc->rdmaout, NULL);
> +        s->rdma_cleanup_thread_quit = false;
> +        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
> +                           rdma, QEMU_THREAD_DETACHED);
> +    }
>  
>      return 0;
>  }
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 09/11] migration: poll the cm event for destination qemu
  2018-08-17 14:01   ` Dr. David Alan Gilbert
@ 2018-08-20  8:35     ` 858585 jemmy
  0 siblings, 0 replies; 25+ messages in thread
From: 858585 jemmy @ 2018-08-20  8:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, qemu-devel, Lidong Chen

On Fri, Aug 17, 2018 at 10:01 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> The destination qemu only poll the comp_channel->fd in
>> qemu_rdma_wait_comp_channel. But when source qemu disconnnect
>> the rdma connection, the destination qemu should be notified.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> OK, this could do with an update to the migration_incoming_co comment in
> migration.h, since previously it was only used by colo; if we merge this
> first please post a patch to update the comment.

How about?

/* The coroutine we should enter back for incoming migration */
Coroutine *migration_incoming_co;

>
> Other than that, I think I'm OK:
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>
>> ---
>>  migration/migration.c |  3 ++-
>>  migration/rdma.c      | 32 +++++++++++++++++++++++++++++++-
>>  2 files changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index df0c2cf..f7d6e26 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -389,6 +389,7 @@ static void process_incoming_migration_co(void *opaque)
>>      int ret;
>>
>>      assert(mis->from_src_file);
>> +    mis->migration_incoming_co = qemu_coroutine_self();
>>      mis->largest_page_size = qemu_ram_pagesize_largest();
>>      postcopy_state_set(POSTCOPY_INCOMING_NONE);
>>      migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
>> @@ -418,7 +419,6 @@ static void process_incoming_migration_co(void *opaque)
>>
>>      /* we get COLO info, and know if we are in COLO mode */
>>      if (!ret && migration_incoming_enable_colo()) {
>> -        mis->migration_incoming_co = qemu_coroutine_self();
>>          qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
>>               colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
>>          mis->have_colo_incoming_thread = true;
>> @@ -442,6 +442,7 @@ static void process_incoming_migration_co(void *opaque)
>>      }
>>      mis->bh = qemu_bh_new(process_incoming_migration_bh, mis);
>>      qemu_bh_schedule(mis->bh);
>> +    mis->migration_incoming_co = NULL;
>>  }
>>
>>  static void migration_incoming_setup(QEMUFile *f)
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 1affc46..ae07515 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -3226,6 +3226,35 @@ err:
>>
>>  static void rdma_accept_incoming_migration(void *opaque);
>>
>> +static void rdma_cm_poll_handler(void *opaque)
>> +{
>> +    RDMAContext *rdma = opaque;
>> +    int ret;
>> +    struct rdma_cm_event *cm_event;
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +
>> +    ret = rdma_get_cm_event(rdma->channel, &cm_event);
>> +    if (ret) {
>> +        error_report("get_cm_event failed %d", errno);
>> +        return;
>> +    }
>> +    rdma_ack_cm_event(cm_event);
>> +
>> +    if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
>> +        cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
>> +        error_report("receive cm event, cm event is %d", cm_event->event);
>> +        rdma->error_state = -EPIPE;
>> +        if (rdma->return_path) {
>> +            rdma->return_path->error_state = -EPIPE;
>> +        }
>> +
>> +        if (mis->migration_incoming_co) {
>> +            qemu_coroutine_enter(mis->migration_incoming_co);
>> +        }
>> +        return;
>> +    }
>> +}
>> +
>>  static int qemu_rdma_accept(RDMAContext *rdma)
>>  {
>>      RDMACapabilities cap;
>> @@ -3326,7 +3355,8 @@ static int qemu_rdma_accept(RDMAContext *rdma)
>>                              NULL,
>>                              (void *)(intptr_t)rdma->return_path);
>>      } else {
>> -        qemu_set_fd_handler(rdma->channel->fd, NULL, NULL, NULL);
>> +        qemu_set_fd_handler(rdma->channel->fd, rdma_cm_poll_handler,
>> +                            NULL, rdma);
>>      }
>>
>>      ret = rdma_accept(rdma->cm_id, &conn_param);
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 10/11] migration: remove the unnecessary RDMA_CONTROL_ERROR message
  2018-08-17 14:04   ` Dr. David Alan Gilbert
@ 2018-08-20  9:04     ` 858585 jemmy
  0 siblings, 0 replies; 25+ messages in thread
From: 858585 jemmy @ 2018-08-20  9:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, qemu-devel, Lidong Chen

On Fri, Aug 17, 2018 at 10:04 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource.
>> If rdma->error_state is ture, the message may not send successfully.
>> and the cm event can also notify the peer qemu.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> How does this keep 'cancel' working; I added 32bce196344 last year to
> make that code also send the RDMA_CONTROL_ERROR in 'cancelling'.

I guess send the RDMA_CONTROL_ERROR is to notify peer qemu close rdma
connection,
 and to make sure the receive rdma_disconnect event.

But the two sides qemu should cleanup rdma independently. maybe the
destination qemu is hang.

1.the current qemu version already not wait for
RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect,

2.for peer qemu, it's already poll rdma->channel->fd, compare to send
RDMA_CONTROL_ERROR, maybe use cm event
to notify peer qemu to quit is better. maybe the rdma is already in
error_state, and RDMA_CONTROL_ERROR
cannot send successfully. when cancel migraiton, the destination will
receive RDMA_CM_EVENT_DISCONNECTED.

3. I prefer use RDMA_CONTROL_ERROR to notify some code logic error,
not rdma connection error.

>
> Dave
>
>> ---
>>  migration/rdma.c | 11 -----------
>>  1 file changed, 11 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index ae07515..e1498f2 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2305,17 +2305,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>      int idx;
>>
>>      if (rdma->cm_id && rdma->connected) {
>> -        if ((rdma->error_state ||
>> -             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
>> -            !rdma->received_error) {
>> -            RDMAControlHeader head = { .len = 0,
>> -                                       .type = RDMA_CONTROL_ERROR,
>> -                                       .repeat = 1,
>> -                                     };
>> -            error_report("Early error. Sending error.");
>> -            qemu_rdma_post_send_control(rdma, NULL, &head);
>> -        }
>> -
>>          rdma_disconnect(rdma->cm_id);
>>          trace_qemu_rdma_cleanup_disconnect();
>>          rdma->connected = false;
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 11/11] migration: create a dedicated thread to release rdma resource
  2018-08-17 14:59   ` Dr. David Alan Gilbert
@ 2018-08-20  9:46     ` 858585 jemmy
  0 siblings, 0 replies; 25+ messages in thread
From: 858585 jemmy @ 2018-08-20  9:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: zhang.zhanghailiang, Juan Quintela, qemu-devel, Lidong Chen

On Fri, Aug 17, 2018 at 10:59 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> ibv_dereg_mr wait for a long time for big memory size virtual server.
>>
>> The test result is:
>>   10GB  326ms
>>   20GB  699ms
>>   30GB  1021ms
>>   40GB  1387ms
>>   50GB  1712ms
>>   60GB  2034ms
>>   70GB  2457ms
>>   80GB  2807ms
>>   90GB  3107ms
>>   100GB 3474ms
>>   110GB 3735ms
>>   120GB 4064ms
>>   130GB 4567ms
>>   140GB 4886ms
>>
>> this will cause the guest os hang for a while when migration finished.
>> So create a dedicated thread to release rdma resource.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>> ---
>>  migration/migration.c |  6 ++++++
>>  migration/migration.h |  3 +++
>>  migration/rdma.c      | 47 +++++++++++++++++++++++++++++++----------------
>>  3 files changed, 40 insertions(+), 16 deletions(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index f7d6e26..25d9009 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -1499,6 +1499,7 @@ void migrate_init(MigrationState *s)
>>      s->vm_was_running = false;
>>      s->iteration_initial_bytes = 0;
>>      s->threshold_size = 0;
>> +    s->rdma_cleanup_thread_quit = true;
>>  }
>>
>>  static GSList *migration_blockers;
>> @@ -1660,6 +1661,10 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
>>          return false;
>>      }
>>
>> +    if (s->rdma_cleanup_thread_quit != true) {
>> +        return false;
>> +    }
>
> That's not good! We error out without saying anything to the user
> about why;  if we error we must always say why, otherwise we'll get
> bug reports from people without anyway to debug them.

Yes, it should give some error message.

>
> However, we also don't need to; all we need to do is turn this into
> a semaphore or similar, and then wait for it at the start of
> 'rdma_start_outgoing_migration' - that way there's no error exit, it
> just waits a few seconds and then carries on correctly.
> (Maybe we need to wait in incoming as well to cope with
> postcopy-recovery).

Yes, this should also consider postcopy-recovery case. I will fix it.
but rdma_start_outgoing_migration is invoked in the main thread, it's will a
little complex to wait.
and if it waits for a long time, it's not easy to troubleshooting.

Maybe report some error message is an easy way to fix it?
Thanks.

>
> Dave
>
>>      if (runstate_check(RUN_STATE_INMIGRATE)) {
>>          error_setg(errp, "Guest is waiting for an incoming migration");
>>          return false;
>> @@ -3213,6 +3218,7 @@ static void migration_instance_init(Object *obj)
>>
>>      ms->state = MIGRATION_STATUS_NONE;
>>      ms->mbps = -1;
>> +    ms->rdma_cleanup_thread_quit = true;
>>      qemu_sem_init(&ms->pause_sem, 0);
>>      qemu_mutex_init(&ms->error_mutex);
>>
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 64a7b33..60138dd 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -224,6 +224,9 @@ struct MigrationState
>>       * do not trigger spurious decompression errors.
>>       */
>>      bool decompress_error_check;
>> +
>> +    /* Set this when rdma resource have released */
>> +    bool rdma_cleanup_thread_quit;
>>  };
>>
>>  void migrate_set_state(int *state, int old_state, int new_state);
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index e1498f2..3282f35 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2995,35 +2995,50 @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>>      }
>>  }
>>
>> -static int qio_channel_rdma_close(QIOChannel *ioc,
>> -                                  Error **errp)
>> +static void *qio_channel_rdma_close_thread(void *arg)
>>  {
>> -    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> -    RDMAContext *rdmain, *rdmaout;
>> -    trace_qemu_rdma_close();
>> -
>> -    rdmain = rioc->rdmain;
>> -    if (rdmain) {
>> -        atomic_rcu_set(&rioc->rdmain, NULL);
>> -    }
>> +    RDMAContext **rdma = arg;
>> +    RDMAContext *rdmain = rdma[0];
>> +    RDMAContext *rdmaout = rdma[1];
>> +    MigrationState *s = migrate_get_current();
>>
>> -    rdmaout = rioc->rdmaout;
>> -    if (rdmaout) {
>> -        atomic_rcu_set(&rioc->rdmaout, NULL);
>> -    }
>> +    rcu_register_thread();
>>
>>      synchronize_rcu();
>> -
>>      if (rdmain) {
>>          qemu_rdma_cleanup(rdmain);
>>      }
>> -
>>      if (rdmaout) {
>>          qemu_rdma_cleanup(rdmaout);
>>      }
>>
>>      g_free(rdmain);
>>      g_free(rdmaout);
>> +    g_free(rdma);
>> +
>> +    rcu_unregister_thread();
>> +    s->rdma_cleanup_thread_quit = true;
>> +    return NULL;
>> +}
>> +
>> +static int qio_channel_rdma_close(QIOChannel *ioc,
>> +                                  Error **errp)
>> +{
>> +    QemuThread t;
>> +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>> +    RDMAContext **rdma = g_new0(RDMAContext*, 2);
>> +    MigrationState *s = migrate_get_current();
>> +
>> +    trace_qemu_rdma_close();
>> +    if (rioc->rdmain || rioc->rdmaout) {
>> +        rdma[0] =  rioc->rdmain;
>> +        rdma[1] =  rioc->rdmaout;
>> +        atomic_rcu_set(&rioc->rdmain, NULL);
>> +        atomic_rcu_set(&rioc->rdmaout, NULL);
>> +        s->rdma_cleanup_thread_quit = false;
>> +        qemu_thread_create(&t, "rdma cleanup", qio_channel_rdma_close_thread,
>> +                           rdma, QEMU_THREAD_DETACHED);
>> +    }
>>
>>      return 0;
>>  }
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 01/11] migration: disable RDMA WRITE after postcopy started
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 01/11] migration: disable RDMA WRITE after postcopy started Lidong Chen
@ 2018-08-22 10:04   ` Juan Quintela
  0 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2018-08-22 10:04 UTC (permalink / raw)
  To: Lidong Chen; +Cc: zhang.zhanghailiang, dgilbert, qemu-devel, Lidong Chen

Lidong Chen <jemmy858585@gmail.com> wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
>
> RDMA WRITE operations are performed with no notification to the destination
> qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE
> after postcopy started.
>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 04/11] migration: Stop rdma yielding during incoming postcopy
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 04/11] migration: Stop rdma yielding during incoming postcopy Lidong Chen
@ 2018-08-22 10:06   ` Juan Quintela
  0 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2018-08-22 10:06 UTC (permalink / raw)
  To: Lidong Chen; +Cc: zhang.zhanghailiang, dgilbert, qemu-devel, Lidong Chen

Lidong Chen <jemmy858585@gmail.com> wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
>
> During incoming postcopy, the destination qemu will invoke
> qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma
> yield, and poll the completion channel fd instead.
>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 05/11] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 05/11] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
@ 2018-08-22 10:06   ` Juan Quintela
  0 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2018-08-22 10:06 UTC (permalink / raw)
  To: Lidong Chen; +Cc: zhang.zhanghailiang, dgilbert, qemu-devel, Lidong Chen

Lidong Chen <jemmy858585@gmail.com> wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
>
> if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu
> crash.
>
> The backtrace is:
> (gdb) bt
>     #0  0x0000000000000000 in ?? ()
>     #1  0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080,
>         io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c:
>     #2  0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438
>     #3  0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47
>     #4  0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327
>         at migration/qemu-file-channel.c:83
>     #5  0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299
>     #6  0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562
>     #7  0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575
>     #8  0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655
>     #9  0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126
>     #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366
>     #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1
>     #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6
>     #13 0x00007f96fe858760 in ?? ()
>     #14 0x0000000000000000 in ?? ()
>
> RDMA QIOChannel not implement io_set_aio_fd_handler. so
> qio_channel_set_aio_fd_handler will access NULL pointer.
>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 06/11] migration: invoke qio_channel_yield only when qemu_in_coroutine()
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 06/11] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
@ 2018-08-22 10:07   ` Juan Quintela
  0 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2018-08-22 10:07 UTC (permalink / raw)
  To: Lidong Chen; +Cc: zhang.zhanghailiang, dgilbert, qemu-devel, Lidong Chen

Lidong Chen <jemmy858585@gmail.com> wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
>
> when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash.
>
> The backtrace is:
>     (gdb) bt
>     #0  0x00007fb20aba91d7 in raise () from /lib64/libc.so.6
>     #1  0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6
>     #2  0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6
>     #3  0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6
>     #4  0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460
>     #5  0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768)
>         at migration/qemu-file-channel.c:83
>     #6  0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299
>     #7  0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562
>     #8  0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575
>     #9  0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647
>     #10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794
>     #11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504
>     #12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0
>     #13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6
>
> This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine().
>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion Lidong Chen
  2018-08-17 12:24   ` Dr. David Alan Gilbert
@ 2018-08-22 10:07   ` Juan Quintela
  1 sibling, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2018-08-22 10:07 UTC (permalink / raw)
  To: Lidong Chen
  Cc: zhang.zhanghailiang, dgilbert, qemu-devel, Lidong Chen,
	Gal Shachaf, Aviad Yehezkel

Lidong Chen <jemmy858585@gmail.com> wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
>
> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
> maybe loop forever. so we should also poll the cm event fd, and when
> receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL,
> we consider some error happened.
>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> Signed-off-by: Gal Shachaf <galsha@mellanox.com>
> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v6 08/11] migration: implement the shutdown for RDMA QIOChannel
  2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 08/11] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
@ 2018-08-22 10:08   ` Juan Quintela
  0 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2018-08-22 10:08 UTC (permalink / raw)
  To: Lidong Chen; +Cc: zhang.zhanghailiang, dgilbert, qemu-devel, Lidong Chen

Lidong Chen <jemmy858585@gmail.com> wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
>
> Because RDMA QIOChannel not implement shutdown function,
> If the to_dst_file was set error, the return path thread
> will wait forever. and the migration thread will wait
> return path thread exit.
>
> the backtrace of return path thread is:
>
> (gdb) bt
>     #0  0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6
>     #1  0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000)
>         at qemu-timer.c:325
>     #2  0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000)
>         at migration/rdma.c:1501
>     #3  0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000,
>         byte_len=0x7ef7091d0640) at migration/rdma.c:1580
>     #4  0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000,
>         head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726
>     #5  0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720,
>         expecting=3) at migration/rdma.c:1903
>     #6  0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8,
>         size=32768) at migration/rdma.c:2714
>     #7  0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232
>     #8  0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0)
>         at migration/qemu-file.c:502
>     #9  0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515
>     #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591
>     #11 0x00000000006a46d3 in source_return_path_thread (
>         opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331
>     #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
>     #13 0x00007f372a77635d in clone () from /lib64/libc.so.6
>
> the backtrace of migration thread is:
>
> (gdb) bt
>     #0  0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0
>     #1  0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>)
>         at util/qemu-thread-posix.c:504
>     #2  0x00000000006a4bc5 in await_return_path_close_on_source (
>         ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460
>     #3  0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>,
>         current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980)
>         at migration/migration.c:1695
>     #4  0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>)
>         at migration/migration.c:1837
>     #5  0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0
>     #6  0x00007f372a77635d in clone () from /lib64/libc.so.6
>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-08-22 10:08 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 01/11] migration: disable RDMA WRITE after postcopy started Lidong Chen
2018-08-22 10:04   ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 02/11] migration: create a dedicated connection for rdma return path Lidong Chen
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 03/11] migration: implement bi-directional RDMA QIOChannel Lidong Chen
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 04/11] migration: Stop rdma yielding during incoming postcopy Lidong Chen
2018-08-22 10:06   ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 05/11] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
2018-08-22 10:06   ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 06/11] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
2018-08-22 10:07   ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion Lidong Chen
2018-08-17 12:24   ` Dr. David Alan Gilbert
2018-08-22 10:07   ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 08/11] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
2018-08-22 10:08   ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 09/11] migration: poll the cm event for destination qemu Lidong Chen
2018-08-17 14:01   ` Dr. David Alan Gilbert
2018-08-20  8:35     ` 858585 jemmy
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 10/11] migration: remove the unnecessary RDMA_CONTROL_ERROR message Lidong Chen
2018-08-17 14:04   ` Dr. David Alan Gilbert
2018-08-20  9:04     ` 858585 jemmy
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 11/11] migration: create a dedicated thread to release rdma resource Lidong Chen
2018-08-17 14:59   ` Dr. David Alan Gilbert
2018-08-20  9:46     ` 858585 jemmy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).