From: "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com>
To: qemu-devel@nongnu.org
Cc: quintela@redhat.com, lvivier@redhat.com, peterx@redhat.com
Subject: [Qemu-devel] [PATCH v4 4/6] migration/rdma: Allow cancelling while waiting for wrid
Date: Mon, 17 Jul 2017 12:09:34 +0100 [thread overview]
Message-ID: <20170717110936.23314-5-dgilbert@redhat.com> (raw)
In-Reply-To: <20170717110936.23314-1-dgilbert@redhat.com>
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
When waiting for a WRID, if the other side dies we end up waiting
for ever with no way to cancel the migration.
Cure this by poll()ing the fd first with a timeout and checking
error flags and migration state.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
migration/rdma.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 53 insertions(+), 6 deletions(-)
diff --git a/migration/rdma.c b/migration/rdma.c
index 59810aec2e..0cf55a6d5b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1466,6 +1466,56 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
return 0;
}
+/* Wait for activity on the completion channel.
+ * Returns 0 on success, none-0 on error.
+ */
+static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
+{
+ /*
+ * Coroutine doesn't start until migration_fd_process_incoming()
+ * so don't yield unless we know we're running inside of a coroutine.
+ */
+ if (rdma->migration_started_on_destination) {
+ yield_until_fd_readable(rdma->comp_channel->fd);
+ } else {
+ /* This is the source side, we're in a separate thread
+ * or destination prior to migration_fd_process_incoming()
+ * we can't yield; so we have to poll the fd.
+ * But we need to be able to handle 'cancel' or an error
+ * without hanging forever.
+ */
+ while (!rdma->error_state && !rdma->received_error) {
+ GPollFD pfds[1];
+ pfds[0].fd = rdma->comp_channel->fd;
+ pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+ /* 0.1s timeout, should be fine for a 'cancel' */
+ switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
+ case 1: /* fd active */
+ return 0;
+
+ case 0: /* Timeout, go around again */
+ break;
+
+ default: /* Error of some type -
+ * I don't trust errno from qemu_poll_ns
+ */
+ error_report("%s: poll failed", __func__);
+ return -EPIPE;
+ }
+
+ if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
+ /* Bail out and let the cancellation happen */
+ return -EPIPE;
+ }
+ }
+ }
+
+ if (rdma->received_error) {
+ return -EPIPE;
+ }
+ return rdma->error_state;
+}
+
/*
* Block until the next work request has completed.
*
@@ -1513,12 +1563,9 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
}
while (1) {
- /*
- * Coroutine doesn't start until migration_fd_process_incoming()
- * so don't yield unless we know we're running inside of a coroutine.
- */
- if (rdma->migration_started_on_destination) {
- yield_until_fd_readable(rdma->comp_channel->fd);
+ ret = qemu_rdma_wait_comp_channel(rdma);
+ if (ret) {
+ goto err_block_for_wrid;
}
ret = ibv_get_cq_event(rdma->comp_channel, &cq, &cq_ctx);
--
2.13.0
next prev parent reply other threads:[~2017-07-17 11:09 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-17 11:09 [Qemu-devel] [PATCH v4 0/6] A bunch of RDMA fixes Dr. David Alan Gilbert (git)
2017-07-17 11:09 ` [Qemu-devel] [PATCH v4 1/6] migration/rdma: Fix race on source Dr. David Alan Gilbert (git)
2017-07-17 11:09 ` [Qemu-devel] [PATCH v4 2/6] migration: Close file on failed migration load Dr. David Alan Gilbert (git)
2017-07-17 19:49 ` Juan Quintela
2017-07-17 11:09 ` [Qemu-devel] [PATCH v4 3/6] migration/rdma: fix qemu_rdma_block_for_wrid error paths Dr. David Alan Gilbert (git)
2017-07-18 1:20 ` Peter Xu
2017-07-18 19:04 ` Dr. David Alan Gilbert
2017-07-17 11:09 ` Dr. David Alan Gilbert (git) [this message]
2017-07-18 1:23 ` [Qemu-devel] [PATCH v4 4/6] migration/rdma: Allow cancelling while waiting for wrid Peter Xu
2017-07-17 11:09 ` [Qemu-devel] [PATCH v4 5/6] migration/rdma: Safely convert control types Dr. David Alan Gilbert (git)
2017-07-17 16:20 ` Juan Quintela
2017-07-17 11:09 ` [Qemu-devel] [PATCH v4 6/6] migration/rdma: Send error during cancelling Dr. David Alan Gilbert (git)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170717110936.23314-5-dgilbert@redhat.com \
--to=dgilbert@redhat.com \
--cc=lvivier@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.