From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Lidong Chen <jemmy858585@gmail.com>
Cc: zhang.zhanghailiang@huawei.com, quintela@redhat.com,
qemu-devel@nongnu.org, Lidong Chen <lidongchen@tencent.com>,
Gal Shachaf <galsha@mellanox.com>,
Aviad Yehezkel <aviadye@mellanox.com>
Subject: Re: [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion
Date: Fri, 17 Aug 2018 13:24:03 +0100 [thread overview]
Message-ID: <20180817122402.GH2459@work-vm> (raw)
In-Reply-To: <1533562177-16447-8-git-send-email-lidongchen@tencent.com>
* Lidong Chen (jemmy858585@gmail.com) wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
>
> If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function
> maybe loop forever. so we should also poll the cm event fd, and when
> receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL,
> we consider some error happened.
>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> Signed-off-by: Gal Shachaf <galsha@mellanox.com>
> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
I found the doc in the man page for rdma_create_event_channel
that said 'Users may make the fd non-blocking, poll or select the fd,
etc', so:
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/rdma.c | 33 ++++++++++++++++++++++++++++++---
> 1 file changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index d6bbf28..673f126 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1489,6 +1489,9 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, uint64_t *wr_id_out,
> */
> static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
> {
> + struct rdma_cm_event *cm_event;
> + int ret = -1;
> +
> /*
> * Coroutine doesn't start until migration_fd_process_incoming()
> * so don't yield unless we know we're running inside of a coroutine.
> @@ -1505,13 +1508,37 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
> * without hanging forever.
> */
> while (!rdma->error_state && !rdma->received_error) {
> - GPollFD pfds[1];
> + GPollFD pfds[2];
> pfds[0].fd = rdma->comp_channel->fd;
> pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> + pfds[0].revents = 0;
> +
> + pfds[1].fd = rdma->channel->fd;
> + pfds[1].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> + pfds[1].revents = 0;
> +
> /* 0.1s timeout, should be fine for a 'cancel' */
> - switch (qemu_poll_ns(pfds, 1, 100 * 1000 * 1000)) {
> + switch (qemu_poll_ns(pfds, 2, 100 * 1000 * 1000)) {
> + case 2:
> case 1: /* fd active */
> - return 0;
> + if (pfds[0].revents) {
> + return 0;
> + }
> +
> + if (pfds[1].revents) {
> + ret = rdma_get_cm_event(rdma->channel, &cm_event);
> + if (!ret) {
> + rdma_ack_cm_event(cm_event);
> + }
> +
> + error_report("receive cm event while wait comp channel,"
> + "cm event is %d", cm_event->event);
> + if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
> + cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
> + return -EPIPE;
> + }
> + }
> + break;
>
> case 0: /* Timeout, go around again */
> break;
> --
> 1.8.3.1
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2018-08-17 12:24 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-06 13:29 [Qemu-devel] [PATCH v6 00/11] Enable postcopy RDMA live migration Lidong Chen
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 01/11] migration: disable RDMA WRITE after postcopy started Lidong Chen
2018-08-22 10:04 ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 02/11] migration: create a dedicated connection for rdma return path Lidong Chen
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 03/11] migration: implement bi-directional RDMA QIOChannel Lidong Chen
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 04/11] migration: Stop rdma yielding during incoming postcopy Lidong Chen
2018-08-22 10:06 ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 05/11] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
2018-08-22 10:06 ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 06/11] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
2018-08-22 10:07 ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 07/11] migration: poll the cm event while wait RDMA work request completion Lidong Chen
2018-08-17 12:24 ` Dr. David Alan Gilbert [this message]
2018-08-22 10:07 ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 08/11] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
2018-08-22 10:08 ` Juan Quintela
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 09/11] migration: poll the cm event for destination qemu Lidong Chen
2018-08-17 14:01 ` Dr. David Alan Gilbert
2018-08-20 8:35 ` 858585 jemmy
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 10/11] migration: remove the unnecessary RDMA_CONTROL_ERROR message Lidong Chen
2018-08-17 14:04 ` Dr. David Alan Gilbert
2018-08-20 9:04 ` 858585 jemmy
2018-08-06 13:29 ` [Qemu-devel] [PATCH v6 11/11] migration: create a dedicated thread to release rdma resource Lidong Chen
2018-08-17 14:59 ` Dr. David Alan Gilbert
2018-08-20 9:46 ` 858585 jemmy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180817122402.GH2459@work-vm \
--to=dgilbert@redhat.com \
--cc=aviadye@mellanox.com \
--cc=galsha@mellanox.com \
--cc=jemmy858585@gmail.com \
--cc=lidongchen@tencent.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=zhang.zhanghailiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.