All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Lidong Chen <jemmy858585@gmail.com>
Cc: zhang.zhanghailiang@huawei.com, quintela@redhat.com,
	berrange@redhat.com, aviadye@mellanox.com, pbonzini@redhat.com,
	qemu-devel@nongnu.org, adido@mellanox.com,
	Lidong Chen <lidongchen@tencent.com>
Subject: Re: [Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect
Date: Wed, 30 May 2018 13:24:18 +0100	[thread overview]
Message-ID: <20180530122417.GD2410@work-vm> (raw)
In-Reply-To: <1527673416-31268-8-git-send-email-lidongchen@tencent.com>

* Lidong Chen (jemmy858585@gmail.com) wrote:
> From: Lidong Chen <jemmy858585@gmail.com>
> 
> When cancel migration during RDMA precopy, the source qemu main thread hangs sometime.
> 
> The backtrace is:
>     (gdb) bt
>     #0  0x00007f249eabd43d in write () from /lib64/libpthread.so.0
>     #1  0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189
>     #2  0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296
>     #3  0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999
>     #4  0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273
>     #5  0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98
>     #6  0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334
>     #7  0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162
>     #8  0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90
>     #9  0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118
>     #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436
>     #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0)
>         at util/async.c:261
>     #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>     #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215
>     #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263
>     #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522
>     #16 0x00000000005bc6a5 in main_loop () at vl.c:1944
>     #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752
> 
> It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime.
> 
> According to IB Spec once active side send DREQ message, it should wait for DREP message
> and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped
> due to network issues.
> For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is
> dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger.
> Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state
> and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events.
> 
> So it should not invoke rdma_get_cm_event which may hang forever, and the event channel
> is also destroyed in qemu_rdma_cleanup.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>



Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/rdma.c       | 12 ++----------
>  migration/trace-events |  1 -
>  2 files changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 0dd4033..92e4d30 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2275,8 +2275,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
>  
>  static void qemu_rdma_cleanup(RDMAContext *rdma)
>  {
> -    struct rdma_cm_event *cm_event;
> -    int ret, idx;
> +    int idx;
>  
>      if (rdma->cm_id && rdma->connected) {
>          if ((rdma->error_state ||
> @@ -2290,14 +2289,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>              qemu_rdma_post_send_control(rdma, NULL, &head);
>          }
>  
> -        ret = rdma_disconnect(rdma->cm_id);
> -        if (!ret) {
> -            trace_qemu_rdma_cleanup_waiting_for_disconnect();
> -            ret = rdma_get_cm_event(rdma->channel, &cm_event);
> -            if (!ret) {
> -                rdma_ack_cm_event(cm_event);
> -            }
> -        }
> +        rdma_disconnect(rdma->cm_id);
>          trace_qemu_rdma_cleanup_disconnect();
>          rdma->connected = false;
>      }
> diff --git a/migration/trace-events b/migration/trace-events
> index 3c798dd..4a768ea 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -146,7 +146,6 @@ qemu_rdma_accept_pin_state(bool pin) "%d"
>  qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
>  qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
>  qemu_rdma_cleanup_disconnect(void) ""
> -qemu_rdma_cleanup_waiting_for_disconnect(void) ""
>  qemu_rdma_close(void) ""
>  qemu_rdma_connect_pin_all_requested(void) ""
>  qemu_rdma_connect_pin_all_outcome(bool pin) "%d"
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-05-30 12:24 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-30  9:43 [Qemu-devel] [PATCH v4 00/12] Enable postcopy RDMA live migration Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 01/12] migration: disable RDMA WRITE after postcopy started Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 02/12] migration: create a dedicated connection for rdma return path Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 03/12] migration: remove unnecessary variables len in QIOChannelRDMA Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 04/12] migration: avoid concurrent invoke channel_close by different threads Lidong Chen
2018-05-30 14:45   ` Dr. David Alan Gilbert
2018-05-31  7:07     ` 858585 jemmy
2018-05-31 10:52       ` Dr. David Alan Gilbert
2018-06-03 13:50         ` 858585 jemmy
2018-06-03 14:43           ` 858585 jemmy
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 05/12] migration: implement bi-directional RDMA QIOChannel Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 06/12] migration: Stop rdma yielding during incoming postcopy Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 07/12] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect Lidong Chen
2018-05-30 12:24   ` Dr. David Alan Gilbert [this message]
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 08/12] migration: implement io_set_aio_fd_handler function for RDMA QIOChannel Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 09/12] migration: invoke qio_channel_yield only when qemu_in_coroutine() Lidong Chen
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 10/12] migration: create a dedicated thread to release rdma resource Lidong Chen
2018-05-30 16:50   ` Dr. David Alan Gilbert
2018-05-31  7:25     ` 858585 jemmy
2018-05-31 10:55       ` Dr. David Alan Gilbert
2018-05-31 11:27         ` 858585 jemmy
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 11/12] migration: poll the cm event while wait RDMA work request completion Lidong Chen
2018-05-30 17:33   ` Dr. David Alan Gilbert
2018-05-31  7:36     ` 858585 jemmy
2018-06-03 15:04       ` Aviad Yehezkel
2018-06-05 14:26         ` 858585 jemmy
2018-05-30  9:43 ` [Qemu-devel] [PATCH v4 12/12] migration: implement the shutdown for RDMA QIOChannel Lidong Chen
2018-05-30 17:59   ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180530122417.GD2410@work-vm \
    --to=dgilbert@redhat.com \
    --cc=adido@mellanox.com \
    --cc=aviadye@mellanox.com \
    --cc=berrange@redhat.com \
    --cc=jemmy858585@gmail.com \
    --cc=lidongchen@tencent.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.