All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: 858585 jemmy <jemmy858585@gmail.com>
Cc: Aviad Yehezkel <aviadye@mellanox.com>,
	Juan Quintela <quintela@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Gal Shachaf <galsha@mellanox.com>,
	adido@mellanox.com, Lidong Chen <lidongchen@tencent.com>
Subject: Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect
Date: Wed, 16 May 2018 14:13:28 +0100	[thread overview]
Message-ID: <20180516131327.GF2531@work-vm> (raw)
In-Reply-To: <CAOGPPbdjb6x-7Lgtw-xDA7qT=qfGHPwibeu2q+v08G21HUmKyQ@mail.gmail.com>

* 858585 jemmy (jemmy858585@gmail.com) wrote:

<snip>

> >> >> > I wonder why dereg_mr takes so long - I could understand if reg_mr
> >> >> > took a long time, but why for dereg, that sounds like the easy side.
> >> >>
> >> >> I use perf collect the information when ibv_dereg_mr is invoked.
> >> >>
> >> >> -   9.95%  client2  [kernel.kallsyms]  [k] put_compound_page
> >> >>                                                           `
> >> >>    - put_compound_page
> >> >>       - 98.45% put_page
> >> >>            __ib_umem_release
> >> >>            ib_umem_release
> >> >>            dereg_mr
> >> >>            mlx5_ib_dereg_mr
> >> >>            ib_dereg_mr
> >> >>            uverbs_free_mr
> >> >>            remove_commit_idr_uobject
> >> >>            _rdma_remove_commit_uobject
> >> >>            rdma_remove_commit_uobject
> >> >>            ib_uverbs_dereg_mr
> >> >>            ib_uverbs_write
> >> >>            vfs_write
> >> >>            sys_write
> >> >>            system_call_fastpath
> >> >>            __GI___libc_write
> >> >>            0
> >> >>       + 1.55% __ib_umem_release
> >> >> +   8.31%  client2  [kernel.kallsyms]  [k] compound_unlock_irqrestore
> >> >> +   7.01%  client2  [kernel.kallsyms]  [k] page_waitqueue
> >> >> +   7.00%  client2  [kernel.kallsyms]  [k] set_page_dirty
> >> >> +   6.61%  client2  [kernel.kallsyms]  [k] unlock_page
> >> >> +   6.33%  client2  [kernel.kallsyms]  [k] put_page_testzero
> >> >> +   5.68%  client2  [kernel.kallsyms]  [k] set_page_dirty_lock
> >> >> +   4.30%  client2  [kernel.kallsyms]  [k] __wake_up_bit
> >> >> +   4.04%  client2  [kernel.kallsyms]  [k] free_pages_prepare
> >> >> +   3.65%  client2  [kernel.kallsyms]  [k] release_pages
> >> >> +   3.62%  client2  [kernel.kallsyms]  [k] arch_local_irq_save
> >> >> +   3.35%  client2  [kernel.kallsyms]  [k] page_mapping
> >> >> +   3.13%  client2  [kernel.kallsyms]  [k] get_pageblock_flags_group
> >> >> +   3.09%  client2  [kernel.kallsyms]  [k] put_page
> >> >>
> >> >> the reason is __ib_umem_release will loop many times for each page.
> >> >>
> >> >> static void __ib_umem_release(struct ib_device *dev, struct ib_umem
> >> >> *umem, int dirty)
> >> >> {
> >> >>     struct scatterlist *sg;
> >> >>     struct page *page;
> >> >>     int i;
> >> >>
> >> >>     if (umem->nmap > 0)
> >> >>          ib_dma_unmap_sg(dev, umem->sg_head.sgl,
> >> >>                                         umem->npages,
> >> >>                                         DMA_BIDIRECTIONAL);
> >> >>
> >> >>          for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {      <<
> >> >> loop a lot of times for each page.here
> >> >
> >> > Why 'lot of times for each page'?  I don't know this code at all, but
> >> > I'd expected once per page?
> >>
> >> sorry, once per page, but a lot of page for a big size virtual machine.
> >
> > Ah OK; so yes it seems best if you can find a way to do the release in
> > the migration thread then;  still maybe this is something some
> > of the kernel people could look at speeding up?
> 
> The kernel code seem is not complex, and I have no idea how to speed up.

Me neither; but I'll ask around.

> >> >
> >> > With your other kernel fix, does the problem of the missing
> >> > RDMA_CM_EVENT_DISCONNECTED events go away?
> >>
> >> Yes, after kernel and qemu fixed, this issue never happens again.
> >
> > I'm confused; which qemu fix; my question was whether the kernel fix by
> > itself fixed the problem of the missing event.
> 
> this qemu fix:
> migration: update index field when delete or qsort RDMALocalBlock

OK good; so then we shouldn't need this 2/2 patch.

> this issue also cause by some ram block is not released. but I do not
> find the root cause.

Hmm, we should try and track that down.

> >
> >> Do you think we should remove rdma_get_cm_event after rdma_disconnect?
> >
> > I don't think so; if 'rdma_disconnect' is supposed to generate the event
> > I think we're supposed to wait for it to know that the disconnect is
> > really complete.
> 
> After move qemu_fclose to migration thread, it will not block the main
> thread when wait
> the disconnection event.

I'm not sure about moving the fclose to the migration thread; it worries
me with the interaction with cancel and other failures.

Dave

> >
> > Dave
> >
> >>
> >> >
> >> > Dave
> >> >
> >> >>               page = sg_page(sg);
> >> >>               if (umem->writable && dirty)
> >> >>                   set_page_dirty_lock(page);
> >> >>               put_page(page);
> >> >>          }
> >> >>
> >> >>          sg_free_table(&umem->sg_head);
> >> >>          return;
> >> >> }
> >> >>
> >> >> >
> >> >> > Dave
> >> >> >
> >> >> >
> >> >> >> >
> >> >> >> > Dave
> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >> Anyway, it should not invoke rdma_get_cm_event in main thread, and the event channel
> >> >> >> >> >> is also destroyed in qemu_rdma_cleanup.
> >> >> >> >> >>
> >> >> >> >> >> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >> >> >> >> >> ---
> >> >> >> >> >>  migration/rdma.c       | 12 ++----------
> >> >> >> >> >>  migration/trace-events |  1 -
> >> >> >> >> >>  2 files changed, 2 insertions(+), 11 deletions(-)
> >> >> >> >> >>
> >> >> >> >> >> diff --git a/migration/rdma.c b/migration/rdma.c
> >> >> >> >> >> index 0dd4033..92e4d30 100644
> >> >> >> >> >> --- a/migration/rdma.c
> >> >> >> >> >> +++ b/migration/rdma.c
> >> >> >> >> >> @@ -2275,8 +2275,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma,
> >> >> >> >> >>
> >> >> >> >> >>  static void qemu_rdma_cleanup(RDMAContext *rdma)
> >> >> >> >> >>  {
> >> >> >> >> >> -    struct rdma_cm_event *cm_event;
> >> >> >> >> >> -    int ret, idx;
> >> >> >> >> >> +    int idx;
> >> >> >> >> >>
> >> >> >> >> >>      if (rdma->cm_id && rdma->connected) {
> >> >> >> >> >>          if ((rdma->error_state ||
> >> >> >> >> >> @@ -2290,14 +2289,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
> >> >> >> >> >>              qemu_rdma_post_send_control(rdma, NULL, &head);
> >> >> >> >> >>          }
> >> >> >> >> >>
> >> >> >> >> >> -        ret = rdma_disconnect(rdma->cm_id);
> >> >> >> >> >> -        if (!ret) {
> >> >> >> >> >> -            trace_qemu_rdma_cleanup_waiting_for_disconnect();
> >> >> >> >> >> -            ret = rdma_get_cm_event(rdma->channel, &cm_event);
> >> >> >> >> >> -            if (!ret) {
> >> >> >> >> >> -                rdma_ack_cm_event(cm_event);
> >> >> >> >> >> -            }
> >> >> >> >> >> -        }
> >> >> >> >> >> +        rdma_disconnect(rdma->cm_id);
> >> >> >> >> >
> >> >> >> >> > I'm worried whether this change could break stuff:
> >> >> >> >> > The docs say for rdma_disconnect that it flushes any posted work
> >> >> >> >> > requests to the completion queue;  so unless we wait for the event
> >> >> >> >> > do we know the stuff has been flushed?   In the normal non-cancel case
> >> >> >> >> > I'm worried that means we could lose something.
> >> >> >> >> > (But I don't know the rdma/infiniband specs well enough to know if it's
> >> >> >> >> > really a problem).
> >> >> >> >>
> >> >> >> >> In qemu_fclose function, it invoke qemu_fflush(f) before invoke f->ops->close.
> >> >> >> >> so I think it's safe for normal migration case.
> >> >> >> >>
> >> >> >> >> For the root cause why not receive RDMA_CM_EVENT_DISCONNECTED event
> >> >> >> >> after rdma_disconnect,
> >> >> >> >> I loop in Aviad Yehezkel<aviadye@mellanox.com>, Aviad will help us
> >> >> >> >> find the root cause.
> >> >> >> >>
> >> >> >> >> >
> >> >> >> >> > Dave
> >> >> >> >> >
> >> >> >> >> >>          trace_qemu_rdma_cleanup_disconnect();
> >> >> >> >> >>          rdma->connected = false;
> >> >> >> >> >>      }
> >> >> >> >> >> diff --git a/migration/trace-events b/migration/trace-events
> >> >> >> >> >> index d6be74b..64573ff 100644
> >> >> >> >> >> --- a/migration/trace-events
> >> >> >> >> >> +++ b/migration/trace-events
> >> >> >> >> >> @@ -125,7 +125,6 @@ qemu_rdma_accept_pin_state(bool pin) "%d"
> >> >> >> >> >>  qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
> >> >> >> >> >>  qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
> >> >> >> >> >>  qemu_rdma_cleanup_disconnect(void) ""
> >> >> >> >> >> -qemu_rdma_cleanup_waiting_for_disconnect(void) ""
> >> >> >> >> >>  qemu_rdma_close(void) ""
> >> >> >> >> >>  qemu_rdma_connect_pin_all_requested(void) ""
> >> >> >> >> >>  qemu_rdma_connect_pin_all_outcome(bool pin) "%d"
> >> >> >> >> >> --
> >> >> >> >> >> 1.8.3.1
> >> >> >> >> >>
> >> >> >> >> > --
> >> >> >> >> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >> >> >> > --
> >> >> >> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >> >> > --
> >> >> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >> > --
> >> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-05-16 13:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-06 14:54 [Qemu-devel] [PATCH 1/2] migration: update index field when delete or qsort RDMALocalBlock Lidong Chen
2018-05-06 14:54 ` [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect Lidong Chen
2018-05-08 18:40   ` Dr. David Alan Gilbert
2018-05-09  3:43     ` 858585 jemmy
2018-05-11 18:03       ` Dr. David Alan Gilbert
2018-05-14  7:32         ` 858585 jemmy
2018-05-14 19:27           ` Dr. David Alan Gilbert
2018-05-16  8:31             ` 858585 jemmy
2018-05-16  9:39               ` Dr. David Alan Gilbert
2018-05-16  9:45                 ` 858585 jemmy
2018-05-16  9:53                   ` Dr. David Alan Gilbert
2018-05-16 12:48                     ` 858585 jemmy
2018-05-16 13:13                       ` Dr. David Alan Gilbert [this message]
     [not found]                         ` <AM6PR05MB4360C1A885C2B052583F47C7C2920@AM6PR05MB4360.eurprd05.prod.outlook.com>
     [not found]                           ` <be4b48b2-22b3-9a69-92ed-dbc9ae59b4e6@dev.mellanox.co.il>
     [not found]                             ` <CAOGPPbenO1P9VbdpDVD5_SNJeoF4eHxkLg=rF7LM9HpbyoR0uw@mail.gmail.com>
2018-05-17  7:31                               ` [Qemu-devel] FW: " Aviad Yehezkel
2018-05-17  7:41                                 ` 858585 jemmy
2018-05-17  7:46                                   ` Aviad Yehezkel
2018-05-22 10:12                         ` [Qemu-devel] " 858585 jemmy
2018-05-08 17:19 ` [Qemu-devel] [PATCH 1/2] migration: update index field when delete or qsort RDMALocalBlock Dr. David Alan Gilbert
2018-05-09  1:39   ` 858585 jemmy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180516131327.GF2531@work-vm \
    --to=dgilbert@redhat.com \
    --cc=adido@mellanox.com \
    --cc=aviadye@mellanox.com \
    --cc=galsha@mellanox.com \
    --cc=jemmy858585@gmail.com \
    --cc=lidongchen@tencent.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.