From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org, michael@hinespot.com, quintela@redhat.com,
lvivier@redhat.com, berrange@redhat.com
Subject: Re: [Qemu-devel] [PATCH 2/5] migration: Close file on failed migration load
Date: Fri, 14 Jul 2017 10:51:00 +0800 [thread overview]
Message-ID: <20170714025100.GC27284@pxdev.xzpeter.org> (raw)
In-Reply-To: <20170712110021.GA30658@work-vm>
On Wed, Jul 12, 2017 at 12:00:22PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Jul 04, 2017 at 07:49:12PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > >
> > > Closing the file before exit on a failure allows
> > > the source to cleanup better, especially with RDMA.
> > >
> > > Partial fix for https://bugs.launchpad.net/qemu/+bug/1545052
> >
> > In above bug reported, the issue is that both dst and src VMs hanged
> > when migration failed (which is a by-design failure). On destination,
> > it hangs at (copied from the link):
> >
> > #0 0x00007ffff39141cd in write () at ../sysdeps/unix/syscall-template.S:81
> > #1 0x00007ffff27fe795 in rdma_get_cm_event.part.15 () from /lib64/librdmacm.so.1
> > #2 0x000055555593e445 in qemu_rdma_cleanup (rdma=0x7fff9647e010) at migration/rdma.c:2210
> > #3 0x000055555593ea45 in qemu_rdma_close (opaque=0x555557796770) at migration/rdma.c:2652
> > #4 0x00005555559397cc in qemu_fclose (f=f@entry=0x5555564b1450) at migration/qemu-file.c:270
> > #5 0x0000555555936b88 in process_incoming_migration_co (opaque=0x5555564b1450) at migration/migration.c:361
> > #6 0x0000555555a25a1a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
> > #7 0x00007fffef5b3110 in ?? () from /lib64/libc.so.6
> >
> > So looks like at that time we have qemu_fclose() for the incoming fd,
> > and that's the thing that caused trouble.
>
> I never saw that hang in the current world; I saw the source hang
> rather than the destination. A hung destination is annoying but
> since it's a failed migration anyway it's no big problem; the much
> bigger problem is a failed migration which breaks the source.
>
> > (just to mention that the version caused failure is commit fc1ec1acf,
> > which is mentioned in the first comment in the bz)
> >
> > Now the situation is: we don't have qemu_flose() now in current QEMU
> > master on the failure path (see below, we just exit() directly). Then
> > would the bz still valid now? And, if we apply this fix (then we do
> > qemu_fclose() again), would it hang again instead of fixing anything?
>
> It doesn't seem to - but the big benefit we get from doing the close
> is that we trigger the 'Early Error. Sending error.' case in
> qemu_rdma_cleanup - by sending that error flag we cause the
> received_error flag to be set on the source, and that causes the
> migration to cleanly fail.
>
> Also, since it sets that received_error flag on the source, my patch 3/5
> would exit it's qemu_rdma_wait_comp_channel loop so theoretically the
> other side of the hang seen in lp1545052 couldn't happen.
I see. Thanks.
I see there is a new version of the series. Will reply in that thread.
--
Peter Xu
next prev parent reply other threads:[~2017-07-14 2:51 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-04 18:49 [Qemu-devel] [PATCH 0/5] A bunch of RDMA fixes Dr. David Alan Gilbert (git)
2017-07-04 18:49 ` [Qemu-devel] [PATCH 1/5] migration/rdma: Fix race on source Dr. David Alan Gilbert (git)
2017-07-12 3:13 ` Peter Xu
2017-07-04 18:49 ` [Qemu-devel] [PATCH 2/5] migration: Close file on failed migration load Dr. David Alan Gilbert (git)
2017-07-12 3:20 ` Peter Xu
2017-07-12 11:00 ` Dr. David Alan Gilbert
2017-07-14 2:51 ` Peter Xu [this message]
2017-07-04 18:49 ` [Qemu-devel] [PATCH 3/5] migration/rdma: Allow cancelling while waiting for wrid Dr. David Alan Gilbert (git)
2017-07-12 9:32 ` Peter Xu
2017-07-12 12:36 ` Dr. David Alan Gilbert
2017-07-14 2:57 ` Peter Xu
2017-07-04 18:49 ` [Qemu-devel] [PATCH 4/5] migration/rdma: Safely convert control types Dr. David Alan Gilbert (git)
2017-07-12 3:24 ` Peter Xu
2017-07-04 18:49 ` [Qemu-devel] [PATCH 5/5] migration/rdma: Send error during cancelling Dr. David Alan Gilbert (git)
2017-07-12 3:42 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170714025100.GC27284@pxdev.xzpeter.org \
--to=peterx@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=lvivier@redhat.com \
--cc=michael@hinespot.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.