From: "Michael S. Tsirkin" <mst@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
marcandre.lureau@redhat.com, maxime.coquelin@redhat.com,
Peng Hao <peng.hao2@zte.com.cn>,
Wang Yechao <wang.yechao255@zte.com.cn>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge
Date: Sat, 22 Jul 2017 00:30:17 +0300 [thread overview]
Message-ID: <20170721232134-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20170721164158.72fc3798@nial.brq.redhat.com>
On Fri, Jul 21, 2017 at 04:41:58PM +0200, Igor Mammedov wrote:
> On Wed, 19 Jul 2017 18:52:56 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Wed, Jul 19, 2017 at 03:24:27PM +0200, Igor Mammedov wrote:
> > > On Wed, 19 Jul 2017 12:46:13 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >
> > > > * Igor Mammedov (imammedo@redhat.com) wrote:
> > > > > On Wed, 19 Jul 2017 23:17:32 +0800
> > > > > Peng Hao <peng.hao2@zte.com.cn> wrote:
> > > > >
> > > > > > When a guest that has several hotplugged dimms is migrated, in
> > > > > > destination host it will fail to resume. Because vhost regions of
> > > > > > several dimms in source host are merged and in the restore stage
> > > > > > in destination host it computes whether more than vhost slot limit
> > > > > > before merging vhost regions of several dimms.
> > > > > could you provide a bit more detailed description of the problem
> > > > > including command line+used device_add commands on source and
> > > > > command line on destination?
> > > >
> > > > (ccing in Marc Andre and Maxime)
> > > >
> > > > Hmm, I'd like to understade the situation where you get merging between
> > > > RAMBlocks; that complicates some stuff for postcopy.
> > > and probably inconsistent merging breaks vhost as well
> > >
> > > merging might happen if regions are adjacent or overlap
> > > but for that to happen merged regions must have equal
> > > distance between their GPA:HVA pairs, so that following
> > > translation would work:
> > >
> > > if gva in regionX[gva_start, len, hva_start]
> > > hva = hva_start + gva - gva_start
> > >
> > > while GVA of regions is under QEMU control and deterministic
> > > HVA is not, so in migration case merging might happen on source
> > > side but not on destination, resulting in different memory maps.
> > >
> > > Maybe Michael might know details why migration works in vhost usecase,
> > > but I don't see vhost sending any vmstate data.
> >
> > We aren't merging ramblocks at all.
> > When we are passing blocks A and B to vhost, if we see that
> >
> > hvaB=hvaA + lenA
> > gpaB=gpaA + lenA
> >
> > then we can improve performance a bit by passing a single
> > chunk to vhost: hvaA,gpaA,lena+lenB
> kernel used to maintain flat array map for look up where
> such optimization could give some benefit which is negligible
> as in practice merging reduces array size only by ~5 entries.
>
> In addition kernel backend has been converted to interval tree
> as flat array doesn't scale, so merging doesn't really matters
> there anymore.
In my opinion not merging slots is an obvious waste - I
think there were patches that added a cache and that
showed some promise. cache will be more effective
if regions are bigger.
> If we can get rid of merging on QEMU side, resulting memory
> map will become of the same size regardless of the order
> in which entries are added or chancy random allocation
> that could allow region merging (i.e. size will become
> deterministic).
It seems somehow wrong to avoid doing (even minor) optimizations just to
make error handling simpler.
> Looking at vhost_user_set_mem_table() it sends actual number of
> entries to backend over the wire, so it shouldn't break backend
> if it were written right (i.e. uses msg.payload.memory.nregions
> instead of VHOST_MEMORY_MAX_NREGIONS from QEMU.), if it breaks
> then it's backend's fault and it should be fixed.
>
> Another thing that could break is too low limit
> VHOST_MEMORY_MAX_NREGIONS = 8
> and QEMU started with default options takes upto 7 entries in map
> unmerged, so any configuration that consumes additional slots won't
> start after upgrade. We could counter the most of issues by rising
> VHOST_MEMORY_MAX_NREGIONS limit and/or teaching vhost-user protocol
> to fetch limit from backend similar to vhost_kernel_memslots_limit().
I absolutely agree we should fix vhost-user to raise the slot
limit, along the lines you suggest. Care looking into it?
>
> > so it does not affect migration normally.
> >
> > >
> > > >
> > > > > >
> > > > > > Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
> > > > > > Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
> > > > > > ---
> > > > > > hw/mem/pc-dimm.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> > > > > > index ea67b46..bb0fa08 100644
> > > > > > --- a/hw/mem/pc-dimm.c
> > > > > > +++ b/hw/mem/pc-dimm.c
> > > > > > @@ -101,7 +101,7 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
> > > > > > goto out;
> > > > > > }
> > > > > >
> > > > > > - if (!vhost_has_free_slot()) {
> > > > > > + if (!vhost_has_free_slot() && runstate_is_running()) {
> > > > > > error_setg(&local_err, "a used vhost backend has no free"
> > > > > > " memory slots left");
> > > > > > goto out;
> > > >
> > > > Even this produces the wrong error message in this case,
> > > > it also makes me think if the existing code should undo a lot of
> > > > the object_property_set's that happen.
> > > >
> > > > Dave
> > > > >
> > > > >
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2017-07-21 21:30 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-19 15:17 [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge Peng Hao
2017-07-19 7:50 ` Igor Mammedov
2017-07-19 11:46 ` Dr. David Alan Gilbert
2017-07-19 13:24 ` Igor Mammedov
2017-07-19 15:52 ` Michael S. Tsirkin
2017-07-20 17:22 ` Dr. David Alan Gilbert
2017-07-21 19:49 ` Michael S. Tsirkin
2017-07-24 8:06 ` Dr. David Alan Gilbert
2017-07-24 10:46 ` Igor Mammedov
2017-07-21 14:41 ` Igor Mammedov
2017-07-21 21:30 ` Michael S. Tsirkin [this message]
2017-07-24 10:05 ` Igor Mammedov
2017-07-24 13:01 ` Igor Mammedov
2017-07-19 8:36 ` no-reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170721232134-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=dgilbert@redhat.com \
--cc=imammedo@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=peng.hao2@zte.com.cn \
--cc=qemu-devel@nongnu.org \
--cc=wang.yechao255@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.