From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50962) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dZasW-0006ex-2V for qemu-devel@nongnu.org; Mon, 24 Jul 2017 06:46:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dZasS-0000dP-R3 for qemu-devel@nongnu.org; Mon, 24 Jul 2017 06:46:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53756) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dZasS-0000cz-H7 for qemu-devel@nongnu.org; Mon, 24 Jul 2017 06:46:16 -0400 Date: Mon, 24 Jul 2017 12:46:03 +0200 From: Igor Mammedov Message-ID: <20170724124603.19c8e81b@nial.brq.redhat.com> In-Reply-To: <20170724080633.GA2127@work-vm> References: <1500477452-59643-1-git-send-email-peng.hao2@zte.com.cn> <20170719095058.4d46359d@Igors-MacBook-Pro.local> <20170719114612.GC3500@work-vm> <20170719152427.620b2a2a@nial.brq.redhat.com> <20170719184748-mutt-send-email-mst@kernel.org> <20170720172215.GH2456@work-vm> <20170721224259-mutt-send-email-mst@kernel.org> <20170724080633.GA2127@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because of vhost region merge List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: "Michael S. Tsirkin" , Peng Hao , qemu-devel@nongnu.org, maxime.coquelin@redhat.com, marcandre.lureau@redhat.com, Wang Yechao On Mon, 24 Jul 2017 09:06:33 +0100 "Dr. David Alan Gilbert" wrote: > * Michael S. Tsirkin (mst@redhat.com) wrote: > > On Thu, Jul 20, 2017 at 06:22:15PM +0100, Dr. David Alan Gilbert wrote: > > > * Michael S. Tsirkin (mst@redhat.com) wrote: > > > > On Wed, Jul 19, 2017 at 03:24:27PM +0200, Igor Mammedov wrote: > > > > > On Wed, 19 Jul 2017 12:46:13 +0100 > > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > > > * Igor Mammedov (imammedo@redhat.com) wrote: > > > > > > > On Wed, 19 Jul 2017 23:17:32 +0800 > > > > > > > Peng Hao wrote: > > > > > > > > > > > > > > > When a guest that has several hotplugged dimms is migrated, in > > > > > > > > destination host it will fail to resume. Because vhost regions of > > > > > > > > several dimms in source host are merged and in the restore stage > > > > > > > > in destination host it computes whether more than vhost slot limit > > > > > > > > before merging vhost regions of several dimms. > > > > > > > could you provide a bit more detailed description of the problem > > > > > > > including command line+used device_add commands on source and > > > > > > > command line on destination? > > > > > > > > > > > > (ccing in Marc Andre and Maxime) > > > > > > > > > > > > Hmm, I'd like to understade the situation where you get merging between > > > > > > RAMBlocks; that complicates some stuff for postcopy. > > > > > and probably inconsistent merging breaks vhost as well > > > > > > > > > > merging might happen if regions are adjacent or overlap > > > > > but for that to happen merged regions must have equal > > > > > distance between their GPA:HVA pairs, so that following > > > > > translation would work: > > > > > > > > > > if gva in regionX[gva_start, len, hva_start] > > > > > hva = hva_start + gva - gva_start > > > > > > > > > > while GVA of regions is under QEMU control and deterministic > > > > > HVA is not, so in migration case merging might happen on source > > > > > side but not on destination, resulting in different memory maps. > > > > > > > > > > Maybe Michael might know details why migration works in vhost usecase, > > > > > but I don't see vhost sending any vmstate data. > > > > > > > > We aren't merging ramblocks at all. > > > > When we are passing blocks A and B to vhost, if we see that > > > > > > > > hvaB=hvaA + lenA > > > > gpaB=gpaA + lenA > > > > > > > > then we can improve performance a bit by passing a single > > > > chunk to vhost: hvaA,gpaA,lena+lenB > > > > > > OK, but that means that a region can incorporate multiple > > > RAMBlocks though? Hmm that's not fun on postcopy. > > > > > > > so it does not affect migration normally. > > > > > > Well, why? What's required - if the region sizes/lengths/orders > > > are different on the source and destination does it matter - if > > > it does then that means we have a problem, since that heuristic > > > is non-deterministic. > > > > > > Dave > > > > It doesn't matter normally. But there's a limit > > on number of regions that vhost can support. > > pc_dimm_memory_plug tries to guess this number early > > because it wants to fail hotplug requests gracefully. > > Why doesn't it matter - can't this heuristic potentially trigger on a > source but not a destination even in the case of something like NUMA > without hotplug? it might if a memdev backend per node is used. > > > See > > commit 3fad87881e55aaff659408dcf25fa204f89a7896 > > Author: Igor Mammedov > > Date: Tue Oct 6 10:37:28 2015 +0200 > > > > pc-dimm: add vhost slots limit check before commiting to hotplug > > > > It might make sense to limit this to hotplug, though > > runstate check seems like a wrong way to do this - > > VM could be stopped e.g. through QMP at the time. > > My other problem is this optimisation makes it tricky for postcopy; > one-region -> one-ramblock is easy, but there's a potential for problems > if different RAMBlock's within the region have different > characteristics. > > Dave > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Peng Hao > > > > > > > > Signed-off-by: Wang Yechao > > > > > > > > --- > > > > > > > > hw/mem/pc-dimm.c | 2 +- > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c > > > > > > > > index ea67b46..bb0fa08 100644 > > > > > > > > --- a/hw/mem/pc-dimm.c > > > > > > > > +++ b/hw/mem/pc-dimm.c > > > > > > > > @@ -101,7 +101,7 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms, > > > > > > > > goto out; > > > > > > > > } > > > > > > > > > > > > > > > > - if (!vhost_has_free_slot()) { > > > > > > > > + if (!vhost_has_free_slot() && runstate_is_running()) { > > > > > > > > error_setg(&local_err, "a used vhost backend has no free" > > > > > > > > " memory slots left"); > > > > > > > > goto out; > > > > > > > > > > > > Even this produces the wrong error message in this case, > > > > > > it also makes me think if the existing code should undo a lot of > > > > > > the object_property_set's that happen. > > > > > > > > > > > > Dave > > > > > > > > > > > > > > > > > > > > -- > > > > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > -- > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >