From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41812) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fEWmY-0006SK-DH for qemu-devel@nongnu.org; Fri, 04 May 2018 05:13:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fEWmS-0007mc-UD for qemu-devel@nongnu.org; Fri, 04 May 2018 05:13:38 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:35074 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fEWmS-0007mP-Oh for qemu-devel@nongnu.org; Fri, 04 May 2018 05:13:32 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D112C76FBA for ; Fri, 4 May 2018 09:13:28 +0000 (UTC) Date: Fri, 4 May 2018 11:13:23 +0200 From: Igor Mammedov Message-ID: <20180504111323.7cb4c7c8@redhat.com> In-Reply-To: <212695685.22833607.1524728271984.JavaMail.zimbra@redhat.com> References: <20180420123456.22196-1-david@redhat.com> <20180423141928.7e64b380@redhat.com> <908f1079-385f-24d3-99ad-152ecd6b01d2@redhat.com> <20180424153154.05e79de7@redhat.com> <1046685642.22350584.1524635112123.JavaMail.zimbra@redhat.com> <20180425152356.46ee7e04@redhat.com> <709141862.22620677.1524664609218.JavaMail.zimbra@redhat.com> <20180425172639.3e4d8ca1@redhat.com> <212695685.22833607.1524728271984.JavaMail.zimbra@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v3 3/3] virtio-pmem: should we make it migratable??? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pankaj Gupta Cc: David Hildenbrand , qemu-devel@nongnu.org, Paolo Bonzini , "Dr. David Alan Gilbert" On Thu, 26 Apr 2018 03:37:51 -0400 (EDT) Pankaj Gupta wrote: trimming CC list to keep people that might be interested in the topic and renaming thread to reflect it. > > > > > > > >> + > > > > > > > >> + memory_region_add_subregion(&hpms->mr, addr - hpms->base, > > > > > > > >> mr); > > > > > > > > missing vmstate registration? > > > > > > > > > > > > > > Missed this one: To be called by the caller. Important because e.g. > > > > > > > for > > > > > > > virtio-pmem we don't want this (I assume :) ). > > > > > > if pmem isn't on shared storage, then We'd probably want to migrate > > > > > > it as well, otherwise target would experience data loss. > > > > > > Anyways, I'd just reat it as normal RAM in migration case > > > > > > > > > > Main difference between RAM and pmem it acts like combination of RAM > > > > > and > > > > > disk. > > > > > Saying this, in normal use-case size would be 100 GB's - few TB's > > > > > range. > > > > > I am not sure we really want to migrate it for non-shared storage > > > > > use-case. > > > > with non shared storage you'd have to migrate it target host but > > > > with shared storage it might be possible to flush it and use directly > > > > from target host. That probably won't work right out of box and would > > > > need some sort of synchronization between src/dst hosts. > > > > > > Shared storage should work out of the box. > > > Only thing is data in destination > > > host will be cache cold and existing pages in cache should be invalidated > > > first. > > > But if we migrate entire fake DAX RAMstate it will populate destination > > > host page > > > cache including pages while were idle in source host. This would > > > unnecessarily > > > create entropy in destination host. > > > > > > To me this feature don't make much sense. Problem which we are solving is: > > > Efficiently use guest RAM. > > What would live migration handover flow look like in case of > > guest constantly dirting memory provided by virtio-pmem and > > and sometimes issuing async flush req along with it? > > Dirty entire pmem (disk) at once not a usual scenario. Some part of disk/pmem > would get dirty and we need to handle that. I just want to say moving entire > pmem (disk) is not efficient solution because we are using this solution to > manage guest memory efficiently. Otherwise it will be like any block device copy > with non-shared storage. not sure if we can use block layer analogy here. > > > > The same applies to nv/pc-dimm as well, as backend file easily could be > > > > on pmem storage as well. > > > > > > Are you saying backing file is in actual actual nvdimm hardware? we don't > > > need > > > emulation at all. > > depends on if file is on DAX filesystem, but your argument about > > migrating huge 100Gb- TB's range applies in this case as well. > > > > > > > > > > > > > Maybe for now we should migrate everything so it would work in case of > > > > non shared NVDIMM on host. And then later add migration-less capability > > > > to all of them. > > > > > > not sure I agree. > > So would you inhibit migration in case of non shared backend storage, > > to avoid loosing data since they aren't migrated? > > I am just thinking what features we want to support with pmem. And live migration > with shared storage is the one which comes to my mind. > > If live migration with non-shared storage is what we want to support (I don't know > yet) we can add this? Even with shared storage it would copy entire pmem state? Perhaps we should register vmstate like for normal ram and use something similar to http://lists.gnu.org/archive/html/qemu-devel/2018-04/msg00003.html this to skip shared memory on migration. In this case we could use this for pc-dimms as well. David, what's your take on it? > Thanks, > Pankaj > > > > > > > > > > One reason why nvdimm added vmstate info could be: still there would be > > > > > transient > > > > > writes in memory with fake DAX and there is no way(till now) to flush > > > > > the > > > > > guest > > > > > writes. But with virtio-pmem we can flush such writes before migration > > > > > and > > > > > automatically > > > > > at destination host with shared disk we will have updated data. > > > > nvdimm has concept of flush address hint (may be not implemented in qemu > > > > yet) > > > > but it can flush. The only reason I'm buying into virtio-mem idea > > > > is that would allow async flush queues which would reduce number > > > > of vmexits. > > > > > > Thats correct. > > > > > > Thanks, > > > Pankaj > > > > > > > > > > > > >