From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43259) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bxXfL-0006f6-2f for qemu-devel@nongnu.org; Fri, 21 Oct 2016 07:07:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bxXfG-00029o-34 for qemu-devel@nongnu.org; Fri, 21 Oct 2016 07:07:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55750) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bxXfF-00029U-RA for qemu-devel@nongnu.org; Fri, 21 Oct 2016 07:07:06 -0400 Date: Fri, 21 Oct 2016 13:07:00 +0200 From: Igor Mammedov Message-ID: <20161021130700.7beff8d3@nial.brq.redhat.com> In-Reply-To: <20161021072210.sxfwixky4idlmzni@hz-desktop> References: <20161020061301.31372-1-haozhong.zhang@intel.com> <20161020143412.5ea6b564@nial.brq.redhat.com> <20161020131138.4gzxk5ekltkiqtq2@hz-desktop> <20161020154215.4b765284@nial.brq.redhat.com> <20161020135610.GY5057@thinpad.lan.raisama.net> <20161020161521.6e20ce2a@nial.brq.redhat.com> <20161020144734.GA5057@thinpad.lan.raisama.net> <20161020173538.1ee9a556@nial.brq.redhat.com> <20161021072210.sxfwixky4idlmzni@hz-desktop> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] hostmem-file: add a property 'notrunc' to avoid data corruption List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Haozhong Zhang Cc: kwolf@redhat.com, Xiao Guangrong , Eduardo Habkost , Peter Crosthwaite , qemu-devel@nongnu.org, mreitz@redhat.com, Paolo Bonzini , Richard Henderson On Fri, 21 Oct 2016 15:22:10 +0800 Haozhong Zhang wrote: > On 10/20/16 17:35 +0200, Igor Mammedov wrote: > >On Thu, 20 Oct 2016 12:47:34 -0200 > >Eduardo Habkost wrote: > > > >> On Thu, Oct 20, 2016 at 04:15:21PM +0200, Igor Mammedov wrote: > >> > On Thu, 20 Oct 2016 11:56:10 -0200 > >> > Eduardo Habkost wrote: > >> > > >> > > On Thu, Oct 20, 2016 at 03:42:15PM +0200, Igor Mammedov wrote: > >> > > > On Thu, 20 Oct 2016 21:11:38 +0800 > >> > > > Haozhong Zhang wrote: > >> > > > > >> > > > > On 10/20/16 14:34 +0200, Igor Mammedov wrote: > >> > > > > >On Thu, 20 Oct 2016 14:13:01 +0800 > >> > > > > >Haozhong Zhang wrote: > >> > > > > > > >> > > > > >> If a file is used as the backend of memory-backend-file and its size is > >> > > > > >> not identical to the property 'size', the file will be truncated. For a > >> > > > > >> file used as the backend of vNVDIMM, its data is expected to be > >> > > > > >> persistent and the truncation may corrupt the existing data. > >> > > > > >I wonder if it's possible just skip 'size' property in your case instead > >> > > > > >'notrunc' property. That way if size is not present one'd get actual size > >> > > > > >using get_file_size() and set 'size' to it? > >> > > > > >And if 'size' is provided and 'size' != file_size then error out. > >> > > > > > > >> > > > > > >> > > > > I don't know how this can be implemented in QEMU. Specially, how does > >> > > > > the memory-backend-file know it's used for vNVDIMM, so that it can > >> > > > > skip the 'size' property? > >> > > > Does memory-backend-file needs to know that it's used by NVDIMM? > >> > > > Looking at nvdimm_realize it doesn't as it's assumes > >> > > > hostemem_size == pmem_size + label_size > >> > > > > >> > > > I'd make hostmem_file.size optional and take size from file > >> > > > and if 'size' is specified explictly require it to mach file size. > >> > > > It's generic and has nothing to do with nvdimm. > >> > > > >> > > We can take size from file, or take size from the > >> > > host_memory_backend_get_memory() callers. > >> > > > >> > > Enumerating all sizes that QEMU can use as input: > >> > > > >> > > A) Backend file size > >> > > B) memory backend "size" option > >> > > C) frontend-provided size (-numa size, -m, or pc-dimm "size" > >> > > property) > >> > -numa size affect only anon memory not backend backed one, for > >> > backend baked memory we use memdev where size comes from backend > >> > > >> > pc-dimm.size is readonly and isn't supposed to influence backend.size > >> > > >> > I'd drop C option > >> > >> If C is not present, it should be, as it affects the guest ABI > >> (and the ABI must never depend on the host you are running or > >> backend configuration, only on the frontend configuration). > >I've meant that C should not affect behavior of backend. > > > >> If we are dropping -numa size in favor of the > >> memory-backend-provided size, that's a bug. > >-numa size is not applicable here as it's not using backends, > >when backends are used it's -numa memdev instead in which case > > numa_info[nodenr].node_mem = object_property_get_int(o, "size", NULL); > > > >> > >> > > >> > > > >> > > My suggestion is: > >> > > * B should be optional. > >> > > * If B is omitted, we should never truncate the file to a smaller > >> > > size. > >> > i.e. derive backend.size from filesize if possible (i.e. not hugepages) > >> > > >> > > * If B is omitted, we can use C as the size when mapping the > >> > > file. > >> > frontend size is the size that's mapped into guest address space. > >> > it should not influence backend's size in backward direction. > >> > You may notice pc-dimm.size is not user settable (readonly) property. > >> > >> Frontend size will not influence backend size, it will just > >> affect the size of the memory region the frontend code will ask > >> the backend to provide. > >> > >> In other words: I believe host_memory_backend_get_memory() needs > >> a 'size' argument, and that memory allocation could be optionally > >> delayed to the host_memory_backend_get_memory() call. This way, > >> we don't need a backend size at all, unless we want the backend > >> to truncate files or preallocate memory for us. > >To not complicate things I'd keep current behavior of > > host_memory_backend_get_memory() > >i.e return MR for all the baked memory and then frontend > >can split and map it into guest address space as it sees fit > >using aliases for non trivial cases taking in account frontend's > >own size/label_size/whatnot properties. > >That's what NVDIMM does now, it gets MR for whole file and > >the splits it on data and label areas and maps into GA only > >data part using memory_region_init_alias(). > > > > As NVDIMM label is mentioned here, I realize a case that extending > file size would fail: > > NVDIMM labels record the namespace information of the data area and > should be persistently stored. The current vNVDIMM implementation in > QEMU always stores and finds those labels at the end of the given > backend storage. If a larger size B is given and the backend file is > extended, QEMU will not be able to find the labels stored on the > original file, and hence cannot present the same namespaces to guest. > > This problem can fixed by deferring the allocation to > host_memory_backend_get_memory(), and adding a parameter 'notrunc' to > it to specify the truncation behavior. If no truncation is specified > and B > A, QEMU will report error and stop. Lets not go there (I mean extending/deferring) and keep thing simple, backend object should be completely initialized upon completion of host_memory_backend_memory_complete() (including creation/mapping of whatever memory it's backed by) or fail if it can't do so. How about following behavior: 1) memory-backend-file,mem-path=/some_dir,size=2G - uses truncate to extend temporary file created in "mem-path" to 'size' for this case to work 'size' is mandatory 2) memory-backend-file,mem-path=/existing_file,size=2G - uses truncate to extend/shrink file "mem-path" to 'size' for this case 'size' could be made optional, if we take in account that backend could be used as persistent storage then shrinking or extending "mem-path" would be corruption as backend has no idea about internal layout if mapped file. We can do something like this here: if (is_size_opt_provided and size_of(mem-path) != 0) { error_out with "mem-path=foo size XXX doesn't match 'size=xxx' option" } else if (is_size_opt_provided and size_of(mem-path) == 0) { // may be we don't need this case and // just fold this in above error case truncate(mem-path) // extend/shrink } else { set_size_opt(size_of(mem-path)) }