From: Haozhong Zhang <haozhong.zhang@intel.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: kwolf@redhat.com, Xiao Guangrong <guangrong.xiao@linux.intel.com>,
Eduardo Habkost <ehabkost@redhat.com>,
Peter Crosthwaite <crosthwaite.peter@gmail.com>,
qemu-devel@nongnu.org, mreitz@redhat.com,
Paolo Bonzini <pbonzini@redhat.com>,
Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH] hostmem-file: add a property 'notrunc' to avoid data corruption
Date: Fri, 21 Oct 2016 19:25:29 +0800 [thread overview]
Message-ID: <20161021112529.zm6ymdj6hnugvhtu@hz-desktop> (raw)
In-Reply-To: <20161021130700.7beff8d3@nial.brq.redhat.com>
On 10/21/16 13:07 +0200, Igor Mammedov wrote:
>On Fri, 21 Oct 2016 15:22:10 +0800
>Haozhong Zhang <haozhong.zhang@intel.com> wrote:
>
>> On 10/20/16 17:35 +0200, Igor Mammedov wrote:
>> >On Thu, 20 Oct 2016 12:47:34 -0200
>> >Eduardo Habkost <ehabkost@redhat.com> wrote:
>> >
>> >> On Thu, Oct 20, 2016 at 04:15:21PM +0200, Igor Mammedov wrote:
>> >> > On Thu, 20 Oct 2016 11:56:10 -0200
>> >> > Eduardo Habkost <ehabkost@redhat.com> wrote:
>> >> >
>> >> > > On Thu, Oct 20, 2016 at 03:42:15PM +0200, Igor Mammedov wrote:
>> >> > > > On Thu, 20 Oct 2016 21:11:38 +0800
>> >> > > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
>> >> > > >
>> >> > > > > On 10/20/16 14:34 +0200, Igor Mammedov wrote:
>> >> > > > > >On Thu, 20 Oct 2016 14:13:01 +0800
>> >> > > > > >Haozhong Zhang <haozhong.zhang@intel.com> wrote:
>> >> > > > > >
>> >> > > > > >> If a file is used as the backend of memory-backend-file and its size is
>> >> > > > > >> not identical to the property 'size', the file will be truncated. For a
>> >> > > > > >> file used as the backend of vNVDIMM, its data is expected to be
>> >> > > > > >> persistent and the truncation may corrupt the existing data.
>> >> > > > > >I wonder if it's possible just skip 'size' property in your case instead
>> >> > > > > >'notrunc' property. That way if size is not present one'd get actual size
>> >> > > > > >using get_file_size() and set 'size' to it?
>> >> > > > > >And if 'size' is provided and 'size' != file_size then error out.
>> >> > > > > >
>> >> > > > >
>> >> > > > > I don't know how this can be implemented in QEMU. Specially, how does
>> >> > > > > the memory-backend-file know it's used for vNVDIMM, so that it can
>> >> > > > > skip the 'size' property?
>> >> > > > Does memory-backend-file needs to know that it's used by NVDIMM?
>> >> > > > Looking at nvdimm_realize it doesn't as it's assumes
>> >> > > > hostemem_size == pmem_size + label_size
>> >> > > >
>> >> > > > I'd make hostmem_file.size optional and take size from file
>> >> > > > and if 'size' is specified explictly require it to mach file size.
>> >> > > > It's generic and has nothing to do with nvdimm.
>> >> > >
>> >> > > We can take size from file, or take size from the
>> >> > > host_memory_backend_get_memory() callers.
>> >> > >
>> >> > > Enumerating all sizes that QEMU can use as input:
>> >> > >
>> >> > > A) Backend file size
>> >> > > B) memory backend "size" option
>> >> > > C) frontend-provided size (-numa size, -m, or pc-dimm "size"
>> >> > > property)
>> >> > -numa size affect only anon memory not backend backed one, for
>> >> > backend baked memory we use memdev where size comes from backend
>> >> >
>> >> > pc-dimm.size is readonly and isn't supposed to influence backend.size
>> >> >
>> >> > I'd drop C option
>> >>
>> >> If C is not present, it should be, as it affects the guest ABI
>> >> (and the ABI must never depend on the host you are running or
>> >> backend configuration, only on the frontend configuration).
>> >I've meant that C should not affect behavior of backend.
>> >
>> >> If we are dropping -numa size in favor of the
>> >> memory-backend-provided size, that's a bug.
>> >-numa size is not applicable here as it's not using backends,
>> >when backends are used it's -numa memdev instead in which case
>> > numa_info[nodenr].node_mem = object_property_get_int(o, "size", NULL);
>> >
>> >>
>> >> >
>> >> > >
>> >> > > My suggestion is:
>> >> > > * B should be optional.
>> >> > > * If B is omitted, we should never truncate the file to a smaller
>> >> > > size.
>> >> > i.e. derive backend.size from filesize if possible (i.e. not hugepages)
>> >> >
>> >> > > * If B is omitted, we can use C as the size when mapping the
>> >> > > file.
>> >> > frontend size is the size that's mapped into guest address space.
>> >> > it should not influence backend's size in backward direction.
>> >> > You may notice pc-dimm.size is not user settable (readonly) property.
>> >>
>> >> Frontend size will not influence backend size, it will just
>> >> affect the size of the memory region the frontend code will ask
>> >> the backend to provide.
>> >>
>> >> In other words: I believe host_memory_backend_get_memory() needs
>> >> a 'size' argument, and that memory allocation could be optionally
>> >> delayed to the host_memory_backend_get_memory() call. This way,
>> >> we don't need a backend size at all, unless we want the backend
>> >> to truncate files or preallocate memory for us.
>> >To not complicate things I'd keep current behavior of
>> > host_memory_backend_get_memory()
>> >i.e return MR for all the baked memory and then frontend
>> >can split and map it into guest address space as it sees fit
>> >using aliases for non trivial cases taking in account frontend's
>> >own size/label_size/whatnot properties.
>> >That's what NVDIMM does now, it gets MR for whole file and
>> >the splits it on data and label areas and maps into GA only
>> >data part using memory_region_init_alias().
>> >
>>
>> As NVDIMM label is mentioned here, I realize a case that extending
>> file size would fail:
>>
>> NVDIMM labels record the namespace information of the data area and
>> should be persistently stored. The current vNVDIMM implementation in
>> QEMU always stores and finds those labels at the end of the given
>> backend storage. If a larger size B is given and the backend file is
>> extended, QEMU will not be able to find the labels stored on the
>> original file, and hence cannot present the same namespaces to guest.
>>
>> This problem can fixed by deferring the allocation to
>> host_memory_backend_get_memory(), and adding a parameter 'notrunc' to
>> it to specify the truncation behavior. If no truncation is specified
>> and B > A, QEMU will report error and stop.
>Lets not go there (I mean extending/deferring) and keep thing simple,
>backend object should be completely initialized upon completion of
>host_memory_backend_memory_complete() (including creation/mapping of
>whatever memory it's backed by) or fail if it can't do so.
>
>
>How about following behavior:
>
> 1) memory-backend-file,mem-path=/some_dir,size=2G
> - uses truncate to extend temporary file created in "mem-path" to 'size'
> for this case to work 'size' is mandatory
>
> 2) memory-backend-file,mem-path=/existing_file,size=2G
> - uses truncate to extend/shrink file "mem-path" to 'size'
> for this case 'size' could be made optional,
> if we take in account that backend could be used as persistent
> storage then shrinking or extending "mem-path" would be corruption
> as backend has no idea about internal layout if mapped file.
> We can do something like this here:
>
> if (is_size_opt_provided and size_of(mem-path) != 0) {
> error_out with "mem-path=foo size XXX doesn't match 'size=xxx' option"
> } else if (is_size_opt_provided and size_of(mem-path) == 0) {
> // may be we don't need this case and
> // just fold this in above error case
> truncate(mem-path) // extend/shrink
> } else {
> set_size_opt(size_of(mem-path))
> }
Agree. Let's wait for Eduardo's comments. If he has no objection, I'll
prepare a patch in this way.
Thanks,
Haozhong
next prev parent reply other threads:[~2016-10-21 11:25 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-20 6:13 [Qemu-devel] [PATCH] hostmem-file: add a property 'notrunc' to avoid data corruption Haozhong Zhang
2016-10-20 6:35 ` no-reply
2016-10-20 12:34 ` Igor Mammedov
2016-10-20 13:11 ` Haozhong Zhang
2016-10-20 13:34 ` Eduardo Habkost
2016-10-20 13:47 ` Haozhong Zhang
2016-10-20 13:42 ` Igor Mammedov
2016-10-20 13:56 ` Eduardo Habkost
2016-10-20 14:15 ` Igor Mammedov
2016-10-20 14:47 ` Eduardo Habkost
2016-10-20 15:35 ` Igor Mammedov
2016-10-20 16:56 ` Eduardo Habkost
2016-10-21 9:31 ` Igor Mammedov
2016-10-21 11:53 ` Eduardo Habkost
2016-10-21 13:26 ` Igor Mammedov
2016-10-21 7:22 ` Haozhong Zhang
2016-10-21 11:07 ` Igor Mammedov
2016-10-21 11:25 ` Haozhong Zhang [this message]
2016-10-21 11:56 ` Eduardo Habkost
2016-10-20 14:22 ` Haozhong Zhang
2016-10-20 15:14 ` Eduardo Habkost
2016-10-20 13:56 ` Haozhong Zhang
2016-10-20 13:21 ` Eduardo Habkost
2016-10-20 13:33 ` Haozhong Zhang
2016-10-20 13:47 ` Eduardo Habkost
2016-10-20 14:17 ` Igor Mammedov
2016-10-20 15:15 ` Eduardo Habkost
2016-10-20 15:41 ` Igor Mammedov
2016-10-20 16:59 ` Eduardo Habkost
2016-10-21 10:28 ` Igor Mammedov
2016-10-21 11:44 ` Eduardo Habkost
2016-10-20 13:47 ` Igor Mammedov
2016-10-20 13:57 ` Eduardo Habkost
2016-10-20 14:18 ` Igor Mammedov
2016-10-20 15:00 ` Eduardo Habkost
2016-10-20 15:14 ` Igor Mammedov
2016-10-20 13:27 ` Daniel P. Berrange
2016-10-20 13:40 ` Eduardo Habkost
2016-10-20 13:54 ` Igor Mammedov
2016-10-20 13:55 ` Kevin Wolf
2016-10-24 13:10 ` Eduardo Habkost
2016-10-25 6:42 ` Haozhong Zhang
2016-10-25 10:01 ` Eduardo Habkost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161021112529.zm6ymdj6hnugvhtu@hz-desktop \
--to=haozhong.zhang@intel.com \
--cc=crosthwaite.peter@gmail.com \
--cc=ehabkost@redhat.com \
--cc=guangrong.xiao@linux.intel.com \
--cc=imammedo@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).