From: Haozhong Zhang <haozhong.zhang@intel.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
kwolf@redhat.com, Xiao Guangrong <guangrong.xiao@linux.intel.com>,
Peter Crosthwaite <crosthwaite.peter@gmail.com>,
qemu-devel@nongnu.org, mreitz@redhat.com,
Paolo Bonzini <pbonzini@redhat.com>,
Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH] hostmem-file: add a property 'notrunc' to avoid data corruption
Date: Fri, 21 Oct 2016 15:22:10 +0800 [thread overview]
Message-ID: <20161021072210.sxfwixky4idlmzni@hz-desktop> (raw)
In-Reply-To: <20161020173538.1ee9a556@nial.brq.redhat.com>
On 10/20/16 17:35 +0200, Igor Mammedov wrote:
>On Thu, 20 Oct 2016 12:47:34 -0200
>Eduardo Habkost <ehabkost@redhat.com> wrote:
>
>> On Thu, Oct 20, 2016 at 04:15:21PM +0200, Igor Mammedov wrote:
>> > On Thu, 20 Oct 2016 11:56:10 -0200
>> > Eduardo Habkost <ehabkost@redhat.com> wrote:
>> >
>> > > On Thu, Oct 20, 2016 at 03:42:15PM +0200, Igor Mammedov wrote:
>> > > > On Thu, 20 Oct 2016 21:11:38 +0800
>> > > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
>> > > >
>> > > > > On 10/20/16 14:34 +0200, Igor Mammedov wrote:
>> > > > > >On Thu, 20 Oct 2016 14:13:01 +0800
>> > > > > >Haozhong Zhang <haozhong.zhang@intel.com> wrote:
>> > > > > >
>> > > > > >> If a file is used as the backend of memory-backend-file and its size is
>> > > > > >> not identical to the property 'size', the file will be truncated. For a
>> > > > > >> file used as the backend of vNVDIMM, its data is expected to be
>> > > > > >> persistent and the truncation may corrupt the existing data.
>> > > > > >I wonder if it's possible just skip 'size' property in your case instead
>> > > > > >'notrunc' property. That way if size is not present one'd get actual size
>> > > > > >using get_file_size() and set 'size' to it?
>> > > > > >And if 'size' is provided and 'size' != file_size then error out.
>> > > > > >
>> > > > >
>> > > > > I don't know how this can be implemented in QEMU. Specially, how does
>> > > > > the memory-backend-file know it's used for vNVDIMM, so that it can
>> > > > > skip the 'size' property?
>> > > > Does memory-backend-file needs to know that it's used by NVDIMM?
>> > > > Looking at nvdimm_realize it doesn't as it's assumes
>> > > > hostemem_size == pmem_size + label_size
>> > > >
>> > > > I'd make hostmem_file.size optional and take size from file
>> > > > and if 'size' is specified explictly require it to mach file size.
>> > > > It's generic and has nothing to do with nvdimm.
>> > >
>> > > We can take size from file, or take size from the
>> > > host_memory_backend_get_memory() callers.
>> > >
>> > > Enumerating all sizes that QEMU can use as input:
>> > >
>> > > A) Backend file size
>> > > B) memory backend "size" option
>> > > C) frontend-provided size (-numa size, -m, or pc-dimm "size"
>> > > property)
>> > -numa size affect only anon memory not backend backed one, for
>> > backend baked memory we use memdev where size comes from backend
>> >
>> > pc-dimm.size is readonly and isn't supposed to influence backend.size
>> >
>> > I'd drop C option
>>
>> If C is not present, it should be, as it affects the guest ABI
>> (and the ABI must never depend on the host you are running or
>> backend configuration, only on the frontend configuration).
>I've meant that C should not affect behavior of backend.
>
>> If we are dropping -numa size in favor of the
>> memory-backend-provided size, that's a bug.
>-numa size is not applicable here as it's not using backends,
>when backends are used it's -numa memdev instead in which case
> numa_info[nodenr].node_mem = object_property_get_int(o, "size", NULL);
>
>>
>> >
>> > >
>> > > My suggestion is:
>> > > * B should be optional.
>> > > * If B is omitted, we should never truncate the file to a smaller
>> > > size.
>> > i.e. derive backend.size from filesize if possible (i.e. not hugepages)
>> >
>> > > * If B is omitted, we can use C as the size when mapping the
>> > > file.
>> > frontend size is the size that's mapped into guest address space.
>> > it should not influence backend's size in backward direction.
>> > You may notice pc-dimm.size is not user settable (readonly) property.
>>
>> Frontend size will not influence backend size, it will just
>> affect the size of the memory region the frontend code will ask
>> the backend to provide.
>>
>> In other words: I believe host_memory_backend_get_memory() needs
>> a 'size' argument, and that memory allocation could be optionally
>> delayed to the host_memory_backend_get_memory() call. This way,
>> we don't need a backend size at all, unless we want the backend
>> to truncate files or preallocate memory for us.
>To not complicate things I'd keep current behavior of
> host_memory_backend_get_memory()
>i.e return MR for all the baked memory and then frontend
>can split and map it into guest address space as it sees fit
>using aliases for non trivial cases taking in account frontend's
>own size/label_size/whatnot properties.
>That's what NVDIMM does now, it gets MR for whole file and
>the splits it on data and label areas and maps into GA only
>data part using memory_region_init_alias().
>
As NVDIMM label is mentioned here, I realize a case that extending
file size would fail:
NVDIMM labels record the namespace information of the data area and
should be persistently stored. The current vNVDIMM implementation in
QEMU always stores and finds those labels at the end of the given
backend storage. If a larger size B is given and the backend file is
extended, QEMU will not be able to find the labels stored on the
original file, and hence cannot present the same namespaces to guest.
This problem can fixed by deferring the allocation to
host_memory_backend_get_memory(), and adding a parameter 'notrunc' to
it to specify the truncation behavior. If no truncation is specified
and B > A, QEMU will report error and stop.
In the long term, we will allow the labels to be stored in separated
files, and extending the data file will be allowed.
Haozhong
[..]
next prev parent reply other threads:[~2016-10-21 7:22 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-20 6:13 [Qemu-devel] [PATCH] hostmem-file: add a property 'notrunc' to avoid data corruption Haozhong Zhang
2016-10-20 6:35 ` no-reply
2016-10-20 12:34 ` Igor Mammedov
2016-10-20 13:11 ` Haozhong Zhang
2016-10-20 13:34 ` Eduardo Habkost
2016-10-20 13:47 ` Haozhong Zhang
2016-10-20 13:42 ` Igor Mammedov
2016-10-20 13:56 ` Eduardo Habkost
2016-10-20 14:15 ` Igor Mammedov
2016-10-20 14:47 ` Eduardo Habkost
2016-10-20 15:35 ` Igor Mammedov
2016-10-20 16:56 ` Eduardo Habkost
2016-10-21 9:31 ` Igor Mammedov
2016-10-21 11:53 ` Eduardo Habkost
2016-10-21 13:26 ` Igor Mammedov
2016-10-21 7:22 ` Haozhong Zhang [this message]
2016-10-21 11:07 ` Igor Mammedov
2016-10-21 11:25 ` Haozhong Zhang
2016-10-21 11:56 ` Eduardo Habkost
2016-10-20 14:22 ` Haozhong Zhang
2016-10-20 15:14 ` Eduardo Habkost
2016-10-20 13:56 ` Haozhong Zhang
2016-10-20 13:21 ` Eduardo Habkost
2016-10-20 13:33 ` Haozhong Zhang
2016-10-20 13:47 ` Eduardo Habkost
2016-10-20 14:17 ` Igor Mammedov
2016-10-20 15:15 ` Eduardo Habkost
2016-10-20 15:41 ` Igor Mammedov
2016-10-20 16:59 ` Eduardo Habkost
2016-10-21 10:28 ` Igor Mammedov
2016-10-21 11:44 ` Eduardo Habkost
2016-10-20 13:47 ` Igor Mammedov
2016-10-20 13:57 ` Eduardo Habkost
2016-10-20 14:18 ` Igor Mammedov
2016-10-20 15:00 ` Eduardo Habkost
2016-10-20 15:14 ` Igor Mammedov
2016-10-20 13:27 ` Daniel P. Berrange
2016-10-20 13:40 ` Eduardo Habkost
2016-10-20 13:54 ` Igor Mammedov
2016-10-20 13:55 ` Kevin Wolf
2016-10-24 13:10 ` Eduardo Habkost
2016-10-25 6:42 ` Haozhong Zhang
2016-10-25 10:01 ` Eduardo Habkost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161021072210.sxfwixky4idlmzni@hz-desktop \
--to=haozhong.zhang@intel.com \
--cc=crosthwaite.peter@gmail.com \
--cc=ehabkost@redhat.com \
--cc=guangrong.xiao@linux.intel.com \
--cc=imammedo@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).