qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Haozhong Zhang <haozhong.zhang@intel.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: kwolf@redhat.com, Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Peter Crosthwaite <crosthwaite.peter@gmail.com>,
	qemu-devel@nongnu.org, mreitz@redhat.com,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH] hostmem-file: add a property 'notrunc' to avoid data corruption
Date: Fri, 21 Oct 2016 19:25:29 +0800	[thread overview]
Message-ID: <20161021112529.zm6ymdj6hnugvhtu@hz-desktop> (raw)
In-Reply-To: <20161021130700.7beff8d3@nial.brq.redhat.com>

On 10/21/16 13:07 +0200, Igor Mammedov wrote:
>On Fri, 21 Oct 2016 15:22:10 +0800
>Haozhong Zhang <haozhong.zhang@intel.com> wrote:
>
>> On 10/20/16 17:35 +0200, Igor Mammedov wrote:
>> >On Thu, 20 Oct 2016 12:47:34 -0200
>> >Eduardo Habkost <ehabkost@redhat.com> wrote:
>> >
>> >> On Thu, Oct 20, 2016 at 04:15:21PM +0200, Igor Mammedov wrote:
>> >> > On Thu, 20 Oct 2016 11:56:10 -0200
>> >> > Eduardo Habkost <ehabkost@redhat.com> wrote:
>> >> >
>> >> > > On Thu, Oct 20, 2016 at 03:42:15PM +0200, Igor Mammedov wrote:
>> >> > > > On Thu, 20 Oct 2016 21:11:38 +0800
>> >> > > > Haozhong Zhang <haozhong.zhang@intel.com> wrote:
>> >> > > >
>> >> > > > > On 10/20/16 14:34 +0200, Igor Mammedov wrote:
>> >> > > > > >On Thu, 20 Oct 2016 14:13:01 +0800
>> >> > > > > >Haozhong Zhang <haozhong.zhang@intel.com> wrote:
>> >> > > > > >
>> >> > > > > >> If a file is used as the backend of memory-backend-file and its size is
>> >> > > > > >> not identical to the property 'size', the file will be truncated. For a
>> >> > > > > >> file used as the backend of vNVDIMM, its data is expected to be
>> >> > > > > >> persistent and the truncation may corrupt the existing data.
>> >> > > > > >I wonder if it's possible just skip 'size' property in your case instead
>> >> > > > > >'notrunc' property. That way if size is not present one'd get actual size
>> >> > > > > >using get_file_size() and set 'size' to it?
>> >> > > > > >And if 'size' is provided and 'size' != file_size then error out.
>> >> > > > > >
>> >> > > > >
>> >> > > > > I don't know how this can be implemented in QEMU. Specially, how does
>> >> > > > > the memory-backend-file know it's used for vNVDIMM, so that it can
>> >> > > > > skip the 'size' property?
>> >> > > > Does memory-backend-file needs to know that it's used by NVDIMM?
>> >> > > > Looking at nvdimm_realize it doesn't as it's assumes
>> >> > > >   hostemem_size == pmem_size + label_size
>> >> > > >
>> >> > > > I'd make hostmem_file.size optional and take size from file
>> >> > > > and if 'size' is specified explictly require it to mach file size.
>> >> > > > It's generic and has nothing to do with nvdimm.
>> >> > >
>> >> > > We can take size from file, or take size from the
>> >> > > host_memory_backend_get_memory() callers.
>> >> > >
>> >> > > Enumerating all sizes that QEMU can use as input:
>> >> > >
>> >> > > A) Backend file size
>> >> > > B) memory backend "size" option
>> >> > > C) frontend-provided size (-numa size, -m, or pc-dimm "size"
>> >> > >     property)
>> >> > -numa size affect only anon memory not backend backed one, for
>> >> > backend baked memory we use memdev where size comes from backend
>> >> >
>> >> > pc-dimm.size is readonly and isn't supposed to influence backend.size
>> >> >
>> >> > I'd drop C option
>> >>
>> >> If C is not present, it should be, as it affects the guest ABI
>> >> (and the ABI must never depend on the host you are running or
>> >> backend configuration, only on the frontend configuration).
>> >I've meant that C should not affect behavior of backend.
>> >
>> >> If we are dropping -numa size in favor of the
>> >> memory-backend-provided size, that's a bug.
>> >-numa size is not applicable here as it's not using backends,
>> >when backends are used it's -numa memdev instead in which case
>> >   numa_info[nodenr].node_mem = object_property_get_int(o, "size", NULL);
>> >
>> >>
>> >> >
>> >> > >
>> >> > > My suggestion is:
>> >> > > * B should be optional.
>> >> > > * If B is omitted, we should never truncate the file to a smaller
>> >> > >   size.
>> >> > i.e. derive backend.size from filesize if possible (i.e. not hugepages)
>> >> >
>> >> > > * If B is omitted, we can use C as the size when mapping the
>> >> > >   file.
>> >> > frontend size is the size that's mapped into guest address space.
>> >> > it should not influence backend's size in backward direction.
>> >> > You may notice pc-dimm.size is not user settable (readonly) property.
>> >>
>> >> Frontend size will not influence backend size, it will just
>> >> affect the size of the memory region the frontend code will ask
>> >> the backend to provide.
>> >>
>> >> In other words: I believe host_memory_backend_get_memory() needs
>> >> a 'size' argument, and that memory allocation could be optionally
>> >> delayed to the host_memory_backend_get_memory() call. This way,
>> >> we don't need a backend size at all, unless we want the backend
>> >> to truncate files or preallocate memory for us.
>> >To not complicate things I'd keep current behavior of
>> >  host_memory_backend_get_memory()
>> >i.e return MR for all the baked memory and then frontend
>> >can split and map it into guest address space as it sees fit
>> >using aliases for non trivial cases taking in account frontend's
>> >own size/label_size/whatnot properties.
>> >That's what NVDIMM does now, it gets MR for whole file and
>> >the splits it on data and label areas and maps into GA only
>> >data part using memory_region_init_alias().
>> >
>>
>> As NVDIMM label is mentioned here, I realize a case that extending
>> file size would fail:
>>
>> NVDIMM labels record the namespace information of the data area and
>> should be persistently stored. The current vNVDIMM implementation in
>> QEMU always stores and finds those labels at the end of the given
>> backend storage. If a larger size B is given and the backend file is
>> extended, QEMU will not be able to find the labels stored on the
>> original file, and hence cannot present the same namespaces to guest.
>>
>> This problem can fixed by deferring the allocation to
>> host_memory_backend_get_memory(), and adding a parameter 'notrunc' to
>> it to specify the truncation behavior. If no truncation is specified
>> and B > A, QEMU will report error and stop.
>Lets not go there (I mean extending/deferring) and keep thing simple,
>backend object should be completely initialized upon completion of
>host_memory_backend_memory_complete() (including creation/mapping of
>whatever memory it's backed by) or fail if it can't do so.
>
>
>How about following behavior:
>
> 1) memory-backend-file,mem-path=/some_dir,size=2G
>    - uses truncate to extend temporary file created in "mem-path" to 'size'
>    for this case to work 'size' is mandatory
>
> 2) memory-backend-file,mem-path=/existing_file,size=2G
>    - uses truncate to extend/shrink file "mem-path" to 'size'
>    for this case 'size' could be made optional,
>    if we take in account that backend could be used as persistent
>    storage then shrinking or extending "mem-path" would be corruption
>    as backend has no idea about internal layout if mapped file.
>    We can do something like this here:
>
>    if (is_size_opt_provided and size_of(mem-path) != 0) {
>       error_out with "mem-path=foo size XXX doesn't match 'size=xxx' option"
>    } else if (is_size_opt_provided and size_of(mem-path) == 0) {
>       // may be we don't need this case and
>       // just fold this in above error case
>       truncate(mem-path) // extend/shrink
>    } else {
>      set_size_opt(size_of(mem-path))
>    }

Agree. Let's wait for Eduardo's comments. If he has no objection, I'll
prepare a patch in this way.

Thanks,
Haozhong

  reply	other threads:[~2016-10-21 11:25 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-20  6:13 [Qemu-devel] [PATCH] hostmem-file: add a property 'notrunc' to avoid data corruption Haozhong Zhang
2016-10-20  6:35 ` no-reply
2016-10-20 12:34 ` Igor Mammedov
2016-10-20 13:11   ` Haozhong Zhang
2016-10-20 13:34     ` Eduardo Habkost
2016-10-20 13:47       ` Haozhong Zhang
2016-10-20 13:42     ` Igor Mammedov
2016-10-20 13:56       ` Eduardo Habkost
2016-10-20 14:15         ` Igor Mammedov
2016-10-20 14:47           ` Eduardo Habkost
2016-10-20 15:35             ` Igor Mammedov
2016-10-20 16:56               ` Eduardo Habkost
2016-10-21  9:31                 ` Igor Mammedov
2016-10-21 11:53                   ` Eduardo Habkost
2016-10-21 13:26                     ` Igor Mammedov
2016-10-21  7:22               ` Haozhong Zhang
2016-10-21 11:07                 ` Igor Mammedov
2016-10-21 11:25                   ` Haozhong Zhang [this message]
2016-10-21 11:56                   ` Eduardo Habkost
2016-10-20 14:22         ` Haozhong Zhang
2016-10-20 15:14           ` Eduardo Habkost
2016-10-20 13:56       ` Haozhong Zhang
2016-10-20 13:21   ` Eduardo Habkost
2016-10-20 13:33     ` Haozhong Zhang
2016-10-20 13:47       ` Eduardo Habkost
2016-10-20 14:17         ` Igor Mammedov
2016-10-20 15:15           ` Eduardo Habkost
2016-10-20 15:41             ` Igor Mammedov
2016-10-20 16:59               ` Eduardo Habkost
2016-10-21 10:28                 ` Igor Mammedov
2016-10-21 11:44                   ` Eduardo Habkost
2016-10-20 13:47       ` Igor Mammedov
2016-10-20 13:57         ` Eduardo Habkost
2016-10-20 14:18           ` Igor Mammedov
2016-10-20 15:00             ` Eduardo Habkost
2016-10-20 15:14               ` Igor Mammedov
2016-10-20 13:27   ` Daniel P. Berrange
2016-10-20 13:40     ` Eduardo Habkost
2016-10-20 13:54     ` Igor Mammedov
2016-10-20 13:55   ` Kevin Wolf
2016-10-24 13:10     ` Eduardo Habkost
2016-10-25  6:42       ` Haozhong Zhang
2016-10-25 10:01         ` Eduardo Habkost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161021112529.zm6ymdj6hnugvhtu@hz-desktop \
    --to=haozhong.zhang@intel.com \
    --cc=crosthwaite.peter@gmail.com \
    --cc=ehabkost@redhat.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=imammedo@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).