qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Milan Zamazal <mzamazal@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	"Liu, Jingqi" <jingqi.liu@intel.com>,
	"Lai, Paul C" <paul.c.lai@intel.com>,
	Cornelia Huck <cohuck@redhat.com>, Arik Hadas <ahadas@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Amnon Ilan <ailan@redhat.com>
Subject: Re: Adjustments of NVDIMM devices and future data safety
Date: Wed, 05 May 2021 22:46:11 +0200	[thread overview]
Message-ID: <87r1ikewoc.fsf@redhat.com> (raw)
In-Reply-To: <20210503160939.30047450@redhat.com> (Igor Mammedov's message of "Mon, 3 May 2021 16:09:39 +0200")

Igor Mammedov <imammedo@redhat.com> writes:

> On Fri, 30 Apr 2021 14:18:30 +0200
> Milan Zamazal <mzamazal@redhat.com> wrote:
>
>> Hi,
>> 
>> I work on NVDIMM support in oVirt/RHV, I think other virtualization
>> management software built on top of QEMU may have similar concerns.
>> 
>> When a virtual NVDIMM device size is specified, it's not necessarily the
>> eventual NVDIMM device size visible to the guest OS.  As seen in
>> https://github.com/qemu/qemu/blob/v6.0.0/hw/mem/nvdimm.c#L117, QEMU
>> makes some adjustments (other adjustments are performed by libvirt but
>> that's a topic for a different forum):
>> 
>> - NVDIMM label size is subtracted from the NVDIMM size.
>> 
>> - NVDIMM label is pointed to a certain memory region.
>> 
>> - The remaining NVDIMM size is aligned down.
>> 
>> There are some related potential problems:
>> 
>> - If the alignment rules change in a future QEMU version, it may result
>>   in a different device size visible to the guest (even if the requested
>>   size remains the same) and cause trouble there up to data loss.
>> 
>> - If the layout on the backing device changes, e.g. a label placement,
>>   then the stored data may become corrupt or inaccessible.
>
> We usually tie ABI changes to machine versions, so if in future we decide to
> change NVDIMM layout, we should preserve old layout for old machine types
> (which is accomplished using compat mechanism)

This could still be a problem if a layout change happened silently with
a machine type version change, since at least in oVirt/RHV, we update
machine types with new versions.

>> - I'm not sure about the current QEMU version, but at least in previous
>>   QEMU versions, the resulting size is important for memory hot plug.
>>   The NVDIMM alignment size is smaller than the required regular memory
>>   DIMM placement alignment.  If a VM contains an NVDIMM with the
>>   resulting size not matching the DIMM placement requirements and a
>>   memory hot plug is attempted then the hot plug fails because the DIMM
>>   is mapped next to the end of the NVDIMM region, which is not
>>   DIMM-aligned.
>
>
> I vaguely recall that, start address of any hotplugged (NV)DIMM
> is aligned on 1G boundary (only the very first versions of memory
> hotplug used unaligned address). Described above situation shouldn't happen.

I'm sure I've experienced the alignment problem, with a failing memory
hot plug.  I'll try to check it again, with a newer QEMU version.

> I'd try to fix alignment issues first if there is any before talking about
> splitting label out.
>
>
>> All this means:
>> 
>> - The requested NVDIMM size must be computed and specified carefully,
>>   with attention to QEMU internal implementation.
>> 
>> - And because it depends on QEMU internal implementation, there is a
>>   risk of malfunction or data loss when the same backing device with the
>>   same parameters is used with a future QEMU version.
>
> When making incompatible changes, we usually add a new property to enable them,
> so normally situation when NVDIMM "with the same parameters" is used
> should lead to old behaviour.

OK.

> If we change 'default' values then as long as one uses versioned machine
> type, old behaviour will be preserved with future QEMU.
> However if one uses un-versioned/another machine type or enables new behavior,
> QEMU doesn't guarantee any compatibility.

I see.  So it's best to pin each VM with NVDIMM devices to a particular
machine type version.

A question is what to do when it is desirable to update the VM to a
newer machine type version.  Is the only safe way to do it to backup
NVDIMM data and restore it from the backup after the VM restart?  I
guess incompatible NVDIMM changes are not going to be that frequent so
it looks like a bit user-unfriendly precaution.

>> As for labels, I was told NVDIMM labels might be put to regular files in
>> future to avoid some problems.  Since label placement is not visible to
>> the guest, such a change could be made transparently without disrupting
>> access to the data.  (As long as the label data is transferred to the
>> new location properly and undesirable resulting NVDIMM size changes are
>> not induced by such a change.)
>
> I think current approach resembles real nvdimm devices
> (the only problem is that one has to configure size/label size,
> where with real devices it's done by manufacturer).

Yes.

> if we add a dedicated property, It should be possible to split label out
> into a separate file.
> However I don't fancy carrying transparent migration from old format to
> the new one in QEMU, I think it should be done by separate utility. So
> that if users have access to backend + and know used label size,
> they should be able split it.

I think this would be an acceptable limitation.

> Also I'm not sure that splitting label out is fixing anything, it just
> replaces one set of rules how to set size/label (assuming there is one)
> with another + user will have to manage 1 more backend (content + label).
>  
>> The primary point is still how to ensure that data kept on a backing
>> device will remain accessible and safe in future QEMU versions and how
>> to possibly avoid reliance on QEMU implementation details.  A big
>> warning in the NVDIMM handling source code to keep backward
>> compatibility (incl. memory hot plugs) and data safety on mind before
>> making any changes there might be a reasonable minimum measure.
>> Any additional ideas?  What do you think about it all?
> we usually are trying to keep compatibility (versioned or new features
> are disabled by default and user has to explicitly enable them)
> when making breaking changes.
> (and that is done without extra warnings in the code,
> otherwise QEMU will be full of them).

Thank you for clarification.  I wonder whether an upper layer, such as
libvirt, could in theory assist with prevention of a disaster some way,
by playing with enabled features or so.  NVDIMM situation is somewhat
specific in that it contains data that can be damaged by an incompatible
change.  Which makes it more fragile than some other compatibility
problems fixable e.g. by restarting the VM with the original machine
type version or with updated parameters.

>> Thank you,
>> Milan
>> 
>> 



  reply	other threads:[~2021-05-05 20:48 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-30 12:18 Adjustments of NVDIMM devices and future data safety Milan Zamazal
2021-05-03 14:09 ` Igor Mammedov
2021-05-05 20:46   ` Milan Zamazal [this message]
2021-05-08  7:30 ` Liu, Jingqi
2021-05-18 15:29   ` Milan Zamazal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1ikewoc.fsf@redhat.com \
    --to=mzamazal@redhat.com \
    --cc=ahadas@redhat.com \
    --cc=ailan@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=jingqi.liu@intel.com \
    --cc=paul.c.lai@intel.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).