qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
To: "Richard W.M. Jones" <rjones@redhat.com>,
	Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org, "Zhang, Haozhong" <haozhong.zhang@intel.com>
Subject: Re: [Qemu-devel] Question about vNVDIMM file format
Date: Wed, 18 May 2016 15:04:52 +0800	[thread overview]
Message-ID: <573C1414.1050608@linux.intel.com> (raw)
In-Reply-To: <20160516182508.GQ1683@redhat.com>


Hi Rich,


On 05/17/2016 02:25 AM, Richard W.M. Jones wrote:
> On Mon, May 16, 2016 at 09:53:36AM -0700, Stefan Hajnoczi wrote:
>> On Mon, May 16, 2016 at 04:04:01PM +0100, Richard W.M. Jones wrote:
>>> I'm playing with ext4 and DAX.
>>>
>>> I'm using:
>>>
>>>    -object memory-backend-file,id=mem1,share,mem-path=/var/tmp/pmem,size=4G \
>>>    -device nvdimm,memdev=mem1,id=nv1
>>>
>>> where /var/tmp/pmem is a 4 GB ext4 filesystem image (no partition
>>> table).  I can mount this in the guest using:
>>>
>>>    mount -o dax /dev/pmem0 /mnt
>>>
>>> and everything appears to work.
>>>
>>> I read in the mailing list that the pmem file has some internal
>>> structure for storing config data, stored in the last 128 KB of the
>>> file.  Is that still the case?
>>
>> AFAICT qemu.git/master does not support the ACPI _DSM for namespace
>> configuration.  That means the entire /var/tmp/pmem should be visible.
>
> That's great, thanks both for your answers.
>
> FWIW I was able to add support to libguestfs -- at least for the
> "direct" backend where we run qemu directly.  Unfortunately libvirt
> does not support the vNVDIMM device yet.
>
> I have posted the two patches needed on our mailing list.  There seems
> to be some delay in our mail server, so they aren't in the archives
> yet:
>
>    https://www.redhat.com/archives/libguestfs/2016-May/thread.html
>
> There are a few possible problems / questions I have:
>
> (a) How necessary is the ACPI dependency?  We disable ACPI because it
> is quite slow, adding something like 150-200ms to the boot process
> (every millisecond counts for us!).  Because I previously never needed
> ACPI, I never really looked into why this is, and it could be
> something quite simple, so I'm going to look at this issue next.  I
> understand that NVDIMMs are not regular (eg) PCI devices, so ordinary
> device probing isn't going to work, and that probably answers the
> question why you need to use ACPI.

Yes, ACPI is necessary to export NVDIMM devices. The good news is that
Intel is working on ‘lite QEMU’ which only has basic/simplest ACPI
support. Haozhong, who has been CCed, is working on it.

>
> (b) Could you describe what the 3 modules (nd_btt, nd_pmem, nfit) do?
> Are all 3 modules necessary in the guest kernel?

I think the best answer is from Kernel's Kconfig :):
ACPI_NFIT: Infrastructure to probe ACPI 6 compliant platforms for
            NVDIMMs (NFIT) and register a libnvdimm device tree

BTT:
           The Block Translation Table (BTT) provides atomic sector
           update semantics for persistent memory devices, so that
           applications that rely on sector writes not being torn (a
           guarantee that typical disks provide) can continue to do so.

PMEM:
           Memory ranges for PMEM are described by either an NFIT
           (NVDIMM Firmware Interface Table, see CONFIG_NFIT_ACPI), a
           non-standard OEM-specific E820 memory type (type-12, see
           CONFIG_X86_PMEM_LEGACY), or it is manually specified by the
           'memmap=nn[KMG]!ss[KMG]' kernel command line (see
           Documentation/kernel-parameters.txt).  This driver converts
           these persistent memory ranges into block devices that are
           capable of DAX (direct-access) file system mappings

Currently vNVDIMM is pure PMEM device without label, BTT is unnecessary,
so you can say N to BTT when configuring linux kernel for VM.

>
> (c) I've got the root filesystem (which is actually ext2, but using
> the ext4.ko driver) mounted with -o dax.  What benefits / differences
> should I observe?  Just general reduced memory / page cache usage?
>

And better performance as slow IO path is not needed anymore. :)

However, there is potential issue if it is not backend by real NVDIMM
hardware, the data is not persistent. We are going to resolve it by
emulating PCOMMIT and do msync properly.

> (d) If, in future, you add the namespace metadata, what tools will be
> available on the host to create a packed filesystem + metadata?
> Assuming that we won't be able to export just a filesystem as I am
> doing now.

Yes, this kind of tool is useful, we has this plan however it is low priority
in our TODO. :(

  reply	other threads:[~2016-05-18  7:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-16 15:04 [Qemu-devel] Question about vNVDIMM file format Richard W.M. Jones
2016-05-16 16:53 ` Stefan Hajnoczi
2016-05-16 18:25   ` Richard W.M. Jones
2016-05-18  7:04     ` Xiao Guangrong [this message]
2016-05-18  8:11       ` Zhang, Haozhong
2016-05-18 10:50       ` Richard W.M. Jones
2016-05-18 17:36       ` Richard W.M. Jones
2016-05-16 17:39 ` Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=573C1414.1050608@linux.intel.com \
    --to=guangrong.xiao@linux.intel.com \
    --cc=haozhong.zhang@intel.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).