xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Juergen Gross <JGross@suse.com>,
	Haozhong Zhang <haozhong.zhang@intel.com>,
	Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>,
	Keir Fraser <keir@xen.org>
Subject: Re: [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
Date: Wed, 27 Jan 2016 09:50:16 -0500	[thread overview]
Message-ID: <20160127145016.GC552@char.us.oracle.com> (raw)
In-Reply-To: <56A8A72B02000078000CB765@prv-mh.provo.novell.com>

On Wed, Jan 27, 2016 at 03:16:59AM -0700, Jan Beulich wrote:
> >>> On 26.01.16 at 20:32, <konrad.wilk@oracle.com> wrote:
> > On Tue, Jan 26, 2016 at 09:34:13AM -0700, Jan Beulich wrote:
> >> >>> On 26.01.16 at 16:57, <haozhong.zhang@intel.com> wrote:
> >> > On 01/26/16 08:37, Jan Beulich wrote:
> >> >> >>> On 26.01.16 at 15:44, <konrad.wilk@oracle.com> wrote:
> >> >> >>  Last year at Linux Plumbers Conference I attended a session dedicated
> >> >> >> to NVDIMM support. I asked the very same question and the INTEL guy
> >> >> >> there told me there is indeed something like a partition table meant
> >> >> >> to describe the layout of the memory areas and their contents.
> >> >> > 
> >> >> > It is described in details at pmem.io, look at  Documents, see
> >> >> > http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf see Namespaces section.
> >> >> 
> >> >> Well, that's about how PMEM and PBLK ranges get marked, but not
> >> >> about how use of the space inside a PMEM range is coordinated.
> >> >>
> >> > 
> >> > How a NVDIMM is partitioned into pmem and pblk is described by ACPI NFIT 
> >> > table.
> >> > Namespace to pmem is something like partition table to disk.
> >> 
> >> But I'm talking about sub-dividing the space inside an individual
> >> PMEM range.
> > 
> > The namespaces are it.
> > 
> > Once you have done them you can mount the PMEM range under say /dev/pmem0
> > and then put a filesystem on it (ext4, xfs) - and enable DAX support.
> > The DAX just means that the FS will bypass the page cache and write directly
> > to the virtual address.
> > 
> > then one can create giant 'dd' images on this filesystem and pass it
> > to QEMU to .. expose as NVDIMM to the guest. Because it is a file - the 
> > blocks
> > (or MFNs) for the contents of the file are most certainly discontingous.
> 
> And what's the advantage of this over PBLK? I.e. why would one
> want to separate PMEM and PBLK ranges if everything gets used
> the same way anyway?

Speed. PBLK emulates hardware - by having a sliding window of the DIMM. The
OS can only write to a ring-buffer with the system address and the payload
(64bytes I think?) - and the hardware (or firmware) picks it up and does the
writes to NVDIMM.

The only motivation behind this is to deal with errors. Normal PMEM writes
do not report errors. As in if the media is busted - the hardware will engage its
remap logic and write somewhere else - until all of its remap blocks have
been exhausted. At that point writes (I presume, not sure) and reads will report
an error - but via an #MCE. 

Part of this Xen design will be how to handle that :-)

With an PBLK - I presume the hardware/firmware will read the block after it has
written it - and if there are errors it will report it right away. Which means
you can easily hook PBLK nicely in RAID setups right away. It will be slower
than PMEM, but it does give you the normal error reporting. That is until
the MCE#->OS->fs errors logic gets figured out.

The MCE# logic code is being developed right now by Tony Luck on LKML - and
the last I saw the MCE# has the system address - and the MCE code would tag
the pages with some bit so that the applications would get a signal.

> 
> Jan
> 

  reply	other threads:[~2016-01-27 14:50 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-29 11:31 [PATCH 0/4] add support for vNVDIMM Haozhong Zhang
2015-12-29 11:31 ` [PATCH 1/4] x86/hvm: allow guest to use clflushopt and clwb Haozhong Zhang
2015-12-29 15:46   ` Andrew Cooper
2015-12-30  1:35     ` Haozhong Zhang
2015-12-30  2:16       ` Haozhong Zhang
2015-12-30 10:33         ` Andrew Cooper
2015-12-29 11:31 ` [PATCH 2/4] x86/hvm: add support for pcommit instruction Haozhong Zhang
2015-12-29 11:31 ` [PATCH 3/4] tools/xl: add a new xl configuration 'nvdimm' Haozhong Zhang
2016-01-04 11:16   ` Wei Liu
2016-01-06 12:40   ` Jan Beulich
2016-01-06 15:28     ` Haozhong Zhang
2015-12-29 11:31 ` [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu Haozhong Zhang
2016-01-15 17:10   ` Jan Beulich
2016-01-18  0:52     ` Haozhong Zhang
2016-01-18  8:46       ` Jan Beulich
2016-01-19 11:37         ` Wei Liu
2016-01-19 11:46           ` Jan Beulich
2016-01-20  5:14             ` Tian, Kevin
2016-01-20  5:58               ` Zhang, Haozhong
2016-01-20  5:31         ` Haozhong Zhang
2016-01-20  8:46           ` Jan Beulich
2016-01-20  8:58             ` Andrew Cooper
2016-01-20 10:15               ` Haozhong Zhang
2016-01-20 10:36                 ` Xiao Guangrong
2016-01-20 13:16                   ` Andrew Cooper
2016-01-20 14:29                     ` Stefano Stabellini
2016-01-20 14:42                       ` Haozhong Zhang
2016-01-20 14:45                       ` Andrew Cooper
2016-01-20 14:53                         ` Haozhong Zhang
2016-01-20 15:13                           ` Konrad Rzeszutek Wilk
2016-01-20 15:29                             ` Haozhong Zhang
2016-01-20 15:41                               ` Konrad Rzeszutek Wilk
2016-01-20 15:54                                 ` Haozhong Zhang
2016-01-21  3:35                                 ` Bob Liu
2016-01-20 15:05                         ` Stefano Stabellini
2016-01-20 18:14                           ` Andrew Cooper
2016-01-20 14:38                     ` Haozhong Zhang
2016-01-20 11:04             ` Haozhong Zhang
2016-01-20 11:20               ` Jan Beulich
2016-01-20 15:29                 ` Xiao Guangrong
2016-01-20 15:47                   ` Konrad Rzeszutek Wilk
2016-01-20 16:25                     ` Xiao Guangrong
2016-01-20 16:47                       ` Konrad Rzeszutek Wilk
2016-01-20 16:55                         ` Xiao Guangrong
2016-01-20 17:18                           ` Konrad Rzeszutek Wilk
2016-01-20 17:23                             ` Xiao Guangrong
2016-01-20 17:48                               ` Konrad Rzeszutek Wilk
2016-01-21  3:12                             ` Haozhong Zhang
2016-01-20 17:07                   ` Jan Beulich
2016-01-20 17:17                     ` Xiao Guangrong
2016-01-21  8:18                       ` Jan Beulich
2016-01-21  8:25                         ` Xiao Guangrong
2016-01-21  8:53                           ` Jan Beulich
2016-01-21  9:10                             ` Xiao Guangrong
2016-01-21  9:29                               ` Andrew Cooper
2016-01-21 10:26                                 ` Jan Beulich
2016-01-21 10:25                               ` Jan Beulich
2016-01-21 14:01                                 ` Haozhong Zhang
2016-01-21 14:52                                   ` Jan Beulich
2016-01-22  2:43                                     ` Haozhong Zhang
2016-01-26 11:44                                     ` George Dunlap
2016-01-26 12:44                                       ` Jan Beulich
2016-01-26 12:54                                         ` Juergen Gross
2016-01-26 14:44                                           ` Konrad Rzeszutek Wilk
2016-01-26 15:37                                             ` Jan Beulich
2016-01-26 15:57                                               ` Haozhong Zhang
2016-01-26 16:34                                                 ` Jan Beulich
2016-01-26 19:32                                                   ` Konrad Rzeszutek Wilk
2016-01-27  7:22                                                     ` Haozhong Zhang
2016-01-27 10:16                                                     ` Jan Beulich
2016-01-27 14:50                                                       ` Konrad Rzeszutek Wilk [this message]
2016-01-27 10:55                                                   ` George Dunlap
2016-01-26 13:58                                         ` George Dunlap
2016-01-26 14:46                                           ` Konrad Rzeszutek Wilk
2016-01-26 15:30                                         ` Haozhong Zhang
2016-01-26 15:33                                           ` Haozhong Zhang
2016-01-26 15:57                                           ` Jan Beulich
2016-01-27  2:23                                             ` Haozhong Zhang
2016-01-20 15:07               ` Konrad Rzeszutek Wilk
2016-01-06 15:37 ` [PATCH 0/4] add support for vNVDIMM Ian Campbell
2016-01-06 15:47   ` Haozhong Zhang
2016-01-20  3:28 ` Tian, Kevin
2016-01-20 12:43   ` Stefano Stabellini
2016-01-20 14:26     ` Zhang, Haozhong
2016-01-20 14:35       ` Stefano Stabellini
2016-01-20 14:47         ` Zhang, Haozhong
2016-01-20 14:54           ` Andrew Cooper
2016-01-20 15:59             ` Haozhong Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160127145016.GC552@char.us.oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=JGross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=haozhong.zhang@intel.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).