From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>,
Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
Ian Campbell <ian.campbell@citrix.com>,
Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
Jun Nakajima <jun.nakajima@intel.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Ian Jackson <ian.jackson@eu.citrix.com>,
xen-devel@lists.xen.org, Jan Beulich <JBeulich@suse.com>,
Keir Fraser <keir@xen.org>
Subject: Re: [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
Date: Wed, 20 Jan 2016 10:47:49 -0500 [thread overview]
Message-ID: <20160120154749.GD1742@char.us.oracle.com> (raw)
In-Reply-To: <569FA7F3.8080506@linux.intel.com>
On Wed, Jan 20, 2016 at 11:29:55PM +0800, Xiao Guangrong wrote:
>
>
> On 01/20/2016 07:20 PM, Jan Beulich wrote:
> >>>>On 20.01.16 at 12:04, <haozhong.zhang@intel.com> wrote:
> >>On 01/20/16 01:46, Jan Beulich wrote:
> >>>>>>On 20.01.16 at 06:31, <haozhong.zhang@intel.com> wrote:
> >>>>Secondly, the driver implements a convenient block device interface to
> >>>>let software access areas where NVDIMM devices are mapped. The
> >>>>existing vNVDIMM implementation in QEMU uses this interface.
> >>>>
> >>>>As Linux NVDIMM driver has already done above, why do we bother to
> >>>>reimplement them in Xen?
> >>>
> >>>See above; a possibility is that we may need a split model (block
> >>>layer parts on Dom0, "normal memory" parts in the hypervisor.
> >>>Iirc the split is being determined by firmware, and hence set in
> >>>stone by the time OS (or hypervisor) boot starts.
> >>
> >>For the "normal memory" parts, do you mean parts that map the host
> >>NVDIMM device's address space range to the guest? I'm going to
> >>implement that part in hypervisor and expose it as a hypercall so that
> >>it can be used by QEMU.
> >
> >To answer this I need to have my understanding of the partitioning
> >being done by firmware confirmed: If that's the case, then "normal"
> >means the part that doesn't get exposed as a block device (SSD).
> >In any event there's no correlation to guest exposure here.
>
> Firmware does not manage NVDIMM. All the operations of nvdimm are handled
> by OS.
>
> Actually, there are lots of things we should take into account if we move
> the NVDIMM management to hypervisor:
If you remove the block device part and just deal with pmem part then this
gets smaller.
Also the _DSM operations - I can't see them being in hypervisor - but only
in the dom0 - which would have the right software to tickle the correct
ioctl on /dev/pmem to do the "management" (carve the NVDIMM, perform
an SMART operation, etc).
> a) ACPI NFIT interpretation
> A new ACPI table introduced in ACPI 6.0 is named NFIT which exports the
> base information of NVDIMM devices which includes PMEM info, PBLK
> info, nvdimm device interleave, vendor info, etc. Let me explain it one
> by one.
And it is a static table. As in part of the MADT.
>
> PMEM and PBLK are two modes to access NVDIMM devices:
> 1) PMEM can be treated as NV-RAM which is directly mapped to CPU's address
> space so that CPU can r/w it directly.
> 2) as NVDIMM has huge capability and CPU's address space is limited, NVDIMM
> only offers two windows which are mapped to CPU's address space, the data
> window and access window, so that CPU can use these two windows to access
> the whole NVDIMM device.
>
> NVDIMM device is interleaved whose info is also exported so that we can
> calculate the address to access the specified NVDIMM device.
Right, along with the serial numbers.
>
> NVDIMM devices from different vendor can have different function so that the
> vendor info is exported by NFIT to make vendor's driver work.
via _DSM right?
>
> b) ACPI SSDT interpretation
> SSDT offers _DSM method which controls NVDIMM device, such as label operation,
> health check etc and hotplug support.
Sounds like the control domain (dom0) would be in charge of that.
>
> c) Resource management
> NVDIMM resource management challenged as:
> 1) PMEM is huge and it is little slower access than RAM so it is not suitable
> to manage it as page struct (i think it is not a big problem in Xen
> hypervisor?)
> 2) need to partition it to it be used in multiple VMs.
> 3) need to support PBLK and partition it in the future.
That all sounds to me like an control domain (dom0) decisions. Not Xen hypervisor.
>
> d) management tools support
> S.M.A.R.T? error detection and recovering?
>
> c) hotplug support
How does that work? Ah the _DSM will point to the new ACPI NFIT for the OS
to scan. That would require the hypervisor also reading this for it to
update it's data-structures.
>
> d) third parts drivers
> Vendor drivers need to be ported to xen hypervisor and let it be supported in
> the management tool.
Ewww.
I presume the 'third party drivers' mean more interesting _DSM features right?
On the base level the firmware with this type of NVDIMM would still have
the basic - ACPI NFIT + E820_NVDIMM (optional).
>
> e) ...
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-01-20 15:47 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-29 11:31 [PATCH 0/4] add support for vNVDIMM Haozhong Zhang
2015-12-29 11:31 ` [PATCH 1/4] x86/hvm: allow guest to use clflushopt and clwb Haozhong Zhang
2015-12-29 15:46 ` Andrew Cooper
2015-12-30 1:35 ` Haozhong Zhang
2015-12-30 2:16 ` Haozhong Zhang
2015-12-30 10:33 ` Andrew Cooper
2015-12-29 11:31 ` [PATCH 2/4] x86/hvm: add support for pcommit instruction Haozhong Zhang
2015-12-29 11:31 ` [PATCH 3/4] tools/xl: add a new xl configuration 'nvdimm' Haozhong Zhang
2016-01-04 11:16 ` Wei Liu
2016-01-06 12:40 ` Jan Beulich
2016-01-06 15:28 ` Haozhong Zhang
2015-12-29 11:31 ` [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu Haozhong Zhang
2016-01-15 17:10 ` Jan Beulich
2016-01-18 0:52 ` Haozhong Zhang
2016-01-18 8:46 ` Jan Beulich
2016-01-19 11:37 ` Wei Liu
2016-01-19 11:46 ` Jan Beulich
2016-01-20 5:14 ` Tian, Kevin
2016-01-20 5:58 ` Zhang, Haozhong
2016-01-20 5:31 ` Haozhong Zhang
2016-01-20 8:46 ` Jan Beulich
2016-01-20 8:58 ` Andrew Cooper
2016-01-20 10:15 ` Haozhong Zhang
2016-01-20 10:36 ` Xiao Guangrong
2016-01-20 13:16 ` Andrew Cooper
2016-01-20 14:29 ` Stefano Stabellini
2016-01-20 14:42 ` Haozhong Zhang
2016-01-20 14:45 ` Andrew Cooper
2016-01-20 14:53 ` Haozhong Zhang
2016-01-20 15:13 ` Konrad Rzeszutek Wilk
2016-01-20 15:29 ` Haozhong Zhang
2016-01-20 15:41 ` Konrad Rzeszutek Wilk
2016-01-20 15:54 ` Haozhong Zhang
2016-01-21 3:35 ` Bob Liu
2016-01-20 15:05 ` Stefano Stabellini
2016-01-20 18:14 ` Andrew Cooper
2016-01-20 14:38 ` Haozhong Zhang
2016-01-20 11:04 ` Haozhong Zhang
2016-01-20 11:20 ` Jan Beulich
2016-01-20 15:29 ` Xiao Guangrong
2016-01-20 15:47 ` Konrad Rzeszutek Wilk [this message]
2016-01-20 16:25 ` Xiao Guangrong
2016-01-20 16:47 ` Konrad Rzeszutek Wilk
2016-01-20 16:55 ` Xiao Guangrong
2016-01-20 17:18 ` Konrad Rzeszutek Wilk
2016-01-20 17:23 ` Xiao Guangrong
2016-01-20 17:48 ` Konrad Rzeszutek Wilk
2016-01-21 3:12 ` Haozhong Zhang
2016-01-20 17:07 ` Jan Beulich
2016-01-20 17:17 ` Xiao Guangrong
2016-01-21 8:18 ` Jan Beulich
2016-01-21 8:25 ` Xiao Guangrong
2016-01-21 8:53 ` Jan Beulich
2016-01-21 9:10 ` Xiao Guangrong
2016-01-21 9:29 ` Andrew Cooper
2016-01-21 10:26 ` Jan Beulich
2016-01-21 10:25 ` Jan Beulich
2016-01-21 14:01 ` Haozhong Zhang
2016-01-21 14:52 ` Jan Beulich
2016-01-22 2:43 ` Haozhong Zhang
2016-01-26 11:44 ` George Dunlap
2016-01-26 12:44 ` Jan Beulich
2016-01-26 12:54 ` Juergen Gross
2016-01-26 14:44 ` Konrad Rzeszutek Wilk
2016-01-26 15:37 ` Jan Beulich
2016-01-26 15:57 ` Haozhong Zhang
2016-01-26 16:34 ` Jan Beulich
2016-01-26 19:32 ` Konrad Rzeszutek Wilk
2016-01-27 7:22 ` Haozhong Zhang
2016-01-27 10:16 ` Jan Beulich
2016-01-27 14:50 ` Konrad Rzeszutek Wilk
2016-01-27 10:55 ` George Dunlap
2016-01-26 13:58 ` George Dunlap
2016-01-26 14:46 ` Konrad Rzeszutek Wilk
2016-01-26 15:30 ` Haozhong Zhang
2016-01-26 15:33 ` Haozhong Zhang
2016-01-26 15:57 ` Jan Beulich
2016-01-27 2:23 ` Haozhong Zhang
2016-01-20 15:07 ` Konrad Rzeszutek Wilk
2016-01-06 15:37 ` [PATCH 0/4] add support for vNVDIMM Ian Campbell
2016-01-06 15:47 ` Haozhong Zhang
2016-01-20 3:28 ` Tian, Kevin
2016-01-20 12:43 ` Stefano Stabellini
2016-01-20 14:26 ` Zhang, Haozhong
2016-01-20 14:35 ` Stefano Stabellini
2016-01-20 14:47 ` Zhang, Haozhong
2016-01-20 14:54 ` Andrew Cooper
2016-01-20 15:59 ` Haozhong Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160120154749.GD1742@char.us.oracle.com \
--to=konrad.wilk@oracle.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=guangrong.xiao@linux.intel.com \
--cc=haozhong.zhang@intel.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jun.nakajima@intel.com \
--cc=keir@xen.org \
--cc=kevin.tian@intel.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).