xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>,
	Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	xen-devel@lists.xen.org, Jan Beulich <JBeulich@suse.com>,
	Jun Nakajima <jun.nakajima@intel.com>, Keir Fraser <keir@xen.org>
Subject: Re: [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
Date: Thu, 21 Jan 2016 00:25:08 +0800	[thread overview]
Message-ID: <569FB4E4.4040204@linux.intel.com> (raw)
In-Reply-To: <20160120154749.GD1742@char.us.oracle.com>



On 01/20/2016 11:47 PM, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 20, 2016 at 11:29:55PM +0800, Xiao Guangrong wrote:
>>
>>
>> On 01/20/2016 07:20 PM, Jan Beulich wrote:
>>>>>> On 20.01.16 at 12:04, <haozhong.zhang@intel.com> wrote:
>>>> On 01/20/16 01:46, Jan Beulich wrote:
>>>>>>>> On 20.01.16 at 06:31, <haozhong.zhang@intel.com> wrote:
>>>>>> Secondly, the driver implements a convenient block device interface to
>>>>>> let software access areas where NVDIMM devices are mapped. The
>>>>>> existing vNVDIMM implementation in QEMU uses this interface.
>>>>>>
>>>>>> As Linux NVDIMM driver has already done above, why do we bother to
>>>>>> reimplement them in Xen?
>>>>>
>>>>> See above; a possibility is that we may need a split model (block
>>>>> layer parts on Dom0, "normal memory" parts in the hypervisor.
>>>>> Iirc the split is being determined by firmware, and hence set in
>>>>> stone by the time OS (or hypervisor) boot starts.
>>>>
>>>> For the "normal memory" parts, do you mean parts that map the host
>>>> NVDIMM device's address space range to the guest? I'm going to
>>>> implement that part in hypervisor and expose it as a hypercall so that
>>>> it can be used by QEMU.
>>>
>>> To answer this I need to have my understanding of the partitioning
>>> being done by firmware confirmed: If that's the case, then "normal"
>>> means the part that doesn't get exposed as a block device (SSD).
>>> In any event there's no correlation to guest exposure here.
>>
>> Firmware does not manage NVDIMM. All the operations of nvdimm are handled
>> by OS.
>>
>> Actually, there are lots of things we should take into account if we move
>> the NVDIMM management to hypervisor:
>
> If you remove the block device part and just deal with pmem part then this
> gets smaller.
>

Yes indeed. But xen can not benefit from NVDIMM BLK, i think it is not a long
time plan. :)

> Also the _DSM operations - I can't see them being in hypervisor - but only
> in the dom0 - which would have the right software to tickle the correct
> ioctl on /dev/pmem to do the "management" (carve the NVDIMM, perform
> an SMART operation, etc).

Yes, it is reasonable to put it in dom 0 and it makes management tools happy.

>
>> a) ACPI NFIT interpretation
>>     A new ACPI table introduced in ACPI 6.0 is named NFIT which exports the
>>     base information of NVDIMM devices which includes PMEM info, PBLK
>>     info, nvdimm device interleave, vendor info, etc. Let me explain it one
>>     by one.
>
> And it is a static table. As in part of the MADT.

Yes, it is, but we need to fetch updated nvdimm info from _FIT in SSDT/DSDT instead
if a nvdimm device is hotpluged, please see below.

>>
>>     PMEM and PBLK are two modes to access NVDIMM devices:
>>     1) PMEM can be treated as NV-RAM which is directly mapped to CPU's address
>>        space so that CPU can r/w it directly.
>>     2) as NVDIMM has huge capability and CPU's address space is limited, NVDIMM
>>        only offers two windows which are mapped to CPU's address space, the data
>>        window and access window, so that CPU can use these two windows to access
>>        the whole NVDIMM device.
>>
>>     NVDIMM device is interleaved whose info is also exported so that we can
>>     calculate the address to access the specified NVDIMM device.
>
> Right, along with the serial numbers.
>>
>>     NVDIMM devices from different vendor can have different function so that the
>>     vendor info is exported by NFIT to make vendor's driver work.
>
> via _DSM right?

Yes.

>>
>> b) ACPI SSDT interpretation
>>     SSDT offers _DSM method which controls NVDIMM device, such as label operation,
>>     health check etc and hotplug support.
>
> Sounds like the control domain (dom0) would be in charge of that.

Yup. Dom0 is a better place to handle it.

>>
>> c) Resource management
>>     NVDIMM resource management challenged as:
>>     1) PMEM is huge and it is little slower access than RAM so it is not suitable
>>        to manage it as page struct (i think it is not a big problem in Xen
>>        hypervisor?)
>>     2) need to partition it to it be used in multiple VMs.
>>     3) need to support PBLK and partition it in the future.
>
> That all sounds to me like an control domain (dom0) decisions. Not Xen hypervisor.

Sure, so let dom0 handle this is better, we are on the same page. :)

>>
>> d) management tools support
>>     S.M.A.R.T? error detection and recovering?
>>
>> c) hotplug support
>
> How does that work? Ah the _DSM will point to the new ACPI NFIT for the OS
> to scan. That would require the hypervisor also reading this for it to
> update it's data-structures.

Similar as you said. The NVDIMM root device in SSDT/DSDT dedicates a new interface,
_FIT, which return the new NFIT once new device hotplugged. And yes, domain 0 is
the better place handing this case too.

>>
>> d) third parts drivers
>>     Vendor drivers need to be ported to xen hypervisor and let it be supported in
>>     the management tool.
>
> Ewww.
>
> I presume the 'third party drivers' mean more interesting _DSM features right?

Yes.

> On the base level the firmware with this type of NVDIMM would still have
> the basic - ACPI NFIT + E820_NVDIMM (optional).
>>

Yes.

  reply	other threads:[~2016-01-20 16:25 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-29 11:31 [PATCH 0/4] add support for vNVDIMM Haozhong Zhang
2015-12-29 11:31 ` [PATCH 1/4] x86/hvm: allow guest to use clflushopt and clwb Haozhong Zhang
2015-12-29 15:46   ` Andrew Cooper
2015-12-30  1:35     ` Haozhong Zhang
2015-12-30  2:16       ` Haozhong Zhang
2015-12-30 10:33         ` Andrew Cooper
2015-12-29 11:31 ` [PATCH 2/4] x86/hvm: add support for pcommit instruction Haozhong Zhang
2015-12-29 11:31 ` [PATCH 3/4] tools/xl: add a new xl configuration 'nvdimm' Haozhong Zhang
2016-01-04 11:16   ` Wei Liu
2016-01-06 12:40   ` Jan Beulich
2016-01-06 15:28     ` Haozhong Zhang
2015-12-29 11:31 ` [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu Haozhong Zhang
2016-01-15 17:10   ` Jan Beulich
2016-01-18  0:52     ` Haozhong Zhang
2016-01-18  8:46       ` Jan Beulich
2016-01-19 11:37         ` Wei Liu
2016-01-19 11:46           ` Jan Beulich
2016-01-20  5:14             ` Tian, Kevin
2016-01-20  5:58               ` Zhang, Haozhong
2016-01-20  5:31         ` Haozhong Zhang
2016-01-20  8:46           ` Jan Beulich
2016-01-20  8:58             ` Andrew Cooper
2016-01-20 10:15               ` Haozhong Zhang
2016-01-20 10:36                 ` Xiao Guangrong
2016-01-20 13:16                   ` Andrew Cooper
2016-01-20 14:29                     ` Stefano Stabellini
2016-01-20 14:42                       ` Haozhong Zhang
2016-01-20 14:45                       ` Andrew Cooper
2016-01-20 14:53                         ` Haozhong Zhang
2016-01-20 15:13                           ` Konrad Rzeszutek Wilk
2016-01-20 15:29                             ` Haozhong Zhang
2016-01-20 15:41                               ` Konrad Rzeszutek Wilk
2016-01-20 15:54                                 ` Haozhong Zhang
2016-01-21  3:35                                 ` Bob Liu
2016-01-20 15:05                         ` Stefano Stabellini
2016-01-20 18:14                           ` Andrew Cooper
2016-01-20 14:38                     ` Haozhong Zhang
2016-01-20 11:04             ` Haozhong Zhang
2016-01-20 11:20               ` Jan Beulich
2016-01-20 15:29                 ` Xiao Guangrong
2016-01-20 15:47                   ` Konrad Rzeszutek Wilk
2016-01-20 16:25                     ` Xiao Guangrong [this message]
2016-01-20 16:47                       ` Konrad Rzeszutek Wilk
2016-01-20 16:55                         ` Xiao Guangrong
2016-01-20 17:18                           ` Konrad Rzeszutek Wilk
2016-01-20 17:23                             ` Xiao Guangrong
2016-01-20 17:48                               ` Konrad Rzeszutek Wilk
2016-01-21  3:12                             ` Haozhong Zhang
2016-01-20 17:07                   ` Jan Beulich
2016-01-20 17:17                     ` Xiao Guangrong
2016-01-21  8:18                       ` Jan Beulich
2016-01-21  8:25                         ` Xiao Guangrong
2016-01-21  8:53                           ` Jan Beulich
2016-01-21  9:10                             ` Xiao Guangrong
2016-01-21  9:29                               ` Andrew Cooper
2016-01-21 10:26                                 ` Jan Beulich
2016-01-21 10:25                               ` Jan Beulich
2016-01-21 14:01                                 ` Haozhong Zhang
2016-01-21 14:52                                   ` Jan Beulich
2016-01-22  2:43                                     ` Haozhong Zhang
2016-01-26 11:44                                     ` George Dunlap
2016-01-26 12:44                                       ` Jan Beulich
2016-01-26 12:54                                         ` Juergen Gross
2016-01-26 14:44                                           ` Konrad Rzeszutek Wilk
2016-01-26 15:37                                             ` Jan Beulich
2016-01-26 15:57                                               ` Haozhong Zhang
2016-01-26 16:34                                                 ` Jan Beulich
2016-01-26 19:32                                                   ` Konrad Rzeszutek Wilk
2016-01-27  7:22                                                     ` Haozhong Zhang
2016-01-27 10:16                                                     ` Jan Beulich
2016-01-27 14:50                                                       ` Konrad Rzeszutek Wilk
2016-01-27 10:55                                                   ` George Dunlap
2016-01-26 13:58                                         ` George Dunlap
2016-01-26 14:46                                           ` Konrad Rzeszutek Wilk
2016-01-26 15:30                                         ` Haozhong Zhang
2016-01-26 15:33                                           ` Haozhong Zhang
2016-01-26 15:57                                           ` Jan Beulich
2016-01-27  2:23                                             ` Haozhong Zhang
2016-01-20 15:07               ` Konrad Rzeszutek Wilk
2016-01-06 15:37 ` [PATCH 0/4] add support for vNVDIMM Ian Campbell
2016-01-06 15:47   ` Haozhong Zhang
2016-01-20  3:28 ` Tian, Kevin
2016-01-20 12:43   ` Stefano Stabellini
2016-01-20 14:26     ` Zhang, Haozhong
2016-01-20 14:35       ` Stefano Stabellini
2016-01-20 14:47         ` Zhang, Haozhong
2016-01-20 14:54           ` Andrew Cooper
2016-01-20 15:59             ` Haozhong Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=569FB4E4.4040204@linux.intel.com \
    --to=guangrong.xiao@linux.intel.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=haozhong.zhang@intel.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).