From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiao Guangrong Subject: Re: [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu Date: Thu, 21 Jan 2016 00:25:08 +0800 Message-ID: <569FB4E4.4040204@linux.intel.com> References: <1451388711-18646-1-git-send-email-haozhong.zhang@intel.com> <1451388711-18646-5-git-send-email-haozhong.zhang@intel.com> <5699362402000078000C7803@prv-mh.provo.novell.com> <20160118005255.GC3528@hz-desktop.sh.intel.com> <569CB47502000078000C7CFB@prv-mh.provo.novell.com> <20160120053132.GA5005@hz-desktop.sh.intel.com> <569F575902000078000C8EDC@prv-mh.provo.novell.com> <20160120110449.GD4939@hz-desktop.sh.intel.com> <569F7B8302000078000C8FF8@prv-mh.provo.novell.com> <569FA7F3.8080506@linux.intel.com> <20160120154749.GD1742@char.us.oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20160120154749.GD1742@char.us.oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Haozhong Zhang , Kevin Tian , Wei Liu , Ian Campbell , Stefano Stabellini , Andrew Cooper , Ian Jackson , xen-devel@lists.xen.org, Jan Beulich , Jun Nakajima , Keir Fraser List-Id: xen-devel@lists.xenproject.org On 01/20/2016 11:47 PM, Konrad Rzeszutek Wilk wrote: > On Wed, Jan 20, 2016 at 11:29:55PM +0800, Xiao Guangrong wrote: >> >> >> On 01/20/2016 07:20 PM, Jan Beulich wrote: >>>>>> On 20.01.16 at 12:04, wrote: >>>> On 01/20/16 01:46, Jan Beulich wrote: >>>>>>>> On 20.01.16 at 06:31, wrote: >>>>>> Secondly, the driver implements a convenient block device interface to >>>>>> let software access areas where NVDIMM devices are mapped. The >>>>>> existing vNVDIMM implementation in QEMU uses this interface. >>>>>> >>>>>> As Linux NVDIMM driver has already done above, why do we bother to >>>>>> reimplement them in Xen? >>>>> >>>>> See above; a possibility is that we may need a split model (block >>>>> layer parts on Dom0, "normal memory" parts in the hypervisor. >>>>> Iirc the split is being determined by firmware, and hence set in >>>>> stone by the time OS (or hypervisor) boot starts. >>>> >>>> For the "normal memory" parts, do you mean parts that map the host >>>> NVDIMM device's address space range to the guest? I'm going to >>>> implement that part in hypervisor and expose it as a hypercall so that >>>> it can be used by QEMU. >>> >>> To answer this I need to have my understanding of the partitioning >>> being done by firmware confirmed: If that's the case, then "normal" >>> means the part that doesn't get exposed as a block device (SSD). >>> In any event there's no correlation to guest exposure here. >> >> Firmware does not manage NVDIMM. All the operations of nvdimm are handled >> by OS. >> >> Actually, there are lots of things we should take into account if we move >> the NVDIMM management to hypervisor: > > If you remove the block device part and just deal with pmem part then this > gets smaller. > Yes indeed. But xen can not benefit from NVDIMM BLK, i think it is not a long time plan. :) > Also the _DSM operations - I can't see them being in hypervisor - but only > in the dom0 - which would have the right software to tickle the correct > ioctl on /dev/pmem to do the "management" (carve the NVDIMM, perform > an SMART operation, etc). Yes, it is reasonable to put it in dom 0 and it makes management tools happy. > >> a) ACPI NFIT interpretation >> A new ACPI table introduced in ACPI 6.0 is named NFIT which exports the >> base information of NVDIMM devices which includes PMEM info, PBLK >> info, nvdimm device interleave, vendor info, etc. Let me explain it one >> by one. > > And it is a static table. As in part of the MADT. Yes, it is, but we need to fetch updated nvdimm info from _FIT in SSDT/DSDT instead if a nvdimm device is hotpluged, please see below. >> >> PMEM and PBLK are two modes to access NVDIMM devices: >> 1) PMEM can be treated as NV-RAM which is directly mapped to CPU's address >> space so that CPU can r/w it directly. >> 2) as NVDIMM has huge capability and CPU's address space is limited, NVDIMM >> only offers two windows which are mapped to CPU's address space, the data >> window and access window, so that CPU can use these two windows to access >> the whole NVDIMM device. >> >> NVDIMM device is interleaved whose info is also exported so that we can >> calculate the address to access the specified NVDIMM device. > > Right, along with the serial numbers. >> >> NVDIMM devices from different vendor can have different function so that the >> vendor info is exported by NFIT to make vendor's driver work. > > via _DSM right? Yes. >> >> b) ACPI SSDT interpretation >> SSDT offers _DSM method which controls NVDIMM device, such as label operation, >> health check etc and hotplug support. > > Sounds like the control domain (dom0) would be in charge of that. Yup. Dom0 is a better place to handle it. >> >> c) Resource management >> NVDIMM resource management challenged as: >> 1) PMEM is huge and it is little slower access than RAM so it is not suitable >> to manage it as page struct (i think it is not a big problem in Xen >> hypervisor?) >> 2) need to partition it to it be used in multiple VMs. >> 3) need to support PBLK and partition it in the future. > > That all sounds to me like an control domain (dom0) decisions. Not Xen hypervisor. Sure, so let dom0 handle this is better, we are on the same page. :) >> >> d) management tools support >> S.M.A.R.T? error detection and recovering? >> >> c) hotplug support > > How does that work? Ah the _DSM will point to the new ACPI NFIT for the OS > to scan. That would require the hypervisor also reading this for it to > update it's data-structures. Similar as you said. The NVDIMM root device in SSDT/DSDT dedicates a new interface, _FIT, which return the new NFIT once new device hotplugged. And yes, domain 0 is the better place handing this case too. >> >> d) third parts drivers >> Vendor drivers need to be ported to xen hypervisor and let it be supported in >> the management tool. > > Ewww. > > I presume the 'third party drivers' mean more interesting _DSM features right? Yes. > On the base level the firmware with this type of NVDIMM would still have > the basic - ACPI NFIT + E820_NVDIMM (optional). >> Yes.