From: "Zhang, Haozhong" <haozhong.zhang@intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
Ian Campbell <ian.campbell@citrix.com>,
StefanoStabellini <stefano.stabellini@eu.citrix.com>,
"Nakajima, Jun" <jun.nakajima@intel.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Ian Jackson <ian.jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <JBeulich@suse.com>, Keir Fraser <keir@xen.org>
Subject: Re: [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu
Date: Wed, 20 Jan 2016 13:58:21 +0800 [thread overview]
Message-ID: <20160120055821.GB5005@hz-desktop.sh.intel.com> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D15F7861C5@SHSMSX101.ccr.corp.intel.com>
On 01/20/16 13:14, Tian, Kevin wrote:
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Tuesday, January 19, 2016 7:47 PM
> >
> > >>> On 19.01.16 at 12:37, <wei.liu2@citrix.com> wrote:
> > > On Mon, Jan 18, 2016 at 01:46:29AM -0700, Jan Beulich wrote:
> > >> >>> On 18.01.16 at 01:52, <haozhong.zhang@intel.com> wrote:
> > >> > On 01/15/16 10:10, Jan Beulich wrote:
> > >> >> >>> On 29.12.15 at 12:31, <haozhong.zhang@intel.com> wrote:
> > >> >> > NVDIMM devices are detected and configured by software through
> > >> >> > ACPI. Currently, QEMU maintains ACPI tables of vNVDIMM devices. This
> > >> >> > patch extends the existing mechanism in hvmloader of loading passthrough
> > >> >> > ACPI tables to load extra ACPI tables built by QEMU.
> > >> >>
> > >> >> Mechanically the patch looks okay, but whether it's actually needed
> > >> >> depends on whether indeed we want NV RAM managed in qemu
> > >> >> instead of in the hypervisor (where imo it belongs); I didn' see any
> > >> >> reply yet to that same comment of mine made (iirc) in the context
> > >> >> of another patch.
> > >> >
> > >> > One purpose of this patch series is to provide vNVDIMM backed by host
> > >> > NVDIMM devices. It requires some drivers to detect and manage host
> > >> > NVDIMM devices (including parsing ACPI, managing labels, etc.) that
> > >> > are not trivial, so I leave this work to the dom0 linux. Current Linux
> > >> > kernel abstract NVDIMM devices as block devices (/dev/pmemXX). QEMU
> > >> > then mmaps them into certain range of dom0's address space and asks
> > >> > Xen hypervisor to map that range of address space to a domU.
> > >> >
> > >
> > > OOI Do we have a viable solution to do all these non-trivial things in
> > > core hypervisor? Are you proposing designing a new set of hypercalls
> > > for NVDIMM?
> >
> > That's certainly a possibility; I lack sufficient detail to make myself
> > an opinion which route is going to be best.
> >
> > Jan
>
> Hi, Haozhong,
>
> Are NVDIMM related ACPI table in plain text format, or do they require
> a ACPI parser to decode? Is there a corresponding E820 entry?
>
Most in plain text format, but still the driver evaluates _FIT
(firmware interface table) method and decode is needed then.
> Above information would be useful to help decide the direction.
>
> In a glimpse I like Jan's idea that it's better to let Xen manage NVDIMM
> since it's a type of memory resource while for memory we expect hypervisor
> to centrally manage.
>
> However in another thought the answer is different if we view this
> resource as a MMIO resource, similar to PCI BAR MMIO, ACPI NVS, etc.
> then it should be fine to have Dom0 manage NVDIMM then Xen just controls
> the mapping based on existing io permission mechanism.
>
It's more like a MMIO device than the normal ram.
> Another possible point for this model is that PMEM is only one mode of
> NVDIMM device, which can be also exposed as a storage device. In the
> latter case the management has to be in Dom0. So we don't need to
> scatter the management role into Dom0/Xen based on different modes.
>
NVDIMM device in pmem mode is exposed as storage device (a block
device /dev/pmemXX) in Linux, and it's also used like a disk drive
(you can make file system on it, create files on it and even pass
files rather than a whole /dev/pmemXX to guests).
> Back to your earlier questions:
>
> > (1) The QEMU patches use xc_hvm_map_io_range_to_ioreq_server() to map
> > the host NVDIMM to domU, which results VMEXIT for every guest
> > read/write to the corresponding vNVDIMM devices. I'm going to find
> > a way to passthrough the address space range of host NVDIMM to a
> > guest domU (similarly to what xen-pt in QEMU uses)
> >
> > (2) Xen currently does not check whether the address that QEMU asks to
> > map to domU is really within the host NVDIMM address
> > space. Therefore, Xen hypervisor needs a way to decide the host
> > NVDIMM address space which can be done by parsing ACPI NFIT
> > tables.
>
> If you look at how ACPI OpRegion is handled for IGD passthrough:
>
> 241 ret = xc_domain_iomem_permission(xen_xc, xen_domid,
> 242 (unsigned long)(igd_host_opregion >> XC_PAGE_SHIFT),
> 243 XEN_PCI_INTEL_OPREGION_PAGES,
> 244 XEN_PCI_INTEL_OPREGION_ENABLE_ACCESSED);
>
> 254 ret = xc_domain_memory_mapping(xen_xc, xen_domid,
> 255 (unsigned long)(igd_guest_opregion >> XC_PAGE_SHIFT),
> 256 (unsigned long)(igd_host_opregion >> XC_PAGE_SHIFT),
> 257 XEN_PCI_INTEL_OPREGION_PAGES,
> 258 DPCI_ADD_MAPPING);
>
Yes, I've noticed these two functions. The addition work would be
adding new ones that can accept virtual address, as QEMU has no easy
way to get the physical address of /dev/pmemXX and can only mmap them
into its virtual address space.
> Above can address your 2 questions. Xen doesn't need to tell exactly
> whether the assigned range actually belongs to NVDIMM, just like
> the policy for PCI assignment today.
>
That means Xen hypervisor can trust whatever address dom0 kernel and
QEMU provide?
Thanks,
Haozhong
next prev parent reply other threads:[~2016-01-20 5:58 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-29 11:31 [PATCH 0/4] add support for vNVDIMM Haozhong Zhang
2015-12-29 11:31 ` [PATCH 1/4] x86/hvm: allow guest to use clflushopt and clwb Haozhong Zhang
2015-12-29 15:46 ` Andrew Cooper
2015-12-30 1:35 ` Haozhong Zhang
2015-12-30 2:16 ` Haozhong Zhang
2015-12-30 10:33 ` Andrew Cooper
2015-12-29 11:31 ` [PATCH 2/4] x86/hvm: add support for pcommit instruction Haozhong Zhang
2015-12-29 11:31 ` [PATCH 3/4] tools/xl: add a new xl configuration 'nvdimm' Haozhong Zhang
2016-01-04 11:16 ` Wei Liu
2016-01-06 12:40 ` Jan Beulich
2016-01-06 15:28 ` Haozhong Zhang
2015-12-29 11:31 ` [PATCH 4/4] hvmloader: add support to load extra ACPI tables from qemu Haozhong Zhang
2016-01-15 17:10 ` Jan Beulich
2016-01-18 0:52 ` Haozhong Zhang
2016-01-18 8:46 ` Jan Beulich
2016-01-19 11:37 ` Wei Liu
2016-01-19 11:46 ` Jan Beulich
2016-01-20 5:14 ` Tian, Kevin
2016-01-20 5:58 ` Zhang, Haozhong [this message]
2016-01-20 5:31 ` Haozhong Zhang
2016-01-20 8:46 ` Jan Beulich
2016-01-20 8:58 ` Andrew Cooper
2016-01-20 10:15 ` Haozhong Zhang
2016-01-20 10:36 ` Xiao Guangrong
2016-01-20 13:16 ` Andrew Cooper
2016-01-20 14:29 ` Stefano Stabellini
2016-01-20 14:42 ` Haozhong Zhang
2016-01-20 14:45 ` Andrew Cooper
2016-01-20 14:53 ` Haozhong Zhang
2016-01-20 15:13 ` Konrad Rzeszutek Wilk
2016-01-20 15:29 ` Haozhong Zhang
2016-01-20 15:41 ` Konrad Rzeszutek Wilk
2016-01-20 15:54 ` Haozhong Zhang
2016-01-21 3:35 ` Bob Liu
2016-01-20 15:05 ` Stefano Stabellini
2016-01-20 18:14 ` Andrew Cooper
2016-01-20 14:38 ` Haozhong Zhang
2016-01-20 11:04 ` Haozhong Zhang
2016-01-20 11:20 ` Jan Beulich
2016-01-20 15:29 ` Xiao Guangrong
2016-01-20 15:47 ` Konrad Rzeszutek Wilk
2016-01-20 16:25 ` Xiao Guangrong
2016-01-20 16:47 ` Konrad Rzeszutek Wilk
2016-01-20 16:55 ` Xiao Guangrong
2016-01-20 17:18 ` Konrad Rzeszutek Wilk
2016-01-20 17:23 ` Xiao Guangrong
2016-01-20 17:48 ` Konrad Rzeszutek Wilk
2016-01-21 3:12 ` Haozhong Zhang
2016-01-20 17:07 ` Jan Beulich
2016-01-20 17:17 ` Xiao Guangrong
2016-01-21 8:18 ` Jan Beulich
2016-01-21 8:25 ` Xiao Guangrong
2016-01-21 8:53 ` Jan Beulich
2016-01-21 9:10 ` Xiao Guangrong
2016-01-21 9:29 ` Andrew Cooper
2016-01-21 10:26 ` Jan Beulich
2016-01-21 10:25 ` Jan Beulich
2016-01-21 14:01 ` Haozhong Zhang
2016-01-21 14:52 ` Jan Beulich
2016-01-22 2:43 ` Haozhong Zhang
2016-01-26 11:44 ` George Dunlap
2016-01-26 12:44 ` Jan Beulich
2016-01-26 12:54 ` Juergen Gross
2016-01-26 14:44 ` Konrad Rzeszutek Wilk
2016-01-26 15:37 ` Jan Beulich
2016-01-26 15:57 ` Haozhong Zhang
2016-01-26 16:34 ` Jan Beulich
2016-01-26 19:32 ` Konrad Rzeszutek Wilk
2016-01-27 7:22 ` Haozhong Zhang
2016-01-27 10:16 ` Jan Beulich
2016-01-27 14:50 ` Konrad Rzeszutek Wilk
2016-01-27 10:55 ` George Dunlap
2016-01-26 13:58 ` George Dunlap
2016-01-26 14:46 ` Konrad Rzeszutek Wilk
2016-01-26 15:30 ` Haozhong Zhang
2016-01-26 15:33 ` Haozhong Zhang
2016-01-26 15:57 ` Jan Beulich
2016-01-27 2:23 ` Haozhong Zhang
2016-01-20 15:07 ` Konrad Rzeszutek Wilk
2016-01-06 15:37 ` [PATCH 0/4] add support for vNVDIMM Ian Campbell
2016-01-06 15:47 ` Haozhong Zhang
2016-01-20 3:28 ` Tian, Kevin
2016-01-20 12:43 ` Stefano Stabellini
2016-01-20 14:26 ` Zhang, Haozhong
2016-01-20 14:35 ` Stefano Stabellini
2016-01-20 14:47 ` Zhang, Haozhong
2016-01-20 14:54 ` Andrew Cooper
2016-01-20 15:59 ` Haozhong Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160120055821.GB5005@hz-desktop.sh.intel.com \
--to=haozhong.zhang@intel.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jun.nakajima@intel.com \
--cc=keir@xen.org \
--cc=kevin.tian@intel.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).