Re: [RFC 0 PATCH 3/3] PVH dom0: construct_dom0 changes

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>, keir.xen@gmail.com
Subject: Re: [RFC 0 PATCH 3/3] PVH dom0: construct_dom0 changes
Date: Fri, 4 Oct 2013 12:02:53 -0400	[thread overview]
Message-ID: <20131004160253.GA27398@phenom.dumpdata.com> (raw)
In-Reply-To: <524EE76502000078000F8E25@nat28.tlf.novell.com>

On Fri, Oct 04, 2013 at 03:05:57PM +0100, Jan Beulich wrote:
> >>> On 04.10.13 at 15:35, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Fri, Oct 04, 2013 at 07:53:20AM +0100, Jan Beulich wrote:
> >> >>> On 03.10.13 at 02:53, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:
> >> > On Fri, 27 Sep 2013 07:54:39 +0100
> >> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >> > 
> >> >> >>> On 27.09.13 at 02:17, Mukesh Rathor <mukesh.rathor@oracle.com>
> >> >> >>> wrote:
> >> >> > On Thu, 26 Sep 2013 09:02:41 +0100 "Jan Beulich"
> >> >> > <JBeulich@suse.com> wrote:
> >> >> >> >>> On 25.09.13 at 23:03, Mukesh Rathor <mukesh.rathor@oracle.com>
> >> >> >> >>> wrote:
> >> >> >> > +/*
> >> >> >> > + * Set the 1:1 map for all non-RAM regions for dom 0. Thus,
> >> >> >> > dom0 will have
> >> >> >> > + * the entire io region mapped in the EPT/NPT.
> >> >> >> > + *
> >> >> >> > + * PVH FIXME: The following doesn't map MMIO ranges when they
> >> >> >> > sit above the
> >> >> >> > + *            highest E820 covered address.
> >> >> >> 
> >> >> >> This absolutely needs fixing before this can go in.
> >> >> > 
> >> >> > Any suggestions on how to fix it? Mapping all the way to end could
> >> >> > result in a huge hap table. 
> >> >> 
> >> >> You'll probably need a call down from Dom0 telling you where it
> >> >> finds/puts MMIO resources. Or perhaps that could be mapped
> >> >> in on demand from the EPT fault handler (since these regions
> >> >> shouldn't be subject to DMA, and hence IOMMU faults shouldn't
> >> >> occur - perhaps that's even a reason to not share page tables
> >> >> at least in dom0-strict mode)?
> >> > 
> >> > Thinking about mapping in on demand from the EPT fault handler, how
> >> > would I know if the access beyond last e820 entry is genuine and not 
> >> > a faulty pte in a buggy guest? Could I consult the mmconfig table (?) 
> >> > or the ACPI table in xen? Any pointers would be helpful... my 
> >> > knowledge runs out quickly here.
> >> 
> >> You'd have to inspect all the BARs of the devices the domain owns.
> >> Hence the thought of having Dom0 tell you about those resource
> >> assignments.
> > 
> > Doesn't that happen via PHYSDEVOP_pci_device_add hypercalls?
> 
> That may (and I think does) happen before resource assignment.
> 
> >> > FWIW, at present pv-ops linux doesn't allow any mmio access beyond
> >> > the last e820 entry. So, we'd need a fix there too. In my very orig
> >> > patch, I was updating all IO mappings on demand by putting hook
> >> > in linux native_pte_update if it was _PAGE_BIT_IOMAP. Another 
> >> > possibility would be do that for any mappings above the last
> >> > e820 entry. What do you think?
> >> 
> >> Special casing IOMAP page table creation might be an option, but
> >> has the downside of allowing kernel bugs to propagate into Xen's
> >> view of the world.
> >> 
> >> > For testing purposes, do you have reference for hardware? I don't see 
> >> > any here with such configuration.
> >> 
> >> Nothing specific, but I know that SR-IOV virtual functions easily
> >> cause kernels to run out of MMIO space below 4G (namely when
> >> the hole is only around 1Gb or even less), and Intel must have
> >> knowledge of graphics cards having so huge a frame buffer that
> >> it can only be mapped above 4G.
> > 
> > Right, but the BIOS Writers Guide and docs all talk setting the MCFG
> > up for that. Granted the MCFG (or was the ACPI spec?) says that the 
> > MCFG regions do not have to be defined in the E820.
> 
> What do MCFG regions have to do with device MMIO ones?

Actually - nothing at all. I somehow was under the impression that
MCFG and MMIO regions would be in the same memory area (as in
MCFG follows the end of MMIO region). But of course
nothing would be that simple.
> 
> > You pointed out also that the MCFG entries might come out from
> > the ACPI DSDT. Which I think all comes back to dom0 parsing this and
> > providing this sort of information back to the hypervisor?
> 
> For the MCFG, yes. But not for individual BARs of devices.

So back to hooking up a new hypercall in the PCI subsystem when
resource assigment has been completed? And also if the PCI subsystem
decides to re-write the resource addresses to odd locations.

Can't one also trap for the configuration changes on the PCI
devices and extract the physical locations then?

> 
> Jan
>

next prev parent reply	other threads:[~2013-10-04 16:03 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-25 21:03 [RFC 0 PATCH 0/3]: PVH dom0 construction Mukesh Rathor
2013-09-25 21:03 ` [RFC 0 PATCH 1/3] PVH dom0: create domctl_memory_mapping() function Mukesh Rathor
2013-09-26  7:03   ` Jan Beulich
2013-09-25 21:03 ` [RFC 0 PATCH 2/3] PVH dom0: move some pv specific code to static functions Mukesh Rathor
2013-09-26  7:21   ` Jan Beulich
2013-09-26 23:32     ` Mukesh Rathor
2013-09-25 21:03 ` [RFC 0 PATCH 3/3] PVH dom0: construct_dom0 changes Mukesh Rathor
2013-09-26  8:02   ` Jan Beulich
2013-09-27  0:17     ` Mukesh Rathor
2013-09-27  6:54       ` Jan Beulich
2013-10-03  0:53         ` Mukesh Rathor
2013-10-04  6:53           ` Jan Beulich
2013-10-04 13:35             ` Konrad Rzeszutek Wilk
2013-10-04 14:05               ` Jan Beulich
2013-10-04 16:02                 ` Konrad Rzeszutek Wilk [this message]
2013-10-04 16:07                   ` Jan Beulich
2013-10-04 20:59                     ` Konrad Rzeszutek Wilk
2013-10-05  1:06                       ` Mukesh Rathor
2013-10-07  7:12                         ` Jan Beulich
2013-10-08  0:58             ` Mukesh Rathor
2013-10-08  7:51               ` Jan Beulich
2013-10-08  8:03                 ` Jan Beulich
2013-10-08  9:39                   ` George Dunlap
2013-10-08  9:57                     ` Jan Beulich
2013-10-08 10:01                       ` George Dunlap
2013-10-08 10:19                         ` Lars Kurth
2013-10-08 12:30                     ` Konrad Rzeszutek Wilk
2013-10-09 13:02                       ` George Dunlap
2013-10-09 13:13                         ` Andrew Cooper
2013-10-09 13:16                           ` George Dunlap
2013-10-09 14:37                             ` Andrew Cooper
2013-10-09 17:50                       ` Tim Deegan
2013-10-09 22:31                         ` Mukesh Rathor
2013-09-27  1:55     ` Mukesh Rathor
2013-09-27  7:01       ` Jan Beulich
2013-09-27 23:03         ` Mukesh Rathor
2013-09-30  6:56           ` Jan Beulich
2013-10-08  0:52             ` Mukesh Rathor
2013-10-08  7:43               ` Jan Beulich
2013-10-09 21:59                 ` Mukesh Rathor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131004160253.GA27398@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=keir.xen@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).