From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Xen, MCFG acpi table and E820 address map Date: Wed, 4 Sep 2013 11:01:00 -0400 Message-ID: <20130904150100.GF3188@phenom.dumpdata.com> References: <722BF1F4E241B546BB147E89923C56DF09F5E6@SJCPEX01CL01.citrite.net> <5227168A02000078000F043D@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1VHEZx-0001Rt-5Q for xen-devel@lists.xenproject.org; Wed, 04 Sep 2013 15:01:09 +0000 Content-Disposition: inline In-Reply-To: <5227168A02000078000F043D@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: xen-devel , Boris Ostrovsky , David Vrabel , Santosh Jodh List-Id: xen-devel@lists.xenproject.org On Wed, Sep 04, 2013 at 10:16:26AM +0100, Jan Beulich wrote: > >>> On 04.09.13 at 03:13, Santosh Jodh wrote: > > Xen will use information from MCFG acpi table to access PCIe extended > > configuration space. However, Xen validates MCFG table by making sure that > > the addresses specified in the MCFG table is correctly marked as reserved in > > the E820 address map. If it is not, the MCFG table is ignored - thereby > > preventing Xen from accessing PCIe extended configuration space. > > > > I recently came across a workstation class system that supports VT-d. This > > system BIOS has a valid MCFG table. The BIOS does NOT report the MCFG > > addresses as reserved in the E820 address map. However, the addresses ARE > > claimed as reserved via the ACPI motherboard resource devnode (PNP0C01) > > mechanism. Could you tell me what machine this is? It would be good to know to develop a patch against it. > > > > On this system, Xen ignores the MCFG table as it fails the E820 check. dom0 > > during early boot does the same thing. However, once ACPI driver is online, > > dom0 validates the MCFG table against the motherboard resource devnode and > > then accepts the MCFG table. You now end up with a situation where dom0 can > > correctly enumerate and use PCIe devices but Xen cannot access extended > > configuration space. > > [...] > > Is this machine an exception? As we move forward, are we likely to see more > > such systems (relying more on ACPI and less on E820 for reserving the MCFG > > range)? Is it worth adding a mmcfg=force option to xen to ignore E820 > > validation result? > > No, this is quite normal, and Xen has code to deal with that > (PHYSDEVOP_pci_mmcfg_reserved). While our (forward ported) > kernels have been making use of this since 2.6.27, I suppose > upstream still doesn't (perhaps not the least because integration > of this might once again raise concerns about Xen needing code > changes in too many different places). > > For reference I'm attaching the diff of the respective source file, > in case someone finds this useful. Thank you for providing the patch. Is there a nicer way of doing this? Inserting Xen codepaths right there is on the high list of no-no. > > Jan > > --- linux-3.11/arch/x86/pci/mmconfig-shared.c > +++ head/arch/x86/pci/mmconfig-shared.c > @@ -23,6 +23,10 @@ > #include > #include > > +#ifdef CONFIG_XEN > +#include > +#endif > + > #define PREFIX "PCI: " > > /* Indicate if the mmcfg resources have been placed into the resource table. */ > @@ -437,6 +441,36 @@ static int is_acpi_reserved(u64 start, u > return mcfg_res.flags; > } > > +static int xen_report_mmconf_reserved(const struct pci_mmcfg_region *cfg, > + int valid, int with_e820) > +{ > +#ifdef CONFIG_XEN > + if (!with_e820) { > + struct physdev_pci_mmcfg_reserved r = { > + .address = cfg->address, > + .segment = cfg->segment, > + .start_bus = cfg->start_bus, > + .end_bus = cfg->end_bus, > + .flags = valid ? XEN_PCI_MMCFG_RESERVED : 0 > + }; > + int rc; > + > + rc = HYPERVISOR_physdev_op(PHYSDEVOP_pci_mmcfg_reserved, &r); > + switch (rc) { > + case 0: case -ENOSYS: > + break; > + default: > + pr_warn(PREFIX "Failed to report MMCONFIG reservation" > + " state for %04x [bus%02x-%02x] to hypervisor" > + " (%d)\n", > + cfg->segment, cfg->start_bus, cfg->end_bus, > + rc); > + } > + } > +#endif > + return valid; > +} > + > typedef int (*check_reserved_t)(u64 start, u64 end, unsigned type); > > static int __ref is_mmconf_reserved(check_reserved_t is_reserved, > @@ -456,7 +490,7 @@ static int __ref is_mmconf_reserved(chec > } > > if (size < (16UL<<20) && size != old_size) > - return 0; > + return xen_report_mmconf_reserved(cfg, 0, with_e820); > > if (dev) > dev_info(dev, "MMCONFIG at %pR reserved in %s\n", > @@ -488,7 +522,7 @@ static int __ref is_mmconf_reserved(chec > &cfg->res, (unsigned long) cfg->address); > } > > - return 1; > + return xen_report_mmconf_reserved(cfg, 1, with_e820); > } > > static int __ref pci_mmcfg_check_reserved(struct device *dev,