From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
David Vrabel <david.vrabel@citrix.com>,
Santosh Jodh <Santosh.Jodh@citrix.com>
Subject: Re: Xen, MCFG acpi table and E820 address map
Date: Wed, 4 Sep 2013 11:01:00 -0400 [thread overview]
Message-ID: <20130904150100.GF3188@phenom.dumpdata.com> (raw)
In-Reply-To: <5227168A02000078000F043D@nat28.tlf.novell.com>
On Wed, Sep 04, 2013 at 10:16:26AM +0100, Jan Beulich wrote:
> >>> On 04.09.13 at 03:13, Santosh Jodh <Santosh.Jodh@citrix.com> wrote:
> > Xen will use information from MCFG acpi table to access PCIe extended
> > configuration space. However, Xen validates MCFG table by making sure that
> > the addresses specified in the MCFG table is correctly marked as reserved in
> > the E820 address map. If it is not, the MCFG table is ignored - thereby
> > preventing Xen from accessing PCIe extended configuration space.
> >
> > I recently came across a workstation class system that supports VT-d. This
> > system BIOS has a valid MCFG table. The BIOS does NOT report the MCFG
> > addresses as reserved in the E820 address map. However, the addresses ARE
> > claimed as reserved via the ACPI motherboard resource devnode (PNP0C01)
> > mechanism.
Could you tell me what machine this is? It would be good to know to develop
a patch against it.
> >
> > On this system, Xen ignores the MCFG table as it fails the E820 check. dom0
> > during early boot does the same thing. However, once ACPI driver is online,
> > dom0 validates the MCFG table against the motherboard resource devnode and
> > then accepts the MCFG table. You now end up with a situation where dom0 can
> > correctly enumerate and use PCIe devices but Xen cannot access extended
> > configuration space.
> > [...]
> > Is this machine an exception? As we move forward, are we likely to see more
> > such systems (relying more on ACPI and less on E820 for reserving the MCFG
> > range)? Is it worth adding a mmcfg=force option to xen to ignore E820
> > validation result?
>
> No, this is quite normal, and Xen has code to deal with that
> (PHYSDEVOP_pci_mmcfg_reserved). While our (forward ported)
> kernels have been making use of this since 2.6.27, I suppose
> upstream still doesn't (perhaps not the least because integration
> of this might once again raise concerns about Xen needing code
> changes in too many different places).
>
> For reference I'm attaching the diff of the respective source file,
> in case someone finds this useful.
Thank you for providing the patch.
Is there a nicer way of doing this? Inserting Xen codepaths right
there is on the high list of no-no.
>
> Jan
>
> --- linux-3.11/arch/x86/pci/mmconfig-shared.c
> +++ head/arch/x86/pci/mmconfig-shared.c
> @@ -23,6 +23,10 @@
> #include <asm/pci_x86.h>
> #include <asm/acpi.h>
>
> +#ifdef CONFIG_XEN
> +#include <xen/interface/physdev.h>
> +#endif
> +
> #define PREFIX "PCI: "
>
> /* Indicate if the mmcfg resources have been placed into the resource table. */
> @@ -437,6 +441,36 @@ static int is_acpi_reserved(u64 start, u
> return mcfg_res.flags;
> }
>
> +static int xen_report_mmconf_reserved(const struct pci_mmcfg_region *cfg,
> + int valid, int with_e820)
> +{
> +#ifdef CONFIG_XEN
> + if (!with_e820) {
> + struct physdev_pci_mmcfg_reserved r = {
> + .address = cfg->address,
> + .segment = cfg->segment,
> + .start_bus = cfg->start_bus,
> + .end_bus = cfg->end_bus,
> + .flags = valid ? XEN_PCI_MMCFG_RESERVED : 0
> + };
> + int rc;
> +
> + rc = HYPERVISOR_physdev_op(PHYSDEVOP_pci_mmcfg_reserved, &r);
> + switch (rc) {
> + case 0: case -ENOSYS:
> + break;
> + default:
> + pr_warn(PREFIX "Failed to report MMCONFIG reservation"
> + " state for %04x [bus%02x-%02x] to hypervisor"
> + " (%d)\n",
> + cfg->segment, cfg->start_bus, cfg->end_bus,
> + rc);
> + }
> + }
> +#endif
> + return valid;
> +}
> +
> typedef int (*check_reserved_t)(u64 start, u64 end, unsigned type);
>
> static int __ref is_mmconf_reserved(check_reserved_t is_reserved,
> @@ -456,7 +490,7 @@ static int __ref is_mmconf_reserved(chec
> }
>
> if (size < (16UL<<20) && size != old_size)
> - return 0;
> + return xen_report_mmconf_reserved(cfg, 0, with_e820);
>
> if (dev)
> dev_info(dev, "MMCONFIG at %pR reserved in %s\n",
> @@ -488,7 +522,7 @@ static int __ref is_mmconf_reserved(chec
> &cfg->res, (unsigned long) cfg->address);
> }
>
> - return 1;
> + return xen_report_mmconf_reserved(cfg, 1, with_e820);
> }
>
> static int __ref pci_mmcfg_check_reserved(struct device *dev,
next prev parent reply other threads:[~2013-09-04 15:01 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-04 1:13 Xen, MCFG acpi table and E820 address map Santosh Jodh
2013-09-04 9:16 ` Jan Beulich
2013-09-04 15:01 ` Konrad Rzeszutek Wilk [this message]
2013-09-04 15:12 ` Jan Beulich
2013-09-04 16:14 ` Santosh Jodh
2013-09-04 16:24 ` Gordan Bobic
2013-09-04 16:26 ` Santosh Jodh
2013-09-05 7:47 ` Jan Beulich
2013-09-05 16:47 ` Santosh Jodh
2013-09-06 6:32 ` Jan Beulich
2013-09-05 7:44 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130904150100.GF3188@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=JBeulich@suse.com \
--cc=Santosh.Jodh@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=david.vrabel@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.