From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Zihan Yang <whois.zihan.yang@gmail.com>,
qemu-devel@nongnu.org, Igor Mammedov <imammedo@redhat.com>,
Eric Auger <eauger@redhat.com>, Drew Jones <drjones@redhat.com>,
Wei Huang <wei@redhat.com>
Subject: Re: [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges
Date: Wed, 23 May 2018 18:01:56 +0300 [thread overview]
Message-ID: <20180523180019-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20180523085751.1ff46b2e@w520.home>
On Wed, May 23, 2018 at 08:57:51AM -0600, Alex Williamson wrote:
> On Wed, 23 May 2018 17:25:32 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Tue, May 22, 2018 at 10:28:56PM -0600, Alex Williamson wrote:
> > > On Wed, 23 May 2018 02:38:52 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > > On Tue, May 22, 2018 at 03:47:41PM -0600, Alex Williamson wrote:
> > > > > On Wed, 23 May 2018 00:44:22 +0300
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > >
> > > > > > On Tue, May 22, 2018 at 03:36:59PM -0600, Alex Williamson wrote:
> > > > > > > On Tue, 22 May 2018 23:58:30 +0300
> > > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > It's not hard to think of a use-case where >256 devices
> > > > > > > > are helpful, for example a nested virt scenario where
> > > > > > > > each device is passed on to a different nested guest.
> > > > > > > >
> > > > > > > > But I think the main feature this is needed for is numa modeling.
> > > > > > > > Guests seem to assume a numa node per PCI root, ergo we need more PCI
> > > > > > > > roots.
> > > > > > >
> > > > > > > But even if we have NUMA affinity per PCI host bridge, a PCI host
> > > > > > > bridge does not necessarily imply a new PCIe domain.
> > > > > >
> > > > > > What are you calling a PCIe domain?
> > > > >
> > > > > Domain/segment
> > > > >
> > > > > 0000:00:00.0
> > > > > ^^^^ This
> > > >
> > > > Right. So we can thinkably have PCIe root complexes share an ACPI segment.
> > > > I don't see what this buys us by itself.
> > >
> > > The ability to define NUMA locality for a PCI sub-hierarchy while
> > > maintaining compatibility with non-segment aware OSes (and firmware).
> >
> > Fur sure, but NUMA is a kind of advanced topic, MCFG has been around for
> > longer than various NUMA tables. Are there really non-segment aware
> > guests that also know how to make use of NUMA?
>
> I can't answer that question, but I assume that multi-segment PCI
> support is perhaps not as pervasive as we may think considering hardware
> OEMs tend to avoid it for their default configurations with multiple
> host bridges.
>
> > > > > Isn't that the only reason we'd need a new MCFG section and the reason
> > > > > we're limited to 256 buses? Thanks,
> > > > >
> > > > > Alex
> > > >
> > > > I don't know whether a single MCFG section can describe multiple roots.
> > > > I think it would be certainly unusual.
> > >
> > > I'm not sure here if you're referring to the actual MCFG ACPI table or
> > > the MMCONFIG range, aka the ECAM. Neither of these describe PCI host
> > > bridges. The MCFG table can describe one or more ECAM ranges, which
> > > provides the ECAM base address, the PCI segment associated with that
> > > ECAM and the start and end bus numbers to know the offset and extent of
> > > the ECAM range. PCI host bridges would then theoretically be separate
> > > ACPI objects with _SEG and _BBN methods to associate them to the
> > > correct ECAM range by segment number and base bus number. So it seems
> > > that tooling exists that an ECAM/MMCONFIG range could be provided per
> > > PCI host bridge, even if they exist within the same domain, but in
> > > practice what I see on systems I have access to is a single MMCONFIG
> > > range supporting all of the host bridges. It also seems there are
> > > numerous ways to describe the MMCONFIG range and I haven't actually
> > > found an example that seems to use the MCFG table. Two have MCFG
> > > tables (that don't seem terribly complete) and the kernel claims to
> > > find the MMCONFIG via e820, another doesn't even have an MCFG table and
> > > the kernel claims to find MMCONFIG via an ACPI motherboard resource.
> > > I'm not sure if I can enable PCI segments on anything to see how the
> > > firmware changes. Thanks,
> > >
> > > Alex
> >
> > Let me clarify. So MCFG have base address allocation structures.
> > Each maps a segment and a range of bus numbers into memory.
> > This structure is what I meant.
>
> Ok, so this is the ECAM/MMCONFIG range through which we do config
> accesses, which is described by MCFG, among other options.
>
> > IIUC you are saying on your systems everything is within a single
> > segment, right? Multiple pci hosts map into a single segment?
>
> Yes, for instance a single MMCONFIG range handles bus number ranges
> 0x00-0x7f within segment 0x0 and the system has host bridges with base
> bus numbers of 0x00 and 0x40, each with different NUMA locality.
>
> > If you do this you can do NUMA, but do not gain > 256 devices.
>
> Correct, but let's also clarify that we're not limited to 256 devices,
> a segment is limited to 256 buses and each PCIe slot is a bus, so the
> limitation is number of hotpluggable slots. "Devices" implies that it
> includes multi-function, ARI, and SR-IOV devices as well, but we can
> have 256 of those per bus, we just don't have the desired hotplug
> granularity for those.
Right, I consider a group of PF and all its VFs a device,
and all functions in a multi-function device a single
device for this purpose.
> > Are we are the same page then?
>
> Seems so. Thanks,
>
> Alex
next prev parent reply other threads:[~2018-05-23 15:02 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-20 7:28 [Qemu-devel] [RFC 0/3] pci_expander_brdige: Put pxb host bridge into separate pci domain Zihan Yang
2018-05-20 7:28 ` [Qemu-devel] [RFC 1/3] pci_expander_bridge: reserve enough mcfg space for pxb host Zihan Yang
2018-05-21 11:03 ` Marcel Apfelbaum
2018-05-22 5:59 ` Zihan Yang
2018-05-22 18:47 ` Marcel Apfelbaum
2018-05-20 7:28 ` [Qemu-devel] [RFC 2/3] pci: Link pci_host_bridges with QTAILQ Zihan Yang
2018-05-21 11:05 ` Marcel Apfelbaum
2018-05-22 5:59 ` Zihan Yang
2018-05-22 18:39 ` Marcel Apfelbaum
2018-05-20 7:28 ` [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges Zihan Yang
2018-05-21 11:53 ` Marcel Apfelbaum
2018-05-22 6:03 ` Zihan Yang
2018-05-22 18:43 ` Marcel Apfelbaum
2018-05-22 9:52 ` Laszlo Ersek
2018-05-22 19:01 ` Marcel Apfelbaum
2018-05-22 19:51 ` Laszlo Ersek
2018-05-22 20:58 ` Michael S. Tsirkin
2018-05-22 21:36 ` Alex Williamson
2018-05-22 21:44 ` Michael S. Tsirkin
2018-05-22 21:47 ` Alex Williamson
2018-05-22 22:00 ` Laszlo Ersek
2018-05-22 23:38 ` Michael S. Tsirkin
2018-05-23 4:28 ` Alex Williamson
2018-05-23 14:25 ` Michael S. Tsirkin
2018-05-23 14:57 ` Alex Williamson
2018-05-23 15:01 ` Michael S. Tsirkin [this message]
2018-05-23 16:50 ` Marcel Apfelbaum
2018-05-22 21:17 ` Alex Williamson
2018-05-22 21:22 ` Michael S. Tsirkin
2018-05-22 21:58 ` Laszlo Ersek
2018-05-22 21:50 ` Laszlo Ersek
2018-05-23 17:00 ` Marcel Apfelbaum
2018-05-22 22:42 ` Laszlo Ersek
2018-05-22 23:40 ` Michael S. Tsirkin
2018-05-23 7:32 ` Laszlo Ersek
2018-05-23 11:11 ` Zihan Yang
2018-05-23 12:28 ` Laszlo Ersek
2018-05-23 17:23 ` Marcel Apfelbaum
2018-05-24 9:57 ` Zihan Yang
2018-05-23 17:33 ` Marcel Apfelbaum
2018-05-24 10:00 ` Zihan Yang
2018-05-23 17:11 ` Marcel Apfelbaum
2018-05-23 17:25 ` Laszlo Ersek
2018-05-28 11:02 ` Laszlo Ersek
2018-05-21 15:23 ` [Qemu-devel] [RFC 0/3] pci_expander_brdige: Put pxb host bridge into separate pci domain Marcel Apfelbaum
2018-05-22 6:04 ` Zihan Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180523180019-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=drjones@redhat.com \
--cc=eauger@redhat.com \
--cc=imammedo@redhat.com \
--cc=lersek@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=qemu-devel@nongnu.org \
--cc=wei@redhat.com \
--cc=whois.zihan.yang@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.