qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Zihan Yang <whois.zihan.yang@gmail.com>,
	qemu-devel@nongnu.org, Igor Mammedov <imammedo@redhat.com>,
	Eric Auger <eauger@redhat.com>, Drew Jones <drjones@redhat.com>,
	Wei Huang <wei@redhat.com>
Subject: Re: [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges
Date: Wed, 23 May 2018 18:01:56 +0300	[thread overview]
Message-ID: <20180523180019-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20180523085751.1ff46b2e@w520.home>

On Wed, May 23, 2018 at 08:57:51AM -0600, Alex Williamson wrote:
> On Wed, 23 May 2018 17:25:32 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, May 22, 2018 at 10:28:56PM -0600, Alex Williamson wrote:
> > > On Wed, 23 May 2018 02:38:52 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > On Tue, May 22, 2018 at 03:47:41PM -0600, Alex Williamson wrote:  
> > > > > On Wed, 23 May 2018 00:44:22 +0300
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > >     
> > > > > > On Tue, May 22, 2018 at 03:36:59PM -0600, Alex Williamson wrote:    
> > > > > > > On Tue, 22 May 2018 23:58:30 +0300
> > > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:       
> > > > > > > >
> > > > > > > > It's not hard to think of a use-case where >256 devices
> > > > > > > > are helpful, for example a nested virt scenario where
> > > > > > > > each device is passed on to a different nested guest.
> > > > > > > >
> > > > > > > > But I think the main feature this is needed for is numa modeling.
> > > > > > > > Guests seem to assume a numa node per PCI root, ergo we need more PCI
> > > > > > > > roots.      
> > > > > > > 
> > > > > > > But even if we have NUMA affinity per PCI host bridge, a PCI host
> > > > > > > bridge does not necessarily imply a new PCIe domain.      
> > > > > > 
> > > > > > What are you calling a PCIe domain?    
> > > > > 
> > > > > Domain/segment
> > > > > 
> > > > > 0000:00:00.0
> > > > > ^^^^ This    
> > > > 
> > > > Right. So we can thinkably have PCIe root complexes share an ACPI segment.
> > > > I don't see what this buys us by itself.  
> > > 
> > > The ability to define NUMA locality for a PCI sub-hierarchy while
> > > maintaining compatibility with non-segment aware OSes (and firmware).  
> > 
> > Fur sure, but NUMA is a kind of advanced topic, MCFG has been around for
> > longer than various NUMA tables. Are there really non-segment aware
> > guests that also know how to make use of NUMA?
> 
> I can't answer that question, but I assume that multi-segment PCI
> support is perhaps not as pervasive as we may think considering hardware
> OEMs tend to avoid it for their default configurations with multiple
> host bridges.
> 
> > > > > Isn't that the only reason we'd need a new MCFG section and the reason
> > > > > we're limited to 256 buses?  Thanks,
> > > > > 
> > > > > Alex    
> > > > 
> > > > I don't know whether a single MCFG section can describe multiple roots.
> > > > I think it would be certainly unusual.  
> > > 
> > > I'm not sure here if you're referring to the actual MCFG ACPI table or
> > > the MMCONFIG range, aka the ECAM.  Neither of these describe PCI host
> > > bridges.  The MCFG table can describe one or more ECAM ranges, which
> > > provides the ECAM base address, the PCI segment associated with that
> > > ECAM and the start and end bus numbers to know the offset and extent of
> > > the ECAM range.  PCI host bridges would then theoretically be separate
> > > ACPI objects with _SEG and _BBN methods to associate them to the
> > > correct ECAM range by segment number and base bus number.  So it seems
> > > that tooling exists that an ECAM/MMCONFIG range could be provided per
> > > PCI host bridge, even if they exist within the same domain, but in
> > > practice what I see on systems I have access to is a single MMCONFIG
> > > range supporting all of the host bridges.  It also seems there are
> > > numerous ways to describe the MMCONFIG range and I haven't actually
> > > found an example that seems to use the MCFG table.  Two have MCFG
> > > tables (that don't seem terribly complete) and the kernel claims to
> > > find the MMCONFIG via e820, another doesn't even have an MCFG table and
> > > the kernel claims to find MMCONFIG via an ACPI motherboard resource.
> > > I'm not sure if I can enable PCI segments on anything to see how the
> > > firmware changes.  Thanks,
> > > 
> > > Alex  
> > 
> > Let me clarify.  So MCFG have base address allocation structures.
> > Each maps a segment and a range of bus numbers into memory.
> > This structure is what I meant.
> 
> Ok, so this is the  ECAM/MMCONFIG range through which we do config
> accesses, which is described by MCFG, among other options.
> 
> > IIUC you are saying on your systems everything is within a single
> > segment, right? Multiple pci hosts map into a single segment?
> 
> Yes, for instance a single MMCONFIG range handles bus number ranges
> 0x00-0x7f within segment 0x0 and the system has host bridges with base
> bus numbers of 0x00 and 0x40, each with different NUMA locality.
> 
> > If you do this you can do NUMA, but do not gain > 256 devices.
> 
> Correct, but let's also clarify that we're not limited to 256 devices,
> a segment is limited to 256 buses and each PCIe slot is a bus, so the
> limitation is number of hotpluggable slots.  "Devices" implies that it
> includes multi-function, ARI, and SR-IOV devices as well, but we can
> have 256 of those per bus, we just don't have the desired hotplug
> granularity for those.

Right, I consider a group of PF and all its VFs a device,
and all functions in a multi-function device a single
device for this purpose.

> > Are we are the same page then?
> 
> Seems so.  Thanks,
> 
> Alex

  reply	other threads:[~2018-05-23 15:02 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-20  7:28 [Qemu-devel] [RFC 0/3] pci_expander_brdige: Put pxb host bridge into separate pci domain Zihan Yang
2018-05-20  7:28 ` [Qemu-devel] [RFC 1/3] pci_expander_bridge: reserve enough mcfg space for pxb host Zihan Yang
2018-05-21 11:03   ` Marcel Apfelbaum
2018-05-22  5:59     ` Zihan Yang
2018-05-22 18:47       ` Marcel Apfelbaum
2018-05-20  7:28 ` [Qemu-devel] [RFC 2/3] pci: Link pci_host_bridges with QTAILQ Zihan Yang
2018-05-21 11:05   ` Marcel Apfelbaum
2018-05-22  5:59     ` Zihan Yang
2018-05-22 18:39       ` Marcel Apfelbaum
2018-05-20  7:28 ` [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges Zihan Yang
2018-05-21 11:53   ` Marcel Apfelbaum
2018-05-22  6:03     ` Zihan Yang
2018-05-22 18:43       ` Marcel Apfelbaum
2018-05-22  9:52     ` Laszlo Ersek
2018-05-22 19:01       ` Marcel Apfelbaum
2018-05-22 19:51         ` Laszlo Ersek
2018-05-22 20:58           ` Michael S. Tsirkin
2018-05-22 21:36             ` Alex Williamson
2018-05-22 21:44               ` Michael S. Tsirkin
2018-05-22 21:47                 ` Alex Williamson
2018-05-22 22:00                   ` Laszlo Ersek
2018-05-22 23:38                   ` Michael S. Tsirkin
2018-05-23  4:28                     ` Alex Williamson
2018-05-23 14:25                       ` Michael S. Tsirkin
2018-05-23 14:57                         ` Alex Williamson
2018-05-23 15:01                           ` Michael S. Tsirkin [this message]
2018-05-23 16:50                         ` Marcel Apfelbaum
2018-05-22 21:17           ` Alex Williamson
2018-05-22 21:22             ` Michael S. Tsirkin
2018-05-22 21:58               ` Laszlo Ersek
2018-05-22 21:50             ` Laszlo Ersek
2018-05-23 17:00             ` Marcel Apfelbaum
2018-05-22 22:42           ` Laszlo Ersek
2018-05-22 23:40             ` Michael S. Tsirkin
2018-05-23  7:32               ` Laszlo Ersek
2018-05-23 11:11                 ` Zihan Yang
2018-05-23 12:28                   ` Laszlo Ersek
2018-05-23 17:23                     ` Marcel Apfelbaum
2018-05-24  9:57                     ` Zihan Yang
2018-05-23 17:33                   ` Marcel Apfelbaum
2018-05-24 10:00                     ` Zihan Yang
2018-05-23 17:11                 ` Marcel Apfelbaum
2018-05-23 17:25                   ` Laszlo Ersek
2018-05-28 11:02                 ` Laszlo Ersek
2018-05-21 15:23 ` [Qemu-devel] [RFC 0/3] pci_expander_brdige: Put pxb host bridge into separate pci domain Marcel Apfelbaum
2018-05-22  6:04   ` Zihan Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180523180019-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=drjones@redhat.com \
    --cc=eauger@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=lersek@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wei@redhat.com \
    --cc=whois.zihan.yang@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).