qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Zihan Yang <whois.zihan.yang@gmail.com>, qemu-devel@nongnu.org
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Igor Mammedov <imammedo@redhat.com>,
	alex.williamson@redhat.com, eauger@redhat.com,
	drjones@redhat.com, wei@redhat.com
Subject: Re: [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges
Date: Wed, 23 May 2018 14:28:54 +0200	[thread overview]
Message-ID: <83f46e13-b3df-2f90-bd39-8822c47cfa6e@redhat.com> (raw)
In-Reply-To: <CAKwiv-jMKSUYusO4S4HR5HCMXLxGNrpYwywm1dWBWsFKMEGzpg@mail.gmail.com>

On 05/23/18 13:11, Zihan Yang wrote:
> Hi all,

> The original purpose was just to support multiple segments in Intel
> Q35 archtecure for PCIe topology, which makes bus number a less scarce
> resource. The patches are very primitive and many things are left for
> firmware to finish(the initial plan was to implement it in SeaBIOS),
> the AML part in QEMU is not finished either. I'm not familiar with
> OVMF or edk2, so there is no plan to touch it yet, but it seems not
> necessary since it already supports multi-segment in the end.

That's incorrect. EDK2 stands for "EFI Development Kit II", and it is a
collection of "universal" (= platform- and ISA-independent) modules
(drivers and libraries), and platfor- and/or ISA-dependent modules
(drivers and libraries). The OVMF firmware is built from a subset of
these modules; the final firmware image includes modules from both
categories -- universal modules, and modules specific to the i440fx and
Q35 QEMU boards. The first category generally lives under MdePkg/,
MdeModulePkg/, UefiCpuPkg/, NetworkPkg/, PcAtChipsetPkg, etc; while the
second category lives under OvmfPkg/.

(The exact same applies to the ArmVirtQemu firmware, with the second
category consisting of ArmVirtPkg/ and OvmfPkg/ modules.)

When we discuss anything PCI-related in edk2, it usually affects both
categories:

(a) the universal/core modules, such as

  - the PCI host bridge / root bridge driver at
    "MdeModulePkg/Bus/Pci/PciHostBridgeDxe",

  - the PCI bus driver at "MdeModulePkg/Bus/Pci/PciBusDxe",

(b) and the platform-specific modules, such as

  - "OvmfPkg/IncompatiblePciDeviceSupportDxe" which causes PciBusDxe to
    allocate 64-bit MMIO BARs above 4 GB, regardless of option ROM
    availability (as long as a CSM is not present), conserving 32-bit
    MMIO aperture for 32-bit BARs,

  - "OvmfPkg/PciHotPlugInitDxe", which implements support for QEMU's
    resource reservation hints, so that we can avoid IO space exhaustion
    with many PCIe root ports, and so that we can reserve MMIO aperture
    for hot-plugging devices with large MMIO BARs,

  - "OvmfPkg/Library/DxePciLibI440FxQ35", which is a low-level PCI
    config space access library, usable in the DXE and later phases,
    that plugs into several drivers, and uses 0xCF8/0xCFC on i440x, and
    ECAM on Q35,

  - "OvmfPkg/Library/PciHostBridgeLib", which plugs into
    "PciHostBridgeDxe" above, exposing the various resource apertures to
    said host bridge / root bridge driver, and implementing support for
    the PXB / PXBe devices,

  - "OvmfPkg/PlatformPei", which is an early (PEI phase) module with a
    grab-bag of platform support code; e.g. it informs
    "DxePciLibI440FxQ35" above about the QEMU board being Q35 vs.
    i440fx, it configures the ECAM (exbar) registers on Q35, it
    determines where the 32-bit and 64-bit PCI MMIO apertures should be;

  - "ArmVirtPkg/Library/BaseCachingPciExpressLib", which is the
    aarch64/virt counterpart of "DxePciLibI440FxQ35" above,

  - "ArmVirtPkg/Library/FdtPciHostBridgeLib", which is the aarch64/virt
    counterpart of "PciHostBridgeLib", consuming the DTB exposed by
    qemu-system-aarch64,

  - "ArmVirtPkg/Library/FdtPciPcdProducerLib", which is an internal
    library that turns parts of the DTB that is exposed by
    qemu-system-aarch64 into various PCI-related, firmware-wide, scalar
    variables (called "PCDs"), upon which both
    "BaseCachingPciExpressLib" and "FdtPciHostBridgeLib" rely.

The point is that any PCI feature in any edk2 platform firmware comes
together from (a) core module support for the feature, and (b) platform
integration between the core code and the QEMU board in question.

If (a) is missing, that implies a very painful uphill battle, which is
why I'd been loudly whining, initially, in this thread, until I realized
that the core support was there in edk2, for PCIe segments.

However, (b) is required as well -- i.e., platform integration under
OvmfPkg/ and perhaps ArmVirtPkg/, between the QEMU boards and the core
edk2 code --, and that definitely doesn't exist for the PCIe segments
feature.

If (a) exists and is flexible enough, then we at least have a chance at
writing the platform support code (b) for it. So that's why I've stopped
whining. Writing (b) is never easy -- in this case, a great many of the
platform modules that I've listed above, under OvmfPkg/ pathnames, could
be affected, or even be eligible for replacement -- but (b) is at least
imaginable practice. Modules in category (a) are shipped *in* -- not
"on" -- every single physical UEFI platform that you can buy today,
which is one reason why it's hugely difficult to implement nontrivial
changes for them.

In brief: your statement is incorrect because category (b) is missing.
And that requires dedicated QEMU support, similarly to how
"OvmfPkg/PciHotPlugInitDxe" requires the vendor-specific resource
reservation capability, and how "OvmfPkg/Library/PciHostBridgeLib"
consumes the "etc/extra-pci-roots" fw_cfg file, and how most everything
that ArmVirtQemu does for PCI(e) originates from QEMU's DTB.

> * 64-bit space is crowded and there are no standards within QEMU for
>   placing per domain 64-bit MMIO and MMCFG ranges
> * We cannot put ECAM arbitrarily high because guest's PA width is
>   limited by host's when EPT is enabled.

That's right. One argument is that firmware can lay out these apertures
and ECAM ranges internally. But that argument breaks down when you hit
the PCPU physical address width, and would like the management stack,
such as libvirtd, to warn you in advance. For that, either libvirtd or
QEMU has to know, or direct, the layout.

> * NUMA modeling seems to be a stronger motivation than the limitation
>   of 256 but nubmers, that each NUMA node holds its own PCI(e)
>   sub-hierarchy

I'd also like to get more information about this -- I thought pxb-pci(e)
was already motivated by supporting NUMA locality. And, to my knowledge,
pxb-pci(e) actually *solved* this problem. Am I wrong? Let's say you
have 16 NUMA nodes (which seems pretty large to me); is it really
insufficient to assign ~16 devices to each node?

Thanks
Laszlo

  reply	other threads:[~2018-05-23 12:29 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-20  7:28 [Qemu-devel] [RFC 0/3] pci_expander_brdige: Put pxb host bridge into separate pci domain Zihan Yang
2018-05-20  7:28 ` [Qemu-devel] [RFC 1/3] pci_expander_bridge: reserve enough mcfg space for pxb host Zihan Yang
2018-05-21 11:03   ` Marcel Apfelbaum
2018-05-22  5:59     ` Zihan Yang
2018-05-22 18:47       ` Marcel Apfelbaum
2018-05-20  7:28 ` [Qemu-devel] [RFC 2/3] pci: Link pci_host_bridges with QTAILQ Zihan Yang
2018-05-21 11:05   ` Marcel Apfelbaum
2018-05-22  5:59     ` Zihan Yang
2018-05-22 18:39       ` Marcel Apfelbaum
2018-05-20  7:28 ` [Qemu-devel] [RFC 3/3] acpi-build: allocate mcfg for multiple host bridges Zihan Yang
2018-05-21 11:53   ` Marcel Apfelbaum
2018-05-22  6:03     ` Zihan Yang
2018-05-22 18:43       ` Marcel Apfelbaum
2018-05-22  9:52     ` Laszlo Ersek
2018-05-22 19:01       ` Marcel Apfelbaum
2018-05-22 19:51         ` Laszlo Ersek
2018-05-22 20:58           ` Michael S. Tsirkin
2018-05-22 21:36             ` Alex Williamson
2018-05-22 21:44               ` Michael S. Tsirkin
2018-05-22 21:47                 ` Alex Williamson
2018-05-22 22:00                   ` Laszlo Ersek
2018-05-22 23:38                   ` Michael S. Tsirkin
2018-05-23  4:28                     ` Alex Williamson
2018-05-23 14:25                       ` Michael S. Tsirkin
2018-05-23 14:57                         ` Alex Williamson
2018-05-23 15:01                           ` Michael S. Tsirkin
2018-05-23 16:50                         ` Marcel Apfelbaum
2018-05-22 21:17           ` Alex Williamson
2018-05-22 21:22             ` Michael S. Tsirkin
2018-05-22 21:58               ` Laszlo Ersek
2018-05-22 21:50             ` Laszlo Ersek
2018-05-23 17:00             ` Marcel Apfelbaum
2018-05-22 22:42           ` Laszlo Ersek
2018-05-22 23:40             ` Michael S. Tsirkin
2018-05-23  7:32               ` Laszlo Ersek
2018-05-23 11:11                 ` Zihan Yang
2018-05-23 12:28                   ` Laszlo Ersek [this message]
2018-05-23 17:23                     ` Marcel Apfelbaum
2018-05-24  9:57                     ` Zihan Yang
2018-05-23 17:33                   ` Marcel Apfelbaum
2018-05-24 10:00                     ` Zihan Yang
2018-05-23 17:11                 ` Marcel Apfelbaum
2018-05-23 17:25                   ` Laszlo Ersek
2018-05-28 11:02                 ` Laszlo Ersek
2018-05-21 15:23 ` [Qemu-devel] [RFC 0/3] pci_expander_brdige: Put pxb host bridge into separate pci domain Marcel Apfelbaum
2018-05-22  6:04   ` Zihan Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83f46e13-b3df-2f90-bd39-8822c47cfa6e@redhat.com \
    --to=lersek@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=drjones@redhat.com \
    --cc=eauger@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wei@redhat.com \
    --cc=whois.zihan.yang@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).