xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Alexey G <x1917x@gmail.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: StefanoStabellini <sstabellini@kernel.org>,
	Wei Liu <wei.liu2@citrix.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Anthony Perard <anthony.perard@citrix.com>,
	Ian Jackson <Ian.Jackson@citrix.com>
Subject: Re: [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
Date: Tue, 27 Mar 2018 05:42:11 +1000	[thread overview]
Message-ID: <20180327054211.00003e13@gmail.com> (raw)
In-Reply-To: <20180326092438.7auduhpx5eu3adb3@MacBook-Pro-de-Roger.local>

On Mon, 26 Mar 2018 10:24:38 +0100
Roger Pau Monné <roger.pau@citrix.com> wrote:

>On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:
[...]
>> In fact, the emulated chipset (NB+SB combo without supplemental
>> devices) itself is a small part of required emulation. It's
>> relatively easy to provide own analogs of for eg. 'mch' and
>> 'ICH9-LPC' QEMU PCIDevice's, the problem is to glue all remaining
>> parts together.
>> 
>> I assume the final goal in this case is to have only a set of
>> necessary QEMU PCIDevice's for which we will be providing I/O, MMIO
>> and PCI conf trapping facilities. Only devices such as rtl8139,
>> ich9-ahci and few others.
>> 
>> Basically, this means a new, chipset-less QEMU machine type.
>> Well, in theory it is possible with a bit of effort I think. The main
>> question is where will be the NB/SB/PCIbus emulating part reside in
>> this case.  
>
>Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
>the southbrigde will be emulated by a device model (ie: QEMU).
>
>As you mention above, I also took a look and it seems like the amount
>of registers that we should emulate for Q35 DRAM controller (D0:F0) is
>fairly minimal based on current QEMU implementation. We could even
>possibly get away by just emulating PCIEXBAR.

MCH emulation alone might be not an option. Besides, some
southbridge-specific features like emulating ACPI PM facilities for
domain power management (basically, anything at PMBASE) will be
preferable to implement on Xen side, especially considering the fact
that ACPI tables are already provided by Xen's libacpi/hvmloader, not
the device model.
I think the feature may require to cover at least the NB+SB
combination, at least Q35 MCH + ICH9 for start, ideally 82441FX+PIIX4
as well. Also, Xen should control emulated/PT PCI device placement.

Before going this way, it would be good to measure all risks.
Looks like there are two main directions currently:

I. (conservative) Let the main device model (QEMU) to inform Xen about
the current chipset-specific MMCONFIG location, to allow Xen to know
that some MMIO accesses to this area must be forwarded to other ioreq
servers (device emulators) in a form of PCI config read/write ioreqs,
if BDF corresponding to a MMCONFIG offset will point to the PCI device
owned by a device emulator.
In case of device emulators the conversion of MMIO accesses to PCI
config ones is a mandatory step, while the owner of the MMCONFIG MMIO
range may receive MMIO accesses in a native form without conversion
(a strongly preferable option for QEMU).

This approach assumes introducing of the new dmop/hypercall (something
like XEN_DMOP_mmcfg_location_change) to pass to Xen basic MMCONFIG
information -- address, enabled/disabled status (or simply address=0
instead) and size of the MMCONFIG area, eg. as a number of buses.
This information is enough to select a proper ioreq server in Xen and
allow multiple device emulators to function properly.
For future compatibility we can also provide the segment and
start/end bus range as arguments.

What this approach will require:
--------------------------------

- new notification-style dmop/hypercall to tell Xen about the current
  emulated MMCONFIG location

- trivial changes in QEMU to use this dmop in Q35 PCIEXBAR handling code

- relatively simple Xen changes in ioreq.c to use the provided range
  for ioreq server selection. Also, to provide MMIO -> PCI config ioreq
  translation for supplemental ioreq servers which don't know anything
  about the emulated system

Risks:
------

Risk to break anything is minimal in this case.

If QEMU will not provide this information (eg. due to an outdated
version installed), only basic PCI config space accesses via CF8/CFC
will be forwarded to a distinct ioreq server. This means the extended
PCI config space accesses won't be forwarded to specific device
emulators. Other than these device emulators, anything else will
continue to work properly in this case. No differences will be for
guest OSes without PCIe ECAM support in either case.

In general, no breakthrough improvements, no negative side-effects.
Just PCIe ECAM working as expected and compatibility with multiple
ioreq servers is retained.


II. (a new feature) Move chipset emulation to Xen directly.

In this case no separate notification necessary as Xen will be
emulating the chosen chipset itself. MMCONFIG location will be known
from own PCIEXBAR emulation.

QEMU will be used only to emulate a minimal set of unrelated devices
(eg. storage/network/vga). Less dependency on QEMU overall.

More freedom to implement some specific features in the future like
smram support for EFI firmware needs. Chipset remapping (aka reclaim)
functionality for memory relocation may be implemented under complete
Xen control, avoiding usage of unsafe add_to_physmap hypercalls.

In future this will allow to move passthrough-supporting code from QEMU
(hw/xen/xen-pt*.c) to Xen, merging it with Roger's vpci series.
This will improve eg. the PT + stubdomain situation a lot -- PCI config
space accesses for PT devices will be handled in a uniform way without
Dom0 interaction.
This particular feature can be implemented for the previous approach as
well, still it is easier to do when Xen controls the emulated machine

In general, this is a good long-term direction.

What this approach will require:
--------------------------------

- Changes in QEMU code to support a new chipset-less machine(s). In
  theory might be possible to implement on top of the "null" machine
  concept

- Major changes in Xen code to implement the actual chipset emulation
  there

- Changes on the toolstack side as the emulated machine will be
  selected and used differently

- Moving passthrough support from QEMU to Xen will likely require to
  re-divide areas of responsibility for PCI device passthrough between
  xen-pciback and the hypervisor. It might be more convenient to
  perform some tasks of xen-pciback in Xen directly

- strong dependency between Xen/libxl/QEMU/etc versions -- any outdated
  component will be a major problem. Can be resolved by providing some
  compatibility code

- longer implementation time

Risks:
------

- A major architecture change with possible issues encountered during
  the implementation

- Moving the emulation of the machine to Xen creates a non-zero risk of
  introducing a security issue while extending the emulation support
  further. As all emulation will take place on a most trusted level, any
  exploitable bug in the chipset emulation code may compromise the
  whole system

- there is a risk to encounter some dependency on missing chipset
  devices in QEMU. Some of QEMU devices (which depend on QEMU chipset
  devices/properties) might not work without extra patches. In theory
  this may be addressed by leaving the dummy MCH/LPC/pci-host devices
  in place while not forwarding any IO/MMIO/PCI conf accesses to them
  (using simply as compat placeholders)

- risk of incompatibility with future QEMU versions

In both cases, for security concerns PCIEXBAR and other MCH registers
can be made write-once (RO on all further accesses, similar to a
TXT-locked system).

[...]
>> Regarding control of the guest memory map in the toolstack only...
>> The problem is, only firmware can see a final memory map at the
>> moment. And only the device model knows about invisible "service"
>> ranges for emulated devices, like the LFB content (aka "VRAM") when
>> it is not mapped to a guest.
>> 
>> In order to calculate the final memory/MMIO hole split, we need to
>> know:
>> 
>> 1) all PCI devices on a PCI bus. At the moment Xen contributes only
>> devices like PT to the final PCI bus (via QMP device_add). Others are
>> QEMU ones. Even Xen platform PCI device relies on QEMU emulation.
>> Non-QEMU device emulators are another source of virtual PCI devices I
>> guess.
>> 
>> 2) all chipset-specific emulated MMIO ranges. MMCONFIG is one of them
>> and largest (up to 256Mb for a segment). There are few other smaller
>> ranges, eg. Root Complex registers. All this ranges depend on the
>> emulated chipset.
>> 
>> 3) all reserved memory ranges (this one what toolstack already knows)
>> 
>> 4) all "service" guest memory ranges like backing storage for VRAM in
>> QEMU. Emulated Option ROMs should belong here too, but IIRC xen-hvm.c
>> either intentionally or by mistate handles them as emulated ranges
>> currently.
>> 
>> If we miss any of these (like what are the chipset-specific ranges
>> and their size alignment requirements) -- we're in trouble. But, if
>> we know *all* of these, we can pre-calculate the MMIO hole size.
>> Although this is a bit fragile to do from the toolstack because both
>> sizing algo in the toolstack and MMIO BAR allocation code in the
>> firmware (hvmloader) must have their algorithms synchronized,
>> because it is possible to sruff BARs to MMIO hole in different ways,
>> especially when PCI-PCI bridges will appear on the scene. Both need
>> to do it in a consistent way (resulting in similar set of gaps
>> between allocated BARs), otherwise expected MMIO hole sizes won't
>> match, which means we may need to relocate MMIO BARs to the high
>> MMIO hole and this in turn may lead to those overlaps with QEMU
>> memories.  
>
>I agree that the current memory layout management (or the lack of it)
>is concerning. Although related, I think this should be tackled as a
>different issue from the chipset one IMHO.
>
>Since you already posted the Q35 series I would attempt to get that
>done first before jumping into the memory layout one.

It is somewhat related to the chipset because memory/MMIO layout
inconsistency can be solved more, well, naturally on Q35.

Basically, we have a non-standard MMIO hole layout where the
start of the high MMIO hole do not match the top of addressable RAM
(due to invisible ranges of the device model).

Q35 initially have facilities to allow firmware to modify (via
emulation) or discover such MMIO hole setup which can be used for safe
MMIO BAR allocation to avoid overlaps with QEMU-owned invisible ranges.

It doesn't really matter which registers to pick for this task, but for
Q35 this approach is at least consistent with what a real system does
(PV/PVH people will find this peculiarity pointless I suppose :) ).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-03-26 19:42 UTC|newest]

Thread overview: 155+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-12 18:33 [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices Alexey Gerasimenko
2018-03-12 18:33 ` [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35 Alexey Gerasimenko
2018-03-12 19:38   ` Konrad Rzeszutek Wilk
2018-03-12 20:10     ` Alexey G
2018-03-12 20:32       ` Konrad Rzeszutek Wilk
2018-03-12 21:19         ` Alexey G
2018-03-13  2:41           ` Tian, Kevin
2018-03-19 12:43   ` Roger Pau Monné
2018-03-19 13:57     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 02/12] Makefile: build and use new DSDT " Alexey Gerasimenko
2018-03-19 12:46   ` Roger Pau Monné
2018-03-19 14:18     ` Alexey G
2018-03-19 13:07   ` Jan Beulich
2018-03-19 14:10     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35) Alexey Gerasimenko
2018-03-13 17:26   ` Wei Liu
2018-03-13 17:58     ` Alexey G
2018-03-13 18:04       ` Wei Liu
2018-03-19 12:56   ` Roger Pau Monné
2018-03-19 16:26     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 04/12] hvmloader: add ACPI enabling for Q35 Alexey Gerasimenko
2018-03-13 17:26   ` Wei Liu
2018-03-19 13:01   ` Roger Pau Monné
2018-03-19 23:59     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 05/12] hvmloader: add Q35 DSDT table loading Alexey Gerasimenko
2018-03-19 14:45   ` Roger Pau Monné
2018-03-20  0:15     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 06/12] hvmloader: add basic Q35 support Alexey Gerasimenko
2018-03-19 15:30   ` Roger Pau Monné
2018-03-19 23:44     ` Alexey G
2018-03-20  9:20       ` Roger Pau Monné
2018-03-20 21:23         ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring Alexey Gerasimenko
2018-03-19 15:58   ` Roger Pau Monné
2018-03-19 19:49     ` Alexey G
2018-03-20  8:50       ` Roger Pau Monné
2018-03-20  9:25         ` Paul Durrant
2018-03-21  0:58         ` Alexey G
2018-03-21  9:09           ` Roger Pau Monné
2018-03-21  9:36             ` Paul Durrant
2018-03-21 14:35               ` Alexey G
2018-03-21 14:58                 ` Paul Durrant
2018-03-21 14:25             ` Alexey G
2018-03-21 14:54               ` Paul Durrant
2018-03-21 17:41                 ` Alexey G
2018-03-21 15:20               ` Roger Pau Monné
2018-03-21 16:56                 ` Alexey G
2018-03-21 17:06                   ` Paul Durrant
2018-03-22  0:31                     ` Alexey G
2018-03-22  9:04                       ` Jan Beulich
2018-03-22  9:55                         ` Alexey G
2018-03-22 10:06                           ` Paul Durrant
2018-03-22 11:56                             ` Alexey G
2018-03-22 12:09                               ` Jan Beulich
2018-03-22 13:05                                 ` Alexey G
2018-03-22 13:20                                   ` Jan Beulich
2018-03-22 14:34                                     ` Alexey G
2018-03-22 14:42                                       ` Jan Beulich
2018-03-22 15:08                                         ` Alexey G
2018-03-23 13:57                                           ` Paul Durrant
2018-03-23 22:32                                             ` Alexey G
2018-03-26  9:24                                               ` Roger Pau Monné
2018-03-26 19:42                                                 ` Alexey G [this message]
2018-03-27  8:45                                                   ` Roger Pau Monné
2018-03-27 15:37                                                     ` Alexey G
2018-03-28  9:30                                                       ` Roger Pau Monné
2018-03-28 11:42                                                         ` Alexey G
2018-03-28 12:05                                                           ` Paul Durrant
2018-03-28 10:03                                                       ` Paul Durrant
2018-03-28 14:14                                                         ` Alexey G
2018-03-21 17:15                   ` Roger Pau Monné
2018-03-21 22:49                     ` Alexey G
2018-03-22  9:29                       ` Paul Durrant
2018-03-22 10:05                         ` Roger Pau Monné
2018-03-22 10:09                           ` Paul Durrant
2018-03-22 11:36                             ` Alexey G
2018-03-22 10:50                         ` Alexey G
2018-03-22  9:57                       ` Roger Pau Monné
2018-03-22 12:29                         ` Alexey G
2018-03-22 12:44                           ` Roger Pau Monné
2018-03-22 15:31                             ` Alexey G
2018-03-23 10:29                               ` Paul Durrant
2018-03-23 11:38                                 ` Jan Beulich
2018-03-23 13:52                                   ` Paul Durrant
2018-05-29 14:23   ` Jan Beulich
2018-05-29 17:56     ` Alexey G
2018-05-29 18:47       ` Alexey G
2018-05-30  4:32         ` Alexey G
2018-05-30  8:13           ` Jan Beulich
2018-05-31  4:25             ` Alexey G
2018-05-30  8:12         ` Jan Beulich
2018-05-31  5:15           ` Alexey G
2018-06-01  5:30             ` Jan Beulich
2018-06-01 15:53               ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine) Alexey Gerasimenko
2018-03-13 17:25   ` Wei Liu
2018-03-13 17:32     ` Anthony PERARD
2018-03-19 17:01   ` Roger Pau Monné
2018-03-19 22:11     ` Alexey G
2018-03-20  9:11       ` Roger Pau Monné
2018-03-21 16:27         ` Wei Liu
2018-03-21 17:03           ` Anthony PERARD
2018-03-21 16:25       ` Wei Liu
2018-03-12 18:33 ` [RFC PATCH 09/12] libxl: Xen Platform device support for Q35 Alexey Gerasimenko
2018-03-19 15:05   ` Alexey G
2018-03-21 16:32     ` Wei Liu
2018-03-12 18:33 ` [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested Alexey Gerasimenko
2018-03-19 17:33   ` Roger Pau Monné
2018-03-19 21:46     ` Alexey G
2018-03-20  9:03       ` Roger Pau Monné
2018-03-20 21:06         ` Alexey G
2018-05-29 14:36   ` Jan Beulich
2018-05-29 18:20     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table Alexey Gerasimenko
2018-03-14 17:48   ` Alexey G
2018-03-19 17:49   ` Roger Pau Monné
2018-03-19 21:20     ` Alexey G
2018-03-20  8:58       ` Roger Pau Monné
2018-03-20  9:36       ` Jan Beulich
2018-03-20 20:53         ` Alexey G
2018-03-21  7:36           ` Jan Beulich
2018-05-29 14:46   ` Jan Beulich
2018-05-29 17:26     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 12/12] docs: provide description for device_model_machine option Alexey Gerasimenko
2018-03-12 18:33 ` [RFC PATCH 13/30] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices Alexey Gerasimenko
2018-03-14 10:48   ` Paolo Bonzini
     [not found]   ` <406abf99-4311-f08d-9f61-df72a9a3ef05@redhat.com>
2018-03-14 11:28     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 14/30] pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 15/30] q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35 Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35 Alexey Gerasimenko
2018-03-12 19:44   ` Eduardo Habkost
     [not found]   ` <20180312194406.GX3417@localhost.localdomain>
2018-03-12 20:56     ` Alexey G
2018-03-12 21:44       ` Eduardo Habkost
     [not found]       ` <20180312214402.GY3417@localhost.localdomain>
2018-03-13 23:49         ` Alexey G
2018-03-13  9:24   ` [Qemu-devel] " Daniel P. Berrangé
2018-03-12 18:34 ` [RFC PATCH 17/30] q35: Fix incorrect values for PCIEXBAR masks Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 18/30] xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and PCIe Extended Capabilities enumeration Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 19/30] xen/pt: avoid reading PCIe device type and cap version multiple times Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 20/30] xen/pt: determine the legacy/PCIe mode for a passed through device Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 21/30] xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology check Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 22/30] xen/pt: add support for PCIe Extended Capabilities and larger config space Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 23/30] xen/pt: handle PCIe Extended Capabilities Next register Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 24/30] xen/pt: allow to hide PCIe Extended Capabilities Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 25/30] xen/pt: add Vendor-specific PCIe Extended Capability descriptor and sizing Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 26/30] xen/pt: add fixed-size PCIe Extended Capabilities descriptors Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 27/30] xen/pt: add AER PCIe Extended Capability descriptor and sizing Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 28/30] xen/pt: add descriptors and size calculation for RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 29/30] xen/pt: add Resizable BAR PCIe Extended Capability descriptor and sizing Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 30/30] xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors " Alexey Gerasimenko
2018-03-13  9:21 ` [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices Daniel P. Berrangé
2018-03-13 11:37   ` Alexey G
2018-03-13 11:44     ` Daniel P. Berrangé
2018-03-16 17:34 ` Alexey G
2018-03-16 18:26   ` Stefano Stabellini
2018-03-16 18:36   ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180327054211.00003e13@gmail.com \
    --to=x1917x@gmail.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Jackson@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=anthony.perard@citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).