[PATCH v6 0/5] Kconfig for PCI passthrough on ARM

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 0/5] Kconfig for PCI passthrough on ARM
@ 2023-11-13 22:21 Stewart Hildebrand
  2023-11-13 22:21 ` [PATCH v6 1/5] xen/arm: pci: introduce PCI_PASSTHROUGH Kconfig option Stewart Hildebrand
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Stewart Hildebrand @ 2023-11-13 22:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stewart Hildebrand, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Wei Liu,
	Anthony PERARD, Juergen Gross, Andrew Cooper, George Dunlap,
	Jan Beulich, Christian Lindig, David Scott,
	Marek Marczykowski-Górecki, Roger Pau Monné,
	Paul Durrant

There are multiple series in development/review [1], [2] that will benefit from
having a Kconfig option for PCI passthrough on ARM. Hence I have sent this
series independent from any other series.

v5->v6:
* address feedback in individual patches

v4->v5:
* add [FUTURE] tag to ("xen/arm: enable vPCI for dom0")
* address feedback in individual patches

v3->v4:
* rename ("xen/arm: pci: plumb xen_arch_domainconfig with pci info")
  to ("xen/vpci: move xen_domctl_createdomain vPCI flag to common")
* fold ("xen/arm: make has_vpci() depend on d->arch.has_vpci")
  into ("xen/vpci: move xen_domctl_createdomain vPCI flag to common")
* split ("xen/arm: enable vPCI for domUs") into separate hypervisor and
  tools patches

v2->v3:
* add ("xen/arm: pci: plumb xen_arch_domainconfig with pci info")
* rename ("xen/arm: make has_vpci depend on CONFIG_HAS_VPCI")
      to ("xen/arm: make has_vpci() depend on d->arch.has_vpci")
* add ("xen/arm: enable vPCI for dom0")

v1->v2:
* add ("[FUTURE] xen/arm: enable vPCI for domUs")

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00660.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00210.html

Rahul Singh (1):
  xen/arm: pci: introduce PCI_PASSTHROUGH Kconfig option

Stewart Hildebrand (4):
  xen/vpci: move xen_domctl_createdomain vPCI flag to common
  [FUTURE] xen/arm: enable vPCI for dom0
  [FUTURE] xen/arm: enable vPCI for domUs
  [FUTURE] tools/arm: enable vPCI for domUs

 tools/libs/light/libxl_arm.c       |  3 +++
 tools/libs/light/libxl_x86.c       |  2 +-
 tools/ocaml/libs/xc/xenctrl.ml     |  2 +-
 tools/ocaml/libs/xc/xenctrl.mli    |  2 +-
 tools/python/xen/lowlevel/xc/xc.c  |  2 +-
 xen/arch/arm/Kconfig               | 10 ++++++++++
 xen/arch/arm/domain.c              |  3 ++-
 xen/arch/arm/domain_build.c        |  6 ++++++
 xen/arch/arm/include/asm/domain.h  |  5 ++---
 xen/arch/arm/include/asm/pci.h     |  9 +++++++++
 xen/arch/arm/pci/pci-host-common.c | 11 ++++++++---
 xen/arch/arm/vpci.c                |  8 ++++++++
 xen/arch/x86/domain.c              | 20 +++++++++++++++-----
 xen/arch/x86/include/asm/domain.h  |  8 +++-----
 xen/arch/x86/setup.c               |  4 ++--
 xen/common/domain.c                | 15 ++++++++++++++-
 xen/drivers/passthrough/pci.c      | 20 ++++++++++++++++++++
 xen/include/public/arch-x86/xen.h  |  9 +++++----
 xen/include/public/domctl.h        |  6 ++++--
 xen/include/xen/domain.h           |  2 ++
 20 files changed, 117 insertions(+), 30 deletions(-)


base-commit: fb41228ececea948c7953c8c16fe28fd65c6536b
-- 
2.42.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 1/5] xen/arm: pci: introduce PCI_PASSTHROUGH Kconfig option
  2023-11-13 22:21 [PATCH v6 0/5] Kconfig for PCI passthrough on ARM Stewart Hildebrand
@ 2023-11-13 22:21 ` Stewart Hildebrand
  2023-11-13 22:21 ` [PATCH v6 2/5] xen/vpci: move xen_domctl_createdomain vPCI flag to common Stewart Hildebrand
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Stewart Hildebrand @ 2023-11-13 22:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Rahul Singh, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Stefano Stabellini,
	Stewart Hildebrand, Julien Grall

From: Rahul Singh <rahul.singh@arm.com>

Setting CONFIG_PCI_PASSTHROUGH=y will enable PCI passthrough on ARM, even though
the feature is not yet complete in the current upstream codebase. The purpose of
this is to make it easier to enable the necessary configs (HAS_PCI, HAS_VPCI) for
testing and development of PCI passthrough on ARM.

Since PCI passthrough on ARM is still work in progress at this time, make it
depend on EXPERT.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
---
(cherry picked from commit 9a08f1f7ce28ec619640ba9ce11018bf443e9a0e from the
 downstream branch [1])

v5->v6:
* no change

v4->v5:
* no change

v3->v4:
* no change

v2->v3:
* add Julien's A-b

v1->v2:
* drop "ARM" naming since it is already in an ARM category
* depend on EXPERT instead of UNSUPPORTED

Changes from downstream to v1:
* depends on ARM_64 (Stefano)
* Don't select HAS_VPCI_GUEST_SUPPORT since this config option is not currently
  used in the upstream codebase. This will want to be re-added here once the
  vpci series [2] is merged.
* Don't select ARM_SMMU_V3 since this option can already be selected
  independently. While PCI passthrough on ARM depends on an SMMU, it does not
  depend on a particular version or variant of an SMMU.
* Don't select HAS_ITS since this option can already be selected independently.
  HAS_ITS may want to be added here once the MSI series [1] is merged.
* Don't select LATE_HWDOM since this option is unrelated to PCI passthrough.

[1] https://gitlab.com/xen-project/people/bmarquis/xen-arm-poc/-/commits/poc/pci-passthrough
[2] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00660.html
---
 xen/arch/arm/Kconfig | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 2939db429b78..5ff68e5d5979 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -190,6 +190,15 @@ config STATIC_SHM
 	help
 	  This option enables statically shared memory on a dom0less system.

+config PCI_PASSTHROUGH
+	bool "PCI passthrough" if EXPERT
+	depends on ARM_64
+	select HAS_PCI
+	select HAS_VPCI
+	default n
+	help
+	  This option enables PCI device passthrough
+
 endmenu

 menu "ARM errata workaround via the alternative framework"
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 2/5] xen/vpci: move xen_domctl_createdomain vPCI flag to common
  2023-11-13 22:21 [PATCH v6 0/5] Kconfig for PCI passthrough on ARM Stewart Hildebrand
  2023-11-13 22:21 ` [PATCH v6 1/5] xen/arm: pci: introduce PCI_PASSTHROUGH Kconfig option Stewart Hildebrand
@ 2023-11-13 22:21 ` Stewart Hildebrand
  2023-11-14  9:11   ` Jan Beulich
  2023-11-13 22:21 ` [PATCH v6 3/5] [FUTURE] xen/arm: enable vPCI for dom0 Stewart Hildebrand
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 32+ messages in thread
From: Stewart Hildebrand @ 2023-11-13 22:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stewart Hildebrand, Wei Liu, Anthony PERARD, Juergen Gross,
	Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall,
	Stefano Stabellini, Christian Lindig, David Scott,
	Marek Marczykowski-Górecki, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Roger Pau Monné, Oleksandr Andrushchenko,
	Rahul Singh, Christian Lindig

Both x86 and ARM need a way at domain creation time to specify whether
the domain needs vPCI emulation. Move the vPCI flag from x86
xen_domctl_createdomain.arch.emulation_flags to the common
xen_domctl_createdomain.flags.

Move has_vpci() macro to common header.

Bump XEN_DOMCTL_INTERFACE_VERSION since we're modifying flags inside
struct xen_domctl_createdomain and xen_arch_domainconfig.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
---
v5->v6:
* don't explicitly clear XEN_DOMCTL_CDF_vpci where unnecessary
* add comment about avoiding re-using bit 10 in emulation_flags
* don't #define _XEN_DOMCTL_CDF_vpci (underscore-prefixed), only
  introduce single new flag identifier
* move ( vpci && !hvm ) check to xen/arch/x86/domain.c
* simplify has_vpci()

v4->v5:
* move flags_optional change in xen/arch/arm/domain.c to next patch
* change latter 2 args to emulation_flags_ok() to unsigned int
* move vpci && !hvm check to common code
* remove stray spaces in emulation_flags_ok(), and a minor style fixup
* add CONFIG_HAS_VPCI check into has_vpci()
* add Christian's A-b (OCaml)

v3->v4:
* renamed, was:
  ("xen/arm: pci: plumb xen_arch_domainconfig with pci info")
* reworked: move x86 vPCI flag to common instead of adding another arch
  specific vPCI flag
* folded ("xen/arm: make has_vpci() depend on d->arch.has_vpci") into
  this patch (retain Signed-off-by's from [1] and [2])

v2->v3:
* new patch

[1] https://gitlab.com/xen-project/people/bmarquis/xen-arm-poc/-/commit/27be1729ce8128dbe37275ce7946b2fbd2e5a382
[2] https://github.com/xen-troops/xen/commit/bf12185e6fb2e31db0d8e6ea9ccd8a02abadec17
---
 tools/libs/light/libxl_x86.c      |  2 +-
 tools/ocaml/libs/xc/xenctrl.ml    |  2 +-
 tools/ocaml/libs/xc/xenctrl.mli   |  2 +-
 tools/python/xen/lowlevel/xc/xc.c |  2 +-
 xen/arch/arm/include/asm/domain.h |  3 ---
 xen/arch/x86/domain.c             | 20 +++++++++++++++-----
 xen/arch/x86/include/asm/domain.h |  6 +-----
 xen/arch/x86/setup.c              |  4 ++--
 xen/common/domain.c               | 10 +++++++++-
 xen/include/public/arch-x86/xen.h |  9 +++++----
 xen/include/public/domctl.h       |  6 ++++--
 xen/include/xen/domain.h          |  2 ++
 12 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index d16573e72cd4..4a42b7231f8b 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -8,7 +8,7 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 {
     switch(d_config->c_info.type) {
     case LIBXL_DOMAIN_TYPE_HVM:
-        config->arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
+        config->arch.emulation_flags = XEN_X86_EMU_ALL;
         break;
     case LIBXL_DOMAIN_TYPE_PVH:
         config->arch.emulation_flags = XEN_X86_EMU_LAPIC;
diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index d6c6eb73db44..6f3da9c6e064 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -46,7 +46,6 @@ type x86_arch_emulation_flags =
   | X86_EMU_IOMMU
   | X86_EMU_PIT
   | X86_EMU_USE_PIRQ
-  | X86_EMU_VPCI
 
 type x86_arch_misc_flags =
   | X86_MSR_RELAXED
@@ -70,6 +69,7 @@ type domain_create_flag =
   | CDF_IOMMU
   | CDF_NESTED_VIRT
   | CDF_VPMU
+  | CDF_VPCI
 
 type domain_create_iommu_opts =
   | IOMMU_NO_SHAREPT
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index 3bfc16edba96..e2dd02bec962 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -40,7 +40,6 @@ type x86_arch_emulation_flags =
   | X86_EMU_IOMMU
   | X86_EMU_PIT
   | X86_EMU_USE_PIRQ
-  | X86_EMU_VPCI
 
 type x86_arch_misc_flags =
   | X86_MSR_RELAXED
@@ -63,6 +62,7 @@ type domain_create_flag =
   | CDF_IOMMU
   | CDF_NESTED_VIRT
   | CDF_VPMU
+  | CDF_VPCI
 
 type domain_create_iommu_opts =
   | IOMMU_NO_SHAREPT
diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
index d3ea350e07b9..b9f559a8e637 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -159,7 +159,7 @@ static PyObject *pyxc_domain_create(XcObject *self,
 
 #if defined (__i386) || defined(__x86_64__)
     if ( config.flags & XEN_DOMCTL_CDF_hvm )
-        config.arch.emulation_flags = (XEN_X86_EMU_ALL & ~XEN_X86_EMU_VPCI);
+        config.arch.emulation_flags = XEN_X86_EMU_ALL;
 #elif defined (__arm__) || defined(__aarch64__)
     config.arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE;
 #else
diff --git a/xen/arch/arm/include/asm/domain.h b/xen/arch/arm/include/asm/domain.h
index 5fb8cd79c01a..3614562eaefe 100644
--- a/xen/arch/arm/include/asm/domain.h
+++ b/xen/arch/arm/include/asm/domain.h
@@ -298,9 +298,6 @@ static inline void arch_vcpu_block(struct vcpu *v) {}
 
 #define arch_vm_assist_valid_mask(d) (1UL << VMASST_TYPE_runstate_update_flag)
 
-/* vPCI is not available on Arm */
-#define has_vpci(d)    ({ (void)(d); false; })
-
 struct arch_vcpu_io {
     struct instr_details dabt_instr; /* when the instruction is decoded */
 };
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 3712e36df930..bbb73abe88a1 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -636,6 +636,7 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
     bool hvm = config->flags & XEN_DOMCTL_CDF_hvm;
     bool hap = config->flags & XEN_DOMCTL_CDF_hap;
     bool nested_virt = config->flags & XEN_DOMCTL_CDF_nested_virt;
+    bool vpci = config->flags & XEN_DOMCTL_CDF_vpci;
     unsigned int max_vcpus;
 
     if ( hvm ? !hvm_enabled : !IS_ENABLED(CONFIG_PV) )
@@ -709,10 +710,17 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
+    if ( vpci && !hvm )
+    {
+        dprintk(XENLOG_INFO, "vPCI requested for non-HVM guest\n");
+        return -EINVAL;
+    }
+
     return 0;
 }
 
-static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)
+static bool emulation_flags_ok(const struct domain *d, unsigned int emflags,
+                               unsigned int cdf)
 {
 #ifdef CONFIG_HVM
     /* This doesn't catch !CONFIG_HVM case but it is better than nothing */
@@ -722,11 +730,13 @@ static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)
     if ( is_hvm_domain(d) )
     {
         if ( is_hardware_domain(d) &&
-             emflags != (X86_EMU_VPCI | X86_EMU_LAPIC | X86_EMU_IOAPIC) )
+             (!(cdf & XEN_DOMCTL_CDF_vpci) ||
+              emflags != (X86_EMU_LAPIC | X86_EMU_IOAPIC)) )
             return false;
         if ( !is_hardware_domain(d) &&
-             emflags != (X86_EMU_ALL & ~X86_EMU_VPCI) &&
-             emflags != X86_EMU_LAPIC )
+             ((cdf & XEN_DOMCTL_CDF_vpci) ||
+              (emflags != X86_EMU_ALL &&
+               emflags != X86_EMU_LAPIC)) )
             return false;
     }
     else if ( emflags != 0 && emflags != X86_EMU_PIT )
@@ -798,7 +808,7 @@ int arch_domain_create(struct domain *d,
         return -EINVAL;
     }
 
-    if ( !emulation_flags_ok(d, emflags) )
+    if ( !emulation_flags_ok(d, emflags, config->flags) )
     {
         printk(XENLOG_G_ERR "d%d: Xen does not allow %s domain creation "
                "with the current selection of emulators: %#x\n",
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index 619e667938ed..cb02a4d1ebb2 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -471,7 +471,6 @@ struct arch_domain
 #define X86_EMU_VGA      XEN_X86_EMU_VGA
 #define X86_EMU_IOMMU    XEN_X86_EMU_IOMMU
 #define X86_EMU_USE_PIRQ XEN_X86_EMU_USE_PIRQ
-#define X86_EMU_VPCI     XEN_X86_EMU_VPCI
 #else
 #define X86_EMU_LAPIC    0
 #define X86_EMU_HPET     0
@@ -482,7 +481,6 @@ struct arch_domain
 #define X86_EMU_VGA      0
 #define X86_EMU_IOMMU    0
 #define X86_EMU_USE_PIRQ 0
-#define X86_EMU_VPCI     0
 #endif
 
 #define X86_EMU_PIT     XEN_X86_EMU_PIT
@@ -492,8 +490,7 @@ struct arch_domain
                                  X86_EMU_PM | X86_EMU_RTC |             \
                                  X86_EMU_IOAPIC | X86_EMU_PIC |         \
                                  X86_EMU_VGA | X86_EMU_IOMMU |          \
-                                 X86_EMU_PIT | X86_EMU_USE_PIRQ |       \
-                                 X86_EMU_VPCI)
+                                 X86_EMU_PIT | X86_EMU_USE_PIRQ)
 
 #define has_vlapic(d)      (!!((d)->arch.emulation_flags & X86_EMU_LAPIC))
 #define has_vhpet(d)       (!!((d)->arch.emulation_flags & X86_EMU_HPET))
@@ -505,7 +502,6 @@ struct arch_domain
 #define has_viommu(d)      (!!((d)->arch.emulation_flags & X86_EMU_IOMMU))
 #define has_vpit(d)        (!!((d)->arch.emulation_flags & X86_EMU_PIT))
 #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
-#define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
 
 #define gdt_ldt_pt_idx(v) \
       ((v)->vcpu_id >> (PAGETABLE_ORDER - GDT_LDT_VCPU_SHIFT))
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index a3d3f797bb1e..00dfcf231e21 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -890,12 +890,12 @@ static struct domain *__init create_dom0(const module_t *image,
 
     if ( opt_dom0_pvh )
     {
-        dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
+        dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_vpci |
                            ((hvm_hap_supported() && !opt_dom0_shadow) ?
                             XEN_DOMCTL_CDF_hap : 0));
 
         dom0_cfg.arch.emulation_flags |=
-            XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
+            XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC;
     }
 
     if ( iommu_enabled )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 8f9ab01c0cb7..12dc27428972 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -509,12 +509,14 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
     bool hap = config->flags & XEN_DOMCTL_CDF_hap;
     bool iommu = config->flags & XEN_DOMCTL_CDF_iommu;
     bool vpmu = config->flags & XEN_DOMCTL_CDF_vpmu;
+    bool vpci = config->flags & XEN_DOMCTL_CDF_vpci;
 
     if ( config->flags &
          ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
            XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
            XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
-           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_vpmu) )
+           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_vpmu |
+           XEN_DOMCTL_CDF_vpci) )
     {
         dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
         return -EINVAL;
@@ -575,6 +577,12 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
+    if ( vpci && !IS_ENABLED(CONFIG_HAS_VPCI) )
+    {
+        dprintk(XENLOG_INFO, "vPCI requested but not enabled\n");
+        return -EINVAL;
+    }
+
     return arch_sanitise_domain_config(config);
 }
 
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index c0f4551247f4..abb808c9060b 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -283,15 +283,16 @@ struct xen_arch_domainconfig {
 #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
 #define _XEN_X86_EMU_USE_PIRQ       9
 #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
-#define _XEN_X86_EMU_VPCI           10
-#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
+/*
+ * Note: bit 10 was previously used for a XEN_X86_EMU_VPCI flag. This bit should
+ * not be re-used without careful consideration.
+ */
 
 #define XEN_X86_EMU_ALL             (XEN_X86_EMU_LAPIC | XEN_X86_EMU_HPET |  \
                                      XEN_X86_EMU_PM | XEN_X86_EMU_RTC |      \
                                      XEN_X86_EMU_IOAPIC | XEN_X86_EMU_PIC |  \
                                      XEN_X86_EMU_VGA | XEN_X86_EMU_IOMMU |   \
-                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ |\
-                                     XEN_X86_EMU_VPCI)
+                                     XEN_X86_EMU_PIT | XEN_X86_EMU_USE_PIRQ)
     uint32_t emulation_flags;
 
 /*
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index a33f9ec32b08..8b3ea62cae06 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -21,7 +21,7 @@
 #include "hvm/save.h"
 #include "memory.h"
 
-#define XEN_DOMCTL_INTERFACE_VERSION 0x00000016
+#define XEN_DOMCTL_INTERFACE_VERSION 0x00000017
 
 /*
  * NB. xen_domctl.domain is an IN/OUT parameter for this operation.
@@ -55,9 +55,11 @@ struct xen_domctl_createdomain {
 #define XEN_DOMCTL_CDF_nested_virt    (1U << _XEN_DOMCTL_CDF_nested_virt)
 /* Should we expose the vPMU to the guest? */
 #define XEN_DOMCTL_CDF_vpmu           (1U << 7)
+/* Should vPCI be enabled for the guest? */
+#define XEN_DOMCTL_CDF_vpci           (1U << 8)
 
 /* Max XEN_DOMCTL_CDF_* constant.  Used for ABI checking. */
-#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_vpmu
+#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_vpci
 
     uint32_t flags;
 
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 54d88bf5e34b..17b3429240f3 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -51,6 +51,8 @@ void arch_get_domain_info(const struct domain *d,
 
 #define is_domain_using_staticmem(d) ((d)->cdf & CDF_staticmem)
 
+#define has_vpci(d) (!!((d)->options & XEN_DOMCTL_CDF_vpci))
+
 /*
  * Arch-specifics.
  */
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 2/5] xen/vpci: move xen_domctl_createdomain vPCI flag to common
  2023-11-13 22:21 ` [PATCH v6 2/5] xen/vpci: move xen_domctl_createdomain vPCI flag to common Stewart Hildebrand
@ 2023-11-14  9:11   ` Jan Beulich
  2023-11-29 21:25     ` Stewart Hildebrand
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2023-11-14  9:11 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Wei Liu, Anthony PERARD, Juergen Gross, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, Christian Lindig,
	David Scott, Marek Marczykowski-Górecki, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Roger Pau Monné,
	Oleksandr Andrushchenko, Rahul Singh, Christian Lindig, xen-devel

On 13.11.2023 23:21, Stewart Hildebrand wrote:
> @@ -709,10 +710,17 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>          return -EINVAL;
>      }
>  
> +    if ( vpci && !hvm )
> +    {
> +        dprintk(XENLOG_INFO, "vPCI requested for non-HVM guest\n");
> +        return -EINVAL;
> +    }
> +
>      return 0;
>  }

As said on the v5 thread, I think my comment was misguided (I'm sorry)
and this wants keeping in common code as you had it.

> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -283,15 +283,16 @@ struct xen_arch_domainconfig {
>  #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
>  #define _XEN_X86_EMU_USE_PIRQ       9
>  #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
> -#define _XEN_X86_EMU_VPCI           10
> -#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
> +/*
> + * Note: bit 10 was previously used for a XEN_X86_EMU_VPCI flag. This bit should
> + * not be re-used without careful consideration.
> + */

I think a multi-line comment is drawing overly much attention to this.
How about "Note: Bit 10 was previously used for XEN_X86_EMU_VPCI. Re-use
with care." which I think fits in a single line comment.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 2/5] xen/vpci: move xen_domctl_createdomain vPCI flag to common
  2023-11-14  9:11   ` Jan Beulich
@ 2023-11-29 21:25     ` Stewart Hildebrand
  2023-12-15  9:36       ` Rahul Singh
  0 siblings, 1 reply; 32+ messages in thread
From: Stewart Hildebrand @ 2023-11-29 21:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, Anthony PERARD, Juergen Gross, Andrew Cooper,
	George Dunlap, Julien Grall, Stefano Stabellini, Christian Lindig,
	David Scott, Marek Marczykowski-Górecki, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Roger Pau Monné,
	Oleksandr Andrushchenko, Rahul Singh, Christian Lindig, xen-devel

On 11/14/23 04:11, Jan Beulich wrote:
> On 13.11.2023 23:21, Stewart Hildebrand wrote:
>> @@ -709,10 +710,17 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>>          return -EINVAL;
>>      }
>>  
>> +    if ( vpci && !hvm )
>> +    {
>> +        dprintk(XENLOG_INFO, "vPCI requested for non-HVM guest\n");
>> +        return -EINVAL;
>> +    }
>> +
>>      return 0;
>>  }
> 
> As said on the v5 thread, I think my comment was misguided (I'm sorry)
> and this wants keeping in common code as you had it.

I'll move it back to xen/common/domain.c. No worries.

> 
>> --- a/xen/include/public/arch-x86/xen.h
>> +++ b/xen/include/public/arch-x86/xen.h
>> @@ -283,15 +283,16 @@ struct xen_arch_domainconfig {
>>  #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
>>  #define _XEN_X86_EMU_USE_PIRQ       9
>>  #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
>> -#define _XEN_X86_EMU_VPCI           10
>> -#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
>> +/*
>> + * Note: bit 10 was previously used for a XEN_X86_EMU_VPCI flag. This bit should
>> + * not be re-used without careful consideration.
>> + */
> 
> I think a multi-line comment is drawing overly much attention to this.
> How about "Note: Bit 10 was previously used for XEN_X86_EMU_VPCI. Re-use
> with care." which I think fits in a single line comment.

Sounds good.

> 
> Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 2/5] xen/vpci: move xen_domctl_createdomain vPCI flag to common
  2023-11-29 21:25     ` Stewart Hildebrand
@ 2023-12-15  9:36       ` Rahul Singh
  2023-12-19 18:55         ` Stewart Hildebrand
  0 siblings, 1 reply; 32+ messages in thread
From: Rahul Singh @ 2023-12-15  9:36 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Jan Beulich, Wei Liu, Anthony PERARD, Juergen Gross,
	Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Christian Lindig, David Scott, Marek Marczykowski-Górecki,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Roger Pau Monné, Oleksandr Andrushchenko, Christian Lindig,
	xen-devel@lists.xenproject.org

Hi Stewart,

> On 29 Nov 2023, at 9:25 pm, Stewart Hildebrand <Stewart.Hildebrand@amd.com> wrote:
> 
> On 11/14/23 04:11, Jan Beulich wrote:
>> On 13.11.2023 23:21, Stewart Hildebrand wrote:
>>> @@ -709,10 +710,17 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>         return -EINVAL;
>>>     }
>>> 
>>> +    if ( vpci && !hvm )
>>> +    {
>>> +        dprintk(XENLOG_INFO, "vPCI requested for non-HVM guest\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>>     return 0;
>>> }
>> 
>> As said on the v5 thread, I think my comment was misguided (I'm sorry)
>> and this wants keeping in common code as you had it.
> 
> I'll move it back to xen/common/domain.c. No worries.

I tested this patch and observed build failure when compiling the "x86_64” arch with
"CONFIG_HVM=n“ option.

x86_64-linux-gnu-ld    -melf_x86_64  -T arch/x86/xen.lds -N prelink.o --build-id=sha1 \
    ./common/symbols-dummy.o -o ./.xen-syms.0 
x86_64-linux-gnu-ld: prelink.o: in function `arch_iommu_hwdom_init’:
(.init.text+0x2192b): undefined reference to `vpci_is_mmcfg_address’
(.init.text+0x2192b): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `vpci_is_mmcfg_address'
x86_64-linux-gnu-ld: (.init.text+0x21947): undefined reference to `vpci_is_mmcfg_address'
(.init.text+0x21947): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `vpci_is_mmcfg_address'
x86_64-linux-gnu-ld: prelink.o: in function `do_physdev_op’: 
(.text.do_physdev_op+0x6db): undefined reference to `register_vpci_mmcfg_handler'
(.text.do_physdev_op+0x6db): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `register_vpci_mmcfg_handler'
x86_64-linux-gnu-ld: ./.xen-syms.0: hidden symbol `vpci_is_mmcfg_address' isn't defined
x86_64-linux-gnu-ld: final link failed: bad value                


Regards,
Rahul


> 
>> 
>>> --- a/xen/include/public/arch-x86/xen.h
>>> +++ b/xen/include/public/arch-x86/xen.h
>>> @@ -283,15 +283,16 @@ struct xen_arch_domainconfig {
>>> #define XEN_X86_EMU_PIT             (1U<<_XEN_X86_EMU_PIT)
>>> #define _XEN_X86_EMU_USE_PIRQ       9
>>> #define XEN_X86_EMU_USE_PIRQ        (1U<<_XEN_X86_EMU_USE_PIRQ)
>>> -#define _XEN_X86_EMU_VPCI           10
>>> -#define XEN_X86_EMU_VPCI            (1U<<_XEN_X86_EMU_VPCI)
>>> +/*
>>> + * Note: bit 10 was previously used for a XEN_X86_EMU_VPCI flag. This bit should
>>> + * not be re-used without careful consideration.
>>> + */
>> 
>> I think a multi-line comment is drawing overly much attention to this.
>> How about "Note: Bit 10 was previously used for XEN_X86_EMU_VPCI. Re-use
>> with care." which I think fits in a single line comment.
> 
> Sounds good.
> 
>> 
>> Jan



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 2/5] xen/vpci: move xen_domctl_createdomain vPCI flag to common
  2023-12-15  9:36       ` Rahul Singh
@ 2023-12-19 18:55         ` Stewart Hildebrand
  0 siblings, 0 replies; 32+ messages in thread
From: Stewart Hildebrand @ 2023-12-19 18:55 UTC (permalink / raw)
  To: Rahul Singh
  Cc: Jan Beulich, Wei Liu, Anthony PERARD, Juergen Gross,
	Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Christian Lindig, David Scott, Marek Marczykowski-Górecki,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	Roger Pau Monné, Oleksandr Andrushchenko, Christian Lindig,
	xen-devel@lists.xenproject.org

On 12/15/23 04:36, Rahul Singh wrote:
> Hi Stewart,
> 
>> On 29 Nov 2023, at 9:25 pm, Stewart Hildebrand <Stewart.Hildebrand@amd.com> wrote:
>>
>> On 11/14/23 04:11, Jan Beulich wrote:
>>> On 13.11.2023 23:21, Stewart Hildebrand wrote:
>>>> @@ -709,10 +710,17 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>>         return -EINVAL;
>>>>     }
>>>>
>>>> +    if ( vpci && !hvm )
>>>> +    {
>>>> +        dprintk(XENLOG_INFO, "vPCI requested for non-HVM guest\n");
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>>     return 0;
>>>> }
>>>
>>> As said on the v5 thread, I think my comment was misguided (I'm sorry)
>>> and this wants keeping in common code as you had it.
>>
>> I'll move it back to xen/common/domain.c. No worries.
> 
> I tested this patch and observed build failure when compiling the "x86_64” arch with
> "CONFIG_HVM=n“ option.
> 
> x86_64-linux-gnu-ld    -melf_x86_64  -T arch/x86/xen.lds -N prelink.o --build-id=sha1 \
>     ./common/symbols-dummy.o -o ./.xen-syms.0 
> x86_64-linux-gnu-ld: prelink.o: in function `arch_iommu_hwdom_init’:
> (.init.text+0x2192b): undefined reference to `vpci_is_mmcfg_address’
> (.init.text+0x2192b): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `vpci_is_mmcfg_address'
> x86_64-linux-gnu-ld: (.init.text+0x21947): undefined reference to `vpci_is_mmcfg_address'
> (.init.text+0x21947): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `vpci_is_mmcfg_address'
> x86_64-linux-gnu-ld: prelink.o: in function `do_physdev_op’: 
> (.text.do_physdev_op+0x6db): undefined reference to `register_vpci_mmcfg_handler'
> (.text.do_physdev_op+0x6db): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `register_vpci_mmcfg_handler'
> x86_64-linux-gnu-ld: ./.xen-syms.0: hidden symbol `vpci_is_mmcfg_address' isn't defined
> x86_64-linux-gnu-ld: final link failed: bad value                

Ah, good catch. Before moving it, the flag was defined in xen/arch/x86/include/asm/domain.h:

#ifdef CONFIG_HVM
#define X86_EMU_VPCI     XEN_X86_EMU_VPCI
#else
#define X86_EMU_VPCI     0
#endif

#define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))


With CONFIG_HVM not set, in xen/drivers/passthrough/x86/iommu.c, the compiler optimized out the call to vpci_is_mmcfg_address():

    if ( has_vpci(d) && vpci_is_mmcfg_address(d, pfn_to_paddr(pfn)) )

After moving the flag, there are a couple of options to make the issue go away. I don't think #defining XEN_DOMCTL_CDF_vpci 0 in the !HVM case would be a good idea since that's a flag shared with the toolstack. We could change the definition of has_vpci():

#define has_vpci(d) (((d)->options & XEN_DOMCTL_CDF_vpci) && \
                     IS_ENABLED(CONFIG_HVM))

Or we could provide empty static inline definitions of vpci_is_mmcfg_address() and register_vpci_mmcfg_handler():

#ifdef CONFIG_HVM
bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr);
#else
static inline bool vpci_is_mmcfg_address(const struct domain *d, paddr_t addr)
{
    return false;
}
#endif

I don't have a strong preference for which way, but changing has_vpci() has a smaller diffstat, so I'll go with that for now. This is assuming that we still want to go with this approach. Given recent related discussions [1], it's possible we may not need a vPCI flag in struct xen_domctl_createdomain, but instead a flag internal to the hypervisor somewhere in struct domain.

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-12/msg00212.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 3/5] [FUTURE] xen/arm: enable vPCI for dom0
  2023-11-13 22:21 [PATCH v6 0/5] Kconfig for PCI passthrough on ARM Stewart Hildebrand
  2023-11-13 22:21 ` [PATCH v6 1/5] xen/arm: pci: introduce PCI_PASSTHROUGH Kconfig option Stewart Hildebrand
  2023-11-13 22:21 ` [PATCH v6 2/5] xen/vpci: move xen_domctl_createdomain vPCI flag to common Stewart Hildebrand
@ 2023-11-13 22:21 ` Stewart Hildebrand
  2023-11-13 22:21 ` [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs Stewart Hildebrand
  2023-11-13 22:21 ` [PATCH v6 5/5] [FUTURE] tools/arm: " Stewart Hildebrand
  4 siblings, 0 replies; 32+ messages in thread
From: Stewart Hildebrand @ 2023-11-13 22:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stewart Hildebrand, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk

Set the vPCI flag in xen_domctl_createdomain to enable vPCI for dom0 if
iommu and PCI passthrough are enabled and there exists a PCI host bridge
in the system.

Adjust pci_host_iterate_bridges_and_count() to count the number of host
bridges if no callback is provided.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
Allowing/enabling vPCI for dom0 on ARM should follow or be part of the
PCI passthrough SMMU series [1]. I'm including it here due to
prerequisites in this Kconfig series. Once the prerequisites are
committed I'll move this patch to the PCI passthrough SMMU series.

v5->v6:
* no change

v4->v5:
* add [FUTURE] tag
* move flags_optional change from the previous patch to here

v3->v4:
* depend on iommu_enabled, pci_passthrough_enabled, and whether there
  is a pci host bridge

v2->v3:
* new patch

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00210.html
---
 xen/arch/arm/domain.c              |  3 ++-
 xen/arch/arm/domain_build.c        |  6 ++++++
 xen/arch/arm/include/asm/pci.h     |  9 +++++++++
 xen/arch/arm/pci/pci-host-common.c | 11 ++++++++---
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 28e3aaa5e482..1409a4235e13 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -607,7 +607,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
 {
     unsigned int max_vcpus;
     unsigned int flags_required = (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap);
-    unsigned int flags_optional = (XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_vpmu);
+    unsigned int flags_optional = (XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_vpmu |
+                                   XEN_DOMCTL_CDF_vpci);
     unsigned int sve_vl_bits = sve_decode_vl(config->arch.sve_vl);
 
     if ( (config->flags & ~flags_optional) != flags_required )
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 2dd2926b4144..512b3c4c76e2 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -3916,6 +3916,12 @@ void __init create_dom0(void)
             panic("SVE vector length error\n");
     }
 
+    if ( IS_ENABLED(CONFIG_HAS_VPCI) &&
+         iommu_enabled &&
+         is_pci_passthrough_enabled() &&
+         (pci_host_iterate_bridges_and_count(NULL, NULL) > 0) )
+        dom0_cfg.flags |= XEN_DOMCTL_CDF_vpci;
+
     dom0 = domain_create(0, &dom0_cfg, CDF_privileged | CDF_directmap);
     if ( IS_ERR(dom0) )
         panic("Error creating domain 0 (rc = %ld)\n", PTR_ERR(dom0));
diff --git a/xen/arch/arm/include/asm/pci.h b/xen/arch/arm/include/asm/pci.h
index 7f77226c9bbf..74816a687bbb 100644
--- a/xen/arch/arm/include/asm/pci.h
+++ b/xen/arch/arm/include/asm/pci.h
@@ -147,5 +147,14 @@ static inline int pci_get_new_domain_nr(void)
     return -1;
 }
 
+struct pci_host_bridge;
+
+static inline int pci_host_iterate_bridges_and_count(
+    struct domain *d,
+    int (*cb)(struct domain *d, struct pci_host_bridge *bridge))
+{
+    return 0;
+}
+
 #endif  /*!CONFIG_HAS_PCI*/
 #endif /* __ARM_PCI_H__ */
diff --git a/xen/arch/arm/pci/pci-host-common.c b/xen/arch/arm/pci/pci-host-common.c
index c0faf0f43675..e6a03ae668f8 100644
--- a/xen/arch/arm/pci/pci-host-common.c
+++ b/xen/arch/arm/pci/pci-host-common.c
@@ -319,9 +319,14 @@ int pci_host_iterate_bridges_and_count(struct domain *d,
     {
         int ret;
 
-        ret = cb(d, bridge);
-        if ( ret < 0 )
-            return ret;
+        if ( cb )
+        {
+            ret = cb(d, bridge);
+            if ( ret < 0 )
+                return ret;
+        }
+        else
+            ret = 1;
         count += ret;
     }
     return count;
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-11-13 22:21 [PATCH v6 0/5] Kconfig for PCI passthrough on ARM Stewart Hildebrand
                   ` (2 preceding siblings ...)
  2023-11-13 22:21 ` [PATCH v6 3/5] [FUTURE] xen/arm: enable vPCI for dom0 Stewart Hildebrand
@ 2023-11-13 22:21 ` Stewart Hildebrand
  2023-11-14  9:13   ` Jan Beulich
  2023-12-01  9:16   ` Roger Pau Monné
  2023-11-13 22:21 ` [PATCH v6 5/5] [FUTURE] tools/arm: " Stewart Hildebrand
  4 siblings, 2 replies; 32+ messages in thread
From: Stewart Hildebrand @ 2023-11-13 22:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stewart Hildebrand, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Roger Pau Monné,
	Paul Durrant

Select HAS_VPCI_GUEST_SUPPORT in Kconfig for enabling vPCI support for
domUs.

Add checks to fail guest creation if the configuration is invalid.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
As the tag implies, this patch is not intended to be merged (yet).

Note that CONFIG_HAS_VPCI_GUEST_SUPPORT is not currently used in the
upstream code base. It will be used by the vPCI series [1]. This patch
is intended to be merged as part of the vPCI series. I'll coordinate
with Volodymyr to include this in the vPCI series or resend afterwards.
Meanwhile, I'll include it here until the Kconfig and
xen_domctl_createdomain prerequisites have been committed.

v5->v6:
* drop is_pvh_domain(), simply make arch_needs_vpci() return false on
  x86 for now
* leave has_vpci() alone, instead, add HAS_VPCI_GUEST_SUPPORT check in
  domain_create

v4->v5:
* replace is_system_domain() check with dom_io check
* return an error in XEN_DOMCTL_assign_device (thanks Jan!)
* remove CONFIG_ARM check
* add needs_vpci() and arch_needs_vpci()
* add HAS_VPCI_GUEST_SUPPORT check to has_vpci()

v3->v4:
* refuse to create domain if configuration is invalid
* split toolstack change into separate patch

v2->v3:
* set pci flags in toolstack

v1->v2:
* new patch

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00660.html
---
 xen/arch/arm/Kconfig              |  1 +
 xen/arch/arm/include/asm/domain.h |  2 ++
 xen/arch/arm/vpci.c               |  8 ++++++++
 xen/arch/x86/include/asm/domain.h |  2 ++
 xen/common/domain.c               |  5 +++++
 xen/drivers/passthrough/pci.c     | 20 ++++++++++++++++++++
 6 files changed, 38 insertions(+)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 5ff68e5d5979..3845b238a33f 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -195,6 +195,7 @@ config PCI_PASSTHROUGH
 	depends on ARM_64
 	select HAS_PCI
 	select HAS_VPCI
+	select HAS_VPCI_GUEST_SUPPORT
 	default n
 	help
 	  This option enables PCI device passthrough
diff --git a/xen/arch/arm/include/asm/domain.h b/xen/arch/arm/include/asm/domain.h
index 3614562eaefe..8e6d5fe9578c 100644
--- a/xen/arch/arm/include/asm/domain.h
+++ b/xen/arch/arm/include/asm/domain.h
@@ -31,6 +31,8 @@ enum domain_type {
 
 #define is_domain_direct_mapped(d) ((d)->cdf & CDF_directmap)
 
+#define arch_needs_vpci(d) ({ (void)(d); true; })
+
 /*
  * Is the domain using the host memory layout?
  *
diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 3bc4bb55082a..61e0edcedea9 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -2,6 +2,7 @@
 /*
  * xen/arch/arm/vpci.c
  */
+#include <xen/lib.h>
 #include <xen/sched.h>
 #include <xen/vpci.h>
 
@@ -90,8 +91,15 @@ int domain_vpci_init(struct domain *d)
             return ret;
     }
     else
+    {
+        if ( !IS_ENABLED(CONFIG_HAS_VPCI_GUEST_SUPPORT) )
+        {
+            gdprintk(XENLOG_ERR, "vPCI requested but guest support not enabled\n");
+            return -EINVAL;
+        }
         register_mmio_handler(d, &vpci_mmio_handler,
                               GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, NULL);
+    }
 
     return 0;
 }
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index cb02a4d1ebb2..d34015391eed 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -503,6 +503,8 @@ struct arch_domain
 #define has_vpit(d)        (!!((d)->arch.emulation_flags & X86_EMU_PIT))
 #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
 
+#define arch_needs_vpci(d) ({ (void)(d); false; })
+
 #define gdt_ldt_pt_idx(v) \
       ((v)->vcpu_id >> (PAGETABLE_ORDER - GDT_LDT_VCPU_SHIFT))
 #define pv_gdt_ptes(v) \
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 12dc27428972..47d49c57bf83 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -692,6 +692,11 @@ struct domain *domain_create(domid_t domid,
 
     if ( !is_idle_domain(d) )
     {
+        err = -EINVAL;
+        if ( !is_hardware_domain(d) && (config->flags & XEN_DOMCTL_CDF_vpci) &&
+             !IS_ENABLED(CONFIG_HAS_VPCI_GUEST_SUPPORT) )
+            goto fail;
+
         if ( !is_hardware_domain(d) )
             d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
         else
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 04d00c7c37df..2203725a2aa6 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1542,6 +1542,18 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev)
     pcidevs_unlock();
 }
 
+static bool needs_vpci(const struct domain *d)
+{
+    if ( is_hardware_domain(d) )
+        return false;
+
+    if ( d == dom_io )
+        /* xl pci-assignable-add assigns PCI devices to domIO */
+        return false;
+
+    return arch_needs_vpci(d);
+}
+
 int iommu_do_pci_domctl(
     struct xen_domctl *domctl, struct domain *d,
     XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
@@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN(machine_sbdf);
 
+        if ( needs_vpci(d) && !has_vpci(d) )
+        {
+            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
+                   &PCI_SBDF(seg, bus, devfn), d);
+            ret = -EPERM;
+            break;
+        }
+
         pcidevs_lock();
         ret = device_assigned(seg, bus, devfn);
         if ( domctl->cmd == XEN_DOMCTL_test_assign_device )
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-11-13 22:21 ` [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs Stewart Hildebrand
@ 2023-11-14  9:13   ` Jan Beulich
  2023-11-30  2:47     ` Stewart Hildebrand
  2023-12-01  9:16   ` Roger Pau Monné
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2023-11-14  9:13 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné, Paul Durrant, xen-devel

On 13.11.2023 23:21, Stewart Hildebrand wrote:
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -503,6 +503,8 @@ struct arch_domain
>  #define has_vpit(d)        (!!((d)->arch.emulation_flags & X86_EMU_PIT))
>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>  
> +#define arch_needs_vpci(d) ({ (void)(d); false; })

See my comments on the v5 thread on both this and ...

> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1542,6 +1542,18 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev)
>      pcidevs_unlock();
>  }
>  
> +static bool needs_vpci(const struct domain *d)
> +{
> +    if ( is_hardware_domain(d) )
> +        return false;

... this. (It is generally a good idea to wait a little with sending new
versions, when you can't be sure yet whether the earlier discussion has
settled.)

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-11-14  9:13   ` Jan Beulich
@ 2023-11-30  2:47     ` Stewart Hildebrand
  2023-11-30  8:33       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Stewart Hildebrand @ 2023-11-30  2:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné, Paul Durrant, xen-devel

On 11/14/23 04:13, Jan Beulich wrote:
> On 13.11.2023 23:21, Stewart Hildebrand wrote:
>> --- a/xen/arch/x86/include/asm/domain.h
>> +++ b/xen/arch/x86/include/asm/domain.h
>> @@ -503,6 +503,8 @@ struct arch_domain
>>  #define has_vpit(d)        (!!((d)->arch.emulation_flags & X86_EMU_PIT))
>>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>>  
>> +#define arch_needs_vpci(d) ({ (void)(d); false; })
> 
> See my comments on the v5 thread on both this and ...

So, the goal here is to return true for a PVH dom0, and false otherwise (for now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) returns true for PVH, how about the following?

/* TODO: re-visit when vPCI is enabled for PVH domUs. */
#define arch_needs_vpci(d) ({                       \
    const struct domain *_d = (d);                  \
    is_hardware_domain(_d) && is_hvm_domain(_d); })


Link to v5 thread for reference [1]
[1] https://lists.xenproject.org/archives/html/xen-devel/2023-11/msg00968.html

> 
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -1542,6 +1542,18 @@ void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev)
>>      pcidevs_unlock();
>>  }
>>  
>> +static bool needs_vpci(const struct domain *d)
>> +{
>> +    if ( is_hardware_domain(d) )
>> +        return false;
> 
> ... this.

I'll move this check to the Arm arch_needs_vpci() in xen/arch/arm/include/asm/domain.h

> (It is generally a good idea to wait a little with sending new
> versions, when you can't be sure yet whether the earlier discussion has
> settled.)

(Sorry, I'll be better about this going forward.)

> 
> Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-11-30  2:47     ` Stewart Hildebrand
@ 2023-11-30  8:33       ` Jan Beulich
  2023-11-30 17:06         ` Stewart Hildebrand
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2023-11-30  8:33 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné, Paul Durrant, xen-devel

On 30.11.2023 03:47, Stewart Hildebrand wrote:
> On 11/14/23 04:13, Jan Beulich wrote:
>> On 13.11.2023 23:21, Stewart Hildebrand wrote:
>>> --- a/xen/arch/x86/include/asm/domain.h
>>> +++ b/xen/arch/x86/include/asm/domain.h
>>> @@ -503,6 +503,8 @@ struct arch_domain
>>>  #define has_vpit(d)        (!!((d)->arch.emulation_flags & X86_EMU_PIT))
>>>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>>>  
>>> +#define arch_needs_vpci(d) ({ (void)(d); false; })
>>
>> See my comments on the v5 thread on both this and ...
> 
> So, the goal here is to return true for a PVH dom0, and false otherwise (for now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) returns true for PVH, how about the following?
> 
> /* TODO: re-visit when vPCI is enabled for PVH domUs. */
> #define arch_needs_vpci(d) ({                       \
>     const struct domain *_d = (d);                  \
>     is_hardware_domain(_d) && is_hvm_domain(_d); })

Looks okay to me, except for the leading underscore in _d (see respective
Misra guidelines, merely re-enforcing what the C standard says). Of course
the double evaluate_nospec() isn't quite nice in the result, but I guess
this isn't going to be used in many places?

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-11-30  8:33       ` Jan Beulich
@ 2023-11-30 17:06         ` Stewart Hildebrand
  2023-12-01  6:57           ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Stewart Hildebrand @ 2023-11-30 17:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné, Paul Durrant, xen-devel

On 11/30/23 03:33, Jan Beulich wrote:
> On 30.11.2023 03:47, Stewart Hildebrand wrote:
>> On 11/14/23 04:13, Jan Beulich wrote:
>>> On 13.11.2023 23:21, Stewart Hildebrand wrote:
>>>> --- a/xen/arch/x86/include/asm/domain.h
>>>> +++ b/xen/arch/x86/include/asm/domain.h
>>>> @@ -503,6 +503,8 @@ struct arch_domain
>>>>  #define has_vpit(d)        (!!((d)->arch.emulation_flags & X86_EMU_PIT))
>>>>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>>>>  
>>>> +#define arch_needs_vpci(d) ({ (void)(d); false; })
>>>
>>> See my comments on the v5 thread on both this and ...
>>
>> So, the goal here is to return true for a PVH dom0, and false otherwise (for now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) returns true for PVH, how about the following?
>>
>> /* TODO: re-visit when vPCI is enabled for PVH domUs. */
>> #define arch_needs_vpci(d) ({                       \
>>     const struct domain *_d = (d);                  \
>>     is_hardware_domain(_d) && is_hvm_domain(_d); })
> 
> Looks okay to me, except for the leading underscore in _d (see respective
> Misra guidelines, merely re-enforcing what the C standard says).

Right. If I'm interpreting the standards correctly, it looks like a trailing underscore would work, seeing as we haven't adopted MISRA C:2012 Dir 4.5 (yet?).

> Of course
> the double evaluate_nospec() isn't quite nice in the result, but I guess
> this isn't going to be used in many places?

Only in XEN_DOMCTL_assign_device, as far as I'm aware.

> 
> Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-11-30 17:06         ` Stewart Hildebrand
@ 2023-12-01  6:57           ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2023-12-01  6:57 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné, Paul Durrant, xen-devel

On 30.11.2023 18:06, Stewart Hildebrand wrote:
> On 11/30/23 03:33, Jan Beulich wrote:
>> On 30.11.2023 03:47, Stewart Hildebrand wrote:
>>> On 11/14/23 04:13, Jan Beulich wrote:
>>>> On 13.11.2023 23:21, Stewart Hildebrand wrote:
>>>>> --- a/xen/arch/x86/include/asm/domain.h
>>>>> +++ b/xen/arch/x86/include/asm/domain.h
>>>>> @@ -503,6 +503,8 @@ struct arch_domain
>>>>>  #define has_vpit(d)        (!!((d)->arch.emulation_flags & X86_EMU_PIT))
>>>>>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>>>>>  
>>>>> +#define arch_needs_vpci(d) ({ (void)(d); false; })
>>>>
>>>> See my comments on the v5 thread on both this and ...
>>>
>>> So, the goal here is to return true for a PVH dom0, and false otherwise (for now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) returns true for PVH, how about the following?
>>>
>>> /* TODO: re-visit when vPCI is enabled for PVH domUs. */
>>> #define arch_needs_vpci(d) ({                       \
>>>     const struct domain *_d = (d);                  \
>>>     is_hardware_domain(_d) && is_hvm_domain(_d); })
>>
>> Looks okay to me, except for the leading underscore in _d (see respective
>> Misra guidelines, merely re-enforcing what the C standard says).
> 
> Right. If I'm interpreting the standards correctly, it looks like a trailing underscore would work, seeing as we haven't adopted MISRA C:2012 Dir 4.5 (yet?).

Adopting the respective Misra rule would only affirm that we should adhere
to the C spec. Us being free-standing (and hence no runtime library involved)
may have been an argument towards more relaxed treatment, but imo never was a
good justification. And yes, trailing underscores is what I have been
recommending, but some of the other maintainers don't really like them
(without, iirc, indicating what else to use as an underlying naming scheme).

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-11-13 22:21 ` [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs Stewart Hildebrand
  2023-11-14  9:13   ` Jan Beulich
@ 2023-12-01  9:16   ` Roger Pau Monné
  2023-12-02  2:56     ` Stefano Stabellini
  1 sibling, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2023-12-01  9:16 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: xen-devel, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Paul Durrant

On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>          bus = PCI_BUS(machine_sbdf);
>          devfn = PCI_DEVFN(machine_sbdf);
>  
> +        if ( needs_vpci(d) && !has_vpci(d) )
> +        {
> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
> +                   &PCI_SBDF(seg, bus, devfn), d);
> +            ret = -EPERM;
> +            break;

I think this is likely too restrictive going forward.  The current
approach is indeed to enable vPCI on a per-domain basis because that's
how PVH dom0 uses it, due to being unable to use ioreq servers.

If we start to expose vPCI suport to guests the interface should be on
a per-device basis, so that vPCI could be enabled for some devices,
while others could still be handled by ioreq servers.

We might want to add a new flag to xen_domctl_assign_device (used by
XEN_DOMCTL_assign_device) in order to signal whether the device will
use vPCI.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-01  9:16   ` Roger Pau Monné
@ 2023-12-02  2:56     ` Stefano Stabellini
  2023-12-04  3:54       ` Stewart Hildebrand
                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Stefano Stabellini @ 2023-12-02  2:56 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stewart Hildebrand, xen-devel, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 1791 bytes --]

On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> >          bus = PCI_BUS(machine_sbdf);
> >          devfn = PCI_DEVFN(machine_sbdf);
> >  
> > +        if ( needs_vpci(d) && !has_vpci(d) )
> > +        {
> > +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
> > +                   &PCI_SBDF(seg, bus, devfn), d);
> > +            ret = -EPERM;
> > +            break;
> 
> I think this is likely too restrictive going forward.  The current
> approach is indeed to enable vPCI on a per-domain basis because that's
> how PVH dom0 uses it, due to being unable to use ioreq servers.
> 
> If we start to expose vPCI suport to guests the interface should be on
> a per-device basis, so that vPCI could be enabled for some devices,
> while others could still be handled by ioreq servers.
> 
> We might want to add a new flag to xen_domctl_assign_device (used by
> XEN_DOMCTL_assign_device) in order to signal whether the device will
> use vPCI.

Actually I don't think this is a good idea. I am all for flexibility but
supporting multiple different configurations comes at an extra cost for
both maintainers and contributors. I think we should try to reduce the
amount of configurations we support rather than increasing them
(especially on x86 where we have PV, PVH, HVM).

I don't think we should enable IOREQ servers to handle PCI passthrough
for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
Passthrough can be handled by vPCI just fine. I think this should be a
good anti-feature to have (a goal to explicitly not add this feature) to
reduce complexity. Unless you see a specific usecase to add support for
it?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-02  2:56     ` Stefano Stabellini
@ 2023-12-04  3:54       ` Stewart Hildebrand
  2023-12-04  8:24       ` Jan Beulich
  2023-12-04 10:58       ` Roger Pau Monné
  2 siblings, 0 replies; 32+ messages in thread
From: Stewart Hildebrand @ 2023-12-04  3:54 UTC (permalink / raw)
  To: Stefano Stabellini, Roger Pau Monné
  Cc: xen-devel, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Paul Durrant

On 12/1/23 21:56, Stefano Stabellini wrote:
> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>>>          bus = PCI_BUS(machine_sbdf);
>>>          devfn = PCI_DEVFN(machine_sbdf);
>>>  
>>> +        if ( needs_vpci(d) && !has_vpci(d) )
>>> +        {
>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
>>> +                   &PCI_SBDF(seg, bus, devfn), d);
>>> +            ret = -EPERM;
>>> +            break;
>>
>> I think this is likely too restrictive going forward.  The current
>> approach is indeed to enable vPCI on a per-domain basis because that's
>> how PVH dom0 uses it, due to being unable to use ioreq servers.
>>
>> If we start to expose vPCI suport to guests the interface should be on
>> a per-device basis, so that vPCI could be enabled for some devices,
>> while others could still be handled by ioreq servers.
>>
>> We might want to add a new flag to xen_domctl_assign_device (used by
>> XEN_DOMCTL_assign_device) in order to signal whether the device will
>> use vPCI.
> 
> Actually I don't think this is a good idea. I am all for flexibility but
> supporting multiple different configurations comes at an extra cost for
> both maintainers and contributors. I think we should try to reduce the
> amount of configurations we support rather than increasing them
> (especially on x86 where we have PV, PVH, HVM).
> 
> I don't think we should enable IOREQ servers to handle PCI passthrough
> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> Passthrough can be handled by vPCI just fine. I think this should be a
> good anti-feature to have (a goal to explicitly not add this feature) to
> reduce complexity. Unless you see a specific usecase to add support for
> it?

Just to preemptively clarify: there is a use case for passthrough (vPCI) and emulated virtio-pci devices (ioreq). However, the XEN_DOMCTL_assign_device hypercall, where this check is added, is only used for real physical hardware devices as far as I can tell. So I agree, I can't see a use case for passing through some physical devices with vPCI and some physical devices with qemu/ioreq.

With that said, the plumbing for a new flag does not appear to be particularly complex. It may actually be simpler than introducing a per-arch needs_vpci(d) heuristic.

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..2c38088a4772 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1113,6 +1113,7 @@ typedef struct pci_add_state {
     libxl_device_pci pci;
     libxl_domid pci_domid;
     int retries;
+    bool vpci;
 } pci_add_state;

 static void pci_add_qemu_trad_watch_state_cb(libxl__egc *egc,
@@ -1176,6 +1177,10 @@ static void do_pci_add(libxl__egc *egc,
         }
     }

+    if (type == LIBXL_DOMAIN_TYPE_PVH /* includes Arm guests */) {
+        pas->vpci = true;
+    }
+
     rc = 0;

 out:
@@ -1418,7 +1423,8 @@ static void pci_add_dm_done(libxl__egc *egc,
     unsigned long long start, end, flags, size;
     int irq, i;
     int r;
-    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
+    uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED |
+                    (pas->vpci ? XEN_DOMCTL_DEV_USES_VPCI : 0);
     uint32_t domainid = domid;
     bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);

diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 2203725a2aa6..7786da1cf1e6 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1630,7 +1630,7 @@ int iommu_do_pci_domctl(
         bus = PCI_BUS(machine_sbdf);
         devfn = PCI_DEVFN(machine_sbdf);

-        if ( needs_vpci(d) && !has_vpci(d) )
+        if ( (flags & XEN_DOMCTL_DEV_USES_VPCI) && !has_vpci(d) )
         {
             printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
                    &PCI_SBDF(seg, bus, devfn), d);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 8b3ea62cae06..5735d47219bc 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -527,6 +527,7 @@ struct xen_domctl_assign_device {
     uint32_t dev;   /* XEN_DOMCTL_DEV_* */
     uint32_t flags;
 #define XEN_DOMCTL_DEV_RDM_RELAXED      1 /* assign only */
+#define XEN_DOMCTL_DEV_USES_VPCI        (1 << 1)
     union {
         struct {
             uint32_t machine_sbdf;   /* machine PCI ID of assigned device */


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-02  2:56     ` Stefano Stabellini
  2023-12-04  3:54       ` Stewart Hildebrand
@ 2023-12-04  8:24       ` Jan Beulich
  2023-12-04 10:58       ` Roger Pau Monné
  2 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2023-12-04  8:24 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Stewart Hildebrand, xen-devel, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Wei Liu, Paul Durrant, Roger Pau Monné

On 02.12.2023 03:56, Stefano Stabellini wrote:
> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>>>          bus = PCI_BUS(machine_sbdf);
>>>          devfn = PCI_DEVFN(machine_sbdf);
>>>  
>>> +        if ( needs_vpci(d) && !has_vpci(d) )
>>> +        {
>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
>>> +                   &PCI_SBDF(seg, bus, devfn), d);
>>> +            ret = -EPERM;
>>> +            break;
>>
>> I think this is likely too restrictive going forward.  The current
>> approach is indeed to enable vPCI on a per-domain basis because that's
>> how PVH dom0 uses it, due to being unable to use ioreq servers.
>>
>> If we start to expose vPCI suport to guests the interface should be on
>> a per-device basis, so that vPCI could be enabled for some devices,
>> while others could still be handled by ioreq servers.
>>
>> We might want to add a new flag to xen_domctl_assign_device (used by
>> XEN_DOMCTL_assign_device) in order to signal whether the device will
>> use vPCI.
> 
> Actually I don't think this is a good idea. I am all for flexibility but
> supporting multiple different configurations comes at an extra cost for
> both maintainers and contributors. I think we should try to reduce the
> amount of configurations we support rather than increasing them
> (especially on x86 where we have PV, PVH, HVM).
> 
> I don't think we should enable IOREQ servers to handle PCI passthrough
> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> Passthrough can be handled by vPCI just fine. I think this should be a
> good anti-feature to have (a goal to explicitly not add this feature) to
> reduce complexity. Unless you see a specific usecase to add support for
> it?

I could see very special devices to want accompanying by a special ioreq
server. I could also see tandem pass-through/emulated device pairs to
potentially be of use and require coordination in the same ioreq server.

That said, I wouldn't insist on allowing a mix of vPCI and ioreq servers
to be used until an actual use case (with code supporting it) actually
appears.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-02  2:56     ` Stefano Stabellini
  2023-12-04  3:54       ` Stewart Hildebrand
  2023-12-04  8:24       ` Jan Beulich
@ 2023-12-04 10:58       ` Roger Pau Monné
  2023-12-04 19:09         ` Stewart Hildebrand
  2023-12-04 22:07         ` Stefano Stabellini
  2 siblings, 2 replies; 32+ messages in thread
From: Roger Pau Monné @ 2023-12-04 10:58 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Stewart Hildebrand, xen-devel, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Paul Durrant

On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> > On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> > > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> > >          bus = PCI_BUS(machine_sbdf);
> > >          devfn = PCI_DEVFN(machine_sbdf);
> > >  
> > > +        if ( needs_vpci(d) && !has_vpci(d) )
> > > +        {
> > > +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
> > > +                   &PCI_SBDF(seg, bus, devfn), d);
> > > +            ret = -EPERM;
> > > +            break;
> > 
> > I think this is likely too restrictive going forward.  The current
> > approach is indeed to enable vPCI on a per-domain basis because that's
> > how PVH dom0 uses it, due to being unable to use ioreq servers.
> > 
> > If we start to expose vPCI suport to guests the interface should be on
> > a per-device basis, so that vPCI could be enabled for some devices,
> > while others could still be handled by ioreq servers.
> > 
> > We might want to add a new flag to xen_domctl_assign_device (used by
> > XEN_DOMCTL_assign_device) in order to signal whether the device will
> > use vPCI.
> 
> Actually I don't think this is a good idea. I am all for flexibility but
> supporting multiple different configurations comes at an extra cost for
> both maintainers and contributors. I think we should try to reduce the
> amount of configurations we support rather than increasing them
> (especially on x86 where we have PV, PVH, HVM).

I think it's perfectly fine to initially require a domain to have all
its devices either passed through using vPCI or ireqs, but the
interface IMO should allow for such differentiation in the future.
That's why I think introducing a domain wide vPCI flag might not be
the best option going forward.

It would be perfectly fine for XEN_DOMCTL_assign_device to set a
domain wide vPCI flag, iow:

if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
{
    if ( has_arch_pdevs(d) )
    {
        printk("All passthrough devices must use the same backend\n");
        return -EINVAL;
    }

    /* Set vPCI domain flag */
}

We have already agreed that we want to aim for a setup where ioreqs
and vPCI could be used for the same domain, but I guess you assumed
that ioreqs would be used for emulated devices exclusively and vPCI
for passthrough devices?

Overall if we agree that ioreqs and vPCI should co-exist for a domain,
I'm not sure there's much reason to limit ioreqs to only handle
emulated devices, seems like an arbitrary limitation.

> I don't think we should enable IOREQ servers to handle PCI passthrough
> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> Passthrough can be handled by vPCI just fine. I think this should be a
> good anti-feature to have (a goal to explicitly not add this feature) to
> reduce complexity. Unless you see a specific usecase to add support for
> it?

There are passthrough devices (GPUs) that might require some extra
mediation on dom0 (like the Intel GVT-g thing) that would force the
usage of ioreqs to passthrough.

It's important that the interfaces we introduce are correct IMO,
because that ends up reflecting on the configuration options that we
expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
gets placed there will ultimately influence how the option gets
exposed in xl/libxl, and the interface there is relevant to keep
stable for end user sanity.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-04 10:58       ` Roger Pau Monné
@ 2023-12-04 19:09         ` Stewart Hildebrand
  2023-12-04 22:07         ` Stefano Stabellini
  1 sibling, 0 replies; 32+ messages in thread
From: Stewart Hildebrand @ 2023-12-04 19:09 UTC (permalink / raw)
  To: Roger Pau Monné, Stefano Stabellini
  Cc: xen-devel, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Paul Durrant

On 12/4/23 05:58, Roger Pau Monné wrote:
> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>>>>          bus = PCI_BUS(machine_sbdf);
>>>>          devfn = PCI_DEVFN(machine_sbdf);
>>>>  
>>>> +        if ( needs_vpci(d) && !has_vpci(d) )
>>>> +        {
>>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
>>>> +                   &PCI_SBDF(seg, bus, devfn), d);
>>>> +            ret = -EPERM;
>>>> +            break;
>>>
>>> I think this is likely too restrictive going forward.  The current
>>> approach is indeed to enable vPCI on a per-domain basis because that's
>>> how PVH dom0 uses it, due to being unable to use ioreq servers.
>>>
>>> If we start to expose vPCI suport to guests the interface should be on
>>> a per-device basis, so that vPCI could be enabled for some devices,
>>> while others could still be handled by ioreq servers.
>>>
>>> We might want to add a new flag to xen_domctl_assign_device (used by
>>> XEN_DOMCTL_assign_device) in order to signal whether the device will
>>> use vPCI.
>>
>> Actually I don't think this is a good idea. I am all for flexibility but
>> supporting multiple different configurations comes at an extra cost for
>> both maintainers and contributors. I think we should try to reduce the
>> amount of configurations we support rather than increasing them
>> (especially on x86 where we have PV, PVH, HVM).
> 
> I think it's perfectly fine to initially require a domain to have all
> its devices either passed through using vPCI or ireqs, but the
> interface IMO should allow for such differentiation in the future.
> That's why I think introducing a domain wide vPCI flag might not be
> the best option going forward.
> 
> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
> domain wide vPCI flag, iow:
> 
> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
> {
>     if ( has_arch_pdevs(d) )
>     {
>         printk("All passthrough devices must use the same backend\n");
>         return -EINVAL;
>     }
> 
>     /* Set vPCI domain flag */

There is a vPCI initializer that would need to be called too (on Arm, to set up mmio handlers). It is normally called earlier during arch_domain_create(), but it looks to be okay to call here as long as it is done before the guest boots.

    d->options |= XEN_DOMCTL_CDF_vpci;
    domain_vpci_init(d);

Perhaps the flag should be set inside domain_vpci_init(d).

> }
> 
> We have already agreed that we want to aim for a setup where ioreqs
> and vPCI could be used for the same domain, but I guess you assumed
> that ioreqs would be used for emulated devices exclusively and vPCI
> for passthrough devices?
> 
> Overall if we agree that ioreqs and vPCI should co-exist for a domain,
> I'm not sure there's much reason to limit ioreqs to only handle
> emulated devices, seems like an arbitrary limitation.
> 
>> I don't think we should enable IOREQ servers to handle PCI passthrough
>> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
>> Passthrough can be handled by vPCI just fine. I think this should be a
>> good anti-feature to have (a goal to explicitly not add this feature) to
>> reduce complexity. Unless you see a specific usecase to add support for
>> it?
> 
> There are passthrough devices (GPUs) that might require some extra
> mediation on dom0 (like the Intel GVT-g thing) that would force the
> usage of ioreqs to passthrough.
> 
> It's important that the interfaces we introduce are correct IMO,
> because that ends up reflecting on the configuration options that we
> expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
> gets placed there will ultimately influence how the option gets
> exposed in xl/libxl, and the interface there is relevant to keep
> stable for end user sanity.
> 
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-04 10:58       ` Roger Pau Monné
  2023-12-04 19:09         ` Stewart Hildebrand
@ 2023-12-04 22:07         ` Stefano Stabellini
  2023-12-05 11:08           ` Roger Pau Monné
  1 sibling, 1 reply; 32+ messages in thread
From: Stefano Stabellini @ 2023-12-04 22:07 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Stewart Hildebrand, xen-devel, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 6809 bytes --]

On Mon, 4 Dec 2023, Roger Pau Monné wrote:
> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
> > On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> > > On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> > > > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> > > >          bus = PCI_BUS(machine_sbdf);
> > > >          devfn = PCI_DEVFN(machine_sbdf);
> > > >  
> > > > +        if ( needs_vpci(d) && !has_vpci(d) )
> > > > +        {
> > > > +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
> > > > +                   &PCI_SBDF(seg, bus, devfn), d);
> > > > +            ret = -EPERM;
> > > > +            break;
> > > 
> > > I think this is likely too restrictive going forward.  The current
> > > approach is indeed to enable vPCI on a per-domain basis because that's
> > > how PVH dom0 uses it, due to being unable to use ioreq servers.
> > > 
> > > If we start to expose vPCI suport to guests the interface should be on
> > > a per-device basis, so that vPCI could be enabled for some devices,
> > > while others could still be handled by ioreq servers.
> > > 
> > > We might want to add a new flag to xen_domctl_assign_device (used by
> > > XEN_DOMCTL_assign_device) in order to signal whether the device will
> > > use vPCI.
> > 
> > Actually I don't think this is a good idea. I am all for flexibility but
> > supporting multiple different configurations comes at an extra cost for
> > both maintainers and contributors. I think we should try to reduce the
> > amount of configurations we support rather than increasing them
> > (especially on x86 where we have PV, PVH, HVM).
> 
> I think it's perfectly fine to initially require a domain to have all
> its devices either passed through using vPCI or ireqs, but the
> interface IMO should allow for such differentiation in the future.
> That's why I think introducing a domain wide vPCI flag might not be
> the best option going forward.
> 
> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
> domain wide vPCI flag, iow:
> 
> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
> {
>     if ( has_arch_pdevs(d) )
>     {
>         printk("All passthrough devices must use the same backend\n");
>         return -EINVAL;
>     }
> 
>     /* Set vPCI domain flag */
> }

That would be fine by me, but maybe we can avoid this change too. I was
imagining that vPCI would be enabled at domain creation, not at runtime.
And that vPCI would be enabled by default for all PVH guests (once we
are past the initial experimental phase.)

> We have already agreed that we want to aim for a setup where ioreqs
> and vPCI could be used for the same domain, but I guess you assumed
> that ioreqs would be used for emulated devices exclusively and vPCI
> for passthrough devices?

Yes, that's right

> Overall if we agree that ioreqs and vPCI should co-exist for a domain,
> I'm not sure there's much reason to limit ioreqs to only handle
> emulated devices, seems like an arbitrary limitation.

Reply below

> > I don't think we should enable IOREQ servers to handle PCI passthrough
> > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> > Passthrough can be handled by vPCI just fine. I think this should be a
> > good anti-feature to have (a goal to explicitly not add this feature) to
> > reduce complexity. Unless you see a specific usecase to add support for
> > it?
> 
> There are passthrough devices (GPUs) that might require some extra
> mediation on dom0 (like the Intel GVT-g thing) that would force the
> usage of ioreqs to passthrough.

From an architectural perspective, I think it would be cleaner, simpler
to maintain, and simpler to understand if Xen was the sole owner of the
PCI Root Complex and PCI config space mediation (implemented with vPCI).
IOREQ can be used for emulation and it works very well for that. At
least in my mind, that makes things much simpler.

I understand there are non-trivial cases, like virtual GPUs with
hardware access, but I don't classify those as passthrough. That's
because there isn't one device that gets fully assigned to the guest.
Instead, there is an emulated device (hence IOREQ) with certain MMIO
regions and interrupts that end up being directly mapped from real
hardware.

So I think it is natural in those cases to use IOREQ and it is also
natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
perspective, I hope it will mostly look as if the device is assigned to
Dom0. Even if it ends up being more complex than that, Rome wasn't
built in one day, and I don't think we should try to solve this problem
on day1 (as long as the interfaces are not stable interfaces).

> It's important that the interfaces we introduce are correct IMO,
> because that ends up reflecting on the configuration options that we
> expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
> gets placed there will ultimately influence how the option gets
> exposed in xl/libxl, and the interface there is relevant to keep
> stable for end user sanity.

I agree with you on the stable interfaces. The important part is not to
introduce changes to stable interfaces that could limit us in the
future. Specifically that includes xl and libxl, we need to be careful
there. But I don't see a single per-domain vPCI enable/disable option as
a problem. Let's say that in the future we have a mediated vGPU
implementation: if it works together with vPCI then the per-domain vPCI
option in libxl will be enabled (either explicitely or by default), if
it doesn't then vPCI will be disabled (either explicitely or by the
newer vGPU options.)

For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
before adding more changes on top of them, not because I don't care
about the mediated GPU problem (we do have something similar at AMD),
but because I worry that if we try to change them now we might not do a
good enough job. I would prefer to wait until we know more about the
actual use case, ideally with code supporting it.

I think the difference in points of views comes from the fact that I see
vPCI as the default, QEMU only as a limited peripheral emulator (or
mediator for the vGPU case) but not in control. vPCI and QEMU are not
equal in my view. vPCI is in charge and always present if not in very
uncommon setups (even if we decide to hook it inside Xen by using
internal IOREQ interfaces). QEMU might come and go.

Now that I am writing this, I realize this is also why I wasn't too
happy with the idea of hooking vPCI using IOREQ. It makes them look as
if they are the same, while I don't they should be considered at the
same level of priority, criticality, safety, integration in the system,
etc.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-04 22:07         ` Stefano Stabellini
@ 2023-12-05 11:08           ` Roger Pau Monné
  2023-12-05 16:27             ` Stewart Hildebrand
                               ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Roger Pau Monné @ 2023-12-05 11:08 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Stewart Hildebrand, xen-devel, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Paul Durrant

On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote:
> On Mon, 4 Dec 2023, Roger Pau Monné wrote:
> > On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
> > > On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> > > > On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> > > > > @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> > > > >          bus = PCI_BUS(machine_sbdf);
> > > > >          devfn = PCI_DEVFN(machine_sbdf);
> > > > >  
> > > > > +        if ( needs_vpci(d) && !has_vpci(d) )
> > > > > +        {
> > > > > +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
> > > > > +                   &PCI_SBDF(seg, bus, devfn), d);
> > > > > +            ret = -EPERM;
> > > > > +            break;
> > > > 
> > > > I think this is likely too restrictive going forward.  The current
> > > > approach is indeed to enable vPCI on a per-domain basis because that's
> > > > how PVH dom0 uses it, due to being unable to use ioreq servers.
> > > > 
> > > > If we start to expose vPCI suport to guests the interface should be on
> > > > a per-device basis, so that vPCI could be enabled for some devices,
> > > > while others could still be handled by ioreq servers.
> > > > 
> > > > We might want to add a new flag to xen_domctl_assign_device (used by
> > > > XEN_DOMCTL_assign_device) in order to signal whether the device will
> > > > use vPCI.
> > > 
> > > Actually I don't think this is a good idea. I am all for flexibility but
> > > supporting multiple different configurations comes at an extra cost for
> > > both maintainers and contributors. I think we should try to reduce the
> > > amount of configurations we support rather than increasing them
> > > (especially on x86 where we have PV, PVH, HVM).
> > 
> > I think it's perfectly fine to initially require a domain to have all
> > its devices either passed through using vPCI or ireqs, but the
> > interface IMO should allow for such differentiation in the future.
> > That's why I think introducing a domain wide vPCI flag might not be
> > the best option going forward.
> > 
> > It would be perfectly fine for XEN_DOMCTL_assign_device to set a
> > domain wide vPCI flag, iow:
> > 
> > if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
> > {
> >     if ( has_arch_pdevs(d) )
> >     {
> >         printk("All passthrough devices must use the same backend\n");
> >         return -EINVAL;
> >     }
> > 
> >     /* Set vPCI domain flag */
> > }
> 
> That would be fine by me, but maybe we can avoid this change too. I was
> imagining that vPCI would be enabled at domain creation, not at runtime.
> And that vPCI would be enabled by default for all PVH guests (once we
> are past the initial experimental phase.)

Then we don't even need a new CDF flag, and just enable vPCI when
IOMMU is enabled?  IOW: we can key the enabling of vPCI to
XEN_DOMCTL_CDF_iommu for specific domain types?

Maybe that's not so trivial on x86, as there's no x86 PVH domain type
from the hypervisor PoV.

> 
> > We have already agreed that we want to aim for a setup where ioreqs
> > and vPCI could be used for the same domain, but I guess you assumed
> > that ioreqs would be used for emulated devices exclusively and vPCI
> > for passthrough devices?
> 
> Yes, that's right
> 
> 
> > Overall if we agree that ioreqs and vPCI should co-exist for a domain,
> > I'm not sure there's much reason to limit ioreqs to only handle
> > emulated devices, seems like an arbitrary limitation.
> 
> Reply below
> 
> 
> > > I don't think we should enable IOREQ servers to handle PCI passthrough
> > > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> > > Passthrough can be handled by vPCI just fine. I think this should be a
> > > good anti-feature to have (a goal to explicitly not add this feature) to
> > > reduce complexity. Unless you see a specific usecase to add support for
> > > it?
> > 
> > There are passthrough devices (GPUs) that might require some extra
> > mediation on dom0 (like the Intel GVT-g thing) that would force the
> > usage of ioreqs to passthrough.
> 
> From an architectural perspective, I think it would be cleaner, simpler
> to maintain, and simpler to understand if Xen was the sole owner of the
> PCI Root Complex and PCI config space mediation (implemented with vPCI).
> IOREQ can be used for emulation and it works very well for that. At
> least in my mind, that makes things much simpler.

But IOREQ already has all the code to mediate accesses to the PCI
config space, and the interface to register separate servers for
different PCI devices.

We would then need to duplicate this internally for vPCI, so that vPCI
could forward accesses to IOREQ just for IOREQ to forward to yet a
different component?  Seems like a lot of duplication for no benefit.

> I understand there are non-trivial cases, like virtual GPUs with
> hardware access, but I don't classify those as passthrough. That's
> because there isn't one device that gets fully assigned to the guest.
> Instead, there is an emulated device (hence IOREQ) with certain MMIO
> regions and interrupts that end up being directly mapped from real
> hardware.
> 
> So I think it is natural in those cases to use IOREQ and it is also
> natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
> perspective, I hope it will mostly look as if the device is assigned to
> Dom0. Even if it ends up being more complex than that, Rome wasn't
> built in one day, and I don't think we should try to solve this problem
> on day1 (as long as the interfaces are not stable interfaces).

I don't see IOREQ as dealing explicitly with emulation.  Yes, it does
allow for emulators to be implemented in user-space, but at the end
it's just an interface that allows forwarding accesses to certain
resources (for the case we are speaking about here, PCI config space)
to entities that registered as handlers.

vPCI OTOH just deals with a very specific resource (PCI config space)
and only allows internal handlers to be registered on a byte
granularity.

So your proposal would be to implement a hierarchy like the one on the
diagram below:

    ┌────────┐ ┌──────────┐ ┌──────────────────┐
    │ Memory │ │ IO Ports │ │ PCI config space │
    └───────┬┘ └┬─────────┘ └───┬──────────────┘
            │   │               │
            │   │           ┌───┴──┐
            │   │           │ vPCI │
            │   │           └─┬──┬─┘
         ┌──┴───┴┐            │  │
         │ IOREQ ├────────────┘  │
         └────┬──┘               │
              │                  │
 ┌────────────┴──┐              ┌┴──────────────┐
 │ IOREQ servers │              │ vPCI handlers │
 └───────────────┘              └───────────────┘

While what I'm proposing would look like:

    ┌────────┐ ┌──────────┐ ┌──────────────────┐
    │ Memory │ │ IO Ports │ │ PCI config space │
    └────┬───┘ └────┬─────┘ └────────┬─────────┘
         │          │                │
         └─────┬────┴────┬───────────┘
               │  IOREQ  │
               └─┬─────┬─┘
                 │     │
 ┌───────────────┤     └────┬──────┐
 │ IOREQ servers │          │ vPCI │
 └───────────────┘          └───┬──┘
                                │
                            ┌───┴───────────┐
                            │ vPCI handlers │
                            └───────────────┘

I'm obviously biased, but I think the latter is cleaner, and allows
all resources to be arbitrated by the same component (IOREQ).

If the concern is about the IOREQ hypercall interface, it would be
fine to introduce an option that limit IOREQs to internal users
(vPCI) without supporting external IOREQ servers.

Think of IOREQ as a resource mediator inside of Xen, that just does
the PCI address decoding and forwards the access to the interested
party, either an external IOREQ server or vPCI.

> 
> > It's important that the interfaces we introduce are correct IMO,
> > because that ends up reflecting on the configuration options that we
> > expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
> > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
> > gets placed there will ultimately influence how the option gets
> > exposed in xl/libxl, and the interface there is relevant to keep
> > stable for end user sanity.
> 
> I agree with you on the stable interfaces. The important part is not to
> introduce changes to stable interfaces that could limit us in the
> future. Specifically that includes xl and libxl, we need to be careful
> there. But I don't see a single per-domain vPCI enable/disable option as
> a problem. Let's say that in the future we have a mediated vGPU
> implementation: if it works together with vPCI then the per-domain vPCI
> option in libxl will be enabled (either explicitely or by default), if
> it doesn't then vPCI will be disabled (either explicitely or by the
> newer vGPU options.)

If vPCI is hooked into IOREQ there won't be a need anymore to register
the vPCI config space traps, as that would be done by IOREQ, and hence
vPCI managed devices could be registered at runtime with IOREQ.  IOW:
there won't be a need anymore to signal at domain creation whether
vPCI is intended to be used or not.

We would obviously need to enable IOREQ for all domains with IOMMU
enabled, as it would be IOREQ that register the PCI config space
handlers.

> For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
> before adding more changes on top of them, not because I don't care
> about the mediated GPU problem (we do have something similar at AMD),
> but because I worry that if we try to change them now we might not do a
> good enough job. I would prefer to wait until we know more about the
> actual use case, ideally with code supporting it.
> 
> I think the difference in points of views comes from the fact that I see
> vPCI as the default, QEMU only as a limited peripheral emulator (or
> mediator for the vGPU case) but not in control. vPCI and QEMU are not
> equal in my view. vPCI is in charge and always present if not in very
> uncommon setups (even if we decide to hook it inside Xen by using
> internal IOREQ interfaces). QEMU might come and go.

Xen needs a single component that mediates accesses to resources,
whether that's IOREQ, or something else I don't really care that much.
Having vPCI mediate accesses to the PCI config space, and IOREQ to the
memory (and on x86 IO port) space just seems awfully complicated for
AFAICT no real benefit.

Also, you seem to confabulate IOREQ with QEMU, while the latter is
indeed an user of IOREQ, I do see IOREQ as a simple resource mediator
inside of Xen, that has the ability to forward such accesses to
external emulators using an hypercall interface.

> Now that I am writing this, I realize this is also why I wasn't too
> happy with the idea of hooking vPCI using IOREQ. It makes them look as
> if they are the same, while I don't they should be considered at the
> same level of priority, criticality, safety, integration in the system,
> etc.

I feel there are some fears with IOREQ from a safety PoV?  The code
that does the resource multiplexing is small, and as said above if
there are safety concerns with the hypercall interface it would be
fine to limit it's usage to internal handlers only.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-05 11:08           ` Roger Pau Monné
@ 2023-12-05 16:27             ` Stewart Hildebrand
  2023-12-05 17:09               ` Roger Pau Monné
  2023-12-05 19:01             ` Stewart Hildebrand
  2023-12-06  2:34             ` Stefano Stabellini
  2 siblings, 1 reply; 32+ messages in thread
From: Stewart Hildebrand @ 2023-12-05 16:27 UTC (permalink / raw)
  To: Roger Pau Monné, Stefano Stabellini
  Cc: xen-devel, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Paul Durrant

On 12/5/23 06:08, Roger Pau Monné wrote:
> On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote:
>> On Mon, 4 Dec 2023, Roger Pau Monné wrote:
>>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
>>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
>>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>>>>>>          bus = PCI_BUS(machine_sbdf);
>>>>>>          devfn = PCI_DEVFN(machine_sbdf);
>>>>>>  
>>>>>> +        if ( needs_vpci(d) && !has_vpci(d) )
>>>>>> +        {
>>>>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
>>>>>> +                   &PCI_SBDF(seg, bus, devfn), d);
>>>>>> +            ret = -EPERM;
>>>>>> +            break;
>>>>>
>>>>> I think this is likely too restrictive going forward.  The current
>>>>> approach is indeed to enable vPCI on a per-domain basis because that's
>>>>> how PVH dom0 uses it, due to being unable to use ioreq servers.
>>>>>
>>>>> If we start to expose vPCI suport to guests the interface should be on
>>>>> a per-device basis, so that vPCI could be enabled for some devices,
>>>>> while others could still be handled by ioreq servers.
>>>>>
>>>>> We might want to add a new flag to xen_domctl_assign_device (used by
>>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will
>>>>> use vPCI.
>>>>
>>>> Actually I don't think this is a good idea. I am all for flexibility but
>>>> supporting multiple different configurations comes at an extra cost for
>>>> both maintainers and contributors. I think we should try to reduce the
>>>> amount of configurations we support rather than increasing them
>>>> (especially on x86 where we have PV, PVH, HVM).
>>>
>>> I think it's perfectly fine to initially require a domain to have all
>>> its devices either passed through using vPCI or ireqs, but the
>>> interface IMO should allow for such differentiation in the future.
>>> That's why I think introducing a domain wide vPCI flag might not be
>>> the best option going forward.
>>>
>>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
>>> domain wide vPCI flag, iow:
>>>
>>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
>>> {
>>>     if ( has_arch_pdevs(d) )
>>>     {
>>>         printk("All passthrough devices must use the same backend\n");
>>>         return -EINVAL;
>>>     }
>>>
>>>     /* Set vPCI domain flag */
>>> }
>>
>> That would be fine by me, but maybe we can avoid this change too. I was
>> imagining that vPCI would be enabled at domain creation, not at runtime.
>> And that vPCI would be enabled by default for all PVH guests (once we
>> are past the initial experimental phase.)
> 
> Then we don't even need a new CDF flag, and just enable vPCI when
> IOMMU is enabled?  IOW: we can key the enabling of vPCI to
> XEN_DOMCTL_CDF_iommu for specific domain types?

There are many Arm based platforms that need to use iommu but don't have (or don't use) PCI, so we'd still like to have a separate vPCI flag.

> 
> Maybe that's not so trivial on x86, as there's no x86 PVH domain type
> from the hypervisor PoV.
> 
>>
>>> We have already agreed that we want to aim for a setup where ioreqs
>>> and vPCI could be used for the same domain, but I guess you assumed
>>> that ioreqs would be used for emulated devices exclusively and vPCI
>>> for passthrough devices?
>>
>> Yes, that's right
>>
>>
>>> Overall if we agree that ioreqs and vPCI should co-exist for a domain,
>>> I'm not sure there's much reason to limit ioreqs to only handle
>>> emulated devices, seems like an arbitrary limitation.
>>
>> Reply below
>>
>>
>>>> I don't think we should enable IOREQ servers to handle PCI passthrough
>>>> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
>>>> Passthrough can be handled by vPCI just fine. I think this should be a
>>>> good anti-feature to have (a goal to explicitly not add this feature) to
>>>> reduce complexity. Unless you see a specific usecase to add support for
>>>> it?
>>>
>>> There are passthrough devices (GPUs) that might require some extra
>>> mediation on dom0 (like the Intel GVT-g thing) that would force the
>>> usage of ioreqs to passthrough.
>>
>> From an architectural perspective, I think it would be cleaner, simpler
>> to maintain, and simpler to understand if Xen was the sole owner of the
>> PCI Root Complex and PCI config space mediation (implemented with vPCI).
>> IOREQ can be used for emulation and it works very well for that. At
>> least in my mind, that makes things much simpler.
> 
> But IOREQ already has all the code to mediate accesses to the PCI
> config space, and the interface to register separate servers for
> different PCI devices.
> 
> We would then need to duplicate this internally for vPCI, so that vPCI
> could forward accesses to IOREQ just for IOREQ to forward to yet a
> different component?  Seems like a lot of duplication for no benefit.
> 
>> I understand there are non-trivial cases, like virtual GPUs with
>> hardware access, but I don't classify those as passthrough. That's
>> because there isn't one device that gets fully assigned to the guest.
>> Instead, there is an emulated device (hence IOREQ) with certain MMIO
>> regions and interrupts that end up being directly mapped from real
>> hardware.
>>
>> So I think it is natural in those cases to use IOREQ and it is also
>> natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
>> perspective, I hope it will mostly look as if the device is assigned to
>> Dom0. Even if it ends up being more complex than that, Rome wasn't
>> built in one day, and I don't think we should try to solve this problem
>> on day1 (as long as the interfaces are not stable interfaces).
> 
> I don't see IOREQ as dealing explicitly with emulation.  Yes, it does
> allow for emulators to be implemented in user-space, but at the end
> it's just an interface that allows forwarding accesses to certain
> resources (for the case we are speaking about here, PCI config space)
> to entities that registered as handlers.
> 
> vPCI OTOH just deals with a very specific resource (PCI config space)
> and only allows internal handlers to be registered on a byte
> granularity.
> 
> So your proposal would be to implement a hierarchy like the one on the
> diagram below:
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └───────┬┘ └┬─────────┘ └───┬──────────────┘
>             │   │               │
>             │   │           ┌───┴──┐
>             │   │           │ vPCI │
>             │   │           └─┬──┬─┘
>          ┌──┴───┴┐            │  │
>          │ IOREQ ├────────────┘  │
>          └────┬──┘               │
>               │                  │
>  ┌────────────┴──┐              ┌┴──────────────┐
>  │ IOREQ servers │              │ vPCI handlers │
>  └───────────────┘              └───────────────┘
> 
> While what I'm proposing would look like:
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └────┬───┘ └────┬─────┘ └────────┬─────────┘
>          │          │                │
>          └─────┬────┴────┬───────────┘
>                │  IOREQ  │
>                └─┬─────┬─┘
>                  │     │
>  ┌───────────────┤     └────┬──────┐
>  │ IOREQ servers │          │ vPCI │
>  └───────────────┘          └───┬──┘
>                                 │
>                             ┌───┴───────────┐
>                             │ vPCI handlers │
>                             └───────────────┘
> 
> I'm obviously biased, but I think the latter is cleaner, and allows
> all resources to be arbitrated by the same component (IOREQ).
> 
> If the concern is about the IOREQ hypercall interface, it would be
> fine to introduce an option that limit IOREQs to internal users
> (vPCI) without supporting external IOREQ servers.
> 
> Think of IOREQ as a resource mediator inside of Xen, that just does
> the PCI address decoding and forwards the access to the interested
> party, either an external IOREQ server or vPCI.
> 
>>
>>> It's important that the interfaces we introduce are correct IMO,
>>> because that ends up reflecting on the configuration options that we
>>> expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
>>> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
>>> gets placed there will ultimately influence how the option gets
>>> exposed in xl/libxl, and the interface there is relevant to keep
>>> stable for end user sanity.
>>
>> I agree with you on the stable interfaces. The important part is not to
>> introduce changes to stable interfaces that could limit us in the
>> future. Specifically that includes xl and libxl, we need to be careful
>> there. But I don't see a single per-domain vPCI enable/disable option as
>> a problem. Let's say that in the future we have a mediated vGPU
>> implementation: if it works together with vPCI then the per-domain vPCI
>> option in libxl will be enabled (either explicitely or by default), if
>> it doesn't then vPCI will be disabled (either explicitely or by the
>> newer vGPU options.)
> 
> If vPCI is hooked into IOREQ there won't be a need anymore to register
> the vPCI config space traps, as that would be done by IOREQ, and hence
> vPCI managed devices could be registered at runtime with IOREQ.  IOW:
> there won't be a need anymore to signal at domain creation whether
> vPCI is intended to be used or not.
> 
> We would obviously need to enable IOREQ for all domains with IOMMU
> enabled, as it would be IOREQ that register the PCI config space
> handlers.
> 
>> For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
>> before adding more changes on top of them, not because I don't care
>> about the mediated GPU problem (we do have something similar at AMD),
>> but because I worry that if we try to change them now we might not do a
>> good enough job. I would prefer to wait until we know more about the
>> actual use case, ideally with code supporting it.
>>
>> I think the difference in points of views comes from the fact that I see
>> vPCI as the default, QEMU only as a limited peripheral emulator (or
>> mediator for the vGPU case) but not in control. vPCI and QEMU are not
>> equal in my view. vPCI is in charge and always present if not in very
>> uncommon setups (even if we decide to hook it inside Xen by using
>> internal IOREQ interfaces). QEMU might come and go.
> 
> Xen needs a single component that mediates accesses to resources,
> whether that's IOREQ, or something else I don't really care that much.
> Having vPCI mediate accesses to the PCI config space, and IOREQ to the
> memory (and on x86 IO port) space just seems awfully complicated for
> AFAICT no real benefit.
> 
> Also, you seem to confabulate IOREQ with QEMU, while the latter is
> indeed an user of IOREQ, I do see IOREQ as a simple resource mediator
> inside of Xen, that has the ability to forward such accesses to
> external emulators using an hypercall interface.
> 
>> Now that I am writing this, I realize this is also why I wasn't too
>> happy with the idea of hooking vPCI using IOREQ. It makes them look as
>> if they are the same, while I don't they should be considered at the
>> same level of priority, criticality, safety, integration in the system,
>> etc.
> 
> I feel there are some fears with IOREQ from a safety PoV?  The code
> that does the resource multiplexing is small, and as said above if
> there are safety concerns with the hypercall interface it would be
> fine to limit it's usage to internal handlers only.
> 
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-05 16:27             ` Stewart Hildebrand
@ 2023-12-05 17:09               ` Roger Pau Monné
  2023-12-05 17:36                 ` Stewart Hildebrand
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2023-12-05 17:09 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Stefano Stabellini, xen-devel, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Paul Durrant

On Tue, Dec 05, 2023 at 11:27:03AM -0500, Stewart Hildebrand wrote:
> On 12/5/23 06:08, Roger Pau Monné wrote:
> > On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote:
> >> On Mon, 4 Dec 2023, Roger Pau Monné wrote:
> >>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
> >>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> >>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> >>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> >>>>>>          bus = PCI_BUS(machine_sbdf);
> >>>>>>          devfn = PCI_DEVFN(machine_sbdf);
> >>>>>>  
> >>>>>> +        if ( needs_vpci(d) && !has_vpci(d) )
> >>>>>> +        {
> >>>>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
> >>>>>> +                   &PCI_SBDF(seg, bus, devfn), d);
> >>>>>> +            ret = -EPERM;
> >>>>>> +            break;
> >>>>>
> >>>>> I think this is likely too restrictive going forward.  The current
> >>>>> approach is indeed to enable vPCI on a per-domain basis because that's
> >>>>> how PVH dom0 uses it, due to being unable to use ioreq servers.
> >>>>>
> >>>>> If we start to expose vPCI suport to guests the interface should be on
> >>>>> a per-device basis, so that vPCI could be enabled for some devices,
> >>>>> while others could still be handled by ioreq servers.
> >>>>>
> >>>>> We might want to add a new flag to xen_domctl_assign_device (used by
> >>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will
> >>>>> use vPCI.
> >>>>
> >>>> Actually I don't think this is a good idea. I am all for flexibility but
> >>>> supporting multiple different configurations comes at an extra cost for
> >>>> both maintainers and contributors. I think we should try to reduce the
> >>>> amount of configurations we support rather than increasing them
> >>>> (especially on x86 where we have PV, PVH, HVM).
> >>>
> >>> I think it's perfectly fine to initially require a domain to have all
> >>> its devices either passed through using vPCI or ireqs, but the
> >>> interface IMO should allow for such differentiation in the future.
> >>> That's why I think introducing a domain wide vPCI flag might not be
> >>> the best option going forward.
> >>>
> >>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
> >>> domain wide vPCI flag, iow:
> >>>
> >>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
> >>> {
> >>>     if ( has_arch_pdevs(d) )
> >>>     {
> >>>         printk("All passthrough devices must use the same backend\n");
> >>>         return -EINVAL;
> >>>     }
> >>>
> >>>     /* Set vPCI domain flag */
> >>> }
> >>
> >> That would be fine by me, but maybe we can avoid this change too. I was
> >> imagining that vPCI would be enabled at domain creation, not at runtime.
> >> And that vPCI would be enabled by default for all PVH guests (once we
> >> are past the initial experimental phase.)
> > 
> > Then we don't even need a new CDF flag, and just enable vPCI when
> > IOMMU is enabled?  IOW: we can key the enabling of vPCI to
> > XEN_DOMCTL_CDF_iommu for specific domain types?
> 
> There are many Arm based platforms that need to use iommu but don't have (or don't use) PCI, so we'd still like to have a separate vPCI flag.

OK, read below though - if we switch to vPCI being a descendant of
IOREQ (so that the PCI config space decoding is done by IOREQ) we
could hotplug vPCI managed devices at runtime without requiring any
prior initialization at domain create, since the traps to the PCI
config space would be setup by IOREQ.

We might need a PCI flag in order to signal whether the domain is
intended to use PCI devices or not, and so whether IOREQ needs to
setup PCI config space traps (either fully emulated or passthrough
devices).  But that would be arch-specific AFAICT, as on x86 we
always trap accesses to the PCI IO ports.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-05 17:09               ` Roger Pau Monné
@ 2023-12-05 17:36                 ` Stewart Hildebrand
  2023-12-05 23:25                   ` Stefano Stabellini
  0 siblings, 1 reply; 32+ messages in thread
From: Stewart Hildebrand @ 2023-12-05 17:36 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, xen-devel, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Paul Durrant

On 12/5/23 12:09, Roger Pau Monné wrote:
> On Tue, Dec 05, 2023 at 11:27:03AM -0500, Stewart Hildebrand wrote:
>> On 12/5/23 06:08, Roger Pau Monné wrote:
>>> On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote:
>>>> On Mon, 4 Dec 2023, Roger Pau Monné wrote:
>>>>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
>>>>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>>>>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
>>>>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>>>>>>>>          bus = PCI_BUS(machine_sbdf);
>>>>>>>>          devfn = PCI_DEVFN(machine_sbdf);
>>>>>>>>  
>>>>>>>> +        if ( needs_vpci(d) && !has_vpci(d) )
>>>>>>>> +        {
>>>>>>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
>>>>>>>> +                   &PCI_SBDF(seg, bus, devfn), d);
>>>>>>>> +            ret = -EPERM;
>>>>>>>> +            break;
>>>>>>>
>>>>>>> I think this is likely too restrictive going forward.  The current
>>>>>>> approach is indeed to enable vPCI on a per-domain basis because that's
>>>>>>> how PVH dom0 uses it, due to being unable to use ioreq servers.
>>>>>>>
>>>>>>> If we start to expose vPCI suport to guests the interface should be on
>>>>>>> a per-device basis, so that vPCI could be enabled for some devices,
>>>>>>> while others could still be handled by ioreq servers.
>>>>>>>
>>>>>>> We might want to add a new flag to xen_domctl_assign_device (used by
>>>>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will
>>>>>>> use vPCI.
>>>>>>
>>>>>> Actually I don't think this is a good idea. I am all for flexibility but
>>>>>> supporting multiple different configurations comes at an extra cost for
>>>>>> both maintainers and contributors. I think we should try to reduce the
>>>>>> amount of configurations we support rather than increasing them
>>>>>> (especially on x86 where we have PV, PVH, HVM).
>>>>>
>>>>> I think it's perfectly fine to initially require a domain to have all
>>>>> its devices either passed through using vPCI or ireqs, but the
>>>>> interface IMO should allow for such differentiation in the future.
>>>>> That's why I think introducing a domain wide vPCI flag might not be
>>>>> the best option going forward.
>>>>>
>>>>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
>>>>> domain wide vPCI flag, iow:
>>>>>
>>>>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
>>>>> {
>>>>>     if ( has_arch_pdevs(d) )
>>>>>     {
>>>>>         printk("All passthrough devices must use the same backend\n");
>>>>>         return -EINVAL;
>>>>>     }
>>>>>
>>>>>     /* Set vPCI domain flag */
>>>>> }
>>>>
>>>> That would be fine by me, but maybe we can avoid this change too. I was
>>>> imagining that vPCI would be enabled at domain creation, not at runtime.
>>>> And that vPCI would be enabled by default for all PVH guests (once we
>>>> are past the initial experimental phase.)
>>>
>>> Then we don't even need a new CDF flag, and just enable vPCI when
>>> IOMMU is enabled?  IOW: we can key the enabling of vPCI to
>>> XEN_DOMCTL_CDF_iommu for specific domain types?
>>
>> There are many Arm based platforms that need to use iommu but don't have (or don't use) PCI, so we'd still like to have a separate vPCI flag.
> 
> OK, read below though - if we switch to vPCI being a descendant of
> IOREQ (so that the PCI config space decoding is done by IOREQ) we
> could hotplug vPCI managed devices at runtime without requiring any
> prior initialization at domain create, since the traps to the PCI
> config space would be setup by IOREQ.
> 
> We might need a PCI flag in order to signal whether the domain is
> intended to use PCI devices or not, and so whether IOREQ needs to
> setup PCI config space traps (either fully emulated or passthrough
> devices).  But that would be arch-specific AFAICT, as on x86 we
> always trap accesses to the PCI IO ports.

On Arm, the toolstack (or dom0less creation code) needs to construct a {v,ioreq}PCI root complex node in the device tree before guest boot. Attempting to hot plug such a device tree node at runtime sounds like a goal for the (far) future. On Arm, we don't trap the {v,ioreq}PCI config space by default, so, yes, we for sure would need such a {v,ioreq}PCI flag for setting up the {v,ioreq}PCI mmio handlers if we go this route.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-05 17:36                 ` Stewart Hildebrand
@ 2023-12-05 23:25                   ` Stefano Stabellini
  0 siblings, 0 replies; 32+ messages in thread
From: Stefano Stabellini @ 2023-12-05 23:25 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Roger Pau Monné, Stefano Stabellini, xen-devel, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 4762 bytes --]

On Tue, 5 Dec 2023, Stewart Hildebrand wrote:
> On 12/5/23 12:09, Roger Pau Monné wrote:
> > On Tue, Dec 05, 2023 at 11:27:03AM -0500, Stewart Hildebrand wrote:
> >> On 12/5/23 06:08, Roger Pau Monné wrote:
> >>> On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote:
> >>>> On Mon, 4 Dec 2023, Roger Pau Monné wrote:
> >>>>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
> >>>>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> >>>>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> >>>>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> >>>>>>>>          bus = PCI_BUS(machine_sbdf);
> >>>>>>>>          devfn = PCI_DEVFN(machine_sbdf);
> >>>>>>>>  
> >>>>>>>> +        if ( needs_vpci(d) && !has_vpci(d) )
> >>>>>>>> +        {
> >>>>>>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
> >>>>>>>> +                   &PCI_SBDF(seg, bus, devfn), d);
> >>>>>>>> +            ret = -EPERM;
> >>>>>>>> +            break;
> >>>>>>>
> >>>>>>> I think this is likely too restrictive going forward.  The current
> >>>>>>> approach is indeed to enable vPCI on a per-domain basis because that's
> >>>>>>> how PVH dom0 uses it, due to being unable to use ioreq servers.
> >>>>>>>
> >>>>>>> If we start to expose vPCI suport to guests the interface should be on
> >>>>>>> a per-device basis, so that vPCI could be enabled for some devices,
> >>>>>>> while others could still be handled by ioreq servers.
> >>>>>>>
> >>>>>>> We might want to add a new flag to xen_domctl_assign_device (used by
> >>>>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will
> >>>>>>> use vPCI.
> >>>>>>
> >>>>>> Actually I don't think this is a good idea. I am all for flexibility but
> >>>>>> supporting multiple different configurations comes at an extra cost for
> >>>>>> both maintainers and contributors. I think we should try to reduce the
> >>>>>> amount of configurations we support rather than increasing them
> >>>>>> (especially on x86 where we have PV, PVH, HVM).
> >>>>>
> >>>>> I think it's perfectly fine to initially require a domain to have all
> >>>>> its devices either passed through using vPCI or ireqs, but the
> >>>>> interface IMO should allow for such differentiation in the future.
> >>>>> That's why I think introducing a domain wide vPCI flag might not be
> >>>>> the best option going forward.
> >>>>>
> >>>>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
> >>>>> domain wide vPCI flag, iow:
> >>>>>
> >>>>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
> >>>>> {
> >>>>>     if ( has_arch_pdevs(d) )
> >>>>>     {
> >>>>>         printk("All passthrough devices must use the same backend\n");
> >>>>>         return -EINVAL;
> >>>>>     }
> >>>>>
> >>>>>     /* Set vPCI domain flag */
> >>>>> }
> >>>>
> >>>> That would be fine by me, but maybe we can avoid this change too. I was
> >>>> imagining that vPCI would be enabled at domain creation, not at runtime.
> >>>> And that vPCI would be enabled by default for all PVH guests (once we
> >>>> are past the initial experimental phase.)
> >>>
> >>> Then we don't even need a new CDF flag, and just enable vPCI when
> >>> IOMMU is enabled?  IOW: we can key the enabling of vPCI to
> >>> XEN_DOMCTL_CDF_iommu for specific domain types?
> >>
> >> There are many Arm based platforms that need to use iommu but don't have (or don't use) PCI, so we'd still like to have a separate vPCI flag.
> > 
> > OK, read below though - if we switch to vPCI being a descendant of
> > IOREQ (so that the PCI config space decoding is done by IOREQ) we
> > could hotplug vPCI managed devices at runtime without requiring any
> > prior initialization at domain create, since the traps to the PCI
> > config space would be setup by IOREQ.
> > 
> > We might need a PCI flag in order to signal whether the domain is
> > intended to use PCI devices or not, and so whether IOREQ needs to
> > setup PCI config space traps (either fully emulated or passthrough
> > devices).  But that would be arch-specific AFAICT, as on x86 we
> > always trap accesses to the PCI IO ports.
> 
> On Arm, the toolstack (or dom0less creation code) needs to construct a {v,ioreq}PCI root complex node in the device tree before guest boot. Attempting to hot plug such a device tree node at runtime sounds like a goal for the (far) future. On Arm, we don't trap the {v,ioreq}PCI config space by default, so, yes, we for sure would need such a {v,ioreq}PCI flag for setting up the {v,ioreq}PCI mmio handlers if we go this route.

Yes and also dynamic configuration and hotplug are actually detrimental
in safety configurations where you need as much as possible, ideally
everything, to be specified at build time.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-05 11:08           ` Roger Pau Monné
  2023-12-05 16:27             ` Stewart Hildebrand
@ 2023-12-05 19:01             ` Stewart Hildebrand
  2023-12-11  9:42               ` Roger Pau Monné
  2023-12-06  2:34             ` Stefano Stabellini
  2 siblings, 1 reply; 32+ messages in thread
From: Stewart Hildebrand @ 2023-12-05 19:01 UTC (permalink / raw)
  To: Roger Pau Monné, Stefano Stabellini
  Cc: xen-devel, Julien Grall, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Paul Durrant

On 12/5/23 06:08, Roger Pau Monné wrote:
> On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote:
>> On Mon, 4 Dec 2023, Roger Pau Monné wrote:
>>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
>>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
>>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>>>>>>          bus = PCI_BUS(machine_sbdf);
>>>>>>          devfn = PCI_DEVFN(machine_sbdf);
>>>>>>  
>>>>>> +        if ( needs_vpci(d) && !has_vpci(d) )
>>>>>> +        {
>>>>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
>>>>>> +                   &PCI_SBDF(seg, bus, devfn), d);
>>>>>> +            ret = -EPERM;
>>>>>> +            break;
>>>>>
>>>>> I think this is likely too restrictive going forward.  The current
>>>>> approach is indeed to enable vPCI on a per-domain basis because that's
>>>>> how PVH dom0 uses it, due to being unable to use ioreq servers.
>>>>>
>>>>> If we start to expose vPCI suport to guests the interface should be on
>>>>> a per-device basis, so that vPCI could be enabled for some devices,
>>>>> while others could still be handled by ioreq servers.
>>>>>
>>>>> We might want to add a new flag to xen_domctl_assign_device (used by
>>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will
>>>>> use vPCI.
>>>>
>>>> Actually I don't think this is a good idea. I am all for flexibility but
>>>> supporting multiple different configurations comes at an extra cost for
>>>> both maintainers and contributors. I think we should try to reduce the
>>>> amount of configurations we support rather than increasing them
>>>> (especially on x86 where we have PV, PVH, HVM).
>>>
>>> I think it's perfectly fine to initially require a domain to have all
>>> its devices either passed through using vPCI or ireqs, but the
>>> interface IMO should allow for such differentiation in the future.
>>> That's why I think introducing a domain wide vPCI flag might not be
>>> the best option going forward.
>>>
>>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
>>> domain wide vPCI flag, iow:
>>>
>>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
>>> {
>>>     if ( has_arch_pdevs(d) )
>>>     {
>>>         printk("All passthrough devices must use the same backend\n");
>>>         return -EINVAL;
>>>     }
>>>
>>>     /* Set vPCI domain flag */
>>> }
>>
>> That would be fine by me, but maybe we can avoid this change too. I was
>> imagining that vPCI would be enabled at domain creation, not at runtime.
>> And that vPCI would be enabled by default for all PVH guests (once we
>> are past the initial experimental phase.)
> 
> Then we don't even need a new CDF flag, and just enable vPCI when
> IOMMU is enabled?  IOW: we can key the enabling of vPCI to
> XEN_DOMCTL_CDF_iommu for specific domain types?
> 
> Maybe that's not so trivial on x86, as there's no x86 PVH domain type
> from the hypervisor PoV.
> 
>>
>>> We have already agreed that we want to aim for a setup where ioreqs
>>> and vPCI could be used for the same domain, but I guess you assumed
>>> that ioreqs would be used for emulated devices exclusively and vPCI
>>> for passthrough devices?
>>
>> Yes, that's right
>>
>>
>>> Overall if we agree that ioreqs and vPCI should co-exist for a domain,
>>> I'm not sure there's much reason to limit ioreqs to only handle
>>> emulated devices, seems like an arbitrary limitation.
>>
>> Reply below
>>
>>
>>>> I don't think we should enable IOREQ servers to handle PCI passthrough
>>>> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
>>>> Passthrough can be handled by vPCI just fine. I think this should be a
>>>> good anti-feature to have (a goal to explicitly not add this feature) to
>>>> reduce complexity. Unless you see a specific usecase to add support for
>>>> it?
>>>
>>> There are passthrough devices (GPUs) that might require some extra
>>> mediation on dom0 (like the Intel GVT-g thing) that would force the
>>> usage of ioreqs to passthrough.
>>
>> From an architectural perspective, I think it would be cleaner, simpler
>> to maintain, and simpler to understand if Xen was the sole owner of the
>> PCI Root Complex and PCI config space mediation (implemented with vPCI).
>> IOREQ can be used for emulation and it works very well for that. At
>> least in my mind, that makes things much simpler.
> 
> But IOREQ already has all the code to mediate accesses to the PCI
> config space, and the interface to register separate servers for
> different PCI devices.
> 
> We would then need to duplicate this internally for vPCI, so that vPCI
> could forward accesses to IOREQ just for IOREQ to forward to yet a
> different component?  Seems like a lot of duplication for no benefit.
> 
>> I understand there are non-trivial cases, like virtual GPUs with
>> hardware access, but I don't classify those as passthrough. That's
>> because there isn't one device that gets fully assigned to the guest.
>> Instead, there is an emulated device (hence IOREQ) with certain MMIO
>> regions and interrupts that end up being directly mapped from real
>> hardware.
>>
>> So I think it is natural in those cases to use IOREQ and it is also
>> natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
>> perspective, I hope it will mostly look as if the device is assigned to
>> Dom0. Even if it ends up being more complex than that, Rome wasn't
>> built in one day, and I don't think we should try to solve this problem
>> on day1 (as long as the interfaces are not stable interfaces).
> 
> I don't see IOREQ as dealing explicitly with emulation.  Yes, it does
> allow for emulators to be implemented in user-space, but at the end
> it's just an interface that allows forwarding accesses to certain
> resources (for the case we are speaking about here, PCI config space)
> to entities that registered as handlers.
> 
> vPCI OTOH just deals with a very specific resource (PCI config space)
> and only allows internal handlers to be registered on a byte
> granularity.
> 
> So your proposal would be to implement a hierarchy like the one on the
> diagram below:
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └───────┬┘ └┬─────────┘ └───┬──────────────┘
>             │   │               │
>             │   │           ┌───┴──┐
>             │   │           │ vPCI │
>             │   │           └─┬──┬─┘
>          ┌──┴───┴┐            │  │
>          │ IOREQ ├────────────┘  │
>          └────┬──┘               │
>               │                  │
>  ┌────────────┴──┐              ┌┴──────────────┐
>  │ IOREQ servers │              │ vPCI handlers │
>  └───────────────┘              └───────────────┘
> 
> While what I'm proposing would look like:
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └────┬───┘ └────┬─────┘ └────────┬─────────┘
>          │          │                │
>          └─────┬────┴────┬───────────┘
>                │  IOREQ  │
>                └─┬─────┬─┘
>                  │     │
>  ┌───────────────┤     └────┬──────┐
>  │ IOREQ servers │          │ vPCI │
>  └───────────────┘          └───┬──┘
>                                 │
>                             ┌───┴───────────┐
>                             │ vPCI handlers │
>                             └───────────────┘
> 
> I'm obviously biased, but I think the latter is cleaner, and allows
> all resources to be arbitrated by the same component (IOREQ).
> 
> If the concern is about the IOREQ hypercall interface, it would be
> fine to introduce an option that limit IOREQs to internal users
> (vPCI) without supporting external IOREQ servers.
> 
> Think of IOREQ as a resource mediator inside of Xen, that just does
> the PCI address decoding and forwards the access to the interested
> party, either an external IOREQ server or vPCI.
> 
>>
>>> It's important that the interfaces we introduce are correct IMO,
>>> because that ends up reflecting on the configuration options that we
>>> expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
>>> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
>>> gets placed there will ultimately influence how the option gets
>>> exposed in xl/libxl, and the interface there is relevant to keep
>>> stable for end user sanity.
>>
>> I agree with you on the stable interfaces. The important part is not to
>> introduce changes to stable interfaces that could limit us in the
>> future. Specifically that includes xl and libxl, we need to be careful
>> there. But I don't see a single per-domain vPCI enable/disable option as
>> a problem. Let's say that in the future we have a mediated vGPU
>> implementation: if it works together with vPCI then the per-domain vPCI
>> option in libxl will be enabled (either explicitely or by default), if
>> it doesn't then vPCI will be disabled (either explicitely or by the
>> newer vGPU options.)
> 
> If vPCI is hooked into IOREQ there won't be a need anymore to register
> the vPCI config space traps, as that would be done by IOREQ, and hence
> vPCI managed devices could be registered at runtime with IOREQ.  IOW:
> there won't be a need anymore to signal at domain creation whether
> vPCI is intended to be used or not.
> 
> We would obviously need to enable IOREQ for all domains with IOMMU
> enabled, as it would be IOREQ that register the PCI config space
> handlers.
> 
>> For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
>> before adding more changes on top of them, not because I don't care
>> about the mediated GPU problem (we do have something similar at AMD),
>> but because I worry that if we try to change them now we might not do a
>> good enough job. I would prefer to wait until we know more about the
>> actual use case, ideally with code supporting it.
>>
>> I think the difference in points of views comes from the fact that I see
>> vPCI as the default, QEMU only as a limited peripheral emulator (or
>> mediator for the vGPU case) but not in control. vPCI and QEMU are not
>> equal in my view. vPCI is in charge and always present if not in very
>> uncommon setups (even if we decide to hook it inside Xen by using
>> internal IOREQ interfaces). QEMU might come and go.
> 
> Xen needs a single component that mediates accesses to resources,
> whether that's IOREQ, or something else I don't really care that much.

I just wanted to share what the "something else" diagram might look like.

    ┌────────┐ ┌──────────┐ ┌──────────────────┐
    │ Memory │ │ IO Ports │ │ PCI config space │
    └────┬───┘ └────┬─────┘ └──────────┬───────┘
         │          │                  │
         └──┬───────┴───────────────┬──┘
            │ PCI Resource Mediator │
            └────┬─────┬────────────┘
                 │     │
         ┌───────┤     └────┬──────┐
         │ IOREQ │          │ vPCI │
         └────┬──┘          └───┬──┘
              │                 │
 ┌────────────┴──┐          ┌───┴───────────┐
 │ IOREQ servers │          │ vPCI handlers │
 └───────────────┘          └───────────────┘

> Having vPCI mediate accesses to the PCI config space, and IOREQ to the
> memory (and on x86 IO port) space just seems awfully complicated for
> AFAICT no real benefit.
> 
> Also, you seem to confabulate IOREQ with QEMU, while the latter is
> indeed an user of IOREQ, I do see IOREQ as a simple resource mediator
> inside of Xen, that has the ability to forward such accesses to
> external emulators using an hypercall interface.
> 
>> Now that I am writing this, I realize this is also why I wasn't too
>> happy with the idea of hooking vPCI using IOREQ. It makes them look as
>> if they are the same, while I don't they should be considered at the
>> same level of priority, criticality, safety, integration in the system,
>> etc.
> 
> I feel there are some fears with IOREQ from a safety PoV?  The code
> that does the resource multiplexing is small, and as said above if
> there are safety concerns with the hypercall interface it would be
> fine to limit it's usage to internal handlers only.

Would it make any sense at all to split the resource multiplexing bits from IOREQ into a new separate PCI resource mediator?

> 
> Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-05 19:01             ` Stewart Hildebrand
@ 2023-12-11  9:42               ` Roger Pau Monné
  0 siblings, 0 replies; 32+ messages in thread
From: Roger Pau Monné @ 2023-12-11  9:42 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Stefano Stabellini, xen-devel, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Paul Durrant

On Tue, Dec 05, 2023 at 02:01:46PM -0500, Stewart Hildebrand wrote:
> On 12/5/23 06:08, Roger Pau Monné wrote:
> > On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote:
> >> On Mon, 4 Dec 2023, Roger Pau Monné wrote:
> >>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
> >>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
> >>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
> >>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
> >>>>>>          bus = PCI_BUS(machine_sbdf);
> >>>>>>          devfn = PCI_DEVFN(machine_sbdf);
> >>>>>>  
> >>>>>> +        if ( needs_vpci(d) && !has_vpci(d) )
> >>>>>> +        {
> >>>>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI support not enabled\n",
> >>>>>> +                   &PCI_SBDF(seg, bus, devfn), d);
> >>>>>> +            ret = -EPERM;
> >>>>>> +            break;
> >>>>>
> >>>>> I think this is likely too restrictive going forward.  The current
> >>>>> approach is indeed to enable vPCI on a per-domain basis because that's
> >>>>> how PVH dom0 uses it, due to being unable to use ioreq servers.
> >>>>>
> >>>>> If we start to expose vPCI suport to guests the interface should be on
> >>>>> a per-device basis, so that vPCI could be enabled for some devices,
> >>>>> while others could still be handled by ioreq servers.
> >>>>>
> >>>>> We might want to add a new flag to xen_domctl_assign_device (used by
> >>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will
> >>>>> use vPCI.
> >>>>
> >>>> Actually I don't think this is a good idea. I am all for flexibility but
> >>>> supporting multiple different configurations comes at an extra cost for
> >>>> both maintainers and contributors. I think we should try to reduce the
> >>>> amount of configurations we support rather than increasing them
> >>>> (especially on x86 where we have PV, PVH, HVM).
> >>>
> >>> I think it's perfectly fine to initially require a domain to have all
> >>> its devices either passed through using vPCI or ireqs, but the
> >>> interface IMO should allow for such differentiation in the future.
> >>> That's why I think introducing a domain wide vPCI flag might not be
> >>> the best option going forward.
> >>>
> >>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
> >>> domain wide vPCI flag, iow:
> >>>
> >>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
> >>> {
> >>>     if ( has_arch_pdevs(d) )
> >>>     {
> >>>         printk("All passthrough devices must use the same backend\n");
> >>>         return -EINVAL;
> >>>     }
> >>>
> >>>     /* Set vPCI domain flag */
> >>> }
> >>
> >> That would be fine by me, but maybe we can avoid this change too. I was
> >> imagining that vPCI would be enabled at domain creation, not at runtime.
> >> And that vPCI would be enabled by default for all PVH guests (once we
> >> are past the initial experimental phase.)
> > 
> > Then we don't even need a new CDF flag, and just enable vPCI when
> > IOMMU is enabled?  IOW: we can key the enabling of vPCI to
> > XEN_DOMCTL_CDF_iommu for specific domain types?
> > 
> > Maybe that's not so trivial on x86, as there's no x86 PVH domain type
> > from the hypervisor PoV.
> > 
> >>
> >>> We have already agreed that we want to aim for a setup where ioreqs
> >>> and vPCI could be used for the same domain, but I guess you assumed
> >>> that ioreqs would be used for emulated devices exclusively and vPCI
> >>> for passthrough devices?
> >>
> >> Yes, that's right
> >>
> >>
> >>> Overall if we agree that ioreqs and vPCI should co-exist for a domain,
> >>> I'm not sure there's much reason to limit ioreqs to only handle
> >>> emulated devices, seems like an arbitrary limitation.
> >>
> >> Reply below
> >>
> >>
> >>>> I don't think we should enable IOREQ servers to handle PCI passthrough
> >>>> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> >>>> Passthrough can be handled by vPCI just fine. I think this should be a
> >>>> good anti-feature to have (a goal to explicitly not add this feature) to
> >>>> reduce complexity. Unless you see a specific usecase to add support for
> >>>> it?
> >>>
> >>> There are passthrough devices (GPUs) that might require some extra
> >>> mediation on dom0 (like the Intel GVT-g thing) that would force the
> >>> usage of ioreqs to passthrough.
> >>
> >> From an architectural perspective, I think it would be cleaner, simpler
> >> to maintain, and simpler to understand if Xen was the sole owner of the
> >> PCI Root Complex and PCI config space mediation (implemented with vPCI).
> >> IOREQ can be used for emulation and it works very well for that. At
> >> least in my mind, that makes things much simpler.
> > 
> > But IOREQ already has all the code to mediate accesses to the PCI
> > config space, and the interface to register separate servers for
> > different PCI devices.
> > 
> > We would then need to duplicate this internally for vPCI, so that vPCI
> > could forward accesses to IOREQ just for IOREQ to forward to yet a
> > different component?  Seems like a lot of duplication for no benefit.
> > 
> >> I understand there are non-trivial cases, like virtual GPUs with
> >> hardware access, but I don't classify those as passthrough. That's
> >> because there isn't one device that gets fully assigned to the guest.
> >> Instead, there is an emulated device (hence IOREQ) with certain MMIO
> >> regions and interrupts that end up being directly mapped from real
> >> hardware.
> >>
> >> So I think it is natural in those cases to use IOREQ and it is also
> >> natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
> >> perspective, I hope it will mostly look as if the device is assigned to
> >> Dom0. Even if it ends up being more complex than that, Rome wasn't
> >> built in one day, and I don't think we should try to solve this problem
> >> on day1 (as long as the interfaces are not stable interfaces).
> > 
> > I don't see IOREQ as dealing explicitly with emulation.  Yes, it does
> > allow for emulators to be implemented in user-space, but at the end
> > it's just an interface that allows forwarding accesses to certain
> > resources (for the case we are speaking about here, PCI config space)
> > to entities that registered as handlers.
> > 
> > vPCI OTOH just deals with a very specific resource (PCI config space)
> > and only allows internal handlers to be registered on a byte
> > granularity.
> > 
> > So your proposal would be to implement a hierarchy like the one on the
> > diagram below:
> > 
> >     ┌────────┐ ┌──────────┐ ┌──────────────────┐
> >     │ Memory │ │ IO Ports │ │ PCI config space │
> >     └───────┬┘ └┬─────────┘ └───┬──────────────┘
> >             │   │               │
> >             │   │           ┌───┴──┐
> >             │   │           │ vPCI │
> >             │   │           └─┬──┬─┘
> >          ┌──┴───┴┐            │  │
> >          │ IOREQ ├────────────┘  │
> >          └────┬──┘               │
> >               │                  │
> >  ┌────────────┴──┐              ┌┴──────────────┐
> >  │ IOREQ servers │              │ vPCI handlers │
> >  └───────────────┘              └───────────────┘
> > 
> > While what I'm proposing would look like:
> > 
> >     ┌────────┐ ┌──────────┐ ┌──────────────────┐
> >     │ Memory │ │ IO Ports │ │ PCI config space │
> >     └────┬───┘ └────┬─────┘ └────────┬─────────┘
> >          │          │                │
> >          └─────┬────┴────┬───────────┘
> >                │  IOREQ  │
> >                └─┬─────┬─┘
> >                  │     │
> >  ┌───────────────┤     └────┬──────┐
> >  │ IOREQ servers │          │ vPCI │
> >  └───────────────┘          └───┬──┘
> >                                 │
> >                             ┌───┴───────────┐
> >                             │ vPCI handlers │
> >                             └───────────────┘
> > 
> > I'm obviously biased, but I think the latter is cleaner, and allows
> > all resources to be arbitrated by the same component (IOREQ).
> > 
> > If the concern is about the IOREQ hypercall interface, it would be
> > fine to introduce an option that limit IOREQs to internal users
> > (vPCI) without supporting external IOREQ servers.
> > 
> > Think of IOREQ as a resource mediator inside of Xen, that just does
> > the PCI address decoding and forwards the access to the interested
> > party, either an external IOREQ server or vPCI.
> > 
> >>
> >>> It's important that the interfaces we introduce are correct IMO,
> >>> because that ends up reflecting on the configuration options that we
> >>> expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
> >>> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
> >>> gets placed there will ultimately influence how the option gets
> >>> exposed in xl/libxl, and the interface there is relevant to keep
> >>> stable for end user sanity.
> >>
> >> I agree with you on the stable interfaces. The important part is not to
> >> introduce changes to stable interfaces that could limit us in the
> >> future. Specifically that includes xl and libxl, we need to be careful
> >> there. But I don't see a single per-domain vPCI enable/disable option as
> >> a problem. Let's say that in the future we have a mediated vGPU
> >> implementation: if it works together with vPCI then the per-domain vPCI
> >> option in libxl will be enabled (either explicitely or by default), if
> >> it doesn't then vPCI will be disabled (either explicitely or by the
> >> newer vGPU options.)
> > 
> > If vPCI is hooked into IOREQ there won't be a need anymore to register
> > the vPCI config space traps, as that would be done by IOREQ, and hence
> > vPCI managed devices could be registered at runtime with IOREQ.  IOW:
> > there won't be a need anymore to signal at domain creation whether
> > vPCI is intended to be used or not.
> > 
> > We would obviously need to enable IOREQ for all domains with IOMMU
> > enabled, as it would be IOREQ that register the PCI config space
> > handlers.
> > 
> >> For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
> >> before adding more changes on top of them, not because I don't care
> >> about the mediated GPU problem (we do have something similar at AMD),
> >> but because I worry that if we try to change them now we might not do a
> >> good enough job. I would prefer to wait until we know more about the
> >> actual use case, ideally with code supporting it.
> >>
> >> I think the difference in points of views comes from the fact that I see
> >> vPCI as the default, QEMU only as a limited peripheral emulator (or
> >> mediator for the vGPU case) but not in control. vPCI and QEMU are not
> >> equal in my view. vPCI is in charge and always present if not in very
> >> uncommon setups (even if we decide to hook it inside Xen by using
> >> internal IOREQ interfaces). QEMU might come and go.
> > 
> > Xen needs a single component that mediates accesses to resources,
> > whether that's IOREQ, or something else I don't really care that much.
> 
> I just wanted to share what the "something else" diagram might look like.
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └────┬───┘ └────┬─────┘ └──────────┬───────┘
>          │          │                  │
>          └──┬───────┴───────────────┬──┘
>             │ PCI Resource Mediator │
>             └────┬─────┬────────────┘
>                  │     │
>          ┌───────┤     └────┬──────┐
>          │ IOREQ │          │ vPCI │
>          └────┬──┘          └───┬──┘
>               │                 │
>  ┌────────────┴──┐          ┌───┴───────────┐
>  │ IOREQ servers │          │ vPCI handlers │
>  └───────────────┘          └───────────────┘

It's IMO weird that the PCI resource mediator also controls Memory
and IO ports, since that's not a PCI specific resource.

Isn't your proposed "PCI Resource Mediator" the same as what IOREQ
currently does?

I'm kind of confused as what benefit there is in introducing another
layer that multiplexes guest resource accesses.

> > Having vPCI mediate accesses to the PCI config space, and IOREQ to the
> > memory (and on x86 IO port) space just seems awfully complicated for
> > AFAICT no real benefit.
> > 
> > Also, you seem to confabulate IOREQ with QEMU, while the latter is
> > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator
> > inside of Xen, that has the ability to forward such accesses to
> > external emulators using an hypercall interface.
> > 
> >> Now that I am writing this, I realize this is also why I wasn't too
> >> happy with the idea of hooking vPCI using IOREQ. It makes them look as
> >> if they are the same, while I don't they should be considered at the
> >> same level of priority, criticality, safety, integration in the system,
> >> etc.
> > 
> > I feel there are some fears with IOREQ from a safety PoV?  The code
> > that does the resource multiplexing is small, and as said above if
> > there are safety concerns with the hypercall interface it would be
> > fine to limit it's usage to internal handlers only.
> 
> Would it make any sense at all to split the resource multiplexing bits from IOREQ into a new separate PCI resource mediator?

That might be fine, but seems like a lot of work and more complexity
in Xen for AFAICT no real benefit.

I think I would need to better understand the worries with using
IOREQ, but wouldn't it be easier to just limit the current IOREQ
code/interface to suit your needs?  Again without knowing exactly what
are the issues with using IOREQ	it's hard to propose solutions.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-05 11:08           ` Roger Pau Monné
  2023-12-05 16:27             ` Stewart Hildebrand
  2023-12-05 19:01             ` Stewart Hildebrand
@ 2023-12-06  2:34             ` Stefano Stabellini
  2023-12-11 10:36               ` Roger Pau Monné
  2 siblings, 1 reply; 32+ messages in thread
From: Stefano Stabellini @ 2023-12-06  2:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Stewart Hildebrand, xen-devel, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 12135 bytes --]

On Tue, 5 Dec 2023, Roger Pau Monné wrote:
> > > > I don't think we should enable IOREQ servers to handle PCI passthrough
> > > > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> > > > Passthrough can be handled by vPCI just fine. I think this should be a
> > > > good anti-feature to have (a goal to explicitly not add this feature) to
> > > > reduce complexity. Unless you see a specific usecase to add support for
> > > > it?
> > > 
> > > There are passthrough devices (GPUs) that might require some extra
> > > mediation on dom0 (like the Intel GVT-g thing) that would force the
> > > usage of ioreqs to passthrough.
> > 
> > From an architectural perspective, I think it would be cleaner, simpler
> > to maintain, and simpler to understand if Xen was the sole owner of the
> > PCI Root Complex and PCI config space mediation (implemented with vPCI).
> > IOREQ can be used for emulation and it works very well for that. At
> > least in my mind, that makes things much simpler.
> 
> But IOREQ already has all the code to mediate accesses to the PCI
> config space, and the interface to register separate servers for
> different PCI devices.
> 
> We would then need to duplicate this internally for vPCI, so that vPCI
> could forward accesses to IOREQ just for IOREQ to forward to yet a
> different component?  Seems like a lot of duplication for no benefit.

[...] 
 
> Also, you seem to confabulate IOREQ with QEMU, while the latter is
> indeed an user of IOREQ, I do see IOREQ as a simple resource mediator
> inside of Xen, that has the ability to forward such accesses to
> external emulators using an hypercall interface.

We have been using different terminologies until now. IOREQ could mean
anything from the ABI interface, the emulator side (QEMU) or the
hypervisor side (Xen). I am going to align with your wording and say:

IOREQ: only the IOREQ implementation in Xen (xen/common/ioreq.c)
IOREQ server: QEMU or alternative

I think it is OK if we use IOREQ internally within Xen to hook vPCI with
PCI config space accesses and emulation. I don't think it is a good idea
to attempt to enable IOREQ servers (e.g. QEMU) to implement PCI
Passthrough when vPCI is also enabled for the domain, at least
initially.


> > I understand there are non-trivial cases, like virtual GPUs with
> > hardware access, but I don't classify those as passthrough. That's
> > because there isn't one device that gets fully assigned to the guest.
> > Instead, there is an emulated device (hence IOREQ) with certain MMIO
> > regions and interrupts that end up being directly mapped from real
> > hardware.
> > 
> > So I think it is natural in those cases to use IOREQ and it is also
> > natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
> > perspective, I hope it will mostly look as if the device is assigned to
> > Dom0. Even if it ends up being more complex than that, Rome wasn't
> > built in one day, and I don't think we should try to solve this problem
> > on day1 (as long as the interfaces are not stable interfaces).
> 
> I don't see IOREQ as dealing explicitly with emulation.  Yes, it does
> allow for emulators to be implemented in user-space, but at the end
> it's just an interface that allows forwarding accesses to certain
> resources (for the case we are speaking about here, PCI config space)
> to entities that registered as handlers.
> 
> vPCI OTOH just deals with a very specific resource (PCI config space)
> and only allows internal handlers to be registered on a byte
> granularity.
> 
> So your proposal would be to implement a hierarchy like the one on the
> diagram below:
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └───────┬┘ └┬─────────┘ └───┬──────────────┘
>             │   │               │
>             │   │           ┌───┴──┐
>             │   │           │ vPCI │
>             │   │           └─┬──┬─┘
>          ┌──┴───┴┐            │  │
>          │ IOREQ ├────────────┘  │
>          └────┬──┘               │
>               │                  │
>  ┌────────────┴──┐              ┌┴──────────────┐
>  │ IOREQ servers │              │ vPCI handlers │
>  └───────────────┘              └───────────────┘

Yes


> While what I'm proposing would look like:
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └────┬───┘ └────┬─────┘ └────────┬─────────┘
>          │          │                │
>          └─────┬────┴────┬───────────┘
>                │  IOREQ  │
>                └─┬─────┬─┘
>                  │     │
>  ┌───────────────┤     └────┬──────┐
>  │ IOREQ servers │          │ vPCI │
>  └───────────────┘          └───┬──┘
>                                 │
>                             ┌───┴───────────┐
>                             │ vPCI handlers │
>                             └───────────────┘

I don't have a major problem with this, but I find it less clear than
the first one.

Let's say that all domains are PVH (or ARM guests). QEMU is running in
Dom0. If QEMU does emulation, that's fine. If QEMU implements PCI
Passthrough then QEMU uses libpci to do PCI config space reads and
write, which go to the Linux kernel in Dom0, which ends up doing PCI
config space reads and writes on the device, and that goes via vPCI in
Xen (vPCI for Dom0). So actually vPCI is still present. It is a lot
simpler to think that vPCI is in charge of all mediated PCI config space
accesses rather than thinking that for the same device vPCI handles PCI
config space accesses for Dom0 but not for DomU.

It is not my preference but I am OK to compromise and go ahead with the
architecture you proposed but please let's keep IOREQ servers out of the
PCI Passthrough picture at least initially.


> I'm obviously biased, but I think the latter is cleaner, and allows
> all resources to be arbitrated by the same component (IOREQ).
> 
> If the concern is about the IOREQ hypercall interface, it would be
> fine to introduce an option that limit IOREQs to internal users
> (vPCI) without supporting external IOREQ servers.
> 
> Think of IOREQ as a resource mediator inside of Xen, that just does
> the PCI address decoding and forwards the access to the interested
> party, either an external IOREQ server or vPCI.

The part about IOREQ (xen/common/ioreq.c) being a resource mediator is
OKish.

I had many discussions over the years with various members of the larger
open source embedded community (Linaro, etc.) and the problem is that
when one says "IOREQ" typically people think of QEMU or other userspace
emulators. They don't think of the Xen side of it. This becomes very
relevant here because Xen is the only part of the system that is
getting safety-certified and it is important to convey the message that
nothing else in required to be safety-certified to have a fully working
Xen system that supports PCI Passthrough.

In short, it is important that the community doesn't get the idea that
QEMU needs to be safety-certified to have PCI Passthrough working
correctly with Xen in a safety scenario.

 
> > > It's important that the interfaces we introduce are correct IMO,
> > > because that ends up reflecting on the configuration options that we
> > > expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
> > > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
> > > gets placed there will ultimately influence how the option gets
> > > exposed in xl/libxl, and the interface there is relevant to keep
> > > stable for end user sanity.
> > 
> > I agree with you on the stable interfaces. The important part is not to
> > introduce changes to stable interfaces that could limit us in the
> > future. Specifically that includes xl and libxl, we need to be careful
> > there. But I don't see a single per-domain vPCI enable/disable option as
> > a problem. Let's say that in the future we have a mediated vGPU
> > implementation: if it works together with vPCI then the per-domain vPCI
> > option in libxl will be enabled (either explicitely or by default), if
> > it doesn't then vPCI will be disabled (either explicitely or by the
> > newer vGPU options.)
> 
> If vPCI is hooked into IOREQ there won't be a need anymore to register
> the vPCI config space traps, as that would be done by IOREQ, and hence
> vPCI managed devices could be registered at runtime with IOREQ.  IOW:
> there won't be a need anymore to signal at domain creation whether
> vPCI is intended to be used or not.

For safety, we have requirements to specify everything statically before
boot so typically anything dynamic is a problem.


> We would obviously need to enable IOREQ for all domains with IOMMU
> enabled, as it would be IOREQ that register the PCI config space
> handlers.

This bit might be OK


> > For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
> > before adding more changes on top of them, not because I don't care
> > about the mediated GPU problem (we do have something similar at AMD),
> > but because I worry that if we try to change them now we might not do a
> > good enough job. I would prefer to wait until we know more about the
> > actual use case, ideally with code supporting it.
> > 
> > I think the difference in points of views comes from the fact that I see
> > vPCI as the default, QEMU only as a limited peripheral emulator (or
> > mediator for the vGPU case) but not in control. vPCI and QEMU are not
> > equal in my view. vPCI is in charge and always present if not in very
> > uncommon setups (even if we decide to hook it inside Xen by using
> > internal IOREQ interfaces). QEMU might come and go.
> 
> Xen needs a single component that mediates accesses to resources,
> whether that's IOREQ, or something else I don't really care that much.
> Having vPCI mediate accesses to the PCI config space, and IOREQ to the
> memory (and on x86 IO port) space just seems awfully complicated for
> AFAICT no real benefit.
>
> > Now that I am writing this, I realize this is also why I wasn't too
> > happy with the idea of hooking vPCI using IOREQ. It makes them look as
> > if they are the same, while I don't they should be considered at the
> > same level of priority, criticality, safety, integration in the system,
> > etc.
> 
> I feel there are some fears with IOREQ from a safety PoV?  The code
> that does the resource multiplexing is small, and as said above if
> there are safety concerns with the hypercall interface it would be
> fine to limit it's usage to internal handlers only.

Yes it is about safety. Everything within Xen will be safety-certified,
hence usable in a safety critical scenario, everything outside of Xen
might not.

The fear is not on the IOREQ itself because xen/common/ioreq.c is part
of the certification scope. The fear is that IOREQ servers (e.g. QEMU)
are somehow in the picture when we discuss safety architectures with PCI
Passthrough, or that IOREQ servers could interfere with vPCI. By
"interfere" I mean that QEMU running in dom0 (a deprivileged dom0) will
be able to cause a malfunction in Xen vPCI.

Yes, limiting the hypercall interface would help in that regard because
it would limit Xen exposure.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-06  2:34             ` Stefano Stabellini
@ 2023-12-11 10:36               ` Roger Pau Monné
  2023-12-12  1:34                 ` Stefano Stabellini
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2023-12-11 10:36 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Stewart Hildebrand, xen-devel, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Jan Beulich, Wei Liu, Paul Durrant

On Tue, Dec 05, 2023 at 06:34:35PM -0800, Stefano Stabellini wrote:
> On Tue, 5 Dec 2023, Roger Pau Monné wrote:
> > > > > I don't think we should enable IOREQ servers to handle PCI passthrough
> > > > > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> > > > > Passthrough can be handled by vPCI just fine. I think this should be a
> > > > > good anti-feature to have (a goal to explicitly not add this feature) to
> > > > > reduce complexity. Unless you see a specific usecase to add support for
> > > > > it?
> > > > 
> > > > There are passthrough devices (GPUs) that might require some extra
> > > > mediation on dom0 (like the Intel GVT-g thing) that would force the
> > > > usage of ioreqs to passthrough.
> > > 
> > > From an architectural perspective, I think it would be cleaner, simpler
> > > to maintain, and simpler to understand if Xen was the sole owner of the
> > > PCI Root Complex and PCI config space mediation (implemented with vPCI).
> > > IOREQ can be used for emulation and it works very well for that. At
> > > least in my mind, that makes things much simpler.
> > 
> > But IOREQ already has all the code to mediate accesses to the PCI
> > config space, and the interface to register separate servers for
> > different PCI devices.
> > 
> > We would then need to duplicate this internally for vPCI, so that vPCI
> > could forward accesses to IOREQ just for IOREQ to forward to yet a
> > different component?  Seems like a lot of duplication for no benefit.
> 
> [...] 
>  
> > Also, you seem to confabulate IOREQ with QEMU, while the latter is
> > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator
> > inside of Xen, that has the ability to forward such accesses to
> > external emulators using an hypercall interface.
> 
> We have been using different terminologies until now. IOREQ could mean
> anything from the ABI interface, the emulator side (QEMU) or the
> hypervisor side (Xen). I am going to align with your wording and say:
> 
> IOREQ: only the IOREQ implementation in Xen (xen/common/ioreq.c)
> IOREQ server: QEMU or alternative
> 
> I think it is OK if we use IOREQ internally within Xen to hook vPCI with
> PCI config space accesses and emulation. I don't think it is a good idea
> to attempt to enable IOREQ servers (e.g. QEMU) to implement PCI
> Passthrough when vPCI is also enabled for the domain, at least
> initially.

I agree, it's perfectly fine to initially limit to vPCI passthrough
devices + QEMU emulated devices only for example.

I think it was mostly an issue with terminology then :).

> 
> > > I understand there are non-trivial cases, like virtual GPUs with
> > > hardware access, but I don't classify those as passthrough. That's
> > > because there isn't one device that gets fully assigned to the guest.
> > > Instead, there is an emulated device (hence IOREQ) with certain MMIO
> > > regions and interrupts that end up being directly mapped from real
> > > hardware.
> > > 
> > > So I think it is natural in those cases to use IOREQ and it is also
> > > natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
> > > perspective, I hope it will mostly look as if the device is assigned to
> > > Dom0. Even if it ends up being more complex than that, Rome wasn't
> > > built in one day, and I don't think we should try to solve this problem
> > > on day1 (as long as the interfaces are not stable interfaces).
> > 
> > I don't see IOREQ as dealing explicitly with emulation.  Yes, it does
> > allow for emulators to be implemented in user-space, but at the end
> > it's just an interface that allows forwarding accesses to certain
> > resources (for the case we are speaking about here, PCI config space)
> > to entities that registered as handlers.
> > 
> > vPCI OTOH just deals with a very specific resource (PCI config space)
> > and only allows internal handlers to be registered on a byte
> > granularity.
> > 
> > So your proposal would be to implement a hierarchy like the one on the
> > diagram below:
> > 
> >     ┌────────┐ ┌──────────┐ ┌──────────────────┐
> >     │ Memory │ │ IO Ports │ │ PCI config space │
> >     └───────┬┘ └┬─────────┘ └───┬──────────────┘
> >             │   │               │
> >             │   │           ┌───┴──┐
> >             │   │           │ vPCI │
> >             │   │           └─┬──┬─┘
> >          ┌──┴───┴┐            │  │
> >          │ IOREQ ├────────────┘  │
> >          └────┬──┘               │
> >               │                  │
> >  ┌────────────┴──┐              ┌┴──────────────┐
> >  │ IOREQ servers │              │ vPCI handlers │
> >  └───────────────┘              └───────────────┘
> 
> Yes
> 
> 
> > While what I'm proposing would look like:
> > 
> >     ┌────────┐ ┌──────────┐ ┌──────────────────┐
> >     │ Memory │ │ IO Ports │ │ PCI config space │
> >     └────┬───┘ └────┬─────┘ └────────┬─────────┘
> >          │          │                │
> >          └─────┬────┴────┬───────────┘
> >                │  IOREQ  │
> >                └─┬─────┬─┘
> >                  │     │
> >  ┌───────────────┤     └────┬──────┐
> >  │ IOREQ servers │          │ vPCI │
> >  └───────────────┘          └───┬──┘
> >                                 │
> >                             ┌───┴───────────┐
> >                             │ vPCI handlers │
> >                             └───────────────┘
> 
> I don't have a major problem with this, but I find it less clear than
> the first one.
> 
> Let's say that all domains are PVH (or ARM guests). QEMU is running in
> Dom0. If QEMU does emulation, that's fine. If QEMU implements PCI
> Passthrough then QEMU uses libpci to do PCI config space reads and
> write, which go to the Linux kernel in Dom0, which ends up doing PCI
> config space reads and writes on the device, and that goes via vPCI in
> Xen (vPCI for Dom0). So actually vPCI is still present. It is a lot
> simpler to think that vPCI is in charge of all mediated PCI config space
> accesses rather than thinking that for the same device vPCI handles PCI
> config space accesses for Dom0 but not for DomU.

So most of the issue is again with terminology I think, you would
like to avoid even having to mention the word IOREQ for PVH domains
for example, which you could possibly do if vPCI trapped all accesses
to the PCI config space.

I would be fine with renaming that internal IOREQ component to
something else.  What I insist on having is a single component that
multiplexes access to all platform resources (IO ports, MMIO, PCI
config space), so that we can have a (kind of) unified interface to
register handlers.

> It is not my preference but I am OK to compromise and go ahead with the
> architecture you proposed but please let's keep IOREQ servers out of the
> PCI Passthrough picture at least initially.
> 
> 
> > I'm obviously biased, but I think the latter is cleaner, and allows
> > all resources to be arbitrated by the same component (IOREQ).
> > 
> > If the concern is about the IOREQ hypercall interface, it would be
> > fine to introduce an option that limit IOREQs to internal users
> > (vPCI) without supporting external IOREQ servers.
> > 
> > Think of IOREQ as a resource mediator inside of Xen, that just does
> > the PCI address decoding and forwards the access to the interested
> > party, either an external IOREQ server or vPCI.
> 
> The part about IOREQ (xen/common/ioreq.c) being a resource mediator is
> OKish.
> 
> I had many discussions over the years with various members of the larger
> open source embedded community (Linaro, etc.) and the problem is that
> when one says "IOREQ" typically people think of QEMU or other userspace
> emulators. They don't think of the Xen side of it. This becomes very
> relevant here because Xen is the only part of the system that is
> getting safety-certified and it is important to convey the message that
> nothing else in required to be safety-certified to have a fully working
> Xen system that supports PCI Passthrough.
> 
> In short, it is important that the community doesn't get the idea that
> QEMU needs to be safety-certified to have PCI Passthrough working
> correctly with Xen in a safety scenario.

Maybe we need to rename that internal IOREQ component to something
else, and then IOREQ would strictly be limited to the hypercall
interface + IOREQ servers.

Or maybe we just need more education/documentation around the
difference between the internal side of IOREQs vs IOREQ servers vs
QEMU.  See for example demu, which is an emulator for a PC-like
compatible system using IOREQ servers:

https://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=summary

>  
> > > > It's important that the interfaces we introduce are correct IMO,
> > > > because that ends up reflecting on the configuration options that we
> > > > expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
> > > > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
> > > > gets placed there will ultimately influence how the option gets
> > > > exposed in xl/libxl, and the interface there is relevant to keep
> > > > stable for end user sanity.
> > > 
> > > I agree with you on the stable interfaces. The important part is not to
> > > introduce changes to stable interfaces that could limit us in the
> > > future. Specifically that includes xl and libxl, we need to be careful
> > > there. But I don't see a single per-domain vPCI enable/disable option as
> > > a problem. Let's say that in the future we have a mediated vGPU
> > > implementation: if it works together with vPCI then the per-domain vPCI
> > > option in libxl will be enabled (either explicitely or by default), if
> > > it doesn't then vPCI will be disabled (either explicitely or by the
> > > newer vGPU options.)
> > 
> > If vPCI is hooked into IOREQ there won't be a need anymore to register
> > the vPCI config space traps, as that would be done by IOREQ, and hence
> > vPCI managed devices could be registered at runtime with IOREQ.  IOW:
> > there won't be a need anymore to signal at domain creation whether
> > vPCI is intended to be used or not.
> 
> For safety, we have requirements to specify everything statically before
> boot so typically anything dynamic is a problem.
> 
> 
> > We would obviously need to enable IOREQ for all domains with IOMMU
> > enabled, as it would be IOREQ that register the PCI config space
> > handlers.
> 
> This bit might be OK
> 
> 
> > > For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
> > > before adding more changes on top of them, not because I don't care
> > > about the mediated GPU problem (we do have something similar at AMD),
> > > but because I worry that if we try to change them now we might not do a
> > > good enough job. I would prefer to wait until we know more about the
> > > actual use case, ideally with code supporting it.
> > > 
> > > I think the difference in points of views comes from the fact that I see
> > > vPCI as the default, QEMU only as a limited peripheral emulator (or
> > > mediator for the vGPU case) but not in control. vPCI and QEMU are not
> > > equal in my view. vPCI is in charge and always present if not in very
> > > uncommon setups (even if we decide to hook it inside Xen by using
> > > internal IOREQ interfaces). QEMU might come and go.
> > 
> > Xen needs a single component that mediates accesses to resources,
> > whether that's IOREQ, or something else I don't really care that much.
> > Having vPCI mediate accesses to the PCI config space, and IOREQ to the
> > memory (and on x86 IO port) space just seems awfully complicated for
> > AFAICT no real benefit.
> >
> > > Now that I am writing this, I realize this is also why I wasn't too
> > > happy with the idea of hooking vPCI using IOREQ. It makes them look as
> > > if they are the same, while I don't they should be considered at the
> > > same level of priority, criticality, safety, integration in the system,
> > > etc.
> > 
> > I feel there are some fears with IOREQ from a safety PoV?  The code
> > that does the resource multiplexing is small, and as said above if
> > there are safety concerns with the hypercall interface it would be
> > fine to limit it's usage to internal handlers only.
> 
> Yes it is about safety. Everything within Xen will be safety-certified,
> hence usable in a safety critical scenario, everything outside of Xen
> might not.
> 
> The fear is not on the IOREQ itself because xen/common/ioreq.c is part
> of the certification scope. The fear is that IOREQ servers (e.g. QEMU)
> are somehow in the picture when we discuss safety architectures with PCI
> Passthrough, or that IOREQ servers could interfere with vPCI. By
> "interfere" I mean that QEMU running in dom0 (a deprivileged dom0) will
> be able to cause a malfunction in Xen vPCI.

For that purpose it doesn't matter much how IOREQs or vPCI interact,
as any (buggy) interaction could possibly allow IOREQ to cause
malfunctions to vPCI.

> Yes, limiting the hypercall interface would help in that regard because
> it would limit Xen exposure.

That would be fine IMO, it could even be a Kconfig option if that
better suits your needs.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs
  2023-12-11 10:36               ` Roger Pau Monné
@ 2023-12-12  1:34                 ` Stefano Stabellini
  0 siblings, 0 replies; 32+ messages in thread
From: Stefano Stabellini @ 2023-12-12  1:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Stefano Stabellini, Stewart Hildebrand, xen-devel, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Paul Durrant

[-- Attachment #1: Type: text/plain, Size: 15065 bytes --]

On Mon, 11 Dec 2023, Roger Pau Monné wrote:
> On Tue, Dec 05, 2023 at 06:34:35PM -0800, Stefano Stabellini wrote:
> > On Tue, 5 Dec 2023, Roger Pau Monné wrote:
> > > > > > I don't think we should enable IOREQ servers to handle PCI passthrough
> > > > > > for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
> > > > > > Passthrough can be handled by vPCI just fine. I think this should be a
> > > > > > good anti-feature to have (a goal to explicitly not add this feature) to
> > > > > > reduce complexity. Unless you see a specific usecase to add support for
> > > > > > it?
> > > > > 
> > > > > There are passthrough devices (GPUs) that might require some extra
> > > > > mediation on dom0 (like the Intel GVT-g thing) that would force the
> > > > > usage of ioreqs to passthrough.
> > > > 
> > > > From an architectural perspective, I think it would be cleaner, simpler
> > > > to maintain, and simpler to understand if Xen was the sole owner of the
> > > > PCI Root Complex and PCI config space mediation (implemented with vPCI).
> > > > IOREQ can be used for emulation and it works very well for that. At
> > > > least in my mind, that makes things much simpler.
> > > 
> > > But IOREQ already has all the code to mediate accesses to the PCI
> > > config space, and the interface to register separate servers for
> > > different PCI devices.
> > > 
> > > We would then need to duplicate this internally for vPCI, so that vPCI
> > > could forward accesses to IOREQ just for IOREQ to forward to yet a
> > > different component?  Seems like a lot of duplication for no benefit.
> > 
> > [...] 
> >  
> > > Also, you seem to confabulate IOREQ with QEMU, while the latter is
> > > indeed an user of IOREQ, I do see IOREQ as a simple resource mediator
> > > inside of Xen, that has the ability to forward such accesses to
> > > external emulators using an hypercall interface.
> > 
> > We have been using different terminologies until now. IOREQ could mean
> > anything from the ABI interface, the emulator side (QEMU) or the
> > hypervisor side (Xen). I am going to align with your wording and say:
> > 
> > IOREQ: only the IOREQ implementation in Xen (xen/common/ioreq.c)
> > IOREQ server: QEMU or alternative
> > 
> > I think it is OK if we use IOREQ internally within Xen to hook vPCI with
> > PCI config space accesses and emulation. I don't think it is a good idea
> > to attempt to enable IOREQ servers (e.g. QEMU) to implement PCI
> > Passthrough when vPCI is also enabled for the domain, at least
> > initially.
> 
> I agree, it's perfectly fine to initially limit to vPCI passthrough
> devices + QEMU emulated devices only for example.

OK good


> I think it was mostly an issue with terminology then :).

Yes :)


> > > > I understand there are non-trivial cases, like virtual GPUs with
> > > > hardware access, but I don't classify those as passthrough. That's
> > > > because there isn't one device that gets fully assigned to the guest.
> > > > Instead, there is an emulated device (hence IOREQ) with certain MMIO
> > > > regions and interrupts that end up being directly mapped from real
> > > > hardware.
> > > > 
> > > > So I think it is natural in those cases to use IOREQ and it is also
> > > > natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
> > > > perspective, I hope it will mostly look as if the device is assigned to
> > > > Dom0. Even if it ends up being more complex than that, Rome wasn't
> > > > built in one day, and I don't think we should try to solve this problem
> > > > on day1 (as long as the interfaces are not stable interfaces).
> > > 
> > > I don't see IOREQ as dealing explicitly with emulation.  Yes, it does
> > > allow for emulators to be implemented in user-space, but at the end
> > > it's just an interface that allows forwarding accesses to certain
> > > resources (for the case we are speaking about here, PCI config space)
> > > to entities that registered as handlers.
> > > 
> > > vPCI OTOH just deals with a very specific resource (PCI config space)
> > > and only allows internal handlers to be registered on a byte
> > > granularity.
> > > 
> > > So your proposal would be to implement a hierarchy like the one on the
> > > diagram below:
> > > 
> > >     ┌────────┐ ┌──────────┐ ┌──────────────────┐
> > >     │ Memory │ │ IO Ports │ │ PCI config space │
> > >     └───────┬┘ └┬─────────┘ └───┬──────────────┘
> > >             │   │               │
> > >             │   │           ┌───┴──┐
> > >             │   │           │ vPCI │
> > >             │   │           └─┬──┬─┘
> > >          ┌──┴───┴┐            │  │
> > >          │ IOREQ ├────────────┘  │
> > >          └────┬──┘               │
> > >               │                  │
> > >  ┌────────────┴──┐              ┌┴──────────────┐
> > >  │ IOREQ servers │              │ vPCI handlers │
> > >  └───────────────┘              └───────────────┘
> > 
> > Yes
> > 
> > 
> > > While what I'm proposing would look like:
> > > 
> > >     ┌────────┐ ┌──────────┐ ┌──────────────────┐
> > >     │ Memory │ │ IO Ports │ │ PCI config space │
> > >     └────┬───┘ └────┬─────┘ └────────┬─────────┘
> > >          │          │                │
> > >          └─────┬────┴────┬───────────┘
> > >                │  IOREQ  │
> > >                └─┬─────┬─┘
> > >                  │     │
> > >  ┌───────────────┤     └────┬──────┐
> > >  │ IOREQ servers │          │ vPCI │
> > >  └───────────────┘          └───┬──┘
> > >                                 │
> > >                             ┌───┴───────────┐
> > >                             │ vPCI handlers │
> > >                             └───────────────┘
> > 
> > I don't have a major problem with this, but I find it less clear than
> > the first one.
> > 
> > Let's say that all domains are PVH (or ARM guests). QEMU is running in
> > Dom0. If QEMU does emulation, that's fine. If QEMU implements PCI
> > Passthrough then QEMU uses libpci to do PCI config space reads and
> > write, which go to the Linux kernel in Dom0, which ends up doing PCI
> > config space reads and writes on the device, and that goes via vPCI in
> > Xen (vPCI for Dom0). So actually vPCI is still present. It is a lot
> > simpler to think that vPCI is in charge of all mediated PCI config space
> > accesses rather than thinking that for the same device vPCI handles PCI
> > config space accesses for Dom0 but not for DomU.
> 
> So most of the issue is again with terminology I think, you would
> like to avoid even having to mention the word IOREQ for PVH domains
> for example, which you could possibly do if vPCI trapped all accesses
> to the PCI config space.
> 
> I would be fine with renaming that internal IOREQ component to
> something else.  What I insist on having is a single component that
> multiplexes access to all platform resources (IO ports, MMIO, PCI
> config space), so that we can have a (kind of) unified interface to
> register handlers.

Yes I am OK with that.

A single multiplexer is fine, however we need to be careful as IOREQ in
Xen has a lot of stuff about handling messages to and from QEMU and
state changes related to it, see ioreq_send and ioreq_send_buffered.


> > It is not my preference but I am OK to compromise and go ahead with the
> > architecture you proposed but please let's keep IOREQ servers out of the
> > PCI Passthrough picture at least initially.
> > 
> > 
> > > I'm obviously biased, but I think the latter is cleaner, and allows
> > > all resources to be arbitrated by the same component (IOREQ).
> > > 
> > > If the concern is about the IOREQ hypercall interface, it would be
> > > fine to introduce an option that limit IOREQs to internal users
> > > (vPCI) without supporting external IOREQ servers.
> > > 
> > > Think of IOREQ as a resource mediator inside of Xen, that just does
> > > the PCI address decoding and forwards the access to the interested
> > > party, either an external IOREQ server or vPCI.
> > 
> > The part about IOREQ (xen/common/ioreq.c) being a resource mediator is
> > OKish.
> > 
> > I had many discussions over the years with various members of the larger
> > open source embedded community (Linaro, etc.) and the problem is that
> > when one says "IOREQ" typically people think of QEMU or other userspace
> > emulators. They don't think of the Xen side of it. This becomes very
> > relevant here because Xen is the only part of the system that is
> > getting safety-certified and it is important to convey the message that
> > nothing else in required to be safety-certified to have a fully working
> > Xen system that supports PCI Passthrough.
> > 
> > In short, it is important that the community doesn't get the idea that
> > QEMU needs to be safety-certified to have PCI Passthrough working
> > correctly with Xen in a safety scenario.
> 
> Maybe we need to rename that internal IOREQ component to something
> else, and then IOREQ would strictly be limited to the hypercall
> interface + IOREQ servers.

Right. We could also keep calling IOREQ things in Xen strictly related
to handling the message passing interface to QEMU, e.g. ioreq_send.

 
> Or maybe we just need more education/documentation around the
> difference between the internal side of IOREQs vs IOREQ servers vs
> QEMU.  See for example demu, which is an emulator for a PC-like
> compatible system using IOREQ servers:
> 
> https://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=summary
> 
> >  
> > > > > It's important that the interfaces we introduce are correct IMO,
> > > > > because that ends up reflecting on the configuration options that we
> > > > > expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
> > > > > XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
> > > > > gets placed there will ultimately influence how the option gets
> > > > > exposed in xl/libxl, and the interface there is relevant to keep
> > > > > stable for end user sanity.
> > > > 
> > > > I agree with you on the stable interfaces. The important part is not to
> > > > introduce changes to stable interfaces that could limit us in the
> > > > future. Specifically that includes xl and libxl, we need to be careful
> > > > there. But I don't see a single per-domain vPCI enable/disable option as
> > > > a problem. Let's say that in the future we have a mediated vGPU
> > > > implementation: if it works together with vPCI then the per-domain vPCI
> > > > option in libxl will be enabled (either explicitely or by default), if
> > > > it doesn't then vPCI will be disabled (either explicitely or by the
> > > > newer vGPU options.)
> > > 
> > > If vPCI is hooked into IOREQ there won't be a need anymore to register
> > > the vPCI config space traps, as that would be done by IOREQ, and hence
> > > vPCI managed devices could be registered at runtime with IOREQ.  IOW:
> > > there won't be a need anymore to signal at domain creation whether
> > > vPCI is intended to be used or not.
> > 
> > For safety, we have requirements to specify everything statically before
> > boot so typically anything dynamic is a problem.
> > 
> > 
> > > We would obviously need to enable IOREQ for all domains with IOMMU
> > > enabled, as it would be IOREQ that register the PCI config space
> > > handlers.
> > 
> > This bit might be OK
> > 
> > 
> > > > For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
> > > > before adding more changes on top of them, not because I don't care
> > > > about the mediated GPU problem (we do have something similar at AMD),
> > > > but because I worry that if we try to change them now we might not do a
> > > > good enough job. I would prefer to wait until we know more about the
> > > > actual use case, ideally with code supporting it.
> > > > 
> > > > I think the difference in points of views comes from the fact that I see
> > > > vPCI as the default, QEMU only as a limited peripheral emulator (or
> > > > mediator for the vGPU case) but not in control. vPCI and QEMU are not
> > > > equal in my view. vPCI is in charge and always present if not in very
> > > > uncommon setups (even if we decide to hook it inside Xen by using
> > > > internal IOREQ interfaces). QEMU might come and go.
> > > 
> > > Xen needs a single component that mediates accesses to resources,
> > > whether that's IOREQ, or something else I don't really care that much.
> > > Having vPCI mediate accesses to the PCI config space, and IOREQ to the
> > > memory (and on x86 IO port) space just seems awfully complicated for
> > > AFAICT no real benefit.
> > >
> > > > Now that I am writing this, I realize this is also why I wasn't too
> > > > happy with the idea of hooking vPCI using IOREQ. It makes them look as
> > > > if they are the same, while I don't they should be considered at the
> > > > same level of priority, criticality, safety, integration in the system,
> > > > etc.
> > > 
> > > I feel there are some fears with IOREQ from a safety PoV?  The code
> > > that does the resource multiplexing is small, and as said above if
> > > there are safety concerns with the hypercall interface it would be
> > > fine to limit it's usage to internal handlers only.
> > 
> > Yes it is about safety. Everything within Xen will be safety-certified,
> > hence usable in a safety critical scenario, everything outside of Xen
> > might not.
> > 
> > The fear is not on the IOREQ itself because xen/common/ioreq.c is part
> > of the certification scope. The fear is that IOREQ servers (e.g. QEMU)
> > are somehow in the picture when we discuss safety architectures with PCI
> > Passthrough, or that IOREQ servers could interfere with vPCI. By
> > "interfere" I mean that QEMU running in dom0 (a deprivileged dom0) will
> > be able to cause a malfunction in Xen vPCI.
> 
> For that purpose it doesn't matter much how IOREQs or vPCI interact,
> as any (buggy) interaction could possibly allow IOREQ to cause
> malfunctions to vPCI.

yep


> > Yes, limiting the hypercall interface would help in that regard because
> > it would limit Xen exposure.
> 
> That would be fine IMO, it could even be a Kconfig option if that
> better suits your needs.

OK. I think we are aligned.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v6 5/5] [FUTURE] tools/arm: enable vPCI for domUs
  2023-11-13 22:21 [PATCH v6 0/5] Kconfig for PCI passthrough on ARM Stewart Hildebrand
                   ` (3 preceding siblings ...)
  2023-11-13 22:21 ` [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs Stewart Hildebrand
@ 2023-11-13 22:21 ` Stewart Hildebrand
  4 siblings, 0 replies; 32+ messages in thread
From: Stewart Hildebrand @ 2023-11-13 22:21 UTC (permalink / raw)
  To: xen-devel; +Cc: Stewart Hildebrand, Wei Liu, Anthony PERARD, Juergen Gross

Set the vPCI flag in xen_domctl_createdomain to enable vPCI if a pci
device has been specified in the xl domain config file.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
Same story as the patch before this regarding the [FUTURE] tag.

v5->v6:
* no change

v4->v5:
* no change

v3->v4:
* split from ("xen/arm: enable vPCI for domUs")
---
 tools/libs/light/libxl_arm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 15391917748c..6daed958e598 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -222,6 +222,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         config->arch.sve_vl = d_config->b_info.arch_arm.sve_vl / 128U;
     }
 
+    if (d_config->num_pcidevs)
+        config->flags |= XEN_DOMCTL_CDF_vpci;
+
     return 0;
 }
 
-- 
2.42.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-12-19 18:56 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-13 22:21 [PATCH v6 0/5] Kconfig for PCI passthrough on ARM Stewart Hildebrand
2023-11-13 22:21 ` [PATCH v6 1/5] xen/arm: pci: introduce PCI_PASSTHROUGH Kconfig option Stewart Hildebrand
2023-11-13 22:21 ` [PATCH v6 2/5] xen/vpci: move xen_domctl_createdomain vPCI flag to common Stewart Hildebrand
2023-11-14  9:11   ` Jan Beulich
2023-11-29 21:25     ` Stewart Hildebrand
2023-12-15  9:36       ` Rahul Singh
2023-12-19 18:55         ` Stewart Hildebrand
2023-11-13 22:21 ` [PATCH v6 3/5] [FUTURE] xen/arm: enable vPCI for dom0 Stewart Hildebrand
2023-11-13 22:21 ` [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs Stewart Hildebrand
2023-11-14  9:13   ` Jan Beulich
2023-11-30  2:47     ` Stewart Hildebrand
2023-11-30  8:33       ` Jan Beulich
2023-11-30 17:06         ` Stewart Hildebrand
2023-12-01  6:57           ` Jan Beulich
2023-12-01  9:16   ` Roger Pau Monné
2023-12-02  2:56     ` Stefano Stabellini
2023-12-04  3:54       ` Stewart Hildebrand
2023-12-04  8:24       ` Jan Beulich
2023-12-04 10:58       ` Roger Pau Monné
2023-12-04 19:09         ` Stewart Hildebrand
2023-12-04 22:07         ` Stefano Stabellini
2023-12-05 11:08           ` Roger Pau Monné
2023-12-05 16:27             ` Stewart Hildebrand
2023-12-05 17:09               ` Roger Pau Monné
2023-12-05 17:36                 ` Stewart Hildebrand
2023-12-05 23:25                   ` Stefano Stabellini
2023-12-05 19:01             ` Stewart Hildebrand
2023-12-11  9:42               ` Roger Pau Monné
2023-12-06  2:34             ` Stefano Stabellini
2023-12-11 10:36               ` Roger Pau Monné
2023-12-12  1:34                 ` Stefano Stabellini
2023-11-13 22:21 ` [PATCH v6 5/5] [FUTURE] tools/arm: " Stewart Hildebrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.