[RFC PATCH 0/6] ARM virtio-pci initial support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/6] ARM virtio-pci initial support
@ 2023-11-15 11:26 Sergiy Kibrik
  2023-11-15 11:26 ` [RFC PATCH 1/6] libxl: Pass max_vcpus to Qemu in case of PVH domain (Arm) as well Sergiy Kibrik
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Sergiy Kibrik @ 2023-11-15 11:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Sergiy Kibrik, Wei Liu, Anthony PERARD,
	Juergen Gross, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk

Hi,
In this patch series we introduce support of PCI devices emulated by Virtio 
on ARM platform. A guest system is presented with Virtio Host bridge device, through
which a number of emulated PCI devices (e.g. disk, network, graphic, audio etc)
can work with corresponding guests' subsystems.

For that purpose we add a new "pci" virtio transport mechanism in xl
configuration, in addition to present "mmio" mechanism.

Suitable MMIO and IRQ ranges are reverved, for guests' PCI accesses are trapped
and forwarded to IOREQ server to be handled outside of Xen. Also guest's DT
extended with PCI (#INTA..#INTD) interrupt mappings.

For now only supported combination of backends is when both PCI Host bridge
and all PCI devices behind it are emulated by the same single instance (i.e. Qemu).

The code was tested with QEMU backends, yet it aims to be extendable to support
stand-alone backends.

 -Sergiy

Oleksandr Tyshchenko (6):
  libxl: Pass max_vcpus to Qemu in case of PVH domain (Arm) as well
  xen/public: arch-arm: reserve resources for virtio-pci
  libxl/arm: Add basic virtio-pci support
  libxl/arm: Reuse generic PCI-IOMMU bindings for virtio-pci devices
  xen/arm: Intercept vPCI config accesses and forward them to emulator
  libxl: Add "backend_type" property for the Virtio devices

 docs/man/xl.cfg.5.pod.in              |  18 +-
 tools/libs/light/libxl_arm.c          | 351 ++++++++++++++++++++++++--
 tools/libs/light/libxl_create.c       |  18 +-
 tools/libs/light/libxl_dm.c           |  98 ++++++-
 tools/libs/light/libxl_internal.h     |   5 +
 tools/libs/light/libxl_types.idl      |  41 ++-
 tools/libs/light/libxl_virtio.c       | 119 +++++++--
 tools/xl/xl_parse.c                   |  39 +++
 xen/arch/arm/Kconfig                  |  10 +
 xen/arch/arm/domain.c                 |   2 +-
 xen/arch/arm/{ => include/asm}/vpci.h |  11 +
 xen/arch/arm/io.c                     |   8 +-
 xen/arch/arm/ioreq.c                  |  19 +-
 xen/arch/arm/vpci.c                   | 106 +++++++-
 xen/include/public/arch-arm.h         |  21 ++
 15 files changed, 797 insertions(+), 69 deletions(-)
 rename xen/arch/arm/{ => include/asm}/vpci.h (75%)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/6] libxl: Pass max_vcpus to Qemu in case of PVH domain (Arm) as well
  2023-11-15 11:26 [RFC PATCH 0/6] ARM virtio-pci initial support Sergiy Kibrik
@ 2023-11-15 11:26 ` Sergiy Kibrik
  2023-11-15 11:26 ` [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci Sergiy Kibrik
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: Sergiy Kibrik @ 2023-11-15 11:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Wei Liu, Anthony PERARD, Juergen Gross,
	Sergiy Kibrik

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The number of vCPUs used for the IOREQ configuration (machine->smp.cpus)
should really match the system value as for each vCPU we setup a dedicated
evtchn for the communication with Xen at the runtime. This is needed
for the IOREQ to be properly configured and work if the involved domain
has more than one vCPU assigned.

Note that Qemu should set mc->max_cpus to GUEST_MAX_VCPUS in
xen_arm_machine_class_init().

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
---
 tools/libs/light/libxl_dm.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
index 14b593110f..0b2548d35b 100644
--- a/tools/libs/light/libxl_dm.c
+++ b/tools/libs/light/libxl_dm.c
@@ -1553,18 +1553,6 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
         if (!libxl__acpi_defbool_val(b_info)) {
             flexarray_append(dm_args, "-no-acpi");
         }
-        if (b_info->max_vcpus > 1) {
-            flexarray_append(dm_args, "-smp");
-            if (b_info->avail_vcpus.size) {
-                int nr_set_cpus = 0;
-                nr_set_cpus = libxl_bitmap_count_set(&b_info->avail_vcpus);
-
-                flexarray_append(dm_args, GCSPRINTF("%d,maxcpus=%d",
-                                                    nr_set_cpus,
-                                                    b_info->max_vcpus));
-            } else
-                flexarray_append(dm_args, GCSPRINTF("%d", b_info->max_vcpus));
-        }
         for (i = 0; i < num_nics; i++) {
             if (nics[i].nictype == LIBXL_NIC_TYPE_VIF_IOEMU) {
                 char *smac = GCSPRINTF(LIBXL_MAC_FMT,
@@ -1800,6 +1788,22 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
     for (i = 0; b_info->extra && b_info->extra[i] != NULL; i++)
         flexarray_append(dm_args, b_info->extra[i]);
 
+    if (b_info->type == LIBXL_DOMAIN_TYPE_HVM ||
+        b_info->type == LIBXL_DOMAIN_TYPE_PVH) {
+        if (b_info->max_vcpus > 1) {
+            flexarray_append(dm_args, "-smp");
+            if (b_info->avail_vcpus.size) {
+                int nr_set_cpus = 0;
+                nr_set_cpus = libxl_bitmap_count_set(&b_info->avail_vcpus);
+
+                flexarray_append(dm_args, GCSPRINTF("%d,maxcpus=%d",
+                                                    nr_set_cpus,
+                                                    b_info->max_vcpus));
+            } else
+                flexarray_append(dm_args, GCSPRINTF("%d", b_info->max_vcpus));
+        }
+    }
+
     flexarray_append(dm_args, "-machine");
     switch (b_info->type) {
     case LIBXL_DOMAIN_TYPE_PVH:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 11:26 [RFC PATCH 0/6] ARM virtio-pci initial support Sergiy Kibrik
  2023-11-15 11:26 ` [RFC PATCH 1/6] libxl: Pass max_vcpus to Qemu in case of PVH domain (Arm) as well Sergiy Kibrik
@ 2023-11-15 11:26 ` Sergiy Kibrik
  2023-11-15 12:33   ` Julien Grall
  2023-11-15 11:26 ` [RFC PATCH 3/6] libxl/arm: Add basic virtio-pci support Sergiy Kibrik
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Sergiy Kibrik @ 2023-11-15 11:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Sergiy Kibrik

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

In order to enable more use-cases such as having multiple
device-models (Qemu) running in different backend domains which provide
virtio-pci devices for the same guest, we allocate and expose one
PCI host bridge for every virtio backend domain for that guest.

For that purpose, reserve separate virtio-pci resources (memory and SPI range
for Legacy PCI interrupts) up to 8 possible PCI hosts (to be aligned with
MAX_NR_IOREQ_SERVERS) and allocate a host per backend domain. The PCI host
details including its host_id to be written to dedicated Xenstore node for
the device-model to retrieve.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
---
 xen/include/public/arch-arm.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index a25e87dbda..e6c9cd5335 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -466,6 +466,19 @@ typedef uint64_t xen_callback_t;
 #define GUEST_VPCI_MEM_ADDR                 xen_mk_ullong(0x23000000)
 #define GUEST_VPCI_MEM_SIZE                 xen_mk_ullong(0x10000000)
 
+/*
+ * 16 MB is reserved for virtio-pci configuration space based on calculation
+ * 8 bridges * 2 buses x 32 devices x 8 functions x 4 KB = 16 MB
+ */
+#define GUEST_VIRTIO_PCI_ECAM_BASE          xen_mk_ullong(0x33000000)
+#define GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE    xen_mk_ullong(0x01000000)
+#define GUEST_VIRTIO_PCI_HOST_ECAM_SIZE     xen_mk_ullong(0x00200000)
+
+/* 64 MB is reserved for virtio-pci memory */
+#define GUEST_VIRTIO_PCI_ADDR_TYPE_MEM    xen_mk_ullong(0x02000000)
+#define GUEST_VIRTIO_PCI_MEM_ADDR         xen_mk_ullong(0x34000000)
+#define GUEST_VIRTIO_PCI_MEM_SIZE         xen_mk_ullong(0x04000000)
+
 /*
  * 16MB == 4096 pages reserved for guest to use as a region to map its
  * grant table in.
@@ -476,6 +489,11 @@ typedef uint64_t xen_callback_t;
 #define GUEST_MAGIC_BASE  xen_mk_ullong(0x39000000)
 #define GUEST_MAGIC_SIZE  xen_mk_ullong(0x01000000)
 
+/* 64 MB is reserved for virtio-pci Prefetch memory */
+#define GUEST_VIRTIO_PCI_ADDR_TYPE_PREFETCH_MEM    xen_mk_ullong(0x42000000)
+#define GUEST_VIRTIO_PCI_PREFETCH_MEM_ADDR         xen_mk_ullong(0x3a000000)
+#define GUEST_VIRTIO_PCI_PREFETCH_MEM_SIZE         xen_mk_ullong(0x04000000)
+
 #define GUEST_RAM_BANKS   2
 
 /*
@@ -515,6 +533,9 @@ typedef uint64_t xen_callback_t;
 #define GUEST_VIRTIO_MMIO_SPI_FIRST   33
 #define GUEST_VIRTIO_MMIO_SPI_LAST    43
 
+#define GUEST_VIRTIO_PCI_SPI_FIRST   44
+#define GUEST_VIRTIO_PCI_SPI_LAST    76
+
 /* PSCI functions */
 #define PSCI_cpu_suspend 0
 #define PSCI_cpu_off     1
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/6] libxl/arm: Add basic virtio-pci support
  2023-11-15 11:26 [RFC PATCH 0/6] ARM virtio-pci initial support Sergiy Kibrik
  2023-11-15 11:26 ` [RFC PATCH 1/6] libxl: Pass max_vcpus to Qemu in case of PVH domain (Arm) as well Sergiy Kibrik
  2023-11-15 11:26 ` [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci Sergiy Kibrik
@ 2023-11-15 11:26 ` Sergiy Kibrik
  2023-11-15 11:26 ` [RFC PATCH 4/6] libxl/arm: Reuse generic PCI-IOMMU bindings for virtio-pci devices Sergiy Kibrik
  2023-11-15 11:26 ` [RFC PATCH 5/6] xen/arm: Intercept vPCI config accesses and forward them to emulator Sergiy Kibrik
  4 siblings, 0 replies; 30+ messages in thread
From: Sergiy Kibrik @ 2023-11-15 11:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Wei Liu, Anthony PERARD, Juergen Gross,
	Sergiy Kibrik

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Introduce new transport mechanism "pci" for the Virtio device
and update parsing and configuration logic accordingly.

In order to enable more use-cases such as having multiple
device-models (Qemu) running in different backend domains which provide
virtio-pci devices for the same guest, we allocate and expose one
PCI host bridge for every virtio backend domain for that guest.

Also extend PCI Host bridge DT node exposed to the guest by adding
bindings for Legacy PCI interrupts (#INTA - #INTD).

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
---
 docs/man/xl.cfg.5.pod.in          |   9 +-
 tools/libs/light/libxl_arm.c      | 287 ++++++++++++++++++++++++++++--
 tools/libs/light/libxl_create.c   |  18 +-
 tools/libs/light/libxl_dm.c       |  70 ++++++++
 tools/libs/light/libxl_internal.h |   5 +
 tools/libs/light/libxl_types.idl  |  34 +++-
 tools/libs/light/libxl_virtio.c   |  98 +++++++---
 tools/xl/xl_parse.c               |  36 ++++
 8 files changed, 507 insertions(+), 50 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 2e234b450e..0fba750815 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -1616,8 +1616,13 @@ hexadecimal format, without the "0x" prefix and all in lower case, like
 
 =item B<transport=STRING>
 
-Specifies the transport mechanism for the Virtio device, only "mmio" is
-supported for now.
+Specifies the transport mechanism for the Virtio device, both "mmio" and "pci"
+are supported. This option is mandatory.
+
+=item B<bdf=STRING>
+
+The Virtio device with transport "pci" must be identified by its B<BDF>.
+See L<xl-pci-configuration(5)> for more details about the format for B<BDF>.
 
 =item B<grant_usage=BOOLEAN>
 
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 1539191774..df6cbbe756 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -20,6 +20,11 @@
  */
 #define VIRTIO_MMIO_DEV_SIZE   xen_mk_ullong(0x200)
 
+#define VIRTIO_PCI_HOST_MEM_SIZE            xen_mk_ullong(0x800000)
+#define VIRTIO_PCI_HOST_PREFETCH_MEM_SIZE   xen_mk_ullong(0x800000)
+#define VIRTIO_PCI_HOST_NUM_SPIS            4
+#define VIRTIO_PCI_MAX_HOSTS                8
+
 static uint64_t alloc_virtio_mmio_base(libxl__gc *gc, uint64_t *virtio_mmio_base)
 {
     uint64_t base = *virtio_mmio_base;
@@ -80,14 +85,101 @@ static const char *gicv_to_string(libxl_gic_version gic_version)
     }
 }
 
+static int alloc_virtio_pci_host(libxl__gc *gc,
+                                 uint32_t backend_domid,
+                                 uint32_t *host_id,
+                                 unsigned int *num_hosts,
+                                 libxl_virtio_pci_host *hosts)
+{
+    unsigned int i;
+
+    BUILD_BUG_ON(VIRTIO_PCI_MAX_HOSTS !=
+                 GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE / GUEST_VIRTIO_PCI_HOST_ECAM_SIZE);
+    BUILD_BUG_ON(VIRTIO_PCI_MAX_HOSTS !=
+                 GUEST_VIRTIO_PCI_MEM_SIZE / VIRTIO_PCI_HOST_MEM_SIZE);
+    BUILD_BUG_ON(VIRTIO_PCI_MAX_HOSTS !=
+                 GUEST_VIRTIO_PCI_PREFETCH_MEM_SIZE / VIRTIO_PCI_HOST_PREFETCH_MEM_SIZE);
+    BUILD_BUG_ON(VIRTIO_PCI_MAX_HOSTS !=
+                 (GUEST_VIRTIO_PCI_SPI_LAST - GUEST_VIRTIO_PCI_SPI_FIRST) / VIRTIO_PCI_HOST_NUM_SPIS);
+
+    if (*num_hosts > VIRTIO_PCI_MAX_HOSTS)
+        return ERROR_INVAL;
+
+    for (i = 0; i < *num_hosts; i++) {
+        if (hosts[i].backend_domid == backend_domid) {
+            *host_id = hosts[i].id;
+
+            LOG(DEBUG, "Reuse host #%u: "
+                        "ECAM: 0x%"PRIx64"-0x%"PRIx64" "
+                        "MEM: 0x%"PRIx64"-0x%"PRIx64" "
+                        "PREFETCH_MEM: 0x%"PRIx64"-0x%"PRIx64" "
+                        "IRQ: %u-%u",
+                        hosts[i].id,
+                        hosts[i].ecam_base,
+                        hosts[i].ecam_base + hosts[i].ecam_size - 1,
+                        hosts[i].mem_base,
+                        hosts[i].mem_base + hosts[i].mem_size - 1,
+                        hosts[i].prefetch_mem_base,
+                        hosts[i].prefetch_mem_base + hosts[i].prefetch_mem_size - 1,
+                        hosts[i].irq_first,
+                        hosts[i].irq_first + hosts[i].num_irqs - 1);
+
+            return 0;
+        }
+    }
+
+    if (i == VIRTIO_PCI_MAX_HOSTS) {
+        LOG(ERROR, "Ran out of reserved resources for virtio-pci host\n");
+        return ERROR_FAIL;
+    }
+
+    hosts[i].backend_domid = backend_domid;
+    hosts[i].id = i;
+    hosts[i].ecam_base = GUEST_VIRTIO_PCI_ECAM_BASE +
+        i * GUEST_VIRTIO_PCI_HOST_ECAM_SIZE;
+    hosts[i].ecam_size = GUEST_VIRTIO_PCI_HOST_ECAM_SIZE;
+    hosts[i].mem_base = GUEST_VIRTIO_PCI_MEM_ADDR +
+        i * VIRTIO_PCI_HOST_MEM_SIZE;
+    hosts[i].mem_size = VIRTIO_PCI_HOST_MEM_SIZE;
+    hosts[i].prefetch_mem_base = GUEST_VIRTIO_PCI_PREFETCH_MEM_ADDR +
+        i * VIRTIO_PCI_HOST_PREFETCH_MEM_SIZE;
+    hosts[i].prefetch_mem_size = VIRTIO_PCI_HOST_PREFETCH_MEM_SIZE;
+    hosts[i].irq_first = GUEST_VIRTIO_PCI_SPI_FIRST +
+        i * VIRTIO_PCI_HOST_NUM_SPIS;
+    hosts[i].num_irqs = VIRTIO_PCI_HOST_NUM_SPIS;
+
+    *host_id = hosts[i].id;
+
+    (*num_hosts)++;
+
+    LOG(DEBUG, "Allocate host #%u: "
+                "ECAM: 0x%"PRIx64"-0x%"PRIx64" "
+                "MEM: 0x%"PRIx64"-0x%"PRIx64" "
+                "PREFETCH_MEM: 0x%"PRIx64"-0x%"PRIx64" "
+                "IRQ: %u-%u",
+                hosts[i].id,
+                hosts[i].ecam_base,
+                hosts[i].ecam_base + hosts[i].ecam_size - 1,
+                hosts[i].mem_base,
+                hosts[i].mem_base + hosts[i].mem_size - 1,
+                hosts[i].prefetch_mem_base,
+                hosts[i].prefetch_mem_base + hosts[i].prefetch_mem_size - 1,
+                hosts[i].irq_first,
+                hosts[i].irq_first + hosts[i].num_irqs - 1);
+
+    return 0;
+}
+
 int libxl__arch_domain_prepare_config(libxl__gc *gc,
                                       libxl_domain_config *d_config,
                                       struct xen_domctl_createdomain *config)
 {
     uint32_t nr_spis = 0;
     unsigned int i;
-    uint32_t vuart_irq, virtio_irq = 0;
-    bool vuart_enabled = false, virtio_enabled = false;
+    uint32_t vuart_irq, virtio_mmio_irq_last, virtio_pci_irq_last = 0;
+    bool vuart_enabled = false, virtio_mmio_enabled = false;
+    unsigned int num_virtio_pci_hosts = 0;
+    libxl_virtio_pci_host virtio_pci_hosts[VIRTIO_PCI_MAX_HOSTS] = {0};
     uint64_t virtio_mmio_base = GUEST_VIRTIO_MMIO_BASE;
     uint32_t virtio_mmio_irq = GUEST_VIRTIO_MMIO_SPI_FIRST;
     int rc;
@@ -118,11 +210,17 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
     for (i = 0; i < d_config->num_virtios; i++) {
         libxl_device_virtio *virtio = &d_config->virtios[i];
 
-        if (virtio->transport != LIBXL_VIRTIO_TRANSPORT_MMIO)
-            continue;
-
-        rc = alloc_virtio_mmio_params(gc, &virtio->base, &virtio->irq,
-                                      &virtio_mmio_base, &virtio_mmio_irq);
+        if (virtio->transport != LIBXL_VIRTIO_TRANSPORT_MMIO) {
+            rc = alloc_virtio_pci_host(gc,
+                                       virtio->backend_domid,
+                                       &virtio->u.pci.host_id,
+                                       &num_virtio_pci_hosts,
+                                       virtio_pci_hosts);
+        } else {
+            rc = alloc_virtio_mmio_params(gc, &virtio->u.mmio.base,
+                                          &virtio->u.mmio.irq,
+                                          &virtio_mmio_base, &virtio_mmio_irq);
+        }
 
         if (rc)
             return rc;
@@ -134,14 +232,25 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
      * The resulting "nr_spis" needs to cover the highest possible SPI.
      */
     if (virtio_mmio_irq != GUEST_VIRTIO_MMIO_SPI_FIRST) {
-        virtio_enabled = true;
+        virtio_mmio_enabled = true;
 
         /*
          * Assumes that "virtio_mmio_irq" is the highest allocated irq, which is
          * updated from alloc_virtio_mmio_irq() currently.
          */
-        virtio_irq = virtio_mmio_irq - 1;
-        nr_spis = max(nr_spis, virtio_irq - 32 + 1);
+        virtio_mmio_irq_last = virtio_mmio_irq - 1;
+        nr_spis = max(nr_spis, virtio_mmio_irq_last - 32 + 1);
+    }
+
+    if (num_virtio_pci_hosts) {
+        libxl_virtio_pci_host *host = &virtio_pci_hosts[num_virtio_pci_hosts - 1];
+
+        /*
+         * Assumes that latest allocated host contains the highest allocated
+         * irq range.
+         */
+        virtio_pci_irq_last = host->irq_first + host->num_irqs - 1;
+        nr_spis = max(nr_spis, virtio_pci_irq_last - 32 + 1);
     }
 
     for (i = 0; i < d_config->b_info.num_irqs; i++) {
@@ -164,10 +273,14 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         }
 
         /* The same check as for vpl011 */
-        if (virtio_enabled &&
-            (irq >= GUEST_VIRTIO_MMIO_SPI_FIRST && irq <= virtio_irq)) {
+        if (virtio_mmio_enabled &&
+            (irq >= GUEST_VIRTIO_MMIO_SPI_FIRST && irq <= virtio_mmio_irq_last)) {
             LOG(ERROR, "Physical IRQ %u conflicting with Virtio MMIO IRQ range\n", irq);
             return ERROR_FAIL;
+        } else if (num_virtio_pci_hosts &&
+            (irq >= GUEST_VIRTIO_PCI_SPI_FIRST && irq <= virtio_pci_irq_last)) {
+            LOG(ERROR, "Physical IRQ %u conflicting with Virtio PCI IRQ range\n", irq);
+            return ERROR_FAIL;
         }
 
         if (irq < 32)
@@ -179,6 +292,14 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
             nr_spis = spi + 1;
     }
 
+    if (num_virtio_pci_hosts) {
+        d_config->b_info.num_virtio_pci_hosts = num_virtio_pci_hosts;
+        d_config->b_info.virtio_pci_hosts = libxl__calloc(NOGC,
+            num_virtio_pci_hosts, sizeof(*d_config->b_info.virtio_pci_hosts));
+        memcpy(d_config->b_info.virtio_pci_hosts, virtio_pci_hosts,
+            sizeof(*d_config->b_info.virtio_pci_hosts) * num_virtio_pci_hosts);
+    }
+
     LOG(DEBUG, "Configure the domain");
 
     config->arch.nr_spis = nr_spis;
@@ -908,6 +1029,130 @@ static int make_vpci_node(libxl__gc *gc, void *fdt,
     return 0;
 }
 
+#define PCI_IRQ_MAP_MIN_STRIDE   8
+
+static int create_virtio_pci_irq_map(libxl__gc *gc, void *fdt,
+                                     libxl_virtio_pci_host *host)
+{
+    uint32_t *full_irq_map, *irq_map;
+    size_t len;
+    unsigned int slot, pin;
+    int res, cells;
+
+    res = fdt_property_cell(fdt, "#interrupt-cells", 1);
+    if (res) return res;
+
+    /* assume GIC node to be present, due to
+     * make_gicv2_node()/make_gicv3_node() get called earlier
+     */
+    res = fdt_node_offset_by_phandle(fdt, GUEST_PHANDLE_GIC);
+    if (res < 0)
+        return res;
+
+    res = fdt_address_cells(fdt, res);
+    /* handle case of make_gicv2_node() setting #address-cells to 0 */
+    if (res == -FDT_ERR_BADNCELLS)
+        res = 0;
+    else if (res < 0)
+        return res;
+
+    cells = res;
+    len = sizeof(uint32_t) * host->num_irqs * host->num_irqs *
+          (PCI_IRQ_MAP_MIN_STRIDE + cells);
+    irq_map = full_irq_map = libxl__malloc(gc, len);
+
+    for (slot = 0; slot < host->num_irqs; slot++) {
+        for (pin = 0; pin < host->num_irqs; pin++) {
+            uint32_t irq = host->irq_first + ((pin + slot) % host->num_irqs);
+            unsigned int i = 0;
+
+            /* PCI address (3 cells) */
+            irq_map[i++] = cpu_to_fdt32(PCI_DEVFN(slot, 0) << 8);
+            irq_map[i++] = cpu_to_fdt32(0);
+            irq_map[i++] = cpu_to_fdt32(0);
+
+            /* PCI interrupt (1 cell) */
+            irq_map[i++] = cpu_to_fdt32(pin + 1);
+
+            /* GIC phandle (1 cell) */
+            irq_map[i++] = cpu_to_fdt32(GUEST_PHANDLE_GIC);
+
+            /* GIC unit address, set 0 because vgic itself handles vpci IRQs */
+            for (int c = cells; c--; irq_map[i++] = cpu_to_fdt32(0));
+
+            /* GIC interrupt (3 cells) */
+            irq_map[i++] = cpu_to_fdt32(0); /* SPI */
+            irq_map[i++] = cpu_to_fdt32(irq - 32);
+            irq_map[i++] = cpu_to_fdt32(DT_IRQ_TYPE_LEVEL_HIGH);
+
+            irq_map += PCI_IRQ_MAP_MIN_STRIDE + cells;
+        }
+    }
+
+    res = fdt_property(fdt, "interrupt-map", full_irq_map, len);
+    if (res) return res;
+
+    res = fdt_property_values(gc, fdt, "interrupt-map-mask", 4,
+                              PCI_DEVFN(3, 0) << 8, 0, 0, 0x7);
+    if (res) return res;
+
+    return 0;
+}
+
+/* TODO Consider reusing make_vpci_node() */
+static int make_virtio_pci_node(libxl__gc *gc, void *fdt,
+                                libxl_virtio_pci_host *host,
+                                libxl_domain_config *d_config)
+{
+    int res;
+    const char *name = GCSPRINTF("pcie@%"PRIx64, host->ecam_base);
+
+    res = fdt_begin_node(fdt, name);
+    if (res) return res;
+
+    res = fdt_property_compat(gc, fdt, 1, "pci-host-ecam-generic");
+    if (res) return res;
+
+    res = fdt_property_string(fdt, "device_type", "pci");
+    if (res) return res;
+
+    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS,
+        GUEST_ROOT_SIZE_CELLS, 1, host->ecam_base, host->ecam_size);
+    if (res) return res;
+
+    res = fdt_property_values(gc, fdt, "bus-range", 2, 0, 1);
+    if (res) return res;
+
+    res = fdt_property_cell(fdt, "#address-cells", 3);
+    if (res) return res;
+
+    res = fdt_property_cell(fdt, "#size-cells", 2);
+    if (res) return res;
+
+    res = fdt_property_string(fdt, "status", "okay");
+    if (res) return res;
+
+    res = fdt_property_vpci_ranges(gc, fdt, GUEST_ROOT_ADDRESS_CELLS,
+        GUEST_ROOT_SIZE_CELLS, 2,
+        GUEST_VIRTIO_PCI_ADDR_TYPE_MEM, host->mem_base, host->mem_size,
+        GUEST_VIRTIO_PCI_ADDR_TYPE_PREFETCH_MEM, host->prefetch_mem_base,
+        host->prefetch_mem_size);
+    if (res) return res;
+
+    /* The same property as for virtio-mmio device */
+    res = fdt_property(fdt, "dma-coherent", NULL, 0);
+    if (res) return res;
+
+    /* Legacy PCI interrupts (#INTA - #INTD) */
+    res = create_virtio_pci_irq_map(gc, fdt, host);
+    if (res) return res;
+
+    res = fdt_end_node(fdt);
+    if (res) return res;
+
+    return 0;
+}
+
 static int make_xen_iommu_node(libxl__gc *gc, void *fdt)
 {
     int res;
@@ -1384,20 +1629,26 @@ next_resize:
         for (i = 0; i < d_config->num_virtios; i++) {
             libxl_device_virtio *virtio = &d_config->virtios[i];
 
-            if (virtio->transport != LIBXL_VIRTIO_TRANSPORT_MMIO)
-                continue;
-
             if (libxl_defbool_val(virtio->grant_usage))
                 iommu_needed = true;
 
-            FDT( make_virtio_mmio_node_device(gc, fdt, virtio->base,
-                                              virtio->irq, virtio->type,
+            if (virtio->transport != LIBXL_VIRTIO_TRANSPORT_MMIO)
+                continue;
+
+            FDT( make_virtio_mmio_node_device(gc, fdt, virtio->u.mmio.base,
+                                              virtio->u.mmio.irq, virtio->type,
                                               virtio->backend_domid,
                                               libxl_defbool_val(virtio->grant_usage)) );
         }
 
+        for (i = 0; i < d_config->b_info.num_virtio_pci_hosts; i++) {
+            libxl_virtio_pci_host *host = &d_config->b_info.virtio_pci_hosts[i];
+
+            FDT( make_virtio_pci_node(gc, fdt, host, d_config) );
+        }
+
         /*
-         * The iommu node should be created only once for all virtio-mmio
+         * The iommu node should be created only once for all virtio
          * devices.
          */
         if (iommu_needed)
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index ce1d431103..22b4fa40cc 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -1273,8 +1273,9 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
     }
 
     for (i = 0; i < d_config->num_virtios; i++) {
-        ret = libxl__virtio_devtype.set_default(gc, domid,
-                                                &d_config->virtios[i], false);
+        libxl_device_virtio *virtio = &d_config->virtios[i];
+
+        ret = libxl__virtio_devtype.set_default(gc, domid, virtio, false);
         if (ret) {
             LOGD(ERROR, domid, "Unable to set virtio defaults for device %d", i);
             goto error_out;
@@ -1770,6 +1771,19 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev,
     for (i = 0; i < d_config->num_virtios; i++)
         libxl__device_add(gc, domid, &libxl__virtio_devtype,
                           &d_config->virtios[i]);
+    /*
+     * This should be done before spawning device model, but after
+     * the creation of "device-model" directory in Xenstore.
+     */
+    for (i = 0; i < d_config->b_info.num_virtio_pci_hosts; i++) {
+        libxl_virtio_pci_host *host = &d_config->b_info.virtio_pci_hosts[i];
+
+        ret = libxl__save_dm_virtio_pci_host(gc, domid, host);
+        if (ret) {
+            LOGD(ERROR, domid, "Unable to save virtio_pci_host for device model");
+            goto error_out;
+        }
+    }
 
     switch (d_config->c_info.type) {
     case LIBXL_DOMAIN_TYPE_HVM:
diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
index 0b2548d35b..4e9391fc08 100644
--- a/tools/libs/light/libxl_dm.c
+++ b/tools/libs/light/libxl_dm.c
@@ -3375,6 +3375,76 @@ static void device_model_postconfig_done(libxl__egc *egc,
     dmss->callback(egc, dmss, rc);
 }
 
+int libxl__save_dm_virtio_pci_host(libxl__gc *gc,
+                                   uint32_t domid,
+                                   libxl_virtio_pci_host *host)
+{
+    const char *dm_path;
+    char **dir;
+    xs_transaction_t t = XBT_NULL;
+    unsigned int n;
+    int rc;
+
+    dm_path = GCSPRINTF("/local/domain/%d/device-model", host->backend_domid);
+
+    dir = libxl__xs_directory(gc, XBT_NULL, dm_path, &n);
+    if (!dir)
+        return ERROR_INVAL;
+
+    dm_path = DEVICE_MODEL_XS_PATH(gc, host->backend_domid, domid, "/virtio_pci_host");
+
+    for (;;) {
+        rc = libxl__xs_transaction_start(gc, &t);
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, GCSPRINTF("%s/id", dm_path),
+                                     GCSPRINTF("%u", host->id));
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, GCSPRINTF("%s/ecam_base", dm_path),
+                                     GCSPRINTF("%#"PRIx64, host->ecam_base));
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, GCSPRINTF("%s/ecam_size", dm_path),
+                                     GCSPRINTF("%#"PRIx64, host->ecam_size));
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, GCSPRINTF("%s/mem_base", dm_path),
+                                     GCSPRINTF("%#"PRIx64, host->mem_base));
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, GCSPRINTF("%s/mem_size", dm_path),
+                                     GCSPRINTF("%#"PRIx64, host->mem_size));
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, GCSPRINTF("%s/prefetch_mem_base", dm_path),
+                                     GCSPRINTF("%#"PRIx64, host->prefetch_mem_base));
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, GCSPRINTF("%s/prefetch_mem_size", dm_path),
+                                     GCSPRINTF("%#"PRIx64, host->prefetch_mem_size));
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, GCSPRINTF("%s/irq_first", dm_path),
+                                     GCSPRINTF("%u", host->irq_first));
+        if (rc) goto out;
+
+        rc = libxl__xs_write_checked(gc, t, GCSPRINTF("%s/num_irqs", dm_path),
+                                     GCSPRINTF("%u", host->num_irqs));
+        if (rc) goto out;
+
+        rc = libxl__xs_transaction_commit(gc, &t);
+        if (!rc) break;
+        if (rc < 0) goto out;
+    }
+
+    return 0;
+
+out:
+    libxl__xs_transaction_abort(gc, &t);
+    return rc;
+}
+
 void libxl__spawn_qdisk_backend(libxl__egc *egc, libxl__dm_spawn_state *dmss)
 {
     STATE_AO_GC(dmss->spawn.ao);
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index d5732d1c37..75d370d739 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -4199,6 +4199,11 @@ _hidden void libxl__spawn_qdisk_backend(libxl__egc *egc,
                                         libxl__dm_spawn_state *dmss);
 _hidden int libxl__destroy_qdisk_backend(libxl__gc *gc, uint32_t domid);
 
+
+_hidden int libxl__save_dm_virtio_pci_host(libxl__gc *gc,
+                                           uint32_t domid,
+                                           libxl_virtio_pci_host *host);
+
 /*----- Domain creation -----*/
 
 
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 7d8bd5d216..a86c601994 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -281,6 +281,7 @@ libxl_vkb_backend = Enumeration("vkb_backend", [
 libxl_virtio_transport = Enumeration("virtio_transport", [
     (0, "UNKNOWN"),
     (1, "MMIO"),
+    (2, "PCI"),
     ])
 
 libxl_passthrough = Enumeration("passthrough", [
@@ -558,6 +559,19 @@ libxl_altp2m_mode = Enumeration("altp2m_mode", [
     (3, "limited"),
     ], init_val = "LIBXL_ALTP2M_MODE_DISABLED")
 
+libxl_virtio_pci_host = Struct("virtio_pci_host", [
+    ("backend_domid",     libxl_domid),
+    ("id",                uint32),
+    ("ecam_base",         uint64),
+    ("ecam_size",         uint64),
+    ("mem_base",          uint64),
+    ("mem_size",          uint64),
+    ("prefetch_mem_base", uint64),
+    ("prefetch_mem_size", uint64),
+    ("irq_first",         uint32),
+    ("num_irqs",          uint32),
+    ])
+
 libxl_domain_build_info = Struct("domain_build_info",[
     ("max_vcpus",       integer),
     ("avail_vcpus",     libxl_bitmap),
@@ -631,6 +645,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
     ("apic",             libxl_defbool),
     ("dm_restrict",      libxl_defbool),
     ("tee",              libxl_tee_type),
+    ("virtio_pci_hosts", Array(libxl_virtio_pci_host, "num_virtio_pci_hosts")),
     ("u", KeyedUnion(None, libxl_domain_type, "type",
                 [("hvm", Struct(None, [("firmware",         string),
                                        ("bios",             libxl_bios_type),
@@ -764,13 +779,22 @@ libxl_device_virtio = Struct("device_virtio", [
     ("backend_domid", libxl_domid),
     ("backend_domname", string),
     ("type", string),
-    ("transport", libxl_virtio_transport),
+    ("u", KeyedUnion(None, libxl_virtio_transport, "transport",
+              [("unknown", None),
+               # Note that virtio-mmio parameters (irq and base) are for internal
+               # use by libxl and can't be modified.
+               ("mmio", Struct(None, [("irq", uint32),
+                                      ("base", uint64),
+                                     ])),
+               ("pci", Struct(None,  [("func", uint8),
+                                      ("dev", uint8),
+                                      ("bus", uint8),
+                                      ("domain", uint16),
+                                      ("host_id", uint32),
+                                     ])),
+              ])),
     ("grant_usage", libxl_defbool),
     ("devid", libxl_devid),
-    # Note that virtio-mmio parameters (irq and base) are for internal
-    # use by libxl and can't be modified.
-    ("irq", uint32),
-    ("base", uint64)
     ])
 
 libxl_device_disk = Struct("device_disk", [
diff --git a/tools/libs/light/libxl_virtio.c b/tools/libs/light/libxl_virtio.c
index e5e321adc5..8062423c75 100644
--- a/tools/libs/light/libxl_virtio.c
+++ b/tools/libs/light/libxl_virtio.c
@@ -57,8 +57,21 @@ static int libxl__set_xenstore_virtio(libxl__gc *gc, uint32_t domid,
 {
     const char *transport = libxl_virtio_transport_to_string(virtio->transport);
 
-    flexarray_append_pair(back, "irq", GCSPRINTF("%u", virtio->irq));
-    flexarray_append_pair(back, "base", GCSPRINTF("%#"PRIx64, virtio->base));
+    if (virtio->transport == LIBXL_VIRTIO_TRANSPORT_MMIO) {
+        flexarray_append_pair(back, "irq", GCSPRINTF("%u", virtio->u.mmio.irq));
+        flexarray_append_pair(back, "base", GCSPRINTF("%#"PRIx64, virtio->u.mmio.base));
+    } else {
+        /*
+         * TODO:
+         * Probably we will also need to store PCI Host bridge details (irq and
+         * mem ranges) this particular PCI device belongs to if emulator cannot
+         * or should not rely on what is described at include/public/arch-arm.h
+         */
+        flexarray_append_pair(back, "bdf", GCSPRINTF("%04x:%02x:%02x.%01x",
+                              virtio->u.pci.domain, virtio->u.pci.bus,
+                              virtio->u.pci.dev, virtio->u.pci.func));
+        flexarray_append_pair(back, "host_id", GCSPRINTF("%u", virtio->u.pci.host_id));
+    }
     flexarray_append_pair(back, "type", GCSPRINTF("%s", virtio->type));
     flexarray_append_pair(back, "transport", GCSPRINTF("%s", transport));
     flexarray_append_pair(back, "grant_usage",
@@ -84,33 +97,72 @@ static int libxl__virtio_from_xenstore(libxl__gc *gc, const char *libxl_path,
     rc = libxl__backendpath_parse_domid(gc, be_path, &virtio->backend_domid);
     if (rc) goto out;
 
-    rc = libxl__xs_read_checked(gc, XBT_NULL,
-				GCSPRINTF("%s/irq", be_path), &tmp);
-    if (rc) goto out;
-
-    if (tmp) {
-        virtio->irq = strtoul(tmp, NULL, 0);
+    tmp = libxl__xs_read(gc, XBT_NULL, GCSPRINTF("%s/transport", be_path));
+    if (!tmp) {
+        LOG(ERROR, "Missing xenstore node %s/transport", be_path);
+        rc = ERROR_INVAL;
+        goto out;
     }
 
-    tmp = NULL;
-    rc = libxl__xs_read_checked(gc, XBT_NULL,
-				GCSPRINTF("%s/base", be_path), &tmp);
-    if (rc) goto out;
+    rc = libxl_virtio_transport_from_string(tmp, &virtio->transport);
+    if (rc) {
+        LOG(ERROR, "Unable to parse xenstore node %s/transport", be_path);
+        goto out;
+    }
 
-    if (tmp) {
-        virtio->base = strtoul(tmp, NULL, 0);
+    if (virtio->transport != LIBXL_VIRTIO_TRANSPORT_MMIO &&
+        virtio->transport != LIBXL_VIRTIO_TRANSPORT_PCI) {
+        LOG(ERROR, "Unexpected transport for virtio");
+        rc = ERROR_INVAL;
+        goto out;
     }
 
-    tmp = NULL;
-    rc = libxl__xs_read_checked(gc, XBT_NULL,
-				GCSPRINTF("%s/transport", be_path), &tmp);
-    if (rc) goto out;
+    if (virtio->transport == LIBXL_VIRTIO_TRANSPORT_MMIO) {
+        tmp = NULL;
+        rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                    GCSPRINTF("%s/irq", be_path), &tmp);
+        if (rc) goto out;
 
-    if (tmp) {
-        if (!strcmp(tmp, "mmio")) {
-            virtio->transport = LIBXL_VIRTIO_TRANSPORT_MMIO;
-        } else {
-            return ERROR_INVAL;
+        if (tmp) {
+            virtio->u.mmio.irq = strtoul(tmp, NULL, 0);
+        }
+
+        tmp = NULL;
+        rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                    GCSPRINTF("%s/base", be_path), &tmp);
+        if (rc) goto out;
+
+        if (tmp) {
+            virtio->u.mmio.base = strtoul(tmp, NULL, 0);
+        }
+    } else {
+        unsigned int domain, bus, dev, func;
+
+        tmp = NULL;
+        rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                    GCSPRINTF("%s/bdf", be_path), &tmp);
+        if (rc) goto out;
+
+        if (tmp) {
+            if (sscanf(tmp, "%04x:%02x:%02x.%01x",
+                &domain, &bus, &dev, &func) != 4) {
+                rc = ERROR_INVAL;
+                goto out;
+            }
+
+            virtio->u.pci.domain = domain;
+            virtio->u.pci.bus = bus;
+            virtio->u.pci.dev = dev;
+            virtio->u.pci.func = func;
+        }
+
+        tmp = NULL;
+        rc = libxl__xs_read_checked(gc, XBT_NULL,
+                                    GCSPRINTF("%s/host_id", be_path), &tmp);
+        if (rc) goto out;
+
+        if (tmp) {
+            virtio->u.pci.host_id = strtoul(tmp, NULL, 0);
         }
     }
 
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index ed983200c3..4544ce2307 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1217,6 +1217,24 @@ static int parse_virtio_config(libxl_device_virtio *virtio, char *token)
         if (rc) return rc;
     } else if (MATCH_OPTION("grant_usage", token, oparg)) {
         libxl_defbool_set(&virtio->grant_usage, strtoul(oparg, NULL, 0));
+    } else if (MATCH_OPTION("bdf", token, oparg)) {
+        /*
+         * TODO:
+         * We pretend that we are ordinary PCI device to reuse BDF parsing
+         * logic. This needs to be properly reused by adjusting parse_bdf().
+         */
+        libxl_device_pci pci;
+
+        rc = xlu_pci_parse_bdf(NULL, &pci, oparg);
+        if (rc) {
+            fprintf(stderr, "Unable to parse BDF `%s' for virtio-pci\n", oparg);
+            return -1;
+        }
+
+        virtio->u.pci.domain = pci.domain;
+        virtio->u.pci.bus = pci.bus;
+        virtio->u.pci.dev = pci.dev;
+        virtio->u.pci.func = pci.func;
     } else {
         fprintf(stderr, "Unknown string \"%s\" in virtio spec\n", token);
         return -1;
@@ -1238,6 +1256,7 @@ static void parse_virtio_list(const XLU_Config *config,
         while ((item = xlu_cfg_get_listitem(virtios, entry)) != NULL) {
             libxl_device_virtio *virtio;
             char *p;
+            bool bdf_present = false;
 
             virtio = ARRAY_EXTEND_INIT(d_config->virtios, d_config->num_virtios,
                                        libxl_device_virtio_init);
@@ -1260,6 +1279,8 @@ static void parse_virtio_list(const XLU_Config *config,
                         strcat(str, p2);
                         p = str;
                     }
+                } else if (MATCH_OPTION("bdf", p, oparg)) {
+                    bdf_present = true;
                 }
 
                 rc = parse_virtio_config(virtio, p);
@@ -1270,6 +1291,21 @@ static void parse_virtio_list(const XLU_Config *config,
                 p = strtok(NULL, ",");
             }
 
+            if (virtio->transport == LIBXL_VIRTIO_TRANSPORT_UNKNOWN) {
+                fprintf(stderr, "Unspecified transport for virtio\n");
+                rc = ERROR_FAIL; goto out;
+            }
+
+            if (virtio->transport == LIBXL_VIRTIO_TRANSPORT_PCI &&
+                !bdf_present) {
+                fprintf(stderr, "BDF must be specified for virtio-pci\n");
+                rc = ERROR_FAIL; goto out;
+            } else if (virtio->transport == LIBXL_VIRTIO_TRANSPORT_MMIO &&
+                bdf_present) {
+                fprintf(stderr, "BDF must not be specified for virtio-mmio\n");
+                rc = ERROR_FAIL; goto out;
+            }
+
             entry++;
             free(buf);
         }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 4/6] libxl/arm: Reuse generic PCI-IOMMU bindings for virtio-pci devices
  2023-11-15 11:26 [RFC PATCH 0/6] ARM virtio-pci initial support Sergiy Kibrik
                   ` (2 preceding siblings ...)
  2023-11-15 11:26 ` [RFC PATCH 3/6] libxl/arm: Add basic virtio-pci support Sergiy Kibrik
@ 2023-11-15 11:26 ` Sergiy Kibrik
  2023-11-15 11:26 ` [RFC PATCH 5/6] xen/arm: Intercept vPCI config accesses and forward them to emulator Sergiy Kibrik
  4 siblings, 0 replies; 30+ messages in thread
From: Sergiy Kibrik @ 2023-11-15 11:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Wei Liu, Anthony PERARD, Juergen Gross,
	Sergiy Kibrik

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Use the same "xen-grant-dma" device concept for the PCI devices
behind device-tree based PCI Host controller, but with one modification.
Unlike for platform devices, we cannot use generic IOMMU bindings
(iommus property), as we need to support more flexible configuration.
The problem is that PCI devices under the single PCI Host controller
may have the backends running in different Xen domains and thus have
different endpoints ID (backend domains ID).

Reuse generic PCI-IOMMU bindings (iommu-map/iommu-map-mask properties)
which allows us to describe relationship between PCI devices and
backend domains ID properly. Linux guest is already able to deal
with generic PCI-IOMMU bindings (see Linux drivers/xen/grant-dma-ops.c
for details).

According to Linux:
   - Documentation/devicetree/bindings/pci/pci-iommu.txt
   - Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
---
 tools/libs/light/libxl_arm.c | 64 ++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index df6cbbe756..03cf3e424b 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -1030,6 +1030,7 @@ static int make_vpci_node(libxl__gc *gc, void *fdt,
 }
 
 #define PCI_IRQ_MAP_MIN_STRIDE   8
+#define PCI_IOMMU_MAP_STRIDE 4
 
 static int create_virtio_pci_irq_map(libxl__gc *gc, void *fdt,
                                      libxl_virtio_pci_host *host)
@@ -1099,6 +1100,65 @@ static int create_virtio_pci_irq_map(libxl__gc *gc, void *fdt,
     return 0;
 }
 
+/* XXX Consider reusing libxl__realloc() to avoid an extra loop */
+static int create_virtio_pci_iommu_map(libxl__gc *gc, void *fdt,
+                                       libxl_virtio_pci_host *host,
+                                       libxl_domain_config *d_config)
+{
+    uint32_t *full_iommu_map, *iommu_map;
+    unsigned int i, len, ntranslated = 0;
+    int res;
+
+    for (i = 0; i < d_config->num_virtios; i++) {
+        libxl_device_virtio *virtio = &d_config->virtios[i];
+
+        if (libxl_defbool_val(virtio->grant_usage) &&
+            virtio->transport == LIBXL_VIRTIO_TRANSPORT_PCI &&
+            virtio->u.pci.host_id == host->id) {
+            ntranslated++;
+        }
+    }
+
+    if (!ntranslated)
+        return 0;
+
+    len = ntranslated * sizeof(uint32_t) * PCI_IOMMU_MAP_STRIDE;
+    full_iommu_map = libxl__malloc(gc, len);
+    iommu_map = full_iommu_map;
+
+    /* See Linux Documentation/devicetree/bindings/pci/pci-iommu.txt */
+    for (i = 0; i < d_config->num_virtios; i++) {
+        libxl_device_virtio *virtio = &d_config->virtios[i];
+
+        if (libxl_defbool_val(virtio->grant_usage) &&
+            virtio->transport == LIBXL_VIRTIO_TRANSPORT_PCI &&
+            virtio->u.pci.host_id == host->id) {
+            uint16_t bdf = (virtio->u.pci.bus << 8) |
+                (virtio->u.pci.dev << 3) | virtio->u.pci.func;
+            unsigned int j = 0;
+
+            /* rid_base (1 cell) */
+            iommu_map[j++] = cpu_to_fdt32(bdf);
+
+            /* iommu_phandle (1 cell) */
+            iommu_map[j++] = cpu_to_fdt32(GUEST_PHANDLE_IOMMU);
+
+            /* iommu_base (1 cell) */
+            iommu_map[j++] = cpu_to_fdt32(virtio->backend_domid);
+
+            /* length (1 cell) */
+            iommu_map[j++] = cpu_to_fdt32(1 << 3);
+
+            iommu_map += PCI_IOMMU_MAP_STRIDE;
+        }
+    }
+
+    res = fdt_property(fdt, "iommu-map", full_iommu_map, len);
+    if (res) return res;
+
+    return 0;
+}
+
 /* TODO Consider reusing make_vpci_node() */
 static int make_virtio_pci_node(libxl__gc *gc, void *fdt,
                                 libxl_virtio_pci_host *host,
@@ -1147,6 +1207,10 @@ static int make_virtio_pci_node(libxl__gc *gc, void *fdt,
     res = create_virtio_pci_irq_map(gc, fdt, host);
     if (res) return res;
 
+    /* xen,grant-dma bindings */
+    res = create_virtio_pci_iommu_map(gc, fdt, host, d_config);
+    if (res) return res;
+
     res = fdt_end_node(fdt);
     if (res) return res;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 5/6] xen/arm: Intercept vPCI config accesses and forward them to emulator
  2023-11-15 11:26 [RFC PATCH 0/6] ARM virtio-pci initial support Sergiy Kibrik
                   ` (3 preceding siblings ...)
  2023-11-15 11:26 ` [RFC PATCH 4/6] libxl/arm: Reuse generic PCI-IOMMU bindings for virtio-pci devices Sergiy Kibrik
@ 2023-11-15 11:26 ` Sergiy Kibrik
  2023-11-15 12:45   ` Julien Grall
  4 siblings, 1 reply; 30+ messages in thread
From: Sergiy Kibrik @ 2023-11-15 11:26 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Sergiy Kibrik

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This is needed for supporting virtio-pci.

In case when the PCI Host bridge is emulated outside of Xen
(IOREQ server), we need some mechanism to intercept config space
accesses on Xen on Arm, and forward them to the emulator
(for example, virtio backend) via IOREQ request.

Unlike x86, on Arm these accesses are MMIO, there is no CFC/CF8 method
to detect which PCI device is targeted.

In order to not mix PCI passthrough with virtio-pci features we add
one more region to cover the total configuration space for all possible
host bridges which can serve virtio-pci devices for that guest.
We expose one PCI host bridge per virtio backend domain.

To distinguish between virtio-pci devices belonging to PCI host
bridges emulated by device-models running in different backend domains
we also need to calculate a segment in virtio_pci_ioreq_server_get_addr().
For this to work the toolstack is responsible to allocate and assign unique
configuration space range and segment (host_id) within total reserved range
for these device-models. The device-models are responsible for applying
a segment when forming DM op for registering PCI devices with IOREQ Server.

Introduce new CONFIG_VIRTIO_PCI to guard the whole handling.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
---
 xen/arch/arm/Kconfig                  |  10 +++
 xen/arch/arm/domain.c                 |   2 +-
 xen/arch/arm/{ => include/asm}/vpci.h |  11 +++
 xen/arch/arm/io.c                     |   8 +-
 xen/arch/arm/ioreq.c                  |  19 ++++-
 xen/arch/arm/vpci.c                   | 106 +++++++++++++++++++++++++-
 6 files changed, 146 insertions(+), 10 deletions(-)
 rename xen/arch/arm/{ => include/asm}/vpci.h (75%)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 2939db429b..9e8d6c4ce2 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -190,6 +190,16 @@ config STATIC_SHM
 	help
 	  This option enables statically shared memory on a dom0less system.
 
+config VIRTIO_PCI
+	bool "Support of virtio-pci transport" if EXPERT
+	depends on ARM_64
+	select HAS_PCI
+	select HAS_VPCI
+	select IOREQ_SERVER
+	default n
+	help
+	  This option enables support of virtio-pci transport
+
 endmenu
 
 menu "ARM errata workaround via the alternative framework"
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 28e3aaa5e4..140f9bbd58 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -28,9 +28,9 @@
 #include <asm/tee/tee.h>
 #include <asm/vfp.h>
 #include <asm/vgic.h>
+#include <asm/vpci.h>
 #include <asm/vtimer.h>
 
-#include "vpci.h"
 #include "vuart.h"
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
diff --git a/xen/arch/arm/vpci.h b/xen/arch/arm/include/asm/vpci.h
similarity index 75%
rename from xen/arch/arm/vpci.h
rename to xen/arch/arm/include/asm/vpci.h
index 3c713f3fcd..54d083c67b 100644
--- a/xen/arch/arm/vpci.h
+++ b/xen/arch/arm/include/asm/vpci.h
@@ -30,6 +30,17 @@ static inline unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 }
 #endif
 
+#ifdef CONFIG_VIRTIO_PCI
+bool virtio_pci_ioreq_server_get_addr(const struct domain *d,
+                                      paddr_t gpa, uint64_t *addr);
+#else
+static inline bool virtio_pci_ioreq_server_get_addr(const struct domain *d,
+                                                    paddr_t gpa, uint64_t *addr)
+{
+    return false;
+}
+#endif
+
 #endif /* __ARCH_ARM_VPCI_H__ */
 
 /*
diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 96c740d563..5c3e03e30d 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -26,6 +26,7 @@ static enum io_state handle_read(const struct mmio_handler *handler,
 {
     const struct hsr_dabt dabt = info->dabt;
     struct cpu_user_regs *regs = guest_cpu_user_regs();
+    int ret;
     /*
      * Initialize to zero to avoid leaking data if there is an
      * implementation error in the emulation (such as not correctly
@@ -33,8 +34,9 @@ static enum io_state handle_read(const struct mmio_handler *handler,
      */
     register_t r = 0;
 
-    if ( !handler->ops->read(v, info, &r, handler->priv) )
-        return IO_ABORT;
+    ret = handler->ops->read(v, info, &r, handler->priv);
+    if ( ret != IO_HANDLED )
+        return ret != IO_RETRY ? IO_ABORT : ret;
 
     r = sign_extend(dabt, r);
 
@@ -53,7 +55,7 @@ static enum io_state handle_write(const struct mmio_handler *handler,
 
     ret = handler->ops->write(v, info, get_user_reg(regs, dabt.reg),
                               handler->priv);
-    return ret ? IO_HANDLED : IO_ABORT;
+    return ret != IO_HANDLED && ret != IO_RETRY ? IO_ABORT : ret;
 }
 
 /* This function assumes that mmio regions are not overlapped */
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 5df755b48b..fd4cc755b6 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -10,6 +10,7 @@
 
 #include <asm/traps.h>
 #include <asm/ioreq.h>
+#include <asm/vpci.h>
 
 #include <public/hvm/ioreq.h>
 
@@ -193,12 +194,24 @@ bool arch_ioreq_server_get_type_addr(const struct domain *d,
                                      uint8_t *type,
                                      uint64_t *addr)
 {
+    uint64_t pci_addr;
+
     if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
         return false;
 
-    *type = (p->type == IOREQ_TYPE_PIO) ?
-             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
-    *addr = p->addr;
+    if ( p->type == IOREQ_TYPE_COPY &&
+         virtio_pci_ioreq_server_get_addr(d, p->addr, &pci_addr) )
+    {
+        /* PCI config data cycle */
+        *type = XEN_DMOP_IO_RANGE_PCI;
+        *addr = pci_addr;
+    }
+    else
+    {
+        *type = (p->type == IOREQ_TYPE_PIO) ?
+                 XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
+        *addr = p->addr;
+    }
 
     return true;
 }
diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 3bc4bb5508..1de4c3e71b 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -2,9 +2,12 @@
 /*
  * xen/arch/arm/vpci.c
  */
+#include <xen/ioreq.h>
 #include <xen/sched.h>
+#include <xen/sizes.h>
 #include <xen/vpci.h>
 
+#include <asm/ioreq.h>
 #include <asm/mmio.h>
 
 static pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge,
@@ -24,6 +27,27 @@ static pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge,
     return sbdf;
 }
 
+bool virtio_pci_ioreq_server_get_addr(const struct domain *d,
+                                      paddr_t gpa, uint64_t *addr)
+{
+    pci_sbdf_t sbdf;
+
+    if ( !has_vpci(d) )
+        return false;
+
+    if ( gpa < GUEST_VIRTIO_PCI_ECAM_BASE ||
+         gpa >= GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE )
+        return false;
+
+    sbdf.sbdf = VPCI_ECAM_BDF((gpa - GUEST_VIRTIO_PCI_ECAM_BASE) %
+        GUEST_VIRTIO_PCI_HOST_ECAM_SIZE);
+    sbdf.seg = (gpa - GUEST_VIRTIO_PCI_ECAM_BASE) /
+        GUEST_VIRTIO_PCI_HOST_ECAM_SIZE;
+    *addr = ((uint64_t)sbdf.sbdf << 32) | ECAM_REG_OFFSET(gpa);
+
+    return true;
+}
+
 static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
                           register_t *r, void *p)
 {
@@ -36,12 +60,12 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
                         1U << info->dabt.size, &data) )
     {
         *r = data;
-        return 1;
+        return IO_HANDLED;
     }
 
     *r = ~0ul;
 
-    return 0;
+    return IO_ABORT;
 }
 
 static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
@@ -59,6 +83,61 @@ static const struct mmio_handler_ops vpci_mmio_handler = {
     .write = vpci_mmio_write,
 };
 
+#ifdef CONFIG_VIRTIO_PCI
+static int virtio_pci_mmio_read(struct vcpu *v, mmio_info_t *info,
+                                register_t *r, void *p)
+{
+    const uint8_t access_size = (1 << info->dabt.size) * 8;
+    const uint64_t access_mask = GENMASK_ULL(access_size - 1, 0);
+    int rc = IO_HANDLED;
+
+    ASSERT(!is_hardware_domain(v->domain));
+
+    if ( domain_has_ioreq_server(v->domain) )
+    {
+        rc = try_fwd_ioserv(guest_cpu_user_regs(), v, info);
+        if ( rc == IO_HANDLED )
+        {
+            *r = v->io.req.data;
+            v->io.req.state = STATE_IOREQ_NONE;
+            return IO_HANDLED;
+        }
+        else if ( rc == IO_UNHANDLED )
+            rc = IO_HANDLED;
+    }
+
+    *r = access_mask;
+    return rc;
+}
+
+static int virtio_pci_mmio_write(struct vcpu *v, mmio_info_t *info,
+                                 register_t r, void *p)
+{
+    int rc = IO_HANDLED;
+
+    ASSERT(!is_hardware_domain(v->domain));
+
+    if ( domain_has_ioreq_server(v->domain) )
+    {
+        rc = try_fwd_ioserv(guest_cpu_user_regs(), v, info);
+        if ( rc == IO_HANDLED )
+        {
+            v->io.req.state = STATE_IOREQ_NONE;
+            return IO_HANDLED;
+        }
+        else if ( rc == IO_UNHANDLED )
+            rc = IO_HANDLED;
+    }
+
+    return rc;
+}
+
+static const struct mmio_handler_ops virtio_pci_mmio_handler = {
+    .read  = virtio_pci_mmio_read,
+    .write = virtio_pci_mmio_write,
+};
+#endif
+
 static int vpci_setup_mmio_handler_cb(struct domain *d,
                                       struct pci_host_bridge *bridge)
 {
@@ -90,9 +169,17 @@ int domain_vpci_init(struct domain *d)
             return ret;
     }
     else
+    {
         register_mmio_handler(d, &vpci_mmio_handler,
                               GUEST_VPCI_ECAM_BASE, GUEST_VPCI_ECAM_SIZE, NULL);
 
+#ifdef CONFIG_VIRTIO_PCI
+        register_mmio_handler(d, &virtio_pci_mmio_handler,
+                              GUEST_VIRTIO_PCI_ECAM_BASE,
+                              GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE, NULL);
+#endif
+    }
+
     return 0;
 }
 
@@ -105,6 +192,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
 
 unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
+    unsigned int count;
+
     if ( !has_vpci(d) )
         return 0;
 
@@ -125,7 +214,18 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
      * For guests each host bridge requires one region to cover the
      * configuration space. At the moment, we only expose a single host bridge.
      */
-    return 1;
+    count = 1;
+
+    /*
+     * In order to not mix PCI passthrough with virtio-pci features we add
+     * one more region to cover the total configuration space for all possible
+     * host bridges which can serve virtio devices for that guest.
+     * We expose one host bridge per virtio backend domain.
+     */
+    if ( IS_ENABLED(CONFIG_VIRTIO_PCI) )
+        count++;
+
+    return count;
 }
 
 /*
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 11:26 ` [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci Sergiy Kibrik
@ 2023-11-15 12:33   ` Julien Grall
  2023-11-15 16:51     ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 30+ messages in thread
From: Julien Grall @ 2023-11-15 12:33 UTC (permalink / raw)
  To: Sergiy Kibrik, xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk

Hi,

Thanks for adding support for virtio-pci in Xen. I have some questions.

On 15/11/2023 11:26, Sergiy Kibrik wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> In order to enable more use-cases such as having multiple
> device-models (Qemu) running in different backend domains which provide
> virtio-pci devices for the same guest, we allocate and expose one
> PCI host bridge for every virtio backend domain for that guest.

OOI, why do you need to expose one PCI host bridge for every stubdomain?

In fact looking at the next patch, it seems you are handling some of the 
hostbridge request in Xen. This is adds a bit more confusion.

I was expecting the virtual PCI device would be in the vPCI and each 
Device emulator would advertise which BDF they are covering.

> 
> For that purpose, reserve separate virtio-pci resources (memory and SPI range
> for Legacy PCI interrupts) up to 8 possible PCI hosts (to be aligned with

Do you mean host bridge rather than host?

> MAX_NR_IOREQ_SERVERS) and allocate a host per backend domain. The PCI host
> details including its host_id to be written to dedicated Xenstore node for
> the device-model to retrieve.

So which with approach, who is decide which BDF will be used for a given 
virtio PCI device?

> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
> ---
>   xen/include/public/arch-arm.h | 21 +++++++++++++++++++++
>   1 file changed, 21 insertions(+)
> 
> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
> index a25e87dbda..e6c9cd5335 100644
> --- a/xen/include/public/arch-arm.h
> +++ b/xen/include/public/arch-arm.h
> @@ -466,6 +466,19 @@ typedef uint64_t xen_callback_t;
>   #define GUEST_VPCI_MEM_ADDR                 xen_mk_ullong(0x23000000)
>   #define GUEST_VPCI_MEM_SIZE                 xen_mk_ullong(0x10000000)
>   
> +/*
> + * 16 MB is reserved for virtio-pci configuration space based on calculation
> + * 8 bridges * 2 buses x 32 devices x 8 functions x 4 KB = 16 MB

Can you explain how youd ecided the "2"?

> + */
> +#define GUEST_VIRTIO_PCI_ECAM_BASE          xen_mk_ullong(0x33000000)
> +#define GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE    xen_mk_ullong(0x01000000)
> +#define GUEST_VIRTIO_PCI_HOST_ECAM_SIZE     xen_mk_ullong(0x00200000)
> +
> +/* 64 MB is reserved for virtio-pci memory */
> +#define GUEST_VIRTIO_PCI_ADDR_TYPE_MEM    xen_mk_ullong(0x02000000)
> +#define GUEST_VIRTIO_PCI_MEM_ADDR         xen_mk_ullong(0x34000000)
> +#define GUEST_VIRTIO_PCI_MEM_SIZE         xen_mk_ullong(0x04000000)
> +
>   /*
>    * 16MB == 4096 pages reserved for guest to use as a region to map its
>    * grant table in.
> @@ -476,6 +489,11 @@ typedef uint64_t xen_callback_t;
>   #define GUEST_MAGIC_BASE  xen_mk_ullong(0x39000000)
>   #define GUEST_MAGIC_SIZE  xen_mk_ullong(0x01000000)
>   
> +/* 64 MB is reserved for virtio-pci Prefetch memory */

This doesn't seem a lot depending on your use case. Can you details how 
you can up with "64 MB"?

> +#define GUEST_VIRTIO_PCI_ADDR_TYPE_PREFETCH_MEM    xen_mk_ullong(0x42000000)
> +#define GUEST_VIRTIO_PCI_PREFETCH_MEM_ADDR         xen_mk_ullong(0x3a000000)
> +#define GUEST_VIRTIO_PCI_PREFETCH_MEM_SIZE         xen_mk_ullong(0x04000000)
> +
>   #define GUEST_RAM_BANKS   2
>   
>   /*
> @@ -515,6 +533,9 @@ typedef uint64_t xen_callback_t;
>   #define GUEST_VIRTIO_MMIO_SPI_FIRST   33
>   #define GUEST_VIRTIO_MMIO_SPI_LAST    43
>   
> +#define GUEST_VIRTIO_PCI_SPI_FIRST   44
> +#define GUEST_VIRTIO_PCI_SPI_LAST    76

Just to confirm this is 4 interrupts per PCI host bridge? If so, can 
this be clarified in a comment?

Also, above you said that the host ID will be written to Xenstore. But 
who will decide which SPI belongs to which host bridge?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 5/6] xen/arm: Intercept vPCI config accesses and forward them to emulator
  2023-11-15 11:26 ` [RFC PATCH 5/6] xen/arm: Intercept vPCI config accesses and forward them to emulator Sergiy Kibrik
@ 2023-11-15 12:45   ` Julien Grall
  2023-11-15 23:30     ` Stefano Stabellini
  0 siblings, 1 reply; 30+ messages in thread
From: Julien Grall @ 2023-11-15 12:45 UTC (permalink / raw)
  To: Sergiy Kibrik, xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Bertrand Marquis,
	Michal Orzel, Volodymyr Babchuk

Hi,

On 15/11/2023 11:26, Sergiy Kibrik wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This is needed for supporting virtio-pci.
> 
> In case when the PCI Host bridge is emulated outside of Xen
> (IOREQ server), we need some mechanism to intercept config space
> accesses on Xen on Arm, and forward them to the emulator
> (for example, virtio backend) via IOREQ request.
> 
> Unlike x86, on Arm these accesses are MMIO, there is no CFC/CF8 method
> to detect which PCI device is targeted.
> 
> In order to not mix PCI passthrough with virtio-pci features we add
> one more region to cover the total configuration space for all possible
> host bridges which can serve virtio-pci devices for that guest.
> We expose one PCI host bridge per virtio backend domain.
I am a little confused. If you expose one PCI host bridge per virtio 
backend, then why can't the backend simply register the MMIO region and 
do the translation itself when it receives the read/write?

To me, it only makes sense for Xen to emulate the host bridge access if 
you plan to have one host bridge shared between multiple IOREQ domains 
or mix with PCI pasthrough.

 From my perspective, I don't expect we would have that many virtio PCI 
devices. So imposing a host bridge per device emulator will mean extra 
resource in the guest as well (they need to keep track of all the 
hostbridge).

So in the longer run, I think we want to allow mixing PCI passthrough 
and virtio-PCI (or really any emulated PCI because nothing here is 
virtio specific).

For now, your approach would be OK to enable virtio PCI on Xen. But I 
don't think there are any changes necessary in Xen other than reserving 
some MMIO regions/IRQ.

Cheers,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 12:33   ` Julien Grall
@ 2023-11-15 16:51     ` Oleksandr Tyshchenko
  2023-11-15 17:31       ` Julien Grall
  2023-11-15 23:28       ` Stefano Stabellini
  0 siblings, 2 replies; 30+ messages in thread
From: Oleksandr Tyshchenko @ 2023-11-15 16:51 UTC (permalink / raw)
  To: Julien Grall, Sergiy Kibrik, xen-devel@lists.xenproject.org
  Cc: Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk



On 15.11.23 14:33, Julien Grall wrote:
> Hi,


Hello Julien

Let me please try to explain some bits.


> 
> Thanks for adding support for virtio-pci in Xen. I have some questions.
> 
> On 15/11/2023 11:26, Sergiy Kibrik wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> In order to enable more use-cases such as having multiple
>> device-models (Qemu) running in different backend domains which provide
>> virtio-pci devices for the same guest, we allocate and expose one
>> PCI host bridge for every virtio backend domain for that guest.
> 
> OOI, why do you need to expose one PCI host bridge for every stubdomain?
> 
> In fact looking at the next patch, it seems you are handling some of the 
> hostbridge request in Xen. This is adds a bit more confusion.
> 
> I was expecting the virtual PCI device would be in the vPCI and each 
> Device emulator would advertise which BDF they are covering.


This patch series only covers use-cases where the device emulator 
handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind 
it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI 
pass-through resources, handling, accounting, nothing. From the 
hypervisor we only need a help to intercept the config space accesses 
happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ... 
GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and 
forward them to the linked device emulator (if any), that's all.

It is not possible (with current series) to run device emulators what
emulate only separate PCI (virtio-pci) devices. For it to be possible, I 
think, much more changes are required than current patch series does. 
There at least should be special PCI Host bridge emulation in Xen (or 
reuse vPCI) for the integration. Also Xen should be in charge of forming 
resulting PCI interrupt based on each PCI device level signaling (if we 
use legacy interrupts), some kind of x86's XEN_DMOP_set_pci_intx_level, 
etc. Please note, I am not saying this is not possible in general, 
likely it is possible, but initial patch series doesn't cover these 
use-cases)

We expose one PCI host bridge per virtio backend domain. This is a 
separate PCI host bridge to combine all virtio-pci devices running in 
the same backend domain (in the same device emulator currently).
The examples:
- if only one domain runs Qemu which servers virtio-blk, virtio-net, 
virtio-console devices for DomU - only single PCI Host bridge will be 
exposed for DomU
- if we add another domain to run Qemu to serve additionally virtio-gpu, 
virtio-input and virtio-snd for the *same* DomU - we expose second PCI 
Host bridge for DomU

I am afraid, we cannot end up exposing only single PCI Host bridge with 
current model (if we use device emulators running in different domains 
that handles the *entire* PCI Host bridges), this won't work.

Please note, I might miss some bits since this enabling work.


> 
>>
>> For that purpose, reserve separate virtio-pci resources (memory and 
>> SPI range
>> for Legacy PCI interrupts) up to 8 possible PCI hosts (to be aligned with
> 
> Do you mean host bridge rather than host?

yes


> 
>> MAX_NR_IOREQ_SERVERS) and allocate a host per backend domain. The PCI 
>> host
>> details including its host_id to be written to dedicated Xenstore node 
>> for
>> the device-model to retrieve.
> 
> So which with approach, who is decide which BDF will be used for a given 
> virtio PCI device?


toolstack (via configuration file)

> 
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
>> ---
>>   xen/include/public/arch-arm.h | 21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> diff --git a/xen/include/public/arch-arm.h 
>> b/xen/include/public/arch-arm.h
>> index a25e87dbda..e6c9cd5335 100644
>> --- a/xen/include/public/arch-arm.h
>> +++ b/xen/include/public/arch-arm.h
>> @@ -466,6 +466,19 @@ typedef uint64_t xen_callback_t;
>>   #define GUEST_VPCI_MEM_ADDR                 xen_mk_ullong(0x23000000)
>>   #define GUEST_VPCI_MEM_SIZE                 xen_mk_ullong(0x10000000)
>> +/*
>> + * 16 MB is reserved for virtio-pci configuration space based on 
>> calculation
>> + * 8 bridges * 2 buses x 32 devices x 8 functions x 4 KB = 16 MB
> 
> Can you explain how youd ecided the "2"?

good question, we have a limited free space available in memory layout 
(we had difficulties to find a suitable holes) also we don't expect a 
lot of virtio-pci devices, so "256" used vPCI would be too much. It was 
decided to reduce significantly, but select maximum to fit into free 
space, with having "2" buses we still fit into the chosen holes.


> 
>> + */
>> +#define GUEST_VIRTIO_PCI_ECAM_BASE          xen_mk_ullong(0x33000000)
>> +#define GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE    xen_mk_ullong(0x01000000)
>> +#define GUEST_VIRTIO_PCI_HOST_ECAM_SIZE     xen_mk_ullong(0x00200000)
>> +
>> +/* 64 MB is reserved for virtio-pci memory */
>> +#define GUEST_VIRTIO_PCI_ADDR_TYPE_MEM    xen_mk_ullong(0x02000000)
>> +#define GUEST_VIRTIO_PCI_MEM_ADDR         xen_mk_ullong(0x34000000)
>> +#define GUEST_VIRTIO_PCI_MEM_SIZE         xen_mk_ullong(0x04000000)
>> +
>>   /*
>>    * 16MB == 4096 pages reserved for guest to use as a region to map its
>>    * grant table in.
>> @@ -476,6 +489,11 @@ typedef uint64_t xen_callback_t;
>>   #define GUEST_MAGIC_BASE  xen_mk_ullong(0x39000000)
>>   #define GUEST_MAGIC_SIZE  xen_mk_ullong(0x01000000)
>> +/* 64 MB is reserved for virtio-pci Prefetch memory */
> 
> This doesn't seem a lot depending on your use case. Can you details how 
> you can up with "64 MB"?

the same calculation as it was done configuration space. It was observed 
that only 16K is used per virtio-pci device (maybe it can be bigger for 
usual PCI device, I don't know). Please look at the example of DomU log 
below (to strings that contain "*BAR 4: assigned*"):


> root@h3ulcb:~# dmesg | grep pci
> [    0.444163] pci-host-generic 33000000.pcie: host bridge /pcie@33000000 ranges:
> [    0.444257] pci-host-generic 33000000.pcie:      MEM 0x0034000000..0x00347fffff -> 0x0034000000
> [    0.444304] pci-host-generic 33000000.pcie:      MEM 0x003a000000..0x003a7fffff -> 0x003a000000
> [    0.444396] pci-host-generic 33000000.pcie: ECAM at [mem 0x33000000-0x331fffff] for [bus 00-01]
> [    0.444595] pci-host-generic 33000000.pcie: PCI host bridge to bus 0000:00
> [    0.444635] pci_bus 0000:00: root bus resource [bus 00-01]
> [    0.444662] pci_bus 0000:00: root bus resource [mem 0x34000000-0x347fffff]
> [    0.444692] pci_bus 0000:00: root bus resource [mem 0x3a000000-0x3a7fffff pref]
> [    0.445193] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
> [    0.449493] pci 0000:00:08.0: [1af4:1042] type 00 class 0x010000                                                                                                                                        
> [    0.451760] pci 0000:00:08.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]                                                                                                                          
> [    0.455985] pci 0000:00:08.0: BAR 4: assigned [mem 0x3a000000-0x3a003fff 64bit pref]   
> [    0.456678] pci-host-generic 33200000.pcie: host bridge /pcie@33200000 ranges:
> [    0.456748] pci-host-generic 33200000.pcie:      MEM 0x0034800000..0x0034ffffff -> 0x0034800000
> [    0.456793] pci-host-generic 33200000.pcie:      MEM 0x003a800000..0x003affffff -> 0x003a800000
> [    0.456879] pci-host-generic 33200000.pcie: ECAM at [mem 0x33200000-0x333fffff] for [bus 00-01]
> [    0.457038] pci-host-generic 33200000.pcie: PCI host bridge to bus 0001:00
> [    0.457079] pci_bus 0001:00: root bus resource [bus 00-01]
> [    0.457106] pci_bus 0001:00: root bus resource [mem 0x34800000-0x34ffffff]
> [    0.457136] pci_bus 0001:00: root bus resource [mem 0x3a800000-0x3affffff pref]
> [    0.457808] pci 0001:00:00.0: [1b36:0008] type 00 class 0x060000
> [    0.461401] pci 0001:00:01.0: [1af4:1041] type 00 class 0x020000
> [    0.463537] pci 0001:00:01.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> [    0.468135] pci 0001:00:02.0: [1af4:1042] type 00 class 0x010000
> [    0.470185] pci 0001:00:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> [    0.474575] pci 0001:00:03.0: [1af4:1050] type 00 class 0x038000
> [    0.476534] pci 0001:00:03.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> [    0.480942] pci 0001:00:04.0: [1af4:1052] type 00 class 0x090000
> [    0.483096] pci 0001:00:04.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> [    0.487631] pci 0001:00:06.0: [1af4:1053] type 00 class 0x078000
> [    0.489693] pci 0001:00:06.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
> [    0.493669] pci 0001:00:01.0: BAR 4: assigned [mem 0x3a800000-0x3a803fff 64bit pref]                                                                                                                    
> [    0.495840] pci 0001:00:02.0: BAR 4: assigned [mem 0x3a804000-0x3a807fff 64bit pref]                                                                                                                    
> [    0.496656] pci 0001:00:03.0: BAR 4: assigned [mem 0x3a808000-0x3a80bfff 64bit pref]                                                                                                                    
> [    0.497671] pci 0001:00:04.0: BAR 4: assigned [mem 0x3a80c000-0x3a80ffff 64bit pref]                                                                                                                    
> [    0.498320] pci 0001:00:06.0: BAR 4: assigned [mem 0x3a810000-0x3a813fff 64bit pref]      
> [    0.500732] virtio-pci 0000:00:08.0: enabling device (0000 -> 0002)
> [    0.509728] virtio-pci 0001:00:01.0: enabling device (0000 -> 0002)
> [    0.529757] virtio-pci 0001:00:02.0: enabling device (0000 -> 0002)
> [    0.550208] virtio-pci 0001:00:03.0: enabling device (0000 -> 0002)
> [    0.571012] virtio-pci 0001:00:04.0: enabling device (0000 -> 0002)
> [    0.591042] virtio-pci 0001:00:06.0: enabling device (0000 -> 0002)



> 
>> +#define GUEST_VIRTIO_PCI_ADDR_TYPE_PREFETCH_MEM    
>> xen_mk_ullong(0x42000000)
>> +#define GUEST_VIRTIO_PCI_PREFETCH_MEM_ADDR         
>> xen_mk_ullong(0x3a000000)
>> +#define GUEST_VIRTIO_PCI_PREFETCH_MEM_SIZE         
>> xen_mk_ullong(0x04000000)
>> +
>>   #define GUEST_RAM_BANKS   2
>>   /*
>> @@ -515,6 +533,9 @@ typedef uint64_t xen_callback_t;
>>   #define GUEST_VIRTIO_MMIO_SPI_FIRST   33
>>   #define GUEST_VIRTIO_MMIO_SPI_LAST    43
>> +#define GUEST_VIRTIO_PCI_SPI_FIRST   44
>> +#define GUEST_VIRTIO_PCI_SPI_LAST    76
> 
> Just to confirm this is 4 interrupts per PCI host bridge? If so, can 
> this be clarified in a comment?

yes (INTA - INTD)


> 
> Also, above you said that the host ID will be written to Xenstore. But 
> who will decide which SPI belongs to which host bridge?


toolstack, it in charge of PCI host bridge resources allocation (as we 
need to both properly create PCI Host bridge device tree node and inform 
device emulator about PCI host bridge details). Please take a look at:

https://patchwork.kernel.org/project/xen-devel/patch/20231115112611.3865905-4-Sergiy_Kibrik@epam.com/

> 
> Cheers,
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 16:51     ` Oleksandr Tyshchenko
@ 2023-11-15 17:31       ` Julien Grall
  2023-11-15 18:14         ` Oleksandr Tyshchenko
  2023-11-17 13:19         ` Sergiy Kibrik
  2023-11-15 23:28       ` Stefano Stabellini
  1 sibling, 2 replies; 30+ messages in thread
From: Julien Grall @ 2023-11-15 17:31 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Sergiy Kibrik,
	xen-devel@lists.xenproject.org
  Cc: Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk

Hi Oleksandr,

On 15/11/2023 16:51, Oleksandr Tyshchenko wrote:
> 
> 
> On 15.11.23 14:33, Julien Grall wrote:
>> Hi,
> 
> 
> Hello Julien
> 
> Let me please try to explain some bits.
> 
> 
>>
>> Thanks for adding support for virtio-pci in Xen. I have some questions.
>>
>> On 15/11/2023 11:26, Sergiy Kibrik wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> In order to enable more use-cases such as having multiple
>>> device-models (Qemu) running in different backend domains which provide
>>> virtio-pci devices for the same guest, we allocate and expose one
>>> PCI host bridge for every virtio backend domain for that guest.
>>
>> OOI, why do you need to expose one PCI host bridge for every stubdomain?
>>
>> In fact looking at the next patch, it seems you are handling some of the
>> hostbridge request in Xen. This is adds a bit more confusion.
>>
>> I was expecting the virtual PCI device would be in the vPCI and each
>> Device emulator would advertise which BDF they are covering.
> 
> 
> This patch series only covers use-cases where the device emulator
> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind
> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI
> pass-through resources, handling, accounting, nothing. 

I understood you want to one Device Emulator to handle the entire PCI 
host bridge. But...

 From the
> hypervisor we only need a help to intercept the config space accesses
> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ...
> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and
> forward them to the linked device emulator (if any), that's all.

... I really don't see why you need to add code in Xen to trap the 
region. If QEMU is dealing with the hostbridge, then it should be able 
to register the MMIO region and then do the translation.

> 
> It is not possible (with current series) to run device emulators what
> emulate only separate PCI (virtio-pci) devices. For it to be possible, I
> think, much more changes are required than current patch series does.
> There at least should be special PCI Host bridge emulation in Xen (or
> reuse vPCI) for the integration. Also Xen should be in charge of forming
> resulting PCI interrupt based on each PCI device level signaling (if we
> use legacy interrupts), some kind of x86's XEN_DMOP_set_pci_intx_level,
> etc. Please note, I am not saying this is not possible in general,
> likely it is possible, but initial patch series doesn't cover these
> use-cases)
> 
> We expose one PCI host bridge per virtio backend domain. This is a
> separate PCI host bridge to combine all virtio-pci devices running in
> the same backend domain (in the same device emulator currently).
> The examples:
> - if only one domain runs Qemu which servers virtio-blk, virtio-net,
> virtio-console devices for DomU - only single PCI Host bridge will be
> exposed for DomU
> - if we add another domain to run Qemu to serve additionally virtio-gpu,
> virtio-input and virtio-snd for the *same* DomU - we expose second PCI
> Host bridge for DomU
> 
> I am afraid, we cannot end up exposing only single PCI Host bridge with
> current model (if we use device emulators running in different domains
> that handles the *entire* PCI Host bridges), this won't work.

That makes sense and it is fine. But see above, I think only the #2 is 
necessary for the hypervisor. Patch #5 should not be necessary at all.

[...]

>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
>>> ---
>>>    xen/include/public/arch-arm.h | 21 +++++++++++++++++++++
>>>    1 file changed, 21 insertions(+)
>>>
>>> diff --git a/xen/include/public/arch-arm.h
>>> b/xen/include/public/arch-arm.h
>>> index a25e87dbda..e6c9cd5335 100644
>>> --- a/xen/include/public/arch-arm.h
>>> +++ b/xen/include/public/arch-arm.h
>>> @@ -466,6 +466,19 @@ typedef uint64_t xen_callback_t;
>>>    #define GUEST_VPCI_MEM_ADDR                 xen_mk_ullong(0x23000000)
>>>    #define GUEST_VPCI_MEM_SIZE                 xen_mk_ullong(0x10000000)
>>> +/*
>>> + * 16 MB is reserved for virtio-pci configuration space based on
>>> calculation
>>> + * 8 bridges * 2 buses x 32 devices x 8 functions x 4 KB = 16 MB
>>
>> Can you explain how youd ecided the "2"?
> 
> good question, we have a limited free space available in memory layout
> (we had difficulties to find a suitable holes) also we don't expect a
> lot of virtio-pci devices, so "256" used vPCI would be too much. It was
> decided to reduce significantly, but select maximum to fit into free
> space, with having "2" buses we still fit into the chosen holes.

If you don't expect a lot of virtio devices, then why do you need two 
buses? Wouldn't one be sufficient?

> 
> 
>>
>>> + */
>>> +#define GUEST_VIRTIO_PCI_ECAM_BASE          xen_mk_ullong(0x33000000)
>>> +#define GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE    xen_mk_ullong(0x01000000)
>>> +#define GUEST_VIRTIO_PCI_HOST_ECAM_SIZE     xen_mk_ullong(0x00200000)
>>> +
>>> +/* 64 MB is reserved for virtio-pci memory */
>>> +#define GUEST_VIRTIO_PCI_ADDR_TYPE_MEM    xen_mk_ullong(0x02000000)
>>> +#define GUEST_VIRTIO_PCI_MEM_ADDR         xen_mk_ullong(0x34000000)
>>> +#define GUEST_VIRTIO_PCI_MEM_SIZE         xen_mk_ullong(0x04000000)
>>> +
>>>    /*
>>>     * 16MB == 4096 pages reserved for guest to use as a region to map its
>>>     * grant table in.
>>> @@ -476,6 +489,11 @@ typedef uint64_t xen_callback_t;
>>>    #define GUEST_MAGIC_BASE  xen_mk_ullong(0x39000000)
>>>    #define GUEST_MAGIC_SIZE  xen_mk_ullong(0x01000000)
>>> +/* 64 MB is reserved for virtio-pci Prefetch memory */
>>
>> This doesn't seem a lot depending on your use case. Can you details how
>> you can up with "64 MB"?
> 
> the same calculation as it was done configuration space. It was observed
> that only 16K is used per virtio-pci device (maybe it can be bigger for
> usual PCI device, I don't know). Please look at the example of DomU log
> below (to strings that contain "*BAR 4: assigned*"):

What about virtio-gpu? I would expect a bit more memory is necessary for 
that use case.

Any case, what I am looking for is for some explanation in the commit 
message of the limits. I don't particularly care about the exact limit 
because this is not part of a stable ABI.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 17:31       ` Julien Grall
@ 2023-11-15 18:14         ` Oleksandr Tyshchenko
  2023-11-15 18:33           ` Julien Grall
  2023-11-17 13:19         ` Sergiy Kibrik
  1 sibling, 1 reply; 30+ messages in thread
From: Oleksandr Tyshchenko @ 2023-11-15 18:14 UTC (permalink / raw)
  To: Julien Grall, Sergiy Kibrik, xen-devel@lists.xenproject.org
  Cc: Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk



On 15.11.23 19:31, Julien Grall wrote:
> Hi Oleksandr,


Hello Julien

> 
> On 15/11/2023 16:51, Oleksandr Tyshchenko wrote:
>>
>>
>> On 15.11.23 14:33, Julien Grall wrote:
>>> Hi,
>>
>>
>> Hello Julien
>>
>> Let me please try to explain some bits.
>>
>>
>>>
>>> Thanks for adding support for virtio-pci in Xen. I have some questions.
>>>
>>> On 15/11/2023 11:26, Sergiy Kibrik wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> In order to enable more use-cases such as having multiple
>>>> device-models (Qemu) running in different backend domains which provide
>>>> virtio-pci devices for the same guest, we allocate and expose one
>>>> PCI host bridge for every virtio backend domain for that guest.
>>>
>>> OOI, why do you need to expose one PCI host bridge for every stubdomain?
>>>
>>> In fact looking at the next patch, it seems you are handling some of the
>>> hostbridge request in Xen. This is adds a bit more confusion.
>>>
>>> I was expecting the virtual PCI device would be in the vPCI and each
>>> Device emulator would advertise which BDF they are covering.
>>
>>
>> This patch series only covers use-cases where the device emulator
>> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind
>> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI
>> pass-through resources, handling, accounting, nothing. 
> 
> I understood you want to one Device Emulator to handle the entire PCI 
> host bridge. But...
> 
>  From the
>> hypervisor we only need a help to intercept the config space accesses
>> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ...
>> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and
>> forward them to the linked device emulator (if any), that's all.
> 
> ... I really don't see why you need to add code in Xen to trap the 
> region. If QEMU is dealing with the hostbridge, then it should be able 
> to register the MMIO region and then do the translation.


Hmm, sounds surprising I would say. Are you saying that unmodified Qemu 
will work if we drop #5? I think this wants to be re-checked (@Sergiy 
can you please investigate?). If indeed so, than #5 will be dropped of 
course from the that series (I would say, postponed until more use-cases).



> 
>>
>> It is not possible (with current series) to run device emulators what
>> emulate only separate PCI (virtio-pci) devices. For it to be possible, I
>> think, much more changes are required than current patch series does.
>> There at least should be special PCI Host bridge emulation in Xen (or
>> reuse vPCI) for the integration. Also Xen should be in charge of forming
>> resulting PCI interrupt based on each PCI device level signaling (if we
>> use legacy interrupts), some kind of x86's XEN_DMOP_set_pci_intx_level,
>> etc. Please note, I am not saying this is not possible in general,
>> likely it is possible, but initial patch series doesn't cover these
>> use-cases)
>>
>> We expose one PCI host bridge per virtio backend domain. This is a
>> separate PCI host bridge to combine all virtio-pci devices running in
>> the same backend domain (in the same device emulator currently).
>> The examples:
>> - if only one domain runs Qemu which servers virtio-blk, virtio-net,
>> virtio-console devices for DomU - only single PCI Host bridge will be
>> exposed for DomU
>> - if we add another domain to run Qemu to serve additionally virtio-gpu,
>> virtio-input and virtio-snd for the *same* DomU - we expose second PCI
>> Host bridge for DomU
>>
>> I am afraid, we cannot end up exposing only single PCI Host bridge with
>> current model (if we use device emulators running in different domains
>> that handles the *entire* PCI Host bridges), this won't work.
> 
> That makes sense and it is fine. But see above, I think only the #2 is 
> necessary for the hypervisor. Patch #5 should not be necessary at all.


Good, it should be re-checked without #5 sure.


> 
> [...]
> 
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
>>>> ---
>>>>    xen/include/public/arch-arm.h | 21 +++++++++++++++++++++
>>>>    1 file changed, 21 insertions(+)
>>>>
>>>> diff --git a/xen/include/public/arch-arm.h
>>>> b/xen/include/public/arch-arm.h
>>>> index a25e87dbda..e6c9cd5335 100644
>>>> --- a/xen/include/public/arch-arm.h
>>>> +++ b/xen/include/public/arch-arm.h
>>>> @@ -466,6 +466,19 @@ typedef uint64_t xen_callback_t;
>>>>    #define GUEST_VPCI_MEM_ADDR                 
>>>> xen_mk_ullong(0x23000000)
>>>>    #define GUEST_VPCI_MEM_SIZE                 
>>>> xen_mk_ullong(0x10000000)
>>>> +/*
>>>> + * 16 MB is reserved for virtio-pci configuration space based on
>>>> calculation
>>>> + * 8 bridges * 2 buses x 32 devices x 8 functions x 4 KB = 16 MB
>>>
>>> Can you explain how youd ecided the "2"?
>>
>> good question, we have a limited free space available in memory layout
>> (we had difficulties to find a suitable holes) also we don't expect a
>> lot of virtio-pci devices, so "256" used vPCI would be too much. It was
>> decided to reduce significantly, but select maximum to fit into free
>> space, with having "2" buses we still fit into the chosen holes.
> 
> If you don't expect a lot of virtio devices, then why do you need two 
> buses? Wouldn't one be sufficient?


For now one would sufficient I think. @Sergiy if you reduce to a single 
bus here, don't forget to update "bus-range" property in device-tree node.




> 
>>
>>
>>>
>>>> + */
>>>> +#define GUEST_VIRTIO_PCI_ECAM_BASE          xen_mk_ullong(0x33000000)
>>>> +#define GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE    xen_mk_ullong(0x01000000)
>>>> +#define GUEST_VIRTIO_PCI_HOST_ECAM_SIZE     xen_mk_ullong(0x00200000)
>>>> +
>>>> +/* 64 MB is reserved for virtio-pci memory */
>>>> +#define GUEST_VIRTIO_PCI_ADDR_TYPE_MEM    xen_mk_ullong(0x02000000)
>>>> +#define GUEST_VIRTIO_PCI_MEM_ADDR         xen_mk_ullong(0x34000000)
>>>> +#define GUEST_VIRTIO_PCI_MEM_SIZE         xen_mk_ullong(0x04000000)
>>>> +
>>>>    /*
>>>>     * 16MB == 4096 pages reserved for guest to use as a region to 
>>>> map its
>>>>     * grant table in.
>>>> @@ -476,6 +489,11 @@ typedef uint64_t xen_callback_t;
>>>>    #define GUEST_MAGIC_BASE  xen_mk_ullong(0x39000000)
>>>>    #define GUEST_MAGIC_SIZE  xen_mk_ullong(0x01000000)
>>>> +/* 64 MB is reserved for virtio-pci Prefetch memory */
>>>
>>> This doesn't seem a lot depending on your use case. Can you details how
>>> you can up with "64 MB"?
>>
>> the same calculation as it was done configuration space. It was observed
>> that only 16K is used per virtio-pci device (maybe it can be bigger for
>> usual PCI device, I don't know). Please look at the example of DomU log
>> below (to strings that contain "*BAR 4: assigned*"):
> 
> What about virtio-gpu? I would expect a bit more memory is necessary for 
> that use case.


In the DomU log I provided during last response, virtio-gpu was also 
present among 5 virtio-pci devices and it worked at the runtime

[    0.474575] pci 0001:00:03.0: [1af4:1050] type 00 class 0x038000
[    0.476534] pci 0001:00:03.0: reg 0x20: [mem 0x00000000-0x00003fff 
64bit pref]
....
[    0.496656] pci 0001:00:03.0: BAR 4: assigned [mem 
0x3a808000-0x3a80bfff 64bit pref]
....
[    0.550208] virtio-pci 0001:00:03.0: enabling device (0000 -> 0002)
....
0001:00:03.0 Display controller: Red Hat, Inc. Virtio GPU (rev 01)

I guess, indeed it needs more memory, but this is related to I/O 
descriptors at the runtime that passed via virtqueue.



> 
> Any case, what I am looking for is for some explanation in the commit 
> message of the limits. I don't particularly care about the exact limit 
> because this is not part of a stable ABI.

ok, sure


> 
> Cheers,
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 18:14         ` Oleksandr Tyshchenko
@ 2023-11-15 18:33           ` Julien Grall
  2023-11-15 19:38             ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 30+ messages in thread
From: Julien Grall @ 2023-11-15 18:33 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Sergiy Kibrik,
	xen-devel@lists.xenproject.org
  Cc: Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk

Hi Oleksandr,

On 15/11/2023 18:14, Oleksandr Tyshchenko wrote:
> On 15.11.23 19:31, Julien Grall wrote:
>> On 15/11/2023 16:51, Oleksandr Tyshchenko wrote:
>>> On 15.11.23 14:33, Julien Grall wrote:
>>>> Thanks for adding support for virtio-pci in Xen. I have some questions.
>>>>
>>>> On 15/11/2023 11:26, Sergiy Kibrik wrote:
>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>
>>>>> In order to enable more use-cases such as having multiple
>>>>> device-models (Qemu) running in different backend domains which provide
>>>>> virtio-pci devices for the same guest, we allocate and expose one
>>>>> PCI host bridge for every virtio backend domain for that guest.
>>>>
>>>> OOI, why do you need to expose one PCI host bridge for every stubdomain?
>>>>
>>>> In fact looking at the next patch, it seems you are handling some of the
>>>> hostbridge request in Xen. This is adds a bit more confusion.
>>>>
>>>> I was expecting the virtual PCI device would be in the vPCI and each
>>>> Device emulator would advertise which BDF they are covering.
>>>
>>>
>>> This patch series only covers use-cases where the device emulator
>>> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind
>>> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI
>>> pass-through resources, handling, accounting, nothing.
>>
>> I understood you want to one Device Emulator to handle the entire PCI
>> host bridge. But...
>>
>>   From the
>>> hypervisor we only need a help to intercept the config space accesses
>>> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ...
>>> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and
>>> forward them to the linked device emulator (if any), that's all.
>>
>> ... I really don't see why you need to add code in Xen to trap the
>> region. If QEMU is dealing with the hostbridge, then it should be able
>> to register the MMIO region and then do the translation.
> 
> 
> Hmm, sounds surprising I would say. Are you saying that unmodified Qemu
> will work if we drop #5?

I don't know if an unmodified QEMU will work. My point is I don't view 
the patch in Xen necessary. You should be able to tell QEMU "here is the 
ECAM region, please emulate an hostbridge". QEMU will then register the 
region to Xen and all the accesses will be forwarded.

In the future we may need a patch similar to #5 if we want to have 
multiple DM using the same hostbridge. But this is a different 
discussion and the patch would need some rework.

The ioreq.c code was always meant to be generic and is always for every 
emulated MMIO. So you want to limit any change in it. Checking the MMIO 
region belongs to the hostbridge and doing the translation is IMHO not a 
good idea to do in ioreq.c. Instead you want to do the conversion from 
MMIO to (sbdf, offset) in virtio_pci_mmio{read, write}(). So the job of 
ioreq.c is to simply find the correct Device Model and forward it.

I also don't see why the feature is gated by has_vcpi(). They are two 
distinct features (at least in your current model).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 18:33           ` Julien Grall
@ 2023-11-15 19:38             ` Oleksandr Tyshchenko
  2023-11-15 19:56               ` Julien Grall
  0 siblings, 1 reply; 30+ messages in thread
From: Oleksandr Tyshchenko @ 2023-11-15 19:38 UTC (permalink / raw)
  To: Julien Grall, Sergiy Kibrik, xen-devel@lists.xenproject.org
  Cc: Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk



On 15.11.23 20:33, Julien Grall wrote:
> Hi Oleksandr,

Hello Julien


> 
> On 15/11/2023 18:14, Oleksandr Tyshchenko wrote:
>> On 15.11.23 19:31, Julien Grall wrote:
>>> On 15/11/2023 16:51, Oleksandr Tyshchenko wrote:
>>>> On 15.11.23 14:33, Julien Grall wrote:
>>>>> Thanks for adding support for virtio-pci in Xen. I have some 
>>>>> questions.
>>>>>
>>>>> On 15/11/2023 11:26, Sergiy Kibrik wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>
>>>>>> In order to enable more use-cases such as having multiple
>>>>>> device-models (Qemu) running in different backend domains which 
>>>>>> provide
>>>>>> virtio-pci devices for the same guest, we allocate and expose one
>>>>>> PCI host bridge for every virtio backend domain for that guest.
>>>>>
>>>>> OOI, why do you need to expose one PCI host bridge for every 
>>>>> stubdomain?
>>>>>
>>>>> In fact looking at the next patch, it seems you are handling some 
>>>>> of the
>>>>> hostbridge request in Xen. This is adds a bit more confusion.
>>>>>
>>>>> I was expecting the virtual PCI device would be in the vPCI and each
>>>>> Device emulator would advertise which BDF they are covering.
>>>>
>>>>
>>>> This patch series only covers use-cases where the device emulator
>>>> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices 
>>>> behind
>>>> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI
>>>> pass-through resources, handling, accounting, nothing.
>>>
>>> I understood you want to one Device Emulator to handle the entire PCI
>>> host bridge. But...
>>>
>>>   From the
>>>> hypervisor we only need a help to intercept the config space accesses
>>>> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ...
>>>> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and
>>>> forward them to the linked device emulator (if any), that's all.
>>>
>>> ... I really don't see why you need to add code in Xen to trap the
>>> region. If QEMU is dealing with the hostbridge, then it should be able
>>> to register the MMIO region and then do the translation.
>>
>>
>> Hmm, sounds surprising I would say. Are you saying that unmodified Qemu
>> will work if we drop #5?
> 
> I don't know if an unmodified QEMU will work. My point is I don't view 
> the patch in Xen necessary. You should be able to tell QEMU "here is the 
> ECAM region, please emulate an hostbridge". QEMU will then register the 
> region to Xen and all the accesses will be forwarded. >
> In the future we may need a patch similar to #5 if we want to have 
> multiple DM using the same hostbridge. But this is a different 
> discussion and the patch would need some rework.


ok

> 
> The ioreq.c code was always meant to be generic and is always for every 
> emulated MMIO. So you want to limit any change in it. Checking the MMIO 
> region belongs to the hostbridge and doing the translation is IMHO not a 
> good idea to do in ioreq.c. Instead you want to do the conversion from 
> MMIO to (sbdf, offset) in virtio_pci_mmio{read, write}(). So the job of 
> ioreq.c is to simply find the correct Device Model and forward it.



Are you about virtio_pci_ioreq_server_get_addr() called from 
arch_ioreq_server_get_type_addr()? If so and if I am not mistaken the 
x86 also check what PCI device is targeted there.

But, I am not against the suggestion, I agree with it.


> 
> I also don't see why the feature is gated by has_vcpi(). They are two 
> distinct features (at least in your current model).

yes, you are correct. In #5 virtio-pci mmio handlers are still 
registered in domain_vpci_init() (which is gated by has_vcpi()), etc


> 
> Cheers,
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 19:38             ` Oleksandr Tyshchenko
@ 2023-11-15 19:56               ` Julien Grall
  0 siblings, 0 replies; 30+ messages in thread
From: Julien Grall @ 2023-11-15 19:56 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Sergiy Kibrik,
	xen-devel@lists.xenproject.org
  Cc: Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk

Hi Oleksandr,

On 15/11/2023 19:38, Oleksandr Tyshchenko wrote:
>> The ioreq.c code was always meant to be generic and is always for every
>> emulated MMIO. So you want to limit any change in it. Checking the MMIO
>> region belongs to the hostbridge and doing the translation is IMHO not a
>> good idea to do in ioreq.c. Instead you want to do the conversion from
>> MMIO to (sbdf, offset) in virtio_pci_mmio{read, write}(). So the job of
>> ioreq.c is to simply find the correct Device Model and forward it.
> 
> 
> 
> Are you about virtio_pci_ioreq_server_get_addr() called from
> arch_ioreq_server_get_type_addr()? If so and if I am not mistaken the
> x86 also check what PCI device is targeted there.
Well yes. We can do better to avoid extra complexity for each MMIO.

Note that the x86 version is somewhat more light-weight.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 16:51     ` Oleksandr Tyshchenko
  2023-11-15 17:31       ` Julien Grall
@ 2023-11-15 23:28       ` Stefano Stabellini
  2023-11-16 15:07         ` Volodymyr Babchuk
  2023-11-17  3:31         ` Stewart Hildebrand
  1 sibling, 2 replies; 30+ messages in thread
From: Stefano Stabellini @ 2023-11-15 23:28 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Julien Grall, Sergiy Kibrik, xen-devel@lists.xenproject.org,
	Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, vikram.garhwal, stewart.hildebrand

+ Stewart, Vikram

On Wed, 15 Nov 2023, Oleksandr Tyshchenko wrote:
> On 15.11.23 14:33, Julien Grall wrote:
> > Thanks for adding support for virtio-pci in Xen. I have some questions.
> > 
> > On 15/11/2023 11:26, Sergiy Kibrik wrote:
> >> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>
> >> In order to enable more use-cases such as having multiple
> >> device-models (Qemu) running in different backend domains which provide
> >> virtio-pci devices for the same guest, we allocate and expose one
> >> PCI host bridge for every virtio backend domain for that guest.
> > 
> > OOI, why do you need to expose one PCI host bridge for every stubdomain?
> > 
> > In fact looking at the next patch, it seems you are handling some of the 
> > hostbridge request in Xen. This is adds a bit more confusion.
> > 
> > I was expecting the virtual PCI device would be in the vPCI and each 
> > Device emulator would advertise which BDF they are covering.
> 
> 
> This patch series only covers use-cases where the device emulator 
> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind 
> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI 
> pass-through resources, handling, accounting, nothing. From the 
> hypervisor we only need a help to intercept the config space accesses 
> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ... 
> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and 
> forward them to the linked device emulator (if any), that's all.
> 
> It is not possible (with current series) to run device emulators what
> emulate only separate PCI (virtio-pci) devices. For it to be possible, I 
> think, much more changes are required than current patch series does. 
> There at least should be special PCI Host bridge emulation in Xen (or 
> reuse vPCI) for the integration. Also Xen should be in charge of forming 
> resulting PCI interrupt based on each PCI device level signaling (if we 
> use legacy interrupts), some kind of x86's XEN_DMOP_set_pci_intx_level, 
> etc. Please note, I am not saying this is not possible in general, 
> likely it is possible, but initial patch series doesn't cover these 
> use-cases)
>
> We expose one PCI host bridge per virtio backend domain. This is a 
> separate PCI host bridge to combine all virtio-pci devices running in 
> the same backend domain (in the same device emulator currently).
> The examples:
> - if only one domain runs Qemu which servers virtio-blk, virtio-net, 
> virtio-console devices for DomU - only single PCI Host bridge will be 
> exposed for DomU
> - if we add another domain to run Qemu to serve additionally virtio-gpu, 
> virtio-input and virtio-snd for the *same* DomU - we expose second PCI 
> Host bridge for DomU
> 
> I am afraid, we cannot end up exposing only single PCI Host bridge with 
> current model (if we use device emulators running in different domains 
> that handles the *entire* PCI Host bridges), this won't work.
 

We were discussing the topic of vPCI and Virtio PCI just this morning
with Stewart and Vikram. We also intend to make them work well together
in the next couple of months (great timing!!)

However, our thinking is to go with the other approach Julien
suggested: a single PCI Root Complex emulated in Xen by vPCI. QEMU would
register individual PCI devices against it.

Vikram, Stewart, please comment. Our understanding is that it should be
possible to make QEMU virtio-pci work with vPCI with relatively minor
efforts and AMD volunteers to do the work in the next couple of months
on the vPCI side.


Although it should be possible to make both approaches work at the same
time, given that it would seem that EPAM and AMD have very similar
requirements, I suggest we work together and collaborate on a single
approach going forward that works best for everyone.


Let me start by saying that if we can get away with it, I think that a
single PCI Root Complex in Xen would be best because it requires less
complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only
one?

Stewart, you are deep into vPCI, what's your thinking?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 5/6] xen/arm: Intercept vPCI config accesses and forward them to emulator
  2023-11-15 12:45   ` Julien Grall
@ 2023-11-15 23:30     ` Stefano Stabellini
  0 siblings, 0 replies; 30+ messages in thread
From: Stefano Stabellini @ 2023-11-15 23:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: Sergiy Kibrik, xen-devel, Oleksandr Tyshchenko,
	Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, vikram.garhwal, stewart.hildebrand

On Wed, 15 Nov 2023, Julien Grall wrote:
> Hi,
> 
> On 15/11/2023 11:26, Sergiy Kibrik wrote:
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > 
> > This is needed for supporting virtio-pci.
> > 
> > In case when the PCI Host bridge is emulated outside of Xen
> > (IOREQ server), we need some mechanism to intercept config space
> > accesses on Xen on Arm, and forward them to the emulator
> > (for example, virtio backend) via IOREQ request.
> > 
> > Unlike x86, on Arm these accesses are MMIO, there is no CFC/CF8 method
> > to detect which PCI device is targeted.
> > 
> > In order to not mix PCI passthrough with virtio-pci features we add
> > one more region to cover the total configuration space for all possible
> > host bridges which can serve virtio-pci devices for that guest.
> > We expose one PCI host bridge per virtio backend domain.
> I am a little confused. If you expose one PCI host bridge per virtio backend,
> then why can't the backend simply register the MMIO region and do the
> translation itself when it receives the read/write?
> 
> To me, it only makes sense for Xen to emulate the host bridge access if you
> plan to have one host bridge shared between multiple IOREQ domains or mix with
> PCI pasthrough.

+1

This is exactly what Stewart, Vikram and I were thinking


> From my perspective, I don't expect we would have that many virtio PCI
> devices. So imposing a host bridge per device emulator will mean extra
> resource in the guest as well (they need to keep track of all the hostbridge).

+1


> So in the longer run, I think we want to allow mixing PCI passthrough and
> virtio-PCI (or really any emulated PCI because nothing here is virtio
> specific).

+1


> For now, your approach would be OK to enable virtio PCI on Xen. But I don't
> think there are any changes necessary in Xen other than reserving some MMIO
> regions/IRQ.

I don't mean to slow down EPAM, but I think we can jump in and do the
necessary changes to vPCI for everyone's benefits and with a timeframe
that works for AMD, EPAM and the Xen community.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 23:28       ` Stefano Stabellini
@ 2023-11-16 15:07         ` Volodymyr Babchuk
  2023-11-16 15:12           ` Julien Grall
  2023-11-16 23:04           ` Stefano Stabellini
  2023-11-17  3:31         ` Stewart Hildebrand
  1 sibling, 2 replies; 30+ messages in thread
From: Volodymyr Babchuk @ 2023-11-16 15:07 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, Julien Grall, Sergiy Kibrik,
	xen-devel@lists.xenproject.org, Bertrand Marquis, Michal Orzel,
	vikram.garhwal@amd.com, Stewart Hildebrand


Hi Stefano,

Stefano Stabellini <sstabellini@kernel.org> writes:

> + Stewart, Vikram
>
> On Wed, 15 Nov 2023, Oleksandr Tyshchenko wrote:
>> On 15.11.23 14:33, Julien Grall wrote:
>> > Thanks for adding support for virtio-pci in Xen. I have some questions.
>> > 
>> > On 15/11/2023 11:26, Sergiy Kibrik wrote:
>> >> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> >>
>> >> In order to enable more use-cases such as having multiple
>> >> device-models (Qemu) running in different backend domains which provide
>> >> virtio-pci devices for the same guest, we allocate and expose one
>> >> PCI host bridge for every virtio backend domain for that guest.
>> > 
>> > OOI, why do you need to expose one PCI host bridge for every stubdomain?
>> > 
>> > In fact looking at the next patch, it seems you are handling some of the 
>> > hostbridge request in Xen. This is adds a bit more confusion.
>> > 
>> > I was expecting the virtual PCI device would be in the vPCI and each 
>> > Device emulator would advertise which BDF they are covering.
>> 
>> 
>> This patch series only covers use-cases where the device emulator 
>> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind 
>> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI 
>> pass-through resources, handling, accounting, nothing. From the 
>> hypervisor we only need a help to intercept the config space accesses 
>> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ... 
>> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and 
>> forward them to the linked device emulator (if any), that's all.
>> 
>> It is not possible (with current series) to run device emulators what
>> emulate only separate PCI (virtio-pci) devices. For it to be possible, I 
>> think, much more changes are required than current patch series does. 
>> There at least should be special PCI Host bridge emulation in Xen (or 
>> reuse vPCI) for the integration. Also Xen should be in charge of forming 
>> resulting PCI interrupt based on each PCI device level signaling (if we 
>> use legacy interrupts), some kind of x86's XEN_DMOP_set_pci_intx_level, 
>> etc. Please note, I am not saying this is not possible in general, 
>> likely it is possible, but initial patch series doesn't cover these 
>> use-cases)
>>
>> We expose one PCI host bridge per virtio backend domain. This is a 
>> separate PCI host bridge to combine all virtio-pci devices running in 
>> the same backend domain (in the same device emulator currently).
>> The examples:
>> - if only one domain runs Qemu which servers virtio-blk, virtio-net, 
>> virtio-console devices for DomU - only single PCI Host bridge will be 
>> exposed for DomU
>> - if we add another domain to run Qemu to serve additionally virtio-gpu, 
>> virtio-input and virtio-snd for the *same* DomU - we expose second PCI 
>> Host bridge for DomU
>> 
>> I am afraid, we cannot end up exposing only single PCI Host bridge with 
>> current model (if we use device emulators running in different domains 
>> that handles the *entire* PCI Host bridges), this won't work.
>  
>
> We were discussing the topic of vPCI and Virtio PCI just this morning
> with Stewart and Vikram. We also intend to make them work well together
> in the next couple of months (great timing!!)
>
> However, our thinking is to go with the other approach Julien
> suggested: a single PCI Root Complex emulated in Xen by vPCI. QEMU would
> register individual PCI devices against it.
>
> Vikram, Stewart, please comment. Our understanding is that it should be
> possible to make QEMU virtio-pci work with vPCI with relatively minor
> efforts and AMD volunteers to do the work in the next couple of months
> on the vPCI side.
>
>
> Although it should be possible to make both approaches work at the same
> time, given that it would seem that EPAM and AMD have very similar
> requirements, I suggest we work together and collaborate on a single
> approach going forward that works best for everyone.
>
>
> Let me start by saying that if we can get away with it, I think that a
> single PCI Root Complex in Xen would be best because it requires less
> complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only
> one?

Well, in fact we tried similar setup, this was in the first version of
virtio-pci support. But we had a couple of issues with this. For
instance, this might conflict with PCI passthrough devices, with virtio
devices that have back-ends in different domains, etc. I am no saying
that this is impossible, but this just involves more moving parts.

With my vPCI patch series in place, hypervisor itself assigns BDFs for
passed-through devices. Toolstack needs to get this information to know
which BDFs are free and can be used by virtio-pci.

-- 
WBR, Volodymyr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-16 15:07         ` Volodymyr Babchuk
@ 2023-11-16 15:12           ` Julien Grall
  2023-11-16 15:26             ` Stewart Hildebrand
  2023-11-16 23:04           ` Stefano Stabellini
  1 sibling, 1 reply; 30+ messages in thread
From: Julien Grall @ 2023-11-16 15:12 UTC (permalink / raw)
  To: Volodymyr Babchuk, Stefano Stabellini
  Cc: Oleksandr Tyshchenko, Sergiy Kibrik,
	xen-devel@lists.xenproject.org, Bertrand Marquis, Michal Orzel,
	vikram.garhwal@amd.com, Stewart Hildebrand

Hi Volodymyr,

On 16/11/2023 15:07, Volodymyr Babchuk wrote:
> With my vPCI patch series in place, hypervisor itself assigns BDFs for
> passed-through devices. Toolstack needs to get this information to know
> which BDFs are free and can be used by virtio-pci.

It sounds a bit odd to let the hypervisor to assign the BDFs. At least 
because there might be case where you want to specific vBDF (for 
instance this is the case with some intel graphic cards). This should be 
the toolstack job to say "I want to assign the pBDF to this vBDF".

Do you have a link to the patch adding the logic in the hypervisor? I 
will comment there as well.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-16 15:12           ` Julien Grall
@ 2023-11-16 15:26             ` Stewart Hildebrand
  2023-11-16 15:58               ` Julien Grall
  0 siblings, 1 reply; 30+ messages in thread
From: Stewart Hildebrand @ 2023-11-16 15:26 UTC (permalink / raw)
  To: Julien Grall, Volodymyr Babchuk, Stefano Stabellini
  Cc: Oleksandr Tyshchenko, Sergiy Kibrik,
	xen-devel@lists.xenproject.org, Bertrand Marquis, Michal Orzel,
	vikram.garhwal@amd.com

On 11/16/23 10:12, Julien Grall wrote:
> Hi Volodymyr,
> 
> On 16/11/2023 15:07, Volodymyr Babchuk wrote:
>> With my vPCI patch series in place, hypervisor itself assigns BDFs for
>> passed-through devices. Toolstack needs to get this information to know
>> which BDFs are free and can be used by virtio-pci.
> 
> It sounds a bit odd to let the hypervisor to assign the BDFs. At least because there might be case where you want to specific vBDF (for instance this is the case with some intel graphic cards). This should be the toolstack job to say "I want to assign the pBDF to this vBDF".

Keep in mind we are also supporting dom0less PCI passthrough.

> 
> Do you have a link to the patch adding the logic in the hypervisor? I will comment there as well.

See add_virtual_device() in [1], specifically the line:

    pdev->vpci->guest_sbdf = PCI_SBDF(0, 0, new_dev_number, 0);

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00673.html


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-16 15:26             ` Stewart Hildebrand
@ 2023-11-16 15:58               ` Julien Grall
  2023-11-16 16:53                 ` Volodymyr Babchuk
  0 siblings, 1 reply; 30+ messages in thread
From: Julien Grall @ 2023-11-16 15:58 UTC (permalink / raw)
  To: Stewart Hildebrand, Volodymyr Babchuk, Stefano Stabellini
  Cc: Oleksandr Tyshchenko, Sergiy Kibrik,
	xen-devel@lists.xenproject.org, Bertrand Marquis, Michal Orzel,
	vikram.garhwal@amd.com



On 16/11/2023 15:26, Stewart Hildebrand wrote:
> On 11/16/23 10:12, Julien Grall wrote:
>> Hi Volodymyr,
>>
>> On 16/11/2023 15:07, Volodymyr Babchuk wrote:
>>> With my vPCI patch series in place, hypervisor itself assigns BDFs for
>>> passed-through devices. Toolstack needs to get this information to know
>>> which BDFs are free and can be used by virtio-pci.
>>
>> It sounds a bit odd to let the hypervisor to assign the BDFs. At least because there might be case where you want to specific vBDF (for instance this is the case with some intel graphic cards). This should be the toolstack job to say "I want to assign the pBDF to this vBDF".
> 
> Keep in mind we are also supporting dom0less PCI passthrough.
Right, but even with that in mind, I expect the Device-Tree to provide 
the details where a given PCI is assigned.

You could have logic for assigning the BDF automagically. But that 
should be part of dom0less, not deep into the vPCI code.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-16 15:58               ` Julien Grall
@ 2023-11-16 16:53                 ` Volodymyr Babchuk
  2023-11-16 17:27                   ` Julien Grall
  0 siblings, 1 reply; 30+ messages in thread
From: Volodymyr Babchuk @ 2023-11-16 16:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stewart Hildebrand, Stefano Stabellini, Oleksandr Tyshchenko,
	Sergiy Kibrik, xen-devel@lists.xenproject.org, Bertrand Marquis,
	Michal Orzel, vikram.garhwal@amd.com

Hi Julien,

Julien Grall <julien@xen.org> writes:

> On 16/11/2023 15:26, Stewart Hildebrand wrote:
>> On 11/16/23 10:12, Julien Grall wrote:
>>> Hi Volodymyr,
>>>
>>> On 16/11/2023 15:07, Volodymyr Babchuk wrote:
>>>> With my vPCI patch series in place, hypervisor itself assigns BDFs for
>>>> passed-through devices. Toolstack needs to get this information to know
>>>> which BDFs are free and can be used by virtio-pci.
>>>
>>> It sounds a bit odd to let the hypervisor to assign the BDFs. At
>>> least because there might be case where you want to specific vBDF
>>> (for instance this is the case with some intel graphic cards). This
>>> should be the toolstack job to say "I want to assign the pBDF to
>>> this vBDF".
>> Keep in mind we are also supporting dom0less PCI passthrough.
> Right, but even with that in mind, I expect the Device-Tree to provide
> the details where a given PCI is assigned.
>
> You could have logic for assigning the BDF automagically. But that
> should be part of dom0less, not deep into the vPCI code.

As far as I know, right now toolstack does not allow you to assign BDF
in any form. For x86 there are two options, and they are controlled by
"passthrough" option of xen-pciback driver in Linux:

  "Option to specify how to export PCI topology to guest:"
  " 0 - (default) Hide the true PCI topology and makes the frontend"
  "   there is a single PCI bus with only the exported devices on it."
  "   For example, a device at 03:05.0 will be re-assigned to 00:00.0"
  "   while second device at 02:1a.1 will be re-assigned to 00:01.1."
  " 1 - Passthrough provides a real view of the PCI topology to the"
  "   frontend (for example, a device at 06:01.b will still appear at"
  "   06:01.b to the frontend). This is similar to how Xen 2.0.x"
  "   exposed PCI devices to its driver domains. This may be required"
  "   for drivers which depend on finding their hardward in certain"
  "   bus/slot locations.");

Also, isn't strict dependency on BDF breaks the PCI specification? I
mean, of course, you can assign Function on random, but what about Bus
and Device parts?

I mean, we can make toolstack responsible of assigning BDFs, but this
might break existing x86 setups... 

-- 
WBR, Volodymyr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-16 16:53                 ` Volodymyr Babchuk
@ 2023-11-16 17:27                   ` Julien Grall
  0 siblings, 0 replies; 30+ messages in thread
From: Julien Grall @ 2023-11-16 17:27 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stewart Hildebrand, Stefano Stabellini, Oleksandr Tyshchenko,
	Sergiy Kibrik, xen-devel@lists.xenproject.org, Bertrand Marquis,
	Michal Orzel, vikram.garhwal@amd.com

Hi Volodymyr,

On 16/11/2023 16:53, Volodymyr Babchuk wrote:
> Julien Grall <julien@xen.org> writes:
>> On 16/11/2023 15:26, Stewart Hildebrand wrote:
>>> On 11/16/23 10:12, Julien Grall wrote:
>>>> Hi Volodymyr,
>>>>
>>>> On 16/11/2023 15:07, Volodymyr Babchuk wrote:
>>>>> With my vPCI patch series in place, hypervisor itself assigns BDFs for
>>>>> passed-through devices. Toolstack needs to get this information to know
>>>>> which BDFs are free and can be used by virtio-pci.
>>>>
>>>> It sounds a bit odd to let the hypervisor to assign the BDFs. At
>>>> least because there might be case where you want to specific vBDF
>>>> (for instance this is the case with some intel graphic cards). This
>>>> should be the toolstack job to say "I want to assign the pBDF to
>>>> this vBDF".
>>> Keep in mind we are also supporting dom0less PCI passthrough.
>> Right, but even with that in mind, I expect the Device-Tree to provide
>> the details where a given PCI is assigned.
>>
>> You could have logic for assigning the BDF automagically. But that
>> should be part of dom0less, not deep into the vPCI code.
> 
> As far as I know, right now toolstack does not allow you to assign BDF
> in any form. For x86 there are two options, and they are controlled by
> "passthrough" option of xen-pciback driver in Linux:

Are you talking about HVM or PV? I am not very familiar for the latter 
but for the former, QEMU is today responsible to find a free slot.

You can also specify which virtual slot you want to use in XL:

=item B<vslot>=I<NUMBER>

=over 4

=item Description

Specifies the virtual slot (device) number where the guest will see this
device. For example, running L<lspci(1)> in a Linux guest where B<vslot>
was specified as C<8> would identify the device as C<00:08.0>. Virtual 
domain
and bus numbers are always 0.

> 
>    "Option to specify how to export PCI topology to guest:"
>    " 0 - (default) Hide the true PCI topology and makes the frontend"
>    "   there is a single PCI bus with only the exported devices on it."
>    "   For example, a device at 03:05.0 will be re-assigned to 00:00.0"
>    "   while second device at 02:1a.1 will be re-assigned to 00:01.1."
>    " 1 - Passthrough provides a real view of the PCI topology to the"
>    "   frontend (for example, a device at 06:01.b will still appear at"
>    "   06:01.b to the frontend). This is similar to how Xen 2.0.x"
>    "   exposed PCI devices to its driver domains. This may be required"
>    "   for drivers which depend on finding their hardward in certain"
>    "   bus/slot locations.");
> 
> Also, isn't strict dependency on BDF breaks the PCI specification? I
> mean, of course, you can assign Function on random, but what about Bus
> and Device parts?

I am not well-versed with the PCI specification. However, what the specs 
says and what users do tend to be different :).

I know a few cases where you need to specify the slot. I have mentioned 
one earlier with the Intel Graphic cards.

> I mean, we can make toolstack responsible of assigning BDFs, but this
> might break existing x86 setups...

It is not clear to me how letting the toolstack selecting the vslot 
would be a problem for HVM (see above). However, it is clear that you 
may break some x86 setup if always allocate a "random" slot.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-16 15:07         ` Volodymyr Babchuk
  2023-11-16 15:12           ` Julien Grall
@ 2023-11-16 23:04           ` Stefano Stabellini
  2023-11-17  0:23             ` Volodymyr Babchuk
  1 sibling, 1 reply; 30+ messages in thread
From: Stefano Stabellini @ 2023-11-16 23:04 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, Julien Grall,
	Sergiy Kibrik, xen-devel@lists.xenproject.org, Bertrand Marquis,
	Michal Orzel, vikram.garhwal@amd.com, Stewart Hildebrand

On Thu, 16 Nov 2023, Volodymyr Babchuk wrote:
> Hi Stefano,
> 
> Stefano Stabellini <sstabellini@kernel.org> writes:
> 
> > + Stewart, Vikram
> >
> > On Wed, 15 Nov 2023, Oleksandr Tyshchenko wrote:
> >> On 15.11.23 14:33, Julien Grall wrote:
> >> > Thanks for adding support for virtio-pci in Xen. I have some questions.
> >> > 
> >> > On 15/11/2023 11:26, Sergiy Kibrik wrote:
> >> >> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >> >>
> >> >> In order to enable more use-cases such as having multiple
> >> >> device-models (Qemu) running in different backend domains which provide
> >> >> virtio-pci devices for the same guest, we allocate and expose one
> >> >> PCI host bridge for every virtio backend domain for that guest.
> >> > 
> >> > OOI, why do you need to expose one PCI host bridge for every stubdomain?
> >> > 
> >> > In fact looking at the next patch, it seems you are handling some of the 
> >> > hostbridge request in Xen. This is adds a bit more confusion.
> >> > 
> >> > I was expecting the virtual PCI device would be in the vPCI and each 
> >> > Device emulator would advertise which BDF they are covering.
> >> 
> >> 
> >> This patch series only covers use-cases where the device emulator 
> >> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind 
> >> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI 
> >> pass-through resources, handling, accounting, nothing. From the 
> >> hypervisor we only need a help to intercept the config space accesses 
> >> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ... 
> >> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and 
> >> forward them to the linked device emulator (if any), that's all.
> >> 
> >> It is not possible (with current series) to run device emulators what
> >> emulate only separate PCI (virtio-pci) devices. For it to be possible, I 
> >> think, much more changes are required than current patch series does. 
> >> There at least should be special PCI Host bridge emulation in Xen (or 
> >> reuse vPCI) for the integration. Also Xen should be in charge of forming 
> >> resulting PCI interrupt based on each PCI device level signaling (if we 
> >> use legacy interrupts), some kind of x86's XEN_DMOP_set_pci_intx_level, 
> >> etc. Please note, I am not saying this is not possible in general, 
> >> likely it is possible, but initial patch series doesn't cover these 
> >> use-cases)
> >>
> >> We expose one PCI host bridge per virtio backend domain. This is a 
> >> separate PCI host bridge to combine all virtio-pci devices running in 
> >> the same backend domain (in the same device emulator currently).
> >> The examples:
> >> - if only one domain runs Qemu which servers virtio-blk, virtio-net, 
> >> virtio-console devices for DomU - only single PCI Host bridge will be 
> >> exposed for DomU
> >> - if we add another domain to run Qemu to serve additionally virtio-gpu, 
> >> virtio-input and virtio-snd for the *same* DomU - we expose second PCI 
> >> Host bridge for DomU
> >> 
> >> I am afraid, we cannot end up exposing only single PCI Host bridge with 
> >> current model (if we use device emulators running in different domains 
> >> that handles the *entire* PCI Host bridges), this won't work.
> >  
> >
> > We were discussing the topic of vPCI and Virtio PCI just this morning
> > with Stewart and Vikram. We also intend to make them work well together
> > in the next couple of months (great timing!!)
> >
> > However, our thinking is to go with the other approach Julien
> > suggested: a single PCI Root Complex emulated in Xen by vPCI. QEMU would
> > register individual PCI devices against it.
> >
> > Vikram, Stewart, please comment. Our understanding is that it should be
> > possible to make QEMU virtio-pci work with vPCI with relatively minor
> > efforts and AMD volunteers to do the work in the next couple of months
> > on the vPCI side.
> >
> >
> > Although it should be possible to make both approaches work at the same
> > time, given that it would seem that EPAM and AMD have very similar
> > requirements, I suggest we work together and collaborate on a single
> > approach going forward that works best for everyone.
> >
> >
> > Let me start by saying that if we can get away with it, I think that a
> > single PCI Root Complex in Xen would be best because it requires less
> > complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only
> > one?
> 
> Well, in fact we tried similar setup, this was in the first version of
> virtio-pci support. But we had a couple of issues with this. For
> instance, this might conflict with PCI passthrough devices, with virtio
> devices that have back-ends in different domains, etc. I am no saying
> that this is impossible, but this just involves more moving parts.
> 
> With my vPCI patch series in place, hypervisor itself assigns BDFs for
> passed-through devices. Toolstack needs to get this information to know
> which BDFs are free and can be used by virtio-pci.

I'll premise that I don't really have an opinion on how the virtual BDF
allocation should happen.

But I'll ask the opposite question that Julien asked: if it is Xen that
does the allocation, that's fine, then couldn't we arrange so that Xen
also does the allocation in the toolstack case too (simply by picking
the first available virtual BDF)?

One way or the other it should be OK as long as we are consistent.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-16 23:04           ` Stefano Stabellini
@ 2023-11-17  0:23             ` Volodymyr Babchuk
  0 siblings, 0 replies; 30+ messages in thread
From: Volodymyr Babchuk @ 2023-11-17  0:23 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, Julien Grall, Sergiy Kibrik,
	xen-devel@lists.xenproject.org, Bertrand Marquis, Michal Orzel,
	vikram.garhwal@amd.com, Stewart Hildebrand


Hi Stefano,

Stefano Stabellini <sstabellini@kernel.org> writes:

> On Thu, 16 Nov 2023, Volodymyr Babchuk wrote:
>> Hi Stefano,
>> 
>> Stefano Stabellini <sstabellini@kernel.org> writes:
>> 
>> > + Stewart, Vikram
>> >
>> > On Wed, 15 Nov 2023, Oleksandr Tyshchenko wrote:
>> >> On 15.11.23 14:33, Julien Grall wrote:
>> >> > Thanks for adding support for virtio-pci in Xen. I have some questions.
>> >> > 
>> >> > On 15/11/2023 11:26, Sergiy Kibrik wrote:
>> >> >> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> >> >>
>> >> >> In order to enable more use-cases such as having multiple
>> >> >> device-models (Qemu) running in different backend domains which provide
>> >> >> virtio-pci devices for the same guest, we allocate and expose one
>> >> >> PCI host bridge for every virtio backend domain for that guest.
>> >> > 
>> >> > OOI, why do you need to expose one PCI host bridge for every stubdomain?
>> >> > 
>> >> > In fact looking at the next patch, it seems you are handling some of the 
>> >> > hostbridge request in Xen. This is adds a bit more confusion.
>> >> > 
>> >> > I was expecting the virtual PCI device would be in the vPCI and each 
>> >> > Device emulator would advertise which BDF they are covering.
>> >> 
>> >> 
>> >> This patch series only covers use-cases where the device emulator 
>> >> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind 
>> >> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI 
>> >> pass-through resources, handling, accounting, nothing. From the 
>> >> hypervisor we only need a help to intercept the config space accesses 
>> >> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ... 
>> >> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and 
>> >> forward them to the linked device emulator (if any), that's all.
>> >> 
>> >> It is not possible (with current series) to run device emulators what
>> >> emulate only separate PCI (virtio-pci) devices. For it to be possible, I 
>> >> think, much more changes are required than current patch series does. 
>> >> There at least should be special PCI Host bridge emulation in Xen (or 
>> >> reuse vPCI) for the integration. Also Xen should be in charge of forming 
>> >> resulting PCI interrupt based on each PCI device level signaling (if we 
>> >> use legacy interrupts), some kind of x86's XEN_DMOP_set_pci_intx_level, 
>> >> etc. Please note, I am not saying this is not possible in general, 
>> >> likely it is possible, but initial patch series doesn't cover these 
>> >> use-cases)
>> >>
>> >> We expose one PCI host bridge per virtio backend domain. This is a 
>> >> separate PCI host bridge to combine all virtio-pci devices running in 
>> >> the same backend domain (in the same device emulator currently).
>> >> The examples:
>> >> - if only one domain runs Qemu which servers virtio-blk, virtio-net, 
>> >> virtio-console devices for DomU - only single PCI Host bridge will be 
>> >> exposed for DomU
>> >> - if we add another domain to run Qemu to serve additionally virtio-gpu, 
>> >> virtio-input and virtio-snd for the *same* DomU - we expose second PCI 
>> >> Host bridge for DomU
>> >> 
>> >> I am afraid, we cannot end up exposing only single PCI Host bridge with 
>> >> current model (if we use device emulators running in different domains 
>> >> that handles the *entire* PCI Host bridges), this won't work.
>> >  
>> >
>> > We were discussing the topic of vPCI and Virtio PCI just this morning
>> > with Stewart and Vikram. We also intend to make them work well together
>> > in the next couple of months (great timing!!)
>> >
>> > However, our thinking is to go with the other approach Julien
>> > suggested: a single PCI Root Complex emulated in Xen by vPCI. QEMU would
>> > register individual PCI devices against it.
>> >
>> > Vikram, Stewart, please comment. Our understanding is that it should be
>> > possible to make QEMU virtio-pci work with vPCI with relatively minor
>> > efforts and AMD volunteers to do the work in the next couple of months
>> > on the vPCI side.
>> >
>> >
>> > Although it should be possible to make both approaches work at the same
>> > time, given that it would seem that EPAM and AMD have very similar
>> > requirements, I suggest we work together and collaborate on a single
>> > approach going forward that works best for everyone.
>> >
>> >
>> > Let me start by saying that if we can get away with it, I think that a
>> > single PCI Root Complex in Xen would be best because it requires less
>> > complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only
>> > one?
>> 
>> Well, in fact we tried similar setup, this was in the first version of
>> virtio-pci support. But we had a couple of issues with this. For
>> instance, this might conflict with PCI passthrough devices, with virtio
>> devices that have back-ends in different domains, etc. I am no saying
>> that this is impossible, but this just involves more moving parts.
>> 
>> With my vPCI patch series in place, hypervisor itself assigns BDFs for
>> passed-through devices. Toolstack needs to get this information to know
>> which BDFs are free and can be used by virtio-pci.
>
> I'll premise that I don't really have an opinion on how the virtual BDF
> allocation should happen.
>
> But I'll ask the opposite question that Julien asked: if it is Xen that
> does the allocation, that's fine, then couldn't we arrange so that Xen
> also does the allocation in the toolstack case too (simply by picking
> the first available virtual BDF)?

Actually, this was my intention as well. As I said in the another email,
we just need to extend or add another domctl to manage vBFDs.

-- 
WBR, Volodymyr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 23:28       ` Stefano Stabellini
  2023-11-16 15:07         ` Volodymyr Babchuk
@ 2023-11-17  3:31         ` Stewart Hildebrand
  2023-11-17  8:11           ` Oleksandr Tyshchenko
  1 sibling, 1 reply; 30+ messages in thread
From: Stewart Hildebrand @ 2023-11-17  3:31 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr Tyshchenko
  Cc: Julien Grall, Sergiy Kibrik, xen-devel@lists.xenproject.org,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, vikram.garhwal

On 11/15/23 18:28, Stefano Stabellini wrote:
> + Stewart, Vikram
> 
> On Wed, 15 Nov 2023, Oleksandr Tyshchenko wrote:
>> On 15.11.23 14:33, Julien Grall wrote:
>>> Thanks for adding support for virtio-pci in Xen. I have some questions.
>>>
>>> On 15/11/2023 11:26, Sergiy Kibrik wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> In order to enable more use-cases such as having multiple
>>>> device-models (Qemu) running in different backend domains which provide
>>>> virtio-pci devices for the same guest, we allocate and expose one
>>>> PCI host bridge for every virtio backend domain for that guest.
>>>
>>> OOI, why do you need to expose one PCI host bridge for every stubdomain?
>>>
>>> In fact looking at the next patch, it seems you are handling some of the 
>>> hostbridge request in Xen. This is adds a bit more confusion.
>>>
>>> I was expecting the virtual PCI device would be in the vPCI and each 
>>> Device emulator would advertise which BDF they are covering.
>>
>>
>> This patch series only covers use-cases where the device emulator 
>> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind 
>> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI 
>> pass-through resources, handling, accounting, nothing. From the 
>> hypervisor we only need a help to intercept the config space accesses 
>> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ... 
>> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and 
>> forward them to the linked device emulator (if any), that's all.
>>
>> It is not possible (with current series) to run device emulators what
>> emulate only separate PCI (virtio-pci) devices. For it to be possible, I 
>> think, much more changes are required than current patch series does. 
>> There at least should be special PCI Host bridge emulation in Xen (or 
>> reuse vPCI) for the integration. Also Xen should be in charge of forming 
>> resulting PCI interrupt based on each PCI device level signaling (if we 
>> use legacy interrupts), some kind of x86's XEN_DMOP_set_pci_intx_level, 
>> etc. Please note, I am not saying this is not possible in general, 
>> likely it is possible, but initial patch series doesn't cover these 
>> use-cases)
>>
>> We expose one PCI host bridge per virtio backend domain. This is a 
>> separate PCI host bridge to combine all virtio-pci devices running in 
>> the same backend domain (in the same device emulator currently).
>> The examples:
>> - if only one domain runs Qemu which servers virtio-blk, virtio-net, 
>> virtio-console devices for DomU - only single PCI Host bridge will be 
>> exposed for DomU
>> - if we add another domain to run Qemu to serve additionally virtio-gpu, 
>> virtio-input and virtio-snd for the *same* DomU - we expose second PCI 
>> Host bridge for DomU
>>
>> I am afraid, we cannot end up exposing only single PCI Host bridge with 
>> current model (if we use device emulators running in different domains 
>> that handles the *entire* PCI Host bridges), this won't work.
>  
> 
> We were discussing the topic of vPCI and Virtio PCI just this morning
> with Stewart and Vikram. We also intend to make them work well together
> in the next couple of months (great timing!!)
> 
> However, our thinking is to go with the other approach Julien
> suggested: a single PCI Root Complex emulated in Xen by vPCI. QEMU would
> register individual PCI devices against it.
> 
> Vikram, Stewart, please comment. Our understanding is that it should be
> possible to make QEMU virtio-pci work with vPCI with relatively minor
> efforts and AMD volunteers to do the work in the next couple of months
> on the vPCI side.
> 
> 
> Although it should be possible to make both approaches work at the same
> time, given that it would seem that EPAM and AMD have very similar
> requirements, I suggest we work together and collaborate on a single
> approach going forward that works best for everyone.
> 
> 
> Let me start by saying that if we can get away with it, I think that a
> single PCI Root Complex in Xen would be best because it requires less
> complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only
> one?
> 
> Stewart, you are deep into vPCI, what's your thinking?

First allow me explain the moving pieces in a bit more detail (skip ahead to "Back to the question: " if you don't want to be bored with the details). I played around with this series, and I passed through a PCI device (with vPCI) and enabled virtio-pci:

virtio = [ "type=virtio,device,transport=pci,bdf=0000:00:00.0,backend_type=qemu" ]
device_model_args = [ "-device", "virtio-serial-pci" ]
pci = [ "01:00.0" ]

Indeed we get two root complexes (2 ECAM ranges, 2 sets of interrupts, etc.) from the domU point of view:

    pcie@10000000 {
        compatible = "pci-host-ecam-generic";
        device_type = "pci";
        reg = <0x00 0x10000000 0x00 0x10000000>;
        bus-range = <0x00 0xff>;
        #address-cells = <0x03>;
        #size-cells = <0x02>;
        status = "okay";
        ranges = <0x2000000 0x00 0x23000000 0x00 0x23000000 0x00 0x10000000 0x42000000 0x01 0x00 0x01 0x00 0x01 0x00>;
        #interrupt-cells = <0x01>;
        interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x74 0x04>;
        interrupt-map-mask = <0x00 0x00 0x00 0x07>;
    };

    pcie@33000000 {
        compatible = "pci-host-ecam-generic";
        device_type = "pci";
        reg = <0x00 0x33000000 0x00 0x200000>;
        bus-range = <0x00 0x01>;
        #address-cells = <0x03>;
        #size-cells = <0x02>;
        status = "okay";
        ranges = <0x2000000 0x00 0x34000000 0x00 0x34000000 0x00 0x800000 0x42000000 0x00 0x3a000000 0x00 0x3a000000 0x00 0x800000>;
        dma-coherent;
        #interrupt-cells = <0x01>;
        interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x0c 0x04 0x00 0x00 0x00 0x02 0xfde8 0x00 0x0d 0x04 0x00 0x00 0x00 0x03 0xfde8 0x00 0x0e 0x04 0x00 0x00 0x00 0x04 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x01 0xfde8 0x00 0x0d 0x04 0x800 0x00 0x00 0x02 0xfde8 0x00 0x0e 0x04 0x800 0x00 0x00 0x03 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x04 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x01 0xfde8 0x00 0x0e 0x04 0x1000 0x00 0x00 0x02 0xfde8 0x00 0x0f 0x04 0x1000 0x00 0x00 0x03 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x04 0xfde8 0x00 0x0d 0x04 0x1800 0x00 0x00 0x01 0xfde8 0x00 0x0f 0x04 0x1800 0x00 0x00 0x02 0xfde8 0x00 0x0c 0x04 0x1800 0x00 0x00 0x03 0xfde8 0x00 0x0d 0x04 0x1800 0x00 0x00 0x04 0xfde8 0x00 0x0e 0x04>;
        interrupt-map-mask = <0x1800 0x00 0x00 0x07>;
    };

Xen vPCI doesn't currently expose a host bridge (i.e. a device with base class 0x06). As an aside, we may eventually want to expose a virtual/emulated host bridge in vPCI, because Linux's x86 PCI probe expects one [0].

Qemu exposes an emulated host bridge, along with any requested emulated devices.

Running lspci -v in the domU yields the following:

0000:00:00.0 Network controller: Ralink corp. RT2790 Wireless 802.11n 1T/2R PCIe
        Subsystem: ASUSTeK Computer Inc. RT2790 Wireless 802.11n 1T/2R PCIe
        Flags: bus master, fast devsel, latency 0, IRQ 13
        Memory at 23000000 (32-bit, non-prefetchable) [size=64K]
        Capabilities: [50] MSI: Enable- Count=1/128 Maskable- 64bit+
        Kernel driver in use: rt2800pci

0001:00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge
        Subsystem: Red Hat, Inc. QEMU PCIe Host bridge
        Flags: fast devsel

0001:00:01.0 Communication controller: Red Hat, Inc. Virtio console
        Subsystem: Red Hat, Inc. Virtio console
        Flags: bus master, fast devsel, latency 0, IRQ 14
        Memory at 3a000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
        Capabilities: [70] Vendor Specific Information: VirtIO: Notify
        Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
        Capabilities: [50] Vendor Specific Information: VirtIO: ISR
        Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
        Kernel driver in use: virtio-pci

0000:00:00.0 is a real passed through device (corresponding to 0000:01:00.0 in dom0).
0001:00:00.0 is the qemu host bridge (base class 0x06).
They are on different segments because they are associated with different root complexes.


Back to the question: Sure, avoiding reserving more memory from the preciously small lowmem virtual memory layout is probably a good idea. With everything in a single virtual root complex (segment), it's probably possible to come up with some vBDF-picking algorithm (+ user ability to specify) that works for most use cases as discussed elsewhere. It will always be in a single fixed segment as far as I can tell.

Some more observations assuming a single virtual root complex:

We should probably hide the qemu host bridge(s) from the guest. In other words, hide all devices with base class 0x06, except eventually vPCI's own virtual host bridge. If we don't hide them, we would likely end up with multiple emulated host bridges on a single root complex (segment). That sounds messy and hard to manage.

We have a need to control the vBDF exposed to the guest - can we force qemu to use particular BDFs for its emulated devices? If not, we will need to do SBDF translation in vPCI for virtio-pci devices. Meaning virtio-pci would then be dependent on the vPCI series [1]. I think that's okay. Hmm - if a qemu SBDF clashes with a real hardware SBDF, is that an issue?

vPCI and virtio-pci devices will be sharing a single ECAM range, so it looks like we must rely on vPCI to intercept the accesses that need to be forwarded to ioreq.

Initially, it looks like we would be adding some sort of support for legacy interrupts in vPCI, but for virtio-pci devices only. Eventually, we may also want to consider virtio-pci with MSI/MSI-X, but it's probably okay to wait.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/pci/direct.c?h=v6.6.1#n186
[1] https://lists.xenproject.org/archives/html/xen-devel/2023-10/msg00660.html


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-17  3:31         ` Stewart Hildebrand
@ 2023-11-17  8:11           ` Oleksandr Tyshchenko
  2023-11-21 19:12             ` Stewart Hildebrand
  0 siblings, 1 reply; 30+ messages in thread
From: Oleksandr Tyshchenko @ 2023-11-17  8:11 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Julien Grall, Sergiy Kibrik, xen-devel@lists.xenproject.org,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	vikram.garhwal@amd.com, Stefano Stabellini



On 17.11.23 05:31, Stewart Hildebrand wrote:

Hello Stewart

[answering only for virtio-pci bits as for vPCI I am only familiar with 
code responsible for trapping config space accesses]

[snip]

>>
>>
>> Let me start by saying that if we can get away with it, I think that a
>> single PCI Root Complex in Xen would be best because it requires less
>> complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only
>> one?
>>
>> Stewart, you are deep into vPCI, what's your thinking?
> 
> First allow me explain the moving pieces in a bit more detail (skip ahead to "Back to the question: " if you don't want to be bored with the details). I played around with this series, and I passed through a PCI device (with vPCI) and enabled virtio-pci:
> 
> virtio = [ "type=virtio,device,transport=pci,bdf=0000:00:00.0,backend_type=qemu" ]
> device_model_args = [ "-device", "virtio-serial-pci" ]
> pci = [ "01:00.0" ]
> 
> Indeed we get two root complexes (2 ECAM ranges, 2 sets of interrupts, etc.) from the domU point of view:
> 
>      pcie@10000000 {
>          compatible = "pci-host-ecam-generic";
>          device_type = "pci";
>          reg = <0x00 0x10000000 0x00 0x10000000>;
>          bus-range = <0x00 0xff>;
>          #address-cells = <0x03>;
>          #size-cells = <0x02>;
>          status = "okay";
>          ranges = <0x2000000 0x00 0x23000000 0x00 0x23000000 0x00 0x10000000 0x42000000 0x01 0x00 0x01 0x00 0x01 0x00>;
>          #interrupt-cells = <0x01>;
>          interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x74 0x04>;
>          interrupt-map-mask = <0x00 0x00 0x00 0x07>;


I am wondering how you got interrupt-map here? AFAIR upstream toolstack 
doesn't add that property for vpci dt node.

>      };
> 
>      pcie@33000000 {
>          compatible = "pci-host-ecam-generic";
>          device_type = "pci";
>          reg = <0x00 0x33000000 0x00 0x200000>;
>          bus-range = <0x00 0x01>;
>          #address-cells = <0x03>;
>          #size-cells = <0x02>;
>          status = "okay";
>          ranges = <0x2000000 0x00 0x34000000 0x00 0x34000000 0x00 0x800000 0x42000000 0x00 0x3a000000 0x00 0x3a000000 0x00 0x800000>;
>          dma-coherent;
>          #interrupt-cells = <0x01>;
>          interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x0c 0x04 0x00 0x00 0x00 0x02 0xfde8 0x00 0x0d 0x04 0x00 0x00 0x00 0x03 0xfde8 0x00 0x0e 0x04 0x00 0x00 0x00 0x04 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x01 0xfde8 0x00 0x0d 0x04 0x800 0x00 0x00 0x02 0xfde8 0x00 0x0e 0x04 0x800 0x00 0x00 0x03 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x04 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x01 0xfde8 0x00 0x0e 0x04 0x1000 0x00 0x00 0x02 0xfde8 0x00 0x0f 0x04 0x1000 0x00 0x00 0x03 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x04 0xfde8 0x00 0x0d 0x04 0x1800 0x00 0x00 0x01 0xfde8 0x00 0x0f 0x04 0x1800 0x00 0x00 0x02 0xfde8 0x00 0x0c 0x04 0x1800 0x00 0x00 0x03 0xfde8 0x00 0x0d 0x04 0x1800 0x00 0x00 0x04 0xfde8 0x00 0x0e 0x04>;
>          interrupt-map-mask = <0x1800 0x00 0x00 0x07>;


that is correct dump.

BTW, if you added "grant_usage=1" (it is disabled by default for dom0) 
to virtio configuration you would get iommu-map property here as well 
[1]. This is another point to think about when considering combined 
approach (single PCI Host bridge node -> single virtual root complex), I 
guess usual PCI device doesn't want grant based DMA addresses, correct? 
If so, it shouldn't be specified in the property.


>      };
> 
> Xen vPCI doesn't currently expose a host bridge (i.e. a device with base class 0x06). As an aside, we may eventually want to expose a virtual/emulated host bridge in vPCI, because Linux's x86 PCI probe expects one [0].
> 
> Qemu exposes an emulated host bridge, along with any requested emulated devices.
> 
> Running lspci -v in the domU yields the following:
> 
> 0000:00:00.0 Network controller: Ralink corp. RT2790 Wireless 802.11n 1T/2R PCIe
>          Subsystem: ASUSTeK Computer Inc. RT2790 Wireless 802.11n 1T/2R PCIe
>          Flags: bus master, fast devsel, latency 0, IRQ 13
>          Memory at 23000000 (32-bit, non-prefetchable) [size=64K]
>          Capabilities: [50] MSI: Enable- Count=1/128 Maskable- 64bit+
>          Kernel driver in use: rt2800pci
> 
> 0001:00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge
>          Subsystem: Red Hat, Inc. QEMU PCIe Host bridge
>          Flags: fast devsel
> 
> 0001:00:01.0 Communication controller: Red Hat, Inc. Virtio console
>          Subsystem: Red Hat, Inc. Virtio console
>          Flags: bus master, fast devsel, latency 0, IRQ 14
>          Memory at 3a000000 (64-bit, prefetchable) [size=16K]
>          Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
>          Capabilities: [70] Vendor Specific Information: VirtIO: Notify
>          Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
>          Capabilities: [50] Vendor Specific Information: VirtIO: ISR
>          Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
>          Kernel driver in use: virtio-pci
> 
> 0000:00:00.0 is a real passed through device (corresponding to 0000:01:00.0 in dom0).
> 0001:00:00.0 is the qemu host bridge (base class 0x06).
> They are on different segments because they are associated with different root complexes.


Glad to hear this patch series doesn't seem to break PCI passthrough in 
your environment.


> 
> 
> Back to the question: Sure, avoiding reserving more memory from the preciously small lowmem virtual memory layout is probably a good idea. With everything in a single virtual root complex (segment), it's probably possible to come up with some vBDF-picking algorithm (+ user ability to specify) that works for most use cases as discussed elsewhere. It will always be in a single fixed segment as far as I can tell.
> 
> Some more observations assuming a single virtual root complex:
> 
> We should probably hide the qemu host bridge(s) from the guest. In other words, hide all devices with base class 0x06, except eventually vPCI's own virtual host bridge. If we don't hide them, we would likely end up with multiple emulated host bridges on a single root complex (segment). That sounds messy and hard to manage.
> 
> We have a need to control the vBDF exposed to the guest - can we force qemu to use particular BDFs for its emulated devices?


Yes, it is possible. Maybe there is a better way, but at
least *bus* and *addr* can be specified and Qemu indeed follows that.

device_model_args=[ '-device', 
'virtio-blk-pci,scsi=off,disable-legacy=on,iommu_platform=on,bus=pcie.0,addr=2,drive=image', 
'-drive', 'if=none,id=image,format=raw,file=/dev/mmcblk1p3' ]

virtio=[ "backend=Domain-0, type=virtio,device, transport=pci, 
bdf=0000:00:02.0, grant_usage=1, backend_type=qemu" ]

root@h3ulcb-domd:~# dmesg | grep virtio
[   0.660789] virtio-pci 0000:00:02.0: enabling device (0000 -> 0002)
[   0.715876] virtio_blk virtio0: [vda] 4096 512-byte logical blocks 
(2.10 MB/2.00 MiB)

root@h3ulcb-domd:~# lspci
00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge
00:02.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)

Also there is one moment for current series: bdf specified for 
virtio-pci device only makes sense for iommu-map property. So 
bdf=0000:00:02.0 in virtio property and bus=pcie.0,addr=2 in 
device_model_args property should be in sync.

[1] 
https://patchwork.kernel.org/project/xen-devel/patch/20231115112611.3865905-5-Sergiy_Kibrik@epam.com/


[snip]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-15 17:31       ` Julien Grall
  2023-11-15 18:14         ` Oleksandr Tyshchenko
@ 2023-11-17 13:19         ` Sergiy Kibrik
  2023-11-17 18:24           ` Julien Grall
  1 sibling, 1 reply; 30+ messages in thread
From: Sergiy Kibrik @ 2023-11-17 13:19 UTC (permalink / raw)
  To: Julien Grall, Oleksandr Tyshchenko,
	xen-devel@lists.xenproject.org
  Cc: Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk

hi Julien, Oleksandr,

[..]
>>
>>
>> This patch series only covers use-cases where the device emulator
>> handles the *entire* PCI Host bridge and PCI (virtio-pci) devices behind
>> it (i.e. Qemu). Also this patch series doesn't touch vPCI/PCI
>> pass-through resources, handling, accounting, nothing. 
> 
> I understood you want to one Device Emulator to handle the entire PCI 
> host bridge. But...
> 
>  From the
>> hypervisor we only need a help to intercept the config space accesses
>> happen in a range [GUEST_VIRTIO_PCI_ECAM_BASE ...
>> GUEST_VIRTIO_PCI_ECAM_BASE + GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE] and
>> forward them to the linked device emulator (if any), that's all.
> 
> ... I really don't see why you need to add code in Xen to trap the 
> region. If QEMU is dealing with the hostbridge, then it should be able 
> to register the MMIO region and then do the translation.
> 
>>
[..]
>>
>> I am afraid, we cannot end up exposing only single PCI Host bridge with
>> current model (if we use device emulators running in different domains
>> that handles the *entire* PCI Host bridges), this won't work.
> 
> That makes sense and it is fine. But see above, I think only the #2 is 
> necessary for the hypervisor. Patch #5 should not be necessary at all.
> 
> [...]

I did checks w/o patch #5 and can confirm that indeed -- qemu & xen can 
do this work without additional modifications to qemu code. So I'll drop
this patch from this series.

[..]
>>>> +/*
>>>> + * 16 MB is reserved for virtio-pci configuration space based on
>>>> calculation
>>>> + * 8 bridges * 2 buses x 32 devices x 8 functions x 4 KB = 16 MB
>>>
>>> Can you explain how youd ecided the "2"?
>>
>> good question, we have a limited free space available in memory layout
>> (we had difficulties to find a suitable holes) also we don't expect a
>> lot of virtio-pci devices, so "256" used vPCI would be too much. It was
>> decided to reduce significantly, but select maximum to fit into free
>> space, with having "2" buses we still fit into the chosen holes.
> 
> If you don't expect a lot of virtio devices, then why do you need two 
> buses? Wouldn't one be sufficient?
> 

one should be reasonably sufficient, I agree

>>
>>
>>>
>>>> + */
>>>> +#define GUEST_VIRTIO_PCI_ECAM_BASE          xen_mk_ullong(0x33000000)
>>>> +#define GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE    xen_mk_ullong(0x01000000)
>>>> +#define GUEST_VIRTIO_PCI_HOST_ECAM_SIZE     xen_mk_ullong(0x00200000)
>>>> +
>>>> +/* 64 MB is reserved for virtio-pci memory */
>>>> +#define GUEST_VIRTIO_PCI_ADDR_TYPE_MEM    xen_mk_ullong(0x02000000)
>>>> +#define GUEST_VIRTIO_PCI_MEM_ADDR         xen_mk_ullong(0x34000000)
>>>> +#define GUEST_VIRTIO_PCI_MEM_SIZE         xen_mk_ullong(0x04000000)
>>>> +
>>>>    /*
>>>>     * 16MB == 4096 pages reserved for guest to use as a region to 
>>>> map its
>>>>     * grant table in.
>>>> @@ -476,6 +489,11 @@ typedef uint64_t xen_callback_t;
>>>>    #define GUEST_MAGIC_BASE  xen_mk_ullong(0x39000000)
>>>>    #define GUEST_MAGIC_SIZE  xen_mk_ullong(0x01000000)
>>>> +/* 64 MB is reserved for virtio-pci Prefetch memory */
>>>
>>> This doesn't seem a lot depending on your use case. Can you details how
>>> you can up with "64 MB"?
>>
>> the same calculation as it was done configuration space. It was observed
>> that only 16K is used per virtio-pci device (maybe it can be bigger for
>> usual PCI device, I don't know). Please look at the example of DomU log
>> below (to strings that contain "*BAR 4: assigned*"):
> 
> What about virtio-gpu? I would expect a bit more memory is necessary for 
> that use case.
> 
> Any case, what I am looking for is for some explanation in the commit 
> message of the limits. I don't particularly care about the exact limit 
> because this is not part of a stable ABI.

sure, I'll put a bit more explanation in both comment and commit 
message. Should I post updated patch series, with updated resources and 
without patch #5, or shall we wait for some more comments here?

--
regards,
  Sergiy


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-17 13:19         ` Sergiy Kibrik
@ 2023-11-17 18:24           ` Julien Grall
  0 siblings, 0 replies; 30+ messages in thread
From: Julien Grall @ 2023-11-17 18:24 UTC (permalink / raw)
  To: Sergiy Kibrik, Oleksandr Tyshchenko,
	xen-devel@lists.xenproject.org
  Cc: Stefano Stabellini, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk

Hi Sergiy,

On 17/11/2023 13:19, Sergiy Kibrik wrote:
>>>>> + */
>>>>> +#define GUEST_VIRTIO_PCI_ECAM_BASE          xen_mk_ullong(0x33000000)
>>>>> +#define GUEST_VIRTIO_PCI_TOTAL_ECAM_SIZE    xen_mk_ullong(0x01000000)
>>>>> +#define GUEST_VIRTIO_PCI_HOST_ECAM_SIZE     xen_mk_ullong(0x00200000)
>>>>> +
>>>>> +/* 64 MB is reserved for virtio-pci memory */
>>>>> +#define GUEST_VIRTIO_PCI_ADDR_TYPE_MEM    xen_mk_ullong(0x02000000)
>>>>> +#define GUEST_VIRTIO_PCI_MEM_ADDR         xen_mk_ullong(0x34000000)
>>>>> +#define GUEST_VIRTIO_PCI_MEM_SIZE         xen_mk_ullong(0x04000000)
>>>>> +
>>>>>    /*
>>>>>     * 16MB == 4096 pages reserved for guest to use as a region to 
>>>>> map its
>>>>>     * grant table in.
>>>>> @@ -476,6 +489,11 @@ typedef uint64_t xen_callback_t;
>>>>>    #define GUEST_MAGIC_BASE  xen_mk_ullong(0x39000000)
>>>>>    #define GUEST_MAGIC_SIZE  xen_mk_ullong(0x01000000)
>>>>> +/* 64 MB is reserved for virtio-pci Prefetch memory */
>>>>
>>>> This doesn't seem a lot depending on your use case. Can you details how
>>>> you can up with "64 MB"?
>>>
>>> the same calculation as it was done configuration space. It was observed
>>> that only 16K is used per virtio-pci device (maybe it can be bigger for
>>> usual PCI device, I don't know). Please look at the example of DomU log
>>> below (to strings that contain "*BAR 4: assigned*"):
>>
>> What about virtio-gpu? I would expect a bit more memory is necessary 
>> for that use case.
>>
>> Any case, what I am looking for is for some explanation in the commit 
>> message of the limits. I don't particularly care about the exact limit 
>> because this is not part of a stable ABI.
> 
> sure, I'll put a bit more explanation in both comment and commit 
> message. Should I post updated patch series, with updated resources and 
> without patch #5, or shall we wait for some more comments here?

I would wait for comments before posting in particular if you haven't 
yet received any comment on the tools side.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-17  8:11           ` Oleksandr Tyshchenko
@ 2023-11-21 19:12             ` Stewart Hildebrand
  2023-11-22  1:14               ` Stefano Stabellini
  0 siblings, 1 reply; 30+ messages in thread
From: Stewart Hildebrand @ 2023-11-21 19:12 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Julien Grall, Sergiy Kibrik, xen-devel@lists.xenproject.org,
	Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
	vikram.garhwal@amd.com, Stefano Stabellini

On 11/17/23 03:11, Oleksandr Tyshchenko wrote:
> 
> 
> On 17.11.23 05:31, Stewart Hildebrand wrote:
> 
> Hello Stewart
> 
> [answering only for virtio-pci bits as for vPCI I am only familiar with 
> code responsible for trapping config space accesses]
> 
> [snip]
> 
>>>
>>>
>>> Let me start by saying that if we can get away with it, I think that a
>>> single PCI Root Complex in Xen would be best because it requires less
>>> complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only
>>> one?
>>>
>>> Stewart, you are deep into vPCI, what's your thinking?
>>
>> First allow me explain the moving pieces in a bit more detail (skip ahead to "Back to the question: " if you don't want to be bored with the details). I played around with this series, and I passed through a PCI device (with vPCI) and enabled virtio-pci:
>>
>> virtio = [ "type=virtio,device,transport=pci,bdf=0000:00:00.0,backend_type=qemu" ]
>> device_model_args = [ "-device", "virtio-serial-pci" ]
>> pci = [ "01:00.0" ]
>>
>> Indeed we get two root complexes (2 ECAM ranges, 2 sets of interrupts, etc.) from the domU point of view:
>>
>>      pcie@10000000 {
>>          compatible = "pci-host-ecam-generic";
>>          device_type = "pci";
>>          reg = <0x00 0x10000000 0x00 0x10000000>;
>>          bus-range = <0x00 0xff>;
>>          #address-cells = <0x03>;
>>          #size-cells = <0x02>;
>>          status = "okay";
>>          ranges = <0x2000000 0x00 0x23000000 0x00 0x23000000 0x00 0x10000000 0x42000000 0x01 0x00 0x01 0x00 0x01 0x00>;
>>          #interrupt-cells = <0x01>;
>>          interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x74 0x04>;
>>          interrupt-map-mask = <0x00 0x00 0x00 0x07>;
> 
> 
> I am wondering how you got interrupt-map here? AFAIR upstream toolstack 
> doesn't add that property for vpci dt node.

You are correct. I'm airing my dirty laundry here: this is a hack to get a legacy interrupt passed through on a ZCU102 board (Zynq UltraScale+), which doesn't have GICv3/ITS. In a system with a GICv3/ITS, we would have an msi-map property (this is also not yet upstream, but is in a couple of downstreams).

> 
>>      };
>>
>>      pcie@33000000 {
>>          compatible = "pci-host-ecam-generic";
>>          device_type = "pci";
>>          reg = <0x00 0x33000000 0x00 0x200000>;
>>          bus-range = <0x00 0x01>;
>>          #address-cells = <0x03>;
>>          #size-cells = <0x02>;
>>          status = "okay";
>>          ranges = <0x2000000 0x00 0x34000000 0x00 0x34000000 0x00 0x800000 0x42000000 0x00 0x3a000000 0x00 0x3a000000 0x00 0x800000>;
>>          dma-coherent;
>>          #interrupt-cells = <0x01>;
>>          interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x0c 0x04 0x00 0x00 0x00 0x02 0xfde8 0x00 0x0d 0x04 0x00 0x00 0x00 0x03 0xfde8 0x00 0x0e 0x04 0x00 0x00 0x00 0x04 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x01 0xfde8 0x00 0x0d 0x04 0x800 0x00 0x00 0x02 0xfde8 0x00 0x0e 0x04 0x800 0x00 0x00 0x03 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x04 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x01 0xfde8 0x00 0x0e 0x04 0x1000 0x00 0x00 0x02 0xfde8 0x00 0x0f 0x04 0x1000 0x00 0x00 0x03 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x04 0xfde8 0x00 0x0d 0x04 0x1800 0x00 0x00 0x01 0xfde8 0x00 0x0f 0x04 0x1800 0x00 0x00 0x02 0xfde8 0x00 0x0c 0x04 0x1800 0x00 0x00 0x03 0xfde8 0x00 0x0d 0x04 0x1800 0x00 0x00 0x04 0xfde8 0x00 0x0e 0x04>;
>>          interrupt-map-mask = <0x1800 0x00 0x00 0x07>;
> 
> 
> that is correct dump.
> 
> BTW, if you added "grant_usage=1" (it is disabled by default for dom0) 
> to virtio configuration you would get iommu-map property here as well 
> [1]. This is another point to think about when considering combined 
> approach (single PCI Host bridge node -> single virtual root complex), I 
> guess usual PCI device doesn't want grant based DMA addresses, correct? 
> If so, it shouldn't be specified in the property.

Right, a usual PCI device doesn't want/need grant based DMA addresses. The iommu-map property [2] is flexible enough to make it apply only to certain vBDFs or ranges of vBDFs. Actually, it looks like ("libxl/arm: Reuse generic PCI-IOMMU bindings for virtio-pci devices") already has the logic for doing exactly this. So it should be fine to have the iommu-map property in the single root complex and still do regular PCI passthrough.

Presumably, we would also want msi-map [3] to apply to some vBDFs, not others. msi-map has the same flexible BDF matching as iommu-map (these two bindings actually share the same parsing function), so we should be able to use a similar strategy here if needed.

We would also want interrupt-map [4] to apply to some vBDFs, not others. The BDF matching with interrupt-map behaves slightly differently, but I believe it is still flexible enough to achieve this. In this case, the function create_virtio_pci_irq_map(), added in ("libxl/arm: Add basic virtio-pci support"), would need some re-thinking.

In summary, with a single virtual root complex, the toolstack needs to know which vBDFs are virtio-pci, and which vBDFs are passthrough, so it can create the [iommu,msi,interrupt]-map properties accordingly.

[2] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
[3] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-msi.txt
[4] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.yaml

> 
> 
>>      };
>>
>> Xen vPCI doesn't currently expose a host bridge (i.e. a device with base class 0x06). As an aside, we may eventually want to expose a virtual/emulated host bridge in vPCI, because Linux's x86 PCI probe expects one [0].
>>
>> Qemu exposes an emulated host bridge, along with any requested emulated devices.
>>
>> Running lspci -v in the domU yields the following:
>>
>> 0000:00:00.0 Network controller: Ralink corp. RT2790 Wireless 802.11n 1T/2R PCIe
>>          Subsystem: ASUSTeK Computer Inc. RT2790 Wireless 802.11n 1T/2R PCIe
>>          Flags: bus master, fast devsel, latency 0, IRQ 13
>>          Memory at 23000000 (32-bit, non-prefetchable) [size=64K]
>>          Capabilities: [50] MSI: Enable- Count=1/128 Maskable- 64bit+
>>          Kernel driver in use: rt2800pci
>>
>> 0001:00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge
>>          Subsystem: Red Hat, Inc. QEMU PCIe Host bridge
>>          Flags: fast devsel
>>
>> 0001:00:01.0 Communication controller: Red Hat, Inc. Virtio console
>>          Subsystem: Red Hat, Inc. Virtio console
>>          Flags: bus master, fast devsel, latency 0, IRQ 14
>>          Memory at 3a000000 (64-bit, prefetchable) [size=16K]
>>          Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
>>          Capabilities: [70] Vendor Specific Information: VirtIO: Notify
>>          Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
>>          Capabilities: [50] Vendor Specific Information: VirtIO: ISR
>>          Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
>>          Kernel driver in use: virtio-pci
>>
>> 0000:00:00.0 is a real passed through device (corresponding to 0000:01:00.0 in dom0).
>> 0001:00:00.0 is the qemu host bridge (base class 0x06).
>> They are on different segments because they are associated with different root complexes.
> 
> 
> Glad to hear this patch series doesn't seem to break PCI passthrough in 
> your environment.
> 
> 
>>
>>
>> Back to the question: Sure, avoiding reserving more memory from the preciously small lowmem virtual memory layout is probably a good idea. With everything in a single virtual root complex (segment), it's probably possible to come up with some vBDF-picking algorithm (+ user ability to specify) that works for most use cases as discussed elsewhere. It will always be in a single fixed segment as far as I can tell.
>>
>> Some more observations assuming a single virtual root complex:
>>
>> We should probably hide the qemu host bridge(s) from the guest. In other words, hide all devices with base class 0x06, except eventually vPCI's own virtual host bridge. If we don't hide them, we would likely end up with multiple emulated host bridges on a single root complex (segment). That sounds messy and hard to manage.
>>
>> We have a need to control the vBDF exposed to the guest - can we force qemu to use particular BDFs for its emulated devices?
> 
> 
> Yes, it is possible. Maybe there is a better way, but at
> least *bus* and *addr* can be specified and Qemu indeed follows that.
> 
> device_model_args=[ '-device', 
> 'virtio-blk-pci,scsi=off,disable-legacy=on,iommu_platform=on,bus=pcie.0,addr=2,drive=image', 
> '-drive', 'if=none,id=image,format=raw,file=/dev/mmcblk1p3' ]
> 
> virtio=[ "backend=Domain-0, type=virtio,device, transport=pci, 
> bdf=0000:00:02.0, grant_usage=1, backend_type=qemu" ]
> 
> root@h3ulcb-domd:~# dmesg | grep virtio
> [   0.660789] virtio-pci 0000:00:02.0: enabling device (0000 -> 0002)
> [   0.715876] virtio_blk virtio0: [vda] 4096 512-byte logical blocks 
> (2.10 MB/2.00 MiB)
> 
> root@h3ulcb-domd:~# lspci
> 00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge
> 00:02.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
> 
> Also there is one moment for current series: bdf specified for 
> virtio-pci device only makes sense for iommu-map property. So 
> bdf=0000:00:02.0 in virtio property and bus=pcie.0,addr=2 in 
> device_model_args property should be in sync.
> 
> [1] 
> https://patchwork.kernel.org/project/xen-devel/patch/20231115112611.3865905-5-Sergiy_Kibrik@epam.com/
> 
> 
> [snip]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
  2023-11-21 19:12             ` Stewart Hildebrand
@ 2023-11-22  1:14               ` Stefano Stabellini
  0 siblings, 0 replies; 30+ messages in thread
From: Stefano Stabellini @ 2023-11-22  1:14 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: Oleksandr Tyshchenko, Julien Grall, Sergiy Kibrik,
	xen-devel@lists.xenproject.org, Bertrand Marquis, Michal Orzel,
	Volodymyr Babchuk, vikram.garhwal@amd.com, Stefano Stabellini

On Tue, 21 Nov 2023, Stewart Hildebrand wrote:
> On 11/17/23 03:11, Oleksandr Tyshchenko wrote:
> > 
> > 
> > On 17.11.23 05:31, Stewart Hildebrand wrote:
> > 
> > Hello Stewart
> > 
> > [answering only for virtio-pci bits as for vPCI I am only familiar with 
> > code responsible for trapping config space accesses]
> > 
> > [snip]
> > 
> >>>
> >>>
> >>> Let me start by saying that if we can get away with it, I think that a
> >>> single PCI Root Complex in Xen would be best because it requires less
> >>> complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only
> >>> one?
> >>>
> >>> Stewart, you are deep into vPCI, what's your thinking?
> >>
> >> First allow me explain the moving pieces in a bit more detail (skip ahead to "Back to the question: " if you don't want to be bored with the details). I played around with this series, and I passed through a PCI device (with vPCI) and enabled virtio-pci:
> >>
> >> virtio = [ "type=virtio,device,transport=pci,bdf=0000:00:00.0,backend_type=qemu" ]
> >> device_model_args = [ "-device", "virtio-serial-pci" ]
> >> pci = [ "01:00.0" ]
> >>
> >> Indeed we get two root complexes (2 ECAM ranges, 2 sets of interrupts, etc.) from the domU point of view:
> >>
> >>      pcie@10000000 {
> >>          compatible = "pci-host-ecam-generic";
> >>          device_type = "pci";
> >>          reg = <0x00 0x10000000 0x00 0x10000000>;
> >>          bus-range = <0x00 0xff>;
> >>          #address-cells = <0x03>;
> >>          #size-cells = <0x02>;
> >>          status = "okay";
> >>          ranges = <0x2000000 0x00 0x23000000 0x00 0x23000000 0x00 0x10000000 0x42000000 0x01 0x00 0x01 0x00 0x01 0x00>;
> >>          #interrupt-cells = <0x01>;
> >>          interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x74 0x04>;
> >>          interrupt-map-mask = <0x00 0x00 0x00 0x07>;
> > 
> > 
> > I am wondering how you got interrupt-map here? AFAIR upstream toolstack 
> > doesn't add that property for vpci dt node.
> 
> You are correct. I'm airing my dirty laundry here: this is a hack to get a legacy interrupt passed through on a ZCU102 board (Zynq UltraScale+), which doesn't have GICv3/ITS. In a system with a GICv3/ITS, we would have an msi-map property (this is also not yet upstream, but is in a couple of downstreams).
> 
> > 
> >>      };
> >>
> >>      pcie@33000000 {
> >>          compatible = "pci-host-ecam-generic";
> >>          device_type = "pci";
> >>          reg = <0x00 0x33000000 0x00 0x200000>;
> >>          bus-range = <0x00 0x01>;
> >>          #address-cells = <0x03>;
> >>          #size-cells = <0x02>;
> >>          status = "okay";
> >>          ranges = <0x2000000 0x00 0x34000000 0x00 0x34000000 0x00 0x800000 0x42000000 0x00 0x3a000000 0x00 0x3a000000 0x00 0x800000>;
> >>          dma-coherent;
> >>          #interrupt-cells = <0x01>;
> >>          interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x0c 0x04 0x00 0x00 0x00 0x02 0xfde8 0x00 0x0d 0x04 0x00 0x00 0x00 0x03 0xfde8 0x00 0x0e 0x04 0x00 0x00 0x00 0x04 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x01 0xfde8 0x00 0x0d 0x04 0x800 0x00 0x00 0x02 0xfde8 0x00 0x0e 0x04 0x800 0x00 0x00 0x03 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x04 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x01 0xfde8 0x00 0x0e 0x04 0x1000 0x00 0x00 0x02 0xfde8 0x00 0x0f 0x04 0x1000 0x00 0x00 0x03 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x04 0xfde8 0x00 0x0d 0x04 0x1800 0x00 0x00 0x01 0xfde8 0x00 0x0f 0x04 0x1800 0x00 0x00 0x02 0xfde8 0x00 0x0c 0x04 0x1800 0x00 0x00 0x03 0xfde8 0x00 0x0d 0x04 0x1800 0x00 0x00 0x04 0xfde8 0x00 0x0e 0x04>;
> >>          interrupt-map-mask = <0x1800 0x00 0x00 0x07>;
> > 
> > 
> > that is correct dump.
> > 
> > BTW, if you added "grant_usage=1" (it is disabled by default for dom0) 
> > to virtio configuration you would get iommu-map property here as well 
> > [1]. This is another point to think about when considering combined 
> > approach (single PCI Host bridge node -> single virtual root complex), I 
> > guess usual PCI device doesn't want grant based DMA addresses, correct? 
> > If so, it shouldn't be specified in the property.
> 
> Right, a usual PCI device doesn't want/need grant based DMA addresses. The iommu-map property [2] is flexible enough to make it apply only to certain vBDFs or ranges of vBDFs. Actually, it looks like ("libxl/arm: Reuse generic PCI-IOMMU bindings for virtio-pci devices") already has the logic for doing exactly this. So it should be fine to have the iommu-map property in the single root complex and still do regular PCI passthrough.
> 
> Presumably, we would also want msi-map [3] to apply to some vBDFs, not others. msi-map has the same flexible BDF matching as iommu-map (these two bindings actually share the same parsing function), so we should be able to use a similar strategy here if needed.
> 
> We would also want interrupt-map [4] to apply to some vBDFs, not others. The BDF matching with interrupt-map behaves slightly differently, but I believe it is still flexible enough to achieve this. In this case, the function create_virtio_pci_irq_map(), added in ("libxl/arm: Add basic virtio-pci support"), would need some re-thinking.
> 
> In summary, with a single virtual root complex, the toolstack needs to know which vBDFs are virtio-pci, and which vBDFs are passthrough, so it can create the [iommu,msi,interrupt]-map properties accordingly.

That should be fine given that the toolstack also knows all the PCI
Passthrough devices too.


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-11-22  1:14 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-15 11:26 [RFC PATCH 0/6] ARM virtio-pci initial support Sergiy Kibrik
2023-11-15 11:26 ` [RFC PATCH 1/6] libxl: Pass max_vcpus to Qemu in case of PVH domain (Arm) as well Sergiy Kibrik
2023-11-15 11:26 ` [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci Sergiy Kibrik
2023-11-15 12:33   ` Julien Grall
2023-11-15 16:51     ` Oleksandr Tyshchenko
2023-11-15 17:31       ` Julien Grall
2023-11-15 18:14         ` Oleksandr Tyshchenko
2023-11-15 18:33           ` Julien Grall
2023-11-15 19:38             ` Oleksandr Tyshchenko
2023-11-15 19:56               ` Julien Grall
2023-11-17 13:19         ` Sergiy Kibrik
2023-11-17 18:24           ` Julien Grall
2023-11-15 23:28       ` Stefano Stabellini
2023-11-16 15:07         ` Volodymyr Babchuk
2023-11-16 15:12           ` Julien Grall
2023-11-16 15:26             ` Stewart Hildebrand
2023-11-16 15:58               ` Julien Grall
2023-11-16 16:53                 ` Volodymyr Babchuk
2023-11-16 17:27                   ` Julien Grall
2023-11-16 23:04           ` Stefano Stabellini
2023-11-17  0:23             ` Volodymyr Babchuk
2023-11-17  3:31         ` Stewart Hildebrand
2023-11-17  8:11           ` Oleksandr Tyshchenko
2023-11-21 19:12             ` Stewart Hildebrand
2023-11-22  1:14               ` Stefano Stabellini
2023-11-15 11:26 ` [RFC PATCH 3/6] libxl/arm: Add basic virtio-pci support Sergiy Kibrik
2023-11-15 11:26 ` [RFC PATCH 4/6] libxl/arm: Reuse generic PCI-IOMMU bindings for virtio-pci devices Sergiy Kibrik
2023-11-15 11:26 ` [RFC PATCH 5/6] xen/arm: Intercept vPCI config accesses and forward them to emulator Sergiy Kibrik
2023-11-15 12:45   ` Julien Grall
2023-11-15 23:30     ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.