* [PATCH v1 3/6] vpci: rename and export vpci_bar_add_rangeset
2025-07-25 14:24 [PATCH v1 0/6] Implement SR-IOV support for PVH Mykyta Poturai
@ 2025-07-25 14:24 ` Mykyta Poturai
2025-07-25 14:54 ` Teddy Astie
2025-07-28 9:51 ` Roger Pau Monné
2025-07-25 14:24 ` [PATCH v1 2/6] vpci: rename and export vpci_guest_mem_bar_{read,write} Mykyta Poturai
` (5 subsequent siblings)
6 siblings, 2 replies; 19+ messages in thread
From: Mykyta Poturai @ 2025-07-25 14:24 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Mykyta Poturai, Roger Pau Monné
Export functions required for SR-IOV support.
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
---
| 8 ++++----
xen/include/xen/vpci.h | 2 ++
2 files changed, 6 insertions(+), 4 deletions(-)
--git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index f33fb27bde..f947f652cd 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -733,8 +733,8 @@ static void cf_check rom_write(
}
}
-static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
- unsigned int i)
+int vpci_bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
+ unsigned int i)
{
char str[32];
@@ -950,7 +950,7 @@ static int cf_check init_header(struct pci_dev *pdev)
else
bars[i].type = VPCI_BAR_MEM32;
- rc = bar_add_rangeset(pdev, &bars[i], i);
+ rc = vpci_bar_add_rangeset(pdev, &bars[i], i);
if ( rc )
goto fail;
@@ -1009,7 +1009,7 @@ static int cf_check init_header(struct pci_dev *pdev)
rom->type = VPCI_BAR_EMPTY;
else
{
- rc = bar_add_rangeset(pdev, rom, num_bars);
+ rc = vpci_bar_add_rangeset(pdev, rom, num_bars);
if ( rc )
goto fail;
}
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 0f0f321023..06f7039f20 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -294,6 +294,8 @@ bool vpci_ecam_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int len,
/* Map/unmap the BARs of a vPCI device. */
int vpci_modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only);
+int vpci_bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
+ unsigned int i);
#endif /* __XEN__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v1 3/6] vpci: rename and export vpci_bar_add_rangeset
2025-07-25 14:24 ` [PATCH v1 3/6] vpci: rename and export vpci_bar_add_rangeset Mykyta Poturai
@ 2025-07-25 14:54 ` Teddy Astie
2025-07-28 9:51 ` Roger Pau Monné
1 sibling, 0 replies; 19+ messages in thread
From: Teddy Astie @ 2025-07-25 14:54 UTC (permalink / raw)
To: Mykyta Poturai, xen-devel; +Cc: Roger Pau Monné
Le 25/07/2025 à 16:26, Mykyta Poturai a écrit :
> Export functions required for SR-IOV support.
>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> ---
> xen/drivers/vpci/header.c | 8 ++++----
> xen/include/xen/vpci.h | 2 ++
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index f33fb27bde..f947f652cd 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -733,8 +733,8 @@ static void cf_check rom_write(
> }
> }
>
> -static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
> - unsigned int i)
> +int vpci_bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
> + unsigned int i)
> {
> char str[32];
>
> @@ -950,7 +950,7 @@ static int cf_check init_header(struct pci_dev *pdev)
> else
> bars[i].type = VPCI_BAR_MEM32;
>
> - rc = bar_add_rangeset(pdev, &bars[i], i);
> + rc = vpci_bar_add_rangeset(pdev, &bars[i], i);
> if ( rc )
> goto fail;
>
> @@ -1009,7 +1009,7 @@ static int cf_check init_header(struct pci_dev *pdev)
> rom->type = VPCI_BAR_EMPTY;
> else
> {
> - rc = bar_add_rangeset(pdev, rom, num_bars);
> + rc = vpci_bar_add_rangeset(pdev, rom, num_bars);
> if ( rc )
> goto fail;
> }
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 0f0f321023..06f7039f20 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -294,6 +294,8 @@ bool vpci_ecam_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int len,
>
> /* Map/unmap the BARs of a vPCI device. */
> int vpci_modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only);
> +int vpci_bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
> + unsigned int i);
^
nit, alignment is a bit off
With that fixed
Reviewed-by: Teddy Astie <teddy.astie@vates.tech>
>
> #endif /* __XEN__ */
>
Teddy Astie | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v1 3/6] vpci: rename and export vpci_bar_add_rangeset
2025-07-25 14:24 ` [PATCH v1 3/6] vpci: rename and export vpci_bar_add_rangeset Mykyta Poturai
2025-07-25 14:54 ` Teddy Astie
@ 2025-07-28 9:51 ` Roger Pau Monné
1 sibling, 0 replies; 19+ messages in thread
From: Roger Pau Monné @ 2025-07-28 9:51 UTC (permalink / raw)
To: Mykyta Poturai; +Cc: xen-devel@lists.xenproject.org
On Fri, Jul 25, 2025 at 02:24:32PM +0000, Mykyta Poturai wrote:
> Export functions required for SR-IOV support.
>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> ---
> xen/drivers/vpci/header.c | 8 ++++----
> xen/include/xen/vpci.h | 2 ++
> 2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index f33fb27bde..f947f652cd 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -733,8 +733,8 @@ static void cf_check rom_write(
> }
> }
>
> -static int bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
> - unsigned int i)
> +int vpci_bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
> + unsigned int i)
Inline from my reply on the BAR register write series, it might be
less wasteful to allocate the ranges per-vCPU rather than per-device.
The only purpose of those rangesets is for BAR mapping, and such
mapping is performed exclusively for each device on a single vCPU.
With that there's probably no need to export this function, as it will
no longer exist.
Thanks, Roger.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v1 2/6] vpci: rename and export vpci_guest_mem_bar_{read,write}
2025-07-25 14:24 [PATCH v1 0/6] Implement SR-IOV support for PVH Mykyta Poturai
2025-07-25 14:24 ` [PATCH v1 3/6] vpci: rename and export vpci_bar_add_rangeset Mykyta Poturai
@ 2025-07-25 14:24 ` Mykyta Poturai
2025-07-25 14:51 ` Teddy Astie
2025-07-25 14:24 ` [PATCH v1 1/6] vpci: rename and export vpci_modify_bars Mykyta Poturai
` (4 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Mykyta Poturai @ 2025-07-25 14:24 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Stewart Hildebrand, Roger Pau Monné, Mykyta Poturai
From: Stewart Hildebrand <stewart.hildebrand@amd.com>
Export functions required for SR-IOV support.
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
---
| 20 +++++++++++---------
xen/include/xen/vpci.h | 6 ++++++
2 files changed, 17 insertions(+), 9 deletions(-)
--git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index c4d8c45640..f33fb27bde 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -610,9 +610,9 @@ static void cf_check bar_write(
pci_conf_write32(pdev->sbdf, reg, val);
}
-static void cf_check guest_mem_bar_write(const struct pci_dev *pdev,
- unsigned int reg, uint32_t val,
- void *data)
+void cf_check vpci_guest_mem_bar_write(const struct pci_dev *pdev,
+ unsigned int reg, uint32_t val,
+ void *data)
{
struct vpci_bar *bar = data;
bool hi = false;
@@ -652,8 +652,8 @@ static void cf_check guest_mem_bar_write(const struct pci_dev *pdev,
bar->guest_addr = guest_addr;
}
-static uint32_t cf_check guest_mem_bar_read(const struct pci_dev *pdev,
- unsigned int reg, void *data)
+uint32_t cf_check vpci_guest_mem_bar_read(const struct pci_dev *pdev,
+ unsigned int reg, void *data)
{
const struct vpci_bar *bar = data;
uint32_t reg_val;
@@ -920,8 +920,9 @@ static int cf_check init_header(struct pci_dev *pdev)
bars[i].type = VPCI_BAR_MEM64_HI;
rc = vpci_add_register(pdev->vpci,
is_hwdom ? vpci_hw_read32
- : guest_mem_bar_read,
- is_hwdom ? bar_write : guest_mem_bar_write,
+ : vpci_guest_mem_bar_read,
+ is_hwdom ? bar_write
+ : vpci_guest_mem_bar_write,
reg, 4, &bars[i]);
if ( rc )
goto fail;
@@ -979,8 +980,9 @@ static int cf_check init_header(struct pci_dev *pdev)
bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
rc = vpci_add_register(pdev->vpci,
- is_hwdom ? vpci_hw_read32 : guest_mem_bar_read,
- is_hwdom ? bar_write : guest_mem_bar_write,
+ is_hwdom ? vpci_hw_read32
+ : vpci_guest_mem_bar_read,
+ is_hwdom ? bar_write : vpci_guest_mem_bar_write,
reg, 4, &bars[i]);
if ( rc )
goto fail;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 5ef35b23c7..0f0f321023 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -69,6 +69,12 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
uint32_t cf_check vpci_read_val(
const struct pci_dev *pdev, unsigned int reg, void *data);
+void cf_check vpci_guest_mem_bar_write(const struct pci_dev *pdev,
+ unsigned int reg, uint32_t val,
+ void *data);
+uint32_t cf_check vpci_guest_mem_bar_read(const struct pci_dev *pdev,
+ unsigned int reg, void *data);
+
/* Passthrough handlers. */
uint32_t cf_check vpci_hw_read8(
const struct pci_dev *pdev, unsigned int reg, void *data);
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH v1 1/6] vpci: rename and export vpci_modify_bars
2025-07-25 14:24 [PATCH v1 0/6] Implement SR-IOV support for PVH Mykyta Poturai
2025-07-25 14:24 ` [PATCH v1 3/6] vpci: rename and export vpci_bar_add_rangeset Mykyta Poturai
2025-07-25 14:24 ` [PATCH v1 2/6] vpci: rename and export vpci_guest_mem_bar_{read,write} Mykyta Poturai
@ 2025-07-25 14:24 ` Mykyta Poturai
2025-07-25 14:42 ` Teddy Astie
2025-07-25 14:24 ` [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0 Mykyta Poturai
` (3 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Mykyta Poturai @ 2025-07-25 14:24 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Stewart Hildebrand, Roger Pau Monné, Mykyta Poturai
From: Stewart Hildebrand <stewart.hildebrand@amd.com>
Export functions required for SR-IOV support.
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
---
| 16 +++++++++-------
xen/include/xen/vpci.h | 3 +++
2 files changed, 12 insertions(+), 7 deletions(-)
--git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index bb76e70799..c4d8c45640 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -304,7 +304,7 @@ static void defer_map(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
raise_softirq(SCHEDULE_SOFTIRQ);
}
-static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
+int vpci_modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only)
{
struct vpci_header *header = &pdev->vpci->header;
struct pci_dev *tmp;
@@ -545,7 +545,7 @@ static void cf_check cmd_write(
* memory decoding bit has not been changed, so leave everything as-is,
* hoping the guest will realize and try again.
*/
- modify_bars(pdev, cmd, false);
+ vpci_modify_bars(pdev, cmd, false);
else
pci_conf_write16(pdev->sbdf, reg, cmd);
}
@@ -713,13 +713,15 @@ static void cf_check rom_write(
* Pass PCI_COMMAND_MEMORY or 0 to signal a map/unmap request, note that
* this fabricated command is never going to be written to the register.
*/
- else if ( modify_bars(pdev, new_enabled ? PCI_COMMAND_MEMORY : 0, true) )
+ else if ( vpci_modify_bars(pdev,
+ new_enabled ? PCI_COMMAND_MEMORY : 0,
+ true) )
/*
* No memory has been added or removed from the p2m (because the actual
* p2m changes are deferred in defer_map) and the ROM enable bit has
* not been changed, so leave everything as-is, hoping the guest will
* realize and try again. It's important to not update rom->addr in the
- * unmap case if modify_bars has failed, or future attempts would
+ * unmap case if vpci_modify_bars has failed, or future attempts would
* attempt to unmap the wrong address.
*/
return;
@@ -894,8 +896,8 @@ static int cf_check init_header(struct pci_dev *pdev)
/*
* For DomUs, clear PCI_COMMAND_{MASTER,MEMORY,IO} and other
* DomU-controllable bits in PCI_COMMAND. Devices assigned to DomUs will
- * start with memory decoding disabled, and modify_bars() will not be called
- * at the end of this function.
+ * start with memory decoding disabled, and vpci_modify_bars() will not be
+ * called at the end of this function.
*/
if ( !is_hwdom )
cmd &= ~(PCI_COMMAND_VGA_PALETTE | PCI_COMMAND_INVALIDATE |
@@ -1020,7 +1022,7 @@ static int cf_check init_header(struct pci_dev *pdev)
goto fail;
}
- return (cmd & PCI_COMMAND_MEMORY) ? modify_bars(pdev, cmd, false) : 0;
+ return (cmd & PCI_COMMAND_MEMORY) ? vpci_modify_bars(pdev, cmd, false) : 0;
fail:
pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 6a481a4e89..5ef35b23c7 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -286,6 +286,9 @@ bool vpci_ecam_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int len,
bool vpci_ecam_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int len,
unsigned long *data);
+/* Map/unmap the BARs of a vPCI device. */
+int vpci_modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only);
+
#endif /* __XEN__ */
#else /* !CONFIG_HAS_VPCI */
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0
2025-07-25 14:24 [PATCH v1 0/6] Implement SR-IOV support for PVH Mykyta Poturai
` (2 preceding siblings ...)
2025-07-25 14:24 ` [PATCH v1 1/6] vpci: rename and export vpci_modify_bars Mykyta Poturai
@ 2025-07-25 14:24 ` Mykyta Poturai
2025-07-25 17:39 ` Teddy Astie
2025-07-28 11:33 ` Roger Pau Monné
2025-07-25 14:24 ` [PATCH v1 5/6] vpci: export vpci_init_capability_list() Mykyta Poturai
` (2 subsequent siblings)
6 siblings, 2 replies; 19+ messages in thread
From: Mykyta Poturai @ 2025-07-25 14:24 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Stewart Hildebrand, Oleksii Kurochko, Community Manager,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
Mykyta Poturai
From: Stewart Hildebrand <stewart.hildebrand@amd.com>
This code is expected to only be used by privileged domains,
unprivileged domains should not get access to the SR-IOV capability.
Implement RW handlers for PCI_SRIOV_CTRL register to dynamically
map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to account
for possible changes in the system page size register.
Relies on dom0 to enable SR-IOV and PHYSDEVOP to inform Xen about
addition/removal of VFs.
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
---
CHANGELOG.md | 3 +-
SUPPORT.md | 2 -
xen/drivers/vpci/Makefile | 2 +-
| 3 +
xen/drivers/vpci/sriov.c | 235 ++++++++++++++++++++++++++++++++++++++
xen/drivers/vpci/vpci.c | 1 +
xen/include/xen/vpci.h | 7 +-
7 files changed, 247 insertions(+), 6 deletions(-)
create mode 100644 xen/drivers/vpci/sriov.c
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5f31ca08fe..7b0e8beb76 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -23,8 +23,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- On x86:
- Option to attempt to fixup p2m page-faults on PVH dom0.
- Resizable BARs is supported for PVH dom0.
- - Support PCI passthrough for HVM domUs when dom0 is PVH (note SR-IOV
- capability usage is not yet supported on PVH dom0).
+ - Support PCI passthrough for HVM domUs when dom0 is PVH.
- Smoke tests for the FreeBSD Xen builds in Cirrus CI.
- On Arm:
diff --git a/SUPPORT.md b/SUPPORT.md
index 6a82a92189..830b598714 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -170,8 +170,6 @@ unexpected behavior or issues on some hardware.
At least the following features are missing on a PVH dom0:
- * PCI SR-IOV.
-
* Native NMI forwarding (nmi=dom0 command line option).
* MCE handling.
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index a7c8a30a89..fe1e57b64d 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1,2 +1,2 @@
-obj-y += vpci.o header.o rebar.o
+obj-y += vpci.o header.o rebar.o sriov.o
obj-$(CONFIG_HAS_PCI_MSI) += msi.o msix.o
--git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index f947f652cd..0a840c6dcc 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -839,6 +839,9 @@ static int cf_check init_header(struct pci_dev *pdev)
ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+ if ( pdev->info.is_virtfn )
+ return 0;
+
switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
{
case PCI_HEADER_TYPE_NORMAL:
diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
new file mode 100644
index 0000000000..640430e3e9
--- /dev/null
+++ b/xen/drivers/vpci/sriov.c
@@ -0,0 +1,235 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Handlers for accesses to the SR-IOV capability structure.
+ *
+ * Copyright (C) 2018 Citrix Systems R&D
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+
+static int vf_init_bars(const struct pci_dev *vf_pdev)
+{
+ unsigned int i, sriov_pos;
+ int vf_idx, rc;
+ const struct pci_dev *pf_pdev = vf_pdev->pf_pdev;
+ uint16_t offset, stride;
+ struct vpci_bar *bars = vf_pdev->vpci->header.bars;
+ struct vpci_bar *physfn_vf_bars = pf_pdev->vpci->sriov->vf_bars;
+
+ sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
+ offset = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_VF_OFFSET);
+ stride = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_VF_STRIDE);
+
+ vf_idx = vf_pdev->sbdf.sbdf;
+ vf_idx -= pf_pdev->sbdf.sbdf + offset;
+ if ( vf_idx < 0 )
+ return -EINVAL;
+ if ( stride )
+ {
+ if ( vf_idx % stride )
+ return -EINVAL;
+ vf_idx /= stride;
+ }
+
+ /*
+ * Set up BARs for this VF out of PF's VF BARs taking into account
+ * the index of the VF.
+ */
+ for ( i = 0; i < PCI_SRIOV_NUM_BARS; i++ )
+ {
+ bars[i].addr = physfn_vf_bars[i].addr + vf_idx * physfn_vf_bars[i].size;
+ bars[i].guest_addr = bars[i].addr;
+ bars[i].size = physfn_vf_bars[i].size;
+ bars[i].type = physfn_vf_bars[i].type;
+ bars[i].prefetchable = physfn_vf_bars[i].prefetchable;
+ rc = vpci_bar_add_rangeset(vf_pdev, &bars[i], i);
+ if ( rc )
+ return rc;
+ }
+
+ return 0;
+}
+
+static int map_vf(const struct pci_dev *vf_pdev, uint16_t cmd)
+{
+ int rc;
+
+ ASSERT(rw_is_write_locked(&vf_pdev->domain->pci_lock));
+
+ rc = vf_init_bars(vf_pdev);
+ if ( rc )
+ return rc;
+
+ return vpci_modify_bars(vf_pdev, cmd, false);
+}
+
+static int size_vf_bars(struct pci_dev *pf_pdev, unsigned int sriov_pos)
+{
+ /*
+ * NB: a non-const pci_dev of the PF is needed in order to update
+ * vf_rlen.
+ */
+ struct vpci_bar *bars;
+ unsigned int i;
+ int rc = 0;
+
+ ASSERT(rw_is_write_locked(&pf_pdev->domain->pci_lock));
+ ASSERT(!pf_pdev->info.is_virtfn);
+
+ if ( !pf_pdev->vpci->sriov )
+ return -EINVAL;
+
+ /* Read BARs for VFs out of PF's SR-IOV extended capability. */
+ bars = pf_pdev->vpci->sriov->vf_bars;
+ /* Set the BARs addresses and size. */
+ for ( i = 0; i < PCI_SRIOV_NUM_BARS; i += rc )
+ {
+ unsigned int idx = sriov_pos + PCI_SRIOV_BAR + i * 4;
+ uint32_t bar;
+ uint64_t addr, size;
+
+ bar = pci_conf_read32(pf_pdev->sbdf, idx);
+
+ rc = pci_size_mem_bar(pf_pdev->sbdf, idx, &addr, &size,
+ PCI_BAR_VF |
+ ((i == PCI_SRIOV_NUM_BARS - 1) ? PCI_BAR_LAST
+ : 0));
+
+ /*
+ * Update vf_rlen on the PF. According to the spec the size of
+ * the BARs can change if the system page size register is
+ * modified, so always update rlen when enabling VFs.
+ */
+ pf_pdev->physfn.vf_rlen[i] = size;
+
+ if ( !size )
+ {
+ bars[i].type = VPCI_BAR_EMPTY;
+ continue;
+ }
+
+ bars[i].addr = addr;
+ bars[i].guest_addr = addr;
+ bars[i].size = size;
+ bars[i].prefetchable = bar & PCI_BASE_ADDRESS_MEM_PREFETCH;
+
+ switch ( rc )
+ {
+ case 1:
+ bars[i].type = VPCI_BAR_MEM32;
+ break;
+
+ case 2:
+ bars[i].type = VPCI_BAR_MEM64_LO;
+ bars[i + 1].type = VPCI_BAR_MEM64_HI;
+ break;
+
+ default:
+ ASSERT_UNREACHABLE();
+ }
+ }
+
+ rc = rc > 0 ? 0 : rc;
+
+ return rc;
+}
+
+static void cf_check control_write(const struct pci_dev *pdev, unsigned int reg,
+ uint32_t val, void *data)
+{
+ unsigned int sriov_pos = reg - PCI_SRIOV_CTRL;
+ uint16_t control = pci_conf_read16(pdev->sbdf, reg);
+ bool mem_enabled = control & PCI_SRIOV_CTRL_MSE;
+ bool new_mem_enabled = val & PCI_SRIOV_CTRL_MSE;
+
+ ASSERT(!pdev->info.is_virtfn);
+
+ if ( new_mem_enabled != mem_enabled )
+ {
+ struct pci_dev *vf_pdev;
+ if ( new_mem_enabled )
+ {
+ /* FIXME casting away const-ness to modify vf_rlen */
+ size_vf_bars((struct pci_dev *)pdev, sriov_pos);
+
+ list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
+ map_vf(vf_pdev, PCI_COMMAND_MEMORY);
+ }
+ else
+ {
+ list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
+ map_vf(vf_pdev, 0);
+ }
+ }
+
+ pci_conf_write16(pdev->sbdf, reg, val);
+}
+
+static int vf_init_header(struct pci_dev *vf_pdev)
+{
+ const struct pci_dev *pf_pdev;
+ unsigned int sriov_pos;
+ int rc = 0;
+ uint16_t ctrl;
+
+ ASSERT(rw_is_write_locked(&vf_pdev->domain->pci_lock));
+
+ if ( !vf_pdev->info.is_virtfn )
+ return 0;
+
+ pf_pdev = vf_pdev->pf_pdev;
+ ASSERT(pf_pdev);
+
+ sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
+ ctrl = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_CTRL);
+
+ if ( (pf_pdev->domain == vf_pdev->domain) && (ctrl & PCI_SRIOV_CTRL_MSE) )
+ {
+ rc = map_vf(vf_pdev, PCI_COMMAND_MEMORY);
+ if ( rc )
+ return rc;
+ }
+
+ return rc;
+}
+
+static int init_sriov(struct pci_dev *pdev)
+{
+ unsigned int pos;
+
+ if ( pdev->info.is_virtfn )
+ return vf_init_header(pdev);
+
+ pos = pci_find_ext_capability(pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
+
+ if ( !pos )
+ return 0;
+
+ if ( !is_hardware_domain(pdev->domain) )
+ {
+ printk(XENLOG_ERR
+ "%pp: SR-IOV configuration unsupported for unpriv %pd\n",
+ &pdev->sbdf, pdev->domain);
+ return 0;
+ }
+
+ pdev->vpci->sriov = xzalloc(struct vpci_sriov);
+ if ( !pdev->vpci->sriov )
+ return -ENOMEM;
+
+ return vpci_add_register(pdev->vpci, vpci_hw_read16, control_write,
+ pos + PCI_SRIOV_CTRL, 2, NULL);
+}
+
+REGISTER_VPCI_INIT(init_sriov, VPCI_PRIORITY_LOW);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 09988f04c2..7af6651831 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -120,6 +120,7 @@ void vpci_deassign_device(struct pci_dev *pdev)
for ( i = 0; i < ARRAY_SIZE(pdev->vpci->header.bars); i++ )
rangeset_destroy(pdev->vpci->header.bars[i].mem);
+ xfree(pdev->vpci->sriov);
xfree(pdev->vpci->msix);
xfree(pdev->vpci->msi);
xfree(pdev->vpci);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 06f7039f20..9e8dcab17e 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -138,7 +138,6 @@ struct vpci {
* upon to know whether BARs are mapped into the guest p2m.
*/
bool bars_mapped : 1;
- /* FIXME: currently there's no support for SR-IOV. */
} header;
/* MSI data. */
@@ -192,6 +191,12 @@ struct vpci {
struct vpci_arch_msix_entry arch;
} entries[];
} *msix;
+
+ struct vpci_sriov {
+ /* PF only */
+ struct vpci_bar vf_bars[PCI_SRIOV_NUM_BARS];
+ } *sriov;
+
#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
/* Guest SBDF of the device. */
#define INVALID_GUEST_SBDF ((pci_sbdf_t){ .sbdf = ~0U })
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0
2025-07-25 14:24 ` [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0 Mykyta Poturai
@ 2025-07-25 17:39 ` Teddy Astie
2026-02-26 10:06 ` Mykyta Poturai
2025-07-28 11:33 ` Roger Pau Monné
1 sibling, 1 reply; 19+ messages in thread
From: Teddy Astie @ 2025-07-25 17:39 UTC (permalink / raw)
To: Mykyta Poturai, xen-devel
Cc: Stewart Hildebrand, Oleksii Kurochko, Community Manager,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Le 25/07/2025 à 16:26, Mykyta Poturai a écrit :
> From: Stewart Hildebrand <stewart.hildebrand@amd.com>
>
> This code is expected to only be used by privileged domains,
> unprivileged domains should not get access to the SR-IOV capability.
>
> Implement RW handlers for PCI_SRIOV_CTRL register to dynamically
> map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to account
> for possible changes in the system page size register.
>
> Relies on dom0 to enable SR-IOV and PHYSDEVOP to inform Xen about
> addition/removal of VFs.
>
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> ---
> CHANGELOG.md | 3 +-
> SUPPORT.md | 2 -
> xen/drivers/vpci/Makefile | 2 +-
> xen/drivers/vpci/header.c | 3 +
> xen/drivers/vpci/sriov.c | 235 ++++++++++++++++++++++++++++++++++++++
> xen/drivers/vpci/vpci.c | 1 +
> xen/include/xen/vpci.h | 7 +-
> 7 files changed, 247 insertions(+), 6 deletions(-)
> create mode 100644 xen/drivers/vpci/sriov.c
>
> diff --git a/CHANGELOG.md b/CHANGELOG.md
> index 5f31ca08fe..7b0e8beb76 100644
> --- a/CHANGELOG.md
> +++ b/CHANGELOG.md
> @@ -23,8 +23,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
> - On x86:
> - Option to attempt to fixup p2m page-faults on PVH dom0.
> - Resizable BARs is supported for PVH dom0.
> - - Support PCI passthrough for HVM domUs when dom0 is PVH (note SR-IOV
> - capability usage is not yet supported on PVH dom0).
> + - Support PCI passthrough for HVM domUs when dom0 is PVH.
> - Smoke tests for the FreeBSD Xen builds in Cirrus CI.
>
> - On Arm:
> diff --git a/SUPPORT.md b/SUPPORT.md
> index 6a82a92189..830b598714 100644
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -170,8 +170,6 @@ unexpected behavior or issues on some hardware.
>
> At least the following features are missing on a PVH dom0:
>
> - * PCI SR-IOV.
> -
> * Native NMI forwarding (nmi=dom0 command line option).
>
> * MCE handling.
I would prefer changelog/support changes to be a separate patch or
alternatively at the last patch as I don't think SR-IOV is fully working
without the 2 remaining ones.
> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
> index a7c8a30a89..fe1e57b64d 100644
> --- a/xen/drivers/vpci/Makefile
> +++ b/xen/drivers/vpci/Makefile
> @@ -1,2 +1,2 @@
> -obj-y += vpci.o header.o rebar.o
> +obj-y += vpci.o header.o rebar.o sriov.o
> obj-$(CONFIG_HAS_PCI_MSI) += msi.o msix.o
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index f947f652cd..0a840c6dcc 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -839,6 +839,9 @@ static int cf_check init_header(struct pci_dev *pdev)
>
> ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>
> + if ( pdev->info.is_virtfn )
> + return 0;
> +
> switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
> {
> case PCI_HEADER_TYPE_NORMAL:
> diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
> new file mode 100644
> index 0000000000..640430e3e9
> --- /dev/null
> +++ b/xen/drivers/vpci/sriov.c
> @@ -0,0 +1,235 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Handlers for accesses to the SR-IOV capability structure.
> + *
> + * Copyright (C) 2018 Citrix Systems R&D
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/vpci.h>
> +
> +static int vf_init_bars(const struct pci_dev *vf_pdev)
> +{
> + unsigned int i, sriov_pos;
> + int vf_idx, rc;
> + const struct pci_dev *pf_pdev = vf_pdev->pf_pdev;
> + uint16_t offset, stride;
> + struct vpci_bar *bars = vf_pdev->vpci->header.bars;
> + struct vpci_bar *physfn_vf_bars = pf_pdev->vpci->sriov->vf_bars;
> +
> + sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
> + offset = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_VF_OFFSET);
> + stride = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_VF_STRIDE);
> +
> + vf_idx = vf_pdev->sbdf.sbdf;
> + vf_idx -= pf_pdev->sbdf.sbdf + offset;
We can probably rewrite it as
vf_idx = vf_pdev->sbdf.sbdf - (pf_pdev->sbdf.sbdf + offset);
especially with sbdf being potentially not representable as a int (even
though very unlikely).
> + if ( vf_idx < 0 )
> + return -EINVAL;
> + if ( stride )
> + {
> + if ( vf_idx % stride )
> + return -EINVAL;
> + vf_idx /= stride;
> + }
> +
> + /*
> + * Set up BARs for this VF out of PF's VF BARs taking into account
> + * the index of the VF.
> + */
> + for ( i = 0; i < PCI_SRIOV_NUM_BARS; i++ )
> + {
> + bars[i].addr = physfn_vf_bars[i].addr + vf_idx * physfn_vf_bars[i].size;
> + bars[i].guest_addr = bars[i].addr;
> + bars[i].size = physfn_vf_bars[i].size;
> + bars[i].type = physfn_vf_bars[i].type;
> + bars[i].prefetchable = physfn_vf_bars[i].prefetchable;
> + rc = vpci_bar_add_rangeset(vf_pdev, &bars[i], i);
> + if ( rc )
> + return rc;
> + }
> +
> + return 0;
> +}
> +
> +static int map_vf(const struct pci_dev *vf_pdev, uint16_t cmd)
> +{
> + int rc;
> +
> + ASSERT(rw_is_write_locked(&vf_pdev->domain->pci_lock));
> +
> + rc = vf_init_bars(vf_pdev);
> + if ( rc )
> + return rc;
> +
> + return vpci_modify_bars(vf_pdev, cmd, false);
> +}
> +
> +static int size_vf_bars(struct pci_dev *pf_pdev, unsigned int sriov_pos)
> +{
> + /*
> + * NB: a non-const pci_dev of the PF is needed in order to update
> + * vf_rlen.
> + */
> + struct vpci_bar *bars;
> + unsigned int i;
> + int rc = 0;
> +
> + ASSERT(rw_is_write_locked(&pf_pdev->domain->pci_lock));
> + ASSERT(!pf_pdev->info.is_virtfn);
> +
> + if ( !pf_pdev->vpci->sriov )
> + return -EINVAL;
> +
> + /* Read BARs for VFs out of PF's SR-IOV extended capability. */
> + bars = pf_pdev->vpci->sriov->vf_bars;
> + /* Set the BARs addresses and size. */
> + for ( i = 0; i < PCI_SRIOV_NUM_BARS; i += rc )
> + {
> + unsigned int idx = sriov_pos + PCI_SRIOV_BAR + i * 4;
> + uint32_t bar;
> + uint64_t addr, size;
> +
> + bar = pci_conf_read32(pf_pdev->sbdf, idx);
> +
> + rc = pci_size_mem_bar(pf_pdev->sbdf, idx, &addr, &size,
> + PCI_BAR_VF |
> + ((i == PCI_SRIOV_NUM_BARS - 1) ? PCI_BAR_LAST
> + : 0));
> +
> + /*
> + * Update vf_rlen on the PF. According to the spec the size of
> + * the BARs can change if the system page size register is
> + * modified, so always update rlen when enabling VFs.
> + */
> + pf_pdev->physfn.vf_rlen[i] = size;
> +
> + if ( !size )
> + {
> + bars[i].type = VPCI_BAR_EMPTY;
> + continue;
> + }
> +
> + bars[i].addr = addr;
> + bars[i].guest_addr = addr;
> + bars[i].size = size;
> + bars[i].prefetchable = bar & PCI_BASE_ADDRESS_MEM_PREFETCH;
> +
> + switch ( rc )
> + {
> + case 1:
> + bars[i].type = VPCI_BAR_MEM32;
> + break;
> +
> + case 2:
> + bars[i].type = VPCI_BAR_MEM64_LO;
> + bars[i + 1].type = VPCI_BAR_MEM64_HI;
> + break;
> +
> + default:
> + ASSERT_UNREACHABLE();
> + }
> + }
> +
> + rc = rc > 0 ? 0 : rc;
> +
> + return rc;
> +}
> +
> +static void cf_check control_write(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data)
> +{
> + unsigned int sriov_pos = reg - PCI_SRIOV_CTRL;
> + uint16_t control = pci_conf_read16(pdev->sbdf, reg);
> + bool mem_enabled = control & PCI_SRIOV_CTRL_MSE;
> + bool new_mem_enabled = val & PCI_SRIOV_CTRL_MSE;
> +
> + ASSERT(!pdev->info.is_virtfn);
> +
> + if ( new_mem_enabled != mem_enabled )
> + {
> + struct pci_dev *vf_pdev;
> + if ( new_mem_enabled )
> + {
> + /* FIXME casting away const-ness to modify vf_rlen */
> + size_vf_bars((struct pci_dev *)pdev, sriov_pos);
> +
> + list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
> + map_vf(vf_pdev, PCI_COMMAND_MEMORY);
> + }
> + else
> + {
> + list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
> + map_vf(vf_pdev, 0);
> + }
> + }
> +
> + pci_conf_write16(pdev->sbdf, reg, val);
> +}
> +
> +static int vf_init_header(struct pci_dev *vf_pdev)
> +{
> + const struct pci_dev *pf_pdev;
> + unsigned int sriov_pos;
> + int rc = 0;
> + uint16_t ctrl;
> +
> + ASSERT(rw_is_write_locked(&vf_pdev->domain->pci_lock));
> +
> + if ( !vf_pdev->info.is_virtfn )
> + return 0;
> +
> + pf_pdev = vf_pdev->pf_pdev;
> + ASSERT(pf_pdev);
> +
> + sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
> + ctrl = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_CTRL);
> +
> + if ( (pf_pdev->domain == vf_pdev->domain) && (ctrl & PCI_SRIOV_CTRL_MSE) )
> + {
> + rc = map_vf(vf_pdev, PCI_COMMAND_MEMORY);
> + if ( rc )
> + return rc;
> + }
> +
> + return rc;
> +}
> +
> +static int init_sriov(struct pci_dev *pdev)
> +{
> + unsigned int pos;
> +
> + if ( pdev->info.is_virtfn )
> + return vf_init_header(pdev);
> +
> + pos = pci_find_ext_capability(pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
> +
> + if ( !pos )
> + return 0;
> +
> + if ( !is_hardware_domain(pdev->domain) )
> + {
> + printk(XENLOG_ERR
> + "%pp: SR-IOV configuration unsupported for unpriv %pd\n",
> + &pdev->sbdf, pdev->domain);
> + return 0;
> + }
Should it instead rely on xsm to know if it the operation is allowed or
not for this domain ?
> +
> + pdev->vpci->sriov = xzalloc(struct vpci_sriov);
> + if ( !pdev->vpci->sriov )
> + return -ENOMEM;
> +
> + return vpci_add_register(pdev->vpci, vpci_hw_read16, control_write,
> + pos + PCI_SRIOV_CTRL, 2, NULL);
> +}
> +
> +REGISTER_VPCI_INIT(init_sriov, VPCI_PRIORITY_LOW);
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 09988f04c2..7af6651831 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -120,6 +120,7 @@ void vpci_deassign_device(struct pci_dev *pdev)
> for ( i = 0; i < ARRAY_SIZE(pdev->vpci->header.bars); i++ )
> rangeset_destroy(pdev->vpci->header.bars[i].mem);
>
> + xfree(pdev->vpci->sriov);
> xfree(pdev->vpci->msix);
> xfree(pdev->vpci->msi);
> xfree(pdev->vpci);
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 06f7039f20..9e8dcab17e 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -138,7 +138,6 @@ struct vpci {
> * upon to know whether BARs are mapped into the guest p2m.
> */
> bool bars_mapped : 1;
> - /* FIXME: currently there's no support for SR-IOV. */
> } header;
>
> /* MSI data. */
> @@ -192,6 +191,12 @@ struct vpci {
> struct vpci_arch_msix_entry arch;
> } entries[];
> } *msix;
> +
> + struct vpci_sriov {
> + /* PF only */
> + struct vpci_bar vf_bars[PCI_SRIOV_NUM_BARS];
> + } *sriov;
> +
> #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> /* Guest SBDF of the device. */
> #define INVALID_GUEST_SBDF ((pci_sbdf_t){ .sbdf = ~0U })
Teddy Astie | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0
2025-07-25 17:39 ` Teddy Astie
@ 2026-02-26 10:06 ` Mykyta Poturai
0 siblings, 0 replies; 19+ messages in thread
From: Mykyta Poturai @ 2026-02-26 10:06 UTC (permalink / raw)
To: Teddy Astie, xen-devel@lists.xenproject.org
Cc: Stewart Hildebrand, Oleksii Kurochko, Community Manager,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
On 7/25/25 20:39, Teddy Astie wrote:
> Le 25/07/2025 à 16:26, Mykyta Poturai a écrit :
>> From: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>
>> This code is expected to only be used by privileged domains,
>> unprivileged domains should not get access to the SR-IOV capability.
>>
>> Implement RW handlers for PCI_SRIOV_CTRL register to dynamically
>> map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to account
>> for possible changes in the system page size register.
>>
>> Relies on dom0 to enable SR-IOV and PHYSDEVOP to inform Xen about
>> addition/removal of VFs.
>>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
>> ---
>> CHANGELOG.md | 3 +-
>> SUPPORT.md | 2 -
>> xen/drivers/vpci/Makefile | 2 +-
>> xen/drivers/vpci/header.c | 3 +
>> xen/drivers/vpci/sriov.c | 235 ++++++++++++++++++++++++++++++++++++++
>> xen/drivers/vpci/vpci.c | 1 +
>> xen/include/xen/vpci.h | 7 +-
>> 7 files changed, 247 insertions(+), 6 deletions(-)
>> create mode 100644 xen/drivers/vpci/sriov.c
>>
>> diff --git a/CHANGELOG.md b/CHANGELOG.md
>> index 5f31ca08fe..7b0e8beb76 100644
>> --- a/CHANGELOG.md
>> +++ b/CHANGELOG.md
>> @@ -23,8 +23,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/
>> - On x86:
>> - Option to attempt to fixup p2m page-faults on PVH dom0.
>> - Resizable BARs is supported for PVH dom0.
>> - - Support PCI passthrough for HVM domUs when dom0 is PVH (note SR-IOV
>> - capability usage is not yet supported on PVH dom0).
>> + - Support PCI passthrough for HVM domUs when dom0 is PVH.
>> - Smoke tests for the FreeBSD Xen builds in Cirrus CI.
>>
>> - On Arm:
>> diff --git a/SUPPORT.md b/SUPPORT.md
>> index 6a82a92189..830b598714 100644
>> --- a/SUPPORT.md
>> +++ b/SUPPORT.md
>> @@ -170,8 +170,6 @@ unexpected behavior or issues on some hardware.
>>
>> At least the following features are missing on a PVH dom0:
>>
>> - * PCI SR-IOV.
>> -
>> * Native NMI forwarding (nmi=dom0 command line option).
>>
>> * MCE handling.
>
> I would prefer changelog/support changes to be a separate patch or
> alternatively at the last patch as I don't think SR-IOV is fully working
> without the 2 remaining ones.
>
>> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
>> index a7c8a30a89..fe1e57b64d 100644
>> --- a/xen/drivers/vpci/Makefile
>> +++ b/xen/drivers/vpci/Makefile
>> @@ -1,2 +1,2 @@
>> -obj-y += vpci.o header.o rebar.o
>> +obj-y += vpci.o header.o rebar.o sriov.o
>> obj-$(CONFIG_HAS_PCI_MSI) += msi.o msix.o
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index f947f652cd..0a840c6dcc 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -839,6 +839,9 @@ static int cf_check init_header(struct pci_dev *pdev)
>>
>> ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>
>> + if ( pdev->info.is_virtfn )
>> + return 0;
>> +
>> switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>> {
>> case PCI_HEADER_TYPE_NORMAL:
>> diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
>> new file mode 100644
>> index 0000000000..640430e3e9
>> --- /dev/null
>> +++ b/xen/drivers/vpci/sriov.c
>> @@ -0,0 +1,235 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Handlers for accesses to the SR-IOV capability structure.
>> + *
>> + * Copyright (C) 2018 Citrix Systems R&D
>> + */
>> +
>> +#include <xen/sched.h>
>> +#include <xen/vpci.h>
>> +
>> +static int vf_init_bars(const struct pci_dev *vf_pdev)
>> +{
>> + unsigned int i, sriov_pos;
>> + int vf_idx, rc;
>> + const struct pci_dev *pf_pdev = vf_pdev->pf_pdev;
>> + uint16_t offset, stride;
>> + struct vpci_bar *bars = vf_pdev->vpci->header.bars;
>> + struct vpci_bar *physfn_vf_bars = pf_pdev->vpci->sriov->vf_bars;
>> +
>> + sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
>> + offset = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_VF_OFFSET);
>> + stride = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_VF_STRIDE);
>> +
>> + vf_idx = vf_pdev->sbdf.sbdf;
>> + vf_idx -= pf_pdev->sbdf.sbdf + offset;
>
> We can probably rewrite it as
>
> vf_idx = vf_pdev->sbdf.sbdf - (pf_pdev->sbdf.sbdf + offset);
>
> especially with sbdf being potentially not representable as a int (even
> though very unlikely).
>
>> + if ( vf_idx < 0 )
>> + return -EINVAL;
>> + if ( stride )
>> + {
>> + if ( vf_idx % stride )
>> + return -EINVAL;
>> + vf_idx /= stride;
>> + }
>> +
>> + /*
>> + * Set up BARs for this VF out of PF's VF BARs taking into account
>> + * the index of the VF.
>> + */
>> + for ( i = 0; i < PCI_SRIOV_NUM_BARS; i++ )
>> + {
>> + bars[i].addr = physfn_vf_bars[i].addr + vf_idx * physfn_vf_bars[i].size;
>> + bars[i].guest_addr = bars[i].addr;
>> + bars[i].size = physfn_vf_bars[i].size;
>> + bars[i].type = physfn_vf_bars[i].type;
>> + bars[i].prefetchable = physfn_vf_bars[i].prefetchable;
>> + rc = vpci_bar_add_rangeset(vf_pdev, &bars[i], i);
>> + if ( rc )
>> + return rc;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int map_vf(const struct pci_dev *vf_pdev, uint16_t cmd)
>> +{
>> + int rc;
>> +
>> + ASSERT(rw_is_write_locked(&vf_pdev->domain->pci_lock));
>> +
>> + rc = vf_init_bars(vf_pdev);
>> + if ( rc )
>> + return rc;
>> +
>> + return vpci_modify_bars(vf_pdev, cmd, false);
>> +}
>> +
>> +static int size_vf_bars(struct pci_dev *pf_pdev, unsigned int sriov_pos)
>> +{
>> + /*
>> + * NB: a non-const pci_dev of the PF is needed in order to update
>> + * vf_rlen.
>> + */
>> + struct vpci_bar *bars;
>> + unsigned int i;
>> + int rc = 0;
>> +
>> + ASSERT(rw_is_write_locked(&pf_pdev->domain->pci_lock));
>> + ASSERT(!pf_pdev->info.is_virtfn);
>> +
>> + if ( !pf_pdev->vpci->sriov )
>> + return -EINVAL;
>> +
>> + /* Read BARs for VFs out of PF's SR-IOV extended capability. */
>> + bars = pf_pdev->vpci->sriov->vf_bars;
>> + /* Set the BARs addresses and size. */
>> + for ( i = 0; i < PCI_SRIOV_NUM_BARS; i += rc )
>> + {
>> + unsigned int idx = sriov_pos + PCI_SRIOV_BAR + i * 4;
>> + uint32_t bar;
>> + uint64_t addr, size;
>> +
>> + bar = pci_conf_read32(pf_pdev->sbdf, idx);
>> +
>> + rc = pci_size_mem_bar(pf_pdev->sbdf, idx, &addr, &size,
>> + PCI_BAR_VF |
>> + ((i == PCI_SRIOV_NUM_BARS - 1) ? PCI_BAR_LAST
>> + : 0));
>> +
>> + /*
>> + * Update vf_rlen on the PF. According to the spec the size of
>> + * the BARs can change if the system page size register is
>> + * modified, so always update rlen when enabling VFs.
>> + */
>> + pf_pdev->physfn.vf_rlen[i] = size;
>> +
>> + if ( !size )
>> + {
>> + bars[i].type = VPCI_BAR_EMPTY;
>> + continue;
>> + }
>> +
>> + bars[i].addr = addr;
>> + bars[i].guest_addr = addr;
>> + bars[i].size = size;
>> + bars[i].prefetchable = bar & PCI_BASE_ADDRESS_MEM_PREFETCH;
>> +
>> + switch ( rc )
>> + {
>> + case 1:
>> + bars[i].type = VPCI_BAR_MEM32;
>> + break;
>> +
>> + case 2:
>> + bars[i].type = VPCI_BAR_MEM64_LO;
>> + bars[i + 1].type = VPCI_BAR_MEM64_HI;
>> + break;
>> +
>> + default:
>> + ASSERT_UNREACHABLE();
>> + }
>> + }
>> +
>> + rc = rc > 0 ? 0 : rc;
>> +
>> + return rc;
>> +}
>> +
>> +static void cf_check control_write(const struct pci_dev *pdev, unsigned int reg,
>> + uint32_t val, void *data)
>> +{
>> + unsigned int sriov_pos = reg - PCI_SRIOV_CTRL;
>> + uint16_t control = pci_conf_read16(pdev->sbdf, reg);
>> + bool mem_enabled = control & PCI_SRIOV_CTRL_MSE;
>> + bool new_mem_enabled = val & PCI_SRIOV_CTRL_MSE;
>> +
>> + ASSERT(!pdev->info.is_virtfn);
>> +
>> + if ( new_mem_enabled != mem_enabled )
>> + {
>> + struct pci_dev *vf_pdev;
>> + if ( new_mem_enabled )
>> + {
>> + /* FIXME casting away const-ness to modify vf_rlen */
>> + size_vf_bars((struct pci_dev *)pdev, sriov_pos);
>> +
>> + list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
>> + map_vf(vf_pdev, PCI_COMMAND_MEMORY);
>> + }
>> + else
>> + {
>> + list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
>> + map_vf(vf_pdev, 0);
>> + }
>> + }
>> +
>> + pci_conf_write16(pdev->sbdf, reg, val);
>> +}
>> +
>> +static int vf_init_header(struct pci_dev *vf_pdev)
>> +{
>> + const struct pci_dev *pf_pdev;
>> + unsigned int sriov_pos;
>> + int rc = 0;
>> + uint16_t ctrl;
>> +
>> + ASSERT(rw_is_write_locked(&vf_pdev->domain->pci_lock));
>> +
>> + if ( !vf_pdev->info.is_virtfn )
>> + return 0;
>> +
>> + pf_pdev = vf_pdev->pf_pdev;
>> + ASSERT(pf_pdev);
>> +
>> + sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
>> + ctrl = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_CTRL);
>> +
>> + if ( (pf_pdev->domain == vf_pdev->domain) && (ctrl & PCI_SRIOV_CTRL_MSE) )
>> + {
>> + rc = map_vf(vf_pdev, PCI_COMMAND_MEMORY);
>> + if ( rc )
>> + return rc;
>> + }
>> +
>> + return rc;
>> +}
>> +
>> +static int init_sriov(struct pci_dev *pdev)
>> +{
>> + unsigned int pos;
>> +
>> + if ( pdev->info.is_virtfn )
>> + return vf_init_header(pdev);
>> +
>> + pos = pci_find_ext_capability(pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
>> +
>> + if ( !pos )
>> + return 0;
>> +
>> + if ( !is_hardware_domain(pdev->domain) )
>> + {
>> + printk(XENLOG_ERR
>> + "%pp: SR-IOV configuration unsupported for unpriv %pd\n",
>> + &pdev->sbdf, pdev->domain);
>> + return 0;
>> + }
>
> Should it instead rely on xsm to know if it the operation is allowed or
> not for this domain ?
>
Hi,
Do you have a suggestion on which hook should be used here? Semantically
both resource_plug_pci and resource_setup_pci make sense here but I am
not sure which one to choose as I am not very familiar with XSM.
>> +
>> + pdev->vpci->sriov = xzalloc(struct vpci_sriov);
>> + if ( !pdev->vpci->sriov )
>> + return -ENOMEM;
>> +
>> + return vpci_add_register(pdev->vpci, vpci_hw_read16, control_write,
>> + pos + PCI_SRIOV_CTRL, 2, NULL);
>> +}
>> +
>> +REGISTER_VPCI_INIT(init_sriov, VPCI_PRIORITY_LOW);
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 09988f04c2..7af6651831 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -120,6 +120,7 @@ void vpci_deassign_device(struct pci_dev *pdev)
>> for ( i = 0; i < ARRAY_SIZE(pdev->vpci->header.bars); i++ )
>> rangeset_destroy(pdev->vpci->header.bars[i].mem);
>>
>> + xfree(pdev->vpci->sriov);
>> xfree(pdev->vpci->msix);
>> xfree(pdev->vpci->msi);
>> xfree(pdev->vpci);
>> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
>> index 06f7039f20..9e8dcab17e 100644
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -138,7 +138,6 @@ struct vpci {
>> * upon to know whether BARs are mapped into the guest p2m.
>> */
>> bool bars_mapped : 1;
>> - /* FIXME: currently there's no support for SR-IOV. */
>> } header;
>>
>> /* MSI data. */
>> @@ -192,6 +191,12 @@ struct vpci {
>> struct vpci_arch_msix_entry arch;
>> } entries[];
>> } *msix;
>> +
>> + struct vpci_sriov {
>> + /* PF only */
>> + struct vpci_bar vf_bars[PCI_SRIOV_NUM_BARS];
>> + } *sriov;
>> +
>> #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> /* Guest SBDF of the device. */
>> #define INVALID_GUEST_SBDF ((pci_sbdf_t){ .sbdf = ~0U })
>
>
>
> Teddy Astie | Vates XCP-ng Developer
>
> XCP-ng & Xen Orchestra - Vates solutions
>
> web: https://vates.tech/
>
--
Mykyta
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0
2025-07-25 14:24 ` [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0 Mykyta Poturai
2025-07-25 17:39 ` Teddy Astie
@ 2025-07-28 11:33 ` Roger Pau Monné
2026-03-04 8:43 ` Mykyta Poturai
1 sibling, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2025-07-28 11:33 UTC (permalink / raw)
To: Mykyta Poturai
Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand,
Oleksii Kurochko, Community Manager, Andrew Cooper,
Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
Stefano Stabellini
On Fri, Jul 25, 2025 at 02:24:33PM +0000, Mykyta Poturai wrote:
> From: Stewart Hildebrand <stewart.hildebrand@amd.com>
>
> This code is expected to only be used by privileged domains,
> unprivileged domains should not get access to the SR-IOV capability.
>
> Implement RW handlers for PCI_SRIOV_CTRL register to dynamically
> map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to account
> for possible changes in the system page size register.
>
> Relies on dom0 to enable SR-IOV and PHYSDEVOP to inform Xen about
> addition/removal of VFs.
Why I'm not opposed to allowing registration of devices using
PHYSDEVOP, can't Xen detect the addition of the VFs and add them
itself?
When I worked on this long time ago, the version of the series that I
posted had registration of the VFs done by Xen also:
https://lore.kernel.org/xen-devel/20180717094830.54806-12-roger.pau@citrix.com/
>
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> ---
> CHANGELOG.md | 3 +-
> SUPPORT.md | 2 -
> xen/drivers/vpci/Makefile | 2 +-
> xen/drivers/vpci/header.c | 3 +
> xen/drivers/vpci/sriov.c | 235 ++++++++++++++++++++++++++++++++++++++
> xen/drivers/vpci/vpci.c | 1 +
> xen/include/xen/vpci.h | 7 +-
> 7 files changed, 247 insertions(+), 6 deletions(-)
> create mode 100644 xen/drivers/vpci/sriov.c
>
> diff --git a/CHANGELOG.md b/CHANGELOG.md
> index 5f31ca08fe..7b0e8beb76 100644
> --- a/CHANGELOG.md
> +++ b/CHANGELOG.md
> @@ -23,8 +23,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
> - On x86:
> - Option to attempt to fixup p2m page-faults on PVH dom0.
> - Resizable BARs is supported for PVH dom0.
> - - Support PCI passthrough for HVM domUs when dom0 is PVH (note SR-IOV
> - capability usage is not yet supported on PVH dom0).
> + - Support PCI passthrough for HVM domUs when dom0 is PVH.
Don't you need to move this out of the x86 specific section?
According to the cover letter you are testing on an ARM board, so this
probably needs to be put in a non-arch part of CHANGELOG?
> - Smoke tests for the FreeBSD Xen builds in Cirrus CI.
>
> - On Arm:
> diff --git a/SUPPORT.md b/SUPPORT.md
> index 6a82a92189..830b598714 100644
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -170,8 +170,6 @@ unexpected behavior or issues on some hardware.
>
> At least the following features are missing on a PVH dom0:
>
> - * PCI SR-IOV.
> -
> * Native NMI forwarding (nmi=dom0 command line option).
>
> * MCE handling.
> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
> index a7c8a30a89..fe1e57b64d 100644
> --- a/xen/drivers/vpci/Makefile
> +++ b/xen/drivers/vpci/Makefile
> @@ -1,2 +1,2 @@
> -obj-y += vpci.o header.o rebar.o
> +obj-y += vpci.o header.o rebar.o sriov.o
> obj-$(CONFIG_HAS_PCI_MSI) += msi.o msix.o
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index f947f652cd..0a840c6dcc 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -839,6 +839,9 @@ static int cf_check init_header(struct pci_dev *pdev)
>
> ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>
> + if ( pdev->info.is_virtfn )
> + return 0;
> +
> switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
> {
> case PCI_HEADER_TYPE_NORMAL:
> diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
> new file mode 100644
> index 0000000000..640430e3e9
> --- /dev/null
> +++ b/xen/drivers/vpci/sriov.c
> @@ -0,0 +1,235 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Handlers for accesses to the SR-IOV capability structure.
> + *
> + * Copyright (C) 2018 Citrix Systems R&D
If there's a Citrix copyright header here, shouldn't there be a
matching Signed-off-by from someone at Citrix (I think that's likely
me)?
Otherwise if there's no content authored by a Citrix employee the
copyright notice must be removed. We need to be careful with
copyright and attribution.
And in any case the date should be updated.
> + */
> +
> +#include <xen/sched.h>
> +#include <xen/vpci.h>
> +
> +static int vf_init_bars(const struct pci_dev *vf_pdev)
> +{
> + unsigned int i, sriov_pos;
> + int vf_idx, rc;
> + const struct pci_dev *pf_pdev = vf_pdev->pf_pdev;
> + uint16_t offset, stride;
> + struct vpci_bar *bars = vf_pdev->vpci->header.bars;
> + struct vpci_bar *physfn_vf_bars = pf_pdev->vpci->sriov->vf_bars;
> +
> + sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
> + offset = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_VF_OFFSET);
> + stride = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_VF_STRIDE);
You can possibly initialize the fields as definition time?
> +
> + vf_idx = vf_pdev->sbdf.sbdf;
> + vf_idx -= pf_pdev->sbdf.sbdf + offset;
> + if ( vf_idx < 0 )
> + return -EINVAL;
> + if ( stride )
> + {
> + if ( vf_idx % stride )
> + return -EINVAL;
> + vf_idx /= stride;
> + }
> +
> + /*
> + * Set up BARs for this VF out of PF's VF BARs taking into account
> + * the index of the VF.
> + */
> + for ( i = 0; i < PCI_SRIOV_NUM_BARS; i++ )
> + {
> + bars[i].addr = physfn_vf_bars[i].addr + vf_idx * physfn_vf_bars[i].size;
> + bars[i].guest_addr = bars[i].addr;
> + bars[i].size = physfn_vf_bars[i].size;
> + bars[i].type = physfn_vf_bars[i].type;
> + bars[i].prefetchable = physfn_vf_bars[i].prefetchable;
> + rc = vpci_bar_add_rangeset(vf_pdev, &bars[i], i);
> + if ( rc )
> + return rc;
> + }
> +
> + return 0;
> +}
> +
> +static int map_vf(const struct pci_dev *vf_pdev, uint16_t cmd)
> +{
> + int rc;
> +
> + ASSERT(rw_is_write_locked(&vf_pdev->domain->pci_lock));
> +
> + rc = vf_init_bars(vf_pdev);
> + if ( rc )
> + return rc;
> +
> + return vpci_modify_bars(vf_pdev, cmd, false);
> +}
> +
> +static int size_vf_bars(struct pci_dev *pf_pdev, unsigned int sriov_pos)
> +{
> + /*
> + * NB: a non-const pci_dev of the PF is needed in order to update
> + * vf_rlen.
> + */
> + struct vpci_bar *bars;
> + unsigned int i;
> + int rc = 0;
> +
> + ASSERT(rw_is_write_locked(&pf_pdev->domain->pci_lock));
> + ASSERT(!pf_pdev->info.is_virtfn);
> +
> + if ( !pf_pdev->vpci->sriov )
Is this possible? If the struct is not allocated the hook shouldn't
be registered either. Maybe it wants to be an ASSERT instead?
> + return -EINVAL;
> +
> + /* Read BARs for VFs out of PF's SR-IOV extended capability. */
> + bars = pf_pdev->vpci->sriov->vf_bars;
> + /* Set the BARs addresses and size. */
> + for ( i = 0; i < PCI_SRIOV_NUM_BARS; i += rc )
> + {
> + unsigned int idx = sriov_pos + PCI_SRIOV_BAR + i * 4;
> + uint32_t bar;
> + uint64_t addr, size;
> +
> + bar = pci_conf_read32(pf_pdev->sbdf, idx);
> +
> + rc = pci_size_mem_bar(pf_pdev->sbdf, idx, &addr, &size,
> + PCI_BAR_VF |
> + ((i == PCI_SRIOV_NUM_BARS - 1) ? PCI_BAR_LAST
> + : 0));
> +
> + /*
> + * Update vf_rlen on the PF. According to the spec the size of
> + * the BARs can change if the system page size register is
> + * modified, so always update rlen when enabling VFs.
> + */
> + pf_pdev->physfn.vf_rlen[i] = size;
> +
> + if ( !size )
> + {
> + bars[i].type = VPCI_BAR_EMPTY;
> + continue;
> + }
> +
> + bars[i].addr = addr;
> + bars[i].guest_addr = addr;
> + bars[i].size = size;
> + bars[i].prefetchable = bar & PCI_BASE_ADDRESS_MEM_PREFETCH;
> +
> + switch ( rc )
> + {
> + case 1:
> + bars[i].type = VPCI_BAR_MEM32;
> + break;
> +
> + case 2:
> + bars[i].type = VPCI_BAR_MEM64_LO;
> + bars[i + 1].type = VPCI_BAR_MEM64_HI;
> + break;
> +
> + default:
> + ASSERT_UNREACHABLE();
> + }
> + }
> +
> + rc = rc > 0 ? 0 : rc;
> +
> + return rc;
> +}
> +
> +static void cf_check control_write(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data)
> +{
> + unsigned int sriov_pos = reg - PCI_SRIOV_CTRL;
> + uint16_t control = pci_conf_read16(pdev->sbdf, reg);
> + bool mem_enabled = control & PCI_SRIOV_CTRL_MSE;
> + bool new_mem_enabled = val & PCI_SRIOV_CTRL_MSE;
> +
> + ASSERT(!pdev->info.is_virtfn);
> +
> + if ( new_mem_enabled != mem_enabled )
> + {
> + struct pci_dev *vf_pdev;
> + if ( new_mem_enabled )
> + {
> + /* FIXME casting away const-ness to modify vf_rlen */
> + size_vf_bars((struct pci_dev *)pdev, sriov_pos);
> +
> + list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
> + map_vf(vf_pdev, PCI_COMMAND_MEMORY);
It's my understating this requires that devices are registerted with
Xen using the PHYSDEVOP prior to dom0 attempting to enable VF MSE. If
that's the case it should be clearly stated, as it's a relevant
ordering constrain.
I also don't think this will work as expected, as each successive call
to map_vf() (which calls vpci_modify_bars()) will lead to overwriting
the previously queued map/unmap operation, and only the BARs from the
last VF on the list will actually be mapped?
If this is to strictly deal with the PF VF BAR{0-5} registers you
could map the whole register in one go, instead of dealing with each
VF BARs separately. That would also lead to possibly bigger pages
being used in the p2m, as the whole VF BARs will be mapped in one go,
instead of splitting them into multiple isolated mapping operations.
> + }
> + else
> + {
> + list_for_each_entry(vf_pdev, &pdev->vf_list, vf_list)
> + map_vf(vf_pdev, 0);
> + }
> + }
Don't you need to track the VF enable bit here, so that VFs are
registered as pci_devs with Xen once VF enable is set?
Also the size of the BARs as result of a change to the System Page
Size register can only happen when VF enable is not set, and hence the
sizing of BARs could be limited to the setting of VF enable, instead
of VF MSE?
> +
> + pci_conf_write16(pdev->sbdf, reg, val);
> +}
> +
> +static int vf_init_header(struct pci_dev *vf_pdev)
> +{
> + const struct pci_dev *pf_pdev;
> + unsigned int sriov_pos;
> + int rc = 0;
> + uint16_t ctrl;
> +
> + ASSERT(rw_is_write_locked(&vf_pdev->domain->pci_lock));
> +
> + if ( !vf_pdev->info.is_virtfn )
> + return 0;
> +
> + pf_pdev = vf_pdev->pf_pdev;
> + ASSERT(pf_pdev);
> +
> + sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
> + ctrl = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_CTRL);
> +
> + if ( (pf_pdev->domain == vf_pdev->domain) && (ctrl & PCI_SRIOV_CTRL_MSE) )
> + {
> + rc = map_vf(vf_pdev, PCI_COMMAND_MEMORY);
> + if ( rc )
> + return rc;
> + }
> +
> + return rc;
> +}
> +
> +static int init_sriov(struct pci_dev *pdev)
This seems to be missing a cf_check attribute?
> +{
> + unsigned int pos;
> +
> + if ( pdev->info.is_virtfn )
> + return vf_init_header(pdev);
> +
> + pos = pci_find_ext_capability(pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
> +
> + if ( !pos )
> + return 0;
> +
> + if ( !is_hardware_domain(pdev->domain) )
> + {
> + printk(XENLOG_ERR
> + "%pp: SR-IOV configuration unsupported for unpriv %pd\n",
> + &pdev->sbdf, pdev->domain);
> + return 0;
> + }
> +
> + pdev->vpci->sriov = xzalloc(struct vpci_sriov);
> + if ( !pdev->vpci->sriov )
> + return -ENOMEM;
> +
> + return vpci_add_register(pdev->vpci, vpci_hw_read16, control_write,
> + pos + PCI_SRIOV_CTRL, 2, NULL);
> +}
> +
> +REGISTER_VPCI_INIT(init_sriov, VPCI_PRIORITY_LOW);
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 09988f04c2..7af6651831 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -120,6 +120,7 @@ void vpci_deassign_device(struct pci_dev *pdev)
> for ( i = 0; i < ARRAY_SIZE(pdev->vpci->header.bars); i++ )
> rangeset_destroy(pdev->vpci->header.bars[i].mem);
>
> + xfree(pdev->vpci->sriov);
> xfree(pdev->vpci->msix);
> xfree(pdev->vpci->msi);
> xfree(pdev->vpci);
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index 06f7039f20..9e8dcab17e 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -138,7 +138,6 @@ struct vpci {
> * upon to know whether BARs are mapped into the guest p2m.
> */
> bool bars_mapped : 1;
> - /* FIXME: currently there's no support for SR-IOV. */
> } header;
>
> /* MSI data. */
> @@ -192,6 +191,12 @@ struct vpci {
> struct vpci_arch_msix_entry arch;
> } entries[];
> } *msix;
> +
> + struct vpci_sriov {
> + /* PF only */
> + struct vpci_bar vf_bars[PCI_SRIOV_NUM_BARS];
Can't you re-use the existing vpci_bar array in vpci_header structure?
Thanks, Roger.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0
2025-07-28 11:33 ` Roger Pau Monné
@ 2026-03-04 8:43 ` Mykyta Poturai
2026-03-04 15:16 ` Roger Pau Monné
0 siblings, 1 reply; 19+ messages in thread
From: Mykyta Poturai @ 2026-03-04 8:43 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand,
Oleksii Kurochko, Community Manager, Andrew Cooper,
Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
Stefano Stabellini
On 7/28/25 14:33, Roger Pau Monné wrote:
> On Fri, Jul 25, 2025 at 02:24:33PM +0000, Mykyta Poturai wrote:
>> From: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>
>> This code is expected to only be used by privileged domains,
>> unprivileged domains should not get access to the SR-IOV capability.
>>
>> Implement RW handlers for PCI_SRIOV_CTRL register to dynamically
>> map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to account
>> for possible changes in the system page size register.
>>
>> Relies on dom0 to enable SR-IOV and PHYSDEVOP to inform Xen about
>> addition/removal of VFs.
>
> Why I'm not opposed to allowing registration of devices using
> PHYSDEVOP, can't Xen detect the addition of the VFs and add them
> itself?
>
> When I worked on this long time ago, the version of the series that I
> posted had registration of the VFs done by Xen also:
>
> https://lore.kernel.org/xen-devel/20180717094830.54806-12-roger.pau@citrix.com/
>
>>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
>> ---
>> CHANGELOG.md | 3 +-
>> SUPPORT.md | 2 -
>> xen/drivers/vpci/Makefile | 2 +-
>> xen/drivers/vpci/header.c | 3 +
>> xen/drivers/vpci/sriov.c | 235 ++++++++++++++++++++++++++++++++++++++
>> xen/drivers/vpci/vpci.c | 1 +
>> xen/include/xen/vpci.h | 7 +-
>> 7 files changed, 247 insertions(+), 6 deletions(-)
>> create mode 100644 xen/drivers/vpci/sriov.c
>>
>> diff --git a/CHANGELOG.md b/CHANGELOG.md
>> index 5f31ca08fe..7b0e8beb76 100644
>> --- a/CHANGELOG.md
>> +++ b/CHANGELOG.md
>> @@ -23,8 +23,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/
>> - On x86:
>> - Option to attempt to fixup p2m page-faults on PVH dom0.
>> - Resizable BARs is supported for PVH dom0.
>> - - Support PCI passthrough for HVM domUs when dom0 is PVH (note SR-IOV
>> - capability usage is not yet supported on PVH dom0).
>> + - Support PCI passthrough for HVM domUs when dom0 is PVH.
>
> Don't you need to move this out of the x86 specific section?
>
> According to the cover letter you are testing on an ARM board, so this
> probably needs to be put in a non-arch part of CHANGELOG?
>
>> - Smoke tests for the FreeBSD Xen builds in Cirrus CI.
>>
>> - On Arm:
>> diff --git a/SUPPORT.md b/SUPPORT.md
>> index 6a82a92189..830b598714 100644
>> --- a/SUPPORT.md
>> +++ b/SUPPORT.md
>> @@ -170,8 +170,6 @@ unexpected behavior or issues on some hardware.
>>
>> At least the following features are missing on a PVH dom0:
>>
>> - * PCI SR-IOV.
>> -
>> * Native NMI forwarding (nmi=dom0 command line option).
>>
>> * MCE handling.
>> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
>> index a7c8a30a89..fe1e57b64d 100644
>> --- a/xen/drivers/vpci/Makefile
>> +++ b/xen/drivers/vpci/Makefile
>> @@ -1,2 +1,2 @@
>> -obj-y += vpci.o header.o rebar.o
>> +obj-y += vpci.o header.o rebar.o sriov.o
>> obj-$(CONFIG_HAS_PCI_MSI) += msi.o msix.o
>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index f947f652cd..0a840c6dcc 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -839,6 +839,9 @@ static int cf_check init_header(struct pci_dev *pdev)
>>
>> ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>
>> + if ( pdev->info.is_virtfn )
>> + return 0;
>> +
>> switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>> {
>> case PCI_HEADER_TYPE_NORMAL:
>> diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
>> new file mode 100644
>> index 0000000000..640430e3e9
>> --- /dev/null
>> +++ b/xen/drivers/vpci/sriov.c
>> @@ -0,0 +1,235 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Handlers for accesses to the SR-IOV capability structure.
>> + *
>> + * Copyright (C) 2018 Citrix Systems R&D
>
> If there's a Citrix copyright header here, shouldn't there be a
> matching Signed-off-by from someone at Citrix (I think that's likely
> me)?
>
> Otherwise if there's no content authored by a Citrix employee the
> copyright notice must be removed. We need to be careful with
> copyright and attribution.
>
> And in any case the date should be updated.
>
Can I add your SOB or is it better to remove the copyright? Looking at
the patches you provided, I think this ones were definitely based on
them, but there are also a lot of changes since then.
--
Mykyta
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0
2026-03-04 8:43 ` Mykyta Poturai
@ 2026-03-04 15:16 ` Roger Pau Monné
2026-03-04 15:34 ` Stewart Hildebrand
0 siblings, 1 reply; 19+ messages in thread
From: Roger Pau Monné @ 2026-03-04 15:16 UTC (permalink / raw)
To: Mykyta Poturai
Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand,
Oleksii Kurochko, Community Manager, Andrew Cooper,
Anthony PERARD, Michal Orzel, Jan Beulich, Julien Grall,
Stefano Stabellini
On Wed, Mar 04, 2026 at 08:43:17AM +0000, Mykyta Poturai wrote:
> On 7/28/25 14:33, Roger Pau Monné wrote:
> > On Fri, Jul 25, 2025 at 02:24:33PM +0000, Mykyta Poturai wrote:
> >> From: Stewart Hildebrand <stewart.hildebrand@amd.com>
> >>
> >> This code is expected to only be used by privileged domains,
> >> unprivileged domains should not get access to the SR-IOV capability.
> >>
> >> Implement RW handlers for PCI_SRIOV_CTRL register to dynamically
> >> map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to account
> >> for possible changes in the system page size register.
> >>
> >> Relies on dom0 to enable SR-IOV and PHYSDEVOP to inform Xen about
> >> addition/removal of VFs.
> >
> > Why I'm not opposed to allowing registration of devices using
> > PHYSDEVOP, can't Xen detect the addition of the VFs and add them
> > itself?
> >
> > When I worked on this long time ago, the version of the series that I
> > posted had registration of the VFs done by Xen also:
> >
> > https://lore.kernel.org/xen-devel/20180717094830.54806-12-roger.pau@citrix.com/
> >
> >>
> >> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> >> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> >> ---
> >> CHANGELOG.md | 3 +-
> >> SUPPORT.md | 2 -
> >> xen/drivers/vpci/Makefile | 2 +-
> >> xen/drivers/vpci/header.c | 3 +
> >> xen/drivers/vpci/sriov.c | 235 ++++++++++++++++++++++++++++++++++++++
> >> xen/drivers/vpci/vpci.c | 1 +
> >> xen/include/xen/vpci.h | 7 +-
> >> 7 files changed, 247 insertions(+), 6 deletions(-)
> >> create mode 100644 xen/drivers/vpci/sriov.c
> >>
> >> diff --git a/CHANGELOG.md b/CHANGELOG.md
> >> index 5f31ca08fe..7b0e8beb76 100644
> >> --- a/CHANGELOG.md
> >> +++ b/CHANGELOG.md
> >> @@ -23,8 +23,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/
> >> - On x86:
> >> - Option to attempt to fixup p2m page-faults on PVH dom0.
> >> - Resizable BARs is supported for PVH dom0.
> >> - - Support PCI passthrough for HVM domUs when dom0 is PVH (note SR-IOV
> >> - capability usage is not yet supported on PVH dom0).
> >> + - Support PCI passthrough for HVM domUs when dom0 is PVH.
> >
> > Don't you need to move this out of the x86 specific section?
> >
> > According to the cover letter you are testing on an ARM board, so this
> > probably needs to be put in a non-arch part of CHANGELOG?
> >
> >> - Smoke tests for the FreeBSD Xen builds in Cirrus CI.
> >>
> >> - On Arm:
> >> diff --git a/SUPPORT.md b/SUPPORT.md
> >> index 6a82a92189..830b598714 100644
> >> --- a/SUPPORT.md
> >> +++ b/SUPPORT.md
> >> @@ -170,8 +170,6 @@ unexpected behavior or issues on some hardware.
> >>
> >> At least the following features are missing on a PVH dom0:
> >>
> >> - * PCI SR-IOV.
> >> -
> >> * Native NMI forwarding (nmi=dom0 command line option).
> >>
> >> * MCE handling.
> >> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
> >> index a7c8a30a89..fe1e57b64d 100644
> >> --- a/xen/drivers/vpci/Makefile
> >> +++ b/xen/drivers/vpci/Makefile
> >> @@ -1,2 +1,2 @@
> >> -obj-y += vpci.o header.o rebar.o
> >> +obj-y += vpci.o header.o rebar.o sriov.o
> >> obj-$(CONFIG_HAS_PCI_MSI) += msi.o msix.o
> >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> >> index f947f652cd..0a840c6dcc 100644
> >> --- a/xen/drivers/vpci/header.c
> >> +++ b/xen/drivers/vpci/header.c
> >> @@ -839,6 +839,9 @@ static int cf_check init_header(struct pci_dev *pdev)
> >>
> >> ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> >>
> >> + if ( pdev->info.is_virtfn )
> >> + return 0;
> >> +
> >> switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
> >> {
> >> case PCI_HEADER_TYPE_NORMAL:
> >> diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
> >> new file mode 100644
> >> index 0000000000..640430e3e9
> >> --- /dev/null
> >> +++ b/xen/drivers/vpci/sriov.c
> >> @@ -0,0 +1,235 @@
> >> +/* SPDX-License-Identifier: GPL-2.0-only */
> >> +/*
> >> + * Handlers for accesses to the SR-IOV capability structure.
> >> + *
> >> + * Copyright (C) 2018 Citrix Systems R&D
> >
> > If there's a Citrix copyright header here, shouldn't there be a
> > matching Signed-off-by from someone at Citrix (I think that's likely
> > me)?
> >
> > Otherwise if there's no content authored by a Citrix employee the
> > copyright notice must be removed. We need to be careful with
> > copyright and attribution.
> >
> > And in any case the date should be updated.
> >
>
> Can I add your SOB or is it better to remove the copyright? Looking at
> the patches you provided, I think this ones were definitely based on
> them, but there are also a lot of changes since then.
If it's based on the patches that I sent many years ago (2018),
then yes, you likely need to add my SoB. Look like that way from the
copyright notice.
Thanks, Roger.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0
2026-03-04 15:16 ` Roger Pau Monné
@ 2026-03-04 15:34 ` Stewart Hildebrand
0 siblings, 0 replies; 19+ messages in thread
From: Stewart Hildebrand @ 2026-03-04 15:34 UTC (permalink / raw)
To: Roger Pau Monné, Mykyta Poturai
Cc: xen-devel@lists.xenproject.org, Oleksii Kurochko,
Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Stefano Stabellini
On 3/4/26 10:16, Roger Pau Monné wrote:
> On Wed, Mar 04, 2026 at 08:43:17AM +0000, Mykyta Poturai wrote:
>> On 7/28/25 14:33, Roger Pau Monné wrote:
>>> On Fri, Jul 25, 2025 at 02:24:33PM +0000, Mykyta Poturai wrote:
>>>> From: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>>>
>>>> This code is expected to only be used by privileged domains,
>>>> unprivileged domains should not get access to the SR-IOV capability.
>>>>
>>>> Implement RW handlers for PCI_SRIOV_CTRL register to dynamically
>>>> map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to account
>>>> for possible changes in the system page size register.
>>>>
>>>> Relies on dom0 to enable SR-IOV and PHYSDEVOP to inform Xen about
>>>> addition/removal of VFs.
>>>
>>> Why I'm not opposed to allowing registration of devices using
>>> PHYSDEVOP, can't Xen detect the addition of the VFs and add them
>>> itself?
>>>
>>> When I worked on this long time ago, the version of the series that I
>>> posted had registration of the VFs done by Xen also:
>>>
>>> https://lore.kernel.org/xen-devel/20180717094830.54806-12-roger.pau@citrix.com/
>>>
>>>>
>>>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>>> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
>>>> ---
>>>> CHANGELOG.md | 3 +-
>>>> SUPPORT.md | 2 -
>>>> xen/drivers/vpci/Makefile | 2 +-
>>>> xen/drivers/vpci/header.c | 3 +
>>>> xen/drivers/vpci/sriov.c | 235 ++++++++++++++++++++++++++++++++++++++
>>>> xen/drivers/vpci/vpci.c | 1 +
>>>> xen/include/xen/vpci.h | 7 +-
>>>> 7 files changed, 247 insertions(+), 6 deletions(-)
>>>> create mode 100644 xen/drivers/vpci/sriov.c
>>>>
>>>> diff --git a/CHANGELOG.md b/CHANGELOG.md
>>>> index 5f31ca08fe..7b0e8beb76 100644
>>>> --- a/CHANGELOG.md
>>>> +++ b/CHANGELOG.md
>>>> @@ -23,8 +23,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/
>>>> - On x86:
>>>> - Option to attempt to fixup p2m page-faults on PVH dom0.
>>>> - Resizable BARs is supported for PVH dom0.
>>>> - - Support PCI passthrough for HVM domUs when dom0 is PVH (note SR-IOV
>>>> - capability usage is not yet supported on PVH dom0).
>>>> + - Support PCI passthrough for HVM domUs when dom0 is PVH.
>>>
>>> Don't you need to move this out of the x86 specific section?
>>>
>>> According to the cover letter you are testing on an ARM board, so this
>>> probably needs to be put in a non-arch part of CHANGELOG?
>>>
>>>> - Smoke tests for the FreeBSD Xen builds in Cirrus CI.
>>>>
>>>> - On Arm:
>>>> diff --git a/SUPPORT.md b/SUPPORT.md
>>>> index 6a82a92189..830b598714 100644
>>>> --- a/SUPPORT.md
>>>> +++ b/SUPPORT.md
>>>> @@ -170,8 +170,6 @@ unexpected behavior or issues on some hardware.
>>>>
>>>> At least the following features are missing on a PVH dom0:
>>>>
>>>> - * PCI SR-IOV.
>>>> -
>>>> * Native NMI forwarding (nmi=dom0 command line option).
>>>>
>>>> * MCE handling.
>>>> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
>>>> index a7c8a30a89..fe1e57b64d 100644
>>>> --- a/xen/drivers/vpci/Makefile
>>>> +++ b/xen/drivers/vpci/Makefile
>>>> @@ -1,2 +1,2 @@
>>>> -obj-y += vpci.o header.o rebar.o
>>>> +obj-y += vpci.o header.o rebar.o sriov.o
>>>> obj-$(CONFIG_HAS_PCI_MSI) += msi.o msix.o
>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>> index f947f652cd..0a840c6dcc 100644
>>>> --- a/xen/drivers/vpci/header.c
>>>> +++ b/xen/drivers/vpci/header.c
>>>> @@ -839,6 +839,9 @@ static int cf_check init_header(struct pci_dev *pdev)
>>>>
>>>> ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>>>
>>>> + if ( pdev->info.is_virtfn )
>>>> + return 0;
>>>> +
>>>> switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
>>>> {
>>>> case PCI_HEADER_TYPE_NORMAL:
>>>> diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
>>>> new file mode 100644
>>>> index 0000000000..640430e3e9
>>>> --- /dev/null
>>>> +++ b/xen/drivers/vpci/sriov.c
>>>> @@ -0,0 +1,235 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>> +/*
>>>> + * Handlers for accesses to the SR-IOV capability structure.
>>>> + *
>>>> + * Copyright (C) 2018 Citrix Systems R&D
>>>
>>> If there's a Citrix copyright header here, shouldn't there be a
>>> matching Signed-off-by from someone at Citrix (I think that's likely
>>> me)?
>>>
>>> Otherwise if there's no content authored by a Citrix employee the
>>> copyright notice must be removed. We need to be careful with
>>> copyright and attribution.
>>>
>>> And in any case the date should be updated.
>>>
>>
>> Can I add your SOB or is it better to remove the copyright? Looking at
>> the patches you provided, I think this ones were definitely based on
>> them, but there are also a lot of changes since then.
>
> If it's based on the patches that I sent many years ago (2018),
> then yes, you likely need to add my SoB. Look like that way from the
> copyright notice.
>
> Thanks, Roger.
A bit of context here: When I worked on this, I used Roger's sriov.c from [1] as
a starting point. Unfortunately, during my haste of development, I neglected to
preserve authorship/Signed-off-by, and didn't get back to addressing it before I
shared the branch with Mykyta. So yes, at minimum Roger's S-o-b should be
re-added (likely as the first S-o-b), and potentially even make Roger author.
[1] https://lore.kernel.org/xen-devel/20180717094830.54806-12-roger.pau@citrix.com/
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v1 5/6] vpci: export vpci_init_capability_list()
2025-07-25 14:24 [PATCH v1 0/6] Implement SR-IOV support for PVH Mykyta Poturai
` (3 preceding siblings ...)
2025-07-25 14:24 ` [PATCH v1 4/6] vpci: add SR-IOV support for PVH Dom0 Mykyta Poturai
@ 2025-07-25 14:24 ` Mykyta Poturai
2025-07-25 14:24 ` [PATCH v1 6/6] vpci: add SR-IOV support for DomUs Mykyta Poturai
2025-07-28 9:04 ` [PATCH v1 0/6] Implement SR-IOV support for PVH Roger Pau Monné
6 siblings, 0 replies; 19+ messages in thread
From: Mykyta Poturai @ 2025-07-25 14:24 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Stewart Hildebrand, Roger Pau Monné, Mykyta Poturai
From: Stewart Hildebrand <stewart.hildebrand@amd.com>
Export functions required for SR-IOV support.
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
---
| 2 +-
xen/include/xen/vpci.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
--git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 0a840c6dcc..ae44d6a73c 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -745,7 +745,7 @@ int vpci_bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
return !bar->mem ? -ENOMEM : 0;
}
-static int vpci_init_capability_list(struct pci_dev *pdev)
+int vpci_init_capability_list(struct pci_dev *pdev)
{
int rc;
bool mask_cap_list = false;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 9e8dcab17e..b8fa93575b 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -302,6 +302,8 @@ int vpci_modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only);
int vpci_bar_add_rangeset(const struct pci_dev *pdev, struct vpci_bar *bar,
unsigned int i);
+int vpci_init_capability_list(struct pci_dev *pdev);
+
#endif /* __XEN__ */
#else /* !CONFIG_HAS_VPCI */
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH v1 6/6] vpci: add SR-IOV support for DomUs
2025-07-25 14:24 [PATCH v1 0/6] Implement SR-IOV support for PVH Mykyta Poturai
` (4 preceding siblings ...)
2025-07-25 14:24 ` [PATCH v1 5/6] vpci: export vpci_init_capability_list() Mykyta Poturai
@ 2025-07-25 14:24 ` Mykyta Poturai
2025-07-28 15:10 ` Roger Pau Monné
2025-07-28 9:04 ` [PATCH v1 0/6] Implement SR-IOV support for PVH Roger Pau Monné
6 siblings, 1 reply; 19+ messages in thread
From: Mykyta Poturai @ 2025-07-25 14:24 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Stewart Hildebrand, Roger Pau Monné, Mykyta Poturai
From: Stewart Hildebrand <stewart.hildebrand@amd.com>
SR-IOV virtual functions have simplified configuration space such as
Vendor ID is derived from the physical function and Device ID comes
from SR-IOV extended capability.
Emulate those, so we can provide VID/DID pair for guests which use PCI
passthrough for SR-IOV virtual functions.
Emulate guest BAR register values based on PF BAR values for VFs.
This allows creating a guest view of the normal BAR registers and emulates
the size and properties as it is done during PCI device enumeration by
the guest.
Note, that VFs ROM BAR is read-only and is all zeros, but VF may provide
access to the PFs ROM via emulation and is not implemented.
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
---
xen/drivers/vpci/sriov.c | 119 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 118 insertions(+), 1 deletion(-)
diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
index 640430e3e9..bf8d9763c6 100644
--- a/xen/drivers/vpci/sriov.c
+++ b/xen/drivers/vpci/sriov.c
@@ -39,7 +39,8 @@ static int vf_init_bars(const struct pci_dev *vf_pdev)
for ( i = 0; i < PCI_SRIOV_NUM_BARS; i++ )
{
bars[i].addr = physfn_vf_bars[i].addr + vf_idx * physfn_vf_bars[i].size;
- bars[i].guest_addr = bars[i].addr;
+ if ( pf_pdev->domain == vf_pdev->domain || bars[i].guest_addr == 0 )
+ bars[i].guest_addr = bars[i].addr;
bars[i].size = physfn_vf_bars[i].size;
bars[i].type = physfn_vf_bars[i].type;
bars[i].prefetchable = physfn_vf_bars[i].prefetchable;
@@ -166,6 +167,44 @@ static void cf_check control_write(const struct pci_dev *pdev, unsigned int reg,
pci_conf_write16(pdev->sbdf, reg, val);
}
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static void cf_check vf_cmd_write(const struct pci_dev *pdev, unsigned int reg,
+ uint32_t cmd, void *data)
+{
+ struct vpci_header *header = data;
+
+ cmd |= PCI_COMMAND_INTX_DISABLE;
+
+ header->guest_cmd = cmd;
+
+ /*
+ * Let Dom0 play with all the bits directly except for the memory
+ * decoding one. Bits that are not allowed for DomU are already
+ * handled above and by the rsvdp_mask.
+ */
+ if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
+ {
+ /*
+ * Ignore the error. No memory has been added or removed from the p2m
+ * (because the actual p2m changes are deferred in defer_map) and the
+ * memory decoding bit has not been changed, so leave everything as-is,
+ * hoping the guest will realize and try again.
+ */
+ map_vf(pdev, cmd);
+ }
+ else
+ pci_conf_write16(pdev->sbdf, reg, cmd);
+}
+
+static uint32_t cf_check vf_cmd_read(const struct pci_dev *pdev,
+ unsigned int reg, void *data)
+{
+ const struct vpci_header *header = data;
+
+ return header->guest_cmd;
+}
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
static int vf_init_header(struct pci_dev *vf_pdev)
{
const struct pci_dev *pf_pdev;
@@ -184,6 +223,84 @@ static int vf_init_header(struct pci_dev *vf_pdev)
sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
ctrl = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_CTRL);
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+ if ( pf_pdev->domain != vf_pdev->domain )
+ {
+ uint16_t vid = pci_conf_read16(pf_pdev->sbdf, PCI_VENDOR_ID);
+ uint16_t did = pci_conf_read16(pf_pdev->sbdf,
+ sriov_pos + PCI_SRIOV_VF_DID);
+ struct vpci_bar *bars = vf_pdev->vpci->header.bars;
+ unsigned int i;
+
+ rc = vpci_add_register(vf_pdev->vpci, vpci_read_val, NULL,
+ PCI_VENDOR_ID, 2, (void *)(uintptr_t)vid);
+ if ( rc )
+ return rc;
+
+ rc = vpci_add_register(vf_pdev->vpci, vpci_read_val, NULL,
+ PCI_DEVICE_ID, 2, (void *)(uintptr_t)did);
+ if ( rc )
+ return rc;
+
+ /* Hardcode multi-function device bit to 0 */
+ rc = vpci_add_register(vf_pdev->vpci, vpci_read_val, NULL,
+ PCI_HEADER_TYPE, 1,
+ (void *)PCI_HEADER_TYPE_NORMAL);
+ if ( rc )
+ return rc;
+
+ rc = vpci_add_register(vf_pdev->vpci, vpci_hw_read32, NULL,
+ PCI_CLASS_REVISION, 4, NULL);
+ if ( rc )
+ return rc;
+
+ /* VF ROZ is covered by ro_mask */
+ rc = vpci_add_register_mask(vf_pdev->vpci, vf_cmd_read, vf_cmd_write,
+ PCI_COMMAND, 2, &vf_pdev->vpci->header,
+ PCI_COMMAND_IO | PCI_COMMAND_SPECIAL |
+ PCI_COMMAND_INVALIDATE |
+ PCI_COMMAND_VGA_PALETTE | PCI_COMMAND_WAIT |
+ PCI_COMMAND_FAST_BACK,
+ 0,
+ PCI_COMMAND_RSVDP_MASK |
+ PCI_COMMAND_PARITY | PCI_COMMAND_SERR,
+ 0);
+ if ( rc )
+ return rc;
+
+ rc = vpci_init_capability_list(vf_pdev);
+ if ( rc )
+ return rc;
+
+ for ( i = 0; i < PCI_SRIOV_NUM_BARS; i++ )
+ {
+ switch ( pf_pdev->vpci->sriov->vf_bars[i].type )
+ {
+ case VPCI_BAR_MEM32:
+ case VPCI_BAR_MEM64_LO:
+ case VPCI_BAR_MEM64_HI:
+ rc = vpci_add_register(vf_pdev->vpci, vpci_guest_mem_bar_read,
+ vpci_guest_mem_bar_write,
+ PCI_BASE_ADDRESS_0 + i * 4, 4, &bars[i]);
+ if ( rc )
+ return rc;
+ break;
+ default:
+ rc = vpci_add_register(vf_pdev->vpci, vpci_read_val, NULL,
+ PCI_BASE_ADDRESS_0 + i * 4, 4,
+ (void *)0);
+ if ( rc )
+ return rc;
+ break;
+ }
+ }
+
+ rc = vf_init_bars(vf_pdev);
+ if ( rc )
+ return rc;
+ }
+#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
+
if ( (pf_pdev->domain == vf_pdev->domain) && (ctrl & PCI_SRIOV_CTRL_MSE) )
{
rc = map_vf(vf_pdev, PCI_COMMAND_MEMORY);
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH v1 6/6] vpci: add SR-IOV support for DomUs
2025-07-25 14:24 ` [PATCH v1 6/6] vpci: add SR-IOV support for DomUs Mykyta Poturai
@ 2025-07-28 15:10 ` Roger Pau Monné
0 siblings, 0 replies; 19+ messages in thread
From: Roger Pau Monné @ 2025-07-28 15:10 UTC (permalink / raw)
To: Mykyta Poturai; +Cc: xen-devel@lists.xenproject.org, Stewart Hildebrand
On Fri, Jul 25, 2025 at 02:24:33PM +0000, Mykyta Poturai wrote:
> From: Stewart Hildebrand <stewart.hildebrand@amd.com>
>
> SR-IOV virtual functions have simplified configuration space such as
> Vendor ID is derived from the physical function and Device ID comes
> from SR-IOV extended capability.
> Emulate those, so we can provide VID/DID pair for guests which use PCI
> passthrough for SR-IOV virtual functions.
>
> Emulate guest BAR register values based on PF BAR values for VFs.
> This allows creating a guest view of the normal BAR registers and emulates
> the size and properties as it is done during PCI device enumeration by
> the guest.
>
> Note, that VFs ROM BAR is read-only and is all zeros, but VF may provide
> access to the PFs ROM via emulation and is not implemented.
>
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
> ---
> xen/drivers/vpci/sriov.c | 119 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 118 insertions(+), 1 deletion(-)
>
> diff --git a/xen/drivers/vpci/sriov.c b/xen/drivers/vpci/sriov.c
> index 640430e3e9..bf8d9763c6 100644
> --- a/xen/drivers/vpci/sriov.c
> +++ b/xen/drivers/vpci/sriov.c
> @@ -39,7 +39,8 @@ static int vf_init_bars(const struct pci_dev *vf_pdev)
> for ( i = 0; i < PCI_SRIOV_NUM_BARS; i++ )
> {
> bars[i].addr = physfn_vf_bars[i].addr + vf_idx * physfn_vf_bars[i].size;
> - bars[i].guest_addr = bars[i].addr;
> + if ( pf_pdev->domain == vf_pdev->domain || bars[i].guest_addr == 0 )
> + bars[i].guest_addr = bars[i].addr;
Wouldn't this better be done based on the owner of the device being
the hardware domain? This seems a bit ad-hoc and not quite obvious.
> bars[i].size = physfn_vf_bars[i].size;
> bars[i].type = physfn_vf_bars[i].type;
> bars[i].prefetchable = physfn_vf_bars[i].prefetchable;
> @@ -166,6 +167,44 @@ static void cf_check control_write(const struct pci_dev *pdev, unsigned int reg,
> pci_conf_write16(pdev->sbdf, reg, val);
> }
>
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +static void cf_check vf_cmd_write(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t cmd, void *data)
> +{
> + struct vpci_header *header = data;
> +
> + cmd |= PCI_COMMAND_INTX_DISABLE;
> +
> + header->guest_cmd = cmd;
> +
> + /*
> + * Let Dom0 play with all the bits directly except for the memory
> + * decoding one. Bits that are not allowed for DomU are already
> + * handled above and by the rsvdp_mask.
> + */
> + if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
> + {
> + /*
> + * Ignore the error. No memory has been added or removed from the p2m
> + * (because the actual p2m changes are deferred in defer_map) and the
> + * memory decoding bit has not been changed, so leave everything as-is,
> + * hoping the guest will realize and try again.
> + */
> + map_vf(pdev, cmd);
> + }
> + else
> + pci_conf_write16(pdev->sbdf, reg, cmd);
> +}
This seems to be (mostly) a duplication of the existing cmd_write() in
header.c. Is there anyway that we could use the same handler and
check for whether the device is a VF for any specific VF handling?
> +
> +static uint32_t cf_check vf_cmd_read(const struct pci_dev *pdev,
> + unsigned int reg, void *data)
> +{
> + const struct vpci_header *header = data;
> +
> + return header->guest_cmd;
> +}
This is a verbatim duplicate of guest_cmd_read() in header.c.
> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
> +
> static int vf_init_header(struct pci_dev *vf_pdev)
> {
> const struct pci_dev *pf_pdev;
> @@ -184,6 +223,84 @@ static int vf_init_header(struct pci_dev *vf_pdev)
> sriov_pos = pci_find_ext_capability(pf_pdev->sbdf, PCI_EXT_CAP_ID_SRIOV);
> ctrl = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_CTRL);
>
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> + if ( pf_pdev->domain != vf_pdev->domain )
> + {
> + uint16_t vid = pci_conf_read16(pf_pdev->sbdf, PCI_VENDOR_ID);
> + uint16_t did = pci_conf_read16(pf_pdev->sbdf,
> + sriov_pos + PCI_SRIOV_VF_DID);
> + struct vpci_bar *bars = vf_pdev->vpci->header.bars;
> + unsigned int i;
> +
> + rc = vpci_add_register(vf_pdev->vpci, vpci_read_val, NULL,
> + PCI_VENDOR_ID, 2, (void *)(uintptr_t)vid);
> + if ( rc )
> + return rc;
> +
> + rc = vpci_add_register(vf_pdev->vpci, vpci_read_val, NULL,
> + PCI_DEVICE_ID, 2, (void *)(uintptr_t)did);
> + if ( rc )
> + return rc;
> +
> + /* Hardcode multi-function device bit to 0 */
> + rc = vpci_add_register(vf_pdev->vpci, vpci_read_val, NULL,
> + PCI_HEADER_TYPE, 1,
> + (void *)PCI_HEADER_TYPE_NORMAL);
> + if ( rc )
> + return rc;
> +
> + rc = vpci_add_register(vf_pdev->vpci, vpci_hw_read32, NULL,
> + PCI_CLASS_REVISION, 4, NULL);
> + if ( rc )
> + return rc;
> +
> + /* VF ROZ is covered by ro_mask */
> + rc = vpci_add_register_mask(vf_pdev->vpci, vf_cmd_read, vf_cmd_write,
> + PCI_COMMAND, 2, &vf_pdev->vpci->header,
> + PCI_COMMAND_IO | PCI_COMMAND_SPECIAL |
> + PCI_COMMAND_INVALIDATE |
> + PCI_COMMAND_VGA_PALETTE | PCI_COMMAND_WAIT |
> + PCI_COMMAND_FAST_BACK,
> + 0,
> + PCI_COMMAND_RSVDP_MASK |
> + PCI_COMMAND_PARITY | PCI_COMMAND_SERR,
> + 0);
> + if ( rc )
> + return rc;
> +
> + rc = vpci_init_capability_list(vf_pdev);
> + if ( rc )
> + return rc;
> +
> + for ( i = 0; i < PCI_SRIOV_NUM_BARS; i++ )
> + {
> + switch ( pf_pdev->vpci->sriov->vf_bars[i].type )
> + {
> + case VPCI_BAR_MEM32:
> + case VPCI_BAR_MEM64_LO:
> + case VPCI_BAR_MEM64_HI:
> + rc = vpci_add_register(vf_pdev->vpci, vpci_guest_mem_bar_read,
> + vpci_guest_mem_bar_write,
> + PCI_BASE_ADDRESS_0 + i * 4, 4, &bars[i]);
> + if ( rc )
> + return rc;
> + break;
> + default:
> + rc = vpci_add_register(vf_pdev->vpci, vpci_read_val, NULL,
> + PCI_BASE_ADDRESS_0 + i * 4, 4,
> + (void *)0);
> + if ( rc )
> + return rc;
> + break;
> + }
> + }
> +
> + rc = vf_init_bars(vf_pdev);
> + if ( rc )
> + return rc;
> + }
> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */
I think it would be better if the code in sr-iov.c would only deal
with accesses to the SR-IOV capability (so the hardware domain), while
the code to handle SR-IOV VFs was directly added to header.c.
Otherwise the work here will collide from the series by Jiqian that
attempts to clean up the mediation of capabilities done in vPCI:
https://lore.kernel.org/xen-devel/20250728050401.329510-1-Jiqian.Chen@amd.com/
Specifically see patch #2:
https://lore.kernel.org/xen-devel/20250728050401.329510-3-Jiqian.Chen@amd.com/
Which introduces vpci_init_capabilities(), and makes vPCI capability
initialization tied to the PCI device having the capability present.
Thanks, Roger.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 0/6] Implement SR-IOV support for PVH
2025-07-25 14:24 [PATCH v1 0/6] Implement SR-IOV support for PVH Mykyta Poturai
` (5 preceding siblings ...)
2025-07-25 14:24 ` [PATCH v1 6/6] vpci: add SR-IOV support for DomUs Mykyta Poturai
@ 2025-07-28 9:04 ` Roger Pau Monné
6 siblings, 0 replies; 19+ messages in thread
From: Roger Pau Monné @ 2025-07-28 9:04 UTC (permalink / raw)
To: Mykyta Poturai
Cc: xen-devel@lists.xenproject.org, Oleksii Kurochko,
Community Manager, Andrew Cooper, Anthony PERARD, Michal Orzel,
Jan Beulich, Julien Grall, Stefano Stabellini
On Fri, Jul 25, 2025 at 02:24:31PM +0000, Mykyta Poturai wrote:
> This series enables support for PCI SR-IOV capabilty for PVH domains.
> It allows Dom0 to enable and use SR-IOV virtual functions and for this
> functions to be passed to guests.
>
> To achieve this, handlers for SRIOV_CONTROL register and simplified handlers
> for VFs header were implemented.
>
> Core functionality is based on previous works [1] and [2].
References here have been lost?
Thanks, Roger.
^ permalink raw reply [flat|nested] 19+ messages in thread