* [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements
@ 2024-02-18 4:56 Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 01/11] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
` (10 more replies)
0 siblings, 11 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki, qemu-stable
I submitted a RFC series[1] to add support for SR-IOV emulation to
virtio-net-pci. During the development of the series, I fixed some
trivial bugs and made improvements that I think are independently
useful. This series extracts those fixes and improvements from the RFC
series.
[1]: https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6dfaf@daynix.com/
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
Changes in v5:
- Added patch "hw/pci: Always call pcie_sriov_pf_reset()".
- Added patch "pcie_sriov: Reset SR-IOV extended capability".
- Removed a reference to PCI_SRIOV_CTRL_VFE in hw/nvme.
(Michael S. Tsirkin)
- Noted the impact on the guest of patch "pcie_sriov: Do not reset
NumVFs after unregistering VFs". (Michael S. Tsirkin)
- Changed to use pcie_sriov_num_vfs().
- Restored pci_set_power() and changed it to call pci_set_enabled() only
for PFs with an expalanation. (Michael S. Tsirkin)
- Reordered patches.
- Link to v4: https://lore.kernel.org/r/20240214-reuse-v4-0-89ad093a07f4@daynix.com
Changes in v4:
- Reverted the change to pci_rom_bar_explicitly_enabled().
(Michael S. Tsirkin)
- Added patch "pcie_sriov: Do not reset NumVFs after unregistering VFs".
- Added patch "hw/nvme: Refer to dev->exp.sriov_pf.num_vfs".
- Link to v3: https://lore.kernel.org/r/20240212-reuse-v3-0-8017b689ce7f@daynix.com
Changes in v3:
- Extracted patch "hw/pci: Use -1 as a default value for rombar" from
patch "hw/pci: Determine if rombar is explicitly enabled"
(Philippe Mathieu-Daudé)
- Added an audit result of PCIDevice::rom_bar to the message of patch
"hw/pci: Use -1 as a default value for rombar"
(Philippe Mathieu-Daudé)
- Link to v2: https://lore.kernel.org/r/20240210-reuse-v2-0-24ba2a502692@daynix.com
Changes in v2:
- Reset after enabling a function so that NVMe VF state gets updated.
- Link to v1: https://lore.kernel.org/r/20240203-reuse-v1-0-5be8c5ce6338@daynix.com
---
Akihiko Odaki (11):
hw/nvme: Use pcie_sriov_num_vfs()
pcie_sriov: Validate NumVFs
hw/pci: Use -1 as a default value for rombar
hw/pci: Determine if rombar is explicitly enabled
vfio: Avoid inspecting option QDict for rombar
hw/qdev: Remove opts member
pcie_sriov: Reset SR-IOV extended capability
pcie_sriov: Do not reset NumVFs after disabling VFs
hw/pci: Always call pcie_sriov_pf_reset()
hw/pci: Rename has_power to enabled
pcie_sriov: Reuse SR-IOV VF device instances
docs/pcie_sriov.txt | 8 ++-
include/hw/pci/pci.h | 2 +-
include/hw/pci/pci_device.h | 22 ++++++-
include/hw/pci/pcie_sriov.h | 10 ++--
include/hw/qdev-core.h | 4 --
hw/core/qdev.c | 1 -
hw/net/igb.c | 15 +++--
hw/nvme/ctrl.c | 54 ++++++++---------
hw/pci/pci.c | 17 +++---
hw/pci/pci_host.c | 4 +-
hw/pci/pcie_sriov.c | 142 ++++++++++++++++++++++----------------------
hw/vfio/pci.c | 3 +-
system/qdev-monitor.c | 12 ++--
13 files changed, 157 insertions(+), 137 deletions(-)
---
base-commit: 5005aed8a7e728d028efb40e243ecfc2b4f3df3a
change-id: 20240129-reuse-faae22b11934
Best regards,
--
Akihiko Odaki <akihiko.odaki@daynix.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v5 01/11] hw/nvme: Use pcie_sriov_num_vfs()
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-19 12:00 ` Klaus Jensen
2024-02-18 4:56 ` [PATCH v5 02/11] pcie_sriov: Validate NumVFs Akihiko Odaki
` (9 subsequent siblings)
10 siblings, 1 reply; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki, qemu-stable
nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
configurations to know the number of VFs being disabled due to SR-IOV
configuration writes, but the logic was flawed and resulted in
out-of-bound memory access.
It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
VFs, but it actually doesn't in the following cases:
- PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
- PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
- VFs were only partially enabled because of realization failure.
It is a responsibility of pcie_sriov to interpret SR-IOV configurations
and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
provides, to get the number of enabled VFs before and after SR-IOV
configuration writes.
Cc: qemu-stable@nongnu.org
Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
hw/nvme/ctrl.c | 26 ++++++++------------------
1 file changed, 8 insertions(+), 18 deletions(-)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f026245d1e9e..7a56e7b79b4d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
}
-static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
- uint32_t val, int len)
+static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
{
NvmeCtrl *n = NVME(dev);
NvmeSecCtrlEntry *sctrl;
- uint16_t sriov_cap = dev->exp.sriov_cap;
- uint32_t off = address - sriov_cap;
- int i, num_vfs;
+ int i;
- if (!sriov_cap) {
- return;
- }
-
- if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
- if (!(val & PCI_SRIOV_CTRL_VFE)) {
- num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
- for (i = 0; i < num_vfs; i++) {
- sctrl = &n->sec_ctrl_list.sec[i];
- nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
- }
- }
+ for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
+ sctrl = &n->sec_ctrl_list.sec[i];
+ nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
}
}
static void nvme_pci_write_config(PCIDevice *dev, uint32_t address,
uint32_t val, int len)
{
- nvme_sriov_pre_write_ctrl(dev, address, val, len);
+ uint16_t old_num_vfs = pcie_sriov_num_vfs(dev);
+
pci_default_write_config(dev, address, val, len);
pcie_cap_flr_write_config(dev, address, val, len);
+ nvme_sriov_post_write_config(dev, old_num_vfs);
}
static const VMStateDescription nvme_vmstate = {
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 02/11] pcie_sriov: Validate NumVFs
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 01/11] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-18 17:36 ` Michael S. Tsirkin
2024-02-18 4:56 ` [PATCH v5 03/11] hw/pci: Use -1 as a default value for rombar Akihiko Odaki
` (8 subsequent siblings)
10 siblings, 1 reply; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki, qemu-stable
The guest may write NumVFs greater than TotalVFs and that can lead
to buffer overflow in VF implementations.
Cc: qemu-stable@nongnu.org
Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
hw/pci/pcie_sriov.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index a1fe65f5d801..da209b7f47fd 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -176,6 +176,9 @@ static void register_vfs(PCIDevice *dev)
assert(sriov_cap > 0);
num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
+ if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
+ return;
+ }
dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 03/11] hw/pci: Use -1 as a default value for rombar
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 01/11] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 02/11] pcie_sriov: Validate NumVFs Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 04/11] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki
` (7 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki
Currently there is no way to distinguish the case that rombar is
explicitly specified as 1 and the case that rombar is not specified.
Set rombar -1 by default to distinguish these cases just as it is done
for addr and romsize. It was confirmed that changing the default value
to -1 will not change the behavior by looking at occurences of rom_bar.
$ git grep -w rom_bar
hw/display/qxl.c:328: QXLRom *rom = memory_region_get_ram_ptr(&d->rom_bar);
hw/display/qxl.c:431: qxl_set_dirty(&qxl->rom_bar, 0, qxl->rom_size);
hw/display/qxl.c:1048: QXLRom *rom = memory_region_get_ram_ptr(&qxl->rom_bar);
hw/display/qxl.c:2131: memory_region_init_rom(&qxl->rom_bar, OBJECT(qxl), "qxl.vrom",
hw/display/qxl.c:2154: PCI_BASE_ADDRESS_SPACE_MEMORY, &qxl->rom_bar);
hw/display/qxl.h:101: MemoryRegion rom_bar;
hw/pci/pci.c:74: DEFINE_PROP_UINT32("rombar", PCIDevice, rom_bar, 1),
hw/pci/pci.c:2329: if (!pdev->rom_bar) {
hw/vfio/pci.c:1019: if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
hw/xen/xen_pt_load_rom.c:29: if (dev->romfile || !dev->rom_bar) {
include/hw/pci/pci_device.h:150: uint32_t rom_bar;
rom_bar refers to a different variable in qxl. It is only tested if
the value is 0 or not in the other places.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
hw/pci/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 6496d027ca61..47f38375bb09 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -71,7 +71,7 @@ static Property pci_props[] = {
DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, -1),
- DEFINE_PROP_UINT32("rombar", PCIDevice, rom_bar, 1),
+ DEFINE_PROP_UINT32("rombar", PCIDevice, rom_bar, -1),
DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
DEFINE_PROP_BIT("x-pcie-lnksta-dllla", PCIDevice, cap_present,
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 04/11] hw/pci: Determine if rombar is explicitly enabled
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
` (2 preceding siblings ...)
2024-02-18 4:56 ` [PATCH v5 03/11] hw/pci: Use -1 as a default value for rombar Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 05/11] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki
` (6 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki
vfio determines if rombar is explicitly enabled by inspecting QDict.
Inspecting QDict is not nice because QDict is untyped and depends on the
details on the external interface. Add an infrastructure to determine if
rombar is explicitly enabled to hw/pci.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
include/hw/pci/pci_device.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b273..54fa0676abf1 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -205,6 +205,11 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
}
+static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev)
+{
+ return dev->rom_bar && dev->rom_bar != -1;
+}
+
uint16_t pci_requester_id(PCIDevice *dev);
/* DMA access functions */
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 05/11] vfio: Avoid inspecting option QDict for rombar
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
` (3 preceding siblings ...)
2024-02-18 4:56 ` [PATCH v5 04/11] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-18 17:26 ` Michael S. Tsirkin
2024-02-18 4:56 ` [PATCH v5 06/11] hw/qdev: Remove opts member Akihiko Odaki
` (5 subsequent siblings)
10 siblings, 1 reply; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki
Use pci_rom_bar_explicitly_enabled() to determine if rombar is explicitly
enabled.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
hw/vfio/pci.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 4fa387f0430d..647f15b2a060 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1012,7 +1012,6 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
{
uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
- DeviceState *dev = DEVICE(vdev);
char *name;
int fd = vdev->vbasedev.fd;
@@ -1046,7 +1045,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
}
if (vfio_opt_rom_in_denylist(vdev)) {
- if (dev->opts && qdict_haskey(dev->opts, "rombar")) {
+ if (pci_rom_bar_explicitly_enabled(&vdev->pdev)) {
warn_report("Device at %s is known to cause system instability"
" issues during option rom execution",
vdev->vbasedev.name);
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 06/11] hw/qdev: Remove opts member
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
` (4 preceding siblings ...)
2024-02-18 4:56 ` [PATCH v5 05/11] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 07/11] pcie_sriov: Reset SR-IOV extended capability Akihiko Odaki
` (4 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki
It is no longer used.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
include/hw/qdev-core.h | 4 ----
hw/core/qdev.c | 1 -
system/qdev-monitor.c | 12 +++++++-----
3 files changed, 7 insertions(+), 10 deletions(-)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9228e96c87e9..5954404dcbfe 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -237,10 +237,6 @@ struct DeviceState {
* @pending_deleted_expires_ms: optional timeout for deletion events
*/
int64_t pending_deleted_expires_ms;
- /**
- * @opts: QDict of options for the device
- */
- QDict *opts;
/**
* @hotplugged: was device added after PHASE_MACHINE_READY?
*/
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index c68d0f7c512f..7349c9a86be8 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -706,7 +706,6 @@ static void device_finalize(Object *obj)
dev->canonical_path = NULL;
}
- qobject_unref(dev->opts);
g_free(dev->id);
}
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index a13db763e5dd..71c00f62ee38 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -625,6 +625,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
char *id;
DeviceState *dev = NULL;
BusState *bus = NULL;
+ QDict *properties;
driver = qdict_get_try_str(opts, "driver");
if (!driver) {
@@ -705,13 +706,14 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
}
/* set properties */
- dev->opts = qdict_clone_shallow(opts);
- qdict_del(dev->opts, "driver");
- qdict_del(dev->opts, "bus");
- qdict_del(dev->opts, "id");
+ properties = qdict_clone_shallow(opts);
+ qdict_del(properties, "driver");
+ qdict_del(properties, "bus");
+ qdict_del(properties, "id");
- object_set_properties_from_keyval(&dev->parent_obj, dev->opts, from_json,
+ object_set_properties_from_keyval(&dev->parent_obj, properties, from_json,
errp);
+ qobject_unref(properties);
if (*errp) {
goto err_del_dev;
}
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 07/11] pcie_sriov: Reset SR-IOV extended capability
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
` (5 preceding siblings ...)
2024-02-18 4:56 ` [PATCH v5 06/11] hw/qdev: Remove opts member Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 08/11] pcie_sriov: Do not reset NumVFs after disabling VFs Akihiko Odaki
` (3 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki
pcie_sriov_pf_disable_vfs() is called when resetting the PF, but it only
disables VFs and does not reset SR-IOV extended capability, leaking the
state and making the VF Enable register inconsistent with the actual
state.
Replace pcie_sriov_pf_disable_vfs() with pcie_sriov_pf_reset(), which
does not only disable VFs but also resets the capability.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
include/hw/pci/pcie_sriov.h | 4 ++--
hw/net/igb.c | 2 +-
hw/nvme/ctrl.c | 2 +-
hw/pci/pcie_sriov.c | 26 ++++++++++++++++++--------
4 files changed, 22 insertions(+), 12 deletions(-)
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 095fb0c9edf9..b77eb7bf58ac 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -58,8 +58,8 @@ void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize);
void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
uint32_t val, int len);
-/* Reset SR/IOV VF Enable bit to unregister all VFs */
-void pcie_sriov_pf_disable_vfs(PCIDevice *dev);
+/* Reset SR/IOV */
+void pcie_sriov_pf_reset(PCIDevice *dev);
/* Get logical VF number of a VF - only valid for VFs */
uint16_t pcie_sriov_vf_number(PCIDevice *dev);
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 0b5c31a58bba..9345506f81ec 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -493,7 +493,7 @@ static void igb_qdev_reset_hold(Object *obj)
trace_e1000e_cb_qdev_reset_hold();
- pcie_sriov_pf_disable_vfs(d);
+ pcie_sriov_pf_reset(d);
igb_core_reset(&s->core);
}
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 7a56e7b79b4d..7c0d3f108724 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -7116,7 +7116,7 @@ static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
}
if (rst != NVME_RESET_CONTROLLER) {
- pcie_sriov_pf_disable_vfs(pci_dev);
+ pcie_sriov_pf_reset(pci_dev);
}
}
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index da209b7f47fd..51b66d1bb342 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -249,16 +249,26 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
}
-/* Reset SR/IOV VF Enable bit to trigger an unregister of all VFs */
-void pcie_sriov_pf_disable_vfs(PCIDevice *dev)
+/* Reset SR/IOV */
+void pcie_sriov_pf_reset(PCIDevice *dev)
{
uint16_t sriov_cap = dev->exp.sriov_cap;
- if (sriov_cap) {
- uint32_t val = pci_get_byte(dev->config + sriov_cap + PCI_SRIOV_CTRL);
- if (val & PCI_SRIOV_CTRL_VFE) {
- val &= ~PCI_SRIOV_CTRL_VFE;
- pcie_sriov_config_write(dev, sriov_cap + PCI_SRIOV_CTRL, val, 1);
- }
+ if (!sriov_cap) {
+ return;
+ }
+
+ pci_set_word(dev->config + sriov_cap + PCI_SRIOV_CTRL, 0);
+ unregister_vfs(dev);
+
+ /*
+ * Default is to use 4K pages, software can modify it
+ * to any of the supported bits
+ */
+ pci_set_word(dev->config + sriov_cap + PCI_SRIOV_SYS_PGSIZE, 0x1);
+
+ for (uint16_t i = 0; i < PCI_NUM_REGIONS; i++) {
+ pci_set_quad(dev->config + sriov_cap + PCI_SRIOV_BAR + i * 4,
+ dev->exp.sriov_pf.vf_bar_type[i]);
}
}
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 08/11] pcie_sriov: Do not reset NumVFs after disabling VFs
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
` (6 preceding siblings ...)
2024-02-18 4:56 ` [PATCH v5 07/11] pcie_sriov: Reset SR-IOV extended capability Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 09/11] hw/pci: Always call pcie_sriov_pf_reset() Akihiko Odaki
` (2 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki
The spec does not NumVFs is reset after disabling VFs except when
resetting the PF. Clearing it is guest visible and out of spec, even
though Linux doesn't rely on this value being preserved, so we never
noticed.
Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
hw/pci/pcie_sriov.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 51b66d1bb342..e9b23221d713 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -215,7 +215,6 @@ static void unregister_vfs(PCIDevice *dev)
g_free(dev->exp.sriov_pf.vf);
dev->exp.sriov_pf.vf = NULL;
dev->exp.sriov_pf.num_vfs = 0;
- pci_set_word(dev->config + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 0);
}
void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
@@ -260,6 +259,8 @@ void pcie_sriov_pf_reset(PCIDevice *dev)
pci_set_word(dev->config + sriov_cap + PCI_SRIOV_CTRL, 0);
unregister_vfs(dev);
+ pci_set_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF, 0);
+
/*
* Default is to use 4K pages, software can modify it
* to any of the supported bits
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 09/11] hw/pci: Always call pcie_sriov_pf_reset()
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
` (7 preceding siblings ...)
2024-02-18 4:56 ` [PATCH v5 08/11] pcie_sriov: Do not reset NumVFs after disabling VFs Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 10/11] hw/pci: Rename has_power to enabled Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 11/11] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
10 siblings, 0 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki
Call pcie_sriov_pf_reset() from pci_do_device_reset() just as we do
for msi_reset() and msix_reset() to prevent duplicating code for each
SR-IOV PF.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
hw/net/igb.c | 2 --
hw/nvme/ctrl.c | 4 ----
hw/pci/pci.c | 1 +
3 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 9345506f81ec..9b37523d6df8 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -488,12 +488,10 @@ static void igb_pci_uninit(PCIDevice *pci_dev)
static void igb_qdev_reset_hold(Object *obj)
{
- PCIDevice *d = PCI_DEVICE(obj);
IGBState *s = IGB(obj);
trace_e1000e_cb_qdev_reset_hold();
- pcie_sriov_pf_reset(d);
igb_core_reset(&s->core);
}
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 7c0d3f108724..c1af4b87b34a 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -7114,10 +7114,6 @@ static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
sctrl = &n->sec_ctrl_list.sec[i];
nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
}
-
- if (rst != NVME_RESET_CONTROLLER) {
- pcie_sriov_pf_reset(pci_dev);
- }
}
if (rst != NVME_RESET_CONTROLLER) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 47f38375bb09..3d37ddbb101e 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -409,6 +409,7 @@ static void pci_do_device_reset(PCIDevice *dev)
msi_reset(dev);
msix_reset(dev);
+ pcie_sriov_pf_reset(dev);
}
/*
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 10/11] hw/pci: Rename has_power to enabled
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
` (8 preceding siblings ...)
2024-02-18 4:56 ` [PATCH v5 09/11] hw/pci: Always call pcie_sriov_pf_reset() Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 11/11] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
10 siblings, 0 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki
The renamed state will not only represent powering state of PFs, but
also represent SR-IOV VF enablement in the future.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
include/hw/pci/pci.h | 7 ++++++-
include/hw/pci/pci_device.h | 2 +-
hw/pci/pci.c | 14 +++++++-------
hw/pci/pci_host.c | 4 ++--
4 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index eaa3fc99d884..6c92b2f70008 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -642,6 +642,11 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
}
MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
-void pci_set_power(PCIDevice *pci_dev, bool state);
+void pci_set_enabled(PCIDevice *pci_dev, bool state);
+
+static inline void pci_set_power(PCIDevice *pci_dev, bool state)
+{
+ pci_set_enabled(pci_dev, state);
+}
#endif
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index 54fa0676abf1..cde53065e4fe 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -56,7 +56,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
struct PCIDevice {
DeviceState qdev;
bool partially_hotplugged;
- bool has_power;
+ bool enabled;
/* PCI config space */
uint8_t *config;
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 3d37ddbb101e..1543f284b255 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1525,7 +1525,7 @@ static void pci_update_mappings(PCIDevice *d)
continue;
new_addr = pci_bar_address(d, i, r->type, r->size);
- if (!d->has_power) {
+ if (!d->enabled) {
new_addr = PCI_BAR_UNMAPPED;
}
@@ -1613,7 +1613,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
pci_update_irq_disabled(d, was_irq_disabled);
memory_region_set_enabled(&d->bus_master_enable_region,
(pci_get_word(d->config + PCI_COMMAND)
- & PCI_COMMAND_MASTER) && d->has_power);
+ & PCI_COMMAND_MASTER) && d->enabled);
}
msi_write_config(d, addr, val_in, l);
@@ -2811,18 +2811,18 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
return msg;
}
-void pci_set_power(PCIDevice *d, bool state)
+void pci_set_enabled(PCIDevice *d, bool state)
{
- if (d->has_power == state) {
+ if (d->enabled == state) {
return;
}
- d->has_power = state;
+ d->enabled = state;
pci_update_mappings(d);
memory_region_set_enabled(&d->bus_master_enable_region,
(pci_get_word(d->config + PCI_COMMAND)
- & PCI_COMMAND_MASTER) && d->has_power);
- if (!d->has_power) {
+ & PCI_COMMAND_MASTER) && d->enabled);
+ if (!d->enabled) {
pci_device_reset(d);
}
}
diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index dfe6fe618401..0d82727cc9dd 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -86,7 +86,7 @@ void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr,
* allowing direct removal of unexposed functions.
*/
if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
- !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) {
+ !pci_dev->enabled || is_pci_dev_ejected(pci_dev)) {
return;
}
@@ -111,7 +111,7 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr,
* allowing direct removal of unexposed functions.
*/
if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
- !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) {
+ !pci_dev->enabled || is_pci_dev_ejected(pci_dev)) {
return ~0x0;
}
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v5 11/11] pcie_sriov: Reuse SR-IOV VF device instances
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
` (9 preceding siblings ...)
2024-02-18 4:56 ` [PATCH v5 10/11] hw/pci: Rename has_power to enabled Akihiko Odaki
@ 2024-02-18 4:56 ` Akihiko Odaki
10 siblings, 0 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-18 4:56 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, Klaus Jensen
Cc: qemu-devel, qemu-block, Akihiko Odaki
Disable SR-IOV VF devices by reusing code to power down PCI devices
instead of removing them when the guest requests to disable VFs. This
allows to realize devices and report VF realization errors at PF
realization time.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
docs/pcie_sriov.txt | 8 ++--
include/hw/pci/pci.h | 5 --
include/hw/pci/pci_device.h | 15 ++++++
include/hw/pci/pcie_sriov.h | 6 +--
hw/net/igb.c | 13 ++++--
hw/nvme/ctrl.c | 24 ++++++----
hw/pci/pci.c | 2 +-
hw/pci/pcie_sriov.c | 110 ++++++++++++++++++++------------------------
8 files changed, 99 insertions(+), 84 deletions(-)
diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index a47aad0bfab0..ab2142807f79 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -52,9 +52,11 @@ setting up a BAR for a VF.
...
/* Add and initialize the SR/IOV capability */
- pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
- vf_devid, initial_vfs, total_vfs,
- fun_offset, stride);
+ if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
+ vf_devid, initial_vfs, total_vfs,
+ fun_offset, stride, errp)) {
+ return;
+ }
/* Set up individual VF BARs (parameters as for normal BARs) */
pcie_sriov_pf_init_vf_bar( ... )
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6c92b2f70008..442017b4865d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -644,9 +644,4 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
void pci_set_enabled(PCIDevice *pci_dev, bool state);
-static inline void pci_set_power(PCIDevice *pci_dev, bool state)
-{
- pci_set_enabled(pci_dev, state);
-}
-
#endif
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index cde53065e4fe..c4fdc96ef50d 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -210,6 +210,21 @@ static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev)
return dev->rom_bar && dev->rom_bar != -1;
}
+static inline void pci_set_power(PCIDevice *pci_dev, bool state)
+{
+ /*
+ * Don't change the enabled state of VFs when powering on/off the device.
+ *
+ * When powering on, VFs must not be enabled immediately but they must
+ * wait until the guest configures SR-IOV.
+ * When powering off, their corresponding PFs will be reset and disable
+ * VFs.
+ */
+ if (!pci_is_vf(pci_dev)) {
+ pci_set_enabled(pci_dev, state);
+ }
+}
+
uint16_t pci_requester_id(PCIDevice *dev);
/* DMA access functions */
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index b77eb7bf58ac..4b1133f79e15 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -18,7 +18,6 @@
struct PCIESriovPF {
uint16_t num_vfs; /* Number of virtual functions created */
uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each VF bar */
- const char *vfname; /* Reference to the device type used for the VFs */
PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */
};
@@ -27,10 +26,11 @@ struct PCIESriovVF {
uint16_t vf_number; /* Logical VF number of this function */
};
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
const char *vfname, uint16_t vf_dev_id,
uint16_t init_vfs, uint16_t total_vfs,
- uint16_t vf_offset, uint16_t vf_stride);
+ uint16_t vf_offset, uint16_t vf_stride,
+ Error **errp);
void pcie_sriov_pf_exit(PCIDevice *dev);
/* Set up a VF bar in the SR/IOV bar area */
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 9b37523d6df8..907259fd8b3b 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -447,9 +447,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error **errp)
pcie_ari_init(pci_dev, 0x150);
- pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
- IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
- IGB_VF_OFFSET, IGB_VF_STRIDE);
+ if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET,
+ TYPE_IGBVF, IGB_82576_VF_DEV_ID,
+ IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
+ IGB_VF_OFFSET, IGB_VF_STRIDE,
+ errp)) {
+ pcie_cap_exit(pci_dev);
+ igb_cleanup_msix(s);
+ msi_uninit(pci_dev);
+ return;
+ }
pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX,
PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index c1af4b87b34a..98c3e942077c 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8027,7 +8027,8 @@ static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs,
return bar_size;
}
-static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
+static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
+ Error **errp)
{
uint16_t vf_dev_id = n->params.use_intel_id ?
PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
@@ -8036,12 +8037,17 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
le16_to_cpu(cap->vifrsm),
NULL, NULL);
- pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
- n->params.sriov_max_vfs, n->params.sriov_max_vfs,
- NVME_VF_OFFSET, NVME_VF_STRIDE);
+ if (!pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
+ n->params.sriov_max_vfs, n->params.sriov_max_vfs,
+ NVME_VF_OFFSET, NVME_VF_STRIDE,
+ errp)) {
+ return false;
+ }
pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
+
+ return true;
}
static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
@@ -8120,6 +8126,12 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
return false;
}
+ if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs &&
+ !nvme_init_sriov(n, pci_dev, 0x120, errp)) {
+ msix_uninit(pci_dev, &n->bar0, &n->bar0);
+ return false;
+ }
+
nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize);
if (n->params.cmb_size_mb) {
@@ -8130,10 +8142,6 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
nvme_init_pmr(n, pci_dev);
}
- if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs) {
- nvme_init_sriov(n, pci_dev, 0x120);
- }
-
return true;
}
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 1543f284b255..b9d271b30bd5 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2822,7 +2822,7 @@ void pci_set_enabled(PCIDevice *d, bool state)
memory_region_set_enabled(&d->bus_master_enable_region,
(pci_get_word(d->config + PCI_COMMAND)
& PCI_COMMAND_MASTER) && d->enabled);
- if (!d->enabled) {
+ if (d->qdev.realized) {
pci_device_reset(d);
}
}
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index e9b23221d713..e40f9631ed93 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -20,15 +20,29 @@
#include "qapi/error.h"
#include "trace.h"
-static PCIDevice *register_vf(PCIDevice *pf, int devfn,
- const char *name, uint16_t vf_num);
-static void unregister_vfs(PCIDevice *dev);
+static void unrealize_vfs(PCIDevice *dev, uint16_t total_vfs)
+{
+ for (uint16_t i = 0; i < total_vfs; i++) {
+ Error *err = NULL;
+ PCIDevice *vf = dev->exp.sriov_pf.vf[i];
+ if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) {
+ error_reportf_err(err, "Failed to unplug: ");
+ }
+ object_unparent(OBJECT(vf));
+ object_unref(OBJECT(vf));
+ }
+ g_free(dev->exp.sriov_pf.vf);
+ dev->exp.sriov_pf.vf = NULL;
+}
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
const char *vfname, uint16_t vf_dev_id,
uint16_t init_vfs, uint16_t total_vfs,
- uint16_t vf_offset, uint16_t vf_stride)
+ uint16_t vf_offset, uint16_t vf_stride,
+ Error **errp)
{
+ BusState *bus = qdev_get_parent_bus(&dev->qdev);
+ int32_t devfn = dev->devfn + vf_offset;
uint8_t *cfg = dev->config + offset;
uint8_t *wmask;
@@ -36,7 +50,6 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
offset, PCI_EXT_CAP_SRIOV_SIZEOF);
dev->exp.sriov_cap = offset;
dev->exp.sriov_pf.num_vfs = 0;
- dev->exp.sriov_pf.vfname = g_strdup(vfname);
dev->exp.sriov_pf.vf = NULL;
pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
@@ -69,13 +82,35 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
qdev_prop_set_bit(&dev->qdev, "multifunction", true);
+
+ dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs);
+
+ for (uint16_t i = 0; i < total_vfs; i++) {
+ PCIDevice *vf = pci_new(devfn, vfname);
+ vf->exp.sriov_vf.pf = dev;
+ vf->exp.sriov_vf.vf_number = i;
+
+ if (!qdev_realize(&vf->qdev, bus, errp)) {
+ unrealize_vfs(dev, i);
+ return false;
+ }
+
+ /* set vid/did according to sr/iov spec - they are not used */
+ pci_config_set_vendor_id(vf->config, 0xffff);
+ pci_config_set_device_id(vf->config, 0xffff);
+
+ dev->exp.sriov_pf.vf[i] = vf;
+ devfn += vf_stride;
+ }
+
+ return true;
}
void pcie_sriov_pf_exit(PCIDevice *dev)
{
- unregister_vfs(dev);
- g_free((char *)dev->exp.sriov_pf.vfname);
- dev->exp.sriov_pf.vfname = NULL;
+ uint8_t *cfg = dev->config + dev->exp.sriov_cap;
+
+ unrealize_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF));
}
void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
@@ -141,38 +176,11 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
}
}
-static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name,
- uint16_t vf_num)
-{
- PCIDevice *dev = pci_new(devfn, name);
- dev->exp.sriov_vf.pf = pf;
- dev->exp.sriov_vf.vf_number = vf_num;
- PCIBus *bus = pci_get_bus(pf);
- Error *local_err = NULL;
-
- qdev_realize(&dev->qdev, &bus->qbus, &local_err);
- if (local_err) {
- error_report_err(local_err);
- return NULL;
- }
-
- /* set vid/did according to sr/iov spec - they are not used */
- pci_config_set_vendor_id(dev->config, 0xffff);
- pci_config_set_device_id(dev->config, 0xffff);
-
- return dev;
-}
-
static void register_vfs(PCIDevice *dev)
{
uint16_t num_vfs;
uint16_t i;
uint16_t sriov_cap = dev->exp.sriov_cap;
- uint16_t vf_offset =
- pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
- uint16_t vf_stride =
- pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
- int32_t devfn = dev->devfn + vf_offset;
assert(sriov_cap > 0);
num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
@@ -180,18 +188,10 @@ static void register_vfs(PCIDevice *dev)
return;
}
- dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
-
trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
PCI_FUNC(dev->devfn), num_vfs);
for (i = 0; i < num_vfs; i++) {
- dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn,
- dev->exp.sriov_pf.vfname, i);
- if (!dev->exp.sriov_pf.vf[i]) {
- num_vfs = i;
- break;
- }
- devfn += vf_stride;
+ pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
}
dev->exp.sriov_pf.num_vfs = num_vfs;
}
@@ -204,16 +204,8 @@ static void unregister_vfs(PCIDevice *dev)
trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
PCI_FUNC(dev->devfn), num_vfs);
for (i = 0; i < num_vfs; i++) {
- Error *err = NULL;
- PCIDevice *vf = dev->exp.sriov_pf.vf[i];
- if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) {
- error_reportf_err(err, "Failed to unplug: ");
- }
- object_unparent(OBJECT(vf));
- object_unref(OBJECT(vf));
+ pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
}
- g_free(dev->exp.sriov_pf.vf);
- dev->exp.sriov_pf.vf = NULL;
dev->exp.sriov_pf.num_vfs = 0;
}
@@ -235,14 +227,10 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
PCI_FUNC(dev->devfn), off, val, len);
if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
- if (dev->exp.sriov_pf.num_vfs) {
- if (!(val & PCI_SRIOV_CTRL_VFE)) {
- unregister_vfs(dev);
- }
+ if (val & PCI_SRIOV_CTRL_VFE) {
+ register_vfs(dev);
} else {
- if (val & PCI_SRIOV_CTRL_VFE) {
- register_vfs(dev);
- }
+ unregister_vfs(dev);
}
}
}
--
2.43.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v5 05/11] vfio: Avoid inspecting option QDict for rombar
2024-02-18 4:56 ` [PATCH v5 05/11] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki
@ 2024-02-18 17:26 ` Michael S. Tsirkin
0 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2024-02-18 17:26 UTC (permalink / raw)
To: Akihiko Odaki
Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson,
Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
Klaus Jensen, qemu-devel, qemu-block
On Sun, Feb 18, 2024 at 01:56:10PM +0900, Akihiko Odaki wrote:
> Use pci_rom_bar_explicitly_enabled() to determine if rombar is explicitly
> enabled.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
I see little point in all this reworks: QDict lookups are
robust. But if Alex wants this change, I won't oppose it.
Alex?
> ---
> hw/vfio/pci.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 4fa387f0430d..647f15b2a060 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1012,7 +1012,6 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
> {
> uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
> off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
> - DeviceState *dev = DEVICE(vdev);
> char *name;
> int fd = vdev->vbasedev.fd;
>
> @@ -1046,7 +1045,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
> }
>
> if (vfio_opt_rom_in_denylist(vdev)) {
> - if (dev->opts && qdict_haskey(dev->opts, "rombar")) {
> + if (pci_rom_bar_explicitly_enabled(&vdev->pdev)) {
> warn_report("Device at %s is known to cause system instability"
> " issues during option rom execution",
> vdev->vbasedev.name);
>
> --
> 2.43.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 02/11] pcie_sriov: Validate NumVFs
2024-02-18 4:56 ` [PATCH v5 02/11] pcie_sriov: Validate NumVFs Akihiko Odaki
@ 2024-02-18 17:36 ` Michael S. Tsirkin
2024-02-19 8:13 ` Akihiko Odaki
0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2024-02-18 17:36 UTC (permalink / raw)
To: Akihiko Odaki
Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson,
Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
Klaus Jensen, qemu-devel, qemu-block, qemu-stable
On Sun, Feb 18, 2024 at 01:56:07PM +0900, Akihiko Odaki wrote:
> The guest may write NumVFs greater than TotalVFs and that can lead
> to buffer overflow in VF implementations.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
> hw/pci/pcie_sriov.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> index a1fe65f5d801..da209b7f47fd 100644
> --- a/hw/pci/pcie_sriov.c
> +++ b/hw/pci/pcie_sriov.c
> @@ -176,6 +176,9 @@ static void register_vfs(PCIDevice *dev)
>
> assert(sriov_cap > 0);
> num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> + if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
> + return;
> + }
>
> dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
This reminds me: how is this num_vfs value set on migration?
>
> --
> 2.43.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 02/11] pcie_sriov: Validate NumVFs
2024-02-18 17:36 ` Michael S. Tsirkin
@ 2024-02-19 8:13 ` Akihiko Odaki
0 siblings, 0 replies; 16+ messages in thread
From: Akihiko Odaki @ 2024-02-19 8:13 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson,
Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
Klaus Jensen, qemu-devel, qemu-block, qemu-stable
On 2024/02/19 2:36, Michael S. Tsirkin wrote:
> On Sun, Feb 18, 2024 at 01:56:07PM +0900, Akihiko Odaki wrote:
>> The guest may write NumVFs greater than TotalVFs and that can lead
>> to buffer overflow in VF implementations.
>>
>> Cc: qemu-stable@nongnu.org
>> Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>> ---
>> hw/pci/pcie_sriov.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
>> index a1fe65f5d801..da209b7f47fd 100644
>> --- a/hw/pci/pcie_sriov.c
>> +++ b/hw/pci/pcie_sriov.c
>> @@ -176,6 +176,9 @@ static void register_vfs(PCIDevice *dev)
>>
>> assert(sriov_cap > 0);
>> num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
>> + if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
>> + return;
>> + }
>>
>> dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
>
>
> This reminds me: how is this num_vfs value set on migration?
That's a good point... Actually no consideration of migration is made
and SR-IOV is completely broken with it.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v5 01/11] hw/nvme: Use pcie_sriov_num_vfs()
2024-02-18 4:56 ` [PATCH v5 01/11] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
@ 2024-02-19 12:00 ` Klaus Jensen
0 siblings, 0 replies; 16+ messages in thread
From: Klaus Jensen @ 2024-02-19 12:00 UTC (permalink / raw)
To: Akihiko Odaki
Cc: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum,
Alex Williamson, Cédric Le Goater, Paolo Bonzini,
Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman,
Jason Wang, Keith Busch, qemu-devel, qemu-block, qemu-stable
[-- Attachment #1: Type: text/plain, Size: 3304 bytes --]
On Feb 18 13:56, Akihiko Odaki wrote:
> nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> configurations to know the number of VFs being disabled due to SR-IOV
> configuration writes, but the logic was flawed and resulted in
> out-of-bound memory access.
>
> It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> VFs, but it actually doesn't in the following cases:
> - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> - VFs were only partially enabled because of realization failure.
>
> It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> provides, to get the number of enabled VFs before and after SR-IOV
> configuration writes.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Thanks Akihiko,
I'll pick this up for hw/nvme nvme-next as-is.
Reviewed-by: Klaus Jensen <k.jensen@samsung.com>
> ---
> hw/nvme/ctrl.c | 26 ++++++++------------------
> 1 file changed, 8 insertions(+), 18 deletions(-)
>
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index f026245d1e9e..7a56e7b79b4d 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
> nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
> }
>
> -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> - uint32_t val, int len)
> +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
> {
> NvmeCtrl *n = NVME(dev);
> NvmeSecCtrlEntry *sctrl;
> - uint16_t sriov_cap = dev->exp.sriov_cap;
> - uint32_t off = address - sriov_cap;
> - int i, num_vfs;
> + int i;
>
> - if (!sriov_cap) {
> - return;
> - }
> -
> - if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> - if (!(val & PCI_SRIOV_CTRL_VFE)) {
> - num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> - for (i = 0; i < num_vfs; i++) {
> - sctrl = &n->sec_ctrl_list.sec[i];
> - nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> - }
> - }
> + for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> + sctrl = &n->sec_ctrl_list.sec[i];
> + nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> }
> }
>
> static void nvme_pci_write_config(PCIDevice *dev, uint32_t address,
> uint32_t val, int len)
> {
> - nvme_sriov_pre_write_ctrl(dev, address, val, len);
> + uint16_t old_num_vfs = pcie_sriov_num_vfs(dev);
> +
> pci_default_write_config(dev, address, val, len);
> pcie_cap_flr_write_config(dev, address, val, len);
> + nvme_sriov_post_write_config(dev, old_num_vfs);
> }
>
> static const VMStateDescription nvme_vmstate = {
>
> --
> 2.43.1
>
--
One of us - No more doubt, silence or taboo about mental illness.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-02-19 12:00 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-18 4:56 [PATCH v5 00/11] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 01/11] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
2024-02-19 12:00 ` Klaus Jensen
2024-02-18 4:56 ` [PATCH v5 02/11] pcie_sriov: Validate NumVFs Akihiko Odaki
2024-02-18 17:36 ` Michael S. Tsirkin
2024-02-19 8:13 ` Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 03/11] hw/pci: Use -1 as a default value for rombar Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 04/11] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 05/11] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki
2024-02-18 17:26 ` Michael S. Tsirkin
2024-02-18 4:56 ` [PATCH v5 06/11] hw/qdev: Remove opts member Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 07/11] pcie_sriov: Reset SR-IOV extended capability Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 08/11] pcie_sriov: Do not reset NumVFs after disabling VFs Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 09/11] hw/pci: Always call pcie_sriov_pf_reset() Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 10/11] hw/pci: Rename has_power to enabled Akihiko Odaki
2024-02-18 4:56 ` [PATCH v5 11/11] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).