* [PATCH v3 0/7] hw/pci: SR-IOV related fixes and improvements @ 2024-02-12 10:20 Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 1/7] hw/pci: Use -1 as a default value for rombar Akihiko Odaki ` (7 more replies) 0 siblings, 8 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-12 10:20 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Akihiko Odaki I submitted a RFC series[1] to add support for SR-IOV emulation to virtio-net-pci. During the development of the series, I fixed some trivial bugs and made improvements that I think are independently useful. This series extracts those fixes and improvements from the RFC series. Below is an explanation of the patches: Patch 1 adds a function to check if ROM BAR is explicitly enabled. It is used in the RFC series to report an error if the user requests to enable ROM BAR for SR-IOV VF. Patch 2 and 3 use it for vfio to remove hacky device option dictionary inspection. Patch 4 adds SR-IOV NumVFs validation to fix potential buffer overflow. Patch 5 changes to realize SR-IOV VFs when the PF is being realized to validate VF configuration. Patch 6 fixes memory leak that occurs if a SR-IOV VF fails to realize. [1]: https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6dfaf@daynix.com/ Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- Changes in v3: - Extracted patch "hw/pci: Use -1 as a default value for rombar" from patch "hw/pci: Determine if rombar is explicitly enabled" (Philippe Mathieu-Daudé) - Added an audit result of PCIDevice::rom_bar to the message of patch "hw/pci: Use -1 as a default value for rombar" (Philippe Mathieu-Daudé) - Link to v2: https://lore.kernel.org/r/20240210-reuse-v2-0-24ba2a502692@daynix.com Changes in v2: - Reset after enabling a function so that NVMe VF state gets updated. - Link to v1: https://lore.kernel.org/r/20240203-reuse-v1-0-5be8c5ce6338@daynix.com --- Akihiko Odaki (7): hw/pci: Use -1 as a default value for rombar hw/pci: Determine if rombar is explicitly enabled vfio: Avoid inspecting option QDict for rombar hw/qdev: Remove opts member pcie_sriov: Validate NumVFs pcie_sriov: Reuse SR-IOV VF device instances pcie_sriov: Release VFs failed to realize docs/pcie_sriov.txt | 8 ++-- include/hw/pci/pci.h | 2 +- include/hw/pci/pci_device.h | 7 ++- include/hw/pci/pcie_sriov.h | 6 +-- include/hw/qdev-core.h | 4 -- hw/core/qdev.c | 1 - hw/net/igb.c | 13 ++++-- hw/nvme/ctrl.c | 24 ++++++---- hw/pci/pci.c | 20 +++++---- hw/pci/pci_host.c | 4 +- hw/pci/pcie.c | 4 +- hw/pci/pcie_sriov.c | 105 +++++++++++++++++++++----------------------- hw/vfio/pci.c | 3 +- system/qdev-monitor.c | 12 ++--- 14 files changed, 116 insertions(+), 97 deletions(-) --- base-commit: 4a4efae44f19528589204581e9e2fab69c5d39aa change-id: 20240129-reuse-faae22b11934 Best regards, -- Akihiko Odaki <akihiko.odaki@daynix.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 1/7] hw/pci: Use -1 as a default value for rombar 2024-02-12 10:20 [PATCH v3 0/7] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki @ 2024-02-12 10:20 ` Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 2/7] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki ` (6 subsequent siblings) 7 siblings, 0 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-12 10:20 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Akihiko Odaki Currently there is no way to distinguish the case that rombar is explicitly specified as 1 and the case that rombar is not specified. Set rombar -1 by default to distinguish these cases just as it is done for addr and romsize. It was confirmed that changing the default value to -1 will not change the behavior by looking at occurences of rom_bar. $ git grep -w rom_bar hw/display/qxl.c:328: QXLRom *rom = memory_region_get_ram_ptr(&d->rom_bar); hw/display/qxl.c:431: qxl_set_dirty(&qxl->rom_bar, 0, qxl->rom_size); hw/display/qxl.c:1048: QXLRom *rom = memory_region_get_ram_ptr(&qxl->rom_bar); hw/display/qxl.c:2131: memory_region_init_rom(&qxl->rom_bar, OBJECT(qxl), "qxl.vrom", hw/display/qxl.c:2154: PCI_BASE_ADDRESS_SPACE_MEMORY, &qxl->rom_bar); hw/display/qxl.h:101: MemoryRegion rom_bar; hw/pci/pci.c:74: DEFINE_PROP_UINT32("rombar", PCIDevice, rom_bar, 1), hw/pci/pci.c:2329: if (!pdev->rom_bar) { hw/vfio/pci.c:1019: if (vdev->pdev.romfile || !vdev->pdev.rom_bar) { hw/xen/xen_pt_load_rom.c:29: if (dev->romfile || !dev->rom_bar) { include/hw/pci/pci_device.h:150: uint32_t rom_bar; rom_bar refers to a different variable in qxl. It is only tested if the value is 0 or not in the other places. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/pci/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 76080af580d7..d08548d8ffe9 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -71,7 +71,7 @@ static Property pci_props[] = { DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1), DEFINE_PROP_STRING("romfile", PCIDevice, romfile), DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, -1), - DEFINE_PROP_UINT32("rombar", PCIDevice, rom_bar, 1), + DEFINE_PROP_UINT32("rombar", PCIDevice, rom_bar, -1), DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present, QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false), DEFINE_PROP_BIT("x-pcie-lnksta-dllla", PCIDevice, cap_present, -- 2.43.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 2/7] hw/pci: Determine if rombar is explicitly enabled 2024-02-12 10:20 [PATCH v3 0/7] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 1/7] hw/pci: Use -1 as a default value for rombar Akihiko Odaki @ 2024-02-12 10:20 ` Akihiko Odaki 2024-02-13 10:52 ` Michael S. Tsirkin 2024-02-12 10:20 ` [PATCH v3 3/7] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki ` (5 subsequent siblings) 7 siblings, 1 reply; 17+ messages in thread From: Akihiko Odaki @ 2024-02-12 10:20 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Akihiko Odaki vfio determines if rombar is explicitly enabled by inspecting QDict. Inspecting QDict is not nice because QDict is untyped and depends on the details on the external interface. Add an infrastructure to determine if rombar is explicitly enabled to hw/pci. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- include/hw/pci/pci_device.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h index d3dd0f64b273..7564e9536dbd 100644 --- a/include/hw/pci/pci_device.h +++ b/include/hw/pci/pci_device.h @@ -205,6 +205,11 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev) return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn); } +static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev) +{ + return dev->rom_bar > 0; +} + uint16_t pci_requester_id(PCIDevice *dev); /* DMA access functions */ -- 2.43.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/7] hw/pci: Determine if rombar is explicitly enabled 2024-02-12 10:20 ` [PATCH v3 2/7] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki @ 2024-02-13 10:52 ` Michael S. Tsirkin 2024-02-13 12:07 ` Akihiko Odaki 0 siblings, 1 reply; 17+ messages in thread From: Michael S. Tsirkin @ 2024-02-13 10:52 UTC (permalink / raw) To: Akihiko Odaki Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen, qemu-devel, qemu-block On Mon, Feb 12, 2024 at 07:20:30PM +0900, Akihiko Odaki wrote: > vfio determines if rombar is explicitly enabled by inspecting QDict. > Inspecting QDict is not nice because QDict is untyped and depends on the > details on the external interface. Add an infrastructure to determine if > rombar is explicitly enabled to hw/pci. > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > --- > include/hw/pci/pci_device.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h > index d3dd0f64b273..7564e9536dbd 100644 > --- a/include/hw/pci/pci_device.h > +++ b/include/hw/pci/pci_device.h > @@ -205,6 +205,11 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev) > return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn); > } > > +static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev) > +{ > + return dev->rom_bar > 0; > +} > + I don't get it. rom_bar is uint32_t if it's set to "-1" is is still >0. How was this patchset tested? > uint16_t pci_requester_id(PCIDevice *dev); > > /* DMA access functions */ > > -- > 2.43.0 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 2/7] hw/pci: Determine if rombar is explicitly enabled 2024-02-13 10:52 ` Michael S. Tsirkin @ 2024-02-13 12:07 ` Akihiko Odaki 0 siblings, 0 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-13 12:07 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen, qemu-devel, qemu-block On 2024/02/13 19:52, Michael S. Tsirkin wrote: > On Mon, Feb 12, 2024 at 07:20:30PM +0900, Akihiko Odaki wrote: >> vfio determines if rombar is explicitly enabled by inspecting QDict. >> Inspecting QDict is not nice because QDict is untyped and depends on the >> details on the external interface. Add an infrastructure to determine if >> rombar is explicitly enabled to hw/pci. >> >> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> >> --- >> include/hw/pci/pci_device.h | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h >> index d3dd0f64b273..7564e9536dbd 100644 >> --- a/include/hw/pci/pci_device.h >> +++ b/include/hw/pci/pci_device.h >> @@ -205,6 +205,11 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev) >> return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn); >> } >> >> +static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev) >> +{ >> + return dev->rom_bar > 0; >> +} >> + > > I don't get it. rom_bar is uint32_t if it's set to "-1" is is still >0. > > How was this patchset tested? I was careless forgot to test v3. I'll revert this line into what v2 had. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 3/7] vfio: Avoid inspecting option QDict for rombar 2024-02-12 10:20 [PATCH v3 0/7] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 1/7] hw/pci: Use -1 as a default value for rombar Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 2/7] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki @ 2024-02-12 10:20 ` Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 4/7] hw/qdev: Remove opts member Akihiko Odaki ` (4 subsequent siblings) 7 siblings, 0 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-12 10:20 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Akihiko Odaki Use pci_rom_bar_explicitly_enabled() to determine if rombar is explicitly enabled. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/vfio/pci.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index d7fe06715c4b..44178ac9355f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1010,7 +1010,6 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev) { uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK); off_t offset = vdev->config_offset + PCI_ROM_ADDRESS; - DeviceState *dev = DEVICE(vdev); char *name; int fd = vdev->vbasedev.fd; @@ -1044,7 +1043,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev) } if (vfio_opt_rom_in_denylist(vdev)) { - if (dev->opts && qdict_haskey(dev->opts, "rombar")) { + if (pci_rom_bar_explicitly_enabled(&vdev->pdev)) { warn_report("Device at %s is known to cause system instability" " issues during option rom execution", vdev->vbasedev.name); -- 2.43.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 4/7] hw/qdev: Remove opts member 2024-02-12 10:20 [PATCH v3 0/7] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki ` (2 preceding siblings ...) 2024-02-12 10:20 ` [PATCH v3 3/7] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki @ 2024-02-12 10:20 ` Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 5/7] pcie_sriov: Validate NumVFs Akihiko Odaki ` (3 subsequent siblings) 7 siblings, 0 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-12 10:20 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Akihiko Odaki It is no longer used. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> --- include/hw/qdev-core.h | 4 ---- hw/core/qdev.c | 1 - system/qdev-monitor.c | 12 +++++++----- 3 files changed, 7 insertions(+), 10 deletions(-) diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 151d9682380d..6befbca31117 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -237,10 +237,6 @@ struct DeviceState { * @pending_deleted_expires_ms: optional timeout for deletion events */ int64_t pending_deleted_expires_ms; - /** - * @opts: QDict of options for the device - */ - QDict *opts; /** * @hotplugged: was device added after PHASE_MACHINE_READY? */ diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 43d863b0c5b0..c98691a90d48 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -706,7 +706,6 @@ static void device_finalize(Object *obj) dev->canonical_path = NULL; } - qobject_unref(dev->opts); g_free(dev->id); } diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c index a13db763e5dd..71c00f62ee38 100644 --- a/system/qdev-monitor.c +++ b/system/qdev-monitor.c @@ -625,6 +625,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts, char *id; DeviceState *dev = NULL; BusState *bus = NULL; + QDict *properties; driver = qdict_get_try_str(opts, "driver"); if (!driver) { @@ -705,13 +706,14 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts, } /* set properties */ - dev->opts = qdict_clone_shallow(opts); - qdict_del(dev->opts, "driver"); - qdict_del(dev->opts, "bus"); - qdict_del(dev->opts, "id"); + properties = qdict_clone_shallow(opts); + qdict_del(properties, "driver"); + qdict_del(properties, "bus"); + qdict_del(properties, "id"); - object_set_properties_from_keyval(&dev->parent_obj, dev->opts, from_json, + object_set_properties_from_keyval(&dev->parent_obj, properties, from_json, errp); + qobject_unref(properties); if (*errp) { goto err_del_dev; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v3 5/7] pcie_sriov: Validate NumVFs 2024-02-12 10:20 [PATCH v3 0/7] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki ` (3 preceding siblings ...) 2024-02-12 10:20 ` [PATCH v3 4/7] hw/qdev: Remove opts member Akihiko Odaki @ 2024-02-12 10:20 ` Akihiko Odaki 2024-02-13 10:59 ` Michael S. Tsirkin 2024-02-12 10:20 ` [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki ` (2 subsequent siblings) 7 siblings, 1 reply; 17+ messages in thread From: Akihiko Odaki @ 2024-02-12 10:20 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Akihiko Odaki The guest may write NumVFs greater than TotalVFs and that can lead to buffer overflow in VF implementations. Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/pci/pcie_sriov.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c index a1fe65f5d801..da209b7f47fd 100644 --- a/hw/pci/pcie_sriov.c +++ b/hw/pci/pcie_sriov.c @@ -176,6 +176,9 @@ static void register_vfs(PCIDevice *dev) assert(sriov_cap > 0); num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF); + if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) { + return; + } dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs); -- 2.43.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3 5/7] pcie_sriov: Validate NumVFs 2024-02-12 10:20 ` [PATCH v3 5/7] pcie_sriov: Validate NumVFs Akihiko Odaki @ 2024-02-13 10:59 ` Michael S. Tsirkin 2024-02-13 14:29 ` Akihiko Odaki 0 siblings, 1 reply; 17+ messages in thread From: Michael S. Tsirkin @ 2024-02-13 10:59 UTC (permalink / raw) To: Akihiko Odaki Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen, qemu-devel, qemu-block On Mon, Feb 12, 2024 at 07:20:33PM +0900, Akihiko Odaki wrote: > The guest may write NumVFs greater than TotalVFs and that can lead > to buffer overflow in VF implementations. > > Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)") > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > --- > hw/pci/pcie_sriov.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c > index a1fe65f5d801..da209b7f47fd 100644 > --- a/hw/pci/pcie_sriov.c > +++ b/hw/pci/pcie_sriov.c > @@ -176,6 +176,9 @@ static void register_vfs(PCIDevice *dev) > > assert(sriov_cap > 0); > num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF); > + if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) { > + return; > + } Indeed: The results are undefined if NumVFs is set to a value greater than TotalVFs. However I note that hw/nvme/ctrl.c will still poke at NumVFs. Since it's undefined, I propose a simpler hack and just force it to PCI_SRIOV_TOTAL_VF. This way everyone can just assume it's ok. > > dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs); > > > -- > 2.43.0 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 5/7] pcie_sriov: Validate NumVFs 2024-02-13 10:59 ` Michael S. Tsirkin @ 2024-02-13 14:29 ` Akihiko Odaki 0 siblings, 0 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-13 14:29 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen, qemu-devel, qemu-block On 2024/02/13 19:59, Michael S. Tsirkin wrote: > On Mon, Feb 12, 2024 at 07:20:33PM +0900, Akihiko Odaki wrote: >> The guest may write NumVFs greater than TotalVFs and that can lead >> to buffer overflow in VF implementations. >> >> Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)") >> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> >> --- >> hw/pci/pcie_sriov.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c >> index a1fe65f5d801..da209b7f47fd 100644 >> --- a/hw/pci/pcie_sriov.c >> +++ b/hw/pci/pcie_sriov.c >> @@ -176,6 +176,9 @@ static void register_vfs(PCIDevice *dev) >> >> assert(sriov_cap > 0); >> num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF); >> + if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) { >> + return; >> + } > > Indeed: > The results are undefined if NumVFs is set to a value greater than TotalVFs. > > However I note that hw/nvme/ctrl.c will still poke at NumVFs. > > Since it's undefined, I propose a simpler hack and just force it > to PCI_SRIOV_TOTAL_VF. This way everyone can just assume it's ok. It is still not OK to poke at NumVFs as the guest may set a different number anytime though it's undefined if NumVFs is set while VFs are enabled. I think hw/nvme/ctrl.c should be changed to look at exp.sriov_pf.num_vfs, which holds the committed NumVFs value. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances 2024-02-12 10:20 [PATCH v3 0/7] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki ` (4 preceding siblings ...) 2024-02-12 10:20 ` [PATCH v3 5/7] pcie_sriov: Validate NumVFs Akihiko Odaki @ 2024-02-12 10:20 ` Akihiko Odaki 2024-02-13 11:01 ` Michael S. Tsirkin 2024-02-12 10:20 ` [PATCH v3 7/7] pcie_sriov: Release VFs failed to realize Akihiko Odaki [not found] ` <CGME20240212102210epcas2p4346c0dfc475ecee9f359d634eba46783@epcms2p8> 7 siblings, 1 reply; 17+ messages in thread From: Akihiko Odaki @ 2024-02-12 10:20 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Akihiko Odaki Disable SR-IOV VF devices by reusing code to power down PCI devices instead of removing them when the guest requests to disable VFs. This allows to realize devices and report VF realization errors at PF realization time. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- docs/pcie_sriov.txt | 8 ++-- include/hw/pci/pci.h | 2 +- include/hw/pci/pci_device.h | 2 +- include/hw/pci/pcie_sriov.h | 6 +-- hw/net/igb.c | 13 ++++-- hw/nvme/ctrl.c | 24 +++++++---- hw/pci/pci.c | 18 ++++---- hw/pci/pci_host.c | 4 +- hw/pci/pcie.c | 4 +- hw/pci/pcie_sriov.c | 100 ++++++++++++++++++++------------------------ 10 files changed, 97 insertions(+), 84 deletions(-) diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt index a47aad0bfab0..ab2142807f79 100644 --- a/docs/pcie_sriov.txt +++ b/docs/pcie_sriov.txt @@ -52,9 +52,11 @@ setting up a BAR for a VF. ... /* Add and initialize the SR/IOV capability */ - pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", - vf_devid, initial_vfs, total_vfs, - fun_offset, stride); + if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", + vf_devid, initial_vfs, total_vfs, + fun_offset, stride, errp)) { + return; + } /* Set up individual VF BARs (parameters as for normal BARs) */ pcie_sriov_pf_init_vf_bar( ... ) diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index fa6313aabc43..fae83b9b723c 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -643,6 +643,6 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev) } MSIMessage pci_get_msi_message(PCIDevice *dev, int vector); -void pci_set_power(PCIDevice *pci_dev, bool state); +void pci_set_enabled(PCIDevice *pci_dev, bool state); #endif diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h index 7564e9536dbd..153e13eaef99 100644 --- a/include/hw/pci/pci_device.h +++ b/include/hw/pci/pci_device.h @@ -56,7 +56,7 @@ typedef struct PCIReqIDCache PCIReqIDCache; struct PCIDevice { DeviceState qdev; bool partially_hotplugged; - bool has_power; + bool is_enabled; /* PCI config space */ uint8_t *config; diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h index 095fb0c9edf9..d9a39daccac4 100644 --- a/include/hw/pci/pcie_sriov.h +++ b/include/hw/pci/pcie_sriov.h @@ -18,7 +18,6 @@ struct PCIESriovPF { uint16_t num_vfs; /* Number of virtual functions created */ uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each VF bar */ - const char *vfname; /* Reference to the device type used for the VFs */ PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */ }; @@ -27,10 +26,11 @@ struct PCIESriovVF { uint16_t vf_number; /* Logical VF number of this function */ }; -void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, +bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, const char *vfname, uint16_t vf_dev_id, uint16_t init_vfs, uint16_t total_vfs, - uint16_t vf_offset, uint16_t vf_stride); + uint16_t vf_offset, uint16_t vf_stride, + Error **errp); void pcie_sriov_pf_exit(PCIDevice *dev); /* Set up a VF bar in the SR/IOV bar area */ diff --git a/hw/net/igb.c b/hw/net/igb.c index 0b5c31a58bba..1079a33d4000 100644 --- a/hw/net/igb.c +++ b/hw/net/igb.c @@ -447,9 +447,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error **errp) pcie_ari_init(pci_dev, 0x150); - pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF, - IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS, - IGB_VF_OFFSET, IGB_VF_STRIDE); + if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, + TYPE_IGBVF, IGB_82576_VF_DEV_ID, + IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS, + IGB_VF_OFFSET, IGB_VF_STRIDE, + errp)) { + pcie_cap_exit(pci_dev); + igb_cleanup_msix(s); + msi_uninit(pci_dev); + return; + } pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX, PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH, diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index f026245d1e9e..f8df622fe590 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -8031,7 +8031,8 @@ static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs, return bar_size; } -static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset) +static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset, + Error **errp) { uint16_t vf_dev_id = n->params.use_intel_id ? PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME; @@ -8040,12 +8041,17 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset) le16_to_cpu(cap->vifrsm), NULL, NULL); - pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id, - n->params.sriov_max_vfs, n->params.sriov_max_vfs, - NVME_VF_OFFSET, NVME_VF_STRIDE); + if (!pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id, + n->params.sriov_max_vfs, n->params.sriov_max_vfs, + NVME_VF_OFFSET, NVME_VF_STRIDE, + errp)) { + return false; + } pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size); + + return true; } static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) @@ -8124,6 +8130,12 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) return false; } + if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs && + !nvme_init_sriov(n, pci_dev, 0x120, errp)) { + msix_uninit(pci_dev, &n->bar0, &n->bar0); + return false; + } + nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize); if (n->params.cmb_size_mb) { @@ -8134,10 +8146,6 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) nvme_init_pmr(n, pci_dev); } - if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs) { - nvme_init_sriov(n, pci_dev, 0x120); - } - return true; } diff --git a/hw/pci/pci.c b/hw/pci/pci.c index d08548d8ffe9..727d812f419f 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -1522,7 +1522,7 @@ static void pci_update_mappings(PCIDevice *d) continue; new_addr = pci_bar_address(d, i, r->type, r->size); - if (!d->has_power) { + if (!d->is_enabled) { new_addr = PCI_BAR_UNMAPPED; } @@ -1610,7 +1610,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int pci_update_irq_disabled(d, was_irq_disabled); memory_region_set_enabled(&d->bus_master_enable_region, (pci_get_word(d->config + PCI_COMMAND) - & PCI_COMMAND_MASTER) && d->has_power); + & PCI_COMMAND_MASTER) && d->is_enabled); } msi_write_config(d, addr, val_in, l); @@ -2179,7 +2179,9 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) return; } - pci_set_power(pci_dev, true); + if (!pci_is_vf(pci_dev)) { + pci_set_enabled(pci_dev, true); + } pci_dev->msi_trigger = pci_msi_trigger; } @@ -2835,18 +2837,18 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector) return msg; } -void pci_set_power(PCIDevice *d, bool state) +void pci_set_enabled(PCIDevice *d, bool state) { - if (d->has_power == state) { + if (d->is_enabled == state) { return; } - d->has_power = state; + d->is_enabled = state; pci_update_mappings(d); memory_region_set_enabled(&d->bus_master_enable_region, (pci_get_word(d->config + PCI_COMMAND) - & PCI_COMMAND_MASTER) && d->has_power); - if (!d->has_power) { + & PCI_COMMAND_MASTER) && d->is_enabled); + if (d->qdev.realized) { pci_device_reset(d); } } diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c index dfe6fe618401..d7e13d72ce07 100644 --- a/hw/pci/pci_host.c +++ b/hw/pci/pci_host.c @@ -86,7 +86,7 @@ void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr, * allowing direct removal of unexposed functions. */ if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) || - !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) { + !pci_dev->is_enabled || is_pci_dev_ejected(pci_dev)) { return; } @@ -111,7 +111,7 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr, * allowing direct removal of unexposed functions. */ if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) || - !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) { + !pci_dev->is_enabled || is_pci_dev_ejected(pci_dev)) { return ~0x0; } diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 6db0cf69cd8a..f34c157e1fd3 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -394,7 +394,9 @@ static void pcie_set_power_device(PCIBus *bus, PCIDevice *dev, void *opaque) { bool *power = opaque; - pci_set_power(dev, *power); + if (!pci_is_vf(dev)) { + pci_set_enabled(dev, *power); + } } static void pcie_cap_update_power(PCIDevice *hotplug_dev) diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c index da209b7f47fd..9ba34cf8f8ed 100644 --- a/hw/pci/pcie_sriov.c +++ b/hw/pci/pcie_sriov.c @@ -20,15 +20,29 @@ #include "qapi/error.h" #include "trace.h" -static PCIDevice *register_vf(PCIDevice *pf, int devfn, - const char *name, uint16_t vf_num); -static void unregister_vfs(PCIDevice *dev); +static void unrealize_vfs(PCIDevice *dev, uint16_t total_vfs) +{ + for (uint16_t i = 0; i < total_vfs; i++) { + Error *err = NULL; + PCIDevice *vf = dev->exp.sriov_pf.vf[i]; + if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) { + error_reportf_err(err, "Failed to unplug: "); + } + object_unparent(OBJECT(vf)); + object_unref(OBJECT(vf)); + } + g_free(dev->exp.sriov_pf.vf); + dev->exp.sriov_pf.vf = NULL; +} -void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, +bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, const char *vfname, uint16_t vf_dev_id, uint16_t init_vfs, uint16_t total_vfs, - uint16_t vf_offset, uint16_t vf_stride) + uint16_t vf_offset, uint16_t vf_stride, + Error **errp) { + BusState *bus = qdev_get_parent_bus(&dev->qdev); + int32_t devfn = dev->devfn + vf_offset; uint8_t *cfg = dev->config + offset; uint8_t *wmask; @@ -36,7 +50,6 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, offset, PCI_EXT_CAP_SRIOV_SIZEOF); dev->exp.sriov_cap = offset; dev->exp.sriov_pf.num_vfs = 0; - dev->exp.sriov_pf.vfname = g_strdup(vfname); dev->exp.sriov_pf.vf = NULL; pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset); @@ -69,13 +82,35 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553); qdev_prop_set_bit(&dev->qdev, "multifunction", true); + + dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs); + + for (uint16_t i = 0; i < total_vfs; i++) { + PCIDevice *vf = pci_new(devfn, vfname); + vf->exp.sriov_vf.pf = dev; + vf->exp.sriov_vf.vf_number = i; + + if (!qdev_realize(&vf->qdev, bus, errp)) { + unrealize_vfs(dev, i); + return false; + } + + /* set vid/did according to sr/iov spec - they are not used */ + pci_config_set_vendor_id(vf->config, 0xffff); + pci_config_set_device_id(vf->config, 0xffff); + + dev->exp.sriov_pf.vf[i] = vf; + devfn += vf_stride; + } + + return true; } void pcie_sriov_pf_exit(PCIDevice *dev) { - unregister_vfs(dev); - g_free((char *)dev->exp.sriov_pf.vfname); - dev->exp.sriov_pf.vfname = NULL; + uint8_t *cfg = dev->config + dev->exp.sriov_cap; + + unrealize_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)); } void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num, @@ -141,38 +176,11 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num, } } -static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name, - uint16_t vf_num) -{ - PCIDevice *dev = pci_new(devfn, name); - dev->exp.sriov_vf.pf = pf; - dev->exp.sriov_vf.vf_number = vf_num; - PCIBus *bus = pci_get_bus(pf); - Error *local_err = NULL; - - qdev_realize(&dev->qdev, &bus->qbus, &local_err); - if (local_err) { - error_report_err(local_err); - return NULL; - } - - /* set vid/did according to sr/iov spec - they are not used */ - pci_config_set_vendor_id(dev->config, 0xffff); - pci_config_set_device_id(dev->config, 0xffff); - - return dev; -} - static void register_vfs(PCIDevice *dev) { uint16_t num_vfs; uint16_t i; uint16_t sriov_cap = dev->exp.sriov_cap; - uint16_t vf_offset = - pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET); - uint16_t vf_stride = - pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE); - int32_t devfn = dev->devfn + vf_offset; assert(sriov_cap > 0); num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF); @@ -180,18 +188,10 @@ static void register_vfs(PCIDevice *dev) return; } - dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs); - trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), num_vfs); for (i = 0; i < num_vfs; i++) { - dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn, - dev->exp.sriov_pf.vfname, i); - if (!dev->exp.sriov_pf.vf[i]) { - num_vfs = i; - break; - } - devfn += vf_stride; + pci_set_enabled(dev->exp.sriov_pf.vf[i], true); } dev->exp.sriov_pf.num_vfs = num_vfs; } @@ -204,16 +204,8 @@ static void unregister_vfs(PCIDevice *dev) trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), num_vfs); for (i = 0; i < num_vfs; i++) { - Error *err = NULL; - PCIDevice *vf = dev->exp.sriov_pf.vf[i]; - if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) { - error_reportf_err(err, "Failed to unplug: "); - } - object_unparent(OBJECT(vf)); - object_unref(OBJECT(vf)); + pci_set_enabled(dev->exp.sriov_pf.vf[i], false); } - g_free(dev->exp.sriov_pf.vf); - dev->exp.sriov_pf.vf = NULL; dev->exp.sriov_pf.num_vfs = 0; pci_set_word(dev->config + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 0); } -- 2.43.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances 2024-02-12 10:20 ` [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki @ 2024-02-13 11:01 ` Michael S. Tsirkin 2024-02-13 12:17 ` Akihiko Odaki 0 siblings, 1 reply; 17+ messages in thread From: Michael S. Tsirkin @ 2024-02-13 11:01 UTC (permalink / raw) To: Akihiko Odaki Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen, qemu-devel, qemu-block On Mon, Feb 12, 2024 at 07:20:34PM +0900, Akihiko Odaki wrote: > Disable SR-IOV VF devices by reusing code to power down PCI devices > instead of removing them when the guest requests to disable VFs. This > allows to realize devices and report VF realization errors at PF > realization time. > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> It is simpler for sure, but I am worried that all of these unused VFs will consume lots of resources even if never enabled. Thoughts? > --- > docs/pcie_sriov.txt | 8 ++-- > include/hw/pci/pci.h | 2 +- > include/hw/pci/pci_device.h | 2 +- > include/hw/pci/pcie_sriov.h | 6 +-- > hw/net/igb.c | 13 ++++-- > hw/nvme/ctrl.c | 24 +++++++---- > hw/pci/pci.c | 18 ++++---- > hw/pci/pci_host.c | 4 +- > hw/pci/pcie.c | 4 +- > hw/pci/pcie_sriov.c | 100 ++++++++++++++++++++------------------------ > 10 files changed, 97 insertions(+), 84 deletions(-) > > diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt > index a47aad0bfab0..ab2142807f79 100644 > --- a/docs/pcie_sriov.txt > +++ b/docs/pcie_sriov.txt > @@ -52,9 +52,11 @@ setting up a BAR for a VF. > ... > > /* Add and initialize the SR/IOV capability */ > - pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", > - vf_devid, initial_vfs, total_vfs, > - fun_offset, stride); > + if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", > + vf_devid, initial_vfs, total_vfs, > + fun_offset, stride, errp)) { > + return; > + } > > /* Set up individual VF BARs (parameters as for normal BARs) */ > pcie_sriov_pf_init_vf_bar( ... ) > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h > index fa6313aabc43..fae83b9b723c 100644 > --- a/include/hw/pci/pci.h > +++ b/include/hw/pci/pci.h > @@ -643,6 +643,6 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev) > } > > MSIMessage pci_get_msi_message(PCIDevice *dev, int vector); > -void pci_set_power(PCIDevice *pci_dev, bool state); > +void pci_set_enabled(PCIDevice *pci_dev, bool state); > > #endif > diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h > index 7564e9536dbd..153e13eaef99 100644 > --- a/include/hw/pci/pci_device.h > +++ b/include/hw/pci/pci_device.h > @@ -56,7 +56,7 @@ typedef struct PCIReqIDCache PCIReqIDCache; > struct PCIDevice { > DeviceState qdev; > bool partially_hotplugged; > - bool has_power; > + bool is_enabled; > > /* PCI config space */ > uint8_t *config; > diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h > index 095fb0c9edf9..d9a39daccac4 100644 > --- a/include/hw/pci/pcie_sriov.h > +++ b/include/hw/pci/pcie_sriov.h > @@ -18,7 +18,6 @@ > struct PCIESriovPF { > uint16_t num_vfs; /* Number of virtual functions created */ > uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each VF bar */ > - const char *vfname; /* Reference to the device type used for the VFs */ > PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */ > }; > > @@ -27,10 +26,11 @@ struct PCIESriovVF { > uint16_t vf_number; /* Logical VF number of this function */ > }; > > -void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, > +bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, > const char *vfname, uint16_t vf_dev_id, > uint16_t init_vfs, uint16_t total_vfs, > - uint16_t vf_offset, uint16_t vf_stride); > + uint16_t vf_offset, uint16_t vf_stride, > + Error **errp); > void pcie_sriov_pf_exit(PCIDevice *dev); > > /* Set up a VF bar in the SR/IOV bar area */ > diff --git a/hw/net/igb.c b/hw/net/igb.c > index 0b5c31a58bba..1079a33d4000 100644 > --- a/hw/net/igb.c > +++ b/hw/net/igb.c > @@ -447,9 +447,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error **errp) > > pcie_ari_init(pci_dev, 0x150); > > - pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF, > - IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS, > - IGB_VF_OFFSET, IGB_VF_STRIDE); > + if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, > + TYPE_IGBVF, IGB_82576_VF_DEV_ID, > + IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS, > + IGB_VF_OFFSET, IGB_VF_STRIDE, > + errp)) { > + pcie_cap_exit(pci_dev); > + igb_cleanup_msix(s); > + msi_uninit(pci_dev); > + return; > + } > > pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX, > PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH, > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c > index f026245d1e9e..f8df622fe590 100644 > --- a/hw/nvme/ctrl.c > +++ b/hw/nvme/ctrl.c > @@ -8031,7 +8031,8 @@ static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs, > return bar_size; > } > > -static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset) > +static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset, > + Error **errp) > { > uint16_t vf_dev_id = n->params.use_intel_id ? > PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME; > @@ -8040,12 +8041,17 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset) > le16_to_cpu(cap->vifrsm), > NULL, NULL); > > - pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id, > - n->params.sriov_max_vfs, n->params.sriov_max_vfs, > - NVME_VF_OFFSET, NVME_VF_STRIDE); > + if (!pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id, > + n->params.sriov_max_vfs, n->params.sriov_max_vfs, > + NVME_VF_OFFSET, NVME_VF_STRIDE, > + errp)) { > + return false; > + } > > pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | > PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size); > + > + return true; > } > > static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) > @@ -8124,6 +8130,12 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) > return false; > } > > + if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs && > + !nvme_init_sriov(n, pci_dev, 0x120, errp)) { > + msix_uninit(pci_dev, &n->bar0, &n->bar0); > + return false; > + } > + > nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize); > > if (n->params.cmb_size_mb) { > @@ -8134,10 +8146,6 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) > nvme_init_pmr(n, pci_dev); > } > > - if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs) { > - nvme_init_sriov(n, pci_dev, 0x120); > - } > - > return true; > } > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > index d08548d8ffe9..727d812f419f 100644 > --- a/hw/pci/pci.c > +++ b/hw/pci/pci.c > @@ -1522,7 +1522,7 @@ static void pci_update_mappings(PCIDevice *d) > continue; > > new_addr = pci_bar_address(d, i, r->type, r->size); > - if (!d->has_power) { > + if (!d->is_enabled) { > new_addr = PCI_BAR_UNMAPPED; > } > > @@ -1610,7 +1610,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int > pci_update_irq_disabled(d, was_irq_disabled); > memory_region_set_enabled(&d->bus_master_enable_region, > (pci_get_word(d->config + PCI_COMMAND) > - & PCI_COMMAND_MASTER) && d->has_power); > + & PCI_COMMAND_MASTER) && d->is_enabled); > } > > msi_write_config(d, addr, val_in, l); > @@ -2179,7 +2179,9 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp) > return; > } > > - pci_set_power(pci_dev, true); > + if (!pci_is_vf(pci_dev)) { > + pci_set_enabled(pci_dev, true); > + } > > pci_dev->msi_trigger = pci_msi_trigger; > } > @@ -2835,18 +2837,18 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector) > return msg; > } > > -void pci_set_power(PCIDevice *d, bool state) > +void pci_set_enabled(PCIDevice *d, bool state) > { > - if (d->has_power == state) { > + if (d->is_enabled == state) { > return; > } > > - d->has_power = state; > + d->is_enabled = state; > pci_update_mappings(d); > memory_region_set_enabled(&d->bus_master_enable_region, > (pci_get_word(d->config + PCI_COMMAND) > - & PCI_COMMAND_MASTER) && d->has_power); > - if (!d->has_power) { > + & PCI_COMMAND_MASTER) && d->is_enabled); > + if (d->qdev.realized) { > pci_device_reset(d); > } > } > diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c > index dfe6fe618401..d7e13d72ce07 100644 > --- a/hw/pci/pci_host.c > +++ b/hw/pci/pci_host.c > @@ -86,7 +86,7 @@ void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr, > * allowing direct removal of unexposed functions. > */ > if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) || > - !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) { > + !pci_dev->is_enabled || is_pci_dev_ejected(pci_dev)) { > return; > } > > @@ -111,7 +111,7 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr, > * allowing direct removal of unexposed functions. > */ > if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) || > - !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) { > + !pci_dev->is_enabled || is_pci_dev_ejected(pci_dev)) { > return ~0x0; > } > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > index 6db0cf69cd8a..f34c157e1fd3 100644 > --- a/hw/pci/pcie.c > +++ b/hw/pci/pcie.c > @@ -394,7 +394,9 @@ static void pcie_set_power_device(PCIBus *bus, PCIDevice *dev, void *opaque) > { > bool *power = opaque; > > - pci_set_power(dev, *power); > + if (!pci_is_vf(dev)) { > + pci_set_enabled(dev, *power); > + } > } > > static void pcie_cap_update_power(PCIDevice *hotplug_dev) > diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c > index da209b7f47fd..9ba34cf8f8ed 100644 > --- a/hw/pci/pcie_sriov.c > +++ b/hw/pci/pcie_sriov.c > @@ -20,15 +20,29 @@ > #include "qapi/error.h" > #include "trace.h" > > -static PCIDevice *register_vf(PCIDevice *pf, int devfn, > - const char *name, uint16_t vf_num); > -static void unregister_vfs(PCIDevice *dev); > +static void unrealize_vfs(PCIDevice *dev, uint16_t total_vfs) > +{ > + for (uint16_t i = 0; i < total_vfs; i++) { > + Error *err = NULL; > + PCIDevice *vf = dev->exp.sriov_pf.vf[i]; > + if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) { > + error_reportf_err(err, "Failed to unplug: "); > + } > + object_unparent(OBJECT(vf)); > + object_unref(OBJECT(vf)); > + } > + g_free(dev->exp.sriov_pf.vf); > + dev->exp.sriov_pf.vf = NULL; > +} > > -void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, > +bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, > const char *vfname, uint16_t vf_dev_id, > uint16_t init_vfs, uint16_t total_vfs, > - uint16_t vf_offset, uint16_t vf_stride) > + uint16_t vf_offset, uint16_t vf_stride, > + Error **errp) > { > + BusState *bus = qdev_get_parent_bus(&dev->qdev); > + int32_t devfn = dev->devfn + vf_offset; > uint8_t *cfg = dev->config + offset; > uint8_t *wmask; > > @@ -36,7 +50,6 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, > offset, PCI_EXT_CAP_SRIOV_SIZEOF); > dev->exp.sriov_cap = offset; > dev->exp.sriov_pf.num_vfs = 0; > - dev->exp.sriov_pf.vfname = g_strdup(vfname); > dev->exp.sriov_pf.vf = NULL; > > pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset); > @@ -69,13 +82,35 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, > pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553); > > qdev_prop_set_bit(&dev->qdev, "multifunction", true); > + > + dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs); > + > + for (uint16_t i = 0; i < total_vfs; i++) { > + PCIDevice *vf = pci_new(devfn, vfname); > + vf->exp.sriov_vf.pf = dev; > + vf->exp.sriov_vf.vf_number = i; > + > + if (!qdev_realize(&vf->qdev, bus, errp)) { > + unrealize_vfs(dev, i); > + return false; > + } > + > + /* set vid/did according to sr/iov spec - they are not used */ > + pci_config_set_vendor_id(vf->config, 0xffff); > + pci_config_set_device_id(vf->config, 0xffff); > + > + dev->exp.sriov_pf.vf[i] = vf; > + devfn += vf_stride; > + } > + > + return true; > } > > void pcie_sriov_pf_exit(PCIDevice *dev) > { > - unregister_vfs(dev); > - g_free((char *)dev->exp.sriov_pf.vfname); > - dev->exp.sriov_pf.vfname = NULL; > + uint8_t *cfg = dev->config + dev->exp.sriov_cap; > + > + unrealize_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)); > } > > void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num, > @@ -141,38 +176,11 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num, > } > } > > -static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name, > - uint16_t vf_num) > -{ > - PCIDevice *dev = pci_new(devfn, name); > - dev->exp.sriov_vf.pf = pf; > - dev->exp.sriov_vf.vf_number = vf_num; > - PCIBus *bus = pci_get_bus(pf); > - Error *local_err = NULL; > - > - qdev_realize(&dev->qdev, &bus->qbus, &local_err); > - if (local_err) { > - error_report_err(local_err); > - return NULL; > - } > - > - /* set vid/did according to sr/iov spec - they are not used */ > - pci_config_set_vendor_id(dev->config, 0xffff); > - pci_config_set_device_id(dev->config, 0xffff); > - > - return dev; > -} > - > static void register_vfs(PCIDevice *dev) > { > uint16_t num_vfs; > uint16_t i; > uint16_t sriov_cap = dev->exp.sriov_cap; > - uint16_t vf_offset = > - pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET); > - uint16_t vf_stride = > - pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE); > - int32_t devfn = dev->devfn + vf_offset; > > assert(sriov_cap > 0); > num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF); > @@ -180,18 +188,10 @@ static void register_vfs(PCIDevice *dev) > return; > } > > - dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs); > - > trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn), > PCI_FUNC(dev->devfn), num_vfs); > for (i = 0; i < num_vfs; i++) { > - dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn, > - dev->exp.sriov_pf.vfname, i); > - if (!dev->exp.sriov_pf.vf[i]) { > - num_vfs = i; > - break; > - } > - devfn += vf_stride; > + pci_set_enabled(dev->exp.sriov_pf.vf[i], true); > } > dev->exp.sriov_pf.num_vfs = num_vfs; > } > @@ -204,16 +204,8 @@ static void unregister_vfs(PCIDevice *dev) > trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn), > PCI_FUNC(dev->devfn), num_vfs); > for (i = 0; i < num_vfs; i++) { > - Error *err = NULL; > - PCIDevice *vf = dev->exp.sriov_pf.vf[i]; > - if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) { > - error_reportf_err(err, "Failed to unplug: "); > - } > - object_unparent(OBJECT(vf)); > - object_unref(OBJECT(vf)); > + pci_set_enabled(dev->exp.sriov_pf.vf[i], false); > } > - g_free(dev->exp.sriov_pf.vf); > - dev->exp.sriov_pf.vf = NULL; > dev->exp.sriov_pf.num_vfs = 0; > pci_set_word(dev->config + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 0); > } > > -- > 2.43.0 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances 2024-02-13 11:01 ` Michael S. Tsirkin @ 2024-02-13 12:17 ` Akihiko Odaki 0 siblings, 0 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-13 12:17 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Philippe Mathieu-Daudé, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen, qemu-devel, qemu-block On 2024/02/13 20:01, Michael S. Tsirkin wrote: > On Mon, Feb 12, 2024 at 07:20:34PM +0900, Akihiko Odaki wrote: >> Disable SR-IOV VF devices by reusing code to power down PCI devices >> instead of removing them when the guest requests to disable VFs. This >> allows to realize devices and report VF realization errors at PF >> realization time. >> >> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > It is simpler for sure, but I am worried that all of these > unused VFs will consume lots of resources even if never > enabled. Thoughts? My rationale behind this change is that the resources should be allocated when the PF is realized to ensure the resources are available when the guest requests to enable VFs. When it is necessary to allocate resources dynamically, the conventional hotplug mechanism should be used instead since the SR-IOV interface is not designed to report resource allocation errors. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 7/7] pcie_sriov: Release VFs failed to realize 2024-02-12 10:20 [PATCH v3 0/7] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki ` (5 preceding siblings ...) 2024-02-12 10:20 ` [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki @ 2024-02-12 10:20 ` Akihiko Odaki [not found] ` <CGME20240212102210epcas2p4346c0dfc475ecee9f359d634eba46783@epcms2p8> 7 siblings, 0 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-12 10:20 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen Cc: qemu-devel, qemu-block, Akihiko Odaki Release VFs failed to realize just as we do in unregister_vfs(). Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> --- hw/pci/pcie_sriov.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c index 9ba34cf8f8ed..9d668b8d6c17 100644 --- a/hw/pci/pcie_sriov.c +++ b/hw/pci/pcie_sriov.c @@ -91,6 +91,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, vf->exp.sriov_vf.vf_number = i; if (!qdev_realize(&vf->qdev, bus, errp)) { + object_unparent(OBJECT(vf)); + object_unref(vf); unrealize_vfs(dev, i); return false; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <CGME20240212102210epcas2p4346c0dfc475ecee9f359d634eba46783@epcms2p8>]
* RE: [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances [not found] ` <CGME20240212102210epcas2p4346c0dfc475ecee9f359d634eba46783@epcms2p8> @ 2024-02-13 8:51 ` Minwoo Im 2024-02-13 12:26 ` Akihiko Odaki 2024-02-13 13:48 ` Akihiko Odaki 0 siblings, 2 replies; 17+ messages in thread From: Minwoo Im @ 2024-02-13 8:51 UTC (permalink / raw) To: Akihiko Odaki, Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen, Minwoo Im, minwoo.im.dev@gmail.com Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org > -----Original Message----- > From: qemu-block-bounces+minwoo.im.dev=gmail.com@nongnu.org <qemu-block- > bounces+minwoo.im.dev=gmail.com@nongnu.org> On Behalf Of Akihiko Odaki > Sent: Monday, February 12, 2024 7:21 PM > To: Philippe Mathieu-Daudé <philmd@linaro.org>; Michael S. Tsirkin > <mst@redhat.com>; Marcel Apfelbaum <marcel.apfelbaum@gmail.com>; Alex > Williamson <alex.williamson@redhat.com>; Cédric Le Goater <clg@redhat.com>; > Paolo Bonzini <pbonzini@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>; > Eduardo Habkost <eduardo@habkost.net>; Sriram Yagnaraman > <sriram.yagnaraman@est.tech>; Jason Wang <jasowang@redhat.com>; Keith Busch > <kbusch@kernel.org>; Klaus Jensen <its@irrelevant.dk> > Cc: qemu-devel@nongnu.org; qemu-block@nongnu.org; Akihiko Odaki > <akihiko.odaki@daynix.com> > Subject: [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances > > Disable SR-IOV VF devices by reusing code to power down PCI devices > instead of removing them when the guest requests to disable VFs. This > allows to realize devices and report VF realization errors at PF > realization time. > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Hello Akihiko, I think this patch fixes the issue reported in [1]. The latest master branch also causes an object-related assertion error when we enable VF(s) and disable through sysfs over and over again (at least twice). But this issue is also fixed with your patch. ** ERROR:../qom/object.c:753:object_finalize: assertion failed: (obj->parent == NULL) Bail out! ERROR:../qom/object.c:753:object_finalize: assertion failed: (obj->parent == NULL) Klaus, If this patchset is applied, I think [1] can be dropped. What do you think? Thanks, [1] https://lore.kernel.org/qemu-devel/20240109022953epcms2p54550dcfc9f831a515206513ae98e7511@epcms2p5/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances 2024-02-13 8:51 ` [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances Minwoo Im @ 2024-02-13 12:26 ` Akihiko Odaki 2024-02-13 13:48 ` Akihiko Odaki 1 sibling, 0 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-13 12:26 UTC (permalink / raw) To: minwoo.im, Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen, minwoo.im.dev@gmail.com Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org On 2024/02/13 17:51, Minwoo Im wrote: >> -----Original Message----- >> From: qemu-block-bounces+minwoo.im.dev=gmail.com@nongnu.org <qemu-block- >> bounces+minwoo.im.dev=gmail.com@nongnu.org> On Behalf Of Akihiko Odaki >> Sent: Monday, February 12, 2024 7:21 PM >> To: Philippe Mathieu-Daudé <philmd@linaro.org>; Michael S. Tsirkin >> <mst@redhat.com>; Marcel Apfelbaum <marcel.apfelbaum@gmail.com>; Alex >> Williamson <alex.williamson@redhat.com>; Cédric Le Goater <clg@redhat.com>; >> Paolo Bonzini <pbonzini@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>; >> Eduardo Habkost <eduardo@habkost.net>; Sriram Yagnaraman >> <sriram.yagnaraman@est.tech>; Jason Wang <jasowang@redhat.com>; Keith Busch >> <kbusch@kernel.org>; Klaus Jensen <its@irrelevant.dk> >> Cc: qemu-devel@nongnu.org; qemu-block@nongnu.org; Akihiko Odaki >> <akihiko.odaki@daynix.com> >> Subject: [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances >> >> Disable SR-IOV VF devices by reusing code to power down PCI devices >> instead of removing them when the guest requests to disable VFs. This >> allows to realize devices and report VF realization errors at PF >> realization time. >> >> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > Hello Akihiko, > > I think this patch fixes the issue reported in [1]. The latest master branch > also causes an object-related assertion error when we enable VF(s) and disable > through sysfs over and over again (at least twice). But this issue is also > fixed with your patch. > > ** > ERROR:../qom/object.c:753:object_finalize: assertion failed: (obj->parent == NULL) > Bail out! ERROR:../qom/object.c:753:object_finalize: assertion failed: (obj->parent == NULL) I'll note that in the next version. > > Klaus, > > If this patchset is applied, I think [1] can be dropped. What do you think? [1] should be kept. This patchset fixes use-after-free but double free [1] fixes still occurs. Regards, Akihiko Odaki > > Thanks, > > [1] https://lore.kernel.org/qemu-devel/20240109022953epcms2p54550dcfc9f831a515206513ae98e7511@epcms2p5/ > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances 2024-02-13 8:51 ` [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances Minwoo Im 2024-02-13 12:26 ` Akihiko Odaki @ 2024-02-13 13:48 ` Akihiko Odaki 1 sibling, 0 replies; 17+ messages in thread From: Akihiko Odaki @ 2024-02-13 13:48 UTC (permalink / raw) To: minwoo.im, Philippe Mathieu-Daudé, Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson, Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé, Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch, Klaus Jensen, minwoo.im.dev@gmail.com Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org On 2024/02/13 17:51, Minwoo Im wrote: >> -----Original Message----- >> From: qemu-block-bounces+minwoo.im.dev=gmail.com@nongnu.org <qemu-block- >> bounces+minwoo.im.dev=gmail.com@nongnu.org> On Behalf Of Akihiko Odaki >> Sent: Monday, February 12, 2024 7:21 PM >> To: Philippe Mathieu-Daudé <philmd@linaro.org>; Michael S. Tsirkin >> <mst@redhat.com>; Marcel Apfelbaum <marcel.apfelbaum@gmail.com>; Alex >> Williamson <alex.williamson@redhat.com>; Cédric Le Goater <clg@redhat.com>; >> Paolo Bonzini <pbonzini@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>; >> Eduardo Habkost <eduardo@habkost.net>; Sriram Yagnaraman >> <sriram.yagnaraman@est.tech>; Jason Wang <jasowang@redhat.com>; Keith Busch >> <kbusch@kernel.org>; Klaus Jensen <its@irrelevant.dk> >> Cc: qemu-devel@nongnu.org; qemu-block@nongnu.org; Akihiko Odaki >> <akihiko.odaki@daynix.com> >> Subject: [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances >> >> Disable SR-IOV VF devices by reusing code to power down PCI devices >> instead of removing them when the guest requests to disable VFs. This >> allows to realize devices and report VF realization errors at PF >> realization time. >> >> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> > > Hello Akihiko, > > I think this patch fixes the issue reported in [1]. The latest master branch > also causes an object-related assertion error when we enable VF(s) and disable > through sysfs over and over again (at least twice). But this issue is also > fixed with your patch. > > ** > ERROR:../qom/object.c:753:object_finalize: assertion failed: (obj->parent == NULL) > Bail out! ERROR:../qom/object.c:753:object_finalize: assertion failed: (obj->parent == NULL) I looked into this and found it's not fixed with my patch though the symptom may be gone. The problem is that object_ref() is not called when copying subsys. An obvious fix is to add an object_ref() call, but I think it's too hacky and error-prone. It's better to enumerate nvme_props and call object_property_get_qobject() and object_property_set_qobject() for each property. Regards, Akihiko Odaki > > Klaus, > > If this patchset is applied, I think [1] can be dropped. What do you think? > > Thanks, > > [1] https://lore.kernel.org/qemu-devel/20240109022953epcms2p54550dcfc9f831a515206513ae98e7511@epcms2p5/ > ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-02-13 14:33 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-12 10:20 [PATCH v3 0/7] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 1/7] hw/pci: Use -1 as a default value for rombar Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 2/7] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki 2024-02-13 10:52 ` Michael S. Tsirkin 2024-02-13 12:07 ` Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 3/7] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 4/7] hw/qdev: Remove opts member Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 5/7] pcie_sriov: Validate NumVFs Akihiko Odaki 2024-02-13 10:59 ` Michael S. Tsirkin 2024-02-13 14:29 ` Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki 2024-02-13 11:01 ` Michael S. Tsirkin 2024-02-13 12:17 ` Akihiko Odaki 2024-02-12 10:20 ` [PATCH v3 7/7] pcie_sriov: Release VFs failed to realize Akihiko Odaki [not found] ` <CGME20240212102210epcas2p4346c0dfc475ecee9f359d634eba46783@epcms2p8> 2024-02-13 8:51 ` [PATCH v3 6/7] pcie_sriov: Reuse SR-IOV VF device instances Minwoo Im 2024-02-13 12:26 ` Akihiko Odaki 2024-02-13 13:48 ` Akihiko Odaki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).