qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Some trivial live update fixes
@ 2025-06-23 10:22 Zhenzhong Duan
  2025-06-23 10:22 ` [PATCH 1/4] vfio/container: Fix SIGSEGV when open container file fails Zhenzhong Duan
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Zhenzhong Duan @ 2025-06-23 10:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, steven.sistare, chao.p.peng,
	Zhenzhong Duan, Michael Tokarev, Laurent Vivier,
	open list:Trivial patches

Hi

These are trivial VFIO live update fixes in corner cases.

1) open /dev/vfio/vfio fail trigger SIGSEGV
2) mdev hotplug trigger qemu abort
3) potential SIGSEGV when unmap-all-vaddr failed
3) potential vfio_container_post_load failure

Thanks
Zhenzhong

Zhenzhong Duan (4):
  vfio/container: Fix SIGSEGV when open container file fails
  vfio/container: fails mdev hotplug if add migration blocker failed
  vfio/container: Fix potential SIGSEGV when recover from
    unmap-all-vaddr failure
  vfio/container: Fix vfio_container_post_load()

 include/hw/vfio/vfio-cpr.h |  7 ++++---
 hw/vfio/container.c        | 12 +++++++++---
 hw/vfio/cpr-legacy.c       | 23 +++++++++--------------
 3 files changed, 22 insertions(+), 20 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] vfio/container: Fix SIGSEGV when open container file fails
  2025-06-23 10:22 [PATCH 0/4] Some trivial live update fixes Zhenzhong Duan
@ 2025-06-23 10:22 ` Zhenzhong Duan
  2025-06-24  8:53   ` Cédric Le Goater
  2025-06-24  9:55   ` Cédric Le Goater
  2025-06-23 10:22 ` [PATCH 2/4] vfio/container: fails mdev hotplug if add migration blocker failed Zhenzhong Duan
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Zhenzhong Duan @ 2025-06-23 10:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, steven.sistare, chao.p.peng,
	Zhenzhong Duan

When open /dev/vfio/vfio fails, SIGSEGV triggers because
vfio_listener_unregister() doesn't support a NULL bcontainer
pointer.

Fixes: a1f267a7d4d9 ("vfio/container: reform vfio_container_connect cleanup")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/container.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 3e8d645ebb..2853f6f08b 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -710,7 +710,9 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
     return true;
 
 fail:
-    vfio_listener_unregister(bcontainer);
+    if (new_container) {
+        vfio_listener_unregister(bcontainer);
+    }
 
     if (group_was_added) {
         vfio_container_group_del(container, group);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] vfio/container: fails mdev hotplug if add migration blocker failed
  2025-06-23 10:22 [PATCH 0/4] Some trivial live update fixes Zhenzhong Duan
  2025-06-23 10:22 ` [PATCH 1/4] vfio/container: Fix SIGSEGV when open container file fails Zhenzhong Duan
@ 2025-06-23 10:22 ` Zhenzhong Duan
  2025-06-24  9:21   ` Cédric Le Goater
  2025-06-23 10:22 ` [PATCH 3/4] vfio/container: Fix potential SIGSEGV when recover from unmap-all-vaddr failure Zhenzhong Duan
  2025-06-23 10:22 ` [PATCH 4/4] vfio/container: Fix vfio_container_post_load() Zhenzhong Duan
  3 siblings, 1 reply; 14+ messages in thread
From: Zhenzhong Duan @ 2025-06-23 10:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, steven.sistare, chao.p.peng,
	Zhenzhong Duan

It's aggressive to abort a running QEMU process when hotplug a mdev
and it fails migration blocker adding.

Fix by just failing mdev hotplug itself.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/container.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 2853f6f08b..68b4fdb401 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -992,12 +992,16 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
     if (vbasedev->mdev) {
         error_setg(&vbasedev->cpr.mdev_blocker,
                    "CPR does not support vfio mdev %s", vbasedev->name);
-        migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker, &error_fatal,
-                                  MIG_MODE_CPR_TRANSFER, -1);
+        if (migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker, errp,
+                                      MIG_MODE_CPR_TRANSFER, -1)) {
+            goto hiod_unref_exit;
+        }
     }
 
     return true;
 
+hiod_unref_exit:
+    object_unref(vbasedev->hiod);
 device_put_exit:
     vfio_device_put(vbasedev);
 group_put_exit:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] vfio/container: Fix potential SIGSEGV when recover from unmap-all-vaddr failure
  2025-06-23 10:22 [PATCH 0/4] Some trivial live update fixes Zhenzhong Duan
  2025-06-23 10:22 ` [PATCH 1/4] vfio/container: Fix SIGSEGV when open container file fails Zhenzhong Duan
  2025-06-23 10:22 ` [PATCH 2/4] vfio/container: fails mdev hotplug if add migration blocker failed Zhenzhong Duan
@ 2025-06-23 10:22 ` Zhenzhong Duan
  2025-06-24 16:54   ` Cédric Le Goater
  2025-06-23 10:22 ` [PATCH 4/4] vfio/container: Fix vfio_container_post_load() Zhenzhong Duan
  3 siblings, 1 reply; 14+ messages in thread
From: Zhenzhong Duan @ 2025-06-23 10:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, steven.sistare, chao.p.peng,
	Zhenzhong Duan

cpr.saved_dma_map isn't initialized in source qemu which lead to vioc->dma_map
assigned a NULL value, this will trigger SIGSEGV.

Fix it by save and restore vioc->dma_map locally.

Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-cpr.h | 8 +++++---
 hw/vfio/cpr-legacy.c       | 3 ++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index 8bf85b9f4e..aef542e93c 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -16,14 +16,16 @@ struct VFIOContainer;
 struct VFIOContainerBase;
 struct VFIOGroup;
 
+typedef int (*DMA_MAP_FUNC)(const struct VFIOContainerBase *bcontainer,
+                            hwaddr iova, ram_addr_t size, void *vaddr,
+                            bool readonly, MemoryRegion *mr);
+
 typedef struct VFIOContainerCPR {
     Error *blocker;
     bool vaddr_unmapped;
     NotifierWithReturn transfer_notifier;
     MemoryListener remap_listener;
-    int (*saved_dma_map)(const struct VFIOContainerBase *bcontainer,
-                         hwaddr iova, ram_addr_t size,
-                         void *vaddr, bool readonly, MemoryRegion *mr);
+    DMA_MAP_FUNC saved_dma_map;
 } VFIOContainerCPR;
 
 typedef struct VFIODeviceCPR {
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index a84c3247b7..100a8db74d 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -148,6 +148,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
          */
 
         VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+        DMA_MAP_FUNC saved_dma_map = vioc->dma_map;
         vioc->dma_map = vfio_legacy_cpr_dma_map;
 
         container->cpr.remap_listener = (MemoryListener) {
@@ -158,7 +159,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
                                  bcontainer->space->as);
         memory_listener_unregister(&container->cpr.remap_listener);
         container->cpr.vaddr_unmapped = false;
-        vioc->dma_map = container->cpr.saved_dma_map;
+        vioc->dma_map = saved_dma_map;
     }
     return 0;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] vfio/container: Fix vfio_container_post_load()
  2025-06-23 10:22 [PATCH 0/4] Some trivial live update fixes Zhenzhong Duan
                   ` (2 preceding siblings ...)
  2025-06-23 10:22 ` [PATCH 3/4] vfio/container: Fix potential SIGSEGV when recover from unmap-all-vaddr failure Zhenzhong Duan
@ 2025-06-23 10:22 ` Zhenzhong Duan
  2025-06-26 13:21   ` Steven Sistare
  3 siblings, 1 reply; 14+ messages in thread
From: Zhenzhong Duan @ 2025-06-23 10:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, eric.auger, steven.sistare, chao.p.peng,
	Zhenzhong Duan

When there are multiple VFIO containers, vioc->dma_map is restored
multiple times, this made only first container work and remaining
containers using vioc->dma_map restored by first container.

Fix it by save and restore vioc->dma_map locally. saved_dma_map in
VFIOContainerCPR becomes useless and is removed.

Fixes: 7e9f21411302 ("vfio/container: restore DMA vaddr")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-cpr.h |  1 -
 hw/vfio/cpr-legacy.c       | 20 +++++++-------------
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index aef542e93c..71a1c1a70c 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -25,7 +25,6 @@ typedef struct VFIOContainerCPR {
     bool vaddr_unmapped;
     NotifierWithReturn transfer_notifier;
     MemoryListener remap_listener;
-    DMA_MAP_FUNC saved_dma_map;
 } VFIOContainerCPR;
 
 typedef struct VFIODeviceCPR {
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index 100a8db74d..ff45a5128f 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -99,20 +99,21 @@ static int vfio_container_post_load(void *opaque, int version_id)
 {
     VFIOContainer *container = opaque;
     VFIOContainerBase *bcontainer = &container->bcontainer;
-    VFIOGroup *group;
+    VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+    DMA_MAP_FUNC saved_dma_map = vioc->dma_map;
     Error *local_err = NULL;
 
+    /* During incoming CPR, divert calls to dma_map. */
+    vioc->dma_map = vfio_legacy_cpr_dma_map;
+
     if (!vfio_listener_register(bcontainer, &local_err)) {
         error_report_err(local_err);
         return -1;
     }
 
-    QLIST_FOREACH(group, &container->group_list, container_next) {
-        VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+    /* Restore original dma_map function */
+    vioc->dma_map = saved_dma_map;
 
-        /* Restore original dma_map function */
-        vioc->dma_map = container->cpr.saved_dma_map;
-    }
     return 0;
 }
 
@@ -180,13 +181,6 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
 
     vmstate_register(NULL, -1, &vfio_container_vmstate, container);
 
-    /* During incoming CPR, divert calls to dma_map. */
-    if (cpr_is_incoming()) {
-        VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
-        container->cpr.saved_dma_map = vioc->dma_map;
-        vioc->dma_map = vfio_legacy_cpr_dma_map;
-    }
-
     migration_add_notifier_mode(&container->cpr.transfer_notifier,
                                 vfio_cpr_fail_notifier,
                                 MIG_MODE_CPR_TRANSFER);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] vfio/container: Fix SIGSEGV when open container file fails
  2025-06-23 10:22 ` [PATCH 1/4] vfio/container: Fix SIGSEGV when open container file fails Zhenzhong Duan
@ 2025-06-24  8:53   ` Cédric Le Goater
  2025-06-24  9:55   ` Cédric Le Goater
  1 sibling, 0 replies; 14+ messages in thread
From: Cédric Le Goater @ 2025-06-24  8:53 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, eric.auger, steven.sistare, chao.p.peng

On 6/23/25 12:22, Zhenzhong Duan wrote:
> When open /dev/vfio/vfio fails, SIGSEGV triggers because
> vfio_listener_unregister() doesn't support a NULL bcontainer
> pointer.
> 
> Fixes: a1f267a7d4d9 ("vfio/container: reform vfio_container_connect cleanup")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/container.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 3e8d645ebb..2853f6f08b 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -710,7 +710,9 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>       return true;
>   
>   fail:
> -    vfio_listener_unregister(bcontainer);
> +    if (new_container) {
> +        vfio_listener_unregister(bcontainer);
> +    }
>   
>       if (group_was_added) {
>           vfio_container_group_del(container, group);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] vfio/container: fails mdev hotplug if add migration blocker failed
  2025-06-23 10:22 ` [PATCH 2/4] vfio/container: fails mdev hotplug if add migration blocker failed Zhenzhong Duan
@ 2025-06-24  9:21   ` Cédric Le Goater
  2025-06-25 21:08     ` Cédric Le Goater
  0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2025-06-24  9:21 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, eric.auger, steven.sistare, chao.p.peng

On 6/23/25 12:22, Zhenzhong Duan wrote:
> It's aggressive to abort a running QEMU process when hotplug a mdev
> and it fails migration blocker adding.
> 
> Fix by just failing mdev hotplug itself.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   hw/vfio/container.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 2853f6f08b..68b4fdb401 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -992,12 +992,16 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>       if (vbasedev->mdev) {
>           error_setg(&vbasedev->cpr.mdev_blocker,
>                      "CPR does not support vfio mdev %s", vbasedev->name);
> -        migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker, &error_fatal,
> -                                  MIG_MODE_CPR_TRANSFER, -1);
> +        if (migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker, errp,
> +                                      MIG_MODE_CPR_TRANSFER, -1)) {

migrate_add_blocker_modes() returns -errno. Testing with '< 0' would be
better.


Thanks,

C.



> +            goto hiod_unref_exit;
> +        }
>       }
>   
>       return true;
>   
> +hiod_unref_exit:
> +    object_unref(vbasedev->hiod);
>   device_put_exit:
>       vfio_device_put(vbasedev);
>   group_put_exit:



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] vfio/container: Fix SIGSEGV when open container file fails
  2025-06-23 10:22 ` [PATCH 1/4] vfio/container: Fix SIGSEGV when open container file fails Zhenzhong Duan
  2025-06-24  8:53   ` Cédric Le Goater
@ 2025-06-24  9:55   ` Cédric Le Goater
  1 sibling, 0 replies; 14+ messages in thread
From: Cédric Le Goater @ 2025-06-24  9:55 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, eric.auger, steven.sistare, chao.p.peng

On 6/23/25 12:22, Zhenzhong Duan wrote:
> When open /dev/vfio/vfio fails, SIGSEGV triggers because
> vfio_listener_unregister() doesn't support a NULL bcontainer
> pointer.
> 
> Fixes: a1f267a7d4d9 ("vfio/container: reform vfio_container_connect cleanup")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   hw/vfio/container.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 3e8d645ebb..2853f6f08b 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -710,7 +710,9 @@ static bool vfio_container_connect(VFIOGroup *group, AddressSpace *as,
>       return true;
>   
>   fail:
> -    vfio_listener_unregister(bcontainer);
> +    if (new_container) {
> +        vfio_listener_unregister(bcontainer);
> +    }
>   
>       if (group_was_added) {
>           vfio_container_group_del(container, group);



Applied to vfio-next.

Thanks,

C.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] vfio/container: Fix potential SIGSEGV when recover from unmap-all-vaddr failure
  2025-06-23 10:22 ` [PATCH 3/4] vfio/container: Fix potential SIGSEGV when recover from unmap-all-vaddr failure Zhenzhong Duan
@ 2025-06-24 16:54   ` Cédric Le Goater
  2025-06-26 12:53     ` Steven Sistare
  0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2025-06-24 16:54 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, eric.auger, steven.sistare, chao.p.peng

On 6/23/25 12:22, Zhenzhong Duan wrote:
> cpr.saved_dma_map isn't initialized in source qemu which lead to vioc->dma_map
> assigned a NULL value, this will trigger SIGSEGV.

I don't understand the scenario. Could you please explain more ?


> Fix it by save and restore vioc->dma_map locally.

Steve, this is cpr territory. Is it still an issue with the rest
of the patches applied ?


Thanks,

C.



> 
> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   include/hw/vfio/vfio-cpr.h | 8 +++++---
>   hw/vfio/cpr-legacy.c       | 3 ++-
>   2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index 8bf85b9f4e..aef542e93c 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -16,14 +16,16 @@ struct VFIOContainer;
>   struct VFIOContainerBase;
>   struct VFIOGroup;
>   
> +typedef int (*DMA_MAP_FUNC)(const struct VFIOContainerBase *bcontainer,
> +                            hwaddr iova, ram_addr_t size, void *vaddr,
> +                            bool readonly, MemoryRegion *mr);
> +
>   typedef struct VFIOContainerCPR {
>       Error *blocker;
>       bool vaddr_unmapped;
>       NotifierWithReturn transfer_notifier;
>       MemoryListener remap_listener;
> -    int (*saved_dma_map)(const struct VFIOContainerBase *bcontainer,
> -                         hwaddr iova, ram_addr_t size,
> -                         void *vaddr, bool readonly, MemoryRegion *mr);
> +    DMA_MAP_FUNC saved_dma_map;
>   } VFIOContainerCPR;
>   
>   typedef struct VFIODeviceCPR {
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index a84c3247b7..100a8db74d 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -148,6 +148,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
>            */
>   
>           VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> +        DMA_MAP_FUNC saved_dma_map = vioc->dma_map;
>           vioc->dma_map = vfio_legacy_cpr_dma_map;
>   
>           container->cpr.remap_listener = (MemoryListener) {
> @@ -158,7 +159,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
>                                    bcontainer->space->as);
>           memory_listener_unregister(&container->cpr.remap_listener);
>           container->cpr.vaddr_unmapped = false;
> -        vioc->dma_map = container->cpr.saved_dma_map;
> +        vioc->dma_map = saved_dma_map;
>       }
>       return 0;
>   }



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] vfio/container: fails mdev hotplug if add migration blocker failed
  2025-06-24  9:21   ` Cédric Le Goater
@ 2025-06-25 21:08     ` Cédric Le Goater
  2025-06-26  1:43       ` Duan, Zhenzhong
  0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2025-06-25 21:08 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, eric.auger, steven.sistare, chao.p.peng

On 6/24/25 11:21, Cédric Le Goater wrote:
> On 6/23/25 12:22, Zhenzhong Duan wrote:
>> It's aggressive to abort a running QEMU process when hotplug a mdev
>> and it fails migration blocker adding.
>>
>> Fix by just failing mdev hotplug itself.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/vfio/container.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index 2853f6f08b..68b4fdb401 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -992,12 +992,16 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>>       if (vbasedev->mdev) {
>>           error_setg(&vbasedev->cpr.mdev_blocker,
>>                      "CPR does not support vfio mdev %s", vbasedev->name);
>> -        migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker, &error_fatal,
>> -                                  MIG_MODE_CPR_TRANSFER, -1);
>> +        if (migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker, errp,
>> +                                      MIG_MODE_CPR_TRANSFER, -1)) {
> 
> migrate_add_blocker_modes() returns -errno. Testing with '< 0' would be
> better.
> 


Reviewed-by: Cédric Le Goater <clg@redhat.com>

I changed the test on the value returned by migrate_add_blocker_modes().


Thanks,

C.



> Thanks,
> 
> C.
> 
> 
> 
>> +            goto hiod_unref_exit;
>> +        }
>>       }
>>       return true;
>> +hiod_unref_exit:
>> +    object_unref(vbasedev->hiod);
>>   device_put_exit:
>>       vfio_device_put(vbasedev);
>>   group_put_exit:
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH 2/4] vfio/container: fails mdev hotplug if add migration blocker failed
  2025-06-25 21:08     ` Cédric Le Goater
@ 2025-06-26  1:43       ` Duan, Zhenzhong
  0 siblings, 0 replies; 14+ messages in thread
From: Duan, Zhenzhong @ 2025-06-26  1:43 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel@nongnu.org
  Cc: alex.williamson@redhat.com, eric.auger@redhat.com,
	steven.sistare@oracle.com, Peng, Chao P



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH 2/4] vfio/container: fails mdev hotplug if add migration
>blocker failed
>
>On 6/24/25 11:21, Cédric Le Goater wrote:
>> On 6/23/25 12:22, Zhenzhong Duan wrote:
>>> It's aggressive to abort a running QEMU process when hotplug a mdev
>>> and it fails migration blocker adding.
>>>
>>> Fix by just failing mdev hotplug itself.
>>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>   hw/vfio/container.c | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>>> index 2853f6f08b..68b4fdb401 100644
>>> --- a/hw/vfio/container.c
>>> +++ b/hw/vfio/container.c
>>> @@ -992,12 +992,16 @@ static bool vfio_legacy_attach_device(const char
>*name, VFIODevice *vbasedev,
>>>       if (vbasedev->mdev) {
>>>           error_setg(&vbasedev->cpr.mdev_blocker,
>>>                      "CPR does not support vfio mdev %s", vbasedev->name);
>>> -        migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker,
>&error_fatal,
>>> -                                  MIG_MODE_CPR_TRANSFER, -1);
>>> +        if (migrate_add_blocker_modes(&vbasedev->cpr.mdev_blocker, errp,
>>> +                                      MIG_MODE_CPR_TRANSFER, -1)) {
>>
>> migrate_add_blocker_modes() returns -errno. Testing with '< 0' would be
>> better.
>>
>
>
>Reviewed-by: Cédric Le Goater <clg@redhat.com>
>

Thanks Cédric.
I ever planned to send an update after receiving comments for patch3,4.

BRs,
Zhenzhong

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] vfio/container: Fix potential SIGSEGV when recover from unmap-all-vaddr failure
  2025-06-24 16:54   ` Cédric Le Goater
@ 2025-06-26 12:53     ` Steven Sistare
  2025-06-26 13:04       ` Steven Sistare
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Sistare @ 2025-06-26 12:53 UTC (permalink / raw)
  To: Cédric Le Goater, Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, eric.auger, chao.p.peng

On 6/24/2025 12:54 PM, Cédric Le Goater wrote:
> On 6/23/25 12:22, Zhenzhong Duan wrote:
>> cpr.saved_dma_map isn't initialized in source qemu which lead to vioc->dma_map
>> assigned a NULL value, this will trigger SIGSEGV.
> 
> I don't understand the scenario. Could you please explain more ?

Thank you Zhenzhong, I see the bug.

CPR overrides then restores dma_map in both outgoing and incoming QEMU, for
different reasons. But it only sets saved_dma_map in the target.

So, a more future-proof fix IMO is to always set saved_dma_map:

@@ -182,9 +182,9 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *conta
      vmstate_register(NULL, -1, &vfio_container_vmstate, container);

      /* During incoming CPR, divert calls to dma_map. */
+    VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
+    container->cpr.saved_dma_map = vioc->dma_map;
      if (cpr_is_incoming()) {
-        VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
-        container->cpr.saved_dma_map = vioc->dma_map;
          vioc->dma_map = vfio_legacy_cpr_dma_map;
      }

- Steve

>> Fix it by save and restore vioc->dma_map locally.
> 
> Steve, this is cpr territory. Is it still an issue with the rest
> of the patches applied ?
>  
> Thanks,
> 
> C.
> 
>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   include/hw/vfio/vfio-cpr.h | 8 +++++---
>>   hw/vfio/cpr-legacy.c       | 3 ++-
>>   2 files changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>> index 8bf85b9f4e..aef542e93c 100644
>> --- a/include/hw/vfio/vfio-cpr.h
>> +++ b/include/hw/vfio/vfio-cpr.h
>> @@ -16,14 +16,16 @@ struct VFIOContainer;
>>   struct VFIOContainerBase;
>>   struct VFIOGroup;
>> +typedef int (*DMA_MAP_FUNC)(const struct VFIOContainerBase *bcontainer,
>> +                            hwaddr iova, ram_addr_t size, void *vaddr,
>> +                            bool readonly, MemoryRegion *mr);
>> +
>>   typedef struct VFIOContainerCPR {
>>       Error *blocker;
>>       bool vaddr_unmapped;
>>       NotifierWithReturn transfer_notifier;
>>       MemoryListener remap_listener;
>> -    int (*saved_dma_map)(const struct VFIOContainerBase *bcontainer,
>> -                         hwaddr iova, ram_addr_t size,
>> -                         void *vaddr, bool readonly, MemoryRegion *mr);
>> +    DMA_MAP_FUNC saved_dma_map;
>>   } VFIOContainerCPR;
>>   typedef struct VFIODeviceCPR {
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> index a84c3247b7..100a8db74d 100644
>> --- a/hw/vfio/cpr-legacy.c
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -148,6 +148,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
>>            */
>>           VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>> +        DMA_MAP_FUNC saved_dma_map = vioc->dma_map;
>>           vioc->dma_map = vfio_legacy_cpr_dma_map;
>>           container->cpr.remap_listener = (MemoryListener) {
>> @@ -158,7 +159,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
>>                                    bcontainer->space->as);
>>           memory_listener_unregister(&container->cpr.remap_listener);
>>           container->cpr.vaddr_unmapped = false;
>> -        vioc->dma_map = container->cpr.saved_dma_map;
>> +        vioc->dma_map = saved_dma_map;
>>       }
>>       return 0;
>>   }
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] vfio/container: Fix potential SIGSEGV when recover from unmap-all-vaddr failure
  2025-06-26 12:53     ` Steven Sistare
@ 2025-06-26 13:04       ` Steven Sistare
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Sistare @ 2025-06-26 13:04 UTC (permalink / raw)
  To: Cédric Le Goater, Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, eric.auger, chao.p.peng

On 6/26/2025 8:53 AM, Steven Sistare wrote:
> On 6/24/2025 12:54 PM, Cédric Le Goater wrote:
>> On 6/23/25 12:22, Zhenzhong Duan wrote:
>>> cpr.saved_dma_map isn't initialized in source qemu which lead to vioc->dma_map
>>> assigned a NULL value, this will trigger SIGSEGV.
>>
>> I don't understand the scenario. Could you please explain more ?
> 
> Thank you Zhenzhong, I see the bug.
> 
> CPR overrides then restores dma_map in both outgoing and incoming QEMU, for
> different reasons. But it only sets saved_dma_map in the target.
> 
> So, a more future-proof fix IMO is to always set saved_dma_map:
> 
> @@ -182,9 +182,9 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *conta
>       vmstate_register(NULL, -1, &vfio_container_vmstate, container);
> 
>       /* During incoming CPR, divert calls to dma_map. */
> +    VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> +    container->cpr.saved_dma_map = vioc->dma_map;
>       if (cpr_is_incoming()) {
> -        VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> -        container->cpr.saved_dma_map = vioc->dma_map;
>           vioc->dma_map = vfio_legacy_cpr_dma_map;
>       }

Now I see that patch 4 deletes saved_dma_map entirely, and looks fine.
I still think the above diff is the best fix for patch 3, and you
can move the "save and restore locally" parts of patch 3 to patch 4.

- Steve

>>> Fix it by save and restore vioc->dma_map locally.
>>
>> Steve, this is cpr territory. Is it still an issue with the rest
>> of the patches applied ?
>>
>> Thanks,
>>
>> C.
>>
>>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>   include/hw/vfio/vfio-cpr.h | 8 +++++---
>>>   hw/vfio/cpr-legacy.c       | 3 ++-
>>>   2 files changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>>> index 8bf85b9f4e..aef542e93c 100644
>>> --- a/include/hw/vfio/vfio-cpr.h
>>> +++ b/include/hw/vfio/vfio-cpr.h
>>> @@ -16,14 +16,16 @@ struct VFIOContainer;
>>>   struct VFIOContainerBase;
>>>   struct VFIOGroup;
>>> +typedef int (*DMA_MAP_FUNC)(const struct VFIOContainerBase *bcontainer,
>>> +                            hwaddr iova, ram_addr_t size, void *vaddr,
>>> +                            bool readonly, MemoryRegion *mr);
>>> +
>>>   typedef struct VFIOContainerCPR {
>>>       Error *blocker;
>>>       bool vaddr_unmapped;
>>>       NotifierWithReturn transfer_notifier;
>>>       MemoryListener remap_listener;
>>> -    int (*saved_dma_map)(const struct VFIOContainerBase *bcontainer,
>>> -                         hwaddr iova, ram_addr_t size,
>>> -                         void *vaddr, bool readonly, MemoryRegion *mr);
>>> +    DMA_MAP_FUNC saved_dma_map;
>>>   } VFIOContainerCPR;
>>>   typedef struct VFIODeviceCPR {
>>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>>> index a84c3247b7..100a8db74d 100644
>>> --- a/hw/vfio/cpr-legacy.c
>>> +++ b/hw/vfio/cpr-legacy.c
>>> @@ -148,6 +148,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
>>>            */
>>>           VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>>> +        DMA_MAP_FUNC saved_dma_map = vioc->dma_map;
>>>           vioc->dma_map = vfio_legacy_cpr_dma_map;
>>>           container->cpr.remap_listener = (MemoryListener) {
>>> @@ -158,7 +159,7 @@ static int vfio_cpr_fail_notifier(NotifierWithReturn *notifier,
>>>                                    bcontainer->space->as);
>>>           memory_listener_unregister(&container->cpr.remap_listener);
>>>           container->cpr.vaddr_unmapped = false;
>>> -        vioc->dma_map = container->cpr.saved_dma_map;
>>> +        vioc->dma_map = saved_dma_map;
>>>       }
>>>       return 0;
>>>   }
>>
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] vfio/container: Fix vfio_container_post_load()
  2025-06-23 10:22 ` [PATCH 4/4] vfio/container: Fix vfio_container_post_load() Zhenzhong Duan
@ 2025-06-26 13:21   ` Steven Sistare
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Sistare @ 2025-06-26 13:21 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel; +Cc: alex.williamson, clg, eric.auger, chao.p.peng

On 6/23/2025 6:22 AM, Zhenzhong Duan wrote:
> When there are multiple VFIO containers, vioc->dma_map is restored
> multiple times, this made only first container work and remaining
> containers using vioc->dma_map restored by first container.
> 
> Fix it by save and restore vioc->dma_map locally. saved_dma_map in
> VFIOContainerCPR becomes useless and is removed.
> 
> Fixes: 7e9f21411302 ("vfio/container: restore DMA vaddr")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   include/hw/vfio/vfio-cpr.h |  1 -
>   hw/vfio/cpr-legacy.c       | 20 +++++++-------------
>   2 files changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index aef542e93c..71a1c1a70c 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -25,7 +25,6 @@ typedef struct VFIOContainerCPR {
>       bool vaddr_unmapped;
>       NotifierWithReturn transfer_notifier;
>       MemoryListener remap_listener;
> -    DMA_MAP_FUNC saved_dma_map;
>   } VFIOContainerCPR;
>   
>   typedef struct VFIODeviceCPR {
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index 100a8db74d..ff45a5128f 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -99,20 +99,21 @@ static int vfio_container_post_load(void *opaque, int version_id)
>   {
>       VFIOContainer *container = opaque;
>       VFIOContainerBase *bcontainer = &container->bcontainer;
> -    VFIOGroup *group;
> +    VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> +    DMA_MAP_FUNC saved_dma_map = vioc->dma_map;
>       Error *local_err = NULL;
>   
> +    /* During incoming CPR, divert calls to dma_map. */
> +    vioc->dma_map = vfio_legacy_cpr_dma_map;
> +
>       if (!vfio_listener_register(bcontainer, &local_err)) {
>           error_report_err(local_err);
>           return -1;
>       }
>   
> -    QLIST_FOREACH(group, &container->group_list, container_next) {
> -        VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> +    /* Restore original dma_map function */
> +    vioc->dma_map = saved_dma_map;
>   
> -        /* Restore original dma_map function */
> -        vioc->dma_map = container->cpr.saved_dma_map;
> -    }
>       return 0;
>   }
>   
> @@ -180,13 +181,6 @@ bool vfio_legacy_cpr_register_container(VFIOContainer *container, Error **errp)
>   
>       vmstate_register(NULL, -1, &vfio_container_vmstate, container);
>   
> -    /* During incoming CPR, divert calls to dma_map. */
> -    if (cpr_is_incoming()) {
> -        VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
> -        container->cpr.saved_dma_map = vioc->dma_map;
> -        vioc->dma_map = vfio_legacy_cpr_dma_map;
> -    }
> -
>       migration_add_notifier_mode(&container->cpr.transfer_notifier,
>                                   vfio_cpr_fail_notifier,
>                                   MIG_MODE_CPR_TRANSFER);

Good fix, thanks. All looks good.  I will wait to see the "local save" parts
of patch 3 squashed into patch 4 before sending RB.

One nit about DMA_MAP_FUNC -- the majority of typedefs for function signatures
use lower case.  See
   git grep typedef | fgrep '(*'

- Steve



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-06-26 13:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-23 10:22 [PATCH 0/4] Some trivial live update fixes Zhenzhong Duan
2025-06-23 10:22 ` [PATCH 1/4] vfio/container: Fix SIGSEGV when open container file fails Zhenzhong Duan
2025-06-24  8:53   ` Cédric Le Goater
2025-06-24  9:55   ` Cédric Le Goater
2025-06-23 10:22 ` [PATCH 2/4] vfio/container: fails mdev hotplug if add migration blocker failed Zhenzhong Duan
2025-06-24  9:21   ` Cédric Le Goater
2025-06-25 21:08     ` Cédric Le Goater
2025-06-26  1:43       ` Duan, Zhenzhong
2025-06-23 10:22 ` [PATCH 3/4] vfio/container: Fix potential SIGSEGV when recover from unmap-all-vaddr failure Zhenzhong Duan
2025-06-24 16:54   ` Cédric Le Goater
2025-06-26 12:53     ` Steven Sistare
2025-06-26 13:04       ` Steven Sistare
2025-06-23 10:22 ` [PATCH 4/4] vfio/container: Fix vfio_container_post_load() Zhenzhong Duan
2025-06-26 13:21   ` Steven Sistare

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).