[PATCH] vfio/container: Remap only populated parts in a section

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] vfio/container: Remap only populated parts in a section
@ 2025-08-14  3:24 Zhenzhong Duan
  2025-08-28 12:28 ` Steven Sistare
  0 siblings, 1 reply; 7+ messages in thread
From: Zhenzhong Duan @ 2025-08-14  3:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, clg, steven.sistare, Zhenzhong Duan

If there are multiple containers and unmap-all fails for some container, we
need to remap vaddr for the other containers for which unmap-all succeeded.
When ram discard is enabled, we should only remap populated parts in a
section instead of the whole section.

Export vfio_ram_discard_notify_populate() and use it to do population.

Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
btw: I didn't find easy to test this corner case, only code inspecting

 include/hw/vfio/vfio-container-base.h |  3 +++
 include/hw/vfio/vfio-cpr.h            |  2 +-
 hw/vfio/cpr-legacy.c                  | 19 ++++++++++++++-----
 hw/vfio/listener.c                    |  8 ++++----
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index bded6e993f..3f0c085143 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -269,6 +269,9 @@ struct VFIOIOMMUClass {
     void (*release)(VFIOContainerBase *bcontainer);
 };
 
+int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
+                                     MemoryRegionSection *section);
+
 VFIORamDiscardListener *vfio_find_ram_discard_listener(
     VFIOContainerBase *bcontainer, MemoryRegionSection *section);
 
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index d37daffbc5..fb32a5f873 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -67,7 +67,7 @@ bool vfio_cpr_container_match(struct VFIOContainer *container,
 void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
                            MemoryRegionSection *section);
 
-bool vfio_cpr_ram_discard_register_listener(
+bool vfio_cpr_ram_discard_replay_populated(
     struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
 
 void vfio_cpr_save_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index 553b203e9b..6909c0a616 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -224,22 +224,31 @@ void vfio_cpr_giommu_remap(VFIOContainerBase *bcontainer,
     memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
 }
 
+static int vfio_cpr_rdm_remap(MemoryRegionSection *section, void *opaque)
+{
+    RamDiscardListener *rdl = opaque;
+    return vfio_ram_discard_notify_populate(rdl, section);
+}
+
 /*
  * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
  * succeeding for others, so the latter have lost their vaddr.  Call this
- * to restore vaddr for a section with a RamDiscardManager.
+ * to restore vaddr for populated parts in a section with a RamDiscardManager.
  *
- * The ram discard listener already exists.  Call its populate function
+ * The ram discard listener already exists.  Call its replay_populated function
  * directly, which calls vfio_legacy_cpr_dma_map.
  */
-bool vfio_cpr_ram_discard_register_listener(VFIOContainerBase *bcontainer,
-                                            MemoryRegionSection *section)
+bool vfio_cpr_ram_discard_replay_populated(VFIOContainerBase *bcontainer,
+                                           MemoryRegionSection *section)
 {
+    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
     VFIORamDiscardListener *vrdl =
         vfio_find_ram_discard_listener(bcontainer, section);
 
     g_assert(vrdl);
-    return vrdl->listener.notify_populate(&vrdl->listener, section) == 0;
+    return ram_discard_manager_replay_populated(rdm, section,
+                                                vfio_cpr_rdm_remap,
+                                                &vrdl->listener) == 0;
 }
 
 int vfio_cpr_group_get_device_fd(int d, const char *name)
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index f498e23a93..74837c1122 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -215,8 +215,8 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
     }
 }
 
-static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
-                                            MemoryRegionSection *section)
+int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
+                                     MemoryRegionSection *section)
 {
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
@@ -572,8 +572,8 @@ void vfio_container_region_add(VFIOContainerBase *bcontainer,
     if (memory_region_has_ram_discard_manager(section->mr)) {
         if (!cpr_remap) {
             vfio_ram_discard_register_listener(bcontainer, section);
-        } else if (!vfio_cpr_ram_discard_register_listener(bcontainer,
-                                                           section)) {
+        } else if (!vfio_cpr_ram_discard_replay_populated(bcontainer,
+                                                          section)) {
             goto fail;
         }
         return;
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] vfio/container: Remap only populated parts in a section
  2025-08-14  3:24 [PATCH] vfio/container: Remap only populated parts in a section Zhenzhong Duan
@ 2025-08-28 12:28 ` Steven Sistare
  2025-09-01  2:13   ` Duan, Zhenzhong
  2025-09-01 11:12   ` Cédric Le Goater
  0 siblings, 2 replies; 7+ messages in thread
From: Steven Sistare @ 2025-08-28 12:28 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel; +Cc: alex.williamson, clg

On 8/13/2025 11:24 PM, Zhenzhong Duan wrote:
> If there are multiple containers and unmap-all fails for some container, we
> need to remap vaddr for the other containers for which unmap-all succeeded.
> When ram discard is enabled, we should only remap populated parts in a
> section instead of the whole section.
> 
> Export vfio_ram_discard_notify_populate() and use it to do population.
> 
> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> btw: I didn't find easy to test this corner case, only code inspecting

Thanks Zhenzhong, this looks correct.

However, I never liked patch
   eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")

I think it adds too much complexity for a rare case.  In fact, if we
examine all the possible error return codes, I believe they all would
be caused by other qemu application bugs, or kernel bugs:

vfio_dma_do_unmap()
   returns -EBUSY if an mdev exists.  qemu blocks live update blocker
     when mdev is present.  If this occurs, the blocker has a bug.
   returns -EINVAL if the vaddr was already invalidated.  qemu already
     invalidated it, or never remapped the vaddr after a previous live
     update.  Both are qemu bugs.

iopt_unmap_all
   iopt_unmap_iova_range
     -EBUSY - qemu is concurrently performing other dma map or unmap
              operations.  a bug.

     -EDEADLOCK - Something is not responding to unmap requests.

Therefore, I think we should just revert eba1f657cbb1, and assert that
the qemu vfio_dma_unmap_vaddr_all() call succeeds.

Thoughts?

- Steve

>   include/hw/vfio/vfio-container-base.h |  3 +++
>   include/hw/vfio/vfio-cpr.h            |  2 +-
>   hw/vfio/cpr-legacy.c                  | 19 ++++++++++++++-----
>   hw/vfio/listener.c                    |  8 ++++----
>   4 files changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index bded6e993f..3f0c085143 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -269,6 +269,9 @@ struct VFIOIOMMUClass {
>       void (*release)(VFIOContainerBase *bcontainer);
>   };
>   
> +int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
> +                                     MemoryRegionSection *section);
> +
>   VFIORamDiscardListener *vfio_find_ram_discard_listener(
>       VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>   
> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
> index d37daffbc5..fb32a5f873 100644
> --- a/include/hw/vfio/vfio-cpr.h
> +++ b/include/hw/vfio/vfio-cpr.h
> @@ -67,7 +67,7 @@ bool vfio_cpr_container_match(struct VFIOContainer *container,
>   void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
>                              MemoryRegionSection *section);
>   
> -bool vfio_cpr_ram_discard_register_listener(
> +bool vfio_cpr_ram_discard_replay_populated(
>       struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>   
>   void vfio_cpr_save_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
> index 553b203e9b..6909c0a616 100644
> --- a/hw/vfio/cpr-legacy.c
> +++ b/hw/vfio/cpr-legacy.c
> @@ -224,22 +224,31 @@ void vfio_cpr_giommu_remap(VFIOContainerBase *bcontainer,
>       memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
>   }
>   
> +static int vfio_cpr_rdm_remap(MemoryRegionSection *section, void *opaque)
> +{
> +    RamDiscardListener *rdl = opaque;
> +    return vfio_ram_discard_notify_populate(rdl, section);
> +}
> +
>   /*
>    * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
>    * succeeding for others, so the latter have lost their vaddr.  Call this
> - * to restore vaddr for a section with a RamDiscardManager.
> + * to restore vaddr for populated parts in a section with a RamDiscardManager.
>    *
> - * The ram discard listener already exists.  Call its populate function
> + * The ram discard listener already exists.  Call its replay_populated function
>    * directly, which calls vfio_legacy_cpr_dma_map.
>    */
> -bool vfio_cpr_ram_discard_register_listener(VFIOContainerBase *bcontainer,
> -                                            MemoryRegionSection *section)
> +bool vfio_cpr_ram_discard_replay_populated(VFIOContainerBase *bcontainer,
> +                                           MemoryRegionSection *section)
>   {
> +    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
>       VFIORamDiscardListener *vrdl =
>           vfio_find_ram_discard_listener(bcontainer, section);
>   
>       g_assert(vrdl);
> -    return vrdl->listener.notify_populate(&vrdl->listener, section) == 0;
> +    return ram_discard_manager_replay_populated(rdm, section,
> +                                                vfio_cpr_rdm_remap,
> +                                                &vrdl->listener) == 0;
>   }
>   
>   int vfio_cpr_group_get_device_fd(int d, const char *name)
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index f498e23a93..74837c1122 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -215,8 +215,8 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>       }
>   }
>   
> -static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
> -                                            MemoryRegionSection *section)
> +int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
> +                                     MemoryRegionSection *section)
>   {
>       VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>                                                   listener);
> @@ -572,8 +572,8 @@ void vfio_container_region_add(VFIOContainerBase *bcontainer,
>       if (memory_region_has_ram_discard_manager(section->mr)) {
>           if (!cpr_remap) {
>               vfio_ram_discard_register_listener(bcontainer, section);
> -        } else if (!vfio_cpr_ram_discard_register_listener(bcontainer,
> -                                                           section)) {
> +        } else if (!vfio_cpr_ram_discard_replay_populated(bcontainer,
> +                                                          section)) {
>               goto fail;
>           }
>           return;



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] vfio/container: Remap only populated parts in a section
  2025-08-28 12:28 ` Steven Sistare
@ 2025-09-01  2:13   ` Duan, Zhenzhong
  2025-09-02 13:10     ` Steven Sistare
  2025-09-01 11:12   ` Cédric Le Goater
  1 sibling, 1 reply; 7+ messages in thread
From: Duan, Zhenzhong @ 2025-09-01  2:13 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel@nongnu.org
  Cc: alex.williamson@redhat.com, clg@redhat.com



>-----Original Message-----
>From: Steven Sistare <steven.sistare@oracle.com>
>Subject: Re: [PATCH] vfio/container: Remap only populated parts in a section
>
>On 8/13/2025 11:24 PM, Zhenzhong Duan wrote:
>> If there are multiple containers and unmap-all fails for some container, we
>> need to remap vaddr for the other containers for which unmap-all
>succeeded.
>> When ram discard is enabled, we should only remap populated parts in a
>> section instead of the whole section.
>>
>> Export vfio_ram_discard_notify_populate() and use it to do population.
>>
>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr
>failure")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> btw: I didn't find easy to test this corner case, only code inspecting
>
>Thanks Zhenzhong, this looks correct.
>
>However, I never liked patch
>   eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
>
>I think it adds too much complexity for a rare case.  In fact, if we
>examine all the possible error return codes, I believe they all would
>be caused by other qemu application bugs, or kernel bugs:
>
>vfio_dma_do_unmap()
>   returns -EBUSY if an mdev exists.  qemu blocks live update blocker
>     when mdev is present.  If this occurs, the blocker has a bug.
>   returns -EINVAL if the vaddr was already invalidated.  qemu already
>     invalidated it, or never remapped the vaddr after a previous live
>     update.  Both are qemu bugs.
>
>iopt_unmap_all
>   iopt_unmap_iova_range
>     -EBUSY - qemu is concurrently performing other dma map or unmap
>              operations.  a bug.
>
>     -EDEADLOCK - Something is not responding to unmap requests.
>
>Therefore, I think we should just revert eba1f657cbb1, and assert that
>the qemu vfio_dma_unmap_vaddr_all() call succeeds.
>
>Thoughts?

I agree it's a rare case and your suggestion will make code simple, but I feel it's aggressive to kill QEMU instance if live update fails, try to restore and keep current instance running is important in cloud env and looks more moderate.

Thanks
Zhenzhong



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] vfio/container: Remap only populated parts in a section
  2025-08-28 12:28 ` Steven Sistare
  2025-09-01  2:13   ` Duan, Zhenzhong
@ 2025-09-01 11:12   ` Cédric Le Goater
  1 sibling, 0 replies; 7+ messages in thread
From: Cédric Le Goater @ 2025-09-01 11:12 UTC (permalink / raw)
  To: Steven Sistare, Zhenzhong Duan, qemu-devel; +Cc: alex.williamson

On 8/28/25 14:28, Steven Sistare wrote:
> On 8/13/2025 11:24 PM, Zhenzhong Duan wrote:
>> If there are multiple containers and unmap-all fails for some container, we
>> need to remap vaddr for the other containers for which unmap-all succeeded.
>> When ram discard is enabled, we should only remap populated parts in a
>> section instead of the whole section.
>>
>> Export vfio_ram_discard_notify_populate() and use it to do population.
>>
>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> btw: I didn't find easy to test this corner case, only code inspecting
> 
> Thanks Zhenzhong, this looks correct.
> 
> However, I never liked patch
>    eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
> 
> I think it adds too much complexity for a rare case.  In fact, if we
> examine all the possible error return codes, I believe they all would
> be caused by other qemu application bugs, or kernel bugs:
> 
> vfio_dma_do_unmap()
>    returns -EBUSY if an mdev exists.  qemu blocks live update blocker
>      when mdev is present.  If this occurs, the blocker has a bug.
>    returns -EINVAL if the vaddr was already invalidated.  qemu already
>      invalidated it, or never remapped the vaddr after a previous live
>      update.  Both are qemu bugs.
> 
> iopt_unmap_all
>    iopt_unmap_iova_range
>      -EBUSY - qemu is concurrently performing other dma map or unmap
>               operations.  a bug.
> 
>      -EDEADLOCK - Something is not responding to unmap requests.
> 
> Therefore, I think we should just revert eba1f657cbb1, and assert that
> the qemu vfio_dma_unmap_vaddr_all() call succeeds.


vfio_dma_unmap_vaddr_all() is called in the .pre_save handler. This is
pretty deep in the callstack and errors should be handled by the callers.
Quitting QEMU in that case would be brutal and it would confuse the
sysmgmt layers.

Improving error reporting of the .pre_save handler would be a nice
addition though.


Thanks,

C.




> 
> Thoughts?
> 
> - Steve
> 
>>   include/hw/vfio/vfio-container-base.h |  3 +++
>>   include/hw/vfio/vfio-cpr.h            |  2 +-
>>   hw/vfio/cpr-legacy.c                  | 19 ++++++++++++++-----
>>   hw/vfio/listener.c                    |  8 ++++----
>>   4 files changed, 22 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
>> index bded6e993f..3f0c085143 100644
>> --- a/include/hw/vfio/vfio-container-base.h
>> +++ b/include/hw/vfio/vfio-container-base.h
>> @@ -269,6 +269,9 @@ struct VFIOIOMMUClass {
>>       void (*release)(VFIOContainerBase *bcontainer);
>>   };
>> +int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>> +                                     MemoryRegionSection *section);
>> +
>>   VFIORamDiscardListener *vfio_find_ram_discard_listener(
>>       VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>> index d37daffbc5..fb32a5f873 100644
>> --- a/include/hw/vfio/vfio-cpr.h
>> +++ b/include/hw/vfio/vfio-cpr.h
>> @@ -67,7 +67,7 @@ bool vfio_cpr_container_match(struct VFIOContainer *container,
>>   void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
>>                              MemoryRegionSection *section);
>> -bool vfio_cpr_ram_discard_register_listener(
>> +bool vfio_cpr_ram_discard_replay_populated(
>>       struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>>   void vfio_cpr_save_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> index 553b203e9b..6909c0a616 100644
>> --- a/hw/vfio/cpr-legacy.c
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -224,22 +224,31 @@ void vfio_cpr_giommu_remap(VFIOContainerBase *bcontainer,
>>       memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
>>   }
>> +static int vfio_cpr_rdm_remap(MemoryRegionSection *section, void *opaque)
>> +{
>> +    RamDiscardListener *rdl = opaque;
>> +    return vfio_ram_discard_notify_populate(rdl, section);
>> +}
>> +
>>   /*
>>    * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
>>    * succeeding for others, so the latter have lost their vaddr.  Call this
>> - * to restore vaddr for a section with a RamDiscardManager.
>> + * to restore vaddr for populated parts in a section with a RamDiscardManager.
>>    *
>> - * The ram discard listener already exists.  Call its populate function
>> + * The ram discard listener already exists.  Call its replay_populated function
>>    * directly, which calls vfio_legacy_cpr_dma_map.
>>    */
>> -bool vfio_cpr_ram_discard_register_listener(VFIOContainerBase *bcontainer,
>> -                                            MemoryRegionSection *section)
>> +bool vfio_cpr_ram_discard_replay_populated(VFIOContainerBase *bcontainer,
>> +                                           MemoryRegionSection *section)
>>   {
>> +    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
>>       VFIORamDiscardListener *vrdl =
>>           vfio_find_ram_discard_listener(bcontainer, section);
>>       g_assert(vrdl);
>> -    return vrdl->listener.notify_populate(&vrdl->listener, section) == 0;
>> +    return ram_discard_manager_replay_populated(rdm, section,
>> +                                                vfio_cpr_rdm_remap,
>> +                                                &vrdl->listener) == 0;
>>   }
>>   int vfio_cpr_group_get_device_fd(int d, const char *name)
>> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
>> index f498e23a93..74837c1122 100644
>> --- a/hw/vfio/listener.c
>> +++ b/hw/vfio/listener.c
>> @@ -215,8 +215,8 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>>       }
>>   }
>> -static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>> -                                            MemoryRegionSection *section)
>> +int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>> +                                     MemoryRegionSection *section)
>>   {
>>       VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>>                                                   listener);
>> @@ -572,8 +572,8 @@ void vfio_container_region_add(VFIOContainerBase *bcontainer,
>>       if (memory_region_has_ram_discard_manager(section->mr)) {
>>           if (!cpr_remap) {
>>               vfio_ram_discard_register_listener(bcontainer, section);
>> -        } else if (!vfio_cpr_ram_discard_register_listener(bcontainer,
>> -                                                           section)) {
>> +        } else if (!vfio_cpr_ram_discard_replay_populated(bcontainer,
>> +                                                          section)) {
>>               goto fail;
>>           }
>>           return;
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] vfio/container: Remap only populated parts in a section
  2025-09-01  2:13   ` Duan, Zhenzhong
@ 2025-09-02 13:10     ` Steven Sistare
  2025-09-05  9:04       ` Duan, Zhenzhong
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Sistare @ 2025-09-02 13:10 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-devel@nongnu.org
  Cc: alex.williamson@redhat.com, clg@redhat.com

On 8/31/2025 10:13 PM, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Steven Sistare <steven.sistare@oracle.com>
>> Subject: Re: [PATCH] vfio/container: Remap only populated parts in a section
>>
>> On 8/13/2025 11:24 PM, Zhenzhong Duan wrote:
>>> If there are multiple containers and unmap-all fails for some container, we
>>> need to remap vaddr for the other containers for which unmap-all
>> succeeded.
>>> When ram discard is enabled, we should only remap populated parts in a
>>> section instead of the whole section.
>>>
>>> Export vfio_ram_discard_notify_populate() and use it to do population.
>>>
>>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr
>> failure")
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>> btw: I didn't find easy to test this corner case, only code inspecting
>>
>> Thanks Zhenzhong, this looks correct.
>>
>> However, I never liked patch
>>    eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
>>
>> I think it adds too much complexity for a rare case.  In fact, if we
>> examine all the possible error return codes, I believe they all would
>> be caused by other qemu application bugs, or kernel bugs:
>>
>> vfio_dma_do_unmap()
>>    returns -EBUSY if an mdev exists.  qemu blocks live update blocker
>>      when mdev is present.  If this occurs, the blocker has a bug.
>>    returns -EINVAL if the vaddr was already invalidated.  qemu already
>>      invalidated it, or never remapped the vaddr after a previous live
>>      update.  Both are qemu bugs.
>>
>> iopt_unmap_all
>>    iopt_unmap_iova_range
>>      -EBUSY - qemu is concurrently performing other dma map or unmap
>>               operations.  a bug.
>>
>>      -EDEADLOCK - Something is not responding to unmap requests.
>>
>> Therefore, I think we should just revert eba1f657cbb1, and assert that
>> the qemu vfio_dma_unmap_vaddr_all() call succeeds.
>>
>> Thoughts?
> 
> I agree it's a rare case and your suggestion will make code simple, but I feel it's aggressive to kill QEMU instance if live update fails, try to restore and keep current instance running is important in cloud env and looks more moderate.

OK.

Reviewed-by: Steve Sistare <steven.sistare@oracle.com>

(but you should also seek an RB from someone who is more familiar with ram discard and
its callbacks).

- Steve


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] vfio/container: Remap only populated parts in a section
  2025-09-02 13:10     ` Steven Sistare
@ 2025-09-05  9:04       ` Duan, Zhenzhong
  2025-09-05 10:17         ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Duan, Zhenzhong @ 2025-09-05  9:04 UTC (permalink / raw)
  To: Steven Sistare, qemu-devel@nongnu.org, David Hildenbrand
  Cc: alex.williamson@redhat.com, clg@redhat.com

+ David Hildenbrand

>-----Original Message-----
>From: Steven Sistare <steven.sistare@oracle.com>
>Subject: Re: [PATCH] vfio/container: Remap only populated parts in a section
>
>On 8/31/2025 10:13 PM, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Steven Sistare <steven.sistare@oracle.com>
>>> Subject: Re: [PATCH] vfio/container: Remap only populated parts in a
>section
>>>
>>> On 8/13/2025 11:24 PM, Zhenzhong Duan wrote:
>>>> If there are multiple containers and unmap-all fails for some container,
>we
>>>> need to remap vaddr for the other containers for which unmap-all
>>> succeeded.
>>>> When ram discard is enabled, we should only remap populated parts in a
>>>> section instead of the whole section.
>>>>
>>>> Export vfio_ram_discard_notify_populate() and use it to do population.
>>>>
>>>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr
>>> failure")
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>> btw: I didn't find easy to test this corner case, only code inspecting
>>>
>>> Thanks Zhenzhong, this looks correct.
>>>
>>> However, I never liked patch
>>>    eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr
>failure")
>>>
>>> I think it adds too much complexity for a rare case.  In fact, if we
>>> examine all the possible error return codes, I believe they all would
>>> be caused by other qemu application bugs, or kernel bugs:
>>>
>>> vfio_dma_do_unmap()
>>>    returns -EBUSY if an mdev exists.  qemu blocks live update blocker
>>>      when mdev is present.  If this occurs, the blocker has a bug.
>>>    returns -EINVAL if the vaddr was already invalidated.  qemu already
>>>      invalidated it, or never remapped the vaddr after a previous live
>>>      update.  Both are qemu bugs.
>>>
>>> iopt_unmap_all
>>>    iopt_unmap_iova_range
>>>      -EBUSY - qemu is concurrently performing other dma map or unmap
>>>               operations.  a bug.
>>>
>>>      -EDEADLOCK - Something is not responding to unmap requests.
>>>
>>> Therefore, I think we should just revert eba1f657cbb1, and assert that
>>> the qemu vfio_dma_unmap_vaddr_all() call succeeds.
>>>
>>> Thoughts?
>>
>> I agree it's a rare case and your suggestion will make code simple, but I feel
>it's aggressive to kill QEMU instance if live update fails, try to restore and
>keep current instance running is important in cloud env and looks more
>moderate.
>
>OK.
>
>Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
>
>(but you should also seek an RB from someone who is more familiar with ram
>discard and
>its callbacks).

Hi David, look forward to your comments, suggestions. Thanks

BRs,
Zhenzhong

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] vfio/container: Remap only populated parts in a section
  2025-09-05  9:04       ` Duan, Zhenzhong
@ 2025-09-05 10:17         ` David Hildenbrand
  0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2025-09-05 10:17 UTC (permalink / raw)
  To: Duan, Zhenzhong, Steven Sistare, qemu-devel@nongnu.org
  Cc: alex.williamson@redhat.com, clg@redhat.com

On 05.09.25 11:04, Duan, Zhenzhong wrote:
> + David Hildenbrand
> 
>> -----Original Message-----
>> From: Steven Sistare <steven.sistare@oracle.com>
>> Subject: Re: [PATCH] vfio/container: Remap only populated parts in a section
>>
>> On 8/31/2025 10:13 PM, Duan, Zhenzhong wrote:
>>>> -----Original Message-----
>>>> From: Steven Sistare <steven.sistare@oracle.com>
>>>> Subject: Re: [PATCH] vfio/container: Remap only populated parts in a
>> section
>>>>
>>>> On 8/13/2025 11:24 PM, Zhenzhong Duan wrote:
>>>>> If there are multiple containers and unmap-all fails for some container,
>> we
>>>>> need to remap vaddr for the other containers for which unmap-all
>>>> succeeded.
>>>>> When ram discard is enabled, we should only remap populated parts in a
>>>>> section instead of the whole section.
>>>>>
>>>>> Export vfio_ram_discard_notify_populate() and use it to do population.
>>>>>
>>>>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr
>>>> failure")
>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>> btw: I didn't find easy to test this corner case, only code inspecting
>>>>
>>>> Thanks Zhenzhong, this looks correct.
>>>>
>>>> However, I never liked patch
>>>>     eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr
>> failure")
>>>>
>>>> I think it adds too much complexity for a rare case.  In fact, if we
>>>> examine all the possible error return codes, I believe they all would
>>>> be caused by other qemu application bugs, or kernel bugs:
>>>>
>>>> vfio_dma_do_unmap()
>>>>     returns -EBUSY if an mdev exists.  qemu blocks live update blocker
>>>>       when mdev is present.  If this occurs, the blocker has a bug.
>>>>     returns -EINVAL if the vaddr was already invalidated.  qemu already
>>>>       invalidated it, or never remapped the vaddr after a previous live
>>>>       update.  Both are qemu bugs.
>>>>
>>>> iopt_unmap_all
>>>>     iopt_unmap_iova_range
>>>>       -EBUSY - qemu is concurrently performing other dma map or unmap
>>>>                operations.  a bug.
>>>>
>>>>       -EDEADLOCK - Something is not responding to unmap requests.
>>>>
>>>> Therefore, I think we should just revert eba1f657cbb1, and assert that
>>>> the qemu vfio_dma_unmap_vaddr_all() call succeeds.
>>>>
>>>> Thoughts?
>>>
>>> I agree it's a rare case and your suggestion will make code simple, but I feel
>> it's aggressive to kill QEMU instance if live update fails, try to restore and
>> keep current instance running is important in cloud env and looks more
>> moderate.
>>
>> OK.
>>
>> Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
>>
>> (but you should also seek an RB from someone who is more familiar with ram
>> discard and
>> its callbacks).
> 
> Hi David, look forward to your comments, suggestions. Thanks

Hi!

I mean, the

	return vrdl->listener.notify_populate(&vrdl->listener, section) == 0;

is completely wrong.

ram_discard_manager_replay_populated() should be the right thing to do.

Was this patch tested somehow (e.g., with virtio-mem), so we're sure it's
now behaving as expected?


I would add an empty line in vfio_cpr_rdm_remap().


Apart from that, LGTM

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-09-05 10:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14  3:24 [PATCH] vfio/container: Remap only populated parts in a section Zhenzhong Duan
2025-08-28 12:28 ` Steven Sistare
2025-09-01  2:13   ` Duan, Zhenzhong
2025-09-02 13:10     ` Steven Sistare
2025-09-05  9:04       ` Duan, Zhenzhong
2025-09-05 10:17         ` David Hildenbrand
2025-09-01 11:12   ` Cédric Le Goater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).