qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@redhat.com>
To: Steven Sistare <steven.sistare@oracle.com>,
	Zhenzhong Duan <zhenzhong.duan@intel.com>,
	qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com
Subject: Re: [PATCH] vfio/container: Remap only populated parts in a section
Date: Mon, 1 Sep 2025 13:12:47 +0200	[thread overview]
Message-ID: <155de6b2-4780-4c96-824b-593b4f784e83@redhat.com> (raw)
In-Reply-To: <0286f864-9aaa-4a49-8975-cb1af3bb1270@oracle.com>

On 8/28/25 14:28, Steven Sistare wrote:
> On 8/13/2025 11:24 PM, Zhenzhong Duan wrote:
>> If there are multiple containers and unmap-all fails for some container, we
>> need to remap vaddr for the other containers for which unmap-all succeeded.
>> When ram discard is enabled, we should only remap populated parts in a
>> section instead of the whole section.
>>
>> Export vfio_ram_discard_notify_populate() and use it to do population.
>>
>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> btw: I didn't find easy to test this corner case, only code inspecting
> 
> Thanks Zhenzhong, this looks correct.
> 
> However, I never liked patch
>    eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
> 
> I think it adds too much complexity for a rare case.  In fact, if we
> examine all the possible error return codes, I believe they all would
> be caused by other qemu application bugs, or kernel bugs:
> 
> vfio_dma_do_unmap()
>    returns -EBUSY if an mdev exists.  qemu blocks live update blocker
>      when mdev is present.  If this occurs, the blocker has a bug.
>    returns -EINVAL if the vaddr was already invalidated.  qemu already
>      invalidated it, or never remapped the vaddr after a previous live
>      update.  Both are qemu bugs.
> 
> iopt_unmap_all
>    iopt_unmap_iova_range
>      -EBUSY - qemu is concurrently performing other dma map or unmap
>               operations.  a bug.
> 
>      -EDEADLOCK - Something is not responding to unmap requests.
> 
> Therefore, I think we should just revert eba1f657cbb1, and assert that
> the qemu vfio_dma_unmap_vaddr_all() call succeeds.


vfio_dma_unmap_vaddr_all() is called in the .pre_save handler. This is
pretty deep in the callstack and errors should be handled by the callers.
Quitting QEMU in that case would be brutal and it would confuse the
sysmgmt layers.

Improving error reporting of the .pre_save handler would be a nice
addition though.


Thanks,

C.




> 
> Thoughts?
> 
> - Steve
> 
>>   include/hw/vfio/vfio-container-base.h |  3 +++
>>   include/hw/vfio/vfio-cpr.h            |  2 +-
>>   hw/vfio/cpr-legacy.c                  | 19 ++++++++++++++-----
>>   hw/vfio/listener.c                    |  8 ++++----
>>   4 files changed, 22 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
>> index bded6e993f..3f0c085143 100644
>> --- a/include/hw/vfio/vfio-container-base.h
>> +++ b/include/hw/vfio/vfio-container-base.h
>> @@ -269,6 +269,9 @@ struct VFIOIOMMUClass {
>>       void (*release)(VFIOContainerBase *bcontainer);
>>   };
>> +int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>> +                                     MemoryRegionSection *section);
>> +
>>   VFIORamDiscardListener *vfio_find_ram_discard_listener(
>>       VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>> diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
>> index d37daffbc5..fb32a5f873 100644
>> --- a/include/hw/vfio/vfio-cpr.h
>> +++ b/include/hw/vfio/vfio-cpr.h
>> @@ -67,7 +67,7 @@ bool vfio_cpr_container_match(struct VFIOContainer *container,
>>   void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
>>                              MemoryRegionSection *section);
>> -bool vfio_cpr_ram_discard_register_listener(
>> +bool vfio_cpr_ram_discard_replay_populated(
>>       struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
>>   void vfio_cpr_save_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
>> diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
>> index 553b203e9b..6909c0a616 100644
>> --- a/hw/vfio/cpr-legacy.c
>> +++ b/hw/vfio/cpr-legacy.c
>> @@ -224,22 +224,31 @@ void vfio_cpr_giommu_remap(VFIOContainerBase *bcontainer,
>>       memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
>>   }
>> +static int vfio_cpr_rdm_remap(MemoryRegionSection *section, void *opaque)
>> +{
>> +    RamDiscardListener *rdl = opaque;
>> +    return vfio_ram_discard_notify_populate(rdl, section);
>> +}
>> +
>>   /*
>>    * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
>>    * succeeding for others, so the latter have lost their vaddr.  Call this
>> - * to restore vaddr for a section with a RamDiscardManager.
>> + * to restore vaddr for populated parts in a section with a RamDiscardManager.
>>    *
>> - * The ram discard listener already exists.  Call its populate function
>> + * The ram discard listener already exists.  Call its replay_populated function
>>    * directly, which calls vfio_legacy_cpr_dma_map.
>>    */
>> -bool vfio_cpr_ram_discard_register_listener(VFIOContainerBase *bcontainer,
>> -                                            MemoryRegionSection *section)
>> +bool vfio_cpr_ram_discard_replay_populated(VFIOContainerBase *bcontainer,
>> +                                           MemoryRegionSection *section)
>>   {
>> +    RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
>>       VFIORamDiscardListener *vrdl =
>>           vfio_find_ram_discard_listener(bcontainer, section);
>>       g_assert(vrdl);
>> -    return vrdl->listener.notify_populate(&vrdl->listener, section) == 0;
>> +    return ram_discard_manager_replay_populated(rdm, section,
>> +                                                vfio_cpr_rdm_remap,
>> +                                                &vrdl->listener) == 0;
>>   }
>>   int vfio_cpr_group_get_device_fd(int d, const char *name)
>> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
>> index f498e23a93..74837c1122 100644
>> --- a/hw/vfio/listener.c
>> +++ b/hw/vfio/listener.c
>> @@ -215,8 +215,8 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>>       }
>>   }
>> -static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>> -                                            MemoryRegionSection *section)
>> +int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>> +                                     MemoryRegionSection *section)
>>   {
>>       VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>>                                                   listener);
>> @@ -572,8 +572,8 @@ void vfio_container_region_add(VFIOContainerBase *bcontainer,
>>       if (memory_region_has_ram_discard_manager(section->mr)) {
>>           if (!cpr_remap) {
>>               vfio_ram_discard_register_listener(bcontainer, section);
>> -        } else if (!vfio_cpr_ram_discard_register_listener(bcontainer,
>> -                                                           section)) {
>> +        } else if (!vfio_cpr_ram_discard_replay_populated(bcontainer,
>> +                                                          section)) {
>>               goto fail;
>>           }
>>           return;
> 



      parent reply	other threads:[~2025-09-01 11:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-14  3:24 [PATCH] vfio/container: Remap only populated parts in a section Zhenzhong Duan
2025-08-28 12:28 ` Steven Sistare
2025-09-01  2:13   ` Duan, Zhenzhong
2025-09-02 13:10     ` Steven Sistare
2025-09-05  9:04       ` Duan, Zhenzhong
2025-09-05 10:17         ` David Hildenbrand
2025-09-08  6:18           ` Duan, Zhenzhong
2025-09-26  2:26           ` Duan, Zhenzhong
2025-09-01 11:12   ` Cédric Le Goater [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=155de6b2-4780-4c96-824b-593b4f784e83@redhat.com \
    --to=clg@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=steven.sistare@oracle.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).