* [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
@ 2025-01-20 17:33 Eric Auger
  2025-01-21  3:27 ` Jason Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Eric Auger @ 2025-01-20 17:33 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, mst, jasowang, sgarzare,
	lvivier
  Cc: zhenzhong.duan
When a guest exposed with a vhost device and protected by an
intel IOMMU gets rebooted, we sometimes observe a spurious warning:
Fail to lookup the translated address ffffe000
We observe that the IOMMU gets disabled through a write to the global
command register (CMAR_GCMD.TE) before the vhost device gets stopped.
When this warning happens it can be observed an inflight IOTLB
miss occurs after the IOMMU disable and before the vhost stop. In
that case a flat translation occurs and the check in
vhost_memory_region_lookup() fails.
Let's disable the IOTLB callbacks when all IOMMU MRs have been
unregistered.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/vhost.c | 4 ++++
 1 file changed, 4 insertions(+)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 6aa72fd434..128c2ab094 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
             break;
         }
     }
+    if (QLIST_EMPTY(&dev->iommu_list) &&
+        dev->vhost_ops->vhost_set_iotlb_callback) {
+        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
+    }
 }
 
 void vhost_toggle_device_iotlb(VirtIODevice *vdev)
-- 
2.47.1
^ permalink raw reply related	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-20 17:33 [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled Eric Auger
@ 2025-01-21  3:27 ` Jason Wang
  2025-01-21  7:15   ` Eric Auger
  2025-01-21 16:25   ` Eric Auger
  2025-01-21  8:31 ` Laurent Vivier
  2025-01-21  9:18 ` Duan, Zhenzhong
  2 siblings, 2 replies; 34+ messages in thread
From: Jason Wang @ 2025-01-21  3:27 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, mst, sgarzare, lvivier,
	zhenzhong.duan
On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote:
>
> When a guest exposed with a vhost device and protected by an
> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>
> Fail to lookup the translated address ffffe000
>
> We observe that the IOMMU gets disabled through a write to the global
> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
> When this warning happens it can be observed an inflight IOTLB
> miss occurs after the IOMMU disable and before the vhost stop. In
> that case a flat translation occurs and the check in
> vhost_memory_region_lookup() fails.
>
> Let's disable the IOTLB callbacks when all IOMMU MRs have been
> unregistered.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  hw/virtio/vhost.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 6aa72fd434..128c2ab094 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
>              break;
>          }
>      }
> +    if (QLIST_EMPTY(&dev->iommu_list) &&
> +        dev->vhost_ops->vhost_set_iotlb_callback) {
> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
> +    }
So the current code assumes:
1) IOMMU is enabled before vhost starts
2) IOMMU is disabled after vhost stops
This patch seems to fix 2) but not 1). Do we need to deal with the
IOMMU enabled after vhost starts?
Thanks
>  }
>
>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
> --
> 2.47.1
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-21  3:27 ` Jason Wang
@ 2025-01-21  7:15   ` Eric Auger
  2025-01-21 16:25   ` Eric Auger
  1 sibling, 0 replies; 34+ messages in thread
From: Eric Auger @ 2025-01-21  7:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: eric.auger.pro, qemu-devel, mst, sgarzare, lvivier,
	zhenzhong.duan
Hi Jason,
On 1/21/25 4:27 AM, Jason Wang wrote:
> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote:
>> When a guest exposed with a vhost device and protected by an
>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>
>> Fail to lookup the translated address ffffe000
>>
>> We observe that the IOMMU gets disabled through a write to the global
>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>> When this warning happens it can be observed an inflight IOTLB
>> miss occurs after the IOMMU disable and before the vhost stop. In
>> that case a flat translation occurs and the check in
>> vhost_memory_region_lookup() fails.
>>
>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>> unregistered.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  hw/virtio/vhost.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 6aa72fd434..128c2ab094 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
>>              break;
>>          }
>>      }
>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>> +    }
> So the current code assumes:
>
> 1) IOMMU is enabled before vhost starts
> 2) IOMMU is disabled after vhost stops
>
> This patch seems to fix 2) but not 1). Do we need to deal with the
> IOMMU enabled after vhost starts?
This patch handles the case where the IOMMU is disabled *before* vhost
stops (not 2). This is what I concretely observe on guest reboot.
But maybe I misunderstood your comments/questions?
Thanks
Eric
>
> Thanks
>
>>  }
>>
>>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>> --
>> 2.47.1
>>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-20 17:33 [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled Eric Auger
  2025-01-21  3:27 ` Jason Wang
@ 2025-01-21  8:31 ` Laurent Vivier
  2025-01-21  8:45   ` Stefano Garzarella
                     ` (2 more replies)
  2025-01-21  9:18 ` Duan, Zhenzhong
  2 siblings, 3 replies; 34+ messages in thread
From: Laurent Vivier @ 2025-01-21  8:31 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, mst, jasowang, sgarzare
  Cc: zhenzhong.duan
On 20/01/2025 18:33, Eric Auger wrote:
> When a guest exposed with a vhost device and protected by an
> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
> 
> Fail to lookup the translated address ffffe000
> 
> We observe that the IOMMU gets disabled through a write to the global
> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
> When this warning happens it can be observed an inflight IOTLB
> miss occurs after the IOMMU disable and before the vhost stop. In
> that case a flat translation occurs and the check in
> vhost_memory_region_lookup() fails.
> 
> Let's disable the IOTLB callbacks when all IOMMU MRs have been
> unregistered.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   hw/virtio/vhost.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 6aa72fd434..128c2ab094 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
>               break;
>           }
>       }
> +    if (QLIST_EMPTY(&dev->iommu_list) &&
> +        dev->vhost_ops->vhost_set_iotlb_callback) {
> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
> +    }
>   }
>   
>   void vhost_toggle_device_iotlb(VirtIODevice *vdev)
I think you need the counterpart in vhost_iommu_region_del() (for instance if we have an 
add after a del that results in an empty list).
But you cannot unconditionally enable it (for instance if vhost is not started)
Perhaps you should move the vhost_set_iotlb_callback() call from 
vhost_start()/vhost_stop() to vhost_iommu_region_add()/vhost_iommu_region_del()?
Thanks,
Laurent
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-21  8:31 ` Laurent Vivier
@ 2025-01-21  8:45   ` Stefano Garzarella
  2025-01-21  8:49     ` Eric Auger
  2025-01-21  8:48   ` Eric Auger
  2025-01-21 16:32   ` Eric Auger
  2 siblings, 1 reply; 34+ messages in thread
From: Stefano Garzarella @ 2025-01-21  8:45 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Eric Auger, eric.auger.pro, qemu-devel, mst, jasowang,
	zhenzhong.duan
On Tue, Jan 21, 2025 at 09:31:53AM +0100, Laurent Vivier wrote:
>On 20/01/2025 18:33, Eric Auger wrote:
>>When a guest exposed with a vhost device and protected by an
>>intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>
>>Fail to lookup the translated address ffffe000
>>
>>We observe that the IOMMU gets disabled through a write to the global
>>command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>>When this warning happens it can be observed an inflight IOTLB
>>miss occurs after the IOMMU disable and before the vhost stop. In
>>that case a flat translation occurs and the check in
>>vhost_memory_region_lookup() fails.
>>
>>Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>unregistered.
>>
>>Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>---
>>  hw/virtio/vhost.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>index 6aa72fd434..128c2ab094 100644
>>--- a/hw/virtio/vhost.c
>>+++ b/hw/virtio/vhost.c
>>@@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
>>              break;
>>          }
>>      }
>>+    if (QLIST_EMPTY(&dev->iommu_list) &&
>>+        dev->vhost_ops->vhost_set_iotlb_callback) {
>>+        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>>+    }
>>  }
>>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>
>I think you need the counterpart in vhost_iommu_region_del() (for 
I guess you meant vhost_iommu_region_add(). I was going to comment 
exactly on that, I agree here.
>instance if we have an add after a del that results in an empty list).
>But you cannot unconditionally enable it (for instance if vhost is not 
>started)
Good point.
>
>Perhaps you should move the vhost_set_iotlb_callback() call from 
>vhost_start()/vhost_stop() to 
>vhost_iommu_region_add()/vhost_iommu_region_del()?
I also like this idea.
Stefano
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-21  8:31 ` Laurent Vivier
  2025-01-21  8:45   ` Stefano Garzarella
@ 2025-01-21  8:48   ` Eric Auger
  2025-01-21 16:32   ` Eric Auger
  2 siblings, 0 replies; 34+ messages in thread
From: Eric Auger @ 2025-01-21  8:48 UTC (permalink / raw)
  To: Laurent Vivier, eric.auger.pro, qemu-devel, mst, jasowang,
	sgarzare
  Cc: zhenzhong.duan
On 1/21/25 9:31 AM, Laurent Vivier wrote:
> On 20/01/2025 18:33, Eric Auger wrote:
>> When a guest exposed with a vhost device and protected by an
>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>
>> Fail to lookup the translated address ffffe000
>>
>> We observe that the IOMMU gets disabled through a write to the global
>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>> When this warning happens it can be observed an inflight IOTLB
>> miss occurs after the IOMMU disable and before the vhost stop. In
>> that case a flat translation occurs and the check in
>> vhost_memory_region_lookup() fails.
>>
>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>> unregistered.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>   hw/virtio/vhost.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 6aa72fd434..128c2ab094 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -931,6 +931,10 @@ static void
>> vhost_iommu_region_del(MemoryListener *listener,
>>               break;
>>           }
>>       }
>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>> +    }
>>   }
>>     void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>
> I think you need the counterpart in vhost_iommu_region_del() (for
> instance if we have an add after a del that results in an empty list).
I guess you meant vhost_iommu_region_add()
> But you cannot unconditionally enable it (for instance if vhost is not
> started)
agreed. I will further look at the control path.
>
> Perhaps you should move the vhost_set_iotlb_callback() call from
> vhost_start()/vhost_stop() to
> vhost_iommu_region_add()/vhost_iommu_region_del()?
Interesting. I will study that.
Thanks!
Eric
>
> Thanks,
> Laurent
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-21  8:45   ` Stefano Garzarella
@ 2025-01-21  8:49     ` Eric Auger
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Auger @ 2025-01-21  8:49 UTC (permalink / raw)
  To: Stefano Garzarella, Laurent Vivier
  Cc: eric.auger.pro, qemu-devel, mst, jasowang, zhenzhong.duan
Hi Stefano,
On 1/21/25 9:45 AM, Stefano Garzarella wrote:
> On Tue, Jan 21, 2025 at 09:31:53AM +0100, Laurent Vivier wrote:
>> On 20/01/2025 18:33, Eric Auger wrote:
>>> When a guest exposed with a vhost device and protected by an
>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>>
>>> Fail to lookup the translated address ffffe000
>>>
>>> We observe that the IOMMU gets disabled through a write to the global
>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>>> When this warning happens it can be observed an inflight IOTLB
>>> miss occurs after the IOMMU disable and before the vhost stop. In
>>> that case a flat translation occurs and the check in
>>> vhost_memory_region_lookup() fails.
>>>
>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>> unregistered.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> ---
>>>  hw/virtio/vhost.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> index 6aa72fd434..128c2ab094 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -931,6 +931,10 @@ static void
>>> vhost_iommu_region_del(MemoryListener *listener,
>>>              break;
>>>          }
>>>      }
>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>>> +    }
>>>  }
>>>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>>
>> I think you need the counterpart in vhost_iommu_region_del() (for 
>
> I guess you meant vhost_iommu_region_add(). I was going to comment
> exactly on that, I agree here.
>
>> instance if we have an add after a del that results in an empty list).
>> But you cannot unconditionally enable it (for instance if vhost is
>> not started)
>
> Good point.
>
>>
>> Perhaps you should move the vhost_set_iotlb_callback() call from
>> vhost_start()/vhost_stop() to
>> vhost_iommu_region_add()/vhost_iommu_region_del()?
>
> I also like this idea.
OK makes sense. I will go in this direction.
Eric
>
> Stefano
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-20 17:33 [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled Eric Auger
  2025-01-21  3:27 ` Jason Wang
  2025-01-21  8:31 ` Laurent Vivier
@ 2025-01-21  9:18 ` Duan, Zhenzhong
  2025-01-21 10:34   ` Eric Auger
  2 siblings, 1 reply; 34+ messages in thread
From: Duan, Zhenzhong @ 2025-01-21  9:18 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	mst@redhat.com, jasowang@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com
Hi Eric,
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>disabled
>
>When a guest exposed with a vhost device and protected by an
>intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>
>Fail to lookup the translated address ffffe000
Do you see this print once during one time reboot?
>
>We observe that the IOMMU gets disabled through a write to the global
>command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>When this warning happens it can be observed an inflight IOTLB
>miss occurs after the IOMMU disable and before the vhost stop. In
>that case a flat translation occurs and the check in
>vhost_memory_region_lookup() fails.
>
>Let's disable the IOTLB callbacks when all IOMMU MRs have been
>unregistered.
Try to understand the sequence, is it like below?
           vhost                                                     vcpu          
call into vtd_iommu_translate();
                                                                    set s->dmar_enabled = false;
                                                                    switch off iommu address space;
                                                                    disable vhost IOTLB callbacks;
check if !s->dmar_enabled;
return flat translation and trigger warning
Thanks
Zhenzhong
>
>Signed-off-by: Eric Auger <eric.auger@redhat.com>
>---
> hw/virtio/vhost.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>index 6aa72fd434..128c2ab094 100644
>--- a/hw/virtio/vhost.c
>+++ b/hw/virtio/vhost.c
>@@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener
>*listener,
>             break;
>         }
>     }
>+    if (QLIST_EMPTY(&dev->iommu_list) &&
>+        dev->vhost_ops->vhost_set_iotlb_callback) {
>+        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>+    }
> }
>
> void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>--
>2.47.1
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-21  9:18 ` Duan, Zhenzhong
@ 2025-01-21 10:34   ` Eric Auger
  2025-01-21 10:43     ` Duan, Zhenzhong
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Auger @ 2025-01-21 10:34 UTC (permalink / raw)
  To: Duan, Zhenzhong, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	mst@redhat.com, jasowang@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com
Hi Zhenzhong,
On 1/21/25 10:18 AM, Duan, Zhenzhong wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Subject: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>> disabled
>>
>> When a guest exposed with a vhost device and protected by an
>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>
>> Fail to lookup the translated address ffffe000
> Do you see this print once during one time reboot?
Actually this happens rarely on reboot. The reproducibility is of the
order of magnitude of 1/10 for me. I use a vm with vhost net device +
virtual intel iommu featuring a crontab job.
@reboot /usr/sbin/reboot
>
>> We observe that the IOMMU gets disabled through a write to the global
>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>> When this warning happens it can be observed an inflight IOTLB
>> miss occurs after the IOMMU disable and before the vhost stop. In
>> that case a flat translation occurs and the check in
>> vhost_memory_region_lookup() fails.
>>
>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>> unregistered.
> Try to understand the sequence, is it like below?
>
>            vhost                                                     vcpu          
>
> call into vtd_iommu_translate();
No that's a kernel vhost translate request that normally tries to find
out the translated address on kernel side in the IOTLB but since the
data is not there it ends up asking for the translation to user space ...
>
>                                                                     set s->dmar_enabled = false;
>                                                                     switch off iommu address space;
>                                                                     disable vhost IOTLB callbacks;
vtd_handle_gcmd_write/vtd_handle_gcmd_te/vtd_handle_gcmd_te which
eventually calls vhost_iommu_region_del
>
> check if !s->dmar_enabled;
> return flat translation and trigger warning
vhost inflight translation reaches user space through
vhost_device_iotlb_miss()
>
> Thanks
> Zhenzhong
>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>> hw/virtio/vhost.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 6aa72fd434..128c2ab094 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener
>> *listener,
>>             break;
>>         }
>>     }
>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>> +    }
>> }
>>
>> void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>> --
>> 2.47.1
^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-21 10:34   ` Eric Auger
@ 2025-01-21 10:43     ` Duan, Zhenzhong
  0 siblings, 0 replies; 34+ messages in thread
From: Duan, Zhenzhong @ 2025-01-21 10:43 UTC (permalink / raw)
  To: eric.auger@redhat.com, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, jasowang@redhat.com,
	sgarzare@redhat.com, lvivier@redhat.com
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>disabled
>
>Hi Zhenzhong,
>
>
>On 1/21/25 10:18 AM, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>>> -----Original Message-----
>>> From: Eric Auger <eric.auger@redhat.com>
>>> Subject: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>>> disabled
>>>
>>> When a guest exposed with a vhost device and protected by an
>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>>
>>> Fail to lookup the translated address ffffe000
>> Do you see this print once during one time reboot?
>Actually this happens rarely on reboot. The reproducibility is of the
>order of magnitude of 1/10 for me. I use a vm with vhost net device +
>virtual intel iommu featuring a crontab job.
>@reboot /usr/sbin/reboot
>>
>>> We observe that the IOMMU gets disabled through a write to the global
>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>>> When this warning happens it can be observed an inflight IOTLB
>>> miss occurs after the IOMMU disable and before the vhost stop. In
>>> that case a flat translation occurs and the check in
>>> vhost_memory_region_lookup() fails.
>>>
>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>> unregistered.
>> Try to understand the sequence, is it like below?
>>
>>            vhost                                                     vcpu
>>
>> call into vtd_iommu_translate();
>No that's a kernel vhost translate request that normally tries to find
>out the translated address on kernel side in the IOTLB but since the
>data is not there it ends up asking for the translation to user space ...
>>
>>                                                                     set s->dmar_enabled = false;
>>                                                                     switch off iommu address space;
>>                                                                     disable vhost IOTLB callbacks;
>vtd_handle_gcmd_write/vtd_handle_gcmd_te/vtd_handle_gcmd_te which
>eventually calls vhost_iommu_region_del
>>
>> check if !s->dmar_enabled;
>> return flat translation and trigger warning
>vhost inflight translation reaches user space through
>vhost_device_iotlb_miss()
Understood, thanks Eric!
BRs.
Zhenzhong
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-21  3:27 ` Jason Wang
  2025-01-21  7:15   ` Eric Auger
@ 2025-01-21 16:25   ` Eric Auger
  2025-01-22  7:17     ` Jason Wang
  1 sibling, 1 reply; 34+ messages in thread
From: Eric Auger @ 2025-01-21 16:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: eric.auger.pro, qemu-devel, mst, sgarzare, lvivier,
	zhenzhong.duan
Hi Jason,
On 1/21/25 4:27 AM, Jason Wang wrote:
> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote:
>> When a guest exposed with a vhost device and protected by an
>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>
>> Fail to lookup the translated address ffffe000
>>
>> We observe that the IOMMU gets disabled through a write to the global
>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>> When this warning happens it can be observed an inflight IOTLB
>> miss occurs after the IOMMU disable and before the vhost stop. In
>> that case a flat translation occurs and the check in
>> vhost_memory_region_lookup() fails.
>>
>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>> unregistered.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  hw/virtio/vhost.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 6aa72fd434..128c2ab094 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
>>              break;
>>          }
>>      }
>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>> +    }
> So the current code assumes:
>
> 1) IOMMU is enabled before vhost starts
> 2) IOMMU is disabled after vhost stops
>
> This patch seems to fix 2) but not 1). Do we need to deal with the
> IOMMU enabled after vhost starts?
sorry I initially misunderstood the above comment. Indeed in the reboot
case assumption 2) happens to be wrong. However what I currently do is:
stop listening to iotlb miss requests from the kernel because my
understanding is those requests are just spurious ones, generate
warnings and we do not care since we are rebooting the system.
However I do not claim this could handle the case where the IOMMU MR
would be turned off and then turned on. I think in that case we should
also flush the kernel IOTLB and this is not taken care of in this patch.
Is it a relevant use case?
wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
that a valid use case as the virtio driver is using the dma api?
Eric
>
> Thanks
>
>>  }
>>
>>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>> --
>> 2.47.1
>>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-21  8:31 ` Laurent Vivier
  2025-01-21  8:45   ` Stefano Garzarella
  2025-01-21  8:48   ` Eric Auger
@ 2025-01-21 16:32   ` Eric Auger
  2 siblings, 0 replies; 34+ messages in thread
From: Eric Auger @ 2025-01-21 16:32 UTC (permalink / raw)
  To: Laurent Vivier, eric.auger.pro, qemu-devel, mst, jasowang,
	sgarzare
  Cc: zhenzhong.duan
Hi,
On 1/21/25 9:31 AM, Laurent Vivier wrote:
> On 20/01/2025 18:33, Eric Auger wrote:
>> When a guest exposed with a vhost device and protected by an
>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>
>> Fail to lookup the translated address ffffe000
>>
>> We observe that the IOMMU gets disabled through a write to the global
>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>> When this warning happens it can be observed an inflight IOTLB
>> miss occurs after the IOMMU disable and before the vhost stop. In
>> that case a flat translation occurs and the check in
>> vhost_memory_region_lookup() fails.
>>
>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>> unregistered.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>   hw/virtio/vhost.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> index 6aa72fd434..128c2ab094 100644
>> --- a/hw/virtio/vhost.c
>> +++ b/hw/virtio/vhost.c
>> @@ -931,6 +931,10 @@ static void
>> vhost_iommu_region_del(MemoryListener *listener,
>>               break;
>>           }
>>       }
>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>> +    }
>>   }
>>     void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>
> I think you need the counterpart in vhost_iommu_region_del() (for
> instance if we have an add after a del that results in an empty list).
> But you cannot unconditionally enable it (for instance if vhost is not
> started)
if we enter vhost_iommu_region_add(), this means that the iommu_listener
has been registered in vhost_vdev_start() and thus vdev->vhost_started
is set.
>
> Perhaps you should move the vhost_set_iotlb_callback() call from
> vhost_start()/vhost_stop() to
> vhost_iommu_region_add()/vhost_iommu_region_del()?
I currently fail to understand whether we shouldn't keep listening to
iotlb callbacks when the IOMMU gets disabled. In that case shouldn't we
flush the kernel IOTLB and update the hdev->mem->regions to reflect the
IOMMR MR disablement?
Thanks
Eric
>
> Thanks,
> Laurent
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-21 16:25   ` Eric Auger
@ 2025-01-22  7:17     ` Jason Wang
  2025-01-22  7:55       ` Eric Auger
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Wang @ 2025-01-22  7:17 UTC (permalink / raw)
  To: eric.auger
  Cc: eric.auger.pro, qemu-devel, mst, sgarzare, lvivier,
	zhenzhong.duan
On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote:
>
>
> Hi Jason,
>
> On 1/21/25 4:27 AM, Jason Wang wrote:
> > On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote:
> >> When a guest exposed with a vhost device and protected by an
> >> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
> >>
> >> Fail to lookup the translated address ffffe000
> >>
> >> We observe that the IOMMU gets disabled through a write to the global
> >> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
> >> When this warning happens it can be observed an inflight IOTLB
> >> miss occurs after the IOMMU disable and before the vhost stop. In
> >> that case a flat translation occurs and the check in
> >> vhost_memory_region_lookup() fails.
> >>
> >> Let's disable the IOTLB callbacks when all IOMMU MRs have been
> >> unregistered.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> ---
> >>  hw/virtio/vhost.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index 6aa72fd434..128c2ab094 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
> >>              break;
> >>          }
> >>      }
> >> +    if (QLIST_EMPTY(&dev->iommu_list) &&
> >> +        dev->vhost_ops->vhost_set_iotlb_callback) {
> >> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
> >> +    }
> > So the current code assumes:
> >
> > 1) IOMMU is enabled before vhost starts
> > 2) IOMMU is disabled after vhost stops
> >
> > This patch seems to fix 2) but not 1). Do we need to deal with the
> > IOMMU enabled after vhost starts?
>
> sorry I initially misunderstood the above comment. Indeed in the reboot
> case assumption 2) happens to be wrong. However what I currently do is:
> stop listening to iotlb miss requests from the kernel because my
> understanding is those requests are just spurious ones, generate
> warnings and we do not care since we are rebooting the system.
>
> However I do not claim this could handle the case where the IOMMU MR
> would be turned off and then turned on. I think in that case we should
> also flush the kernel IOTLB and this is not taken care of in this patch.
> Is it a relevant use case?
Not sure.
>
> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
> that a valid use case as the virtio driver is using the dma api?
It should not be but we can't assume the behaviour of the guest. It
could be buggy or even malicious.
Btw, we had the following codes while handling te:
/* Handle Translation Enable/Disable */
static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
{
    if (s->dmar_enabled == en) {
        return;
    }
    trace_vtd_dmar_enable(en);
...
    vtd_reset_caches(s);
    vtd_address_space_refresh_all(s);
}
vtd_address_space_refresh_all() will basically disable the iommu
memory region. It looks not sufficient to trigger the region_del
callback, maybe we should delete the region or introduce listener
callback?
Thanks
>
> Eric
>
>
> >
> > Thanks
> >
> >>  }
> >>
> >>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
> >> --
> >> 2.47.1
> >>
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-22  7:17     ` Jason Wang
@ 2025-01-22  7:55       ` Eric Auger
  2025-01-23  1:34         ` Jason Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Auger @ 2025-01-22  7:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: eric.auger.pro, qemu-devel, mst, sgarzare, lvivier,
	zhenzhong.duan
Hi Jason,
On 1/22/25 8:17 AM, Jason Wang wrote:
> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote:
>>
>> Hi Jason,
>>
>> On 1/21/25 4:27 AM, Jason Wang wrote:
>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote:
>>>> When a guest exposed with a vhost device and protected by an
>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>>>
>>>> Fail to lookup the translated address ffffe000
>>>>
>>>> We observe that the IOMMU gets disabled through a write to the global
>>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>>>> When this warning happens it can be observed an inflight IOTLB
>>>> miss occurs after the IOMMU disable and before the vhost stop. In
>>>> that case a flat translation occurs and the check in
>>>> vhost_memory_region_lookup() fails.
>>>>
>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>>> unregistered.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> ---
>>>>  hw/virtio/vhost.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>> index 6aa72fd434..128c2ab094 100644
>>>> --- a/hw/virtio/vhost.c
>>>> +++ b/hw/virtio/vhost.c
>>>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
>>>>              break;
>>>>          }
>>>>      }
>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>>>> +    }
>>> So the current code assumes:
>>>
>>> 1) IOMMU is enabled before vhost starts
>>> 2) IOMMU is disabled after vhost stops
>>>
>>> This patch seems to fix 2) but not 1). Do we need to deal with the
>>> IOMMU enabled after vhost starts?
>> sorry I initially misunderstood the above comment. Indeed in the reboot
>> case assumption 2) happens to be wrong. However what I currently do is:
>> stop listening to iotlb miss requests from the kernel because my
>> understanding is those requests are just spurious ones, generate
>> warnings and we do not care since we are rebooting the system.
>>
>> However I do not claim this could handle the case where the IOMMU MR
>> would be turned off and then turned on. I think in that case we should
>> also flush the kernel IOTLB and this is not taken care of in this patch.
>> Is it a relevant use case?
> Not sure.
>
>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
>> that a valid use case as the virtio driver is using the dma api?
> It should not be but we can't assume the behaviour of the guest. It
> could be buggy or even malicious.
agreed
>
> Btw, we had the following codes while handling te:
>
> /* Handle Translation Enable/Disable */
> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
> {
>     if (s->dmar_enabled == en) {
>         return;
>     }
>
>     trace_vtd_dmar_enable(en);
>
> ...
>
>     vtd_reset_caches(s);
>     vtd_address_space_refresh_all(s);
> }
>
> vtd_address_space_refresh_all() will basically disable the iommu
> memory region. It looks not sufficient to trigger the region_del
> callback, maybe we should delete the region or introduce listener
> callback?
This is exactly the code path which is entered in my use case.
vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But given the current implement of this latter the IOTLB callback is not unset and the kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are not updated? shouldn't they. Can you explain what they correspond to?
Thanks
Eric
>
> Thanks
>
>> Eric
>>
>>
>>> Thanks
>>>
>>>>  }
>>>>
>>>>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>>>> --
>>>> 2.47.1
>>>>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-22  7:55       ` Eric Auger
@ 2025-01-23  1:34         ` Jason Wang
  2025-01-23  8:31           ` Eric Auger
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Wang @ 2025-01-23  1:34 UTC (permalink / raw)
  To: eric.auger
  Cc: eric.auger.pro, qemu-devel, mst, sgarzare, lvivier,
	zhenzhong.duan, Peter Xu
On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
>
> Hi Jason,
>
>
> On 1/22/25 8:17 AM, Jason Wang wrote:
> > On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote:
> >>
> >> Hi Jason,
> >>
> >> On 1/21/25 4:27 AM, Jason Wang wrote:
> >>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote:
> >>>> When a guest exposed with a vhost device and protected by an
> >>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
> >>>>
> >>>> Fail to lookup the translated address ffffe000
> >>>>
> >>>> We observe that the IOMMU gets disabled through a write to the global
> >>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
> >>>> When this warning happens it can be observed an inflight IOTLB
> >>>> miss occurs after the IOMMU disable and before the vhost stop. In
> >>>> that case a flat translation occurs and the check in
> >>>> vhost_memory_region_lookup() fails.
> >>>>
> >>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
> >>>> unregistered.
> >>>>
> >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>> ---
> >>>>  hw/virtio/vhost.c | 4 ++++
> >>>>  1 file changed, 4 insertions(+)
> >>>>
> >>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>> index 6aa72fd434..128c2ab094 100644
> >>>> --- a/hw/virtio/vhost.c
> >>>> +++ b/hw/virtio/vhost.c
> >>>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
> >>>>              break;
> >>>>          }
> >>>>      }
> >>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
> >>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
> >>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
> >>>> +    }
> >>> So the current code assumes:
> >>>
> >>> 1) IOMMU is enabled before vhost starts
> >>> 2) IOMMU is disabled after vhost stops
> >>>
> >>> This patch seems to fix 2) but not 1). Do we need to deal with the
> >>> IOMMU enabled after vhost starts?
> >> sorry I initially misunderstood the above comment. Indeed in the reboot
> >> case assumption 2) happens to be wrong. However what I currently do is:
> >> stop listening to iotlb miss requests from the kernel because my
> >> understanding is those requests are just spurious ones, generate
> >> warnings and we do not care since we are rebooting the system.
> >>
> >> However I do not claim this could handle the case where the IOMMU MR
> >> would be turned off and then turned on. I think in that case we should
> >> also flush the kernel IOTLB and this is not taken care of in this patch.
> >> Is it a relevant use case?
> > Not sure.
> >
> >> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
> >> that a valid use case as the virtio driver is using the dma api?
> > It should not be but we can't assume the behaviour of the guest. It
> > could be buggy or even malicious.
>
> agreed
> >
> > Btw, we had the following codes while handling te:
> >
> > /* Handle Translation Enable/Disable */
> > static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
> > {
> >     if (s->dmar_enabled == en) {
> >         return;
> >     }
> >
> >     trace_vtd_dmar_enable(en);
> >
> > ...
> >
> >     vtd_reset_caches(s);
> >     vtd_address_space_refresh_all(s);
> > }
> >
> > vtd_address_space_refresh_all() will basically disable the iommu
> > memory region. It looks not sufficient to trigger the region_del
> > callback, maybe we should delete the region or introduce listener
> > callback?
>
> This is exactly the code path which is entered in my use case.
>
> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But given the current implement of this latter the IOTLB callback is not unset and the kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are not updated? shouldn't they. Can you explain what they correspond to?
Adding Peter for more ideas.
I think it's better to find a way to trigger the listener here, probably:
1) add/delete the memory regions instead of enable/disable
or
2) introduce new listener ops that can be triggered when a region is
enabled or disabled
Thanks
>
> Thanks
>
> Eric
>
> >
> > Thanks
> >
> >> Eric
> >>
> >>
> >>> Thanks
> >>>
> >>>>  }
> >>>>
> >>>>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
> >>>> --
> >>>> 2.47.1
> >>>>
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-23  1:34         ` Jason Wang
@ 2025-01-23  8:31           ` Eric Auger
  2025-01-24  1:48             ` Jason Wang
  2025-01-24  2:44             ` Duan, Zhenzhong
  0 siblings, 2 replies; 34+ messages in thread
From: Eric Auger @ 2025-01-23  8:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: eric.auger.pro, qemu-devel, mst, sgarzare, lvivier,
	zhenzhong.duan, Peter Xu
Hi Jason,
On 1/23/25 2:34 AM, Jason Wang wrote:
> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
>> Hi Jason,
>>
>>
>> On 1/22/25 8:17 AM, Jason Wang wrote:
>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote:
>>>> Hi Jason,
>>>>
>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote:
>>>>>> When a guest exposed with a vhost device and protected by an
>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>>>>>
>>>>>> Fail to lookup the translated address ffffe000
>>>>>>
>>>>>> We observe that the IOMMU gets disabled through a write to the global
>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
>>>>>> When this warning happens it can be observed an inflight IOTLB
>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
>>>>>> that case a flat translation occurs and the check in
>>>>>> vhost_memory_region_lookup() fails.
>>>>>>
>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>>>>> unregistered.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>> ---
>>>>>>  hw/virtio/vhost.c | 4 ++++
>>>>>>  1 file changed, 4 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>>> index 6aa72fd434..128c2ab094 100644
>>>>>> --- a/hw/virtio/vhost.c
>>>>>> +++ b/hw/virtio/vhost.c
>>>>>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
>>>>>>              break;
>>>>>>          }
>>>>>>      }
>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>>>>>> +    }
>>>>> So the current code assumes:
>>>>>
>>>>> 1) IOMMU is enabled before vhost starts
>>>>> 2) IOMMU is disabled after vhost stops
>>>>>
>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
>>>>> IOMMU enabled after vhost starts?
>>>> sorry I initially misunderstood the above comment. Indeed in the reboot
>>>> case assumption 2) happens to be wrong. However what I currently do is:
>>>> stop listening to iotlb miss requests from the kernel because my
>>>> understanding is those requests are just spurious ones, generate
>>>> warnings and we do not care since we are rebooting the system.
>>>>
>>>> However I do not claim this could handle the case where the IOMMU MR
>>>> would be turned off and then turned on. I think in that case we should
>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
>>>> Is it a relevant use case?
>>> Not sure.
>>>
>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
>>>> that a valid use case as the virtio driver is using the dma api?
>>> It should not be but we can't assume the behaviour of the guest. It
>>> could be buggy or even malicious.
>> agreed
>>> Btw, we had the following codes while handling te:
>>>
>>> /* Handle Translation Enable/Disable */
>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
>>> {
>>>     if (s->dmar_enabled == en) {
>>>         return;
>>>     }
>>>
>>>     trace_vtd_dmar_enable(en);
>>>
>>> ...
>>>
>>>     vtd_reset_caches(s);
>>>     vtd_address_space_refresh_all(s);
>>> }
>>>
>>> vtd_address_space_refresh_all() will basically disable the iommu
>>> memory region. It looks not sufficient to trigger the region_del
>>> callback, maybe we should delete the region or introduce listener
>>> callback?
>> This is exactly the code path which is entered in my use case.
>>
>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But given the current implement of this latter the IOTLB callback is not unset and the kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are not updated? shouldn't they. Can you explain what they correspond to?
> Adding Peter for more ideas.
>
> I think it's better to find a way to trigger the listener here, probably:
>
> 1) add/delete the memory regions instead of enable/disable
sorry I don't understand what you mean. The vhost_iommu_region_del call
stack is provided below [1]. Write to the intel iommu GCMD TE bit
induces a call to vhost_iommu_region_del. This happens before the
vhost_dev_stop whose call stack is provided below [2] and originates
from a bus reset.
We may have inflight IOTLB miss requests happening between both.
If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
translated anymore by the IOMMU and the iotlb.translated_addr returned
by address_space_get_iotlb_entry() is not within the registered
vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
the "Fail to lookup the translated address" message.
It sounds wrong that vhost keeps on using IOVAs that are not translated
anymore. It looks we have a reset ordering issue and this patch is just
removing the sympton and not the cause.
At the moment I don't really get what is initiating the intel iommu TE
bit write. This is a guest action but is it initiated from a misordered
qemu event?
It could also relate to
[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+the+rest+of+devices
Thanks
Eric
[1]
#0  vhost_iommu_region_del (listener=0x55555708a388,
section=0x7ffff62bf030) at ../hw/virtio/vhost.c:927
#1  0x0000555555c851b4 in address_space_update_topology_pass
(as=0x55555822e4f0, old_view=0x7fffe8e81850, new_view=0x55555723dfa0,
adding=false) at ../system/memory.c:977
#2  0x0000555555c857a0 in address_space_set_flatview (as=0x55555822e4f0)
at ../system/memory.c:1079
#3  0x0000555555c85986 in memory_region_transaction_commit () at
../system/memory.c:1132
#4  0x0000555555c89f25 in memory_region_set_enabled (mr=0x555557dc0d30,
enabled=false) at ../system/memory.c:2686
#5  0x0000555555b5edb1 in vtd_switch_address_space (as=0x555557dc0c60)
at ../hw/i386/intel_iommu.c:1735
#6  0x0000555555b5ee6f in vtd_switch_address_space_all
(s=0x555557db1500) at ../hw/i386/intel_iommu.c:1779
#7  0x0000555555b64533 in vtd_address_space_refresh_all
(s=0x555557db1500) at ../hw/i386/intel_iommu.c:4006
#8  0x0000555555b60770 in vtd_handle_gcmd_te (s=0x555557db1500,
en=false) at ../hw/i386/intel_iommu.c:2419
#9  0x0000555555b60885 in vtd_handle_gcmd_write (s=0x555557db1500) at
../hw/i386/intel_iommu.c:2449
#10 0x0000555555b61db2 in vtd_mem_write (opaque=0x555557db1500, addr=24,
val=100663296, size=4) at ../hw/i386/intel_iommu.c:2991
#11 0x0000555555c833e0 in memory_region_write_accessor
(mr=0x555557db1840, addr=24, value=0x7ffff62bf3d8, size=4, shift=0,
mask=4294967295, attrs=...) at ../system/memory.c:497
#12 0x0000555555c83711 in access_with_adjusted_size
    (addr=24, value=0x7ffff62bf3d8, size=4, access_size_min=4,
access_size_max=8, access_fn=0x555555c832ea
<memory_region_write_accessor>, mr=0x555557db1840, attrs=...)
    at ../system/memory.c:573
#13 0x0000555555c8698b in memory_region_dispatch_write
(mr=0x555557db1840, addr=24, data=100663296, op=MO_32, attrs=...) at
../system/memory.c:1521
#14 0x0000555555c95619 in flatview_write_continue_step (attrs=...,
buf=0x7ffff7fbb028 "", len=4, mr_addr=24, l=0x7ffff62bf4c0,
mr=0x555557db1840) at ../system/physmem.c:2803
#15 0x0000555555c956eb in flatview_write_continue (fv=0x7fffe852d370,
addr=4275634200, attrs=..., ptr=0x7ffff7fbb028, len=4, mr_addr=24, l=4,
mr=0x555557db1840) at ../system/physmem.c:2833
#16 0x0000555555c957f9 in flatview_write (fv=0x7fffe852d370,
addr=4275634200, attrs=..., buf=0x7ffff7fbb028, len=4) at
../system/physmem.c:2864
#17 0x0000555555c95c39 in address_space_write (as=0x555556ff1720
<address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028,
len=4) at ../system/physmem.c:2984
#18 0x0000555555c95cb1 in address_space_rw (as=0x555556ff1720
<address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028,
len=4, is_write=true) at ../system/physmem.c:2994
#19 0x0000555555ceeb56 in kvm_cpu_exec (cpu=0x55555727e950) at
../accel/kvm/kvm-all.c:3188
#20 0x0000555555cf1ea6 in kvm_vcpu_thread_fn (arg=0x55555727e950) at
../accel/kvm/kvm-accel-ops.c:50
#21 0x0000555555f6de20 in qemu_thread_start (args=0x555557288960) at
../util/qemu-thread-posix.c:541
#22 0x00007ffff7289e92 in start_thread () at /lib64/libc.so.6
[2]
#0  vhost_dev_stop (hdev=0x55555708a2c0, vdev=0x555558234cb0,
vrings=false) at ../hw/virtio/vhost.c:2222
#1  0x0000555555990266 in vhost_net_stop_one (net=0x55555708a2c0,
dev=0x555558234cb0) at ../hw/net/vhost_net.c:350
#2  0x00005555559906ea in vhost_net_stop (dev=0x555558234cb0,
ncs=0x55555826f968, data_queue_pairs=1, cvq=0) at ../hw/net/vhost_net.c:462
#3  0x0000555555c1ad0a in virtio_net_vhost_status (n=0x555558234cb0,
status=0 '\000') at ../hw/net/virtio-net.c:318
#4  0x0000555555c1afaf in virtio_net_set_status (vdev=0x555558234cb0,
status=0 '\000') at ../hw/net/virtio-net.c:393
#5  0x0000555555c591bd in virtio_set_status (vdev=0x555558234cb0, val=0
'\000') at ../hw/virtio/virtio.c:2242
#6  0x0000555555c595eb in virtio_reset (opaque=0x555558234cb0) at
../hw/virtio/virtio.c:2325
#7  0x0000555555a0d1e1 in virtio_bus_reset (bus=0x555558234c30) at
../hw/virtio/virtio-bus.c:109
#8  0x0000555555a13d51 in virtio_pci_reset (qdev=0x55555822c830) at
../hw/virtio/virtio-pci.c:2282
#9  0x0000555555a14016 in virtio_pci_bus_reset_hold (obj=0x55555822c830,
type=RESET_TYPE_COLD) at ../hw/virtio/virtio-pci.c:2322
#10 0x0000555555d08831 in resettable_phase_hold (obj=0x55555822c830,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:180
#11 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557ffa3c0,
cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0,
type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
#12 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20,
obj=0x555557ffa3c0, cb=0x555555d086d5 <resettable_phase_hold>,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
#13 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ffa3c0,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
#14 0x0000555555d0543b in device_reset_child_foreach
(obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275
#15 0x0000555555d084d8 in resettable_child_foreach (rc=0x55555715a090,
obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
#16 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ff98e0,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
#17 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557445120,
cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0,
type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
#18 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20,
obj=0x555557445120, cb=0x555555d086d5 <resettable_phase_hold>,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
#19 0x0000555555d08792 in resettable_phase_hold (obj=0x555557445120,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
#20 0x0000555555d0543b in device_reset_child_foreach
(obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275
#21 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555570cf410,
obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
#22 0x0000555555d08792 in resettable_phase_hold (obj=0x5555572d0a00,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
#23 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x55555723d270,
cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0,
type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
#24 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555571bfde0,
obj=0x55555723d270, cb=0x555555d086d5 <resettable_phase_hold>,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
#25 0x0000555555d08792 in resettable_phase_hold (obj=0x55555723d270,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
#26 0x0000555555d06d6d in resettable_container_child_foreach
(obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resetcontainer.c:54
#27 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555572180f0,
obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
#28 0x0000555555d08792 in resettable_phase_hold (obj=0x55555724a8a0,
opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
#29 0x0000555555d0838d in resettable_assert_reset (obj=0x55555724a8a0,
type=RESET_TYPE_COLD) at ../hw/core/resettable.c:58
#30 0x0000555555d082e5 in resettable_reset (obj=0x55555724a8a0,
type=RESET_TYPE_COLD) at ../hw/core/resettable.c:45
#31 0x00005555558db5c8 in qemu_devices_reset
(reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/core/reset.c:179
#32 0x0000555555b6f5b2 in pc_machine_reset (machine=0x555557234490,
reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/i386/pc.c:1877
#33 0x0000555555a5a0a2 in qemu_system_reset
(reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../system/runstate.c:516
#34 0x0000555555a5a891 in main_loop_should_exit (status=0x7fffffffd574)
at ../system/runstate.c:792
#35 0x0000555555a5a992 in qemu_main_loop () at ../system/runstate.c:825
#36 0x0000555555e9cced in qemu_default_main () at ../system/main.c:37
#37 0x0000555555e9cd2a in main (argc=58, argv=0x7fffffffd6d8) at
../system/main.c:48
>
> or
>
> 2) introduce new listener ops that can be triggered when a region is
> enabled or disabled
>
> Thanks
>
>> Thanks
>>
>> Eric
>>
>>> Thanks
>>>
>>>> Eric
>>>>
>>>>
>>>>> Thanks
>>>>>
>>>>>>  }
>>>>>>
>>>>>>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
>>>>>> --
>>>>>> 2.47.1
>>>>>>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-23  8:31           ` Eric Auger
@ 2025-01-24  1:48             ` Jason Wang
  2025-01-24  2:44             ` Duan, Zhenzhong
  1 sibling, 0 replies; 34+ messages in thread
From: Jason Wang @ 2025-01-24  1:48 UTC (permalink / raw)
  To: eric.auger
  Cc: eric.auger.pro, qemu-devel, mst, sgarzare, lvivier,
	zhenzhong.duan, Peter Xu, CLEMENT MATHIEU--DRIF, Liu, Yi L
On Thu, Jan 23, 2025 at 4:31 PM Eric Auger <eric.auger@redhat.com> wrote:
>
> Hi Jason,
>
>
> On 1/23/25 2:34 AM, Jason Wang wrote:
> > On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
> >> Hi Jason,
> >>
> >>
> >> On 1/22/25 8:17 AM, Jason Wang wrote:
> >>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote:
> >>>> Hi Jason,
> >>>>
> >>>> On 1/21/25 4:27 AM, Jason Wang wrote:
> >>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com> wrote:
> >>>>>> When a guest exposed with a vhost device and protected by an
> >>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
> >>>>>>
> >>>>>> Fail to lookup the translated address ffffe000
> >>>>>>
> >>>>>> We observe that the IOMMU gets disabled through a write to the global
> >>>>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
> >>>>>> When this warning happens it can be observed an inflight IOTLB
> >>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
> >>>>>> that case a flat translation occurs and the check in
> >>>>>> vhost_memory_region_lookup() fails.
> >>>>>>
> >>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
> >>>>>> unregistered.
> >>>>>>
> >>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>> ---
> >>>>>>  hw/virtio/vhost.c | 4 ++++
> >>>>>>  1 file changed, 4 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>>>> index 6aa72fd434..128c2ab094 100644
> >>>>>> --- a/hw/virtio/vhost.c
> >>>>>> +++ b/hw/virtio/vhost.c
> >>>>>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener *listener,
> >>>>>>              break;
> >>>>>>          }
> >>>>>>      }
> >>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
> >>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
> >>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
> >>>>>> +    }
> >>>>> So the current code assumes:
> >>>>>
> >>>>> 1) IOMMU is enabled before vhost starts
> >>>>> 2) IOMMU is disabled after vhost stops
> >>>>>
> >>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
> >>>>> IOMMU enabled after vhost starts?
> >>>> sorry I initially misunderstood the above comment. Indeed in the reboot
> >>>> case assumption 2) happens to be wrong. However what I currently do is:
> >>>> stop listening to iotlb miss requests from the kernel because my
> >>>> understanding is those requests are just spurious ones, generate
> >>>> warnings and we do not care since we are rebooting the system.
> >>>>
> >>>> However I do not claim this could handle the case where the IOMMU MR
> >>>> would be turned off and then turned on. I think in that case we should
> >>>> also flush the kernel IOTLB and this is not taken care of in this patch.
> >>>> Is it a relevant use case?
> >>> Not sure.
> >>>
> >>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
> >>>> that a valid use case as the virtio driver is using the dma api?
> >>> It should not be but we can't assume the behaviour of the guest. It
> >>> could be buggy or even malicious.
> >> agreed
> >>> Btw, we had the following codes while handling te:
> >>>
> >>> /* Handle Translation Enable/Disable */
> >>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
> >>> {
> >>>     if (s->dmar_enabled == en) {
> >>>         return;
> >>>     }
> >>>
> >>>     trace_vtd_dmar_enable(en);
> >>>
> >>> ...
> >>>
> >>>     vtd_reset_caches(s);
> >>>     vtd_address_space_refresh_all(s);
> >>> }
> >>>
> >>> vtd_address_space_refresh_all() will basically disable the iommu
> >>> memory region. It looks not sufficient to trigger the region_del
> >>> callback, maybe we should delete the region or introduce listener
> >>> callback?
> >> This is exactly the code path which is entered in my use case.
> >>
> >> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But given the current implement of this latter the IOTLB callback is not unset and the kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are not updated? shouldn't they. Can you explain what they correspond to?
> > Adding Peter for more ideas.
> >
> > I think it's better to find a way to trigger the listener here, probably:
> >
> > 1) add/delete the memory regions instead of enable/disable
> sorry I don't understand what you mean. The vhost_iommu_region_del call
> stack is provided below [1]. Write to the intel iommu GCMD TE bit
> induces a call to vhost_iommu_region_del. This happens before the
> vhost_dev_stop whose call stack is provided below [2] and originates
> from a bus reset.
Right, I see.
>
> We may have inflight IOTLB miss requests happening between both.
>
> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
> translated anymore by the IOMMU and the iotlb.translated_addr returned
> by address_space_get_iotlb_entry() is not within the registered
> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
> the "Fail to lookup the translated address" message.
>
> It sounds wrong that vhost keeps on using IOVAs that are not translated
> anymore. It looks we have a reset ordering issue and this patch is just
> removing the sympton and not the cause.
Exactly.
>
> At the moment I don't really get what is initiating the intel iommu TE
> bit write. This is a guest action but is it initiated from a misordered
> qemu event?
It could also be a guest bug which reset the vIOMMU before the device.
Do we have a calltrace in the guest?
Adding more vtd maintiners for more thoughts.
Thanks
>
> It could also relate to
> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+the+rest+of+devices
>
> Thanks
>
> Eric
>
>
>
>
> [1]
> #0  vhost_iommu_region_del (listener=0x55555708a388,
> section=0x7ffff62bf030) at ../hw/virtio/vhost.c:927
> #1  0x0000555555c851b4 in address_space_update_topology_pass
> (as=0x55555822e4f0, old_view=0x7fffe8e81850, new_view=0x55555723dfa0,
> adding=false) at ../system/memory.c:977
> #2  0x0000555555c857a0 in address_space_set_flatview (as=0x55555822e4f0)
> at ../system/memory.c:1079
> #3  0x0000555555c85986 in memory_region_transaction_commit () at
> ../system/memory.c:1132
> #4  0x0000555555c89f25 in memory_region_set_enabled (mr=0x555557dc0d30,
> enabled=false) at ../system/memory.c:2686
> #5  0x0000555555b5edb1 in vtd_switch_address_space (as=0x555557dc0c60)
> at ../hw/i386/intel_iommu.c:1735
> #6  0x0000555555b5ee6f in vtd_switch_address_space_all
> (s=0x555557db1500) at ../hw/i386/intel_iommu.c:1779
> #7  0x0000555555b64533 in vtd_address_space_refresh_all
> (s=0x555557db1500) at ../hw/i386/intel_iommu.c:4006
> #8  0x0000555555b60770 in vtd_handle_gcmd_te (s=0x555557db1500,
> en=false) at ../hw/i386/intel_iommu.c:2419
> #9  0x0000555555b60885 in vtd_handle_gcmd_write (s=0x555557db1500) at
> ../hw/i386/intel_iommu.c:2449
> #10 0x0000555555b61db2 in vtd_mem_write (opaque=0x555557db1500, addr=24,
> val=100663296, size=4) at ../hw/i386/intel_iommu.c:2991
> #11 0x0000555555c833e0 in memory_region_write_accessor
> (mr=0x555557db1840, addr=24, value=0x7ffff62bf3d8, size=4, shift=0,
> mask=4294967295, attrs=...) at ../system/memory.c:497
> #12 0x0000555555c83711 in access_with_adjusted_size
>     (addr=24, value=0x7ffff62bf3d8, size=4, access_size_min=4,
> access_size_max=8, access_fn=0x555555c832ea
> <memory_region_write_accessor>, mr=0x555557db1840, attrs=...)
>     at ../system/memory.c:573
> #13 0x0000555555c8698b in memory_region_dispatch_write
> (mr=0x555557db1840, addr=24, data=100663296, op=MO_32, attrs=...) at
> ../system/memory.c:1521
> #14 0x0000555555c95619 in flatview_write_continue_step (attrs=...,
> buf=0x7ffff7fbb028 "", len=4, mr_addr=24, l=0x7ffff62bf4c0,
> mr=0x555557db1840) at ../system/physmem.c:2803
> #15 0x0000555555c956eb in flatview_write_continue (fv=0x7fffe852d370,
> addr=4275634200, attrs=..., ptr=0x7ffff7fbb028, len=4, mr_addr=24, l=4,
> mr=0x555557db1840) at ../system/physmem.c:2833
> #16 0x0000555555c957f9 in flatview_write (fv=0x7fffe852d370,
> addr=4275634200, attrs=..., buf=0x7ffff7fbb028, len=4) at
> ../system/physmem.c:2864
> #17 0x0000555555c95c39 in address_space_write (as=0x555556ff1720
> <address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028,
> len=4) at ../system/physmem.c:2984
> #18 0x0000555555c95cb1 in address_space_rw (as=0x555556ff1720
> <address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028,
> len=4, is_write=true) at ../system/physmem.c:2994
> #19 0x0000555555ceeb56 in kvm_cpu_exec (cpu=0x55555727e950) at
> ../accel/kvm/kvm-all.c:3188
> #20 0x0000555555cf1ea6 in kvm_vcpu_thread_fn (arg=0x55555727e950) at
> ../accel/kvm/kvm-accel-ops.c:50
> #21 0x0000555555f6de20 in qemu_thread_start (args=0x555557288960) at
> ../util/qemu-thread-posix.c:541
> #22 0x00007ffff7289e92 in start_thread () at /lib64/libc.so.6
>
> [2]
> #0  vhost_dev_stop (hdev=0x55555708a2c0, vdev=0x555558234cb0,
> vrings=false) at ../hw/virtio/vhost.c:2222
> #1  0x0000555555990266 in vhost_net_stop_one (net=0x55555708a2c0,
> dev=0x555558234cb0) at ../hw/net/vhost_net.c:350
> #2  0x00005555559906ea in vhost_net_stop (dev=0x555558234cb0,
> ncs=0x55555826f968, data_queue_pairs=1, cvq=0) at ../hw/net/vhost_net.c:462
> #3  0x0000555555c1ad0a in virtio_net_vhost_status (n=0x555558234cb0,
> status=0 '\000') at ../hw/net/virtio-net.c:318
> #4  0x0000555555c1afaf in virtio_net_set_status (vdev=0x555558234cb0,
> status=0 '\000') at ../hw/net/virtio-net.c:393
> #5  0x0000555555c591bd in virtio_set_status (vdev=0x555558234cb0, val=0
> '\000') at ../hw/virtio/virtio.c:2242
> #6  0x0000555555c595eb in virtio_reset (opaque=0x555558234cb0) at
> ../hw/virtio/virtio.c:2325
> #7  0x0000555555a0d1e1 in virtio_bus_reset (bus=0x555558234c30) at
> ../hw/virtio/virtio-bus.c:109
> #8  0x0000555555a13d51 in virtio_pci_reset (qdev=0x55555822c830) at
> ../hw/virtio/virtio-pci.c:2282
> #9  0x0000555555a14016 in virtio_pci_bus_reset_hold (obj=0x55555822c830,
> type=RESET_TYPE_COLD) at ../hw/virtio/virtio-pci.c:2322
> #10 0x0000555555d08831 in resettable_phase_hold (obj=0x55555822c830,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:180
> #11 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557ffa3c0,
> cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0,
> type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
> #12 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20,
> obj=0x555557ffa3c0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #13 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ffa3c0,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #14 0x0000555555d0543b in device_reset_child_foreach
> (obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275
> #15 0x0000555555d084d8 in resettable_child_foreach (rc=0x55555715a090,
> obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #16 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ff98e0,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #17 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557445120,
> cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0,
> type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
> #18 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20,
> obj=0x555557445120, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #19 0x0000555555d08792 in resettable_phase_hold (obj=0x555557445120,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #20 0x0000555555d0543b in device_reset_child_foreach
> (obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275
> #21 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555570cf410,
> obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #22 0x0000555555d08792 in resettable_phase_hold (obj=0x5555572d0a00,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #23 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x55555723d270,
> cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0,
> type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
> #24 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555571bfde0,
> obj=0x55555723d270, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #25 0x0000555555d08792 in resettable_phase_hold (obj=0x55555723d270,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #26 0x0000555555d06d6d in resettable_container_child_foreach
> (obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resetcontainer.c:54
> #27 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555572180f0,
> obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #28 0x0000555555d08792 in resettable_phase_hold (obj=0x55555724a8a0,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #29 0x0000555555d0838d in resettable_assert_reset (obj=0x55555724a8a0,
> type=RESET_TYPE_COLD) at ../hw/core/resettable.c:58
> #30 0x0000555555d082e5 in resettable_reset (obj=0x55555724a8a0,
> type=RESET_TYPE_COLD) at ../hw/core/resettable.c:45
> #31 0x00005555558db5c8 in qemu_devices_reset
> (reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/core/reset.c:179
> #32 0x0000555555b6f5b2 in pc_machine_reset (machine=0x555557234490,
> reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/i386/pc.c:1877
> #33 0x0000555555a5a0a2 in qemu_system_reset
> (reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../system/runstate.c:516
> #34 0x0000555555a5a891 in main_loop_should_exit (status=0x7fffffffd574)
> at ../system/runstate.c:792
> #35 0x0000555555a5a992 in qemu_main_loop () at ../system/runstate.c:825
> #36 0x0000555555e9cced in qemu_default_main () at ../system/main.c:37
> #37 0x0000555555e9cd2a in main (argc=58, argv=0x7fffffffd6d8) at
> ../system/main.c:48
>
>
> >
> > or
> >
> > 2) introduce new listener ops that can be triggered when a region is
> > enabled or disabled
> >
> > Thanks
> >
> >> Thanks
> >>
> >> Eric
> >>
> >>> Thanks
> >>>
> >>>> Eric
> >>>>
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>>  }
> >>>>>>
> >>>>>>  void vhost_toggle_device_iotlb(VirtIODevice *vdev)
> >>>>>> --
> >>>>>> 2.47.1
> >>>>>>
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-23  8:31           ` Eric Auger
  2025-01-24  1:48             ` Jason Wang
@ 2025-01-24  2:44             ` Duan, Zhenzhong
  2025-01-24  3:30               ` Jason Wang
                                 ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Duan, Zhenzhong @ 2025-01-24  2:44 UTC (permalink / raw)
  To: eric.auger@redhat.com, Jason Wang
  Cc: eric.auger.pro@gmail.com, qemu-devel@nongnu.org, mst@redhat.com,
	sgarzare@redhat.com, lvivier@redhat.com, Peter Xu
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>disabled
>
>Hi Jason,
>
>
>On 1/23/25 2:34 AM, Jason Wang wrote:
>> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
>>> Hi Jason,
>>>
>>>
>>> On 1/22/25 8:17 AM, Jason Wang wrote:
>>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com>
>wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
>>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
>wrote:
>>>>>>> When a guest exposed with a vhost device and protected by an
>>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>>>>>>
>>>>>>> Fail to lookup the translated address ffffe000
>>>>>>>
>>>>>>> We observe that the IOMMU gets disabled through a write to the global
>>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets
>stopped.
>>>>>>> When this warning happens it can be observed an inflight IOTLB
>>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
>>>>>>> that case a flat translation occurs and the check in
>>>>>>> vhost_memory_region_lookup() fails.
>>>>>>>
>>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>>>>>> unregistered.
>>>>>>>
>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>> ---
>>>>>>>  hw/virtio/vhost.c | 4 ++++
>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>>>> index 6aa72fd434..128c2ab094 100644
>>>>>>> --- a/hw/virtio/vhost.c
>>>>>>> +++ b/hw/virtio/vhost.c
>>>>>>> @@ -931,6 +931,10 @@ static void
>vhost_iommu_region_del(MemoryListener *listener,
>>>>>>>              break;
>>>>>>>          }
>>>>>>>      }
>>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>>>>>>> +    }
>>>>>> So the current code assumes:
>>>>>>
>>>>>> 1) IOMMU is enabled before vhost starts
>>>>>> 2) IOMMU is disabled after vhost stops
>>>>>>
>>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
>>>>>> IOMMU enabled after vhost starts?
>>>>> sorry I initially misunderstood the above comment. Indeed in the reboot
>>>>> case assumption 2) happens to be wrong. However what I currently do is:
>>>>> stop listening to iotlb miss requests from the kernel because my
>>>>> understanding is those requests are just spurious ones, generate
>>>>> warnings and we do not care since we are rebooting the system.
>>>>>
>>>>> However I do not claim this could handle the case where the IOMMU MR
>>>>> would be turned off and then turned on. I think in that case we should
>>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
>>>>> Is it a relevant use case?
>>>> Not sure.
>>>>
>>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
>>>>> that a valid use case as the virtio driver is using the dma api?
>>>> It should not be but we can't assume the behaviour of the guest. It
>>>> could be buggy or even malicious.
>>> agreed
>>>> Btw, we had the following codes while handling te:
>>>>
>>>> /* Handle Translation Enable/Disable */
>>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
>>>> {
>>>>     if (s->dmar_enabled == en) {
>>>>         return;
>>>>     }
>>>>
>>>>     trace_vtd_dmar_enable(en);
>>>>
>>>> ...
>>>>
>>>>     vtd_reset_caches(s);
>>>>     vtd_address_space_refresh_all(s);
>>>> }
>>>>
>>>> vtd_address_space_refresh_all() will basically disable the iommu
>>>> memory region. It looks not sufficient to trigger the region_del
>>>> callback, maybe we should delete the region or introduce listener
>>>> callback?
>>> This is exactly the code path which is entered in my use case.
>>>
>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But
>given the current implement of this latter the IOTLB callback is not unset and the
>kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are
>not updated? shouldn't they. Can you explain what they correspond to?
>> Adding Peter for more ideas.
>>
>> I think it's better to find a way to trigger the listener here, probably:
>>
>> 1) add/delete the memory regions instead of enable/disable
>sorry I don't understand what you mean. The vhost_iommu_region_del call
>stack is provided below [1]. Write to the intel iommu GCMD TE bit
>induces a call to vhost_iommu_region_del. This happens before the
>vhost_dev_stop whose call stack is provided below [2] and originates
>from a bus reset.
>
>We may have inflight IOTLB miss requests happening between both.
>
>If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
>translated anymore by the IOMMU and the iotlb.translated_addr returned
>by address_space_get_iotlb_entry() is not within the registered
>vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
>the "Fail to lookup the translated address" message.
>
>It sounds wrong that vhost keeps on using IOVAs that are not translated
>anymore. It looks we have a reset ordering issue and this patch is just
>removing the sympton and not the cause.
>
>At the moment I don't really get what is initiating the intel iommu TE
>bit write. This is a guest action but is it initiated from a misordered
>qemu event?
During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown()
to disable iommu before reset. So Peter's patch will not address your issue.
Before iommu shutdown, device_shutdown() is called to shutdown all devices.
Not clear why vhost is still active.
Thanks
Zhenzhong
>
>It could also relate to
>[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+
>the+rest+of+devices
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24  2:44             ` Duan, Zhenzhong
@ 2025-01-24  3:30               ` Jason Wang
  2025-01-24  3:41                 ` Jason Wang
  2025-01-24 15:18                 ` Peter Xu
  2025-01-24 17:47               ` Eric Auger
  2025-01-31  9:55               ` Eric Auger
  2 siblings, 2 replies; 34+ messages in thread
From: Jason Wang @ 2025-01-24  3:30 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: eric.auger@redhat.com, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com, Peter Xu
On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong
<zhenzhong.duan@intel.com> wrote:
>
>
>
> >-----Original Message-----
> >From: Eric Auger <eric.auger@redhat.com>
> >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
> >disabled
> >
> >Hi Jason,
> >
> >
> >On 1/23/25 2:34 AM, Jason Wang wrote:
> >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
> >>> Hi Jason,
> >>>
> >>>
> >>> On 1/22/25 8:17 AM, Jason Wang wrote:
> >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com>
> >wrote:
> >>>>> Hi Jason,
> >>>>>
> >>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
> >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
> >wrote:
> >>>>>>> When a guest exposed with a vhost device and protected by an
> >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
> >>>>>>>
> >>>>>>> Fail to lookup the translated address ffffe000
> >>>>>>>
> >>>>>>> We observe that the IOMMU gets disabled through a write to the global
> >>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets
> >stopped.
> >>>>>>> When this warning happens it can be observed an inflight IOTLB
> >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
> >>>>>>> that case a flat translation occurs and the check in
> >>>>>>> vhost_memory_region_lookup() fails.
> >>>>>>>
> >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
> >>>>>>> unregistered.
> >>>>>>>
> >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>>> ---
> >>>>>>>  hw/virtio/vhost.c | 4 ++++
> >>>>>>>  1 file changed, 4 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>>>>> index 6aa72fd434..128c2ab094 100644
> >>>>>>> --- a/hw/virtio/vhost.c
> >>>>>>> +++ b/hw/virtio/vhost.c
> >>>>>>> @@ -931,6 +931,10 @@ static void
> >vhost_iommu_region_del(MemoryListener *listener,
> >>>>>>>              break;
> >>>>>>>          }
> >>>>>>>      }
> >>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
> >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
> >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
> >>>>>>> +    }
> >>>>>> So the current code assumes:
> >>>>>>
> >>>>>> 1) IOMMU is enabled before vhost starts
> >>>>>> 2) IOMMU is disabled after vhost stops
> >>>>>>
> >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
> >>>>>> IOMMU enabled after vhost starts?
> >>>>> sorry I initially misunderstood the above comment. Indeed in the reboot
> >>>>> case assumption 2) happens to be wrong. However what I currently do is:
> >>>>> stop listening to iotlb miss requests from the kernel because my
> >>>>> understanding is those requests are just spurious ones, generate
> >>>>> warnings and we do not care since we are rebooting the system.
> >>>>>
> >>>>> However I do not claim this could handle the case where the IOMMU MR
> >>>>> would be turned off and then turned on. I think in that case we should
> >>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
> >>>>> Is it a relevant use case?
> >>>> Not sure.
> >>>>
> >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
> >>>>> that a valid use case as the virtio driver is using the dma api?
> >>>> It should not be but we can't assume the behaviour of the guest. It
> >>>> could be buggy or even malicious.
> >>> agreed
> >>>> Btw, we had the following codes while handling te:
> >>>>
> >>>> /* Handle Translation Enable/Disable */
> >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
> >>>> {
> >>>>     if (s->dmar_enabled == en) {
> >>>>         return;
> >>>>     }
> >>>>
> >>>>     trace_vtd_dmar_enable(en);
> >>>>
> >>>> ...
> >>>>
> >>>>     vtd_reset_caches(s);
> >>>>     vtd_address_space_refresh_all(s);
> >>>> }
> >>>>
> >>>> vtd_address_space_refresh_all() will basically disable the iommu
> >>>> memory region. It looks not sufficient to trigger the region_del
> >>>> callback, maybe we should delete the region or introduce listener
> >>>> callback?
> >>> This is exactly the code path which is entered in my use case.
> >>>
> >>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But
> >given the current implement of this latter the IOTLB callback is not unset and the
> >kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are
> >not updated? shouldn't they. Can you explain what they correspond to?
> >> Adding Peter for more ideas.
> >>
> >> I think it's better to find a way to trigger the listener here, probably:
> >>
> >> 1) add/delete the memory regions instead of enable/disable
> >sorry I don't understand what you mean. The vhost_iommu_region_del call
> >stack is provided below [1]. Write to the intel iommu GCMD TE bit
> >induces a call to vhost_iommu_region_del. This happens before the
> >vhost_dev_stop whose call stack is provided below [2] and originates
> >from a bus reset.
> >
> >We may have inflight IOTLB miss requests happening between both.
> >
> >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
> >translated anymore by the IOMMU and the iotlb.translated_addr returned
> >by address_space_get_iotlb_entry() is not within the registered
> >vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
> >the "Fail to lookup the translated address" message.
> >
> >It sounds wrong that vhost keeps on using IOVAs that are not translated
> >anymore. It looks we have a reset ordering issue and this patch is just
> >removing the sympton and not the cause.
> >
> >At the moment I don't really get what is initiating the intel iommu TE
> >bit write. This is a guest action but is it initiated from a misordered
> >qemu event?
>
> During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown()
> to disable iommu before reset. So Peter's patch will not address your issue.
>
> Before iommu shutdown, device_shutdown() is called to shutdown all devices.
> Not clear why vhost is still active.
It might be because neither virtio bus nor virtio-net provides a
shutdown method.
There used to be requests to provide those to unbreak the kexec.
A quick try might be to provide a .driver.shutdown to
virtio_net_driver structure and reset the device there as a start.
Thanks
>
> Thanks
> Zhenzhong
>
> >
> >It could also relate to
> >[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> >https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+
> >the+rest+of+devices
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24  3:30               ` Jason Wang
@ 2025-01-24  3:41                 ` Jason Wang
  2025-01-24  4:00                   ` Duan, Zhenzhong
  2025-01-24 17:56                   ` Eric Auger
  2025-01-24 15:18                 ` Peter Xu
  1 sibling, 2 replies; 34+ messages in thread
From: Jason Wang @ 2025-01-24  3:41 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: eric.auger@redhat.com, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com, Peter Xu
On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong
> <zhenzhong.duan@intel.com> wrote:
> >
> >
> >
> > >-----Original Message-----
> > >From: Eric Auger <eric.auger@redhat.com>
> > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
> > >disabled
> > >
> > >Hi Jason,
> > >
> > >
> > >On 1/23/25 2:34 AM, Jason Wang wrote:
> > >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
> > >>> Hi Jason,
> > >>>
> > >>>
> > >>> On 1/22/25 8:17 AM, Jason Wang wrote:
> > >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com>
> > >wrote:
> > >>>>> Hi Jason,
> > >>>>>
> > >>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
> > >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
> > >wrote:
> > >>>>>>> When a guest exposed with a vhost device and protected by an
> > >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
> > >>>>>>>
> > >>>>>>> Fail to lookup the translated address ffffe000
> > >>>>>>>
> > >>>>>>> We observe that the IOMMU gets disabled through a write to the global
> > >>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets
> > >stopped.
> > >>>>>>> When this warning happens it can be observed an inflight IOTLB
> > >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
> > >>>>>>> that case a flat translation occurs and the check in
> > >>>>>>> vhost_memory_region_lookup() fails.
> > >>>>>>>
> > >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
> > >>>>>>> unregistered.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> > >>>>>>> ---
> > >>>>>>>  hw/virtio/vhost.c | 4 ++++
> > >>>>>>>  1 file changed, 4 insertions(+)
> > >>>>>>>
> > >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > >>>>>>> index 6aa72fd434..128c2ab094 100644
> > >>>>>>> --- a/hw/virtio/vhost.c
> > >>>>>>> +++ b/hw/virtio/vhost.c
> > >>>>>>> @@ -931,6 +931,10 @@ static void
> > >vhost_iommu_region_del(MemoryListener *listener,
> > >>>>>>>              break;
> > >>>>>>>          }
> > >>>>>>>      }
> > >>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
> > >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
> > >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
> > >>>>>>> +    }
> > >>>>>> So the current code assumes:
> > >>>>>>
> > >>>>>> 1) IOMMU is enabled before vhost starts
> > >>>>>> 2) IOMMU is disabled after vhost stops
> > >>>>>>
> > >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
> > >>>>>> IOMMU enabled after vhost starts?
> > >>>>> sorry I initially misunderstood the above comment. Indeed in the reboot
> > >>>>> case assumption 2) happens to be wrong. However what I currently do is:
> > >>>>> stop listening to iotlb miss requests from the kernel because my
> > >>>>> understanding is those requests are just spurious ones, generate
> > >>>>> warnings and we do not care since we are rebooting the system.
> > >>>>>
> > >>>>> However I do not claim this could handle the case where the IOMMU MR
> > >>>>> would be turned off and then turned on. I think in that case we should
> > >>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
> > >>>>> Is it a relevant use case?
> > >>>> Not sure.
> > >>>>
> > >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
> > >>>>> that a valid use case as the virtio driver is using the dma api?
> > >>>> It should not be but we can't assume the behaviour of the guest. It
> > >>>> could be buggy or even malicious.
> > >>> agreed
> > >>>> Btw, we had the following codes while handling te:
> > >>>>
> > >>>> /* Handle Translation Enable/Disable */
> > >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
> > >>>> {
> > >>>>     if (s->dmar_enabled == en) {
> > >>>>         return;
> > >>>>     }
> > >>>>
> > >>>>     trace_vtd_dmar_enable(en);
> > >>>>
> > >>>> ...
> > >>>>
> > >>>>     vtd_reset_caches(s);
> > >>>>     vtd_address_space_refresh_all(s);
> > >>>> }
> > >>>>
> > >>>> vtd_address_space_refresh_all() will basically disable the iommu
> > >>>> memory region. It looks not sufficient to trigger the region_del
> > >>>> callback, maybe we should delete the region or introduce listener
> > >>>> callback?
> > >>> This is exactly the code path which is entered in my use case.
> > >>>
> > >>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But
> > >given the current implement of this latter the IOTLB callback is not unset and the
> > >kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are
> > >not updated? shouldn't they. Can you explain what they correspond to?
> > >> Adding Peter for more ideas.
> > >>
> > >> I think it's better to find a way to trigger the listener here, probably:
> > >>
> > >> 1) add/delete the memory regions instead of enable/disable
> > >sorry I don't understand what you mean. The vhost_iommu_region_del call
> > >stack is provided below [1]. Write to the intel iommu GCMD TE bit
> > >induces a call to vhost_iommu_region_del. This happens before the
> > >vhost_dev_stop whose call stack is provided below [2] and originates
> > >from a bus reset.
> > >
> > >We may have inflight IOTLB miss requests happening between both.
> > >
> > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
> > >translated anymore by the IOMMU and the iotlb.translated_addr returned
> > >by address_space_get_iotlb_entry() is not within the registered
> > >vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
> > >the "Fail to lookup the translated address" message.
> > >
> > >It sounds wrong that vhost keeps on using IOVAs that are not translated
> > >anymore. It looks we have a reset ordering issue and this patch is just
> > >removing the sympton and not the cause.
> > >
> > >At the moment I don't really get what is initiating the intel iommu TE
> > >bit write. This is a guest action but is it initiated from a misordered
> > >qemu event?
> >
> > During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown()
> > to disable iommu before reset. So Peter's patch will not address your issue.
> >
> > Before iommu shutdown, device_shutdown() is called to shutdown all devices.
> > Not clear why vhost is still active.
>
> It might be because neither virtio bus nor virtio-net provides a
> shutdown method.
>
> There used to be requests to provide those to unbreak the kexec.
More could be seen at https://issues.redhat.com/browse/RHEL-331
This looks exactly the same issue.
Thanks
>
> A quick try might be to provide a .driver.shutdown to
> virtio_net_driver structure and reset the device there as a start.
>
> Thanks
>
> >
> > Thanks
> > Zhenzhong
> >
> > >
> > >It could also relate to
> > >[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> > >https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+
> > >the+rest+of+devices
> >
^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24  3:41                 ` Jason Wang
@ 2025-01-24  4:00                   ` Duan, Zhenzhong
  2025-01-24  9:20                     ` Jason Wang
  2025-01-24 17:56                   ` Eric Auger
  1 sibling, 1 reply; 34+ messages in thread
From: Duan, Zhenzhong @ 2025-01-24  4:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: eric.auger@redhat.com, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com, Peter Xu
>-----Original Message-----
>From: Jason Wang <jasowang@redhat.com>
>Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>disabled
>
>On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong
>> <zhenzhong.duan@intel.com> wrote:
>> >
>> >
>> >
>> > >-----Original Message-----
>> > >From: Eric Auger <eric.auger@redhat.com>
>> > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU
>gets
>> > >disabled
>> > >
>> > >Hi Jason,
>> > >
>> > >
>> > >On 1/23/25 2:34 AM, Jason Wang wrote:
>> > >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com>
>wrote:
>> > >>> Hi Jason,
>> > >>>
>> > >>>
>> > >>> On 1/22/25 8:17 AM, Jason Wang wrote:
>> > >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com>
>> > >wrote:
>> > >>>>> Hi Jason,
>> > >>>>>
>> > >>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
>> > >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
>> > >wrote:
>> > >>>>>>> When a guest exposed with a vhost device and protected by an
>> > >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious
>warning:
>> > >>>>>>>
>> > >>>>>>> Fail to lookup the translated address ffffe000
>> > >>>>>>>
>> > >>>>>>> We observe that the IOMMU gets disabled through a write to the
>global
>> > >>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets
>> > >stopped.
>> > >>>>>>> When this warning happens it can be observed an inflight IOTLB
>> > >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
>> > >>>>>>> that case a flat translation occurs and the check in
>> > >>>>>>> vhost_memory_region_lookup() fails.
>> > >>>>>>>
>> > >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>> > >>>>>>> unregistered.
>> > >>>>>>>
>> > >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> > >>>>>>> ---
>> > >>>>>>>  hw/virtio/vhost.c | 4 ++++
>> > >>>>>>>  1 file changed, 4 insertions(+)
>> > >>>>>>>
>> > >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> > >>>>>>> index 6aa72fd434..128c2ab094 100644
>> > >>>>>>> --- a/hw/virtio/vhost.c
>> > >>>>>>> +++ b/hw/virtio/vhost.c
>> > >>>>>>> @@ -931,6 +931,10 @@ static void
>> > >vhost_iommu_region_del(MemoryListener *listener,
>> > >>>>>>>              break;
>> > >>>>>>>          }
>> > >>>>>>>      }
>> > >>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>> > >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>> > >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>> > >>>>>>> +    }
>> > >>>>>> So the current code assumes:
>> > >>>>>>
>> > >>>>>> 1) IOMMU is enabled before vhost starts
>> > >>>>>> 2) IOMMU is disabled after vhost stops
>> > >>>>>>
>> > >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
>> > >>>>>> IOMMU enabled after vhost starts?
>> > >>>>> sorry I initially misunderstood the above comment. Indeed in the
>reboot
>> > >>>>> case assumption 2) happens to be wrong. However what I currently do
>is:
>> > >>>>> stop listening to iotlb miss requests from the kernel because my
>> > >>>>> understanding is those requests are just spurious ones, generate
>> > >>>>> warnings and we do not care since we are rebooting the system.
>> > >>>>>
>> > >>>>> However I do not claim this could handle the case where the IOMMU
>MR
>> > >>>>> would be turned off and then turned on. I think in that case we should
>> > >>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
>> > >>>>> Is it a relevant use case?
>> > >>>> Not sure.
>> > >>>>
>> > >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost
>start. Is
>> > >>>>> that a valid use case as the virtio driver is using the dma api?
>> > >>>> It should not be but we can't assume the behaviour of the guest. It
>> > >>>> could be buggy or even malicious.
>> > >>> agreed
>> > >>>> Btw, we had the following codes while handling te:
>> > >>>>
>> > >>>> /* Handle Translation Enable/Disable */
>> > >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
>> > >>>> {
>> > >>>>     if (s->dmar_enabled == en) {
>> > >>>>         return;
>> > >>>>     }
>> > >>>>
>> > >>>>     trace_vtd_dmar_enable(en);
>> > >>>>
>> > >>>> ...
>> > >>>>
>> > >>>>     vtd_reset_caches(s);
>> > >>>>     vtd_address_space_refresh_all(s);
>> > >>>> }
>> > >>>>
>> > >>>> vtd_address_space_refresh_all() will basically disable the iommu
>> > >>>> memory region. It looks not sufficient to trigger the region_del
>> > >>>> callback, maybe we should delete the region or introduce listener
>> > >>>> callback?
>> > >>> This is exactly the code path which is entered in my use case.
>> > >>>
>> > >>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del.
>But
>> > >given the current implement of this latter the IOTLB callback is not unset and
>the
>> > >kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions
>are
>> > >not updated? shouldn't they. Can you explain what they correspond to?
>> > >> Adding Peter for more ideas.
>> > >>
>> > >> I think it's better to find a way to trigger the listener here, probably:
>> > >>
>> > >> 1) add/delete the memory regions instead of enable/disable
>> > >sorry I don't understand what you mean. The vhost_iommu_region_del call
>> > >stack is provided below [1]. Write to the intel iommu GCMD TE bit
>> > >induces a call to vhost_iommu_region_del. This happens before the
>> > >vhost_dev_stop whose call stack is provided below [2] and originates
>> > >from a bus reset.
>> > >
>> > >We may have inflight IOTLB miss requests happening between both.
>> > >
>> > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
>> > >translated anymore by the IOMMU and the iotlb.translated_addr returned
>> > >by address_space_get_iotlb_entry() is not within the registered
>> > >vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
>> > >the "Fail to lookup the translated address" message.
>> > >
>> > >It sounds wrong that vhost keeps on using IOVAs that are not translated
>> > >anymore. It looks we have a reset ordering issue and this patch is just
>> > >removing the sympton and not the cause.
>> > >
>> > >At the moment I don't really get what is initiating the intel iommu TE
>> > >bit write. This is a guest action but is it initiated from a misordered
>> > >qemu event?
>> >
>> > During reboot, native_machine_shutdown() calls
>x86_platform.iommu_shutdown()
>> > to disable iommu before reset. So Peter's patch will not address your issue.
>> >
>> > Before iommu shutdown, device_shutdown() is called to shutdown all devices.
>> > Not clear why vhost is still active.
>>
>> It might be because neither virtio bus nor virtio-net provides a
>> shutdown method.
Oh, I see.
>>
>> There used to be requests to provide those to unbreak the kexec.
>
>More could be seen at https://issues.redhat.com/browse/RHEL-331
>
>This looks exactly the same issue.
Have not access😊
>
>Thanks
>
>>
>> A quick try might be to provide a .driver.shutdown to
>> virtio_net_driver structure and reset the device there as a start.
Make sense.
Thanks
Zhenzhong
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24  4:00                   ` Duan, Zhenzhong
@ 2025-01-24  9:20                     ` Jason Wang
  2025-01-24  9:50                       ` Duan, Zhenzhong
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Wang @ 2025-01-24  9:20 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: eric.auger@redhat.com, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com, Peter Xu
On Fri, Jan 24, 2025 at 12:01 PM Duan, Zhenzhong
<zhenzhong.duan@intel.com> wrote:
>
>
>
> >-----Original Message-----
> >From: Jason Wang <jasowang@redhat.com>
> >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
> >disabled
> >
> >On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong
> >> <zhenzhong.duan@intel.com> wrote:
> >> >
> >> >
> >> >
> >> > >-----Original Message-----
> >> > >From: Eric Auger <eric.auger@redhat.com>
> >> > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU
> >gets
> >> > >disabled
> >> > >
> >> > >Hi Jason,
> >> > >
> >> > >
> >> > >On 1/23/25 2:34 AM, Jason Wang wrote:
> >> > >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com>
> >wrote:
> >> > >>> Hi Jason,
> >> > >>>
> >> > >>>
> >> > >>> On 1/22/25 8:17 AM, Jason Wang wrote:
> >> > >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com>
> >> > >wrote:
> >> > >>>>> Hi Jason,
> >> > >>>>>
> >> > >>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
> >> > >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
> >> > >wrote:
> >> > >>>>>>> When a guest exposed with a vhost device and protected by an
> >> > >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious
> >warning:
> >> > >>>>>>>
> >> > >>>>>>> Fail to lookup the translated address ffffe000
> >> > >>>>>>>
> >> > >>>>>>> We observe that the IOMMU gets disabled through a write to the
> >global
> >> > >>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets
> >> > >stopped.
> >> > >>>>>>> When this warning happens it can be observed an inflight IOTLB
> >> > >>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
> >> > >>>>>>> that case a flat translation occurs and the check in
> >> > >>>>>>> vhost_memory_region_lookup() fails.
> >> > >>>>>>>
> >> > >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
> >> > >>>>>>> unregistered.
> >> > >>>>>>>
> >> > >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> > >>>>>>> ---
> >> > >>>>>>>  hw/virtio/vhost.c | 4 ++++
> >> > >>>>>>>  1 file changed, 4 insertions(+)
> >> > >>>>>>>
> >> > >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> > >>>>>>> index 6aa72fd434..128c2ab094 100644
> >> > >>>>>>> --- a/hw/virtio/vhost.c
> >> > >>>>>>> +++ b/hw/virtio/vhost.c
> >> > >>>>>>> @@ -931,6 +931,10 @@ static void
> >> > >vhost_iommu_region_del(MemoryListener *listener,
> >> > >>>>>>>              break;
> >> > >>>>>>>          }
> >> > >>>>>>>      }
> >> > >>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
> >> > >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
> >> > >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
> >> > >>>>>>> +    }
> >> > >>>>>> So the current code assumes:
> >> > >>>>>>
> >> > >>>>>> 1) IOMMU is enabled before vhost starts
> >> > >>>>>> 2) IOMMU is disabled after vhost stops
> >> > >>>>>>
> >> > >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
> >> > >>>>>> IOMMU enabled after vhost starts?
> >> > >>>>> sorry I initially misunderstood the above comment. Indeed in the
> >reboot
> >> > >>>>> case assumption 2) happens to be wrong. However what I currently do
> >is:
> >> > >>>>> stop listening to iotlb miss requests from the kernel because my
> >> > >>>>> understanding is those requests are just spurious ones, generate
> >> > >>>>> warnings and we do not care since we are rebooting the system.
> >> > >>>>>
> >> > >>>>> However I do not claim this could handle the case where the IOMMU
> >MR
> >> > >>>>> would be turned off and then turned on. I think in that case we should
> >> > >>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
> >> > >>>>> Is it a relevant use case?
> >> > >>>> Not sure.
> >> > >>>>
> >> > >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost
> >start. Is
> >> > >>>>> that a valid use case as the virtio driver is using the dma api?
> >> > >>>> It should not be but we can't assume the behaviour of the guest. It
> >> > >>>> could be buggy or even malicious.
> >> > >>> agreed
> >> > >>>> Btw, we had the following codes while handling te:
> >> > >>>>
> >> > >>>> /* Handle Translation Enable/Disable */
> >> > >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
> >> > >>>> {
> >> > >>>>     if (s->dmar_enabled == en) {
> >> > >>>>         return;
> >> > >>>>     }
> >> > >>>>
> >> > >>>>     trace_vtd_dmar_enable(en);
> >> > >>>>
> >> > >>>> ...
> >> > >>>>
> >> > >>>>     vtd_reset_caches(s);
> >> > >>>>     vtd_address_space_refresh_all(s);
> >> > >>>> }
> >> > >>>>
> >> > >>>> vtd_address_space_refresh_all() will basically disable the iommu
> >> > >>>> memory region. It looks not sufficient to trigger the region_del
> >> > >>>> callback, maybe we should delete the region or introduce listener
> >> > >>>> callback?
> >> > >>> This is exactly the code path which is entered in my use case.
> >> > >>>
> >> > >>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del.
> >But
> >> > >given the current implement of this latter the IOTLB callback is not unset and
> >the
> >> > >kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions
> >are
> >> > >not updated? shouldn't they. Can you explain what they correspond to?
> >> > >> Adding Peter for more ideas.
> >> > >>
> >> > >> I think it's better to find a way to trigger the listener here, probably:
> >> > >>
> >> > >> 1) add/delete the memory regions instead of enable/disable
> >> > >sorry I don't understand what you mean. The vhost_iommu_region_del call
> >> > >stack is provided below [1]. Write to the intel iommu GCMD TE bit
> >> > >induces a call to vhost_iommu_region_del. This happens before the
> >> > >vhost_dev_stop whose call stack is provided below [2] and originates
> >> > >from a bus reset.
> >> > >
> >> > >We may have inflight IOTLB miss requests happening between both.
> >> > >
> >> > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
> >> > >translated anymore by the IOMMU and the iotlb.translated_addr returned
> >> > >by address_space_get_iotlb_entry() is not within the registered
> >> > >vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
> >> > >the "Fail to lookup the translated address" message.
> >> > >
> >> > >It sounds wrong that vhost keeps on using IOVAs that are not translated
> >> > >anymore. It looks we have a reset ordering issue and this patch is just
> >> > >removing the sympton and not the cause.
> >> > >
> >> > >At the moment I don't really get what is initiating the intel iommu TE
> >> > >bit write. This is a guest action but is it initiated from a misordered
> >> > >qemu event?
> >> >
> >> > During reboot, native_machine_shutdown() calls
> >x86_platform.iommu_shutdown()
> >> > to disable iommu before reset. So Peter's patch will not address your issue.
> >> >
> >> > Before iommu shutdown, device_shutdown() is called to shutdown all devices.
> >> > Not clear why vhost is still active.
> >>
> >> It might be because neither virtio bus nor virtio-net provides a
> >> shutdown method.
>
> Oh, I see.
>
> >>
> >> There used to be requests to provide those to unbreak the kexec.
> >
> >More could be seen at https://issues.redhat.com/browse/RHEL-331
> >
> >This looks exactly the same issue.
>
> Have not access😊
It should work now.
Thanks
>
> >
> >Thanks
> >
> >>
> >> A quick try might be to provide a .driver.shutdown to
> >> virtio_net_driver structure and reset the device there as a start.
>
> Make sense.
>
> Thanks
> Zhenzhong
^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24  9:20                     ` Jason Wang
@ 2025-01-24  9:50                       ` Duan, Zhenzhong
  0 siblings, 0 replies; 34+ messages in thread
From: Duan, Zhenzhong @ 2025-01-24  9:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: eric.auger@redhat.com, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com, Peter Xu
>-----Original Message-----
>From: Jason Wang <jasowang@redhat.com>
>Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>disabled
>
>On Fri, Jan 24, 2025 at 12:01 PM Duan, Zhenzhong
><zhenzhong.duan@intel.com> wrote:
>>
>>
>>
>> >-----Original Message-----
>> >From: Jason Wang <jasowang@redhat.com>
>> >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU
>gets
>> >disabled
>> >
>> >On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote:
>> >>
>> >> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong
>> >> <zhenzhong.duan@intel.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > >-----Original Message-----
>> >> > >From: Eric Auger <eric.auger@redhat.com>
>> >> > >Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when
>IOMMU
>> >gets
>> >> > >disabled
>> >> > >
>> >> > >Hi Jason,
>> >> > >
>> >> > >
>> >> > >On 1/23/25 2:34 AM, Jason Wang wrote:
>> >> > >> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com>
>> >wrote:
>> >> > >>> Hi Jason,
>> >> > >>>
>> >> > >>>
>> >> > >>> On 1/22/25 8:17 AM, Jason Wang wrote:
>> >> > >>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger
><eric.auger@redhat.com>
>> >> > >wrote:
>> >> > >>>>> Hi Jason,
>> >> > >>>>>
>> >> > >>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
>> >> > >>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger
><eric.auger@redhat.com>
>> >> > >wrote:
>> >> > >>>>>>> When a guest exposed with a vhost device and protected by an
>> >> > >>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious
>> >warning:
>> >> > >>>>>>>
>> >> > >>>>>>> Fail to lookup the translated address ffffe000
>> >> > >>>>>>>
>> >> > >>>>>>> We observe that the IOMMU gets disabled through a write to the
>> >global
>> >> > >>>>>>> command register (CMAR_GCMD.TE) before the vhost device
>gets
>> >> > >stopped.
>> >> > >>>>>>> When this warning happens it can be observed an inflight IOTLB
>> >> > >>>>>>> miss occurs after the IOMMU disable and before the vhost stop.
>In
>> >> > >>>>>>> that case a flat translation occurs and the check in
>> >> > >>>>>>> vhost_memory_region_lookup() fails.
>> >> > >>>>>>>
>> >> > >>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>> >> > >>>>>>> unregistered.
>> >> > >>>>>>>
>> >> > >>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> >> > >>>>>>> ---
>> >> > >>>>>>>  hw/virtio/vhost.c | 4 ++++
>> >> > >>>>>>>  1 file changed, 4 insertions(+)
>> >> > >>>>>>>
>> >> > >>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>> >> > >>>>>>> index 6aa72fd434..128c2ab094 100644
>> >> > >>>>>>> --- a/hw/virtio/vhost.c
>> >> > >>>>>>> +++ b/hw/virtio/vhost.c
>> >> > >>>>>>> @@ -931,6 +931,10 @@ static void
>> >> > >vhost_iommu_region_del(MemoryListener *listener,
>> >> > >>>>>>>              break;
>> >> > >>>>>>>          }
>> >> > >>>>>>>      }
>> >> > >>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>> >> > >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>> >> > >>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>> >> > >>>>>>> +    }
>> >> > >>>>>> So the current code assumes:
>> >> > >>>>>>
>> >> > >>>>>> 1) IOMMU is enabled before vhost starts
>> >> > >>>>>> 2) IOMMU is disabled after vhost stops
>> >> > >>>>>>
>> >> > >>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
>> >> > >>>>>> IOMMU enabled after vhost starts?
>> >> > >>>>> sorry I initially misunderstood the above comment. Indeed in the
>> >reboot
>> >> > >>>>> case assumption 2) happens to be wrong. However what I currently
>do
>> >is:
>> >> > >>>>> stop listening to iotlb miss requests from the kernel because my
>> >> > >>>>> understanding is those requests are just spurious ones, generate
>> >> > >>>>> warnings and we do not care since we are rebooting the system.
>> >> > >>>>>
>> >> > >>>>> However I do not claim this could handle the case where the
>IOMMU
>> >MR
>> >> > >>>>> would be turned off and then turned on. I think in that case we
>should
>> >> > >>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
>> >> > >>>>> Is it a relevant use case?
>> >> > >>>> Not sure.
>> >> > >>>>
>> >> > >>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost
>> >start. Is
>> >> > >>>>> that a valid use case as the virtio driver is using the dma api?
>> >> > >>>> It should not be but we can't assume the behaviour of the guest. It
>> >> > >>>> could be buggy or even malicious.
>> >> > >>> agreed
>> >> > >>>> Btw, we had the following codes while handling te:
>> >> > >>>>
>> >> > >>>> /* Handle Translation Enable/Disable */
>> >> > >>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
>> >> > >>>> {
>> >> > >>>>     if (s->dmar_enabled == en) {
>> >> > >>>>         return;
>> >> > >>>>     }
>> >> > >>>>
>> >> > >>>>     trace_vtd_dmar_enable(en);
>> >> > >>>>
>> >> > >>>> ...
>> >> > >>>>
>> >> > >>>>     vtd_reset_caches(s);
>> >> > >>>>     vtd_address_space_refresh_all(s);
>> >> > >>>> }
>> >> > >>>>
>> >> > >>>> vtd_address_space_refresh_all() will basically disable the iommu
>> >> > >>>> memory region. It looks not sufficient to trigger the region_del
>> >> > >>>> callback, maybe we should delete the region or introduce listener
>> >> > >>>> callback?
>> >> > >>> This is exactly the code path which is entered in my use case.
>> >> > >>>
>> >> > >>> vtd_address_space_refresh_all(s) induces the
>vhost_iommu_region_del.
>> >But
>> >> > >given the current implement of this latter the IOTLB callback is not unset
>and
>> >the
>> >> > >kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem-
>>regions
>> >are
>> >> > >not updated? shouldn't they. Can you explain what they correspond to?
>> >> > >> Adding Peter for more ideas.
>> >> > >>
>> >> > >> I think it's better to find a way to trigger the listener here, probably:
>> >> > >>
>> >> > >> 1) add/delete the memory regions instead of enable/disable
>> >> > >sorry I don't understand what you mean. The vhost_iommu_region_del
>call
>> >> > >stack is provided below [1]. Write to the intel iommu GCMD TE bit
>> >> > >induces a call to vhost_iommu_region_del. This happens before the
>> >> > >vhost_dev_stop whose call stack is provided below [2] and originates
>> >> > >from a bus reset.
>> >> > >
>> >> > >We may have inflight IOTLB miss requests happening between both.
>> >> > >
>> >> > >If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
>> >> > >translated anymore by the IOMMU and the iotlb.translated_addr returned
>> >> > >by address_space_get_iotlb_entry() is not within the registered
>> >> > >vhost_memory_regions looked up in vhost_memory_region_lookup(),
>hence
>> >> > >the "Fail to lookup the translated address" message.
>> >> > >
>> >> > >It sounds wrong that vhost keeps on using IOVAs that are not translated
>> >> > >anymore. It looks we have a reset ordering issue and this patch is just
>> >> > >removing the sympton and not the cause.
>> >> > >
>> >> > >At the moment I don't really get what is initiating the intel iommu TE
>> >> > >bit write. This is a guest action but is it initiated from a misordered
>> >> > >qemu event?
>> >> >
>> >> > During reboot, native_machine_shutdown() calls
>> >x86_platform.iommu_shutdown()
>> >> > to disable iommu before reset. So Peter's patch will not address your issue.
>> >> >
>> >> > Before iommu shutdown, device_shutdown() is called to shutdown all
>devices.
>> >> > Not clear why vhost is still active.
>> >>
>> >> It might be because neither virtio bus nor virtio-net provides a
>> >> shutdown method.
>>
>> Oh, I see.
>>
>> >>
>> >> There used to be requests to provide those to unbreak the kexec.
>> >
>> >More could be seen at https://issues.redhat.com/browse/RHEL-331
>> >
>> >This looks exactly the same issue.
>>
>> Have not access😊
>
>It should work now.
Yes, exactly same issue.
Thanks
Zhenzhong
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24  3:30               ` Jason Wang
  2025-01-24  3:41                 ` Jason Wang
@ 2025-01-24 15:18                 ` Peter Xu
  2025-01-26  7:56                   ` Jason Wang
  1 sibling, 1 reply; 34+ messages in thread
From: Peter Xu @ 2025-01-24 15:18 UTC (permalink / raw)
  To: Jason Wang, Eric Auger
  Cc: Duan, Zhenzhong, eric.auger@redhat.com, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com
Hello, Jason, Eric,
On Fri, Jan 24, 2025 at 11:30:56AM +0800, Jason Wang wrote:
> It might be because neither virtio bus nor virtio-net provides a
> shutdown method.
> 
> There used to be requests to provide those to unbreak the kexec.
> 
> A quick try might be to provide a .driver.shutdown to
> virtio_net_driver structure and reset the device there as a start.
I didn't check virtio driver path, but if that's missing it's reasonable to
support it indeed.
OTOH, even with that, vhost can still hit such DMA issue if it's a
hard-reset, am I right?  IOW, when using QMP command "system-reset".  If my
memory is correct, that's the problem I was working on the VFIO series,
rather than a clean reboot.  And that won't give guest driver chance to run
anything, IIUC.
I am wildly suspecting a VT-d write to GCMD to disable it can also appear
if there's a hard reset, then when bootloading the VM the bios (or whatever
firmware at early stage) may want to make sure the VT-d device is
completely off by writting to GCMD. But that's a pure guess.. and that may
or may not matter much on how we fix this problem.
IOW, I suspect we need to fix both of them,
  (a) for soft-reset, by making sure drivers properly quiesce DMAs
  proactively when VM gracefully shuts down.
  (b) for hard-reset, by making sure QEMU reset in proper order.
One thing to mention is for problem (b) VFIO used to have an extra
challenge on !FLR devices, I discussed it in patch 4's comment there.
Quotting from patch 4 of series:
https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com
     * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs
     *     (reference: resettable_cold_reset_fn())
     *
     *     Currently, vIOMMU devices are created as normal '-device'
     *     cmdlines.  It means in many ways it has the same attributes with
     *     most of the rest devices, even if the rest devices should
     *     logically be under control of the vIOMMU unit.
     *
     *     One side effect of it is vIOMMU devices will be currently put
     *     randomly under qdev tree hierarchy, which is the source of
     *     device reset ordering in current QEMU (depth-first traversal).
     *     It means vIOMMU now can be reset before some devices.  For fully
     *     emulated devices that's not a problem, because the traversal
     *     holds BQL for the whole process.  However it is a problem if DMA
     *     can happen without BQL, like VFIO, vDPA or remote device process.
     *
     *     TODO: one ideal solution can be that we make vIOMMU the parent
     *     of the whole pci host bridge.  Hence vIOMMU can be reset after
     *     all the devices are reset and quiesced.
     *
     * (2) Some devices register its own reset functions
     *
     *     Even if above issue solved, if devices register its own reset
     *     functions for some reason via QEMU reset hooks, vIOMMU can still
     *     be reset before the device. One example is vfio_reset_handler()
     *     where FLR is not supported on the device.
     *
     *     TODO: merge relevant reset functions into the device tree reset
     *     framework.
So maybe vhost doesn't have problem (2) listed above, and maybe it means
it's still worthwhile thinking more about problem (1), which is to change
the QOM tree to provide a correct topology representation when vIOMMU is
present: so far it should be still a pretty much orphaned object there.. if
QEMU relies on QOM tree topology for reset order, we may need to move it to
the right place sooner or later.
Thanks,
-- 
Peter Xu
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24  2:44             ` Duan, Zhenzhong
  2025-01-24  3:30               ` Jason Wang
@ 2025-01-24 17:47               ` Eric Auger
  2025-01-26  7:09                 ` Duan, Zhenzhong
  2025-01-31  9:55               ` Eric Auger
  2 siblings, 1 reply; 34+ messages in thread
From: Eric Auger @ 2025-01-24 17:47 UTC (permalink / raw)
  To: Duan, Zhenzhong, Jason Wang
  Cc: eric.auger.pro@gmail.com, qemu-devel@nongnu.org, mst@redhat.com,
	sgarzare@redhat.com, lvivier@redhat.com, Peter Xu
Hi Zhenzhong,
On 1/24/25 3:44 AM, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>> disabled
>>
>> Hi Jason,
>>
>>
>> On 1/23/25 2:34 AM, Jason Wang wrote:
>>> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
>>>> Hi Jason,
>>>>
>>>>
>>>> On 1/22/25 8:17 AM, Jason Wang wrote:
>>>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com>
>> wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
>>>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
>> wrote:
>>>>>>>> When a guest exposed with a vhost device and protected by an
>>>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>>>>>>>
>>>>>>>> Fail to lookup the translated address ffffe000
>>>>>>>>
>>>>>>>> We observe that the IOMMU gets disabled through a write to the global
>>>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets
>> stopped.
>>>>>>>> When this warning happens it can be observed an inflight IOTLB
>>>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
>>>>>>>> that case a flat translation occurs and the check in
>>>>>>>> vhost_memory_region_lookup() fails.
>>>>>>>>
>>>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>>>>>>> unregistered.
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>> ---
>>>>>>>>  hw/virtio/vhost.c | 4 ++++
>>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>>>>> index 6aa72fd434..128c2ab094 100644
>>>>>>>> --- a/hw/virtio/vhost.c
>>>>>>>> +++ b/hw/virtio/vhost.c
>>>>>>>> @@ -931,6 +931,10 @@ static void
>> vhost_iommu_region_del(MemoryListener *listener,
>>>>>>>>              break;
>>>>>>>>          }
>>>>>>>>      }
>>>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>>>>>>>> +    }
>>>>>>> So the current code assumes:
>>>>>>>
>>>>>>> 1) IOMMU is enabled before vhost starts
>>>>>>> 2) IOMMU is disabled after vhost stops
>>>>>>>
>>>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
>>>>>>> IOMMU enabled after vhost starts?
>>>>>> sorry I initially misunderstood the above comment. Indeed in the reboot
>>>>>> case assumption 2) happens to be wrong. However what I currently do is:
>>>>>> stop listening to iotlb miss requests from the kernel because my
>>>>>> understanding is those requests are just spurious ones, generate
>>>>>> warnings and we do not care since we are rebooting the system.
>>>>>>
>>>>>> However I do not claim this could handle the case where the IOMMU MR
>>>>>> would be turned off and then turned on. I think in that case we should
>>>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
>>>>>> Is it a relevant use case?
>>>>> Not sure.
>>>>>
>>>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
>>>>>> that a valid use case as the virtio driver is using the dma api?
>>>>> It should not be but we can't assume the behaviour of the guest. It
>>>>> could be buggy or even malicious.
>>>> agreed
>>>>> Btw, we had the following codes while handling te:
>>>>>
>>>>> /* Handle Translation Enable/Disable */
>>>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
>>>>> {
>>>>>     if (s->dmar_enabled == en) {
>>>>>         return;
>>>>>     }
>>>>>
>>>>>     trace_vtd_dmar_enable(en);
>>>>>
>>>>> ...
>>>>>
>>>>>     vtd_reset_caches(s);
>>>>>     vtd_address_space_refresh_all(s);
>>>>> }
>>>>>
>>>>> vtd_address_space_refresh_all() will basically disable the iommu
>>>>> memory region. It looks not sufficient to trigger the region_del
>>>>> callback, maybe we should delete the region or introduce listener
>>>>> callback?
>>>> This is exactly the code path which is entered in my use case.
>>>>
>>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But
>> given the current implement of this latter the IOTLB callback is not unset and the
>> kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are
>> not updated? shouldn't they. Can you explain what they correspond to?
>>> Adding Peter for more ideas.
>>>
>>> I think it's better to find a way to trigger the listener here, probably:
>>>
>>> 1) add/delete the memory regions instead of enable/disable
>> sorry I don't understand what you mean. The vhost_iommu_region_del call
>> stack is provided below [1]. Write to the intel iommu GCMD TE bit
>> induces a call to vhost_iommu_region_del. This happens before the
>> vhost_dev_stop whose call stack is provided below [2] and originates
> >from a bus reset.
>> We may have inflight IOTLB miss requests happening between both.
>>
>> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
>> translated anymore by the IOMMU and the iotlb.translated_addr returned
>> by address_space_get_iotlb_entry() is not within the registered
>> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
>> the "Fail to lookup the translated address" message.
>>
>> It sounds wrong that vhost keeps on using IOVAs that are not translated
>> anymore. It looks we have a reset ordering issue and this patch is just
>> removing the sympton and not the cause.
>>
>> At the moment I don't really get what is initiating the intel iommu TE
>> bit write. This is a guest action but is it initiated from a misordered
>> qemu event?
> During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown()
> to disable iommu before reset. So Peter's patch will not address your issue.
Exactly I see the initial IOMMU disable comes from this guest code path
(native_machine_shutdown). Indeed this is unrelated to Peter's series.
Then I see that vIOMMU is reset and then vhost_dev_stop is called (this
is related to Peter's series).
>
> Before iommu shutdown, device_shutdown() is called to shutdown all devices.
> Not clear why vhost is still active.
That's the piece I was missing next.
Thanks!
Eric
>
> Thanks
> Zhenzhong
>
>> It could also relate to
>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>> https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+
>> the+rest+of+devices
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24  3:41                 ` Jason Wang
  2025-01-24  4:00                   ` Duan, Zhenzhong
@ 2025-01-24 17:56                   ` Eric Auger
  1 sibling, 0 replies; 34+ messages in thread
From: Eric Auger @ 2025-01-24 17:56 UTC (permalink / raw)
  To: Jason Wang, Duan, Zhenzhong, Eugenio Pérez
  Cc: eric.auger.pro@gmail.com, qemu-devel@nongnu.org, mst@redhat.com,
	sgarzare@redhat.com, lvivier@redhat.com, Peter Xu
Hi,
On 1/24/25 4:41 AM, Jason Wang wrote:
> On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong
>> <zhenzhong.duan@intel.com> wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Eric Auger <eric.auger@redhat.com>
>>>> Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>>>> disabled
>>>>
>>>> Hi Jason,
>>>>
>>>>
>>>> On 1/23/25 2:34 AM, Jason Wang wrote:
>>>>> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>>
>>>>>> On 1/22/25 8:17 AM, Jason Wang wrote:
>>>>>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com>
>>>> wrote:
>>>>>>>> Hi Jason,
>>>>>>>>
>>>>>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
>>>>>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
>>>> wrote:
>>>>>>>>>> When a guest exposed with a vhost device and protected by an
>>>>>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>>>>>>>>>
>>>>>>>>>> Fail to lookup the translated address ffffe000
>>>>>>>>>>
>>>>>>>>>> We observe that the IOMMU gets disabled through a write to the global
>>>>>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets
>>>> stopped.
>>>>>>>>>> When this warning happens it can be observed an inflight IOTLB
>>>>>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
>>>>>>>>>> that case a flat translation occurs and the check in
>>>>>>>>>> vhost_memory_region_lookup() fails.
>>>>>>>>>>
>>>>>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>>>>>>>>> unregistered.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>>  hw/virtio/vhost.c | 4 ++++
>>>>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>>>>>>> index 6aa72fd434..128c2ab094 100644
>>>>>>>>>> --- a/hw/virtio/vhost.c
>>>>>>>>>> +++ b/hw/virtio/vhost.c
>>>>>>>>>> @@ -931,6 +931,10 @@ static void
>>>> vhost_iommu_region_del(MemoryListener *listener,
>>>>>>>>>>              break;
>>>>>>>>>>          }
>>>>>>>>>>      }
>>>>>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>>>>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>>>>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>>>>>>>>>> +    }
>>>>>>>>> So the current code assumes:
>>>>>>>>>
>>>>>>>>> 1) IOMMU is enabled before vhost starts
>>>>>>>>> 2) IOMMU is disabled after vhost stops
>>>>>>>>>
>>>>>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
>>>>>>>>> IOMMU enabled after vhost starts?
>>>>>>>> sorry I initially misunderstood the above comment. Indeed in the reboot
>>>>>>>> case assumption 2) happens to be wrong. However what I currently do is:
>>>>>>>> stop listening to iotlb miss requests from the kernel because my
>>>>>>>> understanding is those requests are just spurious ones, generate
>>>>>>>> warnings and we do not care since we are rebooting the system.
>>>>>>>>
>>>>>>>> However I do not claim this could handle the case where the IOMMU MR
>>>>>>>> would be turned off and then turned on. I think in that case we should
>>>>>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
>>>>>>>> Is it a relevant use case?
>>>>>>> Not sure.
>>>>>>>
>>>>>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
>>>>>>>> that a valid use case as the virtio driver is using the dma api?
>>>>>>> It should not be but we can't assume the behaviour of the guest. It
>>>>>>> could be buggy or even malicious.
>>>>>> agreed
>>>>>>> Btw, we had the following codes while handling te:
>>>>>>>
>>>>>>> /* Handle Translation Enable/Disable */
>>>>>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
>>>>>>> {
>>>>>>>     if (s->dmar_enabled == en) {
>>>>>>>         return;
>>>>>>>     }
>>>>>>>
>>>>>>>     trace_vtd_dmar_enable(en);
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>     vtd_reset_caches(s);
>>>>>>>     vtd_address_space_refresh_all(s);
>>>>>>> }
>>>>>>>
>>>>>>> vtd_address_space_refresh_all() will basically disable the iommu
>>>>>>> memory region. It looks not sufficient to trigger the region_del
>>>>>>> callback, maybe we should delete the region or introduce listener
>>>>>>> callback?
>>>>>> This is exactly the code path which is entered in my use case.
>>>>>>
>>>>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But
>>>> given the current implement of this latter the IOTLB callback is not unset and the
>>>> kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are
>>>> not updated? shouldn't they. Can you explain what they correspond to?
>>>>> Adding Peter for more ideas.
>>>>>
>>>>> I think it's better to find a way to trigger the listener here, probably:
>>>>>
>>>>> 1) add/delete the memory regions instead of enable/disable
>>>> sorry I don't understand what you mean. The vhost_iommu_region_del call
>>>> stack is provided below [1]. Write to the intel iommu GCMD TE bit
>>>> induces a call to vhost_iommu_region_del. This happens before the
>>>> vhost_dev_stop whose call stack is provided below [2] and originates
>>> >from a bus reset.
>>>> We may have inflight IOTLB miss requests happening between both.
>>>>
>>>> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
>>>> translated anymore by the IOMMU and the iotlb.translated_addr returned
>>>> by address_space_get_iotlb_entry() is not within the registered
>>>> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
>>>> the "Fail to lookup the translated address" message.
>>>>
>>>> It sounds wrong that vhost keeps on using IOVAs that are not translated
>>>> anymore. It looks we have a reset ordering issue and this patch is just
>>>> removing the sympton and not the cause.
>>>>
>>>> At the moment I don't really get what is initiating the intel iommu TE
>>>> bit write. This is a guest action but is it initiated from a misordered
>>>> qemu event?
>>> During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown()
>>> to disable iommu before reset. So Peter's patch will not address your issue.
>>>
>>> Before iommu shutdown, device_shutdown() is called to shutdown all devices.
>>> Not clear why vhost is still active.
>> It might be because neither virtio bus nor virtio-net provides a
>> shutdown method.
>>
>> There used to be requests to provide those to unbreak the kexec.
> More could be seen at https://issues.redhat.com/browse/RHEL-331
Cool, we have a culprit now :-) I see the ticket is closed, I will
reopen it. Are there known implementation challenges that caused the fix
postponing or do we "just" need someone with free cycles to carry out
the task. By the way FYI, we have other tickets opened related to vSMMU
and virtio devices also happening during reboot.
Thanks
Eric
> This looks exactly the same issue.
>
> Thanks
>
>> A quick try might be to provide a .driver.shutdown to
>> virtio_net_driver structure and reset the device there as a start.
>>
>> Thanks
>>
>>> Thanks
>>> Zhenzhong
>>>
>>>> It could also relate to
>>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>>>> https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+
>>>> the+rest+of+devices
^ permalink raw reply	[flat|nested] 34+ messages in thread
* RE: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24 17:47               ` Eric Auger
@ 2025-01-26  7:09                 ` Duan, Zhenzhong
  0 siblings, 0 replies; 34+ messages in thread
From: Duan, Zhenzhong @ 2025-01-26  7:09 UTC (permalink / raw)
  To: eric.auger@redhat.com, Jason Wang
  Cc: eric.auger.pro@gmail.com, qemu-devel@nongnu.org, mst@redhat.com,
	sgarzare@redhat.com, lvivier@redhat.com, Peter Xu
Hi Eric,
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>disabled
...
>>>>>> vtd_address_space_refresh_all() will basically disable the iommu
>>>>>> memory region. It looks not sufficient to trigger the region_del
>>>>>> callback, maybe we should delete the region or introduce listener
>>>>>> callback?
>>>>> This is exactly the code path which is entered in my use case.
>>>>>
>>>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But
>>> given the current implement of this latter the IOTLB callback is not unset and
>the
>>> kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions
>are
>>> not updated? shouldn't they. Can you explain what they correspond to?
>>>> Adding Peter for more ideas.
>>>>
>>>> I think it's better to find a way to trigger the listener here, probably:
>>>>
>>>> 1) add/delete the memory regions instead of enable/disable
>>> sorry I don't understand what you mean. The vhost_iommu_region_del call
>>> stack is provided below [1]. Write to the intel iommu GCMD TE bit
>>> induces a call to vhost_iommu_region_del. This happens before the
>>> vhost_dev_stop whose call stack is provided below [2] and originates
>> >from a bus reset.
>>> We may have inflight IOTLB miss requests happening between both.
>>>
>>> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
>>> translated anymore by the IOMMU and the iotlb.translated_addr returned
>>> by address_space_get_iotlb_entry() is not within the registered
>>> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
>>> the "Fail to lookup the translated address" message.
>>>
>>> It sounds wrong that vhost keeps on using IOVAs that are not translated
>>> anymore. It looks we have a reset ordering issue and this patch is just
>>> removing the sympton and not the cause.
>>>
>>> At the moment I don't really get what is initiating the intel iommu TE
>>> bit write. This is a guest action but is it initiated from a misordered
>>> qemu event?
>> During reboot, native_machine_shutdown() calls
>x86_platform.iommu_shutdown()
>> to disable iommu before reset. So Peter's patch will not address your issue.
>
>Exactly I see the initial IOMMU disable comes from this guest code path
>(native_machine_shutdown). Indeed this is unrelated to Peter's series.
>Then I see that vIOMMU is reset and then vhost_dev_stop is called (this
>is related to Peter's series).
Agree.
Thanks
Zhenzhong
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24 15:18                 ` Peter Xu
@ 2025-01-26  7:56                   ` Jason Wang
  2025-01-27  0:44                     ` Jason Wang
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Wang @ 2025-01-26  7:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eric Auger, Duan, Zhenzhong, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com
On Sat, Jan 25, 2025 at 12:42 AM Peter Xu <peterx@redhat.com> wrote:
>
> Hello, Jason, Eric,
>
> On Fri, Jan 24, 2025 at 11:30:56AM +0800, Jason Wang wrote:
> > It might be because neither virtio bus nor virtio-net provides a
> > shutdown method.
> >
> > There used to be requests to provide those to unbreak the kexec.
> >
> > A quick try might be to provide a .driver.shutdown to
> > virtio_net_driver structure and reset the device there as a start.
>
> I didn't check virtio driver path, but if that's missing it's reasonable to
> support it indeed.
>
> OTOH, even with that, vhost can still hit such DMA issue if it's a
> hard-reset, am I right?  IOW, when using QMP command "system-reset".  If my
> memory is correct, that's the problem I was working on the VFIO series,
> rather than a clean reboot.  And that won't give guest driver chance to run
> anything, IIUC.
Yes.
>
> I am wildly suspecting a VT-d write to GCMD to disable it can also appear
> if there's a hard reset, then when bootloading the VM the bios (or whatever
> firmware at early stage) may want to make sure the VT-d device is
> completely off by writting to GCMD. But that's a pure guess.. and that may
> or may not matter much on how we fix this problem.
>
> IOW, I suspect we need to fix both of them,
>
>   (a) for soft-reset, by making sure drivers properly quiesce DMAs
>   proactively when VM gracefully shuts down.
>
>   (b) for hard-reset, by making sure QEMU reset in proper order.
>
> One thing to mention is for problem (b) VFIO used to have an extra
> challenge on !FLR devices, I discussed it in patch 4's comment there.
> Quotting from patch 4 of series:
>
> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com
>
>      * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs
>      *     (reference: resettable_cold_reset_fn())
>      *
>      *     Currently, vIOMMU devices are created as normal '-device'
>      *     cmdlines.  It means in many ways it has the same attributes with
>      *     most of the rest devices, even if the rest devices should
>      *     logically be under control of the vIOMMU unit.
>      *
>      *     One side effect of it is vIOMMU devices will be currently put
>      *     randomly under qdev tree hierarchy, which is the source of
>      *     device reset ordering in current QEMU (depth-first traversal).
>      *     It means vIOMMU now can be reset before some devices.  For fully
>      *     emulated devices that's not a problem, because the traversal
>      *     holds BQL for the whole process.  However it is a problem if DMA
>      *     can happen without BQL, like VFIO, vDPA or remote device process.
>      *
>      *     TODO: one ideal solution can be that we make vIOMMU the parent
>      *     of the whole pci host bridge.  Hence vIOMMU can be reset after
>      *     all the devices are reset and quiesced.
>      *
>      * (2) Some devices register its own reset functions
>      *
>      *     Even if above issue solved, if devices register its own reset
>      *     functions for some reason via QEMU reset hooks, vIOMMU can still
>      *     be reset before the device. One example is vfio_reset_handler()
>      *     where FLR is not supported on the device.
>      *
>      *     TODO: merge relevant reset functions into the device tree reset
>      *     framework.
>
> So maybe vhost doesn't have problem (2) listed above, and maybe it means
> it's still worthwhile thinking more about problem (1), which is to change
> the QOM tree to provide a correct topology representation when vIOMMU is
> present: so far it should be still a pretty much orphaned object there.. if
> QEMU relies on QOM tree topology for reset order, we may need to move it to
> the right place sooner or later.
Sounds like a non-trivial task, so for a hard reset, maybe we can
proceed with Eric's proposal to deal with the reset before the device
stops.
Thanks
>
> Thanks,
>
> --
> Peter Xu
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-26  7:56                   ` Jason Wang
@ 2025-01-27  0:44                     ` Jason Wang
  2025-01-30 17:35                       ` Peter Xu
  0 siblings, 1 reply; 34+ messages in thread
From: Jason Wang @ 2025-01-27  0:44 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eric Auger, Duan, Zhenzhong, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com
On Sun, Jan 26, 2025 at 3:56 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Sat, Jan 25, 2025 at 12:42 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > Hello, Jason, Eric,
> >
> > On Fri, Jan 24, 2025 at 11:30:56AM +0800, Jason Wang wrote:
> > > It might be because neither virtio bus nor virtio-net provides a
> > > shutdown method.
> > >
> > > There used to be requests to provide those to unbreak the kexec.
> > >
> > > A quick try might be to provide a .driver.shutdown to
> > > virtio_net_driver structure and reset the device there as a start.
> >
> > I didn't check virtio driver path, but if that's missing it's reasonable to
> > support it indeed.
> >
> > OTOH, even with that, vhost can still hit such DMA issue if it's a
> > hard-reset, am I right?  IOW, when using QMP command "system-reset".  If my
> > memory is correct, that's the problem I was working on the VFIO series,
> > rather than a clean reboot.  And that won't give guest driver chance to run
> > anything, IIUC.
>
> Yes.
>
> >
> > I am wildly suspecting a VT-d write to GCMD to disable it can also appear
> > if there's a hard reset, then when bootloading the VM the bios (or whatever
> > firmware at early stage) may want to make sure the VT-d device is
> > completely off by writting to GCMD. But that's a pure guess.. and that may
> > or may not matter much on how we fix this problem.
> >
> > IOW, I suspect we need to fix both of them,
> >
> >   (a) for soft-reset, by making sure drivers properly quiesce DMAs
> >   proactively when VM gracefully shuts down.
> >
> >   (b) for hard-reset, by making sure QEMU reset in proper order.
> >
> > One thing to mention is for problem (b) VFIO used to have an extra
> > challenge on !FLR devices, I discussed it in patch 4's comment there.
> > Quotting from patch 4 of series:
> >
> > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com
> >
> >      * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs
> >      *     (reference: resettable_cold_reset_fn())
> >      *
> >      *     Currently, vIOMMU devices are created as normal '-device'
> >      *     cmdlines.  It means in many ways it has the same attributes with
> >      *     most of the rest devices, even if the rest devices should
> >      *     logically be under control of the vIOMMU unit.
> >      *
> >      *     One side effect of it is vIOMMU devices will be currently put
> >      *     randomly under qdev tree hierarchy, which is the source of
> >      *     device reset ordering in current QEMU (depth-first traversal).
> >      *     It means vIOMMU now can be reset before some devices.  For fully
> >      *     emulated devices that's not a problem, because the traversal
> >      *     holds BQL for the whole process.  However it is a problem if DMA
> >      *     can happen without BQL, like VFIO, vDPA or remote device process.
> >      *
> >      *     TODO: one ideal solution can be that we make vIOMMU the parent
> >      *     of the whole pci host bridge.  Hence vIOMMU can be reset after
> >      *     all the devices are reset and quiesced.
> >      *
> >      * (2) Some devices register its own reset functions
> >      *
> >      *     Even if above issue solved, if devices register its own reset
> >      *     functions for some reason via QEMU reset hooks, vIOMMU can still
> >      *     be reset before the device. One example is vfio_reset_handler()
> >      *     where FLR is not supported on the device.
> >      *
> >      *     TODO: merge relevant reset functions into the device tree reset
> >      *     framework.
> >
> > So maybe vhost doesn't have problem (2) listed above, and maybe it means
> > it's still worthwhile thinking more about problem (1), which is to change
> > the QOM tree to provide a correct topology representation when vIOMMU is
> > present: so far it should be still a pretty much orphaned object there.. if
> > QEMU relies on QOM tree topology for reset order, we may need to move it to
> > the right place sooner or later.
>
> Sounds like a non-trivial task, so for a hard reset, maybe we can
> proceed with Eric's proposal to deal with the reset before the device
> stops.
Btw, I actually meant to break the assumption that vhost needs to be
enabled/disabled after/before vIOMMU. This only works for virtio-net /
vhost. From the view of vhost, it would work similar to _F_LOG_ALL
(where there's no assumption on the order of enabling/disabling dirty
page tracking and device start/stop).
Thanks
>
> Thanks
>
> >
> > Thanks,
> >
> > --
> > Peter Xu
> >
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-27  0:44                     ` Jason Wang
@ 2025-01-30 17:35                       ` Peter Xu
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Xu @ 2025-01-30 17:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: Eric Auger, Duan, Zhenzhong, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
	lvivier@redhat.com
On Mon, Jan 27, 2025 at 08:44:50AM +0800, Jason Wang wrote:
> On Sun, Jan 26, 2025 at 3:56 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Sat, Jan 25, 2025 at 12:42 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Hello, Jason, Eric,
> > >
> > > On Fri, Jan 24, 2025 at 11:30:56AM +0800, Jason Wang wrote:
> > > > It might be because neither virtio bus nor virtio-net provides a
> > > > shutdown method.
> > > >
> > > > There used to be requests to provide those to unbreak the kexec.
> > > >
> > > > A quick try might be to provide a .driver.shutdown to
> > > > virtio_net_driver structure and reset the device there as a start.
> > >
> > > I didn't check virtio driver path, but if that's missing it's reasonable to
> > > support it indeed.
> > >
> > > OTOH, even with that, vhost can still hit such DMA issue if it's a
> > > hard-reset, am I right?  IOW, when using QMP command "system-reset".  If my
> > > memory is correct, that's the problem I was working on the VFIO series,
> > > rather than a clean reboot.  And that won't give guest driver chance to run
> > > anything, IIUC.
> >
> > Yes.
> >
> > >
> > > I am wildly suspecting a VT-d write to GCMD to disable it can also appear
> > > if there's a hard reset, then when bootloading the VM the bios (or whatever
> > > firmware at early stage) may want to make sure the VT-d device is
> > > completely off by writting to GCMD. But that's a pure guess.. and that may
> > > or may not matter much on how we fix this problem.
> > >
> > > IOW, I suspect we need to fix both of them,
> > >
> > >   (a) for soft-reset, by making sure drivers properly quiesce DMAs
> > >   proactively when VM gracefully shuts down.
> > >
> > >   (b) for hard-reset, by making sure QEMU reset in proper order.
> > >
> > > One thing to mention is for problem (b) VFIO used to have an extra
> > > challenge on !FLR devices, I discussed it in patch 4's comment there.
> > > Quotting from patch 4 of series:
> > >
> > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com
> > >
> > >      * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs
> > >      *     (reference: resettable_cold_reset_fn())
> > >      *
> > >      *     Currently, vIOMMU devices are created as normal '-device'
> > >      *     cmdlines.  It means in many ways it has the same attributes with
> > >      *     most of the rest devices, even if the rest devices should
> > >      *     logically be under control of the vIOMMU unit.
> > >      *
> > >      *     One side effect of it is vIOMMU devices will be currently put
> > >      *     randomly under qdev tree hierarchy, which is the source of
> > >      *     device reset ordering in current QEMU (depth-first traversal).
> > >      *     It means vIOMMU now can be reset before some devices.  For fully
> > >      *     emulated devices that's not a problem, because the traversal
> > >      *     holds BQL for the whole process.  However it is a problem if DMA
> > >      *     can happen without BQL, like VFIO, vDPA or remote device process.
> > >      *
> > >      *     TODO: one ideal solution can be that we make vIOMMU the parent
> > >      *     of the whole pci host bridge.  Hence vIOMMU can be reset after
> > >      *     all the devices are reset and quiesced.
> > >      *
> > >      * (2) Some devices register its own reset functions
> > >      *
> > >      *     Even if above issue solved, if devices register its own reset
> > >      *     functions for some reason via QEMU reset hooks, vIOMMU can still
> > >      *     be reset before the device. One example is vfio_reset_handler()
> > >      *     where FLR is not supported on the device.
> > >      *
> > >      *     TODO: merge relevant reset functions into the device tree reset
> > >      *     framework.
> > >
> > > So maybe vhost doesn't have problem (2) listed above, and maybe it means
> > > it's still worthwhile thinking more about problem (1), which is to change
> > > the QOM tree to provide a correct topology representation when vIOMMU is
> > > present: so far it should be still a pretty much orphaned object there.. if
> > > QEMU relies on QOM tree topology for reset order, we may need to move it to
> > > the right place sooner or later.
> >
> > Sounds like a non-trivial task, so for a hard reset, maybe we can
> > proceed with Eric's proposal to deal with the reset before the device
> > stops.
The major challenge when I was working on that (as far as I can still
remember..): some devices are created at early stage of QEMU startup, which
can happen before the vIOMMU is created and realized.  Then it can be
challenging to re-parent those devices to be childrens of the vIOMMU, or we
may need a way to create vIOMMU always earlier than those..
> 
> Btw, I actually meant to break the assumption that vhost needs to be
> enabled/disabled after/before vIOMMU. This only works for virtio-net /
> vhost. From the view of vhost, it would work similar to _F_LOG_ALL
> (where there's no assumption on the order of enabling/disabling dirty
> page tracking and device start/stop).
Yes, we can go for a lightweight solution.
Thanks,
-- 
Peter Xu
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-24  2:44             ` Duan, Zhenzhong
  2025-01-24  3:30               ` Jason Wang
  2025-01-24 17:47               ` Eric Auger
@ 2025-01-31  9:55               ` Eric Auger
  2025-02-20 15:25                 ` Michael S. Tsirkin
  2 siblings, 1 reply; 34+ messages in thread
From: Eric Auger @ 2025-01-31  9:55 UTC (permalink / raw)
  To: Duan, Zhenzhong, Jason Wang, Michael S. Tsirkin
  Cc: eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	sgarzare@redhat.com, lvivier@redhat.com, Peter Xu
Hi
On 1/24/25 3:44 AM, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets
>> disabled
>>
>> Hi Jason,
>>
>>
>> On 1/23/25 2:34 AM, Jason Wang wrote:
>>> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
>>>> Hi Jason,
>>>>
>>>>
>>>> On 1/22/25 8:17 AM, Jason Wang wrote:
>>>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com>
>> wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
>>>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
>> wrote:
>>>>>>>> When a guest exposed with a vhost device and protected by an
>>>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>>>>>>>
>>>>>>>> Fail to lookup the translated address ffffe000
>>>>>>>>
>>>>>>>> We observe that the IOMMU gets disabled through a write to the global
>>>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets
>> stopped.
>>>>>>>> When this warning happens it can be observed an inflight IOTLB
>>>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
>>>>>>>> that case a flat translation occurs and the check in
>>>>>>>> vhost_memory_region_lookup() fails.
>>>>>>>>
>>>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>>>>>>> unregistered.
>>>>>>>>
>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>> ---
>>>>>>>>  hw/virtio/vhost.c | 4 ++++
>>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>>>>> index 6aa72fd434..128c2ab094 100644
>>>>>>>> --- a/hw/virtio/vhost.c
>>>>>>>> +++ b/hw/virtio/vhost.c
>>>>>>>> @@ -931,6 +931,10 @@ static void
>> vhost_iommu_region_del(MemoryListener *listener,
>>>>>>>>              break;
>>>>>>>>          }
>>>>>>>>      }
>>>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>>>>>>>> +    }
>>>>>>> So the current code assumes:
>>>>>>>
>>>>>>> 1) IOMMU is enabled before vhost starts
>>>>>>> 2) IOMMU is disabled after vhost stops
>>>>>>>
>>>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
>>>>>>> IOMMU enabled after vhost starts?
>>>>>> sorry I initially misunderstood the above comment. Indeed in the reboot
>>>>>> case assumption 2) happens to be wrong. However what I currently do is:
>>>>>> stop listening to iotlb miss requests from the kernel because my
>>>>>> understanding is those requests are just spurious ones, generate
>>>>>> warnings and we do not care since we are rebooting the system.
>>>>>>
>>>>>> However I do not claim this could handle the case where the IOMMU MR
>>>>>> would be turned off and then turned on. I think in that case we should
>>>>>> also flush the kernel IOTLB and this is not taken care of in this patch.
>>>>>> Is it a relevant use case?
>>>>> Not sure.
>>>>>
>>>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
>>>>>> that a valid use case as the virtio driver is using the dma api?
>>>>> It should not be but we can't assume the behaviour of the guest. It
>>>>> could be buggy or even malicious.
>>>> agreed
>>>>> Btw, we had the following codes while handling te:
>>>>>
>>>>> /* Handle Translation Enable/Disable */
>>>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
>>>>> {
>>>>>     if (s->dmar_enabled == en) {
>>>>>         return;
>>>>>     }
>>>>>
>>>>>     trace_vtd_dmar_enable(en);
>>>>>
>>>>> ...
>>>>>
>>>>>     vtd_reset_caches(s);
>>>>>     vtd_address_space_refresh_all(s);
>>>>> }
>>>>>
>>>>> vtd_address_space_refresh_all() will basically disable the iommu
>>>>> memory region. It looks not sufficient to trigger the region_del
>>>>> callback, maybe we should delete the region or introduce listener
>>>>> callback?
>>>> This is exactly the code path which is entered in my use case.
>>>>
>>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But
>> given the current implement of this latter the IOTLB callback is not unset and the
>> kernel IOTLB is not refreshed. Also as I pointed out the  hdev->mem->regions are
>> not updated? shouldn't they. Can you explain what they correspond to?
>>> Adding Peter for more ideas.
>>>
>>> I think it's better to find a way to trigger the listener here, probably:
>>>
>>> 1) add/delete the memory regions instead of enable/disable
>> sorry I don't understand what you mean. The vhost_iommu_region_del call
>> stack is provided below [1]. Write to the intel iommu GCMD TE bit
>> induces a call to vhost_iommu_region_del. This happens before the
>> vhost_dev_stop whose call stack is provided below [2] and originates
> >from a bus reset.
>> We may have inflight IOTLB miss requests happening between both.
>>
>> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
>> translated anymore by the IOMMU and the iotlb.translated_addr returned
>> by address_space_get_iotlb_entry() is not within the registered
>> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
>> the "Fail to lookup the translated address" message.
>>
>> It sounds wrong that vhost keeps on using IOVAs that are not translated
>> anymore. It looks we have a reset ordering issue and this patch is just
>> removing the sympton and not the cause.
>>
>> At the moment I don't really get what is initiating the intel iommu TE
>> bit write. This is a guest action but is it initiated from a misordered
>> qemu event?
> During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown()
> to disable iommu before reset. So Peter's patch will not address your issue.
>
> Before iommu shutdown, device_shutdown() is called to shutdown all devices.
> Not clear why vhost is still active.
I tested [PATCH] virtio: Remove virtio devices on device_shutdown()
https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shutemov@linux.intel.com/
and it fixes my issue
Eric
>
> Thanks
> Zhenzhong
>
>> It could also relate to
>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>> https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+
>> the+rest+of+devices
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-01-31  9:55               ` Eric Auger
@ 2025-02-20 15:25                 ` Michael S. Tsirkin
  2025-02-20 15:57                   ` Eric Auger
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2025-02-20 15:25 UTC (permalink / raw)
  To: Eric Auger
  Cc: Duan, Zhenzhong, Jason Wang, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, sgarzare@redhat.com, lvivier@redhat.com,
	Peter Xu
On Fri, Jan 31, 2025 at 10:55:26AM +0100, Eric Auger wrote:
> 
> I tested [PATCH] virtio: Remove virtio devices on device_shutdown()
> https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shutemov@linux.intel.com/
> 
> and it fixes my issue
> 
> Eric
To make sure, we are dropping this in favor of the guest fix (which
I intent to merge shortly), correct?
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-02-20 15:25                 ` Michael S. Tsirkin
@ 2025-02-20 15:57                   ` Eric Auger
  2025-02-20 23:27                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Auger @ 2025-02-20 15:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Duan, Zhenzhong, Jason Wang, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, sgarzare@redhat.com, lvivier@redhat.com,
	Peter Xu
Hi Michael,
On 2/20/25 4:25 PM, Michael S. Tsirkin wrote:
> On Fri, Jan 31, 2025 at 10:55:26AM +0100, Eric Auger wrote:
>> I tested [PATCH] virtio: Remove virtio devices on device_shutdown()
>> https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shutemov@linux.intel.com/
>>
>> and it fixes my issue
>>
>> Eric
>
> To make sure, we are dropping this in favor of the guest fix (which
> I intent to merge shortly), correct?
Yes this should be dropped in favour of the implementation of the
shutdown callback on guest side. Did you send your patch, I did not see it.
however on qemu side, there is "[PATCH v3 0/5] Fix vIOMMU reset order"
that needs to be merged to fix qmp system_reset.
Thanks
Eric
>
^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
  2025-02-20 15:57                   ` Eric Auger
@ 2025-02-20 23:27                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 34+ messages in thread
From: Michael S. Tsirkin @ 2025-02-20 23:27 UTC (permalink / raw)
  To: Eric Auger
  Cc: Duan, Zhenzhong, Jason Wang, eric.auger.pro@gmail.com,
	qemu-devel@nongnu.org, sgarzare@redhat.com, lvivier@redhat.com,
	Peter Xu
On Thu, Feb 20, 2025 at 04:57:51PM +0100, Eric Auger wrote:
> Hi Michael,
> 
> 
> On 2/20/25 4:25 PM, Michael S. Tsirkin wrote:
> > On Fri, Jan 31, 2025 at 10:55:26AM +0100, Eric Auger wrote:
> >> I tested [PATCH] virtio: Remove virtio devices on device_shutdown()
> >> https://lore.kernel.org/all/20240808075141.3433253-1-kirill.shutemov@linux.intel.com/
> >>
> >> and it fixes my issue
> >>
> >> Eric
> >
> > To make sure, we are dropping this in favor of the guest fix (which
> > I intent to merge shortly), correct?
> 
> Yes this should be dropped in favour of the implementation of the
> shutdown callback on guest side. Did you send your patch, I did not see it.
Will send tomorrow.
> however on qemu side, there is "[PATCH v3 0/5] Fix vIOMMU reset order"
> that needs to be merged to fix qmp system_reset.
Yes that one I'm testing.
> Thanks
> 
> Eric
> >
^ permalink raw reply	[flat|nested] 34+ messages in thread
end of thread, other threads:[~2025-02-20 23:28 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-20 17:33 [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled Eric Auger
2025-01-21  3:27 ` Jason Wang
2025-01-21  7:15   ` Eric Auger
2025-01-21 16:25   ` Eric Auger
2025-01-22  7:17     ` Jason Wang
2025-01-22  7:55       ` Eric Auger
2025-01-23  1:34         ` Jason Wang
2025-01-23  8:31           ` Eric Auger
2025-01-24  1:48             ` Jason Wang
2025-01-24  2:44             ` Duan, Zhenzhong
2025-01-24  3:30               ` Jason Wang
2025-01-24  3:41                 ` Jason Wang
2025-01-24  4:00                   ` Duan, Zhenzhong
2025-01-24  9:20                     ` Jason Wang
2025-01-24  9:50                       ` Duan, Zhenzhong
2025-01-24 17:56                   ` Eric Auger
2025-01-24 15:18                 ` Peter Xu
2025-01-26  7:56                   ` Jason Wang
2025-01-27  0:44                     ` Jason Wang
2025-01-30 17:35                       ` Peter Xu
2025-01-24 17:47               ` Eric Auger
2025-01-26  7:09                 ` Duan, Zhenzhong
2025-01-31  9:55               ` Eric Auger
2025-02-20 15:25                 ` Michael S. Tsirkin
2025-02-20 15:57                   ` Eric Auger
2025-02-20 23:27                     ` Michael S. Tsirkin
2025-01-21  8:31 ` Laurent Vivier
2025-01-21  8:45   ` Stefano Garzarella
2025-01-21  8:49     ` Eric Auger
2025-01-21  8:48   ` Eric Auger
2025-01-21 16:32   ` Eric Auger
2025-01-21  9:18 ` Duan, Zhenzhong
2025-01-21 10:34   ` Eric Auger
2025-01-21 10:43     ` Duan, Zhenzhong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).