[PATCH 0/5] Fix vIOMMU reset order

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] Fix vIOMMU reset order
@ 2025-02-06 14:21 Eric Auger
  2025-02-06 14:21 ` [PATCH 1/5] hw/virtio/virtio-iommu: Migrate to 3-phase reset Eric Auger
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Eric Auger @ 2025-02-06 14:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, jasowang, imammedo, peterx, alex.williamson, clg, philmd,
	zhenzhong.duan, ddutile

This is a follow-up of Peter's attempt to fix the fact that
vIOMMUs are likely to be reset before the device they protect:

[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/

This is especially observed with virtio devices when a qmp system_reset
command is sent but also with VFIO devices.

This series puts the vIOMMU reset in the 3-phase exit callback.

This scheme was tested successful with virtio-devices and some
VFIO devices. Nevertheless not all the topologies have been
tested yet.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/viommu-3phase-reset-v1

Eric Auger (4):
  hw/virtio/virtio-iommu: Migrate to 3-phase reset
  hw/i386/intel-iommu: Migrate to 3-phase reset
  hw/arm/smmuv3: Move reset to exit phase
  hw/vfio/common: Add a trace point in vfio_reset_handler

Peter Xu (1):
  hw/i386/intel_iommu: Tear down address spaces before IOMMU reset

 hw/arm/smmuv3.c          |  9 +++++----
 hw/i386/intel_iommu.c    | 10 ++++++----
 hw/vfio/common.c         |  1 +
 hw/virtio/virtio-iommu.c |  9 +++++----
 hw/arm/trace-events      |  1 +
 hw/i386/trace-events     |  1 +
 hw/vfio/trace-events     |  1 +
 hw/virtio/trace-events   |  2 +-
 8 files changed, 21 insertions(+), 13 deletions(-)

-- 
2.47.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/5] hw/virtio/virtio-iommu: Migrate to 3-phase reset
  2025-02-06 14:21 [PATCH 0/5] Fix vIOMMU reset order Eric Auger
@ 2025-02-06 14:21 ` Eric Auger
  2025-02-06 14:21 ` [PATCH 2/5] hw/i386/intel-iommu: " Eric Auger
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2025-02-06 14:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, jasowang, imammedo, peterx, alex.williamson, clg, philmd,
	zhenzhong.duan, ddutile

Currently the iommu may be reset before the devices
it protects. For example this happens with virtio-net.

Let's use 3-phase reset mechanism and reset the IOMMU on
exit phase intead.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/virtio-iommu.c | 9 +++++----
 hw/virtio/trace-events   | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index f41104a952..eae3bcedde 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -1504,11 +1504,11 @@ static void virtio_iommu_device_unrealize(DeviceState *dev)
     virtio_cleanup(vdev);
 }
 
-static void virtio_iommu_device_reset(VirtIODevice *vdev)
+static void virtio_iommu_device_reset_exit(Object *obj, ResetType type)
 {
-    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
+    VirtIOIOMMU *s = VIRTIO_IOMMU(obj);
 
-    trace_virtio_iommu_device_reset();
+    trace_virtio_iommu_device_reset_exit();
 
     if (s->domains) {
         g_tree_destroy(s->domains);
@@ -1668,6 +1668,7 @@ static void virtio_iommu_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
 
     device_class_set_props(dc, virtio_iommu_properties);
     dc->vmsd = &vmstate_virtio_iommu;
@@ -1675,7 +1676,7 @@ static void virtio_iommu_class_init(ObjectClass *klass, void *data)
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     vdc->realize = virtio_iommu_device_realize;
     vdc->unrealize = virtio_iommu_device_unrealize;
-    vdc->reset = virtio_iommu_device_reset;
+    rc->phases.exit = virtio_iommu_device_reset_exit;
     vdc->get_config = virtio_iommu_get_config;
     vdc->set_config = virtio_iommu_set_config;
     vdc->get_features = virtio_iommu_get_features;
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 04e36ae047..76f0d458b2 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -108,7 +108,7 @@ virtio_pci_notify_write(uint64_t addr, uint64_t val, unsigned int size) "0x%" PR
 virtio_pci_notify_write_pio(uint64_t addr, uint64_t val, unsigned int size) "0x%" PRIx64" = 0x%" PRIx64 " (%d)"
 
 # hw/virtio/virtio-iommu.c
-virtio_iommu_device_reset(void) "reset!"
+virtio_iommu_device_reset_exit(void) "reset!"
 virtio_iommu_system_reset(void) "system reset!"
 virtio_iommu_get_features(uint64_t features) "device supports features=0x%"PRIx64
 virtio_iommu_device_status(uint8_t status) "driver status = %d"
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] hw/i386/intel-iommu: Migrate to 3-phase reset
  2025-02-06 14:21 [PATCH 0/5] Fix vIOMMU reset order Eric Auger
  2025-02-06 14:21 ` [PATCH 1/5] hw/virtio/virtio-iommu: Migrate to 3-phase reset Eric Auger
@ 2025-02-06 14:21 ` Eric Auger
  2025-02-06 14:21 ` [PATCH 3/5] hw/i386/intel_iommu: Tear down address spaces before IOMMU reset Eric Auger
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2025-02-06 14:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, jasowang, imammedo, peterx, alex.williamson, clg, philmd,
	zhenzhong.duan, ddutile

Currently the IOMMU may be reset before the devices
it protects. For example this happens with virtio-net.

Let's use 3-phase reset mechanism and reset the IOMMU on
exit phase instead.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/i386/intel_iommu.c | 8 +++++---
 hw/i386/trace-events  | 1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f366c223d0..21a8bf45f8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4697,10 +4697,11 @@ static void vtd_init(IntelIOMMUState *s)
 /* Should not reset address_spaces when reset because devices will still use
  * the address space they got at first (won't ask the bus again).
  */
-static void vtd_reset(DeviceState *dev)
+static void vtd_reset_exit(Object *obj, ResetType type)
 {
-    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
+    IntelIOMMUState *s = INTEL_IOMMU_DEVICE(obj);
 
+    trace_vtd_reset_exit();
     vtd_init(s);
     vtd_address_space_refresh_all(s);
 }
@@ -4864,8 +4865,9 @@ static void vtd_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     X86IOMMUClass *x86_class = X86_IOMMU_DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
 
-    device_class_set_legacy_reset(dc, vtd_reset);
+    rc->phases.exit = vtd_reset_exit;
     dc->vmsd = &vtd_vmstate;
     device_class_set_props(dc, vtd_properties);
     dc->hotpluggable = false;
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index 53c02d7ac8..ac9e1a10aa 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -68,6 +68,7 @@ vtd_frr_new(int index, uint64_t hi, uint64_t lo) "index %d high 0x%"PRIx64" low
 vtd_warn_invalid_qi_tail(uint16_t tail) "tail 0x%"PRIx16
 vtd_warn_ir_vector(uint16_t sid, int index, int vec, int target) "sid 0x%"PRIx16" index %d vec %d (should be: %d)"
 vtd_warn_ir_trigger(uint16_t sid, int index, int trig, int target) "sid 0x%"PRIx16" index %d trigger %d (should be: %d)"
+vtd_reset_exit(void) ""
 
 # amd_iommu.c
 amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 0x%"PRIx64" +  offset 0x%"PRIx32
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] hw/i386/intel_iommu: Tear down address spaces before IOMMU reset
  2025-02-06 14:21 [PATCH 0/5] Fix vIOMMU reset order Eric Auger
  2025-02-06 14:21 ` [PATCH 1/5] hw/virtio/virtio-iommu: Migrate to 3-phase reset Eric Auger
  2025-02-06 14:21 ` [PATCH 2/5] hw/i386/intel-iommu: " Eric Auger
@ 2025-02-06 14:21 ` Eric Auger
  2025-02-17  3:02   ` Duan, Zhenzhong
  2025-02-06 14:21 ` [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase Eric Auger
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2025-02-06 14:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, jasowang, imammedo, peterx, alex.williamson, clg, philmd,
	zhenzhong.duan, ddutile

From: Peter Xu <peterx@redhat.com>

No bug report for this, but logically tearing down of existing address
space should happen before reset of IOMMU state / registers, because the
current address spaces may still rely on those information.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/i386/intel_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 21a8bf45f8..1bd9ae403b 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4702,8 +4702,8 @@ static void vtd_reset_exit(Object *obj, ResetType type)
     IntelIOMMUState *s = INTEL_IOMMU_DEVICE(obj);
 
     trace_vtd_reset_exit();
-    vtd_init(s);
     vtd_address_space_refresh_all(s);
+    vtd_init(s);
 }
 
 static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* RE: [PATCH 3/5] hw/i386/intel_iommu: Tear down address spaces before IOMMU reset
  2025-02-06 14:21 ` [PATCH 3/5] hw/i386/intel_iommu: Tear down address spaces before IOMMU reset Eric Auger
@ 2025-02-17  3:02   ` Duan, Zhenzhong
  2025-02-17  7:31     ` Eric Auger
  0 siblings, 1 reply; 30+ messages in thread
From: Duan, Zhenzhong @ 2025-02-17  3:02 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, peter.maydell@linaro.org, mst@redhat.com,
	jasowang@redhat.com, imammedo@redhat.com, peterx@redhat.com,
	alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org,
	ddutile@redhat.com

Hi Eric,

>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: [PATCH 3/5] hw/i386/intel_iommu: Tear down address spaces before
>IOMMU reset
>
>From: Peter Xu <peterx@redhat.com>
>
>No bug report for this, but logically tearing down of existing address
>space should happen before reset of IOMMU state / registers, because the
>current address spaces may still rely on those information.
>
>Signed-off-by: Peter Xu <peterx@redhat.com>
>Signed-off-by: Eric Auger <eric.auger@redhat.com>
>---
> hw/i386/intel_iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>index 21a8bf45f8..1bd9ae403b 100644
>--- a/hw/i386/intel_iommu.c
>+++ b/hw/i386/intel_iommu.c
>@@ -4702,8 +4702,8 @@ static void vtd_reset_exit(Object *obj, ResetType type)
>     IntelIOMMUState *s = INTEL_IOMMU_DEVICE(obj);
>
>     trace_vtd_reset_exit();
>-    vtd_init(s);
>     vtd_address_space_refresh_all(s);
>+    vtd_init(s);

I'm not sure if we should have this change. vtd_switch_address_space() checks s->dmar_enabled and vtd_init() updates s->dmar_enabled. With this change, will we leave stale mapping there after reset?

Thanks
Zhenzhong

> }
>
> static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int
>devfn)
>--
>2.47.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] hw/i386/intel_iommu: Tear down address spaces before IOMMU reset
  2025-02-17  3:02   ` Duan, Zhenzhong
@ 2025-02-17  7:31     ` Eric Auger
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2025-02-17  7:31 UTC (permalink / raw)
  To: Duan, Zhenzhong, eric.auger.pro@gmail.com, qemu-devel@nongnu.org,
	qemu-arm@nongnu.org, peter.maydell@linaro.org, mst@redhat.com,
	jasowang@redhat.com, imammedo@redhat.com, peterx@redhat.com,
	alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org,
	ddutile@redhat.com

Hi Zhenzhong,


On 2/17/25 4:02 AM, Duan, Zhenzhong wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Subject: [PATCH 3/5] hw/i386/intel_iommu: Tear down address spaces before
>> IOMMU reset
>>
>> From: Peter Xu <peterx@redhat.com>
>>
>> No bug report for this, but logically tearing down of existing address
>> space should happen before reset of IOMMU state / registers, because the
>> current address spaces may still rely on those information.
>>
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>> hw/i386/intel_iommu.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 21a8bf45f8..1bd9ae403b 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -4702,8 +4702,8 @@ static void vtd_reset_exit(Object *obj, ResetType type)
>>     IntelIOMMUState *s = INTEL_IOMMU_DEVICE(obj);
>>
>>     trace_vtd_reset_exit();
>> -    vtd_init(s);
>>     vtd_address_space_refresh_all(s);
>> +    vtd_init(s);
> I'm not sure if we should have this change. vtd_switch_address_space() checks s->dmar_enabled and vtd_init() updates s->dmar_enabled. With this change, will we leave stale mapping there after reset?
Yes I do agree. This could break the as switch. I will remove this patch.

Eric
>
> Thanks
> Zhenzhong
>
>> }
>>
>> static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int
>> devfn)
>> --
>> 2.47.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-06 14:21 [PATCH 0/5] Fix vIOMMU reset order Eric Auger
                   ` (2 preceding siblings ...)
  2025-02-06 14:21 ` [PATCH 3/5] hw/i386/intel_iommu: Tear down address spaces before IOMMU reset Eric Auger
@ 2025-02-06 14:21 ` Eric Auger
  2025-02-07 16:37   ` Peter Maydell
  2025-02-06 14:21 ` [PATCH 5/5] hw/vfio/common: Add a trace point in vfio_reset_handler Eric Auger
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2025-02-06 14:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, jasowang, imammedo, peterx, alex.williamson, clg, philmd,
	zhenzhong.duan, ddutile

Currently the iommu may be reset before the devices
it protects. For example this happens with virtio-scsi-pci.
when system_reset is issued from qmp monitor, spurious
"virtio: zero sized buffers are not allowed" warnings can
be observed.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/arm/smmuv3.c     | 9 +++++----
 hw/arm/trace-events | 1 +
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index c0cf5df0f6..7522c32b24 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1870,13 +1870,14 @@ static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
     }
 }
 
-static void smmu_reset_hold(Object *obj, ResetType type)
+static void smmu_reset_exit(Object *obj, ResetType type)
 {
     SMMUv3State *s = ARM_SMMUV3(obj);
     SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
 
-    if (c->parent_phases.hold) {
-        c->parent_phases.hold(obj, type);
+    trace_smmu_reset_exit();
+    if (c->parent_phases.exit) {
+        c->parent_phases.exit(obj, type);
     }
 
     smmuv3_init_regs(s);
@@ -1999,7 +2000,7 @@ static void smmuv3_class_init(ObjectClass *klass, void *data)
     SMMUv3Class *c = ARM_SMMUV3_CLASS(klass);
 
     dc->vmsd = &vmstate_smmuv3;
-    resettable_class_set_parent_phases(rc, NULL, smmu_reset_hold, NULL,
+    resettable_class_set_parent_phases(rc, NULL, NULL, smmu_reset_exit,
                                        &c->parent_phases);
     device_class_set_parent_realize(dc, smmu_realize,
                                     &c->parent_realize);
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index c64ad344bd..7790db780e 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -56,6 +56,7 @@ smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x"
 smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu mr=%s"
 smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu mr=%s"
 smmuv3_inv_notifiers_iova(const char *name, int asid, int vmid, uint64_t iova, uint8_t tg, uint64_t num_pages, int stage) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" tg=%d num_pages=0x%"PRIx64" stage=%d"
+smmu_reset_exit(void) ""
 
 # strongarm.c
 strongarm_uart_update_parameters(const char *label, int speed, char parity, int data_bits, int stop_bits) "%s speed=%d parity=%c data=%d stop=%d"
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-06 14:21 ` [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase Eric Auger
@ 2025-02-07 16:37   ` Peter Maydell
  2025-02-07 16:50     ` Eric Auger
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2025-02-07 16:37 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang, imammedo,
	peterx, alex.williamson, clg, philmd, zhenzhong.duan, ddutile

On Thu, 6 Feb 2025 at 14:23, Eric Auger <eric.auger@redhat.com> wrote:
>
> Currently the iommu may be reset before the devices
> it protects. For example this happens with virtio-scsi-pci.
> when system_reset is issued from qmp monitor, spurious
> "virtio: zero sized buffers are not allowed" warnings can
> be observed.
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  hw/arm/smmuv3.c     | 9 +++++----
>  hw/arm/trace-events | 1 +
>  2 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index c0cf5df0f6..7522c32b24 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -1870,13 +1870,14 @@ static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
>      }
>  }
>
> -static void smmu_reset_hold(Object *obj, ResetType type)
> +static void smmu_reset_exit(Object *obj, ResetType type)
>  {
>      SMMUv3State *s = ARM_SMMUV3(obj);
>      SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
>
> -    if (c->parent_phases.hold) {
> -        c->parent_phases.hold(obj, type);
> +    trace_smmu_reset_exit();
> +    if (c->parent_phases.exit) {
> +        c->parent_phases.exit(obj, type);
>      }

If we need to do something unexpected like reset
register values in the exit phase rather than the
hold phase, it's a good idea to have a comment explaining
why, to avoid somebody coming along afterwards and tidying
it up into the more usual arrangement.

If I understand correctly we need to keep the whole IOMMU
config intact until the exit phase? What's the thing the
device behind the IOMMU is trying to do during its reset
that triggers the warning?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-07 16:37   ` Peter Maydell
@ 2025-02-07 16:50     ` Eric Auger
  2025-02-07 16:58       ` Peter Maydell
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2025-02-07 16:50 UTC (permalink / raw)
  To: Peter Maydell
  Cc: eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang, imammedo,
	peterx, alex.williamson, clg, philmd, zhenzhong.duan, ddutile




On 2/7/25 5:37 PM, Peter Maydell wrote:
> On Thu, 6 Feb 2025 at 14:23, Eric Auger <eric.auger@redhat.com> wrote:
>> Currently the iommu may be reset before the devices
>> it protects. For example this happens with virtio-scsi-pci.
>> when system_reset is issued from qmp monitor, spurious
>> "virtio: zero sized buffers are not allowed" warnings can
>> be observed.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  hw/arm/smmuv3.c     | 9 +++++----
>>  hw/arm/trace-events | 1 +
>>  2 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>> index c0cf5df0f6..7522c32b24 100644
>> --- a/hw/arm/smmuv3.c
>> +++ b/hw/arm/smmuv3.c
>> @@ -1870,13 +1870,14 @@ static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
>>      }
>>  }
>>
>> -static void smmu_reset_hold(Object *obj, ResetType type)
>> +static void smmu_reset_exit(Object *obj, ResetType type)
>>  {
>>      SMMUv3State *s = ARM_SMMUV3(obj);
>>      SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
>>
>> -    if (c->parent_phases.hold) {
>> -        c->parent_phases.hold(obj, type);
>> +    trace_smmu_reset_exit();
>> +    if (c->parent_phases.exit) {
>> +        c->parent_phases.exit(obj, type);
>>      }
> If we need to do something unexpected like reset
> register values in the exit phase rather than the
> hold phase, it's a good idea to have a comment explaining
> why, to avoid somebody coming along afterwards and tidying
> it up into the more usual arrangement.
sure
>
> If I understand correctly we need to keep the whole IOMMU
> config intact until the exit phase? What's the thing the
> device behind the IOMMU is trying to do during its reset
> that triggers the warning?
The virtio-pci-net continues to perform DMA requests and this causes
some weird messages such as:
"virtio: bogus descriptor or out of resources"

Also VFIO devices may continue issuing DMAs causing translation faults

Thanks

Eric

>
> thanks
> -- PMM
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-07 16:50     ` Eric Auger
@ 2025-02-07 16:58       ` Peter Maydell
  2025-02-07 17:47         ` Peter Xu
  2025-02-10  8:40         ` Eric Auger
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Maydell @ 2025-02-07 16:58 UTC (permalink / raw)
  To: eric.auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang, imammedo,
	peterx, alex.williamson, clg, philmd, zhenzhong.duan, ddutile

On Fri, 7 Feb 2025 at 16:50, Eric Auger <eric.auger@redhat.com> wrote:
>
>
>
>
> On 2/7/25 5:37 PM, Peter Maydell wrote:
> > On Thu, 6 Feb 2025 at 14:23, Eric Auger <eric.auger@redhat.com> wrote:
> >> Currently the iommu may be reset before the devices
> >> it protects. For example this happens with virtio-scsi-pci.
> >> when system_reset is issued from qmp monitor, spurious
> >> "virtio: zero sized buffers are not allowed" warnings can
> >> be observed.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >> ---
> >>  hw/arm/smmuv3.c     | 9 +++++----
> >>  hw/arm/trace-events | 1 +
> >>  2 files changed, 6 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> >> index c0cf5df0f6..7522c32b24 100644
> >> --- a/hw/arm/smmuv3.c
> >> +++ b/hw/arm/smmuv3.c
> >> @@ -1870,13 +1870,14 @@ static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
> >>      }
> >>  }
> >>
> >> -static void smmu_reset_hold(Object *obj, ResetType type)
> >> +static void smmu_reset_exit(Object *obj, ResetType type)
> >>  {
> >>      SMMUv3State *s = ARM_SMMUV3(obj);
> >>      SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
> >>
> >> -    if (c->parent_phases.hold) {
> >> -        c->parent_phases.hold(obj, type);
> >> +    trace_smmu_reset_exit();
> >> +    if (c->parent_phases.exit) {
> >> +        c->parent_phases.exit(obj, type);
> >>      }
> > If we need to do something unexpected like reset
> > register values in the exit phase rather than the
> > hold phase, it's a good idea to have a comment explaining
> > why, to avoid somebody coming along afterwards and tidying
> > it up into the more usual arrangement.
> sure
> >
> > If I understand correctly we need to keep the whole IOMMU
> > config intact until the exit phase? What's the thing the
> > device behind the IOMMU is trying to do during its reset
> > that triggers the warning?
> The virtio-pci-net continues to perform DMA requests and this causes
> some weird messages such as:
> "virtio: bogus descriptor or out of resources"
>
> Also VFIO devices may continue issuing DMAs causing translation faults

Hmm, right. I guess this only happens with KVM, or can you
trigger it in a TCG setup too? Anyway, presumably we can
rely on the devices quiescing all their outstanding DMA
by the time the hold phase comes along.

(I wonder if we ought to suggest quiescing outstanding
DMA in the enter phase? But it's probably easier to fix
the iommus like this series does than try to get every
dma-capable pci device to do something different.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-07 16:58       ` Peter Maydell
@ 2025-02-07 17:47         ` Peter Xu
  2025-02-07 18:18           ` Peter Maydell
  2025-02-10  8:35           ` Eric Auger
  2025-02-10  8:40         ` Eric Auger
  1 sibling, 2 replies; 30+ messages in thread
From: Peter Xu @ 2025-02-07 17:47 UTC (permalink / raw)
  To: Peter Maydell
  Cc: eric.auger, eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang,
	imammedo, alex.williamson, clg, philmd, zhenzhong.duan, ddutile

On Fri, Feb 07, 2025 at 04:58:39PM +0000, Peter Maydell wrote:
> (I wonder if we ought to suggest quiescing outstanding
> DMA in the enter phase? But it's probably easier to fix
> the iommus like this series does than try to get every
> dma-capable pci device to do something different.)

I wonder if we should provide some generic helper to register vIOMMU reset
callbacks, so that we'll be sure any vIOMMU model impl that will register
at exit() phase only, and do nothing during the initial two phases.  Then
we can put some rich comment on that helper on why.

Looks like it means the qemu reset model in the future can be a combination
of device tree (which resets depth-first) and the three phases model.  We
will start to use different approach to solve different problems.

Maybe after we settle our mind, we should update the reset document,
e.g. for device emulation developers, we need to be clear on where to
quiesce the DMAs, and it must not happen at exit().  Both all devices and
all iommu impls need to follow the rules to make it work like the plan.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-07 17:47         ` Peter Xu
@ 2025-02-07 18:18           ` Peter Maydell
  2025-02-10  8:47             ` Eric Auger
  2025-02-10 14:14             ` Peter Xu
  2025-02-10  8:35           ` Eric Auger
  1 sibling, 2 replies; 30+ messages in thread
From: Peter Maydell @ 2025-02-07 18:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: eric.auger, eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang,
	imammedo, alex.williamson, clg, philmd, zhenzhong.duan, ddutile

On Fri, 7 Feb 2025 at 17:48, Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Feb 07, 2025 at 04:58:39PM +0000, Peter Maydell wrote:
> > (I wonder if we ought to suggest quiescing outstanding
> > DMA in the enter phase? But it's probably easier to fix
> > the iommus like this series does than try to get every
> > dma-capable pci device to do something different.)
>
> I wonder if we should provide some generic helper to register vIOMMU reset
> callbacks, so that we'll be sure any vIOMMU model impl that will register
> at exit() phase only, and do nothing during the initial two phases.  Then
> we can put some rich comment on that helper on why.
>
> Looks like it means the qemu reset model in the future can be a combination
> of device tree (which resets depth-first) and the three phases model.  We
> will start to use different approach to solve different problems.

The tree of QOM devices (i.e. the one based on the qbus buses
and rooted at the sysbus) resets depth-first, but it does so in
three phases: first we traverse everything doing 'enter'; then
we traverse everything doing 'hold'; then we traverse everything
doing 'exit'. There *used* to be an awkward mix of some things
being three-phase and some not, but we have now got rid of all
of those so a system reset does a single three-phase reset run
which resets everything.

-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-07 18:18           ` Peter Maydell
@ 2025-02-10  8:47             ` Eric Auger
  2025-02-10 14:14             ` Peter Xu
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Auger @ 2025-02-10  8:47 UTC (permalink / raw)
  To: Peter Maydell, Peter Xu
  Cc: eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang, imammedo,
	alex.williamson, clg, philmd, zhenzhong.duan, ddutile


Hi Peter,

On 2/7/25 7:18 PM, Peter Maydell wrote:
> On Fri, 7 Feb 2025 at 17:48, Peter Xu <peterx@redhat.com> wrote:
>> On Fri, Feb 07, 2025 at 04:58:39PM +0000, Peter Maydell wrote:
>>> (I wonder if we ought to suggest quiescing outstanding
>>> DMA in the enter phase? But it's probably easier to fix
>>> the iommus like this series does than try to get every
>>> dma-capable pci device to do something different.)
>> I wonder if we should provide some generic helper to register vIOMMU reset
>> callbacks, so that we'll be sure any vIOMMU model impl that will register
>> at exit() phase only, and do nothing during the initial two phases.  Then
>> we can put some rich comment on that helper on why.
>>
>> Looks like it means the qemu reset model in the future can be a combination
>> of device tree (which resets depth-first) and the three phases model.  We
>> will start to use different approach to solve different problems.
> The tree of QOM devices (i.e. the one based on the qbus buses
> and rooted at the sysbus) resets depth-first, but it does so in
> three phases: first we traverse everything doing 'enter'; then
> we traverse everything doing 'hold'; then we traverse everything
> doing 'exit'. There *used* to be an awkward mix of some things
> being three-phase and some not, but we have now got rid of all
> of those so a system reset does a single three-phase reset run
> which resets everything.
Thank you Peter. This is reassuring. I will add such kind of description
in the commit msg/cover letter.

Eric
>
> -- PMM
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-07 18:18           ` Peter Maydell
  2025-02-10  8:47             ` Eric Auger
@ 2025-02-10 14:14             ` Peter Xu
  2025-02-10 14:22               ` Peter Maydell
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Xu @ 2025-02-10 14:14 UTC (permalink / raw)
  To: Peter Maydell
  Cc: eric.auger, eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang,
	imammedo, alex.williamson, clg, philmd, zhenzhong.duan, ddutile

On Fri, Feb 07, 2025 at 06:18:50PM +0000, Peter Maydell wrote:
> On Fri, 7 Feb 2025 at 17:48, Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Feb 07, 2025 at 04:58:39PM +0000, Peter Maydell wrote:
> > > (I wonder if we ought to suggest quiescing outstanding
> > > DMA in the enter phase? But it's probably easier to fix
> > > the iommus like this series does than try to get every
> > > dma-capable pci device to do something different.)
> >
> > I wonder if we should provide some generic helper to register vIOMMU reset
> > callbacks, so that we'll be sure any vIOMMU model impl that will register
> > at exit() phase only, and do nothing during the initial two phases.  Then
> > we can put some rich comment on that helper on why.
> >
> > Looks like it means the qemu reset model in the future can be a combination
> > of device tree (which resets depth-first) and the three phases model.  We
> > will start to use different approach to solve different problems.
> 
> The tree of QOM devices (i.e. the one based on the qbus buses
> and rooted at the sysbus) resets depth-first, but it does so in
> three phases: first we traverse everything doing 'enter'; then
> we traverse everything doing 'hold'; then we traverse everything
> doing 'exit'. There *used* to be an awkward mix of some things
> being three-phase and some not, but we have now got rid of all
> of those so a system reset does a single three-phase reset run
> which resets everything.

Right.  Sorry I wasn't very clear before indeed on what I wanted to
express.

My understanding is the 3 phases reset, even if existed, was not designed
to order things like vIOMMU and devices that is already described by system
topology.  That's, IMHO, exactly what QOM topology wanted to achieve right
now on ordering device resets and the whole depth-first reset method would
make sense with it.

So from that specific POV, it's a mixture use of both methods on ordering
of devices to reset now (rather than the order of reset process within a
same device, provided into 3 phases).  It may not be very intuitive when
someone reads about the two reset mechanisms, as one would naturally take
vIOMMU as a root object of any other PCIe devices under root complex, and
thinking the order should be guaranteed by QOM on reset already.  In
reality it's not.  So that's the part I wonder if we want to document.

So we must make sure both:

  - All vIOMMUs across all archs must only tear down its mapping at its
    exit() phase, providing the mapping available for all devices during
    the initial 2 phases (probably we could even assert the initial 2 phase
    functions to be NULL when there's a base class).  Meanwhile,

  - All PCIe devices must quiesce their DMA in the initial 2 phases,
    guaranteeing that there's no on-the-fly DMAs possible in the complete
    3rd exit() phase, because any vIOMMU implementation can start to tear
    down its device mappings even as the first entry in 3rd phase (IOW,
    there's also no order constraint for 3rd phase that vIOMMU exit() will
    be invoked before devices' exit()).

I'm not sure if it would be important to document this, but only thought
about it if we want crystal clearance on the choice of this design.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-10 14:14             ` Peter Xu
@ 2025-02-10 14:22               ` Peter Maydell
  2025-02-12 17:28                 ` Cédric Le Goater
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2025-02-10 14:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: eric.auger, eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang,
	imammedo, alex.williamson, clg, philmd, zhenzhong.duan, ddutile

On Mon, 10 Feb 2025 at 14:14, Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Feb 07, 2025 at 06:18:50PM +0000, Peter Maydell wrote:
> > On Fri, 7 Feb 2025 at 17:48, Peter Xu <peterx@redhat.com> wrote:
> > >
> > > On Fri, Feb 07, 2025 at 04:58:39PM +0000, Peter Maydell wrote:
> > > > (I wonder if we ought to suggest quiescing outstanding
> > > > DMA in the enter phase? But it's probably easier to fix
> > > > the iommus like this series does than try to get every
> > > > dma-capable pci device to do something different.)
> > >
> > > I wonder if we should provide some generic helper to register vIOMMU reset
> > > callbacks, so that we'll be sure any vIOMMU model impl that will register
> > > at exit() phase only, and do nothing during the initial two phases.  Then
> > > we can put some rich comment on that helper on why.
> > >
> > > Looks like it means the qemu reset model in the future can be a combination
> > > of device tree (which resets depth-first) and the three phases model.  We
> > > will start to use different approach to solve different problems.
> >
> > The tree of QOM devices (i.e. the one based on the qbus buses
> > and rooted at the sysbus) resets depth-first, but it does so in
> > three phases: first we traverse everything doing 'enter'; then
> > we traverse everything doing 'hold'; then we traverse everything
> > doing 'exit'. There *used* to be an awkward mix of some things
> > being three-phase and some not, but we have now got rid of all
> > of those so a system reset does a single three-phase reset run
> > which resets everything.
>
> Right.  Sorry I wasn't very clear before indeed on what I wanted to
> express.
>
> My understanding is the 3 phases reset, even if existed, was not designed
> to order things like vIOMMU and devices that is already described by system
> topology.  That's, IMHO, exactly what QOM topology wanted to achieve right
> now on ordering device resets and the whole depth-first reset method would
> make sense with it.
>
> So from that specific POV, it's a mixture use of both methods on ordering
> of devices to reset now (rather than the order of reset process within a
> same device, provided into 3 phases).  It may not be very intuitive when
> someone reads about the two reset mechanisms, as one would naturally take
> vIOMMU as a root object of any other PCIe devices under root complex, and
> thinking the order should be guaranteed by QOM on reset already.  In
> reality it's not.  So that's the part I wonder if we want to document.

Yeah, I see what you mean. The issue here is that the iommu isn't
actually a parent of the devices that access through it. This is
true both in the "qbus/qdev bus tree" sense (where it is the PCI
controller device that owns the PCI bus that the devices are on)
and also in the QOM tree sense[*] (where usually the iommu and the
PCI controller are both created by the machine or the SoC, I guess?).
Instead iommus are separate devices that control data flow but
aren't in a parent-child relationship with the devices on either
side of that flow. There is a guarantee about reset ordering between
bus parent/child (so the PCI controller resets before it resets
the PCI bus that resets the PCI devices on the bus), but that doesn't
help the iommu.

[*] I have a vague idea that ideally we might want reset to be
QOM-tree based rather than qbus-tree based. But getting there from
here is non-trivial. And maybe what we really want is "objects,
especially SoCs, that create children can define what their reset
tree is, with the default being to reset all the QOM children".
Lots of non-thought-through complexity here ;-)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-10 14:22               ` Peter Maydell
@ 2025-02-12 17:28                 ` Cédric Le Goater
  0 siblings, 0 replies; 30+ messages in thread
From: Cédric Le Goater @ 2025-02-12 17:28 UTC (permalink / raw)
  To: Peter Maydell, Peter Xu
  Cc: eric.auger, eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang,
	imammedo, alex.williamson, philmd, zhenzhong.duan, ddutile

On 2/10/25 15:22, Peter Maydell wrote:
> On Mon, 10 Feb 2025 at 14:14, Peter Xu <peterx@redhat.com> wrote:
>>
>> On Fri, Feb 07, 2025 at 06:18:50PM +0000, Peter Maydell wrote:
>>> On Fri, 7 Feb 2025 at 17:48, Peter Xu <peterx@redhat.com> wrote:
>>>>
>>>> On Fri, Feb 07, 2025 at 04:58:39PM +0000, Peter Maydell wrote:
>>>>> (I wonder if we ought to suggest quiescing outstanding
>>>>> DMA in the enter phase? But it's probably easier to fix
>>>>> the iommus like this series does than try to get every
>>>>> dma-capable pci device to do something different.)
>>>>
>>>> I wonder if we should provide some generic helper to register vIOMMU reset
>>>> callbacks, so that we'll be sure any vIOMMU model impl that will register
>>>> at exit() phase only, and do nothing during the initial two phases.  Then
>>>> we can put some rich comment on that helper on why.
>>>>
>>>> Looks like it means the qemu reset model in the future can be a combination
>>>> of device tree (which resets depth-first) and the three phases model.  We
>>>> will start to use different approach to solve different problems.
>>>
>>> The tree of QOM devices (i.e. the one based on the qbus buses
>>> and rooted at the sysbus) resets depth-first, but it does so in
>>> three phases: first we traverse everything doing 'enter'; then
>>> we traverse everything doing 'hold'; then we traverse everything
>>> doing 'exit'. There *used* to be an awkward mix of some things
>>> being three-phase and some not, but we have now got rid of all
>>> of those so a system reset does a single three-phase reset run
>>> which resets everything.
>>
>> Right.  Sorry I wasn't very clear before indeed on what I wanted to
>> express.
>>
>> My understanding is the 3 phases reset, even if existed, was not designed
>> to order things like vIOMMU and devices that is already described by system
>> topology.  That's, IMHO, exactly what QOM topology wanted to achieve right
>> now on ordering device resets and the whole depth-first reset method would
>> make sense with it.
>>
>> So from that specific POV, it's a mixture use of both methods on ordering
>> of devices to reset now (rather than the order of reset process within a
>> same device, provided into 3 phases).  It may not be very intuitive when
>> someone reads about the two reset mechanisms, as one would naturally take
>> vIOMMU as a root object of any other PCIe devices under root complex, and
>> thinking the order should be guaranteed by QOM on reset already.  In
>> reality it's not.  So that's the part I wonder if we want to document.
> 
> Yeah, I see what you mean. The issue here is that the iommu isn't
> actually a parent of the devices that access through it. This is
> true both in the "qbus/qdev bus tree" sense (where it is the PCI
> controller device that owns the PCI bus that the devices are on)
> and also in the QOM tree sense[*] (where usually the iommu and the
> PCI controller are both created by the machine or the SoC, I guess?).
> Instead iommus are separate devices that control data flow but
> aren't in a parent-child relationship with the devices on either
> side of that flow. There is a guarantee about reset ordering between
> bus parent/child (so the PCI controller resets before it resets
> the PCI bus that resets the PCI devices on the bus), but that doesn't
> help the iommu.
> 
> [*] I have a vague idea that ideally we might want reset to be
> QOM-tree based rather than qbus-tree based. But getting there from
> here is non-trivial. And maybe what we really want is "objects,
> especially SoCs, that create children can define what their reset
> tree is, with the default being to reset all the QOM children".

yes. When I quickly thought about it, I had the idea that we could
reparent the vIOMMU device(s) to a default or a specific PHB and
order the reset calls from the PHB reset routine.

> Lots of non-thought-through complexity here ;-)

same ...

C.





^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-07 17:47         ` Peter Xu
  2025-02-07 18:18           ` Peter Maydell
@ 2025-02-10  8:35           ` Eric Auger
  2025-02-10 14:18             ` Peter Xu
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Auger @ 2025-02-10  8:35 UTC (permalink / raw)
  To: Peter Xu, Peter Maydell
  Cc: eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang, imammedo,
	alex.williamson, clg, philmd, zhenzhong.duan, ddutile

Hi Peter,


On 2/7/25 6:47 PM, Peter Xu wrote:
> On Fri, Feb 07, 2025 at 04:58:39PM +0000, Peter Maydell wrote:
>> (I wonder if we ought to suggest quiescing outstanding
>> DMA in the enter phase? But it's probably easier to fix
>> the iommus like this series does than try to get every
>> dma-capable pci device to do something different.)
> I wonder if we should provide some generic helper to register vIOMMU reset
> callbacks, so that we'll be sure any vIOMMU model impl that will register
> at exit() phase only, and do nothing during the initial two phases.  Then
> we can put some rich comment on that helper on why.
As discussed with Cédric, I think it shall think about having eventually
a base class for vIOMMU. Maybe this is something we can handle
afterwards though.
>
> Looks like it means the qemu reset model in the future can be a combination
> of device tree (which resets depth-first) and the three phases model.  We
> will start to use different approach to solve different problems.
>
> Maybe after we settle our mind, we should update the reset document,
> e.g. for device emulation developers, we need to be clear on where to
> quiesce the DMAs, and it must not happen at exit().  Both all devices and
> all iommu impls need to follow the rules to make it work like the plan.
The 3 phase documentation already states that you shouldn't do anything
in enter phase that can have side-effect on other devices. However I
agree we can add another example besides the qemu_irq line one.

Eric
>
> Thanks,
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-10  8:35           ` Eric Auger
@ 2025-02-10 14:18             ` Peter Xu
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2025-02-10 14:18 UTC (permalink / raw)
  To: Eric Auger
  Cc: Peter Maydell, eric.auger.pro, qemu-devel, qemu-arm, mst,
	jasowang, imammedo, alex.williamson, clg, philmd, zhenzhong.duan,
	ddutile

On Mon, Feb 10, 2025 at 09:35:53AM +0100, Eric Auger wrote:
> Hi Peter,

Eric,

> 
> 
> On 2/7/25 6:47 PM, Peter Xu wrote:
> > On Fri, Feb 07, 2025 at 04:58:39PM +0000, Peter Maydell wrote:
> >> (I wonder if we ought to suggest quiescing outstanding
> >> DMA in the enter phase? But it's probably easier to fix
> >> the iommus like this series does than try to get every
> >> dma-capable pci device to do something different.)
> > I wonder if we should provide some generic helper to register vIOMMU reset
> > callbacks, so that we'll be sure any vIOMMU model impl that will register
> > at exit() phase only, and do nothing during the initial two phases.  Then
> > we can put some rich comment on that helper on why.
> As discussed with Cédric, I think it shall think about having eventually
> a base class for vIOMMU. Maybe this is something we can handle
> afterwards though.

Yes agreed.

> >
> > Looks like it means the qemu reset model in the future can be a combination
> > of device tree (which resets depth-first) and the three phases model.  We
> > will start to use different approach to solve different problems.
> >
> > Maybe after we settle our mind, we should update the reset document,
> > e.g. for device emulation developers, we need to be clear on where to
> > quiesce the DMAs, and it must not happen at exit().  Both all devices and
> > all iommu impls need to follow the rules to make it work like the plan.
> The 3 phase documentation already states that you shouldn't do anything
> in enter phase that can have side-effect on other devices. However I
> agree we can add another example besides the qemu_irq line one.

The document will be relevant to two sides of things (so far not relevant
to enter() phase, but more on what we should put into the last two phases
either for vIOMMU impl or a PCIe device impl), that I commented in the
other reply to Peter.  I am not sure whether we need such fine granule
document, but in all cases I agree this can definitely be done later. :)

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase
  2025-02-07 16:58       ` Peter Maydell
  2025-02-07 17:47         ` Peter Xu
@ 2025-02-10  8:40         ` Eric Auger
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Auger @ 2025-02-10  8:40 UTC (permalink / raw)
  To: Peter Maydell
  Cc: eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang, imammedo,
	peterx, alex.williamson, clg, philmd, zhenzhong.duan, ddutile


Hi Peter,

On 2/7/25 5:58 PM, Peter Maydell wrote:
> On Fri, 7 Feb 2025 at 16:50, Eric Auger <eric.auger@redhat.com> wrote:
>>
>>
>>
>> On 2/7/25 5:37 PM, Peter Maydell wrote:
>>> On Thu, 6 Feb 2025 at 14:23, Eric Auger <eric.auger@redhat.com> wrote:
>>>> Currently the iommu may be reset before the devices
>>>> it protects. For example this happens with virtio-scsi-pci.
>>>> when system_reset is issued from qmp monitor, spurious
>>>> "virtio: zero sized buffers are not allowed" warnings can
>>>> be observed.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> ---
>>>>  hw/arm/smmuv3.c     | 9 +++++----
>>>>  hw/arm/trace-events | 1 +
>>>>  2 files changed, 6 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
>>>> index c0cf5df0f6..7522c32b24 100644
>>>> --- a/hw/arm/smmuv3.c
>>>> +++ b/hw/arm/smmuv3.c
>>>> @@ -1870,13 +1870,14 @@ static void smmu_init_irq(SMMUv3State *s, SysBusDevice *dev)
>>>>      }
>>>>  }
>>>>
>>>> -static void smmu_reset_hold(Object *obj, ResetType type)
>>>> +static void smmu_reset_exit(Object *obj, ResetType type)
>>>>  {
>>>>      SMMUv3State *s = ARM_SMMUV3(obj);
>>>>      SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
>>>>
>>>> -    if (c->parent_phases.hold) {
>>>> -        c->parent_phases.hold(obj, type);
>>>> +    trace_smmu_reset_exit();
>>>> +    if (c->parent_phases.exit) {
>>>> +        c->parent_phases.exit(obj, type);
>>>>      }
>>> If we need to do something unexpected like reset
>>> register values in the exit phase rather than the
>>> hold phase, it's a good idea to have a comment explaining
>>> why, to avoid somebody coming along afterwards and tidying
>>> it up into the more usual arrangement.
>> sure
>>> If I understand correctly we need to keep the whole IOMMU
>>> config intact until the exit phase? What's the thing the
>>> device behind the IOMMU is trying to do during its reset
>>> that triggers the warning?
>> The virtio-pci-net continues to perform DMA requests and this causes
>> some weird messages such as:
>> "virtio: bogus descriptor or out of resources"
>>
>> Also VFIO devices may continue issuing DMAs causing translation faults
> Hmm, right. I guess this only happens with KVM, or can you
> trigger it in a TCG setup too? Anyway, presumably we can
I have only tested with KVM acceleration for now.
> rely on the devices quiescing all their outstanding DMA
> by the time the hold phase comes along.
>
> (I wonder if we ought to suggest quiescing outstanding
> DMA in the enter phase? But it's probably easier to fix
> the iommus like this series does than try to get every
> dma-capable pci device to do something different.)
at least we can document this, as Peter later suggests. Indeed fixing it
at vIOMMU level looked much easier for me, hoping that no other DMA
capable device would stop DMAs at exit phase

Eric
>
> thanks
> -- PMM
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 5/5] hw/vfio/common: Add a trace point in vfio_reset_handler
  2025-02-06 14:21 [PATCH 0/5] Fix vIOMMU reset order Eric Auger
                   ` (3 preceding siblings ...)
  2025-02-06 14:21 ` [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase Eric Auger
@ 2025-02-06 14:21 ` Eric Auger
  2025-02-07 17:18   ` Cédric Le Goater
  2025-02-07 11:09 ` [PATCH 0/5] Fix vIOMMU reset order Michael S. Tsirkin
  2025-02-07 16:54 ` Peter Xu
  6 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2025-02-06 14:21 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, jasowang, imammedo, peterx, alex.williamson, clg, philmd,
	zhenzhong.duan, ddutile

To ease the debug of reset sequence, let's add a trace point
in vfio_reset_handler()

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/vfio/common.c     | 1 +
 hw/vfio/trace-events | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f7499a9b74..173fb3a997 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1386,6 +1386,7 @@ void vfio_reset_handler(void *opaque)
 {
     VFIODevice *vbasedev;
 
+    trace_vfio_reset_handler();
     QLIST_FOREACH(vbasedev, &vfio_device_list, global_next) {
         if (vbasedev->dev->realized) {
             vbasedev->ops->vfio_compute_needs_reset(vbasedev);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index cab1cf1de0..c5385e1a4f 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -120,6 +120,7 @@ vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype
 vfio_legacy_dma_unmap_overflow_workaround(void) ""
 vfio_get_dirty_bitmap(uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start, uint64_t dirty_pages) "iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64" dirty_pages=%"PRIu64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
+vfio_reset_handler(void) ""
 
 # platform.c
 vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] hw/vfio/common: Add a trace point in vfio_reset_handler
  2025-02-06 14:21 ` [PATCH 5/5] hw/vfio/common: Add a trace point in vfio_reset_handler Eric Auger
@ 2025-02-07 17:18   ` Cédric Le Goater
  0 siblings, 0 replies; 30+ messages in thread
From: Cédric Le Goater @ 2025-02-07 17:18 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, peter.maydell,
	mst, jasowang, imammedo, peterx, alex.williamson, philmd,
	zhenzhong.duan, ddutile

On 2/6/25 15:21, Eric Auger wrote:
> To ease the debug of reset sequence, let's add a trace point
> in vfio_reset_handler()
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/common.c     | 1 +
>   hw/vfio/trace-events | 1 +
>   2 files changed, 2 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f7499a9b74..173fb3a997 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1386,6 +1386,7 @@ void vfio_reset_handler(void *opaque)
>   {
>       VFIODevice *vbasedev;
>   
> +    trace_vfio_reset_handler();
>       QLIST_FOREACH(vbasedev, &vfio_device_list, global_next) {
>           if (vbasedev->dev->realized) {
>               vbasedev->ops->vfio_compute_needs_reset(vbasedev);
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index cab1cf1de0..c5385e1a4f 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -120,6 +120,7 @@ vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype
>   vfio_legacy_dma_unmap_overflow_workaround(void) ""
>   vfio_get_dirty_bitmap(uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start, uint64_t dirty_pages) "iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64" dirty_pages=%"PRIu64
>   vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
> +vfio_reset_handler(void) ""
>   
>   # platform.c
>   vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Fix vIOMMU reset order
  2025-02-06 14:21 [PATCH 0/5] Fix vIOMMU reset order Eric Auger
                   ` (4 preceding siblings ...)
  2025-02-06 14:21 ` [PATCH 5/5] hw/vfio/common: Add a trace point in vfio_reset_handler Eric Auger
@ 2025-02-07 11:09 ` Michael S. Tsirkin
  2025-02-07 16:40   ` Peter Maydell
  2025-02-07 16:54 ` Peter Xu
  6 siblings, 1 reply; 30+ messages in thread
From: Michael S. Tsirkin @ 2025-02-07 11:09 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell, jasowang,
	imammedo, peterx, alex.williamson, clg, philmd, zhenzhong.duan,
	ddutile

On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> This is a follow-up of Peter's attempt to fix the fact that
> vIOMMUs are likely to be reset before the device they protect:
> 
> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> 
> This is especially observed with virtio devices when a qmp system_reset
> command is sent but also with VFIO devices.
> 
> This series puts the vIOMMU reset in the 3-phase exit callback.
> 
> This scheme was tested successful with virtio-devices and some
> VFIO devices. Nevertheless not all the topologies have been
> tested yet.
> 
> Best Regards
> 
> Eric



Looks good.


Acked-by: Michael S. Tsirkin <mst@redhat.com>

How should this be merged?
I supposed I can merge the 1st three and the other
two by the respective maintainers?
I don't think there's a dependency here, right?

> This series can be found at:
> https://github.com/eauger/qemu/tree/viommu-3phase-reset-v1
> 
> Eric Auger (4):
>   hw/virtio/virtio-iommu: Migrate to 3-phase reset
>   hw/i386/intel-iommu: Migrate to 3-phase reset
>   hw/arm/smmuv3: Move reset to exit phase
>   hw/vfio/common: Add a trace point in vfio_reset_handler
> 
> Peter Xu (1):
>   hw/i386/intel_iommu: Tear down address spaces before IOMMU reset
> 
>  hw/arm/smmuv3.c          |  9 +++++----
>  hw/i386/intel_iommu.c    | 10 ++++++----
>  hw/vfio/common.c         |  1 +
>  hw/virtio/virtio-iommu.c |  9 +++++----
>  hw/arm/trace-events      |  1 +
>  hw/i386/trace-events     |  1 +
>  hw/vfio/trace-events     |  1 +
>  hw/virtio/trace-events   |  2 +-
>  8 files changed, 21 insertions(+), 13 deletions(-)
> 
> -- 
> 2.47.1



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Fix vIOMMU reset order
  2025-02-07 11:09 ` [PATCH 0/5] Fix vIOMMU reset order Michael S. Tsirkin
@ 2025-02-07 16:40   ` Peter Maydell
  2025-02-07 16:52     ` Eric Auger
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2025-02-07 16:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, jasowang,
	imammedo, peterx, alex.williamson, clg, philmd, zhenzhong.duan,
	ddutile

On Fri, 7 Feb 2025 at 11:10, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> > This is a follow-up of Peter's attempt to fix the fact that
> > vIOMMUs are likely to be reset before the device they protect:
> >
> > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> >
> > This is especially observed with virtio devices when a qmp system_reset
> > command is sent but also with VFIO devices.
> >
> > This series puts the vIOMMU reset in the 3-phase exit callback.
> >
> > This scheme was tested successful with virtio-devices and some
> > VFIO devices. Nevertheless not all the topologies have been
> > tested yet.
> >
> > Best Regards
> >
> > Eric
>
>
>
> Looks good.
>
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
> How should this be merged?
> I supposed I can merge the 1st three and the other
> two by the respective maintainers?
> I don't think there's a dependency here, right?

If we're happy with the design of the series I think it
would be simpler to take the whole thing through one
tree, rather than split it up. I had a question on the
smmu patch which is mostly about clarifying what the
issue is that we're running into, but in principle
I'm happy for you to take that patch as well.

-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Fix vIOMMU reset order
  2025-02-07 16:40   ` Peter Maydell
@ 2025-02-07 16:52     ` Eric Auger
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2025-02-07 16:52 UTC (permalink / raw)
  To: Peter Maydell, Michael S. Tsirkin
  Cc: eric.auger.pro, qemu-devel, qemu-arm, jasowang, imammedo, peterx,
	alex.williamson, clg, philmd, zhenzhong.duan, ddutile

Hi Peter, Michael,


On 2/7/25 5:40 PM, Peter Maydell wrote:
> On Fri, 7 Feb 2025 at 11:10, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
>>> This is a follow-up of Peter's attempt to fix the fact that
>>> vIOMMUs are likely to be reset before the device they protect:
>>>
>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
>>>
>>> This is especially observed with virtio devices when a qmp system_reset
>>> command is sent but also with VFIO devices.
>>>
>>> This series puts the vIOMMU reset in the 3-phase exit callback.
>>>
>>> This scheme was tested successful with virtio-devices and some
>>> VFIO devices. Nevertheless not all the topologies have been
>>> tested yet.
>>>
>>> Best Regards
>>>
>>> Eric
>>
>>
>> Looks good.
>>
>>
>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>
>> How should this be merged?
>> I supposed I can merge the 1st three and the other
>> two by the respective maintainers?
>> I don't think there's a dependency here, right?
> If we're happy with the design of the series I think it
> would be simpler to take the whole thing through one
> tree, rather than split it up. I had a question on the
> smmu patch which is mostly about clarifying what the
> issue is that we're running into, but in principle
> I'm happy for you to take that patch as well.
Thank you for the swift review. I will respin to add some comments along
with the reset function and the relevance of exit phase + add more
details in the cover letter.

Thanks

Erc
>
> -- PMM
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Fix vIOMMU reset order
  2025-02-06 14:21 [PATCH 0/5] Fix vIOMMU reset order Eric Auger
                   ` (5 preceding siblings ...)
  2025-02-07 11:09 ` [PATCH 0/5] Fix vIOMMU reset order Michael S. Tsirkin
@ 2025-02-07 16:54 ` Peter Xu
  2025-02-07 17:06   ` Peter Maydell
  2025-02-07 17:25   ` Cédric Le Goater
  6 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2025-02-07 16:54 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell, mst,
	jasowang, imammedo, alex.williamson, clg, philmd, zhenzhong.duan,
	ddutile

On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> This is a follow-up of Peter's attempt to fix the fact that
> vIOMMUs are likely to be reset before the device they protect:
> 
> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> 
> This is especially observed with virtio devices when a qmp system_reset
> command is sent but also with VFIO devices.
> 
> This series puts the vIOMMU reset in the 3-phase exit callback.
> 
> This scheme was tested successful with virtio-devices and some
> VFIO devices. Nevertheless not all the topologies have been
> tested yet.

Eric,

It's great to know that we seem to be able to fix everything in such small
changeset!

I would like to double check two things with you here:

  - For VFIO's reset hook, looks like we have landed more changes so that
    vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
    reset during "hold" phase only (via legacy_reset_hold()).  That part
    will make sure vIOMMU (if switching to exit()-only reset) will order
    properly with VFIO.  Is my understanding correct here?

  - Is it possible if some PCIe devices that will provide its own
    phase.exit(), would it matter on the order of PCIe device's
    phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
    exit()-only approach like this one)?

PS: it would be great to attach such information in either cover letter or
commit message.  But definitely not a request to repost the patchset, if
Michael would have Message-ID when merge that'll be far enough to help
anyone find this discussion again.

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Fix vIOMMU reset order
  2025-02-07 16:54 ` Peter Xu
@ 2025-02-07 17:06   ` Peter Maydell
  2025-02-07 17:31     ` Peter Xu
  2025-02-07 17:25   ` Cédric Le Goater
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2025-02-07 17:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang,
	imammedo, alex.williamson, clg, philmd, zhenzhong.duan, ddutile

On Fri, 7 Feb 2025 at 16:54, Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> > This is a follow-up of Peter's attempt to fix the fact that
> > vIOMMUs are likely to be reset before the device they protect:
> >
> > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> >
> > This is especially observed with virtio devices when a qmp system_reset
> > command is sent but also with VFIO devices.
> >
> > This series puts the vIOMMU reset in the 3-phase exit callback.
> >
> > This scheme was tested successful with virtio-devices and some
> > VFIO devices. Nevertheless not all the topologies have been
> > tested yet.
>
> Eric,
>
> It's great to know that we seem to be able to fix everything in such small
> changeset!
>
> I would like to double check two things with you here:
>
>   - For VFIO's reset hook, looks like we have landed more changes so that
>     vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
>     reset during "hold" phase only (via legacy_reset_hold()).  That part
>     will make sure vIOMMU (if switching to exit()-only reset) will order
>     properly with VFIO.  Is my understanding correct here?

Yes, we now do a reset of the whole system as a three-phase setup,
and the old pre-three-phase reset APIs like qemu_register_reset() and
device_class_set_legacy_reset() all happen during the "hold" phase.

>   - Is it possible if some PCIe devices that will provide its own
>     phase.exit(), would it matter on the order of PCIe device's
>     phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
>     exit()-only approach like this one)?

It's certainly possible for a PCIe device to implement
a three-phase reset which does things in the exit phase. However
I think I would say that such a device which didn't cancel all
outstanding DMA operations during either 'enter' or 'hold'
phases would be broken. If it did some other things during
the 'exit' phase I don't think the ordering of those vs the
iommu 'exit' handling should matter.

(To some extent the splitting into three phases is trying
to set up a consistent model as outlined in docs/devel/reset.rst
and to some extent it's just a convenient way to get a basic
"this reset thing I need to do must happen after some other
device has done its reset things" which you can achieve
by ad-hoc putting them in different phases. Ideally we get
mostly the former and a little pragmatic dose of the latter,
but the consistent model is not very solidly nailed down
so I have a feeling the proportions may not be quite as
lopsided as we'd like :-) )

thanks
-- PMM

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Fix vIOMMU reset order
  2025-02-07 17:06   ` Peter Maydell
@ 2025-02-07 17:31     ` Peter Xu
  2025-02-10  8:45       ` Eric Auger
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2025-02-07 17:31 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Eric Auger, eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang,
	imammedo, alex.williamson, clg, philmd, zhenzhong.duan, ddutile

On Fri, Feb 07, 2025 at 05:06:20PM +0000, Peter Maydell wrote:
> On Fri, 7 Feb 2025 at 16:54, Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> > > This is a follow-up of Peter's attempt to fix the fact that
> > > vIOMMUs are likely to be reset before the device they protect:
> > >
> > > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> > > https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
> > >
> > > This is especially observed with virtio devices when a qmp system_reset
> > > command is sent but also with VFIO devices.
> > >
> > > This series puts the vIOMMU reset in the 3-phase exit callback.
> > >
> > > This scheme was tested successful with virtio-devices and some
> > > VFIO devices. Nevertheless not all the topologies have been
> > > tested yet.
> >
> > Eric,
> >
> > It's great to know that we seem to be able to fix everything in such small
> > changeset!
> >
> > I would like to double check two things with you here:
> >
> >   - For VFIO's reset hook, looks like we have landed more changes so that
> >     vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
> >     reset during "hold" phase only (via legacy_reset_hold()).  That part
> >     will make sure vIOMMU (if switching to exit()-only reset) will order
> >     properly with VFIO.  Is my understanding correct here?
> 
> Yes, we now do a reset of the whole system as a three-phase setup,
> and the old pre-three-phase reset APIs like qemu_register_reset() and
> device_class_set_legacy_reset() all happen during the "hold" phase.
> 
> >   - Is it possible if some PCIe devices that will provide its own
> >     phase.exit(), would it matter on the order of PCIe device's
> >     phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
> >     exit()-only approach like this one)?
> 
> It's certainly possible for a PCIe device to implement
> a three-phase reset which does things in the exit phase. However
> I think I would say that such a device which didn't cancel all
> outstanding DMA operations during either 'enter' or 'hold'
> phases would be broken. If it did some other things during
> the 'exit' phase I don't think the ordering of those vs the
> iommu 'exit' handling should matter.

Yes, this sounds fair.

> 
> (To some extent the splitting into three phases is trying
> to set up a consistent model as outlined in docs/devel/reset.rst
> and to some extent it's just a convenient way to get a basic
> "this reset thing I need to do must happen after some other
> device has done its reset things" which you can achieve
> by ad-hoc putting them in different phases. Ideally we get
> mostly the former and a little pragmatic dose of the latter,
> but the consistent model is not very solidly nailed down
> so I have a feeling the proportions may not be quite as
> lopsided as we'd like :-) )

Yes, it's a good move that we can have other ways to fix all the problems
without major surgery, and it also looks solid and clean if we have plan to
fix any outlier PCIe devices.

If there will be a repost after all, not sure if Eric would like to add
some of above discussions into either some commit messages or cover letter.
Or some comment in the code might be even better.

Thanks!

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Fix vIOMMU reset order
  2025-02-07 17:31     ` Peter Xu
@ 2025-02-10  8:45       ` Eric Auger
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2025-02-10  8:45 UTC (permalink / raw)
  To: Peter Xu, Peter Maydell
  Cc: eric.auger.pro, qemu-devel, qemu-arm, mst, jasowang, imammedo,
	alex.williamson, clg, philmd, zhenzhong.duan, ddutile


Hi,

On 2/7/25 6:31 PM, Peter Xu wrote:
> On Fri, Feb 07, 2025 at 05:06:20PM +0000, Peter Maydell wrote:
>> On Fri, 7 Feb 2025 at 16:54, Peter Xu <peterx@redhat.com> wrote:
>>> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
>>>> This is a follow-up of Peter's attempt to fix the fact that
>>>> vIOMMUs are likely to be reset before the device they protect:
>>>>
>>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>>>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
>>>>
>>>> This is especially observed with virtio devices when a qmp system_reset
>>>> command is sent but also with VFIO devices.
>>>>
>>>> This series puts the vIOMMU reset in the 3-phase exit callback.
>>>>
>>>> This scheme was tested successful with virtio-devices and some
>>>> VFIO devices. Nevertheless not all the topologies have been
>>>> tested yet.
>>> Eric,
>>>
>>> It's great to know that we seem to be able to fix everything in such small
>>> changeset!
>>>
>>> I would like to double check two things with you here:
>>>
>>>   - For VFIO's reset hook, looks like we have landed more changes so that
>>>     vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
>>>     reset during "hold" phase only (via legacy_reset_hold()).  That part
>>>     will make sure vIOMMU (if switching to exit()-only reset) will order
>>>     properly with VFIO.  Is my understanding correct here?
>> Yes, we now do a reset of the whole system as a three-phase setup,
>> and the old pre-three-phase reset APIs like qemu_register_reset() and
>> device_class_set_legacy_reset() all happen during the "hold" phase.
>>
>>>   - Is it possible if some PCIe devices that will provide its own
>>>     phase.exit(), would it matter on the order of PCIe device's
>>>     phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
>>>     exit()-only approach like this one)?
>> It's certainly possible for a PCIe device to implement
>> a three-phase reset which does things in the exit phase. However
>> I think I would say that such a device which didn't cancel all
>> outstanding DMA operations during either 'enter' or 'hold'
>> phases would be broken. If it did some other things during
>> the 'exit' phase I don't think the ordering of those vs the
>> iommu 'exit' handling should matter.
> Yes, this sounds fair.
>
>> (To some extent the splitting into three phases is trying
>> to set up a consistent model as outlined in docs/devel/reset.rst
>> and to some extent it's just a convenient way to get a basic
>> "this reset thing I need to do must happen after some other
>> device has done its reset things" which you can achieve
>> by ad-hoc putting them in different phases. Ideally we get
>> mostly the former and a little pragmatic dose of the latter,
>> but the consistent model is not very solidly nailed down
>> so I have a feeling the proportions may not be quite as
>> lopsided as we'd like :-) )
> Yes, it's a good move that we can have other ways to fix all the problems
> without major surgery, and it also looks solid and clean if we have plan to
> fix any outlier PCIe devices.
>
> If there will be a repost after all, not sure if Eric would like to add
> some of above discussions into either some commit messages or cover letter.
> Or some comment in the code might be even better.
Yes I will definitively augment commit msgs/cover letter with all those
considerations. Thank you very much for this rich discussion!

Eric
>
> Thanks!
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Fix vIOMMU reset order
  2025-02-07 16:54 ` Peter Xu
  2025-02-07 17:06   ` Peter Maydell
@ 2025-02-07 17:25   ` Cédric Le Goater
  2025-02-10  8:49     ` Eric Auger
  1 sibling, 1 reply; 30+ messages in thread
From: Cédric Le Goater @ 2025-02-07 17:25 UTC (permalink / raw)
  To: Peter Xu, Eric Auger
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell, mst,
	jasowang, imammedo, alex.williamson, philmd, zhenzhong.duan,
	ddutile

On 2/7/25 17:54, Peter Xu wrote:
> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
>> This is a follow-up of Peter's attempt to fix the fact that
>> vIOMMUs are likely to be reset before the device they protect:
>>
>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
>>
>> This is especially observed with virtio devices when a qmp system_reset
>> command is sent but also with VFIO devices.
>>
>> This series puts the vIOMMU reset in the 3-phase exit callback.
>>
>> This scheme was tested successful with virtio-devices and some
>> VFIO devices. Nevertheless not all the topologies have been
>> tested yet.
> 
> Eric,
> 
> It's great to know that we seem to be able to fix everything in such small
> changeset!
> 
> I would like to double check two things with you here:
> 
>    - For VFIO's reset hook, looks like we have landed more changes so that
>      vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
>      reset during "hold" phase only (via legacy_reset_hold()).  That part
>      will make sure vIOMMU (if switching to exit()-only reset) will order
>      properly with VFIO.  Is my understanding correct here?


Eric,

We were still seeing DMA errors from VFIO devices :

   VFIO_MAP_DMA failed: Bad address

with this series at shutdown (machine or OS) when using an intel_iommu
device. We could see that the VIOMMU was reset and the device DMAs
were still alive. Do you know why now ?

Thanks,

C.


> 
>    - Is it possible if some PCIe devices that will provide its own
>      phase.exit(), would it matter on the order of PCIe device's
>      phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
>      exit()-only approach like this one)?
> 
> PS: it would be great to attach such information in either cover letter or
> commit message.  But definitely not a request to repost the patchset, if
> Michael would have Message-ID when merge that'll be far enough to help
> anyone find this discussion again.
> 
> Thanks!
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] Fix vIOMMU reset order
  2025-02-07 17:25   ` Cédric Le Goater
@ 2025-02-10  8:49     ` Eric Auger
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2025-02-10  8:49 UTC (permalink / raw)
  To: Cédric Le Goater, Peter Xu
  Cc: eric.auger.pro, qemu-devel, qemu-arm, peter.maydell, mst,
	jasowang, imammedo, alex.williamson, philmd, zhenzhong.duan,
	ddutile

Hi Cédric,


On 2/7/25 6:25 PM, Cédric Le Goater wrote:
> On 2/7/25 17:54, Peter Xu wrote:
>> On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
>>> This is a follow-up of Peter's attempt to fix the fact that
>>> vIOMMUs are likely to be reset before the device they protect:
>>>
>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>>> https://lore.kernel.org/all/20240117091559.144730-1-peterx@redhat.com/
>>>
>>> This is especially observed with virtio devices when a qmp system_reset
>>> command is sent but also with VFIO devices.
>>>
>>> This series puts the vIOMMU reset in the 3-phase exit callback.
>>>
>>> This scheme was tested successful with virtio-devices and some
>>> VFIO devices. Nevertheless not all the topologies have been
>>> tested yet.
>>
>> Eric,
>>
>> It's great to know that we seem to be able to fix everything in such
>> small
>> changeset!
>>
>> I would like to double check two things with you here:
>>
>>    - For VFIO's reset hook, looks like we have landed more changes so
>> that
>>      vfio's reset function is now a TYPE_LEGACY_RESET, and it always
>> do the
>>      reset during "hold" phase only (via legacy_reset_hold()).  That
>> part
>>      will make sure vIOMMU (if switching to exit()-only reset) will
>> order
>>      properly with VFIO.  Is my understanding correct here?
>
>
> Eric,
>
> We were still seeing DMA errors from VFIO devices :
>
>   VFIO_MAP_DMA failed: Bad address
>
> with this series at shutdown (machine or OS) when using an intel_iommu
> device. We could see that the VIOMMU was reset and the device DMAs
> were still alive. Do you know why now ?

I have started debugging this other case. At first sight this looks like
a different problem. First this occurs on a qmp system_powerdown
The error messages do not occur on qemu reset but rather as a result of
the guest disabling the intel iommu anc curiously when the aliased IOMMU
MR (nodma) is re-enabled. I need more time to debug this.

Eric

>
> Thanks,
>
> C.
>
>
>>
>>    - Is it possible if some PCIe devices that will provide its own
>>      phase.exit(), would it matter on the order of PCIe device's
>>      phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
>>      exit()-only approach like this one)?
>>
>> PS: it would be great to attach such information in either cover
>> letter or
>> commit message.  But definitely not a request to repost the patchset, if
>> Michael would have Message-ID when merge that'll be far enough to help
>> anyone find this discussion again.
>>
>> Thanks!
>>
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-02-17  7:32 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-06 14:21 [PATCH 0/5] Fix vIOMMU reset order Eric Auger
2025-02-06 14:21 ` [PATCH 1/5] hw/virtio/virtio-iommu: Migrate to 3-phase reset Eric Auger
2025-02-06 14:21 ` [PATCH 2/5] hw/i386/intel-iommu: " Eric Auger
2025-02-06 14:21 ` [PATCH 3/5] hw/i386/intel_iommu: Tear down address spaces before IOMMU reset Eric Auger
2025-02-17  3:02   ` Duan, Zhenzhong
2025-02-17  7:31     ` Eric Auger
2025-02-06 14:21 ` [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase Eric Auger
2025-02-07 16:37   ` Peter Maydell
2025-02-07 16:50     ` Eric Auger
2025-02-07 16:58       ` Peter Maydell
2025-02-07 17:47         ` Peter Xu
2025-02-07 18:18           ` Peter Maydell
2025-02-10  8:47             ` Eric Auger
2025-02-10 14:14             ` Peter Xu
2025-02-10 14:22               ` Peter Maydell
2025-02-12 17:28                 ` Cédric Le Goater
2025-02-10  8:35           ` Eric Auger
2025-02-10 14:18             ` Peter Xu
2025-02-10  8:40         ` Eric Auger
2025-02-06 14:21 ` [PATCH 5/5] hw/vfio/common: Add a trace point in vfio_reset_handler Eric Auger
2025-02-07 17:18   ` Cédric Le Goater
2025-02-07 11:09 ` [PATCH 0/5] Fix vIOMMU reset order Michael S. Tsirkin
2025-02-07 16:40   ` Peter Maydell
2025-02-07 16:52     ` Eric Auger
2025-02-07 16:54 ` Peter Xu
2025-02-07 17:06   ` Peter Maydell
2025-02-07 17:31     ` Peter Xu
2025-02-10  8:45       ` Eric Auger
2025-02-07 17:25   ` Cédric Le Goater
2025-02-10  8:49     ` Eric Auger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).