qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes
@ 2025-07-16  7:31 Sairaj Kodilkar
  2025-07-16  7:31 ` [PATCH 1/7] hw/i386/amd_iommu: Fix MMIO register write tracing Sairaj Kodilkar
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-16  7:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Sairaj Kodilkar

This series provides few cleanups and fixes for the amd iommu

The patches are based on top of 56c6e249b698 (v10.0.0-rc3) and Alejandro's
DMA remapping series [1].

[1] https://lore.kernel.org/all/20250502021605.1795985-1-alejandro.j.jimenez@oracle.com/

The series is uploaded on github:
https://github.com/AMDESE/qemu-iommu/tree/sarunkod/alej%2Bcleanup-v1

Sairaj Kodilkar (7):
  hw/i386/amd_iommu: Fix MMIO register write tracing
  hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field
  hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end
  hw/i386/amd_iommu: Support MMIO writes to the status register
  hw/i386/amd_iommu: Fix event log generation
  hw/i386/amd_iommu: Fix handling device on buses != 0
  hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup

 hw/i386/amd_iommu.c | 217 ++++++++++++++++++++++++++++----------------
 hw/i386/amd_iommu.h |   9 +-
 2 files changed, 146 insertions(+), 80 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] hw/i386/amd_iommu: Fix MMIO register write tracing
  2025-07-16  7:31 [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Sairaj Kodilkar
@ 2025-07-16  7:31 ` Sairaj Kodilkar
  2025-07-16 12:31   ` Philippe Mathieu-Daudé
  2025-07-16  7:31 ` [PATCH 2/7] hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field Sairaj Kodilkar
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-16  7:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Sairaj Kodilkar, Vasant Hegde

Define separate functions to trace MMIO write accesses instead of using
`trace_amdvi_mmio_read()` for both read and write.

Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
---
 hw/i386/amd_iommu.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index e8af24cedb02..7a9d90f00bee 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1358,18 +1358,31 @@ static void amdvi_cmdbuf_run(AMDVIState *s)
     }
 }
 
-static void amdvi_mmio_trace(hwaddr addr, unsigned size)
+static inline uint8_t amdvi_mmio_get_index(hwaddr addr)
 {
     uint8_t index = (addr & ~0x2000) / 8;
 
     if ((addr & 0x2000)) {
         /* high table */
         index = index >= AMDVI_MMIO_REGS_HIGH ? AMDVI_MMIO_REGS_HIGH : index;
-        trace_amdvi_mmio_read(amdvi_mmio_high[index], addr, size, addr & ~0x07);
     } else {
         index = index >= AMDVI_MMIO_REGS_LOW ? AMDVI_MMIO_REGS_LOW : index;
-        trace_amdvi_mmio_read(amdvi_mmio_low[index], addr, size, addr & ~0x07);
     }
+
+    return index;
+}
+
+static void amdvi_mmio_trace_read(hwaddr addr, unsigned size)
+{
+    uint8_t index = amdvi_mmio_get_index(addr);
+    trace_amdvi_mmio_read(amdvi_mmio_low[index], addr, size, addr & ~0x07);
+}
+
+static void amdvi_mmio_trace_write(hwaddr addr, unsigned size, uint64_t val)
+{
+    uint8_t index = amdvi_mmio_get_index(addr);
+    trace_amdvi_mmio_write(amdvi_mmio_low[index], addr, size, val,
+                           addr & ~0x07);
 }
 
 static uint64_t amdvi_mmio_read(void *opaque, hwaddr addr, unsigned size)
@@ -1389,7 +1402,7 @@ static uint64_t amdvi_mmio_read(void *opaque, hwaddr addr, unsigned size)
     } else if (size == 8) {
         val = amdvi_readq(s, addr);
     }
-    amdvi_mmio_trace(addr, size);
+    amdvi_mmio_trace_read(addr, size);
 
     return val;
 }
@@ -1536,7 +1549,7 @@ static void amdvi_mmio_write(void *opaque, hwaddr addr, uint64_t val,
         return;
     }
 
-    amdvi_mmio_trace(addr, size);
+    amdvi_mmio_trace_write(addr, size, val);
     switch (addr & ~0x07) {
     case AMDVI_MMIO_CONTROL:
         amdvi_mmio_reg_write(s, size, val, addr);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field
  2025-07-16  7:31 [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Sairaj Kodilkar
  2025-07-16  7:31 ` [PATCH 1/7] hw/i386/amd_iommu: Fix MMIO register write tracing Sairaj Kodilkar
@ 2025-07-16  7:31 ` Sairaj Kodilkar
  2025-07-16 12:35   ` Philippe Mathieu-Daudé
  2025-07-16  7:31 ` [PATCH 3/7] hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end Sairaj Kodilkar
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-16  7:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Sairaj Kodilkar, Vasant Hegde

The ats_enabled field is set using HTTUNEN, which is wrong.
Fix this by removing the field as it is never used.

Fixes: d29a09ca68428 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
---
 hw/i386/amd_iommu.c | 2 --
 hw/i386/amd_iommu.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 7a9d90f00bee..c8fa98142940 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1412,7 +1412,6 @@ static void amdvi_handle_control_write(AMDVIState *s)
     unsigned long control = amdvi_readq(s, AMDVI_MMIO_CONTROL);
     s->enabled = !!(control & AMDVI_MMIO_CONTROL_AMDVIEN);
 
-    s->ats_enabled = !!(control & AMDVI_MMIO_CONTROL_HTTUNEN);
     s->evtlog_enabled = s->enabled && !!(control &
                         AMDVI_MMIO_CONTROL_EVENTLOGEN);
 
@@ -2263,7 +2262,6 @@ static void amdvi_init(AMDVIState *s)
     s->excl_allow = false;
     s->mmio_enabled = false;
     s->enabled = false;
-    s->ats_enabled = false;
     s->cmdbuf_enabled = false;
 
     /* reset MMIO */
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index fc4d2f7a4575..62641b779ca3 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -368,7 +368,6 @@ struct AMDVIState {
     uint64_t mmio_addr;
 
     bool enabled;                /* IOMMU enabled                */
-    bool ats_enabled;            /* address translation enabled  */
     bool cmdbuf_enabled;         /* command buffer enabled       */
     bool evtlog_enabled;         /* event log enabled            */
     bool excl_enabled;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end
  2025-07-16  7:31 [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Sairaj Kodilkar
  2025-07-16  7:31 ` [PATCH 1/7] hw/i386/amd_iommu: Fix MMIO register write tracing Sairaj Kodilkar
  2025-07-16  7:31 ` [PATCH 2/7] hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field Sairaj Kodilkar
@ 2025-07-16  7:31 ` Sairaj Kodilkar
  2025-07-16  7:31 ` [PATCH 4/7] hw/i386/amd_iommu: Support MMIO writes to the status register Sairaj Kodilkar
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-16  7:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Sairaj Kodilkar, Vasant Hegde

Setting up IOAPIC memory region requires mr_sys and mr_ir. Currently
these two memory regions are setup after the initializing the IOAPIC
memory region, which cause `amdvi_host_dma_iommu()` to use unitialized
mr_sys and mr_ir.

Move the IOAPIC memory region initialization to the end in order to use
the mr_sys and mr_ir regions after they are fully initialized.

Fixes: 577c470f4326 ("x86_iommu/amd: Prepare for interrupt remap support")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
---
 hw/i386/amd_iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index c8fa98142940..784be78f402d 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -2343,9 +2343,6 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    /* Pseudo address space under root PCI bus. */
-    x86ms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_IOAPIC_SB_DEVID);
-
     /* set up MMIO */
     memory_region_init_io(&s->mr_mmio, OBJECT(s), &mmio_mem_ops, s,
                           "amdvi-mmio", AMDVI_MMIO_SIZE);
@@ -2368,6 +2365,9 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
     memory_region_add_subregion_overlap(&s->mr_sys, AMDVI_INT_ADDR_FIRST,
                                         &s->mr_ir, 1);
 
+    /* Pseudo address space under root PCI bus. */
+    x86ms->ioapic_as = amdvi_host_dma_iommu(bus, s, AMDVI_IOAPIC_SB_DEVID);
+
     if (kvm_enabled() && x86ms->apic_id_limit > 255 && !s->xtsup) {
         error_report("AMD IOMMU with x2APIC configuration requires xtsup=on");
         exit(EXIT_FAILURE);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] hw/i386/amd_iommu: Support MMIO writes to the status register
  2025-07-16  7:31 [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Sairaj Kodilkar
                   ` (2 preceding siblings ...)
  2025-07-16  7:31 ` [PATCH 3/7] hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end Sairaj Kodilkar
@ 2025-07-16  7:31 ` Sairaj Kodilkar
  2025-07-16 14:27   ` Ethan MILON
  2025-07-16  7:31 ` [PATCH 5/7] hw/i386/amd_iommu: Fix event log generation Sairaj Kodilkar
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-16  7:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Sairaj Kodilkar, Vasant Hegde

Support the writes to the status register so that guest can reset the
EventOverflow, EventLogInt, ComWaitIntr, etc bits after servicing the
respective interrupt.

Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
---
 hw/i386/amd_iommu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 784be78f402d..e0f4220b8f25 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1613,6 +1613,9 @@ static void amdvi_mmio_write(void *opaque, hwaddr addr, uint64_t val,
         amdvi_mmio_reg_write(s, size, val, addr);
         amdvi_handle_pprtail_write(s);
         break;
+    case AMDVI_MMIO_STATUS:
+        amdvi_mmio_reg_write(s, size, val, addr);
+        break;
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] hw/i386/amd_iommu: Fix event log generation
  2025-07-16  7:31 [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Sairaj Kodilkar
                   ` (3 preceding siblings ...)
  2025-07-16  7:31 ` [PATCH 4/7] hw/i386/amd_iommu: Support MMIO writes to the status register Sairaj Kodilkar
@ 2025-07-16  7:31 ` Sairaj Kodilkar
  2025-07-16 14:50   ` Ethan MILON
  2025-07-16  7:31 ` [PATCH 6/7] hw/i386/amd_iommu: Fix handling device on buses != 0 Sairaj Kodilkar
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-16  7:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Sairaj Kodilkar, Vasant Hegde

Current event logging code is broken, because of following issues

1. The code uses '|' instead of '&' to test the bit field, which causes
   vIOMMU to generate overflow interrupt for every log entry.
2. Code does not update the eventlog tail MMIO register after adding an
   entry to the buffer, because of which guest cannot process new
   entries (as head == tail means buffer is empty).
3. Compares eventlog tail (which is byte offset in the buffer) to
   eventlog length (which is number of maximum entries in the buffer).
   This causes vIOMMU to generate only fix number of event logs, after
   which it keeps on generating overflow interrupts, without
   actually resetting the log buffer.
4. Updates ComWaitInt instead of EventLogInt bitfield in Status
   register. Guest checks this field to see if there are new event log
   entries in the buffer.

Fix above issues, so that guest can process event log entries.

Fixes: d29a09ca68428 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
---
 hw/i386/amd_iommu.c | 20 ++++++++++++++++----
 hw/i386/amd_iommu.h |  1 +
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index e0f4220b8f25..a34062153194 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -172,7 +172,7 @@ static void amdvi_writeq(AMDVIState *s, hwaddr addr, uint64_t val)
 /* OR a 64-bit register with a 64-bit value */
 static bool amdvi_test_mask(AMDVIState *s, hwaddr addr, uint64_t val)
 {
-    return amdvi_readq(s, addr) | val;
+    return amdvi_readq(s, addr) & val;
 }
 
 /* OR a 64-bit register with a 64-bit value storing result in the register */
@@ -201,16 +201,26 @@ static void amdvi_generate_msi_interrupt(AMDVIState *s)
     }
 }
 
+static uint32_t get_next_eventlog_entry(AMDVIState *s)
+{
+    uint32_t evtlog_size = s->evtlog_len * AMDVI_EVENT_LEN;
+    return (s->evtlog_tail + AMDVI_EVENT_LEN) % evtlog_size;
+}
+
 static void amdvi_log_event(AMDVIState *s, uint64_t *evt)
 {
+    uint32_t evtlog_tail_next;
+
     /* event logging not enabled */
     if (!s->evtlog_enabled || amdvi_test_mask(s, AMDVI_MMIO_STATUS,
         AMDVI_MMIO_STATUS_EVT_OVF)) {
         return;
     }
 
+    evtlog_tail_next = get_next_eventlog_entry(s);
+
     /* event log buffer full */
-    if (s->evtlog_tail >= s->evtlog_len) {
+    if (evtlog_tail_next == s->evtlog_head) {
         amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVT_OVF);
         /* generate interrupt */
         amdvi_generate_msi_interrupt(s);
@@ -222,8 +232,10 @@ static void amdvi_log_event(AMDVIState *s, uint64_t *evt)
         trace_amdvi_evntlog_fail(s->evtlog, s->evtlog_tail);
     }
 
-    s->evtlog_tail += AMDVI_EVENT_LEN;
-    amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_COMP_INT);
+    s->evtlog_tail = evtlog_tail_next;
+    amdvi_writeq(s, AMDVI_MMIO_EVENT_TAIL, s->evtlog_tail);
+
+    amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVENT_INT);
     amdvi_generate_msi_interrupt(s);
 }
 
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 62641b779ca3..3dd4e7e3e8b8 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -111,6 +111,7 @@
 #define AMDVI_MMIO_STATUS_CMDBUF_RUN  (1 << 4)
 #define AMDVI_MMIO_STATUS_EVT_RUN     (1 << 3)
 #define AMDVI_MMIO_STATUS_COMP_INT    (1 << 2)
+#define AMDVI_MMIO_STATUS_EVENT_INT   (1 << 1)
 #define AMDVI_MMIO_STATUS_EVT_OVF     (1 << 0)
 
 #define AMDVI_CMDBUF_ID_BYTE              0x07
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] hw/i386/amd_iommu: Fix handling device on buses != 0
  2025-07-16  7:31 [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Sairaj Kodilkar
                   ` (4 preceding siblings ...)
  2025-07-16  7:31 ` [PATCH 5/7] hw/i386/amd_iommu: Fix event log generation Sairaj Kodilkar
@ 2025-07-16  7:31 ` Sairaj Kodilkar
  2025-07-16 15:18   ` Ethan MILON
  2025-07-16  7:31 ` [PATCH 7/7] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup Sairaj Kodilkar
  2025-07-16 12:37 ` [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Philippe Mathieu-Daudé
  7 siblings, 1 reply; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-16  7:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Sairaj Kodilkar, Alexey Kardashevskiy

The AMD IOMMU is set up at boot time and uses PCI bus numbers + devfn
for indexing into DTE. The problem is that before the guest started,
all PCI bus numbers are 0 as no PCI discovery happened yet (BIOS or/and
kernel will do that later) so relying on the bus number is wrong.
The immediate effect is emulated devices cannot do DMA when places on
a bus other that 0.

Replace static array of address_space with hash table which uses devfn and
PCIBus* for key as it is not going to change after the guest is booted.

Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
---
 hw/i386/amd_iommu.c | 124 +++++++++++++++++++++++++++-----------------
 hw/i386/amd_iommu.h |   2 +-
 2 files changed, 76 insertions(+), 50 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index a34062153194..33916b458611 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -59,7 +59,7 @@ const char *amdvi_mmio_high[] = {
 };
 
 struct AMDVIAddressSpace {
-    uint8_t bus_num;            /* bus number                           */
+    PCIBus *bus;                /* PCIBus (for bus number)              */
     uint8_t devfn;              /* device function                      */
     AMDVIState *iommu_state;    /* AMDVI - one per machine              */
     MemoryRegion root;          /* AMDVI Root memory map region         */
@@ -101,6 +101,11 @@ typedef enum AMDVIFaultReason {
     AMDVI_FR_PT_ENTRY_INV,      /* Failure to read PTE from guest memory */
 } AMDVIFaultReason;
 
+typedef struct amdvi_as_key {
+    PCIBus *bus;
+    int devfn;
+} amdvi_as_key;
+
 uint64_t amdvi_extended_feature_register(AMDVIState *s)
 {
     uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
@@ -360,6 +365,42 @@ static guint amdvi_uint64_hash(gconstpointer v)
     return (guint)*(const uint64_t *)v;
 }
 
+static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
+{
+    const struct amdvi_as_key *key1 = v1;
+    const struct amdvi_as_key *key2 = v2;
+
+    return key1->bus == key2->bus && key1->devfn == key2->devfn;
+}
+
+static guint amdvi_as_hash(gconstpointer v)
+{
+    const struct amdvi_as_key *key = v;
+    return (guint)((uint64_t)key->bus | (key->devfn << 24));
+}
+
+static AMDVIAddressSpace *amdvi_as_lookup(AMDVIState *s, PCIBus *bus,
+                                          int devfn)
+{
+    amdvi_as_key key = { .bus = bus, .devfn = devfn };
+    return g_hash_table_lookup(s->address_spaces, &key);
+}
+
+static int amdvi_find_as_by_devid(gpointer key, gpointer value,
+                                  gpointer user_data)
+{
+    amdvi_as_key *as = (struct amdvi_as_key *)key;
+    uint16_t devid = *((uint16_t *)user_data);
+
+    return devid == PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
+}
+
+static AMDVIAddressSpace *amdvi_get_as_by_devid(AMDVIState *s, uint16_t devid)
+{
+    return g_hash_table_find(s->address_spaces,
+                             amdvi_find_as_by_devid, &devid);
+}
+
 static AMDVIIOTLBEntry *amdvi_iotlb_lookup(AMDVIState *s, hwaddr addr,
                                            uint64_t devid)
 {
@@ -530,7 +571,7 @@ static inline uint64_t amdvi_get_pte_entry(AMDVIState *s, uint64_t pte_addr,
 
 static int amdvi_as_to_dte(AMDVIAddressSpace *as, uint64_t *dte)
 {
-    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
+    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
     AMDVIState *s = as->iommu_state;
 
     if (!amdvi_get_dte(s, devid, dte)) {
@@ -983,23 +1024,13 @@ static void amdvi_switch_address_space(AMDVIAddressSpace *amdvi_as)
  */
 static void amdvi_switch_address_space_all(AMDVIState *s)
 {
-    AMDVIAddressSpace **iommu_as;
-
-    for (int bus_num = 0; bus_num < PCI_BUS_MAX; bus_num++) {
-
-        /* Nothing to do if there are no devices on the current bus */
-        if (!s->address_spaces[bus_num]) {
-            continue;
-        }
-        iommu_as = s->address_spaces[bus_num];
+    AMDVIAddressSpace *iommu_as;
+    GHashTableIter as_it;
 
-        for (int devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
+    g_hash_table_iter_init(&as_it, s->address_spaces);
 
-            if (!iommu_as[devfn]) {
-                continue;
-            }
-            amdvi_switch_address_space(iommu_as[devfn]);
-        }
+    while (g_hash_table_iter_next(&as_it, NULL, (void **)&iommu_as)) {
+            amdvi_switch_address_space(iommu_as);
     }
 }
 
@@ -1012,28 +1043,22 @@ static void amdvi_switch_address_space_all(AMDVIState *s)
  */
 static void amdvi_update_addr_translation_mode(AMDVIState *s, uint16_t devid)
 {
-    uint8_t bus_num, devfn, dte_mode;
+    uint8_t dte_mode;
     AMDVIAddressSpace *as;
     uint64_t dte[4] = { 0 };
     IOMMUNotifier *n;
     int ret;
 
-    /*
-     * Convert the devid encoded in the command to a bus and devfn in
-     * order to retrieve the corresponding address space.
-     */
-    bus_num = PCI_BUS_NUM(devid);
-    devfn = devid & 0xff;
-
     /*
      * The main buffer of size (AMDVIAddressSpace *) * (PCI_BUS_MAX) has already
      * been allocated within AMDVIState, but must be careful to not access
      * unallocated devfn.
      */
-    if (!s->address_spaces[bus_num] || !s->address_spaces[bus_num][devfn]) {
+
+    as = amdvi_get_as_by_devid(s, devid);
+    if (!as) {
         return;
     }
-    as = s->address_spaces[bus_num][devfn];
 
     ret = amdvi_as_to_dte(as, dte);
 
@@ -1699,7 +1724,7 @@ static void amdvi_do_translate(AMDVIAddressSpace *as, hwaddr addr,
                                bool is_write, IOMMUTLBEntry *ret)
 {
     AMDVIState *s = as->iommu_state;
-    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
+    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
     AMDVIIOTLBEntry *iotlb_entry = amdvi_iotlb_lookup(s, addr, devid);
     uint64_t entry[4];
     int dte_ret;
@@ -1773,7 +1798,7 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
     }
 
     amdvi_do_translate(as, addr, flag & IOMMU_WO, &ret);
-    trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
+    trace_amdvi_translation_result(pci_bus_num(as->bus), PCI_SLOT(as->devfn),
             PCI_FUNC(as->devfn), addr, ret.translated_addr);
     return ret;
 }
@@ -2137,30 +2162,28 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
     char name[128];
     AMDVIState *s = opaque;
-    AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
-    int bus_num = pci_bus_num(bus);
+    AMDVIAddressSpace *amdvi_dev_as;
+    amdvi_as_key *key;
 
-    iommu_as = s->address_spaces[bus_num];
+    amdvi_dev_as = amdvi_as_lookup(s, bus, devfn);
 
     /* allocate memory during the first run */
-    if (!iommu_as) {
-        iommu_as = g_new0(AMDVIAddressSpace *, PCI_DEVFN_MAX);
-        s->address_spaces[bus_num] = iommu_as;
-    }
-
-    /* set up AMD-Vi region */
-    if (!iommu_as[devfn]) {
+    if (!amdvi_dev_as) {
         snprintf(name, sizeof(name), "amd_iommu_devfn_%d", devfn);
 
-        iommu_as[devfn] = g_new0(AMDVIAddressSpace, 1);
-        iommu_as[devfn]->bus_num = (uint8_t)bus_num;
-        iommu_as[devfn]->devfn = (uint8_t)devfn;
-        iommu_as[devfn]->iommu_state = s;
-        iommu_as[devfn]->notifier_flags = IOMMU_NONE;
-        iommu_as[devfn]->iova_tree = iova_tree_new();
-        iommu_as[devfn]->addr_translation = false;
+        amdvi_dev_as = g_new0(AMDVIAddressSpace, 1);
+        key = g_new0(amdvi_as_key, 1);
 
-        amdvi_dev_as = iommu_as[devfn];
+        amdvi_dev_as->bus = bus;
+        amdvi_dev_as->devfn = (uint8_t)devfn;
+        amdvi_dev_as->iommu_state = s;
+        amdvi_dev_as->notifier_flags = IOMMU_NONE;
+        amdvi_dev_as->iova_tree = iova_tree_new();
+        amdvi_dev_as->addr_translation = false;
+        key->bus = bus;
+        key->devfn = devfn;
+
+        g_hash_table_insert(s->address_spaces, key, amdvi_dev_as);
 
         /*
          * Memory region relationships looks like (Address range shows
@@ -2203,7 +2226,7 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
         amdvi_switch_address_space(amdvi_dev_as);
     }
-    return &iommu_as[devfn]->as;
+    return &amdvi_dev_as->as;
 }
 
 static const PCIIOMMUOps amdvi_iommu_ops = {
@@ -2244,7 +2267,7 @@ static int amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
     if (!s->dma_remap && (new & IOMMU_NOTIFIER_MAP)) {
         error_setg_errno(errp, ENOTSUP,
                 "device %02x.%02x.%x requires dma-remap=1",
-                as->bus_num, PCI_SLOT(as->devfn), PCI_FUNC(as->devfn));
+                pci_bus_num(as->bus), PCI_SLOT(as->devfn), PCI_FUNC(as->devfn));
         return -ENOTSUP;
     }
 
@@ -2353,6 +2376,9 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
     s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
                                      amdvi_uint64_equal, g_free, g_free);
 
+    s->address_spaces = g_hash_table_new_full(amdvi_as_hash,
+                                     amdvi_as_equal, g_free, g_free);
+
     /* This device should take care of IOMMU PCI properties */
     if (!qdev_realize(DEVICE(&s->pci), &bus->qbus, errp)) {
         return;
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 3dd4e7e3e8b8..37a57c4dd553 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -409,7 +409,7 @@ struct AMDVIState {
     bool mmio_enabled;
 
     /* for each served device */
-    AMDVIAddressSpace **address_spaces[PCI_BUS_MAX];
+    GHashTable *address_spaces;
 
     /* list of address spaces with registered notifiers */
     QLIST_HEAD(, AMDVIAddressSpace) amdvi_as_with_notifiers;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
  2025-07-16  7:31 [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Sairaj Kodilkar
                   ` (5 preceding siblings ...)
  2025-07-16  7:31 ` [PATCH 6/7] hw/i386/amd_iommu: Fix handling device on buses != 0 Sairaj Kodilkar
@ 2025-07-16  7:31 ` Sairaj Kodilkar
  2025-07-16 12:37 ` [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Philippe Mathieu-Daudé
  7 siblings, 0 replies; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-16  7:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Sairaj Kodilkar

Physical AMD IOMMU supports upto 64 bits of DMA address. When device tries
to read or write from the given DMA address, IOMMU translates the address
using page table assigned to that device. Since IOMMU uses per device page
tables, the emulated IOMMU should use the cache tag of 68 bits
(64 bit address - 12 bit page alignment + 16 device ID).

Current emulated AMD IOMMU uses GLib hash table to create software iotlb
and uses 64 bit key to store the IOVA and deviceID, which limits the IOVA
to 60 bits. This cause failure while setting up the device when guest is
booted with "iommu.forcedac=1". One example of the failure is when there are
two virtio ethernet devices attached to the guest with

    -netdev user,id=USER0 \
    -netdev user,id=USER1 \
    -device virtio-net-pci,id=vnet0,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER0 \
    -device virtio-net-pci,id=vnet1,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER1 \

In this case dhclient fails for second device with following dmesg

[   24.802644] virtio_net virtio0 enp0s1: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 5664000 usecs ago
[   29.856716] virtio_net virtio0 enp0s1: NETDEV WATCHDOG: CPU: 59: transmit queue 0 timed out 10720 ms
[   29.858585] virtio_net virtio0 enp0s1: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 10720000 usecs ago

To solve this problem, define `struct amdvi_iotlb_key` which uses 64 bit
IOVA and 16 bit devid as key to store and lookup IOTLB entry.

Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
---
 hw/i386/amd_iommu.c | 51 ++++++++++++++++++++++++++++-----------------
 hw/i386/amd_iommu.h |  5 +++--
 2 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 33916b458611..a2a2c0f0be23 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -106,6 +106,11 @@ typedef struct amdvi_as_key {
     int devfn;
 } amdvi_as_key;
 
+typedef struct amdvi_iotlb_key {
+    uint64_t gfn;
+    uint16_t devid;
+} amdvi_iotlb_key;
+
 uint64_t amdvi_extended_feature_register(AMDVIState *s)
 {
     uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
@@ -355,16 +360,6 @@ static void amdvi_log_pagetab_error(AMDVIState *s, uint16_t devid,
              PCI_STATUS_SIG_TARGET_ABORT);
 }
 
-static gboolean amdvi_uint64_equal(gconstpointer v1, gconstpointer v2)
-{
-    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
-}
-
-static guint amdvi_uint64_hash(gconstpointer v)
-{
-    return (guint)*(const uint64_t *)v;
-}
-
 static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
 {
     const struct amdvi_as_key *key1 = v1;
@@ -401,11 +396,27 @@ static AMDVIAddressSpace *amdvi_get_as_by_devid(AMDVIState *s, uint16_t devid)
                              amdvi_find_as_by_devid, &devid);
 }
 
+static gboolean amdvi_iotlb_equal(gconstpointer v1, gconstpointer v2)
+{
+    const amdvi_iotlb_key *key1 = v1;
+    const amdvi_iotlb_key *key2 = v2;
+
+    return key1->devid == key2->devid && key1->gfn == key2->gfn;
+}
+
+static guint amdvi_iotlb_hash(gconstpointer v)
+{
+    const amdvi_iotlb_key *key = v;
+    /* Use GPA and DEVID to find the bucket */
+    return (guint)(key->gfn << AMDVI_PAGE_SHIFT_4K |
+                   (key->devid & ~AMDVI_PAGE_MASK_4K));
+}
+
+
 static AMDVIIOTLBEntry *amdvi_iotlb_lookup(AMDVIState *s, hwaddr addr,
                                            uint64_t devid)
 {
-    uint64_t key = (addr >> AMDVI_PAGE_SHIFT_4K) |
-                   ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
+    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};
     return g_hash_table_lookup(s->iotlb, &key);
 }
 
@@ -427,8 +438,7 @@ static gboolean amdvi_iotlb_remove_by_devid(gpointer key, gpointer value,
 static void amdvi_iotlb_remove_page(AMDVIState *s, hwaddr addr,
                                     uint64_t devid)
 {
-    uint64_t key = (addr >> AMDVI_PAGE_SHIFT_4K) |
-                   ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
+    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};
     g_hash_table_remove(s->iotlb, &key);
 }
 
@@ -439,8 +449,10 @@ static void amdvi_update_iotlb(AMDVIState *s, uint16_t devid,
     /* don't cache erroneous translations */
     if (to_cache.perm != IOMMU_NONE) {
         AMDVIIOTLBEntry *entry = g_new(AMDVIIOTLBEntry, 1);
-        uint64_t *key = g_new(uint64_t, 1);
-        uint64_t gfn = gpa >> AMDVI_PAGE_SHIFT_4K;
+        amdvi_iotlb_key *key = g_new(amdvi_iotlb_key, 1);
+
+        key->gfn = AMDVI_GET_IOTLB_GFN(gpa);
+        key->devid = devid;
 
         trace_amdvi_cache_update(domid, PCI_BUS_NUM(devid), PCI_SLOT(devid),
                 PCI_FUNC(devid), gpa, to_cache.translated_addr);
@@ -453,7 +465,8 @@ static void amdvi_update_iotlb(AMDVIState *s, uint16_t devid,
         entry->perms = to_cache.perm;
         entry->translated_addr = to_cache.translated_addr;
         entry->page_mask = to_cache.addr_mask;
-        *key = gfn | ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
+        entry->devid = devid;
+
         g_hash_table_replace(s->iotlb, key, entry);
     }
 }
@@ -2373,8 +2386,8 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
     X86MachineState *x86ms = X86_MACHINE(ms);
     PCIBus *bus = pcms->pcibus;
 
-    s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
-                                     amdvi_uint64_equal, g_free, g_free);
+    s->iotlb = g_hash_table_new_full(amdvi_iotlb_hash,
+                                     amdvi_iotlb_equal, g_free, g_free);
 
     s->address_spaces = g_hash_table_new_full(amdvi_as_hash,
                                      amdvi_as_equal, g_free, g_free);
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 37a57c4dd553..04d48d23bce6 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -218,8 +218,9 @@
 #define PAGE_SIZE_PTE_COUNT(pgsz)       (1ULL << ((ctz64(pgsz) - 12) % 9))
 
 /* IOTLB */
-#define AMDVI_IOTLB_MAX_SIZE 1024
-#define AMDVI_DEVID_SHIFT    36
+#define AMDVI_IOTLB_MAX_SIZE        1024
+#define AMDVI_IOTLB_DEVID_SHIFT     48
+#define AMDVI_GET_IOTLB_GFN(addr)   (addr >> AMDVI_PAGE_SHIFT_4K)
 
 /* default extended feature */
 #define AMDVI_DEFAULT_EXT_FEATURES \
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] hw/i386/amd_iommu: Fix MMIO register write tracing
  2025-07-16  7:31 ` [PATCH 1/7] hw/i386/amd_iommu: Fix MMIO register write tracing Sairaj Kodilkar
@ 2025-07-16 12:31   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-07-16 12:31 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Vasant Hegde

On 16/7/25 09:31, Sairaj Kodilkar wrote:
> Define separate functions to trace MMIO write accesses instead of using
> `trace_amdvi_mmio_read()` for both read and write.
> 
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>   hw/i386/amd_iommu.c | 23 ++++++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field
  2025-07-16  7:31 ` [PATCH 2/7] hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field Sairaj Kodilkar
@ 2025-07-16 12:35   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-07-16 12:35 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Vasant Hegde

On 16/7/25 09:31, Sairaj Kodilkar wrote:
> The ats_enabled field is set using HTTUNEN, which is wrong.
> Fix this by removing the field as it is never used.
> 
> Fixes: d29a09ca68428 ("hw/i386: Introduce AMD IOMMU")
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>   hw/i386/amd_iommu.c | 2 --
>   hw/i386/amd_iommu.h | 1 -
>   2 files changed, 3 deletions(-)


> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> index fc4d2f7a4575..62641b779ca3 100644
> --- a/hw/i386/amd_iommu.h
> +++ b/hw/i386/amd_iommu.h
> @@ -368,7 +368,6 @@ struct AMDVIState {
>       uint64_t mmio_addr;
>   
>       bool enabled;                /* IOMMU enabled                */
> -    bool ats_enabled;            /* address translation enabled  */

Since commit 28931c2e159 ("hw/i386/amd_iommu: Allow migration when
explicitly create the AMDVI-PCI device") this field is migrated, which
patch removes it?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes
  2025-07-16  7:31 [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Sairaj Kodilkar
                   ` (6 preceding siblings ...)
  2025-07-16  7:31 ` [PATCH 7/7] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup Sairaj Kodilkar
@ 2025-07-16 12:37 ` Philippe Mathieu-Daudé
  2025-07-16 12:56   ` Sairaj Kodilkar
  7 siblings, 1 reply; 24+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-07-16 12:37 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez

On 16/7/25 09:31, Sairaj Kodilkar wrote:
> This series provides few cleanups and fixes for the amd iommu
> 
> The patches are based on top of 56c6e249b698 (v10.0.0-rc3) and Alejandro's
> DMA remapping series [1].

56c6e249b698 is 4 months old, we are about to release v10.1.0-rc0.

What is the point of posting obsolete code?

I'm not going to review further.

Regards,

Phil.

> [1] https://lore.kernel.org/all/20250502021605.1795985-1-alejandro.j.jimenez@oracle.com/
> 
> The series is uploaded on github:
> https://github.com/AMDESE/qemu-iommu/tree/sarunkod/alej%2Bcleanup-v1
> 
> Sairaj Kodilkar (7):
>    hw/i386/amd_iommu: Fix MMIO register write tracing
>    hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field
>    hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end
>    hw/i386/amd_iommu: Support MMIO writes to the status register
>    hw/i386/amd_iommu: Fix event log generation
>    hw/i386/amd_iommu: Fix handling device on buses != 0
>    hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
> 
>   hw/i386/amd_iommu.c | 217 ++++++++++++++++++++++++++++----------------
>   hw/i386/amd_iommu.h |   9 +-
>   2 files changed, 146 insertions(+), 80 deletions(-)
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes
  2025-07-16 12:37 ` [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Philippe Mathieu-Daudé
@ 2025-07-16 12:56   ` Sairaj Kodilkar
  2025-07-16 13:29     ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-16 12:56 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: mst, marcel.apfelbaum, pbonzini, eduardo, richard.henderson,
	alejandro.j.jimenez, Vasant Hegde



On 7/16/2025 6:07 PM, Philippe Mathieu-Daudé wrote:
> On 16/7/25 09:31, Sairaj Kodilkar wrote:
>> This series provides few cleanups and fixes for the amd iommu
>>
>> The patches are based on top of 56c6e249b698 (v10.0.0-rc3) and 
>> Alejandro's
>> DMA remapping series [1].
> 
> 56c6e249b698 is 4 months old, we are about to release v10.1.0-rc0.
> 
> What is the point of posting obsolete code?
> 
> I'm not going to review further.

Hey Philippe,

sorry. I missed to add it in cover letter. Intention is to get feedback
on this series. Since this is on top of Alejandro's series, I will
rebase it once his series gets merged.

Regards
Sairaj

> 
> Regards,
> 
> Phil.
> 
>> [1] https://lore.kernel.org/all/20250502021605.1795985-1- 
>> alejandro.j.jimenez@oracle.com/
>>
>> The series is uploaded on github:
>> https://github.com/AMDESE/qemu-iommu/tree/sarunkod/alej%2Bcleanup-v1
>>
>> Sairaj Kodilkar (7):
>>    hw/i386/amd_iommu: Fix MMIO register write tracing
>>    hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field
>>    hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end
>>    hw/i386/amd_iommu: Support MMIO writes to the status register
>>    hw/i386/amd_iommu: Fix event log generation
>>    hw/i386/amd_iommu: Fix handling device on buses != 0
>>    hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
>>
>>   hw/i386/amd_iommu.c | 217 ++++++++++++++++++++++++++++----------------
>>   hw/i386/amd_iommu.h |   9 +-
>>   2 files changed, 146 insertions(+), 80 deletions(-)
>>
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes
  2025-07-16 12:56   ` Sairaj Kodilkar
@ 2025-07-16 13:29     ` Michael S. Tsirkin
  2025-07-17  5:47       ` Sairaj Kodilkar
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-07-16 13:29 UTC (permalink / raw)
  To: Sairaj Kodilkar
  Cc: Philippe Mathieu-Daudé, qemu-devel, marcel.apfelbaum,
	pbonzini, eduardo, richard.henderson, alejandro.j.jimenez,
	Vasant Hegde

On Wed, Jul 16, 2025 at 06:26:37PM +0530, Sairaj Kodilkar wrote:
> 
> 
> On 7/16/2025 6:07 PM, Philippe Mathieu-Daudé wrote:
> > On 16/7/25 09:31, Sairaj Kodilkar wrote:
> > > This series provides few cleanups and fixes for the amd iommu
> > > 
> > > The patches are based on top of 56c6e249b698 (v10.0.0-rc3) and
> > > Alejandro's
> > > DMA remapping series [1].
> > 
> > 56c6e249b698 is 4 months old, we are about to release v10.1.0-rc0.
> > 
> > What is the point of posting obsolete code?
> > 
> > I'm not going to review further.
> 
> Hey Philippe,
> 
> sorry. I missed to add it in cover letter. Intention is to get feedback
> on this series. Since this is on top of Alejandro's series, I will
> rebase it once his series gets merged.
> 
> Regards
> Sairaj

Merged now, go ahead and rebase.

> > 
> > Regards,
> > 
> > Phil.
> > 
> > > [1] https://lore.kernel.org/all/20250502021605.1795985-1-
> > > alejandro.j.jimenez@oracle.com/
> > > 
> > > The series is uploaded on github:
> > > https://github.com/AMDESE/qemu-iommu/tree/sarunkod/alej%2Bcleanup-v1
> > > 
> > > Sairaj Kodilkar (7):
> > >    hw/i386/amd_iommu: Fix MMIO register write tracing
> > >    hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field
> > >    hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end
> > >    hw/i386/amd_iommu: Support MMIO writes to the status register
> > >    hw/i386/amd_iommu: Fix event log generation
> > >    hw/i386/amd_iommu: Fix handling device on buses != 0
> > >    hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
> > > 
> > >   hw/i386/amd_iommu.c | 217 ++++++++++++++++++++++++++++----------------
> > >   hw/i386/amd_iommu.h |   9 +-
> > >   2 files changed, 146 insertions(+), 80 deletions(-)
> > > 
> > 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] hw/i386/amd_iommu: Support MMIO writes to the status register
  2025-07-16  7:31 ` [PATCH 4/7] hw/i386/amd_iommu: Support MMIO writes to the status register Sairaj Kodilkar
@ 2025-07-16 14:27   ` Ethan MILON
  2025-07-17  6:41     ` Sairaj Kodilkar
  0 siblings, 1 reply; 24+ messages in thread
From: Ethan MILON @ 2025-07-16 14:27 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel@nongnu.org
  Cc: mst@redhat.com, marcel.apfelbaum@gmail.com, pbonzini@redhat.com,
	eduardo@habkost.net, richard.henderson@linaro.org,
	alejandro.j.jimenez@oracle.com, Vasant Hegde

Hi,

On 7/16/25 09:31, Sairaj Kodilkar wrote:
> Support the writes to the status register so that guest can reset the
> EventOverflow, EventLogInt, ComWaitIntr, etc bits after servicing the
> respective interrupt.
> 
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  hw/i386/amd_iommu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 784be78f402d..e0f4220b8f25 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -1613,6 +1613,9 @@ static void amdvi_mmio_write(void *opaque, hwaddr addr, uint64_t val,
>          amdvi_mmio_reg_write(s, size, val, addr);
>          amdvi_handle_pprtail_write(s);
>          break;
> +    case AMDVI_MMIO_STATUS:
> +        amdvi_mmio_reg_write(s, size, val, addr);
> +        break;

This fixes the basic issue with interrupt reset, but there's still a
subtle bug: any update to the status register clears the interrupt bits,
regardless of the value.

The current W1C logic looks like an incomplete copy of the Intel
implementation, and only works properly if the W1C bits are also set in
romask.

We should update romask to include w1cmask, either in amdvi_write[wlq],
amdvi_init, or directly in amdvi_set_quad:

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 66d42f..48d991 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -213,7 +213,7 @@ static void amdvi_set_quad(AMDVIState *s, hwaddr addr, uint64_t val,
                            uint64_t romask, uint64_t w1cmask)
 {
     stq_le_p(&s->mmior[addr], val);
-    stq_le_p(&s->romask[addr], romask);
+    stq_le_p(&s->romask[addr], romask | w1cmask);
     stq_le_p(&s->w1cmask[addr], w1cmask);
 }

Thanks,
Ethan

>      }
>  }
> 
> --
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] hw/i386/amd_iommu: Fix event log generation
  2025-07-16  7:31 ` [PATCH 5/7] hw/i386/amd_iommu: Fix event log generation Sairaj Kodilkar
@ 2025-07-16 14:50   ` Ethan MILON
  2025-07-21 10:44     ` Sairaj Kodilkar
  0 siblings, 1 reply; 24+ messages in thread
From: Ethan MILON @ 2025-07-16 14:50 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel@nongnu.org
  Cc: mst@redhat.com, marcel.apfelbaum@gmail.com, pbonzini@redhat.com,
	eduardo@habkost.net, richard.henderson@linaro.org,
	alejandro.j.jimenez@oracle.com, Vasant Hegde

On 7/16/25 09:31, Sairaj Kodilkar wrote:
> Current event logging code is broken, because of following issues
> 
> 1. The code uses '|' instead of '&' to test the bit field, which causes
>    vIOMMU to generate overflow interrupt for every log entry.
> 2. Code does not update the eventlog tail MMIO register after adding an
>    entry to the buffer, because of which guest cannot process new
>    entries (as head == tail means buffer is empty).
> 3. Compares eventlog tail (which is byte offset in the buffer) to
>    eventlog length (which is number of maximum entries in the buffer).
>    This causes vIOMMU to generate only fix number of event logs, after
>    which it keeps on generating overflow interrupts, without
>    actually resetting the log buffer.
> 4. Updates ComWaitInt instead of EventLogInt bitfield in Status
>    register. Guest checks this field to see if there are new event log
>    entries in the buffer.

You missed one issue, the head and tail should be reset when updating
the base pointer.

@@ -707,6 +711,10 @@ static inline void amdvi_handle_evtbase_write(AMDVIState *s)
     s->evtlog = val & AMDVI_MMIO_EVTLOG_BASE_MASK;
     s->evtlog_len = 1UL << (amdvi_readq(s, AMDVI_MMIO_EVTLOG_SIZE_BYTE)
                     & AMDVI_MMIO_EVTLOG_SIZE_MASK);
+    /* clear tail and head pointer to 0 when event base is updated */
+    s->evtlog_tail = s->evtlog_head = 0;
+    amdvi_writeq_raw(s, AMDVI_MMIO_EVENT_TAIL, s->evtlog_tail);
+    amdvi_writeq_raw(s, AMDVI_MMIO_EVENT_HEAD, s->evtlog_head);
 }

 static inline void amdvi_handle_evttail_write(AMDVIState *s)

Moreover in the spec at 2.5.1 Event Log Restart Procedure, it is 
written "The IOMMU event logging is disabled after system reset 
and when the event log overflows."
Should we implement this behavior or the overflow flag is enough ?

> 
> Fix above issues, so that guest can process event log entries.
> 
> Fixes: d29a09ca68428 ("hw/i386: Introduce AMD IOMMU")
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
> ---
>  hw/i386/amd_iommu.c | 20 ++++++++++++++++----
>  hw/i386/amd_iommu.h |  1 +
>  2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index e0f4220b8f25..a34062153194 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -172,7 +172,7 @@ static void amdvi_writeq(AMDVIState *s, hwaddr addr, uint64_t val)
>  /* OR a 64-bit register with a 64-bit value */

s/OR/AND

>  static bool amdvi_test_mask(AMDVIState *s, hwaddr addr, uint64_t val)
>  {
> -    return amdvi_readq(s, addr) | val;
> +    return amdvi_readq(s, addr) & val;
>  }
> 
>  /* OR a 64-bit register with a 64-bit value storing result in the register */
> @@ -201,16 +201,26 @@ static void amdvi_generate_msi_interrupt(AMDVIState *s)
>      }
>  }
> 
> +static uint32_t get_next_eventlog_entry(AMDVIState *s)
> +{
> +    uint32_t evtlog_size = s->evtlog_len * AMDVI_EVENT_LEN;
> +    return (s->evtlog_tail + AMDVI_EVENT_LEN) % evtlog_size;
> +}
> +
>  static void amdvi_log_event(AMDVIState *s, uint64_t *evt)
>  {
> +    uint32_t evtlog_tail_next;
> +
>      /* event logging not enabled */
>      if (!s->evtlog_enabled || amdvi_test_mask(s, AMDVI_MMIO_STATUS,
>          AMDVI_MMIO_STATUS_EVT_OVF)) {
>          return;
>      }
> 
> +    evtlog_tail_next = get_next_eventlog_entry(s);
> +
>      /* event log buffer full */
> -    if (s->evtlog_tail >= s->evtlog_len) {
> +    if (evtlog_tail_next == s->evtlog_head) {
>          amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVT_OVF);
>          /* generate interrupt */
>          amdvi_generate_msi_interrupt(s);
> @@ -222,8 +232,10 @@ static void amdvi_log_event(AMDVIState *s, uint64_t *evt)
>          trace_amdvi_evntlog_fail(s->evtlog, s->evtlog_tail);
>      }
> 
> -    s->evtlog_tail += AMDVI_EVENT_LEN;
> -    amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_COMP_INT);
> +    s->evtlog_tail = evtlog_tail_next;
> +    amdvi_writeq(s, AMDVI_MMIO_EVENT_TAIL, s->evtlog_tail);
> +
> +    amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVENT_INT);
>      amdvi_generate_msi_interrupt(s);

We should take into account the s->evtlog_intr flag before generating
the interrupt.

And I think we could refactor amdvi_assign_orq inside
amdvi_generate_msi_interrupt, so we could do:
amdvi_generate_msi_interrupt(s, AMDVI_MMIO_STATUS_EVENT_INT); for
example.

Thanks,
Ethan

>  }
> 
> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> index 62641b779ca3..3dd4e7e3e8b8 100644
> --- a/hw/i386/amd_iommu.h
> +++ b/hw/i386/amd_iommu.h
> @@ -111,6 +111,7 @@
>  #define AMDVI_MMIO_STATUS_CMDBUF_RUN  (1 << 4)
>  #define AMDVI_MMIO_STATUS_EVT_RUN     (1 << 3)
>  #define AMDVI_MMIO_STATUS_COMP_INT    (1 << 2)
> +#define AMDVI_MMIO_STATUS_EVENT_INT   (1 << 1)
>  #define AMDVI_MMIO_STATUS_EVT_OVF     (1 << 0)
> 
>  #define AMDVI_CMDBUF_ID_BYTE              0x07
> --
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] hw/i386/amd_iommu: Fix handling device on buses != 0
  2025-07-16  7:31 ` [PATCH 6/7] hw/i386/amd_iommu: Fix handling device on buses != 0 Sairaj Kodilkar
@ 2025-07-16 15:18   ` Ethan MILON
  2025-07-17  6:47     ` Sairaj Kodilkar
  0 siblings, 1 reply; 24+ messages in thread
From: Ethan MILON @ 2025-07-16 15:18 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel@nongnu.org
  Cc: mst@redhat.com, marcel.apfelbaum@gmail.com, pbonzini@redhat.com,
	eduardo@habkost.net, richard.henderson@linaro.org,
	alejandro.j.jimenez@oracle.com, Alexey Kardashevskiy

On 7/16/25 09:31, Sairaj Kodilkar wrote:
> The AMD IOMMU is set up at boot time and uses PCI bus numbers + devfn
> for indexing into DTE. The problem is that before the guest started,
> all PCI bus numbers are 0 as no PCI discovery happened yet (BIOS or/and
> kernel will do that later) so relying on the bus number is wrong.
> The immediate effect is emulated devices cannot do DMA when places on
> a bus other that 0.
> 
> Replace static array of address_space with hash table which uses devfn and
> PCIBus* for key as it is not going to change after the guest is booted.
> 
> Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
> ---
>  hw/i386/amd_iommu.c | 124 +++++++++++++++++++++++++++-----------------
>  hw/i386/amd_iommu.h |   2 +-
>  2 files changed, 76 insertions(+), 50 deletions(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index a34062153194..33916b458611 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -59,7 +59,7 @@ const char *amdvi_mmio_high[] = {
>  };
> 
>  struct AMDVIAddressSpace {
> -    uint8_t bus_num;            /* bus number                           */
> +    PCIBus *bus;                /* PCIBus (for bus number)              */
>      uint8_t devfn;              /* device function                      */
>      AMDVIState *iommu_state;    /* AMDVI - one per machine              */
>      MemoryRegion root;          /* AMDVI Root memory map region         */
> @@ -101,6 +101,11 @@ typedef enum AMDVIFaultReason {
>      AMDVI_FR_PT_ENTRY_INV,      /* Failure to read PTE from guest memory */
>  } AMDVIFaultReason;
> 
> +typedef struct amdvi_as_key {
> +    PCIBus *bus;
> +    int devfn;
> +} amdvi_as_key;
> +
>  uint64_t amdvi_extended_feature_register(AMDVIState *s)
>  {
>      uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
> @@ -360,6 +365,42 @@ static guint amdvi_uint64_hash(gconstpointer v)
>      return (guint)*(const uint64_t *)v;
>  }
> 
> +static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
> +{
> +    const struct amdvi_as_key *key1 = v1;
> +    const struct amdvi_as_key *key2 = v2;
> +
> +    return key1->bus == key2->bus && key1->devfn == key2->devfn;
> +}
> +
> +static guint amdvi_as_hash(gconstpointer v)
> +{
> +    const struct amdvi_as_key *key = v;
> +    return (guint)((uint64_t)key->bus | (key->devfn << 24));

I think it should at least be a xor, but a hash similar to the
intel one is probably preferable:

return (guint)((uintptr_t)key->bus << 8) | key->devfn);

> +}
> +
> +static AMDVIAddressSpace *amdvi_as_lookup(AMDVIState *s, PCIBus *bus,
> +                                          int devfn)
> +{
> +    amdvi_as_key key = { .bus = bus, .devfn = devfn };
> +    return g_hash_table_lookup(s->address_spaces, &key);
> +}
> +
> +static int amdvi_find_as_by_devid(gpointer key, gpointer value,
> +                                  gpointer user_data)
> +{
> +    amdvi_as_key *as = (struct amdvi_as_key *)key;
> +    uint16_t devid = *((uint16_t *)user_data);
> +
> +    return devid == PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
> +}
> +
> +static AMDVIAddressSpace *amdvi_get_as_by_devid(AMDVIState *s, uint16_t devid)
> +{
> +    return g_hash_table_find(s->address_spaces,
> +                             amdvi_find_as_by_devid, &devid);
> +}
> +
>  static AMDVIIOTLBEntry *amdvi_iotlb_lookup(AMDVIState *s, hwaddr addr,
>                                             uint64_t devid)
>  {
> @@ -530,7 +571,7 @@ static inline uint64_t amdvi_get_pte_entry(AMDVIState *s, uint64_t pte_addr,
> 
>  static int amdvi_as_to_dte(AMDVIAddressSpace *as, uint64_t *dte)
>  {
> -    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
> +    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
>      AMDVIState *s = as->iommu_state;
> 
>      if (!amdvi_get_dte(s, devid, dte)) {
> @@ -983,23 +1024,13 @@ static void amdvi_switch_address_space(AMDVIAddressSpace *amdvi_as)
>   */
>  static void amdvi_switch_address_space_all(AMDVIState *s)
>  {
> -    AMDVIAddressSpace **iommu_as;
> -
> -    for (int bus_num = 0; bus_num < PCI_BUS_MAX; bus_num++) {
> -
> -        /* Nothing to do if there are no devices on the current bus */
> -        if (!s->address_spaces[bus_num]) {
> -            continue;
> -        }
> -        iommu_as = s->address_spaces[bus_num];
> +    AMDVIAddressSpace *iommu_as;
> +    GHashTableIter as_it;
> 
> -        for (int devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> +    g_hash_table_iter_init(&as_it, s->address_spaces);
> 
> -            if (!iommu_as[devfn]) {
> -                continue;
> -            }
> -            amdvi_switch_address_space(iommu_as[devfn]);
> -        }
> +    while (g_hash_table_iter_next(&as_it, NULL, (void **)&iommu_as)) {
> +            amdvi_switch_address_space(iommu_as);
>      }
>  }
> 
> @@ -1012,28 +1043,22 @@ static void amdvi_switch_address_space_all(AMDVIState *s)
>   */
>  static void amdvi_update_addr_translation_mode(AMDVIState *s, uint16_t devid)
>  {
> -    uint8_t bus_num, devfn, dte_mode;
> +    uint8_t dte_mode;
>      AMDVIAddressSpace *as;
>      uint64_t dte[4] = { 0 };
>      IOMMUNotifier *n;
>      int ret;
> 
> -    /*
> -     * Convert the devid encoded in the command to a bus and devfn in
> -     * order to retrieve the corresponding address space.
> -     */
> -    bus_num = PCI_BUS_NUM(devid);
> -    devfn = devid & 0xff;
> -
>      /*
>       * The main buffer of size (AMDVIAddressSpace *) * (PCI_BUS_MAX) has already
>       * been allocated within AMDVIState, but must be careful to not access
>       * unallocated devfn.
>       */
> -    if (!s->address_spaces[bus_num] || !s->address_spaces[bus_num][devfn]) {
> +
> +    as = amdvi_get_as_by_devid(s, devid);
> +    if (!as) {
>          return;
>      }
> -    as = s->address_spaces[bus_num][devfn];
> 
>      ret = amdvi_as_to_dte(as, dte);
> 
> @@ -1699,7 +1724,7 @@ static void amdvi_do_translate(AMDVIAddressSpace *as, hwaddr addr,
>                                 bool is_write, IOMMUTLBEntry *ret)
>  {
>      AMDVIState *s = as->iommu_state;
> -    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
> +    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
>      AMDVIIOTLBEntry *iotlb_entry = amdvi_iotlb_lookup(s, addr, devid);
>      uint64_t entry[4];
>      int dte_ret;
> @@ -1773,7 +1798,7 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>      }
> 
>      amdvi_do_translate(as, addr, flag & IOMMU_WO, &ret);
> -    trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
> +    trace_amdvi_translation_result(pci_bus_num(as->bus), PCI_SLOT(as->devfn),
>              PCI_FUNC(as->devfn), addr, ret.translated_addr);
>      return ret;
>  }
> @@ -2137,30 +2162,28 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>  {
>      char name[128];
>      AMDVIState *s = opaque;
> -    AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
> -    int bus_num = pci_bus_num(bus);
> +    AMDVIAddressSpace *amdvi_dev_as;
> +    amdvi_as_key *key;
> 
> -    iommu_as = s->address_spaces[bus_num];
> +    amdvi_dev_as = amdvi_as_lookup(s, bus, devfn);
> 
>      /* allocate memory during the first run */
> -    if (!iommu_as) {
> -        iommu_as = g_new0(AMDVIAddressSpace *, PCI_DEVFN_MAX);
> -        s->address_spaces[bus_num] = iommu_as;
> -    }
> -
> -    /* set up AMD-Vi region */
> -    if (!iommu_as[devfn]) {
> +    if (!amdvi_dev_as) {
>          snprintf(name, sizeof(name), "amd_iommu_devfn_%d", devfn);
> 
> -        iommu_as[devfn] = g_new0(AMDVIAddressSpace, 1);
> -        iommu_as[devfn]->bus_num = (uint8_t)bus_num;
> -        iommu_as[devfn]->devfn = (uint8_t)devfn;
> -        iommu_as[devfn]->iommu_state = s;
> -        iommu_as[devfn]->notifier_flags = IOMMU_NONE;

s/IOMMU_NONE/IOMMU_NOTIFIER_NONE

Thanks,
Ethan

> -        iommu_as[devfn]->iova_tree = iova_tree_new();
> -        iommu_as[devfn]->addr_translation = false;
> +        amdvi_dev_as = g_new0(AMDVIAddressSpace, 1);
> +        key = g_new0(amdvi_as_key, 1);
> 
> -        amdvi_dev_as = iommu_as[devfn];
> +        amdvi_dev_as->bus = bus;
> +        amdvi_dev_as->devfn = (uint8_t)devfn;
> +        amdvi_dev_as->iommu_state = s;
> +        amdvi_dev_as->notifier_flags = IOMMU_NONE;
> +        amdvi_dev_as->iova_tree = iova_tree_new();
> +        amdvi_dev_as->addr_translation = false;
> +        key->bus = bus;
> +        key->devfn = devfn;
> +
> +        g_hash_table_insert(s->address_spaces, key, amdvi_dev_as);
> 
>          /*
>           * Memory region relationships looks like (Address range shows
> @@ -2203,7 +2226,7 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
> 
>          amdvi_switch_address_space(amdvi_dev_as);
>      }
> -    return &iommu_as[devfn]->as;
> +    return &amdvi_dev_as->as;
>  }
> 
>  static const PCIIOMMUOps amdvi_iommu_ops = {
> @@ -2244,7 +2267,7 @@ static int amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>      if (!s->dma_remap && (new & IOMMU_NOTIFIER_MAP)) {
>          error_setg_errno(errp, ENOTSUP,
>                  "device %02x.%02x.%x requires dma-remap=1",
> -                as->bus_num, PCI_SLOT(as->devfn), PCI_FUNC(as->devfn));
> +                pci_bus_num(as->bus), PCI_SLOT(as->devfn), PCI_FUNC(as->devfn));
>          return -ENOTSUP;
>      }
> 
> @@ -2353,6 +2376,9 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
>      s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
>                                       amdvi_uint64_equal, g_free, g_free);
> 
> +    s->address_spaces = g_hash_table_new_full(amdvi_as_hash,
> +                                     amdvi_as_equal, g_free, g_free);
> +
>      /* This device should take care of IOMMU PCI properties */
>      if (!qdev_realize(DEVICE(&s->pci), &bus->qbus, errp)) {
>          return;
> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> index 3dd4e7e3e8b8..37a57c4dd553 100644
> --- a/hw/i386/amd_iommu.h
> +++ b/hw/i386/amd_iommu.h
> @@ -409,7 +409,7 @@ struct AMDVIState {
>      bool mmio_enabled;
> 
>      /* for each served device */
> -    AMDVIAddressSpace **address_spaces[PCI_BUS_MAX];
> +    GHashTable *address_spaces;
> 
>      /* list of address spaces with registered notifiers */
>      QLIST_HEAD(, AMDVIAddressSpace) amdvi_as_with_notifiers;
> --
> 2.34.1
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes
  2025-07-16 13:29     ` Michael S. Tsirkin
@ 2025-07-17  5:47       ` Sairaj Kodilkar
  2025-07-17  6:07         ` Michael S. Tsirkin
  0 siblings, 1 reply; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-17  5:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Philippe Mathieu-Daudé, qemu-devel, marcel.apfelbaum,
	pbonzini, eduardo, richard.henderson, alejandro.j.jimenez,
	Vasant Hegde



On 7/16/2025 6:59 PM, Michael S. Tsirkin wrote:
> On Wed, Jul 16, 2025 at 06:26:37PM +0530, Sairaj Kodilkar wrote:
>>
>>
>> On 7/16/2025 6:07 PM, Philippe Mathieu-Daudé wrote:
>>> On 16/7/25 09:31, Sairaj Kodilkar wrote:
>>>> This series provides few cleanups and fixes for the amd iommu
>>>>
>>>> The patches are based on top of 56c6e249b698 (v10.0.0-rc3) and
>>>> Alejandro's
>>>> DMA remapping series [1].
>>>
>>> 56c6e249b698 is 4 months old, we are about to release v10.1.0-rc0.
>>>
>>> What is the point of posting obsolete code?
>>>
>>> I'm not going to review further.
>>
>> Hey Philippe,
>>
>> sorry. I missed to add it in cover letter. Intention is to get feedback
>> on this series. Since this is on top of Alejandro's series, I will
>> rebase it once his series gets merged.
>>
>> Regards
>> Sairaj
> 
> Merged now, go ahead and rebase.
> 

Hey Michael,

Sorry, I should have mentioned which series I am talking about,
https://lore.kernel.org/qemu-devel/20250502021605.1795985-1-\
alejandro.j.jimenez@oracle.com/

I know Alejandro's cleanup series has merged. I was waiting for
DMA remapping series. But what I'll do now, is remove the patches that
depend on his series and rebase remaining patches on top of master.

Thanks and sorry for inconvenience. Will take care of this in future

Thanks
Sairaj

>>>
>>> Regards,
>>>
>>> Phil.
>>>
>>>> [1] https://lore.kernel.org/all/20250502021605.1795985-1-
>>>> alejandro.j.jimenez@oracle.com/
>>>>
>>>> The series is uploaded on github:
>>>> https://github.com/AMDESE/qemu-iommu/tree/sarunkod/alej%2Bcleanup-v1
>>>>
>>>> Sairaj Kodilkar (7):
>>>>     hw/i386/amd_iommu: Fix MMIO register write tracing
>>>>     hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field
>>>>     hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end
>>>>     hw/i386/amd_iommu: Support MMIO writes to the status register
>>>>     hw/i386/amd_iommu: Fix event log generation
>>>>     hw/i386/amd_iommu: Fix handling device on buses != 0
>>>>     hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
>>>>
>>>>    hw/i386/amd_iommu.c | 217 ++++++++++++++++++++++++++++----------------
>>>>    hw/i386/amd_iommu.h |   9 +-
>>>>    2 files changed, 146 insertions(+), 80 deletions(-)
>>>>
>>>
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes
  2025-07-17  5:47       ` Sairaj Kodilkar
@ 2025-07-17  6:07         ` Michael S. Tsirkin
  2025-07-17 13:48           ` Alejandro Jimenez
  0 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2025-07-17  6:07 UTC (permalink / raw)
  To: Sairaj Kodilkar
  Cc: Philippe Mathieu-Daudé, qemu-devel, marcel.apfelbaum,
	pbonzini, eduardo, richard.henderson, alejandro.j.jimenez,
	Vasant Hegde

On Thu, Jul 17, 2025 at 11:17:05AM +0530, Sairaj Kodilkar wrote:
> 
> 
> On 7/16/2025 6:59 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 16, 2025 at 06:26:37PM +0530, Sairaj Kodilkar wrote:
> > > 
> > > 
> > > On 7/16/2025 6:07 PM, Philippe Mathieu-Daudé wrote:
> > > > On 16/7/25 09:31, Sairaj Kodilkar wrote:
> > > > > This series provides few cleanups and fixes for the amd iommu
> > > > > 
> > > > > The patches are based on top of 56c6e249b698 (v10.0.0-rc3) and
> > > > > Alejandro's
> > > > > DMA remapping series [1].
> > > > 
> > > > 56c6e249b698 is 4 months old, we are about to release v10.1.0-rc0.
> > > > 
> > > > What is the point of posting obsolete code?
> > > > 
> > > > I'm not going to review further.
> > > 
> > > Hey Philippe,
> > > 
> > > sorry. I missed to add it in cover letter. Intention is to get feedback
> > > on this series. Since this is on top of Alejandro's series, I will
> > > rebase it once his series gets merged.
> > > 
> > > Regards
> > > Sairaj
> > 
> > Merged now, go ahead and rebase.
> > 
> 
> Hey Michael,
> 
> Sorry, I should have mentioned which series I am talking about,
> https://lore.kernel.org/qemu-devel/20250502021605.1795985-1-\
> alejandro.j.jimenez@oracle.com/


Alejandro said he will send v3 of this.

> I know Alejandro's cleanup series has merged. I was waiting for
> DMA remapping series. But what I'll do now, is remove the patches that
> depend on his series and rebase remaining patches on top of master.
> 
> Thanks and sorry for inconvenience. Will take care of this in future
> 
> Thanks
> Sairaj
> 
> > > > 
> > > > Regards,
> > > > 
> > > > Phil.
> > > > 
> > > > > [1] https://lore.kernel.org/all/20250502021605.1795985-1-
> > > > > alejandro.j.jimenez@oracle.com/
> > > > > 
> > > > > The series is uploaded on github:
> > > > > https://github.com/AMDESE/qemu-iommu/tree/sarunkod/alej%2Bcleanup-v1
> > > > > 
> > > > > Sairaj Kodilkar (7):
> > > > >     hw/i386/amd_iommu: Fix MMIO register write tracing
> > > > >     hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field
> > > > >     hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end
> > > > >     hw/i386/amd_iommu: Support MMIO writes to the status register
> > > > >     hw/i386/amd_iommu: Fix event log generation
> > > > >     hw/i386/amd_iommu: Fix handling device on buses != 0
> > > > >     hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
> > > > > 
> > > > >    hw/i386/amd_iommu.c | 217 ++++++++++++++++++++++++++++----------------
> > > > >    hw/i386/amd_iommu.h |   9 +-
> > > > >    2 files changed, 146 insertions(+), 80 deletions(-)
> > > > > 
> > > > 
> > 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/7] hw/i386/amd_iommu: Support MMIO writes to the status register
  2025-07-16 14:27   ` Ethan MILON
@ 2025-07-17  6:41     ` Sairaj Kodilkar
  0 siblings, 0 replies; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-17  6:41 UTC (permalink / raw)
  To: Ethan MILON, qemu-devel@nongnu.org
  Cc: mst@redhat.com, marcel.apfelbaum@gmail.com, pbonzini@redhat.com,
	eduardo@habkost.net, richard.henderson@linaro.org,
	alejandro.j.jimenez@oracle.com, Vasant Hegde



On 7/16/2025 7:57 PM, Ethan MILON wrote:
> Hi,
> 
> On 7/16/25 09:31, Sairaj Kodilkar wrote:
>> Support the writes to the status register so that guest can reset the
>> EventOverflow, EventLogInt, ComWaitIntr, etc bits after servicing the
>> respective interrupt.
>>
>> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
>> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
>> ---
>>   hw/i386/amd_iommu.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index 784be78f402d..e0f4220b8f25 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -1613,6 +1613,9 @@ static void amdvi_mmio_write(void *opaque, hwaddr addr, uint64_t val,
>>           amdvi_mmio_reg_write(s, size, val, addr);
>>           amdvi_handle_pprtail_write(s);
>>           break;
>> +    case AMDVI_MMIO_STATUS:
>> +        amdvi_mmio_reg_write(s, size, val, addr);
>> +        break;
> 
> This fixes the basic issue with interrupt reset, but there's still a
> subtle bug: any update to the status register clears the interrupt bits,
> regardless of the value.
> 
> The current W1C logic looks like an incomplete copy of the Intel
> implementation, and only works properly if the W1C bits are also set in
> romask.
> 
> We should update romask to include w1cmask, either in amdvi_write[wlq],
> amdvi_init, or directly in amdvi_set_quad:

Yep, verified it. The correct implementation of amdvi_set_quad should 
have (oldval & (romask | w1cmask)).

Thanks for pointing this out.

-Sairaj

> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 66d42f..48d991 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -213,7 +213,7 @@ static void amdvi_set_quad(AMDVIState *s, hwaddr addr, uint64_t val,
>                              uint64_t romask, uint64_t w1cmask)
>   {
>       stq_le_p(&s->mmior[addr], val);
> -    stq_le_p(&s->romask[addr], romask);
> +    stq_le_p(&s->romask[addr], romask | w1cmask);
>       stq_le_p(&s->w1cmask[addr], w1cmask);
>   }
> 
> Thanks,
> Ethan
> 
>>       }
>>   }
>>
>> --
>> 2.34.1
>>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] hw/i386/amd_iommu: Fix handling device on buses != 0
  2025-07-16 15:18   ` Ethan MILON
@ 2025-07-17  6:47     ` Sairaj Kodilkar
  0 siblings, 0 replies; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-17  6:47 UTC (permalink / raw)
  To: Ethan MILON, qemu-devel@nongnu.org
  Cc: mst@redhat.com, marcel.apfelbaum@gmail.com, pbonzini@redhat.com,
	eduardo@habkost.net, richard.henderson@linaro.org,
	alejandro.j.jimenez@oracle.com, Alexey Kardashevskiy



On 7/16/2025 8:48 PM, Ethan MILON wrote:
> On 7/16/25 09:31, Sairaj Kodilkar wrote:
>> The AMD IOMMU is set up at boot time and uses PCI bus numbers + devfn
>> for indexing into DTE. The problem is that before the guest started,
>> all PCI bus numbers are 0 as no PCI discovery happened yet (BIOS or/and
>> kernel will do that later) so relying on the bus number is wrong.
>> The immediate effect is emulated devices cannot do DMA when places on
>> a bus other that 0.
>>
>> Replace static array of address_space with hash table which uses devfn and
>> PCIBus* for key as it is not going to change after the guest is booted.
>>
>> Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
>> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
>> ---
>>   hw/i386/amd_iommu.c | 124 +++++++++++++++++++++++++++-----------------
>>   hw/i386/amd_iommu.h |   2 +-
>>   2 files changed, 76 insertions(+), 50 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index a34062153194..33916b458611 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -59,7 +59,7 @@ const char *amdvi_mmio_high[] = {
>>   };
>>
>>   struct AMDVIAddressSpace {
>> -    uint8_t bus_num;            /* bus number                           */
>> +    PCIBus *bus;                /* PCIBus (for bus number)              */
>>       uint8_t devfn;              /* device function                      */
>>       AMDVIState *iommu_state;    /* AMDVI - one per machine              */
>>       MemoryRegion root;          /* AMDVI Root memory map region         */
>> @@ -101,6 +101,11 @@ typedef enum AMDVIFaultReason {
>>       AMDVI_FR_PT_ENTRY_INV,      /* Failure to read PTE from guest memory */
>>   } AMDVIFaultReason;
>>
>> +typedef struct amdvi_as_key {
>> +    PCIBus *bus;
>> +    int devfn;
>> +} amdvi_as_key;
>> +
>>   uint64_t amdvi_extended_feature_register(AMDVIState *s)
>>   {
>>       uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
>> @@ -360,6 +365,42 @@ static guint amdvi_uint64_hash(gconstpointer v)
>>       return (guint)*(const uint64_t *)v;
>>   }
>>
>> +static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
>> +{
>> +    const struct amdvi_as_key *key1 = v1;
>> +    const struct amdvi_as_key *key2 = v2;
>> +
>> +    return key1->bus == key2->bus && key1->devfn == key2->devfn;
>> +}
>> +
>> +static guint amdvi_as_hash(gconstpointer v)
>> +{
>> +    const struct amdvi_as_key *key = v;
>> +    return (guint)((uint64_t)key->bus | (key->devfn << 24));
> 
> I think it should at least be a xor, but a hash similar to the
> intel one is probably preferable:
>
> return (guint)((uintptr_t)key->bus << 8) | key->devfn);
> 

Makes sense considering that guint is 32 bit on most 64 bit machines.
But I am not sure if this is a good hash function (with uniform
distribution). Nonetheless, I will still change it to intel's
implementation.

Thanks
Sairaj


../..

>> -
>> -    /* set up AMD-Vi region */
>> -    if (!iommu_as[devfn]) {
>> +    if (!amdvi_dev_as) {
>>           snprintf(name, sizeof(name), "amd_iommu_devfn_%d", devfn);
>>
>> -        iommu_as[devfn] = g_new0(AMDVIAddressSpace, 1);
>> -        iommu_as[devfn]->bus_num = (uint8_t)bus_num;
>> -        iommu_as[devfn]->devfn = (uint8_t)devfn;
>> -        iommu_as[devfn]->iommu_state = s;
>> -        iommu_as[devfn]->notifier_flags = IOMMU_NONE;
> 
> s/IOMMU_NONE/IOMMU_NOTIFIER_NONE
> 

This is part of @Alejandro's changes in DMA remapping. I already pointed
this to him on his V2.

Thanks
Sairaj

../..


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes
  2025-07-17  6:07         ` Michael S. Tsirkin
@ 2025-07-17 13:48           ` Alejandro Jimenez
  2025-07-18 13:28             ` Vasant Hegde
  0 siblings, 1 reply; 24+ messages in thread
From: Alejandro Jimenez @ 2025-07-17 13:48 UTC (permalink / raw)
  To: Michael S. Tsirkin, Sairaj Kodilkar
  Cc: Philippe Mathieu-Daudé, qemu-devel, marcel.apfelbaum,
	pbonzini, eduardo, richard.henderson, Vasant Hegde



On 7/17/25 2:07 AM, Michael S. Tsirkin wrote:
> On Thu, Jul 17, 2025 at 11:17:05AM +0530, Sairaj Kodilkar wrote:
>>
>>
>> On 7/16/2025 6:59 PM, Michael S. Tsirkin wrote:
>>> On Wed, Jul 16, 2025 at 06:26:37PM +0530, Sairaj Kodilkar wrote:
>>>>
>>>>
>>>> On 7/16/2025 6:07 PM, Philippe Mathieu-Daudé wrote:
>>>>> On 16/7/25 09:31, Sairaj Kodilkar wrote:
>>>>>> This series provides few cleanups and fixes for the amd iommu
>>>>>>
>>>>>> The patches are based on top of 56c6e249b698 (v10.0.0-rc3) and
>>>>>> Alejandro's
>>>>>> DMA remapping series [1].
>>>>>
>>>>> 56c6e249b698 is 4 months old, we are about to release v10.1.0-rc0.
>>>>>
>>>>> What is the point of posting obsolete code?
>>>>>
>>>>> I'm not going to review further.
>>>>
>>>> Hey Philippe,
>>>>
>>>> sorry. I missed to add it in cover letter. Intention is to get feedback
>>>> on this series. Since this is on top of Alejandro's series, I will
>>>> rebase it once his series gets merged.
>>>>
>>>> Regards
>>>> Sairaj
>>>
>>> Merged now, go ahead and rebase.
>>>
>>
>> Hey Michael,
>>
>> Sorry, I should have mentioned which series I am talking about,
>> https://lore.kernel.org/qemu-devel/20250502021605.1795985-1-\
>> alejandro.j.jimenez@oracle.com/
> 
> 
> Alejandro said he will send v3 of this.
> 

Yes, I am working on this task, and expect to send the new revision 
soon. The pending issue I am working to address (failures on reboot when 
guest boots with forcedac=1) also requires a change in the VFIO host 
kernel driver, but I will send the QEMU patches as soon as that portion 
is ready.

Thank you,
Alejandro

>> I know Alejandro's cleanup series has merged. I was waiting for
>> DMA remapping series. But what I'll do now, is remove the patches that
>> depend on his series and rebase remaining patches on top of master.
>>
>> Thanks and sorry for inconvenience. Will take care of this in future
>>
>> Thanks
>> Sairaj
>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Phil.
>>>>>
>>>>>> [1] https://lore.kernel.org/all/20250502021605.1795985-1-
>>>>>> alejandro.j.jimenez@oracle.com/
>>>>>>
>>>>>> The series is uploaded on github:
>>>>>> https://github.com/AMDESE/qemu-iommu/tree/sarunkod/alej%2Bcleanup-v1
>>>>>>
>>>>>> Sairaj Kodilkar (7):
>>>>>>      hw/i386/amd_iommu: Fix MMIO register write tracing
>>>>>>      hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field
>>>>>>      hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end
>>>>>>      hw/i386/amd_iommu: Support MMIO writes to the status register
>>>>>>      hw/i386/amd_iommu: Fix event log generation
>>>>>>      hw/i386/amd_iommu: Fix handling device on buses != 0
>>>>>>      hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
>>>>>>
>>>>>>     hw/i386/amd_iommu.c | 217 ++++++++++++++++++++++++++++----------------
>>>>>>     hw/i386/amd_iommu.h |   9 +-
>>>>>>     2 files changed, 146 insertions(+), 80 deletions(-)
>>>>>>
>>>>>
>>>
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes
  2025-07-17 13:48           ` Alejandro Jimenez
@ 2025-07-18 13:28             ` Vasant Hegde
  2025-07-18 14:30               ` Alejandro Jimenez
  0 siblings, 1 reply; 24+ messages in thread
From: Vasant Hegde @ 2025-07-18 13:28 UTC (permalink / raw)
  To: Alejandro Jimenez, Michael S. Tsirkin, Sairaj Kodilkar
  Cc: Philippe Mathieu-Daudé, qemu-devel, marcel.apfelbaum,
	pbonzini, eduardo, richard.henderson

Alejandro, Sairaj,


On 7/17/2025 7:18 PM, Alejandro Jimenez wrote:
> 
> 
> On 7/17/25 2:07 AM, Michael S. Tsirkin wrote:
>> On Thu, Jul 17, 2025 at 11:17:05AM +0530, Sairaj Kodilkar wrote:
>>>
>>>
>>> On 7/16/2025 6:59 PM, Michael S. Tsirkin wrote:
>>>> On Wed, Jul 16, 2025 at 06:26:37PM +0530, Sairaj Kodilkar wrote:
>>>>>


.../...

>>>
>>> Hey Michael,
>>>
>>> Sorry, I should have mentioned which series I am talking about,
>>> https://lore.kernel.org/qemu-devel/20250502021605.1795985-1-\
>>> alejandro.j.jimenez@oracle.com/
>>
>>
>> Alejandro said he will send v3 of this.
>>
> 
> Yes, I am working on this task, and expect to send the new revision soon. The
> pending issue I am working to address (failures on reboot when guest boots with
> forcedac=1) also requires a change in the VFIO host kernel driver, but I will
> send the QEMU patches as soon as that portion is ready.

Sure. May be for now keep that as known issue and move on ? we can fix it later.

Other thought (Again it can be separate patch/series)

Ankit's kernel side fix to support different host page table level is queued for
next merge window. How about reducing MAX host page table level support from
current 6 level to 4 level? I think this should help to improve performance
little bit as well.


[1]
https://lore.kernel.org/linux-iommu/8109b208f87b80e400c2abd24a2e44fcbc0763a5.1749016436.git.Ankit.Soni@amd.com/


-Vasant


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes
  2025-07-18 13:28             ` Vasant Hegde
@ 2025-07-18 14:30               ` Alejandro Jimenez
  0 siblings, 0 replies; 24+ messages in thread
From: Alejandro Jimenez @ 2025-07-18 14:30 UTC (permalink / raw)
  To: Vasant Hegde, Michael S. Tsirkin, Sairaj Kodilkar
  Cc: Philippe Mathieu-Daudé, qemu-devel, marcel.apfelbaum,
	pbonzini, eduardo, richard.henderson



On 7/18/25 9:28 AM, Vasant Hegde wrote:
> Alejandro, Sairaj,
> 
> 
> On 7/17/2025 7:18 PM, Alejandro Jimenez wrote:
>>
>>
>> On 7/17/25 2:07 AM, Michael S. Tsirkin wrote:
>>> On Thu, Jul 17, 2025 at 11:17:05AM +0530, Sairaj Kodilkar wrote:
>>>>
>>>>
>>>> On 7/16/2025 6:59 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Jul 16, 2025 at 06:26:37PM +0530, Sairaj Kodilkar wrote:
>>>>>>
> 
> 
> .../...
> 
>>>>
>>>> Hey Michael,
>>>>
>>>> Sorry, I should have mentioned which series I am talking about,
>>>> https://lore.kernel.org/qemu-devel/20250502021605.1795985-1-\
>>>> alejandro.j.jimenez@oracle.com/
>>>
>>>
>>> Alejandro said he will send v3 of this.
>>>
>>
>> Yes, I am working on this task, and expect to send the new revision soon. The
>> pending issue I am working to address (failures on reboot when guest boots with
>> forcedac=1) also requires a change in the VFIO host kernel driver, but I will
>> send the QEMU patches as soon as that portion is ready.
> 
> Sure. May be for now keep that as known issue and move on ? we can fix it later.
>

That would help. I am spending time on the forcedac issue, trying to 
rule out all the reasons I can think of. But I am reaching a point in 
the investigation where I'd need to ask for advice from the community, 
and having the code merged would simplify things.

So I'll try a few more ideas and if I still can't figure out the issue 
I'll send v3 with an acknowledgment of the limitation, if that is a 
viable approach for others too.


> Other thought (Again it can be separate patch/series)
> 
> Ankit's kernel side fix to support different host page table level is queued for
> next merge window. How about reducing MAX host page table level support from
> current 6 level to 4 level? I think this should help to improve performance
> little bit as well.
> 
> 

Yes, I have considered that option as the "fallback" path, since it is 
an architecturally valid restriction that should solve this problem too. 
My v3 wip branch already includes support for the HATDis portion 
(patches from Joao Martins), and I was planning to add the HATS support 
in future series.

Thank you,
Alejandro

> [1]
> https://lore.kernel.org/linux-iommu/8109b208f87b80e400c2abd24a2e44fcbc0763a5.1749016436.git.Ankit.Soni@amd.com/
> 
> 
> -Vasant



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] hw/i386/amd_iommu: Fix event log generation
  2025-07-16 14:50   ` Ethan MILON
@ 2025-07-21 10:44     ` Sairaj Kodilkar
  0 siblings, 0 replies; 24+ messages in thread
From: Sairaj Kodilkar @ 2025-07-21 10:44 UTC (permalink / raw)
  To: Ethan MILON, qemu-devel@nongnu.org
  Cc: mst@redhat.com, marcel.apfelbaum@gmail.com, pbonzini@redhat.com,
	eduardo@habkost.net, richard.henderson@linaro.org,
	alejandro.j.jimenez@oracle.com, Vasant Hegde



On 7/16/2025 8:20 PM, Ethan MILON wrote:
> On 7/16/25 09:31, Sairaj Kodilkar wrote:
>> Current event logging code is broken, because of following issues
>>
>> 1. The code uses '|' instead of '&' to test the bit field, which causes
>>     vIOMMU to generate overflow interrupt for every log entry.
>> 2. Code does not update the eventlog tail MMIO register after adding an
>>     entry to the buffer, because of which guest cannot process new
>>     entries (as head == tail means buffer is empty).
>> 3. Compares eventlog tail (which is byte offset in the buffer) to
>>     eventlog length (which is number of maximum entries in the buffer).
>>     This causes vIOMMU to generate only fix number of event logs, after
>>     which it keeps on generating overflow interrupts, without
>>     actually resetting the log buffer.
>> 4. Updates ComWaitInt instead of EventLogInt bitfield in Status
>>     register. Guest checks this field to see if there are new event log
>>     entries in the buffer.
> 
> You missed one issue, the head and tail should be reset when updating
> the base pointer.
> 
> @@ -707,6 +711,10 @@ static inline void amdvi_handle_evtbase_write(AMDVIState *s)
>       s->evtlog = val & AMDVI_MMIO_EVTLOG_BASE_MASK;
>       s->evtlog_len = 1UL << (amdvi_readq(s, AMDVI_MMIO_EVTLOG_SIZE_BYTE)
>                       & AMDVI_MMIO_EVTLOG_SIZE_MASK);
> +    /* clear tail and head pointer to 0 when event base is updated */
> +    s->evtlog_tail = s->evtlog_head = 0;
> +    amdvi_writeq_raw(s, AMDVI_MMIO_EVENT_TAIL, s->evtlog_tail);
> +    amdvi_writeq_raw(s, AMDVI_MMIO_EVENT_HEAD, s->evtlog_head);
>   }

Hi Ethan

Thanks for pointing this out. Will update it !

> 
>   static inline void amdvi_handle_evttail_write(AMDVIState *s)
> 
> Moreover in the spec at 2.5.1 Event Log Restart Procedure, it is
> written "The IOMMU event logging is disabled after system reset
> and when the event log overflows."
> Should we implement this behavior or the overflow flag is enough ?
> 

I think overflow flag is enough, Because code discards the new entries
when the buffer overflows (which is equivalent to event logging being
disabled).

>>
>> Fix above issues, so that guest can process event log entries.
>>
>> Fixes: d29a09ca68428 ("hw/i386: Introduce AMD IOMMU")
>> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
>> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
>> ---
>>   hw/i386/amd_iommu.c | 20 ++++++++++++++++----
>>   hw/i386/amd_iommu.h |  1 +
>>   2 files changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index e0f4220b8f25..a34062153194 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -172,7 +172,7 @@ static void amdvi_writeq(AMDVIState *s, hwaddr addr, uint64_t val)
>>   /* OR a 64-bit register with a 64-bit value */
> 
> s/OR/AND
>

Ack.

>>   static bool amdvi_test_mask(AMDVIState *s, hwaddr addr, uint64_t val)
>>   {
>> -    return amdvi_readq(s, addr) | val;
>> +    return amdvi_readq(s, addr) & val;
>>   }
>>
>>   /* OR a 64-bit register with a 64-bit value storing result in the register */
>> @@ -201,16 +201,26 @@ static void amdvi_generate_msi_interrupt(AMDVIState *s)
>>       }
>>   }
>>
>> +static uint32_t get_next_eventlog_entry(AMDVIState *s)
>> +{
>> +    uint32_t evtlog_size = s->evtlog_len * AMDVI_EVENT_LEN;
>> +    return (s->evtlog_tail + AMDVI_EVENT_LEN) % evtlog_size;
>> +}
>> +
>>   static void amdvi_log_event(AMDVIState *s, uint64_t *evt)
>>   {
>> +    uint32_t evtlog_tail_next;
>> +
>>       /* event logging not enabled */
>>       if (!s->evtlog_enabled || amdvi_test_mask(s, AMDVI_MMIO_STATUS,
>>           AMDVI_MMIO_STATUS_EVT_OVF)) {
>>           return;
>>       }
>>
>> +    evtlog_tail_next = get_next_eventlog_entry(s);
>> +
>>       /* event log buffer full */
>> -    if (s->evtlog_tail >= s->evtlog_len) {
>> +    if (evtlog_tail_next == s->evtlog_head) {
>>           amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVT_OVF);
>>           /* generate interrupt */
>>           amdvi_generate_msi_interrupt(s);
>> @@ -222,8 +232,10 @@ static void amdvi_log_event(AMDVIState *s, uint64_t *evt)
>>           trace_amdvi_evntlog_fail(s->evtlog, s->evtlog_tail);
>>       }
>>
>> -    s->evtlog_tail += AMDVI_EVENT_LEN;
>> -    amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_COMP_INT);
>> +    s->evtlog_tail = evtlog_tail_next;
>> +    amdvi_writeq(s, AMDVI_MMIO_EVENT_TAIL, s->evtlog_tail);
>> +
>> +    amdvi_assign_orq(s, AMDVI_MMIO_STATUS, AMDVI_MMIO_STATUS_EVENT_INT);
>>       amdvi_generate_msi_interrupt(s);
> 
> We should take into account the s->evtlog_intr flag before generating
> the interrupt.
> 
> And I think we could refactor amdvi_assign_orq inside
> amdvi_generate_msi_interrupt, so we could do:
> amdvi_generate_msi_interrupt(s, AMDVI_MMIO_STATUS_EVENT_INT); for
> example.
> 

yes we should do definitely do that ! Thanks for pointing it out.
Also I don't want to change 'amdvi_generate_msi_interrupt()' as it is
being used for few other things in future cleanup patches.

But what I can do it add a check in `amdvi_log_event()`.

Thanks
Sairaj
> Thanks,
> Ethan
> 
>>   }
>>
>> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
>> index 62641b779ca3..3dd4e7e3e8b8 100644
>> --- a/hw/i386/amd_iommu.h
>> +++ b/hw/i386/amd_iommu.h
>> @@ -111,6 +111,7 @@
>>   #define AMDVI_MMIO_STATUS_CMDBUF_RUN  (1 << 4)
>>   #define AMDVI_MMIO_STATUS_EVT_RUN     (1 << 3)
>>   #define AMDVI_MMIO_STATUS_COMP_INT    (1 << 2)
>> +#define AMDVI_MMIO_STATUS_EVENT_INT   (1 << 1)
>>   #define AMDVI_MMIO_STATUS_EVT_OVF     (1 << 0)
>>
>>   #define AMDVI_CMDBUF_ID_BYTE              0x07
>> --
>> 2.34.1
>>
>>



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-07-21 10:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-16  7:31 [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Sairaj Kodilkar
2025-07-16  7:31 ` [PATCH 1/7] hw/i386/amd_iommu: Fix MMIO register write tracing Sairaj Kodilkar
2025-07-16 12:31   ` Philippe Mathieu-Daudé
2025-07-16  7:31 ` [PATCH 2/7] hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled field Sairaj Kodilkar
2025-07-16 12:35   ` Philippe Mathieu-Daudé
2025-07-16  7:31 ` [PATCH 3/7] hw/i386/amd_iommu: Move IOAPIC memory region initialization to the end Sairaj Kodilkar
2025-07-16  7:31 ` [PATCH 4/7] hw/i386/amd_iommu: Support MMIO writes to the status register Sairaj Kodilkar
2025-07-16 14:27   ` Ethan MILON
2025-07-17  6:41     ` Sairaj Kodilkar
2025-07-16  7:31 ` [PATCH 5/7] hw/i386/amd_iommu: Fix event log generation Sairaj Kodilkar
2025-07-16 14:50   ` Ethan MILON
2025-07-21 10:44     ` Sairaj Kodilkar
2025-07-16  7:31 ` [PATCH 6/7] hw/i386/amd_iommu: Fix handling device on buses != 0 Sairaj Kodilkar
2025-07-16 15:18   ` Ethan MILON
2025-07-17  6:47     ` Sairaj Kodilkar
2025-07-16  7:31 ` [PATCH 7/7] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup Sairaj Kodilkar
2025-07-16 12:37 ` [PATCH 0/7] hw/i386/amd_iommu: Cleanups and fixes Philippe Mathieu-Daudé
2025-07-16 12:56   ` Sairaj Kodilkar
2025-07-16 13:29     ` Michael S. Tsirkin
2025-07-17  5:47       ` Sairaj Kodilkar
2025-07-17  6:07         ` Michael S. Tsirkin
2025-07-17 13:48           ` Alejandro Jimenez
2025-07-18 13:28             ` Vasant Hegde
2025-07-18 14:30               ` Alejandro Jimenez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).