[PATCH 0/2] Cleanups and fixes (PART 2)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Cleanups and fixes (PART 2)
@ 2025-10-08 16:43 Sairaj Kodilkar
  2025-10-08 16:43 ` [PATCH 1/2] hw/i386/amd_iommu: Fix handling device on buses != 0 Sairaj Kodilkar
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Sairaj Kodilkar @ 2025-10-08 16:43 UTC (permalink / raw)
  To: qemu-devel, alejandro.j.jimenez
  Cc: mst, pbonzini, richard.henderson, philmd, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik,
	Sairaj Kodilkar

This series provide fixes for following two issues:

1. AMD IOMMU fails to detect the devices when they are attached to PCI bus with
   bus id != 0.
   e.g. With following command line, dhclient command fails inside the guest

    -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x5 \
    -netdev user,id=USER0,hostfwd=tcp::3333-:22 \
    -device virtio-net-pci,id=vnet0,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER0,bus=pci.1,addr=0 \

2. Current AMD IOMMU supports IOVAs upto 60 bit which cause failure while
   setting up the devices when guest is booted with command line 
   "iommu.forcedac=1".

   One example of the failure is when there are two virtio ethernet devices
   attached to the guest with command line
   
       -netdev user,id=USER0 \
       -netdev user,id=USER1 \
       -device virtio-net-pci,id=vnet0,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER0 \
       -device virtio-net-pci,id=vnet1,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER1 \
   
   In this case dhclient fails for second device with following dmesg
   
   [   24.802644] virtio_net virtio0 enp0s1: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 5664000 usecs ago
   [   29.856716] virtio_net virtio0 enp0s1: NETDEV WATCHDOG: CPU: 59: transmit queue 0 timed out 10720 ms
   [   29.858585] virtio_net virtio0 enp0s1: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 10720000 usecs ago

Base commit: (qemu uptream) eb7abb4a719f

Sairaj Kodilkar (2):
  hw/i386/amd_iommu: Fix handling device on buses != 0
  hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup

 hw/i386/amd_iommu.c | 166 +++++++++++++++++++++++++++-----------------
 hw/i386/amd_iommu.h |   7 +-
 2 files changed, 106 insertions(+), 67 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] hw/i386/amd_iommu: Fix handling device on buses != 0
  2025-10-08 16:43 [PATCH 0/2] Cleanups and fixes (PART 2) Sairaj Kodilkar
@ 2025-10-08 16:43 ` Sairaj Kodilkar
  2025-10-09 16:17   ` Alejandro Jimenez
  2025-10-08 16:43 ` [PATCH 2/2] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup Sairaj Kodilkar
  2025-10-10  1:33 ` [PATCH 0/2] Cleanups and fixes (PART 2) Alejandro Jimenez
  2 siblings, 1 reply; 13+ messages in thread
From: Sairaj Kodilkar @ 2025-10-08 16:43 UTC (permalink / raw)
  To: qemu-devel, alejandro.j.jimenez
  Cc: mst, pbonzini, richard.henderson, philmd, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik,
	Sairaj Kodilkar

The AMD IOMMU is set up at boot time and uses PCI bus numbers + devfn
for indexing into DTE. The problem is that before the guest started,
all PCI bus numbers are 0 as no PCI discovery happened yet (BIOS or/and
kernel will do that later) so relying on the bus number is wrong.
The immediate effect is emulated devices cannot do DMA when places on
a bus other that 0.

Replace static array of address_space with hash table which uses devfn and
PCIBus* for key as it is not going to change after the guest is booted.

Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
---
 hw/i386/amd_iommu.c | 127 ++++++++++++++++++++++++++------------------
 hw/i386/amd_iommu.h |   2 +-
 2 files changed, 77 insertions(+), 52 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 378e0cb55eab..0a4b4d46d885 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -59,7 +59,7 @@ const char *amdvi_mmio_high[] = {
 };
 
 struct AMDVIAddressSpace {
-    uint8_t bus_num;            /* bus number                           */
+    PCIBus *bus;                /* PCIBus (for bus number)              */
     uint8_t devfn;              /* device function                      */
     AMDVIState *iommu_state;    /* AMDVI - one per machine              */
     MemoryRegion root;          /* AMDVI Root memory map region         */
@@ -101,6 +101,11 @@ typedef enum AMDVIFaultReason {
     AMDVI_FR_PT_ENTRY_INV,      /* Failure to read PTE from guest memory */
 } AMDVIFaultReason;
 
+typedef struct amdvi_as_key {
+    PCIBus *bus;
+    int devfn;
+} amdvi_as_key;
+
 uint64_t amdvi_extended_feature_register(AMDVIState *s)
 {
     uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
@@ -382,6 +387,42 @@ static guint amdvi_uint64_hash(gconstpointer v)
     return (guint)*(const uint64_t *)v;
 }
 
+static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
+{
+    const struct amdvi_as_key *key1 = v1;
+    const struct amdvi_as_key *key2 = v2;
+
+    return key1->bus == key2->bus && key1->devfn == key2->devfn;
+}
+
+static guint amdvi_as_hash(gconstpointer v)
+{
+    const struct amdvi_as_key *key = v;
+    return (guint)((uint64_t)key->bus | (key->devfn << 24));
+}
+
+static AMDVIAddressSpace *amdvi_as_lookup(AMDVIState *s, PCIBus *bus,
+                                          int devfn)
+{
+    amdvi_as_key key = { .bus = bus, .devfn = devfn };
+    return g_hash_table_lookup(s->address_spaces, &key);
+}
+
+static int amdvi_find_as_by_devid(gpointer key, gpointer value,
+                                  gpointer user_data)
+{
+    amdvi_as_key *as = (struct amdvi_as_key *)key;
+    uint16_t devid = *((uint16_t *)user_data);
+
+    return devid == PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
+}
+
+static AMDVIAddressSpace *amdvi_get_as_by_devid(AMDVIState *s, uint16_t devid)
+{
+    return g_hash_table_find(s->address_spaces,
+                             amdvi_find_as_by_devid, &devid);
+}
+
 static AMDVIIOTLBEntry *amdvi_iotlb_lookup(AMDVIState *s, hwaddr addr,
                                            uint64_t devid)
 {
@@ -551,7 +592,7 @@ static inline uint64_t amdvi_get_pte_entry(AMDVIState *s, uint64_t pte_addr,
 
 static int amdvi_as_to_dte(AMDVIAddressSpace *as, uint64_t *dte)
 {
-    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
+    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
     AMDVIState *s = as->iommu_state;
 
     if (!amdvi_get_dte(s, devid, dte)) {
@@ -1011,25 +1052,14 @@ static void amdvi_switch_address_space(AMDVIAddressSpace *amdvi_as)
  */
 static void amdvi_reset_address_translation_all(AMDVIState *s)
 {
-    AMDVIAddressSpace **iommu_as;
-
-    for (int bus_num = 0; bus_num < PCI_BUS_MAX; bus_num++) {
-
-        /* Nothing to do if there are no devices on the current bus */
-        if (!s->address_spaces[bus_num]) {
-            continue;
-        }
-        iommu_as = s->address_spaces[bus_num];
+    AMDVIAddressSpace *iommu_as;
+    GHashTableIter as_it;
 
-        for (int devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
+    g_hash_table_iter_init(&as_it, s->address_spaces);
 
-            if (!iommu_as[devfn]) {
-                continue;
-            }
-            /* Use passthrough as default mode after reset */
-            iommu_as[devfn]->addr_translation = false;
-            amdvi_switch_address_space(iommu_as[devfn]);
-        }
+    while (g_hash_table_iter_next(&as_it, NULL, (void **)&iommu_as)) {
+        iommu_as->addr_translation = false;
+        amdvi_switch_address_space(iommu_as);
     }
 }
 
@@ -1089,27 +1119,21 @@ static void enable_nodma_mode(AMDVIAddressSpace *as)
  */
 static void amdvi_update_addr_translation_mode(AMDVIState *s, uint16_t devid)
 {
-    uint8_t bus_num, devfn, dte_mode;
+    uint8_t dte_mode;
     AMDVIAddressSpace *as;
     uint64_t dte[4] = { 0 };
     int ret;
 
-    /*
-     * Convert the devid encoded in the command to a bus and devfn in
-     * order to retrieve the corresponding address space.
-     */
-    bus_num = PCI_BUS_NUM(devid);
-    devfn = devid & 0xff;
-
     /*
      * The main buffer of size (AMDVIAddressSpace *) * (PCI_BUS_MAX) has already
      * been allocated within AMDVIState, but must be careful to not access
      * unallocated devfn.
      */
-    if (!s->address_spaces[bus_num] || !s->address_spaces[bus_num][devfn]) {
+
+    as = amdvi_get_as_by_devid(s, devid);
+    if (!as) {
         return;
     }
-    as = s->address_spaces[bus_num][devfn];
 
     ret = amdvi_as_to_dte(as, dte);
 
@@ -1783,7 +1807,7 @@ static void amdvi_do_translate(AMDVIAddressSpace *as, hwaddr addr,
                                bool is_write, IOMMUTLBEntry *ret)
 {
     AMDVIState *s = as->iommu_state;
-    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
+    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
     AMDVIIOTLBEntry *iotlb_entry = amdvi_iotlb_lookup(s, addr, devid);
     uint64_t entry[4];
     int dte_ret;
@@ -1858,7 +1882,7 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
     }
 
     amdvi_do_translate(as, addr, flag & IOMMU_WO, &ret);
-    trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
+    trace_amdvi_translation_result(pci_bus_num(as->bus), PCI_SLOT(as->devfn),
             PCI_FUNC(as->devfn), addr, ret.translated_addr);
     return ret;
 }
@@ -2222,30 +2246,28 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
     char name[128];
     AMDVIState *s = opaque;
-    AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
-    int bus_num = pci_bus_num(bus);
+    AMDVIAddressSpace *amdvi_dev_as;
+    amdvi_as_key *key;
 
-    iommu_as = s->address_spaces[bus_num];
+    amdvi_dev_as = amdvi_as_lookup(s, bus, devfn);
 
     /* allocate memory during the first run */
-    if (!iommu_as) {
-        iommu_as = g_new0(AMDVIAddressSpace *, PCI_DEVFN_MAX);
-        s->address_spaces[bus_num] = iommu_as;
-    }
-
-    /* set up AMD-Vi region */
-    if (!iommu_as[devfn]) {
+    if (!amdvi_dev_as) {
         snprintf(name, sizeof(name), "amd_iommu_devfn_%d", devfn);
 
-        iommu_as[devfn] = g_new0(AMDVIAddressSpace, 1);
-        iommu_as[devfn]->bus_num = (uint8_t)bus_num;
-        iommu_as[devfn]->devfn = (uint8_t)devfn;
-        iommu_as[devfn]->iommu_state = s;
-        iommu_as[devfn]->notifier_flags = IOMMU_NOTIFIER_NONE;
-        iommu_as[devfn]->iova_tree = iova_tree_new();
-        iommu_as[devfn]->addr_translation = false;
+        amdvi_dev_as = g_new0(AMDVIAddressSpace, 1);
+        key = g_new0(amdvi_as_key, 1);
 
-        amdvi_dev_as = iommu_as[devfn];
+        amdvi_dev_as->bus = bus;
+        amdvi_dev_as->devfn = (uint8_t)devfn;
+        amdvi_dev_as->iommu_state = s;
+        amdvi_dev_as->notifier_flags = IOMMU_NONE;
+        amdvi_dev_as->iova_tree = iova_tree_new();
+        amdvi_dev_as->addr_translation = false;
+        key->bus = bus;
+        key->devfn = devfn;
+
+        g_hash_table_insert(s->address_spaces, key, amdvi_dev_as);
 
         /*
          * Memory region relationships looks like (Address range shows
@@ -2288,7 +2310,7 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 
         amdvi_switch_address_space(amdvi_dev_as);
     }
-    return &iommu_as[devfn]->as;
+    return &amdvi_dev_as->as;
 }
 
 static const PCIIOMMUOps amdvi_iommu_ops = {
@@ -2329,7 +2351,7 @@ static int amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
     if (!s->dma_remap && (new & IOMMU_NOTIFIER_MAP)) {
         error_setg_errno(errp, ENOTSUP,
             "device %02x.%02x.%x requires dma-remap=1",
-            as->bus_num, PCI_SLOT(as->devfn), PCI_FUNC(as->devfn));
+            pci_bus_num(as->bus), PCI_SLOT(as->devfn), PCI_FUNC(as->devfn));
         return -ENOTSUP;
     }
 
@@ -2510,6 +2532,9 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
     s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
                                      amdvi_uint64_equal, g_free, g_free);
 
+    s->address_spaces = g_hash_table_new_full(amdvi_as_hash,
+                                     amdvi_as_equal, g_free, g_free);
+
     /* set up MMIO */
     memory_region_init_io(&s->mr_mmio, OBJECT(s), &mmio_mem_ops, s,
                           "amdvi-mmio", AMDVI_MMIO_SIZE);
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index daf82fc85f96..38471b95d153 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -408,7 +408,7 @@ struct AMDVIState {
     bool mmio_enabled;
 
     /* for each served device */
-    AMDVIAddressSpace **address_spaces[PCI_BUS_MAX];
+    GHashTable *address_spaces;
 
     /* list of address spaces with registered notifiers */
     QLIST_HEAD(, AMDVIAddressSpace) amdvi_as_with_notifiers;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
  2025-10-08 16:43 [PATCH 0/2] Cleanups and fixes (PART 2) Sairaj Kodilkar
  2025-10-08 16:43 ` [PATCH 1/2] hw/i386/amd_iommu: Fix handling device on buses != 0 Sairaj Kodilkar
@ 2025-10-08 16:43 ` Sairaj Kodilkar
  2025-10-10  1:22   ` Alejandro Jimenez
  2025-10-10  1:33 ` [PATCH 0/2] Cleanups and fixes (PART 2) Alejandro Jimenez
  2 siblings, 1 reply; 13+ messages in thread
From: Sairaj Kodilkar @ 2025-10-08 16:43 UTC (permalink / raw)
  To: qemu-devel, alejandro.j.jimenez
  Cc: mst, pbonzini, richard.henderson, philmd, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik,
	Sairaj Kodilkar

Physical AMD IOMMU supports upto 64 bits of DMA address. When device tries
to read or write from the given DMA address, IOMMU translates the address
using page table assigned to that device. Since IOMMU uses per device page
tables, the emulated IOMMU should use the cache tag of 68 bits
(64 bit address - 12 bit page alignment + 16 device ID).

Current emulated AMD IOMMU uses GLib hash table to create software iotlb
and uses 64 bit key to store the IOVA and deviceID, which limits the IOVA
to 60 bits. This cause failure while setting up the device when guest is
booted with "iommu.forcedac=1".

To solve this problem, define `struct amdvi_iotlb_key` which uses 64 bit
IOVA and 16 bit devid as key to store and lookup IOTLB entry.

Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
---
 hw/i386/amd_iommu.c | 51 ++++++++++++++++++++++++++++-----------------
 hw/i386/amd_iommu.h |  5 +++--
 2 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 0a4b4d46d885..5106d9cc4036 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -106,6 +106,11 @@ typedef struct amdvi_as_key {
     int devfn;
 } amdvi_as_key;
 
+typedef struct amdvi_iotlb_key {
+    uint64_t gfn;
+    uint16_t devid;
+} amdvi_iotlb_key;
+
 uint64_t amdvi_extended_feature_register(AMDVIState *s)
 {
     uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
@@ -377,16 +382,6 @@ static void amdvi_log_pagetab_error(AMDVIState *s, uint16_t devid,
              PCI_STATUS_SIG_TARGET_ABORT);
 }
 
-static gboolean amdvi_uint64_equal(gconstpointer v1, gconstpointer v2)
-{
-    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
-}
-
-static guint amdvi_uint64_hash(gconstpointer v)
-{
-    return (guint)*(const uint64_t *)v;
-}
-
 static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
 {
     const struct amdvi_as_key *key1 = v1;
@@ -423,11 +418,27 @@ static AMDVIAddressSpace *amdvi_get_as_by_devid(AMDVIState *s, uint16_t devid)
                              amdvi_find_as_by_devid, &devid);
 }
 
+static gboolean amdvi_iotlb_equal(gconstpointer v1, gconstpointer v2)
+{
+    const amdvi_iotlb_key *key1 = v1;
+    const amdvi_iotlb_key *key2 = v2;
+
+    return key1->devid == key2->devid && key1->gfn == key2->gfn;
+}
+
+static guint amdvi_iotlb_hash(gconstpointer v)
+{
+    const amdvi_iotlb_key *key = v;
+    /* Use GPA and DEVID to find the bucket */
+    return (guint)(key->gfn << AMDVI_PAGE_SHIFT_4K |
+                   (key->devid & ~AMDVI_PAGE_MASK_4K));
+}
+
+
 static AMDVIIOTLBEntry *amdvi_iotlb_lookup(AMDVIState *s, hwaddr addr,
                                            uint64_t devid)
 {
-    uint64_t key = (addr >> AMDVI_PAGE_SHIFT_4K) |
-                   ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
+    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};
     return g_hash_table_lookup(s->iotlb, &key);
 }
 
@@ -449,8 +460,7 @@ static gboolean amdvi_iotlb_remove_by_devid(gpointer key, gpointer value,
 static void amdvi_iotlb_remove_page(AMDVIState *s, hwaddr addr,
                                     uint64_t devid)
 {
-    uint64_t key = (addr >> AMDVI_PAGE_SHIFT_4K) |
-                   ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
+    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};
     g_hash_table_remove(s->iotlb, &key);
 }
 
@@ -461,8 +471,10 @@ static void amdvi_update_iotlb(AMDVIState *s, uint16_t devid,
     /* don't cache erroneous translations */
     if (to_cache.perm != IOMMU_NONE) {
         AMDVIIOTLBEntry *entry = g_new(AMDVIIOTLBEntry, 1);
-        uint64_t *key = g_new(uint64_t, 1);
-        uint64_t gfn = gpa >> AMDVI_PAGE_SHIFT_4K;
+        amdvi_iotlb_key *key = g_new(amdvi_iotlb_key, 1);
+
+        key->gfn = AMDVI_GET_IOTLB_GFN(gpa);
+        key->devid = devid;
 
         trace_amdvi_cache_update(domid, PCI_BUS_NUM(devid), PCI_SLOT(devid),
                 PCI_FUNC(devid), gpa, to_cache.translated_addr);
@@ -475,7 +487,8 @@ static void amdvi_update_iotlb(AMDVIState *s, uint16_t devid,
         entry->perms = to_cache.perm;
         entry->translated_addr = to_cache.translated_addr;
         entry->page_mask = to_cache.addr_mask;
-        *key = gfn | ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
+        entry->devid = devid;
+
         g_hash_table_replace(s->iotlb, key, entry);
     }
 }
@@ -2529,8 +2542,8 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
         }
     }
 
-    s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
-                                     amdvi_uint64_equal, g_free, g_free);
+    s->iotlb = g_hash_table_new_full(amdvi_iotlb_hash,
+                                     amdvi_iotlb_equal, g_free, g_free);
 
     s->address_spaces = g_hash_table_new_full(amdvi_as_hash,
                                      amdvi_as_equal, g_free, g_free);
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 38471b95d153..8089f9472ac4 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -220,8 +220,9 @@
 #define PAGE_SIZE_PTE_COUNT(pgsz)       (1ULL << ((ctz64(pgsz) - 12) % 9))
 
 /* IOTLB */
-#define AMDVI_IOTLB_MAX_SIZE 1024
-#define AMDVI_DEVID_SHIFT    36
+#define AMDVI_IOTLB_MAX_SIZE        1024
+#define AMDVI_IOTLB_DEVID_SHIFT     48
+#define AMDVI_GET_IOTLB_GFN(addr)   (addr >> AMDVI_PAGE_SHIFT_4K)
 
 /* default extended feature */
 #define AMDVI_DEFAULT_EXT_FEATURES \
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hw/i386/amd_iommu: Fix handling device on buses != 0
  2025-10-08 16:43 ` [PATCH 1/2] hw/i386/amd_iommu: Fix handling device on buses != 0 Sairaj Kodilkar
@ 2025-10-09 16:17   ` Alejandro Jimenez
  2025-10-10  5:04     ` Sairaj Kodilkar
  0 siblings, 1 reply; 13+ messages in thread
From: Alejandro Jimenez @ 2025-10-09 16:17 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: mst, pbonzini, richard.henderson, philmd, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik

Hi Sairaj,

Good catch. This issue makes my Linux guest unusable due to kernel 
watchdog errors. This patch fixes the problem.
I have a few comments, but nothing that would fundamentally alter the 
current behavior. Please see below...

On 10/8/25 12:43 PM, Sairaj Kodilkar wrote:
> The AMD IOMMU is set up at boot time and uses PCI bus numbers + devfn
> for indexing into DTE. The problem is that before the guest started,
> all PCI bus numbers are 0 as no PCI discovery happened yet (BIOS or/and
> kernel will do that later) so relying on the bus number is wrong.
> The immediate effect is emulated devices cannot do DMA when places on
> a bus other that 0.
> 
> Replace static array of address_space with hash table which uses devfn and
> PCIBus* for key as it is not going to change after the guest is booted.
> 
> Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
> ---
>   hw/i386/amd_iommu.c | 127 ++++++++++++++++++++++++++------------------
>   hw/i386/amd_iommu.h |   2 +-
>   2 files changed, 77 insertions(+), 52 deletions(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 378e0cb55eab..0a4b4d46d885 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -59,7 +59,7 @@ const char *amdvi_mmio_high[] = {
>   };
>   
>   struct AMDVIAddressSpace {
> -    uint8_t bus_num;            /* bus number                           */
> +    PCIBus *bus;                /* PCIBus (for bus number)              */
>       uint8_t devfn;              /* device function                      */
>       AMDVIState *iommu_state;    /* AMDVI - one per machine              */
>       MemoryRegion root;          /* AMDVI Root memory map region         */
> @@ -101,6 +101,11 @@ typedef enum AMDVIFaultReason {
>       AMDVI_FR_PT_ENTRY_INV,      /* Failure to read PTE from guest memory */
>   } AMDVIFaultReason;
>   
> +typedef struct amdvi_as_key {
> +    PCIBus *bus;
> +    int devfn;

I'd prefer to use fixed types i.e. uint8_t for devfn. Keeps it 
consistent with same field in other local structs and existing casts in 
the code (e.g amdvi_host_dma_iommu()).

> +} amdvi_as_key;
> +
>   uint64_t amdvi_extended_feature_register(AMDVIState *s)
>   {
>       uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
> @@ -382,6 +387,42 @@ static guint amdvi_uint64_hash(gconstpointer v)
>       return (guint)*(const uint64_t *)v;
>   }
>   
> +static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
> +{
> +    const struct amdvi_as_key *key1 = v1;
> +    const struct amdvi_as_key *key2 = v2;
> +
> +    return key1->bus == key2->bus && key1->devfn == key2->devfn;
> +}
> +
> +static guint amdvi_as_hash(gconstpointer v)
> +{
> +    const struct amdvi_as_key *key = v;
> +    return (guint)((uint64_t)key->bus | (key->devfn << 24));

Any particular reason to build the hash in 'big endian' format?
I don't see a problem as long it remains consistent, but it differs from 
the encoding used by the PCI_* builder macros in pci.h, as well as the 
vtd equivalent code.

Additionally, using uintptr_t instead of uint64_t when casting key->bus 
is a good way to document that we are hashing the pointer value itself. 
In practice I don't see any scenario where there would be a difference 
in behavior (the result is truncated anyways when casting to guint), but
technically/pedantically uintptr_t is correct choice to convert from a 
data pointer.


> +}
> +
> +static AMDVIAddressSpace *amdvi_as_lookup(AMDVIState *s, PCIBus *bus,
> +                                          int devfn)
> +{
> +    amdvi_as_key key = { .bus = bus, .devfn = devfn };
> +    return g_hash_table_lookup(s->address_spaces, &key);
> +}
> +
> +static int amdvi_find_as_by_devid(gpointer key, gpointer value,

this should return a gboolean to exactly match the signature of the 
predicate argument used by g_hash_table_find().
gboolean is ultimately an int, but I don't know if a strict type 
checking tool might complain now or in the future, and since we are 
already using glib defined types we might as well keep it consistent.


> +                                  gpointer user_data)
> +{
> +    amdvi_as_key *as = (struct amdvi_as_key *)key;
> +    uint16_t devid = *((uint16_t *)user_data);
> +
> +    return devid == PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
> +}
> +
> +static AMDVIAddressSpace *amdvi_get_as_by_devid(AMDVIState *s, uint16_t devid)
> +{
> +    return g_hash_table_find(s->address_spaces,
> +                             amdvi_find_as_by_devid, &devid);
> +}
> +
>   static AMDVIIOTLBEntry *amdvi_iotlb_lookup(AMDVIState *s, hwaddr addr,
>                                              uint64_t devid)
>   {
> @@ -551,7 +592,7 @@ static inline uint64_t amdvi_get_pte_entry(AMDVIState *s, uint64_t pte_addr,
>   
>   static int amdvi_as_to_dte(AMDVIAddressSpace *as, uint64_t *dte)
>   {
> -    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
> +    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
>       AMDVIState *s = as->iommu_state;
>   
>       if (!amdvi_get_dte(s, devid, dte)) {
> @@ -1011,25 +1052,14 @@ static void amdvi_switch_address_space(AMDVIAddressSpace *amdvi_as)
>    */
>   static void amdvi_reset_address_translation_all(AMDVIState *s)
>   {
> -    AMDVIAddressSpace **iommu_as;
> -
> -    for (int bus_num = 0; bus_num < PCI_BUS_MAX; bus_num++) {
> -
> -        /* Nothing to do if there are no devices on the current bus */
> -        if (!s->address_spaces[bus_num]) {
> -            continue;
> -        }
> -        iommu_as = s->address_spaces[bus_num];
> +    AMDVIAddressSpace *iommu_as;
> +    GHashTableIter as_it;
>   
> -        for (int devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> +    g_hash_table_iter_init(&as_it, s->address_spaces);
>   
> -            if (!iommu_as[devfn]) {
> -                continue;
> -            }
> -            /* Use passthrough as default mode after reset */
> -            iommu_as[devfn]->addr_translation = false;
> -            amdvi_switch_address_space(iommu_as[devfn]);
> -        }
> +    while (g_hash_table_iter_next(&as_it, NULL, (void **)&iommu_as)) {

Lets keep the comment describing the behavior. This is something I want 
to discuss in a separate thread...

            /* Use passthrough as default mode after reset */

> +        iommu_as->addr_translation = false;
> +        amdvi_switch_address_space(iommu_as);
>       }
>   }
>   
> @@ -1089,27 +1119,21 @@ static void enable_nodma_mode(AMDVIAddressSpace *as)
>    */
>   static void amdvi_update_addr_translation_mode(AMDVIState *s, uint16_t devid)
>   {
> -    uint8_t bus_num, devfn, dte_mode;
> +    uint8_t dte_mode;
>       AMDVIAddressSpace *as;
>       uint64_t dte[4] = { 0 };
>       int ret;
>   
> -    /*
> -     * Convert the devid encoded in the command to a bus and devfn in
> -     * order to retrieve the corresponding address space.
> -     */
> -    bus_num = PCI_BUS_NUM(devid);
> -    devfn = devid & 0xff;
> -
>       /*
>        * The main buffer of size (AMDVIAddressSpace *) * (PCI_BUS_MAX) has already
>        * been allocated within AMDVIState, but must be careful to not access
>        * unallocated devfn.
>        */

I think this block comment can be removed now that we have a better 
interface to retrieve the address space.

> -    if (!s->address_spaces[bus_num] || !s->address_spaces[bus_num][devfn]) {
> +
> +    as = amdvi_get_as_by_devid(s, devid);
> +    if (!as) {
>           return;
>       }
> -    as = s->address_spaces[bus_num][devfn];
>   
>       ret = amdvi_as_to_dte(as, dte);
>   
> @@ -1783,7 +1807,7 @@ static void amdvi_do_translate(AMDVIAddressSpace *as, hwaddr addr,
>                                  bool is_write, IOMMUTLBEntry *ret)
>   {
>       AMDVIState *s = as->iommu_state;
> -    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
> +    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
>       AMDVIIOTLBEntry *iotlb_entry = amdvi_iotlb_lookup(s, addr, devid);
>       uint64_t entry[4];
>       int dte_ret;
> @@ -1858,7 +1882,7 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>       }
>   
>       amdvi_do_translate(as, addr, flag & IOMMU_WO, &ret);
> -    trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
> +    trace_amdvi_translation_result(pci_bus_num(as->bus), PCI_SLOT(as->devfn),
>               PCI_FUNC(as->devfn), addr, ret.translated_addr);
>       return ret;
>   }
> @@ -2222,30 +2246,28 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>   {
>       char name[128];
>       AMDVIState *s = opaque;
> -    AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
> -    int bus_num = pci_bus_num(bus);
> +    AMDVIAddressSpace *amdvi_dev_as;
> +    amdvi_as_key *key;
>   
> -    iommu_as = s->address_spaces[bus_num];
> +    amdvi_dev_as = amdvi_as_lookup(s, bus, devfn);
>   
>       /* allocate memory during the first run */
> -    if (!iommu_as) {
> -        iommu_as = g_new0(AMDVIAddressSpace *, PCI_DEVFN_MAX);
> -        s->address_spaces[bus_num] = iommu_as;
> -    }
> -
> -    /* set up AMD-Vi region */
> -    if (!iommu_as[devfn]) {
> +    if (!amdvi_dev_as) {
>           snprintf(name, sizeof(name), "amd_iommu_devfn_%d", devfn);
>   
> -        iommu_as[devfn] = g_new0(AMDVIAddressSpace, 1);
> -        iommu_as[devfn]->bus_num = (uint8_t)bus_num;
> -        iommu_as[devfn]->devfn = (uint8_t)devfn;
> -        iommu_as[devfn]->iommu_state = s;
> -        iommu_as[devfn]->notifier_flags = IOMMU_NOTIFIER_NONE;
> -        iommu_as[devfn]->iova_tree = iova_tree_new();
> -        iommu_as[devfn]->addr_translation = false;
> +        amdvi_dev_as = g_new0(AMDVIAddressSpace, 1);
> +        key = g_new0(amdvi_as_key, 1);
>   
> -        amdvi_dev_as = iommu_as[devfn];
> +        amdvi_dev_as->bus = bus;
> +        amdvi_dev_as->devfn = (uint8_t)devfn;
> +        amdvi_dev_as->iommu_state = s;
> +        amdvi_dev_as->notifier_flags = IOMMU_NONE;
Keep IOMMU_NOTIFIER_NONE. It is the correct type, as you pointed out in 
a previous patchset.

Thank you,
Alejandro

> +        amdvi_dev_as->iova_tree = iova_tree_new();
> +        amdvi_dev_as->addr_translation = false;
> +        key->bus = bus;
> +        key->devfn = devfn;
> +
> +        g_hash_table_insert(s->address_spaces, key, amdvi_dev_as);
>   
>           /*
>            * Memory region relationships looks like (Address range shows
> @@ -2288,7 +2310,7 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>   
>           amdvi_switch_address_space(amdvi_dev_as);
>       }
> -    return &iommu_as[devfn]->as;
> +    return &amdvi_dev_as->as;
>   }
>   
>   static const PCIIOMMUOps amdvi_iommu_ops = {
> @@ -2329,7 +2351,7 @@ static int amdvi_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu,
>       if (!s->dma_remap && (new & IOMMU_NOTIFIER_MAP)) {
>           error_setg_errno(errp, ENOTSUP,
>               "device %02x.%02x.%x requires dma-remap=1",
> -            as->bus_num, PCI_SLOT(as->devfn), PCI_FUNC(as->devfn));
> +            pci_bus_num(as->bus), PCI_SLOT(as->devfn), PCI_FUNC(as->devfn));
>           return -ENOTSUP;
>       }
>   
> @@ -2510,6 +2532,9 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
>       s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
>                                        amdvi_uint64_equal, g_free, g_free);
>   
> +    s->address_spaces = g_hash_table_new_full(amdvi_as_hash,
> +                                     amdvi_as_equal, g_free, g_free);
> +
>       /* set up MMIO */
>       memory_region_init_io(&s->mr_mmio, OBJECT(s), &mmio_mem_ops, s,
>                             "amdvi-mmio", AMDVI_MMIO_SIZE);
> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> index daf82fc85f96..38471b95d153 100644
> --- a/hw/i386/amd_iommu.h
> +++ b/hw/i386/amd_iommu.h
> @@ -408,7 +408,7 @@ struct AMDVIState {
>       bool mmio_enabled;
>   
>       /* for each served device */
> -    AMDVIAddressSpace **address_spaces[PCI_BUS_MAX];
> +    GHashTable *address_spaces;
>   
>       /* list of address spaces with registered notifiers */
>       QLIST_HEAD(, AMDVIAddressSpace) amdvi_as_with_notifiers;



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
  2025-10-08 16:43 ` [PATCH 2/2] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup Sairaj Kodilkar
@ 2025-10-10  1:22   ` Alejandro Jimenez
  2025-10-10  5:14     ` Sairaj Kodilkar
  0 siblings, 1 reply; 13+ messages in thread
From: Alejandro Jimenez @ 2025-10-10  1:22 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: mst, pbonzini, richard.henderson, philmd, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik

Hi Sairaj,

On 10/8/25 12:43 PM, Sairaj Kodilkar wrote:
> Physical AMD IOMMU supports upto 64 bits of DMA address. When device tries

s/upto/up to/ and "a device"

> to read or write from the given DMA address, IOMMU translates the address

"a given DMA address"

> using page table assigned to that device. Since IOMMU uses per device page
> tables, the emulated IOMMU should use the cache tag of 68 bits
> (64 bit address - 12 bit page alignment + 16 device ID).
> 
> Current emulated AMD IOMMU uses GLib hash table to create software iotlb
> and uses 64 bit key to store the IOVA and deviceID, which limits the IOVA
> to 60 bits. This cause failure while setting up the device when guest is
> booted with "iommu.forcedac=1".
> 
> To solve this problem, define `struct amdvi_iotlb_key` which uses 64 bit
> IOVA and 16 bit devid as key to store and lookup IOTLB entry.
> 

I wouldn't necessarily mention and quote the structure name since that 
is an implementation detail and it might change in the future.

Also, the current implementation also combines a 64-bit IOVA 
(technically a 52bit gfn) with a 16-bit devid, the real change in this 
patch is in how those same values are being shifted to construct a hash 
key that avoids truncation as much as possible. So I'd reword the commit 
message to highlight that.

> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
> ---
>   hw/i386/amd_iommu.c | 51 ++++++++++++++++++++++++++++-----------------
>   hw/i386/amd_iommu.h |  5 +++--
>   2 files changed, 35 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index 0a4b4d46d885..5106d9cc4036 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -106,6 +106,11 @@ typedef struct amdvi_as_key {
>       int devfn;
>   } amdvi_as_key;
>   
> +typedef struct amdvi_iotlb_key {
> +    uint64_t gfn;
> +    uint16_t devid;
> +} amdvi_iotlb_key;
> +
>   uint64_t amdvi_extended_feature_register(AMDVIState *s)
>   {
>       uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
> @@ -377,16 +382,6 @@ static void amdvi_log_pagetab_error(AMDVIState *s, uint16_t devid,
>                PCI_STATUS_SIG_TARGET_ABORT);
>   }
>   
> -static gboolean amdvi_uint64_equal(gconstpointer v1, gconstpointer v2)
> -{
> -    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
> -}
> -
> -static guint amdvi_uint64_hash(gconstpointer v)
> -{
> -    return (guint)*(const uint64_t *)v;
> -}
> -
>   static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
>   {
>       const struct amdvi_as_key *key1 = v1;
> @@ -423,11 +418,27 @@ static AMDVIAddressSpace *amdvi_get_as_by_devid(AMDVIState *s, uint16_t devid)
>                                amdvi_find_as_by_devid, &devid);
>   }
>   
> +static gboolean amdvi_iotlb_equal(gconstpointer v1, gconstpointer v2)
> +{
> +    const amdvi_iotlb_key *key1 = v1;
> +    const amdvi_iotlb_key *key2 = v2;
> +
> +    return key1->devid == key2->devid && key1->gfn == key2->gfn;
> +}
> +
> +static guint amdvi_iotlb_hash(gconstpointer v)
> +{
> +    const amdvi_iotlb_key *key = v;
> +    /* Use GPA and DEVID to find the bucket */
> +    return (guint)(key->gfn << AMDVI_PAGE_SHIFT_4K |
> +                   (key->devid & ~AMDVI_PAGE_MASK_4K));
> +}
> +
> +
>   static AMDVIIOTLBEntry *amdvi_iotlb_lookup(AMDVIState *s, hwaddr addr,
>                                              uint64_t devid)
>   {
> -    uint64_t key = (addr >> AMDVI_PAGE_SHIFT_4K) |
> -                   ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
> +    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};

This line initializes the key fields with the opposite of the intended 
values. Please use this initialization style instead to prevent these 
types of errors, plus it makes the definitions more readable:

-    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};
+    amdvi_iotlb_key key = {
+        .gfn = AMDVI_GET_IOTLB_GFN(addr),
+        .devid = devid,
+    };


>       return g_hash_table_lookup(s->iotlb, &key);
>   }
>   
> @@ -449,8 +460,7 @@ static gboolean amdvi_iotlb_remove_by_devid(gpointer key, gpointer value,
>   static void amdvi_iotlb_remove_page(AMDVIState *s, hwaddr addr,
>                                       uint64_t devid)
>   {
> -    uint64_t key = (addr >> AMDVI_PAGE_SHIFT_4K) |
> -                   ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
> +    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};

Same as above, key fields are initialized in incorrect order. Same easy 
fix by using designated initializers.

>       g_hash_table_remove(s->iotlb, &key);
>   }
>   
> @@ -461,8 +471,10 @@ static void amdvi_update_iotlb(AMDVIState *s, uint16_t devid,
>       /* don't cache erroneous translations */
>       if (to_cache.perm != IOMMU_NONE) {
>           AMDVIIOTLBEntry *entry = g_new(AMDVIIOTLBEntry, 1);
> -        uint64_t *key = g_new(uint64_t, 1);
> -        uint64_t gfn = gpa >> AMDVI_PAGE_SHIFT_4K;
> +        amdvi_iotlb_key *key = g_new(amdvi_iotlb_key, 1);
> +
> +        key->gfn = AMDVI_GET_IOTLB_GFN(gpa);
> +        key->devid = devid;
>   
>           trace_amdvi_cache_update(domid, PCI_BUS_NUM(devid), PCI_SLOT(devid),
>                   PCI_FUNC(devid), gpa, to_cache.translated_addr);
> @@ -475,7 +487,8 @@ static void amdvi_update_iotlb(AMDVIState *s, uint16_t devid,
>           entry->perms = to_cache.perm;
>           entry->translated_addr = to_cache.translated_addr;
>           entry->page_mask = to_cache.addr_mask;
> -        *key = gfn | ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
> +        entry->devid = devid;
> +
>           g_hash_table_replace(s->iotlb, key, entry);
>       }
>   }
> @@ -2529,8 +2542,8 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
>           }
>       }
>   
> -    s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
> -                                     amdvi_uint64_equal, g_free, g_free);
> +    s->iotlb = g_hash_table_new_full(amdvi_iotlb_hash,
> +                                     amdvi_iotlb_equal, g_free, g_free);
>   
>       s->address_spaces = g_hash_table_new_full(amdvi_as_hash,
>                                        amdvi_as_equal, g_free, g_free);
> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> index 38471b95d153..8089f9472ac4 100644
> --- a/hw/i386/amd_iommu.h
> +++ b/hw/i386/amd_iommu.h
> @@ -220,8 +220,9 @@
>   #define PAGE_SIZE_PTE_COUNT(pgsz)       (1ULL << ((ctz64(pgsz) - 12) % 9))
>   
>   /* IOTLB */
> -#define AMDVI_IOTLB_MAX_SIZE 1024
> -#define AMDVI_DEVID_SHIFT    36
> +#define AMDVI_IOTLB_MAX_SIZE        1024
> +#define AMDVI_IOTLB_DEVID_SHIFT     48

Remove AMDVI_IOTLB_DEVID_SHIFT since it is not currently used (I assume 
it is a left over from earlier prototype)...

Thank you,
Alejandro

> +#define AMDVI_GET_IOTLB_GFN(addr)   (addr >> AMDVI_PAGE_SHIFT_4K)
>   
>   /* default extended feature */
>   #define AMDVI_DEFAULT_EXT_FEATURES \



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Cleanups and fixes (PART 2)
  2025-10-08 16:43 [PATCH 0/2] Cleanups and fixes (PART 2) Sairaj Kodilkar
  2025-10-08 16:43 ` [PATCH 1/2] hw/i386/amd_iommu: Fix handling device on buses != 0 Sairaj Kodilkar
  2025-10-08 16:43 ` [PATCH 2/2] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup Sairaj Kodilkar
@ 2025-10-10  1:33 ` Alejandro Jimenez
  2025-10-10  2:19   ` Philippe Mathieu-Daudé
  2 siblings, 1 reply; 13+ messages in thread
From: Alejandro Jimenez @ 2025-10-10  1:33 UTC (permalink / raw)
  To: Sairaj Kodilkar, qemu-devel
  Cc: mst, pbonzini, richard.henderson, philmd, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik



On 10/8/25 12:43 PM, Sairaj Kodilkar wrote:
> This series provide fixes for following two issues:
> 

I was able to reproduce and confirm the fixes for the issues you list. 
Please see my replies to the individual patches for my suggestions.

> 1. AMD IOMMU fails to detect the devices when they are attached to PCI bus with
>     bus id != 0.
>     e.g. With following command line, dhclient command fails inside the guest
> 
>      -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x5 \
>      -netdev user,id=USER0,hostfwd=tcp::3333-:22 \
>      -device virtio-net-pci,id=vnet0,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER0,bus=pci.1,addr=0 \
> 
> 2. Current AMD IOMMU supports IOVAs upto 60 bit which cause failure while
>     setting up the devices when guest is booted with command line
>     "iommu.forcedac=1".
> 
>     One example of the failure is when there are two virtio ethernet devices
>     attached to the guest with command line
>     
>         -netdev user,id=USER0 \
>         -netdev user,id=USER1 \
>         -device virtio-net-pci,id=vnet0,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER0 \
>         -device virtio-net-pci,id=vnet1,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER1 \
>     
>     In this case dhclient fails for second device with following dmesg
>     
>     [   24.802644] virtio_net virtio0 enp0s1: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 5664000 usecs ago
>     [   29.856716] virtio_net virtio0 enp0s1: NETDEV WATCHDOG: CPU: 59: transmit queue 0 timed out 10720 ms
>     [   29.858585] virtio_net virtio0 enp0s1: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 10720000 usecs ago
> 
> Base commit: (qemu uptream) eb7abb4a719f
> 

General feedback for both patches:

I know the commit log is not consistent so far, but going forward I 
propose we adopt the shorter prefix "amd_iommu: " for commit summaries. 
There is no ambiguity (only one arch has amd_iommu), so the full path is 
not required (i.e. avoid 'hw/i386/amd_iommu: '). Shorter boilerplate 
leaves more space for relevant details, and helps people like me who 
struggle to comply with character limits :).

Thank you,
Alejandro

> Sairaj Kodilkar (2):
>    hw/i386/amd_iommu: Fix handling device on buses != 0
>    hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
> 
>   hw/i386/amd_iommu.c | 166 +++++++++++++++++++++++++++-----------------
>   hw/i386/amd_iommu.h |   7 +-
>   2 files changed, 106 insertions(+), 67 deletions(-)
> 





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Cleanups and fixes (PART 2)
  2025-10-10  1:33 ` [PATCH 0/2] Cleanups and fixes (PART 2) Alejandro Jimenez
@ 2025-10-10  2:19   ` Philippe Mathieu-Daudé
  2025-10-10  5:25     ` Sairaj Kodilkar
  2025-10-10 14:37     ` Alejandro Jimenez
  0 siblings, 2 replies; 13+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-10-10  2:19 UTC (permalink / raw)
  To: Alejandro Jimenez, Sairaj Kodilkar, qemu-devel
  Cc: mst, pbonzini, richard.henderson, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik

Hi Alejandro,

On 10/10/25 03:33, Alejandro Jimenez wrote:

> I know the commit log is not consistent so far, but going forward I 
> propose we adopt the shorter prefix "amd_iommu: " for commit summaries. 
> There is no ambiguity (only one arch has amd_iommu), so the full path is 
> not required (i.e. avoid 'hw/i386/amd_iommu: '). Shorter boilerplate 
> leaves more space for relevant details, and helps people like me who 
> struggle to comply with character limits :).

What about "hw/amd_iommu:" to keep 'hw' in subject?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hw/i386/amd_iommu: Fix handling device on buses != 0
  2025-10-09 16:17   ` Alejandro Jimenez
@ 2025-10-10  5:04     ` Sairaj Kodilkar
  0 siblings, 0 replies; 13+ messages in thread
From: Sairaj Kodilkar @ 2025-10-10  5:04 UTC (permalink / raw)
  To: Alejandro Jimenez, qemu-devel
  Cc: mst, pbonzini, richard.henderson, philmd, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik



On 10/9/2025 9:47 PM, Alejandro Jimenez wrote:
Hi alejandro
Thanks for reviewing

> Hi Sairaj,
>
> Good catch. This issue makes my Linux guest unusable due to kernel 
> watchdog errors. This patch fixes the problem.
> I have a few comments, but nothing that would fundamentally alter the 
> current behavior. Please see below...

Yep I also faced similar issue.

>
> On 10/8/25 12:43 PM, Sairaj Kodilkar wrote:
>> The AMD IOMMU is set up at boot time and uses PCI bus numbers + devfn
>> for indexing into DTE. The problem is that before the guest started,
>> all PCI bus numbers are 0 as no PCI discovery happened yet (BIOS or/and
>> kernel will do that later) so relying on the bus number is wrong.
>> The immediate effect is emulated devices cannot do DMA when places on
>> a bus other that 0.
>>
>> Replace static array of address_space with hash table which uses 
>> devfn and
>> PCIBus* for key as it is not going to change after the guest is booted.
>>
>> Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
>> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
>> ---
>>   hw/i386/amd_iommu.c | 127 ++++++++++++++++++++++++++------------------
>>   hw/i386/amd_iommu.h |   2 +-
>>   2 files changed, 77 insertions(+), 52 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index 378e0cb55eab..0a4b4d46d885 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -59,7 +59,7 @@ const char *amdvi_mmio_high[] = {
>>   };
>>     struct AMDVIAddressSpace {
>> -    uint8_t bus_num;            /* bus 
>> number                           */
>> +    PCIBus *bus;                /* PCIBus (for bus 
>> number)              */
>>       uint8_t devfn;              /* device 
>> function                      */
>>       AMDVIState *iommu_state;    /* AMDVI - one per 
>> machine              */
>>       MemoryRegion root;          /* AMDVI Root memory map 
>> region         */
>> @@ -101,6 +101,11 @@ typedef enum AMDVIFaultReason {
>>       AMDVI_FR_PT_ENTRY_INV,      /* Failure to read PTE from guest 
>> memory */
>>   } AMDVIFaultReason;
>>   +typedef struct amdvi_as_key {
>> +    PCIBus *bus;
>> +    int devfn;
>
> I'd prefer to use fixed types i.e. uint8_t for devfn. Keeps it 
> consistent with same field in other local structs and existing casts 
> in the code (e.g amdvi_host_dma_iommu()).
Sure will update it.
>
>> +} amdvi_as_key;
>> +
>>   uint64_t amdvi_extended_feature_register(AMDVIState *s)
>>   {
>>       uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
>> @@ -382,6 +387,42 @@ static guint amdvi_uint64_hash(gconstpointer v)
>>       return (guint)*(const uint64_t *)v;
>>   }
>>   +static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
>> +{
>> +    const struct amdvi_as_key *key1 = v1;
>> +    const struct amdvi_as_key *key2 = v2;
>> +
>> +    return key1->bus == key2->bus && key1->devfn == key2->devfn;
>> +}
>> +
>> +static guint amdvi_as_hash(gconstpointer v)
>> +{
>> +    const struct amdvi_as_key *key = v;
>> +    return (guint)((uint64_t)key->bus | (key->devfn << 24));
>
> Any particular reason to build the hash in 'big endian' format?
> I don't see a problem as long it remains consistent, but it differs 
> from the encoding used by the PCI_* builder macros in pci.h, as well 
> as the vtd equivalent code.
>
> Additionally, using uintptr_t instead of uint64_t when casting 
> key->bus is a good way to document that we are hashing the pointer 
> value itself. In practice I don't see any scenario where there would 
> be a difference in behavior (the result is truncated anyways when 
> casting to guint), but
> technically/pedantically uintptr_t is correct choice to convert from a 
> data pointer.

There is no particular reason for it to be big endian,  I will make it 
consistent with pci.h

>
>
>> +}
>> +
>> +static AMDVIAddressSpace *amdvi_as_lookup(AMDVIState *s, PCIBus *bus,
>> +                                          int devfn)
>> +{
>> +    amdvi_as_key key = { .bus = bus, .devfn = devfn };
>> +    return g_hash_table_lookup(s->address_spaces, &key);
>> +}
>> +
>> +static int amdvi_find_as_by_devid(gpointer key, gpointer value,
>
> this should return a gboolean to exactly match the signature of the 
> predicate argument used by g_hash_table_find().
> gboolean is ultimately an int, but I don't know if a strict type 
> checking tool might complain now or in the future, and since we are 
> already using glib defined types we might as well keep it consistent.
>
>

Ack.

>> + gpointer user_data)
>> +{
>> +    amdvi_as_key *as = (struct amdvi_as_key *)key;
>> +    uint16_t devid = *((uint16_t *)user_data);
>> +
>> +    return devid == PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
>> +}
>> +
>> +static AMDVIAddressSpace *amdvi_get_as_by_devid(AMDVIState *s, 
>> uint16_t devid)
>> +{
>> +    return g_hash_table_find(s->address_spaces,
>> +                             amdvi_find_as_by_devid, &devid);
>> +}
>> +
>>   static AMDVIIOTLBEntry *amdvi_iotlb_lookup(AMDVIState *s, hwaddr addr,
>>                                              uint64_t devid)
>>   {
>> @@ -551,7 +592,7 @@ static inline uint64_t 
>> amdvi_get_pte_entry(AMDVIState *s, uint64_t pte_addr,
>>     static int amdvi_as_to_dte(AMDVIAddressSpace *as, uint64_t *dte)
>>   {
>> -    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
>> +    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
>>       AMDVIState *s = as->iommu_state;
>>         if (!amdvi_get_dte(s, devid, dte)) {
>> @@ -1011,25 +1052,14 @@ static void 
>> amdvi_switch_address_space(AMDVIAddressSpace *amdvi_as)
>>    */
>>   static void amdvi_reset_address_translation_all(AMDVIState *s)
>>   {
>> -    AMDVIAddressSpace **iommu_as;
>> -
>> -    for (int bus_num = 0; bus_num < PCI_BUS_MAX; bus_num++) {
>> -
>> -        /* Nothing to do if there are no devices on the current bus */
>> -        if (!s->address_spaces[bus_num]) {
>> -            continue;
>> -        }
>> -        iommu_as = s->address_spaces[bus_num];
>> +    AMDVIAddressSpace *iommu_as;
>> +    GHashTableIter as_it;
>>   -        for (int devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
>> +    g_hash_table_iter_init(&as_it, s->address_spaces);
>>   -            if (!iommu_as[devfn]) {
>> -                continue;
>> -            }
>> -            /* Use passthrough as default mode after reset */
>> -            iommu_as[devfn]->addr_translation = false;
>> -            amdvi_switch_address_space(iommu_as[devfn]);
>> -        }
>> +    while (g_hash_table_iter_next(&as_it, NULL, (void **)&iommu_as)) {
>
> Lets keep the comment describing the behavior. This is something I 
> want to discuss in a separate thread...

Ahh right, I missed to add the comment after deleting the above part.

>
>            /* Use passthrough as default mode after reset */
>
>> +        iommu_as->addr_translation = false;
>> +        amdvi_switch_address_space(iommu_as);
>>       }
>>   }
>>   @@ -1089,27 +1119,21 @@ static void 
>> enable_nodma_mode(AMDVIAddressSpace *as)
>>    */
>>   static void amdvi_update_addr_translation_mode(AMDVIState *s, 
>> uint16_t devid)
>>   {
>> -    uint8_t bus_num, devfn, dte_mode;
>> +    uint8_t dte_mode;
>>       AMDVIAddressSpace *as;
>>       uint64_t dte[4] = { 0 };
>>       int ret;
>>   -    /*
>> -     * Convert the devid encoded in the command to a bus and devfn in
>> -     * order to retrieve the corresponding address space.
>> -     */
>> -    bus_num = PCI_BUS_NUM(devid);
>> -    devfn = devid & 0xff;
>> -
>>       /*
>>        * The main buffer of size (AMDVIAddressSpace *) * 
>> (PCI_BUS_MAX) has already
>>        * been allocated within AMDVIState, but must be careful to not 
>> access
>>        * unallocated devfn.
>>        */
>
> I think this block comment can be removed now that we have a better 
> interface to retrieve the address space.

Right thanks for noticing, will remove it.

>
>> -    if (!s->address_spaces[bus_num] || 
>> !s->address_spaces[bus_num][devfn]) {
>> +
>> +    as = amdvi_get_as_by_devid(s, devid);
>> +    if (!as) {
>>           return;
>>       }
>> -    as = s->address_spaces[bus_num][devfn];
>>         ret = amdvi_as_to_dte(as, dte);
>>   @@ -1783,7 +1807,7 @@ static void 
>> amdvi_do_translate(AMDVIAddressSpace *as, hwaddr addr,
>>                                  bool is_write, IOMMUTLBEntry *ret)
>>   {
>>       AMDVIState *s = as->iommu_state;
>> -    uint16_t devid = PCI_BUILD_BDF(as->bus_num, as->devfn);
>> +    uint16_t devid = PCI_BUILD_BDF(pci_bus_num(as->bus), as->devfn);
>>       AMDVIIOTLBEntry *iotlb_entry = amdvi_iotlb_lookup(s, addr, devid);
>>       uint64_t entry[4];
>>       int dte_ret;
>> @@ -1858,7 +1882,7 @@ static IOMMUTLBEntry 
>> amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>>       }
>>         amdvi_do_translate(as, addr, flag & IOMMU_WO, &ret);
>> -    trace_amdvi_translation_result(as->bus_num, PCI_SLOT(as->devfn),
>> +    trace_amdvi_translation_result(pci_bus_num(as->bus), 
>> PCI_SLOT(as->devfn),
>>               PCI_FUNC(as->devfn), addr, ret.translated_addr);
>>       return ret;
>>   }
>> @@ -2222,30 +2246,28 @@ static AddressSpace 
>> *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>>   {
>>       char name[128];
>>       AMDVIState *s = opaque;
>> -    AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
>> -    int bus_num = pci_bus_num(bus);
>> +    AMDVIAddressSpace *amdvi_dev_as;
>> +    amdvi_as_key *key;
>>   -    iommu_as = s->address_spaces[bus_num];
>> +    amdvi_dev_as = amdvi_as_lookup(s, bus, devfn);
>>         /* allocate memory during the first run */
>> -    if (!iommu_as) {
>> -        iommu_as = g_new0(AMDVIAddressSpace *, PCI_DEVFN_MAX);
>> -        s->address_spaces[bus_num] = iommu_as;
>> -    }
>> -
>> -    /* set up AMD-Vi region */
>> -    if (!iommu_as[devfn]) {
>> +    if (!amdvi_dev_as) {
>>           snprintf(name, sizeof(name), "amd_iommu_devfn_%d", devfn);
>>   -        iommu_as[devfn] = g_new0(AMDVIAddressSpace, 1);
>> -        iommu_as[devfn]->bus_num = (uint8_t)bus_num;
>> -        iommu_as[devfn]->devfn = (uint8_t)devfn;
>> -        iommu_as[devfn]->iommu_state = s;
>> -        iommu_as[devfn]->notifier_flags = IOMMU_NOTIFIER_NONE;
>> -        iommu_as[devfn]->iova_tree = iova_tree_new();
>> -        iommu_as[devfn]->addr_translation = false;
>> +        amdvi_dev_as = g_new0(AMDVIAddressSpace, 1);
>> +        key = g_new0(amdvi_as_key, 1);
>>   -        amdvi_dev_as = iommu_as[devfn];
>> +        amdvi_dev_as->bus = bus;
>> +        amdvi_dev_as->devfn = (uint8_t)devfn;
>> +        amdvi_dev_as->iommu_state = s;
>> +        amdvi_dev_as->notifier_flags = IOMMU_NONE;
> Keep IOMMU_NOTIFIER_NONE. It is the correct type, as you pointed out 
> in a previous patchset.

Ahh right. I missed to update that field while rebasing patches from V2 
of your series to master

Thanks
Sairaj


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup
  2025-10-10  1:22   ` Alejandro Jimenez
@ 2025-10-10  5:14     ` Sairaj Kodilkar
  0 siblings, 0 replies; 13+ messages in thread
From: Sairaj Kodilkar @ 2025-10-10  5:14 UTC (permalink / raw)
  To: Alejandro Jimenez, qemu-devel
  Cc: mst, pbonzini, richard.henderson, philmd, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik



On 10/10/2025 6:52 AM, Alejandro Jimenez wrote:
Hi Alejandro,
> Hi Sairaj,
>
> On 10/8/25 12:43 PM, Sairaj Kodilkar wrote:
>> Physical AMD IOMMU supports upto 64 bits of DMA address. When device 
>> tries
>
> s/upto/up to/ and "a device"
>
>> to read or write from the given DMA address, IOMMU translates the 
>> address
>
> "a given DMA address"
>
>> using page table assigned to that device. Since IOMMU uses per device 
>> page
>> tables, the emulated IOMMU should use the cache tag of 68 bits
>> (64 bit address - 12 bit page alignment + 16 device ID).
>>
>> Current emulated AMD IOMMU uses GLib hash table to create software iotlb
>> and uses 64 bit key to store the IOVA and deviceID, which limits the 
>> IOVA
>> to 60 bits. This cause failure while setting up the device when guest is
>> booted with "iommu.forcedac=1".
>>
>> To solve this problem, define `struct amdvi_iotlb_key` which uses 64 bit
>> IOVA and 16 bit devid as key to store and lookup IOTLB entry.
>>
>
> I wouldn't necessarily mention and quote the structure name since that 
> is an implementation detail and it might change in the future.
>
> Also, the current implementation also combines a 64-bit IOVA 
> (technically a 52bit gfn) with a 16-bit devid, the real change in this 
> patch is in how those same values are being shifted to construct a 
> hash key that avoids truncation as much as possible. So I'd reword the 
> commit message to highlight that.
>

I will update the commit message to highlight this.

>> Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU")
>> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
>> ---
>>   hw/i386/amd_iommu.c | 51 ++++++++++++++++++++++++++++-----------------
>>   hw/i386/amd_iommu.h |  5 +++--
>>   2 files changed, 35 insertions(+), 21 deletions(-)
>>
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index 0a4b4d46d885..5106d9cc4036 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -106,6 +106,11 @@ typedef struct amdvi_as_key {
>>       int devfn;
>>   } amdvi_as_key;
>>   +typedef struct amdvi_iotlb_key {
>> +    uint64_t gfn;
>> +    uint16_t devid;
>> +} amdvi_iotlb_key;
>> +
>>   uint64_t amdvi_extended_feature_register(AMDVIState *s)
>>   {
>>       uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
>> @@ -377,16 +382,6 @@ static void amdvi_log_pagetab_error(AMDVIState 
>> *s, uint16_t devid,
>>                PCI_STATUS_SIG_TARGET_ABORT);
>>   }
>>   -static gboolean amdvi_uint64_equal(gconstpointer v1, gconstpointer 
>> v2)
>> -{
>> -    return *((const uint64_t *)v1) == *((const uint64_t *)v2);
>> -}
>> -
>> -static guint amdvi_uint64_hash(gconstpointer v)
>> -{
>> -    return (guint)*(const uint64_t *)v;
>> -}
>> -
>>   static gboolean amdvi_as_equal(gconstpointer v1, gconstpointer v2)
>>   {
>>       const struct amdvi_as_key *key1 = v1;
>> @@ -423,11 +418,27 @@ static AMDVIAddressSpace 
>> *amdvi_get_as_by_devid(AMDVIState *s, uint16_t devid)
>>                                amdvi_find_as_by_devid, &devid);
>>   }
>>   +static gboolean amdvi_iotlb_equal(gconstpointer v1, gconstpointer v2)
>> +{
>> +    const amdvi_iotlb_key *key1 = v1;
>> +    const amdvi_iotlb_key *key2 = v2;
>> +
>> +    return key1->devid == key2->devid && key1->gfn == key2->gfn;
>> +}
>> +
>> +static guint amdvi_iotlb_hash(gconstpointer v)
>> +{
>> +    const amdvi_iotlb_key *key = v;
>> +    /* Use GPA and DEVID to find the bucket */
>> +    return (guint)(key->gfn << AMDVI_PAGE_SHIFT_4K |
>> +                   (key->devid & ~AMDVI_PAGE_MASK_4K));
>> +}
>> +
>> +
>>   static AMDVIIOTLBEntry *amdvi_iotlb_lookup(AMDVIState *s, hwaddr addr,
>>                                              uint64_t devid)
>>   {
>> -    uint64_t key = (addr >> AMDVI_PAGE_SHIFT_4K) |
>> -                   ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
>> +    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};
>
> This line initializes the key fields with the opposite of the intended 
> values. Please use this initialization style instead to prevent these 
> types of errors, plus it makes the definitions more readable:

Good catch. This was something from my older prototype where fields were 
reversed.
Will update it.

>
> -    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};
> +    amdvi_iotlb_key key = {
> +        .gfn = AMDVI_GET_IOTLB_GFN(addr),
> +        .devid = devid,
> +    };
>
>
>>       return g_hash_table_lookup(s->iotlb, &key);
>>   }
>>   @@ -449,8 +460,7 @@ static gboolean 
>> amdvi_iotlb_remove_by_devid(gpointer key, gpointer value,
>>   static void amdvi_iotlb_remove_page(AMDVIState *s, hwaddr addr,
>>                                       uint64_t devid)
>>   {
>> -    uint64_t key = (addr >> AMDVI_PAGE_SHIFT_4K) |
>> -                   ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
>> +    amdvi_iotlb_key key = {devid, AMDVI_GET_IOTLB_GFN(addr)};
>
> Same as above, key fields are initialized in incorrect order. Same 
> easy fix by using designated initializers.
>
>>       g_hash_table_remove(s->iotlb, &key);
>>   }
>>   @@ -461,8 +471,10 @@ static void amdvi_update_iotlb(AMDVIState *s, 
>> uint16_t devid,
>>       /* don't cache erroneous translations */
>>       if (to_cache.perm != IOMMU_NONE) {
>>           AMDVIIOTLBEntry *entry = g_new(AMDVIIOTLBEntry, 1);
>> -        uint64_t *key = g_new(uint64_t, 1);
>> -        uint64_t gfn = gpa >> AMDVI_PAGE_SHIFT_4K;
>> +        amdvi_iotlb_key *key = g_new(amdvi_iotlb_key, 1);
>> +
>> +        key->gfn = AMDVI_GET_IOTLB_GFN(gpa);
>> +        key->devid = devid;
>>             trace_amdvi_cache_update(domid, PCI_BUS_NUM(devid), 
>> PCI_SLOT(devid),
>>                   PCI_FUNC(devid), gpa, to_cache.translated_addr);
>> @@ -475,7 +487,8 @@ static void amdvi_update_iotlb(AMDVIState *s, 
>> uint16_t devid,
>>           entry->perms = to_cache.perm;
>>           entry->translated_addr = to_cache.translated_addr;
>>           entry->page_mask = to_cache.addr_mask;
>> -        *key = gfn | ((uint64_t)(devid) << AMDVI_DEVID_SHIFT);
>> +        entry->devid = devid;
>> +
>>           g_hash_table_replace(s->iotlb, key, entry);
>>       }
>>   }
>> @@ -2529,8 +2542,8 @@ static void amdvi_sysbus_realize(DeviceState 
>> *dev, Error **errp)
>>           }
>>       }
>>   -    s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
>> -                                     amdvi_uint64_equal, g_free, 
>> g_free);
>> +    s->iotlb = g_hash_table_new_full(amdvi_iotlb_hash,
>> +                                     amdvi_iotlb_equal, g_free, 
>> g_free);
>>         s->address_spaces = g_hash_table_new_full(amdvi_as_hash,
>>                                        amdvi_as_equal, g_free, g_free);
>> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
>> index 38471b95d153..8089f9472ac4 100644
>> --- a/hw/i386/amd_iommu.h
>> +++ b/hw/i386/amd_iommu.h
>> @@ -220,8 +220,9 @@
>>   #define PAGE_SIZE_PTE_COUNT(pgsz)       (1ULL << ((ctz64(pgsz) - 
>> 12) % 9))
>>     /* IOTLB */
>> -#define AMDVI_IOTLB_MAX_SIZE 1024
>> -#define AMDVI_DEVID_SHIFT    36
>> +#define AMDVI_IOTLB_MAX_SIZE        1024
>> +#define AMDVI_IOTLB_DEVID_SHIFT     48
>
> Remove AMDVI_IOTLB_DEVID_SHIFT since it is not currently used (I 
> assume it is a left over from earlier prototype)...

Right

Thanks
Sairaj


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Cleanups and fixes (PART 2)
  2025-10-10  2:19   ` Philippe Mathieu-Daudé
@ 2025-10-10  5:25     ` Sairaj Kodilkar
  2025-10-10  6:39       ` Michael S. Tsirkin
  2025-10-10 14:37     ` Alejandro Jimenez
  1 sibling, 1 reply; 13+ messages in thread
From: Sairaj Kodilkar @ 2025-10-10  5:25 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Alejandro Jimenez, qemu-devel
  Cc: mst, pbonzini, richard.henderson, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik



On 10/10/2025 7:49 AM, Philippe Mathieu-Daudé wrote:
> Hi Alejandro,
>
> On 10/10/25 03:33, Alejandro Jimenez wrote:
>
>> I know the commit log is not consistent so far, but going forward I 
>> propose we adopt the shorter prefix "amd_iommu: " for commit 
>> summaries. There is no ambiguity (only one arch has amd_iommu), so 
>> the full path is not required (i.e. avoid 'hw/i386/amd_iommu: '). 
>> Shorter boilerplate leaves more space for relevant details, and helps 
>> people like me who struggle to comply with character limits :).
>
> What about "hw/amd_iommu:" to keep 'hw' in subject?
I also prefer "amd_iommu". But I'll wait for alejandro's reply before 
posting V2.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Cleanups and fixes (PART 2)
  2025-10-10  5:25     ` Sairaj Kodilkar
@ 2025-10-10  6:39       ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2025-10-10  6:39 UTC (permalink / raw)
  To: Sairaj Kodilkar
  Cc: Philippe Mathieu-Daudé, Alejandro Jimenez, qemu-devel,
	pbonzini, richard.henderson, suravee.suthikulpanit, vasant.hegde,
	marcel.apfelbaum, eduardo, santosh.shukla, aik

On Fri, Oct 10, 2025 at 10:55:45AM +0530, Sairaj Kodilkar wrote:
> 
> 
> On 10/10/2025 7:49 AM, Philippe Mathieu-Daudé wrote:
> > Hi Alejandro,
> > 
> > On 10/10/25 03:33, Alejandro Jimenez wrote:
> > 
> > > I know the commit log is not consistent so far, but going forward I
> > > propose we adopt the shorter prefix "amd_iommu: " for commit
> > > summaries. There is no ambiguity (only one arch has amd_iommu), so
> > > the full path is not required (i.e. avoid 'hw/i386/amd_iommu: ').
> > > Shorter boilerplate leaves more space for relevant details, and
> > > helps people like me who struggle to comply with character limits
> > > :).
> > 
> > What about "hw/amd_iommu:" to keep 'hw' in subject?
> I also prefer "amd_iommu". But I'll wait for alejandro's reply before
> posting V2.

If possible, pls use that prefix on the cover letter too.

Thanks!

-- 
MST



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Cleanups and fixes (PART 2)
  2025-10-10  2:19   ` Philippe Mathieu-Daudé
  2025-10-10  5:25     ` Sairaj Kodilkar
@ 2025-10-10 14:37     ` Alejandro Jimenez
  2025-10-10 14:45       ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: Alejandro Jimenez @ 2025-10-10 14:37 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Sairaj Kodilkar, qemu-devel
  Cc: mst, pbonzini, richard.henderson, suravee.suthikulpanit,
	vasant.hegde, marcel.apfelbaum, eduardo, santosh.shukla, aik

Hi Philippe,

On 10/9/25 10:19 PM, Philippe Mathieu-Daudé wrote:
> Hi Alejandro,
> 
> On 10/10/25 03:33, Alejandro Jimenez wrote:
> 
>> I know the commit log is not consistent so far, but going forward I 
>> propose we adopt the shorter prefix "amd_iommu: " for commit 
>> summaries. There is no ambiguity (only one arch has amd_iommu), so the 
>> full path is not required (i.e. avoid 'hw/i386/amd_iommu: '). Shorter 
>> boilerplate leaves more space for relevant details, and helps people 
>> like me who struggle to comply with character limits :).
> 
> What about "hw/amd_iommu:" to keep 'hw' in subject?

Is there any tooling that relies on the hw prefix? Skipping the arch in 
the prefix is confusing I think, since hw/amd_iommu is not a valid path 
in the repository.

I was looking for precedent of any preferred format in the commit logs 
under hw/i386/ and there is a lot of variance. But specifically for 
IOMMU emulation code, my interpretation is that the short prefix style 
is most commonly used e.g.

Common x86 IOMMU uses "x86-iommu: "

The VT-d changes are typically in the form:
"intel_iommu: XYZ", which Clément also pointed out recently in:
https://lore.kernel.org/qemu-devel/f97bc435e8ed1c295919350d300068e45ab0bb67.camel@eviden.com/

virtio IOMMU uses "virtio-iommu: "

RISC-V IOMMU uses the full path: "hw/riscv/riscv-iommu: "

SPARC64 has a few commits with "sun4u_iommu: "

I don't believe the 'hw' component is required to avoid ambiguity, but 
perhaps there is something else I am missing...

Thank you,
Alejandro


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] Cleanups and fixes (PART 2)
  2025-10-10 14:37     ` Alejandro Jimenez
@ 2025-10-10 14:45       ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2025-10-10 14:45 UTC (permalink / raw)
  To: Alejandro Jimenez
  Cc: Philippe Mathieu-Daudé, Sairaj Kodilkar, qemu-devel,
	pbonzini, richard.henderson, suravee.suthikulpanit, vasant.hegde,
	marcel.apfelbaum, eduardo, santosh.shukla, aik

On Fri, Oct 10, 2025 at 10:37:55AM -0400, Alejandro Jimenez wrote:
> Hi Philippe,
> 
> On 10/9/25 10:19 PM, Philippe Mathieu-Daudé wrote:
> > Hi Alejandro,
> > 
> > On 10/10/25 03:33, Alejandro Jimenez wrote:
> > 
> > > I know the commit log is not consistent so far, but going forward I
> > > propose we adopt the shorter prefix "amd_iommu: " for commit
> > > summaries. There is no ambiguity (only one arch has amd_iommu), so
> > > the full path is not required (i.e. avoid 'hw/i386/amd_iommu: ').
> > > Shorter boilerplate leaves more space for relevant details, and
> > > helps people like me who struggle to comply with character limits
> > > :).
> > 
> > What about "hw/amd_iommu:" to keep 'hw' in subject?
> 
> Is there any tooling that relies on the hw prefix? Skipping the arch in the
> prefix is confusing I think, since hw/amd_iommu is not a valid path in the
> repository.
> 
> I was looking for precedent of any preferred format in the commit logs under
> hw/i386/ and there is a lot of variance. But specifically for IOMMU
> emulation code, my interpretation is that the short prefix style is most
> commonly used e.g.
> 
> Common x86 IOMMU uses "x86-iommu: "
> 
> The VT-d changes are typically in the form:
> "intel_iommu: XYZ", which Clément also pointed out recently in:
> https://lore.kernel.org/qemu-devel/f97bc435e8ed1c295919350d300068e45ab0bb67.camel@eviden.com/
> 
> virtio IOMMU uses "virtio-iommu: "
> 
> RISC-V IOMMU uses the full path: "hw/riscv/riscv-iommu: "
> 
> SPARC64 has a few commits with "sun4u_iommu: "
> 
> I don't believe the 'hw' component is required to avoid ambiguity, but
> perhaps there is something else I am missing...
> 
> Thank you,
> Alejandro

FWIW I like amd_iommu

-- 
MST



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-10-10 14:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08 16:43 [PATCH 0/2] Cleanups and fixes (PART 2) Sairaj Kodilkar
2025-10-08 16:43 ` [PATCH 1/2] hw/i386/amd_iommu: Fix handling device on buses != 0 Sairaj Kodilkar
2025-10-09 16:17   ` Alejandro Jimenez
2025-10-10  5:04     ` Sairaj Kodilkar
2025-10-08 16:43 ` [PATCH 2/2] hw/i386/amd_iommu: Support 64 bit address for IOTLB lookup Sairaj Kodilkar
2025-10-10  1:22   ` Alejandro Jimenez
2025-10-10  5:14     ` Sairaj Kodilkar
2025-10-10  1:33 ` [PATCH 0/2] Cleanups and fixes (PART 2) Alejandro Jimenez
2025-10-10  2:19   ` Philippe Mathieu-Daudé
2025-10-10  5:25     ` Sairaj Kodilkar
2025-10-10  6:39       ` Michael S. Tsirkin
2025-10-10 14:37     ` Alejandro Jimenez
2025-10-10 14:45       ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).