[PATCH v3 07/22] amd_iommu: Add helpers to walk AMD v1 Page Table format

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
To: qemu-devel@nongnu.org
Cc: mst@redhat.com, clement.mathieu--drif@eviden.com,
	pbonzini@redhat.com, richard.henderson@linaro.org,
	eduardo@habkost.net, peterx@redhat.com, david@redhat.com,
	philmd@linaro.org, marcel.apfelbaum@gmail.com,
	alex.williamson@redhat.com, imammedo@redhat.com,
	anisinha@redhat.com, vasant.hegde@amd.com,
	suravee.suthikulpanit@amd.com, santosh.shukla@amd.com,
	sarunkod@amd.com, Wei.Huang2@amd.com, Ankit.Soni@amd.com,
	ethan.milon@eviden.com, joao.m.martins@oracle.com,
	boris.ostrovsky@oracle.com, alejandro.j.jimenez@oracle.com
Subject: [PATCH v3 07/22] amd_iommu: Add helpers to walk AMD v1 Page Table format
Date: Fri, 19 Sep 2025 21:35:00 +0000	[thread overview]
Message-ID: <20250919213515.917111-8-alejandro.j.jimenez@oracle.com> (raw)
In-Reply-To: <20250919213515.917111-1-alejandro.j.jimenez@oracle.com>

The current amdvi_page_walk() is designed to be called by the replay()
method. Rather than drastically altering it, introduce helpers to fetch
guest PTEs that will be used by a page walker implementation.

Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
---
 hw/i386/amd_iommu.c | 123 ++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/amd_iommu.h |  40 ++++++++++++++
 2 files changed, 163 insertions(+)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 29ed3f0ef292e..c25981ff93c02 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -87,6 +87,8 @@ typedef enum AMDVIFaultReason {
     AMDVI_FR_DTE_RTR_ERR = 1,   /* Failure to retrieve DTE */
     AMDVI_FR_DTE_V,             /* DTE[V] = 0 */
     AMDVI_FR_DTE_TV,            /* DTE[TV] = 0 */
+    AMDVI_FR_PT_ROOT_INV,       /* Page Table Root ptr invalid */
+    AMDVI_FR_PT_ENTRY_INV,      /* Failure to read PTE from guest memory */
 } AMDVIFaultReason;
 
 uint64_t amdvi_extended_feature_register(AMDVIState *s)
@@ -558,6 +560,127 @@ static int amdvi_as_to_dte(AMDVIAddressSpace *as, uint64_t *dte)
     return 0;
 }
 
+/*
+ * For a PTE encoding a large page, return the page size it encodes as described
+ * by the AMD IOMMU Specification Table 14: Example Page Size Encodings.
+ * No need to adjust the value of the PTE to point to the first PTE in the large
+ * page since the encoding guarantees all "base" PTEs in the large page are the
+ * same.
+ */
+static uint64_t large_pte_page_size(uint64_t pte)
+{
+    assert(PTE_NEXT_LEVEL(pte) == 7);
+
+    /* Determine size of the large/contiguous page encoded in the PTE */
+    return PTE_LARGE_PAGE_SIZE(pte);
+}
+
+/*
+ * Helper function to fetch a PTE using AMD v1 pgtable format.
+ * On successful page walk, returns 0 and pte parameter points to a valid PTE.
+ * On failure, returns:
+ * -AMDVI_FR_PT_ROOT_INV: A page walk is not possible due to conditions like DTE
+ *      with invalid permissions, Page Table Root can not be read from DTE, or a
+ *      larger IOVA than supported by page table level encoded in DTE[Mode].
+ * -AMDVI_FR_PT_ENTRY_INV: A PTE could not be read from guest memory during a
+ *      page table walk. This means that the DTE has valid data, but one of the
+ *      lower level entries in the Page Table could not be read.
+ */
+static int __attribute__((unused))
+fetch_pte(AMDVIAddressSpace *as, hwaddr address, uint64_t dte, uint64_t *pte,
+          hwaddr *page_size)
+{
+    IOMMUAccessFlags perms = amdvi_get_perms(dte);
+
+    uint8_t level, mode;
+    uint64_t pte_addr;
+
+    *pte = dte;
+    *page_size = 0;
+
+    if (perms == IOMMU_NONE) {
+        return -AMDVI_FR_PT_ROOT_INV;
+    }
+
+    /*
+     * The Linux kernel driver initializes the default mode to 3, corresponding
+     * to a 39-bit GPA space, where each entry in the pagetable translates to a
+     * 1GB (2^30) page size.
+     */
+    level = mode = get_pte_translation_mode(dte);
+    assert(mode > 0 && mode < 7);
+
+    /*
+     * If IOVA is larger than the max supported by the current pgtable level,
+     * there is nothing to do.
+     */
+    if (address > PT_LEVEL_MAX_ADDR(mode - 1)) {
+        /* IOVA too large for the current DTE */
+        return -AMDVI_FR_PT_ROOT_INV;
+    }
+
+    do {
+        level -= 1;
+
+        /* Update the page_size */
+        *page_size = PTE_LEVEL_PAGE_SIZE(level);
+
+        /* Permission bits are ANDed at every level, including the DTE */
+        perms &= amdvi_get_perms(*pte);
+        if (perms == IOMMU_NONE) {
+            return 0;
+        }
+
+        /* Not Present */
+        if (!IOMMU_PTE_PRESENT(*pte)) {
+            return 0;
+        }
+
+        /* Large or Leaf PTE found */
+        if (PTE_NEXT_LEVEL(*pte) == 7 || PTE_NEXT_LEVEL(*pte) == 0) {
+            /* Leaf PTE found */
+            break;
+        }
+
+        /*
+         * Index the pgtable using the IOVA bits corresponding to current level
+         * and walk down to the lower level.
+         */
+        pte_addr = NEXT_PTE_ADDR(*pte, level, address);
+        *pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
+
+        if (*pte == (uint64_t)-1) {
+            /*
+             * A returned PTE of -1 indicates a failure to read the page table
+             * entry from guest memory.
+             */
+            if (level == mode - 1) {
+                /* Failure to retrieve the Page Table from Root Pointer */
+                *page_size = 0;
+                return -AMDVI_FR_PT_ROOT_INV;
+            } else {
+                /* Failure to read PTE. Page walk skips a page_size chunk */
+                return -AMDVI_FR_PT_ENTRY_INV;
+            }
+        }
+    } while (level > 0);
+
+    assert(PTE_NEXT_LEVEL(*pte) == 0 || PTE_NEXT_LEVEL(*pte) == 7 ||
+           level == 0);
+    /*
+     * Page walk ends when Next Level field on PTE shows that either a leaf PTE
+     * or a series of large PTEs have been reached. In the latter case, even if
+     * the range starts in the middle of a contiguous page, the returned PTE
+     * must be the first PTE of the series.
+     */
+    if (PTE_NEXT_LEVEL(*pte) == 7) {
+        /* Update page_size with the large PTE page size */
+        *page_size = large_pte_page_size(*pte);
+    }
+
+    return 0;
+}
+
 /* log error without aborting since linux seems to be using reserved bits */
 static void amdvi_inval_devtab_entry(AMDVIState *s, uint64_t *cmd)
 {
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index c1170a820257e..9f833b297d25c 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -178,6 +178,46 @@
 #define AMDVI_GATS_MODE                 (2ULL <<  12)
 #define AMDVI_HATS_MODE                 (2ULL <<  10)
 
+/* Page Table format */
+
+#define AMDVI_PTE_PR                    (1ULL << 0)
+#define AMDVI_PTE_NEXT_LEVEL_MASK       GENMASK64(11, 9)
+
+#define IOMMU_PTE_PRESENT(pte)          ((pte) & AMDVI_PTE_PR)
+
+/* Using level=0 for leaf PTE at 4K page size */
+#define PT_LEVEL_SHIFT(level)           (12 + ((level) * 9))
+
+/* Return IOVA bit group used to index the Page Table at specific level */
+#define PT_LEVEL_INDEX(level, iova)     (((iova) >> PT_LEVEL_SHIFT(level)) & \
+                                        GENMASK64(8, 0))
+
+/* Return the max address for a specified level i.e. max_oaddr */
+#define PT_LEVEL_MAX_ADDR(x)    (((x) < 5) ? \
+                                ((1ULL << PT_LEVEL_SHIFT((x + 1))) - 1) : \
+                                (~(0ULL)))
+
+/* Extract the NextLevel field from PTE/PDE */
+#define PTE_NEXT_LEVEL(pte)     (((pte) & AMDVI_PTE_NEXT_LEVEL_MASK) >> 9)
+
+/* Take page table level and return default pagetable size for level */
+#define PTE_LEVEL_PAGE_SIZE(level)      (1ULL << (PT_LEVEL_SHIFT(level)))
+
+/*
+ * Return address of lower level page table encoded in PTE and specified by
+ * current level and corresponding IOVA bit group at such level.
+ */
+#define NEXT_PTE_ADDR(pte, level, iova) (((pte) & AMDVI_DEV_PT_ROOT_MASK) + \
+                                        (PT_LEVEL_INDEX(level, iova) * 8))
+
+/*
+ * Take a PTE value with mode=0x07 and return the page size it encodes.
+ */
+#define PTE_LARGE_PAGE_SIZE(pte)    (1ULL << (1 + cto64(((pte) | 0xfffULL))))
+
+/* Return number of PTEs to use for a given page size (expected power of 2) */
+#define PAGE_SIZE_PTE_COUNT(pgsz)       (1ULL << ((ctz64(pgsz) - 12) % 9))
+
 /* IOTLB */
 #define AMDVI_IOTLB_MAX_SIZE 1024
 #define AMDVI_DEVID_SHIFT    36
-- 
2.43.5

next prev parent reply	other threads:[~2025-09-19 21:36 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-19 21:34 [PATCH v3 00/22] AMD vIOMMU: DMA remapping support for VFIO devices Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 01/22] memory: Adjust event ranges to fit within notifier boundaries Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 02/22] amd_iommu: Document '-device amd-iommu' common options Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 03/22] amd_iommu: Reorder device and page table helpers Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 04/22] amd_iommu: Helper to decode size of page invalidation command Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 05/22] amd_iommu: Add helper function to extract the DTE Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 06/22] amd_iommu: Return an error when unable to read PTE from guest memory Alejandro Jimenez
2025-09-19 21:35 ` Alejandro Jimenez [this message]
2025-09-19 21:35 ` [PATCH v3 08/22] amd_iommu: Add a page walker to sync shadow page tables on invalidation Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 09/22] amd_iommu: Add basic structure to support IOMMU notifier updates Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 10/22] amd_iommu: Sync shadow page tables on page invalidation Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 11/22] amd_iommu: Use iova_tree records to determine large page size on UNMAP Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 12/22] amd_iommu: Unmap all address spaces under the AMD IOMMU on reset Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 13/22] amd_iommu: Add replay callback Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 14/22] amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 15/22] amd_iommu: Toggle memory regions based on address translation mode Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 16/22] amd_iommu: Set all address spaces to use passthrough mode on reset Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 17/22] amd_iommu: Add dma-remap property to AMD vIOMMU device Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 18/22] amd_iommu: Toggle address translation mode on devtab entry invalidation Alejandro Jimenez
2025-10-06  6:08   ` Sairaj Kodilkar
2025-10-06  6:15     ` Michael S. Tsirkin
2025-10-06  6:25       ` Sairaj Kodilkar
2025-10-06 16:03         ` Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 19/22] amd_iommu: Do not assume passthrough translation when DTE[TV]=0 Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 20/22] amd_iommu: Refactor amdvi_page_walk() to use common code for page walk Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 21/22] i386/intel-iommu: Move dma_translation to x86-iommu Alejandro Jimenez
2025-09-22  5:33   ` CLEMENT MATHIEU--DRIF
2025-09-19 21:35 ` [PATCH v3 22/22] amd_iommu: HATDis/HATS=11 support Alejandro Jimenez
2025-10-06 16:07 ` [PATCH v3 00/22] AMD vIOMMU: DMA remapping support for VFIO devices Cédric Le Goater
2025-10-06 18:44   ` Alejandro Jimenez
2025-10-07  5:45     ` Cédric Le Goater
2025-10-07  8:17       ` Vasant Hegde
2025-10-07 19:04       ` Joao Martins
2025-10-07 20:41         ` Cédric Le Goater

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:29ed3f0ef292 dfblob:c25981ff93c0 dfblob:c1170a820257
dfblob:9f833b297d25 )
 OR (
bs:"[PATCH v3 07/22] amd_iommu: Add helpers to walk AMD v1 Page Table format" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250919213515.917111-8-alejandro.j.jimenez@oracle.com \
    --to=alejandro.j.jimenez@oracle.com \
    --cc=Ankit.Soni@amd.com \
    --cc=Wei.Huang2@amd.com \
    --cc=alex.williamson@redhat.com \
    --cc=anisinha@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=clement.mathieu--drif@eviden.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=ethan.milon@eviden.com \
    --cc=imammedo@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=santosh.shukla@amd.com \
    --cc=sarunkod@amd.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=vasant.hegde@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).