From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
To: qemu-devel@nongnu.org
Cc: mst@redhat.com, clement.mathieu--drif@eviden.com,
pbonzini@redhat.com, richard.henderson@linaro.org,
eduardo@habkost.net, peterx@redhat.com, david@redhat.com,
philmd@linaro.org, marcel.apfelbaum@gmail.com,
alex.williamson@redhat.com, imammedo@redhat.com,
anisinha@redhat.com, vasant.hegde@amd.com,
suravee.suthikulpanit@amd.com, santosh.shukla@amd.com,
sarunkod@amd.com, Wei.Huang2@amd.com, Ankit.Soni@amd.com,
ethan.milon@eviden.com, joao.m.martins@oracle.com,
boris.ostrovsky@oracle.com, alejandro.j.jimenez@oracle.com
Subject: [PATCH v3 07/22] amd_iommu: Add helpers to walk AMD v1 Page Table format
Date: Fri, 19 Sep 2025 21:35:00 +0000 [thread overview]
Message-ID: <20250919213515.917111-8-alejandro.j.jimenez@oracle.com> (raw)
In-Reply-To: <20250919213515.917111-1-alejandro.j.jimenez@oracle.com>
The current amdvi_page_walk() is designed to be called by the replay()
method. Rather than drastically altering it, introduce helpers to fetch
guest PTEs that will be used by a page walker implementation.
Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
---
hw/i386/amd_iommu.c | 123 ++++++++++++++++++++++++++++++++++++++++++++
hw/i386/amd_iommu.h | 40 ++++++++++++++
2 files changed, 163 insertions(+)
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 29ed3f0ef292e..c25981ff93c02 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -87,6 +87,8 @@ typedef enum AMDVIFaultReason {
AMDVI_FR_DTE_RTR_ERR = 1, /* Failure to retrieve DTE */
AMDVI_FR_DTE_V, /* DTE[V] = 0 */
AMDVI_FR_DTE_TV, /* DTE[TV] = 0 */
+ AMDVI_FR_PT_ROOT_INV, /* Page Table Root ptr invalid */
+ AMDVI_FR_PT_ENTRY_INV, /* Failure to read PTE from guest memory */
} AMDVIFaultReason;
uint64_t amdvi_extended_feature_register(AMDVIState *s)
@@ -558,6 +560,127 @@ static int amdvi_as_to_dte(AMDVIAddressSpace *as, uint64_t *dte)
return 0;
}
+/*
+ * For a PTE encoding a large page, return the page size it encodes as described
+ * by the AMD IOMMU Specification Table 14: Example Page Size Encodings.
+ * No need to adjust the value of the PTE to point to the first PTE in the large
+ * page since the encoding guarantees all "base" PTEs in the large page are the
+ * same.
+ */
+static uint64_t large_pte_page_size(uint64_t pte)
+{
+ assert(PTE_NEXT_LEVEL(pte) == 7);
+
+ /* Determine size of the large/contiguous page encoded in the PTE */
+ return PTE_LARGE_PAGE_SIZE(pte);
+}
+
+/*
+ * Helper function to fetch a PTE using AMD v1 pgtable format.
+ * On successful page walk, returns 0 and pte parameter points to a valid PTE.
+ * On failure, returns:
+ * -AMDVI_FR_PT_ROOT_INV: A page walk is not possible due to conditions like DTE
+ * with invalid permissions, Page Table Root can not be read from DTE, or a
+ * larger IOVA than supported by page table level encoded in DTE[Mode].
+ * -AMDVI_FR_PT_ENTRY_INV: A PTE could not be read from guest memory during a
+ * page table walk. This means that the DTE has valid data, but one of the
+ * lower level entries in the Page Table could not be read.
+ */
+static int __attribute__((unused))
+fetch_pte(AMDVIAddressSpace *as, hwaddr address, uint64_t dte, uint64_t *pte,
+ hwaddr *page_size)
+{
+ IOMMUAccessFlags perms = amdvi_get_perms(dte);
+
+ uint8_t level, mode;
+ uint64_t pte_addr;
+
+ *pte = dte;
+ *page_size = 0;
+
+ if (perms == IOMMU_NONE) {
+ return -AMDVI_FR_PT_ROOT_INV;
+ }
+
+ /*
+ * The Linux kernel driver initializes the default mode to 3, corresponding
+ * to a 39-bit GPA space, where each entry in the pagetable translates to a
+ * 1GB (2^30) page size.
+ */
+ level = mode = get_pte_translation_mode(dte);
+ assert(mode > 0 && mode < 7);
+
+ /*
+ * If IOVA is larger than the max supported by the current pgtable level,
+ * there is nothing to do.
+ */
+ if (address > PT_LEVEL_MAX_ADDR(mode - 1)) {
+ /* IOVA too large for the current DTE */
+ return -AMDVI_FR_PT_ROOT_INV;
+ }
+
+ do {
+ level -= 1;
+
+ /* Update the page_size */
+ *page_size = PTE_LEVEL_PAGE_SIZE(level);
+
+ /* Permission bits are ANDed at every level, including the DTE */
+ perms &= amdvi_get_perms(*pte);
+ if (perms == IOMMU_NONE) {
+ return 0;
+ }
+
+ /* Not Present */
+ if (!IOMMU_PTE_PRESENT(*pte)) {
+ return 0;
+ }
+
+ /* Large or Leaf PTE found */
+ if (PTE_NEXT_LEVEL(*pte) == 7 || PTE_NEXT_LEVEL(*pte) == 0) {
+ /* Leaf PTE found */
+ break;
+ }
+
+ /*
+ * Index the pgtable using the IOVA bits corresponding to current level
+ * and walk down to the lower level.
+ */
+ pte_addr = NEXT_PTE_ADDR(*pte, level, address);
+ *pte = amdvi_get_pte_entry(as->iommu_state, pte_addr, as->devfn);
+
+ if (*pte == (uint64_t)-1) {
+ /*
+ * A returned PTE of -1 indicates a failure to read the page table
+ * entry from guest memory.
+ */
+ if (level == mode - 1) {
+ /* Failure to retrieve the Page Table from Root Pointer */
+ *page_size = 0;
+ return -AMDVI_FR_PT_ROOT_INV;
+ } else {
+ /* Failure to read PTE. Page walk skips a page_size chunk */
+ return -AMDVI_FR_PT_ENTRY_INV;
+ }
+ }
+ } while (level > 0);
+
+ assert(PTE_NEXT_LEVEL(*pte) == 0 || PTE_NEXT_LEVEL(*pte) == 7 ||
+ level == 0);
+ /*
+ * Page walk ends when Next Level field on PTE shows that either a leaf PTE
+ * or a series of large PTEs have been reached. In the latter case, even if
+ * the range starts in the middle of a contiguous page, the returned PTE
+ * must be the first PTE of the series.
+ */
+ if (PTE_NEXT_LEVEL(*pte) == 7) {
+ /* Update page_size with the large PTE page size */
+ *page_size = large_pte_page_size(*pte);
+ }
+
+ return 0;
+}
+
/* log error without aborting since linux seems to be using reserved bits */
static void amdvi_inval_devtab_entry(AMDVIState *s, uint64_t *cmd)
{
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index c1170a820257e..9f833b297d25c 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -178,6 +178,46 @@
#define AMDVI_GATS_MODE (2ULL << 12)
#define AMDVI_HATS_MODE (2ULL << 10)
+/* Page Table format */
+
+#define AMDVI_PTE_PR (1ULL << 0)
+#define AMDVI_PTE_NEXT_LEVEL_MASK GENMASK64(11, 9)
+
+#define IOMMU_PTE_PRESENT(pte) ((pte) & AMDVI_PTE_PR)
+
+/* Using level=0 for leaf PTE at 4K page size */
+#define PT_LEVEL_SHIFT(level) (12 + ((level) * 9))
+
+/* Return IOVA bit group used to index the Page Table at specific level */
+#define PT_LEVEL_INDEX(level, iova) (((iova) >> PT_LEVEL_SHIFT(level)) & \
+ GENMASK64(8, 0))
+
+/* Return the max address for a specified level i.e. max_oaddr */
+#define PT_LEVEL_MAX_ADDR(x) (((x) < 5) ? \
+ ((1ULL << PT_LEVEL_SHIFT((x + 1))) - 1) : \
+ (~(0ULL)))
+
+/* Extract the NextLevel field from PTE/PDE */
+#define PTE_NEXT_LEVEL(pte) (((pte) & AMDVI_PTE_NEXT_LEVEL_MASK) >> 9)
+
+/* Take page table level and return default pagetable size for level */
+#define PTE_LEVEL_PAGE_SIZE(level) (1ULL << (PT_LEVEL_SHIFT(level)))
+
+/*
+ * Return address of lower level page table encoded in PTE and specified by
+ * current level and corresponding IOVA bit group at such level.
+ */
+#define NEXT_PTE_ADDR(pte, level, iova) (((pte) & AMDVI_DEV_PT_ROOT_MASK) + \
+ (PT_LEVEL_INDEX(level, iova) * 8))
+
+/*
+ * Take a PTE value with mode=0x07 and return the page size it encodes.
+ */
+#define PTE_LARGE_PAGE_SIZE(pte) (1ULL << (1 + cto64(((pte) | 0xfffULL))))
+
+/* Return number of PTEs to use for a given page size (expected power of 2) */
+#define PAGE_SIZE_PTE_COUNT(pgsz) (1ULL << ((ctz64(pgsz) - 12) % 9))
+
/* IOTLB */
#define AMDVI_IOTLB_MAX_SIZE 1024
#define AMDVI_DEVID_SHIFT 36
--
2.43.5
next prev parent reply other threads:[~2025-09-19 21:36 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-19 21:34 [PATCH v3 00/22] AMD vIOMMU: DMA remapping support for VFIO devices Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 01/22] memory: Adjust event ranges to fit within notifier boundaries Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 02/22] amd_iommu: Document '-device amd-iommu' common options Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 03/22] amd_iommu: Reorder device and page table helpers Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 04/22] amd_iommu: Helper to decode size of page invalidation command Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 05/22] amd_iommu: Add helper function to extract the DTE Alejandro Jimenez
2025-09-19 21:34 ` [PATCH v3 06/22] amd_iommu: Return an error when unable to read PTE from guest memory Alejandro Jimenez
2025-09-19 21:35 ` Alejandro Jimenez [this message]
2025-09-19 21:35 ` [PATCH v3 08/22] amd_iommu: Add a page walker to sync shadow page tables on invalidation Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 09/22] amd_iommu: Add basic structure to support IOMMU notifier updates Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 10/22] amd_iommu: Sync shadow page tables on page invalidation Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 11/22] amd_iommu: Use iova_tree records to determine large page size on UNMAP Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 12/22] amd_iommu: Unmap all address spaces under the AMD IOMMU on reset Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 13/22] amd_iommu: Add replay callback Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 14/22] amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 15/22] amd_iommu: Toggle memory regions based on address translation mode Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 16/22] amd_iommu: Set all address spaces to use passthrough mode on reset Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 17/22] amd_iommu: Add dma-remap property to AMD vIOMMU device Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 18/22] amd_iommu: Toggle address translation mode on devtab entry invalidation Alejandro Jimenez
2025-10-06 6:08 ` Sairaj Kodilkar
2025-10-06 6:15 ` Michael S. Tsirkin
2025-10-06 6:25 ` Sairaj Kodilkar
2025-10-06 16:03 ` Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 19/22] amd_iommu: Do not assume passthrough translation when DTE[TV]=0 Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 20/22] amd_iommu: Refactor amdvi_page_walk() to use common code for page walk Alejandro Jimenez
2025-09-19 21:35 ` [PATCH v3 21/22] i386/intel-iommu: Move dma_translation to x86-iommu Alejandro Jimenez
2025-09-22 5:33 ` CLEMENT MATHIEU--DRIF
2025-09-19 21:35 ` [PATCH v3 22/22] amd_iommu: HATDis/HATS=11 support Alejandro Jimenez
2025-10-06 16:07 ` [PATCH v3 00/22] AMD vIOMMU: DMA remapping support for VFIO devices Cédric Le Goater
2025-10-06 18:44 ` Alejandro Jimenez
2025-10-07 5:45 ` Cédric Le Goater
2025-10-07 8:17 ` Vasant Hegde
2025-10-07 19:04 ` Joao Martins
2025-10-07 20:41 ` Cédric Le Goater
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250919213515.917111-8-alejandro.j.jimenez@oracle.com \
--to=alejandro.j.jimenez@oracle.com \
--cc=Ankit.Soni@amd.com \
--cc=Wei.Huang2@amd.com \
--cc=alex.williamson@redhat.com \
--cc=anisinha@redhat.com \
--cc=boris.ostrovsky@oracle.com \
--cc=clement.mathieu--drif@eviden.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=ethan.milon@eviden.com \
--cc=imammedo@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=santosh.shukla@amd.com \
--cc=sarunkod@amd.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=vasant.hegde@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).