Re: [PATCH v7 04/15] iommupt: Add the AMD IOMMU v1 page table format

Linux Kernel Selftest development
 help / color / mirror / Atom feed

From: Vasant Hegde <vasant.hegde@amd.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Alexandre Ghiti <alex@ghiti.fr>,
	Anup Patel <anup@brainfault.org>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Jonathan Corbet <corbet@lwn.net>,
	iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	Justin Stitt <justinstitt@google.com>,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-riscv@lists.infradead.org, llvm@lists.linux.dev,
	Bill Wendling <morbo@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <pjw@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Shuah Khan <shuah@kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Will Deacon <will@kernel.org>
Cc: Alexey Kardashevskiy <aik@amd.com>,
	Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
	James Gowans <jgowans@amazon.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Michael Roth <michael.roth@amd.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	patches@lists.linux.dev
Subject: Re: [PATCH v7 04/15] iommupt: Add the AMD IOMMU v1 page table format
Date: Fri, 31 Oct 2025 15:14:42 +0530	[thread overview]
Message-ID: <9a9a82c3-04b1-4a0e-b7aa-098e36155bdc@amd.com> (raw)
In-Reply-To: <4-v7-ab019a8791e2+175b8-iommu_pt_jgg@nvidia.com>

On 10/23/2025 11:50 PM, Jason Gunthorpe wrote:
> AMD IOMMU v1 is unique in supporting contiguous pages with a variable size
> and it can decode the full 64 bit VA space. Unlike other x86 page tables
> this explicitly does not do sign extension as part of allowing the entire
> 64 bit VA space to be supported.
> 
> The general design is quite similar to the x86 PAE format, except with a
> 6th level and quite different PTE encoding.
> 
> This format is the only one that uses the PT_FEAT_DYNAMIC_TOP feature in
> the existing code as the existing AMDv1 code starts out with a 3 level
> table and adds levels on the fly if more IOVA is needed.
> 
> Comparing the performance of several operations to the existing version:
> 
> iommu_map()
>    pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
>      2^12,     65,64    ,      62,61      ,  -1.01
>      2^13,     70,66    ,      67,62      ,  -8.08
>      2^14,     73,69    ,      71,65      ,  -9.09
>      2^15,     78,75    ,      75,71      ,  -5.05
>      2^16,     89,89    ,      86,84      ,  -2.02
>      2^17,    128,121   ,     124,112     , -10.10
>      2^18,    175,175   ,     170,163     ,  -4.04
>      2^19,    264,306   ,     261,279     ,   6.06
>      2^20,    444,525   ,     438,489     ,  10.10
>      2^21,     60,62    ,      58,59      ,   1.01
>  256*2^12,    381,1833  ,     367,1795    ,  79.79
>  256*2^21,    375,1623  ,     356,1555    ,  77.77
>  256*2^30,    356,1338  ,     349,1277    ,  72.72
> 
> iommu_unmap()
>    pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
>      2^12,     76,89    ,      71,86      ,  17.17
>      2^13,     79,89    ,      75,86      ,  12.12
>      2^14,     78,90    ,      74,86      ,  13.13
>      2^15,     82,89    ,      74,86      ,  13.13
>      2^16,     79,89    ,      74,86      ,  13.13
>      2^17,     81,89    ,      77,87      ,  11.11
>      2^18,     90,92    ,      87,89      ,   2.02
>      2^19,     91,93    ,      88,90      ,   2.02
>      2^20,     96,95    ,      91,92      ,   1.01
>      2^21,     72,88    ,      68,85      ,  20.20
>  256*2^12,    372,6583  ,     364,6251    ,  94.94
>  256*2^21,    398,6032  ,     392,5758    ,  93.93
>  256*2^30,    396,5665  ,     389,5258    ,  92.92
> 
> The ~5-17x speedup when working with mutli-PTE map/unmaps is because the
> AMD implementation rewalks the entire table on every new PTE while this
> version retains its position. The same speedup will be seen with dirtys as
> well.
> 
> The old implementation triggers a compiler optimization that ends up
> generating a "rep stos" memset for contiguous PTEs. Since AMD can have
> contiguous PTEs that span 2Kbytes of table this is a huge win compared to
> a normal movq loop. It is why the unmap side has a fairly flat runtime as
> the contiguous PTE sides increases. This version makes it explicit with a
> memset64() call.
> 
> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>

> ---
>  drivers/iommu/Makefile                     |   1 +
>  drivers/iommu/generic_pt/Kconfig           |  12 +
>  drivers/iommu/generic_pt/fmt/Makefile      |  11 +
>  drivers/iommu/generic_pt/fmt/amdv1.h       | 391 +++++++++++++++++++++
>  drivers/iommu/generic_pt/fmt/defs_amdv1.h  |  21 ++
>  drivers/iommu/generic_pt/fmt/iommu_amdv1.c |  15 +
>  include/linux/generic_pt/common.h          |  19 +
>  include/linux/generic_pt/iommu.h           |  12 +
>  8 files changed, 482 insertions(+)
>  create mode 100644 drivers/iommu/generic_pt/fmt/Makefile
>  create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h
>  create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h
>  create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c
> 

.../...

> +$(eval $(foreach fmt,$(iommu_pt_fmt-m),$(call create_format,$(fmt),m)))
> diff --git a/drivers/iommu/generic_pt/fmt/amdv1.h b/drivers/iommu/generic_pt/fmt/amdv1.h
> new file mode 100644
> index 00000000000000..1f46e4ab4aea51
> --- /dev/null
> +++ b/drivers/iommu/generic_pt/fmt/amdv1.h
> @@ -0,0 +1,391 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES
> + *
> + * AMD IOMMU v1 page table
> + *
> + * This is described in Section "2.2.3 I/O Page Tables for Host Translations"
> + * of the "AMD I/O Virtualization Technology (IOMMU) Specification"
> + *
> + * Note the level numbering here matches the core code, so level 0 is the same
> + * as mode 1.
> + *
> + */
> +#ifndef __GENERIC_PT_FMT_AMDV1_H
> +#define __GENERIC_PT_FMT_AMDV1_H
> +
> +#include "defs_amdv1.h"
> +#include "../pt_defs.h"
> +
> +#include <asm/page.h>
> +#include <linux/bitfield.h>
> +#include <linux/container_of.h>
> +#include <linux/mem_encrypt.h>
> +#include <linux/minmax.h>
> +#include <linux/sizes.h>
> +#include <linux/string.h>
> +
> +enum {
> +	PT_MAX_OUTPUT_ADDRESS_LG2 = 52,
> +	PT_MAX_VA_ADDRESS_LG2 = 64,
> +	PT_ITEM_WORD_SIZE = sizeof(u64),
> +	PT_MAX_TOP_LEVEL = 5,
> +	PT_GRANULE_LG2SZ = 12,
> +	PT_TABLEMEM_LG2SZ = 12,
> +
> +	/* The DTE only has these bits for the top phyiscal address */
> +	PT_TOP_PHYS_MASK = GENMASK_ULL(51, 12),
> +};
> +
> +/* PTE bits */
> +enum {
> +	AMDV1PT_FMT_PR = BIT(0),
> +	AMDV1PT_FMT_D = BIT(6),
> +	AMDV1PT_FMT_NEXT_LEVEL = GENMASK_ULL(11, 9),
> +	AMDV1PT_FMT_OA = GENMASK_ULL(51, 12),
> +	AMDV1PT_FMT_FC = BIT_ULL(60),
> +	AMDV1PT_FMT_IR = BIT_ULL(61),
> +	AMDV1PT_FMT_IW = BIT_ULL(62),
> +};
> +
> +/*
> + * gcc 13 has a bug where it thinks the output of FIELD_GET() is an enum, make
> + * these defines to avoid it.
> + */
> +#define AMDV1PT_FMT_NL_DEFAULT 0
> +#define AMDV1PT_FMT_NL_SIZE 7
> +
> +#define common_to_amdv1pt(common_ptr) \
> +	container_of_const(common_ptr, struct pt_amdv1, common)
> +#define to_amdv1pt(pts) common_to_amdv1pt((pts)->range->common)

Unused macros?

-Vasant

next prev parent reply	other threads:[~2025-10-31  9:44 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-23 18:20 [PATCH v7 00/15] Consolidate iommu page table implementations (AMD) Jason Gunthorpe
2025-10-23 18:20 ` [PATCH v7 01/15] genpt: Generic Page Table base API Jason Gunthorpe
2025-10-25 15:13   ` Pasha Tatashin
2025-10-27 16:35   ` Samiullah Khawaja
2025-10-23 18:20 ` [PATCH v7 02/15] genpt: Add Documentation/ files Jason Gunthorpe
2025-10-25 15:15   ` Pasha Tatashin
2025-10-27 16:37   ` Samiullah Khawaja
2025-10-23 18:20 ` [PATCH v7 03/15] iommupt: Add the basic structure of the iommu implementation Jason Gunthorpe
2025-10-25 15:24   ` Pasha Tatashin
2025-10-27 12:58     ` Jason Gunthorpe
2025-10-27 16:40   ` Samiullah Khawaja
2025-10-23 18:20 ` [PATCH v7 04/15] iommupt: Add the AMD IOMMU v1 page table format Jason Gunthorpe
2025-10-31  9:44   ` Vasant Hegde [this message]
2025-10-23 18:20 ` [PATCH v7 05/15] iommupt: Add iova_to_phys op Jason Gunthorpe
2025-10-25 15:29   ` Pasha Tatashin
2025-10-27 16:51   ` Samiullah Khawaja
2025-10-23 18:20 ` [PATCH v7 06/15] iommupt: Add unmap_pages op Jason Gunthorpe
2025-10-25 15:40   ` Pasha Tatashin
2025-10-27 17:03   ` Samiullah Khawaja
2025-10-23 18:20 ` [PATCH v7 07/15] iommupt: Add map_pages op Jason Gunthorpe
2025-10-28  1:16   ` Tian, Kevin
2025-10-28 17:33   ` Samiullah Khawaja
2025-10-23 18:20 ` [PATCH v7 08/15] iommupt: Add read_and_clear_dirty op Jason Gunthorpe
2025-10-27 17:11   ` Samiullah Khawaja
2025-10-23 18:20 ` [PATCH v7 09/15] iommupt: Add a kunit test for Generic Page Table Jason Gunthorpe
2025-10-23 18:20 ` [PATCH v7 10/15] iommupt: Add a mock pagetable format for iommufd selftest to use Jason Gunthorpe
2025-10-30  1:07   ` Samiullah Khawaja
2025-10-23 18:20 ` [PATCH v7 11/15] iommufd: Change the selftest to use iommupt instead of xarray Jason Gunthorpe
2025-10-30  1:06   ` Samiullah Khawaja
2025-10-23 18:20 ` [PATCH v7 12/15] iommupt: Add the x86 64 bit page table format Jason Gunthorpe
2025-10-31  9:51   ` Vasant Hegde
2025-10-31 16:42     ` Jason Gunthorpe
2025-10-23 18:20 ` [PATCH v7 13/15] iommu/amd: Use the generic iommu page table Jason Gunthorpe
2025-10-30 10:22   ` Vasant Hegde
2025-10-23 18:20 ` [PATCH v7 14/15] iommu/amd: Remove AMD io_pgtable support Jason Gunthorpe
2025-10-30 17:06   ` Vasant Hegde
2025-10-23 18:20 ` [PATCH v7 15/15] iommupt: Add a kunit test for the IOMMU implementation Jason Gunthorpe
2025-10-29 16:00   ` Jason Gunthorpe
2025-10-25 14:52 ` [PATCH v7 00/15] Consolidate iommu page table implementations (AMD) Pasha Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9a9a82c3-04b1-4a0e-b7aa-098e36155bdc@amd.com \
    --to=vasant.hegde@amd.com \
    --cc=aik@amd.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=alex@ghiti.fr \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=corbet@lwn.net \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jgowans@amazon.com \
    --cc=joro@8bytes.org \
    --cc=justinstitt@google.com \
    --cc=kevin.tian@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=llvm@lists.linux.dev \
    --cc=michael.roth@amd.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=nick.desaulniers+lkml@gmail.com \
    --cc=ojeda@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=patches@lists.linux.dev \
    --cc=pjw@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=shuah@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox