From: Randy Dunlap <rdunlap@infradead.org>
To: Jason Gunthorpe <jgg@nvidia.com>,
Jonathan Corbet <corbet@lwn.net>,
iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
Justin Stitt <justinstitt@google.com>,
Kevin Tian <kevin.tian@intel.com>,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
llvm@lists.linux.dev, Bill Wendling <morbo@google.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
Miguel Ojeda <ojeda@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Shuah Khan <shuah@kernel.org>,
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
Will Deacon <will@kernel.org>
Cc: Alexey Kardashevskiy <aik@amd.com>,
Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
James Gowans <jgowans@amazon.com>,
Michael Roth <michael.roth@amd.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
patches@lists.linux.dev
Subject: Re: [PATCH v4 03/15] iommupt: Add the basic structure of the iommu implementation
Date: Tue, 26 Aug 2025 22:03:40 -0700 [thread overview]
Message-ID: <cc96baee-2be5-433f-9902-a160765e2fae@infradead.org> (raw)
In-Reply-To: <3-v4-0d6a6726a372+18959-iommu_pt_jgg@nvidia.com>
On 8/26/25 10:18 AM, Jason Gunthorpe wrote:
> The existing IOMMU page table implementations duplicate all of the working
> algorithms for each format. By using the generic page table API a single C
> version of the IOMMU algorithms can be created and re-used for all of the
> different formats used in the drivers. The implementation will provide a
> single C version of the iommu domain operations: iova_to_phys, map, unmap,
> and read_and_clear_dirty.
>
> Further, adding new algorithms and techniques becomes easy to do across
> the entire fleet of drivers and formats.
>
> The C functions are drop in compatible with the existing iommu_domain_ops
> using the IOMMU_PT_DOMAIN_OPS() macro. Each per-format implementation
> compilation unit will produce exported symbols following the pattern
> pt_iommu_FMT_map_pages() which the macro directly maps to the
> iommu_domain_ops members. This avoids the additional function pointer
> indirection like io-pgtable has.
>
> The top level struct used by the drivers is pt_iommu_table_FMT. It
> contains the other structs to allow container_of() to move between the
> driver, iommu page table, generic page table, and generic format layers.
>
> struct pt_iommu_table_amdv1 {
> struct pt_iommu {
> struct iommu_domain domain;
> } iommu;
> struct pt_amdv1 {
> struct pt_common {
> } common;
> } amdpt;
> };
>
> The driver is expected to union the pt_iommu_table_FMT with it's own
its
(not "it is")
> existing domain struct:
>
> struct driver_domain {
> union {
> struct iommu_domain domain;
> struct pt_iommu_table_amdv1 amdv1;
> };
> };
> PT_IOMMU_CHECK_DOMAIN(struct driver_domain, amdv1, domain);
>
> To create an alias to avoid renaming 'domain' in a lot of driver code.
>
> This allows all the layers to access all the necessary functions to
> implement their different roles with no change to any of the existing
> iommu core code.
>
> Implement the basic starting point: pt_iommu_init(), get_info() and
> deinit().
>
> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/generic_pt/Kconfig | 12 +
> drivers/iommu/generic_pt/fmt/iommu_template.h | 39 +++
> drivers/iommu/generic_pt/iommu_pt.h | 264 ++++++++++++++++++
> include/linux/generic_pt/iommu.h | 118 ++++++++
> 4 files changed, 433 insertions(+)
> create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h
> create mode 100644 drivers/iommu/generic_pt/iommu_pt.h
> create mode 100644 include/linux/generic_pt/iommu.h
>
> diff --git a/drivers/iommu/generic_pt/iommu_pt.h b/drivers/iommu/generic_pt/iommu_pt.h
> new file mode 100644
> index 00000000000000..03e1f3daa7a2ef
> --- /dev/null
> +++ b/drivers/iommu/generic_pt/iommu_pt.h
> @@ -0,0 +1,264 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES
> + *
> + * "Templated C code" for implementing the iommu operations for page tables.
> + * This is compiled multiple times, over all the page table formats to pick up
> + * the per-format definitions.
> + */
> +#ifndef __GENERIC_PT_IOMMU_PT_H
> +#define __GENERIC_PT_IOMMU_PT_H
> +
> +#include "pt_iter.h"
> +
> +#include <linux/iommu.h>
> +#include "../iommu-pages.h"
> +#include <linux/export.h>
Could you put <linux/export.h> before <linux/iommu.h?
Does something prevent that?
> +> +#define DOMAIN_NS(op) CONCATENATE(CONCATENATE(pt_iommu_, PTPFX), op)
> +
> +struct pt_iommu_collect_args {
> + struct iommu_pages_list free_list;
> + u8 ignore_mapped : 1;
> +};
> +
> +static int __collect_tables(struct pt_range *range, void *arg,
> + unsigned int level, struct pt_table_p *table)
> +{
> + struct pt_state pts = pt_init(range, level, table);
> + struct pt_iommu_collect_args *collect = arg;
> + int ret;
> +
> + if (collect->ignore_mapped && !pt_can_have_table(&pts))
> + return 0;
> +
> + for_each_pt_level_entry(&pts) {
> + if (pts.type == PT_ENTRY_TABLE) {
> + iommu_pages_list_add(&collect->free_list, pts.table_lower);
> + ret = pt_descend(&pts, arg, __collect_tables);
> + if (ret)
> + return ret;
> + continue;
> + }
> + if (pts.type == PT_ENTRY_OA && !collect->ignore_mapped)
> + return -EADDRINUSE;
> + }
> + return 0;
> +}
> +
> +static inline struct pt_table_p *table_alloc_top(struct pt_common *common,
> + uintptr_t top_of_table,
> + gfp_t gfp)
> +{
> + struct pt_iommu *iommu_table = iommu_from_common(common);
> +
> + /*
> + * Top doesn't need the free list or otherwise, so it technically
> + * doesn't need to use iommu pages. Use the API anyhow as the top is
> + * usually not smaller than PAGE_SIZE to keep things simple.
> + */
> + return iommu_alloc_pages_node_sz(
> + iommu_table->nid, gfp,
> + log2_to_int(pt_top_memsize_lg2(common, top_of_table)));
> +}
> +
> +static void NS(get_info)(struct pt_iommu *iommu_table,
> + struct pt_iommu_info *info)
> +{
> + struct pt_common *common = common_from_iommu(iommu_table);
> + struct pt_range range = pt_top_range(common);
> + struct pt_state pts = pt_init_top(&range);
> + pt_vaddr_t pgsize_bitmap = 0;
> +
> + if (pt_feature(common, PT_FEAT_DYNAMIC_TOP)) {
> + for (pts.level = 0; pts.level <= PT_MAX_TOP_LEVEL;
> + pts.level++) {
> + if (pt_table_item_lg2sz(&pts) >= common->max_vasz_lg2)
> + break;
> + pgsize_bitmap |= pt_possible_sizes(&pts);
> + }
> + } else {
> + for (pts.level = 0; pts.level <= range.top_level; pts.level++)
> + pgsize_bitmap |= pt_possible_sizes(&pts);
> + }
> +
> + /* Hide page sizes larger than the maximum OA */
> + info->pgsize_bitmap = oalog2_mod(pgsize_bitmap, common->max_oasz_lg2);
> +}
> +
> +static void NS(deinit)(struct pt_iommu *iommu_table)
> +{
> + struct pt_common *common = common_from_iommu(iommu_table);
> + struct pt_range range = pt_all_range(common);
> + struct pt_iommu_collect_args collect = {
> + .free_list = IOMMU_PAGES_LIST_INIT(collect.free_list),
> + .ignore_mapped = true,
> + };
> +
> + iommu_pages_list_add(&collect.free_list, range.top_table);
> + pt_walk_range(&range, __collect_tables, &collect);
> +
> + /*
> + * The driver has to already have fenced the HW access to the page table
> + * and invalidated any caching referring to this memory.
> + */
> + iommu_put_pages_list(&collect.free_list);
> +}
> +
> +static const struct pt_iommu_ops NS(ops) = {
> + .get_info = NS(get_info),
> + .deinit = NS(deinit),
> +};
> +
> +static int pt_init_common(struct pt_common *common)
> +{
> + struct pt_range top_range = pt_top_range(common);
> +
> + if (PT_WARN_ON(top_range.top_level > PT_MAX_TOP_LEVEL))
> + return -EINVAL;
> +
> + if (top_range.top_level == PT_MAX_TOP_LEVEL ||
> + common->max_vasz_lg2 == top_range.max_vasz_lg2)
> + common->features &= ~BIT(PT_FEAT_DYNAMIC_TOP);
> +
> + if (top_range.max_vasz_lg2 == PT_VADDR_MAX_LG2)
> + common->features |= BIT(PT_FEAT_FULL_VA);
> +
> + /* Requested features must match features compiled into this format */
> + if ((common->features & ~(unsigned int)PT_SUPPORTED_FEATURES) ||
> + (!IS_ENABLED(CONFIG_DEBUG_GENERIC_PT) &&
> + (common->features & PT_FORCE_ENABLED_FEATURES) !=
> + PT_FORCE_ENABLED_FEATURES))
> + return -EOPNOTSUPP;
> +
> + if (common->max_oasz_lg2 == 0)
> + common->max_oasz_lg2 = pt_max_output_address_lg2(common);
> + else
> + common->max_oasz_lg2 = min(common->max_oasz_lg2,
> + pt_max_output_address_lg2(common));
> + return 0;
> +}
> +
> +static int pt_iommu_init_domain(struct pt_iommu *iommu_table,
> + struct iommu_domain *domain)
> +{
> + struct pt_common *common = common_from_iommu(iommu_table);
> + struct pt_iommu_info info;
> + struct pt_range range;
> +
> + NS(get_info)(iommu_table, &info);
> +
> + domain->type = __IOMMU_DOMAIN_PAGING;
> + domain->pgsize_bitmap = info.pgsize_bitmap;
> +
> + if (pt_feature(common, PT_FEAT_DYNAMIC_TOP))
> + range = _pt_top_range(common,
> + _pt_top_set(NULL, PT_MAX_TOP_LEVEL));
> + else
> + range = pt_top_range(common);
> +
> + /*
> + * A 64 bit high address space table on a 32 bit system cannot work.
64-bit 32-bit
> + */
> + domain->geometry.aperture_start = (unsigned long)range.va;
> + if ((pt_vaddr_t)domain->geometry.aperture_start != range.va)
> + return -EOVERFLOW;
> +
> + /*
> + * The aperture is limited to what the API can do after considering all
> + * the different types dma_addr_t/unsigned long/pt_vaddr_t that are used
> + * to store a VA. Set the aperture to something that is valid for all
> + * cases. Saturate instead of truncate the end if the types are smaller
> + * than the top range. aperture_end is a last.
Does "is a last" have something to do with terminating loop iteration?
Is it inclusive or exclusive?
> + */
> + domain->geometry.aperture_end = (unsigned long)range.last_va;
> + if ((pt_vaddr_t)domain->geometry.aperture_end != range.last_va) {
> + domain->geometry.aperture_end = ULONG_MAX;
> + domain->pgsize_bitmap &= ULONG_MAX;
> + }
> + domain->geometry.force_aperture = true;
> +
> + return 0;
> +}
> +
> +static void pt_iommu_zero(struct pt_iommu_table *fmt_table)
> +{
> + struct pt_iommu *iommu_table = &fmt_table->iommu;
> + struct pt_iommu cfg = *iommu_table;
> +
> + static_assert(offsetof(struct pt_iommu_table, iommu.domain) == 0);
> + memset_after(fmt_table, 0, iommu.domain);
> +
> + /* The caller can initialize some of these values */
> + iommu_table->nid = cfg.nid;
> +}
> +
> +#define pt_iommu_table_cfg CONCATENATE(pt_iommu_table, _cfg)
> +#define pt_iommu_init CONCATENATE(CONCATENATE(pt_iommu_, PTPFX), init)
My eyes would appreciate a blank line here.
> +int pt_iommu_init(struct pt_iommu_table *fmt_table,
> + const struct pt_iommu_table_cfg *cfg, gfp_t gfp)
> +{
> + struct pt_iommu *iommu_table = &fmt_table->iommu;
> + struct pt_common *common = common_from_iommu(iommu_table);
> + struct pt_table_p *table_mem;
> + int ret;
[snip]
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("IOMMU Pagetable implementation for " __stringify(PTPFX_RAW));
Page table
for consistency.
> +MODULE_IMPORT_NS("GENERIC_PT");
> +
> +#endif
> diff --git a/include/linux/generic_pt/iommu.h b/include/linux/generic_pt/iommu.h
> new file mode 100644
> index 00000000000000..9d2152bc64c0d6
> --- /dev/null
> +++ b/include/linux/generic_pt/iommu.h
> @@ -0,0 +1,118 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES
> + */
> +#ifndef __GENERIC_PT_IOMMU_H
> +#define __GENERIC_PT_IOMMU_H
> +
> +#include <linux/generic_pt/common.h>
> +#include <linux/iommu.h>
> +#include <linux/mm_types.h>
> +
> +struct pt_iommu_ops;
> +
> +/**
> + * DOC: IOMMU Radix Page Table
> + *
> + * The iommu implementation of the Generic Page Table provides an ops struct
s/iommu/IOMMU/ in text (not functions/structs/etc.) where possible.
> + * that is useful to go with an iommu_domain to serve the DMA API, IOMMUFD and
> + * the generic map/unmap interface.
> + *
> + * This interface uses a caller provided locking approach. The caller must have
> + * a VA range lock concept that prevents concurrent threads from calling ops on
> + * the same VA. Generally the range lock must be at least as large as a single
> + * map call.
> + */
> +
> +/**
> + * struct pt_iommu - Base structure for iommu page tables
> + *
> + * The format specific struct will include this as the first member.
format-specific
> + */
> +struct pt_iommu {
> + /**
> + * @domain - The core iommu domain. The driver should use a union to
struct members should be like this:
* @domain: <description>
> + * overlay this memory with its previously existing domain struct to
> + * create an alias.
> + */
> + struct iommu_domain domain;
> +
> + /**
> + * @ops - Function pointers to access the API
> + */
> + const struct pt_iommu_ops *ops;
> +
> + /**
> + * @nid - Node ID to use for table memory allocations. The iommu driver
> + * may want to set the NID to the device's NID, if there are multiple
> + * table walkers.
> + */
> + int nid;
> +};
> +
> +/**
> + * struct pt_iommu_info - Details about the iommu page table
> + *
> + * Returned from pt_iommu_ops->get_info()
> + */
> +struct pt_iommu_info {
> + /**
> + * @pgsize_bitmap - A bitmask where each set bit indicates
Ditto.
> + * a page size that can be natively stored in the page table.
> + */
> + u64 pgsize_bitmap;
> +};
> +
> +struct pt_iommu_ops {
> + /**
> + * get_info() - Return the pt_iommu_info structure
* @get_info: Return the pt_iommu_info structure
> + * @iommu_table: Table to query
> + *
> + * Return some basic static information about the page table.
> + */
> + void (*get_info)(struct pt_iommu *iommu_table,
> + struct pt_iommu_info *info);
> +
> + /**
> + * deinit() - Undo a format specific init operation
Same reformatting.
> + * @iommu_table: Table to destroy
> + *
> + * Release all of the memory. The caller must have already removed the
> + * table from all HW access and all caches.
> + */
> + void (*deinit)(struct pt_iommu *iommu_table);
> +};
> +
> +static inline void pt_iommu_deinit(struct pt_iommu *iommu_table)
> +{
> + /*
> + * It is safe to call pt_iommu_deinit() before an init, or if init
> + * fails. The ops pointer will only become non-NUL if deinit needs to be
s/NUL/NULL/
NUL is the ASCII character with value 0; NULL is a pointer value.
> + * run.
> + */
> + if (iommu_table->ops)
> + iommu_table->ops->deinit(iommu_table);
> +}
> +
> +/**
> + * struct pt_iommu_cfg - Common configuration values for all formats
> + */
> +struct pt_iommu_cfg {
> + /**
> + * @features - Features required. Only these features will be turned on.
* @features:
Similar for below.
> + * The feature list should reflect what the IOMMU HW is capable of.
> + */
> + unsigned int features;
> + /**
> + * @hw_max_vasz_lg2 - Maximum VA the IOMMU HW can support. This will
> + * imply the top level of the table.
> + */
> + u8 hw_max_vasz_lg2;
> + /**
> + * @hw_max_oasz_lg2 - Maximum OA the IOMMU HW can support. The format
> + * might select a lower maximum OA.
> + */
> + u8 hw_max_oasz_lg2;
> +};
> +
> +#endif
--
~Randy
next prev parent reply other threads:[~2025-08-27 5:03 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 17:18 [PATCH v4 00/15] Consolidate iommu page table implementations (AMD) Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 01/15] genpt: Generic Page Table base API Jason Gunthorpe
2025-08-27 7:11 ` Randy Dunlap
2025-08-29 18:51 ` Jason Gunthorpe
2025-08-29 22:50 ` Randy Dunlap
2025-08-26 17:18 ` [PATCH v4 02/15] genpt: Add Documentation/ files Jason Gunthorpe
2025-08-27 1:07 ` Randy Dunlap
2025-08-29 18:57 ` Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 03/15] iommupt: Add the basic structure of the iommu implementation Jason Gunthorpe
2025-08-27 5:03 ` Randy Dunlap [this message]
2025-08-29 19:05 ` Jason Gunthorpe
2025-08-29 19:25 ` Randy Dunlap
2025-08-26 17:18 ` [PATCH v4 04/15] iommupt: Add the AMD IOMMU v1 page table format Jason Gunthorpe
2025-08-27 0:03 ` Randy Dunlap
2025-08-29 19:06 ` Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 05/15] iommupt: Add iova_to_phys op Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 06/15] iommupt: Add unmap_pages op Jason Gunthorpe
2025-08-26 20:44 ` Randy Dunlap
2025-08-29 17:55 ` Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 07/15] iommupt: Add map_pages op Jason Gunthorpe
2025-08-26 23:20 ` Randy Dunlap
2025-08-29 19:23 ` Jason Gunthorpe
2025-08-29 19:27 ` Randy Dunlap
2025-08-26 17:18 ` [PATCH v4 08/15] iommupt: Add read_and_clear_dirty op Jason Gunthorpe
2025-08-26 20:47 ` Randy Dunlap
2025-08-29 17:55 ` Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 09/15] iommupt: Add a kunit test for Generic Page Table Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 10/15] iommupt: Add a mock pagetable format for iommufd selftest to use Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 11/15] iommufd: Change the selftest to use iommupt instead of xarray Jason Gunthorpe
2025-08-26 23:33 ` Randy Dunlap
2025-08-29 17:56 ` Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 12/15] iommupt: Add the x86 64 bit page table format Jason Gunthorpe
2025-08-26 23:38 ` Randy Dunlap
2025-08-29 17:58 ` Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 13/15] iommu/amd: Use the generic iommu page table Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 14/15] iommu/amd: Remove AMD io_pgtable support Jason Gunthorpe
2025-08-26 17:18 ` [PATCH v4 15/15] iommupt: Add a kunit test for the IOMMU implementation Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cc96baee-2be5-433f-9902-a160765e2fae@infradead.org \
--to=rdunlap@infradead.org \
--cc=aik@amd.com \
--cc=alejandro.j.jimenez@oracle.com \
--cc=corbet@lwn.net \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=jgowans@amazon.com \
--cc=joro@8bytes.org \
--cc=justinstitt@google.com \
--cc=kevin.tian@intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=michael.roth@amd.com \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=nick.desaulniers+lkml@gmail.com \
--cc=ojeda@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=patches@lists.linux.dev \
--cc=robin.murphy@arm.com \
--cc=shuah@kernel.org \
--cc=suravee.suthikulpanit@amd.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).