From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
Jonathan Corbet <corbet@lwn.net>,
iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
Justin Stitt <justinstitt@google.com>,
Kevin Tian <kevin.tian@intel.com>,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
llvm@lists.linux.dev, Bill Wendling <morbo@google.com>,
Nathan Chancellor <nathan@kernel.org>,
Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
Miguel Ojeda <ojeda@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Shuah Khan <shuah@kernel.org>,
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
Will Deacon <will@kernel.org>
Cc: Alexey Kardashevskiy <aik@amd.com>,
James Gowans <jgowans@amazon.com>,
Michael Roth <michael.roth@amd.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
patches@lists.linux.dev
Subject: Re: [PATCH v2 00/15] Consolidate iommu page table implementations (AMD)
Date: Mon, 12 May 2025 21:08:05 -0400 [thread overview]
Message-ID: <9a9ed15f-eb0e-45ce-8c21-ef74539aa9c2@oracle.com> (raw)
In-Reply-To: <0-v2-5c26bde5c22d+58b-iommu_pt_jgg@nvidia.com>
On 5/5/25 10:18 AM, Jason Gunthorpe wrote:
> Currently each of the iommu page table formats duplicates all of the logic
> to maintain the page table and perform map/unmap/etc operations. There are
> several different versions of the algorithms between all the different
> formats. The io-pgtable system provides an interface to help isolate the
> page table code from the iommu driver, but doesn't provide tools to
> implement the common algorithms.
>
> This makes it very hard to improve the state of the pagetable code under
> the iommu domains as any proposed improvement needs to alter a large
> number of different driver code paths. Combined with a lack of software
> based testing this makes improvement in this area very hard.
>
> iommufd wants several new page table operations:
> - More efficient map/unmap operations, using iommufd's batching logic
> - unmap that returns the physical addresses into a batch as it progresses
> - cut that allows splitting areas so large pages can have holes
> poked in them dynamically (ie guestmemfd hitless shared/private
> transitions)
> - More agressive freeing of table memory to avoid waste
> - Fragmenting large pages so that dirty tracking can be more granular
> - Reassembling large pages so that VMs can run at full IO performance
> in migration/dirty tracking error flows
> - KHO integration for kernel live upgrade
>
> Together these are algorithmically complex enough to be a very significant
> task to go and implement in all the page table formats we support. Just
> the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86
> PAE / AMDv1 / VT-D SS / RISCV)
>
> Instead of doing the duplicated work, this series takes the first step to
> consolidate the algorithms into one places. In spirit it is similar to the
> work Christoph did a few years back to pull the redundant get_user_pages()
> implementations out of the arch code into core MM. This unlocked a great
> deal of improvement in that space in the following years. I would like to
> see the same benefit in iommu as well.
>
> My first RFC showed a bigger picture with all most all formats and more
> algorithms. This series reorganizes that to be narrowly focused on just
> enough to convert the AMD driver to use the new mechanism.
>
> kunit tests are provided that allow good testing of the algorithms and all
> formats on x86, nothing is arch specific.
>
> AMD is one of the simpler options as the HW is quite uniform with few
> different options/bugs while still requiring the complicated contiguous
> pages support. The HW also has a very simple range based invalidation
> approach that is easy to implement.
>
> The AMD v1 and AMD v2 page table formats are implemented bit for bit
> identical to the current code, tested using a compare kunit test that
> checks against the io-pgtable version (on github, see below).
I have tested the patchset on an AMD Zen 4 Bare Metal instance, booting
in default passthrough and translated (i.e. iommu.passthrough=0) mode.
Exercised both the default AMD v1 and AMD v2 (amd_iommu=pgtbl_v2) page
table formats. Launched KVM guests with VFIO passthrough device. No
issues found.
I also ran the x86_64 kunit test suite; all tests pass, except the
(expected) skipped test_increase_level for the x86_64 format which does
not have PT_FEAT_DYNAMIC_TOP.
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
>
> Updating the AMD driver to replace the io-pgtable layer with the new stuff
> is fairly straightforward now. The layering is fixed up in the new version
> so that all the invalidation goes through function pointers.
>
> Several small fixing patches have come out of this as I've been fixing the
> problems that the test suite uncovers in the current code, and
> implementing the fixed version in iommupt.
>
> On performance, there is a quite wide variety of implementation designs
> across all the drivers. Looking at some key performance across
> the main formats:
>
> iommu_map():
> pgsz ,avg new,old ns, min new,old ns , min % (+ve is better)
> 2^12, 53,66 , 51,63 , 19.19 (AMDV1)
> 256*2^12, 386,1909 , 367,1795 , 79.79
> 256*2^21, 362,1633 , 355,1556 , 77.77
>
> 2^12, 56,62 , 52,59 , 11.11 (AMDv2)
> 256*2^12, 405,1355 , 357,1292 , 72.72
> 256*2^21, 393,1160 , 358,1114 , 67.67
>
> 2^12, 55,65 , 53,62 , 14.14 (VTD second stage)
> 256*2^12, 391,518 , 332,512 , 35.35
> 256*2^21, 383,635 , 336,624 , 46.46
>
> 2^12, 57,65 , 55,63 , 12.12 (ARM 64 bit)
> 256*2^12, 380,389 , 361,369 , 2.02
> 256*2^21, 358,419 , 345,400 , 13.13
>
> iommu_unmap():
> pgsz ,avg new,old ns, min new,old ns , min % (+ve is better)
> 2^12, 69,88 , 65,85 , 23.23 (AMDv1)
> 256*2^12, 353,6498 , 331,6029 , 94.94
> 256*2^21, 373,6014 , 360,5706 , 93.93
>
> 2^12, 71,72 , 66,69 , 4.04 (AMDv2)
> 256*2^12, 228,891 , 206,871 , 76.76
> 256*2^21, 254,721 , 245,711 , 65.65
>
> 2^12, 69,87 , 65,82 , 20.20 (VTD second stage)
> 256*2^12, 210,321 , 200,315 , 36.36
> 256*2^21, 255,349 , 238,342 , 30.30
>
> 2^12, 72,77 , 68,74 , 8.08 (ARM 64 bit)
> 256*2^12, 521,357 , 447,346 , -29.29
> 256*2^21, 489,358 , 433,345 , -25.25
>
> * Above numbers include additional patches to remove the iommu_pgsize()
> overheads. gcc 13.3.0, i7-12700
>
> This version provides fairly consistent performance across formats. ARM
> unmap performance is quite different because this version supports
> contiguous pages and uses a very different algorithm for unmapping. Though
> why it is so worse compared to AMDv1 I haven't figured out yet.
>
> The per-format commits include a more detailed chart.
>
> There is a second branch:
> https://github.com/jgunthorpe/linux/commits/iommu_pt_all
> Containing supporting work and future steps:
> - ARM short descriptor (32 bit), ARM long descriptor (64 bit) formats
> - VT-D second stage format
> - DART v1 & v2 format
> - Draft of a iommufd 'cut' operation to break down huge pages
> - Draft of support for a DMA incoherent HW page table walker
> - A compare test that checks the iommupt formats against the iopgtable
> interface, including updating AMD to have a working iopgtable and patches
> to make VT-D have an iopgtable for testing.
> - A performance test to micro-benchmark map and unmap against iogptable
>
> My strategy is to go one by one for the drivers:
> - AMD driver conversion
> - RISCV page table and driver
> - Intel VT-D driver and VTDSS page table
> - ARM SMMUv3
>
> And concurrently work on the algorithm side:
> - debugfs content dump, like VT-D has
> - Cut support
> - Increase/Decrease page size support
> - map/unmap batching
> - KHO
>
> As we make more algorithm improvements the value to convert the drivers
> increases.
>
> This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt
>
> v1:
> - AMD driver only, many code changes
> RFC: https://lore.kernel.org/all/0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/
>
> Alejandro Jimenez (1):
> iommu/amd: Use the generic iommu page table
>
> Jason Gunthorpe (14):
> genpt: Generic Page Table base API
> genpt: Add Documentation/ files
> iommupt: Add the basic structure of the iommu implementation
> iommupt: Add the AMD IOMMU v1 page table format
> iommupt: Add iova_to_phys op
> iommupt: Add unmap_pages op
> iommupt: Add map_pages op
> iommupt: Add read_and_clear_dirty op
> iommupt: Add a kunit test for Generic Page Table
> iommupt: Add a mock pagetable format for iommufd selftest to use
> iommufd: Change the selftest to use iommupt instead of xarray
> iommupt: Add the x86 64 bit page table format
> iommu/amd: Remove AMD io_pgtable support
> iommupt: Add a kunit test for the IOMMU implementation
>
> .clang-format | 1 +
> Documentation/driver-api/generic_pt.rst | 105 ++
> Documentation/driver-api/index.rst | 1 +
> drivers/iommu/Kconfig | 2 +
> drivers/iommu/Makefile | 1 +
> drivers/iommu/amd/Kconfig | 5 +-
> drivers/iommu/amd/Makefile | 2 +-
> drivers/iommu/amd/amd_iommu.h | 1 -
> drivers/iommu/amd/amd_iommu_types.h | 109 +-
> drivers/iommu/amd/io_pgtable.c | 560 --------
> drivers/iommu/amd/io_pgtable_v2.c | 370 ------
> drivers/iommu/amd/iommu.c | 493 ++++---
> drivers/iommu/generic_pt/.kunitconfig | 13 +
> drivers/iommu/generic_pt/Kconfig | 72 ++
> drivers/iommu/generic_pt/fmt/Makefile | 26 +
> drivers/iommu/generic_pt/fmt/amdv1.h | 407 ++++++
> drivers/iommu/generic_pt/fmt/defs_amdv1.h | 21 +
> drivers/iommu/generic_pt/fmt/defs_x86_64.h | 21 +
> drivers/iommu/generic_pt/fmt/iommu_amdv1.c | 15 +
> drivers/iommu/generic_pt/fmt/iommu_mock.c | 10 +
> drivers/iommu/generic_pt/fmt/iommu_template.h | 48 +
> drivers/iommu/generic_pt/fmt/iommu_x86_64.c | 12 +
> drivers/iommu/generic_pt/fmt/x86_64.h | 241 ++++
> drivers/iommu/generic_pt/iommu_pt.h | 1146 +++++++++++++++++
> drivers/iommu/generic_pt/kunit_generic_pt.h | 721 +++++++++++
> drivers/iommu/generic_pt/kunit_iommu.h | 183 +++
> drivers/iommu/generic_pt/kunit_iommu_pt.h | 451 +++++++
> drivers/iommu/generic_pt/pt_common.h | 351 +++++
> drivers/iommu/generic_pt/pt_defs.h | 312 +++++
> drivers/iommu/generic_pt/pt_fmt_defaults.h | 193 +++
> drivers/iommu/generic_pt/pt_iter.h | 638 +++++++++
> drivers/iommu/generic_pt/pt_log2.h | 130 ++
> drivers/iommu/io-pgtable.c | 4 -
> drivers/iommu/iommufd/Kconfig | 1 +
> drivers/iommu/iommufd/iommufd_test.h | 11 +-
> drivers/iommu/iommufd/selftest.c | 439 +++----
> include/linux/generic_pt/common.h | 166 +++
> include/linux/generic_pt/iommu.h | 264 ++++
> include/linux/io-pgtable.h | 2 -
> tools/testing/selftests/iommu/iommufd.c | 60 +-
> tools/testing/selftests/iommu/iommufd_utils.h | 12 +
> 41 files changed, 6046 insertions(+), 1574 deletions(-)
> create mode 100644 Documentation/driver-api/generic_pt.rst
> delete mode 100644 drivers/iommu/amd/io_pgtable.c
> delete mode 100644 drivers/iommu/amd/io_pgtable_v2.c
> create mode 100644 drivers/iommu/generic_pt/.kunitconfig
> create mode 100644 drivers/iommu/generic_pt/Kconfig
> create mode 100644 drivers/iommu/generic_pt/fmt/Makefile
> create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h
> create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h
> create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86_64.h
> create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c
> create mode 100644 drivers/iommu/generic_pt/fmt/iommu_mock.c
> create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h
> create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86_64.c
> create mode 100644 drivers/iommu/generic_pt/fmt/x86_64.h
> create mode 100644 drivers/iommu/generic_pt/iommu_pt.h
> create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h
> create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h
> create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h
> create mode 100644 drivers/iommu/generic_pt/pt_common.h
> create mode 100644 drivers/iommu/generic_pt/pt_defs.h
> create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h
> create mode 100644 drivers/iommu/generic_pt/pt_iter.h
> create mode 100644 drivers/iommu/generic_pt/pt_log2.h
> create mode 100644 include/linux/generic_pt/common.h
> create mode 100644 include/linux/generic_pt/iommu.h
>
>
> base-commit: db37090502f67e46541e53b91f00bbd565c96bd0
prev parent reply other threads:[~2025-05-13 1:08 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-05 14:18 [PATCH v2 00/15] Consolidate iommu page table implementations (AMD) Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 01/15] genpt: Generic Page Table base API Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 02/15] genpt: Add Documentation/ files Jason Gunthorpe
2025-05-07 2:37 ` Bagas Sanjaya
2025-05-13 18:53 ` Jason Gunthorpe
2025-05-15 6:21 ` Bagas Sanjaya
2025-05-05 14:18 ` [PATCH v2 03/15] iommupt: Add the basic structure of the iommu implementation Jason Gunthorpe
2025-05-14 9:23 ` Ankit Soni
2025-05-14 15:54 ` Jason Gunthorpe
2025-05-14 20:08 ` Alejandro Jimenez
2025-05-15 19:32 ` Jason Gunthorpe
2025-05-16 5:02 ` Ankit Soni
2025-05-16 20:39 ` Alejandro Jimenez
2025-05-05 14:18 ` [PATCH v2 04/15] iommupt: Add the AMD IOMMU v1 page table format Jason Gunthorpe
2025-05-05 18:48 ` ALOK TIWARI
2025-05-16 8:30 ` Yi Liu
2025-05-16 11:57 ` Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 05/15] iommupt: Add iova_to_phys op Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 06/15] iommupt: Add unmap_pages op Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 07/15] iommupt: Add map_pages op Jason Gunthorpe
2025-05-13 5:15 ` Ankit Soni
2025-05-13 12:00 ` Jason Gunthorpe
2025-06-05 16:49 ` Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 08/15] iommupt: Add read_and_clear_dirty op Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 09/15] iommupt: Add a kunit test for Generic Page Table Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 10/15] iommupt: Add a mock pagetable format for iommufd selftest to use Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 11/15] iommufd: Change the selftest to use iommupt instead of xarray Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 12/15] iommupt: Add the x86 64 bit page table format Jason Gunthorpe
2025-06-05 21:03 ` Jacob Pan
2025-06-06 11:43 ` Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 13/15] iommu/amd: Use the generic iommu page table Jason Gunthorpe
2025-05-05 19:09 ` ALOK TIWARI
2025-05-05 14:18 ` [PATCH v2 14/15] iommu/amd: Remove AMD io_pgtable support Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 15/15] iommupt: Add a kunit test for the IOMMU implementation Jason Gunthorpe
2025-05-05 17:44 ` Nathan Chancellor
2025-05-05 17:47 ` Jason Gunthorpe
2025-05-05 18:00 ` Nathan Chancellor
2025-05-13 1:08 ` Alejandro Jimenez [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a9ed15f-eb0e-45ce-8c21-ef74539aa9c2@oracle.com \
--to=alejandro.j.jimenez@oracle.com \
--cc=aik@amd.com \
--cc=corbet@lwn.net \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=jgowans@amazon.com \
--cc=joro@8bytes.org \
--cc=justinstitt@google.com \
--cc=kevin.tian@intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=llvm@lists.linux.dev \
--cc=michael.roth@amd.com \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=nick.desaulniers+lkml@gmail.com \
--cc=ojeda@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=patches@lists.linux.dev \
--cc=robin.murphy@arm.com \
--cc=shuah@kernel.org \
--cc=suravee.suthikulpanit@amd.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox