Building the Linux kernel with Clang and LLVM
 help / color / mirror / Atom feed
From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
	Jonathan Corbet <corbet@lwn.net>,
	iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
	Justin Stitt <justinstitt@google.com>,
	Kevin Tian <kevin.tian@intel.com>,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	llvm@lists.linux.dev, Bill Wendling <morbo@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Shuah Khan <shuah@kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Will Deacon <will@kernel.org>
Cc: Alexey Kardashevskiy <aik@amd.com>,
	James Gowans <jgowans@amazon.com>,
	Michael Roth <michael.roth@amd.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	patches@lists.linux.dev
Subject: Re: [PATCH v2 00/15] Consolidate iommu page table implementations (AMD)
Date: Mon, 12 May 2025 21:08:05 -0400	[thread overview]
Message-ID: <9a9ed15f-eb0e-45ce-8c21-ef74539aa9c2@oracle.com> (raw)
In-Reply-To: <0-v2-5c26bde5c22d+58b-iommu_pt_jgg@nvidia.com>



On 5/5/25 10:18 AM, Jason Gunthorpe wrote:
> Currently each of the iommu page table formats duplicates all of the logic
> to maintain the page table and perform map/unmap/etc operations. There are
> several different versions of the algorithms between all the different
> formats. The io-pgtable system provides an interface to help isolate the
> page table code from the iommu driver, but doesn't provide tools to
> implement the common algorithms.
> 
> This makes it very hard to improve the state of the pagetable code under
> the iommu domains as any proposed improvement needs to alter a large
> number of different driver code paths. Combined with a lack of software
> based testing this makes improvement in this area very hard.
> 
> iommufd wants several new page table operations:
>   - More efficient map/unmap operations, using iommufd's batching logic
>   - unmap that returns the physical addresses into a batch as it progresses
>   - cut that allows splitting areas so large pages can have holes
>     poked in them dynamically (ie guestmemfd hitless shared/private
>     transitions)
>   - More agressive freeing of table memory to avoid waste
>   - Fragmenting large pages so that dirty tracking can be more granular
>   - Reassembling large pages so that VMs can run at full IO performance
>     in migration/dirty tracking error flows
>   - KHO integration for kernel live upgrade
> 
> Together these are algorithmically complex enough to be a very significant
> task to go and implement in all the page table formats we support. Just
> the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86
> PAE / AMDv1 / VT-D SS / RISCV)
> 
> Instead of doing the duplicated work, this series takes the first step to
> consolidate the algorithms into one places. In spirit it is similar to the
> work Christoph did a few years back to pull the redundant get_user_pages()
> implementations out of the arch code into core MM. This unlocked a great
> deal of improvement in that space in the following years. I would like to
> see the same benefit in iommu as well.
> 
> My first RFC showed a bigger picture with all most all formats and more
> algorithms. This series reorganizes that to be narrowly focused on just
> enough to convert the AMD driver to use the new mechanism.
> 
> kunit tests are provided that allow good testing of the algorithms and all
> formats on x86, nothing is arch specific.
> 
> AMD is one of the simpler options as the HW is quite uniform with few
> different options/bugs while still requiring the complicated contiguous
> pages support. The HW also has a very simple range based invalidation
> approach that is easy to implement.
> 
> The AMD v1 and AMD v2 page table formats are implemented bit for bit
> identical to the current code, tested using a compare kunit test that
> checks against the io-pgtable version (on github, see below).

I have tested the patchset on an AMD Zen 4 Bare Metal instance, booting 
in default passthrough and translated (i.e. iommu.passthrough=0) mode. 
Exercised both the default AMD v1 and AMD v2 (amd_iommu=pgtbl_v2) page 
table formats. Launched KVM guests with VFIO passthrough device. No 
issues found.

I also ran the x86_64 kunit test suite; all tests pass, except the 
(expected) skipped test_increase_level for the x86_64 format which does 
not have PT_FEAT_DYNAMIC_TOP.

Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>

> 
> Updating the AMD driver to replace the io-pgtable layer with the new stuff
> is fairly straightforward now. The layering is fixed up in the new version
> so that all the invalidation goes through function pointers.
> 
> Several small fixing patches have come out of this as I've been fixing the
> problems that the test suite uncovers in the current code, and
> implementing the fixed version in iommupt.
> 
> On performance, there is a quite wide variety of implementation designs
> across all the drivers. Looking at some key performance across
> the main formats:
> 
> iommu_map():
>     pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
>       2^12,     53,66    ,      51,63      ,  19.19 (AMDV1)
>   256*2^12,    386,1909  ,     367,1795    ,  79.79
>   256*2^21,    362,1633  ,     355,1556    ,  77.77
> 
>       2^12,     56,62    ,      52,59      ,  11.11 (AMDv2)
>   256*2^12,    405,1355  ,     357,1292    ,  72.72
>   256*2^21,    393,1160  ,     358,1114    ,  67.67
> 
>       2^12,     55,65    ,      53,62      ,  14.14 (VTD second stage)
>   256*2^12,    391,518   ,     332,512     ,  35.35
>   256*2^21,    383,635   ,     336,624     ,  46.46
> 
>       2^12,     57,65    ,      55,63      ,  12.12 (ARM 64 bit)
>   256*2^12,    380,389   ,     361,369     ,   2.02
>   256*2^21,    358,419   ,     345,400     ,  13.13
> 
> iommu_unmap():
>     pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
>       2^12,     69,88    ,      65,85      ,  23.23 (AMDv1)
>   256*2^12,    353,6498  ,     331,6029    ,  94.94
>   256*2^21,    373,6014  ,     360,5706    ,  93.93
> 
>       2^12,     71,72    ,      66,69      ,   4.04 (AMDv2)
>   256*2^12,    228,891   ,     206,871     ,  76.76
>   256*2^21,    254,721   ,     245,711     ,  65.65
> 
>       2^12,     69,87    ,      65,82      ,  20.20 (VTD second stage)
>   256*2^12,    210,321   ,     200,315     ,  36.36
>   256*2^21,    255,349   ,     238,342     ,  30.30
> 
>       2^12,     72,77    ,      68,74      ,   8.08 (ARM 64 bit)
>   256*2^12,    521,357   ,     447,346     , -29.29
>   256*2^21,    489,358   ,     433,345     , -25.25
> 
>    * Above numbers include additional patches to remove the iommu_pgsize()
>      overheads. gcc 13.3.0, i7-12700
> 
> This version provides fairly consistent performance across formats. ARM
> unmap performance is quite different because this version supports
> contiguous pages and uses a very different algorithm for unmapping. Though
> why it is so worse compared to AMDv1 I haven't figured out yet.
> 
> The per-format commits include a more detailed chart.
> 
> There is a second branch:
>     https://github.com/jgunthorpe/linux/commits/iommu_pt_all
> Containing supporting work and future steps:
>   - ARM short descriptor (32 bit), ARM long descriptor (64 bit) formats
>   - VT-D second stage format
>   - DART v1 & v2 format
>   - Draft of a iommufd 'cut' operation to break down huge pages
>   - Draft of support for a DMA incoherent HW page table walker
>   - A compare test that checks the iommupt formats against the iopgtable
>     interface, including updating AMD to have a working iopgtable and patches
>     to make VT-D have an iopgtable for testing.
>   - A performance test to micro-benchmark map and unmap against iogptable
> 
> My strategy is to go one by one for the drivers:
>   - AMD driver conversion
>   - RISCV page table and driver
>   - Intel VT-D driver and VTDSS page table
>   - ARM SMMUv3
> 
> And concurrently work on the algorithm side:
>   - debugfs content dump, like VT-D has
>   - Cut support
>   - Increase/Decrease page size support
>   - map/unmap batching
>   - KHO
> 
> As we make more algorithm improvements the value to convert the drivers
> increases.
> 
> This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt
> 
> v1:
>   - AMD driver only, many code changes
> RFC: https://lore.kernel.org/all/0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/
> 
> Alejandro Jimenez (1):
>    iommu/amd: Use the generic iommu page table
> 
> Jason Gunthorpe (14):
>    genpt: Generic Page Table base API
>    genpt: Add Documentation/ files
>    iommupt: Add the basic structure of the iommu implementation
>    iommupt: Add the AMD IOMMU v1 page table format
>    iommupt: Add iova_to_phys op
>    iommupt: Add unmap_pages op
>    iommupt: Add map_pages op
>    iommupt: Add read_and_clear_dirty op
>    iommupt: Add a kunit test for Generic Page Table
>    iommupt: Add a mock pagetable format for iommufd selftest to use
>    iommufd: Change the selftest to use iommupt instead of xarray
>    iommupt: Add the x86 64 bit page table format
>    iommu/amd: Remove AMD io_pgtable support
>    iommupt: Add a kunit test for the IOMMU implementation
> 
>   .clang-format                                 |    1 +
>   Documentation/driver-api/generic_pt.rst       |  105 ++
>   Documentation/driver-api/index.rst            |    1 +
>   drivers/iommu/Kconfig                         |    2 +
>   drivers/iommu/Makefile                        |    1 +
>   drivers/iommu/amd/Kconfig                     |    5 +-
>   drivers/iommu/amd/Makefile                    |    2 +-
>   drivers/iommu/amd/amd_iommu.h                 |    1 -
>   drivers/iommu/amd/amd_iommu_types.h           |  109 +-
>   drivers/iommu/amd/io_pgtable.c                |  560 --------
>   drivers/iommu/amd/io_pgtable_v2.c             |  370 ------
>   drivers/iommu/amd/iommu.c                     |  493 ++++---
>   drivers/iommu/generic_pt/.kunitconfig         |   13 +
>   drivers/iommu/generic_pt/Kconfig              |   72 ++
>   drivers/iommu/generic_pt/fmt/Makefile         |   26 +
>   drivers/iommu/generic_pt/fmt/amdv1.h          |  407 ++++++
>   drivers/iommu/generic_pt/fmt/defs_amdv1.h     |   21 +
>   drivers/iommu/generic_pt/fmt/defs_x86_64.h    |   21 +
>   drivers/iommu/generic_pt/fmt/iommu_amdv1.c    |   15 +
>   drivers/iommu/generic_pt/fmt/iommu_mock.c     |   10 +
>   drivers/iommu/generic_pt/fmt/iommu_template.h |   48 +
>   drivers/iommu/generic_pt/fmt/iommu_x86_64.c   |   12 +
>   drivers/iommu/generic_pt/fmt/x86_64.h         |  241 ++++
>   drivers/iommu/generic_pt/iommu_pt.h           | 1146 +++++++++++++++++
>   drivers/iommu/generic_pt/kunit_generic_pt.h   |  721 +++++++++++
>   drivers/iommu/generic_pt/kunit_iommu.h        |  183 +++
>   drivers/iommu/generic_pt/kunit_iommu_pt.h     |  451 +++++++
>   drivers/iommu/generic_pt/pt_common.h          |  351 +++++
>   drivers/iommu/generic_pt/pt_defs.h            |  312 +++++
>   drivers/iommu/generic_pt/pt_fmt_defaults.h    |  193 +++
>   drivers/iommu/generic_pt/pt_iter.h            |  638 +++++++++
>   drivers/iommu/generic_pt/pt_log2.h            |  130 ++
>   drivers/iommu/io-pgtable.c                    |    4 -
>   drivers/iommu/iommufd/Kconfig                 |    1 +
>   drivers/iommu/iommufd/iommufd_test.h          |   11 +-
>   drivers/iommu/iommufd/selftest.c              |  439 +++----
>   include/linux/generic_pt/common.h             |  166 +++
>   include/linux/generic_pt/iommu.h              |  264 ++++
>   include/linux/io-pgtable.h                    |    2 -
>   tools/testing/selftests/iommu/iommufd.c       |   60 +-
>   tools/testing/selftests/iommu/iommufd_utils.h |   12 +
>   41 files changed, 6046 insertions(+), 1574 deletions(-)
>   create mode 100644 Documentation/driver-api/generic_pt.rst
>   delete mode 100644 drivers/iommu/amd/io_pgtable.c
>   delete mode 100644 drivers/iommu/amd/io_pgtable_v2.c
>   create mode 100644 drivers/iommu/generic_pt/.kunitconfig
>   create mode 100644 drivers/iommu/generic_pt/Kconfig
>   create mode 100644 drivers/iommu/generic_pt/fmt/Makefile
>   create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h
>   create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h
>   create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86_64.h
>   create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c
>   create mode 100644 drivers/iommu/generic_pt/fmt/iommu_mock.c
>   create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h
>   create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86_64.c
>   create mode 100644 drivers/iommu/generic_pt/fmt/x86_64.h
>   create mode 100644 drivers/iommu/generic_pt/iommu_pt.h
>   create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h
>   create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h
>   create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h
>   create mode 100644 drivers/iommu/generic_pt/pt_common.h
>   create mode 100644 drivers/iommu/generic_pt/pt_defs.h
>   create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h
>   create mode 100644 drivers/iommu/generic_pt/pt_iter.h
>   create mode 100644 drivers/iommu/generic_pt/pt_log2.h
>   create mode 100644 include/linux/generic_pt/common.h
>   create mode 100644 include/linux/generic_pt/iommu.h
> 
> 
> base-commit: db37090502f67e46541e53b91f00bbd565c96bd0


      parent reply	other threads:[~2025-05-13  1:08 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-05 14:18 [PATCH v2 00/15] Consolidate iommu page table implementations (AMD) Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 01/15] genpt: Generic Page Table base API Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 02/15] genpt: Add Documentation/ files Jason Gunthorpe
2025-05-07  2:37   ` Bagas Sanjaya
2025-05-13 18:53     ` Jason Gunthorpe
2025-05-15  6:21       ` Bagas Sanjaya
2025-05-05 14:18 ` [PATCH v2 03/15] iommupt: Add the basic structure of the iommu implementation Jason Gunthorpe
2025-05-14  9:23   ` Ankit Soni
2025-05-14 15:54     ` Jason Gunthorpe
2025-05-14 20:08       ` Alejandro Jimenez
2025-05-15 19:32         ` Jason Gunthorpe
2025-05-16  5:02         ` Ankit Soni
2025-05-16 20:39           ` Alejandro Jimenez
2025-05-05 14:18 ` [PATCH v2 04/15] iommupt: Add the AMD IOMMU v1 page table format Jason Gunthorpe
2025-05-05 18:48   ` ALOK TIWARI
2025-05-16  8:30   ` Yi Liu
2025-05-16 11:57     ` Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 05/15] iommupt: Add iova_to_phys op Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 06/15] iommupt: Add unmap_pages op Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 07/15] iommupt: Add map_pages op Jason Gunthorpe
2025-05-13  5:15   ` Ankit Soni
2025-05-13 12:00     ` Jason Gunthorpe
2025-06-05 16:49   ` Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 08/15] iommupt: Add read_and_clear_dirty op Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 09/15] iommupt: Add a kunit test for Generic Page Table Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 10/15] iommupt: Add a mock pagetable format for iommufd selftest to use Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 11/15] iommufd: Change the selftest to use iommupt instead of xarray Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 12/15] iommupt: Add the x86 64 bit page table format Jason Gunthorpe
2025-06-05 21:03   ` Jacob Pan
2025-06-06 11:43     ` Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 13/15] iommu/amd: Use the generic iommu page table Jason Gunthorpe
2025-05-05 19:09   ` ALOK TIWARI
2025-05-05 14:18 ` [PATCH v2 14/15] iommu/amd: Remove AMD io_pgtable support Jason Gunthorpe
2025-05-05 14:18 ` [PATCH v2 15/15] iommupt: Add a kunit test for the IOMMU implementation Jason Gunthorpe
2025-05-05 17:44   ` Nathan Chancellor
2025-05-05 17:47     ` Jason Gunthorpe
2025-05-05 18:00       ` Nathan Chancellor
2025-05-13  1:08 ` Alejandro Jimenez [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9a9ed15f-eb0e-45ce-8c21-ef74539aa9c2@oracle.com \
    --to=alejandro.j.jimenez@oracle.com \
    --cc=aik@amd.com \
    --cc=corbet@lwn.net \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jgowans@amazon.com \
    --cc=joro@8bytes.org \
    --cc=justinstitt@google.com \
    --cc=kevin.tian@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=michael.roth@amd.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=nick.desaulniers+lkml@gmail.com \
    --cc=ojeda@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=patches@lists.linux.dev \
    --cc=robin.murphy@arm.com \
    --cc=shuah@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox