public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Antheas Kapenekakis <lkml@antheas.dev>
Cc: Robin Murphy <robin.murphy@arm.com>,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Vasant Hegde <vasant.hegde@amd.com>,
	Alejandro Jimenez <alejandro.j.jimenez@oracle.com>,
	dnaim@cachyos.org, Mario.Limonciello@amd.com
Subject: Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists
Date: Thu, 26 Feb 2026 21:03:17 -0400	[thread overview]
Message-ID: <20260227010317.GD44359@ziepe.ca> (raw)
In-Reply-To: <CAGwozwGsh3_M+e35M2Ld7827W=w26-71xGXwgeEBu-ZcZGLACQ@mail.gmail.com>

On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote:
> I am still concerned about unaligned checks. It is a functional change
> that can cause regressions in all devices. The approach of this patch
> does not affect behavior in other devices. I would like for Jason to
> weigh in.

I think Robin's solution is very clever, but I share the concern
regarding what all the implementations do.

So, I fed this question to Claude. It did find two counter points (see
below for the whole report I had it generate):

     Implementations that lose the offset

     s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
     returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
     iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.

     mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
     iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
     step adds the sub-page offset back.

I checked myself and it seems correct. I didn't try to confirm that
the cases it says are OK are in fact OK, but it paints a convincing
picture.

I doubt S390 uses this function you are fixing, and I have no idea
about mtk. Below is also a diff how Claude thought to fix it, I didn't
try to check it.

So, I'd say if Robin is OK with these outliers then it a good and fine
approach.

Jason

iova_to_phys Implementation Survey

Entry point: iommu_iova_to_phys() in drivers/iommu/iommu.c:2502 calls
domain->ops->iova_to_phys(domain, iova) via iommu_domain_ops.

Category 1 — Delegates to io-pgtable

These drivers hold an io_pgtable_ops * and call ops->iova_to_phys(ops, iova).
The actual walk happens in one of the io-pgtable backends listed in Category
4.

  ----------------------------------------------------------------------------------------------------------
  Driver        Function (file:line)                               Ops assignment          Notes
  ------------- -------------------------------------------------- ----------------------- -----------------
  arm-smmu-v3   arm_smmu_iova_to_phys                              :3767                   Pure delegation
                drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3471                           

  arm-smmu      arm_smmu_iova_to_phys                              :1655                   S1 with
  v1/v2         drivers/iommu/arm/arm-smmu/arm-smmu.c:1387                                 FEAT_TRANS_OPS
                                                                                           uses hw ATS1PR
                                                                                           registers
                                                                                           (CB_PAR),
                                                                                           otherwise
                                                                                           io-pgtable

  apple-dart    apple_dart_iova_to_phys                            :1021                   Pure delegation →
                drivers/iommu/apple-dart.c:531                                             io-pgtable-dart

  qcom_iommu    qcom_iommu_iova_to_phys                            :605                    Delegation with
                drivers/iommu/arm/arm-smmu/qcom_iommu.c:492                                spinlock

  ipmmu-vmsa    ipmmu_iova_to_phys drivers/iommu/ipmmu-vmsa.c:702  :895                    Uses
                                                                                           ARM_32_LPAE_S1
                                                                                           format

  mtk_iommu     mtk_iommu_iova_to_phys                             :1073                   Delegation + 4GB
                drivers/iommu/mtk_iommu.c:861                                              mode PA remap
                                                                                           fixup
  ----------------------------------------------------------------------------------------------------------

Category 2 — Open-coded page table walk

These drivers implement their own page table traversal without io-pgtable.

  ---------------------------------------------------------------------------------------------------------
  Driver           Function (file:line)                 Ops assignment       Walk structure
  ---------------- ------------------------------------ -------------------- ------------------------------
  sun50i-iommu     sun50i_iommu_iova_to_phys            :860                 2-level (DTE → PTE)
                   drivers/iommu/sun50i-iommu.c:662                          

  exynos-iommu     exynos_iommu_iova_to_phys            :1487                2-level (section/large/small
                   drivers/iommu/exynos-iommu.c:1375                         page)

  riscv-iommu      riscv_iommu_iova_to_phys             :1355                Sv39/48/57 via
                   drivers/iommu/riscv/iommu.c:1280                          riscv_iommu_pte_fetch (:1166)

  omap-iommu       omap_iommu_iova_to_phys              :1727                iopgtable_lookup_entry helper
                   drivers/iommu/omap-iommu.c:1596                           (super
                                                                             section/section/large/small)

  rockchip-iommu   rk_iommu_iova_to_phys                :1190                2-level (DTE → PTE)
                   drivers/iommu/rockchip-iommu.c:651                        

  msm_iommu        msm_iommu_iova_to_phys               :709                 Hardware walk: writes VA to
                   drivers/iommu/msm_iommu.c:526                             V2PPR register, reads PA from
                                                                             PAR register

  s390-iommu       s390_iommu_iova_to_phys              :1186                3-level ZPCI (region → segment
                   drivers/iommu/s390-iommu.c:989                            → page)

  tegra-smmu       tegra_smmu_iova_to_phys              :1010                2-level via
                   drivers/iommu/tegra-smmu.c:806                            tegra_smmu_pte_lookup

  mtk_iommu_v1     mtk_iommu_v1_iova_to_phys            :593                 Flat single-level table
                   drivers/iommu/mtk_iommu_v1.c:396                          

  sprd-iommu       sprd_iommu_iova_to_phys              :423                 Flat single-level table
                   drivers/iommu/sprd-iommu.c:369                            
  ---------------------------------------------------------------------------------------------------------

Category 3 — Special / trivial

  -------------------------------------------------------------------------------------------
  Driver         Function (file:line)                  Ops assignment         Mechanism
  -------------- ------------------------------------- ---------------------- ---------------
  fsl_pamu       fsl_pamu_iova_to_phys                 :438                   Identity:
                 drivers/iommu/fsl_pamu_domain.c:172                          returns iova
                                                                              (after aperture
                                                                              bounds check)

  virtio-iommu   viommu_iova_to_phys                   :1105                  Interval tree
                 drivers/iommu/virtio-iommu.c:915                             reverse lookup
                                                                              (no page table)
  -------------------------------------------------------------------------------------------

Category 4 — io_pgtable_ops backends

These implement struct io_pgtable_ops.iova_to_phys and are the ultimate walk
functions called by Category 1 drivers.

  --------------------------------------------------------------------------------------------------
  Backend     Function (file:line)                     Ops assignment       Walk strategy
  ----------- ---------------------------------------- -------------------- ------------------------
  ARM LPAE    arm_lpae_iova_to_phys                    :950                 Visitor pattern via
  (64-bit)    drivers/iommu/io-pgtable-arm.c:734                            __arm_lpae_iopte_walk;
                                                                            covers ARM_64_LPAE_S1,
                                                                            S2, ARM_MALI_LPAE

  ARM v7s     arm_v7s_iova_to_phys                     :716                 Iterative do-while
  (32-bit)    drivers/iommu/io-pgtable-arm-v7s.c:644                        2-level; handles
                                                                            contiguous entries

  Apple DART  dart_iova_to_phys                        :402                 dart_get_last pre-walks
              drivers/iommu/io-pgtable-dart.c:336                           to leaf table, then
                                                                            single lookup
  --------------------------------------------------------------------------------------------------

Category 5 — generic_pt framework

All these drivers use IOMMU_PT_DOMAIN_OPS(fmt) which routes iova_to_phys into
the template function pt_iommu_<fmt>_iova_to_phys at
drivers/iommu/generic_pt/iommu_pt.h:170. The walk uses pt_walk_range +
PT_MAKE_LEVELS to generate a fully-inlined unrolled per-level walk; OA
extracted via pt_entry_oa_exact.

  ---------------------------------------------------------------------------------
  Driver           Ops struct (file:line)                          Format
  ---------------- ----------------------------------------------- ----------------
  AMD IOMMU v1     amdv1_ops drivers/iommu/amd/iommu.c:2662        amdv1

  AMD IOMMU v2     amdv2_ops drivers/iommu/amd/iommu.c:2740        x86_64

  Intel VT-d       intel_fs_paging_domain_ops                      x86_64
  first-stage      drivers/iommu/intel/iommu.c:3886                

  Intel VT-d       intel_ss_paging_domain_ops                      vtdss
  second-stage     drivers/iommu/intel/iommu.c:3897                

  iommufd selftest mock_domain_ops etc                             amdv1_mock /
                   drivers/iommu/iommufd/selftest.c:403,411,425    amdv1

  KUnit wrapper    pgtbl_ops                                       Delegates to
                   drivers/iommu/generic_pt/kunit_iommu_cmp.h:86   io_pgtable_ops
                                                                   for comparison
                                                                   testing
  ---------------------------------------------------------------------------------

Sub-page offset handling

When iova_to_phys(iova) is called with an IOVA that is not aligned to the
start of the mapped page/block (e.g. iova_to_phys(1) when a 4KB page is mapped
at IOVA 0), most implementations return the exact physical address including
the sub-page offset (phys_base + offset). Two do not.

Summary

  ------------------------------------------------------------------------------------------------------
  Implementation            Offset preserved?         Mechanism
  ------------------------- ------------------------- --------------------------------------------------
  arm_lpae (io-pgtable)     YES                       iopte_to_paddr(pte) | (iova & (block_size-1))

  arm_v7s (io-pgtable)      YES                       iopte_to_paddr(pte) | (iova & ~LVL_MASK)

  dart (io-pgtable)         YES                       iopte_to_paddr(pte) | (iova & (pgsize-1))

  sun50i-iommu              YES                       page_addr + FIELD_GET(GENMASK(11,0), iova)

  exynos-iommu              YES                       *_phys(entry) + *_offs(iova) per granularity

  riscv-iommu               YES                       pfn_to_phys(pfn) | (iova & (pte_size-1))

  omap-iommu                YES                       (descriptor & mask) | (va & ~mask)

  rockchip-iommu            YES                       pt_address(pte) + rk_iova_page_offset(iova)

  msm_iommu                 YES                       HW PAR register + VA low bits spliced back in

  s390-iommu                NO                        pte & ZPCI_PTE_ADDR_MASK — offset discarded

  tegra-smmu                YES                       SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova)

  mtk_iommu_v1              NO                        pte & ~(page_size-1) — offset discarded

  sprd-iommu                YES                       (pte << PAGE_SHIFT) + (iova & (page_size-1))

  fsl_pamu                  YES (trivial)             return iova — identity mapping

  virtio-iommu              YES                       paddr + (iova - mapping->iova.start)

  generic_pt                YES                       _pt_entry_oa_fast() | log2_mod(va, entry_lg2sz)
  ------------------------------------------------------------------------------------------------------

Category 1 drivers (arm-smmu-v3, arm-smmu, apple-dart, qcom_iommu, ipmmu-vmsa,
mtk_iommu) inherit the behavior of their io-pgtable backend — all preserve
offset.

Implementations that lose the offset

s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk,
returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back.
iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA.

mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by
iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No
step adds the sub-page offset back.


diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index c8d8eff5373d30..8db16989270cd8 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -401,7 +401,8 @@ static phys_addr_t mtk_iommu_v1_iova_to_phys(struct iommu_domain *domain, dma_ad
 
 	spin_lock_irqsave(&dom->pgtlock, flags);
 	pa = *(dom->pgt_va + (iova >> MT2701_IOMMU_PAGE_SHIFT));
-	pa = pa & (~(MT2701_IOMMU_PAGE_SIZE - 1));
+	pa = (pa & (~(MT2701_IOMMU_PAGE_SIZE - 1))) |
+	     (iova & (MT2701_IOMMU_PAGE_SIZE - 1));
 	spin_unlock_irqrestore(&dom->pgtlock, flags);
 
 	return pa;
diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index fe679850af2861..57d27f3a984ed6 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -1015,7 +1015,8 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain,
 			pto = get_st_pto(ste);
 			pte = READ_ONCE(pto[px]);
 			if (pt_entry_isvalid(pte))
-				phys = pte & ZPCI_PTE_ADDR_MASK;
+				phys = (pte & ZPCI_PTE_ADDR_MASK) |
+				       (iova & ~ZPCI_PTE_ADDR_MASK);
 		}
 	}

  reply	other threads:[~2026-02-27  1:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-21 23:50 [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists Antheas Kapenekakis
2026-02-23  6:02 ` Vasant Hegde
2026-02-23  7:46   ` Antheas Kapenekakis
2026-02-24  8:25     ` Vasant Hegde
2026-02-24  9:16       ` Antheas Kapenekakis
2026-02-24 19:23 ` Jason Gunthorpe
2026-02-24 19:33   ` Antheas Kapenekakis
2026-02-24 19:43     ` Jason Gunthorpe
2026-02-24 19:51       ` Antheas Kapenekakis
2026-02-25 18:27         ` Robin Murphy
2026-02-25 22:05           ` Antheas Kapenekakis
2026-02-26 19:46             ` Robin Murphy
2026-02-26 20:40               ` Antheas Kapenekakis
2026-02-27  1:03                 ` Jason Gunthorpe [this message]
2026-02-27  8:06                   ` Antheas Kapenekakis
2026-02-27 14:01                     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260227010317.GD44359@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=Mario.Limonciello@amd.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=dnaim@cachyos.org \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml@antheas.dev \
    --cc=robin.murphy@arm.com \
    --cc=vasant.hegde@amd.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox