From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73048224B0E for ; Fri, 27 Feb 2026 01:03:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.48 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772154202; cv=none; b=Wv97rXFXQ9wlbpTffA+M7J/gjnn8fYdPfRH4vLl1zheQYyvDC/W2f7SnPduHEQr6n0LMNNK5y9xxqY3MH76XZyZqNN/3IODFqyeQXOshoj0Yk9vbKCfsbAwYMG6kPeTlcyctPLbRUrRxyZdGoBg6Ct4Y9M8A/EQyrhO0YVJFO2k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772154202; c=relaxed/simple; bh=4moxmzVHrdrAVQ3Wpcg0uPGs3o1ETCMNgVSry1NIHdA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=tta3L/FS1N9gSOkJOauIroye/Og+V08/xlyaCmcjG66ljQ3fgaTd2cqnuN7mSwdv7+vXhseATBdupBYyuVOtB+xaQ8aLB8BM6M1d81cbWyR0DYSEzZX6ghDDtUJSh9VP01WZaJ8kiwKIYawjbDAYakfVyycHJ/dIa148q2uVP+w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca; spf=pass smtp.mailfrom=ziepe.ca; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b=mIqcYidc; arc=none smtp.client-ip=209.85.219.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="mIqcYidc" Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-899b676b5d2so19122826d6.1 for ; Thu, 26 Feb 2026 17:03:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1772154199; x=1772758999; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=W8sfeIc4s5Af3K+2MPble1+p0RbLO1g1JdfLlBJahUc=; b=mIqcYidc1hXtkSWoWZw0q0rB1f0+7MMthUR5EuLhekUAqav9+AWlQK3rwihMPBnFex 7TuYMoUyBjuVZI5QJhlNSUGdibTs7u9qWdurtNIEReN2tQjpT+0Jr2GMcyJIjD4czEkC aor71nxvxTHVtPxVJmRmubs2ZRCyWXHmJCSlVxmCZ2DR8H85eJaMqi+EkSwya8rC4IYS 4v77iUkxV/XINh9no9xcnTtwaXH9smt6XK8fYd6eGTAl+iAGXhNllf43eHARIpYE6skg gvKPJsa0IeLLudD2ZhiXc4GMvTjj9nhzYH8sPPfBfMvlKVe2nhFR5m2/ny24e/ByNNEG Fckg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772154199; x=1772758999; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W8sfeIc4s5Af3K+2MPble1+p0RbLO1g1JdfLlBJahUc=; b=PmHo5FKHfafkqqBtkE7hukgN+L+6A7E5FTK6TtcLRz0PSB6c7/TZx2TZRO6CnJxQGs nXqLwk5GpY4lwi4532i0kJrvPucKW53Yb0Qfk7OGKasI3rD2yntwyalHyiC/qktTE1Nj GF4cPtJbKGf26IrUnTPPjZkmvrVX3t0VWvIrZ3GckY8UQmHbibHDef43AhcVh/Bo8sHj c2eQTHVtdOWIqp2eALz4m8xIpLN3JbL2ibifeEOUPzV0XSRhCuOY8tTvg5Urmmm+0hMS qebup4i6CnAuc8eLT7/WjrKWvVr+7sJTFbTZY4OLEfQFM+ffFQu8P/cmWyw3Y/uzQeNM TxiQ== X-Forwarded-Encrypted: i=1; AJvYcCX5NEGHffnwnv1BIuR78a/zYIbCQkI8mqvmrIENFs5gpu0eho93J/F8CeLvqSuMW2kXeaNpqLDtyfhInss=@vger.kernel.org X-Gm-Message-State: AOJu0YxlMoJWEee84Xuz/UPQq4gavga1r8V4aeqLUyqiKSUJKQsfizf2 0O6DZZhsZQWFDqLGaAGG+8Fi4plooh8j8rTxALR0PBBsIGaQ2WhRivZy6YDxEnfRbcI= X-Gm-Gg: ATEYQzzTV4BDhmRLEbE3MqWr1BeK36zQWKFK5QvXzy+hD4XTC+HLcXB0H3eWQdnxZXk Mn5IZXzpHSBa0i3LQ0fcxc1McUh2dvZXVCNlHUWvxZXO90TG3nc9h5wyY6MkYwADZ9nWM5L+QmQ 2K8OHwk5YYy0/dKkyoq7qNS1ZaGBtZwJsgy/mw3L3gS9KU6RIN3LywV7rFYtisW3yALwH7O4siB TOPfWWPGFGAIORbafyfioi72PoKlYaDAVTEDL51WKMMAAyz2EqGNvJBZoyIbRcYRag6dZZ0Qz5j umFicnqpel0f9nfzLv0NggGJIr5U71CTgX2Mk6sP9HY8vNPuD7OZGaR+QxJjLAHrW4ySh6BxvZH UKQh2+mVeP6hfys+VKipKHGWsjshKYbozH+3SZWdFhww3GaiW8f7jtUWJ8PAWbbiHVarLOi3e4E GbJYLu+m/CQxuzeCs66pNgPcx03GxWG6gbLSfM6xgWOPnHWYIfl0+p2rsTFvPkqeRSgoLLKHglw NQqAZFU X-Received: by 2002:a05:6214:1c4a:b0:896:fea0:cd05 with SMTP id 6a1803df08f44-899d1dc4aafmr15567566d6.16.1772154199204; Thu, 26 Feb 2026 17:03:19 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-162-112-119.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.112.119]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-507458f8b2asm40391191cf.4.2026.02.26.17.03.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Feb 2026 17:03:18 -0800 (PST) Received: from jgg by wakko with local (Exim 4.97) (envelope-from ) id 1vvmGD-00000000sD1-3Pf2; Thu, 26 Feb 2026 21:03:17 -0400 Date: Thu, 26 Feb 2026 21:03:17 -0400 From: Jason Gunthorpe To: Antheas Kapenekakis Cc: Robin Murphy , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Joerg Roedel , Will Deacon , Vasant Hegde , Alejandro Jimenez , dnaim@cachyos.org, Mario.Limonciello@amd.com Subject: Re: [PATCH v1] iommu: Skip mapping at address 0x0 if it already exists Message-ID: <20260227010317.GD44359@ziepe.ca> References: <20260221235050.2558321-1-lkml@antheas.dev> <6bac3817-9652-4146-ab44-2a9518c3b339@arm.com> <7e3a1d3c-64cb-43a8-bd2f-05b24a8d1611@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Thu, Feb 26, 2026 at 09:40:10PM +0100, Antheas Kapenekakis wrote: > I am still concerned about unaligned checks. It is a functional change > that can cause regressions in all devices. The approach of this patch > does not affect behavior in other devices. I would like for Jason to > weigh in. I think Robin's solution is very clever, but I share the concern regarding what all the implementations do. So, I fed this question to Claude. It did find two counter points (see below for the whole report I had it generate): Implementations that lose the offset s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk, returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back. iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA. mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No step adds the sub-page offset back. I checked myself and it seems correct. I didn't try to confirm that the cases it says are OK are in fact OK, but it paints a convincing picture. I doubt S390 uses this function you are fixing, and I have no idea about mtk. Below is also a diff how Claude thought to fix it, I didn't try to check it. So, I'd say if Robin is OK with these outliers then it a good and fine approach. Jason iova_to_phys Implementation Survey Entry point: iommu_iova_to_phys() in drivers/iommu/iommu.c:2502 calls domain->ops->iova_to_phys(domain, iova) via iommu_domain_ops. Category 1 — Delegates to io-pgtable These drivers hold an io_pgtable_ops * and call ops->iova_to_phys(ops, iova). The actual walk happens in one of the io-pgtable backends listed in Category 4. ---------------------------------------------------------------------------------------------------------- Driver Function (file:line) Ops assignment Notes ------------- -------------------------------------------------- ----------------------- ----------------- arm-smmu-v3 arm_smmu_iova_to_phys :3767 Pure delegation drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3471 arm-smmu arm_smmu_iova_to_phys :1655 S1 with v1/v2 drivers/iommu/arm/arm-smmu/arm-smmu.c:1387 FEAT_TRANS_OPS uses hw ATS1PR registers (CB_PAR), otherwise io-pgtable apple-dart apple_dart_iova_to_phys :1021 Pure delegation → drivers/iommu/apple-dart.c:531 io-pgtable-dart qcom_iommu qcom_iommu_iova_to_phys :605 Delegation with drivers/iommu/arm/arm-smmu/qcom_iommu.c:492 spinlock ipmmu-vmsa ipmmu_iova_to_phys drivers/iommu/ipmmu-vmsa.c:702 :895 Uses ARM_32_LPAE_S1 format mtk_iommu mtk_iommu_iova_to_phys :1073 Delegation + 4GB drivers/iommu/mtk_iommu.c:861 mode PA remap fixup ---------------------------------------------------------------------------------------------------------- Category 2 — Open-coded page table walk These drivers implement their own page table traversal without io-pgtable. --------------------------------------------------------------------------------------------------------- Driver Function (file:line) Ops assignment Walk structure ---------------- ------------------------------------ -------------------- ------------------------------ sun50i-iommu sun50i_iommu_iova_to_phys :860 2-level (DTE → PTE) drivers/iommu/sun50i-iommu.c:662 exynos-iommu exynos_iommu_iova_to_phys :1487 2-level (section/large/small drivers/iommu/exynos-iommu.c:1375 page) riscv-iommu riscv_iommu_iova_to_phys :1355 Sv39/48/57 via drivers/iommu/riscv/iommu.c:1280 riscv_iommu_pte_fetch (:1166) omap-iommu omap_iommu_iova_to_phys :1727 iopgtable_lookup_entry helper drivers/iommu/omap-iommu.c:1596 (super section/section/large/small) rockchip-iommu rk_iommu_iova_to_phys :1190 2-level (DTE → PTE) drivers/iommu/rockchip-iommu.c:651 msm_iommu msm_iommu_iova_to_phys :709 Hardware walk: writes VA to drivers/iommu/msm_iommu.c:526 V2PPR register, reads PA from PAR register s390-iommu s390_iommu_iova_to_phys :1186 3-level ZPCI (region → segment drivers/iommu/s390-iommu.c:989 → page) tegra-smmu tegra_smmu_iova_to_phys :1010 2-level via drivers/iommu/tegra-smmu.c:806 tegra_smmu_pte_lookup mtk_iommu_v1 mtk_iommu_v1_iova_to_phys :593 Flat single-level table drivers/iommu/mtk_iommu_v1.c:396 sprd-iommu sprd_iommu_iova_to_phys :423 Flat single-level table drivers/iommu/sprd-iommu.c:369 --------------------------------------------------------------------------------------------------------- Category 3 — Special / trivial ------------------------------------------------------------------------------------------- Driver Function (file:line) Ops assignment Mechanism -------------- ------------------------------------- ---------------------- --------------- fsl_pamu fsl_pamu_iova_to_phys :438 Identity: drivers/iommu/fsl_pamu_domain.c:172 returns iova (after aperture bounds check) virtio-iommu viommu_iova_to_phys :1105 Interval tree drivers/iommu/virtio-iommu.c:915 reverse lookup (no page table) ------------------------------------------------------------------------------------------- Category 4 — io_pgtable_ops backends These implement struct io_pgtable_ops.iova_to_phys and are the ultimate walk functions called by Category 1 drivers. -------------------------------------------------------------------------------------------------- Backend Function (file:line) Ops assignment Walk strategy ----------- ---------------------------------------- -------------------- ------------------------ ARM LPAE arm_lpae_iova_to_phys :950 Visitor pattern via (64-bit) drivers/iommu/io-pgtable-arm.c:734 __arm_lpae_iopte_walk; covers ARM_64_LPAE_S1, S2, ARM_MALI_LPAE ARM v7s arm_v7s_iova_to_phys :716 Iterative do-while (32-bit) drivers/iommu/io-pgtable-arm-v7s.c:644 2-level; handles contiguous entries Apple DART dart_iova_to_phys :402 dart_get_last pre-walks drivers/iommu/io-pgtable-dart.c:336 to leaf table, then single lookup -------------------------------------------------------------------------------------------------- Category 5 — generic_pt framework All these drivers use IOMMU_PT_DOMAIN_OPS(fmt) which routes iova_to_phys into the template function pt_iommu__iova_to_phys at drivers/iommu/generic_pt/iommu_pt.h:170. The walk uses pt_walk_range + PT_MAKE_LEVELS to generate a fully-inlined unrolled per-level walk; OA extracted via pt_entry_oa_exact. --------------------------------------------------------------------------------- Driver Ops struct (file:line) Format ---------------- ----------------------------------------------- ---------------- AMD IOMMU v1 amdv1_ops drivers/iommu/amd/iommu.c:2662 amdv1 AMD IOMMU v2 amdv2_ops drivers/iommu/amd/iommu.c:2740 x86_64 Intel VT-d intel_fs_paging_domain_ops x86_64 first-stage drivers/iommu/intel/iommu.c:3886 Intel VT-d intel_ss_paging_domain_ops vtdss second-stage drivers/iommu/intel/iommu.c:3897 iommufd selftest mock_domain_ops etc amdv1_mock / drivers/iommu/iommufd/selftest.c:403,411,425 amdv1 KUnit wrapper pgtbl_ops Delegates to drivers/iommu/generic_pt/kunit_iommu_cmp.h:86 io_pgtable_ops for comparison testing --------------------------------------------------------------------------------- Sub-page offset handling When iova_to_phys(iova) is called with an IOVA that is not aligned to the start of the mapped page/block (e.g. iova_to_phys(1) when a 4KB page is mapped at IOVA 0), most implementations return the exact physical address including the sub-page offset (phys_base + offset). Two do not. Summary ------------------------------------------------------------------------------------------------------ Implementation Offset preserved? Mechanism ------------------------- ------------------------- -------------------------------------------------- arm_lpae (io-pgtable) YES iopte_to_paddr(pte) | (iova & (block_size-1)) arm_v7s (io-pgtable) YES iopte_to_paddr(pte) | (iova & ~LVL_MASK) dart (io-pgtable) YES iopte_to_paddr(pte) | (iova & (pgsize-1)) sun50i-iommu YES page_addr + FIELD_GET(GENMASK(11,0), iova) exynos-iommu YES *_phys(entry) + *_offs(iova) per granularity riscv-iommu YES pfn_to_phys(pfn) | (iova & (pte_size-1)) omap-iommu YES (descriptor & mask) | (va & ~mask) rockchip-iommu YES pt_address(pte) + rk_iova_page_offset(iova) msm_iommu YES HW PAR register + VA low bits spliced back in s390-iommu NO pte & ZPCI_PTE_ADDR_MASK — offset discarded tegra-smmu YES SMMU_PFN_PHYS(pfn) + SMMU_OFFSET_IN_PAGE(iova) mtk_iommu_v1 NO pte & ~(page_size-1) — offset discarded sprd-iommu YES (pte << PAGE_SHIFT) + (iova & (page_size-1)) fsl_pamu YES (trivial) return iova — identity mapping virtio-iommu YES paddr + (iova - mapping->iova.start) generic_pt YES _pt_entry_oa_fast() | log2_mod(va, entry_lg2sz) ------------------------------------------------------------------------------------------------------ Category 1 drivers (arm-smmu-v3, arm-smmu, apple-dart, qcom_iommu, ipmmu-vmsa, mtk_iommu) inherit the behavior of their io-pgtable backend — all preserve offset. Implementations that lose the offset s390-iommu (drivers/iommu/s390-iommu.c:989): After the 3-level ZPCI walk, returns pte & ZPCI_PTE_ADDR_MASK with no sub-page offset added back. iova_to_phys(0) and iova_to_phys(1) return the same page-aligned PA. mtk_iommu_v1 (drivers/iommu/mtk_iommu_v1.c:396): Looks up the PTE by iova >> PAGE_SHIFT (discarding offset), then returns pte & ~(page_size-1). No step adds the sub-page offset back. diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index c8d8eff5373d30..8db16989270cd8 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -401,7 +401,8 @@ static phys_addr_t mtk_iommu_v1_iova_to_phys(struct iommu_domain *domain, dma_ad spin_lock_irqsave(&dom->pgtlock, flags); pa = *(dom->pgt_va + (iova >> MT2701_IOMMU_PAGE_SHIFT)); - pa = pa & (~(MT2701_IOMMU_PAGE_SIZE - 1)); + pa = (pa & (~(MT2701_IOMMU_PAGE_SIZE - 1))) | + (iova & (MT2701_IOMMU_PAGE_SIZE - 1)); spin_unlock_irqrestore(&dom->pgtlock, flags); return pa; diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c index fe679850af2861..57d27f3a984ed6 100644 --- a/drivers/iommu/s390-iommu.c +++ b/drivers/iommu/s390-iommu.c @@ -1015,7 +1015,8 @@ static phys_addr_t s390_iommu_iova_to_phys(struct iommu_domain *domain, pto = get_st_pto(ste); pte = READ_ONCE(pto[px]); if (pt_entry_isvalid(pte)) - phys = pte & ZPCI_PTE_ADDR_MASK; + phys = (pte & ZPCI_PTE_ADDR_MASK) | + (iova & ~ZPCI_PTE_ADDR_MASK); } }