From: Jason Gunthorpe <jgg@nvidia.com>
To: David Woodhouse <dwmw2@infradead.org>,
iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
Robin Murphy <robin.murphy@arm.com>,
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
Will Deacon <will@kernel.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>,
Calvin Owens <calvin@wbinvd.org>,
Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>,
Joerg Roedel <joerg.roedel@amd.com>,
Kevin Tian <kevin.tian@intel.com>,
patches@lists.linux.dev, Tina Zhang <tina.zhang@intel.com>
Subject: Re: [PATCH 0/2] Fix VT-d when the IOVA limit is small
Date: Thu, 27 Nov 2025 20:04:38 -0400 [thread overview]
Message-ID: <20251128000438.GA787428@nvidia.com> (raw)
In-Reply-To: <0-v1-ae5d7f0f2620+13b-vtd_mgaw_jgg@nvidia.com>
On Thu, Nov 27, 2025 at 07:54:06PM -0400, Jason Gunthorpe wrote:
> Calvin notes:
>
> =======================
> A Skylake machine has problems with strict translation on next-20251124:
>
> pci 0000:06:00.0: Adding to iommu group 18
> ------------[ cut here ]------------
> WARNING: drivers/iommu/iommu.c:3055 at iommu_setup_default_domain+0x268/0x2f0, CPU#2: swapper/0/1
> CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc6-next-20251124 #1 PREEMPTLAZY
> Hardware name: ASUSTeK COMPUTER INC. WS C246M PRO Series/WS C246M PRO Series, BIOS 6101 06/26/2024
> RIP: 0010:iommu_setup_default_domain+0x268/0x2f0
> <snip>
> Call Trace:
> <TASK>
> iommu_device_register+0x126/0x200
> intel_iommu_init+0x2bf/0x580
> pci_iommu_init+0xb/0x30
> do_one_initcall+0xad/0x1c0
> kernel_init_freeable+0x238/0x290
> kernel_init+0x16/0x120
> ret_from_fork+0x1ba/0x1f0
> ret_from_fork_asm+0x11/0x20
> </TASK>
> Kernel panic - not syncing: kernel: panic_on_warn set ...
> <snip>
> Dumping ftrace buffer:
> ---------------------------------
> 2) | __iommu_group_set_domain_internal() { /* <-iommu_setup_default_domain+0x25e/0x2f0 */
> 2) | __iommu_device_set_domain() { /* <-__iommu_group_set_domain_internal+0x6d/0x140 */
> 2) | __iommu_attach_device() { /* <-__iommu_device_set_domain+0x6d/0xb0 */
> 2) | intel_iommu_attach_device() { /* <-__iommu_attach_device+0x1f/0xe0 */
> 2) 0.140 us | device_block_translation(); /* <-intel_iommu_attach_device+0x19/0x80 ret=0xffffffff81b5e980 */
> 2) | paging_domain_compatible() { /* <-intel_iommu_attach_device+0x24/0x80 */
> 2) | paging_domain_compatible_second_stage() { /* <-paging_domain_compatible+0x47/0x170 */
> 2) 0.137 us | pt_iommu_vtdss_hw_info(); /* <-paging_domain_compatible_second_stage+0x29/0x1a0 ret=0x1 */
> 2) 0.530 us | } /* paging_domain_compatible_second_stage ret=-22 */
> 2) 0.907 us | } /* paging_domain_compatible ret=-22 */
> 2) 1.653 us | } /* intel_iommu_attach_device ret=-22 */
> 2) 2.157 us | } /* __iommu_attach_device ret=-22 */
> 2) 2.528 us | } /* __iommu_device_set_domain ret=-22 */
> 2) 2.954 us | } /* __iommu_group_set_domain_internal ret=-22 */
> ---------------------------------
> Rebooting in 10 seconds..
>
> The failing condition in paging_domain_compatible_second_stage() is:
>
> /* Page table level is supported. */
> if (!(cap_sagaw(iommu->cap) & BIT(pt_info.aw)))
> return -EINVAL;
>
> This happens because, for many domains on this machine, MGAW=39 but
> SAGAW=0x04: that claims a 39-bit maximum address width, but also claims
> to only support 48-bit/4-level paging, which seems odd.
>
> Before the GENERIC_PT rewrite, the kernel only looked at SAGAW, so this
> machine has been happily running for years using 4-level paging.
>
> Now, the kernel refuses to use 4-level paging because MGAW=39. But SAGAW
> claims not to support anything else, so we hit the -EINVAL case above
> and fail to initialize.
>
> If I force 4-level paging, everything works. If I force 39-bit/3-level
> paging, nothing works (lots of bad context faults). So it seems like the
> machine really only supports 4-level paging despite the 3-level MGAW.
> =======================
>
> Which is not a possible condition that was considered when this was
> made. Allow VT-d to pass in the top level of the page table as well as the
> max vasz as seperate things. This lets it setup something compatible with
> the HW.
>
> This is happening because VT-d doesn't quite fit into the architecture we
> expect on Linux where the IOMMU driver should be reporting its full page
> table capability as an aperture and bus width or addressing limitations
> should be attached to the end point devices as a DMA mask. Instead VT-d is
> putting the device limitations in the iommu as well.
>
> Jason Gunthorpe (2):
> iommupt/vtd: Allow VT-d to have a larger table top than the vasz
> requires
> iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
>
> drivers/iommu/amd/iommu.c | 7 +++-
> drivers/iommu/generic_pt/fmt/vtdss.h | 19 +++------
> drivers/iommu/generic_pt/fmt/x86_64.h | 17 ++++----
> drivers/iommu/generic_pt/iommu_pt.h | 14 +++++++
> drivers/iommu/intel/iommu.c | 58 +++++++++++++++++----------
> include/linux/generic_pt/iommu.h | 4 ++
> 6 files changed, 73 insertions(+), 46 deletions(-)
I forgot:
Tested-by: Calvin Owens <calvin@wbinvd.org>
Jason
next prev parent reply other threads:[~2025-11-28 0:04 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-27 23:54 [PATCH 0/2] Fix VT-d when the IOVA limit is small Jason Gunthorpe
2025-11-27 23:54 ` [PATCH 1/2] iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires Jason Gunthorpe
2025-11-28 6:54 ` Baolu Lu
2025-11-27 23:54 ` [PATCH 2/2] iommupt/vtd: Support mgaw's less than a 4 level walk for first stage Jason Gunthorpe
2025-11-28 7:03 ` Baolu Lu
2025-11-28 0:04 ` Jason Gunthorpe [this message]
2025-11-28 7:48 ` [PATCH 0/2] Fix VT-d when the IOVA limit is small Joerg Roedel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251128000438.GA787428@nvidia.com \
--to=jgg@nvidia.com \
--cc=baolu.lu@linux.intel.com \
--cc=calvin@wbinvd.org \
--cc=chaitanya.kumar.borah@intel.com \
--cc=dwmw2@infradead.org \
--cc=iommu@lists.linux.dev \
--cc=joerg.roedel@amd.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=patches@lists.linux.dev \
--cc=robin.murphy@arm.com \
--cc=suravee.suthikulpanit@amd.com \
--cc=tina.zhang@intel.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.