All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix VT-d when the IOVA limit is small
@ 2025-11-27 23:54 Jason Gunthorpe
  2025-11-27 23:54 ` [PATCH 1/2] iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires Jason Gunthorpe
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2025-11-27 23:54 UTC (permalink / raw)
  To: David Woodhouse, iommu, Joerg Roedel, Robin Murphy,
	Suravee Suthikulpanit, Will Deacon
  Cc: Lu Baolu, Calvin Owens, Chaitanya Kumar Borah, Joerg Roedel,
	Kevin Tian, patches, Tina Zhang

Calvin notes:

=======================
A Skylake machine has problems with strict translation on next-20251124:

    pci 0000:06:00.0: Adding to iommu group 18
    ------------[ cut here ]------------
    WARNING: drivers/iommu/iommu.c:3055 at iommu_setup_default_domain+0x268/0x2f0, CPU#2: swapper/0/1
    CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc6-next-20251124 #1 PREEMPTLAZY
    Hardware name: ASUSTeK COMPUTER INC. WS C246M PRO Series/WS C246M PRO Series, BIOS 6101 06/26/2024
    RIP: 0010:iommu_setup_default_domain+0x268/0x2f0
    <snip>
    Call Trace:
     <TASK>
     iommu_device_register+0x126/0x200
     intel_iommu_init+0x2bf/0x580
     pci_iommu_init+0xb/0x30
     do_one_initcall+0xad/0x1c0
     kernel_init_freeable+0x238/0x290
     kernel_init+0x16/0x120
     ret_from_fork+0x1ba/0x1f0
     ret_from_fork_asm+0x11/0x20
     </TASK>
    Kernel panic - not syncing: kernel: panic_on_warn set ...
    <snip>
    Dumping ftrace buffer:
    ---------------------------------
     2)               |    __iommu_group_set_domain_internal() { /* <-iommu_setup_default_domain+0x25e/0x2f0 */
     2)               |      __iommu_device_set_domain() { /* <-__iommu_group_set_domain_internal+0x6d/0x140 */
     2)               |        __iommu_attach_device() { /* <-__iommu_device_set_domain+0x6d/0xb0 */
     2)               |          intel_iommu_attach_device() { /* <-__iommu_attach_device+0x1f/0xe0 */
     2)   0.140 us    |            device_block_translation(); /* <-intel_iommu_attach_device+0x19/0x80 ret=0xffffffff81b5e980 */
     2)               |            paging_domain_compatible() { /* <-intel_iommu_attach_device+0x24/0x80 */
     2)               |              paging_domain_compatible_second_stage() { /* <-paging_domain_compatible+0x47/0x170 */
     2)   0.137 us    |                pt_iommu_vtdss_hw_info(); /* <-paging_domain_compatible_second_stage+0x29/0x1a0 ret=0x1 */
     2)   0.530 us    |              } /* paging_domain_compatible_second_stage ret=-22 */
     2)   0.907 us    |            } /* paging_domain_compatible ret=-22 */
     2)   1.653 us    |          } /* intel_iommu_attach_device ret=-22 */
     2)   2.157 us    |        } /* __iommu_attach_device ret=-22 */
     2)   2.528 us    |      } /* __iommu_device_set_domain ret=-22 */
     2)   2.954 us    |    } /* __iommu_group_set_domain_internal ret=-22 */
    ---------------------------------
    Rebooting in 10 seconds..

The failing condition in paging_domain_compatible_second_stage() is:

    /* Page table level is supported. */
    if (!(cap_sagaw(iommu->cap) & BIT(pt_info.aw)))
        return -EINVAL;

This happens because, for many domains on this machine, MGAW=39 but
SAGAW=0x04: that claims a 39-bit maximum address width, but also claims
to only support 48-bit/4-level paging, which seems odd.

Before the GENERIC_PT rewrite, the kernel only looked at SAGAW, so this
machine has been happily running for years using 4-level paging.

Now, the kernel refuses to use 4-level paging because MGAW=39. But SAGAW
claims not to support anything else, so we hit the -EINVAL case above
and fail to initialize.

If I force 4-level paging, everything works. If I force 39-bit/3-level
paging, nothing works (lots of bad context faults). So it seems like the
machine really only supports 4-level paging despite the 3-level MGAW.
=======================

Which is not a possible condition that was considered when this was
made. Allow VT-d to pass in the top level of the page table as well as the
max vasz as seperate things. This lets it setup something compatible with
the HW.

This is happening because VT-d doesn't quite fit into the architecture we
expect on Linux where the IOMMU driver should be reporting its full page
table capability as an aperture and bus width or addressing limitations
should be attached to the end point devices as a DMA mask. Instead VT-d is
putting the device limitations in the iommu as well.

Jason Gunthorpe (2):
  iommupt/vtd: Allow VT-d to have a larger table top than the vasz
    requires
  iommupt/vtd: Support mgaw's less than a 4 level walk for first stage

 drivers/iommu/amd/iommu.c             |  7 +++-
 drivers/iommu/generic_pt/fmt/vtdss.h  | 19 +++------
 drivers/iommu/generic_pt/fmt/x86_64.h | 17 ++++----
 drivers/iommu/generic_pt/iommu_pt.h   | 14 +++++++
 drivers/iommu/intel/iommu.c           | 58 +++++++++++++++++----------
 include/linux/generic_pt/iommu.h      |  4 ++
 6 files changed, 73 insertions(+), 46 deletions(-)


base-commit: 6b99b135be1b79230acb88865abef6fb0fba35bb
-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-11-28  7:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-27 23:54 [PATCH 0/2] Fix VT-d when the IOVA limit is small Jason Gunthorpe
2025-11-27 23:54 ` [PATCH 1/2] iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires Jason Gunthorpe
2025-11-28  6:54   ` Baolu Lu
2025-11-27 23:54 ` [PATCH 2/2] iommupt/vtd: Support mgaw's less than a 4 level walk for first stage Jason Gunthorpe
2025-11-28  7:03   ` Baolu Lu
2025-11-28  0:04 ` [PATCH 0/2] Fix VT-d when the IOVA limit is small Jason Gunthorpe
2025-11-28  7:48 ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.