public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Matlack <dmatlack@google.com>
To: Baolu Lu <baolu.lu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Alexander Lobakin <aleksander.lobakin@intel.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Bartosz Pawlowski <bartosz.pawlowski@intel.com>,
	David Woodhouse <dwmw2@infradead.org>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	Raghavendra Rao Ananta <rananta@google.com>
Subject: Re: [PATCH v2] PCI: Disable ATS via quirk before notifying IOMMU drivers
Date: Thu, 2 Apr 2026 21:02:35 +0000	[thread overview]
Message-ID: <ac7ZazqDHIVfOPtC@google.com> (raw)
In-Reply-To: <CALzav=d8GxA6uT6FDeThZkT4n7auvp=iuYWAhytbiCuq5dBWkQ@mail.gmail.com>

On 2026-03-31 08:32 AM, David Matlack wrote:
> On Mon, Mar 30, 2026 at 8:32 PM Baolu Lu <baolu.lu@linux.intel.com> wrote:
> >
> > On 3/28/26 05:16, David Matlack wrote:
> > > Ensure that PCI devices with ATS disabled via quirk have it disabled
> > > before IOMMU drivers are notified about the device rather than after.
> > > Fix this by converting the existing quirks from final to early fixups
> > > and changing the quirk logic to set a new no_ats bit in struct pci_dev
> > > that prevents pci_dev.ats_cap from ever gettting set.
> > >
> > > This change ensures that pci_ats_supported() takes quirks into account
> > > during iommu_ops.probe_device(), when IOMMU drivers are first notified
> > > about devices. It also ensures that pci_ats_supported() returns the same
> > > value when the device is released in iommu_ops.release_device().
> > >
> > > Notably, the Intel IOMMU driver uses pci_ats_supported() in
> > > probe/release to determine whether to add/remove a device from a data
> > > structure, which easily leads to a use-after-free without this fix.
> >
> > Can you please shed more light on the above issue? In my investigation,
> > iommu_ops.probe_device() is always called after the no_ats quirk,
> > regardless of whether this patch is applied.
> >
> > The diff of the changes I made for testing is as follows:
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 442271a1b92a..c024964ac53b 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -3271,6 +3271,8 @@ static struct iommu_device
> > *intel_iommu_probe_device(struct device *dev)
> >                                  info->pfsid = pci_dev_id(pci_physfn(pdev));
> >                          info->ats_qdep = pci_ats_queue_depth(pdev);
> >                  }
> > +               pci_info(pdev, "ATS %s\n", info->ats_supported ?
> > +                        "supported" : "not supported");
> >                  if (sm_supported(iommu)) {
> >                          if (pasid_supported(iommu)) {
> >                                  int features = pci_pasid_features(pdev);
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 48946cca4be7..c63616d108b7 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5714,6 +5714,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,
> > 0x1457, quirk_intel_e2000_no_ats);
> >   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1459,
> > quirk_intel_e2000_no_ats);
> >   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x145a,
> > quirk_intel_e2000_no_ats);
> >   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x145c,
> > quirk_intel_e2000_no_ats);
> > +
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0b25, quirk_no_ats);
> >   #endif /* CONFIG_PCI_ATS */
> >
> >   /* Freescale PCIe doesn't support MSI in RC mode */
> >
> >
> > The related kernel messages are shown below:
> >
> > # dmesg | grep "0000:00:01.0"
> > [   15.834944] pci 0000:00:01.0: [8086:0b25] type 00 class 0x088000 PCIe
> > Root Complex Integrated Endpoint
> > [   15.836382] pci 0000:00:01.0: BAR 0 [mem
> > 0x1e0fff980000-0x1e0fff99ffff 64bit pref]
> > [   15.836655] pci 0000:00:01.0: BAR 2 [mem
> > 0x1e0fff900000-0x1e0fff93ffff 64bit pref]
> > [   15.837904] pci 0000:00:01.0: calling
> > quirk_igfx_skip_te_disable+0x0/0xe0 @ 1
> > [   15.838614] pci 0000:00:01.0: quirk_igfx_skip_te_disable+0x0/0xe0
> > took 0 usecs
> > [   21.205177] pci 0000:00:01.0: calling  quirk_no_ats+0x0/0x40 @ 1
> > [   21.206767] pci 0000:00:01.0: disabling ATS
> > [   21.207916] pci 0000:00:01.0: quirk_no_ats+0x0/0x40 took 1122 usecs
> > [   21.305357] pci 0000:00:01.0: DMAR: ATS not supported
> > [   21.306925] pci 0000:00:01.0: Adding to iommu group 4
> > [   42.564912] idxd 0000:00:01.0: Intel(R) Accelerator Device (v200)
> > [   42.568653] probe of 0000:00:01.0 returned 0 after 87413 usecs
> >
> >
> > Anything I missed?
> 
> The Closes link above has the details of my investigation:
> 
>   https://lore.kernel.org/linux-iommu/aYUQ_HkDJU9kjsUl@google.com/
> 
> Can you check if you see the same ordering for VFs? That was the
> scenario where I saw the IOMMU drivers notified before final fixups
> were applied.

During boot I observe the same order as you. PCI final fixups get
applied twice actually, and both times is before iommu probe_device().

First time:

[    1.687987][    T1] pci 0000:00:04.0: pci_fixup_device
[    1.690987][    T1] WARNING: drivers/pci/quirks.c:227 at pci_fixup_device+0xd5/0x230, CPU#5: swapper/0/1
[    1.708986][    T1] RIP: 0010:pci_fixup_device+0xf9/0x230
[    1.755989][    T1]  pci_bus_add_device+0x31/0x90
[    1.758986][    T1]  pci_bus_add_devices+0x39/0x70
[    1.760986][    T1]  acpi_pci_root_add+0x815/0xa60
[    1.767986][    T1]  acpi_bus_attach+0x260/0x2f0
[    1.773988][    T1]  device_for_each_child+0x6a/0xa0
[    1.775986][    T1]  acpi_dev_for_each_child+0x39/0x60
[    1.780986][    T1]  acpi_bus_attach+0xe8/0x2f0
[    1.785987][    T1]  device_for_each_child+0x6a/0xa0
[    1.787989][    T1]  acpi_dev_for_each_child+0x39/0x60
[    1.792985][    T1]  acpi_bus_attach+0xe8/0x2f0
[    1.795986][    T1]  acpi_bus_scan+0x6a/0x200
[    1.799986][    T1]  acpi_scan_init+0xb7/0x1c0
[    1.803988][    T1]  acpi_init+0xad/0xd0
[    1.808986][    T1]  do_one_initcall+0xba/0x2a0
[    1.825986][    T1]  do_initcall_level+0x7c/0xf0
[    1.827988][    T1]  do_initcalls+0x43/0x70
[    1.829986][    T1]  kernel_init_freeable+0xd7/0x140
[    1.834985][    T1]  kernel_init+0x1a/0x130
[    1.836986][    T1]  ret_from_fork+0xf4/0x280
[    1.840985][    T1]  ret_from_fork_asm+0x1a/0x30

Second time:

[    4.156625][    T1] pci 0000:00:04.0: pci_fixup_device
[    4.159282][    T1] WARNING: drivers/pci/quirks.c:227 at pci_fixup_device+0xd5/0x230, CPU#1: swapper/0/1
[    4.178115][    T1] RIP: 0010:pci_fixup_device+0xf9/0x230
[    4.228013][    T1]  pci_apply_final_quirks+0x6a/0x170
[    4.233485][    T1]  do_one_initcall+0xba/0x2a0
[    4.251171][    T1]  do_initcall_level+0x7c/0xf0
[    4.253521][    T1]  do_initcalls+0x43/0x70
[    4.255653][    T1]  kernel_init_freeable+0xd7/0x140
[    4.260602][    T1]  kernel_init+0x1a/0x130
[    4.262745][    T1]  ret_from_fork+0xf4/0x280
[    4.267345][    T1]  ret_from_fork_asm+0x1a/0x30

Then iommu probe_device() happens:

[    5.744756][    T1] pci 0000:00:04.0: intel_iommu_probe_device
[    5.747688][    T1] WARNING: drivers/iommu/intel/iommu.c:3257 at intel_iommu_probe_device+0x3e9/0x840, CPU#1: swapper/0/1
[    5.767127][    T1] RIP: 0010:intel_iommu_probe_device+0x403/0x840
[    5.816998][    T1]  __iommu_probe_device+0xe7/0x4d0
[    5.819522][    T1]  probe_iommu_group+0x29/0x50
[    5.824593][    T1]  bus_for_each_dev+0x111/0x130
[    5.826965][    T1]  iommu_device_register+0xfb/0x250
[    5.829492][    T1]  intel_iommu_init+0x36f/0x450
[    5.834374][    T1]  pci_iommu_init+0x16/0x40
[    5.836614][    T1]  do_one_initcall+0xba/0x2a0
[    5.854121][    T1]  do_initcall_level+0x7c/0xf0
[    5.856450][    T1]  do_initcalls+0x43/0x70
[    5.858563][    T1]  kernel_init_freeable+0xd7/0x140
[    5.863427][    T1]  kernel_init+0x1a/0x130
[    5.865522][    T1]  ret_from_fork+0xf4/0x280
[    5.870169][    T1]  ret_from_fork_asm+0x1a/0x30

But when VFs are created the order is reversed:

[  545.718507][  T242] pci 0000:01:00.1: intel_iommu_probe_device
[  545.718627][  T242] WARNING: drivers/iommu/intel/iommu.c:3257 at intel_iommu_probe_device+0x3e9/0x840, CPU#0: bash/242
[  545.719468][  T242] RIP: 0010:intel_iommu_probe_device+0x403/0x840
[  545.721181][  T242]  __iommu_probe_device+0xe7/0x4d0
[  545.721278][  T242]  iommu_bus_notifier+0x37/0x100
[  545.721373][  T242]  blocking_notifier_call_chain+0x62/0x100
[  545.721485][  T242]  bus_notify+0x97/0xc0
[  545.721567][  T242]  device_add+0x222/0x400
[  545.721646][  T242]  pci_device_add+0x488/0x6e0
[  545.721746][  T242]  pci_iov_add_virtfn+0x2f2/0x3e0
[  545.721935][  T242]  sriov_add_vfs+0x2c/0x60
[  545.722064][  T242]  sriov_enable+0x306/0x4a0
[  545.722157][  T242]  virtio_pci_sriov_configure+0x6a/0x90
[  545.722257][  T242]  sriov_numvfs_store+0xd7/0x1c0
[  545.722349][  T242]  kernfs_fop_write_iter+0xfe/0x180
[  545.722453][  T242]  vfs_write+0x2e4/0x460
[  545.722533][  T242]  ksys_write+0x70/0x100
[  545.722613][  T242]  do_syscall_64+0xd1/0x8b0
[  545.722832][  T242]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[  545.725241][  T242] pci 0000:01:00.1: pci_fixup_device
[  545.725339][  T242] WARNING: drivers/pci/quirks.c:227 at pci_fixup_device+0xd5/0x230, CPU#0: bash/242
[  545.727806][  T242]  pci_bus_add_device+0x31/0x90
[  545.727902][  T242]  pci_iov_add_virtfn+0x3c9/0x3e0
[  545.728118][  T242]  sriov_add_vfs+0x2c/0x60
[  545.728226][  T242]  sriov_enable+0x306/0x4a0
[  545.728320][  T242]  virtio_pci_sriov_configure+0x6a/0x90
[  545.728416][  T242]  sriov_numvfs_store+0xd7/0x1c0
[  545.728512][  T242]  kernfs_fop_write_iter+0xfe/0x180
[  545.728609][  T242]  vfs_write+0x2e4/0x460
[  545.728690][  T242]  ksys_write+0x70/0x100
[  545.728771][  T242]  do_syscall_64+0xd1/0x8b0
[  545.728866][  T242]  ? clear_bhb_loop+0x30/0x80
[  545.728961][  T242]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

My naive reading of the PCI hotplug code makes me think it has the same
ordering as VF creation. More generally, any PCI device that is added
after the IOMMU bus notifiers are set up will probe the device before
final fixups are applied. I can clarify that in the commit message in
v3.

Here is the patch I used to collect the traces above:

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ef7613b177b9..122cb8f610a3 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3254,6 +3254,7 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
        info->dev = dev;
        info->iommu = iommu;
        if (dev_is_pci(dev)) {
+               pci_WARN(to_pci_dev(dev), 1, "intel_iommu_probe_device\n");
                if (ecap_dev_iotlb_support(iommu->ecap) &&
                    pci_ats_supported(pdev) &&
                    dmar_ats_supported(pdev, iommu)) {
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 48946cca4be7..af9cb39c9236 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -224,6 +224,7 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev)
                break;

        case pci_fixup_final:
+               pci_WARN(dev, 1, "pci_fixup_device\n");
                if (!pci_apply_fixup_final_quirks)
                        return;
                start = __start_pci_fixups_final;

  reply	other threads:[~2026-04-02 21:02 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27 21:16 [PATCH v2] PCI: Disable ATS via quirk before notifying IOMMU drivers David Matlack
2026-03-30 22:30 ` Bjorn Helgaas
2026-03-31  3:31 ` Baolu Lu
2026-03-31 15:32   ` David Matlack
2026-04-02 21:02     ` David Matlack [this message]
2026-03-31 16:01 ` David Matlack
2026-03-31 18:38   ` Bjorn Helgaas
2026-03-31 18:46     ` David Matlack
2026-03-31 19:00       ` Andy Shevchenko
2026-03-31 19:02       ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac7ZazqDHIVfOPtC@google.com \
    --to=dmatlack@google.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bartosz.pawlowski@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dwmw2@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rananta@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox