Re: [PATCH] PCI: dwc: ep: Flush before unmap in dw_pcie_ep_raise_msix_irq()

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

From: Niklas Cassel <cassel@kernel.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Jingoo Han" <jingoohan1@gmail.com>,
	"Manivannan Sadhasivam" <mani@kernel.org>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Rob Herring" <robh@kernel.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Kishon Vijay Abraham I" <kishon@kernel.org>,
	"Gustavo Pimentel" <gustavo.pimentel@synopsys.com>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Damien Le Moal" <dlemoal@kernel.org>,
	"Koichiro Den" <den@valinux.co.jp>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: dwc: ep: Flush before unmap in dw_pcie_ep_raise_msix_irq()
Date: Wed, 25 Feb 2026 23:34:27 +0100	[thread overview]
Message-ID: <aZ948_L1pE-YzBVO@fedora> (raw)
In-Reply-To: <20260225214440.GA3786788@bhelgaas>

On Wed, Feb 25, 2026 at 03:44:40PM -0600, Bjorn Helgaas wrote:
> On Wed, Feb 11, 2026 at 06:55:41PM +0100, Niklas Cassel wrote:
> > When running e.g. fio with a larger queue depth against nvmet-pci-epf we
> > get IOMMU errors on the host, e.g.:
> > 
> > arm-smmu-v3 fc900000.iommu:      0x0000010000000010
> > arm-smmu-v3 fc900000.iommu:      0x0000020000000000
> > arm-smmu-v3 fc900000.iommu:      0x000000090000f040
> > arm-smmu-v3 fc900000.iommu:      0x0000000000000000
> > arm-smmu-v3 fc900000.iommu: event: F_TRANSLATION client: 0000:01:00.0 sid: 0x100 ssid: 0x0 iova: 0x90000f040 ipa: 0x0
> > arm-smmu-v3 fc900000.iommu: unpriv data write s1 "Input address caused fault" stag: 0x0
> > 
> > The reason for this is that the writel() is immediately followed by a call
> > to unmap(), which will tear down the outbound address translation.
> > 
> > PCI writes are posted, i.e. don't wait for a completion. Thus, when the
> > writel() returns, might not have completed yet, and could even still be
> > buffered in the PCI bridge, at the time unmap() is called.
> > 
> > Flush the write by performing a read() of the same address, to ensure that
> > the write has reached the destination before calling unmap().
> > 
> > This will add some latency, but that is certainly preferred over corrupting
> > the host memory.
> > 
> > The same problem was solved for dw_pcie_ep_raise_msi_irq(), in commit
> > 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping"), however
> > there it was solved by dedicating an outbound iATU only for MSI. For MSI-X,
> > we can't do the same, as each vector can have a different msg_addr, and
> > because the msg_addr is allowed to be changed while the vector is masked.
> > 
> > Fixes: beb4641a787d ("PCI: dwc: Add MSI-X callbacks handler")
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> 
> beb4641a787d appeared in v4.19 (2018!) so it doesn't strictly qualify
> as a post-merge window fix, but I do understand that it fixes a
> problem similar to the 8719c64e76bf bug that we added in v7.0.

Yes, the problem has been there a very long time.
(And I am basically the guilty one, as the commit that implemented
dw_pcie_ep_raise_msix_irq() basically copied dw_pcie_ep_raise_msi_irq()
which was originally written by me.)

However, the problem is extremely easy to reproduce with nvmet-pci-epf.

Just do a fio --rw=randread --bs=4k --iodepth=32
and you trigger it within a few seconds.

While pci-epf-test has a read and a write test case, these test cases
only raise a single IRQ at the end of the test.

nvmet-pci-epf raises an IRQ after each I/O is completed.

The problem is easier to reproduce the more IRQs you trigger.
E.g. when you run fio with --iodepth=1, you don't trigger the bug.


At least I am glad that we have finally discovered and fixed this bug
after all such a long time.

We have the pci-epf-mhi driver, the pci-epf-ntb, and the pci-epf-vntb
driver, but since this problem has not been discovered before, it is
obvious that they don't raise as many IRQs as nvmet-pci-epf.
And if you look at those EPF drivers, pci-epf-mhi and pci-epf-ntb only
raise an interrupt once after link up.

pci-epf-vntb appears to do it on each doorbell_set(), but that is
probably also not using interrupts nearly as much as nvmet-pci-epf.


Kind regards,
Niklas

     prev parent reply	other threads:[~2026-02-25 22:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11 17:55 [PATCH] PCI: dwc: ep: Flush before unmap in dw_pcie_ep_raise_msix_irq() Niklas Cassel
2026-02-11 19:26 ` Frank Li
2026-02-12 12:47   ` Niklas Cassel
2026-02-25 21:44 ` Bjorn Helgaas
2026-02-25 22:34   ` Niklas Cassel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZ948_L1pE-YzBVO@fedora \
    --to=cassel@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=den@valinux.co.jp \
    --cc=dlemoal@kernel.org \
    --cc=gustavo.pimentel@synopsys.com \
    --cc=helgaas@kernel.org \
    --cc=jingoohan1@gmail.com \
    --cc=kishon@kernel.org \
    --cc=kwilczynski@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lpieralisi@kernel.org \
    --cc=mani@kernel.org \
    --cc=robh@kernel.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox