Re: [PATCH] PCI: dwc: ep: Flush before unmap in dw_pcie_ep_raise_msix_irq()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Niklas Cassel <cassel@kernel.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Jingoo Han" <jingoohan1@gmail.com>,
	"Manivannan Sadhasivam" <mani@kernel.org>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Rob Herring" <robh@kernel.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Kishon Vijay Abraham I" <kishon@kernel.org>,
	"Gustavo Pimentel" <gustavo.pimentel@synopsys.com>,
	"Shinichiro Kawasaki" <shinichiro.kawasaki@wdc.com>,
	"Damien Le Moal" <dlemoal@kernel.org>,
	"Koichiro Den" <den@valinux.co.jp>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: dwc: ep: Flush before unmap in dw_pcie_ep_raise_msix_irq()
Date: Wed, 25 Feb 2026 23:34:27 +0100	[thread overview]
Message-ID: <aZ948_L1pE-YzBVO@fedora> (raw)
In-Reply-To: <20260225214440.GA3786788@bhelgaas>

On Wed, Feb 25, 2026 at 03:44:40PM -0600, Bjorn Helgaas wrote:
> On Wed, Feb 11, 2026 at 06:55:41PM +0100, Niklas Cassel wrote:
> > When running e.g. fio with a larger queue depth against nvmet-pci-epf we
> > get IOMMU errors on the host, e.g.:
> > 
> > arm-smmu-v3 fc900000.iommu:      0x0000010000000010
> > arm-smmu-v3 fc900000.iommu:      0x0000020000000000
> > arm-smmu-v3 fc900000.iommu:      0x000000090000f040
> > arm-smmu-v3 fc900000.iommu:      0x0000000000000000
> > arm-smmu-v3 fc900000.iommu: event: F_TRANSLATION client: 0000:01:00.0 sid: 0x100 ssid: 0x0 iova: 0x90000f040 ipa: 0x0
> > arm-smmu-v3 fc900000.iommu: unpriv data write s1 "Input address caused fault" stag: 0x0
> > 
> > The reason for this is that the writel() is immediately followed by a call
> > to unmap(), which will tear down the outbound address translation.
> > 
> > PCI writes are posted, i.e. don't wait for a completion. Thus, when the
> > writel() returns, might not have completed yet, and could even still be
> > buffered in the PCI bridge, at the time unmap() is called.
> > 
> > Flush the write by performing a read() of the same address, to ensure that
> > the write has reached the destination before calling unmap().
> > 
> > This will add some latency, but that is certainly preferred over corrupting
> > the host memory.
> > 
> > The same problem was solved for dw_pcie_ep_raise_msi_irq(), in commit
> > 8719c64e76bf ("PCI: dwc: ep: Cache MSI outbound iATU mapping"), however
> > there it was solved by dedicating an outbound iATU only for MSI. For MSI-X,
> > we can't do the same, as each vector can have a different msg_addr, and
> > because the msg_addr is allowed to be changed while the vector is masked.
> > 
> > Fixes: beb4641a787d ("PCI: dwc: Add MSI-X callbacks handler")
> > Signed-off-by: Niklas Cassel <cassel@kernel.org>
> 
> beb4641a787d appeared in v4.19 (2018!) so it doesn't strictly qualify
> as a post-merge window fix, but I do understand that it fixes a
> problem similar to the 8719c64e76bf bug that we added in v7.0.

Yes, the problem has been there a very long time.
(And I am basically the guilty one, as the commit that implemented
dw_pcie_ep_raise_msix_irq() basically copied dw_pcie_ep_raise_msi_irq()
which was originally written by me.)

However, the problem is extremely easy to reproduce with nvmet-pci-epf.

Just do a fio --rw=randread --bs=4k --iodepth=32
and you trigger it within a few seconds.

While pci-epf-test has a read and a write test case, these test cases
only raise a single IRQ at the end of the test.

nvmet-pci-epf raises an IRQ after each I/O is completed.

The problem is easier to reproduce the more IRQs you trigger.
E.g. when you run fio with --iodepth=1, you don't trigger the bug.


At least I am glad that we have finally discovered and fixed this bug
after all such a long time.

We have the pci-epf-mhi driver, the pci-epf-ntb, and the pci-epf-vntb
driver, but since this problem has not been discovered before, it is
obvious that they don't raise as many IRQs as nvmet-pci-epf.
And if you look at those EPF drivers, pci-epf-mhi and pci-epf-ntb only
raise an interrupt once after link up.

pci-epf-vntb appears to do it on each doorbell_set(), but that is
probably also not using interrupts nearly as much as nvmet-pci-epf.


Kind regards,
Niklas

     prev parent reply	other threads:[~2026-02-25 22:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11 17:55 [PATCH] PCI: dwc: ep: Flush before unmap in dw_pcie_ep_raise_msix_irq() Niklas Cassel
2026-02-11 19:26 ` Frank Li
2026-02-12 12:47   ` Niklas Cassel
2026-02-25 21:44 ` Bjorn Helgaas
2026-02-25 22:34   ` Niklas Cassel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZ948_L1pE-YzBVO@fedora \
    --to=cassel@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=den@valinux.co.jp \
    --cc=dlemoal@kernel.org \
    --cc=gustavo.pimentel@synopsys.com \
    --cc=helgaas@kernel.org \
    --cc=jingoohan1@gmail.com \
    --cc=kishon@kernel.org \
    --cc=kwilczynski@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lpieralisi@kernel.org \
    --cc=mani@kernel.org \
    --cc=robh@kernel.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.