public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Baolu Lu <baolu.lu@linux.intel.com>
Cc: "Bjorn Helgaas" <bhelgaas@google.com>,
	"Joerg Roedel" <jroedel@suse.de>,
	"Matt Fagnani" <matt.fagnani@bell.net>,
	"Christian König" <christian.koenig@amd.com>,
	"Jason Gunthorpe" <jgg@nvidia.com>,
	"Kevin Tian" <kevin.tian@intel.com>,
	"Vasant Hegde" <vasant.hegde@amd.com>,
	"Tony Zhu" <tony.zhu@intel.com>,
	linux-pci@vger.kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/1] PCI: Add translated request only flag for pci_enable_pasid()
Date: Tue, 31 Jan 2023 18:14:19 -0600	[thread overview]
Message-ID: <20230201001419.GA1776086@bhelgaas> (raw)
In-Reply-To: <030e66e0-fb54-b77d-5094-4786684ba97d@linux.intel.com>

On Tue, Jan 31, 2023 at 08:56:13PM +0800, Baolu Lu wrote:
> On 2023/1/31 2:38, Bjorn Helgaas wrote:
> > > PCI: Add translated request only flag for pci_enable_pasid()
> > > 
> > > The PCIe fabric routes Memory Requests based on the TLP address, ignoring
> > > the PASID. In order to ensure system integrity, commit 201007ef707a ("PCI:
> > > Enable PASID only when ACS RR & UF enabled on upstream path") requires
> > > some ACS features being supported on device's upstream path when enabling
> > > PCI/PASID.

Looking up 201007ef707a to see what ensuring system integrity means,
it prevents Memory Requests with PASID, which should always be routed
to the RC, from being mistakenly routed as peer-to-peer requests.

> > > However, above change causes the Linux kernel boots to black screen on a
> > > system with below graphic device:
> >
> > We need a PCIe concept-level description of the issue first, i.e., in
> > terms of DMA, PASID, ACS, etc.  Then we can mention the AMD GPU issue
> > as an instance.
> 
> How about below description?

Thanks, this is exactly the sort of thing I'm looking for.  But my
understanding of ATS/PRI/PASID is weak, so I'm still working through
this.  Tell me when I say something wrong below...

> PCIe endpoints can use ATS to request DMA remapping hardware to
> translate an IOVA to its mapped physical address. If the translation is
> missing or the permissions are insufficient, the PRI is used to trigger
> an I/O page fault. The IOMMU driver will fill the mapping with desired
> permissions and return the translated address to the device.

In PCIe spec language, I think you're saying that a PCIe Function may
contain an ATC.  If the ATC Capability Enable bit is set, the Function
can issue Translation Requests.

The TA (aka IOMMU) will respond with a Translation Completion.  If the
Completion is a CplD, it contains the translated address and the
Function can store the entry in its ATC.  I assume the I/O page fault
case corresponds to a Cpl (with no data) meaning that the TA could not
translate the address.

If the TA doesn't have a mapping with the desired permissions, and the
Function's Page Request Capability Enable bit is set, it may issue a
Page Request Message.  It's up to the TA/IOMMU to make this message
visible to the OS, which can make the page resident, create an IOMMU
mapping, and enable a PRG Response Message.  After the Function
receives the PRG Response Message, it would issue another Translation
Request.

> The translated address is specified by the IOMMU driver. The IOMMU
> driver ensures that the address is a DMA buffer address instead of any
> P2P address in the PCI fabric. Therefore, any translated memory request
> will eventually be routed to IOMMU regardless of whether there is ACS
> control in the up-streaming path.

A Memory Request with an address that is not a P2P address, i.e., it
is not contained in any bridge aperture, will *always* be routed
toward the RC, won't it?  Isn't that the case regardless of whether
the address is translated or untranslated, and even regardless of ACS?

IIUC, ACS basically causes peer-to-peer requests to be routed upstream
instead of directly to the peer.

OK, reading this again, I realize that I just restated exactly what
you had already written, sorry about that.

> AMD GPU is one of those devices.

I guess you mean the AMD GPU has ATS, PRI, and PASID Capabilities?
And furthermore, that the GPU *always* uses Translated addresses with
PASID?

So I guess what's going on here is that if:

  - A device only uses PASID with Translated addresses, and 
  - those Translated addresses are never P2P addresses, then
  - those transactions will always be routed to the RC.  

And this applies even if there is no ACS or ACS doesn't support
PCI_ACS_RR and PCI_ACS_UF.

The black screen happens because ... ?

What can we include in the commit log to help people find this fix?  I
see these in the bugzilla:

  WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:251 pci_disable_pri+0x75/0x80
  WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:419 pci_disable_pasid+0x45/0x50

(These look like defects in pdev_pri_ats_enable(), so really just
distractions)

  kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874
  kfd kfd: amdgpu: device 1002:9874 NOT added due to errors
  BUG: kernel NULL pointer dereference, address: 0000000000000058
  RIP: 0010:report_iommu_fault+0x11/0x90

I couldn't figure out the NULL pointer dereference.  I expected it to
be from a BUG() or similar in report_iommu_fault(), but I don't see
that.

> Furthermore, it always uses translated memory requests for PASID.
>
> > > 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc.
> > >          [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] (rev ca)
> > >          (prog-if 00 [VGA controller])
> > >          DeviceName: ATI EG BROADWAY
> > >          Subsystem: Hewlett-Packard Company Device 8332

> > > The AMD iommu driver allocates a new domain (called v2 domain) for the
> > "v2 domain" needs to be something greppable -- an identifier,
> > filename, etc.
> 
> The code reads,
> 
> 2052         if (iommu_feature(iommu, FEATURE_GT) &&
> 2053             iommu_feature(iommu, FEATURE_PPR)) {
> 2054                 iommu->is_iommu_v2   = true;
> 
> So, how about
> 
> ..The AMD GPU has a private interface to its own AMD IOMMU, which could
> be detected by the FEATURE_GT && FEATURE_PPR features. The AMD iommu
> driver allocates a special domain for the GPU device ..

Where is this special domain allocated?  I think the above tests for
*IOMMU* features (I assume "GTSup: Guest translations supported" and
"PPRSup: Peripheral page request support" based on the AMD IOMMU
spec).  It doesn't test that this is a GPU.

This change doesn't feel safe for all possible devices that have a
PASID Capability because we don't know whether they *always* use
Translated addresses with PASID TLPs.

Bjorn

  reply	other threads:[~2023-02-01  0:14 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-14  7:34 [PATCH v3 1/1] PCI: Add translated request only flag for pci_enable_pasid() Lu Baolu
2023-01-16 15:42 ` Jason Gunthorpe
2023-01-27 11:30   ` Linux kernel regression tracking (Thorsten Leemhuis)
2023-01-27 17:30 ` Bjorn Helgaas
2023-01-28  7:52   ` Tian, Kevin
2023-01-29  8:42   ` Baolu Lu
2023-01-30 18:38     ` Bjorn Helgaas
2023-01-30 18:47       ` Jason Gunthorpe
2023-01-31 23:50         ` Bjorn Helgaas
2023-02-01  2:28           ` Jason Gunthorpe
2023-01-31 12:25       ` Baolu Lu
2023-02-01 16:58         ` Bjorn Helgaas
2023-02-02  3:08           ` Baolu Lu
2023-02-02 20:12             ` Bjorn Helgaas
2023-02-02 20:45               ` Jason Gunthorpe
2023-02-03 18:20                 ` Bjorn Helgaas
2023-02-03 18:52                   ` Jason Gunthorpe
2023-02-06  4:28                     ` Tian, Kevin
2023-01-31 12:56       ` Baolu Lu
2023-02-01  0:14         ` Bjorn Helgaas [this message]
2023-02-01  2:36           ` Jason Gunthorpe
2023-02-01 14:09             ` Jonathan Cameron
2023-02-01  5:18           ` Vasant Hegde
2023-02-01  5:51           ` Baolu Lu
2023-02-01  5:59           ` Baolu Lu
2023-02-01  6:31           ` Baolu Lu
2023-02-01 14:22             ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230201001419.GA1776086@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=christian.koenig@amd.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=jroedel@suse.de \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=matt.fagnani@bell.net \
    --cc=tony.zhu@intel.com \
    --cc=vasant.hegde@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox