public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Baolu Lu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Nicolin Chen <nicolinc@nvidia.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
Date: Tue, 5 Dec 2023 11:52:46 -0400	[thread overview]
Message-ID: <20231205155246.GR1489931@ziepe.ca> (raw)
In-Reply-To: <BN9PR11MB527687FD2CD3EB17F04B22458C85A@BN9PR11MB5276.namprd11.prod.outlook.com>

On Tue, Dec 05, 2023 at 03:23:05AM +0000, Tian, Kevin wrote:

> > I didn't said the PRI would fail, I said the ATS would fail with a
> > non-present.
> > 
> > It has to work this way or it is completely broken with respect to
> > existing races in the mm side. Agents must retry non-present ATS
> > answers until you get a present or a ATS failure.
> 
> My understanding of the sequence is like below:
> 
> <'D' for device, 'I' for IOMMU>
> 
>   (D) send a ATS translation request
>   (I) respond translation result
>   (D) If success then sends DMA to the target page
>       otherwise send a PRI request
>         (I) raise an IOMMU interrupt allowing sw to fix the translation
>         (I) generate a PRI response to device
>         (D) if success then jump to the first step to retry
>             otherwise abort the current request
> 
> If mm changes the mapping after a success PRI response, mmu notifier
> callback in iommu driver needs to wait for device tlb invalidation completion
> which the device will order properly with outstanding DMA requests using
> the old translation.
> 
> If you refer to the 'retry' after receiving a success PRI response, then yes.
> 
> but there is really no reason to retry upon a PRI response failure which
> indicates that the faulting address is not a valid one which OS would like
> to fix.

Right

> > Draining has to be ordered correctly with whatever the device is
> > doing. Drain needs to come after FLR, for instance. It needs to come
> > after a work queue reset, because drain doesn't make any sense unless
> > it is coupled with a DMA stop at the device.
> 
> Okay that makes sense. As Baolu and you already agreed let's separate
> this fix out of this series.
> 
> The minor interesting aspect is how to document this requirement
> clearly so drivers won't skip calling it when sva is enabled. 

All changes to translation inside kernel drivers should only be done
once the DMA is halted, otherwise things possibily become security
troubled.

We should document this clearly, it is already expected in common
cases like using the DMA API and when removing() drivers. It also
applies when the driver is manually changing a PASID.

The issue is not drain, it is that the HW is still doing DMA on the
PASID and the PASID may be assigned to a new process. This kernel
*must* prevent this directly and strongly.

If the device requires a drain to halt its DMA, then that is a device
specific sequence. Otherwise it should simply halt its DMA in whatever
device specific way it has.

> > Hacking a DMA stop by forcing a blocking translation is not logically
> > correct, with wrong ordering the device may see unexpected translation
> > failures which may trigger AERs or bad things..
> 
> where is such hack? though the current implementation of draining
> is not clean, it's put inside pasid-disable-sequence instead of forcing
> a blocking translation implicitly in iommu driver i.e. it's still the driver
> making decision for what translation to be used...

It is mis-understanding the purpose of drain.

In normal operating cases PRI just flows and the device will
eventually, naturally, reach a stable terminal case. We don't provide
any ordering guarentees across translation changes so PRI just follows
that design. If you change the translation with ongoing DMA then you
just don't know what order things will happen in.

The main purpose of drain is to keep the PRI protocol itself in sync
against events on the device side that cause it to forget about the
tags it has already issued. Eg a FLR should reset the tag record. If a
device then issues a new PRI with a tag that matches a tag that was
outstanding prior to FLR we can get a corruption.

So any drain sequence should start with the device halting new
PRIs. We flush all PRI tags from the system completely, and then the
device may resume issuing new PRIs.

When the device resets it's tag labels is a property of the device.

Notice none of this has anything to do with change of
translation. Change of translation, or flush of ATC, does not
invalidate the tags.

A secondary case is to help devices halt their DMA when they cannot do
this on their own.

Jason

  reply	other threads:[~2023-12-05 15:52 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
2023-11-15  3:02 ` [PATCH v7 01/12] iommu: Move iommu fault data to linux/iommu.h Lu Baolu
2023-12-04 10:52   ` Yi Liu
2023-11-15  3:02 ` [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting Lu Baolu
2023-12-01 15:42   ` Jason Gunthorpe
2023-12-04 10:54   ` Yi Liu
2023-12-05 11:48     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 03/12] iommu: Remove unrecoverable fault data Lu Baolu
2023-12-04 10:58   ` Yi Liu
2023-12-05 11:55     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 04/12] iommu: Cleanup iopf data structure definitions Lu Baolu
2023-12-04 11:03   ` Yi Liu
2023-11-15  3:02 ` [PATCH v7 05/12] iommu: Merge iopf_device_param into iommu_fault_param Lu Baolu
2023-12-04 12:32   ` Yi Liu
2023-12-05 12:01     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 06/12] iommu: Remove iommu_[un]register_device_fault_handler() Lu Baolu
2023-12-04 12:36   ` Yi Liu
2023-12-05 12:09     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 07/12] iommu: Merge iommu_fault_event and iopf_fault Lu Baolu
2023-12-01 19:09   ` Jason Gunthorpe
2023-12-04 12:40   ` Yi Liu
2023-11-15  3:02 ` [PATCH v7 08/12] iommu: Prepare for separating SVA and IOPF Lu Baolu
2023-12-05  7:10   ` Yi Liu
2023-11-15  3:02 ` [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic Lu Baolu
2023-12-01 19:14   ` Jason Gunthorpe
2023-12-05  7:13   ` Yi Liu
2023-12-05 12:13     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 10/12] iommu: Separate SVA and IOPF Lu Baolu
2023-11-15  3:02 ` [PATCH v7 11/12] iommu: Consolidate per-device fault data management Lu Baolu
2023-12-01 19:46   ` Jason Gunthorpe
2023-12-04  0:58     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev() Lu Baolu
2023-12-01 20:35   ` Jason Gunthorpe
2023-12-03  8:53     ` Baolu Lu
2023-12-03 14:14       ` Jason Gunthorpe
2023-12-04  1:32         ` Baolu Lu
2023-12-04  5:37           ` Tian, Kevin
2023-12-04 13:25             ` Jason Gunthorpe
2023-12-05  1:32               ` Tian, Kevin
2023-12-05  1:53                 ` Jason Gunthorpe
2023-12-05  3:23                   ` Tian, Kevin
2023-12-05 15:52                     ` Jason Gunthorpe [this message]
2023-12-04 13:12           ` Jason Gunthorpe
2023-12-04  3:46     ` Baolu Lu
2023-12-04 13:27       ` Jason Gunthorpe
2023-12-05  1:13         ` Baolu Lu
2023-11-24  6:30 ` [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space liulongfang
2023-11-24 12:01   ` Baolu Lu
2023-11-25  4:05     ` liulongfang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231205155246.GR1489931@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jean-philippe@linaro.org \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox