virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/5] vfio: Require that devices support DMA cache coherence
       [not found] ` <2-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
@ 2022-04-05 19:10   ` Alex Williamson
       [not found]     ` <20220405192916.GT2120790@nvidia.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Williamson @ 2022-04-05 19:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization, Will Deacon, Christoph Hellwig, linux-s390, kvm,
	linux-rdma, Joerg Roedel, iommu, Gerald Schaefer, David Woodhouse,
	linux-arm-msm, linux-arm-kernel, netdev, Cornelia Huck, Rob Clark,
	Suravee Suthikulpanit, Christian Benvenuti, Robin Murphy,
	Lu Baolu

On Tue,  5 Apr 2022 13:16:01 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> dev_is_dma_coherent() is the control to determine if IOMMU_CACHE can be
> supported.
> 
> IOMMU_CACHE means that normal DMAs do not require any additional coherency
> mechanism and is the basic uAPI that VFIO exposes to userspace. For
> instance VFIO applications like DPDK will not work if additional coherency
> operations are required.
> 
> Therefore check dev_is_dma_coherent() before allowing a device to join a
> domain. This will block device/platform/iommu combinations from using VFIO
> that do not support cache coherent DMA.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index a4555014bd1e72..2a3aa3e742d943 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -32,6 +32,7 @@
>  #include <linux/vfio.h>
>  #include <linux/wait.h>
>  #include <linux/sched/signal.h>
> +#include <linux/dma-map-ops.h>
>  #include "vfio.h"
>  
>  #define DRIVER_VERSION	"0.3"
> @@ -1348,6 +1349,11 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
>  	if (IS_ERR(device))
>  		return PTR_ERR(device);
>  
> +	if (group->type == VFIO_IOMMU && !dev_is_dma_coherent(device->dev)) {
> +		ret = -ENODEV;
> +		goto err_device_put;
> +	}
> +

Failing at the point where the user is trying to gain access to the
device seems a little late in the process and opaque, wouldn't we
rather have vfio bus drivers fail to probe such devices?  I'd expect
this to occur in the vfio_register_group_dev() path.  Thanks,

Alex

>  	if (!try_module_get(device->dev->driver->owner)) {
>  		ret = -ENODEV;
>  		goto err_device_put;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] iommu: Introduce the domain op enforce_cache_coherency()
       [not found] ` <3-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
@ 2022-04-05 19:50   ` Alex Williamson
       [not found]     ` <20220405225739.GW2120790@nvidia.com>
  2022-04-06  7:09   ` Tian, Kevin
  1 sibling, 1 reply; 18+ messages in thread
From: Alex Williamson @ 2022-04-05 19:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization, Will Deacon, Christoph Hellwig, linux-s390, kvm,
	linux-rdma, Joerg Roedel, iommu, Gerald Schaefer, David Woodhouse,
	linux-arm-msm, linux-arm-kernel, netdev, Cornelia Huck, Rob Clark,
	Suravee Suthikulpanit, Christian Benvenuti, Robin Murphy,
	Lu Baolu

On Tue,  5 Apr 2022 13:16:02 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and
> IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU.
> 
> Currently only Intel and AMD IOMMUs are known to support this
> feature. They both implement it as an IOPTE bit, that when set, will cause
> PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though
> the no-snoop bit was clear.
> 
> The new API is triggered by calling enforce_cache_coherency() before
> mapping any IOVA to the domain which globally switches on no-snoop
> blocking. This allows other implementations that might block no-snoop
> globally and outside the IOPTE - AMD also documents such an HW capability.
> 
> Leave AMD out of sync with Intel and have it block no-snoop even for
> in-kernel users. This can be trivially resolved in a follow up patch.
> 
> Only VFIO will call this new API.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/amd/iommu.c   |  7 +++++++
>  drivers/iommu/intel/iommu.c | 14 +++++++++++++-
>  include/linux/intel-iommu.h |  1 +
>  include/linux/iommu.h       |  4 ++++
>  4 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index a1ada7bff44e61..e500b487eb3429 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2271,6 +2271,12 @@ static int amd_iommu_def_domain_type(struct device *dev)
>  	return 0;
>  }
>  
> +static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
> +{
> +	/* IOMMU_PTE_FC is always set */
> +	return true;
> +}
> +
>  const struct iommu_ops amd_iommu_ops = {
>  	.capable = amd_iommu_capable,
>  	.domain_alloc = amd_iommu_domain_alloc,
> @@ -2293,6 +2299,7 @@ const struct iommu_ops amd_iommu_ops = {
>  		.flush_iotlb_all = amd_iommu_flush_iotlb_all,
>  		.iotlb_sync	= amd_iommu_iotlb_sync,
>  		.free		= amd_iommu_domain_free,
> +		.enforce_cache_coherency = amd_iommu_enforce_cache_coherency,
>  	}
>  };
>  
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index df5c62ecf942b8..f08611a6cc4799 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4422,7 +4422,8 @@ static int intel_iommu_map(struct iommu_domain *domain,
>  		prot |= DMA_PTE_READ;
>  	if (iommu_prot & IOMMU_WRITE)
>  		prot |= DMA_PTE_WRITE;
> -	if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping)
> +	if (((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping) ||
> +	    dmar_domain->enforce_no_snoop)
>  		prot |= DMA_PTE_SNP;
>  
>  	max_addr = iova + size;
> @@ -4545,6 +4546,16 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
>  	return phys;
>  }
>  
> +static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
> +{
> +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +
> +	if (!dmar_domain->iommu_snooping)
> +		return false;
> +	dmar_domain->enforce_no_snoop = true;
> +	return true;
> +}

Don't we have issues if we try to set DMA_PTE_SNP on DMARs that don't
support it, ie. reserved register bit set in pte faults?  It seems
really inconsistent here that I could make a domain that supports
iommu_snooping, set enforce_no_snoop = true, then add another DMAR to
the domain that may not support iommu_snooping, I'd get false on the
subsequent enforcement test, but the dmar_domain is still trying to use
DMA_PTE_SNP.

There's also a disconnect, maybe just in the naming or documentation,
but if I call enforce_cache_coherency for a domain, that seems like the
domain should retain those semantics regardless of how it's modified,
ie. "enforced".  For example, if I tried to perform the above operation,
I should get a failure attaching the device that brings in the less
capable DMAR because the domain has been set to enforce this feature.

If the API is that I need to re-enforce_cache_coherency on every
modification of the domain, shouldn't dmar_domain->enforce_no_snoop
also return to a default value on domain changes?

Maybe this should be something like set_no_snoop_squashing with the
above semantics, it needs to be re-applied whenever the domain:device
composition changes?  Thanks,

Alex

> +
>  static bool intel_iommu_capable(enum iommu_cap cap)
>  {
>  	if (cap == IOMMU_CAP_CACHE_COHERENCY)
> @@ -4898,6 +4909,7 @@ const struct iommu_ops intel_iommu_ops = {
>  		.iotlb_sync		= intel_iommu_tlb_sync,
>  		.iova_to_phys		= intel_iommu_iova_to_phys,
>  		.free			= intel_iommu_domain_free,
> +		.enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
>  	}
>  };
>  
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 2f9891cb3d0014..1f930c0c225d94 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -540,6 +540,7 @@ struct dmar_domain {
>  	u8 has_iotlb_device: 1;
>  	u8 iommu_coherency: 1;		/* indicate coherency of iommu access */
>  	u8 iommu_snooping: 1;		/* indicate snooping control feature */
> +	u8 enforce_no_snoop : 1;        /* Create IOPTEs with snoop control */
>  
>  	struct list_head devices;	/* all devices' list */
>  	struct iova_domain iovad;	/* iova's that belong to this domain */
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9208eca4b0d1ac..fe4f24c469c373 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -272,6 +272,9 @@ struct iommu_ops {
>   * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
>   *            queue
>   * @iova_to_phys: translate iova to physical address
> + * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE,
> + *                           including no-snoop TLPs on PCIe or other platform
> + *                           specific mechanisms.
>   * @enable_nesting: Enable nesting
>   * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*)
>   * @free: Release the domain after use.
> @@ -300,6 +303,7 @@ struct iommu_domain_ops {
>  	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain,
>  				    dma_addr_t iova);
>  
> +	bool (*enforce_cache_coherency)(struct iommu_domain *domain);
>  	int (*enable_nesting)(struct iommu_domain *domain);
>  	int (*set_pgtable_quirks)(struct iommu_domain *domain,
>  				  unsigned long quirks);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 3/5] iommu: Introduce the domain op enforce_cache_coherency()
       [not found]     ` <20220405225739.GW2120790@nvidia.com>
@ 2022-04-05 23:31       ` Tian, Kevin
  2022-04-06  0:08       ` Tian, Kevin
  1 sibling, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2022-04-05 23:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Lu Baolu
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization@lists.linux-foundation.org, Will Deacon,
	Christoph Hellwig, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Joerg Roedel,
	iommu@lists.linux-foundation.org, Gerald Schaefer,
	David Woodhouse, linux-arm-msm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	Cornelia Huck, Rob Clark, Suravee Suthikulpanit, Robin Murphy,
	Christian Benvenuti

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, April 6, 2022 6:58 AM
> 
> On Tue, Apr 05, 2022 at 01:50:36PM -0600, Alex Williamson wrote:
> > >
> > > +static bool intel_iommu_enforce_cache_coherency(struct
> iommu_domain *domain)
> > > +{
> > > +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > > +
> > > +	if (!dmar_domain->iommu_snooping)
> > > +		return false;
> > > +	dmar_domain->enforce_no_snoop = true;
> > > +	return true;
> > > +}
> >
> > Don't we have issues if we try to set DMA_PTE_SNP on DMARs that don't
> > support it, ie. reserved register bit set in pte faults?
> 
> The way the Intel driver is setup that is not possible. Currently it
> does:
> 
>  static bool intel_iommu_capable(enum iommu_cap cap)
>  {
> 	if (cap == IOMMU_CAP_CACHE_COHERENCY)
> 		return domain_update_iommu_snooping(NULL);
> 
> Which is a global property unrelated to any device.
> 
> Thus either all devices and all domains support iommu_snooping, or
> none do.
> 
> It is unclear because for some reason the driver recalculates this
> almost constant value on every device attach..

The reason is simply because iommu capability is a global flag 😉

when the cap is removed by this series I don't think we need keep that
global recalculation thing.

Thanks
Kevin
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 3/5] iommu: Introduce the domain op enforce_cache_coherency()
       [not found]     ` <20220405225739.GW2120790@nvidia.com>
  2022-04-05 23:31       ` Tian, Kevin
@ 2022-04-06  0:08       ` Tian, Kevin
  1 sibling, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2022-04-06  0:08 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Lu Baolu
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization@lists.linux-foundation.org, Will Deacon,
	Christoph Hellwig, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Joerg Roedel,
	iommu@lists.linux-foundation.org, Gerald Schaefer,
	David Woodhouse, linux-arm-msm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	Cornelia Huck, Rob Clark, Suravee Suthikulpanit, Robin Murphy,
	Christian Benvenuti

> From: Tian, Kevin
> Sent: Wednesday, April 6, 2022 7:32 AM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, April 6, 2022 6:58 AM
> >
> > On Tue, Apr 05, 2022 at 01:50:36PM -0600, Alex Williamson wrote:
> > > >
> > > > +static bool intel_iommu_enforce_cache_coherency(struct
> > iommu_domain *domain)
> > > > +{
> > > > +	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > > > +
> > > > +	if (!dmar_domain->iommu_snooping)
> > > > +		return false;
> > > > +	dmar_domain->enforce_no_snoop = true;
> > > > +	return true;
> > > > +}
> > >
> > > Don't we have issues if we try to set DMA_PTE_SNP on DMARs that don't
> > > support it, ie. reserved register bit set in pte faults?
> >
> > The way the Intel driver is setup that is not possible. Currently it
> > does:
> >
> >  static bool intel_iommu_capable(enum iommu_cap cap)
> >  {
> > 	if (cap == IOMMU_CAP_CACHE_COHERENCY)
> > 		return domain_update_iommu_snooping(NULL);
> >
> > Which is a global property unrelated to any device.
> >
> > Thus either all devices and all domains support iommu_snooping, or
> > none do.
> >
> > It is unclear because for some reason the driver recalculates this
> > almost constant value on every device attach..
> 
> The reason is simply because iommu capability is a global flag 😉

...and intel iommu driver supports hotplug-ed iommu. But in reality this
recalculation is almost a no-op because the only iommu that doesn't
support force snoop is for Intel GPU and built-in in the platform.

> 
> when the cap is removed by this series I don't think we need keep that
> global recalculation thing.
> 
> Thanks
> Kevin
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found] ` <1-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
@ 2022-04-06  5:30   ` Christoph Hellwig
       [not found]     ` <20220406120730.GA2120790@nvidia.com>
  2022-04-06 13:56   ` Robin Murphy
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-06  5:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization, Will Deacon, Christoph Hellwig, linux-s390, kvm,
	linux-rdma, Joerg Roedel, Rob Clark, Gerald Schaefer,
	David Woodhouse, linux-arm-msm, linux-arm-kernel, netdev,
	Cornelia Huck, iommu, Suravee Suthikulpanit, Christian Benvenuti,
	Robin Murphy, Lu Baolu

On Tue, Apr 05, 2022 at 01:16:00PM -0300, Jason Gunthorpe wrote:
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index 760b254ba42d6b..24d118198ac756 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -42,6 +42,7 @@
>  #include <linux/list.h>
>  #include <linux/pci.h>
>  #include <rdma/ib_verbs.h>
> +#include <linux/dma-map-ops.h>
>  
>  #include "usnic_log.h"
>  #include "usnic_uiom.h"
> @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
>  	struct usnic_uiom_dev *uiom_dev;
>  	int err;
>  
> +	if (!dev_is_dma_coherent(dev)) {

Which part of the comment at the top of dma-map-ops.h is not clear
enough to you?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 0/5] Make the iommu driver no-snoop block feature consistent
       [not found] <0-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
       [not found] ` <2-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
@ 2022-04-06  6:52 ` Tian, Kevin
       [not found] ` <3-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
       [not found] ` <1-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
  3 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2022-04-06  6:52 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Lu Baolu, Christian Benvenuti,
	Cornelia Huck, David Woodhouse, Gerald Schaefer,
	iommu@lists.linux-foundation.org, Jason Wang, Joerg Roedel,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-arm-msm@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-s390@vger.kernel.org, Matthew Rosato, Michael S. Tsirkin,
	Nelson Escobar, netdev@vger.kernel.org, Rob Clark, Robin Murphy,
	Suravee Suthikulpanit, virtualization@lists.linux-foundation.org,
	Will Deacon
  Cc: Christoph Hellwig

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, April 6, 2022 12:16 AM
> 
> PCIe defines a 'no-snoop' bit in each the TLP which is usually implemented
> by a platform as bypassing elements in the DMA coherent CPU cache
> hierarchy. A driver can command a device to set this bit on some of its
> transactions as a micro-optimization.
> 
> However, the driver is now responsible to synchronize the CPU cache with
> the DMA that bypassed it. On x86 this is done through the wbinvd
> instruction, and the i915 GPU driver is the only Linux DMA driver that
> calls it.

More accurately x86 supports both unprivileged clflush instructions
to invalidate one cacheline and a privileged wbinvd instruction to
invalidate the entire cache. Replacing 'this is done' with 'this may
be done' is clearer.

> 
> The problem comes that KVM on x86 will normally disable the wbinvd
> instruction in the guest and render it a NOP. As the driver running in the
> guest is not aware the wbinvd doesn't work it may still cause the device
> to set the no-snoop bit and the platform will bypass the CPU cache.
> Without a working wbinvd there is no way to re-synchronize the CPU cache
> and the driver in the VM has data corruption.
> 
> Thus, we see a general direction on x86 that the IOMMU HW is able to block
> the no-snoop bit in the TLP. This NOP's the optimization and allows KVM to
> to NOP the wbinvd without causing any data corruption.
> 
> This control for Intel IOMMU was exposed by using IOMMU_CACHE and
> IOMMU_CAP_CACHE_COHERENCY, however these two values now have
> multiple
> meanings and usages beyond blocking no-snoop and the whole thing has
> become confused.

Also point out your finding about AMD IOMMU?

> 
> Change it so that:
>  - IOMMU_CACHE is only about the DMA coherence of normal DMAs from a
>    device. It is used by the DMA API and set when the DMA API will not be
>    doing manual cache coherency operations.
> 
>  - dev_is_dma_coherent() indicates if IOMMU_CACHE can be used with the
>    device
> 
>  - The new optional domain op enforce_cache_coherency() will cause the
>    entire domain to block no-snoop requests - ie there is no way for any
>    device attached to the domain to opt out of the IOMMU_CACHE behavior.
> 
> An iommu driver should implement enforce_cache_coherency() so that by
> default domains allow the no-snoop optimization. This leaves it available
> to kernel drivers like i915. VFIO will call enforce_cache_coherency()
> before establishing any mappings and the domain should then permanently
> block no-snoop.
> 
> If enforce_cache_coherency() fails VFIO will communicate back through to
> KVM into the arch code via kvm_arch_register_noncoherent_dma()
> (only implemented by x86) which triggers a working wbinvd to be made
> available to the VM.
> 
> While other arches are certainly welcome to implement
> enforce_cache_coherency(), it is not clear there is any benefit in doing
> so.
> 
> After this series there are only two calls left to iommu_capable() with a
> bus argument which should help Robin's work here.
> 
> This is on github:
> https://github.com/jgunthorpe/linux/commits/intel_no_snoop
> 
> Cc: "Tian, Kevin" <kevin.tian@intel.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason Gunthorpe (5):
>   iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with
>     dev_is_dma_coherent()
>   vfio: Require that devices support DMA cache coherence
>   iommu: Introduce the domain op enforce_cache_coherency()
>   vfio: Move the Intel no-snoop control off of IOMMU_CACHE
>   iommu: Delete IOMMU_CAP_CACHE_COHERENCY
> 
>  drivers/infiniband/hw/usnic/usnic_uiom.c    | 16 +++++------
>  drivers/iommu/amd/iommu.c                   |  9 +++++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  2 --
>  drivers/iommu/arm/arm-smmu/arm-smmu.c       |  6 -----
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c     |  6 -----
>  drivers/iommu/fsl_pamu_domain.c             |  6 -----
>  drivers/iommu/intel/iommu.c                 | 15 ++++++++---
>  drivers/iommu/s390-iommu.c                  |  2 --
>  drivers/vfio/vfio.c                         |  6 +++++
>  drivers/vfio/vfio_iommu_type1.c             | 30 +++++++++++++--------
>  drivers/vhost/vdpa.c                        |  3 ++-
>  include/linux/intel-iommu.h                 |  1 +
>  include/linux/iommu.h                       |  6 +++--
>  13 files changed, 58 insertions(+), 50 deletions(-)
> 
> 
> base-commit: 3123109284176b1532874591f7c81f3837bbdc17
> --
> 2.35.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 2/5] vfio: Require that devices support DMA cache coherence
       [not found]     ` <20220405192916.GT2120790@nvidia.com>
@ 2022-04-06  7:02       ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2022-04-06  7:02 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization@lists.linux-foundation.org, Will Deacon,
	Christoph Hellwig, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Joerg Roedel,
	iommu@lists.linux-foundation.org, Gerald Schaefer,
	David Woodhouse, linux-arm-msm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	Cornelia Huck, Rob Clark, Suravee Suthikulpanit,
	Christian Benvenuti, Robin Murphy, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, April 6, 2022 3:29 AM
> 
> On Tue, Apr 05, 2022 at 01:10:44PM -0600, Alex Williamson wrote:
> > On Tue,  5 Apr 2022 13:16:01 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > dev_is_dma_coherent() is the control to determine if IOMMU_CACHE can
> be
> > > supported.
> > >
> > > IOMMU_CACHE means that normal DMAs do not require any additional
> coherency
> > > mechanism and is the basic uAPI that VFIO exposes to userspace. For
> > > instance VFIO applications like DPDK will not work if additional coherency
> > > operations are required.
> > >
> > > Therefore check dev_is_dma_coherent() before allowing a device to join
> a
> > > domain. This will block device/platform/iommu combinations from using
> VFIO
> > > that do not support cache coherent DMA.
> > >
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > >  drivers/vfio/vfio.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index a4555014bd1e72..2a3aa3e742d943 100644
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -32,6 +32,7 @@
> > >  #include <linux/vfio.h>
> > >  #include <linux/wait.h>
> > >  #include <linux/sched/signal.h>
> > > +#include <linux/dma-map-ops.h>
> > >  #include "vfio.h"
> > >
> > >  #define DRIVER_VERSION	"0.3"
> > > @@ -1348,6 +1349,11 @@ static int vfio_group_get_device_fd(struct
> vfio_group *group, char *buf)
> > >  	if (IS_ERR(device))
> > >  		return PTR_ERR(device);
> > >
> > > +	if (group->type == VFIO_IOMMU && !dev_is_dma_coherent(device-
> >dev)) {
> > > +		ret = -ENODEV;
> > > +		goto err_device_put;
> > > +	}
> > > +
> >
> > Failing at the point where the user is trying to gain access to the
> > device seems a little late in the process and opaque, wouldn't we
> > rather have vfio bus drivers fail to probe such devices?  I'd expect
> > this to occur in the vfio_register_group_dev() path.  Thanks,
> 
> Yes, that is a good point.
> 
> So like this:
> 
>  int vfio_register_group_dev(struct vfio_device *device)
>  {
> +       if (!dev_is_dma_coherent(device->dev))
> +               return -EINVAL;
> +
>         return __vfio_register_dev(device,
>                 vfio_group_find_or_alloc(device->dev));
>  }
> 
> I fixed it up.
> 

if that is the case should it also apply to usnic and vdpa in the first
patch (i.e. fail the probe)?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 3/5] iommu: Introduce the domain op enforce_cache_coherency()
       [not found] ` <3-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
  2022-04-05 19:50   ` [PATCH 3/5] iommu: Introduce the domain op enforce_cache_coherency() Alex Williamson
@ 2022-04-06  7:09   ` Tian, Kevin
  1 sibling, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2022-04-06  7:09 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Lu Baolu, Christian Benvenuti,
	Cornelia Huck, David Woodhouse, Gerald Schaefer,
	iommu@lists.linux-foundation.org, Jason Wang, Joerg Roedel,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-arm-msm@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-s390@vger.kernel.org, Matthew Rosato, Michael S. Tsirkin,
	Nelson Escobar, netdev@vger.kernel.org, Rob Clark, Robin Murphy,
	Suravee Suthikulpanit, virtualization@lists.linux-foundation.org,
	Will Deacon
  Cc: Christoph Hellwig

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, April 6, 2022 12:16 AM
> 
> This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY
> and
> IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU.
> 
> Currently only Intel and AMD IOMMUs are known to support this
> feature. They both implement it as an IOPTE bit, that when set, will cause
> PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though
> the no-snoop bit was clear.
> 
> The new API is triggered by calling enforce_cache_coherency() before
> mapping any IOVA to the domain which globally switches on no-snoop
> blocking. This allows other implementations that might block no-snoop
> globally and outside the IOPTE - AMD also documents such an HW capability.
> 
> Leave AMD out of sync with Intel and have it block no-snoop even for
> in-kernel users. This can be trivially resolved in a follow up patch.
> 
> Only VFIO will call this new API.

Is it too restrictive? In theory vdpa may also implement a contract with
KVM and then wants to call this new API too?

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found]     ` <20220406120730.GA2120790@nvidia.com>
@ 2022-04-06 13:51       ` Christoph Hellwig
       [not found]         ` <20220406141446.GE2120790@nvidia.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-06 13:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization, Will Deacon, Christoph Hellwig, linux-s390, kvm,
	linux-rdma, Joerg Roedel, Rob Clark, Gerald Schaefer,
	David Woodhouse, linux-arm-msm, linux-arm-kernel, netdev,
	Cornelia Huck, iommu, Suravee Suthikulpanit, Christian Benvenuti,
	Robin Murphy, Lu Baolu

On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote:
> Didn't see it
> 
> I'll move dev_is_dma_coherent to device.h along with
> device_iommu_mapped() and others then

No.  It it is internal for a reason.  It also doesn't actually work
outside of the dma core.  E.g. for non-swiotlb ARM configs it will
not actually work.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found] ` <1-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
  2022-04-06  5:30   ` [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent() Christoph Hellwig
@ 2022-04-06 13:56   ` Robin Murphy
       [not found]     ` <20220406142432.GF2120790@nvidia.com>
  1 sibling, 1 reply; 18+ messages in thread
From: Robin Murphy @ 2022-04-06 13:56 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Lu Baolu, Christian Benvenuti,
	Cornelia Huck, David Woodhouse, Gerald Schaefer, iommu,
	Jason Wang, Joerg Roedel, kvm, linux-arm-kernel, linux-arm-msm,
	linux-rdma, linux-s390, Matthew Rosato, Michael S. Tsirkin,
	Nelson Escobar, netdev, Rob Clark, Suravee Suthikulpanit,
	virtualization, Will Deacon
  Cc: Christoph Hellwig

On 2022-04-05 17:16, Jason Gunthorpe wrote:
> vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct
> way to do this is via dev_is_dma_coherent()

Not necessarily...

Disregarding the complete disaster of PCIe No Snoop on Arm-Based 
systems, there's the more interesting effectively-opposite scenario 
where an SMMU bridges non-coherent devices to a coherent interconnect. 
It's not something we take advantage of yet in Linux, and it can only be 
properly described in ACPI, but there do exist situations where 
IOMMU_CACHE is capable of making the device's traffic snoop, but 
dev_is_dma_coherent() - and device_get_dma_attr() for external users - 
would still say non-coherent because they can't assume that the SMMU is 
enabled and programmed in just the right way.

I've also not thought too much about how things might look with S2FWB 
thrown into the mix in future...

Robin.

> like the DMA API does. If
> IOMMU_CACHE is not supported then these drivers won't work as they don't
> call any coherency-restoring routines around their DMAs.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++++++---------
>   drivers/vhost/vdpa.c                     |  3 ++-
>   2 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index 760b254ba42d6b..24d118198ac756 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -42,6 +42,7 @@
>   #include <linux/list.h>
>   #include <linux/pci.h>
>   #include <rdma/ib_verbs.h>
> +#include <linux/dma-map-ops.h>
>   
>   #include "usnic_log.h"
>   #include "usnic_uiom.h"
> @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
>   	struct usnic_uiom_dev *uiom_dev;
>   	int err;
>   
> +	if (!dev_is_dma_coherent(dev)) {
> +		usnic_err("IOMMU of %s does not support cache coherency\n",
> +				dev_name(dev));
> +		return -EINVAL;
> +	}
> +
>   	uiom_dev = kzalloc(sizeof(*uiom_dev), GFP_ATOMIC);
>   	if (!uiom_dev)
>   		return -ENOMEM;
> @@ -483,13 +490,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
>   	if (err)
>   		goto out_free_dev;
>   
> -	if (!iommu_capable(dev->bus, IOMMU_CAP_CACHE_COHERENCY)) {
> -		usnic_err("IOMMU of %s does not support cache coherency\n",
> -				dev_name(dev));
> -		err = -EINVAL;
> -		goto out_detach_device;
> -	}
> -
>   	spin_lock(&pd->lock);
>   	list_add_tail(&uiom_dev->link, &pd->devs);
>   	pd->dev_cnt++;
> @@ -497,8 +497,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev)
>   
>   	return 0;
>   
> -out_detach_device:
> -	iommu_detach_device(pd->domain, dev);
>   out_free_dev:
>   	kfree(uiom_dev);
>   	return err;
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 4c2f0bd062856a..05ea5800febc37 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -22,6 +22,7 @@
>   #include <linux/vdpa.h>
>   #include <linux/nospec.h>
>   #include <linux/vhost.h>
> +#include <linux/dma-map-ops.h>
>   
>   #include "vhost.h"
>   
> @@ -929,7 +930,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v)
>   	if (!bus)
>   		return -EFAULT;
>   
> -	if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY))
> +	if (!dev_is_dma_coherent(dma_dev))
>   		return -ENOTSUPP;
>   
>   	v->domain = iommu_domain_alloc(bus);
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found]         ` <20220406141446.GE2120790@nvidia.com>
@ 2022-04-06 15:47           ` Christoph Hellwig
  2022-04-06 15:48           ` Robin Murphy
  1 sibling, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-06 15:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization, Will Deacon, Christoph Hellwig, linux-s390, kvm,
	linux-rdma, Joerg Roedel, Rob Clark, Gerald Schaefer,
	David Woodhouse, linux-arm-msm, linux-arm-kernel, netdev,
	Cornelia Huck, iommu, Suravee Suthikulpanit, Christian Benvenuti,
	Robin Murphy, Lu Baolu

On Wed, Apr 06, 2022 at 11:14:46AM -0300, Jason Gunthorpe wrote:
> Really? It is the only condition that dma_info_to_prot() tests to
> decide of IOMMU_CACHE is used or not, so you are saying that there is
> a condition where a device can be attached to an iommu_domain and
> dev_is_dma_coherent() returns the wrong information? How does
> dma-iommu.c safely use it then?

arm does not use dma-iommu.c
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found]         ` <20220406141446.GE2120790@nvidia.com>
  2022-04-06 15:47           ` Christoph Hellwig
@ 2022-04-06 15:48           ` Robin Murphy
  1 sibling, 0 replies; 18+ messages in thread
From: Robin Murphy @ 2022-04-06 15:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization, Will Deacon, linux-s390, kvm, linux-rdma,
	Joerg Roedel, Rob Clark, Gerald Schaefer, linux-arm-msm,
	linux-arm-kernel, netdev, Cornelia Huck, iommu,
	Suravee Suthikulpanit, Christian Benvenuti, David Woodhouse,
	Lu Baolu

On 2022-04-06 15:14, Jason Gunthorpe wrote:
> On Wed, Apr 06, 2022 at 03:51:50PM +0200, Christoph Hellwig wrote:
>> On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote:
>>> Didn't see it
>>>
>>> I'll move dev_is_dma_coherent to device.h along with
>>> device_iommu_mapped() and others then
>>
>> No.  It it is internal for a reason.  It also doesn't actually work
>> outside of the dma core.  E.g. for non-swiotlb ARM configs it will
>> not actually work.
> 
> Really? It is the only condition that dma_info_to_prot() tests to
> decide of IOMMU_CACHE is used or not, so you are saying that there is
> a condition where a device can be attached to an iommu_domain and
> dev_is_dma_coherent() returns the wrong information? How does
> dma-iommu.c safely use it then?

The common iommu-dma layer happens to be part of the subset of the DMA 
core which *does* play the dev->dma_coherent game. 32-bit Arm has its 
own IOMMU DMA ops which do not. I don't know if the set of PowerPCs with 
CONFIG_NOT_COHERENT_CACHE intersects the set of PowerPCs that can do 
VFIO, but that would be another example if so.

> In any case I still need to do something about the places checking
> IOMMU_CAP_CACHE_COHERENCY and thinking that means IOMMU_CACHE
> works. Any idea?

Can we improve the IOMMU drivers such that that *can* be the case 
(within a reasonable margin of error)? That's kind of where I was hoping 
to head with device_iommu_capable(), e.g. [1].

Robin.

[1] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/53390e9505b3791adedc0974e251e5c7360e402e
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found]       ` <20220406151823.GG2120790@nvidia.com>
@ 2022-04-06 15:50         ` Christoph Hellwig
       [not found]           ` <20220406160623.GI2120790@nvidia.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-06 15:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization, Will Deacon, Christoph Hellwig, linux-s390, kvm,
	linux-rdma, Joerg Roedel, Rob Clark, Gerald Schaefer,
	David Woodhouse, linux-arm-msm, linux-arm-kernel, netdev,
	Cornelia Huck, iommu, Suravee Suthikulpanit, Christian Benvenuti,
	Robin Murphy, Lu Baolu

On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > Oh, I didn't know about device_get_dma_attr()..

Which is completely broken for any non-OF, non-ACPI plaform.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found]           ` <20220406160623.GI2120790@nvidia.com>
@ 2022-04-06 16:10             ` Christoph Hellwig
       [not found]               ` <20220406171729.GJ2120790@nvidia.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-06 16:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization, Will Deacon, Christoph Hellwig, linux-s390, kvm,
	linux-rdma, Joerg Roedel, Rob Clark, Gerald Schaefer,
	linux-arm-msm, Robin Murphy, linux-arm-kernel, netdev,
	Cornelia Huck, iommu, Suravee Suthikulpanit, Christian Benvenuti,
	David Woodhouse, Lu Baolu

On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > Oh, I didn't know about device_get_dma_attr()..
> > 
> > Which is completely broken for any non-OF, non-ACPI plaform.
> 
> I saw that, but I spent some time searching and could not find an
> iommu driver that would load independently of OF or ACPI. ie no IOMMU
> platform drivers are created by board files. Things like Intel/AMD
> discover only from ACPI, etc.

s390?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found]               ` <20220406171729.GJ2120790@nvidia.com>
@ 2022-04-07  7:18                 ` Tian, Kevin
       [not found]                   ` <20220407135946.GM2120790@nvidia.com>
  0 siblings, 1 reply; 18+ messages in thread
From: Tian, Kevin @ 2022-04-07  7:18 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization@lists.linux-foundation.org, Will Deacon,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org,
	linux-rdma@vger.kernel.org, Joerg Roedel, Rob Clark,
	Gerald Schaefer, David Woodhouse, linux-arm-msm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	Cornelia Huck, iommu@lists.linux-foundation.org,
	Suravee Suthikulpanit, Christian Benvenuti, Robin Murphy,
	Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, April 7, 2022 1:17 AM
> 
> On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
> > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
> > > > > > Oh, I didn't know about device_get_dma_attr()..
> > > >
> > > > Which is completely broken for any non-OF, non-ACPI plaform.
> > >
> > > I saw that, but I spent some time searching and could not find an
> > > iommu driver that would load independently of OF or ACPI. ie no IOMMU
> > > platform drivers are created by board files. Things like Intel/AMD
> > > discover only from ACPI, etc.

Intel discovers IOMMUs (and optionally ACPI namespace devices) from
ACPI, but there is no ACPI description for PCI devices i.e. the current
logic of device_get_dma_attr() cannot be used on PCI devices. 

> >
> > s390?
> 
> Ah, I missed looking in s390, hyperv and virtio..
> 
> hyperv is not creating iommu_domains, just IRQ remapping
> 
> virtio is using OF
> 
> And s390 indeed doesn't obviously have OF or ACPI parts..
> 
> This seems like it would be consistent with other things:
> 
> enum dev_dma_attr device_get_dma_attr(struct device *dev)
> {
> 	const struct fwnode_handle *fwnode = dev_fwnode(dev);
> 	struct acpi_device *adev = to_acpi_device_node(fwnode);
> 
> 	if (is_of_node(fwnode)) {
> 		if (of_dma_is_coherent(to_of_node(fwnode)))
> 			return DEV_DMA_COHERENT;
> 		return DEV_DMA_NON_COHERENT;
> 	} else if (adev) {
> 		return acpi_get_dma_attr(adev);
> 	}
> 
> 	/* Platform is always DMA coherent */
> 	if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) &&
> 	    !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) &&
> 	    !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) &&
> 	    device_iommu_mapped(dev))
> 		return DEV_DMA_COHERENT;
> 	return DEV_DMA_NOT_SUPPORTED;
> }
> EXPORT_SYMBOL_GPL(device_get_dma_attr);
> 
> ie s390 has no of or acpi but the entire platform is known DMA
> coherent at config time so allow it. Not sure we need the
> device_iommu_mapped() or not.

Probably not. If adding an iommu may change the coherency it would
come from specific IOPTE attributes for a device. The fact whether the
device is mapped by iommu doesn't tell that information.

> 
> We could alternatively use existing device_get_dma_attr() as a default
> with an iommu wrapper and push the exception down through the iommu
> driver and s390 can override it.
> 

if going this way probably device_get_dma_attr() should be renamed to
device_fwnode_get_dma_attr() instead to make it clearer?

Thanks
Kevin
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found]                   ` <20220407135946.GM2120790@nvidia.com>
@ 2022-04-07 15:17                     ` Robin Murphy
  2022-04-07 15:31                       ` Christoph Hellwig
       [not found]                       ` <20220407152331.GN2120790@nvidia.com>
  0 siblings, 2 replies; 18+ messages in thread
From: Robin Murphy @ 2022-04-07 15:17 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization@lists.linux-foundation.org, Will Deacon,
	Christoph Hellwig, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Joerg Roedel,
	Rob Clark, Gerald Schaefer, linux-arm-msm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	Cornelia Huck, iommu@lists.linux-foundation.org,
	Suravee Suthikulpanit, Christian Benvenuti, David Woodhouse,
	Lu Baolu

On 2022-04-07 14:59, Jason Gunthorpe wrote:
> On Thu, Apr 07, 2022 at 07:18:48AM +0000, Tian, Kevin wrote:
>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>> Sent: Thursday, April 7, 2022 1:17 AM
>>>
>>> On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote:
>>>> On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote:
>>>>> On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote:
>>>>>> On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote:
>>>>>>>> Oh, I didn't know about device_get_dma_attr()..
>>>>>>
>>>>>> Which is completely broken for any non-OF, non-ACPI plaform.
>>>>>
>>>>> I saw that, but I spent some time searching and could not find an
>>>>> iommu driver that would load independently of OF or ACPI. ie no IOMMU
>>>>> platform drivers are created by board files. Things like Intel/AMD
>>>>> discover only from ACPI, etc.
>>
>> Intel discovers IOMMUs (and optionally ACPI namespace devices) from
>> ACPI, but there is no ACPI description for PCI devices i.e. the current
>> logic of device_get_dma_attr() cannot be used on PCI devices.
> 
> Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or
> DEV_DMA_NOT_SUPPORTED?

I think it _should_ return DEV_DMA_COHERENT on x86/IA-64 (unless a _CCA 
method was actually present to say otherwise), based on 
acpi_init_coherency(), but I only know for sure what happens on arm64.

> I think I should give up on this and just redefine the existing iommu
> cap flag to IOMMU_CAP_CACHE_SUPPORTED or something.

TBH I don't see any issue with current name, but I'd certainly be happy 
to nail down a specific definition for it, along the lines of "this 
means that IOMMU_CACHE mappings are generally coherent". That works for 
things like Arm's S2FWB making it OK to assign an otherwise-non-coherent 
device without extra hassle.

For the specific case of overriding PCIe No Snoop (which is more 
problematic from an Arm SMMU PoV) when assigning to a VM, would that not 
be easier solved by just having vfio-pci clear the "Enable No Snoop" 
control bit in the endpoint's PCIe capability?

>>> We could alternatively use existing device_get_dma_attr() as a default
>>> with an iommu wrapper and push the exception down through the iommu
>>> driver and s390 can override it.
>>>
>>
>> if going this way probably device_get_dma_attr() should be renamed to
>> device_fwnode_get_dma_attr() instead to make it clearer?
> 
> I'm looking at the few users:
> 
> drivers/ata/ahci_ceva.c
> drivers/ata/ahci_qoriq.c
>   - These are ARM only drivers. They are trying to copy the dma-coherent
>     property from its DT/ACPI definition to internal register settings
>     which look like they tune how the AXI bus transactions are created.
> 
>     I'm guessing the SATA IP block's AXI interface can be configured to
>     generate coherent or non-coherent requests and it has to be set
>     in a way that is consistent with the SOC architecture and match
>     what the DMA API expects the device will do.
> 
> drivers/crypto/ccp/sp-platform.c
>   - Only used on ARM64 and also programs a HW register similar to the
>     sata drivers. Refuses to work if the FW property is not present.
> 
> drivers/net/ethernet/amd/xgbe/xgbe-platform.c
>   - Seems to be configuring another ARM AXI block
> 
> drivers/gpu/drm/panfrost/panfrost_drv.c
>   - Robin's commit comment here is good, and one of the things this
>     controls is if the coherent_walk is set for the io-pgtable-arm.c
>     code which avoids DMA API calls
> 
> drivers/gpu/drm/tegra/uapi.c
>   - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea.
> 
> My take is that the drivers using this API are doing it to make sure
> their HW blocks are setup in a way that is consistent with the DMA API
> they are also using, and run in constrained embedded-style
> environments that know the firmware support is present.
> 
> So in the end it does not seem suitable right now for linking to
> IOMMU_CACHE..

That seems a pretty good summary - I think they're basically all 
"firmware told Linux I'm coherent so I'd better act coherent" cases, but 
that still doesn't necessarily mean that they're *forced* to respect 
that. One of the things on my to-do list is to try adding a 
DMA_ATTR_NO_SNOOP that can force DMA cache maintenance for coherent 
devices, primarily to hook up in Panfrost (where there is a bit of a 
performance to claw back on the coherent AmLogic SoCs by leaving certain 
buffers non-cacheable).

Cheers,
Robin.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
  2022-04-07 15:17                     ` Robin Murphy
@ 2022-04-07 15:31                       ` Christoph Hellwig
       [not found]                       ` <20220407152331.GN2120790@nvidia.com>
  1 sibling, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2022-04-07 15:31 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization@lists.linux-foundation.org, Will Deacon,
	Christoph Hellwig, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Joerg Roedel,
	Rob Clark, Jason Gunthorpe, Gerald Schaefer,
	linux-arm-msm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	Cornelia Huck, iommu@lists.linux-foundation.org,
	Suravee Suthikulpanit, Christian Benvenuti, David Woodhouse,
	Lu Baolu

On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:
>> My take is that the drivers using this API are doing it to make sure
>> their HW blocks are setup in a way that is consistent with the DMA API
>> they are also using, and run in constrained embedded-style
>> environments that know the firmware support is present.
>>
>> So in the end it does not seem suitable right now for linking to
>> IOMMU_CACHE..
>
> That seems a pretty good summary - I think they're basically all "firmware 
> told Linux I'm coherent so I'd better act coherent" cases, but that still 
> doesn't necessarily mean that they're *forced* to respect that.

Yes. And the interface is horribly misnamed for that.  I'll see what
I can do to clean this up as I've noticed various other not very
nice things in that area.

> One of the 
> things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force 
> DMA cache maintenance for coherent devices, primarily to hook up in 
> Panfrost (where there is a bit of a performance to claw back on the 
> coherent AmLogic SoCs by leaving certain buffers non-cacheable).

This has been an explicit request from the amdgpu folks and thus been
on my TODO list for quite a while as well.  Note that I don't think it
should be a flag to dma_alloc_attrs, but rather for dma_alloc_pages
as the drivers that want non-snoop generally also want to actually
be able to deal with pages.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
       [not found]                       ` <20220407152331.GN2120790@nvidia.com>
@ 2022-04-07 22:37                         ` Alex Williamson
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2022-04-07 22:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nelson Escobar, Matthew Rosato, Michael S. Tsirkin,
	virtualization@lists.linux-foundation.org, Will Deacon,
	Christoph Hellwig, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org, Joerg Roedel,
	iommu@lists.linux-foundation.org, Gerald Schaefer,
	linux-arm-msm@vger.kernel.org, Robin Murphy,
	linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	Cornelia Huck, Rob Clark, Suravee Suthikulpanit,
	Christian Benvenuti, David Woodhouse, Lu Baolu

On Thu, 7 Apr 2022 12:23:31 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote:
> 
> > For the specific case of overriding PCIe No Snoop (which is more problematic
> > from an Arm SMMU PoV) when assigning to a VM, would that not be easier
> > solved by just having vfio-pci clear the "Enable No Snoop" control bit in
> > the endpoint's PCIe capability?  
> 
> Ideally.
> 
> That was rediscussed recently, apparently there are non-compliant
> devices and drivers that just ignore the bit. 
> 
> Presumably this is why x86 had to move to an IOMMU enforced feature..

I considered this option when implementing the current solution, but
ultimately I didn't have confidence in being able to prevent drivers
from using device specific means to effect the change anyway.  GPUs
especially have various back channels to config space.  Thanks,

Alex

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-07 22:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <0-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
     [not found] ` <2-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
2022-04-05 19:10   ` [PATCH 2/5] vfio: Require that devices support DMA cache coherence Alex Williamson
     [not found]     ` <20220405192916.GT2120790@nvidia.com>
2022-04-06  7:02       ` Tian, Kevin
2022-04-06  6:52 ` [PATCH 0/5] Make the iommu driver no-snoop block feature consistent Tian, Kevin
     [not found] ` <3-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
2022-04-05 19:50   ` [PATCH 3/5] iommu: Introduce the domain op enforce_cache_coherency() Alex Williamson
     [not found]     ` <20220405225739.GW2120790@nvidia.com>
2022-04-05 23:31       ` Tian, Kevin
2022-04-06  0:08       ` Tian, Kevin
2022-04-06  7:09   ` Tian, Kevin
     [not found] ` <1-v1-ef02c60ddb76+12ca2-intel_no_snoop_jgg@nvidia.com>
2022-04-06  5:30   ` [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent() Christoph Hellwig
     [not found]     ` <20220406120730.GA2120790@nvidia.com>
2022-04-06 13:51       ` Christoph Hellwig
     [not found]         ` <20220406141446.GE2120790@nvidia.com>
2022-04-06 15:47           ` Christoph Hellwig
2022-04-06 15:48           ` Robin Murphy
2022-04-06 13:56   ` Robin Murphy
     [not found]     ` <20220406142432.GF2120790@nvidia.com>
     [not found]       ` <20220406151823.GG2120790@nvidia.com>
2022-04-06 15:50         ` Christoph Hellwig
     [not found]           ` <20220406160623.GI2120790@nvidia.com>
2022-04-06 16:10             ` Christoph Hellwig
     [not found]               ` <20220406171729.GJ2120790@nvidia.com>
2022-04-07  7:18                 ` Tian, Kevin
     [not found]                   ` <20220407135946.GM2120790@nvidia.com>
2022-04-07 15:17                     ` Robin Murphy
2022-04-07 15:31                       ` Christoph Hellwig
     [not found]                       ` <20220407152331.GN2120790@nvidia.com>
2022-04-07 22:37                         ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).