public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Matthew Rosato <mjrosato@linux.ibm.com>,
	Pierre Morel <pmorel@linux.ibm.com>,
	iommu@lists.linux.dev, linux-s390@vger.kernel.org,
	borntraeger@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com,
	gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com,
	svens@linux.ibm.com, joro@8bytes.org, will@kernel.org,
	robin.murphy@arm.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/3] iommu/s390: Fix duplicate domain attachments
Date: Mon, 26 Sep 2022 10:46:41 -0300	[thread overview]
Message-ID: <YzGtQY+uw4ZzZoSH@nvidia.com> (raw)
In-Reply-To: <81463119aeadd55465cfac1f5bc6a8b79f0c9738.camel@linux.ibm.com>

On Mon, Sep 26, 2022 at 11:00:53AM +0200, Niklas Schnelle wrote:

> > > +out_unregister_restore:
> > > +	zpci_unregister_ioat(zdev, 0);
> > >  out_restore:
> > > -	if (!zdev->s390_domain) {
> > > +	zdev->dma_table = NULL;
> > > +	if (prev_domain)
> > > +		s390_iommu_attach_device(&prev_domain->domain,
> > > +					 dev);
> > 
> > Huh. That is a surprising thing
> > 
> > I think this function needs some re-ordering to avoid this condition
> > 
> > The checks for aperture should be earlier, and they are not quite
> > right. The aperture is only allowed to grow. If it starts out as 0 and
> > then is set to something valid on first attach, a later attach cannot
> > then shrink it. There could already be mappings in the domain under
> > the now invalidated aperture and no caller is prepared to deal with
> > this.
> 
> Ohh I think this is indeed broken. Let me rephrase to see if I
> understand correctly. You're saying that while we only allow exactly
> matching apertures on additional attaches, we do allow shrinking if
> there is temporarily no device attached to the domain. That part is
> then broken because there could still be mappings outside the new
> aperture stored in the translation tables?

Right, go from 0 -> sized apperture on first attach, and then once it
is sized it doesn't change again.
 
> > That leaves the only error case as zpci_register_ioat() - which seems
> > like it is the actual "attach" operation. Since
> > __s390_iommu_detach_device() is just internal accounting (and can't
> > fail) it should be moved after
> > 
> > So the logic order should be
> > 
> > 1) Attempt to widen the aperture, if this fails the domain is
> >    incompatible bail immediately
> 
> Question. If the widening succeeds but we fail later during the attach
> e.g. in 2) then the aperture remains widend or would that be rolled
> back? 

I'd leave it widened.

IMHO I don't like this trick of setting the aperture on attach. It is
logically wrong. The aperture is part of the configuration of the page
table itself. The domain should know what page table format and thus
apterture it has the moment it is allocated. Usually this is the
related to the number of levels in the radix tree.

It seems to me that the issue here is trying to use the aperture when
the reserved region is the appropriate tool.

eg I see that s390_domain_alloc calls dma_alloc_cpu_table() which just
allocates a 3 level radix tree. This means it has a specific max
address that can be passed to dma_walk_cpu_trans(). So the aperture
should be fixed based on the radix tree parameters.

The device specific start/end should be represented as a reserved
regions per-device. See patch below..

This is meaningful because it effects when VFIO can share the domains
across devices. If devices have different reserved ranges we can still
share domains, so long as no mapping is placed in the union of the
reserved ranges. However if you vary the aperture, like is currently
happening, then the domains become unsharable.

diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
index c898bcbbce118f..ba80325da76cd9 100644
--- a/drivers/iommu/s390-iommu.c
+++ b/drivers/iommu/s390-iommu.c
@@ -51,6 +51,12 @@ static bool s390_iommu_capable(enum iommu_cap cap)
 	}
 }
 
+/*
+ * dma_alloc_cpu_table() creates a 3 level table, rtx, sx, px, so the address
+ * that can be passed to dma_walk_cpu_trans() must be less than this.
+ */
+#define MAX_DMA_TABLE_ADDR (ZPCI_TABLE_SIZE * ZPCI_TABLE_SIZE * ZPCI_PT_SIZE)
+
 static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
 {
 	struct s390_domain *s390_domain;
@@ -68,6 +74,10 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
 		return NULL;
 	}
 
+	s390_domain->domain.geometry.force_aperture = true;
+	s390_domain->domain.geometry.aperture_start = 0;
+	s390_domain->domain.geometry.aperture_end = MAX_DMA_TABLE_ADDR;
+
 	spin_lock_init(&s390_domain->dma_table_lock);
 	spin_lock_init(&s390_domain->list_lock);
 	INIT_LIST_HEAD(&s390_domain->devices);
@@ -119,18 +129,6 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	}
 
 	spin_lock_irqsave(&s390_domain->list_lock, flags);
-	/* First device defines the DMA range limits */
-	if (list_empty(&s390_domain->devices)) {
-		domain->geometry.aperture_start = zdev->start_dma;
-		domain->geometry.aperture_end = zdev->end_dma;
-		domain->geometry.force_aperture = true;
-	/* Allow only devices with identical DMA range limits */
-	} else if (domain->geometry.aperture_start != zdev->start_dma ||
-		   domain->geometry.aperture_end != zdev->end_dma) {
-		rc = -EINVAL;
-		spin_unlock_irqrestore(&s390_domain->list_lock, flags);
-		goto out_restore;
-	}
 	domain_device->zdev = zdev;
 	zdev->s390_domain = s390_domain;
 	list_add(&domain_device->list, &s390_domain->devices);
@@ -152,6 +150,30 @@ static int s390_iommu_attach_device(struct iommu_domain *domain,
 	return rc;
 }
 
+static void s390_iommu_get_resv_regions(struct device *dev,
+					struct list_head *list)
+{
+	struct zpci_dev *zdev = to_zpci_dev(dev);
+	struct iommu_resv_region *region;
+
+	if (zdev->start_dma) {
+		region = iommu_alloc_resv_region(0, zdev->start_dma, 0,
+						 IOMMU_RESV_RESERVED);
+		if (!region)
+			return;
+		list_add_tail(&region->list, list);
+	}
+
+	if (zdev->end_dma < MAX_DMA_TABLE_ADDR) {
+		region = iommu_alloc_resv_region(
+			zdev->end_dma, MAX_DMA_TABLE_ADDR - zdev->end_dma, 0,
+			IOMMU_RESV_RESERVED);
+		if (!region)
+			return;
+		list_add_tail(&region->list, list);
+	}
+}
+
 static void s390_iommu_detach_device(struct iommu_domain *domain,
 				     struct device *dev)
 {
@@ -376,6 +398,7 @@ static const struct iommu_ops s390_iommu_ops = {
 	.release_device = s390_iommu_release_device,
 	.device_group = generic_device_group,
 	.pgsize_bitmap = S390_IOMMU_PGSIZES,
+	.get_resv_regions = s390_iommu_get_resv_regions,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev	= s390_iommu_attach_device,
 		.detach_dev	= s390_iommu_detach_device,

  reply	other threads:[~2022-09-26 15:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22  9:52 [PATCH v2 0/3] iommu/s390: Fixes related to repeat attach_dev calls Niklas Schnelle
2022-09-22  9:52 ` [PATCH v2 1/3] iommu/s390: Fix duplicate domain attachments Niklas Schnelle
2022-09-22 14:33   ` Jason Gunthorpe
2022-09-26  9:00     ` Niklas Schnelle
2022-09-26 13:46       ` Jason Gunthorpe [this message]
2022-09-27 16:33         ` Niklas Schnelle
2022-09-27 16:56           ` Jason Gunthorpe
2022-09-28  8:58             ` Niklas Schnelle
2022-09-28 13:32               ` Jason Gunthorpe
2022-09-29  7:47                 ` Niklas Schnelle
2022-09-26 13:29     ` Niklas Schnelle
2022-09-26 13:51       ` Jason Gunthorpe
2022-09-27 16:24         ` Niklas Schnelle
2022-09-27 16:40           ` Jason Gunthorpe
2022-09-22  9:52 ` [PATCH v2 2/3] s390/pci: remove unused bus_next field from struct zpci_dev Niklas Schnelle
2022-09-26  9:17   ` Pierre Morel
2022-09-26  9:23     ` Pierre Morel
2022-09-26 13:41       ` Niklas Schnelle
2022-09-22  9:52 ` [PATCH v2 3/3] iommu/s390: Get rid of s390_domain_device Niklas Schnelle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzGtQY+uw4ZzZoSH@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=agordeev@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pmorel@linux.ibm.com \
    --cc=robin.murphy@arm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox