All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Jonathan Corbet <corbet@lwn.net>,
	David Woodhouse <dwmw2@infradead.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Joerg Roedel <joro@8bytes.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	"llvm@lists.linux.dev" <llvm@lists.linux.dev>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Shuah Khan <shuah@kernel.org>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	Tom Rix <trix@redhat.com>, Will Deacon <will@kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	Eric Auger <eric.auger@redhat.com>,
	Eric Farman <farman@linux.ibm.com>,
	Jason Wang <jasowang@redhat.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	"Martins, Joao" <joao.m.martins@oracle.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Shameerali Kolothum Thodi  <shameerali.kolothum.thodi@huawei.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	Keqian Zhu <zhukeqian1@huawei.com>
Subject: Re: [PATCH v4 08/17] iommufd: Algorithms for PFN storage
Date: Mon, 14 Nov 2022 14:02:38 -0400	[thread overview]
Message-ID: <Y3KCvnLOZpGXAGhU@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB52762E5ACAAE7D7B398730D78C059@BN9PR11MB5276.namprd11.prod.outlook.com>

On Mon, Nov 14, 2022 at 05:50:50AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, November 8, 2022 8:49 AM
> > 
> > @@ -171,7 +183,7 @@ static struct iopt_area
> > *iopt_pages_find_domain_area(struct iopt_pages *pages,
> >   */
> >  struct pfn_batch {
> >  	unsigned long *pfns;
> > -	u16 *npfns;
> > +	u32 *npfns;
> 
> why not making it u32 and removing later FIXME directly in patch7?

Rebase error, I fixed it

> >  static bool batch_add_pfn(struct pfn_batch *batch, unsigned long pfn)
> >  {
> > -	/* FIXME: U16 is too small */
> > +	const unsigned int MAX_NPFNS = type_max(typeof(*batch->npfns));
> 
> use lowercase i.e. max_npfns.

It is customary to use caps for constants, eg enum values and things

> > +static void __iopt_area_unfill_domain(struct iopt_area *area,
> > +				      struct iopt_pages *pages,
> > +				      struct iommu_domain *domain,
> > +				      unsigned long last_index)
> > +{
> > +	struct interval_tree_double_span_iter span;
> > +	unsigned long start_index = iopt_area_index(area);
> > +	unsigned long unmapped_end_index = start_index;
> > +	u64 backup[BATCH_BACKUP_SIZE];
> > +	struct pfn_batch batch;
> > +
> > +	lockdep_assert_held(&pages->mutex);
> > +
> > +	batch_init_backup(&batch, last_index + 1, backup, sizeof(backup));
> > +	interval_tree_for_each_double_span(&span, &pages-
> > >domains_itree,
> > +					   &pages->access_itree, start_index,
> > +					   last_index) {
> > +		if (span.is_used) {
> > +			batch_skip_carry(&batch,
> > +					 span.last_used - span.start_used + 1);
> > +			continue;
> > +		}
> > +		iopt_area_unpin_domain(&batch, area, pages, domain,
> > +				       span.start_hole, span.last_hole,
> > +				       &unmapped_end_index, last_index);
> > +	}
> > +	if (unmapped_end_index != last_index + 1)
> > +		iopt_area_unmap_domain_range(area, domain,
> > unmapped_end_index,
> > +					     last_index);
> 
> a comment marking that it's for the last trailing used span of which
> the pages are not contiguous to previous span.

	/*
	 * If the range ends in a access then we do the residual unmap without
	 * any unpins.
	 */

> btw it is not easy to understand how this func plus unpin_domain()
> actually work. more comments are welcomed to help readability.

	/*
	 * For security we must not unpin something that is still DMA mapped,
	 * so this must unmap any IOVA before we go ahead and unpin the pages.
	 * This creates a complexity where we need to skip over unpinning pages
	 * held in the xarray, but continue to unmap from the domain.
	 *
	 * The domain unmap cannot stop in the middle of a contiguous range of
	 * PFNs. To solve this problem the unpinning step will read ahead to the
	 * end of any contiguous span, unmap that whole span, and then only
	 * unpin the leading part that does not have any accesses. The residual
	 * PFNs that were unmapped but not unpinned are called a "carry" in the
	 * batch as they are moved to the front of the PFN list and continue on
	 * to the next iteration(s).
	 */

> > +/*
> > + * This can do everything and is fully coherent with what a iommu_domain
> > would
> > + * see.
> > + */
> > +static int iopt_pages_rw_slow(struct iopt_pages *pages,
> 
> Can you elaborate what guarantees coherency in this function and how it
> becomes different in other rw variations?

/*
 * This uses the pfn_reader instead of taking a shortcut by using the mm. It can
 * do every scenario and is fully consistent with what an iommu_domain would
 * see.
 */

> > + * iopt_pages_remove_access() - Release an in-kernel access for PFNs
> > + * @area: The source of PFNs
> > + * @start_index: First page index
> > + * @last_index: Inclusive last page index
> > + *
> > + * Undo iopt_pages_add_access() and unpin the pages if necessary. The
> > caller
> > + * must stop using the PFNs before calling this.
> > + */
> > +void iopt_pages_remove_access(struct iopt_area *area, unsigned long
> > start_index,
> > +			      unsigned long last_index)
> 
> this is called iopt_pages_xxx() but the first parameter is iopt_area.
> 
> also it's not balanced with iopt_pages_add_access() which requires the
> caller to hold pages->mutex and populate area->num_accesses.

OK, see below

Thanks,
Jason

--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -565,7 +565,7 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova)
-		iopt_pages_remove_access(
+		iopt_area_remove_access(
 			area, iopt_area_iova_to_index(area, iter.cur_iova),
 			iopt_area_iova_to_index(
 				area,
@@ -650,15 +650,10 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 			goto err_remove;
 		}
 
-		mutex_lock(&area->pages->mutex);
-		rc = iopt_pages_add_access(area->pages, index, last_index,
-					   out_pages, flags);
-		if (rc) {
-			mutex_unlock(&area->pages->mutex);
+		rc = iopt_pages_add_access(area, index, last_index, out_pages,
+					   flags);
+		if (rc)
 			goto err_remove;
-		}
-		area->num_accesses++;
-		mutex_unlock(&area->pages->mutex);
 		out_pages += last_index - index + 1;
 	}
 	if (!iopt_area_contig_done(&iter)) {
@@ -673,7 +668,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 	if (iova < iter.cur_iova) {
 		last_iova = iter.cur_iova - 1;
 		iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova)
-			iopt_pages_remove_access(
+			iopt_area_remove_access(
 				area,
 				iopt_area_iova_to_index(area, iter.cur_iova),
 				iopt_area_iova_to_index(
diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index 3b85fa344f6be3..68bc3957534dd7 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -221,10 +221,10 @@ int iopt_pages_fill_xarray(struct iopt_pages *pages, unsigned long start,
 void iopt_pages_unfill_xarray(struct iopt_pages *pages, unsigned long start,
 			      unsigned long last);
 
-int iopt_pages_add_access(struct iopt_pages *pages, unsigned long start,
-			unsigned long last, struct page **out_pages,
-			unsigned int flags);
-void iopt_pages_remove_access(struct iopt_area *area, unsigned long start,
+int iopt_area_add_access(struct iopt_area *area, unsigned long start,
+			 unsigned long last, struct page **out_pages,
+			 unsigned int flags);
+void iopt_area_remove_access(struct iopt_area *area, unsigned long start,
 			    unsigned long last);
 int iopt_pages_rw_access(struct iopt_pages *pages, unsigned long start_byte,
 			 void *data, unsigned long length, unsigned int flags);
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 01d2447eac4ede..e5f267d9e2b491 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -1807,8 +1826,8 @@ iopt_pages_get_exact_access(struct iopt_pages *pages, unsigned long index,
 }
 
 /**
- * iopt_pages_add_access() - Record an in-knerel access for PFNs
- * @pages: The source of PFNs
+ * iopt_area_add_access() - Record an in-knerel access for PFNs
+ * @area: The source of PFNs
  * @start_index: First page index
  * @last_index: Inclusive last page index
  * @out_pages: Output list of struct page's representing the PFNs
@@ -1819,40 +1838,49 @@ iopt_pages_get_exact_access(struct iopt_pages *pages, unsigned long index,
  *
  * This should be undone through a matching call to iopt_pages_remove_access()
  */
-int iopt_pages_add_access(struct iopt_pages *pages, unsigned long start_index,
+int iopt_area_add_access(struct iopt_area *area, unsigned long start_index,
 			  unsigned long last_index, struct page **out_pages,
 			  unsigned int flags)
 {
+	struct iopt_pages *pages = area->pages;
 	struct iopt_pages_access *access;
 	int rc;
 
 	if ((flags & IOMMUFD_ACCESS_RW_WRITE) && !pages->writable)
 		return -EPERM;
 
+	mutex_lock(&pages->mutex);
 	access = iopt_pages_get_exact_access(pages, start_index, last_index);
 	if (access) {
 		refcount_inc(&access->refcount);
 		iopt_pages_fill_from_xarray(pages, start_index, last_index,
 					    out_pages);
+		mutex_unlock(&pages->mutex);
 		return 0;
 	}
 
 	access = kzalloc(sizeof(*access), GFP_KERNEL_ACCOUNT);
-	if (!access)
-		return -ENOMEM;
+	if (!access) {
+		rc = -ENOMEM;
+		goto err_unlock;
+	}
 
 	rc = iopt_pages_fill_xarray(pages, start_index, last_index, out_pages);
 	if (rc)
-		goto out_free;
+		goto err_free;
 
 	access->node.start = start_index;
 	access->node.last = last_index;
 	refcount_set(&access->refcount, 1);
+	area->num_accesses++;
 	interval_tree_insert(&access->node, &pages->access_itree);
+	mutex_unlock(&pages->mutex);
 	return 0;
 
-out_free:
+err_free:
 	kfree(access);
+err_unlock:
+	mutex_unlock(&pages->mutex);
 	return rc;
 }
 
@@ -1865,11 +1893,11 @@ int iopt_pages_add_access(struct iopt_pages *pages, unsigned long start_index,
  * Undo iopt_pages_add_access() and unpin the pages if necessary. The caller
  * must stop using the PFNs before calling this.
  */
-void iopt_pages_remove_access(struct iopt_area *area, unsigned long start_index,
-			      unsigned long last_index)
+void iopt_area_remove_access(struct iopt_area *area, unsigned long start_index,
+			     unsigned long last_index)
 {
-	struct iopt_pages_access *access;
 	struct iopt_pages *pages = area->pages;
+	struct iopt_pages_access *access;
 
 	mutex_lock(&pages->mutex);
 	access = iopt_pages_get_exact_access(pages, start_index, last_index);

  reply	other threads:[~2022-11-14 18:04 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-08  0:48 [PATCH v4 00/17] IOMMUFD Generic interface Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 01/17] iommu: Add IOMMU_CAP_ENFORCE_CACHE_COHERENCY Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 02/17] iommu: Add device-centric DMA ownership interfaces Jason Gunthorpe
2022-11-11  5:37   ` Tian, Kevin
2022-11-14 16:44     ` Jason Gunthorpe
2022-11-14 13:33   ` Eric Auger
2022-11-14 16:58     ` Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 03/17] interval-tree: Add a utility to iterate over spans in an interval tree Jason Gunthorpe
2022-11-15 14:14   ` Eric Auger
2022-11-15 16:44     ` Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 04/17] iommufd: Document overview of iommufd Jason Gunthorpe
2022-11-08  3:45   ` Bagas Sanjaya
2022-11-08 17:10   ` [PATCH v4 4/17] " Jason Gunthorpe
2022-11-11  5:59     ` Tian, Kevin
2022-11-14 15:14       ` Jason Gunthorpe
2022-11-10  9:30   ` [PATCH v4 04/17] " Bagas Sanjaya
2022-11-10 14:49     ` Jonathan Corbet
2022-11-10 14:54       ` Jason Gunthorpe
2022-11-10 15:10         ` Jonathan Corbet
2022-11-10 15:23           ` Jason Gunthorpe
2022-11-10 15:28             ` Jonathan Corbet
2022-11-10 15:29               ` Jason Gunthorpe
2022-11-10 15:52                 ` Jonathan Corbet
2022-11-10 16:54                   ` Jason Gunthorpe
2022-11-11  1:46       ` Bagas Sanjaya
2022-11-14 20:50   ` Eric Auger
2022-11-15  0:52     ` Jason Gunthorpe
2022-11-08  0:48 ` [PATCH v4 05/17] iommufd: File descriptor, context, kconfig and makefiles Jason Gunthorpe
2022-11-11  6:07   ` Tian, Kevin
2022-11-08  0:48 ` [PATCH v4 06/17] kernel/user: Allow user::locked_vm to be usable for iommufd Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 07/17] iommufd: PFN handling for iopt_pages Jason Gunthorpe
2022-11-11  9:56   ` Tian, Kevin
2022-11-14 17:20     ` Jason Gunthorpe
2022-11-11 11:09   ` Tian, Kevin
2022-11-14 17:24     ` Jason Gunthorpe
2022-11-15  2:59       ` Tian, Kevin
2022-11-08  0:49 ` [PATCH v4 08/17] iommufd: Algorithms for PFN storage Jason Gunthorpe
2022-11-14  5:50   ` Tian, Kevin
2022-11-14 18:02     ` Jason Gunthorpe [this message]
2022-11-15  3:06       ` Tian, Kevin
2022-11-15 14:49         ` Jason Gunthorpe
2022-11-14 19:19   ` [PATCH v4 8/17] " Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 09/17] iommufd: Data structure to provide IOVA to PFN mapping Jason Gunthorpe
2022-11-14  7:28   ` Tian, Kevin
2022-11-14 18:43     ` Jason Gunthorpe
2022-11-15  3:13       ` Tian, Kevin
2022-11-15 15:05         ` Jason Gunthorpe
2022-11-16  0:09           ` Tian, Kevin
2022-11-16  0:32             ` Jason Gunthorpe
2022-11-16  2:30               ` Tian, Kevin
2022-11-08  0:49 ` [PATCH v4 10/17] iommufd: IOCTLs for the io_pagetable Jason Gunthorpe
2022-11-08 13:27   ` Bagas Sanjaya
2022-11-08 17:01     ` Jason Gunthorpe
2022-11-14  7:46   ` Tian, Kevin
2022-11-08  0:49 ` [PATCH v4 11/17] iommufd: Add a HW pagetable object Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 12/17] iommufd: Add kAPI toward external drivers for physical devices Jason Gunthorpe
2022-11-08 14:34   ` Yi Liu
2022-11-08 17:57     ` Jason Gunthorpe
2022-11-14  7:59   ` Tian, Kevin
2022-11-08  0:49 ` [PATCH v4 13/17] iommufd: Add kAPI toward external drivers for kernel access Jason Gunthorpe
2022-11-14  8:25   ` Tian, Kevin
2022-11-14 19:05     ` Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 14/17] iommufd: vfio container FD ioctl compatibility Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 15/17] iommufd: Add a selftest Jason Gunthorpe
2022-11-08  1:01   ` Jason Gunthorpe
2022-11-08  5:48   ` Nicolin Chen
2022-11-08 13:27     ` Jason Gunthorpe
2022-11-09 23:51   ` Matthew Rosato
2022-11-08  0:49 ` [PATCH v4 16/17] iommufd: Add some fault injection points Jason Gunthorpe
2022-11-08  7:25   ` Nicolin Chen
2022-11-08 12:37     ` Jason Gunthorpe
2022-11-08  0:49 ` [PATCH v4 17/17] iommufd: Add additional invariant assertions Jason Gunthorpe
2022-11-08  1:09 ` S390 testing for IOMMUFD Jason Gunthorpe
2022-11-08 10:12   ` Christian Borntraeger
2022-11-08 14:04     ` Anthony Krowiak
2022-11-09 14:49       ` Anthony Krowiak
2022-11-09 16:12         ` Jason Gunthorpe
2022-11-09 19:13           ` Anthony Krowiak
2022-11-09 20:43             ` Jason Gunthorpe
2022-11-09 19:09         ` Anthony Krowiak
2022-11-08 13:50   ` Matthew Rosato
2022-11-08 13:54     ` Jason Gunthorpe
2022-11-08 14:19       ` Eric Farman
2022-11-08 14:37         ` Jason Gunthorpe
2022-11-08 15:29           ` Eric Farman
2022-11-08 19:18             ` Matthew Rosato
2022-11-08 20:04               ` Jason Gunthorpe
2022-11-08 20:17                 ` Eric Farman
2022-11-08 19:34             ` Jason Gunthorpe
2022-11-08 20:07               ` Eric Farman
2022-11-08 20:10                 ` Jason Gunthorpe
2022-11-09  0:41               ` [PATCH 1/2] vfio/ccw: Convert to use vfio_dma_rw() kernel test robot
2022-11-11 15:51 ` [PATCH v4 00/17] IOMMUFD Generic interface Shameerali Kolothum Thodi
2022-11-12 12:44   ` Yi Liu
2023-01-10 11:35     ` Shameerali Kolothum Thodi
2023-01-10 13:49       ` Jason Gunthorpe
2023-01-10 15:16         ` Joao Martins
2023-01-10 15:18           ` Jason Gunthorpe
2023-01-10 15:30           ` Shameerali Kolothum Thodi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3KCvnLOZpGXAGhU@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=chaitanyak@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=corbet@lwn.net \
    --cc=daniel.m.jordan@oracle.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=dwmw2@infradead.org \
    --cc=eric.auger@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=iommu@lists.linux.dev \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mjrosato@linux.ibm.com \
    --cc=mst@redhat.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=nicolinc@nvidia.com \
    --cc=ojeda@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=shuah@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.