All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/9] iommu_ioas_map_file
@ 2024-09-24 15:05 Steve Sistare
  2024-09-24 15:05 ` [PATCH V2 1/9] iommufd: rename uptr in iopt_alloc_iova Steve Sistare
                   ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

Provide the IOMMU_IOAS_MAP_FILE ioctl, which allows a user to register
memory by passing a memfd plus offset and length.  Implement it using
the memfd_map_folios KAPI, and the proposed folio_split_user_page_pin KAPI.
See the individual patches for details.

Changes in V2:
 * changed names and commit message in "rename uptr in iopt_alloc_iova"
 * normalized comments describing the iopt_map_user_pages interface
 * submitted folio_split_user_page_pin (fka folio_repin_unhugely) separately
 * replaced nupages[] optimization with folio-to-batch optimization
 * added selftests for map file

Steve Sistare (9):
  iommufd: rename uptr in iopt_alloc_iova
  iommufd: generalize iopt_pages address
  iommufd: pfn reader for file mappings
  iommufd: IOMMU_IOAS_MAP_FILE interface
  iommufd: IOMMU_IOAS_MAP_FILE implementation
  iommufd: file mappings for mdev
  iommufd: pfn reader local variables
  iommufd: optimize file mapping
  iommufd: map file selftest

 drivers/iommu/iommufd/io_pagetable.c          | 121 +++++++----
 drivers/iommu/iommufd/io_pagetable.h          |  20 +-
 drivers/iommu/iommufd/ioas.c                  |  43 ++++
 drivers/iommu/iommufd/iommufd_private.h       |   6 +
 drivers/iommu/iommufd/main.c                  |   2 +
 drivers/iommu/iommufd/pages.c                 | 292 ++++++++++++++++++++++----
 include/uapi/linux/iommufd.h                  |  25 +++
 tools/testing/selftests/iommu/iommufd.c       | 156 ++++++++++++--
 tools/testing/selftests/iommu/iommufd_utils.h |  41 ++++
 9 files changed, 607 insertions(+), 99 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH V2 1/9] iommufd: rename uptr in iopt_alloc_iova
  2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
@ 2024-09-24 15:05 ` Steve Sistare
  2024-09-27 15:59   ` Jason Gunthorpe
  2024-09-24 15:05 ` [PATCH V2 2/9] iommufd: generalize iopt_pages address Steve Sistare
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

iopt_alloc_iova takes a uptr argument but only checks for its alignment.
Generalize this to an unsigned address, which can be the offset from the
start of a file in a subsequent patch.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/iommu/iommufd/io_pagetable.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 05fd9d3..20e31e1 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -107,9 +107,9 @@ static bool __alloc_iova_check_used(struct interval_tree_span_iter *span,
  * Does not return a 0 IOVA even if it is valid.
  */
 static int iopt_alloc_iova(struct io_pagetable *iopt, unsigned long *iova,
-			   unsigned long uptr, unsigned long length)
+			   unsigned long addr, unsigned long length)
 {
-	unsigned long page_offset = uptr % PAGE_SIZE;
+	unsigned long page_offset = addr % PAGE_SIZE;
 	struct interval_tree_double_span_iter used_span;
 	struct interval_tree_span_iter allowed_span;
 	unsigned long iova_alignment;
@@ -121,15 +121,15 @@ static int iopt_alloc_iova(struct io_pagetable *iopt, unsigned long *iova,
 		return -EOVERFLOW;
 
 	/*
-	 * Keep alignment present in the uptr when building the IOVA, this
+	 * Keep alignment present in addr when building the IOVA, which
 	 * increases the chance we can map a THP.
 	 */
-	if (!uptr)
+	if (!addr)
 		iova_alignment = roundup_pow_of_two(length);
 	else
 		iova_alignment = min_t(unsigned long,
 				       roundup_pow_of_two(length),
-				       1UL << __ffs64(uptr));
+				       1UL << __ffs64(addr));
 
 	if (iova_alignment < iopt->iova_alignment)
 		return -EINVAL;
@@ -240,7 +240,7 @@ static int iopt_alloc_area_pages(struct io_pagetable *iopt,
 				 int iommu_prot, unsigned int flags)
 {
 	struct iopt_pages_list *elm;
-	unsigned long iova;
+	unsigned long iova, start;
 	int rc = 0;
 
 	list_for_each_entry(elm, pages_list, next) {
@@ -259,9 +259,8 @@ static int iopt_alloc_area_pages(struct io_pagetable *iopt,
 		/* Use the first entry to guess the ideal IOVA alignment */
 		elm = list_first_entry(pages_list, struct iopt_pages_list,
 				       next);
-		rc = iopt_alloc_iova(
-			iopt, dst_iova,
-			(uintptr_t)elm->pages->uptr + elm->start_byte, length);
+		start = elm->start_byte + (uintptr_t)elm->pages->uptr;
+		rc = iopt_alloc_iova(iopt, dst_iova, start, length);
 		if (rc)
 			goto out_unlock;
 		if (IS_ENABLED(CONFIG_IOMMUFD_TEST) &&
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 2/9] iommufd: generalize iopt_pages address
  2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
  2024-09-24 15:05 ` [PATCH V2 1/9] iommufd: rename uptr in iopt_alloc_iova Steve Sistare
@ 2024-09-24 15:05 ` Steve Sistare
  2024-09-24 20:31   ` Nicolin Chen
  2024-10-01 19:54   ` Jason Gunthorpe
  2024-09-24 15:05 ` [PATCH V2 3/9] iommufd: pfn reader for file mappings Steve Sistare
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

The starting address in iopt_pages is currently a __user *uptr.  Generalize
to allow other types of addresses.  Refactor iopt_alloc_pages and
iopt_map_user_pages into address-type specific and common functions.

Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/iommu/iommufd/io_pagetable.c | 60 +++++++++++++++++++++++-------------
 drivers/iommu/iommufd/io_pagetable.h | 13 ++++++--
 drivers/iommu/iommufd/pages.c        | 30 +++++++++++++-----
 3 files changed, 71 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 20e31e1..a0a66f1 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -375,6 +375,38 @@ int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list,
 	return rc;
 }
 
+static int iopt_map_common(struct iommufd_ctx *ictx,
+			   struct io_pagetable *iopt,
+			   struct iopt_pages *pages,
+			   unsigned long *iova,
+			   unsigned long length,
+			   unsigned long start_byte,
+			   int iommu_prot,
+			   unsigned int flags)
+{
+	struct iopt_pages_list elm = {};
+	LIST_HEAD(pages_list);
+	int rc;
+
+	elm.pages = pages;
+	elm.start_byte = start_byte;
+	if (ictx->account_mode == IOPT_PAGES_ACCOUNT_MM &&
+	    elm.pages->account_mode == IOPT_PAGES_ACCOUNT_USER)
+		elm.pages->account_mode = IOPT_PAGES_ACCOUNT_MM;
+	elm.length = length;
+	list_add(&elm.next, &pages_list);
+
+	rc = iopt_map_pages(iopt, &pages_list, length, iova, iommu_prot, flags);
+	if (rc) {
+		if (elm.area)
+			iopt_abort_area(elm.area);
+		if (elm.pages)
+			iopt_put_pages(elm.pages);
+		return rc;
+	}
+	return 0;
+}
+
 /**
  * iopt_map_user_pages() - Map a user VA to an iova in the io page table
  * @ictx: iommufd_ctx the iopt is part of
@@ -399,29 +431,15 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
 			unsigned long length, int iommu_prot,
 			unsigned int flags)
 {
-	struct iopt_pages_list elm = {};
-	LIST_HEAD(pages_list);
-	int rc;
+	struct iopt_pages *pages;
 
-	elm.pages = iopt_alloc_pages(uptr, length, iommu_prot & IOMMU_WRITE);
-	if (IS_ERR(elm.pages))
-		return PTR_ERR(elm.pages);
-	if (ictx->account_mode == IOPT_PAGES_ACCOUNT_MM &&
-	    elm.pages->account_mode == IOPT_PAGES_ACCOUNT_USER)
-		elm.pages->account_mode = IOPT_PAGES_ACCOUNT_MM;
-	elm.start_byte = uptr - elm.pages->uptr;
-	elm.length = length;
-	list_add(&elm.next, &pages_list);
+	pages = iopt_alloc_user_pages(uptr, length, iommu_prot & IOMMU_WRITE);
+	if (IS_ERR(pages))
+		return PTR_ERR(pages);
 
-	rc = iopt_map_pages(iopt, &pages_list, length, iova, iommu_prot, flags);
-	if (rc) {
-		if (elm.area)
-			iopt_abort_area(elm.area);
-		if (elm.pages)
-			iopt_put_pages(elm.pages);
-		return rc;
-	}
-	return 0;
+	return iopt_map_common(ictx, iopt, pages, iova, length,
+			       uptr - pages->uptr,
+			       iommu_prot, flags);
 }
 
 struct iova_bitmap_fn_arg {
diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index 0ec3509..7c4a338 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -175,6 +175,10 @@ enum {
 	IOPT_PAGES_ACCOUNT_MM = 2,
 };
 
+enum iopt_address_type {
+	IOPT_ADDRESS_USER = 0,
+};
+
 /*
  * This holds a pinned page list for multiple areas of IO address space. The
  * pages always originate from a linear chunk of userspace VA. Multiple
@@ -195,7 +199,10 @@ struct iopt_pages {
 	struct task_struct *source_task;
 	struct mm_struct *source_mm;
 	struct user_struct *source_user;
-	void __user *uptr;
+	enum iopt_address_type type;
+	union {
+		void __user *uptr;		/* IOPT_ADDRESS_USER */
+	};
 	bool writable:1;
 	u8 account_mode;
 
@@ -206,8 +213,8 @@ struct iopt_pages {
 	struct rb_root_cached domains_itree;
 };
 
-struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
-				    bool writable);
+struct iopt_pages *iopt_alloc_user_pages(void __user *uptr,
+					 unsigned long length, bool writable);
 void iopt_release_pages(struct kref *kref);
 static inline void iopt_put_pages(struct iopt_pages *pages)
 {
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 117f644..69822d4 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -1139,11 +1139,11 @@ static int pfn_reader_first(struct pfn_reader *pfns, struct iopt_pages *pages,
 	return 0;
 }
 
-struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
-				    bool writable)
+static struct iopt_pages *iopt_alloc_pages(unsigned long length,
+					   unsigned long start_byte,
+					   bool writable)
 {
 	struct iopt_pages *pages;
-	unsigned long end;
 
 	/*
 	 * The iommu API uses size_t as the length, and protect the DIV_ROUND_UP
@@ -1152,9 +1152,6 @@ struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
 	if (length > SIZE_MAX - PAGE_SIZE || length == 0)
 		return ERR_PTR(-EINVAL);
 
-	if (check_add_overflow((unsigned long)uptr, length, &end))
-		return ERR_PTR(-EOVERFLOW);
-
 	pages = kzalloc(sizeof(*pages), GFP_KERNEL_ACCOUNT);
 	if (!pages)
 		return ERR_PTR(-ENOMEM);
@@ -1164,8 +1161,7 @@ struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
 	mutex_init(&pages->mutex);
 	pages->source_mm = current->mm;
 	mmgrab(pages->source_mm);
-	pages->uptr = (void __user *)ALIGN_DOWN((uintptr_t)uptr, PAGE_SIZE);
-	pages->npages = DIV_ROUND_UP(length + (uptr - pages->uptr), PAGE_SIZE);
+	pages->npages = DIV_ROUND_UP(length + start_byte, PAGE_SIZE);
 	pages->access_itree = RB_ROOT_CACHED;
 	pages->domains_itree = RB_ROOT_CACHED;
 	pages->writable = writable;
@@ -1179,6 +1175,24 @@ struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
 	return pages;
 }
 
+struct iopt_pages *iopt_alloc_user_pages(void __user *uptr,
+					 unsigned long length, bool writable)
+{
+	struct iopt_pages *pages;
+	unsigned long end;
+	void __user *uptr_down = (void __user *) ALIGN_DOWN((uintptr_t)uptr, PAGE_SIZE);
+
+	if (check_add_overflow((unsigned long)uptr, length, &end))
+		return ERR_PTR(-EOVERFLOW);
+
+	pages = iopt_alloc_pages(length, uptr - uptr_down, writable);
+	if (IS_ERR(pages))
+		return pages;
+	pages->uptr = uptr_down;
+	pages->type = IOPT_ADDRESS_USER;
+	return pages;
+}
+
 void iopt_release_pages(struct kref *kref)
 {
 	struct iopt_pages *pages = container_of(kref, struct iopt_pages, kref);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 3/9] iommufd: pfn reader for file mappings
  2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
  2024-09-24 15:05 ` [PATCH V2 1/9] iommufd: rename uptr in iopt_alloc_iova Steve Sistare
  2024-09-24 15:05 ` [PATCH V2 2/9] iommufd: generalize iopt_pages address Steve Sistare
@ 2024-09-24 15:05 ` Steve Sistare
  2024-09-24 20:53   ` Nicolin Chen
                     ` (3 more replies)
  2024-09-24 15:05 ` [PATCH V2 4/9] iommufd: IOMMU_IOAS_MAP_FILE interface Steve Sistare
                   ` (5 subsequent siblings)
  8 siblings, 4 replies; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

Extend pfn_reader_user to pin file mappings, by calling memfd_pin_folios.
Repin at small page granularity and unpack pages into upages[] to mesh
with the existing code paths.  This is sub-optimal but simple, and will be
optimized in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/iommu/iommufd/io_pagetable.h |  5 +++
 drivers/iommu/iommufd/pages.c        | 86 +++++++++++++++++++++++++++++++-----
 2 files changed, 80 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index 7c4a338..3a28f46 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -177,6 +177,7 @@ enum {
 
 enum iopt_address_type {
 	IOPT_ADDRESS_USER = 0,
+	IOPT_ADDRESS_FILE = 1,
 };
 
 /*
@@ -202,6 +203,10 @@ struct iopt_pages {
 	enum iopt_address_type type;
 	union {
 		void __user *uptr;		/* IOPT_ADDRESS_USER */
+		struct {			/* IOPT_ADDRESS_FILE */
+			struct file *file;
+			unsigned long start;
+		};
 	};
 	bool writable:1;
 	u8 account_mode;
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 69822d4..df5ba4f 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -703,19 +703,28 @@ struct pfn_reader_user {
 	 * neither
 	 */
 	int locked;
+
+	/* The following are only valid if file != NULL. */
+	struct file *file;
+	struct folio **ufolios;
+	unsigned long ufolios_len;
 };
 
 static void pfn_reader_user_init(struct pfn_reader_user *user,
 				 struct iopt_pages *pages)
 {
 	user->upages = NULL;
+	user->upages_len = 0;
 	user->upages_start = 0;
 	user->upages_end = 0;
 	user->locked = -1;
-
 	user->gup_flags = FOLL_LONGTERM;
 	if (pages->writable)
 		user->gup_flags |= FOLL_WRITE;
+
+	user->file = (pages->type == IOPT_ADDRESS_FILE) ? pages->file : NULL;
+	user->ufolios = NULL;
+	user->ufolios_len = 0;
 }
 
 static void pfn_reader_user_destroy(struct pfn_reader_user *user,
@@ -731,6 +740,45 @@ static void pfn_reader_user_destroy(struct pfn_reader_user *user,
 
 	kfree(user->upages);
 	user->upages = NULL;
+	kfree(user->ufolios);
+	user->ufolios = NULL;
+}
+
+static long pin_memfd_pages(struct pfn_reader_user *user,
+			    unsigned long start,
+			    unsigned long npages)
+{
+	unsigned long end, nr, i, j, npin, offset, npages_out;
+	long nfolios;
+	struct folio *folio;
+	struct page **upages = user->upages;
+
+	nfolios = user->ufolios_len / sizeof(*user->ufolios);
+	end = start + (npages << PAGE_SHIFT) - 1;
+
+	nfolios = memfd_pin_folios(user->file, start, end,
+				   user->ufolios, nfolios, &offset);
+	if (nfolios <= 0)
+		return nfolios;
+
+	offset >>= PAGE_SHIFT;
+	npages_out = 0;
+
+	for (i = 0; i < nfolios; i++) {
+		folio = user->ufolios[i];
+		nr = folio_nr_pages(folio);
+		npin = min(nr - offset, npages);
+		if (nr > 1) {
+			folio_split_user_page_pin(folio, npin);
+		}
+		for (j = offset; j < offset + npin; j++)
+			*upages++ = folio_page(folio, j);
+		npages -= npin;
+		npages_out += npin;
+		offset = 0;
+	}
+
+	return npages_out;
 }
 
 static int pfn_reader_user_pin(struct pfn_reader_user *user,
@@ -739,7 +787,8 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
 			       unsigned long last_index)
 {
 	bool remote_mm = pages->source_mm != current->mm;
-	unsigned long npages;
+	unsigned long npages = last_index - start_index + 1;
+	unsigned long start, unum;
 	uintptr_t uptr;
 	long rc;
 
@@ -749,14 +798,25 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
 
 	if (!user->upages) {
 		/* All undone in pfn_reader_destroy() */
-		user->upages_len =
-			(last_index - start_index + 1) * sizeof(*user->upages);
+		user->upages_len = npages * sizeof(*user->upages);
 		user->upages = temp_kmalloc(&user->upages_len, NULL, 0);
 		if (!user->upages)
 			return -ENOMEM;
 	}
 
-	if (user->locked == -1) {
+	if (user->file && !user->ufolios) {
+		user->ufolios_len = npages * sizeof(*user->ufolios);
+		user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
+		if (!user->ufolios)
+			return -ENOMEM;
+
+		/* Bail for now.  Be more robust when we optimize for folios. */
+		if (user->ufolios_len / sizeof(*user->ufolios) <
+		    user->upages_len / sizeof(*user->upages))
+			return -ENOMEM;
+	}
+
+	if (!user->file && user->locked == -1) {
 		/*
 		 * The majority of usages will run the map task within the mm
 		 * providing the pages, so we can optimize into
@@ -769,18 +829,22 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
 		user->locked = 0;
 	}
 
-	npages = min_t(unsigned long, last_index - start_index + 1,
-		       user->upages_len / sizeof(*user->upages));
-
+	unum = user->file ? user->ufolios_len / sizeof(*user->ufolios) :
+			    user->upages_len / sizeof(*user->upages);
+	npages = min_t(unsigned long, npages, unum);
 
 	if (iommufd_should_fail())
 		return -EFAULT;
 
-	uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
-	if (!remote_mm)
+	if (user->file) {
+		start = pages->start + (start_index * PAGE_SIZE);
+		rc = pin_memfd_pages(user, start, npages);
+	} else if (!remote_mm) {
+		uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
 		rc = pin_user_pages_fast(uptr, npages, user->gup_flags,
 					 user->upages);
-	else {
+	} else {
+		uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
 		if (!user->locked) {
 			mmap_read_lock(pages->source_mm);
 			user->locked = 1;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 4/9] iommufd: IOMMU_IOAS_MAP_FILE interface
  2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
                   ` (2 preceding siblings ...)
  2024-09-24 15:05 ` [PATCH V2 3/9] iommufd: pfn reader for file mappings Steve Sistare
@ 2024-09-24 15:05 ` Steve Sistare
  2024-10-01 19:58   ` Jason Gunthorpe
  2024-09-24 15:05 ` [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation Steve Sistare
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

Define the IOMMU_IOAS_MAP_FILE ioctl interface, which allows a user to
register memory by passing a memfd plus offset and length.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/iommu/iommufd/ioas.c            |  5 +++++
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 drivers/iommu/iommufd/main.c            |  2 ++
 include/uapi/linux/iommufd.h            | 25 +++++++++++++++++++++++++
 4 files changed, 33 insertions(+)

diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index 157a89b..270e8ea 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -197,6 +197,11 @@ static int conv_iommu_prot(u32 map_flags)
 	return iommu_prot;
 }
 
+int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd)
+{
+	return -ENOTTY;
+}
+
 int iommufd_ioas_map(struct iommufd_ucmd *ucmd)
 {
 	struct iommu_ioas_map *cmd = ucmd->cmd;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 92efe30..21ec075 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -275,6 +275,7 @@ static inline struct iommufd_ioas *iommufd_get_ioas(struct iommufd_ctx *ictx,
 int iommufd_ioas_iova_ranges(struct iommufd_ucmd *ucmd);
 int iommufd_ioas_allow_iovas(struct iommufd_ucmd *ucmd);
 int iommufd_ioas_map(struct iommufd_ucmd *ucmd);
+int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd);
 int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
 int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
 int iommufd_ioas_option(struct iommufd_ucmd *ucmd);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 83bbd7c..5f77ce6 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -378,6 +378,8 @@ struct iommufd_ioctl_op {
 		 struct iommu_ioas_iova_ranges, out_iova_alignment),
 	IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map,
 		 iova),
+	IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file,
+		struct iommu_ioas_map_file, iova),
 	IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap,
 		 length),
 	IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 4dde745..c484981 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -51,6 +51,7 @@ enum {
 	IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP = 0x8c,
 	IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d,
 	IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e,
+	IOMMUFD_CMD_IOAS_MAP_FILE = 0x8f,
 };
 
 /**
@@ -214,6 +215,30 @@ struct iommu_ioas_map {
 #define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP)
 
 /**
+ * struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE)
+ * @size: sizeof(struct iommu_ioas_map_file)
+ * @fd: the memfd to map
+ * @start: byte offset from start of file to map from
+ * @flags: same as for iommu_ioas_map
+ * @ios_id: ditto
+ * @length: ditto
+ * @iova: ditto
+ *
+ * Set an IOVA mapping from a memfd file.  All other arguments and semantics
+ * match those of IOMMU_IOAS_MAP.
+ */
+struct iommu_ioas_map_file {
+	__u32 size;
+	__u32 flags;
+	__u32 ioas_id;
+	__s32 fd;
+	__aligned_u64 start;
+	__aligned_u64 length;
+	__aligned_u64 iova;
+};
+#define IOMMU_IOAS_MAP_FILE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP_FILE)
+
+/**
  * struct iommu_ioas_copy - ioctl(IOMMU_IOAS_COPY)
  * @size: sizeof(struct iommu_ioas_copy)
  * @flags: Combination of enum iommufd_ioas_map_flags
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation
  2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
                   ` (3 preceding siblings ...)
  2024-09-24 15:05 ` [PATCH V2 4/9] iommufd: IOMMU_IOAS_MAP_FILE interface Steve Sistare
@ 2024-09-24 15:05 ` Steve Sistare
  2024-09-24 21:08   ` Nicolin Chen
  2024-10-01 20:05   ` Jason Gunthorpe
  2024-09-24 15:05 ` [PATCH V2 6/9] iommufd: file mappings for mdev Steve Sistare
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

Implement the IOMMU_IOAS_MAP_FILE ioctl using the memfd_pin_folios KAPI.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/iommu/iommufd/io_pagetable.c    | 34 +++++++++++++++++++++++++++-
 drivers/iommu/iommufd/io_pagetable.h    |  2 ++
 drivers/iommu/iommufd/ioas.c            | 40 ++++++++++++++++++++++++++++++++-
 drivers/iommu/iommufd/iommufd_private.h |  5 +++++
 drivers/iommu/iommufd/pages.c           | 19 +++++++++++++++-
 5 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index a0a66f1..1e14a0f 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -259,7 +259,10 @@ static int iopt_alloc_area_pages(struct io_pagetable *iopt,
 		/* Use the first entry to guess the ideal IOVA alignment */
 		elm = list_first_entry(pages_list, struct iopt_pages_list,
 				       next);
-		start = elm->start_byte + (uintptr_t)elm->pages->uptr;
+		if (elm->pages->type == IOPT_ADDRESS_FILE)
+			start = elm->start_byte + elm->pages->start;
+		else
+			start = elm->start_byte + (uintptr_t)elm->pages->uptr;
 		rc = iopt_alloc_iova(iopt, dst_iova, start, length);
 		if (rc)
 			goto out_unlock;
@@ -442,6 +445,35 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
 			       iommu_prot, flags);
 }
 
+/**
+ * iopt_map_file_pages() - Like iopt_map_user_pages, but map a file.
+ * @ictx: iommufd_ctx the iopt is part of
+ * @iopt: io_pagetable to act on
+ * @iova: If IOPT_ALLOC_IOVA is set this is unused on input and contains
+ *        the chosen iova on output. Otherwise is the iova to map to on input
+ * @file: file to map
+ * @start: map file starting at this byte offset
+ * @length: Number of bytes to map
+ * @iommu_prot: Combination of IOMMU_READ/WRITE/etc bits for the mapping
+ * @flags: IOPT_ALLOC_IOVA or zero
+ */
+int iopt_map_file_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
+			unsigned long *iova,
+			struct file *file, unsigned long start,
+			unsigned long length, int iommu_prot,
+			unsigned int flags)
+{
+	struct iopt_pages *pages;
+
+	pages = iopt_alloc_file_pages(file, start, length,
+				      iommu_prot & IOMMU_WRITE);
+	if (IS_ERR(pages))
+		return PTR_ERR(pages);
+	return iopt_map_common(ictx, iopt, pages, iova, length,
+			       start - pages->start,
+			       iommu_prot, flags);
+}
+
 struct iova_bitmap_fn_arg {
 	unsigned long flags;
 	struct io_pagetable *iopt;
diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index 3a28f46..909d396 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -220,6 +220,8 @@ struct iopt_pages {
 
 struct iopt_pages *iopt_alloc_user_pages(void __user *uptr,
 					 unsigned long length, bool writable);
+struct iopt_pages *iopt_alloc_file_pages(struct file *file, unsigned long start,
+					 unsigned long length, bool writable);
 void iopt_release_pages(struct kref *kref);
 static inline void iopt_put_pages(struct iopt_pages *pages)
 {
diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index 270e8ea..2d25ba0 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -2,6 +2,7 @@
 /*
  * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
  */
+#include <linux/file.h>
 #include <linux/interval_tree.h>
 #include <linux/iommufd.h>
 #include <linux/iommu.h>
@@ -199,7 +200,44 @@ static int conv_iommu_prot(u32 map_flags)
 
 int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd)
 {
-	return -ENOTTY;
+	struct iommu_ioas_map_file *cmd = ucmd->cmd;
+	unsigned long iova = cmd->iova;
+	struct iommufd_ioas *ioas;
+	unsigned int flags = 0;
+	int rc;
+	struct file *file;
+	unsigned long end;
+
+	if (cmd->iova >= ULONG_MAX || cmd->length >= ULONG_MAX)
+		return -EOVERFLOW;
+
+	if (check_add_overflow(cmd->start, cmd->length - 1, &end))
+		return -EOVERFLOW;
+
+	ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
+	if (IS_ERR(ioas))
+		return PTR_ERR(ioas);
+
+	if (!(cmd->flags & IOMMU_IOAS_MAP_FIXED_IOVA))
+		flags = IOPT_ALLOC_IOVA;
+
+	file = fget(cmd->fd);
+	if (!file)
+		return -EBADF;
+
+	rc = iopt_map_file_pages(ucmd->ictx, &ioas->iopt, &iova,
+				 file, cmd->start,
+				 cmd->length,
+				 conv_iommu_prot(cmd->flags), flags);
+	if (rc)
+		goto out_put;
+
+	cmd->iova = iova;
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out_put:
+	iommufd_put_object(ucmd->ictx, &ioas->obj);
+	fput(file);
+	return rc;
 }
 
 int iommufd_ioas_map(struct iommufd_ucmd *ucmd)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 21ec075..0ebb5bd 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -68,6 +68,11 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
 			unsigned long *iova, void __user *uptr,
 			unsigned long length, int iommu_prot,
 			unsigned int flags);
+int iopt_map_file_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
+			unsigned long *iova,
+			struct file *file, unsigned long start,
+			unsigned long length, int iommu_prot,
+			unsigned int flags);
 int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list,
 		   unsigned long length, unsigned long *dst_iova,
 		   int iommu_prot, unsigned int flags);
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index df5ba4f..0a432dc2 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -902,7 +902,8 @@ static int update_mm_locked_vm(struct iopt_pages *pages, unsigned long npages,
 		mmap_read_unlock(pages->source_mm);
 		user->locked = 0;
 		/* If we had the lock then we also have a get */
-	} else if ((!user || !user->upages) &&
+
+	} else if ((!user || (!user->upages && !user->ufolios)) &&
 		   pages->source_mm != current->mm) {
 		if (!mmget_not_zero(pages->source_mm))
 			return -EINVAL;
@@ -1257,6 +1258,22 @@ struct iopt_pages *iopt_alloc_user_pages(void __user *uptr,
 	return pages;
 }
 
+struct iopt_pages *iopt_alloc_file_pages(struct file *file, unsigned long start,
+					 unsigned long length, bool writable)
+
+{
+	struct iopt_pages *pages;
+	unsigned long start_down = ALIGN_DOWN(start, PAGE_SIZE);
+
+	pages = iopt_alloc_pages(length, start - start_down, writable);
+	if (IS_ERR(pages))
+		return pages;
+	pages->file = get_file(file);
+	pages->start = start_down;
+	pages->type = IOPT_ADDRESS_FILE;
+	return pages;
+}
+
 void iopt_release_pages(struct kref *kref)
 {
 	struct iopt_pages *pages = container_of(kref, struct iopt_pages, kref);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 6/9] iommufd: file mappings for mdev
  2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
                   ` (4 preceding siblings ...)
  2024-09-24 15:05 ` [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation Steve Sistare
@ 2024-09-24 15:05 ` Steve Sistare
  2024-10-01 20:09   ` Jason Gunthorpe
  2024-09-24 15:05 ` [PATCH V2 7/9] iommufd: pfn reader local variables Steve Sistare
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

Support file mappings for mediated devices, aka mdevs.  Access is initiated
by the vfio_pin_pages and vfio_dma_rw kernel interfaces.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/iommu/iommufd/pages.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 0a432dc2..9d31df4 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -1725,11 +1725,11 @@ static int iopt_pages_fill_from_domain(struct iopt_pages *pages,
 	return 0;
 }
 
-static int iopt_pages_fill_from_mm(struct iopt_pages *pages,
-				   struct pfn_reader_user *user,
-				   unsigned long start_index,
-				   unsigned long last_index,
-				   struct page **out_pages)
+static int iopt_pages_fill(struct iopt_pages *pages,
+			   struct pfn_reader_user *user,
+			   unsigned long start_index,
+			   unsigned long last_index,
+			   struct page **out_pages)
 {
 	unsigned long cur_index = start_index;
 	int rc;
@@ -1803,8 +1803,8 @@ int iopt_pages_fill_xarray(struct iopt_pages *pages, unsigned long start_index,
 
 		/* hole */
 		cur_pages = out_pages + (span.start_hole - start_index);
-		rc = iopt_pages_fill_from_mm(pages, &user, span.start_hole,
-					     span.last_hole, cur_pages);
+		rc = iopt_pages_fill(pages, &user, span.start_hole,
+				     span.last_hole, cur_pages);
 		if (rc)
 			goto out_clean_xa;
 		rc = pages_to_xarray(&pages->pinned_pfns, span.start_hole,
@@ -1884,6 +1884,9 @@ static int iopt_pages_rw_page(struct iopt_pages *pages, unsigned long index,
 	struct page *page = NULL;
 	int rc;
 
+	if (WARN_ON(pages->type != IOPT_ADDRESS_USER))
+		return -EINVAL;
+
 	if (!mmget_not_zero(pages->source_mm))
 		return iopt_pages_rw_slow(pages, index, index, offset, data,
 					  length, flags);
@@ -1939,6 +1942,11 @@ int iopt_pages_rw_access(struct iopt_pages *pages, unsigned long start_byte,
 	if ((flags & IOMMUFD_ACCESS_RW_WRITE) && !pages->writable)
 		return -EPERM;
 
+	if (pages->type == IOPT_ADDRESS_FILE)
+		return iopt_pages_rw_slow(pages, start_index, last_index,
+					  start_byte % PAGE_SIZE, data, length,
+					  flags);
+
 	if (!(flags & IOMMUFD_ACCESS_RW_KTHREAD) && change_mm) {
 		if (start_index == last_index)
 			return iopt_pages_rw_page(pages, start_index,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 7/9] iommufd: pfn reader local variables
  2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
                   ` (5 preceding siblings ...)
  2024-09-24 15:05 ` [PATCH V2 6/9] iommufd: file mappings for mdev Steve Sistare
@ 2024-09-24 15:05 ` Steve Sistare
  2024-10-02 17:29   ` Steven Sistare
  2024-09-24 15:05 ` [PATCH V2 8/9] iommufd: optimize file mapping Steve Sistare
  2024-09-24 15:05 ` [PATCH V2 9/9] iommufd: map file selftest Steve Sistare
  8 siblings, 1 reply; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

Add local variables for common sub-expressions needed by a subsequent
patch.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/iommu/iommufd/pages.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 9d31df4..6e11189 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -1043,6 +1043,8 @@ static int pfn_reader_fill_span(struct pfn_reader *pfns)
 {
 	struct interval_tree_double_span_iter *span = &pfns->span;
 	unsigned long start_index = pfns->batch_end_index;
+	struct pfn_reader_user *user = &pfns->user;
+	unsigned long npages;
 	struct iopt_area *area;
 	int rc;
 
@@ -1080,10 +1082,9 @@ static int pfn_reader_fill_span(struct pfn_reader *pfns)
 			return rc;
 	}
 
-	batch_from_pages(&pfns->batch,
-			 pfns->user.upages +
-				 (start_index - pfns->user.upages_start),
-			 pfns->user.upages_end - start_index);
+	npages = user->upages_end - start_index;
+	start_index -= user->upages_start;
+	batch_from_pages(&pfns->batch, user->upages + start_index, npages);
 	return 0;
 }
 
@@ -1157,16 +1158,17 @@ static int pfn_reader_init(struct pfn_reader *pfns, struct iopt_pages *pages,
 static void pfn_reader_release_pins(struct pfn_reader *pfns)
 {
 	struct iopt_pages *pages = pfns->pages;
+	struct pfn_reader_user *user = &pfns->user;
+	unsigned long npages, start_index;
 
-	if (pfns->user.upages_end > pfns->batch_end_index) {
-		size_t npages = pfns->user.upages_end - pfns->batch_end_index;
-
+	if (user->upages_end > pfns->batch_end_index) {
 		/* Any pages not transferred to the batch are just unpinned */
-		unpin_user_pages(pfns->user.upages + (pfns->batch_end_index -
-						      pfns->user.upages_start),
-				 npages);
+
+		npages = user->upages_end - pfns->batch_end_index;
+		start_index = pfns->batch_end_index - user->upages_start;
+		unpin_user_pages(user->upages + start_index, npages);
 		iopt_pages_sub_npinned(pages, npages);
-		pfns->user.upages_end = pfns->batch_end_index;
+		user->upages_end = pfns->batch_end_index;
 	}
 	if (pfns->batch_start_index != pfns->batch_end_index) {
 		pfn_reader_unpin(pfns);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 8/9] iommufd: optimize file mapping
  2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
                   ` (6 preceding siblings ...)
  2024-09-24 15:05 ` [PATCH V2 7/9] iommufd: pfn reader local variables Steve Sistare
@ 2024-09-24 15:05 ` Steve Sistare
  2024-10-01 21:03   ` Jason Gunthorpe
  2024-09-24 15:05 ` [PATCH V2 9/9] iommufd: map file selftest Steve Sistare
  8 siblings, 1 reply; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

When filling a batch, avoid the intermediate step of expanding folios into
upages[] in pin_memfd_pages, via the new helper functions batch_from_folios
and batch_add_pfn_num.

However, we must still expand to upages for the iopt_pages_fill path.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/iommu/iommufd/pages.c | 129 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 113 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 6e11189..8aeccb3 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -347,26 +347,34 @@ static void batch_destroy(struct pfn_batch *batch, void *backup)
 }
 
 /* true if the pfn was added, false otherwise */
-static bool batch_add_pfn(struct pfn_batch *batch, unsigned long pfn)
+static bool batch_add_pfn_num(struct pfn_batch *batch, unsigned long pfn,
+			      unsigned long nr)
 {
 	const unsigned int MAX_NPFNS = type_max(typeof(*batch->npfns));
+	unsigned long max_npfns = MAX_NPFNS - nr;
 
 	if (batch->end &&
 	    pfn == batch->pfns[batch->end - 1] + batch->npfns[batch->end - 1] &&
-	    batch->npfns[batch->end - 1] != MAX_NPFNS) {
-		batch->npfns[batch->end - 1]++;
-		batch->total_pfns++;
+	    batch->npfns[batch->end - 1] <= max_npfns) {
+		batch->npfns[batch->end - 1] += nr;
+		batch->total_pfns += nr;
 		return true;
 	}
 	if (batch->end == batch->array_size)
 		return false;
-	batch->total_pfns++;
+	batch->total_pfns += nr;
 	batch->pfns[batch->end] = pfn;
-	batch->npfns[batch->end] = 1;
+	batch->npfns[batch->end] = nr;
 	batch->end++;
 	return true;
 }
 
+/* true if the pfn was added, false otherwise */
+static bool batch_add_pfn(struct pfn_batch *batch, unsigned long pfn)
+{
+	return batch_add_pfn_num(batch, pfn, 1);
+}
+
 /*
  * Fill the batch with pfns from the domain. When the batch is full, or it
  * reaches last_index, the function will return. The caller should use
@@ -622,6 +630,67 @@ static void batch_from_pages(struct pfn_batch *batch, struct page **pages,
 			break;
 }
 
+static void batch_from_folios(struct pfn_batch *batch, struct folio **folios,
+			      unsigned long offset, unsigned long npages)
+{
+	unsigned long nr, pfn, i = 0;
+	struct folio *folio;
+
+	while (npages) {
+		folio = folios[i++];
+		nr = folio_nr_pages(folio) - offset;
+		nr = min_t(unsigned long, nr, npages);
+		pfn = page_to_pfn(folio_page(folio, offset));
+		batch_add_pfn_num(batch, pfn, nr);
+		offset = 0;
+		npages -= nr;
+	}
+}
+
+/*
+ * Return the folio containing the page which is @start_index pages beyond
+ * page number @offset in folios[0].  Return the index of that page in
+ * @offset_out.
+ */
+static struct folio **folios_start(struct folio **folios, unsigned long offset,
+				   unsigned long start_index,
+				   unsigned long *offset_out)
+{
+	unsigned long nr = folio_nr_pages(*folios);
+
+	while (offset + start_index > nr) {
+		start_index -= (nr - offset);
+		offset = 0;
+		folios++;
+		nr = folio_nr_pages(*folios);
+	}
+
+	*offset_out = offset + start_index;
+	return folios;
+}
+
+static void folios_unpin_partial(struct folio **folios, unsigned long offset,
+				 unsigned long npages)
+{
+	unsigned long nr, j, i = 0;
+	struct folio *folio;
+
+	while (npages) {
+		folio = folios[i++];
+		nr = folio_nr_pages(folio);
+		if (offset == 0 && nr < npages) {
+			unpin_folio(folio);
+		} else {
+			nr = min_t(unsigned long, npages, nr - offset);
+			for (j = 0; j < nr; j++)
+				unpin_user_page(folio_page(folio, offset + j));
+			offset = 0;
+		}
+		npages -= nr;
+	}
+
+}
+
 static void batch_unpin(struct pfn_batch *batch, struct iopt_pages *pages,
 			unsigned int first_page_off, size_t npages)
 {
@@ -708,6 +777,8 @@ struct pfn_reader_user {
 	struct file *file;
 	struct folio **ufolios;
 	unsigned long ufolios_len;
+	unsigned long ufolios_offset;
+	bool ufolios_huge;
 };
 
 static void pfn_reader_user_init(struct pfn_reader_user *user,
@@ -725,6 +796,8 @@ static void pfn_reader_user_init(struct pfn_reader_user *user,
 	user->file = (pages->type == IOPT_ADDRESS_FILE) ? pages->file : NULL;
 	user->ufolios = NULL;
 	user->ufolios_len = 0;
+	user->ufolios_huge = 0;
+	user->ufolios_offset = 0;
 }
 
 static void pfn_reader_user_destroy(struct pfn_reader_user *user,
@@ -762,6 +835,7 @@ static long pin_memfd_pages(struct pfn_reader_user *user,
 		return nfolios;
 
 	offset >>= PAGE_SHIFT;
+	user->ufolios_offset = offset;
 	npages_out = 0;
 
 	for (i = 0; i < nfolios; i++) {
@@ -770,9 +844,11 @@ static long pin_memfd_pages(struct pfn_reader_user *user,
 		npin = min(nr - offset, npages);
 		if (nr > 1) {
 			folio_split_user_page_pin(folio, npin);
+			user->ufolios_huge = true;
 		}
-		for (j = offset; j < offset + npin; j++)
-			*upages++ = folio_page(folio, j);
+		if (upages)
+			for (j = offset; j < offset + npin; j++)
+				*upages++ = folio_page(folio, j);
 		npages -= npin;
 		npages_out += npin;
 		offset = 0;
@@ -796,7 +872,7 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
 	    WARN_ON(last_index < start_index))
 		return -EINVAL;
 
-	if (!user->upages) {
+	if (!user->file && !user->upages) {
 		/* All undone in pfn_reader_destroy() */
 		user->upages_len = npages * sizeof(*user->upages);
 		user->upages = temp_kmalloc(&user->upages_len, NULL, 0);
@@ -809,11 +885,6 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
 		user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
 		if (!user->ufolios)
 			return -ENOMEM;
-
-		/* Bail for now.  Be more robust when we optimize for folios. */
-		if (user->ufolios_len / sizeof(*user->ufolios) <
-		    user->upages_len / sizeof(*user->upages))
-			return -ENOMEM;
 	}
 
 	if (!user->file && user->locked == -1) {
@@ -1045,6 +1116,8 @@ static int pfn_reader_fill_span(struct pfn_reader *pfns)
 	unsigned long start_index = pfns->batch_end_index;
 	struct pfn_reader_user *user = &pfns->user;
 	unsigned long npages;
+	unsigned long offset;
+	struct folio **folios;
 	struct iopt_area *area;
 	int rc;
 
@@ -1084,7 +1157,18 @@ static int pfn_reader_fill_span(struct pfn_reader *pfns)
 
 	npages = user->upages_end - start_index;
 	start_index -= user->upages_start;
-	batch_from_pages(&pfns->batch, user->upages + start_index, npages);
+
+	if (!user->file) {
+		batch_from_pages(&pfns->batch, user->upages + start_index,
+				 npages);
+	} else if (!user->ufolios_huge) {
+		batch_from_folios(&pfns->batch, user->ufolios + start_index, 0,
+				  npages);
+	} else {
+		folios = folios_start(user->ufolios, user->ufolios_offset,
+				      start_index, &offset);
+		batch_from_folios(&pfns->batch, folios, offset, npages);
+	}
 	return 0;
 }
 
@@ -1160,13 +1244,26 @@ static void pfn_reader_release_pins(struct pfn_reader *pfns)
 	struct iopt_pages *pages = pfns->pages;
 	struct pfn_reader_user *user = &pfns->user;
 	unsigned long npages, start_index;
+	unsigned long offset;
+	struct folio **folios;
 
 	if (user->upages_end > pfns->batch_end_index) {
 		/* Any pages not transferred to the batch are just unpinned */
 
 		npages = user->upages_end - pfns->batch_end_index;
 		start_index = pfns->batch_end_index - user->upages_start;
-		unpin_user_pages(user->upages + start_index, npages);
+
+		if (!user->file) {
+			unpin_user_pages(user->upages + start_index, npages);
+		} else if (!user->ufolios_huge) {
+			unpin_folios(user->ufolios + start_index, npages);
+		} else {
+			folios = folios_start(user->ufolios,
+					      user->ufolios_offset,
+					      start_index, &offset);
+			folios_unpin_partial(folios, offset, npages);
+		}
+
 		iopt_pages_sub_npinned(pages, npages);
 		user->upages_end = pfns->batch_end_index;
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH V2 9/9] iommufd: map file selftest
  2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
                   ` (7 preceding siblings ...)
  2024-09-24 15:05 ` [PATCH V2 8/9] iommufd: optimize file mapping Steve Sistare
@ 2024-09-24 15:05 ` Steve Sistare
  2024-10-01 21:05   ` Jason Gunthorpe
  8 siblings, 1 reply; 38+ messages in thread
From: Steve Sistare @ 2024-09-24 15:05 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare

Add test cases to exercise IOMMU_IOAS_MAP_FILE.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 tools/testing/selftests/iommu/iommufd.c       | 156 +++++++++++++++++++++++---
 tools/testing/selftests/iommu/iommufd_utils.h |  41 +++++++
 2 files changed, 182 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 4927b9a..b1a3acc 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -3,6 +3,8 @@
 #include <stdlib.h>
 #include <sys/mman.h>
 #include <sys/eventfd.h>
+#include <sys/syscall.h>
+#include <asm/unistd.h>
 
 #define __EXPORTED_HEADERS__
 #include <linux/vfio.h>
@@ -10,6 +12,8 @@
 #include "iommufd_utils.h"
 
 static unsigned long HUGEPAGE_SIZE;
+static void *mfd_buffer;
+static int mfd;
 
 #define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
 #define MOCK_HUGE_PAGE_SIZE (512 * MOCK_PAGE_SIZE)
@@ -33,6 +37,24 @@ static unsigned long get_huge_page_size(void)
 	return strtoul(buf, NULL, 10);
 }
 
+static int memfd_create(const char *name, unsigned int flags)
+{
+	return syscall(__NR_memfd_create, name, flags);
+}
+
+static void *memfd_mmap(size_t length, int prot, int flags, int *mfd_p)
+{
+	int mfd_flags = (flags & MAP_HUGETLB) ? MFD_HUGETLB : 0;
+	int mfd = memfd_create("buffer", mfd_flags);
+
+	if (mfd <= 0)
+		return MAP_FAILED;
+	if (ftruncate(mfd, length))
+		return MAP_FAILED;
+	*mfd_p = mfd;
+	return mmap(0, length, prot, flags, mfd, 0);
+}
+
 static __attribute__((constructor)) void setup_sizes(void)
 {
 	void *vrc;
@@ -49,6 +71,9 @@ static __attribute__((constructor)) void setup_sizes(void)
 	vrc = mmap(buffer, BUFFER_SIZE, PROT_READ | PROT_WRITE,
 		   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
 	assert(vrc == buffer);
+
+	mfd_buffer = memfd_mmap(BUFFER_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
+				&mfd);
 }
 
 FIXTURE(iommufd)
@@ -1372,6 +1397,7 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 {
 	unsigned int mock_domains;
 	bool hugepages;
+	bool file;
 };
 
 FIXTURE_SETUP(iommufd_mock_domain)
@@ -1410,26 +1436,45 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 {
 	.mock_domains = 1,
 	.hugepages = false,
+	.file = false,
 };
 
 FIXTURE_VARIANT_ADD(iommufd_mock_domain, two_domains)
 {
 	.mock_domains = 2,
 	.hugepages = false,
+	.file = false,
 };
 
 FIXTURE_VARIANT_ADD(iommufd_mock_domain, one_domain_hugepage)
 {
 	.mock_domains = 1,
 	.hugepages = true,
+	.file = false,
 };
 
 FIXTURE_VARIANT_ADD(iommufd_mock_domain, two_domains_hugepage)
 {
 	.mock_domains = 2,
 	.hugepages = true,
+	.file = false,
 };
 
+FIXTURE_VARIANT_ADD(iommufd_mock_domain, one_domain_file)
+{
+	.mock_domains = 1,
+	.hugepages = false,
+	.file = true,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_mock_domain, one_domain_file_hugepage)
+{
+	.mock_domains = 1,
+	.hugepages = true,
+	.file = true,
+};
+
+
 /* Have the kernel check that the user pages made it to the iommu_domain */
 #define check_mock_iova(_ptr, _iova, _length)                                \
 	({                                                                   \
@@ -1455,7 +1500,10 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 		}                                                            \
 	})
 
-TEST_F(iommufd_mock_domain, basic)
+static void test_basic_mmap(
+	struct __test_metadata *_metadata,
+	struct _test_data_iommufd_mock_domain *self,
+	const struct _fixture_variant_iommufd_mock_domain *variant)
 {
 	size_t buf_size = self->mmap_buf_size;
 	uint8_t *buf;
@@ -1478,12 +1526,52 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 	test_err_ioctl_ioas_map(EFAULT, buf, buf_size, &iova);
 }
 
+static void test_basic_file(
+	struct __test_metadata *_metadata,
+	struct _test_data_iommufd_mock_domain *self,
+	const struct _fixture_variant_iommufd_mock_domain *variant)
+{
+	size_t buf_size = self->mmap_buf_size;
+	uint8_t *buf;
+	__u64 iova;
+	int mfd_tmp;
+	int prot = PROT_READ | PROT_WRITE;
+
+	/* Simple one page map */
+	test_ioctl_ioas_map_file(mfd, 0, PAGE_SIZE, &iova);
+	check_mock_iova(mfd_buffer, iova, PAGE_SIZE);
+
+	buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd_tmp);
+	ASSERT_NE(MAP_FAILED, buf);
+
+	/* EFAULT half way through mapping */
+	ASSERT_EQ(0, munmap(buf + buf_size / 2, buf_size / 2));
+	test_err_ioctl_ioas_map_file(EFAULT, 0, buf_size, &iova);
+
+	/* EFAULT on first page */
+	ASSERT_EQ(0, munmap(buf, buf_size / 2));
+	test_err_ioctl_ioas_map_file(EFAULT, 0, buf_size, &iova);
+
+	close(mfd_tmp);
+}
+
+TEST_F(iommufd_mock_domain, basic)
+{
+	if (variant->file)
+		test_basic_file(_metadata, self, variant);
+	else
+		test_basic_mmap(_metadata, self, variant);
+}
+
 TEST_F(iommufd_mock_domain, ro_unshare)
 {
 	uint8_t *buf;
 	__u64 iova;
 	int fd;
 
+	if (variant->file)
+		SKIP(return, "redundant test");
+
 	fd = open("/proc/self/exe", O_RDONLY);
 	ASSERT_NE(-1, fd);
 
@@ -1513,9 +1601,13 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 	unsigned int start;
 	unsigned int end;
 	uint8_t *buf;
+	int prot = PROT_READ | PROT_WRITE;
+	int mfd;
 
-	buf = mmap(0, buf_size, PROT_READ | PROT_WRITE, self->mmap_flags, -1,
-		   0);
+	if (variant->file)
+		buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd);
+	else
+		buf = mmap(0, buf_size, prot, self->mmap_flags, -1, 0);
 	ASSERT_NE(MAP_FAILED, buf);
 	check_refs(buf, buf_size, 0);
 
@@ -1532,7 +1624,12 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 			size_t length = end - start;
 			__u64 iova;
 
-			test_ioctl_ioas_map(buf + start, length, &iova);
+			if (variant->file) {
+				test_ioctl_ioas_map_file(mfd, start, length,
+							 &iova);
+			} else {
+				test_ioctl_ioas_map(buf + start, length, &iova);
+			}
 			check_mock_iova(buf + start, iova, length);
 			check_refs(buf + start / PAGE_SIZE * PAGE_SIZE,
 				   end / PAGE_SIZE * PAGE_SIZE -
@@ -1544,6 +1641,8 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 	}
 	check_refs(buf, buf_size, 0);
 	ASSERT_EQ(0, munmap(buf, buf_size));
+	if (variant->file)
+		close(mfd);
 }
 
 TEST_F(iommufd_mock_domain, all_aligns_copy)
@@ -1554,9 +1653,13 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 	unsigned int start;
 	unsigned int end;
 	uint8_t *buf;
+	int prot = PROT_READ | PROT_WRITE;
+	int mfd;
 
-	buf = mmap(0, buf_size, PROT_READ | PROT_WRITE, self->mmap_flags, -1,
-		   0);
+	if (variant->file)
+		buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd);
+	else
+		buf = mmap(0, buf_size, prot, self->mmap_flags, -1, 0);
 	ASSERT_NE(MAP_FAILED, buf);
 	check_refs(buf, buf_size, 0);
 
@@ -1575,7 +1678,12 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 			uint32_t mock_stdev_id;
 			__u64 iova;
 
-			test_ioctl_ioas_map(buf + start, length, &iova);
+			if (variant->file) {
+				test_ioctl_ioas_map_file(mfd, start, length,
+							 &iova);
+			} else {
+				test_ioctl_ioas_map(buf + start, length, &iova);
+			}
 
 			/* Add and destroy a domain while the area exists */
 			old_id = self->hwpt_ids[1];
@@ -1596,15 +1704,18 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 	}
 	check_refs(buf, buf_size, 0);
 	ASSERT_EQ(0, munmap(buf, buf_size));
+	if (variant->file)
+		close(mfd);
 }
 
 TEST_F(iommufd_mock_domain, user_copy)
 {
+	void *buf = variant->file ? mfd_buffer : buffer;
 	struct iommu_test_cmd access_cmd = {
 		.size = sizeof(access_cmd),
 		.op = IOMMU_TEST_OP_ACCESS_PAGES,
 		.access_pages = { .length = BUFFER_SIZE,
-				  .uptr = (uintptr_t)buffer },
+				  .uptr = (uintptr_t)buf },
 	};
 	struct iommu_ioas_copy copy_cmd = {
 		.size = sizeof(copy_cmd),
@@ -1623,9 +1734,13 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 
 	/* Pin the pages in an IOAS with no domains then copy to an IOAS with domains */
 	test_ioctl_ioas_alloc(&ioas_id);
-	test_ioctl_ioas_map_id(ioas_id, buffer, BUFFER_SIZE,
-			       &copy_cmd.src_iova);
-
+	if (variant->file) {
+		test_ioctl_ioas_map_id_file(ioas_id, mfd, 0, BUFFER_SIZE,
+					    &copy_cmd.src_iova);
+	} else {
+		test_ioctl_ioas_map_id(ioas_id, buf, BUFFER_SIZE,
+				       &copy_cmd.src_iova);
+	}
 	test_cmd_create_access(ioas_id, &access_cmd.id,
 			       MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES);
 
@@ -1635,12 +1750,17 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 			&access_cmd));
 	copy_cmd.src_ioas_id = ioas_id;
 	ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, &copy_cmd));
-	check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE);
+	check_mock_iova(buf, MOCK_APERTURE_START, BUFFER_SIZE);
 
 	/* Now replace the ioas with a new one */
 	test_ioctl_ioas_alloc(&new_ioas_id);
-	test_ioctl_ioas_map_id(new_ioas_id, buffer, BUFFER_SIZE,
-			       &copy_cmd.src_iova);
+	if (variant->file) {
+		test_ioctl_ioas_map_id_file(new_ioas_id, mfd, 0, BUFFER_SIZE,
+					    &copy_cmd.src_iova);
+	} else {
+		test_ioctl_ioas_map_id(new_ioas_id, buf, BUFFER_SIZE,
+				       &copy_cmd.src_iova);
+	}
 	test_cmd_access_replace_ioas(access_cmd.id, new_ioas_id);
 
 	/* Destroy the old ioas and cleanup copied mapping */
@@ -1654,7 +1774,7 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 			&access_cmd));
 	copy_cmd.src_ioas_id = new_ioas_id;
 	ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, &copy_cmd));
-	check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE);
+	check_mock_iova(buf, MOCK_APERTURE_START, BUFFER_SIZE);
 
 	test_cmd_destroy_access_pages(
 		access_cmd.id, access_cmd.access_pages.out_access_pages_id);
@@ -1667,6 +1787,9 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 {
 	uint32_t ioas_id;
 
+	if (variant->file)
+		SKIP(return, "redundant test");
+
 	test_ioctl_ioas_alloc(&ioas_id);
 
 	test_cmd_mock_domain_replace(self->stdev_ids[0], ioas_id);
@@ -1697,6 +1820,9 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
 {
 	int i;
 
+	if (variant->file)
+		SKIP(return, "redundant test");
+
 	for (i = 0; i != variant->mock_domains; i++) {
 		uint32_t hwpt_id[2];
 		uint32_t stddev_id;
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 40f6f14..7bc664e 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -589,6 +589,47 @@ static int _test_ioctl_ioas_unmap(int fd, unsigned int ioas_id, uint64_t iova,
 	EXPECT_ERRNO(_errno, _test_ioctl_ioas_unmap(self->fd, self->ioas_id, \
 						    iova, length, NULL))
 
+static int _test_ioctl_ioas_map_file(int fd, unsigned int ioas_id, int mfd,
+				     size_t start, size_t length, __u64 *iova,
+				     unsigned int flags)
+{
+	struct iommu_ioas_map_file cmd = {
+		.size = sizeof(cmd),
+		.flags = flags,
+		.ioas_id = ioas_id,
+		.fd = mfd,
+		.start = start,
+		.length = length,
+	};
+	int ret;
+
+	if (flags & IOMMU_IOAS_MAP_FIXED_IOVA)
+		cmd.iova = *iova;
+
+	ret = ioctl(fd, IOMMU_IOAS_MAP_FILE, &cmd);
+	*iova = cmd.iova;
+	return ret;
+}
+
+#define test_ioctl_ioas_map_file(mfd, start, length, iova_p)                 \
+	ASSERT_EQ(0, _test_ioctl_ioas_map_file(self->fd, self->ioas_id, mfd, \
+					       start, length, iova_p,        \
+					       IOMMU_IOAS_MAP_WRITEABLE |    \
+						   IOMMU_IOAS_MAP_READABLE))
+
+#define test_err_ioctl_ioas_map_file(_errno, start, length, iova_p)	     \
+	EXPECT_ERRNO(_errno,                                                 \
+		     _test_ioctl_ioas_map_file(self->fd, self->ioas_id, mfd, \
+					       start, length, iova_p,        \
+					       IOMMU_IOAS_MAP_WRITEABLE |    \
+						   IOMMU_IOAS_MAP_READABLE))
+
+#define test_ioctl_ioas_map_id_file(ioas_id, mfd, start, length, iova_p)     \
+	ASSERT_EQ(0, _test_ioctl_ioas_map_file(self->fd, ioas_id, mfd,       \
+					  start, length, iova_p,             \
+					  IOMMU_IOAS_MAP_WRITEABLE |         \
+						  IOMMU_IOAS_MAP_READABLE))
+
 static int _test_ioctl_set_temp_memory_limit(int fd, unsigned int limit)
 {
 	struct iommu_test_cmd memlimit_cmd = {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 2/9] iommufd: generalize iopt_pages address
  2024-09-24 15:05 ` [PATCH V2 2/9] iommufd: generalize iopt_pages address Steve Sistare
@ 2024-09-24 20:31   ` Nicolin Chen
  2024-09-25 22:43     ` Jason Gunthorpe
  2024-10-01 19:54   ` Jason Gunthorpe
  1 sibling, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2024-09-24 20:31 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Jason Gunthorpe, Kevin Tian

Hi Steve,

Some nitpickings.

On Tue, Sep 24, 2024 at 08:05:31AM -0700, Steve Sistare wrote:
> +static int iopt_map_common(struct iommufd_ctx *ictx,
> +			     struct io_pagetable *iopt,
> +                          struct iopt_pages *pages,
> +                          unsigned long *iova,
> +                          unsigned long length,
> +                          unsigned long start_byte,
> +                          int iommu_prot,
> +                          unsigned int flags)

I think we should follow the existing coding style by combining
lines that could fit within 80 characters each.

>  /**
>   * iopt_map_user_pages() - Map a user VA to an iova in the io page table
>   * @ictx: iommufd_ctx the iopt is part of
> @@ -399,29 +431,15 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
>                         unsigned long length, int iommu_prot,
>                         unsigned int flags)
>  {
[...]
> -       return 0;
> +       return iopt_map_common(ictx, iopt, pages, iova, length,
> +                              uptr - pages->uptr,
> +                              iommu_prot, flags);

The last line could fit into the previous one.

> +enum iopt_address_type {
> +       IOPT_ADDRESS_USER = 0,
> +};
> +
>  /*
>   * This holds a pinned page list for multiple areas of IO address space. The
>   * pages always originate from a linear chunk of userspace VA. Multiple
> @@ -195,7 +199,10 @@ struct iopt_pages {
>         struct task_struct *source_task;
>         struct mm_struct *source_mm;
>         struct user_struct *source_user;
> -       void __user *uptr;
> +       enum iopt_address_type type;
> +       union {
> +               void __user *uptr;              /* IOPT_ADDRESS_USER */
> +       };
>         bool writable:1;
>         u8 account_mode;

We only have one line in the kdoc mentioning the "address" word.
So, I personally prefer "iopt_pages_type" and "IOPT_PAGES_USER",
matching with existing functions iopt_alloc/map_user_pages also.

> +struct iopt_pages *iopt_alloc_user_pages(void __user *uptr,
> +                                        unsigned long length, bool writable)
> +{
> +       struct iopt_pages *pages;
> +       unsigned long end;
> +       void __user *uptr_down = (void __user *) ALIGN_DOWN((uintptr_t)uptr, PAGE_SIZE);

+	void __user *uptr_aligned =
+		(void __user *) ALIGN_DOWN((uintptr_t)uptr, PAGE_SIZE);
+       struct iopt_pages *pages;
+       unsigned long end;

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 3/9] iommufd: pfn reader for file mappings
  2024-09-24 15:05 ` [PATCH V2 3/9] iommufd: pfn reader for file mappings Steve Sistare
@ 2024-09-24 20:53   ` Nicolin Chen
  2024-09-26 15:59     ` Steven Sistare
  2024-09-25  1:32   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2024-09-24 20:53 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Jason Gunthorpe, Kevin Tian

On Tue, Sep 24, 2024 at 08:05:32AM -0700, Steve Sistare wrote:

> diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
> index 69822d4..df5ba4f 100644
> --- a/drivers/iommu/iommufd/pages.c
> +++ b/drivers/iommu/iommufd/pages.c
> @@ -703,19 +703,28 @@ struct pfn_reader_user {
>          * neither
>          */
>         int locked;
> +
> +       /* The following are only valid if file != NULL. */
> +       struct file *file;
> +       struct folio **ufolios;
> +       unsigned long ufolios_len;
>  };

I am not 100% sure. Yet, given the comments at pfn_reader_user:
	/* pfn_reader_user is just the pin_user_pages() path */
which FILE doesn't use, I think we might need a pfn_reader_file
with own set of helpers:
	pfn_reader_file_init
	pfn_reader_file_destroy
	pfn_reader_file_pin
Then their callers should switch-case pages->type to select the
corresponding type of pfn_reader functions.

Maybe USER and FILE don't diff that much. But considering we'll
next add a PHYS type and none of them makes sense to PHYS, there
could be another layer of cleanup at these pfn_reader functions,
v.s. stuffing all FILE-related lines into pfn_reader_user.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation
  2024-09-24 15:05 ` [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation Steve Sistare
@ 2024-09-24 21:08   ` Nicolin Chen
  2024-09-26 15:50     ` Steven Sistare
  2024-10-01 20:05   ` Jason Gunthorpe
  1 sibling, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2024-09-24 21:08 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Jason Gunthorpe, Kevin Tian

On Tue, Sep 24, 2024 at 08:05:34AM -0700, Steve Sistare wrote:
> diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
> index a0a66f1..1e14a0f 100644
> --- a/drivers/iommu/iommufd/io_pagetable.c
> +++ b/drivers/iommu/iommufd/io_pagetable.c
> @@ -259,7 +259,10 @@ static int iopt_alloc_area_pages(struct io_pagetable *iopt,
>                 /* Use the first entry to guess the ideal IOVA alignment */
>                 elm = list_first_entry(pages_list, struct iopt_pages_list,
>                                        next);
> -               start = elm->start_byte + (uintptr_t)elm->pages->uptr;
> +               if (elm->pages->type == IOPT_ADDRESS_FILE)
> +                       start = elm->start_byte + elm->pages->start;
> +               else
> +                       start = elm->start_byte + (uintptr_t)elm->pages->uptr;

	switch (elm->pages->type) {
	case IOPT_ADDRESS_USER: ...
	case IOPT_ADDRESS_FILE: ...
	}

Would make it easier for us to add other types.
 
> +/**
> + * iopt_map_file_pages() - Like iopt_map_user_pages, but map a file.

Could we just write "Map a user FILE to an..."? It's not that long :)

>  int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd)
>  {
[...]
> +       rc = iopt_map_file_pages(ucmd->ictx, &ioas->iopt, &iova,
> +                                file, cmd->start,
> +                                cmd->length,

We could merge cmd->length to the previous line.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 3/9] iommufd: pfn reader for file mappings
  2024-09-24 15:05 ` [PATCH V2 3/9] iommufd: pfn reader for file mappings Steve Sistare
  2024-09-24 20:53   ` Nicolin Chen
@ 2024-09-25  1:32   ` kernel test robot
  2024-09-25  1:55   ` kernel test robot
  2024-09-26 14:10   ` Steven Sistare
  3 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2024-09-25  1:32 UTC (permalink / raw)
  To: Steve Sistare, iommu
  Cc: oe-kbuild-all, Jason Gunthorpe, Kevin Tian, Nicolin Chen,
	Steve Sistare

Hi Steve,

kernel test robot noticed the following build errors:

[auto build test ERROR on shuah-kselftest/next]
[also build test ERROR on shuah-kselftest/fixes linus/master v6.11 next-20240924]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steve-Sistare/iommufd-rename-uptr-in-iopt_alloc_iova/20240924-231223
base:   https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
patch link:    https://lore.kernel.org/r/1727190338-385692-4-git-send-email-steven.sistare%40oracle.com
patch subject: [PATCH V2 3/9] iommufd: pfn reader for file mappings
config: arc-randconfig-001-20240925 (https://download.01.org/0day-ci/archive/20240925/202409250945.31YjGALI-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240925/202409250945.31YjGALI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409250945.31YjGALI-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/iommu/iommufd/pages.c: In function 'pin_memfd_pages':
   drivers/iommu/iommufd/pages.c:772:25: error: implicit declaration of function 'folio_split_user_page_pin' [-Werror=implicit-function-declaration]
     772 |                         folio_split_user_page_pin(folio, npin);
         |                         ^~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/iommu/iommufd/pages.c: In function 'pfn_reader_user_pin':
>> drivers/iommu/iommufd/pages.c:809:46: error: passing argument 1 of 'temp_kmalloc' from incompatible pointer type [-Werror=incompatible-pointer-types]
     809 |                 user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
         |                                              ^~~~~~~~~~~~~~~~~~
         |                                              |
         |                                              long unsigned int *
   drivers/iommu/iommufd/pages.c:74:35: note: expected 'size_t *' {aka 'unsigned int *'} but argument is of type 'long unsigned int *'
      74 | static void *temp_kmalloc(size_t *size, void *backup, size_t backup_len)
         |                           ~~~~~~~~^~~~
   cc1: some warnings being treated as errors


vim +/temp_kmalloc +809 drivers/iommu/iommufd/pages.c

   783	
   784	static int pfn_reader_user_pin(struct pfn_reader_user *user,
   785				       struct iopt_pages *pages,
   786				       unsigned long start_index,
   787				       unsigned long last_index)
   788	{
   789		bool remote_mm = pages->source_mm != current->mm;
   790		unsigned long npages = last_index - start_index + 1;
   791		unsigned long start, unum;
   792		uintptr_t uptr;
   793		long rc;
   794	
   795		if (IS_ENABLED(CONFIG_IOMMUFD_TEST) &&
   796		    WARN_ON(last_index < start_index))
   797			return -EINVAL;
   798	
   799		if (!user->upages) {
   800			/* All undone in pfn_reader_destroy() */
   801			user->upages_len = npages * sizeof(*user->upages);
   802			user->upages = temp_kmalloc(&user->upages_len, NULL, 0);
   803			if (!user->upages)
   804				return -ENOMEM;
   805		}
   806	
   807		if (user->file && !user->ufolios) {
   808			user->ufolios_len = npages * sizeof(*user->ufolios);
 > 809			user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
   810			if (!user->ufolios)
   811				return -ENOMEM;
   812	
   813			/* Bail for now.  Be more robust when we optimize for folios. */
   814			if (user->ufolios_len / sizeof(*user->ufolios) <
   815			    user->upages_len / sizeof(*user->upages))
   816				return -ENOMEM;
   817		}
   818	
   819		if (!user->file && user->locked == -1) {
   820			/*
   821			 * The majority of usages will run the map task within the mm
   822			 * providing the pages, so we can optimize into
   823			 * get_user_pages_fast()
   824			 */
   825			if (remote_mm) {
   826				if (!mmget_not_zero(pages->source_mm))
   827					return -EFAULT;
   828			}
   829			user->locked = 0;
   830		}
   831	
   832		unum = user->file ? user->ufolios_len / sizeof(*user->ufolios) :
   833				    user->upages_len / sizeof(*user->upages);
   834		npages = min_t(unsigned long, npages, unum);
   835	
   836		if (iommufd_should_fail())
   837			return -EFAULT;
   838	
   839		if (user->file) {
   840			start = pages->start + (start_index * PAGE_SIZE);
   841			rc = pin_memfd_pages(user, start, npages);
   842		} else if (!remote_mm) {
   843			uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
   844			rc = pin_user_pages_fast(uptr, npages, user->gup_flags,
   845						 user->upages);
   846		} else {
   847			uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
   848			if (!user->locked) {
   849				mmap_read_lock(pages->source_mm);
   850				user->locked = 1;
   851			}
   852			rc = pin_user_pages_remote(pages->source_mm, uptr, npages,
   853						   user->gup_flags, user->upages,
   854						   &user->locked);
   855		}
   856		if (rc <= 0) {
   857			if (WARN_ON(!rc))
   858				return -EFAULT;
   859			return rc;
   860		}
   861		iopt_pages_add_npinned(pages, rc);
   862		user->upages_start = start_index;
   863		user->upages_end = start_index + rc;
   864		return 0;
   865	}
   866	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 3/9] iommufd: pfn reader for file mappings
  2024-09-24 15:05 ` [PATCH V2 3/9] iommufd: pfn reader for file mappings Steve Sistare
  2024-09-24 20:53   ` Nicolin Chen
  2024-09-25  1:32   ` kernel test robot
@ 2024-09-25  1:55   ` kernel test robot
  2024-09-26 14:10   ` Steven Sistare
  3 siblings, 0 replies; 38+ messages in thread
From: kernel test robot @ 2024-09-25  1:55 UTC (permalink / raw)
  To: Steve Sistare, iommu
  Cc: llvm, oe-kbuild-all, Jason Gunthorpe, Kevin Tian, Nicolin Chen,
	Steve Sistare

Hi Steve,

kernel test robot noticed the following build errors:

[auto build test ERROR on shuah-kselftest/next]
[also build test ERROR on shuah-kselftest/fixes linus/master v6.11 next-20240924]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steve-Sistare/iommufd-rename-uptr-in-iopt_alloc_iova/20240924-231223
base:   https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
patch link:    https://lore.kernel.org/r/1727190338-385692-4-git-send-email-steven.sistare%40oracle.com
patch subject: [PATCH V2 3/9] iommufd: pfn reader for file mappings
config: i386-buildonly-randconfig-001-20240925 (https://download.01.org/0day-ci/archive/20240925/202409250951.qvaDPgeE-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240925/202409250951.qvaDPgeE-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409250951.qvaDPgeE-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/iommu/iommufd/pages.c:772:4: error: call to undeclared function 'folio_split_user_page_pin'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     772 |                         folio_split_user_page_pin(folio, npin);
         |                         ^
>> drivers/iommu/iommufd/pages.c:809:32: error: incompatible pointer types passing 'unsigned long *' to parameter of type 'size_t *' (aka 'unsigned int *') [-Werror,-Wincompatible-pointer-types]
     809 |                 user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
         |                                              ^~~~~~~~~~~~~~~~~~
   drivers/iommu/iommufd/pages.c:74:35: note: passing argument to parameter 'size' here
      74 | static void *temp_kmalloc(size_t *size, void *backup, size_t backup_len)
         |                                   ^
   2 errors generated.


vim +809 drivers/iommu/iommufd/pages.c

   746	
   747	static long pin_memfd_pages(struct pfn_reader_user *user,
   748				    unsigned long start,
   749				    unsigned long npages)
   750	{
   751		unsigned long end, nr, i, j, npin, offset, npages_out;
   752		long nfolios;
   753		struct folio *folio;
   754		struct page **upages = user->upages;
   755	
   756		nfolios = user->ufolios_len / sizeof(*user->ufolios);
   757		end = start + (npages << PAGE_SHIFT) - 1;
   758	
   759		nfolios = memfd_pin_folios(user->file, start, end,
   760					   user->ufolios, nfolios, &offset);
   761		if (nfolios <= 0)
   762			return nfolios;
   763	
   764		offset >>= PAGE_SHIFT;
   765		npages_out = 0;
   766	
   767		for (i = 0; i < nfolios; i++) {
   768			folio = user->ufolios[i];
   769			nr = folio_nr_pages(folio);
   770			npin = min(nr - offset, npages);
   771			if (nr > 1) {
 > 772				folio_split_user_page_pin(folio, npin);
   773			}
   774			for (j = offset; j < offset + npin; j++)
   775				*upages++ = folio_page(folio, j);
   776			npages -= npin;
   777			npages_out += npin;
   778			offset = 0;
   779		}
   780	
   781		return npages_out;
   782	}
   783	
   784	static int pfn_reader_user_pin(struct pfn_reader_user *user,
   785				       struct iopt_pages *pages,
   786				       unsigned long start_index,
   787				       unsigned long last_index)
   788	{
   789		bool remote_mm = pages->source_mm != current->mm;
   790		unsigned long npages = last_index - start_index + 1;
   791		unsigned long start, unum;
   792		uintptr_t uptr;
   793		long rc;
   794	
   795		if (IS_ENABLED(CONFIG_IOMMUFD_TEST) &&
   796		    WARN_ON(last_index < start_index))
   797			return -EINVAL;
   798	
   799		if (!user->upages) {
   800			/* All undone in pfn_reader_destroy() */
   801			user->upages_len = npages * sizeof(*user->upages);
   802			user->upages = temp_kmalloc(&user->upages_len, NULL, 0);
   803			if (!user->upages)
   804				return -ENOMEM;
   805		}
   806	
   807		if (user->file && !user->ufolios) {
   808			user->ufolios_len = npages * sizeof(*user->ufolios);
 > 809			user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
   810			if (!user->ufolios)
   811				return -ENOMEM;
   812	
   813			/* Bail for now.  Be more robust when we optimize for folios. */
   814			if (user->ufolios_len / sizeof(*user->ufolios) <
   815			    user->upages_len / sizeof(*user->upages))
   816				return -ENOMEM;
   817		}
   818	
   819		if (!user->file && user->locked == -1) {
   820			/*
   821			 * The majority of usages will run the map task within the mm
   822			 * providing the pages, so we can optimize into
   823			 * get_user_pages_fast()
   824			 */
   825			if (remote_mm) {
   826				if (!mmget_not_zero(pages->source_mm))
   827					return -EFAULT;
   828			}
   829			user->locked = 0;
   830		}
   831	
   832		unum = user->file ? user->ufolios_len / sizeof(*user->ufolios) :
   833				    user->upages_len / sizeof(*user->upages);
   834		npages = min_t(unsigned long, npages, unum);
   835	
   836		if (iommufd_should_fail())
   837			return -EFAULT;
   838	
   839		if (user->file) {
   840			start = pages->start + (start_index * PAGE_SIZE);
   841			rc = pin_memfd_pages(user, start, npages);
   842		} else if (!remote_mm) {
   843			uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
   844			rc = pin_user_pages_fast(uptr, npages, user->gup_flags,
   845						 user->upages);
   846		} else {
   847			uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
   848			if (!user->locked) {
   849				mmap_read_lock(pages->source_mm);
   850				user->locked = 1;
   851			}
   852			rc = pin_user_pages_remote(pages->source_mm, uptr, npages,
   853						   user->gup_flags, user->upages,
   854						   &user->locked);
   855		}
   856		if (rc <= 0) {
   857			if (WARN_ON(!rc))
   858				return -EFAULT;
   859			return rc;
   860		}
   861		iopt_pages_add_npinned(pages, rc);
   862		user->upages_start = start_index;
   863		user->upages_end = start_index + rc;
   864		return 0;
   865	}
   866	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 2/9] iommufd: generalize iopt_pages address
  2024-09-24 20:31   ` Nicolin Chen
@ 2024-09-25 22:43     ` Jason Gunthorpe
  2024-09-25 23:04       ` Nicolin Chen
  0 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2024-09-25 22:43 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: Steve Sistare, iommu, Kevin Tian

On Tue, Sep 24, 2024 at 01:31:37PM -0700, Nicolin Chen wrote:
> On Tue, Sep 24, 2024 at 08:05:31AM -0700, Steve Sistare wrote:
> > +static int iopt_map_common(struct iommufd_ctx *ictx,
> > +			     struct io_pagetable *iopt,
> > +                          struct iopt_pages *pages,
> > +                          unsigned long *iova,
> > +                          unsigned long length,
> > +                          unsigned long start_byte,
> > +                          int iommu_prot,
> > +                          unsigned int flags)
> 
> I think we should follow the existing coding style by combining
> lines that could fit within 80 characters each.

To make life easy the "coding style" is mostly whatever clang-format
spits out. "git clang-format HEAD^!" for instance will adjust the
current patch.

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 2/9] iommufd: generalize iopt_pages address
  2024-09-25 22:43     ` Jason Gunthorpe
@ 2024-09-25 23:04       ` Nicolin Chen
  2024-09-26 14:48         ` Jason Gunthorpe
  0 siblings, 1 reply; 38+ messages in thread
From: Nicolin Chen @ 2024-09-25 23:04 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Steve Sistare, iommu, Kevin Tian

On Wed, Sep 25, 2024 at 07:43:22PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 24, 2024 at 01:31:37PM -0700, Nicolin Chen wrote:
> > On Tue, Sep 24, 2024 at 08:05:31AM -0700, Steve Sistare wrote:
> > > +static int iopt_map_common(struct iommufd_ctx *ictx,
> > > +			     struct io_pagetable *iopt,
> > > +                          struct iopt_pages *pages,
> > > +                          unsigned long *iova,
> > > +                          unsigned long length,
> > > +                          unsigned long start_byte,
> > > +                          int iommu_prot,
> > > +                          unsigned int flags)
> > 
> > I think we should follow the existing coding style by combining
> > lines that could fit within 80 characters each.
> 
> To make life easy the "coding style" is mostly whatever clang-format
> spits out. "git clang-format HEAD^!" for instance will adjust the
> current patch.

I see. That's useful. What's that exclamation mark at the end for?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 3/9] iommufd: pfn reader for file mappings
  2024-09-24 15:05 ` [PATCH V2 3/9] iommufd: pfn reader for file mappings Steve Sistare
                     ` (2 preceding siblings ...)
  2024-09-25  1:55   ` kernel test robot
@ 2024-09-26 14:10   ` Steven Sistare
  3 siblings, 0 replies; 38+ messages in thread
From: Steven Sistare @ 2024-09-26 14:10 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen



On 9/24/2024 11:05 AM, Steve Sistare wrote:
> Extend pfn_reader_user to pin file mappings, by calling memfd_pin_folios.
> Repin at small page granularity and unpack pages into upages[] to mesh
> with the existing code paths.  This is sub-optimal but simple, and will be
> optimized in a subsequent patch.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>   drivers/iommu/iommufd/io_pagetable.h |  5 +++
>   drivers/iommu/iommufd/pages.c        | 86 +++++++++++++++++++++++++++++++-----
>   2 files changed, 80 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
> index 7c4a338..3a28f46 100644
> --- a/drivers/iommu/iommufd/io_pagetable.h
> +++ b/drivers/iommu/iommufd/io_pagetable.h
> @@ -177,6 +177,7 @@ enum {
>   
>   enum iopt_address_type {
>   	IOPT_ADDRESS_USER = 0,
> +	IOPT_ADDRESS_FILE = 1,
>   };
>   
>   /*
> @@ -202,6 +203,10 @@ struct iopt_pages {
>   	enum iopt_address_type type;
>   	union {
>   		void __user *uptr;		/* IOPT_ADDRESS_USER */
> +		struct {			/* IOPT_ADDRESS_FILE */
> +			struct file *file;
> +			unsigned long start;
> +		};
>   	};
>   	bool writable:1;
>   	u8 account_mode;
> diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
> index 69822d4..df5ba4f 100644
> --- a/drivers/iommu/iommufd/pages.c
> +++ b/drivers/iommu/iommufd/pages.c
> @@ -703,19 +703,28 @@ struct pfn_reader_user {
>   	 * neither
>   	 */
>   	int locked;
> +
> +	/* The following are only valid if file != NULL. */
> +	struct file *file;
> +	struct folio **ufolios;
> +	unsigned long ufolios_len;
>   };
>   
>   static void pfn_reader_user_init(struct pfn_reader_user *user,
>   				 struct iopt_pages *pages)
>   {
>   	user->upages = NULL;
> +	user->upages_len = 0;
>   	user->upages_start = 0;
>   	user->upages_end = 0;
>   	user->locked = -1;
> -
>   	user->gup_flags = FOLL_LONGTERM;
>   	if (pages->writable)
>   		user->gup_flags |= FOLL_WRITE;
> +
> +	user->file = (pages->type == IOPT_ADDRESS_FILE) ? pages->file : NULL;
> +	user->ufolios = NULL;
> +	user->ufolios_len = 0;
>   }
>   
>   static void pfn_reader_user_destroy(struct pfn_reader_user *user,
> @@ -731,6 +740,45 @@ static void pfn_reader_user_destroy(struct pfn_reader_user *user,
>   
>   	kfree(user->upages);
>   	user->upages = NULL;
> +	kfree(user->ufolios);
> +	user->ufolios = NULL;
> +}
> +
> +static long pin_memfd_pages(struct pfn_reader_user *user,
> +			    unsigned long start,
> +			    unsigned long npages)
> +{
> +	unsigned long end, nr, i, j, npin, offset, npages_out;
> +	long nfolios;
> +	struct folio *folio;
> +	struct page **upages = user->upages;
> +
> +	nfolios = user->ufolios_len / sizeof(*user->ufolios);
> +	end = start + (npages << PAGE_SHIFT) - 1;
> +
> +	nfolios = memfd_pin_folios(user->file, start, end,
> +				   user->ufolios, nfolios, &offset);
> +	if (nfolios <= 0)
> +		return nfolios;
> +
> +	offset >>= PAGE_SHIFT;
> +	npages_out = 0;
> +
> +	for (i = 0; i < nfolios; i++) {
> +		folio = user->ufolios[i];
> +		nr = folio_nr_pages(folio);
> +		npin = min(nr - offset, npages);
> +		if (nr > 1) {
> +			folio_split_user_page_pin(folio, npin);
> +		}
> +		for (j = offset; j < offset + npin; j++)
> +			*upages++ = folio_page(folio, j);
> +		npages -= npin;
> +		npages_out += npin;
> +		offset = 0;
> +	}
> +
> +	return npages_out;
>   }
>   
>   static int pfn_reader_user_pin(struct pfn_reader_user *user,
> @@ -739,7 +787,8 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
>   			       unsigned long last_index)
>   {
>   	bool remote_mm = pages->source_mm != current->mm;
> -	unsigned long npages;
> +	unsigned long npages = last_index - start_index + 1;
> +	unsigned long start, unum;
>   	uintptr_t uptr;
>   	long rc;
>   
> @@ -749,14 +798,25 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
>   
>   	if (!user->upages) {
>   		/* All undone in pfn_reader_destroy() */
> -		user->upages_len =
> -			(last_index - start_index + 1) * sizeof(*user->upages);
> +		user->upages_len = npages * sizeof(*user->upages);
>   		user->upages = temp_kmalloc(&user->upages_len, NULL, 0);
>   		if (!user->upages)
>   			return -ENOMEM;
>   	}
>   
> -	if (user->locked == -1) {
> +	if (user->file && !user->ufolios) {
> +		user->ufolios_len = npages * sizeof(*user->ufolios);
> +		user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
> +		if (!user->ufolios)
> +			return -ENOMEM;
> +
> +		/* Bail for now.  Be more robust when we optimize for folios. */
> +		if (user->ufolios_len / sizeof(*user->ufolios) <
> +		    user->upages_len / sizeof(*user->upages))
> +			return -ENOMEM;
> +	}
> +
> +	if (!user->file && user->locked == -1) {

During further testing I found a bug caused by this code. Elsewhere the code
checks "if (user->locked)" and breaks because locked is -1, as it was never
initialized here.  The code here should be:

         if (user->locked == -1) {
                 if (!user->file && remote_mm) {
                         if (!mmget_not_zero(pages->source_mm))
                                 return -EFAULT;
                 }
                 user->locked = 0;
         }

I will fix this in V3 (but will not send it until V2 is reviewed).

- Steve

>   		/*
>   		 * The majority of usages will run the map task within the mm
>   		 * providing the pages, so we can optimize into
> @@ -769,18 +829,22 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
>   		user->locked = 0;
>   	}
>   
> -	npages = min_t(unsigned long, last_index - start_index + 1,
> -		       user->upages_len / sizeof(*user->upages));
> -
> +	unum = user->file ? user->ufolios_len / sizeof(*user->ufolios) :
> +			    user->upages_len / sizeof(*user->upages);
> +	npages = min_t(unsigned long, npages, unum);
>   
>   	if (iommufd_should_fail())
>   		return -EFAULT;
>   
> -	uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
> -	if (!remote_mm)
> +	if (user->file) {
> +		start = pages->start + (start_index * PAGE_SIZE);
> +		rc = pin_memfd_pages(user, start, npages);
> +	} else if (!remote_mm) {
> +		uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
>   		rc = pin_user_pages_fast(uptr, npages, user->gup_flags,
>   					 user->upages);
> -	else {
> +	} else {
> +		uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
>   		if (!user->locked) {
>   			mmap_read_lock(pages->source_mm);
>   			user->locked = 1;


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 2/9] iommufd: generalize iopt_pages address
  2024-09-25 23:04       ` Nicolin Chen
@ 2024-09-26 14:48         ` Jason Gunthorpe
  2024-09-26 15:48           ` Steven Sistare
  0 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2024-09-26 14:48 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: Steve Sistare, iommu, Kevin Tian

On Wed, Sep 25, 2024 at 04:04:26PM -0700, Nicolin Chen wrote:
> On Wed, Sep 25, 2024 at 07:43:22PM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 24, 2024 at 01:31:37PM -0700, Nicolin Chen wrote:
> > > On Tue, Sep 24, 2024 at 08:05:31AM -0700, Steve Sistare wrote:
> > > > +static int iopt_map_common(struct iommufd_ctx *ictx,
> > > > +			     struct io_pagetable *iopt,
> > > > +                          struct iopt_pages *pages,
> > > > +                          unsigned long *iova,
> > > > +                          unsigned long length,
> > > > +                          unsigned long start_byte,
> > > > +                          int iommu_prot,
> > > > +                          unsigned int flags)
> > > 
> > > I think we should follow the existing coding style by combining
> > > lines that could fit within 80 characters each.
> > 
> > To make life easy the "coding style" is mostly whatever clang-format
> > spits out. "git clang-format HEAD^!" for instance will adjust the
> > current patch.
> 
> I see. That's useful. What's that exclamation mark at the end for?

Manual page git-rev-parse(1)

REVISION RANGE SUMMARY

       <rev>^!, e.g. HEAD^!
           A suffix ^ followed by an exclamation mark is the same as giving commit <rev> and all its parents prefixed with ^
           to exclude them (and their ancestors).

It is the syntax to make a range that includes only a single commit (HEAD).

Though HEAD^ probably makes more sense since it is computing a diff
from an absolute point.

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 2/9] iommufd: generalize iopt_pages address
  2024-09-26 14:48         ` Jason Gunthorpe
@ 2024-09-26 15:48           ` Steven Sistare
  0 siblings, 0 replies; 38+ messages in thread
From: Steven Sistare @ 2024-09-26 15:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Nicolin Chen; +Cc: iommu, Kevin Tian

On 9/26/2024 10:48 AM, Jason Gunthorpe wrote:
> On Wed, Sep 25, 2024 at 04:04:26PM -0700, Nicolin Chen wrote:
>> On Wed, Sep 25, 2024 at 07:43:22PM -0300, Jason Gunthorpe wrote:
>>> On Tue, Sep 24, 2024 at 01:31:37PM -0700, Nicolin Chen wrote:
>>>> On Tue, Sep 24, 2024 at 08:05:31AM -0700, Steve Sistare wrote:
>>>>> +static int iopt_map_common(struct iommufd_ctx *ictx,
>>>>> +			     struct io_pagetable *iopt,
>>>>> +                          struct iopt_pages *pages,
>>>>> +                          unsigned long *iova,
>>>>> +                          unsigned long length,
>>>>> +                          unsigned long start_byte,
>>>>> +                          int iommu_prot,
>>>>> +                          unsigned int flags)
>>>>
>>>> I think we should follow the existing coding style by combining
>>>> lines that could fit within 80 characters each.
>>>
>>> To make life easy the "coding style" is mostly whatever clang-format
>>> spits out. "git clang-format HEAD^!" for instance will adjust the
>>> current patch.
>>
>> I see. That's useful. What's that exclamation mark at the end for?
> 
> Manual page git-rev-parse(1)
> 
> REVISION RANGE SUMMARY
> 
>         <rev>^!, e.g. HEAD^!
>             A suffix ^ followed by an exclamation mark is the same as giving commit <rev> and all its parents prefixed with ^
>             to exclude them (and their ancestors).
> 
> It is the syntax to make a range that includes only a single commit (HEAD).
> 
> Though HEAD^ probably makes more sense since it is computing a diff
> from an absolute point.

Sure, I'll apply this to all the patches for V3.

- Steve


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation
  2024-09-24 21:08   ` Nicolin Chen
@ 2024-09-26 15:50     ` Steven Sistare
  0 siblings, 0 replies; 38+ messages in thread
From: Steven Sistare @ 2024-09-26 15:50 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: iommu, Jason Gunthorpe, Kevin Tian

On 9/24/2024 5:08 PM, Nicolin Chen wrote:
> On Tue, Sep 24, 2024 at 08:05:34AM -0700, Steve Sistare wrote:
>> diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
>> index a0a66f1..1e14a0f 100644
>> --- a/drivers/iommu/iommufd/io_pagetable.c
>> +++ b/drivers/iommu/iommufd/io_pagetable.c
>> @@ -259,7 +259,10 @@ static int iopt_alloc_area_pages(struct io_pagetable *iopt,
>>                  /* Use the first entry to guess the ideal IOVA alignment */
>>                  elm = list_first_entry(pages_list, struct iopt_pages_list,
>>                                         next);
>> -               start = elm->start_byte + (uintptr_t)elm->pages->uptr;
>> +               if (elm->pages->type == IOPT_ADDRESS_FILE)
>> +                       start = elm->start_byte + elm->pages->start;
>> +               else
>> +                       start = elm->start_byte + (uintptr_t)elm->pages->uptr;
> 
> 	switch (elm->pages->type) {
> 	case IOPT_ADDRESS_USER: ...
> 	case IOPT_ADDRESS_FILE: ...
> 	}
> 
> Would make it easier for us to add other types.

OK - steve

>   
>> +/**
>> + * iopt_map_file_pages() - Like iopt_map_user_pages, but map a file.
> 
> Could we just write "Map a user FILE to an..."? It's not that long :)
> 
>>   int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd)
>>   {
> [...]
>> +       rc = iopt_map_file_pages(ucmd->ictx, &ioas->iopt, &iova,
>> +                                file, cmd->start,
>> +                                cmd->length,
> 
> We could merge cmd->length to the previous line.
> 
> Thanks
> Nicolin


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 3/9] iommufd: pfn reader for file mappings
  2024-09-24 20:53   ` Nicolin Chen
@ 2024-09-26 15:59     ` Steven Sistare
  0 siblings, 0 replies; 38+ messages in thread
From: Steven Sistare @ 2024-09-26 15:59 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: iommu, Jason Gunthorpe, Kevin Tian

On 9/24/2024 4:53 PM, Nicolin Chen wrote:
> On Tue, Sep 24, 2024 at 08:05:32AM -0700, Steve Sistare wrote:
> 
>> diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
>> index 69822d4..df5ba4f 100644
>> --- a/drivers/iommu/iommufd/pages.c
>> +++ b/drivers/iommu/iommufd/pages.c
>> @@ -703,19 +703,28 @@ struct pfn_reader_user {
>>           * neither
>>           */
>>          int locked;
>> +
>> +       /* The following are only valid if file != NULL. */
>> +       struct file *file;
>> +       struct folio **ufolios;
>> +       unsigned long ufolios_len;
>>   };
> 
> I am not 100% sure. Yet, given the comments at pfn_reader_user:
> 	/* pfn_reader_user is just the pin_user_pages() path */
> which FILE doesn't use, I think we might need a pfn_reader_file
> with own set of helpers:
> 	pfn_reader_file_init
> 	pfn_reader_file_destroy
> 	pfn_reader_file_pin
> Then their callers should switch-case pages->type to select the
> corresponding type of pfn_reader functions.
> 
> Maybe USER and FILE don't diff that much. But considering we'll
> next add a PHYS type and none of them makes sense to PHYS, there
> could be another layer of cleanup at these pfn_reader functions,
> v.s. stuffing all FILE-related lines into pfn_reader_user.

IMO the single pfn_reader_user is cleanest given user and file types.
Feel free to rework it when you add PHYS if that additional type tips
the scales.

- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 1/9] iommufd: rename uptr in iopt_alloc_iova
  2024-09-24 15:05 ` [PATCH V2 1/9] iommufd: rename uptr in iopt_alloc_iova Steve Sistare
@ 2024-09-27 15:59   ` Jason Gunthorpe
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Gunthorpe @ 2024-09-27 15:59 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Tue, Sep 24, 2024 at 08:05:30AM -0700, Steve Sistare wrote:
> iopt_alloc_iova takes a uptr argument but only checks for its alignment.
> Generalize this to an unsigned address, which can be the offset from the
> start of a file in a subsequent patch.  No functional change.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/iommu/iommufd/io_pagetable.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 2/9] iommufd: generalize iopt_pages address
  2024-09-24 15:05 ` [PATCH V2 2/9] iommufd: generalize iopt_pages address Steve Sistare
  2024-09-24 20:31   ` Nicolin Chen
@ 2024-10-01 19:54   ` Jason Gunthorpe
  1 sibling, 0 replies; 38+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 19:54 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Tue, Sep 24, 2024 at 08:05:31AM -0700, Steve Sistare wrote:
> The starting address in iopt_pages is currently a __user *uptr.  Generalize
> to allow other types of addresses.  Refactor iopt_alloc_pages and
> iopt_map_user_pages into address-type specific and common functions.
> 
> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/iommu/iommufd/io_pagetable.c | 60 +++++++++++++++++++++++-------------
>  drivers/iommu/iommufd/io_pagetable.h | 13 ++++++--
>  drivers/iommu/iommufd/pages.c        | 30 +++++++++++++-----
>  3 files changed, 71 insertions(+), 32 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 4/9] iommufd: IOMMU_IOAS_MAP_FILE interface
  2024-09-24 15:05 ` [PATCH V2 4/9] iommufd: IOMMU_IOAS_MAP_FILE interface Steve Sistare
@ 2024-10-01 19:58   ` Jason Gunthorpe
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 19:58 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Tue, Sep 24, 2024 at 08:05:33AM -0700, Steve Sistare wrote:
> Define the IOMMU_IOAS_MAP_FILE ioctl interface, which allows a user to
> register memory by passing a memfd plus offset and length.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/iommu/iommufd/ioas.c            |  5 +++++
>  drivers/iommu/iommufd/iommufd_private.h |  1 +
>  drivers/iommu/iommufd/main.c            |  2 ++
>  include/uapi/linux/iommufd.h            | 25 +++++++++++++++++++++++++
>  4 files changed, 33 insertions(+)

I would squash this into the next patch, but it looks OK to me

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation
  2024-09-24 15:05 ` [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation Steve Sistare
  2024-09-24 21:08   ` Nicolin Chen
@ 2024-10-01 20:05   ` Jason Gunthorpe
  2024-10-02 17:26     ` Steven Sistare
  1 sibling, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 20:05 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Tue, Sep 24, 2024 at 08:05:34AM -0700, Steve Sistare wrote:

> @@ -199,7 +200,44 @@ static int conv_iommu_prot(u32 map_flags)
>  
>  int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd)
>  {
> -	return -ENOTTY;
> +	struct iommu_ioas_map_file *cmd = ucmd->cmd;
> +	unsigned long iova = cmd->iova;
> +	struct iommufd_ioas *ioas;
> +	unsigned int flags = 0;
> +	int rc;
> +	struct file *file;
> +	unsigned long end;
> +
> +	if (cmd->iova >= ULONG_MAX || cmd->length >= ULONG_MAX)
> +		return -EOVERFLOW;
> +
> +	if (check_add_overflow(cmd->start, cmd->length - 1, &end))
> +		return -EOVERFLOW;

Do we check that length != 0 before doing this math someplace?

> +	ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
> +	if (IS_ERR(ioas))
> +		return PTR_ERR(ioas);
> +
> +	if (!(cmd->flags & IOMMU_IOAS_MAP_FIXED_IOVA))
> +		flags = IOPT_ALLOC_IOVA;
> +
> +	file = fget(cmd->fd);
> +	if (!file)
> +		return -EBADF;
> +
> +	rc = iopt_map_file_pages(ucmd->ictx, &ioas->iopt, &iova,
> +				 file, cmd->start,
> +				 cmd->length,
> +				 conv_iommu_prot(cmd->flags), flags);
> +	if (rc)
> +		goto out_put;
> +
> +	cmd->iova = iova;
> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> +out_put:
> +	iommufd_put_object(ucmd->ictx, &ioas->obj);
> +	fput(file);

I don't see a second fput, what does fput on the success path?

>  int iommufd_ioas_map(struct iommufd_ucmd *ucmd)
> diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
> index df5ba4f..0a432dc2 100644
> --- a/drivers/iommu/iommufd/pages.c
> +++ b/drivers/iommu/iommufd/pages.c
> @@ -902,7 +902,8 @@ static int update_mm_locked_vm(struct iopt_pages *pages, unsigned long npages,
>  		mmap_read_unlock(pages->source_mm);
>  		user->locked = 0;
>  		/* If we had the lock then we also have a get */
> -	} else if ((!user || !user->upages) &&
> +
> +	} else if ((!user || (!user->upages && !user->ufolios)) &&
>  		   pages->source_mm != current->mm) {
>  		if (!mmget_not_zero(pages->source_mm))
>  			return -EINVAL;

Does this hunk go in an earlier previous patch?

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 6/9] iommufd: file mappings for mdev
  2024-09-24 15:05 ` [PATCH V2 6/9] iommufd: file mappings for mdev Steve Sistare
@ 2024-10-01 20:09   ` Jason Gunthorpe
  2024-10-02 17:26     ` Steven Sistare
  0 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 20:09 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Tue, Sep 24, 2024 at 08:05:35AM -0700, Steve Sistare wrote:
> Support file mappings for mediated devices, aka mdevs.  Access is initiated
> by the vfio_pin_pages and vfio_dma_rw kernel interfaces.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/iommu/iommufd/pages.c | 22 +++++++++++++++-------
>  1 file changed, 15 insertions(+), 7 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

> @@ -1884,6 +1884,9 @@ static int iopt_pages_rw_page(struct iopt_pages *pages, unsigned long index,
>  	struct page *page = NULL;
>  	int rc;
>  
> +	if (WARN_ON(pages->type != IOPT_ADDRESS_USER))
> +		return -EINVAL;

You can can probably wrap this with 
  if (IS_ENABLED(CONFIG_IOMMUFD_TEST))

since it is supposed to be fast path

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 8/9] iommufd: optimize file mapping
  2024-09-24 15:05 ` [PATCH V2 8/9] iommufd: optimize file mapping Steve Sistare
@ 2024-10-01 21:03   ` Jason Gunthorpe
  2024-10-02 17:26     ` Steven Sistare
  0 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 21:03 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Tue, Sep 24, 2024 at 08:05:37AM -0700, Steve Sistare wrote:
> @@ -622,6 +630,67 @@ static void batch_from_pages(struct pfn_batch *batch, struct page **pages,
>  			break;
>  }
>  
> +static void batch_from_folios(struct pfn_batch *batch, struct folio **folios,
> +			      unsigned long offset, unsigned long npages)
> +{
> +	unsigned long nr, pfn, i = 0;
> +	struct folio *folio;
> +
> +	while (npages) {
> +		folio = folios[i++];
> +		nr = folio_nr_pages(folio) - offset;
> +		nr = min_t(unsigned long, nr, npages);
> +		pfn = page_to_pfn(folio_page(folio, offset));
> +		batch_add_pfn_num(batch, pfn, nr);

I think we technically need to cap nr here as well?

> +		offset = 0;
> +		npages -= nr;
> +	}
> +}
> +
> +/*
> + * Return the folio containing the page which is @start_index pages beyond
> + * page number @offset in folios[0].  Return the index of that page in
> + * @offset_out.
> + */
> +static struct folio **folios_start(struct folio **folios, unsigned long offset,
> +				   unsigned long start_index,
> +				   unsigned long *offset_out)
> +{
> +	unsigned long nr = folio_nr_pages(*folios);
> +
> +	while (offset + start_index > nr) {
> +		start_index -= (nr - offset);
> +		offset = 0;
> +		folios++;
> +		nr = folio_nr_pages(*folios);
> +	}
> +
> +	*offset_out = offset + start_index;
> +	return folios;
> +}

Why does this exist? I would expect you'd call memfd_pin_folios with
the correct offset so that the first folio was the right folio, this
is how the pin_user_pages path works?

> +static void folios_unpin_partial(struct folio **folios, unsigned long offset,
> +				 unsigned long npages)
> +{
> +	unsigned long nr, j, i = 0;
> +	struct folio *folio;
> +
> +	while (npages) {
> +		folio = folios[i++];
> +		nr = folio_nr_pages(folio);
> +		if (offset == 0 && nr < npages) {
> +			unpin_folio(folio);
> +		} else {
> +			nr = min_t(unsigned long, npages, nr - offset);
> +			for (j = 0; j < nr; j++)
> +				unpin_user_page(folio_page(folio, offset + j));

Don't loop, we have unpin_user_pages() for this

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 9/9] iommufd: map file selftest
  2024-09-24 15:05 ` [PATCH V2 9/9] iommufd: map file selftest Steve Sistare
@ 2024-10-01 21:05   ` Jason Gunthorpe
  2024-10-02 17:26     ` Steven Sistare
  0 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 21:05 UTC (permalink / raw)
  To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Tue, Sep 24, 2024 at 08:05:38AM -0700, Steve Sistare wrote:
> @@ -33,6 +37,24 @@ static unsigned long get_huge_page_size(void)
>  	return strtoul(buf, NULL, 10);
>  }
>  
> +static int memfd_create(const char *name, unsigned int flags)
> +{
> +	return syscall(__NR_memfd_create, name, flags);
> +}

I think we should just use the glibc definition, or is that an issue?

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation
  2024-10-01 20:05   ` Jason Gunthorpe
@ 2024-10-02 17:26     ` Steven Sistare
  0 siblings, 0 replies; 38+ messages in thread
From: Steven Sistare @ 2024-10-02 17:26 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, Kevin Tian, Nicolin Chen

On 10/1/2024 4:05 PM, Jason Gunthorpe wrote:
> On Tue, Sep 24, 2024 at 08:05:34AM -0700, Steve Sistare wrote:
> 
>> @@ -199,7 +200,44 @@ static int conv_iommu_prot(u32 map_flags)
>>   
>>   int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd)
>>   {
>> -	return -ENOTTY;
>> +	struct iommu_ioas_map_file *cmd = ucmd->cmd;
>> +	unsigned long iova = cmd->iova;
>> +	struct iommufd_ioas *ioas;
>> +	unsigned int flags = 0;
>> +	int rc;
>> +	struct file *file;
>> +	unsigned long end;
>> +
>> +	if (cmd->iova >= ULONG_MAX || cmd->length >= ULONG_MAX)
>> +		return -EOVERFLOW;
>> +
>> +	if (check_add_overflow(cmd->start, cmd->length - 1, &end))
>> +		return -EOVERFLOW;
> 
> Do we check that length != 0 before doing this math someplace?

Nope, I will add it, thanks.
Since iommufd_ioas_map allows length == 0, I will allow it here also,
but skip check_add_overflow.

>> +	ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
>> +	if (IS_ERR(ioas))
>> +		return PTR_ERR(ioas);
>> +
>> +	if (!(cmd->flags & IOMMU_IOAS_MAP_FIXED_IOVA))
>> +		flags = IOPT_ALLOC_IOVA;
>> +
>> +	file = fget(cmd->fd);
>> +	if (!file)
>> +		return -EBADF;
>> +
>> +	rc = iopt_map_file_pages(ucmd->ictx, &ioas->iopt, &iova,
>> +				 file, cmd->start,
>> +				 cmd->length,
>> +				 conv_iommu_prot(cmd->flags), flags);
>> +	if (rc)
>> +		goto out_put;
>> +
>> +	cmd->iova = iova;
>> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
>> +out_put:
>> +	iommufd_put_object(ucmd->ictx, &ioas->obj);
>> +	fput(file);
> 
> I don't see a second fput, what does fput on the success path?

Thanks, that's a bug.  iopt_alloc_file_pages calls get_file, and I must
add fput in iopt_release_pages.

>>   int iommufd_ioas_map(struct iommufd_ucmd *ucmd)
>> diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
>> index df5ba4f..0a432dc2 100644
>> --- a/drivers/iommu/iommufd/pages.c
>> +++ b/drivers/iommu/iommufd/pages.c
>> @@ -902,7 +902,8 @@ static int update_mm_locked_vm(struct iopt_pages *pages, unsigned long npages,
>>   		mmap_read_unlock(pages->source_mm);
>>   		user->locked = 0;
>>   		/* If we had the lock then we also have a get */
>> -	} else if ((!user || !user->upages) &&
>> +
>> +	} else if ((!user || (!user->upages && !user->ufolios)) &&
>>   		   pages->source_mm != current->mm) {
>>   		if (!mmget_not_zero(pages->source_mm))
>>   			return -EINVAL;
> 
> Does this hunk go in an earlier previous patch?

Yes, it belongs in "pfn reader for file mappings", thanks.

- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 6/9] iommufd: file mappings for mdev
  2024-10-01 20:09   ` Jason Gunthorpe
@ 2024-10-02 17:26     ` Steven Sistare
  0 siblings, 0 replies; 38+ messages in thread
From: Steven Sistare @ 2024-10-02 17:26 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, Kevin Tian, Nicolin Chen

On 10/1/2024 4:09 PM, Jason Gunthorpe wrote:
> On Tue, Sep 24, 2024 at 08:05:35AM -0700, Steve Sistare wrote:
>> Support file mappings for mediated devices, aka mdevs.  Access is initiated
>> by the vfio_pin_pages and vfio_dma_rw kernel interfaces.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>   drivers/iommu/iommufd/pages.c | 22 +++++++++++++++-------
>>   1 file changed, 15 insertions(+), 7 deletions(-)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
>> @@ -1884,6 +1884,9 @@ static int iopt_pages_rw_page(struct iopt_pages *pages, unsigned long index,
>>   	struct page *page = NULL;
>>   	int rc;
>>   
>> +	if (WARN_ON(pages->type != IOPT_ADDRESS_USER))
>> +		return -EINVAL;
> 
> You can can probably wrap this with
>    if (IS_ENABLED(CONFIG_IOMMUFD_TEST))
> 
> since it is supposed to be fast path

Cool, will do - steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 8/9] iommufd: optimize file mapping
  2024-10-01 21:03   ` Jason Gunthorpe
@ 2024-10-02 17:26     ` Steven Sistare
  2024-10-02 17:29       ` Jason Gunthorpe
  0 siblings, 1 reply; 38+ messages in thread
From: Steven Sistare @ 2024-10-02 17:26 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, Kevin Tian, Nicolin Chen

On 10/1/2024 5:03 PM, Jason Gunthorpe wrote:
> On Tue, Sep 24, 2024 at 08:05:37AM -0700, Steve Sistare wrote:
>> @@ -622,6 +630,67 @@ static void batch_from_pages(struct pfn_batch *batch, struct page **pages,
>>   			break;
>>   }
>>   
>> +static void batch_from_folios(struct pfn_batch *batch, struct folio **folios,
>> +			      unsigned long offset, unsigned long npages)
>> +{
>> +	unsigned long nr, pfn, i = 0;
>> +	struct folio *folio;
>> +
>> +	while (npages) {
>> +		folio = folios[i++];
>> +		nr = folio_nr_pages(folio) - offset;
>> +		nr = min_t(unsigned long, nr, npages);
>> +		pfn = page_to_pfn(folio_page(folio, offset));
>> +		batch_add_pfn_num(batch, pfn, nr);
> 
> I think we technically need to cap nr here as well?

Do you mean cap nr at max-unsigned-int and loop over the folio?

>> +		offset = 0;
>> +		npages -= nr;
>> +	}
>> +}
>> +
>> +/*
>> + * Return the folio containing the page which is @start_index pages beyond
>> + * page number @offset in folios[0].  Return the index of that page in
>> + * @offset_out.
>> + */
>> +static struct folio **folios_start(struct folio **folios, unsigned long offset,
>> +				   unsigned long start_index,
>> +				   unsigned long *offset_out)
>> +{
>> +	unsigned long nr = folio_nr_pages(*folios);
>> +
>> +	while (offset + start_index > nr) {
>> +		start_index -= (nr - offset);
>> +		offset = 0;
>> +		folios++;
>> +		nr = folio_nr_pages(*folios);
>> +	}
>> +
>> +	*offset_out = offset + start_index;
>> +	return folios;
>> +}
> 
> Why does this exist? I would expect you'd call memfd_pin_folios with
> the correct offset so that the first folio was the right folio, this
> is how the pin_user_pages path works?

We need folios_start for the case where the 'if' branch below is not taken,
so the folio was pinned by a previous call to pfn_reader_fill_span,
so the requested start_index for the current call does not necessarily
fall in the first folio of user->ufolios.

pfn_reader_fill_span()
     if (start_index >= pfns->user.upages_end)
         rc = pfn_reader_user_pin(&pfns->user, pfns->pages, start_index,

    ...
    folios = folios_start(user->ufolios, user->ufolios_offset,
                          start_index, &offset);
    batch_from_folios(&pfns->batch, folios, offset, npages);

>> +static void folios_unpin_partial(struct folio **folios, unsigned long offset,
>> +				 unsigned long npages)
>> +{
>> +	unsigned long nr, j, i = 0;
>> +	struct folio *folio;
>> +
>> +	while (npages) {
>> +		folio = folios[i++];
>> +		nr = folio_nr_pages(folio);
>> +		if (offset == 0 && nr < npages) {
>> +			unpin_folio(folio);
>> +		} else {
>> +			nr = min_t(unsigned long, npages, nr - offset);
>> +			for (j = 0; j < nr; j++)
>> +				unpin_user_page(folio_page(folio, offset + j));
> 
> Don't loop, we have unpin_user_pages() for this

Cannot call it, I do not have a pages[] array in hand.

- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 9/9] iommufd: map file selftest
  2024-10-01 21:05   ` Jason Gunthorpe
@ 2024-10-02 17:26     ` Steven Sistare
  2024-10-02 17:29       ` Jason Gunthorpe
  0 siblings, 1 reply; 38+ messages in thread
From: Steven Sistare @ 2024-10-02 17:26 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, Kevin Tian, Nicolin Chen

On 10/1/2024 5:05 PM, Jason Gunthorpe wrote:
> On Tue, Sep 24, 2024 at 08:05:38AM -0700, Steve Sistare wrote:
>> @@ -33,6 +37,24 @@ static unsigned long get_huge_page_size(void)
>>   	return strtoul(buf, NULL, 10);
>>   }
>>   
>> +static int memfd_create(const char *name, unsigned int flags)
>> +{
>> +	return syscall(__NR_memfd_create, name, flags);
>> +}
> 
> I think we should just use the glibc definition, or is that an issue?

Older userland distributions with older glibc do not define it, but maybe
old enough that they are never paired with newer kernels that support iommufd.
Your call.

- Steve

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 8/9] iommufd: optimize file mapping
  2024-10-02 17:26     ` Steven Sistare
@ 2024-10-02 17:29       ` Jason Gunthorpe
  2024-10-02 17:51         ` Steven Sistare
  0 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2024-10-02 17:29 UTC (permalink / raw)
  To: Steven Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Wed, Oct 02, 2024 at 01:26:32PM -0400, Steven Sistare wrote:
> On 10/1/2024 5:03 PM, Jason Gunthorpe wrote:
> > On Tue, Sep 24, 2024 at 08:05:37AM -0700, Steve Sistare wrote:
> > > @@ -622,6 +630,67 @@ static void batch_from_pages(struct pfn_batch *batch, struct page **pages,
> > >   			break;
> > >   }
> > > +static void batch_from_folios(struct pfn_batch *batch, struct folio **folios,
> > > +			      unsigned long offset, unsigned long npages)
> > > +{
> > > +	unsigned long nr, pfn, i = 0;
> > > +	struct folio *folio;
> > > +
> > > +	while (npages) {
> > > +		folio = folios[i++];
> > > +		nr = folio_nr_pages(folio) - offset;
> > > +		nr = min_t(unsigned long, nr, npages);
> > > +		pfn = page_to_pfn(folio_page(folio, offset));
> > > +		batch_add_pfn_num(batch, pfn, nr);
> > 
> > I think we technically need to cap nr here as well?
> 
> Do you mean cap nr at max-unsigned-int and loop over the folio?

The max that can be put into a batch entry

> > Why does this exist? I would expect you'd call memfd_pin_folios with
> > the correct offset so that the first folio was the right folio, this
> > is how the pin_user_pages path works?
> 
> We need folios_start for the case where the 'if' branch below is not taken,
> so the folio was pinned by a previous call to pfn_reader_fill_span,
> so the requested start_index for the current call does not necessarily
> fall in the first folio of user->ufolios.

But that should be handled by keeping track of where we are in the
array, not searching it again..


> > > +static void folios_unpin_partial(struct folio **folios, unsigned long offset,
> > > +				 unsigned long npages)
> > > +{
> > > +	unsigned long nr, j, i = 0;
> > > +	struct folio *folio;
> > > +
> > > +	while (npages) {
> > > +		folio = folios[i++];
> > > +		nr = folio_nr_pages(folio);
> > > +		if (offset == 0 && nr < npages) {
> > > +			unpin_folio(folio);
> > > +		} else {
> > > +			nr = min_t(unsigned long, npages, nr - offset);
> > > +			for (j = 0; j < nr; j++)
> > > +				unpin_user_page(folio_page(folio, offset + j));
> > 
> > Don't loop, we have unpin_user_pages() for this
> 
> Cannot call it, I do not have a pages[] array in hand.

Ugh, still, don't loop, this needs a new mm helper then

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 9/9] iommufd: map file selftest
  2024-10-02 17:26     ` Steven Sistare
@ 2024-10-02 17:29       ` Jason Gunthorpe
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Gunthorpe @ 2024-10-02 17:29 UTC (permalink / raw)
  To: Steven Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Wed, Oct 02, 2024 at 01:26:49PM -0400, Steven Sistare wrote:
> On 10/1/2024 5:05 PM, Jason Gunthorpe wrote:
> > On Tue, Sep 24, 2024 at 08:05:38AM -0700, Steve Sistare wrote:
> > > @@ -33,6 +37,24 @@ static unsigned long get_huge_page_size(void)
> > >   	return strtoul(buf, NULL, 10);
> > >   }
> > > +static int memfd_create(const char *name, unsigned int flags)
> > > +{
> > > +	return syscall(__NR_memfd_create, name, flags);
> > > +}
> > 
> > I think we should just use the glibc definition, or is that an issue?
> 
> Older userland distributions with older glibc do not define it, but maybe
> old enough that they are never paired with newer kernels that support iommufd.
> Your call.

I would keep it simple, it is just a test

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 7/9] iommufd: pfn reader local variables
  2024-09-24 15:05 ` [PATCH V2 7/9] iommufd: pfn reader local variables Steve Sistare
@ 2024-10-02 17:29   ` Steven Sistare
  2024-10-04 12:16     ` Jason Gunthorpe
  0 siblings, 1 reply; 38+ messages in thread
From: Steven Sistare @ 2024-10-02 17:29 UTC (permalink / raw)
  To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen

Hi Jason, perhaps you missed this patch, since you commented on all
the others.

- Steve

On 9/24/2024 11:05 AM, Steve Sistare wrote:
> Add local variables for common sub-expressions needed by a subsequent
> patch.  No functional change.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>   drivers/iommu/iommufd/pages.c | 24 +++++++++++++-----------
>   1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
> index 9d31df4..6e11189 100644
> --- a/drivers/iommu/iommufd/pages.c
> +++ b/drivers/iommu/iommufd/pages.c
> @@ -1043,6 +1043,8 @@ static int pfn_reader_fill_span(struct pfn_reader *pfns)
>   {
>   	struct interval_tree_double_span_iter *span = &pfns->span;
>   	unsigned long start_index = pfns->batch_end_index;
> +	struct pfn_reader_user *user = &pfns->user;
> +	unsigned long npages;
>   	struct iopt_area *area;
>   	int rc;
>   
> @@ -1080,10 +1082,9 @@ static int pfn_reader_fill_span(struct pfn_reader *pfns)
>   			return rc;
>   	}
>   
> -	batch_from_pages(&pfns->batch,
> -			 pfns->user.upages +
> -				 (start_index - pfns->user.upages_start),
> -			 pfns->user.upages_end - start_index);
> +	npages = user->upages_end - start_index;
> +	start_index -= user->upages_start;
> +	batch_from_pages(&pfns->batch, user->upages + start_index, npages);
>   	return 0;
>   }
>   
> @@ -1157,16 +1158,17 @@ static int pfn_reader_init(struct pfn_reader *pfns, struct iopt_pages *pages,
>   static void pfn_reader_release_pins(struct pfn_reader *pfns)
>   {
>   	struct iopt_pages *pages = pfns->pages;
> +	struct pfn_reader_user *user = &pfns->user;
> +	unsigned long npages, start_index;
>   
> -	if (pfns->user.upages_end > pfns->batch_end_index) {
> -		size_t npages = pfns->user.upages_end - pfns->batch_end_index;
> -
> +	if (user->upages_end > pfns->batch_end_index) {
>   		/* Any pages not transferred to the batch are just unpinned */
> -		unpin_user_pages(pfns->user.upages + (pfns->batch_end_index -
> -						      pfns->user.upages_start),
> -				 npages);
> +
> +		npages = user->upages_end - pfns->batch_end_index;
> +		start_index = pfns->batch_end_index - user->upages_start;
> +		unpin_user_pages(user->upages + start_index, npages);
>   		iopt_pages_sub_npinned(pages, npages);
> -		pfns->user.upages_end = pfns->batch_end_index;
> +		user->upages_end = pfns->batch_end_index;
>   	}
>   	if (pfns->batch_start_index != pfns->batch_end_index) {
>   		pfn_reader_unpin(pfns);


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 8/9] iommufd: optimize file mapping
  2024-10-02 17:29       ` Jason Gunthorpe
@ 2024-10-02 17:51         ` Steven Sistare
  0 siblings, 0 replies; 38+ messages in thread
From: Steven Sistare @ 2024-10-02 17:51 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: iommu, Kevin Tian, Nicolin Chen

On 10/2/2024 1:29 PM, Jason Gunthorpe wrote:
> On Wed, Oct 02, 2024 at 01:26:32PM -0400, Steven Sistare wrote:
>> On 10/1/2024 5:03 PM, Jason Gunthorpe wrote:
>>> On Tue, Sep 24, 2024 at 08:05:37AM -0700, Steve Sistare wrote:
>>>> @@ -622,6 +630,67 @@ static void batch_from_pages(struct pfn_batch *batch, struct page **pages,
>>>>    			break;
>>>>    }
>>>> +static void batch_from_folios(struct pfn_batch *batch, struct folio **folios,
>>>> +			      unsigned long offset, unsigned long npages)
>>>> +{
>>>> +	unsigned long nr, pfn, i = 0;
>>>> +	struct folio *folio;
>>>> +
>>>> +	while (npages) {
>>>> +		folio = folios[i++];
>>>> +		nr = folio_nr_pages(folio) - offset;
>>>> +		nr = min_t(unsigned long, nr, npages);
>>>> +		pfn = page_to_pfn(folio_page(folio, offset));
>>>> +		batch_add_pfn_num(batch, pfn, nr);
>>>
>>> I think we technically need to cap nr here as well?
>>
>> Do you mean cap nr at max-unsigned-int and loop over the folio?
> 
> The max that can be put into a batch entry

Yes.

>>> Why does this exist? I would expect you'd call memfd_pin_folios with
>>> the correct offset so that the first folio was the right folio, this
>>> is how the pin_user_pages path works?
>>
>> We need folios_start for the case where the 'if' branch below is not taken,
>> so the folio was pinned by a previous call to pfn_reader_fill_span,
>> so the requested start_index for the current call does not necessarily
>> fall in the first folio of user->ufolios.
> 
> But that should be handled by keeping track of where we are in the
> array, not searching it again..

OK.

>>>> +static void folios_unpin_partial(struct folio **folios, unsigned long offset,
>>>> +				 unsigned long npages)
>>>> +{
>>>> +	unsigned long nr, j, i = 0;
>>>> +	struct folio *folio;
>>>> +
>>>> +	while (npages) {
>>>> +		folio = folios[i++];
>>>> +		nr = folio_nr_pages(folio);
>>>> +		if (offset == 0 && nr < npages) {
>>>> +			unpin_folio(folio);
>>>> +		} else {
>>>> +			nr = min_t(unsigned long, npages, nr - offset);
>>>> +			for (j = 0; j < nr; j++)
>>>> +				unpin_user_page(folio_page(folio, offset + j));
>>>
>>> Don't loop, we have unpin_user_pages() for this
>>
>> Cannot call it, I do not have a pages[] array in hand.
> 
> Ugh, still, don't loop, this needs a new mm helper then

unpin_user_page_range_dirty_lock will do.

- Steve


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH V2 7/9] iommufd: pfn reader local variables
  2024-10-02 17:29   ` Steven Sistare
@ 2024-10-04 12:16     ` Jason Gunthorpe
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Gunthorpe @ 2024-10-04 12:16 UTC (permalink / raw)
  To: Steven Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen

On Wed, Oct 02, 2024 at 01:29:38PM -0400, Steven Sistare wrote:
> Hi Jason, perhaps you missed this patch, since you commented on all
> the others.

It is OK, I wanted to look more closely when you respin it

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2024-10-04 12:16 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-24 15:05 [PATCH V2 0/9] iommu_ioas_map_file Steve Sistare
2024-09-24 15:05 ` [PATCH V2 1/9] iommufd: rename uptr in iopt_alloc_iova Steve Sistare
2024-09-27 15:59   ` Jason Gunthorpe
2024-09-24 15:05 ` [PATCH V2 2/9] iommufd: generalize iopt_pages address Steve Sistare
2024-09-24 20:31   ` Nicolin Chen
2024-09-25 22:43     ` Jason Gunthorpe
2024-09-25 23:04       ` Nicolin Chen
2024-09-26 14:48         ` Jason Gunthorpe
2024-09-26 15:48           ` Steven Sistare
2024-10-01 19:54   ` Jason Gunthorpe
2024-09-24 15:05 ` [PATCH V2 3/9] iommufd: pfn reader for file mappings Steve Sistare
2024-09-24 20:53   ` Nicolin Chen
2024-09-26 15:59     ` Steven Sistare
2024-09-25  1:32   ` kernel test robot
2024-09-25  1:55   ` kernel test robot
2024-09-26 14:10   ` Steven Sistare
2024-09-24 15:05 ` [PATCH V2 4/9] iommufd: IOMMU_IOAS_MAP_FILE interface Steve Sistare
2024-10-01 19:58   ` Jason Gunthorpe
2024-09-24 15:05 ` [PATCH V2 5/9] iommufd: IOMMU_IOAS_MAP_FILE implementation Steve Sistare
2024-09-24 21:08   ` Nicolin Chen
2024-09-26 15:50     ` Steven Sistare
2024-10-01 20:05   ` Jason Gunthorpe
2024-10-02 17:26     ` Steven Sistare
2024-09-24 15:05 ` [PATCH V2 6/9] iommufd: file mappings for mdev Steve Sistare
2024-10-01 20:09   ` Jason Gunthorpe
2024-10-02 17:26     ` Steven Sistare
2024-09-24 15:05 ` [PATCH V2 7/9] iommufd: pfn reader local variables Steve Sistare
2024-10-02 17:29   ` Steven Sistare
2024-10-04 12:16     ` Jason Gunthorpe
2024-09-24 15:05 ` [PATCH V2 8/9] iommufd: optimize file mapping Steve Sistare
2024-10-01 21:03   ` Jason Gunthorpe
2024-10-02 17:26     ` Steven Sistare
2024-10-02 17:29       ` Jason Gunthorpe
2024-10-02 17:51         ` Steven Sistare
2024-09-24 15:05 ` [PATCH V2 9/9] iommufd: map file selftest Steve Sistare
2024-10-01 21:05   ` Jason Gunthorpe
2024-10-02 17:26     ` Steven Sistare
2024-10-02 17:29       ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.