* [PATCH V4 1/9] mm/gup: folio_add_pins
2024-10-18 21:27 [PATCH V4 0/9] iommu_ioas_map_file Steve Sistare
@ 2024-10-18 21:27 ` Steve Sistare
2024-10-18 21:27 ` [PATCH V4 2/9] iommufd: rename uptr in iopt_alloc_iova Steve Sistare
` (7 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Steve Sistare @ 2024-10-18 21:27 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare
Export a function that adds pins to an already-pinned huge-page folio.
This allows any range of small pages within the folio to be unpinned later.
For example, pages pinned via memfd_pin_folios and modified by
folio_add_pins could be unpinned via unpin_user_page(s).
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
include/linux/mm.h | 1 +
mm/gup.c | 24 ++++++++++++++++++++++++
2 files changed, 25 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecf63d2..f9de33a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2524,6 +2524,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
struct folio **folios, unsigned int max_folios,
pgoff_t *offset);
+int folio_add_pins(struct folio *folio, unsigned int pins);
int get_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages);
diff --git a/mm/gup.c b/mm/gup.c
index a82890b..4ac3e567 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3717,3 +3717,27 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
return ret;
}
EXPORT_SYMBOL_GPL(memfd_pin_folios);
+
+/**
+ * folio_add_pins() - add pins to an already-pinned folio
+ * @folio: the folio to add more pins to
+ * @pins: number of pins to add
+ *
+ * Try to add more pins to an already-pinned folio. The semantics
+ * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot
+ * be changed.
+ *
+ * This function is helpful when having obtained a pin on a large folio
+ * using memfd_pin_folios(), but wanting to logically unpin parts
+ * (e.g., individual pages) of the folio later, for example, using
+ * unpin_user_page_range_dirty_lock().
+ *
+ * This is not the right interface to initially pin a folio.
+ */
+int folio_add_pins(struct folio *folio, unsigned int pins)
+{
+ VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio));
+
+ return try_grab_folio(folio, pins, FOLL_PIN);
+}
+EXPORT_SYMBOL_GPL(folio_add_pins);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH V4 2/9] iommufd: rename uptr in iopt_alloc_iova
2024-10-18 21:27 [PATCH V4 0/9] iommu_ioas_map_file Steve Sistare
2024-10-18 21:27 ` [PATCH V4 1/9] mm/gup: folio_add_pins Steve Sistare
@ 2024-10-18 21:27 ` Steve Sistare
2024-10-22 18:32 ` Nicolin Chen
2024-10-18 21:27 ` [PATCH V4 3/9] iommufd: generalize iopt_pages address Steve Sistare
` (6 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2024-10-18 21:27 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare
iopt_alloc_iova takes a uptr argument but only checks for its alignment.
Generalize this to an unsigned address, which can be the offset from the
start of a file in a subsequent patch. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/iommufd/io_pagetable.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 4bf7ccd..68ad91d 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -107,9 +107,9 @@ static bool __alloc_iova_check_used(struct interval_tree_span_iter *span,
* Does not return a 0 IOVA even if it is valid.
*/
static int iopt_alloc_iova(struct io_pagetable *iopt, unsigned long *iova,
- unsigned long uptr, unsigned long length)
+ unsigned long addr, unsigned long length)
{
- unsigned long page_offset = uptr % PAGE_SIZE;
+ unsigned long page_offset = addr % PAGE_SIZE;
struct interval_tree_double_span_iter used_span;
struct interval_tree_span_iter allowed_span;
unsigned long max_alignment = PAGE_SIZE;
@@ -122,15 +122,15 @@ static int iopt_alloc_iova(struct io_pagetable *iopt, unsigned long *iova,
return -EOVERFLOW;
/*
- * Keep alignment present in the uptr when building the IOVA, this
+ * Keep alignment present in addr when building the IOVA, which
* increases the chance we can map a THP.
*/
- if (!uptr)
+ if (!addr)
iova_alignment = roundup_pow_of_two(length);
else
iova_alignment = min_t(unsigned long,
roundup_pow_of_two(length),
- 1UL << __ffs64(uptr));
+ 1UL << __ffs64(addr));
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
max_alignment = HPAGE_SIZE;
@@ -248,6 +248,7 @@ static int iopt_alloc_area_pages(struct io_pagetable *iopt,
int iommu_prot, unsigned int flags)
{
struct iopt_pages_list *elm;
+ unsigned long start;
unsigned long iova;
int rc = 0;
@@ -267,9 +268,8 @@ static int iopt_alloc_area_pages(struct io_pagetable *iopt,
/* Use the first entry to guess the ideal IOVA alignment */
elm = list_first_entry(pages_list, struct iopt_pages_list,
next);
- rc = iopt_alloc_iova(
- iopt, dst_iova,
- (uintptr_t)elm->pages->uptr + elm->start_byte, length);
+ start = elm->start_byte + (uintptr_t)elm->pages->uptr;
+ rc = iopt_alloc_iova(iopt, dst_iova, start, length);
if (rc)
goto out_unlock;
if (IS_ENABLED(CONFIG_IOMMUFD_TEST) &&
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH V4 2/9] iommufd: rename uptr in iopt_alloc_iova
2024-10-18 21:27 ` [PATCH V4 2/9] iommufd: rename uptr in iopt_alloc_iova Steve Sistare
@ 2024-10-22 18:32 ` Nicolin Chen
0 siblings, 0 replies; 19+ messages in thread
From: Nicolin Chen @ 2024-10-22 18:32 UTC (permalink / raw)
To: Steve Sistare; +Cc: iommu, Jason Gunthorpe, Kevin Tian
On Fri, Oct 18, 2024 at 02:27:29PM -0700, Steve Sistare wrote:
> iopt_alloc_iova takes a uptr argument but only checks for its alignment.
> Generalize this to an unsigned address, which can be the offset from the
> start of a file in a subsequent patch. No functional change.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH V4 3/9] iommufd: generalize iopt_pages address
2024-10-18 21:27 [PATCH V4 0/9] iommu_ioas_map_file Steve Sistare
2024-10-18 21:27 ` [PATCH V4 1/9] mm/gup: folio_add_pins Steve Sistare
2024-10-18 21:27 ` [PATCH V4 2/9] iommufd: rename uptr in iopt_alloc_iova Steve Sistare
@ 2024-10-18 21:27 ` Steve Sistare
2024-10-22 18:38 ` Nicolin Chen
2024-10-18 21:27 ` [PATCH V4 4/9] iommufd: pfn reader local variables Steve Sistare
` (5 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2024-10-18 21:27 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare
The starting address in iopt_pages is currently a __user *uptr. Generalize
to allow other types of addresses. Refactor iopt_alloc_pages and
iopt_map_user_pages into address-type specific and common functions.
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/iommufd/io_pagetable.c | 55 ++++++++++++++++++++++--------------
drivers/iommu/iommufd/io_pagetable.h | 13 +++++++--
drivers/iommu/iommufd/pages.c | 31 ++++++++++++++------
3 files changed, 67 insertions(+), 32 deletions(-)
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 68ad91d..874ee9e 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -384,6 +384,34 @@ int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list,
return rc;
}
+static int iopt_map_common(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
+ struct iopt_pages *pages, unsigned long *iova,
+ unsigned long length, unsigned long start_byte,
+ int iommu_prot, unsigned int flags)
+{
+ struct iopt_pages_list elm = {};
+ LIST_HEAD(pages_list);
+ int rc;
+
+ elm.pages = pages;
+ elm.start_byte = start_byte;
+ if (ictx->account_mode == IOPT_PAGES_ACCOUNT_MM &&
+ elm.pages->account_mode == IOPT_PAGES_ACCOUNT_USER)
+ elm.pages->account_mode = IOPT_PAGES_ACCOUNT_MM;
+ elm.length = length;
+ list_add(&elm.next, &pages_list);
+
+ rc = iopt_map_pages(iopt, &pages_list, length, iova, iommu_prot, flags);
+ if (rc) {
+ if (elm.area)
+ iopt_abort_area(elm.area);
+ if (elm.pages)
+ iopt_put_pages(elm.pages);
+ return rc;
+ }
+ return 0;
+}
+
/**
* iopt_map_user_pages() - Map a user VA to an iova in the io page table
* @ictx: iommufd_ctx the iopt is part of
@@ -408,29 +436,14 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
unsigned long length, int iommu_prot,
unsigned int flags)
{
- struct iopt_pages_list elm = {};
- LIST_HEAD(pages_list);
- int rc;
+ struct iopt_pages *pages;
- elm.pages = iopt_alloc_pages(uptr, length, iommu_prot & IOMMU_WRITE);
- if (IS_ERR(elm.pages))
- return PTR_ERR(elm.pages);
- if (ictx->account_mode == IOPT_PAGES_ACCOUNT_MM &&
- elm.pages->account_mode == IOPT_PAGES_ACCOUNT_USER)
- elm.pages->account_mode = IOPT_PAGES_ACCOUNT_MM;
- elm.start_byte = uptr - elm.pages->uptr;
- elm.length = length;
- list_add(&elm.next, &pages_list);
+ pages = iopt_alloc_user_pages(uptr, length, iommu_prot & IOMMU_WRITE);
+ if (IS_ERR(pages))
+ return PTR_ERR(pages);
- rc = iopt_map_pages(iopt, &pages_list, length, iova, iommu_prot, flags);
- if (rc) {
- if (elm.area)
- iopt_abort_area(elm.area);
- if (elm.pages)
- iopt_put_pages(elm.pages);
- return rc;
- }
- return 0;
+ return iopt_map_common(ictx, iopt, pages, iova, length,
+ uptr - pages->uptr, iommu_prot, flags);
}
struct iova_bitmap_fn_arg {
diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index c61d744..8e48266 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -175,6 +175,10 @@ enum {
IOPT_PAGES_ACCOUNT_MM = 2,
};
+enum iopt_address_type {
+ IOPT_ADDRESS_USER = 0,
+};
+
/*
* This holds a pinned page list for multiple areas of IO address space. The
* pages always originate from a linear chunk of userspace VA. Multiple
@@ -195,7 +199,10 @@ struct iopt_pages {
struct task_struct *source_task;
struct mm_struct *source_mm;
struct user_struct *source_user;
- void __user *uptr;
+ enum iopt_address_type type;
+ union {
+ void __user *uptr; /* IOPT_ADDRESS_USER */
+ };
bool writable:1;
u8 account_mode;
@@ -206,8 +213,8 @@ struct iopt_pages {
struct rb_root_cached domains_itree;
};
-struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
- bool writable);
+struct iopt_pages *iopt_alloc_user_pages(void __user *uptr,
+ unsigned long length, bool writable);
void iopt_release_pages(struct kref *kref);
static inline void iopt_put_pages(struct iopt_pages *pages)
{
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 93d806c..aa69c97 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -1139,11 +1139,11 @@ static int pfn_reader_first(struct pfn_reader *pfns, struct iopt_pages *pages,
return 0;
}
-struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
- bool writable)
+static struct iopt_pages *iopt_alloc_pages(unsigned long length,
+ unsigned long start_byte,
+ bool writable)
{
struct iopt_pages *pages;
- unsigned long end;
/*
* The iommu API uses size_t as the length, and protect the DIV_ROUND_UP
@@ -1152,9 +1152,6 @@ struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
if (length > SIZE_MAX - PAGE_SIZE || length == 0)
return ERR_PTR(-EINVAL);
- if (check_add_overflow((unsigned long)uptr, length, &end))
- return ERR_PTR(-EOVERFLOW);
-
pages = kzalloc(sizeof(*pages), GFP_KERNEL_ACCOUNT);
if (!pages)
return ERR_PTR(-ENOMEM);
@@ -1164,8 +1161,7 @@ struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
mutex_init(&pages->mutex);
pages->source_mm = current->mm;
mmgrab(pages->source_mm);
- pages->uptr = (void __user *)ALIGN_DOWN((uintptr_t)uptr, PAGE_SIZE);
- pages->npages = DIV_ROUND_UP(length + (uptr - pages->uptr), PAGE_SIZE);
+ pages->npages = DIV_ROUND_UP(length + start_byte, PAGE_SIZE);
pages->access_itree = RB_ROOT_CACHED;
pages->domains_itree = RB_ROOT_CACHED;
pages->writable = writable;
@@ -1179,6 +1175,25 @@ struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length,
return pages;
}
+struct iopt_pages *iopt_alloc_user_pages(void __user *uptr,
+ unsigned long length, bool writable)
+{
+ struct iopt_pages *pages;
+ unsigned long end;
+ void __user *uptr_down =
+ (void __user *) ALIGN_DOWN((uintptr_t)uptr, PAGE_SIZE);
+
+ if (check_add_overflow((unsigned long)uptr, length, &end))
+ return ERR_PTR(-EOVERFLOW);
+
+ pages = iopt_alloc_pages(length, uptr - uptr_down, writable);
+ if (IS_ERR(pages))
+ return pages;
+ pages->uptr = uptr_down;
+ pages->type = IOPT_ADDRESS_USER;
+ return pages;
+}
+
void iopt_release_pages(struct kref *kref)
{
struct iopt_pages *pages = container_of(kref, struct iopt_pages, kref);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH V4 3/9] iommufd: generalize iopt_pages address
2024-10-18 21:27 ` [PATCH V4 3/9] iommufd: generalize iopt_pages address Steve Sistare
@ 2024-10-22 18:38 ` Nicolin Chen
0 siblings, 0 replies; 19+ messages in thread
From: Nicolin Chen @ 2024-10-22 18:38 UTC (permalink / raw)
To: Steve Sistare; +Cc: iommu, Jason Gunthorpe, Kevin Tian
On Fri, Oct 18, 2024 at 02:27:30PM -0700, Steve Sistare wrote:
> The starting address in iopt_pages is currently a __user *uptr. Generalize
> to allow other types of addresses. Refactor iopt_alloc_pages and
> iopt_map_user_pages into address-type specific and common functions.
>
> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH V4 4/9] iommufd: pfn reader local variables
2024-10-18 21:27 [PATCH V4 0/9] iommu_ioas_map_file Steve Sistare
` (2 preceding siblings ...)
2024-10-18 21:27 ` [PATCH V4 3/9] iommufd: generalize iopt_pages address Steve Sistare
@ 2024-10-18 21:27 ` Steve Sistare
2024-10-18 21:27 ` [PATCH V4 5/9] iommufd: folio subroutines Steve Sistare
` (4 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Steve Sistare @ 2024-10-18 21:27 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare
Add local variables for common sub-expressions needed by a subsequent
patch. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/iommufd/pages.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index aa69c97..0cf5b65 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -978,6 +978,8 @@ static int pfn_reader_fill_span(struct pfn_reader *pfns)
{
struct interval_tree_double_span_iter *span = &pfns->span;
unsigned long start_index = pfns->batch_end_index;
+ struct pfn_reader_user *user = &pfns->user;
+ unsigned long npages;
struct iopt_area *area;
int rc;
@@ -1015,10 +1017,9 @@ static int pfn_reader_fill_span(struct pfn_reader *pfns)
return rc;
}
- batch_from_pages(&pfns->batch,
- pfns->user.upages +
- (start_index - pfns->user.upages_start),
- pfns->user.upages_end - start_index);
+ npages = user->upages_end - start_index;
+ start_index -= user->upages_start;
+ batch_from_pages(&pfns->batch, user->upages + start_index, npages);
return 0;
}
@@ -1092,16 +1093,18 @@ static int pfn_reader_init(struct pfn_reader *pfns, struct iopt_pages *pages,
static void pfn_reader_release_pins(struct pfn_reader *pfns)
{
struct iopt_pages *pages = pfns->pages;
+ struct pfn_reader_user *user = &pfns->user;
- if (pfns->user.upages_end > pfns->batch_end_index) {
- size_t npages = pfns->user.upages_end - pfns->batch_end_index;
-
+ if (user->upages_end > pfns->batch_end_index) {
/* Any pages not transferred to the batch are just unpinned */
- unpin_user_pages(pfns->user.upages + (pfns->batch_end_index -
- pfns->user.upages_start),
- npages);
+
+ unsigned long npages = user->upages_end - pfns->batch_end_index;
+ unsigned long start_index = pfns->batch_end_index -
+ user->upages_start;
+
+ unpin_user_pages(user->upages + start_index, npages);
iopt_pages_sub_npinned(pages, npages);
- pfns->user.upages_end = pfns->batch_end_index;
+ user->upages_end = pfns->batch_end_index;
}
if (pfns->batch_start_index != pfns->batch_end_index) {
pfn_reader_unpin(pfns);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH V4 5/9] iommufd: folio subroutines
2024-10-18 21:27 [PATCH V4 0/9] iommu_ioas_map_file Steve Sistare
` (3 preceding siblings ...)
2024-10-18 21:27 ` [PATCH V4 4/9] iommufd: pfn reader local variables Steve Sistare
@ 2024-10-18 21:27 ` Steve Sistare
2024-10-21 17:30 ` Jason Gunthorpe
2024-10-18 21:27 ` [PATCH V4 6/9] iommufd: pfn reader for file mappings Steve Sistare
` (3 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2024-10-18 21:27 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare
Add subroutines for copying folios to a batch and unpinning pages.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
drivers/iommu/iommufd/pages.c | 93 +++++++++++++++++++++++++++++++++++++------
1 file changed, 80 insertions(+), 13 deletions(-)
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 0cf5b65..c9ca9e0 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -346,25 +346,40 @@ static void batch_destroy(struct pfn_batch *batch, void *backup)
kfree(batch->pfns);
}
-/* true if the pfn was added, false otherwise */
-static bool batch_add_pfn(struct pfn_batch *batch, unsigned long pfn)
+/* returns the number of pfn's added */
+static long batch_add_pfn_num(struct pfn_batch *batch, unsigned long pfn,
+ unsigned long nr)
{
- const unsigned int MAX_NPFNS = type_max(typeof(*batch->npfns));
+ const unsigned long MAX_NPFNS = type_max(typeof(*batch->npfns));
+ unsigned long n = 0;
- if (batch->end &&
- pfn == batch->pfns[batch->end - 1] + batch->npfns[batch->end - 1] &&
- batch->npfns[batch->end - 1] != MAX_NPFNS) {
- batch->npfns[batch->end - 1]++;
- batch->total_pfns++;
- return true;
+ if (batch->end) {
+ unsigned long npfn_end = batch->npfns[batch->end - 1];
+ unsigned long pfn_end = batch->pfns[batch->end - 1];
+
+ if (pfn == pfn_end + npfn_end && npfn_end < MAX_NPFNS) {
+ n = min_t(unsigned long, MAX_NPFNS - npfn_end, nr);
+ batch->npfns[batch->end - 1] += n;
+ batch->total_pfns += n;
+ if (nr == n)
+ return n;
+ nr -= n;
+ }
}
if (batch->end == batch->array_size)
- return false;
- batch->total_pfns++;
+ return n;
+ nr = min_t(unsigned long, MAX_NPFNS, nr);
+ batch->total_pfns += nr;
batch->pfns[batch->end] = pfn;
- batch->npfns[batch->end] = 1;
+ batch->npfns[batch->end] = nr;
batch->end++;
- return true;
+ return n + nr;
+}
+
+/* true if the pfn was added, false otherwise */
+static bool batch_add_pfn(struct pfn_batch *batch, unsigned long pfn)
+{
+ return batch_add_pfn_num(batch, pfn, 1) == 1;
}
/*
@@ -622,6 +637,58 @@ static void batch_from_pages(struct pfn_batch *batch, struct page **pages,
break;
}
+static void batch_from_folios(struct pfn_batch *batch, struct folio ***folios_p,
+ unsigned long *offset_p, unsigned long npages)
+{
+ unsigned long offset = *offset_p;
+ struct folio **folios = *folios_p;
+
+ while (npages) {
+ unsigned long n;
+ struct folio *folio = *folios;
+ unsigned long nr = folio_nr_pages(folio) - offset;
+ unsigned long pfn = page_to_pfn(folio_page(folio, offset));
+
+ nr = min(nr, npages);
+ n = batch_add_pfn_num(batch, pfn, nr);
+ npages -= n;
+
+ if (n == nr) {
+ folios++;
+ offset = 0;
+ } else if (n) {
+ offset += n;
+ } else {
+ break;
+ }
+ }
+
+ *folios_p = folios;
+ *offset_p = offset;
+}
+
+static void folios_unpin_partial(struct folio **folios, unsigned long offset,
+ unsigned long npages)
+{
+ unsigned long i = 0;
+
+ while (npages) {
+ struct folio *folio = folios[i++];
+ unsigned long nr = folio_nr_pages(folio);
+
+ if (offset == 0 && nr < npages) {
+ unpin_folio(folio);
+ } else {
+ struct page *page = folio_page(folio, offset);
+
+ nr = min(npages, nr - offset);
+ unpin_user_page_range_dirty_lock(page, nr, false);
+ offset = 0;
+ }
+ npages -= nr;
+ }
+}
+
static void batch_unpin(struct pfn_batch *batch, struct iopt_pages *pages,
unsigned int first_page_off, size_t npages)
{
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH V4 5/9] iommufd: folio subroutines
2024-10-18 21:27 ` [PATCH V4 5/9] iommufd: folio subroutines Steve Sistare
@ 2024-10-21 17:30 ` Jason Gunthorpe
2024-10-22 13:26 ` Steven Sistare
0 siblings, 1 reply; 19+ messages in thread
From: Jason Gunthorpe @ 2024-10-21 17:30 UTC (permalink / raw)
To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen
On Fri, Oct 18, 2024 at 02:27:32PM -0700, Steve Sistare wrote:
> -/* true if the pfn was added, false otherwise */
> -static bool batch_add_pfn(struct pfn_batch *batch, unsigned long pfn)
> +/* returns the number of pfn's added */
> +static long batch_add_pfn_num(struct pfn_batch *batch, unsigned long pfn,
> + unsigned long nr)
> {
> - const unsigned int MAX_NPFNS = type_max(typeof(*batch->npfns));
> + const unsigned long MAX_NPFNS = type_max(typeof(*batch->npfns));
> + unsigned long n = 0;
>
> - if (batch->end &&
> - pfn == batch->pfns[batch->end - 1] + batch->npfns[batch->end - 1] &&
> - batch->npfns[batch->end - 1] != MAX_NPFNS) {
> - batch->npfns[batch->end - 1]++;
> - batch->total_pfns++;
> - return true;
> + if (batch->end) {
> + unsigned long npfn_end = batch->npfns[batch->end - 1];
> + unsigned long pfn_end = batch->pfns[batch->end - 1];
> +
> + if (pfn == pfn_end + npfn_end && npfn_end < MAX_NPFNS) {
> + n = min_t(unsigned long, MAX_NPFNS - npfn_end, nr);
> + batch->npfns[batch->end - 1] += n;
> + batch->total_pfns += n;
> + if (nr == n)
> + return n;
> + nr -= n;
Looking at this a bit more carefully, MAX_NPFNS is 2**32 and a PFN is
2**12, so a batch entry can describe 2**44 bytes of contiguous memory.
I don't think we need all this logic to try to do 'min'. If the thing
we are trying to add doesn't fit then just skip trying to join it and
continue on.
The next empty item will hold 2**44 and that can hold any single folio
- folio npages is limited to an unsigned int.
One reason I bring this up is that we really don't want to split
across folios, like if you get a 2M,2M,1G pattern then we don't want
to split the 1G across batches as that will definately harm
construction of the page table. Theoretical because of how big the
size is but still.
> +static void batch_from_folios(struct pfn_batch *batch, struct folio ***folios_p,
> + unsigned long *offset_p, unsigned long npages)
> +{
> + unsigned long offset = *offset_p;
> + struct folio **folios = *folios_p;
> +
> + while (npages) {
> + unsigned long n;
> + struct folio *folio = *folios;
> + unsigned long nr = folio_nr_pages(folio) - offset;
> + unsigned long pfn = page_to_pfn(folio_page(folio, offset));
> +
> + nr = min(nr, npages);
Ah, npages is used for the trailing partial folio too..
> + n = batch_add_pfn_num(batch, pfn, nr);
> + npages -= n;
> +
> + if (n == nr) {
> + folios++;
> + offset = 0;
> + } else if (n) {
> + offset += n;
> + } else {
> + break;
> + }
Then this gets to be quite a bit simpler
And you'll put the refcount adjustment in here like you said?
Otherwise this patch looks like the right thing
Thanks,
Jason
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH V4 5/9] iommufd: folio subroutines
2024-10-21 17:30 ` Jason Gunthorpe
@ 2024-10-22 13:26 ` Steven Sistare
0 siblings, 0 replies; 19+ messages in thread
From: Steven Sistare @ 2024-10-22 13:26 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: iommu, Kevin Tian, Nicolin Chen
On 10/21/2024 1:30 PM, Jason Gunthorpe wrote:
> On Fri, Oct 18, 2024 at 02:27:32PM -0700, Steve Sistare wrote:
>> -/* true if the pfn was added, false otherwise */
>> -static bool batch_add_pfn(struct pfn_batch *batch, unsigned long pfn)
>> +/* returns the number of pfn's added */
>> +static long batch_add_pfn_num(struct pfn_batch *batch, unsigned long pfn,
>> + unsigned long nr)
>> {
>> - const unsigned int MAX_NPFNS = type_max(typeof(*batch->npfns));
>> + const unsigned long MAX_NPFNS = type_max(typeof(*batch->npfns));
>> + unsigned long n = 0;
>>
>> - if (batch->end &&
>> - pfn == batch->pfns[batch->end - 1] + batch->npfns[batch->end - 1] &&
>> - batch->npfns[batch->end - 1] != MAX_NPFNS) {
>> - batch->npfns[batch->end - 1]++;
>> - batch->total_pfns++;
>> - return true;
>> + if (batch->end) {
>> + unsigned long npfn_end = batch->npfns[batch->end - 1];
>> + unsigned long pfn_end = batch->pfns[batch->end - 1];
>> +
>> + if (pfn == pfn_end + npfn_end && npfn_end < MAX_NPFNS) {
>> + n = min_t(unsigned long, MAX_NPFNS - npfn_end, nr);
>> + batch->npfns[batch->end - 1] += n;
>> + batch->total_pfns += n;
>> + if (nr == n)
>> + return n;
>> + nr -= n;
>
> Looking at this a bit more carefully, MAX_NPFNS is 2**32 and a PFN is
> 2**12, so a batch entry can describe 2**44 bytes of contiguous memory.
>
> I don't think we need all this logic to try to do 'min'. If the thing
> we are trying to add doesn't fit then just skip trying to join it and
> continue on.
Gladly. The return type becomes boolean, as either all or none of the folio
is accepted, which further simplifies its caller batch_from_folios.
- Steve
>
> The next empty item will hold 2**44 and that can hold any single folio
> - folio npages is limited to an unsigned int.
>
> One reason I bring this up is that we really don't want to split
> across folios, like if you get a 2M,2M,1G pattern then we don't want
> to split the 1G across batches as that will definately harm
> construction of the page table. Theoretical because of how big the
> size is but still.
>
>> +static void batch_from_folios(struct pfn_batch *batch, struct folio ***folios_p,
>> + unsigned long *offset_p, unsigned long npages)
>> +{
>> + unsigned long offset = *offset_p;
>> + struct folio **folios = *folios_p;
>> +
>> + while (npages) {
>> + unsigned long n;
>> + struct folio *folio = *folios;
>> + unsigned long nr = folio_nr_pages(folio) - offset;
>> + unsigned long pfn = page_to_pfn(folio_page(folio, offset));
>> +
>> + nr = min(nr, npages);
>
> Ah, npages is used for the trailing partial folio too..
>
>> + n = batch_add_pfn_num(batch, pfn, nr);
>> + npages -= n;
>> +
>> + if (n == nr) {
>> + folios++;
>> + offset = 0;
>> + } else if (n) {
>> + offset += n;
>> + } else {
>> + break;
>> + }
>
> Then this gets to be quite a bit simpler
>
> And you'll put the refcount adjustment in here like you said?
>
> Otherwise this patch looks like the right thing
>
> Thanks,
> Jason
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH V4 6/9] iommufd: pfn reader for file mappings
2024-10-18 21:27 [PATCH V4 0/9] iommu_ioas_map_file Steve Sistare
` (4 preceding siblings ...)
2024-10-18 21:27 ` [PATCH V4 5/9] iommufd: folio subroutines Steve Sistare
@ 2024-10-18 21:27 ` Steve Sistare
2024-10-19 18:41 ` kernel test robot
2024-10-21 17:32 ` Jason Gunthorpe
2024-10-18 21:27 ` [PATCH V4 7/9] iommufd: IOMMU_IOAS_MAP_FILE Steve Sistare
` (2 subsequent siblings)
8 siblings, 2 replies; 19+ messages in thread
From: Steve Sistare @ 2024-10-18 21:27 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare
Extend pfn_reader_user to pin file mappings, by calling memfd_pin_folios.
Repin at small page granularity, and fill the batch from folios. Expand
folios to upages for the iopt_pages_fill path.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
drivers/iommu/iommufd/io_pagetable.h | 5 ++
drivers/iommu/iommufd/pages.c | 111 ++++++++++++++++++++++++++++++-----
2 files changed, 101 insertions(+), 15 deletions(-)
diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index 8e48266..5ac4eed 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -177,6 +177,7 @@ enum {
enum iopt_address_type {
IOPT_ADDRESS_USER = 0,
+ IOPT_ADDRESS_FILE = 1,
};
/*
@@ -202,6 +203,10 @@ struct iopt_pages {
enum iopt_address_type type;
union {
void __user *uptr; /* IOPT_ADDRESS_USER */
+ struct { /* IOPT_ADDRESS_FILE */
+ struct file *file;
+ unsigned long start;
+ };
};
bool writable:1;
u8 account_mode;
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index c9ca9e0..9c88b0d 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -770,19 +770,32 @@ struct pfn_reader_user {
* neither
*/
int locked;
+
+ /* The following are only valid if file != NULL. */
+ struct file *file;
+ struct folio **ufolios;
+ unsigned long ufolios_len;
+ unsigned long ufolios_offset;
+ struct folio **ufolios_next;
};
static void pfn_reader_user_init(struct pfn_reader_user *user,
struct iopt_pages *pages)
{
user->upages = NULL;
+ user->upages_len = 0;
user->upages_start = 0;
user->upages_end = 0;
user->locked = -1;
-
user->gup_flags = FOLL_LONGTERM;
if (pages->writable)
user->gup_flags |= FOLL_WRITE;
+
+ user->file = (pages->type == IOPT_ADDRESS_FILE) ? pages->file : NULL;
+ user->ufolios = NULL;
+ user->ufolios_len = 0;
+ user->ufolios_next = NULL;
+ user->ufolios_offset = 0;
}
static void pfn_reader_user_destroy(struct pfn_reader_user *user,
@@ -798,6 +811,50 @@ static void pfn_reader_user_destroy(struct pfn_reader_user *user,
kfree(user->upages);
user->upages = NULL;
+ kfree(user->ufolios);
+ user->ufolios = NULL;
+}
+
+static long pin_memfd_pages(struct pfn_reader_user *user, unsigned long start,
+ unsigned long npages)
+{
+ unsigned long i;
+ unsigned long j;
+ unsigned long offset;
+ unsigned long npages_out = 0;
+ struct page **upages = user->upages;
+ unsigned long end = start + (npages << PAGE_SHIFT) - 1;
+ long nfolios = user->ufolios_len / sizeof(*user->ufolios);
+
+ nfolios = memfd_pin_folios(user->file, start, end, user->ufolios,
+ nfolios, &offset);
+ if (nfolios <= 0)
+ return nfolios;
+
+ offset >>= PAGE_SHIFT;
+ user->ufolios_next = user->ufolios;
+ user->ufolios_offset = offset;
+
+ for (i = 0; i < nfolios; i++) {
+ struct folio *folio = user->ufolios[i];
+ unsigned long nr = folio_nr_pages(folio);
+ unsigned long npin = min(nr - offset, npages);
+
+ if (npin > 1) {
+ int rc = folio_add_pins(folio, npin - 1);
+
+ if (rc)
+ return rc;
+ }
+ if (upages)
+ for (j = offset; j < offset + npin; j++)
+ *upages++ = folio_page(folio, j);
+ npages -= npin;
+ npages_out += npin;
+ offset = 0;
+ }
+
+ return npages_out;
}
static int pfn_reader_user_pin(struct pfn_reader_user *user,
@@ -806,7 +863,9 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
unsigned long last_index)
{
bool remote_mm = pages->source_mm != current->mm;
- unsigned long npages;
+ unsigned long npages = last_index - start_index + 1;
+ unsigned long start;
+ unsigned long unum;
uintptr_t uptr;
long rc;
@@ -814,40 +873,50 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
WARN_ON(last_index < start_index))
return -EINVAL;
- if (!user->upages) {
+ if (!user->file && !user->upages) {
/* All undone in pfn_reader_destroy() */
- user->upages_len =
- (last_index - start_index + 1) * sizeof(*user->upages);
+ user->upages_len = npages * sizeof(*user->upages);
user->upages = temp_kmalloc(&user->upages_len, NULL, 0);
if (!user->upages)
return -ENOMEM;
}
+ if (user->file && !user->ufolios) {
+ user->ufolios_len = npages * sizeof(*user->ufolios);
+ user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
+ if (!user->ufolios)
+ return -ENOMEM;
+ }
+
if (user->locked == -1) {
/*
* The majority of usages will run the map task within the mm
* providing the pages, so we can optimize into
* get_user_pages_fast()
*/
- if (remote_mm) {
+ if (!user->file && remote_mm) {
if (!mmget_not_zero(pages->source_mm))
return -EFAULT;
}
user->locked = 0;
}
- npages = min_t(unsigned long, last_index - start_index + 1,
- user->upages_len / sizeof(*user->upages));
-
+ unum = user->file ? user->ufolios_len / sizeof(*user->ufolios) :
+ user->upages_len / sizeof(*user->upages);
+ npages = min_t(unsigned long, npages, unum);
if (iommufd_should_fail())
return -EFAULT;
- uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
- if (!remote_mm)
+ if (user->file) {
+ start = pages->start + (start_index * PAGE_SIZE);
+ rc = pin_memfd_pages(user, start, npages);
+ } else if (!remote_mm) {
+ uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
rc = pin_user_pages_fast(uptr, npages, user->gup_flags,
user->upages);
- else {
+ } else {
+ uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
if (!user->locked) {
mmap_read_lock(pages->source_mm);
user->locked = 1;
@@ -905,7 +974,8 @@ static int update_mm_locked_vm(struct iopt_pages *pages, unsigned long npages,
mmap_read_unlock(pages->source_mm);
user->locked = 0;
/* If we had the lock then we also have a get */
- } else if ((!user || !user->upages) &&
+
+ } else if ((!user || (!user->upages && !user->ufolios)) &&
pages->source_mm != current->mm) {
if (!mmget_not_zero(pages->source_mm))
return -EINVAL;
@@ -1086,7 +1156,13 @@ static int pfn_reader_fill_span(struct pfn_reader *pfns)
npages = user->upages_end - start_index;
start_index -= user->upages_start;
- batch_from_pages(&pfns->batch, user->upages + start_index, npages);
+
+ if (!user->file)
+ batch_from_pages(&pfns->batch, user->upages + start_index,
+ npages);
+ else
+ batch_from_folios(&pfns->batch, &user->ufolios_next,
+ &user->ufolios_offset, npages);
return 0;
}
@@ -1169,7 +1245,12 @@ static void pfn_reader_release_pins(struct pfn_reader *pfns)
unsigned long start_index = pfns->batch_end_index -
user->upages_start;
- unpin_user_pages(user->upages + start_index, npages);
+ if (!user->file)
+ unpin_user_pages(user->upages + start_index, npages);
+ else
+ folios_unpin_partial(user->ufolios_next,
+ user->ufolios_offset, npages);
+
iopt_pages_sub_npinned(pages, npages);
user->upages_end = pfns->batch_end_index;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH V4 6/9] iommufd: pfn reader for file mappings
2024-10-18 21:27 ` [PATCH V4 6/9] iommufd: pfn reader for file mappings Steve Sistare
@ 2024-10-19 18:41 ` kernel test robot
2024-10-21 17:32 ` Jason Gunthorpe
1 sibling, 0 replies; 19+ messages in thread
From: kernel test robot @ 2024-10-19 18:41 UTC (permalink / raw)
To: Steve Sistare, iommu
Cc: oe-kbuild-all, Jason Gunthorpe, Kevin Tian, Nicolin Chen,
Steve Sistare
Hi Steve,
kernel test robot noticed the following build errors:
[auto build test ERROR on e2d8fe9148b79ed1cbf0663edc988db7769173dc]
url: https://github.com/intel-lab-lkp/linux/commits/Steve-Sistare/mm-gup-folio_add_pins/20241019-053644
base: e2d8fe9148b79ed1cbf0663edc988db7769173dc
patch link: https://lore.kernel.org/r/1729286856-127844-7-git-send-email-steven.sistare%40oracle.com
patch subject: [PATCH V4 6/9] iommufd: pfn reader for file mappings
config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20241020/202410200205.mriS8xuS-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241020/202410200205.mriS8xuS-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410200205.mriS8xuS-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/iommu/iommufd/pages.c: In function 'pfn_reader_user_pin':
>> drivers/iommu/iommufd/pages.c:886:46: error: passing argument 1 of 'temp_kmalloc' from incompatible pointer type [-Wincompatible-pointer-types]
886 | user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
| ^~~~~~~~~~~~~~~~~~
| |
| long unsigned int *
drivers/iommu/iommufd/pages.c:74:35: note: expected 'size_t *' {aka 'unsigned int *'} but argument is of type 'long unsigned int *'
74 | static void *temp_kmalloc(size_t *size, void *backup, size_t backup_len)
| ~~~~~~~~^~~~
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for GET_FREE_REGION
Depends on [n]: SPARSEMEM [=n]
Selected by [m]:
- RESOURCE_KUNIT_TEST [=m] && RUNTIME_TESTING_MENU [=y] && KUNIT [=m]
vim +/temp_kmalloc +886 drivers/iommu/iommufd/pages.c
859
860 static int pfn_reader_user_pin(struct pfn_reader_user *user,
861 struct iopt_pages *pages,
862 unsigned long start_index,
863 unsigned long last_index)
864 {
865 bool remote_mm = pages->source_mm != current->mm;
866 unsigned long npages = last_index - start_index + 1;
867 unsigned long start;
868 unsigned long unum;
869 uintptr_t uptr;
870 long rc;
871
872 if (IS_ENABLED(CONFIG_IOMMUFD_TEST) &&
873 WARN_ON(last_index < start_index))
874 return -EINVAL;
875
876 if (!user->file && !user->upages) {
877 /* All undone in pfn_reader_destroy() */
878 user->upages_len = npages * sizeof(*user->upages);
879 user->upages = temp_kmalloc(&user->upages_len, NULL, 0);
880 if (!user->upages)
881 return -ENOMEM;
882 }
883
884 if (user->file && !user->ufolios) {
885 user->ufolios_len = npages * sizeof(*user->ufolios);
> 886 user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0);
887 if (!user->ufolios)
888 return -ENOMEM;
889 }
890
891 if (user->locked == -1) {
892 /*
893 * The majority of usages will run the map task within the mm
894 * providing the pages, so we can optimize into
895 * get_user_pages_fast()
896 */
897 if (!user->file && remote_mm) {
898 if (!mmget_not_zero(pages->source_mm))
899 return -EFAULT;
900 }
901 user->locked = 0;
902 }
903
904 unum = user->file ? user->ufolios_len / sizeof(*user->ufolios) :
905 user->upages_len / sizeof(*user->upages);
906 npages = min_t(unsigned long, npages, unum);
907
908 if (iommufd_should_fail())
909 return -EFAULT;
910
911 if (user->file) {
912 start = pages->start + (start_index * PAGE_SIZE);
913 rc = pin_memfd_pages(user, start, npages);
914 } else if (!remote_mm) {
915 uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
916 rc = pin_user_pages_fast(uptr, npages, user->gup_flags,
917 user->upages);
918 } else {
919 uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE);
920 if (!user->locked) {
921 mmap_read_lock(pages->source_mm);
922 user->locked = 1;
923 }
924 rc = pin_user_pages_remote(pages->source_mm, uptr, npages,
925 user->gup_flags, user->upages,
926 &user->locked);
927 }
928 if (rc <= 0) {
929 if (WARN_ON(!rc))
930 return -EFAULT;
931 return rc;
932 }
933 iopt_pages_add_npinned(pages, rc);
934 user->upages_start = start_index;
935 user->upages_end = start_index + rc;
936 return 0;
937 }
938
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH V4 6/9] iommufd: pfn reader for file mappings
2024-10-18 21:27 ` [PATCH V4 6/9] iommufd: pfn reader for file mappings Steve Sistare
2024-10-19 18:41 ` kernel test robot
@ 2024-10-21 17:32 ` Jason Gunthorpe
1 sibling, 0 replies; 19+ messages in thread
From: Jason Gunthorpe @ 2024-10-21 17:32 UTC (permalink / raw)
To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen
On Fri, Oct 18, 2024 at 02:27:33PM -0700, Steve Sistare wrote:
> +static long pin_memfd_pages(struct pfn_reader_user *user, unsigned long start,
> + unsigned long npages)
> +{
> + unsigned long i;
> + unsigned long j;
> + unsigned long offset;
> + unsigned long npages_out = 0;
> + struct page **upages = user->upages;
> + unsigned long end = start + (npages << PAGE_SHIFT) - 1;
> + long nfolios = user->ufolios_len / sizeof(*user->ufolios);
> +
> + nfolios = memfd_pin_folios(user->file, start, end, user->ufolios,
> + nfolios, &offset);
> + if (nfolios <= 0)
> + return nfolios;
> +
> + offset >>= PAGE_SHIFT;
> + user->ufolios_next = user->ufolios;
> + user->ufolios_offset = offset;
> +
> + for (i = 0; i < nfolios; i++) {
> + struct folio *folio = user->ufolios[i];
> + unsigned long nr = folio_nr_pages(folio);
> + unsigned long npin = min(nr - offset, npages);
Let's add a comment note about the loop computing npages being able to be
optimized someday
But I think this looks good, with the add_refs change you noted
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> + if (upages)
> + for (j = offset; j < offset + npin; j++)
> + *upages++ = folio_page(folio, j);
The upages path is 'slow' and rarely used because it is for mdevs
only, so it would be fine if it was the only loop here.
Jason
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH V4 7/9] iommufd: IOMMU_IOAS_MAP_FILE
2024-10-18 21:27 [PATCH V4 0/9] iommu_ioas_map_file Steve Sistare
` (5 preceding siblings ...)
2024-10-18 21:27 ` [PATCH V4 6/9] iommufd: pfn reader for file mappings Steve Sistare
@ 2024-10-18 21:27 ` Steve Sistare
2024-10-21 17:34 ` Jason Gunthorpe
2024-10-18 21:27 ` [PATCH V4 8/9] iommufd: file mappings for mdev Steve Sistare
2024-10-18 21:27 ` [PATCH V4 9/9] iommufd: map file selftest Steve Sistare
8 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2024-10-18 21:27 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare
Define the IOMMU_IOAS_MAP_FILE ioctl interface, which allows a user to
register memory by passing a memfd plus offset and length. Implement it
using the memfd_pin_folios KAPI.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
drivers/iommu/iommufd/io_pagetable.c | 36 ++++++++++++++++++++++++++++++-
drivers/iommu/iommufd/io_pagetable.h | 2 ++
drivers/iommu/iommufd/ioas.c | 38 +++++++++++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 5 +++++
drivers/iommu/iommufd/main.c | 2 ++
drivers/iommu/iommufd/pages.c | 23 ++++++++++++++++++++
include/uapi/linux/iommufd.h | 25 ++++++++++++++++++++++
7 files changed, 130 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 874ee9e..8a790e5 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -268,7 +268,14 @@ static int iopt_alloc_area_pages(struct io_pagetable *iopt,
/* Use the first entry to guess the ideal IOVA alignment */
elm = list_first_entry(pages_list, struct iopt_pages_list,
next);
- start = elm->start_byte + (uintptr_t)elm->pages->uptr;
+ switch (elm->pages->type) {
+ case IOPT_ADDRESS_USER:
+ start = elm->start_byte + (uintptr_t)elm->pages->uptr;
+ break;
+ case IOPT_ADDRESS_FILE:
+ start = elm->start_byte + elm->pages->start;
+ break;
+ }
rc = iopt_alloc_iova(iopt, dst_iova, start, length);
if (rc)
goto out_unlock;
@@ -446,6 +453,33 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
uptr - pages->uptr, iommu_prot, flags);
}
+/**
+ * iopt_map_file_pages() - Like iopt_map_user_pages, but map a file.
+ * @ictx: iommufd_ctx the iopt is part of
+ * @iopt: io_pagetable to act on
+ * @iova: If IOPT_ALLOC_IOVA is set this is unused on input and contains
+ * the chosen iova on output. Otherwise is the iova to map to on input
+ * @file: file to map
+ * @start: map file starting at this byte offset
+ * @length: Number of bytes to map
+ * @iommu_prot: Combination of IOMMU_READ/WRITE/etc bits for the mapping
+ * @flags: IOPT_ALLOC_IOVA or zero
+ */
+int iopt_map_file_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
+ unsigned long *iova, struct file *file,
+ unsigned long start, unsigned long length,
+ int iommu_prot, unsigned int flags)
+{
+ struct iopt_pages *pages;
+
+ pages = iopt_alloc_file_pages(file, start, length,
+ iommu_prot & IOMMU_WRITE);
+ if (IS_ERR(pages))
+ return PTR_ERR(pages);
+ return iopt_map_common(ictx, iopt, pages, iova, length,
+ start - pages->start, iommu_prot, flags);
+}
+
struct iova_bitmap_fn_arg {
unsigned long flags;
struct io_pagetable *iopt;
diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index 5ac4eed..9b40b22 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -220,6 +220,8 @@ struct iopt_pages {
struct iopt_pages *iopt_alloc_user_pages(void __user *uptr,
unsigned long length, bool writable);
+struct iopt_pages *iopt_alloc_file_pages(struct file *file, unsigned long start,
+ unsigned long length, bool writable);
void iopt_release_pages(struct kref *kref);
static inline void iopt_put_pages(struct iopt_pages *pages)
{
diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index 2c4b2bb..3d31bcf 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -2,6 +2,7 @@
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
*/
+#include <linux/file.h>
#include <linux/interval_tree.h>
#include <linux/iommu.h>
#include <linux/iommufd.h>
@@ -197,6 +198,43 @@ static int conv_iommu_prot(u32 map_flags)
return iommu_prot;
}
+int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_ioas_map_file *cmd = ucmd->cmd;
+ unsigned long iova = cmd->iova;
+ struct iommufd_ioas *ioas;
+ unsigned int flags = 0;
+ int rc;
+ struct file *file;
+
+ if (cmd->iova >= ULONG_MAX || cmd->length >= ULONG_MAX)
+ return -EOVERFLOW;
+
+ ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id);
+ if (IS_ERR(ioas))
+ return PTR_ERR(ioas);
+
+ if (!(cmd->flags & IOMMU_IOAS_MAP_FIXED_IOVA))
+ flags = IOPT_ALLOC_IOVA;
+
+ file = fget(cmd->fd);
+ if (!file)
+ return -EBADF;
+
+ rc = iopt_map_file_pages(ucmd->ictx, &ioas->iopt, &iova, file,
+ cmd->start, cmd->length,
+ conv_iommu_prot(cmd->flags), flags);
+ if (rc)
+ goto out_put;
+
+ cmd->iova = iova;
+ rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out_put:
+ iommufd_put_object(ucmd->ictx, &ioas->obj);
+ fput(file);
+ return rc;
+}
+
int iommufd_ioas_map(struct iommufd_ucmd *ucmd)
{
struct iommu_ioas_map *cmd = ucmd->cmd;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index f1d865e..8f3c21a 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -69,6 +69,10 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
unsigned long *iova, void __user *uptr,
unsigned long length, int iommu_prot,
unsigned int flags);
+int iopt_map_file_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
+ unsigned long *iova, struct file *file,
+ unsigned long start, unsigned long length,
+ int iommu_prot, unsigned int flags);
int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list,
unsigned long length, unsigned long *dst_iova,
int iommu_prot, unsigned int flags);
@@ -276,6 +280,7 @@ static inline struct iommufd_ioas *iommufd_get_ioas(struct iommufd_ctx *ictx,
int iommufd_ioas_iova_ranges(struct iommufd_ucmd *ucmd);
int iommufd_ioas_allow_iovas(struct iommufd_ucmd *ucmd);
int iommufd_ioas_map(struct iommufd_ucmd *ucmd);
+int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd);
int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
int iommufd_ioas_option(struct iommufd_ucmd *ucmd);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index b5f5d27..826a2b2 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -378,6 +378,8 @@ struct iommufd_ioctl_op {
struct iommu_ioas_iova_ranges, out_iova_alignment),
IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map,
iova),
+ IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file,
+ struct iommu_ioas_map_file, iova),
IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap,
length),
IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option,
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 9c88b0d..d369f56 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -45,6 +45,7 @@
* last_iova + 1 can overflow. An iopt_pages index will always be much less than
* ULONG_MAX so last_index + 1 cannot overflow.
*/
+#include <linux/file.h>
#include <linux/highmem.h>
#include <linux/iommu.h>
#include <linux/iommufd.h>
@@ -1345,6 +1346,26 @@ struct iopt_pages *iopt_alloc_user_pages(void __user *uptr,
return pages;
}
+struct iopt_pages *iopt_alloc_file_pages(struct file *file, unsigned long start,
+ unsigned long length, bool writable)
+
+{
+ struct iopt_pages *pages;
+ unsigned long start_down = ALIGN_DOWN(start, PAGE_SIZE);
+ unsigned long end;
+
+ if (length && check_add_overflow(start, length - 1, &end))
+ return ERR_PTR(-EOVERFLOW);
+
+ pages = iopt_alloc_pages(length, start - start_down, writable);
+ if (IS_ERR(pages))
+ return pages;
+ pages->file = get_file(file);
+ pages->start = start_down;
+ pages->type = IOPT_ADDRESS_FILE;
+ return pages;
+}
+
void iopt_release_pages(struct kref *kref)
{
struct iopt_pages *pages = container_of(kref, struct iopt_pages, kref);
@@ -1357,6 +1378,8 @@ void iopt_release_pages(struct kref *kref)
mutex_destroy(&pages->mutex);
put_task_struct(pages->source_task);
free_uid(pages->source_user);
+ if (pages->type == IOPT_ADDRESS_FILE)
+ fput(pages->file);
kfree(pages);
}
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 72010f7..a8f0d30 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -51,6 +51,7 @@ enum {
IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP = 0x8c,
IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d,
IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e,
+ IOMMUFD_CMD_IOAS_MAP_FILE = 0x8f,
};
/**
@@ -214,6 +215,30 @@ struct iommu_ioas_map {
#define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP)
/**
+ * struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE)
+ * @size: sizeof(struct iommu_ioas_map_file)
+ * @fd: the memfd to map
+ * @start: byte offset from start of file to map from
+ * @flags: same as for iommu_ioas_map
+ * @ios_id: ditto
+ * @length: ditto
+ * @iova: ditto
+ *
+ * Set an IOVA mapping from a memfd file. All other arguments and semantics
+ * match those of IOMMU_IOAS_MAP.
+ */
+struct iommu_ioas_map_file {
+ __u32 size;
+ __u32 flags;
+ __u32 ioas_id;
+ __s32 fd;
+ __aligned_u64 start;
+ __aligned_u64 length;
+ __aligned_u64 iova;
+};
+#define IOMMU_IOAS_MAP_FILE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP_FILE)
+
+/**
* struct iommu_ioas_copy - ioctl(IOMMU_IOAS_COPY)
* @size: sizeof(struct iommu_ioas_copy)
* @flags: Combination of enum iommufd_ioas_map_flags
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH V4 7/9] iommufd: IOMMU_IOAS_MAP_FILE
2024-10-18 21:27 ` [PATCH V4 7/9] iommufd: IOMMU_IOAS_MAP_FILE Steve Sistare
@ 2024-10-21 17:34 ` Jason Gunthorpe
0 siblings, 0 replies; 19+ messages in thread
From: Jason Gunthorpe @ 2024-10-21 17:34 UTC (permalink / raw)
To: Steve Sistare; +Cc: iommu, Kevin Tian, Nicolin Chen
On Fri, Oct 18, 2024 at 02:27:34PM -0700, Steve Sistare wrote:
> Define the IOMMU_IOAS_MAP_FILE ioctl interface, which allows a user to
> register memory by passing a memfd plus offset and length. Implement it
> using the memfd_pin_folios KAPI.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> drivers/iommu/iommufd/io_pagetable.c | 36 ++++++++++++++++++++++++++++++-
> drivers/iommu/iommufd/io_pagetable.h | 2 ++
> drivers/iommu/iommufd/ioas.c | 38 +++++++++++++++++++++++++++++++++
> drivers/iommu/iommufd/iommufd_private.h | 5 +++++
> drivers/iommu/iommufd/main.c | 2 ++
> drivers/iommu/iommufd/pages.c | 23 ++++++++++++++++++++
> include/uapi/linux/iommufd.h | 25 ++++++++++++++++++++++
> 7 files changed, 130 insertions(+), 1 deletion(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH V4 8/9] iommufd: file mappings for mdev
2024-10-18 21:27 [PATCH V4 0/9] iommu_ioas_map_file Steve Sistare
` (6 preceding siblings ...)
2024-10-18 21:27 ` [PATCH V4 7/9] iommufd: IOMMU_IOAS_MAP_FILE Steve Sistare
@ 2024-10-18 21:27 ` Steve Sistare
2024-10-18 21:27 ` [PATCH V4 9/9] iommufd: map file selftest Steve Sistare
8 siblings, 0 replies; 19+ messages in thread
From: Steve Sistare @ 2024-10-18 21:27 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare
Support file mappings for mediated devices, aka mdevs. Access is initiated
by the vfio_pin_pages and vfio_dma_rw kernel interfaces.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/iommufd/pages.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index d369f56..75d0fed 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -1819,11 +1819,11 @@ static int iopt_pages_fill_from_domain(struct iopt_pages *pages,
return 0;
}
-static int iopt_pages_fill_from_mm(struct iopt_pages *pages,
- struct pfn_reader_user *user,
- unsigned long start_index,
- unsigned long last_index,
- struct page **out_pages)
+static int iopt_pages_fill(struct iopt_pages *pages,
+ struct pfn_reader_user *user,
+ unsigned long start_index,
+ unsigned long last_index,
+ struct page **out_pages)
{
unsigned long cur_index = start_index;
int rc;
@@ -1897,8 +1897,8 @@ int iopt_pages_fill_xarray(struct iopt_pages *pages, unsigned long start_index,
/* hole */
cur_pages = out_pages + (span.start_hole - start_index);
- rc = iopt_pages_fill_from_mm(pages, &user, span.start_hole,
- span.last_hole, cur_pages);
+ rc = iopt_pages_fill(pages, &user, span.start_hole,
+ span.last_hole, cur_pages);
if (rc)
goto out_clean_xa;
rc = pages_to_xarray(&pages->pinned_pfns, span.start_hole,
@@ -1978,6 +1978,10 @@ static int iopt_pages_rw_page(struct iopt_pages *pages, unsigned long index,
struct page *page = NULL;
int rc;
+ if (IS_ENABLED(CONFIG_IOMMUFD_TEST) &&
+ WARN_ON(pages->type != IOPT_ADDRESS_USER))
+ return -EINVAL;
+
if (!mmget_not_zero(pages->source_mm))
return iopt_pages_rw_slow(pages, index, index, offset, data,
length, flags);
@@ -2033,6 +2037,15 @@ int iopt_pages_rw_access(struct iopt_pages *pages, unsigned long start_byte,
if ((flags & IOMMUFD_ACCESS_RW_WRITE) && !pages->writable)
return -EPERM;
+ if (pages->type == IOPT_ADDRESS_FILE)
+ return iopt_pages_rw_slow(pages, start_index, last_index,
+ start_byte % PAGE_SIZE, data, length,
+ flags);
+
+ if (IS_ENABLED(CONFIG_IOMMUFD_TEST) &&
+ WARN_ON(pages->type != IOPT_ADDRESS_USER))
+ return -EINVAL;
+
if (!(flags & IOMMUFD_ACCESS_RW_KTHREAD) && change_mm) {
if (start_index == last_index)
return iopt_pages_rw_page(pages, start_index,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH V4 9/9] iommufd: map file selftest
2024-10-18 21:27 [PATCH V4 0/9] iommu_ioas_map_file Steve Sistare
` (7 preceding siblings ...)
2024-10-18 21:27 ` [PATCH V4 8/9] iommufd: file mappings for mdev Steve Sistare
@ 2024-10-18 21:27 ` Steve Sistare
2024-10-18 22:51 ` Nicolin Chen
8 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2024-10-18 21:27 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Nicolin Chen, Steve Sistare
Add test cases to exercise IOMMU_IOAS_MAP_FILE.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
tools/testing/selftests/iommu/iommufd.c | 151 +++++++++++++++++++++++---
tools/testing/selftests/iommu/iommufd_utils.h | 41 +++++++
2 files changed, 177 insertions(+), 15 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 4927b9a..3525670 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -3,6 +3,8 @@
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/eventfd.h>
+#include <sys/syscall.h>
+#include <asm/unistd.h>
#define __EXPORTED_HEADERS__
#include <linux/vfio.h>
@@ -10,6 +12,8 @@
#include "iommufd_utils.h"
static unsigned long HUGEPAGE_SIZE;
+static void *mfd_buffer;
+static int mfd;
#define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
#define MOCK_HUGE_PAGE_SIZE (512 * MOCK_PAGE_SIZE)
@@ -33,6 +37,19 @@ static unsigned long get_huge_page_size(void)
return strtoul(buf, NULL, 10);
}
+static void *memfd_mmap(size_t length, int prot, int flags, int *mfd_p)
+{
+ int mfd_flags = (flags & MAP_HUGETLB) ? MFD_HUGETLB : 0;
+ int mfd = memfd_create("buffer", mfd_flags);
+
+ if (mfd <= 0)
+ return MAP_FAILED;
+ if (ftruncate(mfd, length))
+ return MAP_FAILED;
+ *mfd_p = mfd;
+ return mmap(0, length, prot, flags, mfd, 0);
+}
+
static __attribute__((constructor)) void setup_sizes(void)
{
void *vrc;
@@ -49,6 +66,9 @@ static __attribute__((constructor)) void setup_sizes(void)
vrc = mmap(buffer, BUFFER_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
assert(vrc == buffer);
+
+ mfd_buffer = memfd_mmap(BUFFER_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
+ &mfd);
}
FIXTURE(iommufd)
@@ -1372,6 +1392,7 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
{
unsigned int mock_domains;
bool hugepages;
+ bool file;
};
FIXTURE_SETUP(iommufd_mock_domain)
@@ -1410,26 +1431,45 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
{
.mock_domains = 1,
.hugepages = false,
+ .file = false,
};
FIXTURE_VARIANT_ADD(iommufd_mock_domain, two_domains)
{
.mock_domains = 2,
.hugepages = false,
+ .file = false,
};
FIXTURE_VARIANT_ADD(iommufd_mock_domain, one_domain_hugepage)
{
.mock_domains = 1,
.hugepages = true,
+ .file = false,
};
FIXTURE_VARIANT_ADD(iommufd_mock_domain, two_domains_hugepage)
{
.mock_domains = 2,
.hugepages = true,
+ .file = false,
};
+FIXTURE_VARIANT_ADD(iommufd_mock_domain, one_domain_file)
+{
+ .mock_domains = 1,
+ .hugepages = false,
+ .file = true,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_mock_domain, one_domain_file_hugepage)
+{
+ .mock_domains = 1,
+ .hugepages = true,
+ .file = true,
+};
+
+
/* Have the kernel check that the user pages made it to the iommu_domain */
#define check_mock_iova(_ptr, _iova, _length) \
({ \
@@ -1455,7 +1495,10 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
} \
})
-TEST_F(iommufd_mock_domain, basic)
+static void
+test_basic_mmap(struct __test_metadata *_metadata,
+ struct _test_data_iommufd_mock_domain *self,
+ const struct _fixture_variant_iommufd_mock_domain *variant)
{
size_t buf_size = self->mmap_buf_size;
uint8_t *buf;
@@ -1478,12 +1521,52 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
test_err_ioctl_ioas_map(EFAULT, buf, buf_size, &iova);
}
+static void
+test_basic_file(struct __test_metadata *_metadata,
+ struct _test_data_iommufd_mock_domain *self,
+ const struct _fixture_variant_iommufd_mock_domain *variant)
+{
+ size_t buf_size = self->mmap_buf_size;
+ uint8_t *buf;
+ __u64 iova;
+ int mfd_tmp;
+ int prot = PROT_READ | PROT_WRITE;
+
+ /* Simple one page map */
+ test_ioctl_ioas_map_file(mfd, 0, PAGE_SIZE, &iova);
+ check_mock_iova(mfd_buffer, iova, PAGE_SIZE);
+
+ buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd_tmp);
+ ASSERT_NE(MAP_FAILED, buf);
+
+ /* EFAULT half way through mapping */
+ ASSERT_EQ(0, munmap(buf + buf_size / 2, buf_size / 2));
+ test_err_ioctl_ioas_map_file(EFAULT, 0, buf_size, &iova);
+
+ /* EFAULT on first page */
+ ASSERT_EQ(0, munmap(buf, buf_size / 2));
+ test_err_ioctl_ioas_map_file(EFAULT, 0, buf_size, &iova);
+
+ close(mfd_tmp);
+}
+
+TEST_F(iommufd_mock_domain, basic)
+{
+ if (variant->file)
+ test_basic_file(_metadata, self, variant);
+ else
+ test_basic_mmap(_metadata, self, variant);
+}
+
TEST_F(iommufd_mock_domain, ro_unshare)
{
uint8_t *buf;
__u64 iova;
int fd;
+ if (variant->file)
+ SKIP(return, "redundant test");
+
fd = open("/proc/self/exe", O_RDONLY);
ASSERT_NE(-1, fd);
@@ -1513,9 +1596,13 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
unsigned int start;
unsigned int end;
uint8_t *buf;
+ int prot = PROT_READ | PROT_WRITE;
+ int mfd;
- buf = mmap(0, buf_size, PROT_READ | PROT_WRITE, self->mmap_flags, -1,
- 0);
+ if (variant->file)
+ buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd);
+ else
+ buf = mmap(0, buf_size, prot, self->mmap_flags, -1, 0);
ASSERT_NE(MAP_FAILED, buf);
check_refs(buf, buf_size, 0);
@@ -1532,7 +1619,12 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
size_t length = end - start;
__u64 iova;
- test_ioctl_ioas_map(buf + start, length, &iova);
+ if (variant->file) {
+ test_ioctl_ioas_map_file(mfd, start, length,
+ &iova);
+ } else {
+ test_ioctl_ioas_map(buf + start, length, &iova);
+ }
check_mock_iova(buf + start, iova, length);
check_refs(buf + start / PAGE_SIZE * PAGE_SIZE,
end / PAGE_SIZE * PAGE_SIZE -
@@ -1544,6 +1636,8 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
}
check_refs(buf, buf_size, 0);
ASSERT_EQ(0, munmap(buf, buf_size));
+ if (variant->file)
+ close(mfd);
}
TEST_F(iommufd_mock_domain, all_aligns_copy)
@@ -1554,9 +1648,13 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
unsigned int start;
unsigned int end;
uint8_t *buf;
+ int prot = PROT_READ | PROT_WRITE;
+ int mfd;
- buf = mmap(0, buf_size, PROT_READ | PROT_WRITE, self->mmap_flags, -1,
- 0);
+ if (variant->file)
+ buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd);
+ else
+ buf = mmap(0, buf_size, prot, self->mmap_flags, -1, 0);
ASSERT_NE(MAP_FAILED, buf);
check_refs(buf, buf_size, 0);
@@ -1575,7 +1673,12 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
uint32_t mock_stdev_id;
__u64 iova;
- test_ioctl_ioas_map(buf + start, length, &iova);
+ if (variant->file) {
+ test_ioctl_ioas_map_file(mfd, start, length,
+ &iova);
+ } else {
+ test_ioctl_ioas_map(buf + start, length, &iova);
+ }
/* Add and destroy a domain while the area exists */
old_id = self->hwpt_ids[1];
@@ -1596,15 +1699,18 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
}
check_refs(buf, buf_size, 0);
ASSERT_EQ(0, munmap(buf, buf_size));
+ if (variant->file)
+ close(mfd);
}
TEST_F(iommufd_mock_domain, user_copy)
{
+ void *buf = variant->file ? mfd_buffer : buffer;
struct iommu_test_cmd access_cmd = {
.size = sizeof(access_cmd),
.op = IOMMU_TEST_OP_ACCESS_PAGES,
.access_pages = { .length = BUFFER_SIZE,
- .uptr = (uintptr_t)buffer },
+ .uptr = (uintptr_t)buf },
};
struct iommu_ioas_copy copy_cmd = {
.size = sizeof(copy_cmd),
@@ -1623,9 +1729,13 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
/* Pin the pages in an IOAS with no domains then copy to an IOAS with domains */
test_ioctl_ioas_alloc(&ioas_id);
- test_ioctl_ioas_map_id(ioas_id, buffer, BUFFER_SIZE,
- ©_cmd.src_iova);
-
+ if (variant->file) {
+ test_ioctl_ioas_map_id_file(ioas_id, mfd, 0, BUFFER_SIZE,
+ ©_cmd.src_iova);
+ } else {
+ test_ioctl_ioas_map_id(ioas_id, buf, BUFFER_SIZE,
+ ©_cmd.src_iova);
+ }
test_cmd_create_access(ioas_id, &access_cmd.id,
MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES);
@@ -1635,12 +1745,17 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
&access_cmd));
copy_cmd.src_ioas_id = ioas_id;
ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, ©_cmd));
- check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE);
+ check_mock_iova(buf, MOCK_APERTURE_START, BUFFER_SIZE);
/* Now replace the ioas with a new one */
test_ioctl_ioas_alloc(&new_ioas_id);
- test_ioctl_ioas_map_id(new_ioas_id, buffer, BUFFER_SIZE,
- ©_cmd.src_iova);
+ if (variant->file) {
+ test_ioctl_ioas_map_id_file(new_ioas_id, mfd, 0, BUFFER_SIZE,
+ ©_cmd.src_iova);
+ } else {
+ test_ioctl_ioas_map_id(new_ioas_id, buf, BUFFER_SIZE,
+ ©_cmd.src_iova);
+ }
test_cmd_access_replace_ioas(access_cmd.id, new_ioas_id);
/* Destroy the old ioas and cleanup copied mapping */
@@ -1654,7 +1769,7 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
&access_cmd));
copy_cmd.src_ioas_id = new_ioas_id;
ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, ©_cmd));
- check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE);
+ check_mock_iova(buf, MOCK_APERTURE_START, BUFFER_SIZE);
test_cmd_destroy_access_pages(
access_cmd.id, access_cmd.access_pages.out_access_pages_id);
@@ -1667,6 +1782,9 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
{
uint32_t ioas_id;
+ if (variant->file)
+ SKIP(return, "redundant test");
+
test_ioctl_ioas_alloc(&ioas_id);
test_cmd_mock_domain_replace(self->stdev_ids[0], ioas_id);
@@ -1697,6 +1815,9 @@ static void check_access_rw(struct __test_metadata *_metadata, int fd,
{
int i;
+ if (variant->file)
+ SKIP(return, "redundant test");
+
for (i = 0; i != variant->mock_domains; i++) {
uint32_t hwpt_id[2];
uint32_t stddev_id;
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 40f6f14..7bc664e 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -589,6 +589,47 @@ static int _test_ioctl_ioas_unmap(int fd, unsigned int ioas_id, uint64_t iova,
EXPECT_ERRNO(_errno, _test_ioctl_ioas_unmap(self->fd, self->ioas_id, \
iova, length, NULL))
+static int _test_ioctl_ioas_map_file(int fd, unsigned int ioas_id, int mfd,
+ size_t start, size_t length, __u64 *iova,
+ unsigned int flags)
+{
+ struct iommu_ioas_map_file cmd = {
+ .size = sizeof(cmd),
+ .flags = flags,
+ .ioas_id = ioas_id,
+ .fd = mfd,
+ .start = start,
+ .length = length,
+ };
+ int ret;
+
+ if (flags & IOMMU_IOAS_MAP_FIXED_IOVA)
+ cmd.iova = *iova;
+
+ ret = ioctl(fd, IOMMU_IOAS_MAP_FILE, &cmd);
+ *iova = cmd.iova;
+ return ret;
+}
+
+#define test_ioctl_ioas_map_file(mfd, start, length, iova_p) \
+ ASSERT_EQ(0, _test_ioctl_ioas_map_file(self->fd, self->ioas_id, mfd, \
+ start, length, iova_p, \
+ IOMMU_IOAS_MAP_WRITEABLE | \
+ IOMMU_IOAS_MAP_READABLE))
+
+#define test_err_ioctl_ioas_map_file(_errno, start, length, iova_p) \
+ EXPECT_ERRNO(_errno, \
+ _test_ioctl_ioas_map_file(self->fd, self->ioas_id, mfd, \
+ start, length, iova_p, \
+ IOMMU_IOAS_MAP_WRITEABLE | \
+ IOMMU_IOAS_MAP_READABLE))
+
+#define test_ioctl_ioas_map_id_file(ioas_id, mfd, start, length, iova_p) \
+ ASSERT_EQ(0, _test_ioctl_ioas_map_file(self->fd, ioas_id, mfd, \
+ start, length, iova_p, \
+ IOMMU_IOAS_MAP_WRITEABLE | \
+ IOMMU_IOAS_MAP_READABLE))
+
static int _test_ioctl_set_temp_memory_limit(int fd, unsigned int limit)
{
struct iommu_test_cmd memlimit_cmd = {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH V4 9/9] iommufd: map file selftest
2024-10-18 21:27 ` [PATCH V4 9/9] iommufd: map file selftest Steve Sistare
@ 2024-10-18 22:51 ` Nicolin Chen
2024-10-22 13:27 ` Steven Sistare
0 siblings, 1 reply; 19+ messages in thread
From: Nicolin Chen @ 2024-10-18 22:51 UTC (permalink / raw)
To: Steve Sistare; +Cc: iommu, Jason Gunthorpe, Kevin Tian
On Fri, Oct 18, 2024 at 02:27:36PM -0700, Steve Sistare wrote:
>
> Add test cases to exercise IOMMU_IOAS_MAP_FILE.
[...]
> +static int _test_ioctl_ioas_map_file(int fd, unsigned int ioas_id, int mfd,
> + size_t start, size_t length, __u64 *iova,
> + unsigned int flags)
> +{
> + struct iommu_ioas_map_file cmd = {
> + .size = sizeof(cmd),
> + .flags = flags,
> + .ioas_id = ioas_id,
> + .fd = mfd,
> + .start = start,
> + .length = length,
> + };
> + int ret;
> +
> + if (flags & IOMMU_IOAS_MAP_FIXED_IOVA)
> + cmd.iova = *iova;
> +
> + ret = ioctl(fd, IOMMU_IOAS_MAP_FILE, &cmd);
For a new ioctl command, There should a TEST_LENGTH in
TEST_F(iommufd, cmd_length).
Also, we might need something in iommufd_fail_nth.c too.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH V4 9/9] iommufd: map file selftest
2024-10-18 22:51 ` Nicolin Chen
@ 2024-10-22 13:27 ` Steven Sistare
0 siblings, 0 replies; 19+ messages in thread
From: Steven Sistare @ 2024-10-22 13:27 UTC (permalink / raw)
To: Nicolin Chen; +Cc: iommu, Jason Gunthorpe, Kevin Tian
On 10/18/2024 6:51 PM, Nicolin Chen wrote:
> On Fri, Oct 18, 2024 at 02:27:36PM -0700, Steve Sistare wrote:
>>
>> Add test cases to exercise IOMMU_IOAS_MAP_FILE.
> [...]
>> +static int _test_ioctl_ioas_map_file(int fd, unsigned int ioas_id, int mfd,
>> + size_t start, size_t length, __u64 *iova,
>> + unsigned int flags)
>> +{
>> + struct iommu_ioas_map_file cmd = {
>> + .size = sizeof(cmd),
>> + .flags = flags,
>> + .ioas_id = ioas_id,
>> + .fd = mfd,
>> + .start = start,
>> + .length = length,
>> + };
>> + int ret;
>> +
>> + if (flags & IOMMU_IOAS_MAP_FIXED_IOVA)
>> + cmd.iova = *iova;
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_MAP_FILE, &cmd);
>
> For a new ioctl command, There should a TEST_LENGTH in
> TEST_F(iommufd, cmd_length).
Thanks, I will add this.
> Also, we might need something in iommufd_fail_nth.c too.
Thanks, I'll add a map_file variant of one or more of these cases:
TEST_FAIL_NTH(basic_fail_nth, map_two_domains)
TEST_FAIL_NTH(basic_fail_nth, map_domain)
TEST_FAIL_NTH(basic_fail_nth, basic)
- Steve
^ permalink raw reply [flat|nested] 19+ messages in thread