* [PATCH V2 0/4] iommufd live update
@ 2024-09-26 13:53 Steve Sistare
2024-09-26 13:53 ` [PATCH V2 1/4] iommufd: Export do_update_pinned Steve Sistare
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Steve Sistare @ 2024-09-26 13:53 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Alex Williamson, Cornelia Huck,
Steve Sistare
Live update is a technique wherein an application saves its state, launches
an updated version of itself, and restores its state. Clients of the
application experience a brief suspension of service, on the order of
100's of milliseconds, but are otherwise unaffected.
Define the IOMMU_IOAS_CHANGE_PROCESS ioctl to allow management and use
of an iommufd device to be transferred from one process to another. The
application is responsible for transferring the device descriptor to the new
process, eg either by preservation across fork and exec or via SCM_RIGHTS.
It atomically updates everything to the current process, grabbing the new mm,
and transferring pinned page counts.
IOMMU_IOAS_CHANGE_PROCESS only supports DMA mappings created with
IOMMU_IOAS_MAP_FILE, because the kernel metadata for such mappings does not
depend on the userland VA of the pages (which is different in the new process).
IOMMU_IOAS_CHANGE_PROCESS fails if other types of mappings are present.
Thanks to Jason Gunthorpe for code and ideas in this series.
This series depends on the series:
[PATCH V2] iommu_ioas_map_file
This is implemented in QEMU by the series:
[PATCH] Live update: iommufd (V2 to be posted shortly)
Changes in V2:
* deleted interface to update VA
* add selftest cases
Steve Sistare (4):
iommufd: Export do_update_pinned
iommufd: Lock all objects
iommufd: Add IOMMU_IOAS_CHANGE_PROCESS
iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest
drivers/iommu/iommufd/io_pagetable.h | 6 +
drivers/iommu/iommufd/ioas.c | 209 ++++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 2 +
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/pages.c | 10 +-
include/uapi/linux/iommufd.h | 23 +++
tools/testing/selftests/iommu/Makefile | 1 +
tools/testing/selftests/iommu/iommufd.c | 148 +++++++++++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 8 +-
9 files changed, 398 insertions(+), 12 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH V2 1/4] iommufd: Export do_update_pinned
2024-09-26 13:53 [PATCH V2 0/4] iommufd live update Steve Sistare
@ 2024-09-26 13:53 ` Steve Sistare
2024-10-02 17:34 ` Jason Gunthorpe
2024-09-26 13:53 ` [PATCH V2 2/4] iommufd: Lock all objects Steve Sistare
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Steve Sistare @ 2024-09-26 13:53 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Alex Williamson, Cornelia Huck,
Steve Sistare
Export do_update_pinned. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
drivers/iommu/iommufd/io_pagetable.h | 5 +++++
drivers/iommu/iommufd/pages.c | 10 +++++-----
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index 909d396..5552911 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -252,4 +252,9 @@ struct iopt_pages_access {
unsigned int users;
};
+struct pfn_reader_user;
+
+int iopt_update_pinned(struct iopt_pages *pages, unsigned long npages,
+ bool inc, struct pfn_reader_user *user);
+
#endif
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 6b71072..db606b1 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -991,8 +991,8 @@ static int update_mm_locked_vm(struct iopt_pages *pages, unsigned long npages,
return rc;
}
-static int do_update_pinned(struct iopt_pages *pages, unsigned long npages,
- bool inc, struct pfn_reader_user *user)
+int iopt_update_pinned(struct iopt_pages *pages, unsigned long npages,
+ bool inc, struct pfn_reader_user *user)
{
int rc = 0;
@@ -1026,8 +1026,8 @@ static void update_unpinned(struct iopt_pages *pages)
return;
if (pages->npinned == pages->last_npinned)
return;
- do_update_pinned(pages, pages->last_npinned - pages->npinned, false,
- NULL);
+ iopt_update_pinned(pages, pages->last_npinned - pages->npinned, false,
+ NULL);
}
/*
@@ -1057,7 +1057,7 @@ static int pfn_reader_user_update_pinned(struct pfn_reader_user *user,
npages = pages->npinned - pages->last_npinned;
inc = true;
}
- return do_update_pinned(pages, npages, inc, user);
+ return iopt_update_pinned(pages, npages, inc, user);
}
/*
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH V2 2/4] iommufd: Lock all objects
2024-09-26 13:53 [PATCH V2 0/4] iommufd live update Steve Sistare
2024-09-26 13:53 ` [PATCH V2 1/4] iommufd: Export do_update_pinned Steve Sistare
@ 2024-09-26 13:53 ` Steve Sistare
2024-10-07 16:18 ` Steven Sistare
2024-09-26 13:53 ` [PATCH V2 3/4] iommufd: Add IOMMU_IOAS_CHANGE_PROCESS Steve Sistare
2024-09-26 13:53 ` [PATCH V2 4/4] iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest Steve Sistare
3 siblings, 1 reply; 13+ messages in thread
From: Steve Sistare @ 2024-09-26 13:53 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Alex Williamson, Cornelia Huck,
Steve Sistare
Define helpers to lock and unlock all iommufd_ctx objects.
This will allow DMA mappings to be updated atomically during live update.
This code is substantially the same as an initial version provided by
Jason, plus fixes.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
drivers/iommu/iommufd/ioas.c | 67 +++++++++++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 1 +
drivers/iommu/iommufd/main.c | 1 +
3 files changed, 69 insertions(+)
diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index 2d25ba0..8eb869d 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -52,7 +52,10 @@ int iommufd_ioas_alloc_ioctl(struct iommufd_ucmd *ucmd)
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
if (rc)
goto out_table;
+
+ down_read(&ucmd->ictx->ioas_creation_lock);
iommufd_object_finalize(ucmd->ictx, &ioas->obj);
+ up_read(&ucmd->ictx->ioas_creation_lock);
return 0;
out_table:
@@ -370,6 +373,70 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd)
return rc;
}
+static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx,
+ struct xarray *ioas_list)
+{
+ struct iommufd_ioas *ioas;
+ unsigned long index;
+
+ xa_for_each(ioas_list, index, ioas) {
+ up_write(&ioas->iopt.iova_rwsem);
+ refcount_dec(&ioas->obj.users);
+ }
+ up_write(&ictx->ioas_creation_lock);
+ xa_destroy(ioas_list);
+}
+
+static int iommufd_take_all_iova_rwsem(struct iommufd_ctx *ictx,
+ struct xarray *ioas_list)
+{
+ struct iommufd_object *obj;
+ unsigned long index;
+ int rc;
+
+ /*
+ * This is very ugly, it is done instead of adding a lock around
+ * pages->source_mm, which is a performance path for mdev, we just
+ * obtain the write side of all the iova_rwsems which also protects the
+ * pages->source_*. Due to copies we can't know which IOAS could read
+ * from the pages, so we just lock everything. This is the only place
+ * locks are nested and they are uniformly taken in ID order.
+ *
+ * ioas_creation_lock prevents new IOAS from being installed in the
+ * xarray while we do this, and also prevents more than one thread from
+ * holding nested locks.
+ */
+ down_write(&ictx->ioas_creation_lock);
+ xa_lock(&ictx->objects);
+ lockdep_off();
+ xa_for_each(&ictx->objects, index, obj) {
+ struct iommufd_ioas *ioas;
+
+ if (!obj || obj->type != IOMMUFD_OBJ_IOAS)
+ continue;
+
+ if (!refcount_inc_not_zero(&obj->users))
+ continue;
+
+ xa_unlock(&ictx->objects);
+
+ ioas = container_of(obj, struct iommufd_ioas, obj);
+ down_write(&ioas->iopt.iova_rwsem);
+
+ rc = xa_err(xa_store(ioas_list, index, ioas, GFP_KERNEL));
+ if (rc) {
+ lockdep_on();
+ iommufd_release_all_iova_rwsem(ictx, ioas_list);
+ return rc;
+ }
+
+ xa_lock(&ictx->objects);
+ }
+ lockdep_on();
+ xa_unlock(&ictx->objects);
+ return 0;
+}
+
int iommufd_option_rlimit_mode(struct iommu_option *cmd,
struct iommufd_ctx *ictx)
{
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 0ebb5bd..06a52f2 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -23,6 +23,7 @@ struct iommufd_ctx {
struct xarray objects;
struct xarray groups;
wait_queue_head_t destroy_wait;
+ struct rw_semaphore ioas_creation_lock;
u8 account_mode;
/* Compatibility with VFIO no iommu */
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 5f77ce6..13ec031 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -248,6 +248,7 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp)
pr_info_once("IOMMUFD is providing /dev/vfio/vfio, not VFIO.\n");
}
+ init_rwsem(&ictx->ioas_creation_lock);
xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
xa_init(&ictx->groups);
ictx->file = filp;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH V2 3/4] iommufd: Add IOMMU_IOAS_CHANGE_PROCESS
2024-09-26 13:53 [PATCH V2 0/4] iommufd live update Steve Sistare
2024-09-26 13:53 ` [PATCH V2 1/4] iommufd: Export do_update_pinned Steve Sistare
2024-09-26 13:53 ` [PATCH V2 2/4] iommufd: Lock all objects Steve Sistare
@ 2024-09-26 13:53 ` Steve Sistare
2024-10-02 17:51 ` Jason Gunthorpe
2024-09-26 13:53 ` [PATCH V2 4/4] iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest Steve Sistare
3 siblings, 1 reply; 13+ messages in thread
From: Steve Sistare @ 2024-09-26 13:53 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Alex Williamson, Cornelia Huck,
Steve Sistare
Add an ioctl that updates all DMA mappings to reflect the current process,
Change the mm and transfer locked memory accounting from old to current mm.
This will be used for live update, allowing an old process to hand the
iommufd device descriptor to a new process. The new process calls the
ioctl.
IOMMU_IOAS_CHANGE_PROCESS only supports DMA mappings created with
IOMMU_IOAS_MAP_FILE, because the kernel metadata for such mappings does not
depend on the userland VA of the pages (which is different in the new process).
IOMMU_IOAS_CHANGE_PROCESS fails if other types of mappings are present.
This code is substantially the same as an initial version provided by Jason,
plus fixes.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
drivers/iommu/iommufd/io_pagetable.h | 1 +
drivers/iommu/iommufd/ioas.c | 142 ++++++++++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 1 +
drivers/iommu/iommufd/main.c | 2 +
include/uapi/linux/iommufd.h | 23 ++++++
5 files changed, 169 insertions(+)
diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index 5552911..255fb01 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -173,6 +173,7 @@ enum {
IOPT_PAGES_ACCOUNT_NONE = 0,
IOPT_PAGES_ACCOUNT_USER = 1,
IOPT_PAGES_ACCOUNT_MM = 2,
+ IOPT_PAGES_ACCOUNT_MODE_NUM = 3,
};
enum iopt_address_type {
diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index 8eb869d..122a40b 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -437,6 +437,148 @@ static int iommufd_take_all_iova_rwsem(struct iommufd_ctx *ictx,
return 0;
}
+static bool need_charge_update(struct iopt_pages *pages)
+{
+ switch (pages->account_mode) {
+ case IOPT_PAGES_ACCOUNT_NONE:
+ return false;
+ case IOPT_PAGES_ACCOUNT_MM:
+ return pages->source_mm != current->mm;
+ case IOPT_PAGES_ACCOUNT_USER:
+ /*
+ * Update when mm changes because it also accounts
+ * in mm->pinned_vm.
+ */
+ return (pages->source_user != current_user()) ||
+ (pages->source_mm != current->mm);
+ }
+ return true;
+}
+
+static int charge_current(unsigned long *npinned)
+{
+ struct iopt_pages tmp = {
+ .source_mm = current->mm,
+ .source_task = current->group_leader,
+ .source_user = current_user(),
+ };
+ unsigned int account_mode;
+ int rc;
+
+ for (account_mode = 0; account_mode != IOPT_PAGES_ACCOUNT_MODE_NUM;
+ account_mode++) {
+ if (!npinned[account_mode])
+ continue;
+
+ tmp.account_mode = account_mode;
+ rc = iopt_update_pinned(&tmp, npinned[account_mode], true,
+ NULL);
+ if (rc)
+ goto err_undo;
+ }
+ return 0;
+
+err_undo:
+ while (account_mode != 0) {
+ account_mode--;
+ tmp.account_mode = account_mode;
+ iopt_update_pinned(&tmp, npinned[account_mode], false, NULL);
+ }
+ return rc;
+}
+
+static void change_mm(struct iopt_pages *pages)
+{
+ struct task_struct *old_task = pages->source_task;
+ struct user_struct *old_user = pages->source_user;
+ struct mm_struct *old_mm = pages->source_mm;
+
+ pages->source_mm = current->mm;
+ mmgrab(pages->source_mm);
+ mmdrop(old_mm);
+
+ pages->source_task = current->group_leader;
+ get_task_struct(pages->source_task);
+ put_task_struct(old_task);
+
+ pages->source_user = get_uid(current_user());
+ free_uid(old_user);
+}
+
+#define for_each_ioas_area(_xa, _index, _ioas, _area) \
+ xa_for_each((_xa), (_index), (_ioas)) \
+ for (_area = iopt_area_iter_first(&_ioas->iopt, 0, ULONG_MAX); \
+ _area; \
+ _area = iopt_area_iter_next(_area, 0, ULONG_MAX))
+
+int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_ioas_change_process *cmd = ucmd->cmd;
+ struct iommufd_ctx *ictx = ucmd->ictx;
+ unsigned long all_npinned[IOPT_PAGES_ACCOUNT_MODE_NUM] = {0, 0, 0};
+ struct iommufd_ioas *ioas;
+ struct iopt_area *area;
+ struct iopt_pages *pages;
+ struct xarray ioas_list;
+ unsigned long index;
+ int rc;
+
+ if (cmd->__reserved)
+ return -EOPNOTSUPP;
+
+ xa_init(&ioas_list);
+ rc = iommufd_take_all_iova_rwsem(ictx, &ioas_list);
+ if (rc)
+ goto out_no_release;
+
+ for_each_ioas_area(&ioas_list, index, ioas, area) {
+ if (area->pages->type != IOPT_ADDRESS_FILE) {
+ rc = -EINVAL;
+ goto out;
+ }
+ }
+
+ /*
+ * Count last_pinned pages, then clear it to avoid double counting
+ * if the same iopt_pages is visited multiple times in this loop.
+ * Since we are under all the locks, npinned == last_npinned, so we
+ * can easily restore last_npinned before we return.
+ */
+ for_each_ioas_area(&ioas_list, index, ioas, area) {
+ pages = area->pages;
+
+ if (need_charge_update(pages)) {
+ all_npinned[pages->account_mode] += pages->last_npinned;
+ pages->last_npinned = 0;
+ }
+ }
+
+ rc = charge_current(all_npinned);
+
+ if (rc) {
+ /* Charge failed. Fix last_npinned and bail. */
+ for_each_ioas_area(&ioas_list, index, ioas, area)
+ area->pages->last_npinned = area->pages->npinned;
+ goto out;
+ }
+
+ for_each_ioas_area(&ioas_list, index, ioas, area) {
+ pages = area->pages;
+
+ /* Uncharge the old one (which also restores last_npinned) */
+ if (need_charge_update(pages))
+ iopt_update_pinned(pages, pages->npinned, false, NULL);
+
+ change_mm(pages);
+ }
+
+out:
+ iommufd_release_all_iova_rwsem(ictx, &ioas_list);
+
+out_no_release:
+ return rc;
+}
+
int iommufd_option_rlimit_mode(struct iommu_option *cmd,
struct iommufd_ctx *ictx)
{
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 06a52f2..968bc17 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -282,6 +282,7 @@ static inline struct iommufd_ioas *iommufd_get_ioas(struct iommufd_ctx *ictx,
int iommufd_ioas_allow_iovas(struct iommufd_ucmd *ucmd);
int iommufd_ioas_map(struct iommufd_ucmd *ucmd);
int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd);
+int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd);
int iommufd_ioas_copy(struct iommufd_ucmd *ucmd);
int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd);
int iommufd_ioas_option(struct iommufd_ucmd *ucmd);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 13ec031..244eb4d 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -373,6 +373,8 @@ struct iommufd_ioctl_op {
struct iommu_ioas_alloc, out_ioas_id),
IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,
struct iommu_ioas_allow_iovas, allowed_iovas),
+ IOCTL_OP(IOMMU_IOAS_CHANGE_PROCESS, iommufd_ioas_change_process,
+ struct iommu_ioas_change_process, size),
IOCTL_OP(IOMMU_IOAS_COPY, iommufd_ioas_copy, struct iommu_ioas_copy,
src_iova),
IOCTL_OP(IOMMU_IOAS_IOVA_RANGES, iommufd_ioas_iova_ranges,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index c484981..523de8c 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -52,6 +52,7 @@ enum {
IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d,
IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e,
IOMMUFD_CMD_IOAS_MAP_FILE = 0x8f,
+ IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x90,
};
/**
@@ -822,4 +823,26 @@ struct iommu_fault_alloc {
__u32 out_fault_fd;
};
#define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC)
+
+/**
+ * struct iommu_ioas_change_process - ioctl(VFIO_IOAS_CHANGE_PROCESS)
+ * @size: sizeof(struct iommu_ioas_change_process)
+ * @__reserved: Must be 0
+ *
+ * This transfers pinned memory counts for every memory map in every IOAS
+ * in the context to the current process. This only supports maps created
+ * with IOMMU_IOAS_MAP_FILE, and returns EINVAL if other maps are present.
+ * If the ioctl returns a failure status, then nothing is changed.
+ *
+ * This API is useful for transferring operation of a device from one process
+ * to another, such as during userland live update.
+ */
+struct iommu_ioas_change_process {
+ __u32 size;
+ __u32 __reserved;
+};
+
+#define IOMMU_IOAS_CHANGE_PROCESS \
+ _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_CHANGE_PROCESS)
+
#endif
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH V2 4/4] iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest
2024-09-26 13:53 [PATCH V2 0/4] iommufd live update Steve Sistare
` (2 preceding siblings ...)
2024-09-26 13:53 ` [PATCH V2 3/4] iommufd: Add IOMMU_IOAS_CHANGE_PROCESS Steve Sistare
@ 2024-09-26 13:53 ` Steve Sistare
2024-10-07 16:05 ` Steven Sistare
3 siblings, 1 reply; 13+ messages in thread
From: Steve Sistare @ 2024-09-26 13:53 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Alex Williamson, Cornelia Huck,
Steve Sistare
Add selftest cases for IOMMU_IOAS_CHANGE_PROCESS.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
tools/testing/selftests/iommu/Makefile | 1 +
tools/testing/selftests/iommu/iommufd.c | 148 ++++++++++++++++++++++++--
tools/testing/selftests/iommu/iommufd_utils.h | 8 +-
3 files changed, 150 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/iommu/Makefile b/tools/testing/selftests/iommu/Makefile
index fd64779..bd23ac9 100644
--- a/tools/testing/selftests/iommu/Makefile
+++ b/tools/testing/selftests/iommu/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
CFLAGS += -Wall -O2 -Wno-unused-function
CFLAGS += $(KHDR_INCLUDES)
+CFLAGS += -lcap
TEST_GEN_PROGS :=
TEST_GEN_PROGS += iommufd
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 6adf84f..5cb8b28 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -5,6 +5,7 @@
#include <sys/eventfd.h>
#include <sys/syscall.h>
#include <asm/unistd.h>
+#include <sys/capability.h>
#define __EXPORTED_HEADERS__
#include <linux/vfio.h>
@@ -215,6 +216,144 @@ static __attribute__((constructor)) void setup_sizes(void)
EXPECT_ERRNO(ENOENT, ioctl(self->fd, IOMMU_OPTION, &cmd));
}
+static void drop_cap_ipc_lock(struct __test_metadata *_metadata)
+{
+ cap_t caps;
+ cap_value_t cap_list[1] = { CAP_IPC_LOCK };
+
+ caps = cap_get_proc();
+ ASSERT_NE(caps, NULL);
+ ASSERT_NE(-1,
+ cap_set_flag(caps, CAP_EFFECTIVE, 1, cap_list, CAP_CLEAR));
+ ASSERT_NE(-1, cap_set_proc(caps));
+ cap_free(caps);
+}
+
+static long get_proc_status_value(pid_t pid, const char *var)
+{
+ FILE *fp;
+ char buf[80], tag[80];
+ long val = -1;
+
+ snprintf(buf, sizeof(buf), "/proc/%d/status", pid);
+ fp = fopen(buf, "r");
+ if (!fp)
+ return val;
+
+ while (fgets(buf, sizeof(buf), fp))
+ if (fscanf(fp, "%s %ld\n", tag, &val) == 2 && !strcmp(tag, var))
+ break;
+
+ fclose(fp);
+ return val;
+}
+
+static long get_vm_pinned(pid_t pid)
+{
+ return get_proc_status_value(pid, "VmPin:");
+}
+
+static long get_vm_locked(pid_t pid)
+{
+ return get_proc_status_value(pid, "VmLck:");
+}
+
+FIXTURE(change_process)
+{
+ int fd;
+ uint32_t ioas_id;
+};
+
+FIXTURE_VARIANT(change_process)
+{
+ int accounting;
+};
+
+FIXTURE_SETUP(change_process)
+{
+ self->fd = open("/dev/iommu", O_RDWR);
+ ASSERT_NE(-1, self->fd);
+
+ drop_cap_ipc_lock(_metadata);
+ if (variant->accounting != IOPT_PAGES_ACCOUNT_NONE) {
+ struct iommu_option set_limit_cmd = {
+ .size = sizeof(set_limit_cmd),
+ .option_id = IOMMU_OPTION_RLIMIT_MODE,
+ .op = IOMMU_OPTION_OP_SET,
+ .val64 = (variant->accounting == IOPT_PAGES_ACCOUNT_MM),
+ };
+ ASSERT_EQ(0, ioctl(self->fd, IOMMU_OPTION, &set_limit_cmd));
+ }
+
+ test_ioctl_ioas_alloc(&self->ioas_id);
+ test_cmd_mock_domain(self->ioas_id, NULL, NULL, NULL);
+}
+
+FIXTURE_TEARDOWN(change_process)
+{
+ teardown_iommufd(self->fd, _metadata);
+}
+
+FIXTURE_VARIANT_ADD(change_process, account_none)
+{
+ .accounting = IOPT_PAGES_ACCOUNT_NONE,
+};
+
+FIXTURE_VARIANT_ADD(change_process, account_user)
+{
+ .accounting = IOPT_PAGES_ACCOUNT_USER,
+};
+
+FIXTURE_VARIANT_ADD(change_process, account_mm)
+{
+ .accounting = IOPT_PAGES_ACCOUNT_MM,
+};
+
+TEST_F(change_process, basic)
+{
+ pid_t parent = getpid();
+ pid_t child;
+ __u64 iova;
+ struct iommu_ioas_change_process cmd = {
+ .size = sizeof(cmd),
+ };
+
+ /* Expect failure if non-file maps exist */
+ test_ioctl_ioas_map(buffer, PAGE_SIZE, &iova);
+ EXPECT_ERRNO(EINVAL, ioctl(self->fd, IOMMU_IOAS_CHANGE_PROCESS, &cmd));
+ test_ioctl_ioas_unmap(iova, PAGE_SIZE);
+
+ /* Change process works in current process. */
+ test_ioctl_ioas_map_file(mfd, 0, PAGE_SIZE, &iova);
+ ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_CHANGE_PROCESS, &cmd));
+
+ /* Change process works in another process */
+ child = fork();
+ if (!child) {
+ int nlock = PAGE_SIZE / 1024;
+
+ /* Parent accounts for locked memory before */
+ ASSERT_EQ(nlock, get_vm_pinned(parent));
+ if (variant->accounting == IOPT_PAGES_ACCOUNT_MM)
+ ASSERT_EQ(nlock, get_vm_locked(parent));
+ ASSERT_EQ(0, get_vm_pinned(getpid()));
+ ASSERT_EQ(0, get_vm_locked(getpid()));
+
+ ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_CHANGE_PROCESS, &cmd));
+
+ /* Child accounts for locked memory after */
+ ASSERT_EQ(0, get_vm_pinned(parent));
+ ASSERT_EQ(0, get_vm_locked(parent));
+ ASSERT_EQ(nlock, get_vm_pinned(getpid()));
+ if (variant->accounting == IOPT_PAGES_ACCOUNT_MM)
+ ASSERT_EQ(nlock, get_vm_locked(getpid()));
+
+ exit(0);
+ }
+ ASSERT_NE(-1, child);
+ ASSERT_EQ(child, waitpid(child, NULL, 0));
+}
+
FIXTURE(iommufd_ioas)
{
int fd;
@@ -1548,13 +1687,10 @@ static void test_basic_file(
buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd_tmp);
ASSERT_NE(MAP_FAILED, buf);
- /* EFAULT half way through mapping */
- ASSERT_EQ(0, munmap(buf + buf_size / 2, buf_size / 2));
- test_err_ioctl_ioas_map_file(EFAULT, 0, buf_size, &iova);
+ test_err_ioctl_ioas_map_file(EINVAL, mfd_tmp, 0, buf_size + 1, &iova);
- /* EFAULT on first page */
- ASSERT_EQ(0, munmap(buf, buf_size / 2));
- test_err_ioctl_ioas_map_file(EFAULT, 0, buf_size, &iova);
+ ASSERT_EQ(0, ftruncate(mfd_tmp, 0));
+ test_err_ioctl_ioas_map_file(EINVAL, mfd_tmp, 0, buf_size, &iova);
close(mfd_tmp);
}
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 7bc664e..14c9f9d 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -22,6 +22,12 @@
#define BIT_MASK(nr) (1UL << ((nr) % __BITS_PER_LONG))
#define BIT_WORD(nr) ((nr) / __BITS_PER_LONG)
+enum {
+ IOPT_PAGES_ACCOUNT_NONE = 0,
+ IOPT_PAGES_ACCOUNT_USER = 1,
+ IOPT_PAGES_ACCOUNT_MM = 2,
+};
+
#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
static inline void set_bit(unsigned int nr, unsigned long *addr)
@@ -617,7 +623,7 @@ static int _test_ioctl_ioas_map_file(int fd, unsigned int ioas_id, int mfd,
IOMMU_IOAS_MAP_WRITEABLE | \
IOMMU_IOAS_MAP_READABLE))
-#define test_err_ioctl_ioas_map_file(_errno, start, length, iova_p) \
+#define test_err_ioctl_ioas_map_file(_errno, mfd, start, length, iova_p) \
EXPECT_ERRNO(_errno, \
_test_ioctl_ioas_map_file(self->fd, self->ioas_id, mfd, \
start, length, iova_p, \
--
1.8.3.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH V2 1/4] iommufd: Export do_update_pinned
2024-09-26 13:53 ` [PATCH V2 1/4] iommufd: Export do_update_pinned Steve Sistare
@ 2024-10-02 17:34 ` Jason Gunthorpe
0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2024-10-02 17:34 UTC (permalink / raw)
To: Steve Sistare; +Cc: iommu, Kevin Tian, Alex Williamson, Cornelia Huck
On Thu, Sep 26, 2024 at 06:53:45AM -0700, Steve Sistare wrote:
> Export do_update_pinned. No functional change.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> drivers/iommu/iommufd/io_pagetable.h | 5 +++++
> drivers/iommu/iommufd/pages.c | 10 +++++-----
> 2 files changed, 10 insertions(+), 5 deletions(-)
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2 3/4] iommufd: Add IOMMU_IOAS_CHANGE_PROCESS
2024-09-26 13:53 ` [PATCH V2 3/4] iommufd: Add IOMMU_IOAS_CHANGE_PROCESS Steve Sistare
@ 2024-10-02 17:51 ` Jason Gunthorpe
2024-10-04 19:43 ` Steven Sistare
0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2024-10-02 17:51 UTC (permalink / raw)
To: Steve Sistare; +Cc: iommu, Kevin Tian, Alex Williamson, Cornelia Huck
On Thu, Sep 26, 2024 at 06:53:47AM -0700, Steve Sistare wrote:
> +static int charge_current(unsigned long *npinned)
> +{
> + struct iopt_pages tmp = {
> + .source_mm = current->mm,
> + .source_task = current->group_leader,
> + .source_user = current_user(),
> + };
> + unsigned int account_mode;
> + int rc;
This seems like it needs a bit more protection, we shouldn't be
touching pages that are not set to current otherwise it will corrupt
the counts during the uncharge?
Maybe the whole operation should fail if any areas are not owned by
this process?
> + for (account_mode = 0; account_mode != IOPT_PAGES_ACCOUNT_MODE_NUM;
> + account_mode++) {
> + if (!npinned[account_mode])
> + continue;
> +
> + tmp.account_mode = account_mode;
> + rc = iopt_update_pinned(&tmp, npinned[account_mode], true,
> + NULL);
> + if (rc)
> + goto err_undo;
> + }
> + return 0;
> +
> +err_undo:
> + while (account_mode != 0) {
> + account_mode--;
Do we need to check:
if (!npinned[account_mode])
continue;
To match the charging side?
> +int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd)
> +{
> + struct iommu_ioas_change_process *cmd = ucmd->cmd;
> + struct iommufd_ctx *ictx = ucmd->ictx;
> + unsigned long all_npinned[IOPT_PAGES_ACCOUNT_MODE_NUM] = {0, 0, 0};
Just use "= {}"
> + struct iommufd_ioas *ioas;
> + struct iopt_area *area;
> + struct iopt_pages *pages;
> + struct xarray ioas_list;
> + unsigned long index;
> + int rc;
> +
> + if (cmd->__reserved)
> + return -EOPNOTSUPP;
> +
> + xa_init(&ioas_list);
> + rc = iommufd_take_all_iova_rwsem(ictx, &ioas_list);
> + if (rc)
> + goto out_no_release;
> +
> + for_each_ioas_area(&ioas_list, index, ioas, area) {
> + if (area->pages->type != IOPT_ADDRESS_FILE) {
> + rc = -EINVAL;
> + goto out;
> + }
> + }
> +
> + /*
> + * Count last_pinned pages, then clear it to avoid double counting
> + * if the same iopt_pages is visited multiple times in this loop.
> + * Since we are under all the locks, npinned == last_npinned, so we
> + * can easily restore last_npinned before we return.
> + */
> + for_each_ioas_area(&ioas_list, index, ioas, area) {
> + pages = area->pages;
> +
> + if (need_charge_update(pages)) {
> + all_npinned[pages->account_mode] += pages->last_npinned;
> + pages->last_npinned = 0;
> + }
> + }
> +
> + rc = charge_current(all_npinned);
> +
> + if (rc) {
> + /* Charge failed. Fix last_npinned and bail. */
> + for_each_ioas_area(&ioas_list, index, ioas, area)
> + area->pages->last_npinned = area->pages->npinned;
> + goto out;
> + }
> +
> + for_each_ioas_area(&ioas_list, index, ioas, area) {
> + pages = area->pages;
> +
> + /* Uncharge the old one (which also restores last_npinned) */
> + if (need_charge_update(pages))
> + iopt_update_pinned(pages, pages->npinned, false, NULL);
> +
> + change_mm(pages);
> + }
> +
> +out:
> + iommufd_release_all_iova_rwsem(ictx, &ioas_list);
> +
> +out_no_release:
> + return rc;
Don't need this label
Jason
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2 3/4] iommufd: Add IOMMU_IOAS_CHANGE_PROCESS
2024-10-02 17:51 ` Jason Gunthorpe
@ 2024-10-04 19:43 ` Steven Sistare
2024-10-04 20:20 ` Jason Gunthorpe
0 siblings, 1 reply; 13+ messages in thread
From: Steven Sistare @ 2024-10-04 19:43 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: iommu, Kevin Tian, Alex Williamson, Cornelia Huck
On 10/2/2024 1:51 PM, Jason Gunthorpe wrote:
> On Thu, Sep 26, 2024 at 06:53:47AM -0700, Steve Sistare wrote:
>
>> +static int charge_current(unsigned long *npinned)
>> +{
>> + struct iopt_pages tmp = {
>> + .source_mm = current->mm,
>> + .source_task = current->group_leader,
>> + .source_user = current_user(),
>> + };
>> + unsigned int account_mode;
>> + int rc;
>
> This seems like it needs a bit more protection, we shouldn't be
> touching pages that are not set to current otherwise it will corrupt
> the counts during the uncharge?
>
> Maybe the whole operation should fail if any areas are not owned by
> this process?
I don't follow. In a typical scenario, none of the areas are owned by the
current process when IOMMU_IOAS_CHANGE_PROCESS is called. We charge current
in one lump sum here using a fake iopt_pages, so no side effect on the real
areas. Then we uncharge the previous owners area-by-area at the comment
"Uncharge the old one".
Also, need_charge_update() excludes areas that are already owned by the
current process, so we don't mess up current charges.
But perhaps I missed your point.
>> + for (account_mode = 0; account_mode != IOPT_PAGES_ACCOUNT_MODE_NUM;
>> + account_mode++) {
>> + if (!npinned[account_mode])
>> + continue;
>> +
>> + tmp.account_mode = account_mode;
>> + rc = iopt_update_pinned(&tmp, npinned[account_mode], true,
>> + NULL);
>> + if (rc)
>> + goto err_undo;
>> + }
>> + return 0;
>> +
>> +err_undo:
>> + while (account_mode != 0) {
>> + account_mode--;
>
> Do we need to check:
>
> if (!npinned[account_mode])
> continue;
>
> To match the charging side?
No need functionally, as passing npages=0 to iopt_update_pinned is harmless,
but it does perform unnecessary locking and atomics. Adding a check would
be more efficient, albeit in a less traveled error path. I'll add it for
symmetry.
- Steve
>> +int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd)
>> +{
>> + struct iommu_ioas_change_process *cmd = ucmd->cmd;
>> + struct iommufd_ctx *ictx = ucmd->ictx;
>> + unsigned long all_npinned[IOPT_PAGES_ACCOUNT_MODE_NUM] = {0, 0, 0};
>
> Just use "= {}"
>
>> + struct iommufd_ioas *ioas;
>> + struct iopt_area *area;
>> + struct iopt_pages *pages;
>> + struct xarray ioas_list;
>> + unsigned long index;
>> + int rc;
>> +
>> + if (cmd->__reserved)
>> + return -EOPNOTSUPP;
>> +
>> + xa_init(&ioas_list);
>> + rc = iommufd_take_all_iova_rwsem(ictx, &ioas_list);
>> + if (rc)
>> + goto out_no_release;
>> +
>> + for_each_ioas_area(&ioas_list, index, ioas, area) {
>> + if (area->pages->type != IOPT_ADDRESS_FILE) {
>> + rc = -EINVAL;
>> + goto out;
>> + }
>> + }
>> +
>> + /*
>> + * Count last_pinned pages, then clear it to avoid double counting
>> + * if the same iopt_pages is visited multiple times in this loop.
>> + * Since we are under all the locks, npinned == last_npinned, so we
>> + * can easily restore last_npinned before we return.
>> + */
>> + for_each_ioas_area(&ioas_list, index, ioas, area) {
>> + pages = area->pages;
>> +
>> + if (need_charge_update(pages)) {
>> + all_npinned[pages->account_mode] += pages->last_npinned;
>> + pages->last_npinned = 0;
>> + }
>> + }
>> +
>> + rc = charge_current(all_npinned);
>> +
>> + if (rc) {
>> + /* Charge failed. Fix last_npinned and bail. */
>> + for_each_ioas_area(&ioas_list, index, ioas, area)
>> + area->pages->last_npinned = area->pages->npinned;
>> + goto out;
>> + }
>> +
>> + for_each_ioas_area(&ioas_list, index, ioas, area) {
>> + pages = area->pages;
>> +
>> + /* Uncharge the old one (which also restores last_npinned) */
>> + if (need_charge_update(pages))
>> + iopt_update_pinned(pages, pages->npinned, false, NULL);
>> +
>> + change_mm(pages);
>> + }
>> +
>> +out:
>> + iommufd_release_all_iova_rwsem(ictx, &ioas_list);
>> +
>> +out_no_release:
>> + return rc;
>
> Don't need this label
>
> Jason
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2 3/4] iommufd: Add IOMMU_IOAS_CHANGE_PROCESS
2024-10-04 19:43 ` Steven Sistare
@ 2024-10-04 20:20 ` Jason Gunthorpe
0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2024-10-04 20:20 UTC (permalink / raw)
To: Steven Sistare; +Cc: iommu, Kevin Tian, Alex Williamson, Cornelia Huck
On Fri, Oct 04, 2024 at 03:43:32PM -0400, Steven Sistare wrote:
> On 10/2/2024 1:51 PM, Jason Gunthorpe wrote:
> > On Thu, Sep 26, 2024 at 06:53:47AM -0700, Steve Sistare wrote:
> >
> > > +static int charge_current(unsigned long *npinned)
> > > +{
> > > + struct iopt_pages tmp = {
> > > + .source_mm = current->mm,
> > > + .source_task = current->group_leader,
> > > + .source_user = current_user(),
> > > + };
> > > + unsigned int account_mode;
> > > + int rc;
> >
> > This seems like it needs a bit more protection, we shouldn't be
> > touching pages that are not set to current otherwise it will corrupt
> > the counts during the uncharge?
> >
> > Maybe the whole operation should fail if any areas are not owned by
> > this process?
>
> I don't follow. In a typical scenario, none of the areas are owned by the
> current process when IOMMU_IOAS_CHANGE_PROCESS is called.
Oh I see, I didn't quite remember this all correctly. It is OK then,
you are taking ownership of all ares to the current process and it
does correctly de-account for any prior area owner.
> be more efficient, albeit in a less traveled error path. I'll add it for
> symmetry.
Yeah symmetry.
Jason
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2 4/4] iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest
2024-09-26 13:53 ` [PATCH V2 4/4] iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest Steve Sistare
@ 2024-10-07 16:05 ` Steven Sistare
2024-10-07 16:56 ` Jason Gunthorpe
0 siblings, 1 reply; 13+ messages in thread
From: Steven Sistare @ 2024-10-07 16:05 UTC (permalink / raw)
To: iommu; +Cc: Jason Gunthorpe, Kevin Tian, Alex Williamson, Cornelia Huck
Hi Jason. Do you have any comments or RB on the selftest?
I am ready to post V3.
- Steve
On 9/26/2024 9:53 AM, Steve Sistare wrote:
> Add selftest cases for IOMMU_IOAS_CHANGE_PROCESS.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> tools/testing/selftests/iommu/Makefile | 1 +
> tools/testing/selftests/iommu/iommufd.c | 148 ++++++++++++++++++++++++--
> tools/testing/selftests/iommu/iommufd_utils.h | 8 +-
> 3 files changed, 150 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/iommu/Makefile b/tools/testing/selftests/iommu/Makefile
> index fd64779..bd23ac9 100644
> --- a/tools/testing/selftests/iommu/Makefile
> +++ b/tools/testing/selftests/iommu/Makefile
> @@ -1,6 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
> CFLAGS += -Wall -O2 -Wno-unused-function
> CFLAGS += $(KHDR_INCLUDES)
> +CFLAGS += -lcap
>
> TEST_GEN_PROGS :=
> TEST_GEN_PROGS += iommufd
> diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
> index 6adf84f..5cb8b28 100644
> --- a/tools/testing/selftests/iommu/iommufd.c
> +++ b/tools/testing/selftests/iommu/iommufd.c
> @@ -5,6 +5,7 @@
> #include <sys/eventfd.h>
> #include <sys/syscall.h>
> #include <asm/unistd.h>
> +#include <sys/capability.h>
>
> #define __EXPORTED_HEADERS__
> #include <linux/vfio.h>
> @@ -215,6 +216,144 @@ static __attribute__((constructor)) void setup_sizes(void)
> EXPECT_ERRNO(ENOENT, ioctl(self->fd, IOMMU_OPTION, &cmd));
> }
>
> +static void drop_cap_ipc_lock(struct __test_metadata *_metadata)
> +{
> + cap_t caps;
> + cap_value_t cap_list[1] = { CAP_IPC_LOCK };
> +
> + caps = cap_get_proc();
> + ASSERT_NE(caps, NULL);
> + ASSERT_NE(-1,
> + cap_set_flag(caps, CAP_EFFECTIVE, 1, cap_list, CAP_CLEAR));
> + ASSERT_NE(-1, cap_set_proc(caps));
> + cap_free(caps);
> +}
> +
> +static long get_proc_status_value(pid_t pid, const char *var)
> +{
> + FILE *fp;
> + char buf[80], tag[80];
> + long val = -1;
> +
> + snprintf(buf, sizeof(buf), "/proc/%d/status", pid);
> + fp = fopen(buf, "r");
> + if (!fp)
> + return val;
> +
> + while (fgets(buf, sizeof(buf), fp))
> + if (fscanf(fp, "%s %ld\n", tag, &val) == 2 && !strcmp(tag, var))
> + break;
> +
> + fclose(fp);
> + return val;
> +}
> +
> +static long get_vm_pinned(pid_t pid)
> +{
> + return get_proc_status_value(pid, "VmPin:");
> +}
> +
> +static long get_vm_locked(pid_t pid)
> +{
> + return get_proc_status_value(pid, "VmLck:");
> +}
> +
> +FIXTURE(change_process)
> +{
> + int fd;
> + uint32_t ioas_id;
> +};
> +
> +FIXTURE_VARIANT(change_process)
> +{
> + int accounting;
> +};
> +
> +FIXTURE_SETUP(change_process)
> +{
> + self->fd = open("/dev/iommu", O_RDWR);
> + ASSERT_NE(-1, self->fd);
> +
> + drop_cap_ipc_lock(_metadata);
> + if (variant->accounting != IOPT_PAGES_ACCOUNT_NONE) {
> + struct iommu_option set_limit_cmd = {
> + .size = sizeof(set_limit_cmd),
> + .option_id = IOMMU_OPTION_RLIMIT_MODE,
> + .op = IOMMU_OPTION_OP_SET,
> + .val64 = (variant->accounting == IOPT_PAGES_ACCOUNT_MM),
> + };
> + ASSERT_EQ(0, ioctl(self->fd, IOMMU_OPTION, &set_limit_cmd));
> + }
> +
> + test_ioctl_ioas_alloc(&self->ioas_id);
> + test_cmd_mock_domain(self->ioas_id, NULL, NULL, NULL);
> +}
> +
> +FIXTURE_TEARDOWN(change_process)
> +{
> + teardown_iommufd(self->fd, _metadata);
> +}
> +
> +FIXTURE_VARIANT_ADD(change_process, account_none)
> +{
> + .accounting = IOPT_PAGES_ACCOUNT_NONE,
> +};
> +
> +FIXTURE_VARIANT_ADD(change_process, account_user)
> +{
> + .accounting = IOPT_PAGES_ACCOUNT_USER,
> +};
> +
> +FIXTURE_VARIANT_ADD(change_process, account_mm)
> +{
> + .accounting = IOPT_PAGES_ACCOUNT_MM,
> +};
> +
> +TEST_F(change_process, basic)
> +{
> + pid_t parent = getpid();
> + pid_t child;
> + __u64 iova;
> + struct iommu_ioas_change_process cmd = {
> + .size = sizeof(cmd),
> + };
> +
> + /* Expect failure if non-file maps exist */
> + test_ioctl_ioas_map(buffer, PAGE_SIZE, &iova);
> + EXPECT_ERRNO(EINVAL, ioctl(self->fd, IOMMU_IOAS_CHANGE_PROCESS, &cmd));
> + test_ioctl_ioas_unmap(iova, PAGE_SIZE);
> +
> + /* Change process works in current process. */
> + test_ioctl_ioas_map_file(mfd, 0, PAGE_SIZE, &iova);
> + ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_CHANGE_PROCESS, &cmd));
> +
> + /* Change process works in another process */
> + child = fork();
> + if (!child) {
> + int nlock = PAGE_SIZE / 1024;
> +
> + /* Parent accounts for locked memory before */
> + ASSERT_EQ(nlock, get_vm_pinned(parent));
> + if (variant->accounting == IOPT_PAGES_ACCOUNT_MM)
> + ASSERT_EQ(nlock, get_vm_locked(parent));
> + ASSERT_EQ(0, get_vm_pinned(getpid()));
> + ASSERT_EQ(0, get_vm_locked(getpid()));
> +
> + ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_CHANGE_PROCESS, &cmd));
> +
> + /* Child accounts for locked memory after */
> + ASSERT_EQ(0, get_vm_pinned(parent));
> + ASSERT_EQ(0, get_vm_locked(parent));
> + ASSERT_EQ(nlock, get_vm_pinned(getpid()));
> + if (variant->accounting == IOPT_PAGES_ACCOUNT_MM)
> + ASSERT_EQ(nlock, get_vm_locked(getpid()));
> +
> + exit(0);
> + }
> + ASSERT_NE(-1, child);
> + ASSERT_EQ(child, waitpid(child, NULL, 0));
> +}
> +
> FIXTURE(iommufd_ioas)
> {
> int fd;
> @@ -1548,13 +1687,10 @@ static void test_basic_file(
> buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd_tmp);
> ASSERT_NE(MAP_FAILED, buf);
>
> - /* EFAULT half way through mapping */
> - ASSERT_EQ(0, munmap(buf + buf_size / 2, buf_size / 2));
> - test_err_ioctl_ioas_map_file(EFAULT, 0, buf_size, &iova);
> + test_err_ioctl_ioas_map_file(EINVAL, mfd_tmp, 0, buf_size + 1, &iova);
>
> - /* EFAULT on first page */
> - ASSERT_EQ(0, munmap(buf, buf_size / 2));
> - test_err_ioctl_ioas_map_file(EFAULT, 0, buf_size, &iova);
> + ASSERT_EQ(0, ftruncate(mfd_tmp, 0));
> + test_err_ioctl_ioas_map_file(EINVAL, mfd_tmp, 0, buf_size, &iova);
>
> close(mfd_tmp);
> }
> diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
> index 7bc664e..14c9f9d 100644
> --- a/tools/testing/selftests/iommu/iommufd_utils.h
> +++ b/tools/testing/selftests/iommu/iommufd_utils.h
> @@ -22,6 +22,12 @@
> #define BIT_MASK(nr) (1UL << ((nr) % __BITS_PER_LONG))
> #define BIT_WORD(nr) ((nr) / __BITS_PER_LONG)
>
> +enum {
> + IOPT_PAGES_ACCOUNT_NONE = 0,
> + IOPT_PAGES_ACCOUNT_USER = 1,
> + IOPT_PAGES_ACCOUNT_MM = 2,
> +};
> +
> #define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
>
> static inline void set_bit(unsigned int nr, unsigned long *addr)
> @@ -617,7 +623,7 @@ static int _test_ioctl_ioas_map_file(int fd, unsigned int ioas_id, int mfd,
> IOMMU_IOAS_MAP_WRITEABLE | \
> IOMMU_IOAS_MAP_READABLE))
>
> -#define test_err_ioctl_ioas_map_file(_errno, start, length, iova_p) \
> +#define test_err_ioctl_ioas_map_file(_errno, mfd, start, length, iova_p) \
> EXPECT_ERRNO(_errno, \
> _test_ioctl_ioas_map_file(self->fd, self->ioas_id, mfd, \
> start, length, iova_p, \
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2 2/4] iommufd: Lock all objects
2024-09-26 13:53 ` [PATCH V2 2/4] iommufd: Lock all objects Steve Sistare
@ 2024-10-07 16:18 ` Steven Sistare
2024-10-07 16:43 ` Jason Gunthorpe
0 siblings, 1 reply; 13+ messages in thread
From: Steven Sistare @ 2024-10-07 16:18 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Kevin Tian, Alex Williamson, Cornelia Huck, iommu
Hi Jason,
Any comment on this patch before I sent V3?
There will be only a few changes in V3 in iommufd_take_all_iova_rwsem:
* delete lockdep_on/off
* replace
down_write(&ioas->iopt.iova_rwsem);
with
down_write_nest_lock(&ioas->iopt.iova_rwsem,
&ictx->ioas_creation_lock);
- Steve
On 9/26/2024 9:53 AM, Steve Sistare wrote:
> Define helpers to lock and unlock all iommufd_ctx objects.
> This will allow DMA mappings to be updated atomically during live update.
> This code is substantially the same as an initial version provided by
> Jason, plus fixes.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> drivers/iommu/iommufd/ioas.c | 67 +++++++++++++++++++++++++++++++++
> drivers/iommu/iommufd/iommufd_private.h | 1 +
> drivers/iommu/iommufd/main.c | 1 +
> 3 files changed, 69 insertions(+)
>
> diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
> index 2d25ba0..8eb869d 100644
> --- a/drivers/iommu/iommufd/ioas.c
> +++ b/drivers/iommu/iommufd/ioas.c
> @@ -52,7 +52,10 @@ int iommufd_ioas_alloc_ioctl(struct iommufd_ucmd *ucmd)
> rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> if (rc)
> goto out_table;
> +
> + down_read(&ucmd->ictx->ioas_creation_lock);
> iommufd_object_finalize(ucmd->ictx, &ioas->obj);
> + up_read(&ucmd->ictx->ioas_creation_lock);
> return 0;
>
> out_table:
> @@ -370,6 +373,70 @@ int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd)
> return rc;
> }
>
> +static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx,
> + struct xarray *ioas_list)
> +{
> + struct iommufd_ioas *ioas;
> + unsigned long index;
> +
> + xa_for_each(ioas_list, index, ioas) {
> + up_write(&ioas->iopt.iova_rwsem);
> + refcount_dec(&ioas->obj.users);
> + }
> + up_write(&ictx->ioas_creation_lock);
> + xa_destroy(ioas_list);
> +}
> +
> +static int iommufd_take_all_iova_rwsem(struct iommufd_ctx *ictx,
> + struct xarray *ioas_list)
> +{
> + struct iommufd_object *obj;
> + unsigned long index;
> + int rc;
> +
> + /*
> + * This is very ugly, it is done instead of adding a lock around
> + * pages->source_mm, which is a performance path for mdev, we just
> + * obtain the write side of all the iova_rwsems which also protects the
> + * pages->source_*. Due to copies we can't know which IOAS could read
> + * from the pages, so we just lock everything. This is the only place
> + * locks are nested and they are uniformly taken in ID order.
> + *
> + * ioas_creation_lock prevents new IOAS from being installed in the
> + * xarray while we do this, and also prevents more than one thread from
> + * holding nested locks.
> + */
> + down_write(&ictx->ioas_creation_lock);
> + xa_lock(&ictx->objects);
> + lockdep_off();
> + xa_for_each(&ictx->objects, index, obj) {
> + struct iommufd_ioas *ioas;
> +
> + if (!obj || obj->type != IOMMUFD_OBJ_IOAS)
> + continue;
> +
> + if (!refcount_inc_not_zero(&obj->users))
> + continue;
> +
> + xa_unlock(&ictx->objects);
> +
> + ioas = container_of(obj, struct iommufd_ioas, obj);
> + down_write(&ioas->iopt.iova_rwsem);
> +
> + rc = xa_err(xa_store(ioas_list, index, ioas, GFP_KERNEL));
> + if (rc) {
> + lockdep_on();
> + iommufd_release_all_iova_rwsem(ictx, ioas_list);
> + return rc;
> + }
> +
> + xa_lock(&ictx->objects);
> + }
> + lockdep_on();
> + xa_unlock(&ictx->objects);
> + return 0;
> +}
> +
> int iommufd_option_rlimit_mode(struct iommu_option *cmd,
> struct iommufd_ctx *ictx)
> {
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 0ebb5bd..06a52f2 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -23,6 +23,7 @@ struct iommufd_ctx {
> struct xarray objects;
> struct xarray groups;
> wait_queue_head_t destroy_wait;
> + struct rw_semaphore ioas_creation_lock;
>
> u8 account_mode;
> /* Compatibility with VFIO no iommu */
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index 5f77ce6..13ec031 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -248,6 +248,7 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp)
> pr_info_once("IOMMUFD is providing /dev/vfio/vfio, not VFIO.\n");
> }
>
> + init_rwsem(&ictx->ioas_creation_lock);
> xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
> xa_init(&ictx->groups);
> ictx->file = filp;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2 2/4] iommufd: Lock all objects
2024-10-07 16:18 ` Steven Sistare
@ 2024-10-07 16:43 ` Jason Gunthorpe
0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2024-10-07 16:43 UTC (permalink / raw)
To: Steven Sistare; +Cc: Kevin Tian, Alex Williamson, Cornelia Huck, iommu
On Mon, Oct 07, 2024 at 12:18:37PM -0400, Steven Sistare wrote:
> Hi Jason,
> Any comment on this patch before I sent V3?
> There will be only a few changes in V3 in iommufd_take_all_iova_rwsem:
> * delete lockdep_on/off
> * replace
> down_write(&ioas->iopt.iova_rwsem);
> with
> down_write_nest_lock(&ioas->iopt.iova_rwsem,
> &ictx->ioas_creation_lock);
That seems Ok
Jason
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH V2 4/4] iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest
2024-10-07 16:05 ` Steven Sistare
@ 2024-10-07 16:56 ` Jason Gunthorpe
0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2024-10-07 16:56 UTC (permalink / raw)
To: Steven Sistare; +Cc: iommu, Kevin Tian, Alex Williamson, Cornelia Huck
On Mon, Oct 07, 2024 at 12:05:54PM -0400, Steven Sistare wrote:
> Hi Jason. Do you have any comments or RB on the selftest?
> I am ready to post V3.
Nothing stood out to me
Jason
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-10-07 16:56 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-26 13:53 [PATCH V2 0/4] iommufd live update Steve Sistare
2024-09-26 13:53 ` [PATCH V2 1/4] iommufd: Export do_update_pinned Steve Sistare
2024-10-02 17:34 ` Jason Gunthorpe
2024-09-26 13:53 ` [PATCH V2 2/4] iommufd: Lock all objects Steve Sistare
2024-10-07 16:18 ` Steven Sistare
2024-10-07 16:43 ` Jason Gunthorpe
2024-09-26 13:53 ` [PATCH V2 3/4] iommufd: Add IOMMU_IOAS_CHANGE_PROCESS Steve Sistare
2024-10-02 17:51 ` Jason Gunthorpe
2024-10-04 19:43 ` Steven Sistare
2024-10-04 20:20 ` Jason Gunthorpe
2024-09-26 13:53 ` [PATCH V2 4/4] iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest Steve Sistare
2024-10-07 16:05 ` Steven Sistare
2024-10-07 16:56 ` Jason Gunthorpe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.