From: David Gibson <david@gibson.dropbear.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org,
Alex Williamson <alex.williamson@redhat.com>,
Paul Mackerras <paulus@samba.org>
Subject: Re: [PATCH kernel 06/15] powerpc/mm/iommu: Put pages on process exit
Date: Fri, 12 Aug 2016 13:13:56 +1000 [thread overview]
Message-ID: <20160812031356.GI16493@voom.fritz.box> (raw)
In-Reply-To: <1470213656-1042-7-git-send-email-aik@ozlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 8248 bytes --]
On Wed, Aug 03, 2016 at 06:40:47PM +1000, Alexey Kardashevskiy wrote:
> At the moment VFIO IOMMU SPAPR v2 driver pins all guest RAM pages when
> the userspace starts using VFIO.
This doesn't sound accurate. Isn't it userspace that decides what
gets pinned, not the VFIO driver?
>When the userspace process finishes,
> all the pinned pages need to be put; this is done as a part of
> the userspace memory context (MM) destruction which happens on
> the very last mmdrop().
>
> This approach has a problem that a MM of the userspace process
> may live longer than the userspace process itself as kernel threads
> use userspace process MMs which was runnning on a CPU where
> the kernel thread was scheduled to. If this happened, the MM remains
> referenced until this exact kernel thread wakes up again
> and releases the very last reference to the MM, on an idle system this
> can take even hours.
>
> This references and caches MM once per container and adds tracking
> how many times each preregistered area was registered in
> a specific container. This way we do not depend on @current pointing to
> a valid task descriptor.
The handling of @current and refcounting the mm sounds more like its
describing the previous patch.
THe description of counting how many times each prereg area is
registered doesn't seem accurate, since you block multiple
registration with an EBUSY. Or else it's describing the 'used'
counter in the lower-level mm_iommu_table_group_mem_t tracking,
rather than anything changed by this patch.
> This changes the userspace interface to return EBUSY if memory is
> already registered (mm_iommu_get() used to increment the counter);
> however it should not have any practical effect as the only
> userspace tool available now does register memory area once per
> container anyway.
>
> As tce_iommu_register_pages/tce_iommu_unregister_pages are called
> under container->lock, this does not need additional locking.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>
> # Conflicts:
> # arch/powerpc/include/asm/mmu_context.h
> # arch/powerpc/mm/mmu_context_book3s64.c
> # arch/powerpc/mm/mmu_context_iommu.c
Looks like some lines to be cleaned up in the message.
> ---
> arch/powerpc/include/asm/mmu_context.h | 1 -
> arch/powerpc/mm/mmu_context_book3s64.c | 4 ---
> arch/powerpc/mm/mmu_context_iommu.c | 11 -------
> drivers/vfio/vfio_iommu_spapr_tce.c | 52 +++++++++++++++++++++++++++++++++-
> 4 files changed, 51 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index b85cc7b..a4c4ed5 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -25,7 +25,6 @@ extern long mm_iommu_get(struct mm_struct *mm,
> extern long mm_iommu_put(struct mm_struct *mm,
> struct mm_iommu_table_group_mem_t *mem);
> extern void mm_iommu_init(struct mm_struct *mm);
> -extern void mm_iommu_cleanup(struct mm_struct *mm);
> extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup(struct mm_struct *mm,
> unsigned long ua, unsigned long size);
> extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
> index ad82735..1a07969 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -159,10 +159,6 @@ static inline void destroy_pagetable_page(struct mm_struct *mm)
>
> void destroy_context(struct mm_struct *mm)
> {
> -#ifdef CONFIG_SPAPR_TCE_IOMMU
> - mm_iommu_cleanup(mm);
> -#endif
> -
> #ifdef CONFIG_PPC_ICSWX
> drop_cop(mm->context.acop, mm);
> kfree(mm->context.cop_lockp);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
> index ee6685b..10f01fe 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -293,14 +293,3 @@ void mm_iommu_init(struct mm_struct *mm)
> {
> INIT_LIST_HEAD_RCU(&mm->context.iommu_group_mem_list);
> }
> -
> -void mm_iommu_cleanup(struct mm_struct *mm)
> -{
> - struct mm_iommu_table_group_mem_t *mem, *tmp;
> -
> - list_for_each_entry_safe(mem, tmp, &mm->context.iommu_group_mem_list,
> - next) {
> - list_del_rcu(&mem->next);
> - mm_iommu_do_free(mem);
> - }
> -}
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 9752e77..40e71a0 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -89,6 +89,15 @@ struct tce_iommu_group {
> };
>
> /*
> + * A container needs to remember which preregistered areas and how many times
> + * it has referenced to do proper cleanup at the userspace process exit.
> + */
> +struct tce_iommu_prereg {
> + struct list_head next;
> + struct mm_iommu_table_group_mem_t *mem;
> +};
> +
> +/*
> * The container descriptor supports only a single group per container.
> * Required by the API as the container is not supplied with the IOMMU group
> * at the moment of initialization.
> @@ -101,12 +110,26 @@ struct tce_container {
> struct mm_struct *mm;
> struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
> struct list_head group_list;
> + struct list_head prereg_list;
> };
>
> +static long tce_iommu_prereg_free(struct tce_container *container,
> + struct tce_iommu_prereg *tcemem)
> +{
> + long ret;
> +
> + list_del(&tcemem->next);
> + ret = mm_iommu_put(container->mm, tcemem->mem);
> + kfree(tcemem);
> +
> + return ret;
> +}
> +
> static long tce_iommu_unregister_pages(struct tce_container *container,
> __u64 vaddr, __u64 size)
> {
> struct mm_iommu_table_group_mem_t *mem;
> + struct tce_iommu_prereg *tcemem;
>
> if ((vaddr & ~PAGE_MASK) || (size & ~PAGE_MASK))
> return -EINVAL;
> @@ -115,7 +138,12 @@ static long tce_iommu_unregister_pages(struct tce_container *container,
> if (!mem)
> return -ENOENT;
>
> - return mm_iommu_put(container->mm, mem);
> + list_for_each_entry(tcemem, &container->prereg_list, next) {
> + if (tcemem->mem == mem)
> + return tce_iommu_prereg_free(container, tcemem);
> + }
> +
> + return -ENOENT;
> }
>
> static long tce_iommu_register_pages(struct tce_container *container,
> @@ -123,6 +151,7 @@ static long tce_iommu_register_pages(struct tce_container *container,
> {
> long ret = 0;
> struct mm_iommu_table_group_mem_t *mem = NULL;
> + struct tce_iommu_prereg *tcemem;
> unsigned long entries = size >> PAGE_SHIFT;
>
> if ((vaddr & ~PAGE_MASK) || (size & ~PAGE_MASK) ||
> @@ -140,6 +169,18 @@ static long tce_iommu_register_pages(struct tce_container *container,
> ret = mm_iommu_get(container->mm, vaddr, entries, &mem);
> if (ret)
> return ret;
> +
> + list_for_each_entry(tcemem, &container->prereg_list, next) {
> + if (tcemem->mem == mem) {
> + mm_iommu_put(container->mm, mem);
> + return -EBUSY;
> + }
> + }
> +
> + tcemem = kzalloc(sizeof(*tcemem), GFP_KERNEL);
> + tcemem->mem = mem;
> + list_add(&tcemem->next, &container->prereg_list);
> +
> container->enabled = true;
>
> return 0;
> @@ -325,6 +366,7 @@ static void *tce_iommu_open(unsigned long arg)
>
> mutex_init(&container->lock);
> INIT_LIST_HEAD_RCU(&container->group_list);
> + INIT_LIST_HEAD_RCU(&container->prereg_list);
>
> container->v2 = arg == VFIO_SPAPR_TCE_v2_IOMMU;
>
> @@ -362,6 +404,14 @@ static void tce_iommu_release(void *iommu_data)
> tce_iommu_free_table(tbl);
> }
>
> + while (!list_empty(&container->prereg_list)) {
> + struct tce_iommu_prereg *tcemem;
> +
> + tcemem = list_first_entry(&container->prereg_list,
> + struct tce_iommu_prereg, next);
> + tce_iommu_prereg_free(container, tcemem);
> + }
> +
> if (container->mm)
> mmdrop(container->mm);
> tce_iommu_disable(container);
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2016-08-12 3:17 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-03 8:40 [PATCH kernel 00/15] powerpc/kvm/vfio: Enable in-kernel acceleration Alexey Kardashevskiy
2016-08-03 8:40 ` [PATCH kernel 01/15] Revert "iommu: Add a function to find an iommu group by id" Alexey Kardashevskiy
2016-08-15 4:58 ` Paul Mackerras
2016-08-03 8:40 ` [PATCH kernel 02/15] KVM: PPC: Finish enabling VFIO KVM device on POWER Alexey Kardashevskiy
2016-08-04 5:21 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 03/15] KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number Alexey Kardashevskiy
2016-08-03 8:40 ` [PATCH kernel 04/15] powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again Alexey Kardashevskiy
2016-08-04 5:23 ` David Gibson
2016-08-09 11:26 ` [kernel, " Michael Ellerman
2016-08-03 8:40 ` [PATCH kernel 05/15] powerpc/iommu: Stop using @current in mm_iommu_xxx Alexey Kardashevskiy
2016-08-03 10:10 ` Nicholas Piggin
2016-08-05 7:00 ` Michael Ellerman
2016-08-09 5:29 ` Alexey Kardashevskiy
2016-08-09 4:43 ` Balbir Singh
2016-08-09 6:04 ` Nicholas Piggin
2016-08-09 6:17 ` Balbir Singh
2016-08-12 2:57 ` David Gibson
2016-08-12 4:56 ` Alexey Kardashevskiy
2016-08-15 10:58 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 06/15] powerpc/mm/iommu: Put pages on process exit Alexey Kardashevskiy
2016-08-03 10:11 ` Nicholas Piggin
2016-08-12 3:13 ` David Gibson [this message]
2016-08-03 8:40 ` [PATCH kernel 07/15] powerpc/iommu: Cleanup iommu_table disposal Alexey Kardashevskiy
2016-08-12 3:18 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 08/15] powerpc/vfio_spapr_tce: Add reference counting to iommu_table Alexey Kardashevskiy
2016-08-12 3:29 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 09/15] powerpc/mmu: Add real mode support for IOMMU preregistered memory Alexey Kardashevskiy
2016-08-12 4:02 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 10/15] KVM: PPC: Use preregistered memory API to access TCE list Alexey Kardashevskiy
2016-08-12 4:17 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 11/15] powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange() Alexey Kardashevskiy
2016-08-12 4:29 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 12/15] KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently Alexey Kardashevskiy
2016-08-12 4:34 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 13/15] KVM: PPC: Pass kvm* to kvmppc_find_table() Alexey Kardashevskiy
2016-08-12 4:45 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 14/15] vfio/spapr_tce: Export container API for external users Alexey Kardashevskiy
2016-08-08 16:43 ` Alex Williamson
2016-08-09 5:19 ` Alexey Kardashevskiy
2016-08-09 12:16 ` Alex Williamson
2016-08-10 5:37 ` Alexey Kardashevskiy
2016-08-10 16:46 ` Alex Williamson
2016-08-12 5:46 ` David Gibson
2016-08-12 6:12 ` Alexey Kardashevskiy
2016-08-15 11:07 ` David Gibson
2016-08-17 8:31 ` Alexey Kardashevskiy
2016-08-12 15:22 ` Alex Williamson
2016-08-17 3:17 ` David Gibson
2016-08-18 0:22 ` Alexey Kardashevskiy
2016-08-29 6:35 ` Alexey Kardashevskiy
2016-08-29 13:27 ` David Gibson
2016-09-07 9:09 ` Alexey Kardashevskiy
2016-09-21 6:56 ` Alexey Kardashevskiy
2016-09-23 7:12 ` David Gibson
2016-10-17 6:06 ` Alexey Kardashevskiy
2016-10-18 1:42 ` David Gibson
2016-08-15 3:59 ` Paul Mackerras
2016-08-15 15:32 ` Alex Williamson
2016-08-12 5:25 ` David Gibson
2016-08-03 8:40 ` [PATCH kernel 15/15] KVM: PPC: Add in-kernel acceleration for VFIO Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160812031356.GI16493@voom.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).