From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 42Drvh2JQqzF0RH for ; Tue, 18 Sep 2018 15:21:52 +1000 (AEST) Date: Tue, 18 Sep 2018 13:51:16 +1000 From: David Gibson To: "Aneesh Kumar K.V" Cc: npiggin@gmail.com, benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au, Alexey Kardashevskiy , linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC PATCH 3/3] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get Message-ID: <20180918035116.GA30868@umbus.fritz.box> References: <20180903163733.27965-1-aneesh.kumar@linux.ibm.com> <20180903163733.27965-4-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="TB36FDmn/VVEgNH/" In-Reply-To: <20180903163733.27965-4-aneesh.kumar@linux.ibm.com> List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , --TB36FDmn/VVEgNH/ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Sep 03, 2018 at 10:07:33PM +0530, Aneesh Kumar K.V wrote: > Current code doesn't do page migration if the page allocated is a compoun= d page. > With HugeTLB migration support, we can end up allocating hugetlb pages fr= om > CMA region. Also THP pages can be allocated from CMA region. This patch u= pdates > the code to handle compound pages correctly. >=20 > This add a new helper get_user_pages_cma_migrate. It does one get_user_pa= ges > with right count, instead of doing one get_user_pages per page. That avoi= ds > reading page table multiple times. The helper could possibly used by other > subystem if we have more users. >=20 > The patch also convert the hpas member of mm_iommu_table_group_mem_t to a= union. > We use the same storage location to store pointers to struct page. We can= not > update alll the code path use struct page *, because we access hpas in re= al mode > and we can't do that struct page * to pfn conversion in real mode. >=20 > Signed-off-by: Aneesh Kumar K.V This approach doesn't seem quite right to me. It's specific to pages mapped into the IOMMU. It's true that will address the obvious case we have, of vfio-using guests fragmenting the CMA for other guests. But AFAICT, fragmenting the CMA coud happen with *any* locked memory, not just things that are IOMMU mapped for VFIO. So, for example a guest not using vfio, but using -realtime mlock=3Don, or an unrelated program using locked memory (e.g. gpg or something else that locks memory for security reasons). AFAICT this approach won't fix the problem for that case. > --- > arch/powerpc/mm/mmu_context_iommu.c | 195 ++++++++++++++++++---------- > 1 file changed, 123 insertions(+), 72 deletions(-) >=20 > diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_co= ntext_iommu.c > index f472965f7638..597b88a0abce 100644 > --- a/arch/powerpc/mm/mmu_context_iommu.c > +++ b/arch/powerpc/mm/mmu_context_iommu.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > =20 > static DEFINE_MUTEX(mem_list_mutex); > =20 > @@ -30,8 +31,18 @@ struct mm_iommu_table_group_mem_t { > atomic64_t mapped; > unsigned int pageshift; > u64 ua; /* userspace address */ > - u64 entries; /* number of entries in hpas[] */ > - u64 *hpas; /* vmalloc'ed */ > + u64 entries; /* number of entries in hpages[] */ > + /* > + * in mm_iommu_get we temporarily use this to store > + * struct page address. > + * > + * We need to convert ua to hpa in real mode. Make it > + * simpler by storing physicall address. > + */ > + union { > + struct page **hpages; /* vmalloc'ed */ > + phys_addr_t *hpas; > + }; > }; > =20 > static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, > @@ -75,62 +86,112 @@ bool mm_iommu_preregistered(struct mm_struct *mm) > EXPORT_SYMBOL_GPL(mm_iommu_preregistered); > =20 > /* > - * Taken from alloc_migrate_target with changes to remove CMA allocations > + * Taken from alloc_migrate_target/alloc_migrate_huge_page with changes = to remove > + * CMA allocations > + * Is this the right allocator for hugetlb? > */ > struct page *new_iommu_non_cma_page(struct page *page, unsigned long pri= vate) > { > - gfp_t gfp_mask =3D GFP_USER; > - struct page *new_page; > + /* is this the right nid? */ > + int nid =3D numa_mem_id(); > + gfp_t gfp_mask =3D GFP_HIGHUSER; > =20 > - if (PageCompound(page)) > - return NULL; > + if (PageHuge(page)) { > =20 > - if (PageHighMem(page)) > - gfp_mask |=3D __GFP_HIGHMEM; > + struct hstate *h =3D page_hstate(page); > + /* > + * We don't want to dequeue from the pool because pool pages will > + * mostly be from the CMA region. > + */ > + return alloc_migrate_huge_page(h, gfp_mask, nid, NULL); > =20 > - /* > - * We don't want the allocation to force an OOM if possibe > - */ > - new_page =3D alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN); > - return new_page; > + } else if (PageTransHuge(page)) { > + struct page *thp; > + gfp_t thp_gfpmask =3D GFP_TRANSHUGE & ~__GFP_MOVABLE; > + > + thp =3D __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER); > + if (!thp) > + return NULL; > + prep_transhuge_page(thp); > + return thp; > + } > + return __alloc_pages_node(nid, gfp_mask, 0); > } > =20 > -static int mm_iommu_move_page_from_cma(struct page *page) > +int get_user_pages_cma_migrate(unsigned long start, int nr_pages, int wr= ite, > + struct page **pages) > { > - int ret =3D 0; > - LIST_HEAD(cma_migrate_pages); > - > - /* Ignore huge pages for now */ > - if (PageCompound(page)) > - return -EBUSY; > - > - lru_add_drain(); > - ret =3D isolate_lru_page(page); > - if (ret) > + int i, ret; > + bool drain_allow =3D true; > + bool migrate_allow =3D true; > + LIST_HEAD(cma_page_list); > + > +get_user_again: > + ret =3D get_user_pages_fast(start, nr_pages, write, pages); > + if (ret <=3D 0) > return ret; > =20 > - list_add(&page->lru, &cma_migrate_pages); > - put_page(page); /* Drop the gup reference */ > - > - ret =3D migrate_pages(&cma_migrate_pages, new_iommu_non_cma_page, > - NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE); > - if (ret) { > - if (!list_empty(&cma_migrate_pages)) > - putback_movable_pages(&cma_migrate_pages); > + for (i =3D 0; i < ret; ++i) { > + /* > + * If we get a page from the CMA zone, since we are going to > + * be pinning these entries, we might as well move them out > + * of the CMA zone if possible. > + */ > + if (is_migrate_cma_page(pages[i]) && migrate_allow) { > + if (PageHuge(pages[i])) > + isolate_huge_page(pages[i], &cma_page_list); > + else { > + struct page *head =3D compound_head(pages[i]); > + > + if (!PageLRU(head) && drain_allow) { > + lru_add_drain_all(); > + drain_allow =3D false; > + } > + > + if (!isolate_lru_page(head)) { > + list_add_tail(&head->lru, &cma_page_list); > + mod_node_page_state(page_pgdat(head), > + NR_ISOLATED_ANON + > + page_is_file_cache(head), > + hpage_nr_pages(head)); > + } > + } > + } > } > - > - return 0; > + if (!list_empty(&cma_page_list)) { > + /* > + * drop the above get_user_pages reference. > + */ > + for (i =3D 0; i < ret; ++i) > + put_page(pages[i]); > + > + if (migrate_pages(&cma_page_list, new_iommu_non_cma_page, > + NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) { > + /* > + * some of the pages failed migration. Do get_user_pages > + * without migration. > + */ > + migrate_allow =3D false; > + > + if (!list_empty(&cma_page_list)) > + putback_movable_pages(&cma_page_list); > + } > + /* > + * We did migrate all the pages, Try to get the page references again > + * migrating any new CMA pages which we failed to isolate earlier. > + */ > + drain_allow =3D true; > + goto get_user_again; > + } > + return ret; > } > =20 > long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long = entries, > struct mm_iommu_table_group_mem_t **pmem) > { > struct mm_iommu_table_group_mem_t *mem; > - long i, j, ret =3D 0, locked_entries =3D 0; > + long i, ret =3D 0, locked_entries =3D 0; > unsigned int pageshift; > - unsigned long flags; > - unsigned long cur_ua; > - struct page *page =3D NULL; > =20 > mutex_lock(&mem_list_mutex); > =20 > @@ -177,47 +238,37 @@ long mm_iommu_get(struct mm_struct *mm, unsigned lo= ng ua, unsigned long entries, > goto unlock_exit; > } > =20 > + ret =3D get_user_pages_cma_migrate(ua, entries, 1, mem->hpages); > + if (ret !=3D entries) { > + /* free the reference taken */ > + for (i =3D 0; i < ret; i++) > + put_page(mem->hpages[i]); > + > + vfree(mem->hpas); > + kfree(mem); > + ret =3D -EFAULT; > + goto unlock_exit; > + } else > + ret =3D 0; > + > + pageshift =3D PAGE_SHIFT; > for (i =3D 0; i < entries; ++i) { > - cur_ua =3D ua + (i << PAGE_SHIFT); > - if (1 !=3D get_user_pages_fast(cur_ua, > - 1/* pages */, 1/* iswrite */, &page)) { > - ret =3D -EFAULT; > - for (j =3D 0; j < i; ++j) > - put_page(pfn_to_page(mem->hpas[j] >> > - PAGE_SHIFT)); > - vfree(mem->hpas); > - kfree(mem); > - goto unlock_exit; > - } > + struct page *page =3D mem->hpages[i]; > /* > - * If we get a page from the CMA zone, since we are going to > - * be pinning these entries, we might as well move them out > - * of the CMA zone if possible. NOTE: faulting in + migration > - * can be expensive. Batching can be considered later > + * Allow to use larger than 64k IOMMU pages. Only do that > + * if we are backed by hugetlb. > */ > - if (is_migrate_cma_page(page)) { > - if (mm_iommu_move_page_from_cma(page)) > - goto populate; > - if (1 !=3D get_user_pages_fast(cur_ua, > - 1/* pages */, 1/* iswrite */, > - &page)) { > - ret =3D -EFAULT; > - for (j =3D 0; j < i; ++j) > - put_page(pfn_to_page(mem->hpas[j] >> > - PAGE_SHIFT)); > - vfree(mem->hpas); > - kfree(mem); > - goto unlock_exit; > - } > - } > -populate: > - pageshift =3D PAGE_SHIFT; > - if (mem->pageshift > PAGE_SHIFT && PageHuge(page)) { > + if ((mem->pageshift > PAGE_SHIFT) && PageHuge(page)) { > struct page *head =3D compound_head(page); > pageshift =3D compound_order(head) + PAGE_SHIFT; > } > mem->pageshift =3D min(mem->pageshift, pageshift); > + /* > + * We don't need struct page reference any more, switch > + * physicall address. > + */ > mem->hpas[i] =3D page_to_pfn(page) << PAGE_SHIFT; > + > } > =20 > atomic64_set(&mem->mapped, 1); --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --TB36FDmn/VVEgNH/ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlugdjIACgkQbDjKyiDZ s5LwHxAAybGdP/B0/Lroq6iaFR6CbH+7N16uzOetTnXBZk4TD9pFgyRnG0ndmGrp CDIflySudsIbRXCs0HYF76daeWITIEMBEVwUg+3+HvYYVcziXYU+Kc5zTbWdsd3k 4+a8A3x2P8TOdi2NKXJeiVquVzIOMHi7Q1DdcoIcZVuZ35EV1WHo2/SOiSOkTlBR 4Q2ELiLSgoOY+Cn+rAAB1ArZIb5yWLw2AjQFCBtn3yfNo9rpx16fQg8D1chUV0U5 +ofDykQDmMzXCLYLAN34y3K4igtRJ0B0LxKrPfFniE/KGcgRYnbQkbMD+atx8brw h5mXJTxKYHtqbF0B4L65xi9YkJ+wyTbRQzDYPDFkMKUmrMc0zOmEZSxB4GpD6H5q xpUoAwAmGI8RL8FVioD5rrtiPkG0Ss5+z9qNAQq/TnsmQSvThMQJXjWED550Pzyp 6W9DcGG1KcflwIAMMWElIWebfbntlbA8ces0qJLRYS4y4BZsVp46hcZT8BGlXOXa BNJM1f6w1wiygnOZMDYuQ/mmIqyqaFTnMKRf28Qa9O3APuYuYGp3kTOHmYTDfW9q IntZ5bvnd0XLgX6Q669e799j11vTUxo+OM8r9YBqc/dnMQnDxPH3eM4GLW6PeYa7 67bynZOjjbfEbN4Evx1tFkGeZE6qZxhoZdI/IBuciIB29AAh5q0= =RhMO -----END PGP SIGNATURE----- --TB36FDmn/VVEgNH/--