From: Leon Romanovsky <leon@kernel.org>
To: Zhu Yanjun <zyjzyj2000@gmail.com>
Cc: Doug Ledford <dledford@redhat.com>,
Jason Gunthorpe <jgg@nvidia.com>,
RDMA mailing list <linux-rdma@vger.kernel.org>,
maorg@nvidia.com
Subject: Re: Fwd: [PATCH 1/1] RDMA/umem: add back hugepage sg list
Date: Sun, 7 Mar 2021 19:20:04 +0200 [thread overview]
Message-ID: <YEULROAvpIaLcnXp@unreal> (raw)
In-Reply-To: <CAD=hENeqTTmpS5V+G646V0QvJFLVSd3Sq11ffQFcDXU-OSsQEg@mail.gmail.com>
Please use git send-email to send patches.
Also don't write 1/1 for single patch.
Thanks
On Sun, Mar 07, 2021 at 10:28:33PM +0800, Zhu Yanjun wrote:
> From: Zhu Yanjun <zyjzyj2000@gmail.com>
>
> After the commit ("RDMA/umem: Move to allocate SG table from pages"),
> the sg list from ib_ume_get is like the following:
> "
> sg_dma_address(sg):0x4b3c1ce000
> sg_dma_address(sg):0x4c3c1cd000
> sg_dma_address(sg):0x4d3c1cc000
> sg_dma_address(sg):0x4e3c1cb000
> "
>
> But sometimes, we need sg list like the following:
> "
> sg_dma_address(sg):0x203b400000
> sg_dma_address(sg):0x213b200000
> sg_dma_address(sg):0x223b000000
> sg_dma_address(sg):0x233ae00000
> sg_dma_address(sg):0x243ac00000
> "
> The function ib_umem_add_sg_table can provide the sg list like the
> second. And this function is removed in the commit ("RDMA/umem: Move
> to allocate SG table from pages"). Now I add it back.
>
> The new function is ib_umem_get to ib_umem_hugepage_get that calls
> ib_umem_add_sg_table.
>
> This function ib_umem_huagepage_get can get 4K, 2M sg list dma address.
>
> Fixes: 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages")
> Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
> ---
> drivers/infiniband/core/umem.c | 197 +++++++++++++++++++++++++++++++++
> include/rdma/ib_umem.h | 3 +
> 2 files changed, 200 insertions(+)
>
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 2dde99a9ba07..af8733788b1e 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -62,6 +62,203 @@ static void __ib_umem_release(struct ib_device
> *dev, struct ib_umem *umem, int d
> sg_free_table(&umem->sg_head);
> }
>
> +/* ib_umem_add_sg_table - Add N contiguous pages to scatter table
> + *
> + * sg: current scatterlist entry
> + * page_list: array of npage struct page pointers
> + * npages: number of pages in page_list
> + * max_seg_sz: maximum segment size in bytes
> + * nents: [out] number of entries in the scatterlist
> + *
> + * Return new end of scatterlist
> + */
> +static struct scatterlist *ib_umem_add_sg_table(struct scatterlist *sg,
> + struct page **page_list,
> + unsigned long npages,
> + unsigned int max_seg_sz,
> + int *nents)
> +{
> + unsigned long first_pfn;
> + unsigned long i = 0;
> + bool update_cur_sg = false;
> + bool first = !sg_page(sg);
> +
> + /* Check if new page_list is contiguous with end of previous page_list.
> + * sg->length here is a multiple of PAGE_SIZE and sg->offset is 0.
> + */
> + if (!first && (page_to_pfn(sg_page(sg)) + (sg->length >> PAGE_SHIFT) ==
> + page_to_pfn(page_list[0])))
> + update_cur_sg = true;
> +
> + while (i != npages) {
> + unsigned long len;
> + struct page *first_page = page_list[i];
> +
> + first_pfn = page_to_pfn(first_page);
> +
> + /* Compute the number of contiguous pages we have starting
> + * at i
> + */
> + for (len = 0; i != npages &&
> + first_pfn + len == page_to_pfn(page_list[i]) &&
> + len < (max_seg_sz >> PAGE_SHIFT);
> + len++)
> + i++;
> +
> + /* Squash N contiguous pages from page_list into current sge */
> + if (update_cur_sg) {
> + if ((max_seg_sz - sg->length) >= (len << PAGE_SHIFT)) {
> + sg_set_page(sg, sg_page(sg),
> + sg->length + (len << PAGE_SHIFT),
> + 0);
> + update_cur_sg = false;
> + continue;
> + }
> + update_cur_sg = false;
> + }
> +
> + /* Squash N contiguous pages into next sge or first sge */
> + if (!first)
> + sg = sg_next(sg);
> +
> + (*nents)++;
> + sg_set_page(sg, first_page, len << PAGE_SHIFT, 0);
> + first = false;
> + }
> +
> + return sg;
> +}
> +
> +/**
> + * ib_umem_hugepage_get - Pin and DMA map userspace memory.
> + *
> + * @device: IB device to connect UMEM
> + * @addr: userspace virtual address to start at
> + * @size: length of region to pin
> + * @access: IB_ACCESS_xxx flags for memory being pinned
> + */
> +struct ib_umem *ib_umem_hugepage_get(struct ib_device *device,
> + unsigned long addr,
> + size_t size, int access)
> +{
> + struct ib_umem *umem;
> + struct page **page_list;
> + unsigned long lock_limit;
> + unsigned long new_pinned;
> + unsigned long cur_base;
> + unsigned long dma_attr = 0;
> + struct mm_struct *mm;
> + unsigned long npages;
> + int ret;
> + struct scatterlist *sg;
> + unsigned int gup_flags = FOLL_WRITE;
> +
> + /*
> + * If the combination of the addr and size requested for this memory
> + * region causes an integer overflow, return error.
> + */
> + if (((addr + size) < addr) ||
> + PAGE_ALIGN(addr + size) < (addr + size))
> + return ERR_PTR(-EINVAL);
> +
> + if (!can_do_mlock())
> + return ERR_PTR(-EPERM);
> +
> + if (access & IB_ACCESS_ON_DEMAND)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + umem = kzalloc(sizeof(*umem), GFP_KERNEL);
> + if (!umem)
> + return ERR_PTR(-ENOMEM);
> + umem->ibdev = device;
> + umem->length = size;
> + umem->address = addr;
> + umem->writable = ib_access_writable(access);
> + umem->owning_mm = mm = current->mm;
> + mmgrab(mm);
> +
> + page_list = (struct page **) __get_free_page(GFP_KERNEL);
> + if (!page_list) {
> + ret = -ENOMEM;
> + goto umem_kfree;
> + }
> +
> + npages = ib_umem_num_pages(umem);
> + if (npages == 0 || npages > UINT_MAX) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +
> + new_pinned = atomic64_add_return(npages, &mm->pinned_vm);
> + if (new_pinned > lock_limit && !capable(CAP_IPC_LOCK)) {
> + atomic64_sub(npages, &mm->pinned_vm);
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + cur_base = addr & PAGE_MASK;
> +
> + ret = sg_alloc_table(&umem->sg_head, npages, GFP_KERNEL);
> + if (ret)
> + goto vma;
> +
> + if (!umem->writable)
> + gup_flags |= FOLL_FORCE;
> +
> + sg = umem->sg_head.sgl;
> +
> + while (npages) {
> + cond_resched();
> + ret = pin_user_pages_fast(cur_base,
> + min_t(unsigned long, npages,
> + PAGE_SIZE /
> + sizeof(struct page *)),
> + gup_flags | FOLL_LONGTERM, page_list);
> + if (ret < 0)
> + goto umem_release;
> +
> + cur_base += ret * PAGE_SIZE;
> + npages -= ret;
> +
> + sg = ib_umem_add_sg_table(sg, page_list, ret,
> + dma_get_max_seg_size(device->dma_device),
> + &umem->sg_nents);
> + }
> +
> + sg_mark_end(sg);
> +
> + if (access & IB_ACCESS_RELAXED_ORDERING)
> + dma_attr |= DMA_ATTR_WEAK_ORDERING;
> +
> + umem->nmap =
> + ib_dma_map_sg_attrs(device, umem->sg_head.sgl, umem->sg_nents,
> + DMA_BIDIRECTIONAL, dma_attr);
> +
> + if (!umem->nmap) {
> + ret = -ENOMEM;
> + goto umem_release;
> + }
> +
> + ret = 0;
> + goto out;
> +
> +umem_release:
> + __ib_umem_release(device, umem, 0);
> +vma:
> + atomic64_sub(ib_umem_num_pages(umem), &mm->pinned_vm);
> +out:
> + free_page((unsigned long) page_list);
> +umem_kfree:
> + if (ret) {
> + mmdrop(umem->owning_mm);
> + kfree(umem);
> + }
> + return ret ? ERR_PTR(ret) : umem;
> +}
> +EXPORT_SYMBOL(ib_umem_hugepage_get);
> +
> /**
> * ib_umem_find_best_pgsz - Find best HW page size to use for this MR
> *
> diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
> index 676c57f5ca80..fc6350ff21e6 100644
> --- a/include/rdma/ib_umem.h
> +++ b/include/rdma/ib_umem.h
> @@ -96,6 +96,9 @@ static inline void
> __rdma_umem_block_iter_start(struct ib_block_iter *biter,
> __rdma_block_iter_next(biter);)
>
> #ifdef CONFIG_INFINIBAND_USER_MEM
> +struct ib_umem *ib_umem_hugepage_get(struct ib_device *device,
> + unsigned long addr,
> + size_t size, int access);
>
> struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
> size_t size, int access);
> --
> 2.25.1
next prev parent reply other threads:[~2021-03-07 17:20 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20210307221034.568606-1-yanjun.zhu@intel.com>
2021-03-07 14:28 ` Fwd: [PATCH 1/1] RDMA/umem: add back hugepage sg list Zhu Yanjun
2021-03-07 17:20 ` Leon Romanovsky [this message]
2021-03-07 17:22 ` Leon Romanovsky
2021-03-08 2:44 ` Zhu Yanjun
2021-03-08 10:13 ` Zhu Yanjun
2021-03-08 12:16 ` Jason Gunthorpe
2021-03-11 10:41 ` Zhu Yanjun
2021-03-12 0:25 ` Jason Gunthorpe
2021-03-12 8:04 ` Zhu Yanjun
2021-03-12 8:05 ` Zhu Yanjun
2021-03-12 13:02 ` Jason Gunthorpe
2021-03-12 13:42 ` Zhu Yanjun
2021-03-12 13:49 ` Zhu Yanjun
2021-03-12 14:01 ` Jason Gunthorpe
2021-03-13 3:02 ` Zhu Yanjun
2021-03-19 13:00 ` Jason Gunthorpe
2021-03-19 13:33 ` Zhu Yanjun
2021-03-19 13:48 ` Jason Gunthorpe
2021-03-20 3:38 ` Zhu Yanjun
2021-03-20 3:49 ` Zhu Yanjun
2021-03-20 20:38 ` Jason Gunthorpe
2021-03-21 8:06 ` Zhu Yanjun
2021-03-21 12:07 ` Jason Gunthorpe
2021-03-21 12:54 ` Zhu Yanjun
2021-03-21 13:03 ` Jason Gunthorpe
2021-03-21 14:38 ` Zhu Yanjun
2021-03-21 15:52 ` Jason Gunthorpe
2021-03-22 5:07 ` Zhu Yanjun
2021-03-08 0:44 ` Jason Gunthorpe
2021-03-08 2:47 ` Zhu Yanjun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YEULROAvpIaLcnXp@unreal \
--to=leon@kernel.org \
--cc=dledford@redhat.com \
--cc=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=maorg@nvidia.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox