From: Matthew Brost <matthew.brost@intel.com>
To: Oak Zeng <oak.zeng@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <thomas.hellstrom@intel.com>,
<airlied@gmail.com>, <brian.welty@intel.com>,
<himal.prasad.ghimiray@intel.com>
Subject: Re: [PATCH 4/5] drm/xe: Helper to populate a userptr or hmmptr
Date: Thu, 14 Mar 2024 20:25:01 +0000 [thread overview]
Message-ID: <ZfNdHSODk7eNLA5X@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <20240314033553.1379444-5-oak.zeng@intel.com>
On Wed, Mar 13, 2024 at 11:35:52PM -0400, Oak Zeng wrote:
> Add a helper function xe_hmm_populate_range to populate
> a a userptr or hmmptr range. This functions calls hmm_range_fault
> to read CPU page tables and populate all pfns/pages of this
> virtual address range.
>
> If the populated page is system memory page, dma-mapping is performed
> to get a dma-address which can be used later for GPU to access pages.
>
> If the populated page is device private page, we calculate the dpa (
> device physical address) of the page.
>
> The dma-address or dpa is then saved in userptr's sg table. This is
> prepare work to replace the get_user_pages_fast code in userptr code
> path. The helper function will also be used to populate hmmptr later.
>
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> Cc: Brian Welty <brian.welty@intel.com>
> ---
> drivers/gpu/drm/xe/Makefile | 3 +-
> drivers/gpu/drm/xe/xe_hmm.c | 213 ++++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_hmm.h | 12 ++
> 3 files changed, 227 insertions(+), 1 deletion(-)
> create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
> create mode 100644 drivers/gpu/drm/xe/xe_hmm.h
>
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 840467080e59..29dcbc938b01 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -143,7 +143,8 @@ xe-y += xe_bb.o \
> xe_wait_user_fence.o \
> xe_wa.o \
> xe_wopcm.o \
> - xe_svm_devmem.o
> + xe_svm_devmem.o \
> + xe_hmm.o
>
> # graphics hardware monitoring (HWMON) support
> xe-$(CONFIG_HWMON) += xe_hwmon.o
> diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
> new file mode 100644
> index 000000000000..c45c2447d386
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_hmm.c
> @@ -0,0 +1,213 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +#include <linux/mmu_notifier.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/memremap.h>
> +#include <linux/swap.h>
> +#include <linux/mm.h>
> +#include "xe_hmm.h"
> +#include "xe_vm.h"
> +
> +/**
> + * mark_range_accessed() - mark a range is accessed, so core mm
> + * have such information for memory eviction or write back to
> + * hard disk
> + *
> + * @range: the range to mark
> + * @write: if write to this range, we mark pages in this range
> + * as dirty
> + */
> +static void mark_range_accessed(struct hmm_range *range, bool write)
> +{
> + struct page *page;
> + u64 i, npages;
> +
> + npages = ((range->end - 1) >> PAGE_SHIFT) - (range->start >> PAGE_SHIFT) + 1;
> + for (i = 0; i < npages; i++) {
> + page = hmm_pfn_to_page(range->hmm_pfns[i]);
> + if (write) {
> + lock_page(page);
> + set_page_dirty(page);
> + unlock_page(page);
> + }
> + mark_page_accessed(page);
> + }
> +}
> +
> +/**
> + * build_sg() - build a scatter gather table for all the physical pages/pfn
> + * in a hmm_range. dma-address is save in sg table and will be used to program
> + * GPU page table later.
> + *
> + * @xe: the xe device who will access the dma-address in sg table
> + * @range: the hmm range that we build the sg table from. range->hmm_pfns[]
> + * has the pfn numbers of pages that back up this hmm address range.
> + * @st: pointer to the sg table.
> + * @write: whether we write to this range. This decides dma map direction
> + * for system pages. If write we map it bi-diretional; otherwise
> + * DMA_TO_DEVICE
> + *
> + * All the contiguous pfns will be collapsed into one entry in
> + * the scatter gather table. This is for the convenience of
> + * later on operations to bind address range to GPU page table.
> + *
> + * The dma_address in the sg table will later be used by GPU to
> + * access memory. So if the memory is system memory, we need to
> + * do a dma-mapping so it can be accessed by GPU/DMA. If the memory
> + * is GPU local memory (of the GPU who is going to access memory),
> + * we need gpu dpa (device physical address), and there is no need
> + * of dma-mapping.
> + *
> + * FIXME: dma-mapping for peer gpu device to access remote gpu's
> + * memory. Add this when you support p2p
> + *
> + * This function allocates the storage of the sg table. It is
> + * caller's responsibility to free it calling sg_free_table.
> + *
> + * Returns 0 if successful; -ENOMEM if fails to allocate memory
> + */
> +static int build_sg(struct xe_device *xe, struct hmm_range *range,
xe is unused.
> + struct sg_table *st, bool write)
> +{
> + struct device *dev = xe->drm.dev;
> + struct scatterlist *sg;
> + u64 i, npages;
> +
> + sg = NULL;
> + st->nents = 0;
> + npages = ((range->end - 1) >> PAGE_SHIFT) - (range->start >> PAGE_SHIFT) + 1;
> +
> + if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
> + return -ENOMEM;
> +
> + for (i = 0; i < npages; i++) {
> + struct page *page;
> + unsigned long addr;
> + struct xe_mem_region *mr;
> +
> + page = hmm_pfn_to_page(range->hmm_pfns[i]);
> + if (is_device_private_page(page)) {
> + mr = page_to_mem_region(page);
Not seeing where page_to_mem_region is defined.
> + addr = vram_pfn_to_dpa(mr, range->hmm_pfns[i]);
> + } else {
> + addr = dma_map_page(dev, page, 0, PAGE_SIZE,
> + write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE);
> + }
> +
> + if (sg && (addr == (sg_dma_address(sg) + sg->length))) {
> + sg->length += PAGE_SIZE;
> + sg_dma_len(sg) += PAGE_SIZE;
> + continue;
> + }
> +
> + sg = sg ? sg_next(sg) : st->sgl;
> + sg_dma_address(sg) = addr;
> + sg_dma_len(sg) = PAGE_SIZE;
> + sg->length = PAGE_SIZE;
> + st->nents++;
> + }
> +
> + sg_mark_end(sg);
> + return 0;
> +}
> +
Hmm, this looks way to open coded to me.
Can't we do something like this:
struct page **pages = convert from range->hmm_pfns
sg_alloc_table_from_pages_segment
if (is_device_private_page())
populatue sg table via vram_pfn_to_dpa
else
dma_map_sgtable
free(pages)
This assume we are not mixing is_device_private_page & system memory
pages in a single struct hmm_range.
> +/**
> + * xe_hmm_populate_range() - Populate physical pages of a virtual
> + * address range
> + *
> + * @vma: vma has information of the range to populate. only vma
> + * of userptr and hmmptr type can be populated.
> + * @hmm_range: pointer to hmm_range struct. hmm_rang->hmm_pfns
> + * will hold the populated pfns.
> + * @write: populate pages with write permission
> + *
> + * This function populate the physical pages of a virtual
> + * address range. The populated physical pages is saved in
> + * userptr's sg table. It is similar to get_user_pages but call
> + * hmm_range_fault.
> + *
> + * This function also read mmu notifier sequence # (
> + * mmu_interval_read_begin), for the purpose of later
> + * comparison (through mmu_interval_read_retry).
> + *
> + * This must be called with mmap read or write lock held.
> + *
> + * This function allocates the storage of the userptr sg table.
> + * It is caller's responsibility to free it calling sg_free_table.
I'd add a helper to free the sg_free_table & unmap the dma pages if
needed.
> + *
> + * returns: 0 for succuss; negative error no on failure
> + */
> +int xe_hmm_populate_range(struct xe_vma *vma, struct hmm_range *hmm_range,
> + bool write)
> +{
The layering is all wrong here, we shouldn't be touching struct xe_vma
directly in hmm layer.
Pass in a populated hmm_range and sgt. Or alternatively pass in
arguments and then populate a hmm_range locally.
> + unsigned long timeout =
> + jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> + unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
> + struct xe_userptr_vma *userptr_vma;
> + struct xe_userptr *userptr;
> + u64 start = vma->gpuva.va.addr;
> + u64 end = start + vma->gpuva.va.range;
We have helper - xe_vma_start and xe_vma_end but I think either of these
are correct in this case.
xe_vma_start is the address which this bound to the GPU, we want the
userptr address.
So I think it would be:
start = xe_vma_userptr()
end = xe_vma_userptr() + xe_vma_size()
> + struct xe_vm *vm = xe_vma_vm(vma);
> + u64 npages;
> + int ret;
> +
> + userptr_vma = to_userptr_vma(vma);
> + userptr = &userptr_vma->userptr;
> + mmap_assert_locked(userptr->notifier.mm);
> +
> + npages = ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
This math is done above, if you need this math in next rev add a helper.
> + pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
> + if (unlikely(!pfns))
> + return -ENOMEM;
> +
> + if (write)
> + flags |= HMM_PFN_REQ_WRITE;
> +
> + memset64((u64 *)pfns, (u64)flags, npages);
Why is this needed, can't we just set hmm_range->default_flags?
> + hmm_range->hmm_pfns = pfns;
> + hmm_range->notifier_seq = mmu_interval_read_begin(&userptr->notifier);
We need a userptr->notifier == userptr->notifier_seq check that just
returns, right?
> + hmm_range->notifier = &userptr->notifier;
> + hmm_range->start = start;
> + hmm_range->end = end;
> + hmm_range->pfn_flags_mask = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE;
Is this needed? AMD and Nouveau do not set this argument.
> + /**
> + * FIXME:
> + * Set the the dev_private_owner can prevent hmm_range_fault to fault
> + * in the device private pages owned by caller. See function
> + * hmm_vma_handle_pte. In multiple GPU case, this should be set to the
> + * device owner of the best migration destination. e.g., device0/vm0
> + * has a page fault, but we have determined the best placement of
> + * the fault address should be on device1, we should set below to
> + * device1 instead of device0.
> + */
> + hmm_range->dev_private_owner = vm->xe->drm.dev;
> +
> + while (true) {
mmap_read_lock(mm);
> + ret = hmm_range_fault(hmm_range);
mmap_read_unlock(mm);
> + if (time_after(jiffies, timeout))
> + break;
> +
> + if (ret == -EBUSY)
> + continue;
> + break;
> + }
> +
> + if (ret)
> + goto free_pfns;
> +
> + ret = build_sg(vm->xe, hmm_range, &userptr->sgt, write);
> + if (ret)
> + goto free_pfns;
> +
> + mark_range_accessed(hmm_range, write);
> + userptr->sg = &userptr->sgt;
Again this should be set in caller after this function return.
> + userptr->notifier_seq = hmm_range->notifier_seq;
This is could be a pass by reference I guess and set here.
> +
> +free_pfns:
> + kvfree(pfns);
> + return ret;
> +}
> +
> diff --git a/drivers/gpu/drm/xe/xe_hmm.h b/drivers/gpu/drm/xe/xe_hmm.h
> new file mode 100644
> index 000000000000..960f3f6d36ae
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_hmm.h
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +#include <linux/types.h>
> +#include <linux/hmm.h>
> +#include "xe_vm_types.h"
> +#include "xe_svm.h"
As per the previous patches no need to xe_*.h files, just forward
declare any arguments.
Matt
> +
> +int xe_hmm_populate_range(struct xe_vma *vma, struct hmm_range *hmm_range,
> + bool write);
> --
> 2.26.3
>
next prev parent reply other threads:[~2024-03-14 20:26 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-14 3:35 [PATCH 0/5] Use hmm_range_fault to populate user page Oak Zeng
2024-03-14 3:28 ` ✓ CI.Patch_applied: success for " Patchwork
2024-03-14 3:28 ` ✗ CI.checkpatch: warning " Patchwork
2024-03-14 3:29 ` ✗ CI.KUnit: failure " Patchwork
2024-03-14 3:35 ` [PATCH 1/5] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-03-14 17:17 ` Matthew Brost
2024-03-14 18:32 ` Zeng, Oak
2024-03-14 20:49 ` Matthew Brost
2024-03-15 16:00 ` Zeng, Oak
2024-03-15 20:39 ` Matthew Brost
2024-03-15 21:31 ` Zeng, Oak
2024-03-16 1:25 ` Matthew Brost
2024-03-18 10:16 ` Hellstrom, Thomas
2024-03-18 15:02 ` Zeng, Oak
2024-03-18 15:46 ` Hellstrom, Thomas
2024-03-18 14:51 ` Zeng, Oak
2024-03-15 1:45 ` Welty, Brian
2024-03-15 3:10 ` Zeng, Oak
2024-03-15 3:16 ` Zeng, Oak
2024-03-15 18:05 ` Welty, Brian
2024-03-15 23:11 ` Zeng, Oak
2024-03-14 3:35 ` [PATCH 2/5] drm/xe: Helper to get memory region from tile Oak Zeng
2024-03-14 17:33 ` Matthew Brost
2024-03-14 17:44 ` Matthew Brost
2024-03-15 2:48 ` Zeng, Oak
2024-03-14 3:35 ` [PATCH 3/5] drm/xe: Helper to get dpa from pfn Oak Zeng
2024-03-14 17:39 ` Matthew Brost
2024-03-15 17:29 ` Zeng, Oak
2024-03-16 1:33 ` Matthew Brost
2024-03-18 19:25 ` Zeng, Oak
2024-03-18 12:09 ` Hellstrom, Thomas
2024-03-18 19:27 ` Zeng, Oak
2024-03-14 3:35 ` [PATCH 4/5] drm/xe: Helper to populate a userptr or hmmptr Oak Zeng
2024-03-14 20:25 ` Matthew Brost [this message]
2024-03-16 1:35 ` Zeng, Oak
2024-03-18 0:29 ` Matthew Brost
2024-03-18 11:53 ` Hellstrom, Thomas
2024-03-18 19:50 ` Zeng, Oak
2024-03-19 8:41 ` Hellstrom, Thomas
2024-03-19 16:13 ` Zeng, Oak
2024-03-19 19:52 ` Hellstrom, Thomas
2024-03-19 20:01 ` Zeng, Oak
2024-03-18 13:12 ` Hellstrom, Thomas
2024-03-18 14:49 ` Zeng, Oak
2024-03-18 15:40 ` Hellstrom, Thomas
2024-03-18 16:09 ` Zeng, Oak
2024-03-14 3:35 ` [PATCH 5/5] drm/xe: Use hmm_range_fault to populate user pages Oak Zeng
2024-03-14 20:54 ` Matthew Brost
2024-03-19 2:36 ` Zeng, Oak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZfNdHSODk7eNLA5X@DUT025-TGLU.fm.intel.com \
--to=matthew.brost@intel.com \
--cc=airlied@gmail.com \
--cc=brian.welty@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=oak.zeng@intel.com \
--cc=thomas.hellstrom@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.