From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Auld <matthew.auld@intel.com>, intel-xe@lists.freedesktop.org
Cc: Oak Zeng <oak.zeng@intel.com>, stable@vger.kernel.org
Subject: Re: [PATCH v2 2/3] drm/xe/hmm: Don't dereference struct page pointers without notifier lock
Date: Tue, 04 Mar 2025 16:37:02 +0100 [thread overview]
Message-ID: <55e586e48941f825cb67f408f2a608be0a01e89d.camel@linux.intel.com> (raw)
In-Reply-To: <1e5ef4c8-9545-4102-88d9-865cf6a4bec9@intel.com>
On Tue, 2025-03-04 at 15:16 +0000, Matthew Auld wrote:
> On 04/03/2025 11:37, Thomas Hellström wrote:
> > The pnfs that we obtain from hmm_range_fault() point to pages that
> > we don't have a reference on, and the guarantee that they are still
> > in the cpu page-tables is that the notifier lock must be held and
> > the
> > notifier seqno is still valid.
> >
> > So while building the sg table and marking the pages accesses /
> > dirty
> > we need to hold this lock with a validated seqno.
> >
> > However, the lock is reclaim tainted which makes
> > sg_alloc_table_from_pages_segment() unusable, since it internally
> > allocates memory.
> >
> > Instead build the sg-table manually. For the non-iommu case
> > this might lead to fewer coalesces, but if that's a problem it can
> > be fixed up later in the resource cursor code. For the iommu case,
> > the whole sg-table may still be coalesced to a single contigous
> > device va region.
> >
> > This avoids marking pages that we don't own dirty and accessed, and
> > it also avoid dereferencing struct pages that we don't own.
> >
> > v2:
> > - Use assert to check whether hmm pfns are valid (Matthew Auld)
> > - Take into account that large pages may cross range boundaries
> > (Matthew Auld)
> >
> > Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate
> > userptr")
> > Cc: Oak Zeng <oak.zeng@intel.com>
> > Cc: <stable@vger.kernel.org> # v6.10+
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_hmm.c | 119 ++++++++++++++++++++++++++++---
> > -----
> > 1 file changed, 93 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_hmm.c
> > b/drivers/gpu/drm/xe/xe_hmm.c
> > index c56738fa713b..93cce9e819a1 100644
> > --- a/drivers/gpu/drm/xe/xe_hmm.c
> > +++ b/drivers/gpu/drm/xe/xe_hmm.c
> > @@ -42,6 +42,40 @@ static void xe_mark_range_accessed(struct
> > hmm_range *range, bool write)
> > }
> > }
> >
> > +static int xe_alloc_sg(struct xe_device *xe, struct sg_table *st,
> > + struct hmm_range *range, struct
> > rw_semaphore *notifier_sem)
> > +{
> > + unsigned long i, npages, hmm_pfn;
> > + unsigned long num_chunks = 0;
> > + int ret;
> > +
> > + /* HMM docs says this is needed. */
> > + ret = down_read_interruptible(notifier_sem);
> > + if (ret)
> > + return ret;
> > +
> > + if (mmu_interval_read_retry(range->notifier, range-
> > >notifier_seq))
>
> up_read() ?
>
> > + return -EAGAIN;
> > +
> > + npages = xe_npages_in_range(range->start, range->end);
> > + for (i = 0; i < npages;) {
> > + unsigned long len;
> > +
> > + hmm_pfn = range->hmm_pfns[i];
> > + xe_assert(xe, hmm_pfn & HMM_PFN_VALID);
> > +
> > + len = 1UL << hmm_pfn_to_map_order(hmm_pfn);
> > +
> > + /* If order > 0 the page may extend beyond range-
> > >start */
> > + len -= (hmm_pfn & ~HMM_PFN_FLAGS) & (len - 1);
> > + i += len;
> > + num_chunks++;
> > + }
> > + up_read(notifier_sem);
> > +
> > + return sg_alloc_table(st, num_chunks, GFP_KERNEL);
> > +}
> > +
> > /**
> > * xe_build_sg() - build a scatter gather table for all the
> > physical pages/pfn
> > * in a hmm_range. dma-map pages if necessary. dma-address is
> > save in sg table
> > @@ -50,6 +84,7 @@ static void xe_mark_range_accessed(struct
> > hmm_range *range, bool write)
> > * @range: the hmm range that we build the sg table from. range-
> > >hmm_pfns[]
> > * has the pfn numbers of pages that back up this hmm address
> > range.
> > * @st: pointer to the sg table.
> > + * @notifier_sem: The xe notifier lock.
> > * @write: whether we write to this range. This decides dma map
> > direction
> > * for system pages. If write we map it bi-diretional; otherwise
> > * DMA_TO_DEVICE
> > @@ -76,38 +111,41 @@ static void xe_mark_range_accessed(struct
> > hmm_range *range, bool write)
> > * Returns 0 if successful; -ENOMEM if fails to allocate memory
> > */
> > static int xe_build_sg(struct xe_device *xe, struct hmm_range
> > *range,
> > - struct sg_table *st, bool write)
> > + struct sg_table *st,
> > + struct rw_semaphore *notifier_sem,
> > + bool write)
> > {
> > + unsigned long npages = xe_npages_in_range(range->start,
> > range->end);
> > struct device *dev = xe->drm.dev;
> > - struct page **pages;
> > - u64 i, npages;
> > - int ret;
> > + struct scatterlist *sgl;
> > + struct page *page;
> > + unsigned long i, j;
> >
> > - npages = xe_npages_in_range(range->start, range->end);
> > - pages = kvmalloc_array(npages, sizeof(*pages),
> > GFP_KERNEL);
> > - if (!pages)
> > - return -ENOMEM;
> > + lockdep_assert_held(notifier_sem);
> >
> > - for (i = 0; i < npages; i++) {
> > - pages[i] = hmm_pfn_to_page(range->hmm_pfns[i]);
> > - xe_assert(xe, !is_device_private_page(pages[i]));
> > - }
> > + i = 0;
> > + for_each_sg(st->sgl, sgl, st->nents, j) {
> > + unsigned long hmm_pfn, size;
> >
> > - ret = sg_alloc_table_from_pages_segment(st, pages, npages,
> > 0, npages << PAGE_SHIFT,
> > -
> > xe_sg_segment_size(dev), GFP_KERNEL);
> > - if (ret)
> > - goto free_pages;
> > + hmm_pfn = range->hmm_pfns[i];
> > + page = hmm_pfn_to_page(hmm_pfn);
> > + xe_assert(xe, !is_device_private_page(page));
> >
> > - ret = dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL :
> > DMA_TO_DEVICE,
> > - DMA_ATTR_SKIP_CPU_SYNC |
> > DMA_ATTR_NO_KERNEL_MAPPING);
> > - if (ret) {
> > - sg_free_table(st);
> > - st = NULL;
> > + size = 1UL << hmm_pfn_to_map_order(hmm_pfn);
> > + size -= page_to_pfn(page) & (size - 1);
> > + i += size;
> > +
> > + if (unlikely(j == st->nents - 1)) {
> > + if (i > npages)
> > + size -= (i - npages);
> > + sg_mark_end(sgl);
> > + }
> > + sg_set_page(sgl, page, size << PAGE_SHIFT, 0);
> > }
> > + xe_assert(xe, i == npages);
> >
> > -free_pages:
> > - kvfree(pages);
> > - return ret;
> > + return dma_map_sgtable(dev, st, write ? DMA_BIDIRECTIONAL
> > : DMA_TO_DEVICE,
> > + DMA_ATTR_SKIP_CPU_SYNC |
> > DMA_ATTR_NO_KERNEL_MAPPING);
> > }
> >
> > /**
> > @@ -235,16 +273,45 @@ int xe_hmm_userptr_populate_range(struct
> > xe_userptr_vma *uvma,
> > if (ret)
> > goto free_pfns;
> >
> > - ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt,
> > write);
> > + if (unlikely(userptr->sg)) {
>
> Is it really possible to hit this? We did the unmap above also,
> although
> that was outside of the notifier lock?
Right. That was a spill-over from the next patch where we unmap in the
notifier. But for this patch I'll fix up the two issues mentioned and
add your R-B.
Thanks for reviewing.
Thomas
>
> Otherwise,
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>
> > + ret = down_write_killable(&vm-
> > >userptr.notifier_lock);
> > + if (ret)
> > + goto free_pfns;
> > +
> > + xe_hmm_userptr_free_sg(uvma);
> > + up_write(&vm->userptr.notifier_lock);
> > + }
> > +
> > + ret = xe_alloc_sg(vm->xe, &userptr->sgt, &hmm_range, &vm-
> > >userptr.notifier_lock);
> > if (ret)
> > goto free_pfns;
> >
> > + ret = down_read_interruptible(&vm->userptr.notifier_lock);
> > + if (ret)
> > + goto free_st;
> > +
> > + if (mmu_interval_read_retry(hmm_range.notifier,
> > hmm_range.notifier_seq)) {
> > + ret = -EAGAIN;
> > + goto out_unlock;
> > + }
> > +
> > + ret = xe_build_sg(vm->xe, &hmm_range, &userptr->sgt,
> > + &vm->userptr.notifier_lock, write);
> > + if (ret)
> > + goto out_unlock;
> > +
> > xe_mark_range_accessed(&hmm_range, write);
> > userptr->sg = &userptr->sgt;
> > userptr->notifier_seq = hmm_range.notifier_seq;
> > + up_read(&vm->userptr.notifier_lock);
> > + kvfree(pfns);
> > + return 0;
> >
> > +out_unlock:
> > + up_read(&vm->userptr.notifier_lock);
> > +free_st:
> > + sg_free_table(&userptr->sgt);
> > free_pfns:
> > kvfree(pfns);
> > return ret;
> > }
> > -
>
next prev parent reply other threads:[~2025-03-04 15:37 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-04 11:37 [PATCH v2 0/3] drm/xe: Userptr invalid page access fixes Thomas Hellström
2025-03-04 11:37 ` [PATCH v2 1/3] drm/xe/hmm: Style- and include fixes Thomas Hellström
2025-03-04 12:21 ` Jani Nikula
2025-03-04 15:16 ` Matthew Auld
2025-03-04 11:37 ` [PATCH v2 2/3] drm/xe/hmm: Don't dereference struct page pointers without notifier lock Thomas Hellström
2025-03-04 15:16 ` Matthew Auld
2025-03-04 15:37 ` Thomas Hellström [this message]
2025-03-04 11:37 ` [PATCH v2 3/3] drm/xe/userptr: Unmap userptrs in the mmu notifier Thomas Hellström
2025-03-04 16:53 ` Matthew Auld
2025-03-04 13:48 ` ✗ CI.Patch_applied: failure for drm/xe: Userptr invalid page access fixes (rev2) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55e586e48941f825cb67f408f2a608be0a01e89d.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
--cc=oak.zeng@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.