All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhangyuhao <yuhao.zhang@huawei.com>
To: David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	John Hubbard <jhubbard@nvidia.com>, Peter Xu <peterx@redhat.com>,
	Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>
Subject: RE: Issues with Pinning User Pages for SVA on IOMMUs Lacking IOPF
Date: Tue, 2 Sep 2025 13:04:51 +0000	[thread overview]
Message-ID: <54be598d93404e7185ecfbe49f7fe93c@huawei.com> (raw)
In-Reply-To: <fc37388a-dc09-4a71-bd79-c0d09e482c21@redhat.com>

[Adding linux-kernel mailing list for visibility]

Best,
Yuhao
-----Original Message-----
From: David Hildenbrand <david@redhat.com> 
Sent: Monday, September 1, 2025 10:34 PM
To: Zhangyuhao <yuhao.zhang@huawei.com>; Andrew Morton <akpm@linux-foundation.org>; Jason Gunthorpe <jgg@ziepe.ca>; John Hubbard <jhubbard@nvidia.com>; Peter Xu <peterx@redhat.com>; Joerg Roedel <joro@8bytes.org>; Will Deacon <will@kernel.org>; Robin Murphy <robin.murphy@arm.com>
Subject: Re: Issues with Pinning User Pages for SVA on IOMMUs Lacking IOPF

On 01.09.25 15:43, Zhangyuhao wrote:
> Hello Linux kernel community,

Hi,

> 
> Current IOMMU SVA support relies on hardware IOPF (IO Page Fault). We have observed that certain IOMMU devices do not support IOPF.
> But We are still exploring how to enable SVA in such scenarios.
> 
> To address this, we attempted to pin memory to prevent device accesses from triggering IO page faults.
> 
> Solution 1: User-space mlock + madvise(MADV_POPULATE_WRITE)
> 
> if (madvise(buf, size, MADV_POPULATE_WRITE) != 0) {
>      free(buf);
>      return 1;
> }
> if (mlock(buf, size) != 0) {
>      free(buf);
>      return 1;
> }
> Result: Page faults still occurred due to page migration.

Yes, NUMA-hinting might similarly affect this (even when page not migrated).

> 
> Solution 2: Kernel-space pin via IOCTL
> 
> ret = pin_user_pages_fast(cur_base, npages, FOLL_LONGTERM, page_list);
> 
> Result: Page faults occurred occasionally, traced to NUMA balancing marking pages as invalid.

Ah, there you talk about NUMA balancing.

> 
> To solve the problem, we used FOLL_LONGTERM | FOLL_HONOR_NUMA_FAULT to pin user pages.
> 

See prot_numa_skip(): we skip DMA-pinned folios in COW mappings only. So If you would have a !COW mapping (e.g., MAP_SHARED shmem), that wouldn't work reliably I think.

I think we could change that without causing too much harm.

diff --git a/mm/mprotect.c b/mm/mprotect.c index 113b489858341..17809c8604f25 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -137,8 +137,11 @@ static bool prot_numa_skip(struct vm_area_struct *vma, unsigned long addr,
                 goto skip;
  
         /* Also skip shared copy-on-write pages */
-       if (is_cow_mapping(vma->vm_flags) &&
-           (folio_maybe_dma_pinned(folio) || folio_maybe_mapped_shared(folio)))
+       if (is_cow_mapping(vma->vm_flags) && folio_maybe_mapped_shared(folio))
+               goto skip;
+
+       /* Folios that are pinned and cannot be migrated either way. */
+       if (folio_maybe_dma_pinned(folio))
                 goto skip;
  
         /*


> This approach has been tested and successfully prevents IO page faults so far.
> 
> We would like guidance from the community:
> 
> Can this approach reliably prevent all IO page faults?

See the case above regarding non-cow mappings.

We essentially need to make sure that we don't (temporarily) unmap for migration/reclaim/split/whatever if a folio maybe pinned.

We back out in all cases (unexpected reference), but we'll have to sanity-check whether we reject maybe_pinned folios early to not temporarily unmap.

> 
> Is there a better or recommended method to pin user pages for SVA?

Most use cases use longterm pinnings to then configure the iommu manually. Then, it does not really matter what happens to your process page tables.

So your use case is rather new :)

But yes, a longerm pinning while resolving NUMA-hitning faults should in theory work.

We just have to make sure that everybody else plays nice early with dma-pinned folios.

--
Cheers

David / dhildenb



           reply	other threads:[~2025-09-02 13:04 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <fc37388a-dc09-4a71-bd79-c0d09e482c21@redhat.com>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54be598d93404e7185ecfbe49f7fe93c@huawei.com \
    --to=yuhao.zhang@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.