From: Pantelis Antoniou <p.antoniou@partner.samsung.com>
To: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
<mm-commits@vger.kernel.org>, <wade.farnsworth@siemens.com>,
<jhubbard@nvidia.com>, <jgg@ziepe.ca>, <david@redhat.com>,
<c.briere@samsung.com>, <artem.k@samsung.com>,
David Howells <dhowells@redhat.com>
Subject: Re: + fix-zero-copy-i-o-on-__get_user_pages-allocated-pages.patch added to mm-hotfixes-unstable branch
Date: Thu, 8 May 2025 17:36:12 +0300 [thread overview]
Message-ID: <20250508173612.34d1bea3@sarc.samsung.com> (raw)
In-Reply-To: <aBy8v7gi3tsQ5-iy@x1.local>
On Thu, 8 May 2025 10:16:31 -0400
Peter Xu <peterx@redhat.com> wrote:
Hi Peter,
> Hi, Pantelis, [Cc David Howells] On Wed, May 07, 2025 at 02: 55: 54PM
> -0700, Andrew Morton wrote: > > The patch titled > Subject: Fix zero
> copy I/O on __get_user_pages allocated pages > has been added to the
> -mm mm-hotfixes-unstable
> Hi, Pantelis,
>
> [Cc David Howells]
>
> On Wed, May 07, 2025 at 02:55:54PM -0700, Andrew Morton wrote:
> >
> > The patch titled
> > Subject: Fix zero copy I/O on __get_user_pages allocated pages
> > has been added to the -mm mm-hotfixes-unstable branch. Its
> > filename is
> > fix-zero-copy-i-o-on-__get_user_pages-allocated-pages.patch
> >
> > This patch will shortly appear at
> > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/fix-zero-copy-i-o-on-__get_user_pages-allocated-pages.patch__;!!KUh5zVML9r9m!2UOP9aM2VFq6hYqCdCsuJWGKqQ36OHuy8fOXVwFXktF6e9uH-2METAUSLAFHOPpOplI8gbkk7l6UAmauPPQ$
> >
> > This patch will later appear in the mm-hotfixes-unstable branch at
> > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> >
> > Before you just go and hit "reply", please:
> > a) Consider who else should be cc'ed
> > b) Prefer to cc a suitable mailing list as well
> > c) Ideally: find the original patch on the mailing list and do a
> > reply-to-all to that, adding suitable additional cc's
> >
> > *** Remember to use Documentation/process/submit-checklist.rst when
> > testing your code ***
> >
> > The -mm tree is included into linux-next via the mm-everything
> > branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > and is updated there every 2-3 working days
> >
> > ------------------------------------------------------
> > From: Pantelis Antoniou <p.antoniou@partner.samsung.com>
> > Subject: Fix zero copy I/O on __get_user_pages allocated pages
> > Date: Wed, 7 May 2025 10:41:05 -0500
> >
> > Recent updates to net filesystems enabled zero copy operations,
> > which require getting a user space page pinned.
> >
> > This does not work for pages that were allocated via
> > __get_user_pages and then mapped to user-space via remap_pfn_rage.
> >
> > remap_pfn_range_internal() will turn on VM_IO | VM_PFNMAP vma bits.
> > VM_PFNMAP in particular mark the pages as not having struct_page
> > associated with them, which is not the case for __get_user_pages()
> >
> > This in turn makes any attempt to lock a page fail, and breaking
> > I/O from that address range.
> >
> > This patch address it by special casing pages in those VMAs and not
> > calling vm_normal_page() for them.
> >
> > Link:
> > https://urldefense.com/v3/__https://lkml.kernel.org/r/20250507154105.763088-2-p.antoniou@partner.samsung.com__;!!KUh5zVML9r9m!2UOP9aM2VFq6hYqCdCsuJWGKqQ36OHuy8fOXVwFXktF6e9uH-2METAUSLAFHOPpOplI8gbkk7l6UcsZY8XI$
> > Signed-off-by: Pantelis Antoniou <p.antoniou@partner.samsung.com>
> > Cc: Artem Krupotkin <artem.k@samsung.com> Cc: Charles Briere
> > <c.briere@samsung.com> Cc: Wade Farnsworth
> > <wade.farnsworth@siemens.com> Cc: David Hildenbrand
> > <david@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > ---
> >
> > mm/gup.c | 22 ++++++++++++++++++----
> > 1 file changed, 18 insertions(+), 4 deletions(-)
> >
> > ---
> a/mm/gup.c~fix-zero-copy-i-o-on-__get_user_pages-allocated-pages
> > +++ a/mm/gup.c
> > @@ -833,6 +833,20 @@ static inline bool can_follow_write_pte(
> > return !userfaultfd_pte_wp(vma, pte);
> > }
> >
> > +static struct page *gup_normal_page(struct vm_area_struct *vma,
> > + unsigned long address, pte_t pte)
> > +{
> > + unsigned long pfn;
> > +
> > + if (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)) {
> > + pfn = pte_pfn(pte);
> > + if (!pfn_valid(pfn) || is_zero_pfn(pfn) || pfn >
> > highest_memmap_pfn)
> > + return NULL;
> > + return pfn_to_page(pfn);
> > + }
> > + return vm_normal_page(vma, address, pte);
> > +}
> > +
> > static struct page *follow_page_pte(struct vm_area_struct *vma,
> > unsigned long address, pmd_t *pmd, unsigned int
> > flags, struct dev_pagemap **pgmap)
> > @@ -858,7 +872,9 @@ static struct page *follow_page_pte(stru
> > if (pte_protnone(pte) && !gup_can_follow_protnone(vma,
> > flags)) goto no_page;
> >
> > - page = vm_normal_page(vma, address, pte);
> > + page = gup_normal_page(vma, address, pte);
> > + if (page && (vma->vm_flags & (VM_MIXEDMAP | VM_PFNMAP)))
> > + (void)follow_pfn_pte(vma, address, ptep, flags);
> >
> > /*
> > * We only care about anon pages in can_follow_write_pte()
> > and don't @@ -1130,7 +1146,7 @@ static int get_gate_page(struct
> > mm_struc *vma = get_gate_vma(mm);
> > if (!page)
> > goto out;
> > - *page = vm_normal_page(*vma, address, entry);
> > + *page = gup_normal_page(*vma, address, entry);
>
> Is this really needed? IIUC the iter code would only use in either
> UBUF or IOVEC ones.
>
I think you're right, for our platforms the gate check never passes.
However using the same gup_normal_page() method could be clearer in this
context.
> > if (!*page) {
> > if ((gup_flags & FOLL_DUMP) ||
> > !is_zero_pfn(pte_pfn(entry))) goto unmap;
> > @@ -1271,8 +1287,6 @@ static int check_vma_flags(struct vm_are
> > int foreign = (gup_flags & FOLL_REMOTE);
> > bool vma_anon = vma_is_anonymous(vma);
> >
> > - if (vm_flags & (VM_IO | VM_PFNMAP))
> > - return -EFAULT;
>
> Is there's any justification that this won't break some existing GUP
> users that may rely on properly failing at pfnmaps?
>
> IIUC netfs isn't the first one that wants to GUP on top of pfnmaps,
> KVM does it for years and so far it was processed in a standalone
> path:
>
> hva_to_pfn:
> else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) {
> r = hva_to_pfn_remapped(vma, kfp, &pfn);
>
> That started with supporting real pfnmaps (with no page struct), but
> pfnmap with page structs can also happen afaict, and kvm processes
> that too by checking page==NULL ultimately, e.g. in
> kvm_release_faultin_page().
>
I see. The problem is that we're not the owners of the code in netfslib,
and it is considerably more intrusive to fix things there.
This is a hotfix for a userspace regression. I sort of agree that having
different handling for these areas in netfslib would be ideal.
Or perhaps changing semantics by having an extra VM_* bit that would
mark that VMA as actually having a backing page struct. Dunno, things
could get considerably complex fast.
> The other thing is above only processed pte level of pfnmap, and just
> to mention pmd/pud may need attention too because we're gradually
> supporting huge mappings even for pfns. I didn't check whether it's
> possible as of now, though. Maybe it's not an immediate concern.
>
You are absolutely right, eventually it will be a concern in the future.
> In general, I'm uncertain about whether this is the right way to go so
> far. To me it might be less intrusive if we follow what kvm does for
> now, or maybe we also at least want to enrich the justification part
> in the commit log.
>
Again, this as a hotfix. An actual fix might be something that address
both KVM and netfslib concerns, but that would be something much
larger than a 20 line patch.
> >
> > if ((gup_flags & FOLL_ANON) && !vma_anon)
> > return -EFAULT;
> > _
> >
> > Patches currently in -mm which might be from
> > p.antoniou@partner.samsung.com are
> >
> > fix-zero-copy-i-o-on-__get_user_pages-allocated-pages.patch
> >
>
Regards
-- Pantelis
next prev parent reply other threads:[~2025-05-08 14:36 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-07 21:55 + fix-zero-copy-i-o-on-__get_user_pages-allocated-pages.patch added to mm-hotfixes-unstable branch Andrew Morton
2025-05-08 14:16 ` Peter Xu
2025-05-08 14:36 ` Pantelis Antoniou [this message]
2025-05-08 15:08 ` Peter Xu
2025-05-08 15:10 ` David Hildenbrand
2025-05-08 15:27 ` Pantelis Antoniou
2025-05-08 15:40 ` David Hildenbrand
2025-05-08 15:48 ` Pantelis Antoniou
2025-05-08 16:25 ` Pantelis Antoniou
2025-05-08 17:35 ` Jason Gunthorpe
2025-05-08 17:47 ` Pantelis Antoniou
2025-05-08 18:01 ` Jason Gunthorpe
2025-05-08 18:02 ` David Hildenbrand
2025-05-08 18:11 ` Pantelis Antoniou
2025-05-08 18:26 ` David Hildenbrand
2025-05-08 18:47 ` Peter Xu
2025-05-08 19:04 ` David Hildenbrand
2025-05-08 19:06 ` Jason Gunthorpe
2025-05-08 19:08 ` Peter Xu
2025-05-08 19:12 ` Jason Gunthorpe
2025-05-08 19:16 ` David Hildenbrand
2025-05-08 19:39 ` Peter Xu
2025-05-08 19:14 ` David Hildenbrand
2025-05-08 19:19 ` Jason Gunthorpe
2025-05-08 19:34 ` David Hildenbrand
2025-05-09 16:30 ` Pantelis Antoniou
2025-05-09 17:11 ` John Hubbard
2025-05-09 17:33 ` Jason Gunthorpe
2025-05-09 17:50 ` Pantelis Antoniou
2025-05-09 18:39 ` Jason Gunthorpe
2025-05-08 19:11 ` Jason Gunthorpe
2025-05-08 15:17 ` Pantelis Antoniou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250508173612.34d1bea3@sarc.samsung.com \
--to=p.antoniou@partner.samsung.com \
--cc=akpm@linux-foundation.org \
--cc=artem.k@samsung.com \
--cc=c.briere@samsung.com \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=mm-commits@vger.kernel.org \
--cc=peterx@redhat.com \
--cc=wade.farnsworth@siemens.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.