From: Jason Gunthorpe <jgg@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: David Hildenbrand <david@redhat.com>,
Jinjiang Tu <tujinjiang@huawei.com>,
akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
wangkefeng.wang@huawei.com
Subject: Re: [PATCH] mm: fix COW mapping handing in generic_access_phys
Date: Wed, 28 May 2025 14:34:17 -0300 [thread overview]
Message-ID: <20250528173417.GZ61950@nvidia.com> (raw)
In-Reply-To: <aDdEkhwDS2YMu9OV@x1.local>
On Wed, May 28, 2025 at 01:14:58PM -0400, Peter Xu wrote:
> On Wed, May 28, 2025 at 01:29:15PM -0300, Jason Gunthorpe wrote:
> > On Wed, May 28, 2025 at 12:06:07PM -0400, Peter Xu wrote:
> > > #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */
> > >
> > > I'm not confident to blame any driver yet to have those special cases for
> > > VM_PFNMAP, because it only says "managed without struct page", it didn't
> > > say "it must not contain struct page".. Hence it hints the core mm "please
> > > do not manage these mappings with struct page at all". Still sounds fair
> > > contract, even if not ideal.
> >
> > I think it is pretty clear, if a VMA has VM_PFNMAP then nothing must
> > ever try to obtain a struct page from any PTEs in it, for any reason,
> > even if things in it might have a struct page. In practice it means
> > nothing can call vm_normal_page() on a VM_PFNMAP.
> >
> > It would be nice to update the comment to make it clearer.
>
> Yes that would help. Maybe the hard part is making sure how it is
> documented will be how it is used..
>
> >
> > If the VMA owner wanted to permit access to the struct page then it
> > should have used VM_MIXEDMAP.
> >
> > The fundamental difference between PFNMAP and MIXEDMAP is that
> > vm_normal_page() is allowed on MIXEDMAP. That comes with some extra
> > rules and restrictions to support arches without the special pte bit.
>
> If in the ideal world where VM_PFNMAP has a stricter semantics, it sounds
> fair to disable vm_normal_page() on top of VM_PFNMAP, yes.
>
> >
> > VM_IO | VM_PFNMAP further means that all the pfns in the VMA require
> > the use of io accessors (writel/readl) to access them.
>
> Hmm.. I'm not 100% sure on this one. E.g., vDSO is VM_IO now but it's
> definitely accessible that got mapped into userspace.
That seems wrong...
It says it in the comment clearly:
#define VM_IO 0x00004000 /* Memory mapped I/O or similar */
"memory mapped i/o" is exactly __iomem.
At least this VDSO is kind of weird:
vma = _install_special_mapping(mm,
VDSO_VCLOCK_PAGES_START(addr),
VDSO_NR_VCLOCK_PAGES * PAGE_SIZE,
VM_READ|VM_MAYREAD|VM_IO|VM_DONTDUMP|
VM_PFNMAP|VM_SEALED_SYSMAP,
&vvar_vclock_mapping);
Because the things it puts into the VMA are not actually known to be
MMIO, they are both special hypervisor clock pages:
struct pvclock_vsyscall_time_info *pvti =
pvclock_get_pvti_cpu0_va();
unsigned long pfn = hv_get_tsc_pfn();
And they are *probably* ddr, but also x86 doesn't care about VM_IO, so
it doesn't matter if it is wrong.
I wonder if the vdso_install_vvar_mapping() one was blidly copied from x86:
pfn = __phys_to_pfn(__pa_symbol(vdso_k_time_data));
pfn = __phys_to_pfn(__pa_symbol(vdso_k_time_data));
pfn = __phys_to_pfn(__pa_symbol(vdso_k_rng_data));
pfn = __phys_to_pfn(__pa_symbol(vdso_k_arch_data)) +
Because __pa_symbol is definately not MMIO and should not have VM_IO.
> But I confess I at least don't know why VM_IO existed, considering there're
> also VM_*MAP and VM_DONTDUMP.
I've assumed it was for the various debugger/dump related paths to
prevent access to the memory and system crash. Some environments
cannot touch MMIO addresses without using readl/writel.
> > No idea what VM_IO | VM_MIXEDMAP is supposed to mean. Only the special
> > ptes need io accessors?
> >
> > In either case GUP doesn't really work on the VMA. PFNMAP is totally
> > blocked, and for MIXEDMAP userspace has no way to discover which
> > subset of the VMA is GUPable. I think that GUP is supported on
> > MIXEDMAP at all is a bit of a weirdo thing.
>
> Does it imply that in the ideal case one should use follow_pfnmap_start()
> for MIXEDMAP?
No
> I don't have a strong feeling yet on how GUP should treat MIXEDMAP, either
> (1) fail MIXEDMAP like you said, falling back to follow_pfnmap_start(), or
> (2) allow MIXEDMAP only on page-backed mappings, then fallback to
> follow_pfnmap_start() on non-page-backed mappings only.
GUP should follow the rules, it must use vm_normal_page() on each PTE
and if and only if a struct page is returned then it can be refcounted
and returned by GUP.
GUP should never ignore the special bit and convert a special PTE to a
struct page. The very definition of the special bit is that you cannot
do this.
Jason
next prev parent reply other threads:[~2025-05-28 17:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-28 1:56 [PATCH] mm: fix COW mapping handing in generic_access_phys Jinjiang Tu
2025-05-28 8:59 ` David Hildenbrand
2025-05-28 9:59 ` David Hildenbrand
2025-05-28 12:14 ` Jinjiang Tu
2025-05-28 14:54 ` Peter Xu
2025-05-28 15:02 ` David Hildenbrand
2025-05-28 15:25 ` Peter Xu
2025-05-28 15:29 ` David Hildenbrand
2025-05-28 16:06 ` Peter Xu
2025-05-28 16:29 ` Jason Gunthorpe
2025-05-28 17:14 ` Peter Xu
2025-05-28 17:34 ` Jason Gunthorpe [this message]
2025-05-28 17:37 ` David Hildenbrand
2025-05-28 17:32 ` David Hildenbrand
2025-05-28 17:47 ` Jason Gunthorpe
2025-05-28 17:59 ` Jason Gunthorpe
2025-05-28 18:03 ` David Hildenbrand
2025-05-28 18:00 ` David Hildenbrand
2025-05-28 18:15 ` Jason Gunthorpe
2025-05-28 18:22 ` David Hildenbrand
2025-05-28 18:29 ` Jason Gunthorpe
2025-05-30 10:04 ` David Hildenbrand
2025-05-28 12:13 ` Jinjiang Tu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250528173417.GZ61950@nvidia.com \
--to=jgg@nvidia.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=tujinjiang@huawei.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.