From: Vincent Donnefort <vdonnefort@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: rostedt@goodmis.org, mhiramat@kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
mathieu.desnoyers@efficios.com, kernel-team@android.com,
rdunlap@infradead.org, rppt@kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v21 2/5] ring-buffer: Introducing ring-buffer mapping functions
Date: Thu, 25 Apr 2024 17:42:46 +0100 [thread overview]
Message-ID: <ZiqIBo4sX-oAgj2M@google.com> (raw)
In-Reply-To: <04137a08-8918-422c-8512-beb2074a427e@redhat.com>
On Wed, Apr 24, 2024 at 10:55:54PM +0200, David Hildenbrand wrote:
> On 24.04.24 22:31, Vincent Donnefort wrote:
> > Hi David,
> >
> > Thanks for your quick response.
> >
> > On Wed, Apr 24, 2024 at 05:26:39PM +0200, David Hildenbrand wrote:
> > >
> > > I gave it some more thought, and I think we are still missing something (I
> > > wish PFNMAP/MIXEDMAP wouldn't be that hard).
> > >
> > > > +
> > > > +/*
> > > > + * +--------------+ pgoff == 0
> > > > + * | meta page |
> > > > + * +--------------+ pgoff == 1
> > > > + * | subbuffer 0 |
> > > > + * | |
> > > > + * +--------------+ pgoff == (1 + (1 << subbuf_order))
> > > > + * | subbuffer 1 |
> > > > + * | |
> > > > + * ...
> > > > + */
> > > > +#ifdef CONFIG_MMU
> > > > +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
> > > > + struct vm_area_struct *vma)
> > > > +{
> > > > + unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff;
> > > > + unsigned int subbuf_pages, subbuf_order;
> > > > + struct page **pages;
> > > > + int p = 0, s = 0;
> > > > + int err;
> > > > +
> > >
> > > I'd add some comments here like
> > >
> > > /* Refuse any MAP_PRIVATE or writable mappings. */
> > > > + if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC ||
> > > > + !(vma->vm_flags & VM_MAYSHARE))
> > > > + return -EPERM;
> > > > +
> > >
> > > /*
> > > * Make sure the mapping cannot become writable later. Also, tell the VM
> > > * to not touch these pages pages (VM_DONTCOPY | VM_DONTDUMP) and tell
> > > * GUP to leave them alone as well (VM_IO).
> > > */
> > > > + vm_flags_mod(vma,
> > > > + VM_MIXEDMAP | VM_PFNMAP |
> > > > + VM_DONTCOPY | VM_DONTDUMP | VM_DONTEXPAND | VM_IO,
> > > > + VM_MAYWRITE);
> > >
> > > I am still really unsure about VM_PFNMAP ... it's not a PFNMAP at all and,
> > > as stated, vm_insert_pages() even complains quite a lot when it would have
> > > to set VM_MIXEDMAP and VM_PFNMAP is already set, likely for a very good
> > > reason.
> > >
> > > Can't we limit ourselves to VM_IO?
> > >
> > > But then, I wonder if it really helps much regarding GUP: yes, it blocks
> > > ordinary GUP (see check_vma_flags()) but as insert_page_into_pte_locked()
> > > does *not* set pte_special(), GUP-fast (gup_fast_pte_range()) will not
> > > reject it.
> > >
> > > Really, if you want GUP-fast to reject it, remap_pfn_range() and friends are
> > > the way to go, that will set pte_special() such that also GUP-fast will
> > > leave it alone, just like vm_normal_page() would.
> > >
> > > So ... I know Linus recommended VM_PFNMAP/VM_IO to stop GUP, but it alone
> > > won't stop all of GUP. We really have to mark the PTE as special, which
> > > vm_insert_page() must not do (because it is refcounted!).
> >
> > Hum, apologies, I am not sure to follow the connection here. Why do you think
> > the recommendation was to prevent GUP?
>
> Ah, I'm hallucinating! :) "not let people play games with the mapping" to me
> implied "make sure nobody touches it". If GUP is acceptable that makes stuff
> a lot easier. VM_IO will block some GUP, but not all of it.
>
> >
> > >
> > > Which means: do we really have to stop GUP from grabbing that page?
> > >
> > > Using vm_insert_page() only with VM_MIXEDMAP (and without VM_PFNMAP|VM_IO)
> > > would be better.
> >
> > Under the assumption we do not want to stop all GUP, why not using VM_IO over
> > VM_MIXEDMAP which is I believe more restrictive?
>
> VM_MIXEDMAP will be implicitly set by vm_insert_page(). There is a lengthy comment
> for vm_normal_page() that explains all this madness. VM_MIXEDMAP is primarily
> relevant for COW mappings, which you just forbid completely.
>
> remap_pfn_range_notrack() documents the semantics of some of the other flags:
>
> * VM_IO tells people not to look at these pages
> * (accesses can have side effects).
> * VM_PFNMAP tells the core MM that the base pages are just
> * raw PFN mappings, and do not have a "struct page" associated
> * with them.
> * VM_DONTEXPAND
> * Disable vma merging and expanding with mremap().
> * VM_DONTDUMP
> * Omit vma from core dump, even when VM_IO turned off.
>
> VM_PFNMAP is very likely really not what we want, unless we really perform raw
> PFN mappings ... VM_IO we can set without doing much harm.
>
> So I would suggest dropping VM_PFNMAP when using vm_insert_pages(), using only VM_IO
> and likely just letting vm_insert_pages() set VM_MIXEDMAP for you.
Sounds good, I will do that in v22.
>
> [...]
>
> > >
> > > vm_insert_pages() documents: "In case of error, we may have mapped a subset
> > > of the provided pages. It is the caller's responsibility to account for this
> > > case."
> > >
> > > Which could for example happen, when allocating a page table fails.
> > >
> > > Would we able to deal with that here?
> >
> > As we are in the mmap path, on an error, I would expect the vma to be destroyed
> > and those pages whom insertion succeeded to be unmapped?
> >
>
> Ah, we simply fail ->mmap().
>
> In mmap_region(), if call_mmap() failed, we "goto unmap_and_free_vma" where we have
>
> /* Undo any partial mapping done by a device driver. */
> unmap_region(mm, &vmi.mas, vma, prev, next, vma->vm_start, vma->vm_end, vma->vm_end, true);
>
>
> > But perhaps shall we proactively zap_page_range_single()?
>
> No mmap_region() should indeed be handling it correctly already!
Ok, thanks for confirming!
>
> --
> Cheers,
>
> David / dhildenb
>
next prev parent reply other threads:[~2024-04-25 16:42 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-23 23:27 [PATCH v21 0/5] Introducing trace buffer mapping by user-space Vincent Donnefort
2024-04-23 23:27 ` [PATCH v21 1/5] ring-buffer: Allocate sub-buffers with __GFP_COMP Vincent Donnefort
2024-04-23 23:27 ` [PATCH v21 2/5] ring-buffer: Introducing ring-buffer mapping functions Vincent Donnefort
2024-04-24 15:26 ` David Hildenbrand
2024-04-24 20:31 ` Vincent Donnefort
2024-04-24 20:55 ` David Hildenbrand
2024-04-25 16:42 ` Vincent Donnefort [this message]
2024-04-23 23:27 ` [PATCH v21 3/5] tracing: Allow user-space mapping of the ring-buffer Vincent Donnefort
2024-04-23 23:27 ` [PATCH v21 4/5] Documentation: tracing: Add ring-buffer mapping Vincent Donnefort
2024-04-23 23:27 ` [PATCH v21 5/5] ring-buffer/selftest: Add ring-buffer mapping test Vincent Donnefort
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZiqIBo4sX-oAgj2M@google.com \
--to=vdonnefort@google.com \
--cc=david@redhat.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.