From: Vincent Donnefort <vdonnefort@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: rostedt@goodmis.org, mhiramat@kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
mathieu.desnoyers@efficios.com, kernel-team@android.com,
rdunlap@infradead.org, rppt@kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v22 2/5] ring-buffer: Introducing ring-buffer mapping functions
Date: Thu, 2 May 2024 14:38:32 +0100 [thread overview]
Message-ID: <ZjOXWLfR7GpzE8H_@google.com> (raw)
In-Reply-To: <78e20e98-bdfc-4d7b-a59c-988b81fcc58b@redhat.com>
On Thu, May 02, 2024 at 03:30:32PM +0200, David Hildenbrand wrote:
> On 30.04.24 13:13, Vincent Donnefort wrote:
> > In preparation for allowing the user-space to map a ring-buffer, add
> > a set of mapping functions:
> >
> > ring_buffer_{map,unmap}()
> >
> > And controls on the ring-buffer:
> >
> > ring_buffer_map_get_reader() /* swap reader and head */
> >
> > Mapping the ring-buffer also involves:
> >
> > A unique ID for each subbuf of the ring-buffer, currently they are
> > only identified through their in-kernel VA.
> >
> > A meta-page, where are stored ring-buffer statistics and a
> > description for the current reader
> >
> > The linear mapping exposes the meta-page, and each subbuf of the
> > ring-buffer, ordered following their unique ID, assigned during the
> > first mapping.
> >
> > Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
> > size will remain unmodified and the splice enabling functions will in
> > reality simply memcpy the data instead of swapping subbufs.
> >
> > CC: <linux-mm@kvack.org>
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> >
> > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > index dc5ae4e96aee..96d2140b471e 100644
> > --- a/include/linux/ring_buffer.h
> > +++ b/include/linux/ring_buffer.h
>
> [...]
>
> > +/*
> > + * +--------------+ pgoff == 0
> > + * | meta page |
> > + * +--------------+ pgoff == 1
> > + * | subbuffer 0 |
> > + * | |
> > + * +--------------+ pgoff == (1 + (1 << subbuf_order))
> > + * | subbuffer 1 |
> > + * | |
> > + * ...
> > + */
> > +#ifdef CONFIG_MMU
> > +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer,
> > + struct vm_area_struct *vma)
> > +{
> > + unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff;
> > + unsigned int subbuf_pages, subbuf_order;
> > + struct page **pages;
> > + int p = 0, s = 0;
> > + int err;
> > +
> > + /* Refuse MP_PRIVATE or writable mappings */
> > + if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC ||
> > + !(vma->vm_flags & VM_MAYSHARE))
> > + return -EPERM;
> > +
> > + /*
> > + * Make sure the mapping cannot become writable later. Also tell the VM
> > + * to not touch these pages (VM_DONTCOPY | VM_DONTEXPAND). Finally,
> > + * prevent migration, GUP and dump (VM_IO).
> > + */
> > + vm_flags_mod(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_IO, VM_MAYWRITE);
> > +
> > + lockdep_assert_held(&cpu_buffer->mapping_lock);
> > +
> > + subbuf_order = cpu_buffer->buffer->subbuf_order;
> > + subbuf_pages = 1 << subbuf_order;
> > +
> > + nr_subbufs = cpu_buffer->nr_pages + 1; /* + reader-subbuf */
> > + nr_pages = ((nr_subbufs) << subbuf_order) - pgoff + 1; /* + meta-page */
> > +
> > + vma_pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> > + if (!vma_pages || vma_pages > nr_pages)
> > + return -EINVAL;
> > +
> > + nr_pages = vma_pages;
> > +
> > + pages = kcalloc(nr_pages, sizeof(*pages), GFP_KERNEL);
> > + if (!pages)
> > + return -ENOMEM;
> > +
> > + if (!pgoff) {
> > + pages[p++] = virt_to_page(cpu_buffer->meta_page);
> > +
> > + /*
> > + * TODO: Align sub-buffers on their size, once
> > + * vm_insert_pages() supports the zero-page.
> > + */
> > + } else {
> > + /* Skip the meta-page */
> > + pgoff--;
> > +
> > + if (pgoff % subbuf_pages) {
> > + err = -EINVAL;
> > + goto out;
> > + }
> > +
> > + s += pgoff / subbuf_pages;
> > + }
> > +
> > + while (s < nr_subbufs && p < nr_pages) {
> > + struct page *page = virt_to_page(cpu_buffer->subbuf_ids[s]);
> > + int off = 0;
> > +
> > + for (; off < (1 << (subbuf_order)); off++, page++) {
> > + if (p >= nr_pages)
> > + break;
> > +
> > + pages[p++] = page;
> > + }
> > + s++;
> > + }
> > +
> > + err = vm_insert_pages(vma, vma->vm_start, pages, &nr_pages);
>
> Nit: I did not immediately understand if we could end here with p < nr_pages
> (IOW, pages[] not completely filled).
>
> One source of confusion is the "s < nr_subbufs" check in the while loop: why
> is "p < nr_pages" insufficient?
Hum, indeed, the "s < nr_subbufs" check is superfluous, nr_pages, is already
capped by the number of subbufs, there's no way we can overflow subbuf_ids[].
>
>
> For the MM bits:
>
> Acked-by: David Hildenbrand <david@redhat.com>
Thanks a lot for having a look at the series, very much appreciated!
>
>
>
> --
> Cheers,
>
> David / dhildenb
>
next prev parent reply other threads:[~2024-05-02 13:38 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-30 11:13 [PATCH v22 0/5] Introducing trace buffer mapping by user-space Vincent Donnefort
2024-04-30 11:13 ` [PATCH v22 1/5] ring-buffer: Allocate sub-buffers with __GFP_COMP Vincent Donnefort
2024-05-02 13:18 ` David Hildenbrand
2024-04-30 11:13 ` [PATCH v22 2/5] ring-buffer: Introducing ring-buffer mapping functions Vincent Donnefort
2024-05-02 13:30 ` David Hildenbrand
2024-05-02 13:38 ` Vincent Donnefort [this message]
2024-05-02 13:46 ` Steven Rostedt
2024-05-08 2:34 ` Steven Rostedt
2024-05-09 11:05 ` Vincent Donnefort
2024-05-10 9:15 ` David Hildenbrand
2024-05-10 10:57 ` Vincent Donnefort
2024-05-10 9:19 ` David Hildenbrand
2024-05-10 11:03 ` Vincent Donnefort
2024-05-10 18:42 ` Steven Rostedt
2024-04-30 11:13 ` [PATCH v22 3/5] tracing: Allow user-space mapping of the ring-buffer Vincent Donnefort
2024-04-30 11:13 ` [PATCH v22 4/5] Documentation: tracing: Add ring-buffer mapping Vincent Donnefort
2024-04-30 11:13 ` [PATCH v22 5/5] ring-buffer/selftest: Add ring-buffer mapping test Vincent Donnefort
2024-05-03 19:12 ` Shuah Khan
2024-05-07 23:35 ` Steven Rostedt
2024-05-10 11:04 ` Vincent Donnefort
2024-05-10 18:44 ` Steven Rostedt
2024-05-10 18:50 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZjOXWLfR7GpzE8H_@google.com \
--to=vdonnefort@google.com \
--cc=david@redhat.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.