From: Mike Rapoport <rppt@kernel.org>
To: Vincent Donnefort <vdonnefort@google.com>
Cc: rostedt@goodmis.org, mhiramat@kernel.org,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
mathieu.desnoyers@efficios.com, kernel-team@android.com,
rdunlap@infradead.org, linux-mm@kvack.org
Subject: Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions
Date: Thu, 18 Apr 2024 09:55:55 +0300 [thread overview]
Message-ID: <ZiDD-wdGIjOBcQ2U@kernel.org> (raw)
In-Reply-To: <20240406173649.3210836-3-vdonnefort@google.com>
On Sat, Apr 06, 2024 at 06:36:46PM +0100, Vincent Donnefort wrote:
> In preparation for allowing the user-space to map a ring-buffer, add
> a set of mapping functions:
>
> ring_buffer_{map,unmap}()
>
> And controls on the ring-buffer:
>
> ring_buffer_map_get_reader() /* swap reader and head */
>
> Mapping the ring-buffer also involves:
>
> A unique ID for each subbuf of the ring-buffer, currently they are
> only identified through their in-kernel VA.
>
> A meta-page, where are stored ring-buffer statistics and a
> description for the current reader
>
> The linear mapping exposes the meta-page, and each subbuf of the
> ring-buffer, ordered following their unique ID, assigned during the
> first mapping.
>
> Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
> size will remain unmodified and the splice enabling functions will in
> reality simply memcpy the data instead of swapping subbufs.
>
> CC: <linux-mm@kvack.org>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
>
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index dc5ae4e96aee..96d2140b471e 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -6,6 +6,8 @@
> #include <linux/seq_file.h>
> #include <linux/poll.h>
>
> +#include <uapi/linux/trace_mmap.h>
> +
> struct trace_buffer;
> struct ring_buffer_iter;
>
> @@ -223,4 +225,8 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
> #define trace_rb_cpu_prepare NULL
> #endif
>
> +int ring_buffer_map(struct trace_buffer *buffer, int cpu,
> + struct vm_area_struct *vma);
> +int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
> +int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
> #endif /* _LINUX_RING_BUFFER_H */
> diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
> new file mode 100644
> index 000000000000..ffcd8dfcaa4f
> --- /dev/null
> +++ b/include/uapi/linux/trace_mmap.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _TRACE_MMAP_H_
> +#define _TRACE_MMAP_H_
> +
> +#include <linux/types.h>
> +
> +/**
> + * struct trace_buffer_meta - Ring-buffer Meta-page description
> + * @meta_page_size: Size of this meta-page.
> + * @meta_struct_len: Size of this structure.
> + * @subbuf_size: Size of each sub-buffer.
> + * @nr_subbufs: Number of subbfs in the ring-buffer, including the reader.
> + * @reader.lost_events: Number of events lost at the time of the reader swap.
> + * @reader.id: subbuf ID of the current reader. ID range [0 : @nr_subbufs - 1]
> + * @reader.read: Number of bytes read on the reader subbuf.
> + * @flags: Placeholder for now, 0 until new features are supported.
> + * @entries: Number of entries in the ring-buffer.
> + * @overrun: Number of entries lost in the ring-buffer.
> + * @read: Number of entries that have been read.
> + * @Reserved1: Reserved for future use.
> + * @Reserved2: Reserved for future use.
> + */
> +struct trace_buffer_meta {
> + __u32 meta_page_size;
> + __u32 meta_struct_len;
> +
> + __u32 subbuf_size;
> + __u32 nr_subbufs;
> +
> + struct {
> + __u64 lost_events;
> + __u32 id;
> + __u32 read;
> + } reader;
> +
> + __u64 flags;
> +
> + __u64 entries;
> + __u64 overrun;
> + __u64 read;
> +
> + __u64 Reserved1;
> + __u64 Reserved2;
Why do you need reserved fields? This structure always resides in the
beginning of a page and the rest of the page is essentially "reserved".
> +};
> +
> +#endif /* _TRACE_MMAP_H_ */
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index cc9ebe593571..793ecc454039 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
...
> +static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu *cpu_buffer,
> + unsigned long *subbuf_ids)
> +{
> + struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> + unsigned int nr_subbufs = cpu_buffer->nr_pages + 1;
> + struct buffer_page *first_subbuf, *subbuf;
> + int id = 0;
> +
> + subbuf_ids[id] = (unsigned long)cpu_buffer->reader_page->page;
> + cpu_buffer->reader_page->id = id++;
> +
> + first_subbuf = subbuf = rb_set_head_page(cpu_buffer);
> + do {
> + if (WARN_ON(id >= nr_subbufs))
> + break;
> +
> + subbuf_ids[id] = (unsigned long)subbuf->page;
> + subbuf->id = id;
> +
> + rb_inc_page(&subbuf);
> + id++;
> + } while (subbuf != first_subbuf);
> +
> + /* install subbuf ID to kern VA translation */
> + cpu_buffer->subbuf_ids = subbuf_ids;
> +
> + /* __rb_map_vma() pads the meta-page to align it with the sub-buffers */
> + meta->meta_page_size = PAGE_SIZE << cpu_buffer->buffer->subbuf_order;
Isn't this a single page?
> + meta->meta_struct_len = sizeof(*meta);
> + meta->nr_subbufs = nr_subbufs;
> + meta->subbuf_size = cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
> +
> + rb_update_meta_page(cpu_buffer);
> +}
...
> +#define subbuf_page(off, start) \
> + virt_to_page((void *)((start) + ((off) << PAGE_SHIFT)))
> +
> +#define foreach_subbuf_page(sub_order, start, page) \
Nit: usually iterators in kernel use for_each
> + page = subbuf_page(0, (start)); \
> + for (int __off = 0; __off < (1 << (sub_order)); \
> + __off++, page = subbuf_page(__off, (start)))
The pages are allocated with alloc_pages_node(.. subbuf_order) are
physically contiguous and struct pages for them are also contiguous, so
inside a subbuf_order allocation you can just do page++.
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2024-04-18 6:57 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240406173649.3210836-1-vdonnefort@google.com>
2024-04-06 17:36 ` [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions Vincent Donnefort
2024-04-10 17:43 ` Steven Rostedt
2024-04-23 16:04 ` Liam R. Howlett
2024-04-28 21:24 ` Steven Rostedt
2024-04-18 6:55 ` Mike Rapoport [this message]
2024-04-19 3:43 ` Steven Rostedt
2024-04-22 18:01 ` Vincent Donnefort
2024-04-19 18:25 ` David Hildenbrand
2024-04-22 9:27 ` David Hildenbrand
2024-04-22 18:20 ` Vincent Donnefort
2024-04-22 18:27 ` David Hildenbrand
2024-04-22 20:31 ` Vincent Donnefort
2024-04-23 15:18 ` David Hildenbrand
2024-04-06 17:36 ` [PATCH v20 3/5] tracing: Allow user-space mapping of the ring-buffer Vincent Donnefort
[not found] ` <20240410135612.5dc362e3@gandalf.local.home>
2024-04-17 4:55 ` [PATCH v20 0/5] Introducing trace buffer mapping by user-space Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZiDD-wdGIjOBcQ2U@kernel.org \
--to=rppt@kernel.org \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vdonnefort@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).