From: Vincent Donnefort <vdonnefort@google.com>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: rostedt@goodmis.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org,
mathieu.desnoyers@efficios.com, kernel-team@android.com
Subject: Re: [PATCH v11 2/5] ring-buffer: Introducing ring-buffer mapping functions
Date: Mon, 15 Jan 2024 15:37:31 +0000 [thread overview]
Message-ID: <ZaVRO76JxaHjPwCi@google.com> (raw)
In-Reply-To: <20240115134303.1a673e37b8e7d35a33d8df52@kernel.org>
On Mon, Jan 15, 2024 at 01:43:03PM +0900, Masami Hiramatsu wrote:
> On Thu, 11 Jan 2024 16:17:09 +0000
> Vincent Donnefort <vdonnefort@google.com> wrote:
>
> > In preparation for allowing the user-space to map a ring-buffer, add
> > a set of mapping functions:
> >
> > ring_buffer_{map,unmap}()
> > ring_buffer_map_fault()
> >
> > And controls on the ring-buffer:
> >
> > ring_buffer_map_get_reader() /* swap reader and head */
> >
> > Mapping the ring-buffer also involves:
> >
> > A unique ID for each subbuf of the ring-buffer, currently they are
> > only identified through their in-kernel VA.
> >
> > A meta-page, where are stored ring-buffer statistics and a
> > description for the current reader
> >
> > The linear mapping exposes the meta-page, and each subbuf of the
> > ring-buffer, ordered following their unique ID, assigned during the
> > first mapping.
> >
> > Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
> > size will remain unmodified and the splice enabling functions will in
> > reality simply memcpy the data instead of swapping subbufs.
> >
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> >
> > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > index fa802db216f9..0841ba8bab14 100644
> > --- a/include/linux/ring_buffer.h
> > +++ b/include/linux/ring_buffer.h
> > @@ -6,6 +6,8 @@
> > #include <linux/seq_file.h>
> > #include <linux/poll.h>
> >
> > +#include <uapi/linux/trace_mmap.h>
> > +
> > struct trace_buffer;
> > struct ring_buffer_iter;
> >
> > @@ -221,4 +223,9 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
> > #define trace_rb_cpu_prepare NULL
> > #endif
> >
> > +int ring_buffer_map(struct trace_buffer *buffer, int cpu);
> > +int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
> > +struct page *ring_buffer_map_fault(struct trace_buffer *buffer, int cpu,
> > + unsigned long pgoff);
> > +int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
> > #endif /* _LINUX_RING_BUFFER_H */
> > diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
> > new file mode 100644
> > index 000000000000..bde39a73ce65
> > --- /dev/null
> > +++ b/include/uapi/linux/trace_mmap.h
> > @@ -0,0 +1,45 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +#ifndef _TRACE_MMAP_H_
> > +#define _TRACE_MMAP_H_
> > +
> > +#include <linux/types.h>
> > +
> > +/**
> > + * struct trace_buffer_meta - Ring-buffer Meta-page description
> > + * @entries: Number of entries in the ring-buffer.
> > + * @overrun: Number of entries lost in the ring-buffer.
> > + * @read: Number of entries that have been read.
> > + * @subbufs_touched: Number of subbufs that have been filled.
> > + * @subbufs_lost: Number of subbufs lost to overrun.
> > + * @subbufs_read: Number of subbufs that have been read.
> > + * @reader.lost_events: Number of events lost at the time of the reader swap.
> > + * @reader.id: subbuf ID of the current reader. From 0 to @nr_subbufs - 1
> > + * @reader.read: Number of bytes read on the reader subbuf.
> > + * @subbuf_size: Size of each subbuf, including the header.
> > + * @nr_subbufs: Number of subbfs in the ring-buffer.
> > + * @meta_page_size: Size of this meta-page.
> > + * @meta_struct_len: Size of this structure.
> > + */
> > +struct trace_buffer_meta {
> > + unsigned long entries;
> > + unsigned long overrun;
> > + unsigned long read;
> > +
> > + unsigned long subbufs_touched;
> > + unsigned long subbufs_lost;
> > + unsigned long subbufs_read;
> > +
> > + struct {
> > + unsigned long lost_events;
> > + __u32 id;
> > + __u32 read;
> > + } reader;
> > +
> > + __u32 subbuf_size;
> > + __u32 nr_subbufs;
> > +
> > + __u32 meta_page_size;
> > + __u32 meta_struct_len;
> > +};
> > +
> > +#endif /* _TRACE_MMAP_H_ */
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index db73e326fa04..e9ff1c95e896 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -338,6 +338,7 @@ struct buffer_page {
> > local_t entries; /* entries on this page */
> > unsigned long real_end; /* real end of data */
> > unsigned order; /* order of the page */
> > + u32 id; /* ID for external mapping */
> > struct buffer_data_page *page; /* Actual data page */
> > };
> >
> > @@ -484,6 +485,12 @@ struct ring_buffer_per_cpu {
> > u64 read_stamp;
> > /* pages removed since last reset */
> > unsigned long pages_removed;
> > +
> > + int mapped;
> > + struct mutex mapping_lock;
> > + unsigned long *subbuf_ids; /* ID to addr */
> > + struct trace_buffer_meta *meta_page;
> > +
> > /* ring buffer pages to update, > 0 to add, < 0 to remove */
> > long nr_pages_to_update;
> > struct list_head new_pages; /* new pages to add */
> > @@ -1542,6 +1549,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
> > init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters);
> > init_waitqueue_head(&cpu_buffer->irq_work.waiters);
> > init_waitqueue_head(&cpu_buffer->irq_work.full_waiters);
> > + mutex_init(&cpu_buffer->mapping_lock);
> >
> > bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
> > GFP_KERNEL, cpu_to_node(cpu));
> > @@ -5160,6 +5168,23 @@ static void rb_clear_buffer_page(struct buffer_page *page)
> > page->read = 0;
> > }
> >
> > +static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
> > +{
> > + struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> > +
> > + WRITE_ONCE(meta->entries, local_read(&cpu_buffer->entries));
> > + WRITE_ONCE(meta->overrun, local_read(&cpu_buffer->overrun));
> > + WRITE_ONCE(meta->read, cpu_buffer->read);
> > +
> > + WRITE_ONCE(meta->subbufs_touched, local_read(&cpu_buffer->pages_touched));
> > + WRITE_ONCE(meta->subbufs_lost, local_read(&cpu_buffer->pages_lost));
> > + WRITE_ONCE(meta->subbufs_read, local_read(&cpu_buffer->pages_read));
> > +
> > + WRITE_ONCE(meta->reader.read, cpu_buffer->reader_page->read);
> > + WRITE_ONCE(meta->reader.id, cpu_buffer->reader_page->id);
> > + WRITE_ONCE(meta->reader.lost_events, cpu_buffer->lost_events);
> > +}
> > +
> > static void
> > rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
> > {
> > @@ -5204,6 +5229,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
> > cpu_buffer->lost_events = 0;
> > cpu_buffer->last_overrun = 0;
> >
> > + if (cpu_buffer->mapped)
> > + rb_update_meta_page(cpu_buffer);
> > +
> > rb_head_page_activate(cpu_buffer);
> > cpu_buffer->pages_removed = 0;
> > }
> > @@ -5418,6 +5446,11 @@ int ring_buffer_swap_cpu(struct trace_buffer *buffer_a,
> > cpu_buffer_a = buffer_a->buffers[cpu];
> > cpu_buffer_b = buffer_b->buffers[cpu];
> >
> > + if (READ_ONCE(cpu_buffer_a->mapped) || READ_ONCE(cpu_buffer_b->mapped)) {
> > + ret = -EBUSY;
> > + goto out;
> > + }
>
> Ah, this is not enough to stop snapshot. update_max_tr()@kernel/trace/trace.c
> is used for swapping all CPU buffer and it does not use this function.
> (but that should use this instead of accessing buffer directly...)
>
> Maybe we need ring_buffer_swap() and it checks all cpu_buffer does not set
> mapping bits.
>
> Thank you,
How about instead of having ring_buffer_swap_cpu() returning -EBUSY when mapped
we have two functions to block any mapping that trace.c could call?
int rb_swap_prepare(struct ring_buffer *cpu_buffer)
{
mutex_lock(&cpu_buffer->mapping_lock);
if (!cpu_buffer->mapped)
return 0;
mutex_unlock(&cpu_buffer->mapping_lock);
}
int rb_swap_done(struct ring_buffer *cpu_buffer)
{
mutex_unlock(&cpu_buffer->mapping_lock);
}
int ring_buffer_swap_prepare(struct trace_buffer *buffer, int cpu)
{
...
if (cpu == RING_BUFFER_ALL_CPUS) {
int cpu_i;
for_each_buffer_cpu(buffer, cpu_i) {
err = rb_swap_prepare(buffer->buffers[cpu_i]);
if (err)
break;
}
if (err)
/* Rollback ... */
return err;
}
return rb_swap_prepare(buffer->buffers[cpu]);
}
void ring_buffer_swap_done(struct trace_buffer *buffer, int cpu)
{
if (cpu == RING_BUFFER_ALL_CPUS) {
int cpu_i;
for_each_buffer_cpu(buffer, cpu_i)
rb_swap_done(buffer->buffers[cpu_i]);
return;
}
return rb_swap_done(buffer->buffers[cpu]);
}
[...]
next prev parent reply other threads:[~2024-01-15 15:37 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-11 16:17 [PATCH v11 0/5] Introducing trace buffer mapping by user-space Vincent Donnefort
2024-01-11 16:17 ` [PATCH v11 1/5] ring-buffer: Zero ring-buffer sub-buffers Vincent Donnefort
2024-01-13 13:38 ` Masami Hiramatsu
2024-01-11 16:17 ` [PATCH v11 2/5] ring-buffer: Introducing ring-buffer mapping functions Vincent Donnefort
2024-01-11 16:34 ` Mathieu Desnoyers
2024-01-11 23:23 ` Steven Rostedt
2024-01-12 9:13 ` Vincent Donnefort
2024-01-12 15:06 ` Steven Rostedt
2024-01-12 15:58 ` Steven Rostedt
2024-01-15 4:43 ` Masami Hiramatsu
2024-01-15 15:37 ` Vincent Donnefort [this message]
2024-01-15 16:09 ` Steven Rostedt
2024-01-15 16:23 ` Steven Rostedt
2024-01-15 17:29 ` Vincent Donnefort
2024-01-15 18:03 ` Steven Rostedt
2024-01-15 23:48 ` Masami Hiramatsu
2024-01-11 16:17 ` [PATCH v11 3/5] tracing: Allow user-space mapping of the ring-buffer Vincent Donnefort
2024-01-11 16:17 ` [PATCH v11 4/5] Documentation: tracing: Add ring-buffer mapping Vincent Donnefort
2024-01-13 13:36 ` Masami Hiramatsu
2024-01-14 14:26 ` Masami Hiramatsu
2024-01-14 16:23 ` Steven Rostedt
2024-01-14 23:10 ` Masami Hiramatsu
2024-01-11 16:17 ` [PATCH v11 5/5] ring-buffer/selftest: Add ring-buffer mapping test Vincent Donnefort
2024-01-13 13:39 ` Masami Hiramatsu
2024-01-14 14:17 ` Masami Hiramatsu
2024-01-14 16:20 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZaVRO76JxaHjPwCi@google.com \
--to=vdonnefort@google.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.