All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Donnefort <vdonnefort@google.com>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: rostedt@goodmis.org, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	mathieu.desnoyers@efficios.com, kernel-team@android.com
Subject: Re: [PATCH v11 2/5] ring-buffer: Introducing ring-buffer mapping functions
Date: Mon, 15 Jan 2024 15:37:31 +0000	[thread overview]
Message-ID: <ZaVRO76JxaHjPwCi@google.com> (raw)
In-Reply-To: <20240115134303.1a673e37b8e7d35a33d8df52@kernel.org>

On Mon, Jan 15, 2024 at 01:43:03PM +0900, Masami Hiramatsu wrote:
> On Thu, 11 Jan 2024 16:17:09 +0000
> Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> > In preparation for allowing the user-space to map a ring-buffer, add
> > a set of mapping functions:
> > 
> >   ring_buffer_{map,unmap}()
> >   ring_buffer_map_fault()
> > 
> > And controls on the ring-buffer:
> > 
> >   ring_buffer_map_get_reader()  /* swap reader and head */
> > 
> > Mapping the ring-buffer also involves:
> > 
> >   A unique ID for each subbuf of the ring-buffer, currently they are
> >   only identified through their in-kernel VA.
> > 
> >   A meta-page, where are stored ring-buffer statistics and a
> >   description for the current reader
> > 
> > The linear mapping exposes the meta-page, and each subbuf of the
> > ring-buffer, ordered following their unique ID, assigned during the
> > first mapping.
> > 
> > Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
> > size will remain unmodified and the splice enabling functions will in
> > reality simply memcpy the data instead of swapping subbufs.
> > 
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > 
> > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > index fa802db216f9..0841ba8bab14 100644
> > --- a/include/linux/ring_buffer.h
> > +++ b/include/linux/ring_buffer.h
> > @@ -6,6 +6,8 @@
> >  #include <linux/seq_file.h>
> >  #include <linux/poll.h>
> >  
> > +#include <uapi/linux/trace_mmap.h>
> > +
> >  struct trace_buffer;
> >  struct ring_buffer_iter;
> >  
> > @@ -221,4 +223,9 @@ int trace_rb_cpu_prepare(unsigned int cpu, struct hlist_node *node);
> >  #define trace_rb_cpu_prepare	NULL
> >  #endif
> >  
> > +int ring_buffer_map(struct trace_buffer *buffer, int cpu);
> > +int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
> > +struct page *ring_buffer_map_fault(struct trace_buffer *buffer, int cpu,
> > +				   unsigned long pgoff);
> > +int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
> >  #endif /* _LINUX_RING_BUFFER_H */
> > diff --git a/include/uapi/linux/trace_mmap.h b/include/uapi/linux/trace_mmap.h
> > new file mode 100644
> > index 000000000000..bde39a73ce65
> > --- /dev/null
> > +++ b/include/uapi/linux/trace_mmap.h
> > @@ -0,0 +1,45 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +#ifndef _TRACE_MMAP_H_
> > +#define _TRACE_MMAP_H_
> > +
> > +#include <linux/types.h>
> > +
> > +/**
> > + * struct trace_buffer_meta - Ring-buffer Meta-page description
> > + * @entries:		Number of entries in the ring-buffer.
> > + * @overrun:		Number of entries lost in the ring-buffer.
> > + * @read:		Number of entries that have been read.
> > + * @subbufs_touched:	Number of subbufs that have been filled.
> > + * @subbufs_lost:	Number of subbufs lost to overrun.
> > + * @subbufs_read:	Number of subbufs that have been read.
> > + * @reader.lost_events:	Number of events lost at the time of the reader swap.
> > + * @reader.id:		subbuf ID of the current reader. From 0 to @nr_subbufs - 1
> > + * @reader.read:	Number of bytes read on the reader subbuf.
> > + * @subbuf_size:	Size of each subbuf, including the header.
> > + * @nr_subbufs:		Number of subbfs in the ring-buffer.
> > + * @meta_page_size:	Size of this meta-page.
> > + * @meta_struct_len:	Size of this structure.
> > + */
> > +struct trace_buffer_meta {
> > +	unsigned long	entries;
> > +	unsigned long	overrun;
> > +	unsigned long	read;
> > +
> > +	unsigned long	subbufs_touched;
> > +	unsigned long	subbufs_lost;
> > +	unsigned long	subbufs_read;
> > +
> > +	struct {
> > +		unsigned long	lost_events;
> > +		__u32		id;
> > +		__u32		read;
> > +	} reader;
> > +
> > +	__u32		subbuf_size;
> > +	__u32		nr_subbufs;
> > +
> > +	__u32		meta_page_size;
> > +	__u32		meta_struct_len;
> > +};
> > +
> > +#endif /* _TRACE_MMAP_H_ */
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index db73e326fa04..e9ff1c95e896 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -338,6 +338,7 @@ struct buffer_page {
> >  	local_t		 entries;	/* entries on this page */
> >  	unsigned long	 real_end;	/* real end of data */
> >  	unsigned	 order;		/* order of the page */
> > +	u32		 id;		/* ID for external mapping */
> >  	struct buffer_data_page *page;	/* Actual data page */
> >  };
> >  
> > @@ -484,6 +485,12 @@ struct ring_buffer_per_cpu {
> >  	u64				read_stamp;
> >  	/* pages removed since last reset */
> >  	unsigned long			pages_removed;
> > +
> > +	int				mapped;
> > +	struct mutex			mapping_lock;
> > +	unsigned long			*subbuf_ids;	/* ID to addr */
> > +	struct trace_buffer_meta	*meta_page;
> > +
> >  	/* ring buffer pages to update, > 0 to add, < 0 to remove */
> >  	long				nr_pages_to_update;
> >  	struct list_head		new_pages; /* new pages to add */
> > @@ -1542,6 +1549,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
> >  	init_irq_work(&cpu_buffer->irq_work.work, rb_wake_up_waiters);
> >  	init_waitqueue_head(&cpu_buffer->irq_work.waiters);
> >  	init_waitqueue_head(&cpu_buffer->irq_work.full_waiters);
> > +	mutex_init(&cpu_buffer->mapping_lock);
> >  
> >  	bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()),
> >  			    GFP_KERNEL, cpu_to_node(cpu));
> > @@ -5160,6 +5168,23 @@ static void rb_clear_buffer_page(struct buffer_page *page)
> >  	page->read = 0;
> >  }
> >  
> > +static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
> > +{
> > +	struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> > +
> > +	WRITE_ONCE(meta->entries, local_read(&cpu_buffer->entries));
> > +	WRITE_ONCE(meta->overrun, local_read(&cpu_buffer->overrun));
> > +	WRITE_ONCE(meta->read, cpu_buffer->read);
> > +
> > +	WRITE_ONCE(meta->subbufs_touched, local_read(&cpu_buffer->pages_touched));
> > +	WRITE_ONCE(meta->subbufs_lost, local_read(&cpu_buffer->pages_lost));
> > +	WRITE_ONCE(meta->subbufs_read, local_read(&cpu_buffer->pages_read));
> > +
> > +	WRITE_ONCE(meta->reader.read, cpu_buffer->reader_page->read);
> > +	WRITE_ONCE(meta->reader.id, cpu_buffer->reader_page->id);
> > +	WRITE_ONCE(meta->reader.lost_events, cpu_buffer->lost_events);
> > +}
> > +
> >  static void
> >  rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
> >  {
> > @@ -5204,6 +5229,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
> >  	cpu_buffer->lost_events = 0;
> >  	cpu_buffer->last_overrun = 0;
> >  
> > +	if (cpu_buffer->mapped)
> > +		rb_update_meta_page(cpu_buffer);
> > +
> >  	rb_head_page_activate(cpu_buffer);
> >  	cpu_buffer->pages_removed = 0;
> >  }
> > @@ -5418,6 +5446,11 @@ int ring_buffer_swap_cpu(struct trace_buffer *buffer_a,
> >  	cpu_buffer_a = buffer_a->buffers[cpu];
> >  	cpu_buffer_b = buffer_b->buffers[cpu];
> >  
> > +	if (READ_ONCE(cpu_buffer_a->mapped) || READ_ONCE(cpu_buffer_b->mapped)) {
> > +		ret = -EBUSY;
> > +		goto out;
> > +	}
> 
> Ah, this is not enough to stop snapshot. update_max_tr()@kernel/trace/trace.c
> is used for swapping all CPU buffer and it does not use this function.
> (but that should use this instead of accessing buffer directly...)
> 
> Maybe we need ring_buffer_swap() and it checks all cpu_buffer does not set
> mapping bits.
> 
> Thank you,

How about instead of having ring_buffer_swap_cpu() returning -EBUSY when mapped
we have two functions to block any mapping that trace.c could call?

 int rb_swap_prepare(struct ring_buffer *cpu_buffer)
 {
    mutex_lock(&cpu_buffer->mapping_lock);

    if (!cpu_buffer->mapped)
    	return 0;

    mutex_unlock(&cpu_buffer->mapping_lock);
 }

 int rb_swap_done(struct ring_buffer *cpu_buffer)
 {
    mutex_unlock(&cpu_buffer->mapping_lock);
 }

 int ring_buffer_swap_prepare(struct trace_buffer *buffer, int cpu)
 {
 ...
     
     if (cpu == RING_BUFFER_ALL_CPUS) {
    	 int cpu_i;

         for_each_buffer_cpu(buffer, cpu_i) {
	 	err = rb_swap_prepare(buffer->buffers[cpu_i]);
		if (err)
			break;
	 }

	 if (err)
	    /* Rollback ... */

	 return err;

     } 

     return rb_swap_prepare(buffer->buffers[cpu]);
 }

 void ring_buffer_swap_done(struct trace_buffer *buffer, int cpu)
 {
     if (cpu == RING_BUFFER_ALL_CPUS) {
    	 int cpu_i;

         for_each_buffer_cpu(buffer, cpu_i)
	 	rb_swap_done(buffer->buffers[cpu_i]);
	 return;
     }

     return rb_swap_done(buffer->buffers[cpu]);
 }

[...]

  reply	other threads:[~2024-01-15 15:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-11 16:17 [PATCH v11 0/5] Introducing trace buffer mapping by user-space Vincent Donnefort
2024-01-11 16:17 ` [PATCH v11 1/5] ring-buffer: Zero ring-buffer sub-buffers Vincent Donnefort
2024-01-13 13:38   ` Masami Hiramatsu
2024-01-11 16:17 ` [PATCH v11 2/5] ring-buffer: Introducing ring-buffer mapping functions Vincent Donnefort
2024-01-11 16:34   ` Mathieu Desnoyers
2024-01-11 23:23     ` Steven Rostedt
2024-01-12  9:13       ` Vincent Donnefort
2024-01-12 15:06         ` Steven Rostedt
2024-01-12 15:58           ` Steven Rostedt
2024-01-15  4:43   ` Masami Hiramatsu
2024-01-15 15:37     ` Vincent Donnefort [this message]
2024-01-15 16:09       ` Steven Rostedt
2024-01-15 16:23         ` Steven Rostedt
2024-01-15 17:29           ` Vincent Donnefort
2024-01-15 18:03             ` Steven Rostedt
2024-01-15 23:48           ` Masami Hiramatsu
2024-01-11 16:17 ` [PATCH v11 3/5] tracing: Allow user-space mapping of the ring-buffer Vincent Donnefort
2024-01-11 16:17 ` [PATCH v11 4/5] Documentation: tracing: Add ring-buffer mapping Vincent Donnefort
2024-01-13 13:36   ` Masami Hiramatsu
2024-01-14 14:26   ` Masami Hiramatsu
2024-01-14 16:23     ` Steven Rostedt
2024-01-14 23:10       ` Masami Hiramatsu
2024-01-11 16:17 ` [PATCH v11 5/5] ring-buffer/selftest: Add ring-buffer mapping test Vincent Donnefort
2024-01-13 13:39   ` Masami Hiramatsu
2024-01-14 14:17     ` Masami Hiramatsu
2024-01-14 16:20       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaVRO76JxaHjPwCi@google.com \
    --to=vdonnefort@google.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.