From: Steven Rostedt <rostedt@goodmis.org>
To: Vincent Donnefort <vdonnefort@google.com>
Cc: Linux Trace Devel <linux-trace-devel@vger.kernel.org>
Subject: Re: [PATCH] libtracefs: Add ring buffer memory mapping APIs
Date: Fri, 5 Jan 2024 14:00:10 -0500 [thread overview]
Message-ID: <20240105140010.06c09e3a@gandalf.local.home> (raw)
In-Reply-To: <ZZhJPaFoVuZEO0IO@google.com>
On Fri, 5 Jan 2024 18:23:57 +0000
Vincent Donnefort <vdonnefort@google.com> wrote:
> On Fri, Jan 05, 2024 at 12:31:11PM -0500, Steven Rostedt wrote:
> > On Fri, 5 Jan 2024 14:25:15 +0000
> > Vincent Donnefort <vdonnefort@google.com> wrote:
> >
> > > > > Maybe this ioctl should be called regardless if events are found on the current
> > > > > reader page. This would at least update the reader->read field and make sure
> > > > > subsequent readers are not getting the same events we already had here?
> > > >
> > > > If we call the ioctl() before we are finished reading events, the events on
> > > > the reader page will be discarded and added back to the writer buffer if
> > > > the writer is not on the reader page.
> > >
> > > Hum, I am a bit confused here. If the writer is not on the reader page, then
> > > there's no way a writer could overwrite these events while we access them?
> >
> > Maybe I'm confused. Where do you want to call the ioctl again? If the
> > writer is off the reader page and we call the ioctl() where no new events
> > were added since our last read, the ioctl() will advance the reader page,
> > will it not? That is, any events that were on the previous reader page that
> > we did not read yet is lost.
>
> I meant like the snippet you shared below.
>
> If on a reader page, there are "unread" events, the ioctl will simply return the
> same reader page, as long as there is something to read. It is then safe to call
> the ioctl unconditionally.
There's only unread events the first time we read the cpubuffer after
mapping. Then every ioctl() will give a new page if the writer is not
currently on it.
int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
{
struct ring_buffer_per_cpu *cpu_buffer;
unsigned long reader_size;
unsigned long flags;
cpu_buffer = rb_get_mapped_buffer(buffer, cpu);
if (IS_ERR(cpu_buffer))
return (int)PTR_ERR(cpu_buffer);
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
consume:
if (rb_per_cpu_empty(cpu_buffer))
goto out;
reader_size = rb_page_size(cpu_buffer->reader_page);
/*
* There are data to be read on the current reader page, we can
* return to the caller. But before that, we assume the latter will read
* everything. Let's update the kernel reader accordingly.
*/
if (cpu_buffer->reader_page->read < reader_size) {
while (cpu_buffer->reader_page->read < reader_size)
rb_advance_reader(cpu_buffer);
// This updates the reader_page to the size of the page.
// This means that the next time this is called, it will swap the page.
goto out;
}
if (WARN_ON(!rb_get_reader_page(cpu_buffer)))
goto out;
goto consume;
out:
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
rb_put_mapped_buffer(cpu_buffer);
return 0;
}
>
> >
> > The trace_mmap_load_subbuf() is called to update the reader page. If for
> > some reason the kbuf hasn't read all the data and calls the
> > tracefs_cpu_read_buf() again, it should still return the same kbuf, but
> > updated.
> >
> > kbuf = tracefs_cpu_read_buf() {
> > ret = trace_mmap_load_subbuf(tcpu->mapping, tcpu->kbuf) {
> > if (data != kbuffer_subbuffer(kbuf)) {
> > kbuffer_load_subbuffer(kbuf, data);
> > return 1;
> > }
> > return ret > 0 ? tcpu->kbuf : NULL;
> > }
> >
> > record.data = kbuffer_read_event(kbuf, &record.ts);
> > kbuffer_next_event(kbuf, NULL);
> >
> > kbuf = tracefs_cpu_read_buf() {
> > ret = trace_mmap_load_subbuf(tcpu->mapping, tcpu->kbuf) {
> > kbuffer_refresh(kbuf);
> > /* Are there still events to read? */
> > if (kbuffer_curr_size(kbuf))
> > return 1;
> >
> > The above should return because the application has not read all of the
> > kbuf yet. This is perfectly fine for the application to do this.
> >
> > If we don't return here, but instead do:
> >
> > ioctl(tmap->fd, TRACE_MMAP_IOCTL_GET_READER);
> >
> > The buffer that kbuf points to will be swapped out with a new page. And the
> > data on the buffer is lost.
>
> It shouldn't, the same reader page will be returned, only the reader_page->read
> pointer would be updated.
But after the first read of the page, the reader_page->read is at the end,
and will swap.
>
> >
> > Hmm, but looking at how the code currently works without mapping, it looks
> > like it would do the read again anyway assuming that the kbuf was
> > completely read before reading again. Perhaps it would make the logic
> > simpler to assume that the buffer was completely read before this gets
> > called again. That is, we could have it do:
> >
> > if (data != kbuffer_subbuffer(kbuf)) {
> > kbuffer_load_subbuffer(kbuf, data);
> > return 1;
> > }
> >
> > if (ioctl(tmap->fd, TRACE_MMAP_IOCTL_GET_READER) < 0)
> > return -1;
> >
> > if (id != tmap->map->reader.id) {
> > /* New page, just reload it */
> > data = tmap->data + tmap->map->subbuf_size * id;
> > kbuffer_load_subbuffer(kbuf, data);
> > return 1;
> > }
> >
> > /*
> > * Perhaps the reader page had a write that added
> > * more data.
> > */
> > kbuffer_refresh(kbuf);
> >
> > /* Are there still events to read? */
> > return kbuffer_curr_size(kbuf) ? 1 : 0;
> >
>
> That sounds good
>
> >
> > >
> > > >
> > > > And there should only be one reader accessing the map. This is not thread
> > > > safe. Once you load the subbuffer into kbuf, kbuf handles what was read. We
> > > > don't want to use the reader page for that.
> > >
> > > I had in mind a scenario with two sequential read. In that case only the reader
> > > page read could help to "save" what has been read so far.
> > >
> > > Currently, we have:
> > >
> > > <event A>
> > > ./cpu-map
> > > print event A
> > >
> > > <event B>
> > > ./cpu-map
> > > print event A
> > > print event B
> >
> > The kbuffer keeps track of what was read from it, so I'm still confused by
> > what you mean.
>
> We probably want to keep track of what has already been consumed outside of
> the remits of a single libtraceevent execution.
>
> But now with the code snippet above, in addition to the fast-forward when the
> kbuf is first loaded (kbuffer_read_buffer(kbuf, tmpbuf, tmap->map->reader.read)),
> we shouldn't have that problem anymore.
>
> <event A>
> $ ./cpu-map
> print event A
>
> <event B>
> $ ./cpu-map
> print event B
I think that should be on the kernel side. The first mapping should update
the read meta page and then update the reader_page so that the next mapping
will give something different.
>
> Or even
>
> <event A>
> $ ./cpu-map
> print event A
>
> <event B>
> $ cat /sys/kernel/tracing/trace_pipe
> print event B
So this is a kernel change, not a library one.
-- Steve
next prev parent reply other threads:[~2024-01-05 18:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-29 1:11 [PATCH] libtracefs: Add ring buffer memory mapping APIs Steven Rostedt
2024-01-05 9:17 ` Vincent Donnefort
2024-01-05 13:41 ` Steven Rostedt
2024-01-05 14:25 ` Vincent Donnefort
2024-01-05 17:31 ` Steven Rostedt
2024-01-05 18:23 ` Vincent Donnefort
2024-01-05 19:00 ` Steven Rostedt [this message]
2024-01-05 20:11 ` Steven Rostedt
2024-01-05 20:22 ` Steven Rostedt
-- strict thread matches above, loose matches on Subject: below --
2024-01-05 20:29 Steven Rostedt
2024-01-08 14:25 ` Vincent Donnefort
2024-01-08 17:16 ` Steven Rostedt
2024-01-08 17:34 ` Vincent Donnefort
2024-01-23 9:52 ` Vincent Donnefort
2024-01-23 15:15 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240105140010.06c09e3a@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=linux-trace-devel@vger.kernel.org \
--cc=vdonnefort@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).