public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: ltt-dev@lists.casi.polymtl.ca, linux-kernel@vger.kernel.org,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Robert Wisniewski <bob@watson.ibm.com>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: LTTng 0.146, adds extra read-side sub-buffer for flight recorder
Date: Tue, 14 Jul 2009 18:57:30 -0400	[thread overview]
Message-ID: <20090714225730.GA19199@Krystal> (raw)
In-Reply-To: <alpine.DEB.2.00.0907141824200.32740@gandalf.stny.rr.com>

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> 
> On Mon, 13 Jul 2009, Mathieu Desnoyers wrote:
> 
> > Hi,
> > 
> > So, I needed a weekend break from writing my thesis (It's almost over!) ;)
> > and I had the great idea to try to come up with a way to ensure that
> > LTTng flight recorder mode permits to have a read-side that never sees
> > corrupted data.
> > 
> > Basically, this is the main thing Steven have been asking me for a
> > while. And it looks like I just figured out a way to do it.
> > 
> > So for flight recorder tracing, this new LTTng version allocates an
> > extra subbuffer which gets exchanged by the reader with the writer
> > subbuffer before it gets read.
> > 
> > Normal tracing does not need this extra subbuffer, because the
> > write-side just drops events when the buffer is full. So we don't
> > allocate it and we don't perform any exchange. The space
> > reservation/commit code plays nicely with both flight recorder and
> > normal tracing schemes.
> > 
> > Here is how I did it:
> > 
> > No modification was required to the buffer space reservation/commit
> > algorithm. I just had to do the following at the backend level
> > (responsible for writing data to/reader data from the buffer):
> > 
> > I am using an array of pointers (one pointer for each subbuffer), plus a
> > pointer to the reader subbuffer. Each of these pointers are pointing to
> > an array of pages, which are all the pages that constitute a subbuffer.
> > Reads/writes from/to the buffer are done by accessors which pick up the
> > right page location within this page table. By modifying the top-level
> > subbuffer pointer, we can swap a whole subbuffer in a single operation.
> > 
> > There is a trick to deal with concurrency between writer and reader.
> > When the top-level subbuffer pointers are not used (no writer is
> > currently writing into it, no reader is reading from its subbuffer), we
> > set a RCHAN_NOREF_FLAG (value: 0x1) which indicates that no reference is
> > currently taken to this subbuffer. As long as this flag is set in the
> > pointer, it is safe for the reader to exchange it. When the writer needs
> > to access this subbuffer for writing, it clears the flag, and sets it
> > back after committing the last piece of data to it.
> > 
> > When the reader figures out that the write-side subbuffer it is trying
> > to exchange has a reference, it fails with -EAGAIN.
> > 
> > Nice things about the way I do it here:
> > 
> > - I keep the separation between the space reservation layer and back-end
> >   buffer layer. The extra reader subbuffer exchange is done at the
> >   back-end layer. The reason why it took me so long to try to come up
> >   with something is that I tried to do it at the space reservation
> >   layer, which was not fitting well the space reservation semantics.
> > 
> > - Keeping space reservation and physical buffer management separate
> >   helps splitting complexity into sub-layers easier to verify.
> > 
> > - Given the space reservation/commit is separate from the subbuffer
> >   exchange per se, I don't need any special-cases for "if the tail
> >   pointer is in the reader page".... these things never happen because
> >   the reserve, commit and consumed counts are completely unrelated to
> >   the pointers to physical subbuffers.
> 
> I don't yet have time to read the patches (not this week, anyway), but I'm 
> assuming that you can only get the new page (swap) while a writer is not 
> writing to it. Thus if it is not a full page, then you must either copy 
> the data, or swap out a non full page. Not complaining here, just trying 
> to understand it :-)

Yep, it's a requirement that when a subbuffer is being written to, it's
not possible for the reader to exchange it, so it's impossible to read
it. It simplifies a lot of things.

> 
> Thus the trick is that you have a series of pointers to the data, and you 
> swap out the data and not the list?  Hmm, actually the ring buffer is 
> already like that and I probably could do the same.

There is no list involved per se. I exchange the pointers to these
structures, not the data itself.

Let's say I have 2 sub-buffers for the writer and one extra sub-buffer
for the reader. I would have:

- An array of 2 pointers to sub-buffer structures. (owned by the writer)
- 1 pointer to subbuffer structure (owned by the reader).

> 
> Here's another thing that the ring buffer does (and makes things a little 
> complex too) is that it keeps track of the number of entries in the buffer 
> as well as the number of overruns. The number of entries in the page is 
> kept in the list data and not the data page itself.

I could add these counters to the "sub-buffer structure". They are not
part of the data itself. When I exchange the top-level pointers to these
structures, the counters, which are part of these structures, will
follow.

> 
> Using a special flag to switch out the data instead of breaking the link 
> list may make things much simpler.

Yeah :) That's why I've chosen to use such flag. And using a linked list
seems like overly complex compared to the simple 2-levels page table I
use here.

> 
> Hmm, I'll take a round to make the ring buffer closer to what you have 
> done. At this rate, we may finally merge the two to handle things that we 
> both need ;-)

Hopefully :) Please don't hesitate for more info if you need some.

I'll have to go back to my thesis next week, but hopefully within 2 more
weeks I should be almost done and more available.

Mathieu

> 
> -- Steve
> 
> 
> > 
> > As always, the tree is available at:
> > 
> > http://git.kernel.org/?p=linux/kernel/git/compudj/linux-2.6-lttng.git
> > git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
> > 
> > The commits implementing this the extra reader page for the lockless
> > scheme are:
> > 
> > lttng-relay-per-subbuffer-index.patch
> > lttng-relay-per-subbuffer-index-low-bit-noref.patch
> > lttng-relay-lockless-writer-use-noref-flag.patch
> > lttng-relay-default-sb-index-to-noref.patch
> > lttng-relay-lockless-exchange-reader-writer-pages.patch
> > 
> > Comments are welcome,
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > -- 
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> > 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2009-07-14 22:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-13  7:14 LTTng 0.146, adds extra read-side sub-buffer for flight recorder Mathieu Desnoyers
2009-07-14 22:33 ` Steven Rostedt
2009-07-14 22:57   ` Mathieu Desnoyers [this message]
2009-07-15 13:57   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090714225730.GA19199@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=bob@watson.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltt-dev@lists.casi.polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox