* LTTng 0.146, adds extra read-side sub-buffer for flight recorder
@ 2009-07-13 7:14 Mathieu Desnoyers
2009-07-14 22:33 ` Steven Rostedt
0 siblings, 1 reply; 4+ messages in thread
From: Mathieu Desnoyers @ 2009-07-13 7:14 UTC (permalink / raw)
To: ltt-dev
Cc: linux-kernel, Steven Rostedt, Lai Jiangshan, KOSAKI Motohiro,
Frederic Weisbecker, Robert Wisniewski, Ingo Molnar
Hi,
So, I needed a weekend break from writing my thesis (It's almost over!) ;)
and I had the great idea to try to come up with a way to ensure that
LTTng flight recorder mode permits to have a read-side that never sees
corrupted data.
Basically, this is the main thing Steven have been asking me for a
while. And it looks like I just figured out a way to do it.
So for flight recorder tracing, this new LTTng version allocates an
extra subbuffer which gets exchanged by the reader with the writer
subbuffer before it gets read.
Normal tracing does not need this extra subbuffer, because the
write-side just drops events when the buffer is full. So we don't
allocate it and we don't perform any exchange. The space
reservation/commit code plays nicely with both flight recorder and
normal tracing schemes.
Here is how I did it:
No modification was required to the buffer space reservation/commit
algorithm. I just had to do the following at the backend level
(responsible for writing data to/reader data from the buffer):
I am using an array of pointers (one pointer for each subbuffer), plus a
pointer to the reader subbuffer. Each of these pointers are pointing to
an array of pages, which are all the pages that constitute a subbuffer.
Reads/writes from/to the buffer are done by accessors which pick up the
right page location within this page table. By modifying the top-level
subbuffer pointer, we can swap a whole subbuffer in a single operation.
There is a trick to deal with concurrency between writer and reader.
When the top-level subbuffer pointers are not used (no writer is
currently writing into it, no reader is reading from its subbuffer), we
set a RCHAN_NOREF_FLAG (value: 0x1) which indicates that no reference is
currently taken to this subbuffer. As long as this flag is set in the
pointer, it is safe for the reader to exchange it. When the writer needs
to access this subbuffer for writing, it clears the flag, and sets it
back after committing the last piece of data to it.
When the reader figures out that the write-side subbuffer it is trying
to exchange has a reference, it fails with -EAGAIN.
Nice things about the way I do it here:
- I keep the separation between the space reservation layer and back-end
buffer layer. The extra reader subbuffer exchange is done at the
back-end layer. The reason why it took me so long to try to come up
with something is that I tried to do it at the space reservation
layer, which was not fitting well the space reservation semantics.
- Keeping space reservation and physical buffer management separate
helps splitting complexity into sub-layers easier to verify.
- Given the space reservation/commit is separate from the subbuffer
exchange per se, I don't need any special-cases for "if the tail
pointer is in the reader page".... these things never happen because
the reserve, commit and consumed counts are completely unrelated to
the pointers to physical subbuffers.
As always, the tree is available at:
http://git.kernel.org/?p=linux/kernel/git/compudj/linux-2.6-lttng.git
git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
The commits implementing this the extra reader page for the lockless
scheme are:
lttng-relay-per-subbuffer-index.patch
lttng-relay-per-subbuffer-index-low-bit-noref.patch
lttng-relay-lockless-writer-use-noref-flag.patch
lttng-relay-default-sb-index-to-noref.patch
lttng-relay-lockless-exchange-reader-writer-pages.patch
Comments are welcome,
Thanks,
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: LTTng 0.146, adds extra read-side sub-buffer for flight recorder
2009-07-13 7:14 LTTng 0.146, adds extra read-side sub-buffer for flight recorder Mathieu Desnoyers
@ 2009-07-14 22:33 ` Steven Rostedt
2009-07-14 22:57 ` Mathieu Desnoyers
2009-07-15 13:57 ` Mathieu Desnoyers
0 siblings, 2 replies; 4+ messages in thread
From: Steven Rostedt @ 2009-07-14 22:33 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: ltt-dev, linux-kernel, Lai Jiangshan, KOSAKI Motohiro,
Frederic Weisbecker, Robert Wisniewski, Ingo Molnar
On Mon, 13 Jul 2009, Mathieu Desnoyers wrote:
> Hi,
>
> So, I needed a weekend break from writing my thesis (It's almost over!) ;)
> and I had the great idea to try to come up with a way to ensure that
> LTTng flight recorder mode permits to have a read-side that never sees
> corrupted data.
>
> Basically, this is the main thing Steven have been asking me for a
> while. And it looks like I just figured out a way to do it.
>
> So for flight recorder tracing, this new LTTng version allocates an
> extra subbuffer which gets exchanged by the reader with the writer
> subbuffer before it gets read.
>
> Normal tracing does not need this extra subbuffer, because the
> write-side just drops events when the buffer is full. So we don't
> allocate it and we don't perform any exchange. The space
> reservation/commit code plays nicely with both flight recorder and
> normal tracing schemes.
>
> Here is how I did it:
>
> No modification was required to the buffer space reservation/commit
> algorithm. I just had to do the following at the backend level
> (responsible for writing data to/reader data from the buffer):
>
> I am using an array of pointers (one pointer for each subbuffer), plus a
> pointer to the reader subbuffer. Each of these pointers are pointing to
> an array of pages, which are all the pages that constitute a subbuffer.
> Reads/writes from/to the buffer are done by accessors which pick up the
> right page location within this page table. By modifying the top-level
> subbuffer pointer, we can swap a whole subbuffer in a single operation.
>
> There is a trick to deal with concurrency between writer and reader.
> When the top-level subbuffer pointers are not used (no writer is
> currently writing into it, no reader is reading from its subbuffer), we
> set a RCHAN_NOREF_FLAG (value: 0x1) which indicates that no reference is
> currently taken to this subbuffer. As long as this flag is set in the
> pointer, it is safe for the reader to exchange it. When the writer needs
> to access this subbuffer for writing, it clears the flag, and sets it
> back after committing the last piece of data to it.
>
> When the reader figures out that the write-side subbuffer it is trying
> to exchange has a reference, it fails with -EAGAIN.
>
> Nice things about the way I do it here:
>
> - I keep the separation between the space reservation layer and back-end
> buffer layer. The extra reader subbuffer exchange is done at the
> back-end layer. The reason why it took me so long to try to come up
> with something is that I tried to do it at the space reservation
> layer, which was not fitting well the space reservation semantics.
>
> - Keeping space reservation and physical buffer management separate
> helps splitting complexity into sub-layers easier to verify.
>
> - Given the space reservation/commit is separate from the subbuffer
> exchange per se, I don't need any special-cases for "if the tail
> pointer is in the reader page".... these things never happen because
> the reserve, commit and consumed counts are completely unrelated to
> the pointers to physical subbuffers.
I don't yet have time to read the patches (not this week, anyway), but I'm
assuming that you can only get the new page (swap) while a writer is not
writing to it. Thus if it is not a full page, then you must either copy
the data, or swap out a non full page. Not complaining here, just trying
to understand it :-)
Thus the trick is that you have a series of pointers to the data, and you
swap out the data and not the list? Hmm, actually the ring buffer is
already like that and I probably could do the same.
Here's another thing that the ring buffer does (and makes things a little
complex too) is that it keeps track of the number of entries in the buffer
as well as the number of overruns. The number of entries in the page is
kept in the list data and not the data page itself.
Using a special flag to switch out the data instead of breaking the link
list may make things much simpler.
Hmm, I'll take a round to make the ring buffer closer to what you have
done. At this rate, we may finally merge the two to handle things that we
both need ;-)
-- Steve
>
> As always, the tree is available at:
>
> http://git.kernel.org/?p=linux/kernel/git/compudj/linux-2.6-lttng.git
> git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
>
> The commits implementing this the extra reader page for the lockless
> scheme are:
>
> lttng-relay-per-subbuffer-index.patch
> lttng-relay-per-subbuffer-index-low-bit-noref.patch
> lttng-relay-lockless-writer-use-noref-flag.patch
> lttng-relay-default-sb-index-to-noref.patch
> lttng-relay-lockless-exchange-reader-writer-pages.patch
>
> Comments are welcome,
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: LTTng 0.146, adds extra read-side sub-buffer for flight recorder
2009-07-14 22:33 ` Steven Rostedt
@ 2009-07-14 22:57 ` Mathieu Desnoyers
2009-07-15 13:57 ` Mathieu Desnoyers
1 sibling, 0 replies; 4+ messages in thread
From: Mathieu Desnoyers @ 2009-07-14 22:57 UTC (permalink / raw)
To: Steven Rostedt
Cc: ltt-dev, linux-kernel, Lai Jiangshan, KOSAKI Motohiro,
Frederic Weisbecker, Robert Wisniewski, Ingo Molnar
* Steven Rostedt (rostedt@goodmis.org) wrote:
>
>
> On Mon, 13 Jul 2009, Mathieu Desnoyers wrote:
>
> > Hi,
> >
> > So, I needed a weekend break from writing my thesis (It's almost over!) ;)
> > and I had the great idea to try to come up with a way to ensure that
> > LTTng flight recorder mode permits to have a read-side that never sees
> > corrupted data.
> >
> > Basically, this is the main thing Steven have been asking me for a
> > while. And it looks like I just figured out a way to do it.
> >
> > So for flight recorder tracing, this new LTTng version allocates an
> > extra subbuffer which gets exchanged by the reader with the writer
> > subbuffer before it gets read.
> >
> > Normal tracing does not need this extra subbuffer, because the
> > write-side just drops events when the buffer is full. So we don't
> > allocate it and we don't perform any exchange. The space
> > reservation/commit code plays nicely with both flight recorder and
> > normal tracing schemes.
> >
> > Here is how I did it:
> >
> > No modification was required to the buffer space reservation/commit
> > algorithm. I just had to do the following at the backend level
> > (responsible for writing data to/reader data from the buffer):
> >
> > I am using an array of pointers (one pointer for each subbuffer), plus a
> > pointer to the reader subbuffer. Each of these pointers are pointing to
> > an array of pages, which are all the pages that constitute a subbuffer.
> > Reads/writes from/to the buffer are done by accessors which pick up the
> > right page location within this page table. By modifying the top-level
> > subbuffer pointer, we can swap a whole subbuffer in a single operation.
> >
> > There is a trick to deal with concurrency between writer and reader.
> > When the top-level subbuffer pointers are not used (no writer is
> > currently writing into it, no reader is reading from its subbuffer), we
> > set a RCHAN_NOREF_FLAG (value: 0x1) which indicates that no reference is
> > currently taken to this subbuffer. As long as this flag is set in the
> > pointer, it is safe for the reader to exchange it. When the writer needs
> > to access this subbuffer for writing, it clears the flag, and sets it
> > back after committing the last piece of data to it.
> >
> > When the reader figures out that the write-side subbuffer it is trying
> > to exchange has a reference, it fails with -EAGAIN.
> >
> > Nice things about the way I do it here:
> >
> > - I keep the separation between the space reservation layer and back-end
> > buffer layer. The extra reader subbuffer exchange is done at the
> > back-end layer. The reason why it took me so long to try to come up
> > with something is that I tried to do it at the space reservation
> > layer, which was not fitting well the space reservation semantics.
> >
> > - Keeping space reservation and physical buffer management separate
> > helps splitting complexity into sub-layers easier to verify.
> >
> > - Given the space reservation/commit is separate from the subbuffer
> > exchange per se, I don't need any special-cases for "if the tail
> > pointer is in the reader page".... these things never happen because
> > the reserve, commit and consumed counts are completely unrelated to
> > the pointers to physical subbuffers.
>
> I don't yet have time to read the patches (not this week, anyway), but I'm
> assuming that you can only get the new page (swap) while a writer is not
> writing to it. Thus if it is not a full page, then you must either copy
> the data, or swap out a non full page. Not complaining here, just trying
> to understand it :-)
Yep, it's a requirement that when a subbuffer is being written to, it's
not possible for the reader to exchange it, so it's impossible to read
it. It simplifies a lot of things.
>
> Thus the trick is that you have a series of pointers to the data, and you
> swap out the data and not the list? Hmm, actually the ring buffer is
> already like that and I probably could do the same.
There is no list involved per se. I exchange the pointers to these
structures, not the data itself.
Let's say I have 2 sub-buffers for the writer and one extra sub-buffer
for the reader. I would have:
- An array of 2 pointers to sub-buffer structures. (owned by the writer)
- 1 pointer to subbuffer structure (owned by the reader).
>
> Here's another thing that the ring buffer does (and makes things a little
> complex too) is that it keeps track of the number of entries in the buffer
> as well as the number of overruns. The number of entries in the page is
> kept in the list data and not the data page itself.
I could add these counters to the "sub-buffer structure". They are not
part of the data itself. When I exchange the top-level pointers to these
structures, the counters, which are part of these structures, will
follow.
>
> Using a special flag to switch out the data instead of breaking the link
> list may make things much simpler.
Yeah :) That's why I've chosen to use such flag. And using a linked list
seems like overly complex compared to the simple 2-levels page table I
use here.
>
> Hmm, I'll take a round to make the ring buffer closer to what you have
> done. At this rate, we may finally merge the two to handle things that we
> both need ;-)
Hopefully :) Please don't hesitate for more info if you need some.
I'll have to go back to my thesis next week, but hopefully within 2 more
weeks I should be almost done and more available.
Mathieu
>
> -- Steve
>
>
> >
> > As always, the tree is available at:
> >
> > http://git.kernel.org/?p=linux/kernel/git/compudj/linux-2.6-lttng.git
> > git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
> >
> > The commits implementing this the extra reader page for the lockless
> > scheme are:
> >
> > lttng-relay-per-subbuffer-index.patch
> > lttng-relay-per-subbuffer-index-low-bit-noref.patch
> > lttng-relay-lockless-writer-use-noref-flag.patch
> > lttng-relay-default-sb-index-to-noref.patch
> > lttng-relay-lockless-exchange-reader-writer-pages.patch
> >
> > Comments are welcome,
> >
> > Thanks,
> >
> > Mathieu
> >
> > --
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
> >
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: LTTng 0.146, adds extra read-side sub-buffer for flight recorder
2009-07-14 22:33 ` Steven Rostedt
2009-07-14 22:57 ` Mathieu Desnoyers
@ 2009-07-15 13:57 ` Mathieu Desnoyers
1 sibling, 0 replies; 4+ messages in thread
From: Mathieu Desnoyers @ 2009-07-15 13:57 UTC (permalink / raw)
To: Steven Rostedt
Cc: ltt-dev, linux-kernel, Lai Jiangshan, KOSAKI Motohiro,
Frederic Weisbecker, Robert Wisniewski, Ingo Molnar
* Steven Rostedt (rostedt@goodmis.org) wrote:
>
>
> On Mon, 13 Jul 2009, Mathieu Desnoyers wrote:
>
> > Hi,
> >
> > So, I needed a weekend break from writing my thesis (It's almost over!) ;)
> > and I had the great idea to try to come up with a way to ensure that
> > LTTng flight recorder mode permits to have a read-side that never sees
> > corrupted data.
> >
> > Basically, this is the main thing Steven have been asking me for a
> > while. And it looks like I just figured out a way to do it.
> >
> > So for flight recorder tracing, this new LTTng version allocates an
> > extra subbuffer which gets exchanged by the reader with the writer
> > subbuffer before it gets read.
> >
> > Normal tracing does not need this extra subbuffer, because the
> > write-side just drops events when the buffer is full. So we don't
> > allocate it and we don't perform any exchange. The space
> > reservation/commit code plays nicely with both flight recorder and
> > normal tracing schemes.
> >
> > Here is how I did it:
> >
> > No modification was required to the buffer space reservation/commit
> > algorithm. I just had to do the following at the backend level
> > (responsible for writing data to/reader data from the buffer):
> >
> > I am using an array of pointers (one pointer for each subbuffer), plus a
> > pointer to the reader subbuffer. Each of these pointers are pointing to
> > an array of pages, which are all the pages that constitute a subbuffer.
> > Reads/writes from/to the buffer are done by accessors which pick up the
> > right page location within this page table. By modifying the top-level
> > subbuffer pointer, we can swap a whole subbuffer in a single operation.
> >
> > There is a trick to deal with concurrency between writer and reader.
> > When the top-level subbuffer pointers are not used (no writer is
> > currently writing into it, no reader is reading from its subbuffer), we
> > set a RCHAN_NOREF_FLAG (value: 0x1) which indicates that no reference is
> > currently taken to this subbuffer. As long as this flag is set in the
> > pointer, it is safe for the reader to exchange it. When the writer needs
> > to access this subbuffer for writing, it clears the flag, and sets it
> > back after committing the last piece of data to it.
> >
> > When the reader figures out that the write-side subbuffer it is trying
> > to exchange has a reference, it fails with -EAGAIN.
> >
> > Nice things about the way I do it here:
> >
> > - I keep the separation between the space reservation layer and back-end
> > buffer layer. The extra reader subbuffer exchange is done at the
> > back-end layer. The reason why it took me so long to try to come up
> > with something is that I tried to do it at the space reservation
> > layer, which was not fitting well the space reservation semantics.
> >
> > - Keeping space reservation and physical buffer management separate
> > helps splitting complexity into sub-layers easier to verify.
> >
> > - Given the space reservation/commit is separate from the subbuffer
> > exchange per se, I don't need any special-cases for "if the tail
> > pointer is in the reader page".... these things never happen because
> > the reserve, commit and consumed counts are completely unrelated to
> > the pointers to physical subbuffers.
>
> I don't yet have time to read the patches (not this week, anyway), but I'm
> assuming that you can only get the new page (swap) while a writer is not
> writing to it. Thus if it is not a full page, then you must either copy
> the data, or swap out a non full page. Not complaining here, just trying
> to understand it :-)
>
> Thus the trick is that you have a series of pointers to the data, and you
> swap out the data and not the list? Hmm, actually the ring buffer is
> already like that and I probably could do the same.
>
> Here's another thing that the ring buffer does (and makes things a little
> complex too) is that it keeps track of the number of entries in the buffer
> as well as the number of overruns. The number of entries in the page is
> kept in the list data and not the data page itself.
>
> Using a special flag to switch out the data instead of breaking the link
> list may make things much simpler.
>
> Hmm, I'll take a round to make the ring buffer closer to what you have
> done. At this rate, we may finally merge the two to handle things that we
> both need ;-)
>
BTW, I just fixed a race I thought about in the commit path. Basically,
I had to find a way to make the subbuffer delivery happen only once per
subbuffer written without adding a local_add_return to the commit fast
path. I did it with a local_cmpxchg done only at potential subbuffer
delivery. It's detailed in the header of
lttng-relay-lockless-writer-use-noref-flag.patch
of LTTng 0.147.
Mathieu
> -- Steve
>
>
> >
> > As always, the tree is available at:
> >
> > http://git.kernel.org/?p=linux/kernel/git/compudj/linux-2.6-lttng.git
> > git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
> >
> > The commits implementing this the extra reader page for the lockless
> > scheme are:
> >
> > lttng-relay-per-subbuffer-index.patch
> > lttng-relay-per-subbuffer-index-low-bit-noref.patch
> > lttng-relay-lockless-writer-use-noref-flag.patch
> > lttng-relay-default-sb-index-to-noref.patch
> > lttng-relay-lockless-exchange-reader-writer-pages.patch
> >
> > Comments are welcome,
> >
> > Thanks,
> >
> > Mathieu
> >
> > --
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
> >
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-07-15 13:58 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-13 7:14 LTTng 0.146, adds extra read-side sub-buffer for flight recorder Mathieu Desnoyers
2009-07-14 22:33 ` Steven Rostedt
2009-07-14 22:57 ` Mathieu Desnoyers
2009-07-15 13:57 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox