* [PATCH 0/6] blkin (LTTng + Zipkin) tracing
@ 2014-11-12 23:19 Andrew Shewmaker
2014-11-13 16:14 ` Sage Weil
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Shewmaker @ 2014-11-12 23:19 UTC (permalink / raw)
To: ceph-devel
The following patches are a cleaned up version of the work
Marios Kogias first posted in August.
http://www.spinics.net/lists/ceph-devel/msg19890.html
The changes have been made against Ceph 0.80.1, and will be
moved forward soon.
With them Ceph can use Blkin, a library created by Marios Kogias and others,
which enables tracking a specific request from the time it enters
the system at higher levels till it is finally served by RADOS.
In general, Blkin implements the tracing semantics described in the Dapper
paper http://static.googleusercontent.com/media/research.google.com/el/pubs/archive/36356.pdf
in order to trace the causal relationships between the different
processing phases that an IO request may trigger. The goal is an end-to-end
visualisation of the request's route in the system, accompanied by information
concerning latencies in each processing phase. Thanks to LTTng this can happen
with a minimal overhead and in realtime. In order to visualize the results Blkin
was integrated with Twitter's Zipkin http://twitter.github.io/zipkin/
(which is a tracing system entirely based on Dapper).
These patches can also be found in https://github.com/agshew/ceph/tree/wip-blkin
In addition to cleanup, I've written a short document describing how to
test Blkin tracing in Ceph (without Zipkin). See doc/dev/trace.rst
Note that I have a question in to Marios concerning a compiler warning for
ignoring the return value of write() in Message::init_trace_info().
The same calls also use a hardcoded file descriptor 3. I'm guessing this
code was just used by him for debugging Blkin and can be removed, but
I've left it for the moment.
In the immediate future I plan to:
- push a wip-blkin branch to github.com/ceph and take advantage of gitbuilder test/qa
- move the changes forward to ceph:master
- add Andreas' tracepoints https://github.com/ceph/ceph/pull/2877 using Blkin
and investigate how easy it is to select the level of tracing detail
Questions:
1. Did I split the patches into sensible groups?
2. How low is LTTng's overhead? Is it entirely eliminated when not enabled?
Do we need to take advantage of something like the Linux kernel's CONFIG_DYNAMIC_FTRACE
trick, where a special mcount() function is converted back and forth between
a NOP and trace calls? See http://lwn.net/Articles/365835/ for a little more
detail.
3. Also on the topic of performance, does the API for adding keyvalues need versions
of annotations that used tracing functions with vectorized arguments? For instance,
when many details about an event are required (e.g. read vs. write, length, etc.)
or if multiple types of events are created simultaneously?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 0/6] blkin (LTTng + Zipkin) tracing
2014-11-12 23:19 [PATCH 0/6] blkin (LTTng + Zipkin) tracing Andrew Shewmaker
@ 2014-11-13 16:14 ` Sage Weil
2014-11-13 17:56 ` Andrew Shewmaker
0 siblings, 1 reply; 3+ messages in thread
From: Sage Weil @ 2014-11-13 16:14 UTC (permalink / raw)
To: Andrew Shewmaker; +Cc: ceph-devel
On Wed, 12 Nov 2014, Andrew Shewmaker wrote:
> The following patches are a cleaned up version of the work
> Marios Kogias first posted in August.
> http://www.spinics.net/lists/ceph-devel/msg19890.html
> The changes have been made against Ceph 0.80.1, and will be
> moved forward soon.
>
> With them Ceph can use Blkin, a library created by Marios Kogias and others,
> which enables tracking a specific request from the time it enters
> the system at higher levels till it is finally served by RADOS.
>
> In general, Blkin implements the tracing semantics described in the Dapper
> paper http://static.googleusercontent.com/media/research.google.com/el/pubs/archive/36356.pdf
> in order to trace the causal relationships between the different
> processing phases that an IO request may trigger. The goal is an end-to-end
> visualisation of the request's route in the system, accompanied by information
> concerning latencies in each processing phase. Thanks to LTTng this can happen
> with a minimal overhead and in realtime. In order to visualize the results Blkin
> was integrated with Twitter's Zipkin http://twitter.github.io/zipkin/
> (which is a tracing system entirely based on Dapper).
>
> These patches can also be found in https://github.com/agshew/ceph/tree/wip-blkin
This looks great! Do you mind opening a github pull request from that
branch? It's a bit more convenient for capturing review.
> In addition to cleanup, I've written a short document describing how to
> test Blkin tracing in Ceph (without Zipkin). See doc/dev/trace.rst
>
> Note that I have a question in to Marios concerning a compiler warning for
> ignoring the return value of write() in Message::init_trace_info().
> The same calls also use a hardcoded file descriptor 3. I'm guessing this
> code was just used by him for debugging Blkin and can be removed, but
> I've left it for the moment.
>
> In the immediate future I plan to:
>
> - push a wip-blkin branch to github.com/ceph and take advantage of gitbuilder test/qa
> - move the changes forward to ceph:master
> - add Andreas' tracepoints https://github.com/ceph/ceph/pull/2877 using Blkin
> and investigate how easy it is to select the level of tracing detail
>
> Questions:
>
> 1. Did I split the patches into sensible groups?
1 could be broken into the build changes and the msg/optracker code. It
looks like it unconditionally links against zipkin-cpp now, which we
probably don't want. Unless blkin is statically linked or something, but
I don't see anything in the patch that would do that yet. In any case,
having the build stuff in a separate patch helps.
The split for the rest looks fine. Need to look at the changes to osd
init carefully as it is a bit delicate.
> 2. How low is LTTng's overhead? Is it entirely eliminated when not enabled?
>
> Do we need to take advantage of something like the Linux kernel's CONFIG_DYNAMIC_FTRACE
> trick, where a special mcount() function is converted back and forth between
> a NOP and trace calls? See http://lwn.net/Articles/365835/ for a little more
> detail.
I always assumed that lttng was doing something like this, but I don't see
a clear explanation of what an inactive tracepoint looks like anywhere..
sage
> 3. Also on the topic of performance, does the API for adding keyvalues need versions
> of annotations that used tracing functions with vectorized arguments? For instance,
> when many details about an event are required (e.g. read vs. write, length, etc.)
> or if multiple types of events are created simultaneously?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 0/6] blkin (LTTng + Zipkin) tracing
2014-11-13 16:14 ` Sage Weil
@ 2014-11-13 17:56 ` Andrew Shewmaker
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Shewmaker @ 2014-11-13 17:56 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Thu, Nov 13, 2014 at 08:14:48AM -0800, Sage Weil wrote:
> On Wed, 12 Nov 2014, Andrew Shewmaker wrote:
<snip>
> > In general, Blkin implements the tracing semantics described in the Dapper
> > paper http://static.googleusercontent.com/media/research.google.com/el/pubs/archive/36356.pdf
> > in order to trace the causal relationships between the different
> > processing phases that an IO request may trigger. The goal is an end-to-end
> > visualisation of the request's route in the system, accompanied by information
> > concerning latencies in each processing phase. Thanks to LTTng this can happen
> > with a minimal overhead and in realtime. In order to visualize the results Blkin
> > was integrated with Twitter's Zipkin http://twitter.github.io/zipkin/
> > (which is a tracing system entirely based on Dapper).
> >
> > These patches can also be found in https://github.com/agshew/ceph/tree/wip-blkin
>
> This looks great! Do you mind opening a github pull request from that
> branch? It's a bit more convenient for capturing review.
I'll do that, but first I need to make changes to autoconf/automake
for blkin. It isn't actually building.
I had been going down the road of treating it simply
as a separate package from ceph, then decided to include it as a
submodule. My branch only built on my system because I had already
installed blkin separately.
<snip>
> > In the immediate future I plan to:
> >
> > - push a wip-blkin branch to github.com/ceph and take advantage of gitbuilder test/qa
> > - move the changes forward to ceph:master
> > - add Andreas' tracepoints https://github.com/ceph/ceph/pull/2877 using Blkin
> > and investigate how easy it is to select the level of tracing detail
> >
> > Questions:
> >
> > 1. Did I split the patches into sensible groups?
>
> 1 could be broken into the build changes and the msg/optracker code. It
> looks like it unconditionally links against zipkin-cpp now, which we
> probably don't want. Unless blkin is statically linked or something, but
> I don't see anything in the patch that would do that yet. In any case,
> having the build stuff in a separate patch helps.
Right. Makes sense. I'll split that out for version 2.
> The split for the rest looks fine. Need to look at the changes to osd
> init carefully as it is a bit delicate.
>
> > 2. How low is LTTng's overhead? Is it entirely eliminated when not enabled?
> >
> > Do we need to take advantage of something like the Linux kernel's CONFIG_DYNAMIC_FTRACE
> > trick, where a special mcount() function is converted back and forth between
> > a NOP and trace calls? See http://lwn.net/Articles/365835/ for a little more
> > detail.
>
> I always assumed that lttng was doing something like this, but I don't see
> a clear explanation of what an inactive tracepoint looks like anywhere..
Yeah, I suppose they must, but I couldn't find a nice short
explanation either. LWN has a couple of articles that don't go deeply
into it. At most, they mention the use of asm gotos
(http://lwn.net/Articles/491543/).
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-11-13 17:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-12 23:19 [PATCH 0/6] blkin (LTTng + Zipkin) tracing Andrew Shewmaker
2014-11-13 16:14 ` Sage Weil
2014-11-13 17:56 ` Andrew Shewmaker
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.