All of lore.kernel.org
 help / color / mirror / Atom feed
* (unknown), 
@ 2015-03-11  2:43 Andrew Shewmaker
  2015-03-11  2:43 ` [PATCH V6 1/5] Build support for BlkKin (LTTng + Zipkin) tracing Andrew Shewmaker
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Andrew Shewmaker @ 2015-03-11  2:43 UTC (permalink / raw)
  To: ceph-devel; +Cc: agshew, marioskogias, chendi.xue

The following patches are based on the work of Marios Kogias, first
posted in August. http://www.spinics.net/lists/ceph-devel/msg19890.html
This patch is against HEAD as of March 10th,
commit 5d5b510810e96503b9323b010149f7bd5b45db7c.
It can also be found at https://github.com/agshew/ceph/tree/wip-blkin-v6

Thanks to Josh Durgin again for comments on the V5 patchset.

I think the blkin patchset is looking pretty good at this point.


Outstanding issues:

1) librados will need more general blkin tracing
currently it only has aio_read_traced() and aio_write_traced() calls

2) some work will need to be done filtering blkin events/keyvalues


To Do:

 1. push a wip-blkin branch to github.com/ceph and take advantage of gitbuilder test/qa
 2. submit a pull request
 3. add Andreas' tracepoints https://github.com/ceph/ceph/pull/2877 using Blkin
    and investigate how easy it is to select the level of tracing detail


Changes since V5:

  * put MOSDOp encode(features) statement back that was accidentally left out in V5
  * moved OSD daemonize call back to original spot
  * initialized blkin in ceph-mds (and moved all initializations to first patch)
  * updated aio_read_traced() and aio_write_traced() to match non-traced versions
  * improved blkin wrapper readability by removing unnecessary stringification

Changes since V4:

  * removed messenger_end trace event
    In Pipe::reader(), when message is enqueued, it will be destroyed.
    Naive pointer checks don't work here. You can't depend on
    pointers being set to null on destruction. It may be possible to wrap
    trace event with m->get() and m->put() to keep it around, or put this
    trace event in dispatch paths, but just removing trace event for now
    in order to move forward.
  * removed mutex in aio_*_traced() methods
    A mutex was carried forward from Marios' original patch while rebasing
    when it should have been removed.
  * removed Message::trace_end_after_span
    Message::trace_end_after_span did not ever appear to be true, so
    it has been removed
  * added asserts for master trace and endpoint checks
    Tried to use asserts in more places, but they prevented execution.
    Tried to use douts and ldouts instead, but they didn't work.
  * added parens around macro pointer args
    parens make it safer to use pointers passed as arguments in a macro

Changes since V2:

  * WITH_BLKIN added to makefile vars when necessary
  * added Blkin build instructions
  * added Zipkin build instructions
  * Blkin wrapper macros do not stringify args any longer.
    The macro wrappers will be more flexible/robust if they don't
    turn arguments into strings.
  * added missing blkin_trace_info struct prototype to librados.h
  * TrackedOp trace creation methods are virtual, implemented in OpRequest
  * avoid faults due to non-existent traces
    Check if osd_trace exists when creating a pg_trace, etc.
    Return true only if trace creation was successful.
    Use dout() if trace_osd, trace_pg, etc. fail, in order to ease debugging.
  * create trace_osd in ms_fast_dispatch

Changes since V1:
  * split build changes into separate patch
  * conditional build support for blkin (default off)
  * blkin is not a Ceph repo submodule
    build and install packages from https://github.com/agshew/blkin.git
    Note: rpms don't support babeltrace plugins for use with Zipkin
  * removal of debugging in Message::init_trace_info()

With this patchset Ceph can use Blkin, a library created by
Marios Kogias and others, which enables tracking a specific request
from the time it enters the system at higher levels till it is finally
served by RADOS.

In general, Blkin implements the tracing semantics described in the Dapper
paper http://static.googleusercontent.com/media/research.google.com/el/pubs/archive/36356.pdf
in order to trace the causal relationships between the different
processing phases that an IO request may trigger. The goal is an end-to-end
visualisation of the request's route in the system, accompanied by information
concerning latencies in each processing phase. Thanks to LTTng this can happen
with a minimal overhead and in realtime. In order to visualize the results Blkin
was integrated with Twitter's Zipkin http://twitter.github.io/zipkin/
(which is a tracing system entirely based on Dapper).

A short document describing how to test Blkin tracing in Ceph with Zipkin
is in doc/dev/trace.rst

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V6 1/5] Build support for BlkKin (LTTng + Zipkin) tracing
  2015-03-11  2:43 (unknown), Andrew Shewmaker
@ 2015-03-11  2:43 ` Andrew Shewmaker
  2015-03-11  2:43 ` [PATCH V6 2/5] Initial support for blkin " Andrew Shewmaker
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Andrew Shewmaker @ 2015-03-11  2:43 UTC (permalink / raw)
  To: ceph-devel; +Cc: agshew, marioskogias, chendi.xue

 * Adds --with-blkin to autoconf (default without)
 * Adds BLKIN_LIBS to necessary automake files
 * Adds documentation for testing BlkKin tracing
 * Adds ztrace initialization

Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
Signed-off-by: Marios-Evaggelos Kogias <marioskogias@gmail.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
---
 configure.ac             |   11 +++
 do_autogen.sh            |    5 +
 doc/dev/blkin.rst        |  168 +++++++++++++++++++++++++++++++++++++++++++++++
 src/Makefile-env.am      |    6 +
 src/Makefile.am          |   14 +++
 src/ceph_mds.cc          |    2 
 src/ceph_mon.cc          |    2 
 src/ceph_osd.cc          |    2 
 src/librados/Makefile.am |    9 ++
 src/librbd/Makefile.am   |   10 ++
 src/mon/Makefile.am      |   11 ++-
 src/msg/Makefile.am      |    8 ++
 src/os/Makefile.am       |    8 ++
 src/osd/Makefile.am      |    9 ++
 src/osdc/Makefile.am     |    9 ++
 15 files changed, 271 insertions(+), 3 deletions(-)
 create mode 100644 doc/dev/blkin.rst

diff --git a/configure.ac b/configure.ac
index 05f0cf9..78d431d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -333,6 +333,17 @@ if test "x$enable_coverage" != xno; then
 fi
 AC_SUBST(GCOV_PREFIX_STRIP, `echo $(pwd)/src | tr -dc / | wc -c`)
 
+# blkin (lttng+zipkin) tracing?
+AC_ARG_WITH([blkin],
+	    [AS_HELP_STRING([--with-blkin], [blkin (lttng + zipkin) tracing])],
+	    [],
+	    [with_blkin=no])
+have_blkin=no
+AS_IF([test "x$with_blkin" == "xyes"],
+    [PKG_CHECK_MODULES([BLKIN], [blkin], [have_blkin=yes])])
+AM_CONDITIONAL(WITH_BLKIN, test "x$have_blkin" == xyes)
+AM_COND_IF([WITH_BLKIN], [AC_DEFINE([WITH_BLKIN], [1], [Defined if using BlkKin])])
+
 # radosgw?
 AC_ARG_WITH([radosgw],
             [AS_HELP_STRING([--with-radosgw], [build RADOS gateway])],
diff --git a/do_autogen.sh b/do_autogen.sh
index 7df1c93..0e50654 100755
--- a/do_autogen.sh
+++ b/do_autogen.sh
@@ -5,6 +5,7 @@ usage() {
 do_autogen.sh: make a ceph build by running autogen, etc.
 
 -h:                              this help message
+-b                               blkin tracing
 -d <level>                       debug build
                                  level 0: no debug
                                  level 1: -g
@@ -32,9 +33,11 @@ debug_level=0
 verbose=0
 profile=0
 CONFIGURE_FLAGS="--disable-static --with-lttng"
-while getopts  "d:e:hHrTPJLjpcvO:" flag
+while getopts  "bd:e:hHrTPJLjpcvO:" flag
 do
     case $flag in
+    b) CONFIGURE_FLAGS="$CONFIGURE_FLAGS --with-blkin";;
+
     d) debug_level=$OPTARG;;
 
     O) CFLAGS="${CFLAGS} -O$OPTARG";;
diff --git a/doc/dev/blkin.rst b/doc/dev/blkin.rst
new file mode 100644
index 0000000..bf14142
--- /dev/null
+++ b/doc/dev/blkin.rst
@@ -0,0 +1,168 @@
+=========================
+ Tracing Ceph With BlkKin
+=========================
+
+Ceph can use Blkin, a library created by Marios Kogias and others,
+which enables tracking a specific request from the time it enters
+the system at higher levels till it is finally served by RADOS.
+
+In general, Blkin implements the Dapper_ tracing semantics
+in order to show the causal relationships between the different
+processing phases that an IO request may trigger. The goal is an
+end-to-end visualisation of the request's route in the system,
+accompanied by information concerning latencies in each processing
+phase. Thanks to LTTng this can happen with a minimal overhead and
+in realtime. The LTTng traces can then be visualized with Twitter's
+Zipkin_.
+
+.. _Dapper: http://static.googleusercontent.com/media/research.google.com/el//pubs/archive/36356.pdf
+.. _Zipkin: http://twitter.github.io/zipkin/
+
+
+Installing Blkin
+================
+
+You can install Markos Kogias' upstream Blkin_ by hand.::
+
+  cd blkin/
+  make && make install
+
+ or build
+distribution packages using DistroReadyBlkin_, which also comes
+with pkgconfig support. If you choose the latter, then you must
+generate the configure and make files first.::
+
+  cd blkin
+  autoreconf -i
+
+.. _Blkin: https://github.com/marioskogias/blkin
+.. _DistroReadyBlkin: https://github.com/agshew/blkin
+
+
+Configuring Ceph with Blkin
+===========================
+
+If you built and installed Blkin by hand, rather than building and
+installing packages, then set these variables before configuring
+Ceph.::
+
+  export BLKIN_CFLAGS=-Iblkin/
+  export BLKIN_LIBS=-lzipkin-cpp
+
+Since there are separate lttng and blkin changes to Ceph, you may
+want to configure with something like::
+
+  ./configure --with-blkin --without-lttng --with-debug
+
+
+Testing Blkin
+=============
+
+It's easy to test Ceph's Blkin tracing. Let's assume you don't have
+Ceph already running, and you compiled Ceph with Blkin support but
+you did't install it. Then launch Ceph with the ``vstart.sh`` script
+in Ceph's src directgory so you can see the possible tracepoints.::
+
+  cd src
+  OSD=3 MON=3 RGW=1 ./vstart.sh -n
+  lttng list --userspace
+
+You'll see something like the following:::
+
+  UST events:
+  -------------
+  PID: 8987 - Name: ./ceph-osd
+        zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
+        zipkin:keyval (loglevel: TRACE_WARNING (4)) (type: tracepoint)
+        ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
+
+  PID: 8407 - Name: ./ceph-mon
+        zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
+        zipkin:keyval (loglevel: TRACE_WARNING (4)) (type: tracepoint)
+        ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
+
+  ...
+
+Next, stop Ceph so that the tracepoints can be enabled.::
+
+  ./stop.sh
+
+Start up an LTTng session and enable the tracepoints.::
+
+  lttng create blkin-test
+  lttng enable-event --userspace zipkin:timestamp
+  lttng enable-event --userspace zipkin:keyval
+  lttng start
+
+Then start up Ceph again.::
+
+  OSD=3 MON=3 RGW=1 ./vstart.sh -n
+
+You may want to check that ceph is up.::
+
+  ./ceph status
+
+Now put something in usin rados, check that it made it, get it back, and remove it.::
+
+  ./rados mkpool test-blkin
+  ./rados put test-object-1 ./vstart.sh --pool=test-blkin
+  ./rados -p test-blkin ls
+  ./ceph osd map test-blkin test-object-1
+  ./rados get test-object-1 ./vstart-copy.sh --pool=test-blkin
+  md5sum vstart*
+  ./rados rm test-object-1 --pool=test-blkin
+
+You could also use the example in ``examples/librados/`` or ``rados bench``.
+
+Then stop the LTTng session and see what was collected.::
+
+  lttng stop
+  lttng view
+
+You'll see something like:::
+
+  [13:09:07.755054973] (+?.?????????) scruffy zipkin:timestamp: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, event = "Message allocated" }
+  [13:09:07.755071569] (+0.000016596) scruffy zipkin:keyval: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, key = "Type", val = "MOSDOp" }
+  [13:09:07.755074217] (+0.000002648) scruffy zipkin:keyval: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, key = "Reqid", val = "client.4126.0:1" }
+  ...
+
+
+Install  Zipkin
+===============
+One of the points of using Blkin is so that you can look at the traces
+using Zipkin. Users should run Zipkin as a tracepoints collector and
+also a web service, which means users need to run three services,
+zipkin-collector, zipkin-query and zipkin-web.::
+
+Download Zipkin Package::
+
+  wget https://github.com/twitter/zipkin/archive/1.1.0.tar.gz
+  tar zxf 1.1.0.tar.gz
+  cd zipkin-1.1.0
+  bin/collector cassandra &
+  bin/query cassandra &
+  bin/web &
+
+Check Zipkin::
+
+  bin/test
+  Browse http://${zipkin-web-ip}:8080
+
+
+Show Ceph's Blkin Traces in Zipkin-web
+======================================
+Blkin provides a script which translates lttng result to Zipkin
+(Dapper) semantics.::
+
+Send lttng data to Zipkin::
+
+  python3 babeltrace_zipkin.py ${lttng-traces-dir}/${blkin-test}/ust/uid/0/64-bit/ -p ${zipkin-collector-port(9410 by default)} -s ${zipkin-collector-ip}
+
+Example::
+
+  python3 babeltrace_zipkin.py ~/lttng-traces-dir/blkin-test-20150225-160222/ust/uid/0/64-bit/ -p 9410 -s 127.0.0.1
+
+Check Ceph traces on webpage::
+
+  Browse http://${zipkin-web-ip}:8080
+  Click "Find traces"
diff --git a/src/Makefile-env.am b/src/Makefile-env.am
index 8716f15..dea5565 100644
--- a/src/Makefile-env.am
+++ b/src/Makefile-env.am
@@ -190,6 +190,12 @@ if ENABLE_COVERAGE
 EXTRALIBS += -lgcov
 endif # ENABLE_COVERAGE
 
+if WITH_BLKIN
+AM_CPPFLAGS += -DWITH_BLKIN
+AM_CXXFLAGS += $(BLKIN_CFLAGS)
+EXTRALIBS += $(BLKIN_LIBS)
+endif
+
 # Libosd always needs osdc and os
 LIBOSD += $(LIBOSDC) $(LIBOS)
 
diff --git a/src/Makefile.am b/src/Makefile.am
index 116be9d..1a5035a 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -40,11 +40,25 @@ include tools/Makefile.am
 # core daemons
 
 ceph_mon_SOURCES = ceph_mon.cc
+ceph_mon_CPPFLAGS = $(AM_CPPFLAGS)
+ceph_mon_CXXFLAGS = $(AM_CXXFLAGS)
 ceph_mon_LDADD = $(LIBMON) $(LIBOS) $(CEPH_GLOBAL) $(LIBCOMMON)
+if WITH_BLKIN
+ceph_mon_CPPFLAGS += -DWITH_BLKIN
+ceph_mon_CXXFLAGS += $(BLKIN_CFLAGS)
+ceph_mon_LDADD += $(BLKIN_LIBS)
+endif
 bin_PROGRAMS += ceph-mon
 
 ceph_osd_SOURCES = ceph_osd.cc
+ceph_osd_CPPFLAGS = $(AM_CPPFLAGS)
+ceph_osd_CXXFLAGS = $(AM_CXXFLAGS)
 ceph_osd_LDADD = $(LIBOSD) $(CEPH_GLOBAL) $(LIBCOMMON)
+if WITH_BLKIN
+ceph_osd_CPPFLAGS += -DWITH_BLKIN
+ceph_osd_CXXFLAGS += $(BLKIN_CFLAGS)
+ceph_osd_LDADD += $(BLKIN_LIBS)
+endif
 bin_PROGRAMS += ceph-osd
 
 ceph_mds_SOURCES = ceph_mds.cc
diff --git a/src/ceph_mds.cc b/src/ceph_mds.cc
index 06269fb..182bd9d 100644
--- a/src/ceph_mds.cc
+++ b/src/ceph_mds.cc
@@ -198,6 +198,8 @@ int main(int argc, const char **argv)
 
   msgr->start();
 
+  BLKIN_ZTRACE_INIT();
+
   // start mds
   mds = new MDS(g_conf->name.get_id().c_str(), msgr, &mc);
 
diff --git a/src/ceph_mon.cc b/src/ceph_mon.cc
index e0ca6e2..6c3f03f 100644
--- a/src/ceph_mon.cc
+++ b/src/ceph_mon.cc
@@ -509,6 +509,8 @@ int main(int argc, const char **argv)
       prefork.exit(1);
   }
 
+  BLKIN_ZTRACE_INIT();
+
   MonitorDBStore *store = new MonitorDBStore(g_conf->mon_data);
   err = store->open(std::cerr);
   if (err < 0) {
diff --git a/src/ceph_osd.cc b/src/ceph_osd.cc
index 884b7ed..469468f 100644
--- a/src/ceph_osd.cc
+++ b/src/ceph_osd.cc
@@ -186,6 +186,8 @@ int main(int argc, const char **argv)
     usage();
   }
 
+  BLKIN_ZTRACE_INIT();
+
   // the store
   ObjectStore *store = ObjectStore::create(g_ceph_context,
 					   g_conf->osd_objectstore,
diff --git a/src/librados/Makefile.am b/src/librados/Makefile.am
index 8e6de3f..59ee46d 100644
--- a/src/librados/Makefile.am
+++ b/src/librados/Makefile.am
@@ -14,6 +14,8 @@ librados_la_SOURCES = \
 	common/buffer.cc \
 	librados/librados.cc
 
+librados_la_CPPFLAGS = $(AM_CPPFLAGS)
+
 # We need this to avoid basename conflicts with the librados build tests in test/Makefile.am
 librados_la_CXXFLAGS = ${AM_CXXFLAGS}
 
@@ -25,6 +27,13 @@ LIBRADOS_DEPS += $(LIBRADOS_TP)
 endif
 
 librados_la_LIBADD = $(LIBRADOS_DEPS) $(PTHREAD_LIBS) $(CRYPTO_LIBS) $(EXTRALIBS)
+
+if WITH_BLKIN
+librados_la_CPPFLAGS += -DWITH_BLKIN
+librados_la_CXXFLAGS += $(BLKIN_CXXFLAGS)
+librados_la_LIBADD += $(BLKIN_LIBS)
+endif
+
 librados_la_LDFLAGS = ${AM_LDFLAGS} -version-info 2:0:0
 if LINUX
 librados_la_CXXFLAGS += -fvisibility=hidden -fvisibility-inlines-hidden
diff --git a/src/librbd/Makefile.am b/src/librbd/Makefile.am
index a4b21d9..9cc6460 100644
--- a/src/librbd/Makefile.am
+++ b/src/librbd/Makefile.am
@@ -37,11 +37,19 @@ if WITH_LTTNG
 librbd_la_LIBADD += $(LIBRBD_TP)
 endif
 
+librbd_la_CPPFLAGS = $(AM_CPPFLAGS)
+librbd_la_CXXFLAGS = $(AM_CXXFLAGS)
 librbd_la_LDFLAGS = ${AM_LDFLAGS} -version-info 1:0:0
 if LINUX
-librbd_la_CXXFLAGS = -fvisibility=hidden -fvisibility-inlines-hidden
+librbd_la_CXXFLAGS += -fvisibility=hidden -fvisibility-inlines-hidden
 librbd_la_LDFLAGS += -Xcompiler -Xlinker -Xcompiler '--exclude-libs=ALL'
 endif
+
+if WITH_BLKIN
+librbd_la_CPPFLAGS += -DWITH_BLKIN
+librbd_la_CXXFLAGS += $(BLKIN_CFLAGS)
+librbd_la_LIBADD += $(BLKIN_LIBS)
+endif
 lib_LTLIBRARIES += librbd.la
 
 noinst_HEADERS += \
diff --git a/src/mon/Makefile.am b/src/mon/Makefile.am
index 6016355..8ab5063 100644
--- a/src/mon/Makefile.am
+++ b/src/mon/Makefile.am
@@ -17,7 +17,17 @@ libmon_la_SOURCES = \
 	mon/HealthMonitor.cc \
 	mon/DataHealthService.cc \
 	mon/ConfigKeyService.cc
+
+libmon_la_CPPFLAGS = $(AM_CPPFLAGS)
+libmon_la_CXXFLAGS = $(AM_CXXFLAGS)
 libmon_la_LIBADD = $(LIBAUTH) $(LIBCOMMON) $(LIBOS) $(LIBMON_TYPES)
+
+if WITH_BLKIN
+libmon_la_CPPFLAGS += -DWITH_BLKIN
+libmon_la_CXXFLAGS += $(BLKIN_CFLAGS)
+libmon_la_LIBADD += $(BLKIN_LIBS)
+endif
+
 noinst_LTLIBRARIES += libmon.la
 
 noinst_HEADERS += \
@@ -46,4 +56,3 @@ noinst_HEADERS += \
 	mon/QuorumService.h \
 	mon/Session.h \
 	mon/mon_types.h
-
diff --git a/src/msg/Makefile.am b/src/msg/Makefile.am
index 6ef821b..2230789 100644
--- a/src/msg/Makefile.am
+++ b/src/msg/Makefile.am
@@ -3,6 +3,14 @@ libmsg_la_SOURCES = \
 	msg/Messenger.cc \
 	msg/msg_types.cc
 
+libmsg_la_CPPFLAGS = $(AM_CPPFLAGS)
+libmsg_la_CXXFLAGS = $(AM_CXXFLAGS)
+if WITH_BLKIN
+libmsg_la_CPPFLAGS += -DWITH_BLKIN
+libmsg_la_CXXFLAGS += $(BLKIN_CFLAGS)
+libmsg_la_LIBADD = $(BLKIN_LIBS)
+endif
+
 noinst_HEADERS += \
 	msg/Connection.h \
 	msg/Dispatcher.h \
diff --git a/src/os/Makefile.am b/src/os/Makefile.am
index d12adba..77aa56a 100644
--- a/src/os/Makefile.am
+++ b/src/os/Makefile.am
@@ -36,12 +36,20 @@ if WITH_LIBZFS
 libos_la_SOURCES += os/ZFSFileStoreBackend.cc
 endif
 
+libos_la_CPPFLAGS = $(AM_CPPFLAGS)
 libos_la_CXXFLAGS = ${AM_CXXFLAGS}
 libos_la_LIBADD = $(LIBOS_TYPES)
+
 if WITH_LTTNG
 libos_la_LIBADD += $(LIBOS_TP)
 endif
 
+if WITH_BLKIN
+libos_la_CPPFLAGS += -DWITH_BLKIN
+libos_la_CXXFLAGS += $(BLKIN_CFLAGS)
+libos_la_LIBADD += $(BLKIN_LIBS)
+endif
+
 noinst_LTLIBRARIES += libos.la
 
 noinst_HEADERS += \
diff --git a/src/osd/Makefile.am b/src/osd/Makefile.am
index 2338f18..4371ba6 100644
--- a/src/osd/Makefile.am
+++ b/src/osd/Makefile.am
@@ -24,11 +24,20 @@ libosd_la_SOURCES = \
 	osd/SnapMapper.cc \
 	objclass/class_api.cc
 
+libosd_la_CPPFLAGS = $(AM_CPPFLAGS)
 libosd_la_CXXFLAGS = ${AM_CXXFLAGS}
 libosd_la_LIBADD = $(LIBOSDC) $(LIBOS) $(LIBOSD_TYPES) $(LIBOS_TYPES)
+
 if WITH_LTTNG
 libosd_la_LIBADD += $(LIBOSD_TP)
 endif
+
+if WITH_BLKIN
+libosd_la_CPPFLAGS += -DWITH_BLKIN
+libosd_la_CXXFLAGS += $(BLKIN_CFLAGS)
+libosd_la_LIBADD += $(BLKIN_LIBS)
+endif
+
 noinst_LTLIBRARIES += libosd.la
 
 noinst_HEADERS += \
diff --git a/src/osdc/Makefile.am b/src/osdc/Makefile.am
index 3ba446b..7b56637 100644
--- a/src/osdc/Makefile.am
+++ b/src/osdc/Makefile.am
@@ -4,6 +4,15 @@ libosdc_la_SOURCES = \
 	osdc/Filer.cc \
 	osdc/Striper.cc \
 	osdc/Journaler.cc
+
+libosdc_la_CPPFLAGS = $(AM_CPPFLAGS)
+libosdc_la_CXXFLAGS = $(AM_CXXFLAGS)
+if WITH_BLKIN
+libosdc_la_CPPFLAGS += -DWITH_BLKIN
+libosdc_la_CXXFLAGS += $(BLKIN_CFLAGS)
+libosdc_la_LIBADD = $(BLKIN_LIBS)
+endif
+
 noinst_LTLIBRARIES += libosdc.la
 
 noinst_HEADERS += \

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V6 2/5] Initial support for blkin (LTTng + Zipkin) tracing
  2015-03-11  2:43 (unknown), Andrew Shewmaker
  2015-03-11  2:43 ` [PATCH V6 1/5] Build support for BlkKin (LTTng + Zipkin) tracing Andrew Shewmaker
@ 2015-03-11  2:43 ` Andrew Shewmaker
  2015-03-11  2:43 ` [PATCH V6 3/5] OSD " Andrew Shewmaker
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Andrew Shewmaker @ 2015-03-11  2:43 UTC (permalink / raw)
  To: ceph-devel; +Cc: agshew, marioskogias, chendi.xue

 * Adds conditional build support.
 * Adds BLKIN_* macro wrappers to reduce need for ifdef WITH_BLKIN.
 * Adds blkin header include.
 * Adds message infrastructure for blkin tracing.
 * Adds blkin trace events and keyvals to OpTracker.
 * Adds osd, pg, journal, and filestore blkin traces to OpTracker/OpRequest.

These changes are Marios', with the following exceptions:
 * split out initial support for Marios' additional tracing changes
 * conditional build and BLKIN_* macro wrappers
 * blkin include path is "blkin/ztracer.hpp"
 * only Message.h includes blkin, as others include Message.h
 * commented code has been removed
 * unused lname var/code in SimpleMessenger has been removed
 * excluded messenger_trace because it appears not to be used
 * braces added to conditionals to match recommended coding style
   note: did not change member variable names to use m_ prefix
         since member vars in same class did not
 * Ref suffix added to TrackedOp shared_ptr vars to be consistent
   note: did not add -Ref suffix to ZTrace*Ref vars in Message
 * TrackedOp trace creation methods are virtual, implemented in OpRequest
 * avoid faults due to non-existent traces
   Check if osd_trace exists when creating a pg_trace, etc.
   Return true only if trace creation was successful.
   Use dout() if trace_osd, trace_pg, etc. fail, in order to ease debugging.

Signed-off-by: Marios-Evaggelos Kogias <marioskogias@gmail.com>
Signed-off-by: Filippos Giannakos <philipgian@grnet.gr>
Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
---
 src/common/TrackedOp.cc           |  82 ++++++++++++++++++
 src/common/TrackedOp.h            |  26 ++++++
 src/common/blkin_wrapper.h        | 171 ++++++++++++++++++++++++++++++++++++++
 src/msg/Message.cc                |  76 +++++++++++++++++
 src/msg/Message.h                 |  25 ++++++
 src/msg/msg_types.cc              |  23 ++++-
 src/msg/msg_types.h               |   1 +
 src/msg/simple/Pipe.cc            |  31 +++++++
 src/msg/simple/Pipe.h             |   4 +
 src/msg/simple/SimpleMessenger.cc |   1 +
 10 files changed, 437 insertions(+), 3 deletions(-)
 create mode 100644 src/common/blkin_wrapper.h

diff --git a/src/common/TrackedOp.cc b/src/common/TrackedOp.cc
index 32dbc53..611911f 100644
--- a/src/common/TrackedOp.cc
+++ b/src/common/TrackedOp.cc
@@ -293,8 +293,22 @@ void OpTracker::_mark_event(TrackedOp *op, const string &evt,
      
 }
 
+#ifdef WITH_BLKIN
+void OpTracker::trace_event(TrackedOp *op, TrackedOpTraceRef t, const string &evt, TrackedOpEndpointRef ep)
+{
+  t->event(evt, ep);
+}
+
+void OpTracker::trace_keyval(TrackedOp *op, TrackedOpTraceRef t, const string &key,
+			     const string &val, TrackedOpEndpointRef ep)
+{
+  t->keyval(key, val, ep);
+}
+#endif
+
 void OpTracker::RemoveOnDelete::operator()(TrackedOp *op) {
   op->mark_event("done");
+  BLKIN_OP_TRACE_EVENT(op, osd, "span_ended");
   if (!tracker->tracking_enabled) {
     op->_unregistered();
     delete op;
@@ -332,3 +346,71 @@ void TrackedOp::dump(utime_t now, Formatter *f) const
     f->close_section();
   }
 }
+
+#ifdef WITH_BLKIN
+void TrackedOp::trace_osd(string event)
+{
+  if (!osd_trace) {
+    dout(5) << "trace_osd failed, osd_trace doesn't exist, event: " << event << dendl;
+    return;
+  }
+
+  tracker->trace_event(this, osd_trace, event, osd_trace->get_endpoint());
+}
+
+void TrackedOp::trace_osd(string key, string val)
+{
+  if (!osd_trace) {
+    dout(5) << "trace_osd failed, osd_trace doesn't exist, key: " << key << " val: " << val << dendl;
+    return;
+  }
+
+  tracker->trace_keyval(this, osd_trace, key, val, osd_trace->get_endpoint());
+}
+
+void TrackedOp::trace_pg(string event)
+{
+  if (!pg_trace) {
+    dout(5) << "trace_pg failed, pg_trace doesn't exist, event: " << event << dendl;
+    return;
+  } else if (!osd_trace) {
+    dout(5) << "trace_pg failed, osd_trace doesn't exist, event: " << event << dendl;
+    return;
+  }
+
+  tracker->trace_event(this, osd_trace, event, pg_trace->get_endpoint());
+}
+
+void TrackedOp::get_pg_trace_info(struct blkin_trace_info *info)
+{
+  if (!pg_trace) {
+    dout(5) << "get_pg_trace failed, pg_trace doesn't exist" << dendl;
+    return;
+  } else if (!osd_trace) {
+    dout(5) << "get_pg_trace failed, osd_trace doesn't exist" << dendl;
+    return;
+  }
+
+  osd_trace->get_trace_info(info);
+}
+
+void TrackedOp::trace_journal(string event)
+{
+  if (!journal_trace) {
+    dout(5) << "trace_journal failed, journal_trace doesn't exist, event: " << event << dendl;
+    return;
+  }
+
+  tracker->trace_event(this, journal_trace, event, journal_trace->get_endpoint());
+}
+
+void TrackedOp::trace_filestore(string event)
+{
+  if (!filestore_trace) {
+    dout(5) << "trace_filestore failed, filestore_trace doesn't exist, event: " << event << dendl;
+    return;
+  }
+
+  tracker->trace_event(this, filestore_trace, event, filestore_trace->get_endpoint());
+}
+#endif // WITH_BLKIN
diff --git a/src/common/TrackedOp.h b/src/common/TrackedOp.h
index 42d1eaf..fc861f7 100644
--- a/src/common/TrackedOp.h
+++ b/src/common/TrackedOp.h
@@ -116,6 +116,11 @@ public:
   void mark_event(TrackedOp *op, const string &evt,
                           utime_t time = ceph_clock_now(g_ceph_context));
 
+#ifdef WITH_BLKIN
+  void trace_event(TrackedOp *op, TrackedOpTraceRef t, const string &evt, TrackedOpEndpointRef ep);
+  void trace_keyval(TrackedOp *op, TrackedOpTraceRef t, const string &key,
+		    const string &val, TrackedOpEndpointRef ep);
+#endif
   void on_shutdown() {
     history.on_shutdown();
   }
@@ -150,6 +155,13 @@ protected:
   string current; /// the current state the event is in
   uint64_t seq; /// a unique value set by the OpTracker
 
+#ifdef WITH_BLKIN
+  TrackedOpTraceRef osd_trace;
+  TrackedOpTraceRef pg_trace;
+  TrackedOpTraceRef journal_trace;
+  TrackedOpTraceRef filestore_trace;
+#endif
+
   uint32_t warn_interval_multiplier; // limits output of a given op warning
 
   TrackedOp(OpTracker *_tracker, const utime_t& initiated) :
@@ -191,6 +203,20 @@ public:
     return events.rbegin()->second.c_str();
   }
   void dump(utime_t now, Formatter *f) const;
+
+#ifdef WITH_BLKIN
+  virtual bool create_osd_trace(TrackedOpEndpointRef ep) { return false; }
+  virtual bool create_pg_trace(TrackedOpEndpointRef ep) { return false; }
+  virtual bool create_journal_trace(TrackedOpEndpointRef ep) { return false; }
+  virtual bool create_filestore_trace(TrackedOpEndpointRef ep) { return false; }
+  void trace_osd(string event);
+  void trace_osd(string key, string val);
+  void trace_pg(string event);
+  void get_pg_trace_info(struct blkin_trace_info *info);
+  void trace_journal(string event);
+  void trace_filestore(string event);
+  TrackedOpTraceRef get_osd_trace() { return osd_trace; };
+#endif // WITH_BLKIN
 };
 
 #endif
diff --git a/src/common/blkin_wrapper.h b/src/common/blkin_wrapper.h
new file mode 100644
index 0000000..04b8ced
--- /dev/null
+++ b/src/common/blkin_wrapper.h
@@ -0,0 +1,171 @@
+// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
+// vim: ts=8 sw=2 smarttab
+#ifdef WITH_BLKIN
+
+// string helper
+#define BLKIN_OSS(OSS, A) \
+  ostringstream OSS; \
+  OSS << A;
+
+// src/msg/Message.h
+#define BLKIN_GET_MASTER(MT) \
+  ZTracer::ZTraceRef MT = get_master_trace();
+
+#define BLKIN_MSG_GET_MASTER(MT, MSG) \
+  ZTracer::ZTraceRef MT = (MSG)->get_master_trace();
+
+#define BLKIN_ZTRACE_INIT() do { ZTracer::ztrace_init(); } while(0)
+
+#define BLKIN_INIT_TRACE(T) do { init_trace_info(T); } while(0)
+
+#define BLKIN_MSG_INIT_TRACE(M, T) do { (M)->init_trace_info(T); } while(0)
+
+#define BLKIN_MSG_INIT_TRACE_IF(C, M, T) \
+  do { if (C) (M)->init_trace_info(T); } while(0)
+
+#define BLKIN_MSG_DO_INIT_TRACE() \
+  struct blkin_trace_info tinfo; \
+  tinfo.trace_id = 0; \
+  tinfo.span_id = 0; \
+  tinfo.parent_span_id = 0;
+
+#define BLKIN_MSG_ENCODE_TRACE() \
+  do { \
+    if (mt) { \
+      struct blkin_trace_info tinfo; \
+      mt->get_trace_info(&tinfo); \
+      ::encode(tinfo.trace_id, payload); \
+      ::encode(tinfo.span_id, payload); \
+      ::encode(tinfo.parent_span_id, payload); \
+    } else { \
+      int64_t zero = 0; \
+      ::encode(zero, payload); \
+      ::encode(zero, payload); \
+      ::encode(zero, payload); \
+    }; \
+  } while(0)
+
+#define BLKIN_MSG_DECODE_TRACE(V) \
+  do { \
+    if (header.version >= V) { \
+      ::decode(tinfo.trace_id, p); \
+      ::decode(tinfo.span_id, p); \
+      ::decode(tinfo.parent_span_id, p); \
+    }; \
+    init_trace_info(&tinfo); \
+  } while(0)
+
+#define BLKIN_MSG_CHECK_SPAN() \
+  do { \
+    struct blkin_trace_info tinfo; \
+    ZTracer::ZTraceRef mt = req->get_master_trace(); \
+    assert(mt != NULL); \
+    mt->get_trace_info(&tinfo); \
+  } while(0)
+
+#define BLKIN_MSG_INFO_DECL(TYPE) \
+  void trace_msg_info() \
+  { \
+    assert(master_trace != NULL); \
+    ostringstream oss; \
+    oss << get_reqid(); \
+    master_trace->keyval("Type", TYPE); \
+    master_trace->keyval("Reqid", oss.str()); \
+  }
+
+#define BLKIN_MSG_END_DECL(TYPE) \
+  bool create_message_endpoint() \
+  { \
+    message_endpoint = ZTracer::create_ZTraceEndpoint("0.0.0.0", 0, TYPE); \
+    assert(message_endpoint != NULL); \
+    return true; \
+  }
+
+#define BLKIN_MSG_END(EP, IP, PORT, NAME) \
+  do { \
+    EP = ZTracer::create_ZTraceEndpoint(IP, PORT, NAME); \
+  } while(0)
+
+#define BLKIN_MSG_TRACE_EVENT(M, E) do { (M)->trace(E); } while(0)
+
+#define BLKIN_MSG_TRACE_EVENT_IF(C, M, E) do { if (C) (M)->trace(E); } while(0)
+
+#define BLKIN_MSG_TRACE_KEYVAL(M, K, V) do { (M)->trace(K, V); } while(0)
+
+// src/msg/Pipe.h
+#define BLKIN_PIPE_ENDPOINT() do { set_endpoint(); } while(0)
+
+// src/common/TrackedOp.h
+typedef ZTracer::ZTraceEndpointRef TrackedOpEndpointRef;
+typedef ZTracer::ZTraceRef TrackedOpTraceRef;
+
+#define BLKIN_END_REF(X) TrackedOpEndpointRef X;
+#define BLKIN_TRACE_REF(X) TrackedOpTraceRef X;
+
+#define BLKIN_CREATE_TRACE(REF, ...) \
+  do { REF = ZTracer::create_ZTrace(__VA_ARGS__); } while(0)
+
+#define BLKIN_OP_CREATE_TRACE(OP, TYPE, EP) \
+  do { (OP)->create_ ## TYPE ## _trace(EP); } while(0)
+
+#define BLKIN_OP_TRACE_EVENT(OP, TYPE, E) \
+  do { (OP)->trace_ ## TYPE (E); } while(0)
+
+#define BLKIN_TYPE_TRACE_EVENT(TYPE, E) \
+  do { trace_ ## TYPE (E); } while(0)
+
+#define BLKIN_OP_TRACE_KEYVAL(OP, TYPE, K, V) \
+  do { (OP)->trace_ ## TYPE (K, V); } while(0)
+
+#define BLKIN_OP_EVENT(O, E) do { (O)->event(E); } while(0)
+
+#define BLKIN_OP_EVENT_IF(C, O, E) do { if (C) (O)->event(E); } while(0)
+
+#define BLKIN_OP_SET_TRACE_DECL() \
+  void set_trace(TrackedOpTraceRef t) \
+  { \
+    trace = t; \
+  }
+
+#define BLKIN_OP_SET_TRACE(O, T) do { (O)->set_trace(T); } while(0)
+
+#else // Not WITH_BLKIN
+
+// string helper
+#define BLKIN_OSS(OSS, A) do { } while(0)
+
+// src/msg/Message.h
+#define BLKIN_GET_MASTER(MT) do { } while(0)
+#define BLKIN_MSG_GET_MASTER(MT, MSG) do { } while(0)
+#define BLKIN_ZTRACE_INIT() do { } while(0)
+#define BLKIN_INIT_TRACE(T) do { } while(0)
+#define BLKIN_MSG_INIT_TRACE(M, T) do { } while(0)
+#define BLKIN_MSG_INIT_TRACE_IF(C, M, T) do { } while(0)
+#define BLKIN_MSG_DO_INIT_TRACE() do { } while(0)
+#define BLKIN_MSG_ENCODE_TRACE() do { } while(0)
+#define BLKIN_MSG_DECODE_TRACE(V) do { } while(0)
+#define BLKIN_MSG_CHECK_SPAN() do { } while(0)
+#define BLKIN_MSG_INFO_DECL(TYPE)
+#define BLKIN_MSG_END_DECL(TYPE)
+#define BLKIN_MSG_END(EP, IP, PORT, NAME) do { } while(0)
+#define BLKIN_MSG_TRACE_EVENT(M, E) do { } while(0)
+#define BLKIN_MSG_TRACE_EVENT_IF(C, M, E) do { } while(0)
+#define BLKIN_MSG_TRACE_KEYVAL(M, K, V) do { } while(0)
+
+// src/msg/Pipe.h
+#define BLKIN_PIPE_ENDPOINT() do { } while(0)
+
+// src/common/TrackedOp.h
+#define BLKIN_END_REF(X)
+#define BLKIN_TRACE_REF(X)
+#define BLKIN_CREATE_TRACE(REF, ...) do { } while(0)
+#define BLKIN_OP_CREATE_TRACE(OP, TYPE, EP) do { } while(0)
+#define BLKIN_OP_TRACE_EVENT(OP, TYPE, E) do { } while(0)
+#define BLKIN_TYPE_TRACE_EVENT(TYPE, E) do { } while(0)
+#define BLKIN_OP_TRACE_KEYVAL(OP, TYPE, K, V) do { } while(0)
+#define BLKIN_OP_EVENT(O, E) do { } while(0)
+#define BLKIN_OP_EVENT_IF(C, O, E) do { } while(0)
+#define BLKIN_OP_SET_TRACE_DECL()
+#define BLKIN_OP_SET_TRACE(O, T) do { } while(0)
+
+#endif // WITH_BLKIN
diff --git a/src/msg/Message.cc b/src/msg/Message.cc
index 23f5179..968d368 100644
--- a/src/msg/Message.cc
+++ b/src/msg/Message.cc
@@ -817,3 +817,79 @@ Message *decode_message(CephContext *cct, int crcflags, bufferlist::iterator& p)
   return decode_message(cct, crcflags, h, f, fr, mi, da);
 }
 
+#ifdef WITH_BLKIN
+int Message::trace_basic_info()
+{
+  if (!master_trace) {
+    return 0;
+  }
+
+  master_trace->event("Message allocated");
+  trace_msg_info();
+  return 0;
+}
+
+int Message::init_trace_info()
+{
+  create_message_endpoint();
+  master_trace = ZTracer::create_ZTrace("Main", message_endpoint);
+  if (!master_trace) {
+    return -ENOMEM;
+  }
+
+  return trace_basic_info();
+}
+
+int Message::init_trace_info(ZTracer::ZTraceRef t)
+{
+  if (!t) {
+    return -EINVAL;
+  }
+
+  create_message_endpoint();
+  master_trace = ZTracer::create_ZTrace("Main", t, message_endpoint);
+  if (!master_trace) {
+    return -ENOMEM;
+  }
+
+  return trace_basic_info();
+}
+
+int Message::init_trace_info(struct blkin_trace_info *tinfo)
+{
+  if (!(tinfo->trace_id == 0 && tinfo->span_id == 0 && tinfo->parent_span_id == 0)) {
+    create_message_endpoint();
+    master_trace = ZTracer::create_ZTrace("Main", message_endpoint, tinfo);
+    return trace_basic_info();
+  }
+  return init_trace_info();
+}
+
+void Message::trace(string event)
+{
+  if (!master_trace) {
+    return;
+  }
+
+  master_trace->event(event);
+}
+
+void Message::trace(string key, string val)
+{
+  if (!master_trace) {
+    return;
+  }
+
+  master_trace->keyval(key, val);
+}
+
+bool Message::create_message_endpoint()
+{
+  message_endpoint = ZTracer::create_ZTraceEndpoint("0.0.0.0", 0, "NaN");
+  if (!message_endpoint) {
+    return false;
+  }
+
+  return true;
+}
+#endif // WITH_BLKIN
diff --git a/src/msg/Message.h b/src/msg/Message.h
index ddba0e5..a7c1e54 100644
--- a/src/msg/Message.h
+++ b/src/msg/Message.h
@@ -18,6 +18,11 @@
 #include <stdlib.h>
 #include <ostream>
 
+#ifdef WITH_BLKIN
+#include <ztracer.hpp>
+#endif
+#include "common/blkin_wrapper.h"
+
 #include <boost/intrusive_ptr.hpp>
 #include <boost/intrusive/list.hpp>
 // Because intusive_ptr clobbers our assert...
@@ -257,6 +262,10 @@ protected:
   // currently throttled.
   uint64_t dispatch_throttle_size;
 
+#ifdef WITH_BLKIN
+  ZTracer::ZTraceEndpointRef message_endpoint;
+  ZTracer::ZTraceRef master_trace;
+#endif
   friend class Messenger;
 
 public:
@@ -270,6 +279,7 @@ public:
     memset(&header, 0, sizeof(header));
     memset(&footer, 0, sizeof(footer));
   }
+
   Message(int t, int version=1, int compat_version=0)
     : connection(NULL),
       magic(0),
@@ -326,6 +336,10 @@ public:
   uint32_t get_magic() const { return magic; }
   void set_magic(int _magic) { magic = _magic; }
 
+#ifdef WITH_BLKIN
+  ZTracer::ZTraceRef get_master_trace() { return master_trace; }
+#endif // WITH_BLKIN
+
   /*
    * If you use get_[data, middle, payload] you shouldn't
    * use it to change those bufferlists unless you KNOW
@@ -457,6 +471,17 @@ public:
   virtual void dump(Formatter *f) const;
 
   void encode(uint64_t features, int crcflags);
+
+#ifdef WITH_BLKIN
+  int init_trace_info();
+  int init_trace_info(struct blkin_trace_info *tinfo);
+  int init_trace_info(ZTracer::ZTraceRef t);
+  void trace(string event);
+  void trace(string key, string val);
+  int trace_basic_info();
+  virtual void trace_msg_info() { };
+  virtual bool create_message_endpoint();
+#endif // WITH_BLKIN
 };
 typedef boost::intrusive_ptr<Message> MessageRef;
 
diff --git a/src/msg/msg_types.cc b/src/msg/msg_types.cc
index b02db76..ba36b8d 100644
--- a/src/msg/msg_types.cc
+++ b/src/msg/msg_types.cc
@@ -132,7 +132,7 @@ bool entity_addr_t::parse(const char *s, const char **end)
 
 
 
-ostream& operator<<(ostream& out, const sockaddr_storage &ss)
+int ss_to_string(const sockaddr_storage &ss, string &host, int &port)
 {
   char buf[NI_MAXHOST] = { 0 };
   char serv[NI_MAXSERV] = { 0 };
@@ -147,8 +147,25 @@ ostream& operator<<(ostream& out, const sockaddr_storage &ss)
   getnameinfo((struct sockaddr *)&ss, hostlen, buf, sizeof(buf),
 	      serv, sizeof(serv),
 	      NI_NUMERICHOST | NI_NUMERICSERV);
+  host = buf;
+  port = atoi(serv);
+  return 0;
+}
+
+int entity_addr_t::to_string(string &host, int &port)
+{
+  return ss_to_string(addr, host, port);
+}
+
+
+ostream& operator<<(ostream& out, const sockaddr_storage &ss)
+{
+  string host;
+  int port;
+
+  ss_to_string(ss, host, port);
   if (ss.ss_family == AF_INET6)
-    return out << '[' << buf << "]:" << serv;
+    return out << '[' << host << "]:" << port;
   return out //<< ss.ss_family << ":"
-	     << buf << ':' << serv;
+	     << host << ':' << port;
 }
diff --git a/src/msg/msg_types.h b/src/msg/msg_types.h
index 9c0f266..ac32c85 100644
--- a/src/msg/msg_types.h
+++ b/src/msg/msg_types.h
@@ -336,6 +336,7 @@ struct entity_addr_t {
   }
 
   void dump(Formatter *f) const;
+  int to_string(string &host, int &port);
 
   static void generate_test_instances(list<entity_addr_t*>& o);
 };
diff --git a/src/msg/simple/Pipe.cc b/src/msg/simple/Pipe.cc
index 40bf41b..c1e411d 100644
--- a/src/msg/simple/Pipe.cc
+++ b/src/msg/simple/Pipe.cc
@@ -114,6 +114,7 @@ Pipe::Pipe(SimpleMessenger *r, int st, PipeConnection *con)
 
   recv_max_prefetch = msgr->cct->_conf->ms_tcp_prefetch_max_size;
   recv_buf = new char[recv_max_prefetch];
+  BLKIN_PIPE_ENDPOINT();
 }
 
 Pipe::~Pipe()
@@ -124,6 +125,33 @@ Pipe::~Pipe()
   delete[] recv_buf;
 }
 
+#ifdef WITH_BLKIN
+void Pipe::set_endpoint()
+{
+  string type;
+  entity_inst_t inst = msgr->get_myinst();
+
+  if (inst.name.is_client()) {
+    type = "MON";
+  } else if (inst.name.is_mds()) {
+    type = "MDS";
+  } else if (inst.name.is_osd()) {
+    type = "OSD";
+  } else if (inst.name.is_client()) {
+    type = "CLIENT";
+  } else {
+    type = "UNKNOWN";
+  }
+
+  string host;
+  int port;
+
+  inst.addr.to_string(host, port);
+
+  pipe_endpoint = ZTracer::create_ZTraceEndpoint(host, port, "Messenger-" + type);
+}
+#endif // WITH_BLKIN
+
 void Pipe::handle_ack(uint64_t seq)
 {
   lsubdout(msgr->cct, ms, 15) << "reader got ack seq " << seq << dendl;
@@ -1587,6 +1615,7 @@ void Pipe::reader()
 	  fault(true);
 	continue;
       }
+      BLKIN_MSG_TRACE_EVENT(m, "message_read");
 
       if (state == STATE_CLOSED ||
 	  state == STATE_CONNECTING) {
@@ -1822,6 +1851,8 @@ void Pipe::writer()
 
         pipe_lock.Unlock();
 
+	BLKIN_MSG_TRACE_EVENT(m, "writer_sending");
+
         ldout(msgr->cct,20) << "writer sending " << m->get_seq() << " " << m << dendl;
 	int rc = write_message(header, footer, blist);
 
diff --git a/src/msg/simple/Pipe.h b/src/msg/simple/Pipe.h
index ddfa99f..0e31bb3 100644
--- a/src/msg/simple/Pipe.h
+++ b/src/msg/simple/Pipe.h
@@ -195,6 +195,7 @@ class DispatchQueue;
     ceph::shared_ptr<AuthSessionHandler> session_security;
 
   protected:
+    BLKIN_END_REF(pipe_endpoint)
     friend class SimpleMessenger;
     PipeConnectionRef connection_state;
 
@@ -219,6 +220,9 @@ class DispatchQueue;
     uint64_t in_seq, in_seq_acked;
     
     void set_socket_options();
+#ifdef WITH_BLKIN
+    void set_endpoint();
+#endif
 
     int accept();   // server handshake
     int connect();  // client handshake
diff --git a/src/msg/simple/SimpleMessenger.cc b/src/msg/simple/SimpleMessenger.cc
index 38b5d84..308492a 100644
--- a/src/msg/simple/SimpleMessenger.cc
+++ b/src/msg/simple/SimpleMessenger.cc
@@ -113,6 +113,7 @@ int SimpleMessenger::_send_message(Message *m, const entity_inst_t& dest)
 
   lock.Lock();
   Pipe *pipe = _lookup_pipe(dest.addr);
+  BLKIN_MSG_TRACE_EVENT(m, "submitting_message");
   submit_message(m, (pipe ? pipe->connection_state.get() : NULL),
                  dest.addr, dest.name.type(), true);
   lock.Unlock();

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V6 3/5] OSD support for blkin (LTTng + Zipkin) tracing
  2015-03-11  2:43 (unknown), Andrew Shewmaker
  2015-03-11  2:43 ` [PATCH V6 1/5] Build support for BlkKin (LTTng + Zipkin) tracing Andrew Shewmaker
  2015-03-11  2:43 ` [PATCH V6 2/5] Initial support for blkin " Andrew Shewmaker
@ 2015-03-11  2:43 ` Andrew Shewmaker
  2015-03-11  2:43 ` [PATCH V6 4/5] Rados " Andrew Shewmaker
  2015-03-11  2:43 ` [PATCH V6 5/5] Rados example " Andrew Shewmaker
  4 siblings, 0 replies; 6+ messages in thread
From: Andrew Shewmaker @ 2015-03-11  2:43 UTC (permalink / raw)
  To: ceph-devel; +Cc: agshew, marioskogias, chendi.xue

 * Adds tracing for OSDs and Placement Groups, specifically:
   messages for OSD operations, FileJournal, FileStore,
   ECBackend, replication, and traced versions of Objecter
   read/write (only basic read/write functions).
 * Moves global_init_daemonize() earlier in ceph_osd.c:main

Signed-off-by: Marios-Evaggelos Kogias <marioskogias@gmail.com>
Signed-off-by: Filippos Giannakos <philipgian@grnet.gr>
Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
---
 20 files changed, 323 insertions(+), 14 deletions(-)
 messages/MOSDOp.h         |   13 ++++++++
 messages/MOSDOpReply.h    |   13 ++++++++
 messages/MOSDSubOp.h      |   11 ++++++
 messages/MOSDSubOpReply.h |   12 +++++++
 os/FileJournal.cc         |   14 ++++++--
 os/FileJournal.h          |    6 +++
 os/FileStore.cc           |   20 ++++++++++++
 os/FileStore.h            |    1 
 osd/ECBackend.cc          |    8 +++-
 osd/OSD.cc                |   21 +++++++++++++
 osd/OSD.h                 |    1 
 osd/OpRequest.cc          |   74 ++++++++++++++++++++++++++++++++++++++++++++++
 osd/OpRequest.h           |    9 +++++
 osd/PG.cc                 |    2 +
 osd/PG.h                  |    1 
 osd/ReplicatedBackend.cc  |   19 +++++++++--
 osd/ReplicatedPG.cc       |   35 +++++++++++++++++++++
 osdc/Objecter.cc          |    7 ++++
 osdc/Objecter.h           |   63 ++++++++++++++++++++++++++++++++++++++-
 19 files changed, 319 insertions(+), 11 deletions(-)

diff --git a/src/messages/MOSDOp.h b/src/messages/MOSDOp.h
index 5b88f31..388c9b9 100644
--- a/src/messages/MOSDOp.h
+++ b/src/messages/MOSDOp.h
@@ -32,7 +32,11 @@ class OSD;
 
 class MOSDOp : public Message {
 
+#ifdef WITH_BLKIN
+  static const int HEAD_VERSION = 6;
+#else
   static const int HEAD_VERSION = 5;
+#endif
   static const int COMPAT_VERSION = 3;
 
 private:
@@ -185,6 +189,7 @@ public:
 
   // marshalling
   virtual void encode_payload(uint64_t features) {
+    BLKIN_GET_MASTER(mt);
 
     OSDOp::merge_osd_op_vector_in_data(ops, data);
 
@@ -261,11 +266,14 @@ struct ceph_osd_request_head {
 
       ::encode(retry_attempt, payload);
       ::encode(features, payload);
+
+      BLKIN_MSG_ENCODE_TRACE();
     }
   }
 
   virtual void decode_payload() {
     bufferlist::iterator p = payload.begin();
+    BLKIN_MSG_DO_INIT_TRACE();
 
     if (header.version < 2) {
       // old decode
@@ -348,6 +356,8 @@ struct ceph_osd_request_head {
 	::decode(features, p);
       else
 	features = 0;
+
+      BLKIN_MSG_DECODE_TRACE(6);
     }
 
     OSDOp::split_osd_op_vector_in_data(ops, data);
@@ -390,6 +400,9 @@ struct ceph_osd_request_head {
     out << " e" << osdmap_epoch;
     out << ")";
   }
+
+  BLKIN_MSG_INFO_DECL("MOSDOp")
+  BLKIN_MSG_END_DECL("MOSDOp")
 };
 
 
diff --git a/src/messages/MOSDOpReply.h b/src/messages/MOSDOpReply.h
index b2d5155..64f01c1 100644
--- a/src/messages/MOSDOpReply.h
+++ b/src/messages/MOSDOpReply.h
@@ -32,7 +32,11 @@
 
 class MOSDOpReply : public Message {
 
+#ifdef WITH_BLKIN
+  static const int HEAD_VERSION = 7;
+#else
   static const int HEAD_VERSION = 6;
+#endif
   static const int COMPAT_VERSION = 2;
 
   object_t oid;
@@ -143,6 +147,7 @@ public:
       if (ignore_out_data)
 	ops[i].outdata.clear();
     }
+    BLKIN_MSG_CHECK_SPAN();
   }
 private:
   ~MOSDOpReply() {}
@@ -150,6 +155,8 @@ private:
 public:
   virtual void encode_payload(uint64_t features) {
 
+    BLKIN_GET_MASTER(mt);
+
     OSDOp::merge_osd_op_vector_out_data(ops, data);
 
     if ((features & CEPH_FEATURE_PGID64) == 0) {
@@ -189,10 +196,13 @@ public:
       ::encode(replay_version, payload);
       ::encode(user_version, payload);
       ::encode(redirect, payload);
+
+      BLKIN_MSG_ENCODE_TRACE();
     }
   }
   virtual void decode_payload() {
     bufferlist::iterator p = payload.begin();
+    BLKIN_MSG_DO_INIT_TRACE();
     if (header.version < 2) {
       ceph_osd_reply_head head;
       ::decode(head, p);
@@ -244,6 +254,8 @@ public:
 
       if (header.version >= 6)
 	::decode(redirect, p);
+
+      BLKIN_MSG_DECODE_TRACE(7);
     }
   }
 
@@ -270,6 +282,7 @@ public:
     out << ")";
   }
 
+  BLKIN_MSG_END_DECL("MOSDOpReply")
 };
 
 
diff --git a/src/messages/MOSDSubOp.h b/src/messages/MOSDSubOp.h
index 544dfcf..96f48a9 100644
--- a/src/messages/MOSDSubOp.h
+++ b/src/messages/MOSDSubOp.h
@@ -25,8 +25,13 @@
 
 class MOSDSubOp : public Message {
 
+#ifdef WITH_BLKIN
+  static const int HEAD_VERSION = 12;
+#else
   static const int HEAD_VERSION = 11;
   static const int COMPAT_VERSION = 7;
+#endif
+  static const int COMPAT_VERSION = 1;
 
 public:
   epoch_t map_epoch;
@@ -103,6 +108,7 @@ public:
     //version >=7
     assert (header.version >= 7);
     bufferlist::iterator p = payload.begin();
+    BLKIN_MSG_DO_INIT_TRACE();
     ::decode(map_epoch, p);
     ::decode(reqid, p);
     ::decode(pgid.pgid, p);
@@ -170,6 +176,7 @@ public:
     } else {
       pg_trim_rollback_to = pg_trim_to;
     }
+    BLKIN_MSG_DECODE_TRACE(12);
   }
 
   virtual void encode_payload(uint64_t features) {
@@ -177,6 +184,7 @@ public:
     ::encode(reqid, payload);
     ::encode(pgid.pgid, payload);
     ::encode(poid, payload);
+    BLKIN_GET_MASTER(mt);
 
     __u32 num_ops = ops.size();
     ::encode(num_ops, payload);
@@ -221,6 +229,7 @@ public:
     ::encode(pgid.shard, payload);
     ::encode(updated_hit_set_history, payload);
     ::encode(pg_trim_rollback_to, payload);
+    BLKIN_MSG_ENCODE_TRACE();
   }
 
   MOSDSubOp()
@@ -262,6 +271,8 @@ public:
       out << ", has_updated_hit_set_history";
     out << ")";
   }
+
+  BLKIN_MSG_END_DECL("MOSDSubOp")
 };
 
 
diff --git a/src/messages/MOSDSubOpReply.h b/src/messages/MOSDSubOpReply.h
index a084246..dfff58e 100644
--- a/src/messages/MOSDSubOpReply.h
+++ b/src/messages/MOSDSubOpReply.h
@@ -30,7 +30,11 @@
  */
 
 class MOSDSubOpReply : public Message {
+#ifdef WITH_BLKIN
+  static const int HEAD_VERSION = 3;
+#else
   static const int HEAD_VERSION = 2;
+#endif
   static const int COMPAT_VERSION = 1;
 public:
   epoch_t map_epoch;
@@ -55,6 +59,7 @@ public:
 
   virtual void decode_payload() {
     bufferlist::iterator p = payload.begin();
+    BLKIN_MSG_DO_INIT_TRACE();
     ::decode(map_epoch, p);
     ::decode(reqid, p);
     ::decode(pgid.pgid, p);
@@ -84,8 +89,11 @@ public:
 	shard_id_t::NO_SHARD);
       pgid.shard = shard_id_t::NO_SHARD;
     }
+    BLKIN_MSG_DECODE_TRACE(3);
   }
   virtual void encode_payload(uint64_t features) {
+    BLKIN_GET_MASTER(mt);
+
     ::encode(map_epoch, payload);
     ::encode(reqid, payload);
     ::encode(pgid.pgid, payload);
@@ -102,6 +110,8 @@ public:
     ::encode(attrset, payload);
     ::encode(from, payload);
     ::encode(pgid.shard, payload);
+
+    BLKIN_MSG_ENCODE_TRACE();
   }
 
   epoch_t get_map_epoch() { return map_epoch; }
@@ -138,6 +148,7 @@ public:
     result(result_) {
     memset(&peer_stat, 0, sizeof(peer_stat));
     set_tid(req->get_tid());
+    BLKIN_MSG_CHECK_SPAN();
   }
   MOSDSubOpReply() : Message(MSG_OSD_SUBOPREPLY) {}
 private:
@@ -160,6 +171,7 @@ public:
     out << ")";
   }
 
+  BLKIN_MSG_END_DECL("MOSDSubOpReply")
 };
 
 
diff --git a/src/os/FileJournal.cc b/src/os/FileJournal.cc
index c6bb6f2..8e024d0 100644
--- a/src/os/FileJournal.cc
+++ b/src/os/FileJournal.cc
@@ -874,8 +874,11 @@ void FileJournal::queue_completions_thru(uint64_t seq)
     }
     if (next.finish)
       finisher->queue(next.finish);
-    if (next.tracked_op)
+    if (next.tracked_op) {
       next.tracked_op->mark_event("journaled_completion_queued");
+      BLKIN_OP_TRACE_EVENT(next.tracked_op, journal, "journaled_completion_queued");
+      BLKIN_OP_TRACE_EVENT(next.tracked_op, journal, "span_ended");
+    }
   }
   finisher_cond.Signal();
 }
@@ -934,8 +937,10 @@ int FileJournal::prepare_single_write(bufferlist& bl, off64_t& queue_pos, uint64
   }
   bl.append((const char*)&h, sizeof(h));
 
-  if (next_write.tracked_op)
+  if (next_write.tracked_op) {
     next_write.tracked_op->mark_event("write_thread_in_journal_buffer");
+    BLKIN_OP_TRACE_EVENT(next_write.tracked_op, journal, "write_thread_in_journal_buffer");
+  }
 
   // pop from writeq
   pop_write();
@@ -1498,8 +1503,11 @@ void FileJournal::submit_entry(uint64_t seq, bufferlist& e, int alignment,
 
   throttle_ops.take(1);
   throttle_bytes.take(e.length());
-  if (osd_op)
+  if (osd_op) {
     osd_op->mark_event("commit_queued_for_journal_write");
+    BLKIN_OP_CREATE_TRACE(osd_op, journal, journal_endpoint);
+    BLKIN_OP_TRACE_EVENT(osd_op, journal, "commit_queued_for_journal_write");
+  }
   if (logger) {
     logger->set(l_os_jq_max_ops, throttle_ops.get_max());
     logger->set(l_os_jq_max_bytes, throttle_bytes.get_max());
diff --git a/src/os/FileJournal.h b/src/os/FileJournal.h
index 574c902..9622441 100644
--- a/src/os/FileJournal.h
+++ b/src/os/FileJournal.h
@@ -215,6 +215,7 @@ public:
   } __attribute__((__packed__, aligned(4)));
 
 private:
+  BLKIN_END_REF(journal_endpoint)
   string fn;
 
   char *zero_buf;
@@ -385,7 +386,10 @@ private:
     write_stop(false),
     aio_stop(false),
     write_thread(this),
-    write_finish_thread(this) { }
+    write_finish_thread(this)
+  {
+    BLKIN_MSG_END(journal_endpoint, "", 0, "Journal (" + fn + ")");
+  }
   ~FileJournal() {
     delete[] zero_buf;
   }
diff --git a/src/os/FileStore.cc b/src/os/FileStore.cc
index f6c3bb8..2d2bcee 100644
--- a/src/os/FileStore.cc
+++ b/src/os/FileStore.cc
@@ -563,6 +563,7 @@ FileStore::FileStore(const std::string &base, const std::string &jdev, osflagbit
   m_filestore_max_inline_xattrs(0)
 {
   m_filestore_kill_at.set(g_conf->filestore_kill_at);
+  BLKIN_MSG_END(filestore_endpoint, "", 0, "Filestore (" + basedir + "(" + name + ")" + ")");
 
   ostringstream oss;
   oss << basedir << "/current";
@@ -1717,6 +1718,9 @@ void FileStore::queue_op(OpSequencer *osr, Op *o)
   // so that regardless of which order the threads pick up the
   // sequencer, the op order will be preserved.
 
+  if (o->osd_op) {
+    BLKIN_OP_TRACE_EVENT(o->osd_op, osd, "queueing_for_filestore");
+  }
   osr->queue(o);
 
   logger->inc(l_os_ops);
@@ -1728,6 +1732,9 @@ void FileStore::queue_op(OpSequencer *osr, Op *o)
 	  << "   (queue has " << op_queue_len << " ops and " << op_queue_bytes << " bytes)"
 	  << dendl;
   op_wq.queue(osr);
+  if (o->osd_op) {
+    BLKIN_OP_TRACE_EVENT(o->osd_op, osd, "queued_for_filestore");
+  }
 }
 
 void FileStore::op_queue_reserve_throttle(Op *o, ThreadPool::TPHandle *handle)
@@ -1798,8 +1805,17 @@ void FileStore::_do_op(OpSequencer *osr, ThreadPool::TPHandle &handle)
   osr->apply_lock.Lock();
   Op *o = osr->peek_queue();
   apply_manager.op_apply_start(o->op);
+
+  if (o->osd_op) {
+    BLKIN_OP_CREATE_TRACE(o->osd_op, filestore, filestore_endpoint);
+    BLKIN_OP_TRACE_EVENT(o->osd_op, filestore, "filestore_dequeued");
+  }
   dout(5) << "_do_op " << o << " seq " << o->op << " " << *osr << "/" << osr->parent << " start" << dendl;
   int r = _do_transactions(o->tls, o->op, &handle);
+
+  if (o->osd_op) {
+    BLKIN_OP_TRACE_EVENT(o->osd_op, filestore, "filestore_finished");
+  }
   apply_manager.op_apply_finish(o->op);
   dout(10) << "_do_op " << o << " seq " << o->op << " r = " << r
 	   << ", finisher " << o->onreadable << " " << o->onreadable_sync << dendl;
@@ -1809,6 +1825,10 @@ void FileStore::_finish_op(OpSequencer *osr)
 {
   list<Context*> to_queue;
   Op *o = osr->dequeue(&to_queue);
+  if (o->osd_op) {
+    BLKIN_OP_TRACE_EVENT(o->osd_op, filestore, "filestore_finishing_op");
+    BLKIN_OP_TRACE_EVENT(o->osd_op, filestore, "span_ended");
+  }
   
   dout(10) << "_finish_op " << o << " seq " << o->op << " " << *osr << "/" << osr->parent << dendl;
   osr->apply_lock.Unlock();  // locked in _do_op
diff --git a/src/os/FileStore.h b/src/os/FileStore.h
index af1fb8d..6605a97 100644
--- a/src/os/FileStore.h
+++ b/src/os/FileStore.h
@@ -118,6 +118,7 @@ public:
   }
 
 private:
+  BLKIN_END_REF(filestore_endpoint)
   string internal_name;         ///< internal name, used to name the perfcounter instance
   string basedir, journalpath;
   osflagbits_t generic_flags;
diff --git a/src/osd/ECBackend.cc b/src/osd/ECBackend.cc
index 951b249..8fc4459 100644
--- a/src/osd/ECBackend.cc
+++ b/src/osd/ECBackend.cc
@@ -727,8 +727,10 @@ struct SubWriteCommitted : public Context {
     : pg(pg), msg(msg), tid(tid),
       version(version), last_complete(last_complete) {}
   void finish(int) {
-    if (msg)
+    if (msg) {
       msg->mark_event("sub_op_committed");
+      BLKIN_OP_TRACE_EVENT(msg, pg, "sub_op_committed");
+    }
     pg->sub_write_committed(tid, version, last_complete);
   }
 };
@@ -769,8 +771,10 @@ struct SubWriteApplied : public Context {
     eversion_t version)
     : pg(pg), msg(msg), tid(tid), version(version) {}
   void finish(int) {
-    if (msg)
+    if (msg) {
       msg->mark_event("sub_op_applied");
+      BLKIN_OP_TRACE_EVENT(msg, pg, "sub_op_applied");
+    }
     pg->sub_write_applied(tid, version);
   }
 };
diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 35854f5..30c8f06 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -21,6 +21,9 @@
 #include <signal.h>
 #include <ctype.h>
 #include <boost/scoped_ptr.hpp>
+#ifdef WITH_BLKIN
+#include <boost/lexical_cast.hpp>
+#endif
 
 #ifdef HAVE_SYS_PARAM_H
 #include <sys/param.h>
@@ -1572,6 +1575,8 @@ OSD::OSD(CephContext *cct_, ObjectStore *store_,
                                          cct->_conf->osd_op_log_threshold);
   op_tracker.set_history_size_and_duration(cct->_conf->osd_op_history_size,
                                            cct->_conf->osd_op_history_duration);
+
+  BLKIN_MSG_END(osd_endpoint, "", 0, "osd." + boost::lexical_cast<string>(whoami));
 }
 
 OSD::~OSD()
@@ -5435,6 +5440,9 @@ void OSD::ms_fast_dispatch(Message *m)
 #endif
     tracepoint(osd, ms_fast_dispatch, reqid.name._type,
         reqid.name._num, reqid.tid, reqid.inc);
+
+    BLKIN_OP_CREATE_TRACE(op, osd, osd_endpoint);
+    BLKIN_OP_TRACE_EVENT(op, osd, "waiting_on_osdmap");
   }
   OSDMapRef nextmap = service.get_nextmap_reserved();
   Session *session = static_cast<Session*>(m->get_connection()->get_priv());
@@ -5782,6 +5790,8 @@ void OSD::_dispatch(Message *m)
     {
       OpRequestRef op = op_tracker.create_request<OpRequest, Message*>(m);
       op->mark_event("waiting_for_osdmap");
+      BLKIN_OP_CREATE_TRACE(op, osd, osd_endpoint);
+      BLKIN_OP_TRACE_EVENT(op, osd, "waiting_for_osdmap");
       // no map?  starting up?
       if (!osdmap) {
         dout(7) << "no OSDMap, not booted" << dendl;
@@ -8020,6 +8030,8 @@ void OSD::handle_op(OpRequestRef& op, OSDMapRef& osdmap)
     return;
   }
 
+  BLKIN_OP_TRACE_EVENT(op, osd, "handling_op");
+
   // we don't need encoded payload anymore
   m->clear_payload();
 
@@ -8157,7 +8169,10 @@ void OSD::handle_op(OpRequestRef& op, OSDMapRef& osdmap)
   if (pg) {
     op->send_map_update = share_map.should_send;
     op->sent_epoch = m->get_map_epoch();
+    BLKIN_OP_CREATE_TRACE(op, pg, pg->pg_endpoint);
+    BLKIN_OP_TRACE_EVENT(op, pg, "enqueuing_op");
     enqueue_op(pg, op);
+    BLKIN_OP_TRACE_EVENT(op, pg, "enqueued_op");
     share_map.should_send = false;
   }
 }
@@ -8168,6 +8183,8 @@ void OSD::handle_replica_op(OpRequestRef& op, OSDMapRef& osdmap)
   T *m = static_cast<T *>(op->get_req());
   assert(m->get_type() == MSGTYPE);
 
+  BLKIN_OP_TRACE_EVENT(op, osd, "handling_replica_op");
+
   dout(10) << __func__ << " " << *m << " epoch " << m->map_epoch << dendl;
   if (!require_self_aliveness(op->get_req(), m->map_epoch))
     return;
@@ -8200,7 +8217,9 @@ void OSD::handle_replica_op(OpRequestRef& op, OSDMapRef& osdmap)
   if (pg) {
     op->send_map_update = should_share_map;
     op->sent_epoch = m->map_epoch;
+    BLKIN_OP_CREATE_TRACE(op, pg, pg->pg_endpoint);
     enqueue_op(pg, op);
+    BLKIN_OP_TRACE_EVENT(op, osd, "enqueued_replica_op");
   } else if (should_share_map && m->get_connection()->is_connected()) {
     C_SendMap *send_map = new C_SendMap(this, m->get_source(),
 					m->get_connection(),
@@ -8291,7 +8310,9 @@ void OSD::ShardedOpWQ::_process(uint32_t thread_index, heartbeat_handle_d *hb )
   delete f;
   *_dout << dendl;
 
+  BLKIN_OP_TRACE_EVENT(op, pg, "dequeuing_op");
   osd->dequeue_op(item.first, op, tp_handle);
+  BLKIN_OP_TRACE_EVENT(op, pg, "dequeued_op");
 
   {
 #ifdef WITH_LTTNG
diff --git a/src/osd/OSD.h b/src/osd/OSD.h
index b043d28..1e814cd 100644
--- a/src/osd/OSD.h
+++ b/src/osd/OSD.h
@@ -919,6 +919,7 @@ protected:
   AuthAuthorizeHandlerRegistry *authorize_handler_cluster_registry;
   AuthAuthorizeHandlerRegistry *authorize_handler_service_registry;
 
+  BLKIN_END_REF(osd_endpoint)
   Messenger   *cluster_messenger;
   Messenger   *client_messenger;
   Messenger   *objecter_messenger;
diff --git a/src/osd/OpRequest.cc b/src/osd/OpRequest.cc
index 1296334..fb8f981 100644
--- a/src/osd/OpRequest.cc
+++ b/src/osd/OpRequest.cc
@@ -131,3 +131,77 @@ void OpRequest::mark_flag_point(uint8_t flag, const string& s) {
 	     reqid.name._num, reqid.tid, reqid.inc, rmw_flags,
 	     flag, s.c_str(), old_flags, hit_flag_points);
 }
+
+#ifdef WITH_BLKIN
+bool OpRequest::create_osd_trace(TrackedOpEndpointRef ep)
+{
+  string name = "OSD Handling op";
+  if (!request) {
+    return false;
+  }
+
+  TrackedOpTraceRef mt = request->get_master_trace();
+  if (!mt) {
+    return false;
+  }
+
+  osd_trace = ZTracer::create_ZTrace(name, mt, ep);
+  if(!osd_trace){
+    return false;
+  }
+
+  return true;
+}
+
+bool OpRequest::create_pg_trace(TrackedOpEndpointRef ep)
+{
+  string name = "PG";
+  if (!request) {
+    return false;
+  }
+
+  TrackedOpTraceRef mt = request->get_master_trace();
+  if (!mt) {
+    return false;
+  }
+
+  pg_trace = ZTracer::create_ZTrace(name, mt, ep);
+  if(!pg_trace){
+    return false;
+  }
+
+  return true;
+}
+
+bool OpRequest::create_journal_trace(TrackedOpEndpointRef ep)
+{
+  string name = "Journal access";
+
+  if (!osd_trace) {
+    return false;
+  }
+
+  journal_trace = ZTracer::create_ZTrace(name, osd_trace, ep);
+  if(!journal_trace){
+    return false;
+  }
+
+  return true;
+}
+
+bool OpRequest::create_filestore_trace(TrackedOpEndpointRef ep)
+{
+  string name = "Filestore access";
+
+  if (!osd_trace) {
+    return false;
+  }
+
+  filestore_trace = ZTracer::create_ZTrace(name, osd_trace, ep);
+  if(!filestore_trace){
+    return false;
+  }
+
+  return true;
+}
+#endif // WITH_BLKIN
diff --git a/src/osd/OpRequest.h b/src/osd/OpRequest.h
index 88a2704..65b5bbc 100644
--- a/src/osd/OpRequest.h
+++ b/src/osd/OpRequest.h
@@ -76,6 +76,13 @@ struct OpRequest : public TrackedOp {
 
   void _dump(utime_t now, Formatter *f) const;
 
+#ifdef WITH_BLKIN
+  bool create_osd_trace(TrackedOpEndpointRef ep);
+  bool create_pg_trace(TrackedOpEndpointRef ep);
+  bool create_journal_trace(TrackedOpEndpointRef ep);
+  bool create_filestore_trace(TrackedOpEndpointRef ep);
+#endif // WITH_BLKIN
+
   bool has_feature(uint64_t f) const {
     return request->get_connection()->has_feature(f);
   }
@@ -146,9 +153,11 @@ public:
   }
   void mark_sub_op_sent(const string& s) {
     mark_flag_point(flag_sub_op_sent, s);
+    BLKIN_TYPE_TRACE_EVENT(pg, "sub_op_sent | " + s);
   }
   void mark_commit_sent() {
     mark_flag_point(flag_commit_sent, "commit_sent");
+    BLKIN_TYPE_TRACE_EVENT(pg, "commit_sent");
   }
 
   utime_t get_dequeued_time() const {
diff --git a/src/osd/PG.cc b/src/osd/PG.cc
index 7447ecc..ea6f84f 100644
--- a/src/osd/PG.cc
+++ b/src/osd/PG.cc
@@ -220,6 +220,8 @@ PG::PG(OSDService *o, OSDMapRef curmap,
 #ifdef PG_DEBUG_REFS
   osd->add_pgid(p, this);
 #endif
+  BLKIN_OSS(oss, "PG " << info.pgid);
+  BLKIN_MSG_END(pg_endpoint, "", 0, oss.str());
 }
 
 PG::~PG()
diff --git a/src/osd/PG.h b/src/osd/PG.h
index 870fdf2..2ab78f2 100644
--- a/src/osd/PG.h
+++ b/src/osd/PG.h
@@ -197,6 +197,7 @@ public:
   void update_snap_mapper_bits(uint32_t bits) {
     snap_mapper.update_bits(bits);
   }
+  BLKIN_END_REF(pg_endpoint)
 protected:
   // Ops waiting for map, should be queued at back
   Mutex map_lock;
diff --git a/src/osd/ReplicatedBackend.cc b/src/osd/ReplicatedBackend.cc
index 680c27a..afbd4c9 100644
--- a/src/osd/ReplicatedBackend.cc
+++ b/src/osd/ReplicatedBackend.cc
@@ -130,6 +130,7 @@ bool ReplicatedBackend::handle_message(
   OpRequestRef op
   )
 {
+  BLKIN_OP_TRACE_EVENT(op, pg, "handling_message");
   dout(10) << __func__ << ": " << op << dendl;
   switch (op->get_req()->get_type()) {
   case MSG_OSD_PG_PUSH:
@@ -608,8 +609,10 @@ void ReplicatedBackend::op_applied(
   InProgressOp *op)
 {
   dout(10) << __func__ << ": " << op->tid << dendl;
-  if (op->op)
+  if (op->op) {
     op->op->mark_event("op_applied");
+    BLKIN_OP_TRACE_EVENT(op->op, pg, "op_applied");
+  }
 
   op->waiting_for_applied.erase(get_parent()->whoami_shard());
   parent->op_applied(op->v);
@@ -628,8 +631,10 @@ void ReplicatedBackend::op_commit(
   InProgressOp *op)
 {
   dout(10) << __func__ << ": " << op->tid << dendl;
-  if (op->op)
+  if (op->op) {
     op->op->mark_event("op_commit");
+    BLKIN_OP_TRACE_EVENT(op->op, pg, "op_commit");
+  }
 
   op->waiting_for_commit.erase(get_parent()->whoami_shard());
 
@@ -680,12 +685,18 @@ void ReplicatedBackend::sub_op_modify_reply(OpRequestRef op)
     if (r->ack_type & CEPH_OSD_FLAG_ONDISK) {
       assert(ip_op.waiting_for_commit.count(from));
       ip_op.waiting_for_commit.erase(from);
-      if (ip_op.op)
+      if (ip_op.op) {
 	ip_op.op->mark_event("sub_op_commit_rec");
+	BLKIN_OP_TRACE_EVENT(ip_op.op, pg, "sub_op_commit_rec");
+	BLKIN_MSG_TRACE_EVENT(op->get_req(), "span_ended");
+      }
     } else {
       assert(ip_op.waiting_for_applied.count(from));
-      if (ip_op.op)
+      if (ip_op.op) {
 	ip_op.op->mark_event("sub_op_applied_rec");
+	BLKIN_OP_TRACE_EVENT(ip_op.op, pg, "sub_op_applied_rec");
+	BLKIN_MSG_TRACE_EVENT(op->get_req(), "span_ended");
+      }
     }
     ip_op.waiting_for_applied.erase(from);
 
diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc
index 057564c..106eb4d 100644
--- a/src/osd/ReplicatedPG.cc
+++ b/src/osd/ReplicatedPG.cc
@@ -1301,6 +1301,7 @@ void ReplicatedPG::do_request(
   OpRequestRef& op,
   ThreadPool::TPHandle &handle)
 {
+  BLKIN_OP_TRACE_EVENT(op, pg, "starting_request");
   if (!op_has_sufficient_caps(op)) {
     osd->reply_op_error(op, -EPERM);
     return;
@@ -1422,6 +1423,7 @@ bool ReplicatedPG::check_src_targ(const hobject_t& soid, const hobject_t& toid)
  */
 void ReplicatedPG::do_op(OpRequestRef& op)
 {
+  BLKIN_OP_TRACE_EVENT(op, pg, "do_op");
   MOSDOp *m = static_cast<MOSDOp*>(op->get_req());
   assert(m->get_type() == CEPH_MSG_OSD_OP);
   if (op->includes_pg_op()) {
@@ -1432,6 +1434,7 @@ void ReplicatedPG::do_op(OpRequestRef& op)
     return do_pg_op(op);
   }
 
+  BLKIN_OP_TRACE_KEYVAL(op, osd, "object", m->get_oid().name);
   if (get_osdmap()->is_blacklisted(m->get_source_addr())) {
     dout(10) << "do_op " << m->get_source_addr() << " is blacklisted" << dendl;
     osd->reply_op_error(op, -EBLACKLISTED);
@@ -1536,6 +1539,7 @@ void ReplicatedPG::do_op(OpRequestRef& op)
 	if (m->wants_ack()) {
 	  if (already_ack(replay_version)) {
 	    MOSDOpReply *reply = new MOSDOpReply(m, 0, get_osdmap()->get_epoch(), 0, false);
+	    BLKIN_MSG_INIT_TRACE(reply, op->get_osd_trace());
 	    reply->add_flags(CEPH_OSD_FLAG_ACK);
 	    reply->set_reply_versions(replay_version, user_version);
 	    osd->send_message_osd_client(reply, m->get_connection());
@@ -2254,6 +2258,8 @@ void ReplicatedPG::execute_ctx(OpContext *ctx)
   const hobject_t& soid = obc->obs.oi.soid;
   map<hobject_t,ObjectContextRef>& src_obc = ctx->src_obc;
 
+  BLKIN_OP_TRACE_EVENT(op, pg, "executing_ctx");
+
   // this method must be idempotent since we may call it several times
   // before we finally apply the resulting transaction.
   delete ctx->op_t;
@@ -2359,6 +2365,7 @@ void ReplicatedPG::execute_ctx(OpContext *ctx)
   // prepare the reply
   ctx->reply = new MOSDOpReply(m, 0, get_osdmap()->get_epoch(), 0,
 			       successful_write);
+  BLKIN_MSG_INIT_TRACE(ctx->reply, op->get_osd_trace());
 
   // Write operations aren't allowed to return a data payload because
   // we can't do so reliably. If the client has to resend the request
@@ -2508,6 +2515,8 @@ void ReplicatedPG::do_sub_op(OpRequestRef op)
   assert(m->get_type() == MSG_OSD_SUBOP);
   dout(15) << "do_sub_op " << *op->get_req() << dendl;
 
+  BLKIN_OP_TRACE_EVENT(op, pg, "do_sub_op");
+
   OSDOp *first = NULL;
   if (m->ops.size() >= 1) {
     first = &m->ops[0];
@@ -3597,6 +3606,13 @@ int ReplicatedPG::do_osd_ops(OpContext *ctx, vector<OSDOp>& ops)
     case CEPH_OSD_OP_READ:
       ++ctx->num_read;
       {
+        BLKIN_OP_TRACE_KEYVAL(ctx->op, osd, "type", "read");
+        BLKIN_OSS(oss1, op.extent.offset);
+        BLKIN_OP_TRACE_KEYVAL(ctx->op, osd, "offset", oss1.str());
+        BLKIN_OSS(oss2, op.extent.length);
+        BLKIN_OP_TRACE_KEYVAL(ctx->op, osd, "length", oss2.str());
+      }
+      {
 	__u32 seq = oi.truncate_seq;
 	uint64_t size = oi.size;
 	tracepoint(osd, do_osd_op_pre_read, soid.oid.name.c_str(), soid.snap.val, size, seq, op.extent.offset, op.extent.length, op.extent.truncate_size, op.extent.truncate_seq);
@@ -7507,12 +7523,14 @@ void ReplicatedPG::eval_repop(RepGather *repop)
 	repop->ctx->reply = NULL;
       else {
 	reply = new MOSDOpReply(m, 0, get_osdmap()->get_epoch(), 0, true);
+	BLKIN_MSG_INIT_TRACE(reply, repop->ctx->op->get_osd_trace());
 	reply->set_reply_versions(repop->ctx->at_version,
 				  repop->ctx->user_at_version);
       }
       reply->add_flags(CEPH_OSD_FLAG_ACK | CEPH_OSD_FLAG_ONDISK);
       dout(10) << " sending commit on " << *repop << " " << reply << dendl;
       osd->send_message_osd_client(reply, m->get_connection());
+      BLKIN_MSG_TRACE_EVENT(m, "replied_commit");
       repop->sent_disk = true;
       repop->ctx->op->mark_commit_sent();
     }
@@ -7530,10 +7548,12 @@ void ReplicatedPG::eval_repop(RepGather *repop)
 	   ++i) {
 	MOSDOp *m = (MOSDOp*)i->first->get_req();
 	MOSDOpReply *reply = new MOSDOpReply(m, 0, get_osdmap()->get_epoch(), 0, true);
+	BLKIN_MSG_INIT_TRACE(reply, repop->ctx->op->get_osd_trace());
 	reply->set_reply_versions(repop->ctx->at_version,
 				  i->second);
 	reply->add_flags(CEPH_OSD_FLAG_ACK);
 	osd->send_message_osd_client(reply, m->get_connection());
+	BLKIN_MSG_TRACE_EVENT(m, "replied_ack");
       }
       waiting_for_ack.erase(repop->v);
     }
@@ -7587,6 +7607,8 @@ void ReplicatedPG::eval_repop(RepGather *repop)
       dout(0) << "   q front is " << *repop_queue.front() << dendl; 
       assert(repop_queue.front() == repop);
     }
+    BLKIN_OP_TRACE_EVENT(repop->ctx->op, pg, "all_done");
+    BLKIN_MSG_TRACE_EVENT(m, "span_ended");
     repop_queue.pop_front();
     remove_repop(repop);
   }
@@ -7606,6 +7628,8 @@ void ReplicatedPG::issue_repop(RepGather *repop, utime_t now)
           << " o " << soid
           << dendl;
 
+  BLKIN_OP_TRACE_EVENT(ctx->op, pg, "issuing_repop");
+
   repop->v = ctx->at_version;
   if (ctx->at_version > eversion_t()) {
     for (set<pg_shard_t>::iterator i = actingbackfill.begin();
@@ -7770,6 +7794,7 @@ void ReplicatedBackend::issue_op(
   InProgressOp *op,
   ObjectStore::Transaction *op_t)
 {
+  BLKIN_OP_TRACE_EVENT(op->op, pg, "issuing_replication");
 
   if (parent->get_actingbackfill_shards().size() > 1) {
     ostringstream ss;
@@ -7806,6 +7831,8 @@ void ReplicatedBackend::issue_op(
 	    op_t,
 	    peer,
 	    pinfo);
+
+      BLKIN_MSG_INIT_TRACE_IF(op->op->get_req(), wr, op->op->get_osd_trace());
     } else {
       wr = generate_subop<MOSDRepOp, MSG_OSD_REPOP>(
 	    soid,
@@ -8635,6 +8662,7 @@ void ReplicatedBackend::sub_op_modify_impl(OpRequestRef op)
 void ReplicatedBackend::sub_op_modify_applied(RepModifyRef rm)
 {
   rm->op->mark_event("sub_op_applied");
+  BLKIN_OP_TRACE_EVENT(rm->op, pg, "sub_op_applied");
   rm->applied = true;
 
   dout(10) << "sub_op_modify_applied on " << rm << " op "
@@ -8666,6 +8694,9 @@ void ReplicatedBackend::sub_op_modify_applied(RepModifyRef rm)
   // send ack to acker only if we haven't sent a commit already
   if (ack) {
     ack->set_priority(CEPH_MSG_PRIO_HIGH); // this better match commit priority!
+    BLKIN_MSG_INIT_TRACE(ack, rm->op->get_osd_trace());
+    BLKIN_MSG_TRACE_EVENT(rm->op->get_req(), "replied_apply");
+    BLKIN_MSG_TRACE_EVENT(rm->op->get_req(), "span_ended");
     get_parent()->send_message_osd_cluster(
       rm->ackerosd, ack, get_osdmap()->get_epoch());
   }
@@ -8709,6 +8740,9 @@ void ReplicatedBackend::sub_op_modify_commit(RepModifyRef rm)
   }
 
   commit->set_priority(CEPH_MSG_PRIO_HIGH); // this better match ack priority!
+  BLKIN_MSG_INIT_TRACE(commit, rm->op->get_osd_trace());
+  BLKIN_MSG_TRACE_EVENT(rm->op->get_req(), "replied_commit");
+  BLKIN_MSG_TRACE_EVENT(rm->op->get_req(), "span_ended");
   get_parent()->send_message_osd_cluster(
     rm->ackerosd, commit, get_osdmap()->get_epoch());
   
@@ -9388,6 +9422,7 @@ struct C_OnPushCommit : public Context {
   OpRequestRef op;
   C_OnPushCommit(ReplicatedPG *pg, OpRequestRef op) : pg(pg), op(op) {}
   void finish(int) {
+    BLKIN_OP_TRACE_EVENT(op, pg, "committed");
     op->mark_event("committed");
     log_subop_stats(pg->osd->logger, op, l_osd_sop_push);
   }
diff --git a/src/osdc/Objecter.cc b/src/osdc/Objecter.cc
index 69d812a..31ba376 100644
--- a/src/osdc/Objecter.cc
+++ b/src/osdc/Objecter.cc
@@ -2764,6 +2764,7 @@ MOSDOp *Objecter::_prepare_osd_op(Op *op)
   m->ops = op->ops;
   m->set_mtime(op->mtime);
   m->set_retry_attempt(op->attempts++);
+  BLKIN_MSG_INIT_TRACE_IF(op->trace, m, op->trace);
 
   if (op->replay_version != eversion_t())
     m->set_version(op->replay_version);  // we're replaying this op!
@@ -2904,6 +2905,7 @@ void Objecter::handle_osd_op_reply(MOSDOpReply *m)
 	    << " ... stray" << dendl;
     s->lock.unlock();
     put_session(s);
+    BLKIN_MSG_TRACE_EVENT(m, "span_ended");
     m->put();
     return;
   }
@@ -2916,6 +2918,10 @@ void Objecter::handle_osd_op_reply(MOSDOpReply *m)
 		<< dendl;
   Op *op = iter->second;
 
+  BLKIN_MSG_TRACE_EVENT_IF(op->oncommit, m, "oncommit_message");
+  BLKIN_MSG_TRACE_EVENT_IF(op->onack, m, "onack_message");
+  BLKIN_MSG_TRACE_EVENT(m, "span_ended");
+
   if (m->get_retry_attempt() >= 0) {
     if (m->get_retry_attempt() != (op->attempts - 1)) {
       ldout(cct, 7) << " ignoring reply from attempt " << m->get_retry_attempt()
@@ -3006,6 +3012,7 @@ void Objecter::handle_osd_op_reply(MOSDOpReply *m)
   assert(op->out_bl.size() == op->out_rval.size());
   assert(op->out_bl.size() == op->out_handler.size());
   vector<OSDOp>::iterator p = out_ops.begin();
+  BLKIN_OP_EVENT_IF(op->trace, op->trace, "in_handle_osd_op_reply");
   for (unsigned i = 0;
        p != out_ops.end() && pb != op->out_bl.end();
        ++i, ++p, ++pb, ++pr, ++ph) {
diff --git a/src/osdc/Objecter.h b/src/osdc/Objecter.h
index 68c429d..7dd57ca 100644
--- a/src/osdc/Objecter.h
+++ b/src/osdc/Objecter.h
@@ -54,6 +54,7 @@ struct ObjectOperation {
   vector<OSDOp> ops;
   int flags;
   int priority;
+  BLKIN_TRACE_REF(trace);
 
   vector<bufferlist*> out_bl;
   vector<Context*> out_handler;
@@ -71,6 +72,8 @@ struct ObjectOperation {
     return ops.size();
   }
 
+  BLKIN_OP_SET_TRACE_DECL()
+
   void set_last_op_flags(int flags) {
     assert(!ops.empty());
     ops.rbegin()->op.flags = flags;
@@ -1063,6 +1066,7 @@ private:
   atomic_t global_op_flags; // flags which are applied to each IO op
   bool keep_balanced_budget;
   bool honor_osdmap_full;
+  BLKIN_END_REF(objecter_endpoint)
 
 public:
   void maybe_request_map();
@@ -1182,6 +1186,7 @@ public:
     epoch_t map_dne_bound;
 
     bool budgeted;
+    BLKIN_TRACE_REF(trace);
 
     /// true if we should resend this message on failure
     bool should_resend;
@@ -1231,6 +1236,8 @@ public:
 	target.base_oloc.key.clear();
     }
 
+    BLKIN_OP_SET_TRACE_DECL()
+
     bool operator<(const Op& other) const {
       return tid < other.tid;
     }
@@ -1241,6 +1248,7 @@ public:
 	delete out_handler.back();
 	out_handler.pop_back();
       }
+      BLKIN_OP_EVENT_IF(trace, trace, "span_ended");
     }
   };
 
@@ -1862,7 +1870,9 @@ private:
     op_throttle_bytes(cct, "objecter_bytes", cct->_conf->objecter_inflight_op_bytes),
     op_throttle_ops(cct, "objecter_ops", cct->_conf->objecter_inflight_ops),
     epoch_barrier(0)
-  { }
+  {
+    BLKIN_MSG_END(objecter_endpoint, "0.0.0.0", 0, "objecter");
+  }
   ~Objecter();
 
   void init();
@@ -2054,6 +2064,7 @@ public:
     Op *o = prepare_read_op(oid, oloc, op, snapid, pbl, flags, onack, objver, data_offset);
     if (features)
       o->features = features;
+    BLKIN_OP_SET_TRACE(o, op.trace);
     return op_submit(o);
   }
   ceph_tid_t pg_read(uint32_t hash, object_locator_t oloc,
@@ -2232,6 +2243,29 @@ public:
     return read(oid, oloc, 0, 0, snap, pbl, flags | global_op_flags.read() | CEPH_OSD_FLAG_READ, onfinish, objver);
   }
 
+#ifdef WITH_BLKIN
+  ceph_tid_t read_traced(const object_t& oid, const object_locator_t& oloc,
+	     uint64_t off, uint64_t len, snapid_t snap, bufferlist *pbl, int flags,
+	     Context *onfinish, struct blkin_trace_info *info,
+	     version_t *objver = NULL, ObjectOperation *extra_ops = NULL) {
+    vector<OSDOp> ops;
+    int i = init_ops(ops, 1, extra_ops);
+    ops[i].op.op = CEPH_OSD_OP_READ;
+    ops[i].op.extent.offset = off;
+    ops[i].op.extent.length = len;
+    ops[i].op.extent.truncate_size = 0;
+    ops[i].op.extent.truncate_seq = 0;
+    Op *o = new Op(oid, oloc, ops, flags | global_op_flags.read() | CEPH_OSD_FLAG_READ, onfinish, 0, objver);
+    o->snapid = snap;
+    o->outbl = pbl;
+    ZTracer::ZTraceRef t = ZTracer::create_ZTrace("librados", objecter_endpoint);
+    t->set_trace_info(info);
+    t->event("objecter_read");
+    o->set_trace(t);
+    free(info);
+    return op_submit(o);
+  }
+#endif // WITH_BLKIN
 
   // writes
   ceph_tid_t _modify(const object_t& oid, const object_locator_t& oloc,
@@ -2431,11 +2465,38 @@ public:
     return op_submit(o);
   }
 
+#ifdef WITH_BLKIN
+  ceph_tid_t write_traced(const object_t& oid, const object_locator_t& oloc,
+	      uint64_t off, uint64_t len, const SnapContext& snapc, const bufferlist &bl,
+	      utime_t mtime, int flags,
+	      Context *onack, Context *oncommit, struct blkin_trace_info *info,
+	      version_t *objver = NULL, ObjectOperation *extra_ops = NULL) {
+    vector<OSDOp> ops;
+    int i = init_ops(ops, 1, extra_ops);
+    ops[i].op.op = CEPH_OSD_OP_WRITE;
+    ops[i].op.extent.offset = off;
+    ops[i].op.extent.length = len;
+    ops[i].op.extent.truncate_size = 0;
+    ops[i].op.extent.truncate_seq = 0;
+    ops[i].indata = bl;
+    Op *o = new Op(oid, oloc, ops, flags | global_op_flags.read() | CEPH_OSD_FLAG_WRITE, onack, oncommit, objver);
+    o->mtime = mtime;
+    o->snapc = snapc;
+    ZTracer::ZTraceRef t = ZTracer::create_ZTrace("librados", objecter_endpoint);
+    t->set_trace_info(info);
+    t->event("objecter_write");
+    o->set_trace(t);
+    free(info);
+    return op_submit(o);
+  }
+#endif // WITH_BLKIN
+
   void list_nobjects(NListContext *p, Context *onfinish);
   uint32_t list_nobjects_seek(NListContext *p, uint32_t pos);
   void list_objects(ListContext *p, Context *onfinish);
   uint32_t list_objects_seek(ListContext *p, uint32_t pos);
 
+
   // -------------------------
   // pool ops
 private:

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V6 4/5] Rados support for blkin (LTTng + Zipkin) tracing
  2015-03-11  2:43 (unknown), Andrew Shewmaker
                   ` (2 preceding siblings ...)
  2015-03-11  2:43 ` [PATCH V6 3/5] OSD " Andrew Shewmaker
@ 2015-03-11  2:43 ` Andrew Shewmaker
  2015-03-11  2:43 ` [PATCH V6 5/5] Rados example " Andrew Shewmaker
  4 siblings, 0 replies; 6+ messages in thread
From: Andrew Shewmaker @ 2015-03-11  2:43 UTC (permalink / raw)
  To: ceph-devel; +Cc: agshew, marioskogias, chendi.xue

 * Adds traced versions of Rados aio_read/write

Signed-off-by: Marios-Evaggelos Kogias <marioskogias@gmail.com>
Signed-off-by: Filippos Giannakos <philipgian@grnet.gr>
Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
---
 include/rados/librados.h |   13 ++++++
 librados/IoCtxImpl.cc    |   91 +++++++++++++++++++++++++++++++++++++++++++++++
 librados/IoCtxImpl.h     |   12 ++++++
 librados/RadosClient.cc  |    1 
 librados/librados.cc     |   30 +++++++++++++++
 5 files changed, 147 insertions(+)

diff --git a/src/include/rados/librados.h b/src/include/rados/librados.h
index 15a225b..aa0416e 100644
--- a/src/include/rados/librados.h
+++ b/src/include/rados/librados.h
@@ -1736,6 +1736,13 @@ CEPH_RADOS_API int rados_aio_write(rados_ioctx_t io, const char *oid,
 		                   rados_completion_t completion,
 		                   const char *buf, size_t len, uint64_t off);
 
+#ifdef WITH_BLKIN
+struct blkin_trace_info;
+int rados_aio_write_traced(rados_ioctx_t io, const char *oid,
+        rados_completion_t completion,
+        const char *buf, size_t len, uint64_t off,
+        struct blkin_trace_info *info);
+#endif
 /**
  * Asychronously append data to an object
  *
@@ -1818,6 +1825,12 @@ CEPH_RADOS_API int rados_aio_read(rados_ioctx_t io, const char *oid,
 		                  rados_completion_t completion,
 		                  char *buf, size_t len, uint64_t off);
 
+#ifdef WTIH_BLKIN
+int rados_aio_read_traced(rados_ioctx_t io, const char *oid,
+        rados_completion_t completion,
+        char *buf, size_t len, uint64_t off,
+        struct blkin_trace_info *info);
+#endif
 /**
  * Block until all pending writes in an io context are safe
  *
diff --git a/src/librados/IoCtxImpl.cc b/src/librados/IoCtxImpl.cc
index 5ef56c0..cfff8a6 100644
--- a/src/librados/IoCtxImpl.cc
+++ b/src/librados/IoCtxImpl.cc
@@ -31,6 +31,7 @@ librados::IoCtxImpl::IoCtxImpl() :
   aio_write_seq(0), cached_pool_names_lock("librados::IoCtxImpl::cached_pool_names_lock"),
   objecter(NULL)
 {
+    BLKIN_MSG_END(ioctx_endpoint, "0.0.0.0", 0, "ioctx");
 }
 
 librados::IoCtxImpl::IoCtxImpl(RadosClient *c, Objecter *objecter,
@@ -42,6 +43,7 @@ librados::IoCtxImpl::IoCtxImpl(RadosClient *c, Objecter *objecter,
     aio_write_seq(0), cached_pool_names_lock("librados::IoCtxImpl::cached_pool_names_lock"),
     objecter(objecter)
 {
+    BLKIN_MSG_END(ioctx_endpoint, "0.0.0.0", 0, "ioctx");
 }
 
 void librados::IoCtxImpl::set_snap_read(snapid_t s)
@@ -516,6 +518,7 @@ int librados::IoCtxImpl::operate(const object_t& oid, ::ObjectOperation *o,
   Objecter::Op *objecter_op = objecter->prepare_mutate_op(oid, oloc,
 	                                                  *o, snapc, ut, flags,
 	                                                  NULL, oncommit, &ver);
+  BLKIN_OP_SET_TRACE(objecter_op, o->trace);
   objecter->op_submit(objecter_op);
 
   mylock.Lock();
@@ -645,6 +648,48 @@ int librados::IoCtxImpl::aio_read(const object_t oid, AioCompletionImpl *c,
   return 0;
 }
 
+#ifdef WITH_BLKIN
+int librados::IoCtxImpl::aio_read_traced(const object_t oid, AioCompletionImpl *c,
+					 char *buf, size_t len, uint64_t off,
+					 uint64_t snapid,
+					 struct blkin_trace_info *info)
+{
+  if (len > (size_t) INT_MAX)
+    return -EDOM;
+
+  /*handle trace*/
+  ZTracer::ZTraceRef trace;
+  trace = ZTracer::create_ZTrace("librados", ioctx_endpoint, info, true);
+  if (!trace) {
+    ldout(client->cct, 5) << "librados read trace could not be created" << dendl;
+    return -1;
+  }
+
+  trace->event("librados accept");
+  Context *onack = new C_aio_Ack(c);
+
+  c->is_read = true;
+  c->io = this;
+  c->bl.clear();
+  c->bl.push_back(buffer::create_static(len, buf));
+  c->blp = &c->bl;
+
+  c->tid = objecter->read(oid, oloc,
+		 off, len, snapid, &c->bl, 0,
+		 onack, &c->objver);
+
+  trace->event("send to objecter");
+  struct blkin_trace_info *child_info = (struct blkin_trace_info *)
+      malloc(sizeof(struct blkin_trace_info));
+  trace->get_trace_info(child_info);
+  objecter->read_traced(oid, oloc,
+		 off, len, snapid, &c->bl, 0,
+		 onack, child_info, &c->objver);
+
+  return 0;
+}
+#endif // WITH_BLKIN
+
 class C_ObjectOperation : public Context {
 public:
   ::ObjectOperation m_ops;
@@ -705,6 +750,52 @@ int librados::IoCtxImpl::aio_write(const object_t &oid, AioCompletionImpl *c,
   return 0;
 }
 
+#ifdef WITH_BLKIN
+int librados::IoCtxImpl::aio_write_traced(const object_t &oid, AioCompletionImpl *c,
+					  const bufferlist& bl, size_t len,
+					  uint64_t off, struct blkin_trace_info *info)
+{
+  utime_t ut = ceph_clock_now(client->cct);
+  ldout(client->cct, 20) << "aio_write_traced " << oid << " " << off << "~" << len << " snapc=" << snapc << " snap_seq=" << snap_seq << dendl;
+
+  if (len > UINT_MAX/2)
+    return -E2BIG;
+  /* can't write to a snapshot */
+  if (snap_seq != CEPH_NOSNAP)
+    return -EROFS;
+
+  /*handle trace*/
+  ZTracer::ZTraceRef trace;
+  trace = ZTracer::create_ZTrace("librados", ioctx_endpoint, info, true);
+  if (!trace) {
+    ldout(client->cct, 5) << "librados write trace could not be created" << dendl;
+    return -1;
+  }
+
+  trace->event("librados accept");
+
+  c->io = this;
+  queue_aio_write(c);
+
+  Context *onack = new C_aio_Ack(c);
+  Context *onsafe = new C_aio_Safe(c);
+
+  c->tid = objecter->write(oid, oloc,
+		  off, len, snapc, bl, ut, 0,
+		  onack, onsafe, &c->objver);
+
+  trace->event("send to objecter");
+  struct blkin_trace_info *child_info = (struct blkin_trace_info *)
+      malloc(sizeof(struct blkin_trace_info));
+  trace->get_trace_info(child_info);
+  objecter->write_traced(oid, oloc,
+		  off, len, snapc, bl, ut, 0,
+		  onack, onsafe, child_info, &c->objver);
+
+  return 0;
+}
+#endif // WITH_BLKIN
+
 int librados::IoCtxImpl::aio_append(const object_t &oid, AioCompletionImpl *c,
 				    const bufferlist& bl, size_t len)
 {
diff --git a/src/librados/IoCtxImpl.h b/src/librados/IoCtxImpl.h
index df73b03..916c690 100644
--- a/src/librados/IoCtxImpl.h
+++ b/src/librados/IoCtxImpl.h
@@ -51,6 +51,8 @@ struct librados::IoCtxImpl {
 
   Objecter *objecter;
 
+  BLKIN_END_REF(ioctx_endpoint)
+
   IoCtxImpl();
   IoCtxImpl(RadosClient *c, Objecter *objecter,
 	    int64_t poolid, snapid_t s);
@@ -174,11 +176,21 @@ struct librados::IoCtxImpl {
 	       bufferlist *pbl, size_t len, uint64_t off, uint64_t snapid);
   int aio_read(object_t oid, AioCompletionImpl *c,
 	       char *buf, size_t len, uint64_t off, uint64_t snapid);
+#ifdef WITH_BLKIN
+  int aio_read_traced(object_t oid, AioCompletionImpl *c,
+	       char *buf, size_t len, uint64_t off, uint64_t snapid,
+	       struct blkin_trace_info *info);
+#endif
   int aio_sparse_read(const object_t oid, AioCompletionImpl *c,
 		      std::map<uint64_t,uint64_t> *m, bufferlist *data_bl,
 		      size_t len, uint64_t off, uint64_t snapid);
   int aio_write(const object_t &oid, AioCompletionImpl *c,
 		const bufferlist& bl, size_t len, uint64_t off);
+#ifdef WITH_BLKIN
+  int aio_write_traced(const object_t &oid, AioCompletionImpl *c,
+		const bufferlist& bl, size_t len, uint64_t off,
+		struct blkin_trace_info *info);
+#endif
   int aio_append(const object_t &oid, AioCompletionImpl *c,
 		 const bufferlist& bl, size_t len);
   int aio_write_full(const object_t &oid, AioCompletionImpl *c,
diff --git a/src/librados/RadosClient.cc b/src/librados/RadosClient.cc
index 8e63fce..f19bf69 100644
--- a/src/librados/RadosClient.cc
+++ b/src/librados/RadosClient.cc
@@ -79,6 +79,7 @@ librados::RadosClient::RadosClient(CephContext *cct_)
     log_last_version(0), log_cb(NULL), log_cb_arg(NULL),
     finisher(cct)
 {
+  BLKIN_ZTRACE_INIT();
 }
 
 int64_t librados::RadosClient::lookup_pool(const char *name)
diff --git a/src/librados/librados.cc b/src/librados/librados.cc
index cbefe0b..918bce7 100644
--- a/src/librados/librados.cc
+++ b/src/librados/librados.cc
@@ -3744,6 +3744,20 @@ extern "C" int rados_aio_read(rados_ioctx_t io, const char *o,
   return retval;
 }
 
+#ifdef WITH_BLKIN
+extern "C" int rados_aio_read_traced(rados_ioctx_t io, const char *o,
+				rados_completion_t completion,
+				char *buf, size_t len, uint64_t off,
+				struct blkin_trace_info *info)
+{
+	librados::IoCtxImpl *ctx = (librados::IoCtxImpl *)io;
+	object_t oid(o);
+	return ctx->aio_read_traced(oid,
+			(librados::AioCompletionImpl*)completion, buf, len,
+			off, ctx->snap_seq, info);
+}
+#endif
+
 extern "C" int rados_aio_write(rados_ioctx_t io, const char *o,
 				rados_completion_t completion,
 				const char *buf, size_t len, uint64_t off)
@@ -3761,6 +3775,22 @@ extern "C" int rados_aio_write(rados_ioctx_t io, const char *o,
   return retval;
 }
 
+#ifdef WITH_BLKIN
+extern "C" int rados_aio_write_traced(rados_ioctx_t io, const char *o,
+		rados_completion_t completion,
+		const char *buf, size_t len, uint64_t off,
+		struct blkin_trace_info *info)
+{
+	librados::IoCtxImpl *ctx = (librados::IoCtxImpl *)io;
+	object_t oid(o);
+	bufferlist bl;
+	bl.append(buf, len);
+	return ctx->aio_write_traced(oid,
+			(librados::AioCompletionImpl*)completion,bl, len, off,
+			info);
+}
+#endif
+
 extern "C" int rados_aio_append(rados_ioctx_t io, const char *o,
 				rados_completion_t completion,
 				const char *buf, size_t len)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V6 5/5] Rados example for blkin (LTTng + Zipkin) tracing
  2015-03-11  2:43 (unknown), Andrew Shewmaker
                   ` (3 preceding siblings ...)
  2015-03-11  2:43 ` [PATCH V6 4/5] Rados " Andrew Shewmaker
@ 2015-03-11  2:43 ` Andrew Shewmaker
  4 siblings, 0 replies; 6+ messages in thread
From: Andrew Shewmaker @ 2015-03-11  2:43 UTC (permalink / raw)
  To: ceph-devel; +Cc: agshew, marioskogias, chendi.xue

This code is Marios', with the following exceptions:
 * split out example from other tracing changes
 * changed name of example
 * whitespace issues have been fixed
 * Makefile cleans up the new example

Signed-off-by: Marios-Evaggelos Kogias <marioskogias@gmail.com>
Signed-off-by: Filippos Giannakos <philipgian@grnet.gr>
Signed-off-by: Andrew Shewmaker <agshew@gmail.com>
---
 Makefile      |    9 ++++
 hello_blkin.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 116 insertions(+)
 create mode 100644 examples/librados/hello_blkin.c

diff --git a/examples/librados/Makefile b/examples/librados/Makefile
index 190cca2..f8d8e1d 100644
--- a/examples/librados/Makefile
+++ b/examples/librados/Makefile
@@ -1,4 +1,8 @@
+ifdef WITH_BLKIN
+all: librados_hello_world librados_hello_world_c librados_hello_blkin
+else
 all: librados_hello_world librados_hello_world_c
+endif
 
 librados_hello_world: hello_world.cc
 	g++ -g -c hello_world.cc -o hello_world.o $(CFLAGS)
@@ -8,6 +12,11 @@ librados_hello_world_c: hello_world_c.c
 	cc -g -c hello_world_c.c -o hello_world_c.o $(CFLAGS)
 	cc -g hello_world_c.o -lrados -o librados_hello_world_c $(LDFLAGS)
 
+librados_hello_blkin: hello_blkin.c
+	cc -g -c hello_blkin.c -o hello_blkin.o $(CFLAGS)
+	cc -g hello_blkin.o -lrados -lblkin-front -o librados_hello_blkin $(LDFLAGS)
+
 clean:
 	rm hello_world.o librados_hello_world
 	rm hello_world_c.o librados_hello_world_c
+	rm hello_blkin.o librados_hello_blkin
diff --git a/examples/librados/hello_blkin.c b/examples/librados/hello_blkin.c
new file mode 100644
index 0000000..1bc1c50
--- /dev/null
+++ b/examples/librados/hello_blkin.c
@@ -0,0 +1,107 @@
+// -*- mode:C++; tab-width:8; c-basic-offset:2; indent-tabs-mode:t -*-
+// vim: ts=8 sw=2 smarttab
+
+#include <stdio.h>
+#include <rados/librados.h>
+#include <stdlib.h>
+#include <blkin-front.h>
+
+#define POOL "data"
+
+const char *object_name = "hello_world_object";
+const char *object_val = "hello world!";
+
+rados_ioctx_t *connect_to_rados()
+{
+  rados_t rados;
+  rados_ioctx_t *ioctx = malloc(sizeof(rados_ioctx_t));
+
+  if (rados_create(&rados, NULL) < 0) {
+    return NULL;
+  }
+  if (rados_conf_read_file(rados, NULL) < 0){
+    return NULL;
+  }
+  if (rados_connect(rados) < 0) {
+    printf("Could not connect\n");
+    rados_shutdown(rados);
+    return NULL;
+  }
+  if (rados_pool_lookup(rados, POOL) < 0) {
+    printf("Could not find pool\n");
+    rados_shutdown(rados);
+    return NULL;
+  }
+  if (rados_ioctx_create(rados, POOL, ioctx) < 0){
+    rados_shutdown(rados);
+    return NULL;
+  }
+  return ioctx;
+}
+
+void end_read(rados_completion_t c, void *arg)
+{
+  fprintf(stderr, "just read\n");
+}
+
+void end_write(rados_completion_t c, void *arg)
+{
+  fprintf(stderr, "just wrote\n");
+}
+
+int main()
+{
+  int r, i;
+  char temp[12];
+  rados_completion_t compl;
+  rados_ioctx_t *rados_ctx;
+  /*initialize lib*/
+  r = blkin_init();
+  if (r < 0) {
+    fprintf(stderr, "Could not initialize blkin\n");
+    exit(1);
+  }
+  /*initialize endpoint*/
+  struct blkin_endpoint endp;
+  blkin_init_endpoint(&endp, "10.0.0.1", 5000, "rados client");
+  struct blkin_trace trace;
+  blkin_init_new_trace(&trace, "client", &endp);
+  BLKIN_TIMESTAMP(&trace, &endp, "start");
+
+  //connect
+  rados_ctx = connect_to_rados();
+  if (rados_ctx == NULL){
+    printf("Connect return NULL\n");
+    exit(1);
+  }
+  printf("connected\n");
+
+  //write an object
+  r = rados_aio_create_completion(NULL, end_write, end_write, &compl);
+  if (r<0) {
+    printf("could not create completion\n");
+    exit(1);
+  }
+  printf("created completion\n");
+  r = rados_aio_write_traced(*rados_ctx, object_name, compl, object_val, 12,
+      0, &trace.info);
+
+  rados_aio_wait_for_complete(compl);
+  printf("completed\n");
+
+  //read an object
+  r = rados_aio_create_completion(NULL, end_read, end_read, &compl);
+  if (r<0) {
+    printf("could not create completion\n");
+    exit(1);
+  }
+  printf("created completion\n");
+  r = rados_aio_read_traced(*rados_ctx, object_name, compl, temp, 12, 0,
+      &trace.info);
+  rados_aio_wait_for_complete(compl);
+  printf("I read:\n");
+  for (i=0;i<12;i++)
+    printf("%c",temp[i]);
+  printf("completed\n");
+  BLKIN_TIMESTAMP(&trace, &endp, "Span ended");
+}

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-03-11  2:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-11  2:43 (unknown), Andrew Shewmaker
2015-03-11  2:43 ` [PATCH V6 1/5] Build support for BlkKin (LTTng + Zipkin) tracing Andrew Shewmaker
2015-03-11  2:43 ` [PATCH V6 2/5] Initial support for blkin " Andrew Shewmaker
2015-03-11  2:43 ` [PATCH V6 3/5] OSD " Andrew Shewmaker
2015-03-11  2:43 ` [PATCH V6 4/5] Rados " Andrew Shewmaker
2015-03-11  2:43 ` [PATCH V6 5/5] Rados example " Andrew Shewmaker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.