From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
To: mhiramat@kernel.org, oleg@redhat.com, peterz@infradead.org,
srikar@linux.vnet.ibm.com, rostedt@goodmis.org
Cc: acme@kernel.org, ananth@linux.vnet.ibm.com,
akpm@linux-foundation.org, alexander.shishkin@linux.intel.com,
alexis.berlemont@gmail.com, corbet@lwn.net,
dan.j.williams@intel.com, jolsa@redhat.com, kan.liang@intel.com,
kjlx@templeofstupid.com, kstewart@linuxfoundation.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, milian.wolff@kdab.com, mingo@redhat.com,
namhyung@kernel.org, naveen.n.rao@linux.vnet.ibm.com,
pc@us.ibm.com, tglx@linutronix.de, yao.jin@linux.intel.com,
fengguang.wu@intel.com, jglisse@redhat.com,
Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Subject: [PATCH v2 0/9] trace_uprobe: Support SDT markers having reference count (semaphore)
Date: Wed, 4 Apr 2018 14:01:01 +0530 [thread overview]
Message-ID: <20180404083110.18647-1-ravi.bangoria@linux.vnet.ibm.com> (raw)
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:
if (reference_counter > 0) {
Execute marker instructions;
}
Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.
Currently, perf tool has limited supports for SDT markers. I.e. it
can not trace markers surrounded by reference counter. Also, it's
not easy to add reference counter logic in userspace tool like perf,
so basic idea for this patchset is to add reference counter logic in
the trace_uprobe infrastructure. Ex,[2]
# cat tick.c
...
for (i = 0; i < 100; i++) {
DTRACE_PROBE1(tick, loop1, i);
if (TICK_LOOP2_ENABLED()) {
DTRACE_PROBE1(tick, loop2, i);
}
printf("hi: %d\n", i);
sleep(1);
}
...
Here tick:loop1 is marker without reference counter where as tick:loop2
is surrounded by reference counter condition.
# perf buildid-cache --add /tmp/tick
# perf probe sdt_tick:loop1
# perf probe sdt_tick:loop2
# perf stat -e sdt_tick:loop1,sdt_tick:loop2 -- /tmp/tick
hi: 0
hi: 1
hi: 2
^C
Performance counter stats for '/tmp/tick':
3 sdt_tick:loop1
0 sdt_tick:loop2
2.747086086 seconds time elapsed
Perf failed to record data for tick:loop2. Same experiment with this
patch series:
# ./perf buildid-cache --add /tmp/tick
# ./perf probe sdt_tick:loop2
# ./perf stat -e sdt_tick:loop2 /tmp/tick
hi: 0
hi: 1
hi: 2
^C
Performance counter stats for '/tmp/tick':
3 sdt_tick:loop2
2.561851452 seconds time elapsed
Note:
- 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are referring it as 'reference
counter' in kernel / perf code.
v2 changes:
- [PATCH v2 3/9] is new. build_map_info() has a side effect. One has
to perform mmput() when he is done with the mm. Let free_map_info()
take care of mmput() so that one does not need to worry about it.
- [PATCH v2 6/9] sdt_update_ref_ctr(). No need to use memcpy().
Reference counter can be directly updated using normal assignment.
- [PATCH v2 6/9] Check valid vma is returned by sdt_find_vma() before
incrementing / decrementing a reference counter.
- [PATCH v2 6/9] Introduce utility functions for taking write lock on
dup_mmap_sem. Use these functions in trace_uprobe to avoide race with
fork / dup_mmap().
- [PATCH v2 6/9] Don't check presence of mm in tu->sml at decrement
time. Purpose of maintaining the list is to ensure increment happen
only once for each {trace_uprobe,mm} tuple.
- [PATCH v2 7/9] v1 was not removing mm from tu->sml when process
exits and tracing is still on. This leads to a problem if same
address gets used by new mm. Use mmu_notifier to remove such mm
from the list. This guarantees that all mm which has been added
to tu->sml will be removed from list either when tracing ends or
when process goes away.
- [PATCH v2 7/9] Patch description was misleading. Change it. Add
more generic python example.
- [PATCH v2 7/9] Convert sml_rw_sem into mutex sml_lock.
- [PATCH v2 7/9] Use builtin linked list in sdt_mm_list instead of
defining it's own pointer chain.
- Change the order of last two patches.
- [PATCH v2 9/9] Check availability of ref_ctr_offset support by
trace_uprobe infrastructure before using it. This ensures newer
perf tool will still work on older kernels which does not support
trace_uprobe with reference counter.
- Other changes as suggested by Masami, Oleg and Steve.
v1 can be found at:
https://lkml.org/lkml/2018/3/13/432
[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
[2] https://github.com/iovisor/bcc/issues/327#issuecomment-200576506
[3] https://lkml.org/lkml/2017/12/6/976
Oleg Nesterov (1):
Uprobe: Move mmput() into free_map_info()
Ravi Bangoria (8):
Uprobe: Export vaddr <-> offset conversion functions
mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
Uprobe: Rename map_info to uprobe_map_info
Uprobe: Export uprobe_map_info along with
uprobe_{build/free}_map_info()
trace_uprobe: Support SDT markers having reference count (semaphore)
trace_uprobe/sdt: Fix multiple update of same reference counter
trace_uprobe/sdt: Document about reference counter
perf probe: Support SDT markers having reference counter (semaphore)
Documentation/trace/uprobetracer.txt | 16 ++-
include/linux/mm.h | 12 ++
include/linux/uprobes.h | 19 +++
kernel/events/uprobes.c | 79 ++++++-----
kernel/trace/trace.c | 2 +-
kernel/trace/trace_uprobe.c | 261 ++++++++++++++++++++++++++++++++++-
tools/perf/util/probe-event.c | 18 ++-
tools/perf/util/probe-event.h | 1 +
tools/perf/util/probe-file.c | 34 ++++-
tools/perf/util/probe-file.h | 1 +
tools/perf/util/symbol-elf.c | 46 ++++--
tools/perf/util/symbol.h | 7 +
12 files changed, 431 insertions(+), 65 deletions(-)
--
1.8.3.1
next reply other threads:[~2018-04-04 8:28 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-04 8:31 Ravi Bangoria [this message]
2018-04-04 8:31 ` [PATCH v2 1/9] Uprobe: Export vaddr <-> offset conversion functions Ravi Bangoria
2018-04-04 8:31 ` [PATCH v2 2/9] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr() Ravi Bangoria
2018-04-04 8:31 ` [PATCH v2 3/9] Uprobe: Move mmput() into free_map_info() Ravi Bangoria
2018-04-04 8:31 ` [PATCH v2 4/9] Uprobe: Rename map_info to uprobe_map_info Ravi Bangoria
2018-04-04 8:31 ` [PATCH v2 5/9] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info() Ravi Bangoria
2018-04-04 8:31 ` [PATCH v2 6/9] trace_uprobe: Support SDT markers having reference count (semaphore) Ravi Bangoria
2018-04-04 15:03 ` [RFC PATCH] trace_uprobe: trace_uprobe_mmap() can be static kbuild test robot
2018-04-04 15:03 ` [PATCH v2 6/9] trace_uprobe: Support SDT markers having reference count (semaphore) kbuild test robot
2018-04-04 8:31 ` [PATCH v2 7/9] trace_uprobe/sdt: Fix multiple update of same reference counter Ravi Bangoria
2018-04-04 13:18 ` kbuild test robot
2018-04-04 13:24 ` kbuild test robot
2018-04-09 13:17 ` Oleg Nesterov
2018-04-09 13:32 ` Ravi Bangoria
2018-04-09 13:41 ` Ravi Bangoria
2018-04-09 13:29 ` Oleg Nesterov
2018-04-10 8:19 ` Ravi Bangoria
2018-04-10 11:06 ` Oleg Nesterov
2018-04-11 4:28 ` Ravi Bangoria
2018-04-04 8:31 ` [PATCH v2 8/9] trace_uprobe/sdt: Document about " Ravi Bangoria
2018-04-04 8:31 ` [PATCH v2 9/9] perf probe: Support SDT markers having reference counter (semaphore) Ravi Bangoria
2018-04-09 7:28 ` Masami Hiramatsu
2018-04-09 8:29 ` Ravi Bangoria
2018-04-09 14:08 ` Masami Hiramatsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180404083110.18647-1-ravi.bangoria@linux.vnet.ibm.com \
--to=ravi.bangoria@linux.vnet.ibm.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=alexis.berlemont@gmail.com \
--cc=ananth@linux.vnet.ibm.com \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=fengguang.wu@intel.com \
--cc=jglisse@redhat.com \
--cc=jolsa@redhat.com \
--cc=kan.liang@intel.com \
--cc=kjlx@templeofstupid.com \
--cc=kstewart@linuxfoundation.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhiramat@kernel.org \
--cc=milian.wolff@kdab.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=naveen.n.rao@linux.vnet.ibm.com \
--cc=oleg@redhat.com \
--cc=pc@us.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
--cc=yao.jin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).