linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC GIT PULL] perf/trace/lock optimization/scalability improvements
@ 2010-02-03  9:14 Frederic Weisbecker
  2010-02-03  9:14 ` [PATCH 01/11] tracing: Add lock_class_init event Frederic Weisbecker
                   ` (13 more replies)
  0 siblings, 14 replies; 55+ messages in thread
From: Frederic Weisbecker @ 2010-02-03  9:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Steven Rostedt, Paul Mackerras,
	Hitoshi Mitake, Li Zefan, Lai Jiangshan, Masami Hiramatsu,
	Jens Axboe

Hi,

There are many things that happen in this patchset, treating
different problems:

- remove most of the string copy overhead in fast path
- open the way for lock class oriented profiling (as
  opposite to lock instance profiling. Both can be useful
  in different ways).
- remove the buffers muliplexing (less contention)
- event injection support
- remove violent lock events recursion (only 2 among 3, the remaining
  one is detailed below).

Some differences, by running:
	perf lock record perf sched pipe -l 100000

Before the patchset:

	Total time: 91.015 [sec]

	     910.157300 usecs/op
		   1098 ops/sec

After this patchset applied:

	Total time: 43.706 [sec]

	     437.062080 usecs/op
		   2288 ops/sec

Although it's actually 50 secs after the very latest patch in this
series. It is supposed to bring more scalability (and I believe it
does on a box with more than two cpus, although I can't test).
But multiplexing the counters had a side effect: perf record has
only one buffer to eat and not 5 * NR_CPUS, which makes its job
a bit easier when we multiplex (at the cost of cpus contention of
course, but on my atom, the scalability gain is not very visible).

And also, after this odd patch:

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 98fd360..254b3d4 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3094,7 +3094,8 @@ static u32 perf_event_tid(struct perf_event *event, struct task_struct *p)
        if (event->parent)
                event = event->parent;
 
-       return task_pid_nr_ns(p, event->ns);
+       return p->pid;
 }

We get:

	Total time: 26.170 [sec]

	     261.707960 usecs/op
		   3821 ops/sec

Ie: 2x faster than this patchset, and more than 3x faster than
tip:/perf/core

This is because task_pid_nr_ns() takes a lock and creates
lock events recursion. We really need to fix that.

You can pull this patchset from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
	perf/core

Thanks.


---

Frederic Weisbecker (11):
      tracing: Add lock_class_init event
      tracing: Introduce TRACE_EVENT_INJECT
      tracing: Inject lock_class_init events on registration
      tracing: Add lock class id in lock_acquire event
      perf: New PERF_EVENT_IOC_INJECT ioctl
      perf: Handle injection ioctl with trace events
      perf: Handle injection iotcl for tracepoints from perf record
      perf/lock: Add support for lock_class_init events
      tracing: Remove the lock name from most lock events
      tracing/perf: Fix lock events recursions in the fast path
      perf lock: Drop the buffers multiplexing dependency


 include/linux/ftrace_event.h       |    6 +-
 include/linux/lockdep.h            |    4 +
 include/linux/perf_event.h         |    6 +
 include/linux/tracepoint.h         |    3 +
 include/trace/define_trace.h       |    6 +
 include/trace/events/lock.h        |   57 ++++--
 include/trace/ftrace.h             |   31 +++-
 kernel/lockdep.c                   |   16 ++
 kernel/perf_event.c                |   47 ++++-
 kernel/trace/trace_event_profile.c |   46 +++--
 kernel/trace/trace_events.c        |    3 +
 tools/perf/builtin-lock.c          |  345 ++++++++++++++++++++++++++++++++----
 tools/perf/builtin-record.c        |    9 +
 13 files changed, 497 insertions(+), 82 deletions(-)

^ permalink raw reply related	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2010-02-28 22:24 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-03  9:14 [RFC GIT PULL] perf/trace/lock optimization/scalability improvements Frederic Weisbecker
2010-02-03  9:14 ` [PATCH 01/11] tracing: Add lock_class_init event Frederic Weisbecker
2010-02-03  9:14 ` [PATCH 02/11] tracing: Introduce TRACE_EVENT_INJECT Frederic Weisbecker
2010-02-05 14:08   ` Steven Rostedt
2010-02-05 14:47   ` Steven Rostedt
2010-02-05 14:53     ` Peter Zijlstra
2010-02-05 15:07       ` Steven Rostedt
2010-02-06 12:20         ` Frederic Weisbecker
2010-02-06 13:19           ` Steven Rostedt
2010-02-10 10:04             ` Frederic Weisbecker
2010-02-10 14:05               ` Steven Rostedt
2010-02-11 18:57                 ` Frederic Weisbecker
2010-02-11 19:23                   ` Steven Rostedt
2010-02-03  9:14 ` [PATCH 03/11] tracing: Inject lock_class_init events on registration Frederic Weisbecker
2010-02-05 14:13   ` Steven Rostedt
2010-02-05 14:30     ` Peter Zijlstra
2010-02-05 14:44       ` Steven Rostedt
2010-02-03  9:14 ` [PATCH 04/11] tracing: Add lock class id in lock_acquire event Frederic Weisbecker
2010-02-03  9:14 ` [PATCH 05/11] perf: New PERF_EVENT_IOC_INJECT ioctl Frederic Weisbecker
2010-02-03  9:19   ` Frederic Weisbecker
2010-02-03  9:14 ` [PATCH 06/11] perf: Handle injection ioctl with trace events Frederic Weisbecker
2010-02-03  9:14 ` [PATCH 07/11] perf: Handle injection iotcl for tracepoints from perf record Frederic Weisbecker
2010-02-03  9:14 ` [PATCH 08/11] perf/lock: Add support for lock_class_init events Frederic Weisbecker
2010-02-03  9:14 ` [PATCH 09/11] tracing: Remove the lock name from most lock events Frederic Weisbecker
2010-02-03  9:14 ` [PATCH 10/11] tracing/perf: Fix lock events recursions in the fast path Frederic Weisbecker
2010-02-04 15:47   ` Paul E. McKenney
2010-02-05  2:38     ` Lai Jiangshan
2010-02-05  9:45       ` Peter Zijlstra
2010-02-05  9:50         ` Peter Zijlstra
2010-02-05 10:49           ` Ingo Molnar
2010-02-05 12:10             ` Peter Zijlstra
2010-02-05 12:12               ` Peter Zijlstra
2010-02-05 13:01                 ` Peter Zijlstra
2010-02-06 11:12                   ` Frederic Weisbecker
2010-02-06 11:24                     ` Peter Zijlstra
2010-02-06 11:40                       ` Frederic Weisbecker
2010-02-06 14:17                         ` Peter Zijlstra
2010-02-06 16:10                           ` Frederic Weisbecker
2010-02-07  9:45                             ` Peter Zijlstra
2010-02-10 10:17                               ` Frederic Weisbecker
2010-02-28 22:24                   ` Frederic Weisbecker
2010-02-03  9:14 ` [PATCH 11/11] perf lock: Drop the buffers multiplexing dependency Frederic Weisbecker
2010-02-03 10:25 ` [RFC GIT PULL] perf/trace/lock optimization/scalability improvements Jens Axboe
2010-02-03 20:50   ` Frederic Weisbecker
2010-02-03 21:21     ` Jens Axboe
2010-02-03 22:13       ` Frederic Weisbecker
2010-02-04 19:40     ` Jens Axboe
2010-02-06 10:37       ` Frederic Weisbecker
2010-02-03 10:26 ` Ingo Molnar
2010-02-03 21:26   ` Frederic Weisbecker
2010-02-03 10:33 ` Peter Zijlstra
2010-02-03 22:07   ` Frederic Weisbecker
2010-02-04  6:33     ` Ingo Molnar
2010-02-07 17:10     ` Peter Zijlstra
2010-02-10 10:49       ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).