From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
"Luck, Tony" <tony.luck@intel.com>,
linux-kernel@vger.kernel.org, ying.huang@intel.com, bp@alien8.de,
tglx@linutronix.de, akpm@linux-foundation.org,
mchehab@redhat.com, Arnaldo Carvalho de Melo <acme@redhat.com>,
Arjan van de Ven <arjan@infradead.org>
Subject: Re: Tracing Requirements (was: [RFC/Requirements/Design] h/w error reporting)
Date: Wed, 10 Nov 2010 15:23:16 -0500 [thread overview]
Message-ID: <20101110202316.GA32396@Krystal> (raw)
In-Reply-To: <20101110191127.GA6190@nowhere>
* Frederic Weisbecker (fweisbec@gmail.com) wrote:
> On Wed, Nov 10, 2010 at 02:00:45PM -0500, Steven Rostedt wrote:
> > On Wed, 2010-11-10 at 19:41 +0100, Ingo Molnar wrote:
> >
> > > We'll need to embark on this incremental path instead of a rewrite-the-world thing.
> > > As a maintainer my task is to say 'no' to rewrite-the-world approaches - and we can
> > > and will do better here.
> >
> > Thus you are saying that we stick to the status quo, and also ignore the
> > fact that perf was a rewrite-the-world from ftrace to begin with.
>
> Perhaps you and Mathieu can summarize your requirements here and then explain
> why extending the current ABI wouldn't work. It's quite normal that people
> try to find a solution fully backward compatible in the first place. If
> it's not possible, fine, but then justify it.
Sure, here are the requirements my user-base have, followed by a listing of Perf
and Ftrace pain points, some of which are directly derived from their respective
ABIs, others partially caused by their implementation and partially caused by
their ABI.
- Low overhead is key
- 150 ns per event (cache-hot)
- Zero-copy (splice to disk/network, mmap for zero-copy in-place data
analysis)
- Compactness of traces
- e.g. 96 bits per event (including typical 64-bit payload), no PID saved per
event.
- Scalability to multi-core and multi-processor
- Per-CPU buffers, time-stamp reading both scalable to many cpus *and* accurate
- Production-grace tracer reliability
- Trace clock accuracy within 100ns, ordering can be inferred based on
lock/interrupt handler knowledge, ability to know when ordering might be
wrong.
- Flight recorder mode
- Support concurrent read while writer is overwriting buffer data
(Thomas Gleixner named these "trace-shots")
- Support multiple trace sessions in parallel
- Engineer + Operator + flight recorder for automated bug reports
- Availability of trace buffers for crash diagnosis
- Save to disk, network, use kexec or persistent memory
- Heterogeneous environment support
- Portability
- Distinct host/target environment support
- Management of multiple target kernel versions
- No dependency on kernel image to analyze traces
(traces contain complete information)
- Live view/analysis of trace streams via the network
- Impact on buffer flushing, power saving, idle, ...
- Synchronized system-wide (hypervisor, kernel and user-space) traces
- Scalability of analysis tools to very large data sets (> 10GB)
- Standardization of trace format across analysis tools
* Ring Buffer issues with Perf:
- Perf does not support flight recorder tracing (concurrent read/write)
- Sub-buffers are needed to support concurrent read/writes in flight recorder
mode. Peter still has to convince me otherwise (if he cares).
- Imply adding padding when an event does not fit in the current sub-buffer
(ABI change). Note for Frederic: creating a single-subbuffer as large as the
buffer does not solve this problem, because perf allows writing an event
across the end of the buffer and its beginning. In a scheme where
sub-buffers can be discarded, it makes it quite unreliable to try to figure
out where partially overwritten events end.
- Calling the kernel when finishing reading a sub-buffer is needed for flight
recorder mode tracing. It is not possible with the mmap-head-tail-counter
ABI Perf currently uses for reader-writer synchronization.
- Perf is 5 times slower than Ftrace/Generic Ring Buffer Library/LTTng.
- Partially due to implementation.
- Partially due to large event size.
* Trace Format issues with Perf:
- Perf event headers are too large
- Handling of dynamically added instrumentation while trace is recorded is
inexistent.
* Ring Buffer issues with Ftrace:
- Ftrace needs an internal API cleanup.
- "peek" is an unnecessary API duplication which complicates everything down
to the buffer-level.
- Ftrace does not support cross-pages event writes
- Limits event size to less than 4kB
* Trace Format issues with Ftrace:
- Ftrace timestamps are saved as delta from previous event
- Only works for tracing where preemption can be disabled, unusable for
user-space tracing.
- Creates an artificial data dependency between events, leading to odd
side-effects when dealing with nesting over tracer
- 0 ns IRQ/SOFTIRQ handler duration side-effect
- Event size limited to one page
- Ftrace event headers are still too large
- Handling of dynamically added instrumentation while trace is recorded is
inexistent.
So given that fixing these issues requires a large ABI rework of both Ftrace and
Perf, creating a new ABI rather than building on top of an ABI not initially
designed to meet these requirements seems to really make sense here.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2010-11-10 20:23 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-10 0:56 [RFC/Requirements/Design] h/w error reporting Luck, Tony
2010-11-10 10:14 ` Ingo Molnar
2010-11-10 14:40 ` Steven Rostedt
2010-11-10 14:43 ` Peter Zijlstra
2010-11-10 15:09 ` Steven Rostedt
2010-11-10 15:28 ` Mathieu Desnoyers
2010-11-10 15:30 ` Peter Zijlstra
2010-11-10 15:53 ` Steven Rostedt
2010-11-10 16:52 ` Steven Rostedt
2010-11-10 17:05 ` Borislav Petkov
2010-11-10 17:41 ` Ingo Molnar
2010-11-10 17:50 ` Luck, Tony
2010-11-10 18:09 ` Steven Rostedt
2010-11-10 18:52 ` Ingo Molnar
2010-11-10 17:25 ` Frederic Weisbecker
2010-11-10 17:48 ` Ingo Molnar
2010-11-10 18:05 ` Steven Rostedt
2010-11-10 18:23 ` Luck, Tony
2010-11-10 18:31 ` Peter Zijlstra
2010-11-10 18:49 ` Ingo Molnar
2010-11-10 18:24 ` Peter Zijlstra
2010-11-10 18:41 ` Ingo Molnar
2010-11-10 19:00 ` Steven Rostedt
2010-11-10 19:11 ` Ingo Molnar
2010-11-10 19:11 ` Frederic Weisbecker
2010-11-10 19:30 ` Ingo Molnar
2010-11-10 19:48 ` Steven Rostedt
2010-11-10 20:23 ` Mathieu Desnoyers [this message]
2010-11-10 20:54 ` Tracing Requirements (was: [RFC/Requirements/Design] h/w error reporting) Luck, Tony
2010-11-10 21:06 ` Steven Rostedt
2010-11-10 21:34 ` Steven Rostedt
2010-11-10 22:51 ` Mathieu Desnoyers
2010-11-10 23:12 ` Thomas Gleixner
2010-11-10 23:20 ` Steven Rostedt
2010-11-10 23:45 ` Thomas Gleixner
2010-11-11 18:25 ` Ted Ts'o
2010-11-10 23:28 ` Mathieu Desnoyers
2010-11-10 23:58 ` Thomas Gleixner
2010-11-11 9:17 ` Ingo Molnar
2010-11-11 13:37 ` Mathieu Desnoyers
2010-11-10 21:30 ` Frederic Weisbecker
2010-11-10 21:54 ` Steven Rostedt
2010-11-10 22:19 ` Frederic Weisbecker
2010-11-10 22:49 ` Frederic Weisbecker
2010-11-11 0:11 ` Mathieu Desnoyers
2010-11-11 16:10 ` Steven Rostedt
2010-11-11 16:34 ` Mathieu Desnoyers
2010-11-10 19:16 ` [RFC/Requirements/Design] h/w error reporting Steven Rostedt
2010-11-10 19:38 ` Steven Rostedt
2010-11-10 18:27 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101110202316.GA32396@Krystal \
--to=mathieu.desnoyers@efficios.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=bp@alien8.de \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@redhat.com \
--cc=mingo@elte.hu \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.