From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
"Luck, Tony" <tony.luck@intel.com>,
linux-kernel@vger.kernel.org, ying.huang@intel.com, bp@alien8.de,
tglx@linutronix.de, akpm@linux-foundation.org,
mchehab@redhat.com, Arnaldo Carvalho de Melo <acme@redhat.com>,
Arjan van de Ven <arjan@infradead.org>
Subject: Re: Tracing Requirements (was: [RFC/Requirements/Design] h/w error reporting)
Date: Wed, 10 Nov 2010 15:23:16 -0500 [thread overview]
Message-ID: <20101110202316.GA32396@Krystal> (raw)
In-Reply-To: <20101110191127.GA6190@nowhere>
* Frederic Weisbecker (fweisbec@gmail.com) wrote:
> On Wed, Nov 10, 2010 at 02:00:45PM -0500, Steven Rostedt wrote:
> > On Wed, 2010-11-10 at 19:41 +0100, Ingo Molnar wrote:
> >
> > > We'll need to embark on this incremental path instead of a rewrite-the-world thing.
> > > As a maintainer my task is to say 'no' to rewrite-the-world approaches - and we can
> > > and will do better here.
> >
> > Thus you are saying that we stick to the status quo, and also ignore the
> > fact that perf was a rewrite-the-world from ftrace to begin with.
>
> Perhaps you and Mathieu can summarize your requirements here and then explain
> why extending the current ABI wouldn't work. It's quite normal that people
> try to find a solution fully backward compatible in the first place. If
> it's not possible, fine, but then justify it.
Sure, here are the requirements my user-base have, followed by a listing of Perf
and Ftrace pain points, some of which are directly derived from their respective
ABIs, others partially caused by their implementation and partially caused by
their ABI.
- Low overhead is key
- 150 ns per event (cache-hot)
- Zero-copy (splice to disk/network, mmap for zero-copy in-place data
analysis)
- Compactness of traces
- e.g. 96 bits per event (including typical 64-bit payload), no PID saved per
event.
- Scalability to multi-core and multi-processor
- Per-CPU buffers, time-stamp reading both scalable to many cpus *and* accurate
- Production-grace tracer reliability
- Trace clock accuracy within 100ns, ordering can be inferred based on
lock/interrupt handler knowledge, ability to know when ordering might be
wrong.
- Flight recorder mode
- Support concurrent read while writer is overwriting buffer data
(Thomas Gleixner named these "trace-shots")
- Support multiple trace sessions in parallel
- Engineer + Operator + flight recorder for automated bug reports
- Availability of trace buffers for crash diagnosis
- Save to disk, network, use kexec or persistent memory
- Heterogeneous environment support
- Portability
- Distinct host/target environment support
- Management of multiple target kernel versions
- No dependency on kernel image to analyze traces
(traces contain complete information)
- Live view/analysis of trace streams via the network
- Impact on buffer flushing, power saving, idle, ...
- Synchronized system-wide (hypervisor, kernel and user-space) traces
- Scalability of analysis tools to very large data sets (> 10GB)
- Standardization of trace format across analysis tools
* Ring Buffer issues with Perf:
- Perf does not support flight recorder tracing (concurrent read/write)
- Sub-buffers are needed to support concurrent read/writes in flight recorder
mode. Peter still has to convince me otherwise (if he cares).
- Imply adding padding when an event does not fit in the current sub-buffer
(ABI change). Note for Frederic: creating a single-subbuffer as large as the
buffer does not solve this problem, because perf allows writing an event
across the end of the buffer and its beginning. In a scheme where
sub-buffers can be discarded, it makes it quite unreliable to try to figure
out where partially overwritten events end.
- Calling the kernel when finishing reading a sub-buffer is needed for flight
recorder mode tracing. It is not possible with the mmap-head-tail-counter
ABI Perf currently uses for reader-writer synchronization.
- Perf is 5 times slower than Ftrace/Generic Ring Buffer Library/LTTng.
- Partially due to implementation.
- Partially due to large event size.
* Trace Format issues with Perf:
- Perf event headers are too large
- Handling of dynamically added instrumentation while trace is recorded is
inexistent.
* Ring Buffer issues with Ftrace:
- Ftrace needs an internal API cleanup.
- "peek" is an unnecessary API duplication which complicates everything down
to the buffer-level.
- Ftrace does not support cross-pages event writes
- Limits event size to less than 4kB
* Trace Format issues with Ftrace:
- Ftrace timestamps are saved as delta from previous event
- Only works for tracing where preemption can be disabled, unusable for
user-space tracing.
- Creates an artificial data dependency between events, leading to odd
side-effects when dealing with nesting over tracer
- 0 ns IRQ/SOFTIRQ handler duration side-effect
- Event size limited to one page
- Ftrace event headers are still too large
- Handling of dynamically added instrumentation while trace is recorded is
inexistent.
So given that fixing these issues requires a large ABI rework of both Ftrace and
Perf, creating a new ABI rather than building on top of an ABI not initially
designed to meet these requirements seems to really make sense here.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2010-11-10 20:23 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-10 0:56 [RFC/Requirements/Design] h/w error reporting Luck, Tony
2010-11-10 10:14 ` Ingo Molnar
2010-11-10 14:40 ` Steven Rostedt
2010-11-10 14:43 ` Peter Zijlstra
2010-11-10 15:09 ` Steven Rostedt
2010-11-10 15:28 ` Mathieu Desnoyers
2010-11-10 15:30 ` Peter Zijlstra
2010-11-10 15:53 ` Steven Rostedt
2010-11-10 16:52 ` Steven Rostedt
2010-11-10 17:05 ` Borislav Petkov
2010-11-10 17:41 ` Ingo Molnar
2010-11-10 17:50 ` Luck, Tony
2010-11-10 18:09 ` Steven Rostedt
2010-11-10 18:52 ` Ingo Molnar
2010-11-10 17:25 ` Frederic Weisbecker
2010-11-10 17:48 ` Ingo Molnar
2010-11-10 18:05 ` Steven Rostedt
2010-11-10 18:23 ` Luck, Tony
2010-11-10 18:31 ` Peter Zijlstra
2010-11-10 18:49 ` Ingo Molnar
2010-11-10 18:24 ` Peter Zijlstra
2010-11-10 18:41 ` Ingo Molnar
2010-11-10 19:00 ` Steven Rostedt
2010-11-10 19:11 ` Ingo Molnar
2010-11-10 19:11 ` Frederic Weisbecker
2010-11-10 19:30 ` Ingo Molnar
2010-11-10 19:48 ` Steven Rostedt
2010-11-10 20:23 ` Mathieu Desnoyers [this message]
2010-11-10 20:54 ` Tracing Requirements (was: [RFC/Requirements/Design] h/w error reporting) Luck, Tony
2010-11-10 21:06 ` Steven Rostedt
2010-11-10 21:34 ` Steven Rostedt
2010-11-10 22:51 ` Mathieu Desnoyers
2010-11-10 23:12 ` Thomas Gleixner
2010-11-10 23:20 ` Steven Rostedt
2010-11-10 23:45 ` Thomas Gleixner
2010-11-11 18:25 ` Ted Ts'o
2010-11-10 23:28 ` Mathieu Desnoyers
2010-11-10 23:58 ` Thomas Gleixner
2010-11-11 9:17 ` Ingo Molnar
2010-11-11 13:37 ` Mathieu Desnoyers
2010-11-10 21:30 ` Frederic Weisbecker
2010-11-10 21:54 ` Steven Rostedt
2010-11-10 22:19 ` Frederic Weisbecker
2010-11-10 22:49 ` Frederic Weisbecker
2010-11-11 0:11 ` Mathieu Desnoyers
2010-11-11 16:10 ` Steven Rostedt
2010-11-11 16:34 ` Mathieu Desnoyers
2010-11-10 19:16 ` [RFC/Requirements/Design] h/w error reporting Steven Rostedt
2010-11-10 19:38 ` Steven Rostedt
2010-11-10 18:27 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101110202316.GA32396@Krystal \
--to=mathieu.desnoyers@efficios.com \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=bp@alien8.de \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@redhat.com \
--cc=mingo@elte.hu \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox