public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"Luck, Tony" <tony.luck@intel.com>,
	linux-kernel@vger.kernel.org, ying.huang@intel.com, bp@alien8.de,
	tglx@linutronix.de, akpm@linux-foundation.org,
	mchehab@redhat.com, Arnaldo Carvalho de Melo <acme@redhat.com>,
	Arjan van de Ven <arjan@infradead.org>
Subject: Re: Tracing Requirements (was: [RFC/Requirements/Design] h/w error reporting)
Date: Wed, 10 Nov 2010 15:23:16 -0500	[thread overview]
Message-ID: <20101110202316.GA32396@Krystal> (raw)
In-Reply-To: <20101110191127.GA6190@nowhere>

* Frederic Weisbecker (fweisbec@gmail.com) wrote:
> On Wed, Nov 10, 2010 at 02:00:45PM -0500, Steven Rostedt wrote:
> > On Wed, 2010-11-10 at 19:41 +0100, Ingo Molnar wrote:
> > 
> > > We'll need to embark on this incremental path instead of a rewrite-the-world thing. 
> > > As a maintainer my task is to say 'no' to rewrite-the-world approaches - and we can 
> > > and will do better here.
> > 
> > Thus you are saying that we stick to the status quo, and also ignore the
> > fact that perf was a rewrite-the-world from ftrace to begin with.
> 
> Perhaps you and Mathieu can summarize your requirements here and then explain
> why extending the current ABI wouldn't work. It's quite normal that people
> try to find a solution fully backward compatible in the first place. If
> it's not possible, fine, but then justify it.

Sure, here are the requirements my user-base have, followed by a listing of Perf
and Ftrace pain points, some of which are directly derived from their respective
ABIs, others partially caused by their implementation and partially caused by
their ABI.

- Low overhead is key
  - 150 ns per event (cache-hot)
  - Zero-copy (splice to disk/network, mmap for zero-copy in-place data
    analysis)
- Compactness of traces
  - e.g. 96 bits per event (including typical 64-bit payload), no PID saved per
    event.
- Scalability to multi-core and multi-processor
  - Per-CPU buffers, time-stamp reading both scalable to many cpus *and* accurate
- Production-grace tracer reliability
  - Trace clock accuracy within 100ns, ordering can be inferred based on
    lock/interrupt handler knowledge, ability to know when ordering might be
    wrong.
- Flight recorder mode
  - Support concurrent read while writer is overwriting buffer data
    (Thomas Gleixner named these "trace-shots")
- Support multiple trace sessions in parallel
  - Engineer + Operator + flight recorder for automated bug reports
- Availability of trace buffers for crash diagnosis
  - Save to disk, network, use kexec or persistent memory
- Heterogeneous environment support
  - Portability
  - Distinct host/target environment support
  - Management of multiple target kernel versions
  - No dependency on kernel image to analyze traces
    (traces contain complete information)
- Live view/analysis of trace streams via the network
  - Impact on buffer flushing, power saving, idle, ...
- Synchronized system-wide (hypervisor, kernel and user-space) traces
- Scalability of analysis tools to very large data sets (> 10GB)
- Standardization of trace format across analysis tools


* Ring Buffer issues with Perf:

- Perf does not support flight recorder tracing (concurrent read/write)
  - Sub-buffers are needed to support concurrent read/writes in flight recorder
    mode. Peter still has to convince me otherwise (if he cares).
  - Imply adding padding when an event does not fit in the current sub-buffer
    (ABI change). Note for Frederic: creating a single-subbuffer as large as the
    buffer does not solve this problem, because perf allows writing an event
    across the end of the buffer and its beginning. In a scheme where
    sub-buffers can be discarded, it makes it quite unreliable to try to figure
    out where partially overwritten events end.
  - Calling the kernel when finishing reading a sub-buffer is needed for flight
    recorder mode tracing. It is not possible with the mmap-head-tail-counter
    ABI Perf currently uses for reader-writer synchronization.
- Perf is 5 times slower than Ftrace/Generic Ring Buffer Library/LTTng.
  - Partially due to implementation.
  - Partially due to large event size.

* Trace Format issues with Perf:

- Perf event headers are too large
- Handling of dynamically added instrumentation while trace is recorded is
  inexistent.


* Ring Buffer issues with Ftrace:

- Ftrace needs an internal API cleanup.
  - "peek" is an unnecessary API duplication which complicates everything down
    to the buffer-level.
- Ftrace does not support cross-pages event writes
  - Limits event size to less than 4kB

* Trace Format issues with Ftrace:

- Ftrace timestamps are saved as delta from previous event
  - Only works for tracing where preemption can be disabled, unusable for
    user-space tracing.
  - Creates an artificial data dependency between events, leading to odd
    side-effects when dealing with nesting over tracer
    - 0 ns IRQ/SOFTIRQ handler duration side-effect
- Event size limited to one page
- Ftrace event headers are still too large
- Handling of dynamically added instrumentation while trace is recorded is
  inexistent.

So given that fixing these issues requires a large ABI rework of both Ftrace and
Perf, creating a new ABI rather than building on top of an ABI not initially
designed to meet these requirements seems to really make sense here.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2010-11-10 20:23 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-10  0:56 [RFC/Requirements/Design] h/w error reporting Luck, Tony
2010-11-10 10:14 ` Ingo Molnar
2010-11-10 14:40   ` Steven Rostedt
2010-11-10 14:43     ` Peter Zijlstra
2010-11-10 15:09       ` Steven Rostedt
2010-11-10 15:28         ` Mathieu Desnoyers
2010-11-10 15:30         ` Peter Zijlstra
2010-11-10 15:53           ` Steven Rostedt
2010-11-10 16:52           ` Steven Rostedt
2010-11-10 17:05             ` Borislav Petkov
2010-11-10 17:41               ` Ingo Molnar
2010-11-10 17:50                 ` Luck, Tony
2010-11-10 18:09                 ` Steven Rostedt
2010-11-10 18:52                   ` Ingo Molnar
2010-11-10 17:25             ` Frederic Weisbecker
2010-11-10 17:48           ` Ingo Molnar
2010-11-10 18:05             ` Steven Rostedt
2010-11-10 18:23               ` Luck, Tony
2010-11-10 18:31                 ` Peter Zijlstra
2010-11-10 18:49                   ` Ingo Molnar
2010-11-10 18:24               ` Peter Zijlstra
2010-11-10 18:41                 ` Ingo Molnar
2010-11-10 19:00                   ` Steven Rostedt
2010-11-10 19:11                     ` Ingo Molnar
2010-11-10 19:11                     ` Frederic Weisbecker
2010-11-10 19:30                       ` Ingo Molnar
2010-11-10 19:48                         ` Steven Rostedt
2010-11-10 20:23                       ` Mathieu Desnoyers [this message]
2010-11-10 20:54                         ` Tracing Requirements (was: [RFC/Requirements/Design] h/w error reporting) Luck, Tony
2010-11-10 21:06                           ` Steven Rostedt
2010-11-10 21:34                             ` Steven Rostedt
2010-11-10 22:51                           ` Mathieu Desnoyers
2010-11-10 23:12                             ` Thomas Gleixner
2010-11-10 23:20                               ` Steven Rostedt
2010-11-10 23:45                                 ` Thomas Gleixner
2010-11-11 18:25                                 ` Ted Ts'o
2010-11-10 23:28                               ` Mathieu Desnoyers
2010-11-10 23:58                                 ` Thomas Gleixner
2010-11-11  9:17                                   ` Ingo Molnar
2010-11-11 13:37                                     ` Mathieu Desnoyers
2010-11-10 21:30                         ` Frederic Weisbecker
2010-11-10 21:54                           ` Steven Rostedt
2010-11-10 22:19                             ` Frederic Weisbecker
2010-11-10 22:49                             ` Frederic Weisbecker
2010-11-11  0:11                           ` Mathieu Desnoyers
2010-11-11 16:10                             ` Steven Rostedt
2010-11-11 16:34                               ` Mathieu Desnoyers
2010-11-10 19:16                 ` [RFC/Requirements/Design] h/w error reporting Steven Rostedt
2010-11-10 19:38                 ` Steven Rostedt
2010-11-10 18:27               ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101110202316.GA32396@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=bp@alien8.de \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox