All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"Luck, Tony" <tony.luck@intel.com>,
	linux-kernel@vger.kernel.org, ying.huang@intel.com, bp@alien8.de,
	tglx@linutronix.de, akpm@linux-foundation.org,
	mchehab@redhat.com, Arnaldo Carvalho de Melo <acme@redhat.com>,
	Arjan van de Ven <arjan@infradead.org>
Subject: Re: Tracing Requirements (was: [RFC/Requirements/Design] h/w error reporting)
Date: Wed, 10 Nov 2010 15:23:16 -0500	[thread overview]
Message-ID: <20101110202316.GA32396@Krystal> (raw)
In-Reply-To: <20101110191127.GA6190@nowhere>

* Frederic Weisbecker (fweisbec@gmail.com) wrote:
> On Wed, Nov 10, 2010 at 02:00:45PM -0500, Steven Rostedt wrote:
> > On Wed, 2010-11-10 at 19:41 +0100, Ingo Molnar wrote:
> > 
> > > We'll need to embark on this incremental path instead of a rewrite-the-world thing. 
> > > As a maintainer my task is to say 'no' to rewrite-the-world approaches - and we can 
> > > and will do better here.
> > 
> > Thus you are saying that we stick to the status quo, and also ignore the
> > fact that perf was a rewrite-the-world from ftrace to begin with.
> 
> Perhaps you and Mathieu can summarize your requirements here and then explain
> why extending the current ABI wouldn't work. It's quite normal that people
> try to find a solution fully backward compatible in the first place. If
> it's not possible, fine, but then justify it.

Sure, here are the requirements my user-base have, followed by a listing of Perf
and Ftrace pain points, some of which are directly derived from their respective
ABIs, others partially caused by their implementation and partially caused by
their ABI.

- Low overhead is key
  - 150 ns per event (cache-hot)
  - Zero-copy (splice to disk/network, mmap for zero-copy in-place data
    analysis)
- Compactness of traces
  - e.g. 96 bits per event (including typical 64-bit payload), no PID saved per
    event.
- Scalability to multi-core and multi-processor
  - Per-CPU buffers, time-stamp reading both scalable to many cpus *and* accurate
- Production-grace tracer reliability
  - Trace clock accuracy within 100ns, ordering can be inferred based on
    lock/interrupt handler knowledge, ability to know when ordering might be
    wrong.
- Flight recorder mode
  - Support concurrent read while writer is overwriting buffer data
    (Thomas Gleixner named these "trace-shots")
- Support multiple trace sessions in parallel
  - Engineer + Operator + flight recorder for automated bug reports
- Availability of trace buffers for crash diagnosis
  - Save to disk, network, use kexec or persistent memory
- Heterogeneous environment support
  - Portability
  - Distinct host/target environment support
  - Management of multiple target kernel versions
  - No dependency on kernel image to analyze traces
    (traces contain complete information)
- Live view/analysis of trace streams via the network
  - Impact on buffer flushing, power saving, idle, ...
- Synchronized system-wide (hypervisor, kernel and user-space) traces
- Scalability of analysis tools to very large data sets (> 10GB)
- Standardization of trace format across analysis tools


* Ring Buffer issues with Perf:

- Perf does not support flight recorder tracing (concurrent read/write)
  - Sub-buffers are needed to support concurrent read/writes in flight recorder
    mode. Peter still has to convince me otherwise (if he cares).
  - Imply adding padding when an event does not fit in the current sub-buffer
    (ABI change). Note for Frederic: creating a single-subbuffer as large as the
    buffer does not solve this problem, because perf allows writing an event
    across the end of the buffer and its beginning. In a scheme where
    sub-buffers can be discarded, it makes it quite unreliable to try to figure
    out where partially overwritten events end.
  - Calling the kernel when finishing reading a sub-buffer is needed for flight
    recorder mode tracing. It is not possible with the mmap-head-tail-counter
    ABI Perf currently uses for reader-writer synchronization.
- Perf is 5 times slower than Ftrace/Generic Ring Buffer Library/LTTng.
  - Partially due to implementation.
  - Partially due to large event size.

* Trace Format issues with Perf:

- Perf event headers are too large
- Handling of dynamically added instrumentation while trace is recorded is
  inexistent.


* Ring Buffer issues with Ftrace:

- Ftrace needs an internal API cleanup.
  - "peek" is an unnecessary API duplication which complicates everything down
    to the buffer-level.
- Ftrace does not support cross-pages event writes
  - Limits event size to less than 4kB

* Trace Format issues with Ftrace:

- Ftrace timestamps are saved as delta from previous event
  - Only works for tracing where preemption can be disabled, unusable for
    user-space tracing.
  - Creates an artificial data dependency between events, leading to odd
    side-effects when dealing with nesting over tracer
    - 0 ns IRQ/SOFTIRQ handler duration side-effect
- Event size limited to one page
- Ftrace event headers are still too large
- Handling of dynamically added instrumentation while trace is recorded is
  inexistent.

So given that fixing these issues requires a large ABI rework of both Ftrace and
Perf, creating a new ABI rather than building on top of an ABI not initially
designed to meet these requirements seems to really make sense here.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2010-11-10 20:23 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-10  0:56 [RFC/Requirements/Design] h/w error reporting Luck, Tony
2010-11-10 10:14 ` Ingo Molnar
2010-11-10 14:40   ` Steven Rostedt
2010-11-10 14:43     ` Peter Zijlstra
2010-11-10 15:09       ` Steven Rostedt
2010-11-10 15:28         ` Mathieu Desnoyers
2010-11-10 15:30         ` Peter Zijlstra
2010-11-10 15:53           ` Steven Rostedt
2010-11-10 16:52           ` Steven Rostedt
2010-11-10 17:05             ` Borislav Petkov
2010-11-10 17:41               ` Ingo Molnar
2010-11-10 17:50                 ` Luck, Tony
2010-11-10 18:09                 ` Steven Rostedt
2010-11-10 18:52                   ` Ingo Molnar
2010-11-10 17:25             ` Frederic Weisbecker
2010-11-10 17:48           ` Ingo Molnar
2010-11-10 18:05             ` Steven Rostedt
2010-11-10 18:23               ` Luck, Tony
2010-11-10 18:31                 ` Peter Zijlstra
2010-11-10 18:49                   ` Ingo Molnar
2010-11-10 18:24               ` Peter Zijlstra
2010-11-10 18:41                 ` Ingo Molnar
2010-11-10 19:00                   ` Steven Rostedt
2010-11-10 19:11                     ` Ingo Molnar
2010-11-10 19:11                     ` Frederic Weisbecker
2010-11-10 19:30                       ` Ingo Molnar
2010-11-10 19:48                         ` Steven Rostedt
2010-11-10 20:23                       ` Mathieu Desnoyers [this message]
2010-11-10 20:54                         ` Tracing Requirements (was: [RFC/Requirements/Design] h/w error reporting) Luck, Tony
2010-11-10 21:06                           ` Steven Rostedt
2010-11-10 21:34                             ` Steven Rostedt
2010-11-10 22:51                           ` Mathieu Desnoyers
2010-11-10 23:12                             ` Thomas Gleixner
2010-11-10 23:20                               ` Steven Rostedt
2010-11-10 23:45                                 ` Thomas Gleixner
2010-11-11 18:25                                 ` Ted Ts'o
2010-11-10 23:28                               ` Mathieu Desnoyers
2010-11-10 23:58                                 ` Thomas Gleixner
2010-11-11  9:17                                   ` Ingo Molnar
2010-11-11 13:37                                     ` Mathieu Desnoyers
2010-11-10 21:30                         ` Frederic Weisbecker
2010-11-10 21:54                           ` Steven Rostedt
2010-11-10 22:19                             ` Frederic Weisbecker
2010-11-10 22:49                             ` Frederic Weisbecker
2010-11-11  0:11                           ` Mathieu Desnoyers
2010-11-11 16:10                             ` Steven Rostedt
2010-11-11 16:34                               ` Mathieu Desnoyers
2010-11-10 19:16                 ` [RFC/Requirements/Design] h/w error reporting Steven Rostedt
2010-11-10 19:38                 ` Steven Rostedt
2010-11-10 18:27               ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101110202316.GA32396@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=bp@alien8.de \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.