From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Martin Bligh <mbligh@google.com>,
Steven Rostedt <rostedt@goodmis.org>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
prasad@linux.vnet.ibm.com, "Frank Ch. Eigler" <fche@redhat.com>,
David Wilder <dwilder@us.ibm.com>,
hch@lst.de, Tom Zanussi <zanussi@comcast.net>,
Steven Rostedt <srostedt@redhat.com>
Subject: Re: [RFC PATCH 1/3] Unified trace buffer
Date: Wed, 24 Sep 2008 14:01:00 -0400 [thread overview]
Message-ID: <20080924180100.GA4374@Krystal> (raw)
In-Reply-To: <alpine.LFD.1.10.0809240921430.3265@nehalem.linux-foundation.org>
* Linus Torvalds (torvalds@linux-foundation.org) wrote:
>
>
> On Wed, 24 Sep 2008, Peter Zijlstra wrote:
> >
> > So when we reserve we get a pointer into page A, but our reserve length
> > will run over into page B. A write() method will know how to check for
> > this and break up the memcpy to copy up-to the end of A and continue
> > into B.
>
> I would suggest just not allowing page straddling.
>
> Yeah, it would limit event size to less than a page, but seriously, do
> people really want more than that? If you have huge events, I suspect it
> would be a hell of a lot better to support some kind of indirection
> scheme than to force the ring buffer to handle insane cases.
>
> Most people will want the events to be as _small_ as humanly possible. The
> normal event size should hopefully be in the 8-16 bytes, and I think the
> RFC patch is already broken because it allocates that insane 64-bit event
> counter for things. Who the hell wants a 64-bit event counter that much?
> That's broken.
>
> Linus
>
Hi Linus,
I agree that the standard "high event rate" use-case, when events are as
small as possible, would fit perfectly in 4kB sub-subbfers. However,
I see a few use-cases where having the ability to write across page
boundaries would be useful. Those will likely be low event-rate
situations where it is useful to take a bigger snapshot of a problematic
condition, but still to have it synchronized with the rest of the trace
data. e.g. :
- Writing a whole video frame into the trace upon video card glitch.
- Writing a jumbo frame (up to 9000 bytes) into the buffer when a
network card error is detected or when some iptables rules (LOG, TRACE
?) are reached.
- Dumping a kernel stack (potentially 8KB) in a single event when a
kernel OOPS is reached.
- Dumping a userspace process stack into the trace upon SIGILL, SIGSEGV
and friends.
That's only what I come up with from the top of my head, and I am sure
we'll find very ingenious users who will find plenty of other use-cases
where 4kB events won't be enough.
(It reminds me of someone saying "640K ought to be enough for anybody.")
;-)
If the write abstraction supports page straddling, I think it would be a
huge gain in simplicity for such users because they would not have to
break their payload in various events and have to create another event
layer on top of all that which would identify events uniquely with a
"cookie" or to protect writing events into the buffers with another
layer of locking.
Besides, there are other memory backends where the buffers can be put
that do not depend on the page size, namely video card memory. It can be
very useful to collect data that survives reboots. Given that this
memory will likely consist of contiguous pages, I see no need to limit
the maximum event size to a page on such support. Therefore, I think the
support for page-crossing should be placed in the "write" abstraction
(which would be specific to the type of memory used to back the
buffers) rather that the reserve/commit layer (which can simply
do reserve/commit in terms of offset from the buffer start, without
having to know the gory buffer implementation details (e.g. : array of
pages, linear mapping at boot time, video card memory...).
So, given the relative simplicity of doing a write() abstraction layer
which would deal with page-crossing writes compared to the complexity
that users would have to deal with when splitting up their large events,
I would recommend to abstract page straddling when writing to a page
array.
I think having the ability to break the buffers into sub-buffers is
still required, because it's good to have the ability to seek quickly in
such data, and the way to do this is to separate the buffer in
fixed-size sub-buffers which each contains many variable-sized events.
But I would recommend to make the sub-buffer size configurable by the
tracers so we can support events bigger than 4kB when needed.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2008-09-24 18:06 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-24 5:10 [RFC PATCH 0/3] An Unified tracing buffer (attempt) Steven Rostedt
2008-09-24 5:10 ` [RFC PATCH 1/3] Unified trace buffer Steven Rostedt
2008-09-24 15:03 ` Peter Zijlstra
2008-09-24 15:44 ` Steven Rostedt
2008-09-25 10:38 ` Ingo Molnar
2008-09-24 15:47 ` Martin Bligh
2008-09-24 16:11 ` Peter Zijlstra
2008-09-24 16:24 ` Linus Torvalds
2008-09-24 16:37 ` Steven Rostedt
2008-09-24 16:56 ` Martin Bligh
2008-09-24 17:25 ` Linus Torvalds
2008-09-24 18:01 ` Mathieu Desnoyers [this message]
2008-09-24 20:49 ` Linus Torvalds
2008-09-24 16:26 ` Steven Rostedt
2008-09-24 16:49 ` Martin Bligh
2008-09-24 17:36 ` Linus Torvalds
2008-09-24 17:49 ` Steven Rostedt
2008-09-24 20:23 ` Linus Torvalds
2008-09-24 20:37 ` David Miller
2008-09-24 20:48 ` Steven Rostedt
2008-09-24 20:51 ` Martin Bligh
2008-09-24 21:24 ` Frank Ch. Eigler
2008-09-24 21:33 ` Steven Rostedt
2008-09-24 20:47 ` Steven Rostedt
2008-09-24 21:03 ` Martin Bligh
2008-09-24 21:17 ` Steven Rostedt
2008-09-24 21:51 ` Steven Rostedt
2008-09-25 10:41 ` Peter Zijlstra
2008-09-25 14:33 ` Martin Bligh
2008-09-25 14:53 ` Peter Zijlstra
2008-09-25 15:05 ` Linus Torvalds
2008-09-25 15:25 ` Martin Bligh
2008-09-25 15:36 ` Ingo Molnar
2008-09-25 16:23 ` Mathieu Desnoyers
2008-09-25 16:32 ` Steven Rostedt
2008-09-25 17:20 ` Mathieu Desnoyers
2008-09-25 17:32 ` Steven Rostedt
2008-09-25 16:40 ` Linus Torvalds
2008-09-25 16:53 ` Steven Rostedt
2008-09-25 17:07 ` Linus Torvalds
2008-09-25 19:55 ` Ingo Molnar
2008-09-25 20:12 ` Ingo Molnar
2008-09-25 20:24 ` Linus Torvalds
2008-09-25 20:29 ` Linus Torvalds
2008-09-25 20:47 ` Steven Rostedt
2008-09-25 21:01 ` Steven Rostedt
2008-09-25 21:10 ` Ingo Molnar
2008-09-25 21:16 ` Ingo Molnar
2008-09-25 21:41 ` Ingo Molnar
2008-09-25 21:56 ` Ingo Molnar
2008-09-25 21:58 ` Linus Torvalds
2008-09-25 22:14 ` Ingo Molnar
2008-09-25 23:33 ` Linus Torvalds
2008-09-27 17:16 ` Ingo Molnar
2008-09-27 17:36 ` Ingo Molnar
2008-09-27 17:38 ` Steven Rostedt
2008-09-27 17:50 ` Peter Zijlstra
2008-09-27 18:18 ` Steven Rostedt
2008-09-27 18:42 ` Ingo Molnar
2008-09-25 20:52 ` Ingo Molnar
2008-09-25 21:14 ` Jeremy Fitzhardinge
2008-09-25 21:15 ` Martin Bligh
2008-09-25 20:29 ` Mathieu Desnoyers
2008-09-25 20:20 ` Ingo Molnar
2008-09-25 21:02 ` Jeremy Fitzhardinge
2008-09-25 21:55 ` Linus Torvalds
2008-09-25 22:25 ` Ingo Molnar
2008-09-25 22:45 ` Steven Rostedt
2008-09-25 23:04 ` Jeremy Fitzhardinge
2008-09-25 23:25 ` Ingo Molnar
2008-09-26 14:04 ` Thomas Gleixner
2008-09-25 22:39 ` Jeremy Fitzhardinge
2008-09-25 22:55 ` Ingo Molnar
2008-09-26 1:17 ` Jeremy Fitzhardinge
2008-09-26 1:27 ` Steven Rostedt
2008-09-26 1:49 ` Jeremy Fitzhardinge
2008-09-25 22:59 ` Steven Rostedt
2008-09-26 1:27 ` Jeremy Fitzhardinge
2008-09-26 1:35 ` Steven Rostedt
2008-09-26 2:07 ` Jeremy Fitzhardinge
2008-09-26 2:25 ` Steven Rostedt
2008-09-26 5:31 ` Jeremy Fitzhardinge
2008-09-26 10:41 ` Steven Rostedt
2008-09-25 15:26 ` Steven Rostedt
2008-09-25 17:22 ` Linus Torvalds
2008-09-25 17:39 ` Steven Rostedt
2008-09-25 18:14 ` Linus Torvalds
2008-09-25 15:20 ` Steven Rostedt
2008-09-24 17:54 ` Martin Bligh
2008-09-24 18:04 ` Martin Bligh
2008-09-24 20:39 ` Linus Torvalds
2008-09-24 20:56 ` Martin Bligh
2008-09-24 21:08 ` Steven Rostedt
2008-09-24 20:30 ` Linus Torvalds
2008-09-24 20:53 ` Mathieu Desnoyers
2008-09-24 22:28 ` Linus Torvalds
2008-09-24 22:41 ` Linus Torvalds
2008-09-25 17:15 ` Mathieu Desnoyers
2008-09-25 17:29 ` Linus Torvalds
2008-09-25 17:42 ` Mathieu Desnoyers
2008-09-25 16:37 ` Mathieu Desnoyers
2008-09-25 16:49 ` Linus Torvalds
2008-09-25 17:02 ` Steven Rostedt
2008-09-24 16:13 ` Mathieu Desnoyers
2008-09-24 16:31 ` Steven Rostedt
2008-09-24 16:39 ` Peter Zijlstra
2008-09-24 16:51 ` Mathieu Desnoyers
2008-09-24 5:10 ` [RFC PATCH 2/3] ftrace: combine some print formating Steven Rostedt
2008-09-24 5:10 ` [RFC PATCH 3/3] ftrace: hack in the ring buffer Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080924180100.GA4374@Krystal \
--to=compudj@krystal.dyndns.org \
--cc=akpm@linux-foundation.org \
--cc=dwilder@us.ibm.com \
--cc=fche@redhat.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@google.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=prasad@linux.vnet.ibm.com \
--cc=rostedt@goodmis.org \
--cc=srostedt@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=zanussi@comcast.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox