From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>, LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Steven Rostedt <rostedt@rostedt.homelinux.com>,
Thomas Gleixner <tglx@linutronix.de>,
Christoph Hellwig <hch@lst.de>, Li Zefan <lizf@cn.fujitsu.com>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
Johannes Berg <johannes.berg@intel.com>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Arnaldo Carvalho de Melo <acme@infradead.org>,
Tom Zanussi <tzanussi@gmail.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Andi Kleen <andi@firstfloor.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Jeremy Fitzhardinge <jeremy@goop.org>,
"Frank Ch. Eigler" <fche@redhat.com>,
Tejun Heo <htejun@gmail.com>
Subject: Re: [patch 1/2] x86_64 page fault NMI-safe
Date: Thu, 15 Jul 2010 12:26:32 -0400 [thread overview]
Message-ID: <20100715162631.GB30989@Krystal> (raw)
In-Reply-To: <20100714233843.GD14533@nowhere>
* Frederic Weisbecker (fweisbec@gmail.com) wrote:
> On Wed, Jul 14, 2010 at 07:11:17PM -0400, Mathieu Desnoyers wrote:
> > * Frederic Weisbecker (fweisbec@gmail.com) wrote:
> > > On Wed, Jul 14, 2010 at 06:31:07PM -0400, Mathieu Desnoyers wrote:
> > > > * Frederic Weisbecker (fweisbec@gmail.com) wrote:
> > > > > On Wed, Jul 14, 2010 at 12:54:19PM -0700, Linus Torvalds wrote:
> > > > > > On Wed, Jul 14, 2010 at 12:36 PM, Frederic Weisbecker
> > > > > > <fweisbec@gmail.com> wrote:
> > > > > > >
> > > > > > > There is also the fact we need to handle the lost NMI, by defering its
> > > > > > > treatment or so. That adds even more complexity.
> > > > > >
> > > > > > I don't think your read my proposal very deeply. It already handles
> > > > > > them by taking a fault on the iret of the first one (that's why we
> > > > > > point to the stack frame - so that we can corrupt it and force a
> > > > > > fault).
> > > > >
> > > > >
> > > > > Ah right, I missed this part.
> > > >
> > > > Hrm, Frederic, I hate to ask that but.. what are you doing with those percpu 8k
> > > > data structures exactly ? :)
> > > >
> > > > Mathieu
> > >
> > >
> > >
> > > So, when an event triggers in perf, we sometimes want to capture the stacktrace
> > > that led to the event.
> > >
> > > We want this stacktrace (here we call that a callchain) to be recorded
> > > locklessly. So we want this callchain buffer per cpu, with the following
> > > type:
> >
> > Ah OK, so you mean that perf now has 2 different ring buffer implementations ?
> > How about using a single one that is generic enough to handle perf and ftrace
> > needs instead ?
> >
> > (/me runs away quickly before the lightning strikes) ;)
> >
> > Mathieu
>
>
> :-)
>
> That's no ring buffer. It's a temporary linear buffer to fill a stacktrace,
> and get its effective size before committing it to the real ring buffer.
I was more thinking along the lines of making sure a ring buffer has the proper
support for your use-case. It shares a lot of requirements with a standard ring
buffer:
- Need to be lock-less
- Need to reserve space, write data in a buffer
By configuring a ring buffer with 4k sub-buffer size (that's configurable
dynamically), all we need to add is the ability to squash a previously saved
record from the buffer. I am confident we can provide a clean API for this that
would allow discard of previously committed entry as long as we are still on the
same non-fully-committed sub-buffer. This fits your use-case exactly, so that's
fine.
You could have one 4k ring buffer per cpu per execution context. I wonder if
each Linux architecture have support for separated thread vs softirtq vs irq vs
nmi stacks ? Even then, given you have only one stack for all shared irqs, you
need something that is concurrency-aware at the ring buffer level.
These small stack-like ring buffers could be used to save your temporary stack
copy. When you really need to save it to a larger ring buffer along with a
trace, then you just take a snapshot of the stack ring buffers.
So you get to use one single ring buffer synchronization and memory allocation
mechanism, that everyone has reviewed. The advantage is that we would not be
having this nmi race discussion in the first place: the generic ring buffer uses
"get page" directly rather than relying on vmalloc, because these bugs have
already been identified and dealt with years ago.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2010-07-15 16:26 UTC|newest]
Thread overview: 163+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-14 15:49 [patch 0/2] x86: NMI-safe trap handlers Mathieu Desnoyers
2010-07-14 15:49 ` [patch 1/2] x86_64 page fault NMI-safe Mathieu Desnoyers
2010-07-14 16:28 ` Linus Torvalds
2010-07-14 17:06 ` Mathieu Desnoyers
2010-07-14 18:10 ` Linus Torvalds
2010-07-14 18:46 ` Ingo Molnar
2010-07-14 19:14 ` Linus Torvalds
2010-07-14 19:36 ` Frederic Weisbecker
2010-07-14 19:54 ` Linus Torvalds
2010-07-14 20:17 ` Mathieu Desnoyers
2010-07-14 20:55 ` Linus Torvalds
2010-07-14 21:18 ` Ingo Molnar
2010-07-14 22:14 ` Frederic Weisbecker
2010-07-14 22:31 ` Mathieu Desnoyers
2010-07-14 22:48 ` Frederic Weisbecker
2010-07-14 23:11 ` Mathieu Desnoyers
2010-07-14 23:38 ` Frederic Weisbecker
2010-07-15 16:26 ` Mathieu Desnoyers [this message]
2010-08-03 17:18 ` Peter Zijlstra
2010-08-03 18:25 ` Mathieu Desnoyers
2010-08-04 6:46 ` Peter Zijlstra
2010-08-04 7:14 ` Ingo Molnar
2010-08-04 14:45 ` Mathieu Desnoyers
2010-08-04 14:56 ` Peter Zijlstra
2010-08-06 1:49 ` Mathieu Desnoyers
2010-08-06 9:51 ` Peter Zijlstra
2010-08-06 13:46 ` Mathieu Desnoyers
2010-08-06 6:18 ` Masami Hiramatsu
2010-08-06 9:50 ` Peter Zijlstra
2010-08-06 13:37 ` Mathieu Desnoyers
2010-08-07 9:51 ` Masami Hiramatsu
2010-08-09 16:53 ` Frederic Weisbecker
2010-08-03 18:56 ` Linus Torvalds
2010-08-03 19:45 ` Mathieu Desnoyers
2010-08-03 20:02 ` Linus Torvalds
2010-08-03 20:10 ` Ingo Molnar
2010-08-03 20:21 ` Ingo Molnar
2010-08-03 21:16 ` Mathieu Desnoyers
2010-08-03 20:54 ` Mathieu Desnoyers
2010-08-04 6:27 ` Peter Zijlstra
2010-08-04 14:06 ` Mathieu Desnoyers
2010-08-04 14:50 ` Peter Zijlstra
2010-08-06 1:42 ` Mathieu Desnoyers
2010-08-06 10:11 ` Peter Zijlstra
2010-08-06 11:14 ` Peter Zijlstra
2010-08-06 14:15 ` Mathieu Desnoyers
2010-08-06 14:13 ` Mathieu Desnoyers
2010-08-11 14:44 ` Steven Rostedt
2010-08-11 14:34 ` Steven Rostedt
2010-08-15 13:35 ` Mathieu Desnoyers
2010-08-15 16:33 ` Avi Kivity
2010-08-15 16:44 ` Mathieu Desnoyers
2010-08-15 16:51 ` Avi Kivity
2010-08-15 18:31 ` Mathieu Desnoyers
2010-08-16 10:49 ` Avi Kivity
2010-08-16 11:29 ` Avi Kivity
2010-08-04 6:46 ` Dave Chinner
2010-08-04 7:21 ` Ingo Molnar
2010-07-14 23:40 ` Steven Rostedt
2010-07-14 19:41 ` Linus Torvalds
2010-07-14 19:56 ` Andi Kleen
2010-07-14 20:05 ` Mathieu Desnoyers
2010-07-14 20:07 ` Andi Kleen
2010-07-14 20:08 ` H. Peter Anvin
2010-07-14 23:32 ` Tejun Heo
2010-07-14 22:31 ` Frederic Weisbecker
2010-07-14 22:56 ` Linus Torvalds
2010-07-14 23:09 ` Andi Kleen
2010-07-14 23:22 ` Linus Torvalds
2010-07-15 14:11 ` Frederic Weisbecker
2010-07-15 14:35 ` Andi Kleen
2010-07-16 11:21 ` Frederic Weisbecker
2010-07-15 14:46 ` Steven Rostedt
2010-07-16 10:47 ` Frederic Weisbecker
2010-07-16 11:43 ` Steven Rostedt
2010-07-15 14:51 ` Linus Torvalds
2010-07-15 15:38 ` Linus Torvalds
2010-07-16 12:00 ` Frederic Weisbecker
2010-07-16 12:54 ` Steven Rostedt
2010-07-14 20:39 ` Mathieu Desnoyers
2010-07-14 21:23 ` Linus Torvalds
2010-07-14 21:45 ` Maciej W. Rozycki
2010-07-14 21:52 ` Linus Torvalds
2010-07-14 22:31 ` Maciej W. Rozycki
2010-07-14 22:21 ` Mathieu Desnoyers
2010-07-14 22:37 ` Linus Torvalds
2010-07-14 22:51 ` Jeremy Fitzhardinge
2010-07-14 23:02 ` Linus Torvalds
2010-07-14 23:54 ` Jeremy Fitzhardinge
2010-07-15 1:23 ` Linus Torvalds
2010-07-15 1:45 ` Linus Torvalds
2010-07-15 18:31 ` Mathieu Desnoyers
2010-07-15 18:43 ` Linus Torvalds
2010-07-15 18:48 ` Linus Torvalds
2010-07-15 22:01 ` Mathieu Desnoyers
2010-07-15 22:16 ` Linus Torvalds
2010-07-15 22:24 ` H. Peter Anvin
2010-07-15 22:26 ` Linus Torvalds
2010-07-15 22:46 ` H. Peter Anvin
2010-07-15 22:58 ` Andi Kleen
2010-07-15 23:20 ` H. Peter Anvin
2010-07-15 23:23 ` Linus Torvalds
2010-07-15 23:41 ` H. Peter Anvin
2010-07-15 23:44 ` Linus Torvalds
2010-07-15 23:46 ` H. Peter Anvin
2010-07-15 23:48 ` Andi Kleen
2010-07-15 22:30 ` Mathieu Desnoyers
2010-07-16 19:13 ` Mathieu Desnoyers
2010-07-15 16:44 ` Mathieu Desnoyers
2010-07-15 16:49 ` Linus Torvalds
2010-07-15 17:38 ` Mathieu Desnoyers
2010-07-15 20:44 ` H. Peter Anvin
2010-07-18 11:03 ` Avi Kivity
2010-07-18 17:36 ` Linus Torvalds
2010-07-18 18:04 ` Avi Kivity
2010-07-18 18:22 ` Linus Torvalds
2010-07-19 7:32 ` Avi Kivity
2010-07-18 18:17 ` Linus Torvalds
2010-07-18 18:43 ` Steven Rostedt
2010-07-18 19:26 ` Linus Torvalds
2010-07-14 15:49 ` [patch 2/2] x86 NMI-safe INT3 and Page Fault Mathieu Desnoyers
2010-07-14 16:42 ` Maciej W. Rozycki
2010-07-14 18:12 ` Mathieu Desnoyers
2010-07-14 19:21 ` Maciej W. Rozycki
2010-07-14 19:58 ` Mathieu Desnoyers
2010-07-14 20:36 ` Maciej W. Rozycki
2010-07-16 12:28 ` Avi Kivity
2010-07-16 14:49 ` Mathieu Desnoyers
2010-07-16 15:34 ` Andi Kleen
2010-07-16 15:40 ` Mathieu Desnoyers
2010-07-16 16:47 ` Avi Kivity
2010-07-16 16:58 ` Mathieu Desnoyers
2010-07-16 17:54 ` Avi Kivity
2010-07-16 18:05 ` H. Peter Anvin
2010-07-16 18:15 ` Avi Kivity
2010-07-16 18:17 ` H. Peter Anvin
2010-07-16 18:28 ` Avi Kivity
2010-07-16 18:37 ` Linus Torvalds
2010-07-16 19:26 ` Avi Kivity
2010-07-16 21:39 ` Linus Torvalds
2010-07-16 22:07 ` Andi Kleen
2010-07-16 22:26 ` Linus Torvalds
2010-07-16 22:41 ` Andi Kleen
2010-07-17 1:15 ` Linus Torvalds
2010-07-16 22:40 ` Mathieu Desnoyers
2010-07-18 9:23 ` Avi Kivity
2010-07-16 18:22 ` Mathieu Desnoyers
2010-07-16 18:32 ` Avi Kivity
2010-07-16 19:29 ` H. Peter Anvin
2010-07-16 19:39 ` Avi Kivity
2010-07-16 19:32 ` Andi Kleen
2010-07-16 18:25 ` Linus Torvalds
2010-07-16 19:30 ` Andi Kleen
2010-07-18 9:26 ` Avi Kivity
2010-07-16 19:28 ` Andi Kleen
2010-07-16 19:32 ` Avi Kivity
2010-07-16 19:34 ` Andi Kleen
2010-08-04 9:46 ` Peter Zijlstra
2010-08-04 20:23 ` H. Peter Anvin
2010-07-14 17:06 ` [patch 0/2] x86: NMI-safe trap handlers Andi Kleen
2010-07-14 17:08 ` Mathieu Desnoyers
2010-07-14 18:56 ` Andi Kleen
2010-07-14 23:29 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100715162631.GB30989@Krystal \
--to=mathieu.desnoyers@efficios.com \
--cc=acme@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=hch@lst.de \
--cc=hpa@zytor.com \
--cc=htejun@gmail.com \
--cc=jeremy@goop.org \
--cc=johannes.berg@intel.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rostedt@rostedt.homelinux.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=tzanussi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox