From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Steven Rostedt <rostedt@rostedt.homelinux.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Christoph Hellwig <hch@lst.de>, Li Zefan <lizf@cn.fujitsu.com>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
Johannes Berg <johannes.berg@intel.com>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Arnaldo Carvalho de Melo <acme@infradead.org>,
Tom Zanussi <tzanussi@gmail.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Andi Kleen <andi@firstfloor.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Jeremy Fitzhardinge <jeremy@goop.org>,
"Frank Ch. Eigler" <fche@redhat.com>,
Tejun Heo <htejun@gmail.com>
Subject: Re: [patch 1/2] x86_64 page fault NMI-safe
Date: Fri, 16 Jul 2010 15:13:04 -0400 [thread overview]
Message-ID: <20100716191304.GB8309@Krystal> (raw)
In-Reply-To: <20100715220117.GA1499@Krystal>
Hi Linus,
What I omitted in my original description paragraph is that I also test for NMIs
nested over NMI "regular code" with a "nesting" per-cpu flag, which deals with
the concerns you expressed in your reply about function calls and traps.
I'm self-replying to keep track of Avi's comment about the need to save/restore
cr2 at the beginning/end of the NMI handler, so we don't end up corrupting a VM
CR2 if we have the following scenario: trap in VM, NMI, trap in NMI. So I added
cr2 awareness to the code snippet below, so we should be close to have something
that starts to make sense. (although I'm not saying it's bug-free yet) ;)
Please note that I'll be off on vacation for 2 weeks starting this evening (back
on August 2) without Internet access, so my answers might be delayed.
Thanks !
Mathieu
Code originally written by Linus Torvalds, modified by Mathieu Desnoyers
intenting to handle the fake NMI entry gracefully given that NMIs are not
necessarily disabled at the entry point. It uses a "need fake NMI" flag rather
than playing games with CS and faults. When a fake NMI is needed, it simply
jumps back to the beginning of regular nmi code. NMI exit code and fake NMI
entry are made reentrant with respect to NMI handler interruption by testing, at
the very beginning of the NMI handler, if a NMI is nested over the whole
nmi_atomic .. nmi_atomic_end code region. It also tests for nested NMIs by
keeping a per-cpu "nmi nested" flag"; this deals with detection of nesting over
the "regular nmi" execution. This code assumes NMIs have a separate stack.
#
# Two per-cpu variables: a "are we nested" flag (one byte).
# a "do we need to execute a fake NMI" flag (one byte).
# The %rsp at which the stack copy is saved is at a fixed address, which leaves
# enough room at the bottom of NMI stack for the "real" NMI entry stack. This
# assumes we have a separate NMI stack.
# The NMI stack copy top of stack is at nmi_stack_copy.
# The NMI stack copy "rip" is at nmi_stack_copy_rip, which is set to
# nmi_stack_copy-40.
#
nmi:
# Test if nested over atomic code.
cmpq $nmi_atomic,0(%rsp)
jae nmi_addr_is_ae
# Test if nested over general NMI code.
cmpb $0,%__percpu_seg:nmi_stack_nesting
jne nmi_nested_set_fake_and_return
# create new stack
is_unnested_nmi:
# Save some space for nested NMI's. The exception itself
# will never use more space, but it might use less (since
# if will be a kernel-kernel transition).
# Save %rax on top of the stack (need to temporarily use it)
pushq %rax
movq %rsp, %rax
movq $nmi_stack_copy,%rsp
# copy the five words of stack info. rip starts at 8+0(%rax).
# cr2 is saved at nmi_stack_copy_rip+40
pushq %cr2 # save cr2 to handle nesting over page faults
pushq 8+32(%rax) # ss
pushq 8+24(%rax) # rsp
pushq 8+16(%rax) # eflags
pushq 8+8(%rax) # cs
pushq 8+0(%rax) # rip
movq 0(%rax),%rax # restore %rax
set_nmi_nesting:
# and set the nesting flags
movb $0xff,%__percpu_seg:nmi_stack_nesting
regular_nmi_code:
...
# regular NMI code goes here, and can take faults,
# because this sequence now has proper nested-nmi
# handling
...
nmi_atomic:
# An NMI nesting over the whole nmi_atomic .. nmi_atomic_end region will
# be handled specially. This includes the fake NMI entry point.
cmpb $0,%__percpu_seg:need_fake_nmi
jne fake_nmi
movb $0,%__percpu_seg:nmi_stack_nesting
# restore cr2
movq %nmi_stack_copy_rip+40,%cr2
iret
# This is the fake NMI entry point.
fake_nmi:
movb $0x0,%__percpu_seg:need_fake_nmi
jmp regular_nmi_code
nmi_atomic_end:
# Make sure the address is in the nmi_atomic range and in CS segment.
nmi_addr_is_ae:
cmpq $nmi_atomic_end,0(%rsp)
jae is_unnested_nmi
# The saved rip points to the final NMI iret. Check the CS segment to
# make sure.
cmpw $__KERNEL_CS,8(%rsp)
jne is_unnested_nmi
# This is the case when we hit just as we're supposed to do the atomic code
# of a previous nmi. We run the NMI using the old return address that is still
# on the stack, rather than copy the new one that is bogus and points to where
# the nested NMI interrupted the original NMI handler!
# Easy: just set the stack pointer to point to the stack copy, clear
# need_fake_nmi (because we are directly going to execute the requested NMI) and
# jump to "nesting flag set" (which is followed by regular nmi code execution).
movq $nmi_stack_copy_rip,%rsp
movb $0x0,%__percpu_seg:need_fake_nmi
jmp set_nmi_nesting
# This is the actual nested case. Make sure we branch to the fake NMI handler
# after this handler is done.
nmi_nested_set_fake_and_return:
movb $0xff,%__percpu_seg:need_fake_nmi
popfq
jmp *(%rsp)
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2010-07-16 19:13 UTC|newest]
Thread overview: 163+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-14 15:49 [patch 0/2] x86: NMI-safe trap handlers Mathieu Desnoyers
2010-07-14 15:49 ` [patch 1/2] x86_64 page fault NMI-safe Mathieu Desnoyers
2010-07-14 16:28 ` Linus Torvalds
2010-07-14 17:06 ` Mathieu Desnoyers
2010-07-14 18:10 ` Linus Torvalds
2010-07-14 18:46 ` Ingo Molnar
2010-07-14 19:14 ` Linus Torvalds
2010-07-14 19:36 ` Frederic Weisbecker
2010-07-14 19:54 ` Linus Torvalds
2010-07-14 20:17 ` Mathieu Desnoyers
2010-07-14 20:55 ` Linus Torvalds
2010-07-14 21:18 ` Ingo Molnar
2010-07-14 22:14 ` Frederic Weisbecker
2010-07-14 22:31 ` Mathieu Desnoyers
2010-07-14 22:48 ` Frederic Weisbecker
2010-07-14 23:11 ` Mathieu Desnoyers
2010-07-14 23:38 ` Frederic Weisbecker
2010-07-15 16:26 ` Mathieu Desnoyers
2010-08-03 17:18 ` Peter Zijlstra
2010-08-03 18:25 ` Mathieu Desnoyers
2010-08-04 6:46 ` Peter Zijlstra
2010-08-04 7:14 ` Ingo Molnar
2010-08-04 14:45 ` Mathieu Desnoyers
2010-08-04 14:56 ` Peter Zijlstra
2010-08-06 1:49 ` Mathieu Desnoyers
2010-08-06 9:51 ` Peter Zijlstra
2010-08-06 13:46 ` Mathieu Desnoyers
2010-08-06 6:18 ` Masami Hiramatsu
2010-08-06 9:50 ` Peter Zijlstra
2010-08-06 13:37 ` Mathieu Desnoyers
2010-08-07 9:51 ` Masami Hiramatsu
2010-08-09 16:53 ` Frederic Weisbecker
2010-08-03 18:56 ` Linus Torvalds
2010-08-03 19:45 ` Mathieu Desnoyers
2010-08-03 20:02 ` Linus Torvalds
2010-08-03 20:10 ` Ingo Molnar
2010-08-03 20:21 ` Ingo Molnar
2010-08-03 21:16 ` Mathieu Desnoyers
2010-08-03 20:54 ` Mathieu Desnoyers
2010-08-04 6:27 ` Peter Zijlstra
2010-08-04 14:06 ` Mathieu Desnoyers
2010-08-04 14:50 ` Peter Zijlstra
2010-08-06 1:42 ` Mathieu Desnoyers
2010-08-06 10:11 ` Peter Zijlstra
2010-08-06 11:14 ` Peter Zijlstra
2010-08-06 14:15 ` Mathieu Desnoyers
2010-08-06 14:13 ` Mathieu Desnoyers
2010-08-11 14:44 ` Steven Rostedt
2010-08-11 14:34 ` Steven Rostedt
2010-08-15 13:35 ` Mathieu Desnoyers
2010-08-15 16:33 ` Avi Kivity
2010-08-15 16:44 ` Mathieu Desnoyers
2010-08-15 16:51 ` Avi Kivity
2010-08-15 18:31 ` Mathieu Desnoyers
2010-08-16 10:49 ` Avi Kivity
2010-08-16 11:29 ` Avi Kivity
2010-08-04 6:46 ` Dave Chinner
2010-08-04 7:21 ` Ingo Molnar
2010-07-14 23:40 ` Steven Rostedt
2010-07-14 19:41 ` Linus Torvalds
2010-07-14 19:56 ` Andi Kleen
2010-07-14 20:05 ` Mathieu Desnoyers
2010-07-14 20:07 ` Andi Kleen
2010-07-14 20:08 ` H. Peter Anvin
2010-07-14 23:32 ` Tejun Heo
2010-07-14 22:31 ` Frederic Weisbecker
2010-07-14 22:56 ` Linus Torvalds
2010-07-14 23:09 ` Andi Kleen
2010-07-14 23:22 ` Linus Torvalds
2010-07-15 14:11 ` Frederic Weisbecker
2010-07-15 14:35 ` Andi Kleen
2010-07-16 11:21 ` Frederic Weisbecker
2010-07-15 14:46 ` Steven Rostedt
2010-07-16 10:47 ` Frederic Weisbecker
2010-07-16 11:43 ` Steven Rostedt
2010-07-15 14:51 ` Linus Torvalds
2010-07-15 15:38 ` Linus Torvalds
2010-07-16 12:00 ` Frederic Weisbecker
2010-07-16 12:54 ` Steven Rostedt
2010-07-14 20:39 ` Mathieu Desnoyers
2010-07-14 21:23 ` Linus Torvalds
2010-07-14 21:45 ` Maciej W. Rozycki
2010-07-14 21:52 ` Linus Torvalds
2010-07-14 22:31 ` Maciej W. Rozycki
2010-07-14 22:21 ` Mathieu Desnoyers
2010-07-14 22:37 ` Linus Torvalds
2010-07-14 22:51 ` Jeremy Fitzhardinge
2010-07-14 23:02 ` Linus Torvalds
2010-07-14 23:54 ` Jeremy Fitzhardinge
2010-07-15 1:23 ` Linus Torvalds
2010-07-15 1:45 ` Linus Torvalds
2010-07-15 18:31 ` Mathieu Desnoyers
2010-07-15 18:43 ` Linus Torvalds
2010-07-15 18:48 ` Linus Torvalds
2010-07-15 22:01 ` Mathieu Desnoyers
2010-07-15 22:16 ` Linus Torvalds
2010-07-15 22:24 ` H. Peter Anvin
2010-07-15 22:26 ` Linus Torvalds
2010-07-15 22:46 ` H. Peter Anvin
2010-07-15 22:58 ` Andi Kleen
2010-07-15 23:20 ` H. Peter Anvin
2010-07-15 23:23 ` Linus Torvalds
2010-07-15 23:41 ` H. Peter Anvin
2010-07-15 23:44 ` Linus Torvalds
2010-07-15 23:46 ` H. Peter Anvin
2010-07-15 23:48 ` Andi Kleen
2010-07-15 22:30 ` Mathieu Desnoyers
2010-07-16 19:13 ` Mathieu Desnoyers [this message]
2010-07-15 16:44 ` Mathieu Desnoyers
2010-07-15 16:49 ` Linus Torvalds
2010-07-15 17:38 ` Mathieu Desnoyers
2010-07-15 20:44 ` H. Peter Anvin
2010-07-18 11:03 ` Avi Kivity
2010-07-18 17:36 ` Linus Torvalds
2010-07-18 18:04 ` Avi Kivity
2010-07-18 18:22 ` Linus Torvalds
2010-07-19 7:32 ` Avi Kivity
2010-07-18 18:17 ` Linus Torvalds
2010-07-18 18:43 ` Steven Rostedt
2010-07-18 19:26 ` Linus Torvalds
2010-07-14 15:49 ` [patch 2/2] x86 NMI-safe INT3 and Page Fault Mathieu Desnoyers
2010-07-14 16:42 ` Maciej W. Rozycki
2010-07-14 18:12 ` Mathieu Desnoyers
2010-07-14 19:21 ` Maciej W. Rozycki
2010-07-14 19:58 ` Mathieu Desnoyers
2010-07-14 20:36 ` Maciej W. Rozycki
2010-07-16 12:28 ` Avi Kivity
2010-07-16 14:49 ` Mathieu Desnoyers
2010-07-16 15:34 ` Andi Kleen
2010-07-16 15:40 ` Mathieu Desnoyers
2010-07-16 16:47 ` Avi Kivity
2010-07-16 16:58 ` Mathieu Desnoyers
2010-07-16 17:54 ` Avi Kivity
2010-07-16 18:05 ` H. Peter Anvin
2010-07-16 18:15 ` Avi Kivity
2010-07-16 18:17 ` H. Peter Anvin
2010-07-16 18:28 ` Avi Kivity
2010-07-16 18:37 ` Linus Torvalds
2010-07-16 19:26 ` Avi Kivity
2010-07-16 21:39 ` Linus Torvalds
2010-07-16 22:07 ` Andi Kleen
2010-07-16 22:26 ` Linus Torvalds
2010-07-16 22:41 ` Andi Kleen
2010-07-17 1:15 ` Linus Torvalds
2010-07-16 22:40 ` Mathieu Desnoyers
2010-07-18 9:23 ` Avi Kivity
2010-07-16 18:22 ` Mathieu Desnoyers
2010-07-16 18:32 ` Avi Kivity
2010-07-16 19:29 ` H. Peter Anvin
2010-07-16 19:39 ` Avi Kivity
2010-07-16 19:32 ` Andi Kleen
2010-07-16 18:25 ` Linus Torvalds
2010-07-16 19:30 ` Andi Kleen
2010-07-18 9:26 ` Avi Kivity
2010-07-16 19:28 ` Andi Kleen
2010-07-16 19:32 ` Avi Kivity
2010-07-16 19:34 ` Andi Kleen
2010-08-04 9:46 ` Peter Zijlstra
2010-08-04 20:23 ` H. Peter Anvin
2010-07-14 17:06 ` [patch 0/2] x86: NMI-safe trap handlers Andi Kleen
2010-07-14 17:08 ` Mathieu Desnoyers
2010-07-14 18:56 ` Andi Kleen
2010-07-14 23:29 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100716191304.GB8309@Krystal \
--to=mathieu.desnoyers@efficios.com \
--cc=acme@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=hch@lst.de \
--cc=hpa@zytor.com \
--cc=htejun@gmail.com \
--cc=jeremy@goop.org \
--cc=johannes.berg@intel.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rostedt@rostedt.homelinux.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=tzanussi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox