* [PATCH 1/3 v2] x86: Save cr2 in NMI in case NMIs take a page fault
2012-06-09 2:02 [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
@ 2012-06-09 2:02 ` Steven Rostedt
2012-06-09 2:02 ` [PATCH 2/3 v2] x86: Remove cmpxchg from i386 NMI nesting code Steven Rostedt
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2012-06-09 2:02 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
Avi Kivity, Linus Torvalds, Brian Gerst
[-- Attachment #1: Type: text/plain, Size: 2531 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
Avi Kivity reported that page faults in NMIs could cause havic if
the NMI preempted another page fault handler:
The recent changes to NMI allow exceptions to take place in NMI
handlers, but I think that a #PF (say, due to access to vmalloc space)
is still problematic. Consider the sequence
#PF (cr2 set by processor)
NMI
...
#PF (cr2 clobbered)
do_page_fault()
IRET
...
IRET
do_page_fault()
address = read_cr2()
The last line reads the overwritten cr2 value.
Originally I wrote a patch to solve this by saving the cr2 on the stack.
Brian Gerst suggested to save it in the r12 register as both r12 and rbx
are saved by the do_nmi handler as required by the C standard. But rbx
is already used for saving if swapgs needs to be run on exit of the NMI
handler.
Link: http://lkml.kernel.org/r/4FBB8C40.6080304@redhat.com
Link: http://lkml.kernel.org/r/1337763411.13348.140.camel@gandalf.stny.rr.com
Reported-by: Avi Kivity <avi@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
arch/x86/kernel/entry_64.S | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 7d65133..111f6bb 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1758,10 +1758,30 @@ end_repeat_nmi:
*/
call save_paranoid
DEFAULT_FRAME 0
+
+ /*
+ * Save off the CR2 register. If we take a page fault in the NMI then
+ * it could corrupt the CR2 value. If the NMI preempts a page fault
+ * handler before it was able to read the CR2 register, and then the
+ * NMI itself takes a page fault, the page fault that was preempted
+ * will read the information from the NMI page fault and not the
+ * origin fault. Save it off and restore it if it changes.
+ * Use the r12 callee-saved register.
+ */
+ movq %cr2, %r12
+
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
movq %rsp,%rdi
movq $-1,%rsi
call do_nmi
+
+ /* Did the NMI take a page fault? Restore cr2 if it did */
+ movq %cr2, %rcx
+ cmpq %rcx, %r12
+ je 1f
+ movq %r12, %cr2
+1:
+
testl %ebx,%ebx /* swapgs needed? */
jnz nmi_restore
nmi_swapgs:
--
1.7.10
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 2/3 v2] x86: Remove cmpxchg from i386 NMI nesting code
2012-06-09 2:02 [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
2012-06-09 2:02 ` [PATCH 1/3 v2] x86: Save cr2 in NMI in case NMIs take a page fault Steven Rostedt
@ 2012-06-09 2:02 ` Steven Rostedt
2012-06-09 2:03 ` [PATCH 3/3 v2] x86: Save cr2 in NMI in case NMIs take a page fault (for i386) Steven Rostedt
2012-06-09 2:58 ` [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
3 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2012-06-09 2:02 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin
[-- Attachment #1: Type: text/plain, Size: 4254 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
I've been informed by someone on LWN called 'slashdot' that
some i386 machines do not support a true cmpxchg. The cmpxchg
used by the i386 NMI nesting code must be a true cmpxchg as
disabling interrupts will not work for NMIs (which is the work
around for i386s that do not have a true cmpxchg).
This 'slashdot' character also suggested a fix to the issue.
As the state of the nesting NMIs goes as follows:
NOT_RUNNING -> EXECUTING
EXECUTING -> NOT_RUNNING
EXECUTING -> LATCHED
LATCHED -> EXECUTING
Having these states as enum values of:
NOT_RUNNING = 0
EXECUTING = 1
LATCHED = 2
Instead of a cmpxchg to make EXECUTING -> NOT_RUNNING a
dec_and_test() would work as well. If the dec_and_test brings
the state to NOT_RUNNING, that is the same as a cmpxchg
succeeding to change EXECUTING to NOT_RUNNING. If a nested NMI
were to come in and change it to LATCHED, the dec_and_test() would
convert the state to EXECUTING (what we want it to be in such a
case anyway).
I asked 'slashdot' to post this as a patch, but it never came to
be. I decided to do the work instead.
Thanks to H. Peter Anvin for suggesting to use this_cpu_dec_and_return()
instead of local_dec_and_test(&__get_cpu_var()).
Link: http://lwn.net/Articles/484932/
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
arch/x86/kernel/nmi.c | 35 +++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index a0b2f84..a15a888 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -365,8 +365,9 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
#ifdef CONFIG_X86_32
/*
* For i386, NMIs use the same stack as the kernel, and we can
- * add a workaround to the iret problem in C. Simply have 3 states
- * the NMI can be in.
+ * add a workaround to the iret problem in C (preventing nested
+ * NMIs if an NMI takes a trap). Simply have 3 states the NMI
+ * can be in:
*
* 1) not running
* 2) executing
@@ -383,13 +384,20 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
* If an NMI hits a breakpoint that executes an iret, another
* NMI can preempt it. We do not want to allow this new NMI
* to run, but we want to execute it when the first one finishes.
- * We set the state to "latched", and the first NMI will perform
- * an cmpxchg on the state, and if it doesn't successfully
- * reset the state to "not running" it will restart the next
- * NMI.
+ * We set the state to "latched", and the exit of the first NMI will
+ * perform a dec_return, if the result is zero (NOT_RUNNING), then
+ * it will simply exit the NMI handler. If not, the dec_return
+ * would have set the state to NMI_EXECUTING (what we want it to
+ * be when we are running). In this case, we simply jump back
+ * to rerun the NMI handler again, and restart the 'latched' NMI.
+ *
+ * No trap (breakpoint or page fault) should be hit before nmi_restart,
+ * thus there is no race between the first check of state for NOT_RUNNING
+ * and setting it to NMI_EXECUTING. The HW will prevent nested NMIs
+ * at this point.
*/
enum nmi_states {
- NMI_NOT_RUNNING,
+ NMI_NOT_RUNNING = 0,
NMI_EXECUTING,
NMI_LATCHED,
};
@@ -397,18 +405,17 @@ static DEFINE_PER_CPU(enum nmi_states, nmi_state);
#define nmi_nesting_preprocess(regs) \
do { \
- if (__get_cpu_var(nmi_state) != NMI_NOT_RUNNING) { \
- __get_cpu_var(nmi_state) = NMI_LATCHED; \
+ if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) { \
+ this_cpu_write(nmi_state, NMI_LATCHED); \
return; \
} \
- nmi_restart: \
- __get_cpu_var(nmi_state) = NMI_EXECUTING; \
- } while (0)
+ this_cpu_write(nmi_state, NMI_EXECUTING); \
+ } while (0); \
+ nmi_restart:
#define nmi_nesting_postprocess() \
do { \
- if (cmpxchg(&__get_cpu_var(nmi_state), \
- NMI_EXECUTING, NMI_NOT_RUNNING) != NMI_EXECUTING) \
+ if (this_cpu_dec_return(nmi_state)) \
goto nmi_restart; \
} while (0)
#else /* x86_64 */
--
1.7.10
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 3/3 v2] x86: Save cr2 in NMI in case NMIs take a page fault (for i386)
2012-06-09 2:02 [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
2012-06-09 2:02 ` [PATCH 1/3 v2] x86: Save cr2 in NMI in case NMIs take a page fault Steven Rostedt
2012-06-09 2:02 ` [PATCH 2/3 v2] x86: Remove cmpxchg from i386 NMI nesting code Steven Rostedt
@ 2012-06-09 2:03 ` Steven Rostedt
2012-06-09 2:58 ` [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
3 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2012-06-09 2:03 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
Avi Kivity, Linus Torvalds
[-- Attachment #1: Type: text/plain, Size: 2765 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
Avi Kivity reported that page faults in NMIs could cause havic if
the NMI preempted another page fault handler:
The recent changes to NMI allow exceptions to take place in NMI
handlers, but I think that a #PF (say, due to access to vmalloc space)
is still problematic. Consider the sequence
#PF (cr2 set by processor)
NMI
...
#PF (cr2 clobbered)
do_page_fault()
IRET
...
IRET
do_page_fault()
address = read_cr2()
The last line reads the overwritten cr2 value.
This is the i386 version, which has the luxury of doing the work
in C code.
Link: http://lkml.kernel.org/r/4FBB8C40.6080304@redhat.com
Reported-by: Avi Kivity <avi@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
arch/x86/kernel/nmi.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index a15a888..f84f5c5 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -395,6 +395,14 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
* thus there is no race between the first check of state for NOT_RUNNING
* and setting it to NMI_EXECUTING. The HW will prevent nested NMIs
* at this point.
+ *
+ * In case the NMI takes a page fault, we need to save off the CR2
+ * because the NMI could have preempted another page fault and corrupt
+ * the CR2 that is about to be read. As nested NMIs must be restarted
+ * and they can not take breakpoints or page faults, the update of the
+ * CR2 must be done before converting the nmi state back to NOT_RUNNING.
+ * Otherwise, there would be a race of another nested NMI coming in
+ * after setting state to NOT_RUNNING but before updating the nmi_cr2.
*/
enum nmi_states {
NMI_NOT_RUNNING = 0,
@@ -402,6 +410,7 @@ enum nmi_states {
NMI_LATCHED,
};
static DEFINE_PER_CPU(enum nmi_states, nmi_state);
+static DEFINE_PER_CPU(unsigned long, nmi_cr2);
#define nmi_nesting_preprocess(regs) \
do { \
@@ -410,11 +419,14 @@ static DEFINE_PER_CPU(enum nmi_states, nmi_state);
return; \
} \
this_cpu_write(nmi_state, NMI_EXECUTING); \
+ this_cpu_write(nmi_cr2, read_cr2()); \
} while (0); \
nmi_restart:
#define nmi_nesting_postprocess() \
do { \
+ if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) \
+ write_cr2(this_cpu_read(nmi_cr2)); \
if (this_cpu_dec_return(nmi_state)) \
goto nmi_restart; \
} while (0)
--
1.7.10
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI
2012-06-09 2:02 [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
` (2 preceding siblings ...)
2012-06-09 2:03 ` [PATCH 3/3 v2] x86: Save cr2 in NMI in case NMIs take a page fault (for i386) Steven Rostedt
@ 2012-06-09 2:58 ` Steven Rostedt
2012-06-09 6:30 ` H. Peter Anvin
2012-06-09 6:32 ` H. Peter Anvin
3 siblings, 2 replies; 8+ messages in thread
From: Steven Rostedt @ 2012-06-09 2:58 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin
On Fri, 2012-06-08 at 22:02 -0400, Steven Rostedt wrote:
> Ingo,
>
> This is the version 2 with the updates suggested by H. Peter Anvin.
>
> Please pull the latest tip/x86/core-2 tree, which can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> tip/x86/core-2
>
FYI, I based this off of tip/x86/debug.
-- Steve
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI
2012-06-09 2:58 ` [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
@ 2012-06-09 6:30 ` H. Peter Anvin
2012-06-09 6:32 ` H. Peter Anvin
1 sibling, 0 replies; 8+ messages in thread
From: H. Peter Anvin @ 2012-06-09 6:30 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner
On 06/08/2012 07:58 PM, Steven Rostedt wrote:
> On Fri, 2012-06-08 at 22:02 -0400, Steven Rostedt wrote:
>> Ingo,
>>
>> This is the version 2 with the updates suggested by H. Peter Anvin.
>>
>> Please pull the latest tip/x86/core-2 tree, which can be found at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
>> tip/x86/core-2
>>
>
> FYI, I based this off of tip/x86/debug.
>
No, you didn't... you based it on an ancient (pre-v3.3) version of
tip/x86/debug...
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI
2012-06-09 2:58 ` [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
2012-06-09 6:30 ` H. Peter Anvin
@ 2012-06-09 6:32 ` H. Peter Anvin
2012-06-09 12:44 ` Steven Rostedt
1 sibling, 1 reply; 8+ messages in thread
From: H. Peter Anvin @ 2012-06-09 6:32 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner
On 06/08/2012 07:58 PM, Steven Rostedt wrote:
> On Fri, 2012-06-08 at 22:02 -0400, Steven Rostedt wrote:
>> Ingo,
>>
>> This is the version 2 with the updates suggested by H. Peter Anvin.
>>
>> Please pull the latest tip/x86/core-2 tree, which can be found at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
>> tip/x86/core-2
>>
>
> FYI, I based this off of tip/x86/debug.
>
Hm... actually it looks more like you reused an old branch and forgot to
actually push it out.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3 v2] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI
2012-06-09 6:32 ` H. Peter Anvin
@ 2012-06-09 12:44 ` Steven Rostedt
0 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2012-06-09 12:44 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner
On Fri, 2012-06-08 at 23:32 -0700, H. Peter Anvin wrote:
> Hm... actually it looks more like you reused an old branch and forgot to
> actually push it out.
Actully, I pushed it out, I just forgot to check to see if it succeeded:
error: failed to push some refs to 'gitolite@ra.kernel.org:/pub/scm/linux/kernel/git/rostedt/linux-trace.git'
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes (e.g. 'git pull') before pushing again. See the
'Note about fast-forwards' section of 'git push --help' for details.
This is also why I include the HEAD SHA1 in my pull messages ;-)
Force pushed just made.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 8+ messages in thread