[RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
       [not found] ` <20080417165944.GB25198@Krystal>
@ 2008-04-17 20:14   ` Mathieu Desnoyers
  2008-04-17 20:29     ` Andrew Morton
                       ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-17 20:14 UTC (permalink / raw)
  To: mingo
  Cc: akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

(hopefully finally CCing LKML) :)

Implements an alternative iret with popf and return so trap and exception
handlers can return to the NMI handler without issuing iret. iret would cause
NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to
copy the return instruction pointer to the top of the previous stack, issue a
popf, loads the previous esp and issue a near return (ret).

It allows placing immediate values (and therefore optimized trace_marks) in NMI
code since returning from a breakpoint would be valid. Accessing vmalloc'd
memory, which allows executing module code or accessing vmapped or vmalloc'd
areas from NMI context, would also be valid. This is very useful to tracers like
LTTng.

This patch makes all faults, traps and exception safe to be called from NMI
context *except* single-stepping, which requires iret to restore the TF (trap
flag) and jump to the return address in a single instruction. Sorry, no kprobes
support in NMI handlers because of this limitation.  We cannot single-step an
NMI handler, because iret must set the TF flag and return back to the
instruction to single-step in a single instruction. This cannot be emulated with
popf/lret, because lret would be single-stepped. It does not apply to immediate
values because they do not use single-stepping. This code detects if the TF
flag is set and uses the iret path for single-stepping, even if it reactivates
NMIs prematurely.

alpha and avr32 use the active count bit 31. This patch moves them to 28.

TODO : test alpha and avr32 active count modification
TODO : add paravirt support for the iret alternative. Currently, paravirt
kernels running on bare metal still use iret in traps nested over nmi handlers.

tested on x86_32 (tests implemented in a separate patch) :
- instrumented the return path to export the EIP, CS and EFLAGS values when
  taken so we know the return path code has been executed.
- trace_mark, using immediate values, with 10ms delay with the breakpoint
  activated. Runs well through the return path.
- tested vmalloc faults in NMI handler by placing a non-optimized marker in the
  NMI handler (so no breakpoint is executed) and connecting a probe which
  touches every pages of a 20MB vmalloc'd buffer. It executes trough the return
  path without problem.
- Tested with and without preemption

tested on x86_64
- instrumented the return path to export the EIP, CS and EFLAGS values when
  taken so we know the return path code has been executed.
- trace_mark, using immediate values, with 10ms delay with the breakpoint
  activated. Runs well through the return path.

To test on x86_64 :
- Test without preemption
- Test vmalloc faults
- Test on Intel 64 bits CPUs.

Changelog since v1 :
- x86_64 fixes.
Changelog since v2 :
- fix paravirt build
Changelog since v3 :
- Include modifications suggested by Jeremy
Changelog since v3 :
- including hardirq.h in entry_32/64.S is a bad idea (non ifndef'd C
  code), define HARDNMI_MASK in the .S files directly.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: akpm@osdl.org
CC: mingo@elte.hu
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
---
 arch/x86/kernel/entry_32.S      |   27 ++++++++++++++++++-
 arch/x86/kernel/entry_64.S      |   26 ++++++++++++++++++
 include/asm-alpha/thread_info.h |    2 -
 include/asm-avr32/thread_info.h |    2 -
 include/asm-x86/irqflags.h      |   55 ++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/paravirt.h      |    2 +
 include/linux/hardirq.h         |   24 ++++++++++++++++-
 7 files changed, 133 insertions(+), 5 deletions(-)

Index: linux-2.6-lttng/include/linux/hardirq.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/hardirq.h	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/include/linux/hardirq.h	2008-04-16 11:29:30.000000000 -0400
@@ -22,10 +22,13 @@
  * PREEMPT_MASK: 0x000000ff
  * SOFTIRQ_MASK: 0x0000ff00
  * HARDIRQ_MASK: 0x0fff0000
+ * HARDNMI_MASK: 0x40000000
  */
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 
+#define HARDNMI_BITS	1
+
 #ifndef HARDIRQ_BITS
 #define HARDIRQ_BITS	12
 
@@ -45,16 +48,19 @@
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
 #define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDNMI_SHIFT	(30)
 
 #define __IRQ_MASK(x)	((1UL << (x))-1)
 
 #define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
 #define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
 #define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
+#define HARDNMI_MASK	(__IRQ_MASK(HARDNMI_BITS) << HARDNMI_SHIFT)
 
 #define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
 #define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
 #define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
+#define HARDNMI_OFFSET	(1UL << HARDNMI_SHIFT)
 
 #if PREEMPT_ACTIVE < (1 << (HARDIRQ_SHIFT + HARDIRQ_BITS))
 #error PREEMPT_ACTIVE is too low!
@@ -63,6 +69,7 @@
 #define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
 #define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
 #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
+#define hardnmi_count()	(preempt_count() & HARDNMI_MASK)
 
 /*
  * Are we doing bottom half or hardware interrupt processing?
@@ -71,6 +78,7 @@
 #define in_irq()		(hardirq_count())
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
+#define in_nmi()		(hardnmi_count())
 
 /*
  * Are we running in atomic context?  WARNING: this macro cannot
@@ -159,7 +167,19 @@ extern void irq_enter(void);
  */
 extern void irq_exit(void);
 
-#define nmi_enter()		do { lockdep_off(); __irq_enter(); } while (0)
-#define nmi_exit()		do { __irq_exit(); lockdep_on(); } while (0)
+#define nmi_enter()					\
+	do {						\
+		lockdep_off();				\
+		BUG_ON(hardnmi_count());		\
+		add_preempt_count(HARDNMI_OFFSET);	\
+		__irq_enter();				\
+	} while (0)
+
+#define nmi_exit()					\
+	do {						\
+		__irq_exit();				\
+		sub_preempt_count(HARDNMI_OFFSET);	\
+		lockdep_on();				\
+	} while (0)
 
 #endif /* LINUX_HARDIRQ_H */
Index: linux-2.6-lttng/arch/x86/kernel/entry_32.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/entry_32.S	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/entry_32.S	2008-04-17 12:55:10.000000000 -0400
@@ -75,11 +75,12 @@ DF_MASK		= 0x00000400 
 NT_MASK		= 0x00004000
 VM_MASK		= 0x00020000
 
+#define HARDNMI_MASK 0x40000000
+
 #ifdef CONFIG_PREEMPT
 #define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 #define preempt_stop(clobbers)
-#define resume_kernel		restore_nocheck
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -265,6 +266,8 @@ END(ret_from_exception)
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
+	testl $HARDNMI_MASK,TI_preempt_count(%ebp)	# nested over NMI ?
+	jnz return_to_nmi
 	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
 	jnz restore_nocheck
 need_resched:
@@ -276,6 +279,12 @@ need_resched:
 	call preempt_schedule_irq
 	jmp need_resched
 END(resume_kernel)
+#else
+ENTRY(resume_kernel)
+	testl $HARDNMI_MASK,TI_preempt_count(%ebp)	# nested over NMI ?
+	jnz return_to_nmi
+	jmp restore_nocheck
+END(resume_kernel)
 #endif
 	CFI_ENDPROC
 
@@ -411,6 +420,22 @@ restore_nocheck_notrace:
 	CFI_ADJUST_CFA_OFFSET -4
 irq_return:
 	INTERRUPT_RETURN
+return_to_nmi:
+	testl $X86_EFLAGS_TF, PT_EFLAGS(%esp)
+	jnz restore_nocheck		/*
+					 * If single-stepping an NMI handler,
+					 * use the normal iret path instead of
+					 * the popf/lret because lret would be
+					 * single-stepped. It should not
+					 * happen : it will reactivate NMIs
+					 * prematurely.
+					 */
+	TRACE_IRQS_IRET
+	RESTORE_REGS
+	addl $4, %esp			# skip orig_eax/error_code
+	CFI_ADJUST_CFA_OFFSET -4
+	INTERRUPT_RETURN_NMI_SAFE
+
 .section .fixup,"ax"
 iret_exc:
 	pushl $0			# no error code
Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/entry_64.S	2008-04-17 12:53:54.000000000 -0400
@@ -54,6 +54,8 @@
 
 	.code64
 
+#define HARDNMI_MASK 0x40000000
+
 #ifndef CONFIG_PREEMPT
 #define retint_kernel retint_restore_args
 #endif	
@@ -581,12 +583,27 @@ retint_restore_args:	/* return to kernel
 	 * The iretq could re-enable interrupts:
 	 */
 	TRACE_IRQS_IRETQ
+	testl $HARDNMI_MASK,threadinfo_preempt_count(%rcx)
+	jnz return_to_nmi		/* Nested over NMI ? */
 restore_args:
 	RESTORE_ARGS 0,8,0
 
 irq_return:
 	INTERRUPT_RETURN
 
+return_to_nmi:				/*
+					 * If single-stepping an NMI handler,
+					 * use the normal iret path instead of
+					 * the popf/lret because lret would be
+					 * single-stepped. It should not
+					 * happen : it will reactivate NMIs
+					 * prematurely.
+					 */
+	testw $X86_EFLAGS_TF,EFLAGS-ARGOFFSET(%rsp)	/* trap flag? */
+	jnz restore_args
+	RESTORE_ARGS 0,8,0
+	INTERRUPT_RETURN_NMI_SAFE
+
 	.section __ex_table, "a"
 	.quad irq_return, bad_iret
 	.previous
@@ -802,6 +819,10 @@ END(spurious_interrupt)
 	.macro paranoidexit trace=1
 	/* ebx:	no swapgs flag */
 paranoid_exit\trace:
+	GET_THREAD_INFO(%rcx)
+	testl $HARDNMI_MASK,threadinfo_preempt_count(%rcx)
+	jnz paranoid_return_to_nmi\trace	/* Nested over NMI ? */
+paranoid_exit_no_nmi\trace:
 	testl %ebx,%ebx				/* swapgs needed? */
 	jnz paranoid_restore\trace
 	testl $3,CS(%rsp)
@@ -814,6 +835,11 @@ paranoid_swapgs\trace:
 paranoid_restore\trace:
 	RESTORE_ALL 8
 	jmp irq_return
+paranoid_return_to_nmi\trace:
+	testw $X86_EFLAGS_TF,EFLAGS-0(%rsp)	/* trap flag? */
+	jnz paranoid_exit_no_nmi\trace
+	RESTORE_ALL 8
+	INTERRUPT_RETURN_NMI_SAFE
 paranoid_userspace\trace:
 	GET_THREAD_INFO(%rcx)
 	movl threadinfo_flags(%rcx),%ebx
Index: linux-2.6-lttng/include/asm-x86/irqflags.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/irqflags.h	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86/irqflags.h	2008-04-17 12:28:23.000000000 -0400
@@ -138,12 +138,67 @@ static inline unsigned long __raw_local_
 
 #ifdef CONFIG_X86_64
 #define INTERRUPT_RETURN	iretq
+
+/*
+ * Only returns from a trap or exception to a NMI context (intra-privilege
+ * level near return) to the same SS and CS segments. Should be used
+ * upon trap or exception return when nested over a NMI context so no iret is
+ * issued. It takes care of modifying the eflags, rsp and returning to the
+ * previous function.
+ *
+ * The stack, at that point, looks like :
+ *
+ * 0(rsp)  RIP
+ * 8(rsp)  CS
+ * 16(rsp) EFLAGS
+ * 24(rsp) RSP
+ * 32(rsp) SS
+ *
+ * Upon execution :
+ * Copy EIP to the top of the return stack
+ * Update top of return stack address
+ * Pop eflags into the eflags register
+ * Make the return stack current
+ * Near return (popping the return address from the return stack)
+ */
+#define INTERRUPT_RETURN_NMI_SAFE	pushq %rax;		\
+					mov %rsp, %rax;		\
+					mov 24+8(%rax), %rsp;	\
+					pushq 0+8(%rax);	\
+					pushq 16+8(%rax);	\
+					movq (%rax), %rax;	\
+					popfq;			\
+					ret;
+
 #define ENABLE_INTERRUPTS_SYSCALL_RET			\
 			movq	%gs:pda_oldrsp, %rsp;	\
 			swapgs;				\
 			sysretq;
 #else
 #define INTERRUPT_RETURN		iret
+
+/*
+ * Protected mode only, no V8086. Implies that protected mode must
+ * be entered before NMIs or MCEs are enabled. Only returns from a trap or
+ * exception to a NMI context (intra-privilege level far return). Should be used
+ * upon trap or exception return when nested over a NMI context so no iret is
+ * issued.
+ *
+ * The stack, at that point, looks like :
+ *
+ * 0(esp) EIP
+ * 4(esp) CS
+ * 8(esp) EFLAGS
+ *
+ * Upon execution :
+ * Copy the stack eflags to top of stack
+ * Pop eflags into the eflags register
+ * Far return: pop EIP and CS into their register, and additionally pop EFLAGS.
+ */
+#define INTERRUPT_RETURN_NMI_SAFE	pushl 8(%esp);	\
+					popfl;		\
+					lret $4;
+
 #define ENABLE_INTERRUPTS_SYSCALL_RET	sti; sysexit
 #define GET_CR0_INTO_EAX		movl %cr0, %eax
 #endif
Index: linux-2.6-lttng/include/asm-alpha/thread_info.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-alpha/thread_info.h	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/include/asm-alpha/thread_info.h	2008-04-17 12:53:55.000000000 -0400
@@ -57,7 +57,7 @@ register struct thread_info *__current_t
 
 #endif /* __ASSEMBLY__ */
 
-#define PREEMPT_ACTIVE		0x40000000
+#define PREEMPT_ACTIVE		0x10000000
 
 /*
  * Thread information flags:
Index: linux-2.6-lttng/include/asm-avr32/thread_info.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-avr32/thread_info.h	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/include/asm-avr32/thread_info.h	2008-04-17 12:53:55.000000000 -0400
@@ -70,7 +70,7 @@ static inline struct thread_info *curren
 
 #endif /* !__ASSEMBLY__ */
 
-#define PREEMPT_ACTIVE		0x40000000
+#define PREEMPT_ACTIVE		0x10000000
 
 /*
  * Thread information flags
Index: linux-2.6-lttng/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/paravirt.h	2008-04-16 12:23:44.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86/paravirt.h	2008-04-16 12:24:36.000000000 -0400
@@ -1358,6 +1358,8 @@ static inline unsigned long __raw_local_
 	PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_iret), CLBR_NONE,	\
 		  jmp *%cs:pv_cpu_ops+PV_CPU_iret)
 
+#define INTERRUPT_RETURN_NMI_SAFE INTERRUPT_RETURN
+
 #define DISABLE_INTERRUPTS(clobbers)					\
 	PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers, \
 		  PV_SAVE_REGS;			\

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-17 20:14   ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5) Mathieu Desnoyers
@ 2008-04-17 20:29     ` Andrew Morton
  2008-04-17 21:16       ` Mathieu Desnoyers
  2008-04-17 22:01     ` Andi Kleen
  2008-04-21 14:00     ` Pavel Machek
  2 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2008-04-17 20:29 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: mingo, hpa, jeremy, rostedt, fche, linux-kernel

On Thu, 17 Apr 2008 16:14:10 -0400
Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:

> +#define nmi_enter()					\
> +	do {						\
> +		lockdep_off();				\
> +		BUG_ON(hardnmi_count());		\
> +		add_preempt_count(HARDNMI_OFFSET);	\
> +		__irq_enter();				\
> +	} while (0)

<did it _have_ to be a macro?>

Doing BUG() inside an NMI should be OK most of the time.  But the
BUG-handling code does want to know if we're in interrupt context - at
least for the "fatal exception in interrupt" stuff, and probably other
things.

But afacit the failure to include HARDNMI_MASK in

 #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))

will prevent that.

So.

Should we or should we not make in_interrupt() return true in NMI? 
"should", I expect.

If not, we'd need to do something else to communicate the current
processing state down to the BUG-handling code.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-17 20:29     ` Andrew Morton
@ 2008-04-17 21:16       ` Mathieu Desnoyers
  2008-04-17 21:26         ` Andrew Morton
  0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-17 21:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, hpa, jeremy, rostedt, fche, linux-kernel

* Andrew Morton (akpm@linux-foundation.org) wrote:
> On Thu, 17 Apr 2008 16:14:10 -0400
> Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
> 
> > +#define nmi_enter()					\
> > +	do {						\
> > +		lockdep_off();				\
> > +		BUG_ON(hardnmi_count());		\
> > +		add_preempt_count(HARDNMI_OFFSET);	\
> > +		__irq_enter();				\
> > +	} while (0)
> 
> <did it _have_ to be a macro?>
> 

isn't this real macro art work ? ;) I kept the same coding style that
was already there, which mimics the irq_enter/irq_exit macros. Changing
all of them at once could be done in a separate patch.

> Doing BUG() inside an NMI should be OK most of the time.  But the
> BUG-handling code does want to know if we're in interrupt context - at
> least for the "fatal exception in interrupt" stuff, and probably other
> things.
> 
> But afacit the failure to include HARDNMI_MASK in
> 
>  #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
> 
> will prevent that.
> 
> So.
> 
> Should we or should we not make in_interrupt() return true in NMI? 
> "should", I expect.
> 
> If not, we'd need to do something else to communicate the current
> processing state down to the BUG-handling code.
> 

You bring an interesting question. In practice, since this BUG_ON could
only happen if we have an NMI nested over another NMI or an nmi which
fails to decrement its HARDNMI_MASK. Given that the HARDIRQ_MASK is
incremented right after the HARDNMI_MASK increment (the reverse is also
true), really bad things (TM) must have happened for the BUG_ON to be
triggered outside of the __irq_enter()/__irq_exit() scope of the NMI
below the buggy one.

But since this code is there to extract as much information as possible
when things go wrong, I would say it's safer to, at least, add
HARDNMI_MASK to irq_count().

Instead, though, I think we could add :

if (in_nmi())
   panic("Fatal exception in non-maskable interrupt");

to die(). That would be clearer. I just added it to x86_32, but can't
find where x86_64 reports the "fatal exception in interrupt" and friends
message. Any idea ?

By dealing with this case specifically, I think we don't really have to
add HARDNMI_MASK to irq_count(), considering it's normally an HARDIRQ
too.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-17 21:16       ` Mathieu Desnoyers
@ 2008-04-17 21:26         ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2008-04-17 21:26 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: mingo, hpa, jeremy, rostedt, fche, linux-kernel

On Thu, 17 Apr 2008 17:16:25 -0400
Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:

> > Should we or should we not make in_interrupt() return true in NMI? 
> > "should", I expect.
> > 
> > If not, we'd need to do something else to communicate the current
> > processing state down to the BUG-handling code.
> > 
> 
> You bring an interesting question. In practice, since this BUG_ON could
> only happen if we have an NMI nested over another NMI or an nmi which
> fails to decrement its HARDNMI_MASK. Given that the HARDIRQ_MASK is
> incremented right after the HARDNMI_MASK increment (the reverse is also
> true), really bad things (TM) must have happened for the BUG_ON to be
> triggered outside of the __irq_enter()/__irq_exit() scope of the NMI
> below the buggy one.
> 
> But since this code is there to extract as much information as possible
> when things go wrong, I would say it's safer to, at least, add
> HARDNMI_MASK to irq_count().
> 
> Instead, though, I think we could add :
> 
> if (in_nmi())
>    panic("Fatal exception in non-maskable interrupt");
> 
> to die().

But that's just one site.  There might be (now, or in the future) other
code under BUG() which tests in_interrupt().

And most of the places where we test for in_interrupt() and in_irq()
probably want that to return true is we're in NMI too.  After all, it's an
interrupt.  

> That would be clearer. I just added it to x86_32, but can't
> find where x86_64 reports the "fatal exception in interrupt" and friends
> message. Any idea ?

Dunno - maybe it just doesn't have it.  Maybe it was never the right thing
to do.

> By dealing with this case specifically, I think we don't really have to
> add HARDNMI_MASK to irq_count(), considering it's normally an HARDIRQ
> too.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-17 20:14   ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5) Mathieu Desnoyers
  2008-04-17 20:29     ` Andrew Morton
@ 2008-04-17 22:01     ` Andi Kleen
  2008-04-18  0:06       ` Mathieu Desnoyers
  2008-04-21 14:00     ` Pavel Machek
  2 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-04-17 22:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: mingo, akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
>
> It allows placing immediate values (and therefore optimized trace_marks) in NMI
> code

Only if all your trace_mark infrastructure is lock less.

-Andi


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-17 22:01     ` Andi Kleen
@ 2008-04-18  0:06       ` Mathieu Desnoyers
  2008-04-18  8:07         ` Andi Kleen
  2008-04-18 11:30         ` Andi Kleen
  0 siblings, 2 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-18  0:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: mingo, akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* Andi Kleen (andi@firstfloor.org) wrote:
> Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
> >
> > It allows placing immediate values (and therefore optimized trace_marks) in NMI
> > code
> 
> Only if all your trace_mark infrastructure is lock less.
> 
> -Andi
> 

It uses RCU-style updates and has been designed to be lockless from the
ground up.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-18  0:06       ` Mathieu Desnoyers
@ 2008-04-18  8:07         ` Andi Kleen
  2008-04-19 21:00           ` Mathieu Desnoyers
  2008-04-18 11:30         ` Andi Kleen
  1 sibling, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-04-18  8:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: mingo, akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

Mathieu Desnoyers wrote:
> * Andi Kleen (andi@firstfloor.org) wrote:
>> Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
>>> It allows placing immediate values (and therefore optimized trace_marks) in NMI
>>> code
>> Only if all your trace_mark infrastructure is lock less.
>>
>> -Andi
>>
> 
> It uses RCU-style updates and has been designed to be lockless from the
> ground up.

Wrong. If it causes vmalloc faults it is not lockless.

-Andi


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-18  8:07         ` Andi Kleen
@ 2008-04-19 21:00           ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-19 21:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: mingo, akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* Andi Kleen (andi@firstfloor.org) wrote:
> Mathieu Desnoyers wrote:
> > * Andi Kleen (andi@firstfloor.org) wrote:
> >> Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
> >>> It allows placing immediate values (and therefore optimized trace_marks) in NMI
> >>> code
> >> Only if all your trace_mark infrastructure is lock less.
> >>
> >> -Andi
> >>
> > 
> > It uses RCU-style updates and has been designed to be lockless from the
> > ground up.
> 
> Wrong. If it causes vmalloc faults it is not lockless.
> 
> -Andi
> 

Could you point me where vmalloc_fault accesses a data structure for
which updates are protected by disabling interrupts ? I am curious.

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-18  0:06       ` Mathieu Desnoyers
  2008-04-18  8:07         ` Andi Kleen
@ 2008-04-18 11:30         ` Andi Kleen
  2008-04-19 21:23           ` Mathieu Desnoyers
  1 sibling, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-04-18 11:30 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: mingo, akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
>
> It uses RCU-style updates and has been designed to be lockless from the
> ground up.

RCU is not necessarily NMI safe. In most cases RCU needs writer locks
which you cannot do with NMIs.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-18 11:30         ` Andi Kleen
@ 2008-04-19 21:23           ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-19 21:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: mingo, akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* Andi Kleen (andi@firstfloor.org) wrote:
> Mathieu Desnoyers <compudj@krystal.dyndns.org> writes:
> >
> > It uses RCU-style updates and has been designed to be lockless from the
> > ground up.
> 
> RCU is not necessarily NMI safe. In most cases RCU needs writer locks
> which you cannot do with NMIs.
> 
> -Andi
> 

RCU-style updates are done outside of NMIs, in sleepable context. That's
just required when the probes connected on markers must be
registered/unregistered.

The NMI context is the RCU read side. It only have to get the probe
function pointers to call along with the private data pointers.

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-17 20:14   ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5) Mathieu Desnoyers
  2008-04-17 20:29     ` Andrew Morton
  2008-04-17 22:01     ` Andi Kleen
@ 2008-04-21 14:00     ` Pavel Machek
  2008-04-21 14:22       ` H. Peter Anvin
  2 siblings, 1 reply; 23+ messages in thread
From: Pavel Machek @ 2008-04-21 14:00 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: mingo, akpm, H. Peter Anvin, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

On Thu 2008-04-17 16:14:10, Mathieu Desnoyers wrote:
> (hopefully finally CCing LKML) :)
> 
> Implements an alternative iret with popf and return so trap and exception
> handlers can return to the NMI handler without issuing iret. iret would cause
> NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to
> copy the return instruction pointer to the top of the previous stack, issue a
> popf, loads the previous esp and issue a near return (ret).

sounds expensive. Does it slow down normal loads?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 14:00     ` Pavel Machek
@ 2008-04-21 14:22       ` H. Peter Anvin
  2008-04-21 14:51         ` Mathieu Desnoyers
  2008-04-21 15:08         ` Mathieu Desnoyers
  0 siblings, 2 replies; 23+ messages in thread
From: H. Peter Anvin @ 2008-04-21 14:22 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Mathieu Desnoyers, mingo, akpm, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

Pavel Machek wrote:
> On Thu 2008-04-17 16:14:10, Mathieu Desnoyers wrote:
>> (hopefully finally CCing LKML) :)
>>
>> Implements an alternative iret with popf and return so trap and exception
>> handlers can return to the NMI handler without issuing iret. iret would cause
>> NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to
>> copy the return instruction pointer to the top of the previous stack, issue a
>> popf, loads the previous esp and issue a near return (ret).
> 
> sounds expensive. Does it slow down normal loads?
> 

It should *only* be used to return from NMI, #MC or INT3 (breakpoint), 
which should never happen in normal operation, and even then only when 
interrupting another NMI or #MC handler.

	-hpa


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 14:22       ` H. Peter Anvin
@ 2008-04-21 14:51         ` Mathieu Desnoyers
  2008-04-21 15:08         ` Mathieu Desnoyers
  1 sibling, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-21 14:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* H. Peter Anvin (hpa@zytor.com) wrote:
> Pavel Machek wrote:
>> On Thu 2008-04-17 16:14:10, Mathieu Desnoyers wrote:
>>> (hopefully finally CCing LKML) :)
>>>
>>> Implements an alternative iret with popf and return so trap and exception
>>> handlers can return to the NMI handler without issuing iret. iret would 
>>> cause
>>> NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 
>>> has to
>>> copy the return instruction pointer to the top of the previous stack, 
>>> issue a
>>> popf, loads the previous esp and issue a near return (ret).
>> sounds expensive. Does it slow down normal loads?
>
> It should *only* be used to return from NMI, #MC or INT3 (breakpoint), 
> which should never happen in normal operation, and even then only when 
> interrupting another NMI or #MC handler.
>
> 	-hpa
>

Sorry Pavel, for some reason you message did not reach my inbox. hpa is
right : this code path is only taken to return to the NMI handler from
a trap or exception or, possibly, machine check exception.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 14:22       ` H. Peter Anvin
  2008-04-21 14:51         ` Mathieu Desnoyers
@ 2008-04-21 15:08         ` Mathieu Desnoyers
  2008-04-21 15:08           ` H. Peter Anvin
  2008-04-21 15:11           ` Mathieu Desnoyers
  1 sibling, 2 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-21 15:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* H. Peter Anvin (hpa@zytor.com) wrote:
> Pavel Machek wrote:
>> On Thu 2008-04-17 16:14:10, Mathieu Desnoyers wrote:
>>> (hopefully finally CCing LKML) :)
>>>
>>> Implements an alternative iret with popf and return so trap and exception
>>> handlers can return to the NMI handler without issuing iret. iret would 
>>> cause
>>> NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 
>>> has to
>>> copy the return instruction pointer to the top of the previous stack, 
>>> issue a
>>> popf, loads the previous esp and issue a near return (ret).
>> sounds expensive. Does it slow down normal loads?
>
> It should *only* be used to return from NMI, #MC or INT3 (breakpoint), 
> which should never happen in normal operation, and even then only when 
> interrupting another NMI or #MC handler.
>
> 	-hpa
>

Just to be clear : the added cost on normal interrupt return is to add a
supplementary test of the thread flags already loaded in registers and
a conditional branch. This is used to detect if we are nested over an
NMI handler. I doubt anyone ever notice an impact caused by this added
test/branch.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 15:08         ` Mathieu Desnoyers
@ 2008-04-21 15:08           ` H. Peter Anvin
  2008-04-21 15:21             ` Mathieu Desnoyers
  2008-04-21 15:47             ` Mathieu Desnoyers
  2008-04-21 15:11           ` Mathieu Desnoyers
  1 sibling, 2 replies; 23+ messages in thread
From: H. Peter Anvin @ 2008-04-21 15:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Pavel Machek, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

Mathieu Desnoyers wrote:
> 
> Just to be clear : the added cost on normal interrupt return is to add a
> supplementary test of the thread flags already loaded in registers and
> a conditional branch. This is used to detect if we are nested over an
> NMI handler. I doubt anyone ever notice an impact caused by this added
> test/branch.
> 

Why the **** would you do this except in the handful of places where you 
actually *could* be nested over an NMI handler (basically #MC, #DB and 
INT3)?

	-hpa


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 15:08           ` H. Peter Anvin
@ 2008-04-21 15:21             ` Mathieu Desnoyers
  2008-04-21 15:47             ` Mathieu Desnoyers
  1 sibling, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-21 15:21 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* H. Peter Anvin (hpa@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> Just to be clear : the added cost on normal interrupt return is to add a
>> supplementary test of the thread flags already loaded in registers and
>> a conditional branch. This is used to detect if we are nested over an
>> NMI handler. I doubt anyone ever notice an impact caused by this added
>> test/branch.
>
> Why the **** would you do this except in the handful of places where you 
> actually *could* be nested over an NMI handler (basically #MC, #DB and 
> INT3)?
>
> 	-hpa
>

Because I would have to do a more invasive code modification, since they
currently share their return path with normal interrupts. I agree that
the next step is to tune the patchset to only target traps and
exceptions which may happen on top of an NMI. I'll change it in my next
patchset version.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 15:08           ` H. Peter Anvin
  2008-04-21 15:21             ` Mathieu Desnoyers
@ 2008-04-21 15:47             ` Mathieu Desnoyers
  2008-04-21 17:23               ` Pavel Machek
  1 sibling, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-21 15:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* H. Peter Anvin (hpa@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> Just to be clear : the added cost on normal interrupt return is to add a
>> supplementary test of the thread flags already loaded in registers and
>> a conditional branch. This is used to detect if we are nested over an
>> NMI handler. I doubt anyone ever notice an impact caused by this added
>> test/branch.
>
> Why the **** would you do this except in the handful of places where you 
> actually *could* be nested over an NMI handler (basically #MC, #DB and 
> INT3)?
>
> 	-hpa
>

There is also the page fault case. I think putting this test in
ret_from_exception would be both safe (it is executed for any
exception return) and fast (exceptions are rare).

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 15:47             ` Mathieu Desnoyers
@ 2008-04-21 17:23               ` Pavel Machek
  2008-04-21 17:28                 ` H. Peter Anvin
  0 siblings, 1 reply; 23+ messages in thread
From: Pavel Machek @ 2008-04-21 17:23 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: H. Peter Anvin, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

On Mon 2008-04-21 11:47:56, Mathieu Desnoyers wrote:
> * H. Peter Anvin (hpa@zytor.com) wrote:
> > Mathieu Desnoyers wrote:
> >> Just to be clear : the added cost on normal interrupt return is to add a
> >> supplementary test of the thread flags already loaded in registers and
> >> a conditional branch. This is used to detect if we are nested over an
> >> NMI handler. I doubt anyone ever notice an impact caused by this added
> >> test/branch.
> >
> > Why the **** would you do this except in the handful of places where you 
> > actually *could* be nested over an NMI handler (basically #MC, #DB and 
> > INT3)?

> 
> There is also the page fault case. I think putting this test in
> ret_from_exception would be both safe (it is executed for any
> exception return) and fast (exceptions are rare).

Eh? I thought that page fault is one of the hottest paths in kernel
(along with syscall and packet receive/send)...
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 17:23               ` Pavel Machek
@ 2008-04-21 17:28                 ` H. Peter Anvin
  2008-04-21 17:42                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 23+ messages in thread
From: H. Peter Anvin @ 2008-04-21 17:28 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Mathieu Desnoyers, mingo, akpm, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

Pavel Machek wrote:
> 
>> There is also the page fault case. I think putting this test in
>> ret_from_exception would be both safe (it is executed for any
>> exception return) and fast (exceptions are rare).
> 
> Eh? I thought that page fault is one of the hottest paths in kernel
> (along with syscall and packet receive/send)...
> 								Pavel

Yeah, and the concept of handling page faults inside an NMI handler is 
pure fantasy.

	-hpa

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 17:28                 ` H. Peter Anvin
@ 2008-04-21 17:42                   ` Mathieu Desnoyers
  2008-04-21 17:59                     ` H. Peter Anvin
  0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-21 17:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* H. Peter Anvin (hpa@zytor.com) wrote:
> Pavel Machek wrote:
>>> There is also the page fault case. I think putting this test in
>>> ret_from_exception would be both safe (it is executed for any
>>> exception return) and fast (exceptions are rare).
>> Eh? I thought that page fault is one of the hottest paths in kernel
>> (along with syscall and packet receive/send)...
>> 								Pavel
>

On x86_64, we can pinpoint only the page faults returning to the kernel,
which are rare and only caused by vmalloc accesses. Ideally we could do
the same on x86_32.

> Yeah, and the concept of handling page faults inside an NMI handler is pure 
> fantasy.
>
> 	-hpa
>

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 17:42                   ` Mathieu Desnoyers
@ 2008-04-21 17:59                     ` H. Peter Anvin
  2008-04-22 13:12                       ` Mathieu Desnoyers
  0 siblings, 1 reply; 23+ messages in thread
From: H. Peter Anvin @ 2008-04-21 17:59 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Pavel Machek, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

Mathieu Desnoyers wrote:
> * H. Peter Anvin (hpa@zytor.com) wrote:
>> Pavel Machek wrote:
>>>> There is also the page fault case. I think putting this test in
>>>> ret_from_exception would be both safe (it is executed for any
>>>> exception return) and fast (exceptions are rare).
>>> Eh? I thought that page fault is one of the hottest paths in kernel
>>> (along with syscall and packet receive/send)...
>>> 								Pavel
> 
> On x86_64, we can pinpoint only the page faults returning to the kernel,
> which are rare and only caused by vmalloc accesses. Ideally we could do
> the same on x86_32.
> 

Pinpoint, how?  Ultimately you need a runtime test, and you better be 
showing that people are going to die unless before you add a cycle to 
the page fault path.  I'm only slightly exaggerating that.

	-hpa


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 17:59                     ` H. Peter Anvin
@ 2008-04-22 13:12                       ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-22 13:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* H. Peter Anvin (hpa@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> * H. Peter Anvin (hpa@zytor.com) wrote:
>>> Pavel Machek wrote:
>>>>> There is also the page fault case. I think putting this test in
>>>>> ret_from_exception would be both safe (it is executed for any
>>>>> exception return) and fast (exceptions are rare).
>>>> Eh? I thought that page fault is one of the hottest paths in kernel
>>>> (along with syscall and packet receive/send)...
>>>> 								Pavel
>> On x86_64, we can pinpoint only the page faults returning to the kernel,
>> which are rare and only caused by vmalloc accesses. Ideally we could do
>> the same on x86_32.
>
> Pinpoint, how?  Ultimately you need a runtime test, and you better be 
> showing that people are going to die unless before you add a cycle to the 
> page fault path.  I'm only slightly exaggerating that.
>

On x86_32, ret_from_exception identifies the return path taken to return
from an exception. By dulicating the check_userspace code both in the
ret_from_intr and in the ret_from_exception (that's only 4
instructions), we can know if we are in the specific condition of
returning to the kernel from an exception without any supplementary
test. Therefore, we can do the nmi nesting test only in the specific
return-to-kernel-from-exception case without slowing down any critical
code.

Something similar is done on x86_64. That will appear in my next
version.

Thanks,

Mathieu

> 	-hpa
>

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5)
  2008-04-21 15:08         ` Mathieu Desnoyers
  2008-04-21 15:08           ` H. Peter Anvin
@ 2008-04-21 15:11           ` Mathieu Desnoyers
  1 sibling, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2008-04-21 15:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, mingo, akpm, Jeremy Fitzhardinge, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* Mathieu Desnoyers (compudj@krystal.dyndns.org) wrote:
> * H. Peter Anvin (hpa@zytor.com) wrote:
> > Pavel Machek wrote:
> >> On Thu 2008-04-17 16:14:10, Mathieu Desnoyers wrote:
> >>> (hopefully finally CCing LKML) :)
> >>>
> >>> Implements an alternative iret with popf and return so trap and exception
> >>> handlers can return to the NMI handler without issuing iret. iret would 
> >>> cause
> >>> NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 
> >>> has to
> >>> copy the return instruction pointer to the top of the previous stack, 
> >>> issue a
> >>> popf, loads the previous esp and issue a near return (ret).
> >> sounds expensive. Does it slow down normal loads?
> >
> > It should *only* be used to return from NMI, #MC or INT3 (breakpoint), 
> > which should never happen in normal operation, and even then only when 
> > interrupting another NMI or #MC handler.
> >
> > 	-hpa
> >
> 
> Just to be clear : the added cost on normal interrupt return is to add a
> supplementary test of the thread flags already loaded in registers and

err, by thread flag, I meant thread preempt count. And it's not in
registers, so it has to be read from the data cache (it's clearly
already there).

> a conditional branch. This is used to detect if we are nested over an
> NMI handler. I doubt anyone ever notice an impact caused by this added
> test/branch.
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-04-22 13:12 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080417165839.GA25198@Krystal>
     [not found] ` <20080417165944.GB25198@Krystal>
2008-04-17 20:14   ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v5) Mathieu Desnoyers
2008-04-17 20:29     ` Andrew Morton
2008-04-17 21:16       ` Mathieu Desnoyers
2008-04-17 21:26         ` Andrew Morton
2008-04-17 22:01     ` Andi Kleen
2008-04-18  0:06       ` Mathieu Desnoyers
2008-04-18  8:07         ` Andi Kleen
2008-04-19 21:00           ` Mathieu Desnoyers
2008-04-18 11:30         ` Andi Kleen
2008-04-19 21:23           ` Mathieu Desnoyers
2008-04-21 14:00     ` Pavel Machek
2008-04-21 14:22       ` H. Peter Anvin
2008-04-21 14:51         ` Mathieu Desnoyers
2008-04-21 15:08         ` Mathieu Desnoyers
2008-04-21 15:08           ` H. Peter Anvin
2008-04-21 15:21             ` Mathieu Desnoyers
2008-04-21 15:47             ` Mathieu Desnoyers
2008-04-21 17:23               ` Pavel Machek
2008-04-21 17:28                 ` H. Peter Anvin
2008-04-21 17:42                   ` Mathieu Desnoyers
2008-04-21 17:59                     ` H. Peter Anvin
2008-04-22 13:12                       ` Mathieu Desnoyers
2008-04-21 15:11           ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox