[RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2)
       [not found] <20080414230344.GA16061@Krystal>
@ 2008-04-14 23:05 ` Mathieu Desnoyers
  2008-04-16 13:06   ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Mathieu Desnoyers @ 2008-04-14 23:05 UTC (permalink / raw)
  To: Andi Kleen, akpm, mingo, H. Peter Anvin, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler
  Cc: linux-kernel

(CCing lkml)

Implements an alternative iret with popf and return so trap and exception
handlers can return to the NMI handler without issuing iret. iret would cause
NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to
copy the return instruction pointer to the top of the previous stack, issue a
popf, loads the previous esp and issue a near return (ret).

It allows placing immediate values (and therefore optimized trace_marks) in NMI
code since returning from a breakpoint would be valid. Accessing vmalloc'd
memory, which allows executing module code or accessing vmapped or vmalloc'd
areas from NMI context, would also be valid. This is very useful to tracers like
LTTng.

This patch makes all faults, traps and exception safe to be called from NMI
context *except* single-stepping, which requires iret to restore the TF (trap
flag) and jump to the return address in a single instruction. Sorry, no kprobes
support in NMI handlers because of this limitation.  We cannot single-step an
NMI handler, because iret must set the TF flag and return back to the
instruction to single-step in a single instruction. This cannot be emulated with
popf/lret, because lret would be single-stepped. It does not apply to immediate
values because they do not use single-stepping. This code detects if the TF
flag is set and uses the iret path for single-stepping, even if it reactivates
NMIs prematurely.

alpha and avr32 use the active count bit 31. This patch moves them to 28.

TODO : support paravirt ops.
TODO : test alpha and avr32 active count modification

tested on x86_32 (tests implemented in a separate patch) :
- instrumented the return path to export the EIP, CS and EFLAGS values when
  taken so we know the return path code has been executed.
- trace_mark, using immediate values, with 10ms delay with the breakpoint
  activated. Runs well through the return path.
- tested vmalloc faults in NMI handler by placing a non-optimized marker in the
  NMI handler (so no breakpoint is executed) and connecting a probe which
  touches every pages of a 20MB vmalloc'd buffer. It executes trough the return
  path without problem.
- Tested with and without preemption

tested on x86_64 AMD64
- instrumented the return path to export the EIP, CS and EFLAGS values when
  taken so we know the return path code has been executed.
- trace_mark, using immediate values, with 10ms delay with the breakpoint
  activated. Runs well through the return path.

To test on x86_64 :
- Test without preemption
- Test vmalloc faults
- Test on Intel 64 bits CPUs.

"This way lies madness. Don't go there."
- Andi

Changelog since v1 :
- x86_64 fixes.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: akpm@osdl.org
CC: mingo@elte.hu
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
---
 arch/x86/kernel/entry_32.S      |   25 +++++++++++++++-
 arch/x86/kernel/entry_64.S      |   31 ++++++++++++++++++++
 include/asm-alpha/thread_info.h |    2 -
 include/asm-avr32/thread_info.h |    2 -
 include/asm-x86/irqflags.h      |   61 ++++++++++++++++++++++++++++++++++++++++
 include/linux/hardirq.h         |   24 ++++++++++++++-
 6 files changed, 140 insertions(+), 5 deletions(-)

Index: linux-2.6-lttng/include/linux/hardirq.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/hardirq.h	2008-04-10 15:56:41.000000000 -0400
+++ linux-2.6-lttng/include/linux/hardirq.h	2008-04-10 16:02:06.000000000 -0400
@@ -22,10 +22,13 @@
  * PREEMPT_MASK: 0x000000ff
  * SOFTIRQ_MASK: 0x0000ff00
  * HARDIRQ_MASK: 0x0fff0000
+ * HARDNMI_MASK: 0x40000000
  */
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 
+#define HARDNMI_BITS	1
+
 #ifndef HARDIRQ_BITS
 #define HARDIRQ_BITS	12
 
@@ -45,16 +48,19 @@
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
 #define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDNMI_SHIFT	(30)
 
 #define __IRQ_MASK(x)	((1UL << (x))-1)
 
 #define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
 #define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
 #define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
+#define HARDNMI_MASK	(__IRQ_MASK(HARDNMI_BITS) << HARDNMI_SHIFT)
 
 #define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
 #define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
 #define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
+#define HARDNMI_OFFSET	(1UL << HARDNMI_SHIFT)
 
 #if PREEMPT_ACTIVE < (1 << (HARDIRQ_SHIFT + HARDIRQ_BITS))
 #error PREEMPT_ACTIVE is too low!
@@ -63,6 +69,7 @@
 #define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
 #define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
 #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
+#define hardnmi_count()	(preempt_count() & HARDNMI_MASK)
 
 /*
  * Are we doing bottom half or hardware interrupt processing?
@@ -71,6 +78,7 @@
 #define in_irq()		(hardirq_count())
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
+#define in_nmi()		(hardnmi_count())
 
 /*
  * Are we running in atomic context?  WARNING: this macro cannot
@@ -159,7 +167,19 @@ extern void irq_enter(void);
  */
 extern void irq_exit(void);
 
-#define nmi_enter()		do { lockdep_off(); __irq_enter(); } while (0)
-#define nmi_exit()		do { __irq_exit(); lockdep_on(); } while (0)
+#define nmi_enter()					\
+	do {						\
+		lockdep_off();				\
+		BUG_ON(hardnmi_count());		\
+		add_preempt_count(HARDNMI_OFFSET);	\
+		__irq_enter();				\
+	} while (0)
+
+#define nmi_exit()					\
+	do {						\
+		__irq_exit();				\
+		sub_preempt_count(HARDNMI_OFFSET);	\
+		lockdep_on();				\
+	} while (0)
 
 #endif /* LINUX_HARDIRQ_H */
Index: linux-2.6-lttng/arch/x86/kernel/entry_32.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/entry_32.S	2008-04-10 16:02:04.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/entry_32.S	2008-04-11 07:52:36.000000000 -0400
@@ -79,7 +79,6 @@ VM_MASK		= 0x00020000
 #define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 #define preempt_stop(clobbers)
-#define resume_kernel		restore_nocheck
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -265,6 +264,8 @@ END(ret_from_exception)
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
+	testl $0x40000000,TI_preempt_count(%ebp)	# nested over NMI ?
+	jnz return_to_nmi
 	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
 	jnz restore_nocheck
 need_resched:
@@ -276,6 +277,12 @@ need_resched:
 	call preempt_schedule_irq
 	jmp need_resched
 END(resume_kernel)
+#else
+ENTRY(resume_kernel)
+	testl $0x40000000,TI_preempt_count(%ebp)	# nested over NMI ?
+	jnz return_to_nmi
+	jmp restore_nocheck
+END(resume_kernel)
 #endif
 	CFI_ENDPROC
 
@@ -411,6 +418,22 @@ restore_nocheck_notrace:
 	CFI_ADJUST_CFA_OFFSET -4
 irq_return:
 	INTERRUPT_RETURN
+return_to_nmi:
+	testl $X86_EFLAGS_TF, PT_EFLAGS(%esp)
+	jnz restore_nocheck		/*
+					 * If single-stepping an NMI handler,
+					 * use the normal iret path instead of
+					 * the popf/lret because lret would be
+					 * single-stepped. It should not
+					 * happen : it will reactivate NMIs
+					 * prematurely.
+					 */
+	TRACE_IRQS_IRET
+	RESTORE_REGS
+	addl $4, %esp			# skip orig_eax/error_code
+	CFI_ADJUST_CFA_OFFSET -4
+	INTERRUPT_RETURN_NMI_SAFE
+
 .section .fixup,"ax"
 iret_exc:
 	pushl $0			# no error code
Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S	2008-04-10 16:02:05.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/entry_64.S	2008-04-11 07:52:36.000000000 -0400
@@ -593,12 +593,27 @@ retint_restore_args:	/* return to kernel
 	 * The iretq could re-enable interrupts:
 	 */
 	TRACE_IRQS_IRETQ
+	testl $0x40000000,threadinfo_preempt_count(%rcx) /* Nested over NMI ? */
+	jnz return_to_nmi
 restore_args:
 	RESTORE_ARGS 0,8,0
 
 irq_return:
 	INTERRUPT_RETURN
 
+return_to_nmi:				/*
+					 * If single-stepping an NMI handler,
+					 * use the normal iret path instead of
+					 * the popf/lret because lret would be
+					 * single-stepped. It should not
+					 * happen : it will reactivate NMIs
+					 * prematurely.
+					 */
+	bt $8,EFLAGS-ARGOFFSET(%rsp)	/* trap flag? */
+	jc restore_args
+	RESTORE_ARGS 0,8,0
+	INTERRUPT_RETURN_NMI_SAFE
+
 	.section __ex_table, "a"
 	.quad irq_return, bad_iret
 	.previous
@@ -814,6 +829,10 @@ END(spurious_interrupt)
 	.macro paranoidexit trace=1
 	/* ebx:	no swapgs flag */
 paranoid_exit\trace:
+	GET_THREAD_INFO(%rcx)
+	testl $0x40000000,threadinfo_preempt_count(%rcx) /* Nested over NMI ? */
+	jnz paranoid_return_to_nmi\trace
+paranoid_exit_no_nmi\trace:
 	testl %ebx,%ebx				/* swapgs needed? */
 	jnz paranoid_restore\trace
 	testl $3,CS(%rsp)
@@ -826,6 +845,18 @@ paranoid_swapgs\trace:
 paranoid_restore\trace:
 	RESTORE_ALL 8
 	jmp irq_return
+paranoid_return_to_nmi\trace:		/*
+					 * If single-stepping an NMI handler,
+					 * use the normal iret path instead of
+					 * the popf/lret because lret would be
+					 * single-stepped. It should not
+					 * happen : it will reactivate NMIs
+					 * prematurely.
+					 */
+	bt $8,EFLAGS-0(%rsp)		/* trap flag? */
+	jc paranoid_exit_no_nmi\trace
+	RESTORE_ALL 8
+	INTERRUPT_RETURN_NMI_SAFE
 paranoid_userspace\trace:
 	GET_THREAD_INFO(%rcx)
 	movl threadinfo_flags(%rcx),%ebx
Index: linux-2.6-lttng/include/asm-x86/irqflags.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/irqflags.h	2008-04-10 15:56:41.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86/irqflags.h	2008-04-11 07:58:59.000000000 -0400
@@ -138,12 +138,73 @@ static inline unsigned long __raw_local_
 
 #ifdef CONFIG_X86_64
 #define INTERRUPT_RETURN	iretq
+
+/*
+ * Only returns from a trap or exception to a NMI context (intra-privilege
+ * level near return) to the same SS and CS segments. Should be used
+ * upon trap or exception return when nested over a NMI context so no iret is
+ * issued. It takes care of modifying the eflags, rsp and returning to the
+ * previous function.
+ *
+ * The stack, at that point, looks like :
+ *
+ * 0(rsp)  RIP
+ * 8(rsp)  CS
+ * 16(rsp) EFLAGS
+ * 24(rsp) RSP
+ * 32(rsp) SS
+ *
+ * Upon execution :
+ * Copy EIP to the top of the return stack
+ * Update top of return stack address
+ * Pop eflags into the eflags register
+ * Make the return stack current
+ * Near return (popping the return address from the return stack)
+ */
+#define INTERRUPT_RETURN_NMI_SAFE	pushq %rax;		\
+					pushq %rbx;		\
+					movq 40(%rsp), %rax;	\
+					movq 16(%rsp), %rbx;	\
+					subq $8, %rax;		\
+					movq %rbx, (%rax);	\
+					movq %rax, 40(%rsp);	\
+					popq %rbx;		\
+					popq %rax;		\
+					addq $16, %rsp;		\
+					popfq;			\
+					movq (%rsp), %rsp;	\
+					ret;			\
+
 #define ENABLE_INTERRUPTS_SYSCALL_RET			\
 			movq	%gs:pda_oldrsp, %rsp;	\
 			swapgs;				\
 			sysretq;
 #else
 #define INTERRUPT_RETURN		iret
+
+/*
+ * Protected mode only, no V8086. Implies that protected mode must
+ * be entered before NMIs or MCEs are enabled. Only returns from a trap or
+ * exception to a NMI context (intra-privilege level far return). Should be used
+ * upon trap or exception return when nested over a NMI context so no iret is
+ * issued.
+ *
+ * The stack, at that point, looks like :
+ *
+ * 0(esp) EIP
+ * 4(esp) CS
+ * 8(esp) EFLAGS
+ *
+ * Upon execution :
+ * Copy the stack eflags to top of stack
+ * Pop eflags into the eflags register
+ * Far return: pop EIP and CS into their register, and additionally pop EFLAGS.
+ */
+#define INTERRUPT_RETURN_NMI_SAFE	pushl 8(%esp);	\
+					popfl;		\
+					.byte 0xCA;	\
+					.word 4;
+
 #define ENABLE_INTERRUPTS_SYSCALL_RET	sti; sysexit
 #define GET_CR0_INTO_EAX		movl %cr0, %eax
 #endif
Index: linux-2.6-lttng/include/asm-alpha/thread_info.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-alpha/thread_info.h	2008-04-10 16:02:04.000000000 -0400
+++ linux-2.6-lttng/include/asm-alpha/thread_info.h	2008-04-10 16:02:06.000000000 -0400
@@ -57,7 +57,7 @@ register struct thread_info *__current_t
 
 #endif /* __ASSEMBLY__ */
 
-#define PREEMPT_ACTIVE		0x40000000
+#define PREEMPT_ACTIVE		0x10000000
 
 /*
  * Thread information flags:
Index: linux-2.6-lttng/include/asm-avr32/thread_info.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-avr32/thread_info.h	2008-04-10 16:02:04.000000000 -0400
+++ linux-2.6-lttng/include/asm-avr32/thread_info.h	2008-04-10 16:02:06.000000000 -0400
@@ -70,7 +70,7 @@ static inline struct thread_info *curren
 
 #endif /* !__ASSEMBLY__ */
 
-#define PREEMPT_ACTIVE		0x40000000
+#define PREEMPT_ACTIVE		0x10000000
 
 /*
  * Thread information flags

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2)
  2008-04-14 23:05 ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2) Mathieu Desnoyers
@ 2008-04-16 13:06   ` Ingo Molnar
  2008-04-16 13:47     ` [TEST PATCH] Test NMI kprobe modules Mathieu Desnoyers
  2008-04-16 15:10     ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2) Ingo Molnar
  0 siblings, 2 replies; 19+ messages in thread
From: Ingo Molnar @ 2008-04-16 13:06 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, akpm, H. Peter Anvin, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel


* Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:

> Implements an alternative iret with popf and return so trap and 
> exception handlers can return to the NMI handler without issuing iret. 
> iret would cause NMIs to be reenabled prematurely. x86_32 uses popf 
> and far return. x86_64 has to copy the return instruction pointer to 
> the top of the previous stack, issue a popf, loads the previous esp 
> and issue a near return (ret).

thanks Mathieu, i've picked this up into x86.git for more testing.

note that this also fixes an oprofile regression: when oprofile is used 
to generate stack-backtraces, we can fault on address resolution from 
NMI context and currently we do an IRET - with your fixes it should work 
fine. Obscure case but still worth fixing.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [TEST PATCH] Test NMI kprobe modules
  2008-04-16 13:06   ` Ingo Molnar
@ 2008-04-16 13:47     ` Mathieu Desnoyers
  2008-04-16 14:34       ` Ingo Molnar
  2008-04-16 15:10     ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2) Ingo Molnar
  1 sibling, 1 reply; 19+ messages in thread
From: Mathieu Desnoyers @ 2008-04-16 13:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, akpm, H. Peter Anvin, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
> 
> > Implements an alternative iret with popf and return so trap and 
> > exception handlers can return to the NMI handler without issuing iret. 
> > iret would cause NMIs to be reenabled prematurely. x86_32 uses popf 
> > and far return. x86_64 has to copy the return instruction pointer to 
> > the top of the previous stack, issue a popf, loads the previous esp 
> > and issue a near return (ret).
> 
> thanks Mathieu, i've picked this up into x86.git for more testing.
> 
> note that this also fixes an oprofile regression: when oprofile is used 
> to generate stack-backtraces, we can fault on address resolution from 
> NMI context and currently we do an IRET - with your fixes it should work 
> fine. Obscure case but still worth fixing.
> 
> 	Ingo
> 

Hi Ingo,

I also have a test workbench in the form of the following patch. It is
*not* meant for inclusion of any sort, but could help testing.

Enabling a kprobe, a trace_mark() and a vmalloc access requires either
to uncomment the kprobe code or to enable immediate values and disable
the vmalloc code in the marker probe, or disable immediate values and
enable the vmalloc code in the marker probe.

Thanks,

Mathieu

Small marker module to test placing a breakpoint into an NMI handler.

Notes :
We cannot single-step an NMI handler, because iret must set the TF flag and
return back to the instruction to single-step in a single instruction. This
cannot be emulated with popf/lret, because lret would be single-stepped.

Note2 :
Immediate values does not use single-stepping. Hehe. :)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: akpm@osdl.org
CC: mingo@elte.hu
---
 arch/x86/kernel/entry_32.S      |   30 ++++++++++
 arch/x86/kernel/entry_64.S      |   87 +++++++++++++++++++++++++++++++
 arch/x86/kernel/immediate.c     |    1 
 arch/x86/kernel/traps_32.c      |   21 +++++++
 arch/x86/kernel/traps_64.c      |   20 ++++++-
 samples/kprobes/Makefile        |    2 
 samples/kprobes/kprobe_nmi.c    |  110 ++++++++++++++++++++++++++++++++++++++++
 samples/markers/probe-example.c |   35 +++++-------
 8 files changed, 284 insertions(+), 22 deletions(-)

Index: linux-2.6-lttng/arch/x86/kernel/entry_32.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/entry_32.S	2008-04-11 07:52:36.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/entry_32.S	2008-04-11 07:59:07.000000000 -0400
@@ -430,9 +430,39 @@ return_to_nmi:
 					 */
 	TRACE_IRQS_IRET
 	RESTORE_REGS
+	#ud2 	# TEST, BUG on return to NMI handler
 	addl $4, %esp			# skip orig_eax/error_code
 	CFI_ADJUST_CFA_OFFSET -4
+	pushl %eax
+	pushfl
+	movl (%esp), %eax
+	movl %eax, debugo_eflags
+	addl $4, %esp
+	mov %cs, debugo_cs
+	movl 4(%esp), %eax
+	movl %eax, debug_eip
+	movl 8(%esp), %eax
+	movl %eax, debug_cs
+	movl 12(%esp), %eax
+	movl %eax, debug_eflags
+	movl 16(%esp), %eax
+	movl %eax, debug_extra
+	movl 20(%esp), %eax
+	movl %eax, debug_extra2
+	movl 24(%esp), %eax
+	movl %eax, debug_extra3
+	movl 28(%esp), %eax
+	movl %eax, debug_extra4
+	popl %eax
+	#INTERRUPT_RETURN
 	INTERRUPT_RETURN_NMI_SAFE
+	#pushl 8(%esp);
+	#popfl;
+	#.byte 0xCA;	#lret
+	#.word 4;	# pop eflags
+	#.byte 0xC2;	#ret
+	#.word 8;	# pop CS and eflags
+	#lret
 
 .section .fixup,"ax"
 iret_exc:
Index: linux-2.6-lttng/arch/x86/kernel/traps_32.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/traps_32.c	2008-04-11 07:52:36.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/traps_32.c	2008-04-11 07:59:07.000000000 -0400
@@ -791,7 +791,7 @@ void __kprobes die_nmi(struct pt_regs *r
 	do_exit(SIGSEGV);
 }
 
-static __kprobes void default_do_nmi(struct pt_regs * regs)
+void default_do_nmi(struct pt_regs * regs)
 {
 	unsigned char reason = 0;
 
@@ -799,6 +799,8 @@ static __kprobes void default_do_nmi(str
 	if (!smp_processor_id())
 		reason = get_nmi_reason();
  
+ 	/* int3 disabled */
+	_trace_mark(test_nmi, MARK_NOARGS);
 	trace_mark(kernel_arch_trap_entry, "trap_id %d ip #p%ld", 2,
 		instruction_pointer(regs));
 
@@ -1289,3 +1291,20 @@ static int __init code_bytes_setup(char 
 	return 1;
 }
 __setup("code_bytes=", code_bytes_setup);
+
+long debug_eip, debug_cs, debug_eflags, debug_extra, debug_extra2, debug_extra3, debug_extra4;
+long debugo_eip, debugo_cs, debugo_eflags, debugo_extra, debugo_extra2, debugo_extra3, debugo_extra4;
+EXPORT_SYMBOL(debug_eip);
+EXPORT_SYMBOL(debug_cs);
+EXPORT_SYMBOL(debug_eflags);
+EXPORT_SYMBOL(debug_extra);
+EXPORT_SYMBOL(debug_extra2);
+EXPORT_SYMBOL(debug_extra3);
+EXPORT_SYMBOL(debug_extra4);
+EXPORT_SYMBOL(debugo_eip);
+EXPORT_SYMBOL(debugo_cs);
+EXPORT_SYMBOL(debugo_eflags);
+EXPORT_SYMBOL(debugo_extra);
+EXPORT_SYMBOL(debugo_extra2);
+EXPORT_SYMBOL(debugo_extra3);
+EXPORT_SYMBOL(debugo_extra4);
Index: linux-2.6-lttng/samples/kprobes/Makefile
===================================================================
--- linux-2.6-lttng.orig/samples/kprobes/Makefile	2008-04-11 07:52:36.000000000 -0400
+++ linux-2.6-lttng/samples/kprobes/Makefile	2008-04-11 07:59:07.000000000 -0400
@@ -1,5 +1,5 @@
 # builds the kprobes example kernel modules;
 # then to use one (as root):  insmod <module_name.ko>
 
-obj-$(CONFIG_SAMPLE_KPROBES) += kprobe_example.o jprobe_example.o
+obj-$(CONFIG_SAMPLE_KPROBES) += kprobe_example.o jprobe_example.o kprobe_nmi.o
 obj-$(CONFIG_SAMPLE_KRETPROBES) += kretprobe_example.o
Index: linux-2.6-lttng/samples/kprobes/kprobe_nmi.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/samples/kprobes/kprobe_nmi.c	2008-04-11 08:40:14.000000000 -0400
@@ -0,0 +1,110 @@
+/*
+ * NOTE: This example is works on x86 and powerpc.
+ * Here's a sample kernel module showing the use of kprobes to dump a
+ * stack trace and selected registers when do_fork() is called.
+ *
+ * For more information on theory of operation of kprobes, see
+ * Documentation/kprobes.txt
+ *
+ * You will see the trace data in /var/log/messages and on the console
+ * whenever do_fork() is invoked to create a new process.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kprobes.h>
+
+extern long debug_eip, debug_cs, debug_eflags, debug_extra, debug_extra2, debug_extra3, debug_extra4;
+extern long debugo_eip, debugo_cs, debugo_eflags, debugo_extra, debugo_extra2, debugo_extra3, debugo_extra4;
+static int disable;
+
+/* For each probe you need to allocate a kprobe structure */
+static struct kprobe kp = {
+	.symbol_name	= "default_do_nmi",
+};
+
+/* kprobe pre_handler: called just before the probed instruction is executed */
+static int handler_pre(struct kprobe *p, struct pt_regs *regs)
+{
+	if (disable)
+		return 0;
+#ifdef CONFIG_X86
+	printk(KERN_INFO "pre_handler: p->addr = 0x%p, ip = %lx,"
+			" flags = 0x%lx\n",
+		p->addr, regs->ip, regs->flags);
+#endif
+#ifdef CONFIG_PPC
+	printk(KERN_INFO "pre_handler: p->addr = 0x%p, nip = 0x%lx,"
+			" msr = 0x%lx\n",
+		p->addr, regs->nip, regs->msr);
+#endif
+
+	/* A dump_stack() here will give a stack backtrace */
+	return 0;
+}
+
+/* kprobe post_handler: called after the probed instruction is executed */
+static void handler_post(struct kprobe *p, struct pt_regs *regs,
+				unsigned long flags)
+{
+	if (disable)
+		return;
+#ifdef CONFIG_X86
+	printk(KERN_INFO "post_handler: p->addr = 0x%p, flags = 0x%lx\n",
+		p->addr, regs->flags);
+#endif
+#ifdef CONFIG_PPC
+	printk(KERN_INFO "post_handler: p->addr = 0x%p, msr = 0x%lx\n",
+		p->addr, regs->msr);
+#endif
+	disable = 1;
+}
+
+/*
+ * fault_handler: this is called if an exception is generated for any
+ * instruction within the pre- or post-handler, or when Kprobes
+ * single-steps the probed instruction.
+ */
+static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
+{
+	if (disable)
+		return 0;
+	printk(KERN_INFO "fault_handler: p->addr = 0x%p, trap #%dn",
+		p->addr, trapnr);
+	/* Return 0 because we don't handle the fault. */
+	return 0;
+}
+
+static int __init kprobe_init(void)
+{
+	int ret;
+	kp.pre_handler = handler_pre;
+	kp.post_handler = handler_post;
+	kp.fault_handler = handler_fault;
+
+	//ret = register_kprobe(&kp);
+	//if (ret < 0) {
+	//	printk(KERN_INFO "register_kprobe failed, returned %d\n", ret);
+	//	return ret;
+	//}
+	printk(KERN_INFO "Planted kprobe at %p\n", kp.addr);
+	return 0;
+}
+
+static void __exit kprobe_exit(void)
+{
+	printk("debug data:  eip 0x%lX, cs 0x%lX, eflags 0x%lX, "
+		"extra 0x%lX 0x%lX 0x%lX 0x%lX\n",
+		debug_eip, debug_cs, debug_eflags, debug_extra,
+		debug_extra2, debug_extra3, debug_extra4);
+	printk("debugo data: eip 0x%lX, cs 0x%lX, eflags 0x%lX, "
+		"extra 0x%lX 0x%lX 0x%lX 0x%lX\n",
+		debugo_eip, debugo_cs, debugo_eflags, debugo_extra,
+		debugo_extra2, debugo_extra3, debugo_extra4);
+	unregister_kprobe(&kp);
+	printk(KERN_INFO "kprobe at %p unregistered\n", kp.addr);
+}
+
+module_init(kprobe_init)
+module_exit(kprobe_exit)
+MODULE_LICENSE("GPL");
Index: linux-2.6-lttng/arch/x86/kernel/immediate.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/immediate.c	2008-04-11 07:52:36.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/immediate.c	2008-04-11 07:59:07.000000000 -0400
@@ -272,6 +272,7 @@ __kprobes int arch_imv_update(const stru
 		 * interrupts.
 		 */
 		wmb();
+		mdelay(10);
 		text_poke((void *)insn, (unsigned char *)bypass_eip, 1);
 		/*
 		 * Wait for all int3 handlers to end (interrupts are disabled in
Index: linux-2.6-lttng/samples/markers/probe-example.c
===================================================================
--- linux-2.6-lttng.orig/samples/markers/probe-example.c	2008-04-11 07:52:36.000000000 -0400
+++ linux-2.6-lttng/samples/markers/probe-example.c	2008-04-11 07:59:07.000000000 -0400
@@ -12,6 +12,7 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/marker.h>
+#include <linux/vmalloc.h>
 #include <asm/atomic.h>
 
 struct probe_data {
@@ -20,40 +21,34 @@ struct probe_data {
 	marker_probe_func *probe_func;
 };
 
+/* 20 MB buffer */
+char *vmem;
+atomic_t eventb_count = ATOMIC_INIT(0);
+
 void probe_subsystem_event(void *probe_data, void *call_data,
 	const char *format, va_list *args)
 {
+	vmem[atomic_read(&eventb_count) % 20971520] = 0x42;
+	atomic_add(4096, &eventb_count);
 	/* Declare args */
-	unsigned int value;
-	const char *mystr;
+	//unsigned int value;
+	//const char *mystr;
 
 	/* Assign args */
-	value = va_arg(*args, typeof(value));
-	mystr = va_arg(*args, typeof(mystr));
+	//value = va_arg(*args, typeof(value));
+	//mystr = va_arg(*args, typeof(mystr));
 
 	/* Call printk */
-	printk(KERN_INFO "Value %u, string %s\n", value, mystr);
+	//printk(KERN_INFO "Value %u, string %s\n", value, mystr);
 
 	/* or count, check rights, serialize data in a buffer */
 }
 
-atomic_t eventb_count = ATOMIC_INIT(0);
-
-void probe_subsystem_eventb(void *probe_data, void *call_data,
-	const char *format, va_list *args)
-{
-	/* Increment counter */
-	atomic_inc(&eventb_count);
-}
-
 static struct probe_data probe_array[] =
 {
-	{	.name = "subsystem_event",
-		.format = "integer %d string %s",
-		.probe_func = probe_subsystem_event },
-	{	.name = "subsystem_eventb",
+	{	.name = "test_nmi",
 		.format = MARK_NOARGS,
-		.probe_func = probe_subsystem_eventb },
+		.probe_func = probe_subsystem_event },
 };
 
 static int __init probe_init(void)
@@ -61,6 +56,7 @@ static int __init probe_init(void)
 	int result;
 	int i;
 
+	vmem = vmalloc(20971520);
 	for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
 		result = marker_probe_register(probe_array[i].name,
 				probe_array[i].format,
@@ -81,6 +77,7 @@ static void __exit probe_fini(void)
 			probe_array[i].probe_func, &probe_array[i]);
 	printk(KERN_INFO "Number of event b : %u\n",
 			atomic_read(&eventb_count));
+	vfree(vmem);
 }
 
 module_init(probe_init);
Index: linux-2.6-lttng/arch/x86/kernel/traps_64.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/traps_64.c	2008-04-11 07:52:36.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/traps_64.c	2008-04-11 08:40:37.000000000 -0400
@@ -827,11 +827,12 @@ unknown_nmi_error(unsigned char reason, 
 
 /* Runs on IST stack. This code must keep interrupts off all the time.
    Nested NMIs are prevented by the CPU. */
-asmlinkage __kprobes void default_do_nmi(struct pt_regs *regs)
+asmlinkage void default_do_nmi(struct pt_regs *regs)
 {
 	unsigned char reason = 0;
 	int cpu;
 
+	trace_mark(test_nmi, MARK_NOARGS);
 	trace_mark(kernel_arch_trap_entry, "trap_id %d ip #p%ld",
 		2, instruction_pointer(regs));
 
@@ -1225,3 +1226,20 @@ static int __init code_bytes_setup(char 
 	return 1;
 }
 __setup("code_bytes=", code_bytes_setup);
+
+long debug_eip, debug_cs, debug_eflags, debug_extra, debug_extra2, debug_extra3, debug_extra4;
+long debugo_eip, debugo_cs, debugo_eflags, debugo_extra, debugo_extra2, debugo_extra3, debugo_extra4;
+EXPORT_SYMBOL(debug_eip);
+EXPORT_SYMBOL(debug_cs);
+EXPORT_SYMBOL(debug_eflags);
+EXPORT_SYMBOL(debug_extra);
+EXPORT_SYMBOL(debug_extra2);
+EXPORT_SYMBOL(debug_extra3);
+EXPORT_SYMBOL(debug_extra4);
+EXPORT_SYMBOL(debugo_eip);
+EXPORT_SYMBOL(debugo_cs);
+EXPORT_SYMBOL(debugo_eflags);
+EXPORT_SYMBOL(debugo_extra);
+EXPORT_SYMBOL(debugo_extra2);
+EXPORT_SYMBOL(debugo_extra3);
+EXPORT_SYMBOL(debugo_extra4);
Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S	2008-04-11 07:52:36.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/entry_64.S	2008-04-11 07:59:59.000000000 -0400
@@ -612,7 +612,51 @@ return_to_nmi:				/*
 	bt $8,EFLAGS-ARGOFFSET(%rsp)	/* trap flag? */
 	jc restore_args
 	RESTORE_ARGS 0,8,0
+	pushq %rax
+	pushfq
+	movq (%rsp), %rax
+	movq %rax, debugo_eflags
+	addq $8, %rsp
+	mov %cs, debugo_cs
+	movq 8(%rsp), %rax
+	movq %rsp, debugo_extra
+	mov %ss, debugo_extra2
+	movq 8(%rsp), %rax
+	movq %rax, debug_eip
+	movq 16(%rsp), %rax
+	movq %rax, debug_cs
+	movq 24(%rsp), %rax
+	movq %rax, debug_eflags
+	movq 32(%rsp), %rax
+	movq %rax, debug_extra
+	movq 40(%rsp), %rax
+	movq %rax, debug_extra2
+	movq 48(%rsp), %rax
+	movq %rax, debug_extra3
+	movq 56(%rsp), %rax
+	movq %rax, debug_extra4
+	popq %rax
+	#jmp irq_return
 	INTERRUPT_RETURN_NMI_SAFE
+	#pushq %rax
+	#pushq %rbx
+	# We return to the same SS
+	#movq 40(%rsp), %rax	# The return stack address
+	#movq 24(%rsp), %rbx	# Copy CS to other stack
+	#movq %rbx, -8(%rax)
+	#movq 16(%rsp), %rbx	# Copy RIP to other stack
+	#movq %rbx, -8(%rax)
+	#subq $8, %rax
+	#movq %rax, 40(%rsp)	# Update top of return stack address
+	#popq %rbx
+	#popq %rax
+	#addq $16, %rsp		# Skip RIP and CS
+	#popfq
+	#movq (%rsp), %rsp
+	#ret
+	#don't load SS nor use lret, since we return to same CS and SS.
+	#lss (%rsp), %rsp
+	#lret
 
 	.section __ex_table, "a"
 	.quad irq_return, bad_iret
@@ -856,7 +900,50 @@ paranoid_return_to_nmi\trace:		/*
 	bt $8,EFLAGS-0(%rsp)		/* trap flag? */
 	jc paranoid_exit_no_nmi\trace
 	RESTORE_ALL 8
+	pushq %rax
+	pushfq
+	movq (%rsp), %rax
+	movq %rax, debugo_eflags
+	addq $8, %rsp
+	mov %cs, debugo_cs
+	movq %rsp, debugo_extra
+	mov %ss, debugo_extra2
+	movq 8(%rsp), %rax
+	movq %rax, debug_eip
+	movq 16(%rsp), %rax
+	movq %rax, debug_cs
+	movq 24(%rsp), %rax
+	movq %rax, debug_eflags
+	movq 32(%rsp), %rax
+	movq %rax, debug_extra
+	movq 40(%rsp), %rax
+	movq %rax, debug_extra2
+	movq 48(%rsp), %rax
+	movq %rax, debug_extra3
+	movq 56(%rsp), %rax
+	movq %rax, debug_extra4
+	popq %rax
+	#jmp irq_return
 	INTERRUPT_RETURN_NMI_SAFE
+	#pushq %rax
+	#pushq %rbx
+	#movq 40(%rsp), %rax	# The return stack address
+	#movq 24(%rsp), %rbx	# Copy CS to other stack
+	#movq %rbx, -8(%rax)
+	#movq 16(%rsp), %rbx	# Copy RIP to other stack
+	#movq %rbx, -8(%rax)
+	#subq $8, %rax
+	#movq %rax, 40(%rsp)	# Update top of return stack address
+	#popq %rbx
+	#popq %rax
+	#addq $16, %rsp		# Skip RIP and CS
+	#popfq
+	#movq (%rsp), %rsp
+	#ret
+	#don't load SS nor use lret, since we return to same CS and SS.
+	#lss (%rsp), %rsp
+	#lret
+
 paranoid_userspace\trace:
 	GET_THREAD_INFO(%rcx)
 	movl threadinfo_flags(%rcx),%ebx

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [TEST PATCH] Test NMI kprobe modules
  2008-04-16 13:47     ` [TEST PATCH] Test NMI kprobe modules Mathieu Desnoyers
@ 2008-04-16 14:34       ` Ingo Molnar
  2008-04-16 14:54         ` Mathieu Desnoyers
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-04-16 14:34 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, akpm, H. Peter Anvin, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel


* Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:

> +long debug_eip, debug_cs, debug_eflags, debug_extra, debug_extra2, debug_extra3, debug_extra4;
> +long debugo_eip, debugo_cs, debugo_eflags, debugo_extra, debugo_extra2, debugo_extra3, debugo_extra4;
> +EXPORT_SYMBOL(debug_eip);
> +EXPORT_SYMBOL(debug_cs);
> +EXPORT_SYMBOL(debug_eflags);
> +EXPORT_SYMBOL(debug_extra);
> +EXPORT_SYMBOL(debug_extra2);
> +EXPORT_SYMBOL(debug_extra3);
> +EXPORT_SYMBOL(debug_extra4);
> +EXPORT_SYMBOL(debugo_eip);
> +EXPORT_SYMBOL(debugo_cs);
> +EXPORT_SYMBOL(debugo_eflags);
> +EXPORT_SYMBOL(debugo_extra);
> +EXPORT_SYMBOL(debugo_extra2);
> +EXPORT_SYMBOL(debugo_extra3);
> +EXPORT_SYMBOL(debugo_extra4);

ok, while this is a test patch of yours, lets make one thing sure: all 
things hook-alike instrumentation _MUST_ and will stay stay GPL 
exported. It's all very internal, and there will be no automatic, 
programmable interface stability guarantees for any of the markers that 
kabi crap could come and shackle the kernel with ... (it will all be 
stable to SystemTap of course - but SystemTap is a kernel-internal 
entity in that regard)

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [TEST PATCH] Test NMI kprobe modules
  2008-04-16 14:34       ` Ingo Molnar
@ 2008-04-16 14:54         ` Mathieu Desnoyers
  0 siblings, 0 replies; 19+ messages in thread
From: Mathieu Desnoyers @ 2008-04-16 14:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, akpm, H. Peter Anvin, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
> 
> > +long debug_eip, debug_cs, debug_eflags, debug_extra, debug_extra2, debug_extra3, debug_extra4;
> > +long debugo_eip, debugo_cs, debugo_eflags, debugo_extra, debugo_extra2, debugo_extra3, debugo_extra4;
> > +EXPORT_SYMBOL(debug_eip);
> > +EXPORT_SYMBOL(debug_cs);
> > +EXPORT_SYMBOL(debug_eflags);
> > +EXPORT_SYMBOL(debug_extra);
> > +EXPORT_SYMBOL(debug_extra2);
> > +EXPORT_SYMBOL(debug_extra3);
> > +EXPORT_SYMBOL(debug_extra4);
> > +EXPORT_SYMBOL(debugo_eip);
> > +EXPORT_SYMBOL(debugo_cs);
> > +EXPORT_SYMBOL(debugo_eflags);
> > +EXPORT_SYMBOL(debugo_extra);
> > +EXPORT_SYMBOL(debugo_extra2);
> > +EXPORT_SYMBOL(debugo_extra3);
> > +EXPORT_SYMBOL(debugo_extra4);
> 
> ok, while this is a test patch of yours, lets make one thing sure: all 
> things hook-alike instrumentation _MUST_ and will stay stay GPL 
> exported. It's all very internal, and there will be no automatic, 
> programmable interface stability guarantees for any of the markers that 
> kabi crap could come and shackle the kernel with ... (it will all be 
> stable to SystemTap of course - but SystemTap is a kernel-internal 
> entity in that regard)
> 
> 	Ingo
> 

Yes, I agree. This patch is just an ugly test-hack. :)

I already designed my LTTng tracer so it follows changes to the internal
kernel structures as easily as possible, but yes, it is kernel internal
too.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2)
  2008-04-16 13:06   ` Ingo Molnar
  2008-04-16 13:47     ` [TEST PATCH] Test NMI kprobe modules Mathieu Desnoyers
@ 2008-04-16 15:10     ` Ingo Molnar
  2008-04-16 15:18       ` H. Peter Anvin
                         ` (3 more replies)
  1 sibling, 4 replies; 19+ messages in thread
From: Ingo Molnar @ 2008-04-16 15:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, akpm, H. Peter Anvin, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> thanks Mathieu, i've picked this up into x86.git for more testing.

... but had to drop it due to missing PARAVIRT support which broke the 
build. I guess on paravirt we could just initially define 
INTERRUPT_RETURN_NMI_SAFE to iret, etc.?

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2)
  2008-04-16 15:10     ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2) Ingo Molnar
@ 2008-04-16 15:18       ` H. Peter Anvin
  2008-04-16 15:37       ` Mathieu Desnoyers
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2008-04-16 15:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, Andi Kleen, akpm, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>> thanks Mathieu, i've picked this up into x86.git for more testing.
> 
> ... but had to drop it due to missing PARAVIRT support which broke the 
> build. I guess on paravirt we could just initially define 
> INTERRUPT_RETURN_NMI_SAFE to iret, etc.?

I figure that's what we'd have to do.  This is exactly the fundamental 
problem with paravirt - it relies on a different, an ill-defined, model 
of the platform than the (generally well-defined) hardware specification 
provides.


	-hpa

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2)
  2008-04-16 15:10     ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2) Ingo Molnar
  2008-04-16 15:18       ` H. Peter Anvin
@ 2008-04-16 15:37       ` Mathieu Desnoyers
  2008-04-16 16:03       ` Jeremy Fitzhardinge
  2008-04-16 16:28       ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3) Mathieu Desnoyers
  3 siblings, 0 replies; 19+ messages in thread
From: Mathieu Desnoyers @ 2008-04-16 15:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, akpm, H. Peter Anvin, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > thanks Mathieu, i've picked this up into x86.git for more testing.
> 
> ... but had to drop it due to missing PARAVIRT support which broke the 
> build. I guess on paravirt we could just initially define 
> INTERRUPT_RETURN_NMI_SAFE to iret, etc.?
> 

Yes, I was about to email you about this.

Would it be valid to execute a popf in PARAVIRT or should be have some
special support like the normal iret does ?

And yes, initially using iret for paravirt should just work, but it
would leave the NMI iret issue for paravirt.

Mathieu

> 	Ingo
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2)
  2008-04-16 15:10     ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2) Ingo Molnar
  2008-04-16 15:18       ` H. Peter Anvin
  2008-04-16 15:37       ` Mathieu Desnoyers
@ 2008-04-16 16:03       ` Jeremy Fitzhardinge
  2008-04-18  0:48         ` Mathieu Desnoyers
  2008-04-16 16:28       ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3) Mathieu Desnoyers
  3 siblings, 1 reply; 19+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-16 16:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, Andi Kleen, akpm, H. Peter Anvin,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel, Rusty Russell,
	Zachary Amsden

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>   
>> thanks Mathieu, i've picked this up into x86.git for more testing.
>>     
>
> ... but had to drop it due to missing PARAVIRT support which broke the 
> build. I guess on paravirt we could just initially define 
> INTERRUPT_RETURN_NMI_SAFE to iret, etc.?

I have not yet implemented Xen's support for paravirtual NMI, so there's 
no scope for breaking anything from my perspective.  When I get around 
to NMI, I'll work around whatever's there.  I don't know if lguest or 
VMI has any guest NMI support.

    J

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)
  2008-04-16 15:10     ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2) Ingo Molnar
                         ` (2 preceding siblings ...)
  2008-04-16 16:03       ` Jeremy Fitzhardinge
@ 2008-04-16 16:28       ` Mathieu Desnoyers
  2008-04-16 17:57         ` Jeremy Fitzhardinge
  3 siblings, 1 reply; 19+ messages in thread
From: Mathieu Desnoyers @ 2008-04-16 16:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, akpm, H. Peter Anvin, Jeremy Fitzhardinge,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

(added PARAVIRT support, simple mapping of INTERRUPT_RETURN_NMI_SAFE to
INTERRUPT_RETURN)

x86 NMI-safe INT3 and Page Fault

Implements an alternative iret with popf and return so trap and exception
handlers can return to the NMI handler without issuing iret. iret would cause
NMIs to be reenabled prematurely. x86_32 uses popf and far return. x86_64 has to
copy the return instruction pointer to the top of the previous stack, issue a
popf, loads the previous esp and issue a near return (ret).

It allows placing immediate values (and therefore optimized trace_marks) in NMI
code since returning from a breakpoint would be valid. Accessing vmalloc'd
memory, which allows executing module code or accessing vmapped or vmalloc'd
areas from NMI context, would also be valid. This is very useful to tracers like
LTTng.

This patch makes all faults, traps and exception safe to be called from NMI
context *except* single-stepping, which requires iret to restore the TF (trap
flag) and jump to the return address in a single instruction. Sorry, no kprobes
support in NMI handlers because of this limitation.  We cannot single-step an
NMI handler, because iret must set the TF flag and return back to the
instruction to single-step in a single instruction. This cannot be emulated with
popf/lret, because lret would be single-stepped. It does not apply to immediate
values because they do not use single-stepping. This code detects if the TF
flag is set and uses the iret path for single-stepping, even if it reactivates
NMIs prematurely.

alpha and avr32 use the active count bit 31. This patch moves them to 28.

TODO : test alpha and avr32 active count modification

tested on x86_32 (tests implemented in a separate patch) :
- instrumented the return path to export the EIP, CS and EFLAGS values when
  taken so we know the return path code has been executed.
- trace_mark, using immediate values, with 10ms delay with the breakpoint
  activated. Runs well through the return path.
- tested vmalloc faults in NMI handler by placing a non-optimized marker in the
  NMI handler (so no breakpoint is executed) and connecting a probe which
  touches every pages of a 20MB vmalloc'd buffer. It executes trough the return
  path without problem.
- Tested with and without preemption

tested on x86_64
- instrumented the return path to export the EIP, CS and EFLAGS values when
  taken so we know the return path code has been executed.
- trace_mark, using immediate values, with 10ms delay with the breakpoint
  activated. Runs well through the return path.

To test on x86_64 :
- Test without preemption
- Test vmalloc faults
- Test on Intel 64 bits CPUs.

"This way lies madness. Don't go there."
- Andi

Changelog since v1 :
- x86_64 fixes.
Changelog since v2 :
- paravirt support

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: akpm@osdl.org
CC: mingo@elte.hu
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: "Frank Ch. Eigler" <fche@redhat.com>
---
 arch/x86/kernel/entry_32.S      |   25 +++++++++++++++-
 arch/x86/kernel/entry_64.S      |   31 ++++++++++++++++++++
 include/asm-alpha/thread_info.h |    2 -
 include/asm-avr32/thread_info.h |    2 -
 include/asm-x86/irqflags.h      |   61 ++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/paravirt.h      |    2 +
 include/linux/hardirq.h         |   24 ++++++++++++++-
 7 files changed, 142 insertions(+), 5 deletions(-)

Index: linux-2.6-lttng/include/linux/hardirq.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/hardirq.h	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/include/linux/hardirq.h	2008-04-16 11:29:30.000000000 -0400
@@ -22,10 +22,13 @@
  * PREEMPT_MASK: 0x000000ff
  * SOFTIRQ_MASK: 0x0000ff00
  * HARDIRQ_MASK: 0x0fff0000
+ * HARDNMI_MASK: 0x40000000
  */
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 
+#define HARDNMI_BITS	1
+
 #ifndef HARDIRQ_BITS
 #define HARDIRQ_BITS	12
 
@@ -45,16 +48,19 @@
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
 #define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDNMI_SHIFT	(30)
 
 #define __IRQ_MASK(x)	((1UL << (x))-1)
 
 #define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
 #define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
 #define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
+#define HARDNMI_MASK	(__IRQ_MASK(HARDNMI_BITS) << HARDNMI_SHIFT)
 
 #define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
 #define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
 #define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
+#define HARDNMI_OFFSET	(1UL << HARDNMI_SHIFT)
 
 #if PREEMPT_ACTIVE < (1 << (HARDIRQ_SHIFT + HARDIRQ_BITS))
 #error PREEMPT_ACTIVE is too low!
@@ -63,6 +69,7 @@
 #define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
 #define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
 #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
+#define hardnmi_count()	(preempt_count() & HARDNMI_MASK)
 
 /*
  * Are we doing bottom half or hardware interrupt processing?
@@ -71,6 +78,7 @@
 #define in_irq()		(hardirq_count())
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
+#define in_nmi()		(hardnmi_count())
 
 /*
  * Are we running in atomic context?  WARNING: this macro cannot
@@ -159,7 +167,19 @@ extern void irq_enter(void);
  */
 extern void irq_exit(void);
 
-#define nmi_enter()		do { lockdep_off(); __irq_enter(); } while (0)
-#define nmi_exit()		do { __irq_exit(); lockdep_on(); } while (0)
+#define nmi_enter()					\
+	do {						\
+		lockdep_off();				\
+		BUG_ON(hardnmi_count());		\
+		add_preempt_count(HARDNMI_OFFSET);	\
+		__irq_enter();				\
+	} while (0)
+
+#define nmi_exit()					\
+	do {						\
+		__irq_exit();				\
+		sub_preempt_count(HARDNMI_OFFSET);	\
+		lockdep_on();				\
+	} while (0)
 
 #endif /* LINUX_HARDIRQ_H */
Index: linux-2.6-lttng/arch/x86/kernel/entry_32.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/entry_32.S	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/entry_32.S	2008-04-16 12:06:30.000000000 -0400
@@ -79,7 +79,6 @@ VM_MASK		= 0x00020000
 #define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 #define preempt_stop(clobbers)
-#define resume_kernel		restore_nocheck
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -265,6 +264,8 @@ END(ret_from_exception)
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
+	testl $0x40000000,TI_preempt_count(%ebp)	# nested over NMI ?
+	jnz return_to_nmi
 	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
 	jnz restore_nocheck
 need_resched:
@@ -276,6 +277,12 @@ need_resched:
 	call preempt_schedule_irq
 	jmp need_resched
 END(resume_kernel)
+#else
+ENTRY(resume_kernel)
+	testl $0x40000000,TI_preempt_count(%ebp)	# nested over NMI ?
+	jnz return_to_nmi
+	jmp restore_nocheck
+END(resume_kernel)
 #endif
 	CFI_ENDPROC
 
@@ -411,6 +418,22 @@ restore_nocheck_notrace:
 	CFI_ADJUST_CFA_OFFSET -4
 irq_return:
 	INTERRUPT_RETURN
+return_to_nmi:
+	testl $X86_EFLAGS_TF, PT_EFLAGS(%esp)
+	jnz restore_nocheck		/*
+					 * If single-stepping an NMI handler,
+					 * use the normal iret path instead of
+					 * the popf/lret because lret would be
+					 * single-stepped. It should not
+					 * happen : it will reactivate NMIs
+					 * prematurely.
+					 */
+	TRACE_IRQS_IRET
+	RESTORE_REGS
+	addl $4, %esp			# skip orig_eax/error_code
+	CFI_ADJUST_CFA_OFFSET -4
+	INTERRUPT_RETURN_NMI_SAFE
+
 .section .fixup,"ax"
 iret_exc:
 	pushl $0			# no error code
Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/entry_64.S	2008-04-16 12:06:31.000000000 -0400
@@ -581,12 +581,27 @@ retint_restore_args:	/* return to kernel
 	 * The iretq could re-enable interrupts:
 	 */
 	TRACE_IRQS_IRETQ
+	testl $0x40000000,threadinfo_preempt_count(%rcx) /* Nested over NMI ? */
+	jnz return_to_nmi
 restore_args:
 	RESTORE_ARGS 0,8,0
 
 irq_return:
 	INTERRUPT_RETURN
 
+return_to_nmi:				/*
+					 * If single-stepping an NMI handler,
+					 * use the normal iret path instead of
+					 * the popf/lret because lret would be
+					 * single-stepped. It should not
+					 * happen : it will reactivate NMIs
+					 * prematurely.
+					 */
+	bt $8,EFLAGS-ARGOFFSET(%rsp)	/* trap flag? */
+	jc restore_args
+	RESTORE_ARGS 0,8,0
+	INTERRUPT_RETURN_NMI_SAFE
+
 	.section __ex_table, "a"
 	.quad irq_return, bad_iret
 	.previous
@@ -802,6 +817,10 @@ END(spurious_interrupt)
 	.macro paranoidexit trace=1
 	/* ebx:	no swapgs flag */
 paranoid_exit\trace:
+	GET_THREAD_INFO(%rcx)
+	testl $0x40000000,threadinfo_preempt_count(%rcx) /* Nested over NMI ? */
+	jnz paranoid_return_to_nmi\trace
+paranoid_exit_no_nmi\trace:
 	testl %ebx,%ebx				/* swapgs needed? */
 	jnz paranoid_restore\trace
 	testl $3,CS(%rsp)
@@ -814,6 +833,18 @@ paranoid_swapgs\trace:
 paranoid_restore\trace:
 	RESTORE_ALL 8
 	jmp irq_return
+paranoid_return_to_nmi\trace:		/*
+					 * If single-stepping an NMI handler,
+					 * use the normal iret path instead of
+					 * the popf/lret because lret would be
+					 * single-stepped. It should not
+					 * happen : it will reactivate NMIs
+					 * prematurely.
+					 */
+	bt $8,EFLAGS-0(%rsp)		/* trap flag? */
+	jc paranoid_exit_no_nmi\trace
+	RESTORE_ALL 8
+	INTERRUPT_RETURN_NMI_SAFE
 paranoid_userspace\trace:
 	GET_THREAD_INFO(%rcx)
 	movl threadinfo_flags(%rcx),%ebx
Index: linux-2.6-lttng/include/asm-x86/irqflags.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/irqflags.h	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86/irqflags.h	2008-04-16 11:29:30.000000000 -0400
@@ -138,12 +138,73 @@ static inline unsigned long __raw_local_
 
 #ifdef CONFIG_X86_64
 #define INTERRUPT_RETURN	iretq
+
+/*
+ * Only returns from a trap or exception to a NMI context (intra-privilege
+ * level near return) to the same SS and CS segments. Should be used
+ * upon trap or exception return when nested over a NMI context so no iret is
+ * issued. It takes care of modifying the eflags, rsp and returning to the
+ * previous function.
+ *
+ * The stack, at that point, looks like :
+ *
+ * 0(rsp)  RIP
+ * 8(rsp)  CS
+ * 16(rsp) EFLAGS
+ * 24(rsp) RSP
+ * 32(rsp) SS
+ *
+ * Upon execution :
+ * Copy EIP to the top of the return stack
+ * Update top of return stack address
+ * Pop eflags into the eflags register
+ * Make the return stack current
+ * Near return (popping the return address from the return stack)
+ */
+#define INTERRUPT_RETURN_NMI_SAFE	pushq %rax;		\
+					pushq %rbx;		\
+					movq 40(%rsp), %rax;	\
+					movq 16(%rsp), %rbx;	\
+					subq $8, %rax;		\
+					movq %rbx, (%rax);	\
+					movq %rax, 40(%rsp);	\
+					popq %rbx;		\
+					popq %rax;		\
+					addq $16, %rsp;		\
+					popfq;			\
+					movq (%rsp), %rsp;	\
+					ret;			\
+
 #define ENABLE_INTERRUPTS_SYSCALL_RET			\
 			movq	%gs:pda_oldrsp, %rsp;	\
 			swapgs;				\
 			sysretq;
 #else
 #define INTERRUPT_RETURN		iret
+
+/*
+ * Protected mode only, no V8086. Implies that protected mode must
+ * be entered before NMIs or MCEs are enabled. Only returns from a trap or
+ * exception to a NMI context (intra-privilege level far return). Should be used
+ * upon trap or exception return when nested over a NMI context so no iret is
+ * issued.
+ *
+ * The stack, at that point, looks like :
+ *
+ * 0(esp) EIP
+ * 4(esp) CS
+ * 8(esp) EFLAGS
+ *
+ * Upon execution :
+ * Copy the stack eflags to top of stack
+ * Pop eflags into the eflags register
+ * Far return: pop EIP and CS into their register, and additionally pop EFLAGS.
+ */
+#define INTERRUPT_RETURN_NMI_SAFE	pushl 8(%esp);	\
+					popfl;		\
+					.byte 0xCA;	\
+					.word 4;
+
 #define ENABLE_INTERRUPTS_SYSCALL_RET	sti; sysexit
 #define GET_CR0_INTO_EAX		movl %cr0, %eax
 #endif
Index: linux-2.6-lttng/include/asm-alpha/thread_info.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-alpha/thread_info.h	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/include/asm-alpha/thread_info.h	2008-04-16 12:06:32.000000000 -0400
@@ -57,7 +57,7 @@ register struct thread_info *__current_t
 
 #endif /* __ASSEMBLY__ */
 
-#define PREEMPT_ACTIVE		0x40000000
+#define PREEMPT_ACTIVE		0x10000000
 
 /*
  * Thread information flags:
Index: linux-2.6-lttng/include/asm-avr32/thread_info.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-avr32/thread_info.h	2008-04-16 11:25:18.000000000 -0400
+++ linux-2.6-lttng/include/asm-avr32/thread_info.h	2008-04-16 12:06:32.000000000 -0400
@@ -70,7 +70,7 @@ static inline struct thread_info *curren
 
 #endif /* !__ASSEMBLY__ */
 
-#define PREEMPT_ACTIVE		0x40000000
+#define PREEMPT_ACTIVE		0x10000000
 
 /*
  * Thread information flags
Index: linux-2.6-lttng/include/asm-x86/paravirt.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/paravirt.h	2008-04-16 12:23:44.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86/paravirt.h	2008-04-16 12:24:36.000000000 -0400
@@ -1358,6 +1358,8 @@ static inline unsigned long __raw_local_
 	PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_iret), CLBR_NONE,	\
 		  jmp *%cs:pv_cpu_ops+PV_CPU_iret)
 
+#define INTERRUPT_RETURN_NMI_SAFE INTERRUPT_RETURN
+
 #define DISABLE_INTERRUPTS(clobbers)					\
 	PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers, \
 		  PV_SAVE_REGS;			\
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)
  2008-04-16 16:28       ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3) Mathieu Desnoyers
@ 2008-04-16 17:57         ` Jeremy Fitzhardinge
  2008-04-17 16:29           ` Mathieu Desnoyers
  0 siblings, 1 reply; 19+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-16 17:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Andi Kleen, akpm, H. Peter Anvin, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

Mathieu Desnoyers wrote:
> "This way lies madness. Don't go there."
>   

It is a large amount of... stuff.  This immediate values thing makes a 
big improvement then?

> Index: linux-2.6-lttng/include/linux/hardirq.h
> ===================================================================
> --- linux-2.6-lttng.orig/include/linux/hardirq.h	2008-04-16 11:25:18.000000000 -0400
> +++ linux-2.6-lttng/include/linux/hardirq.h	2008-04-16 11:29:30.000000000 -0400
> @@ -22,10 +22,13 @@
>   * PREEMPT_MASK: 0x000000ff
>   * SOFTIRQ_MASK: 0x0000ff00
>   * HARDIRQ_MASK: 0x0fff0000
> + * HARDNMI_MASK: 0x40000000
>   */
>  #define PREEMPT_BITS	8
>  #define SOFTIRQ_BITS	8
>  
> +#define HARDNMI_BITS	1
> +
>  #ifndef HARDIRQ_BITS
>  #define HARDIRQ_BITS	12
>  
> @@ -45,16 +48,19 @@
>  #define PREEMPT_SHIFT	0
>  #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
>  #define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
> +#define HARDNMI_SHIFT	(30)
>   

Why at 30, rather than HARDIRQ_SHIFT+HARDIRQ_BITS?

>  
>  #define __IRQ_MASK(x)	((1UL << (x))-1)
>  
>  #define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
>  #define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
>  #define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
> +#define HARDNMI_MASK	(__IRQ_MASK(HARDNMI_BITS) << HARDNMI_SHIFT)
>  
>  #define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
>  #define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
>  #define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
> +#define HARDNMI_OFFSET	(1UL << HARDNMI_SHIFT)
>  
>  #if PREEMPT_ACTIVE < (1 << (HARDIRQ_SHIFT + HARDIRQ_BITS))
>  #error PREEMPT_ACTIVE is too low!
> @@ -63,6 +69,7 @@
>  #define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
>  #define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
>  #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
> +#define hardnmi_count()	(preempt_count() & HARDNMI_MASK)
>  
>  /*
>   * Are we doing bottom half or hardware interrupt processing?
> @@ -71,6 +78,7 @@
>  #define in_irq()		(hardirq_count())
>  #define in_softirq()		(softirq_count())
>  #define in_interrupt()		(irq_count())
> +#define in_nmi()		(hardnmi_count())
>  
>  /*
>   * Are we running in atomic context?  WARNING: this macro cannot
> @@ -159,7 +167,19 @@ extern void irq_enter(void);
>   */
>  extern void irq_exit(void);
>  
> -#define nmi_enter()		do { lockdep_off(); __irq_enter(); } while (0)
> -#define nmi_exit()		do { __irq_exit(); lockdep_on(); } while (0)
> +#define nmi_enter()					\
> +	do {						\
> +		lockdep_off();				\
> +		BUG_ON(hardnmi_count());		\
> +		add_preempt_count(HARDNMI_OFFSET);	\
> +		__irq_enter();				\
> +	} while (0)
> +
> +#define nmi_exit()					\
> +	do {						\
> +		__irq_exit();				\
> +		sub_preempt_count(HARDNMI_OFFSET);	\
> +		lockdep_on();				\
> +	} while (0)
>  
>  #endif /* LINUX_HARDIRQ_H */
> Index: linux-2.6-lttng/arch/x86/kernel/entry_32.S
> ===================================================================
> --- linux-2.6-lttng.orig/arch/x86/kernel/entry_32.S	2008-04-16 11:25:18.000000000 -0400
> +++ linux-2.6-lttng/arch/x86/kernel/entry_32.S	2008-04-16 12:06:30.000000000 -0400
> @@ -79,7 +79,6 @@ VM_MASK		= 0x00020000
>  #define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
>  #else
>  #define preempt_stop(clobbers)
> -#define resume_kernel		restore_nocheck
>  #endif
>  
>  .macro TRACE_IRQS_IRET
> @@ -265,6 +264,8 @@ END(ret_from_exception)
>  #ifdef CONFIG_PREEMPT
>  ENTRY(resume_kernel)
>  	DISABLE_INTERRUPTS(CLBR_ANY)
> +	testl $0x40000000,TI_preempt_count(%ebp)	# nested over NMI ?
> +	jnz return_to_nmi
>  	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
>  	jnz restore_nocheck
>  need_resched:
> @@ -276,6 +277,12 @@ need_resched:
>  	call preempt_schedule_irq
>  	jmp need_resched
>  END(resume_kernel)
> +#else
> +ENTRY(resume_kernel)
> +	testl $0x40000000,TI_preempt_count(%ebp)	# nested over NMI ?
>   

HARDNMI_MASK?

> +	jnz return_to_nmi
> +	jmp restore_nocheck
> +END(resume_kernel)
>  #endif
>  	CFI_ENDPROC
>  
> @@ -411,6 +418,22 @@ restore_nocheck_notrace:
>  	CFI_ADJUST_CFA_OFFSET -4
>  irq_return:
>  	INTERRUPT_RETURN
> +return_to_nmi:
> +	testl $X86_EFLAGS_TF, PT_EFLAGS(%esp)
> +	jnz restore_nocheck		/*
> +					 * If single-stepping an NMI handler,
> +					 * use the normal iret path instead of
> +					 * the popf/lret because lret would be
> +					 * single-stepped. It should not
> +					 * happen : it will reactivate NMIs
> +					 * prematurely.
> +					 */
> +	TRACE_IRQS_IRET
> +	RESTORE_REGS
> +	addl $4, %esp			# skip orig_eax/error_code
> +	CFI_ADJUST_CFA_OFFSET -4
> +	INTERRUPT_RETURN_NMI_SAFE
> +
>  .section .fixup,"ax"
>  iret_exc:
>  	pushl $0			# no error code
> Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S
> ===================================================================
> --- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S	2008-04-16 11:25:18.000000000 -0400
> +++ linux-2.6-lttng/arch/x86/kernel/entry_64.S	2008-04-16 12:06:31.000000000 -0400
> @@ -581,12 +581,27 @@ retint_restore_args:	/* return to kernel
>  	 * The iretq could re-enable interrupts:
>  	 */
>  	TRACE_IRQS_IRETQ
> +	testl $0x40000000,threadinfo_preempt_count(%rcx) /* Nested over NMI ? */
>   

HARDNMI_MASK?  (ditto below)

> +	jnz return_to_nmi
>  restore_args:
>  	RESTORE_ARGS 0,8,0
>  
>  irq_return:
>  	INTERRUPT_RETURN
>  
> +return_to_nmi:				/*
> +					 * If single-stepping an NMI handler,
> +					 * use the normal iret path instead of
> +					 * the popf/lret because lret would be
> +					 * single-stepped. It should not
> +					 * happen : it will reactivate NMIs
> +					 * prematurely.
> +					 */
> +	bt $8,EFLAGS-ARGOFFSET(%rsp)	/* trap flag? */
>   

test[bwl] is a bit more usual; then you can use X86_EFLAGS_TF.

> +	jc restore_args
> +	RESTORE_ARGS 0,8,0
> +	INTERRUPT_RETURN_NMI_SAFE
> +
>  	.section __ex_table, "a"
>  	.quad irq_return, bad_iret
>  	.previous
> @@ -802,6 +817,10 @@ END(spurious_interrupt)
>  	.macro paranoidexit trace=1
>  	/* ebx:	no swapgs flag */
>  paranoid_exit\trace:
> +	GET_THREAD_INFO(%rcx)
> +	testl $0x40000000,threadinfo_preempt_count(%rcx) /* Nested over NMI ? */
> +	jnz paranoid_return_to_nmi\trace
> +paranoid_exit_no_nmi\trace:
>  	testl %ebx,%ebx				/* swapgs needed? */
>  	jnz paranoid_restore\trace
>  	testl $3,CS(%rsp)
> @@ -814,6 +833,18 @@ paranoid_swapgs\trace:
>  paranoid_restore\trace:
>  	RESTORE_ALL 8
>  	jmp irq_return
> +paranoid_return_to_nmi\trace:		/*
> +					 * If single-stepping an NMI handler,
> +					 * use the normal iret path instead of
> +					 * the popf/lret because lret would be
> +					 * single-stepped. It should not
> +					 * happen : it will reactivate NMIs
> +					 * prematurely.
> +					 */
> +	bt $8,EFLAGS-0(%rsp)		/* trap flag? */
> +	jc paranoid_exit_no_nmi\trace
> +	RESTORE_ALL 8
> +	INTERRUPT_RETURN_NMI_SAFE
>   

Does this need to be repeated verbatim?

>  paranoid_userspace\trace:
>  	GET_THREAD_INFO(%rcx)
>  	movl threadinfo_flags(%rcx),%ebx
> Index: linux-2.6-lttng/include/asm-x86/irqflags.h
> ===================================================================
> --- linux-2.6-lttng.orig/include/asm-x86/irqflags.h	2008-04-16 11:25:18.000000000 -0400
> +++ linux-2.6-lttng/include/asm-x86/irqflags.h	2008-04-16 11:29:30.000000000 -0400
> @@ -138,12 +138,73 @@ static inline unsigned long __raw_local_
>  
>  #ifdef CONFIG_X86_64
>  #define INTERRUPT_RETURN	iretq
> +
> +/*
> + * Only returns from a trap or exception to a NMI context (intra-privilege
> + * level near return) to the same SS and CS segments. Should be used
> + * upon trap or exception return when nested over a NMI context so no iret is
> + * issued. It takes care of modifying the eflags, rsp and returning to the
> + * previous function.
> + *
> + * The stack, at that point, looks like :
> + *
> + * 0(rsp)  RIP
> + * 8(rsp)  CS
> + * 16(rsp) EFLAGS
> + * 24(rsp) RSP
> + * 32(rsp) SS
> + *
> + * Upon execution :
> + * Copy EIP to the top of the return stack
> + * Update top of return stack address
> + * Pop eflags into the eflags register
> + * Make the return stack current
> + * Near return (popping the return address from the return stack)
> + */
> +#define INTERRUPT_RETURN_NMI_SAFE	pushq %rax;		\
> +					pushq %rbx;		\
> +					movq 40(%rsp), %rax;	\
> +					movq 16(%rsp), %rbx;	\
>   

Use X+16(%rsp) notation here, so that the offsets correspond to the 
comment above.

> +					subq $8, %rax;		\
> +					movq %rbx, (%rax);	\
> +					movq %rax, 40(%rsp);	\
> +					popq %rbx;		\
> +					popq %rax;		\
> +					addq $16, %rsp;		\
> +					popfq;			\
> +					movq (%rsp), %rsp;	\
> +					ret;			\
>   

How about something like

	pushq	%rax
	mov	%rsp, %rax		/* old stack */
	mov	24+8(%rax), %rsp	/* switch stacks */
	pushq	 0+8(%rax)		/* push return rip */
	pushq	16+8(%rax)		/* push return rflags */
	movq	(%rax), %rax		/* restore %rax */
	popfq				/* restore flags */
	ret				/* restore rip */


> +
>  #define ENABLE_INTERRUPTS_SYSCALL_RET			\
>  			movq	%gs:pda_oldrsp, %rsp;	\
>  			swapgs;				\
>  			sysretq;
>  #else
>  #define INTERRUPT_RETURN		iret
> +
> +/*
> + * Protected mode only, no V8086. Implies that protected mode must
> + * be entered before NMIs or MCEs are enabled. Only returns from a trap or
> + * exception to a NMI context (intra-privilege level far return). Should be used
> + * upon trap or exception return when nested over a NMI context so no iret is
> + * issued.
> + *
> + * The stack, at that point, looks like :
> + *
> + * 0(esp) EIP
> + * 4(esp) CS
> + * 8(esp) EFLAGS
> + *
> + * Upon execution :
> + * Copy the stack eflags to top of stack
> + * Pop eflags into the eflags register
> + * Far return: pop EIP and CS into their register, and additionally pop EFLAGS.
> + */
> +#define INTERRUPT_RETURN_NMI_SAFE	pushl 8(%esp);	\
> +					popfl;		\
> +					.byte 0xCA;	\
> +					.word 4;
>   

Why not "lret $4"?

    J

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)
  2008-04-16 17:57         ` Jeremy Fitzhardinge
@ 2008-04-17 16:29           ` Mathieu Desnoyers
  2008-04-17 16:45             ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Mathieu Desnoyers @ 2008-04-17 16:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Andi Kleen, akpm, H. Peter Anvin, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel

* Jeremy Fitzhardinge (jeremy@goop.org) wrote:
> Mathieu Desnoyers wrote:
>> "This way lies madness. Don't go there."
>>   
>
> It is a large amount of... stuff.  This immediate values thing makes a big 
> improvement then?
>

As ingo said : the nmi-safe traps and exception is not only usefu lto
immediate values, but also to oprofile. On top of that, the LTTng kernel
tracer has to write into vmalloc'd memory, so it's required there too.

For the immediate values, I doubt kernel developers will like the idea
of adding Kernel Markers in their code if it implies an added data cache
hit for every added marker. That's the problem immediate values solve.

>> Index: linux-2.6-lttng/include/linux/hardirq.h
>> ===================================================================
>> --- linux-2.6-lttng.orig/include/linux/hardirq.h	2008-04-16 
>> 11:25:18.000000000 -0400
>> +++ linux-2.6-lttng/include/linux/hardirq.h	2008-04-16 11:29:30.000000000 
>> -0400
>> @@ -22,10 +22,13 @@
>>   * PREEMPT_MASK: 0x000000ff
>>   * SOFTIRQ_MASK: 0x0000ff00
>>   * HARDIRQ_MASK: 0x0fff0000
>> + * HARDNMI_MASK: 0x40000000
>>   */
>>  #define PREEMPT_BITS	8
>>  #define SOFTIRQ_BITS	8
>>  +#define HARDNMI_BITS	1
>> +
>>  #ifndef HARDIRQ_BITS
>>  #define HARDIRQ_BITS	12
>>  @@ -45,16 +48,19 @@
>>  #define PREEMPT_SHIFT	0
>>  #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
>>  #define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
>> +#define HARDNMI_SHIFT	(30)
>>   
>
> Why at 30, rather than HARDIRQ_SHIFT+HARDIRQ_BITS?
>

Because bit 28 (HARDIRQ_SHIFT+HARDIRQ_BITS) is used for the
PREEMPT_ACTIVE flag. I tried to find the bit the less used across the
various architectures.

>>   #define __IRQ_MASK(x)	((1UL << (x))-1)
>>   #define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
>>  #define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
>>  #define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
>> +#define HARDNMI_MASK	(__IRQ_MASK(HARDNMI_BITS) << HARDNMI_SHIFT)
>>   #define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
>>  #define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
>>  #define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
>> +#define HARDNMI_OFFSET	(1UL << HARDNMI_SHIFT)
>>   #if PREEMPT_ACTIVE < (1 << (HARDIRQ_SHIFT + HARDIRQ_BITS))
>>  #error PREEMPT_ACTIVE is too low!
>> @@ -63,6 +69,7 @@
>>  #define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
>>  #define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
>>  #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
>> +#define hardnmi_count()	(preempt_count() & HARDNMI_MASK)
>>   /*
>>   * Are we doing bottom half or hardware interrupt processing?
>> @@ -71,6 +78,7 @@
>>  #define in_irq()		(hardirq_count())
>>  #define in_softirq()		(softirq_count())
>>  #define in_interrupt()		(irq_count())
>> +#define in_nmi()		(hardnmi_count())
>>   /*
>>   * Are we running in atomic context?  WARNING: this macro cannot
>> @@ -159,7 +167,19 @@ extern void irq_enter(void);
>>   */
>>  extern void irq_exit(void);
>>  -#define nmi_enter()		do { lockdep_off(); __irq_enter(); } while (0)
>> -#define nmi_exit()		do { __irq_exit(); lockdep_on(); } while (0)
>> +#define nmi_enter()					\
>> +	do {						\
>> +		lockdep_off();				\
>> +		BUG_ON(hardnmi_count());		\
>> +		add_preempt_count(HARDNMI_OFFSET);	\
>> +		__irq_enter();				\
>> +	} while (0)
>> +
>> +#define nmi_exit()					\
>> +	do {						\
>> +		__irq_exit();				\
>> +		sub_preempt_count(HARDNMI_OFFSET);	\
>> +		lockdep_on();				\
>> +	} while (0)
>>   #endif /* LINUX_HARDIRQ_H */
>> Index: linux-2.6-lttng/arch/x86/kernel/entry_32.S
>> ===================================================================
>> --- linux-2.6-lttng.orig/arch/x86/kernel/entry_32.S	2008-04-16 
>> 11:25:18.000000000 -0400
>> +++ linux-2.6-lttng/arch/x86/kernel/entry_32.S	2008-04-16 
>> 12:06:30.000000000 -0400
>> @@ -79,7 +79,6 @@ VM_MASK		= 0x00020000
>>  #define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); 
>> TRACE_IRQS_OFF
>>  #else
>>  #define preempt_stop(clobbers)
>> -#define resume_kernel		restore_nocheck
>>  #endif
>>   .macro TRACE_IRQS_IRET
>> @@ -265,6 +264,8 @@ END(ret_from_exception)
>>  #ifdef CONFIG_PREEMPT
>>  ENTRY(resume_kernel)
>>  	DISABLE_INTERRUPTS(CLBR_ANY)
>> +	testl $0x40000000,TI_preempt_count(%ebp)	# nested over NMI ?
>> +	jnz return_to_nmi
>>  	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
>>  	jnz restore_nocheck
>>  need_resched:
>> @@ -276,6 +277,12 @@ need_resched:
>>  	call preempt_schedule_irq
>>  	jmp need_resched
>>  END(resume_kernel)
>> +#else
>> +ENTRY(resume_kernel)
>> +	testl $0x40000000,TI_preempt_count(%ebp)	# nested over NMI ?
>>   
>
> HARDNMI_MASK?
>

Will fix.

>> +	jnz return_to_nmi
>> +	jmp restore_nocheck
>> +END(resume_kernel)
>>  #endif
>>  	CFI_ENDPROC
>>  @@ -411,6 +418,22 @@ restore_nocheck_notrace:
>>  	CFI_ADJUST_CFA_OFFSET -4
>>  irq_return:
>>  	INTERRUPT_RETURN
>> +return_to_nmi:
>> +	testl $X86_EFLAGS_TF, PT_EFLAGS(%esp)
>> +	jnz restore_nocheck		/*
>> +					 * If single-stepping an NMI handler,
>> +					 * use the normal iret path instead of
>> +					 * the popf/lret because lret would be
>> +					 * single-stepped. It should not
>> +					 * happen : it will reactivate NMIs
>> +					 * prematurely.
>> +					 */
>> +	TRACE_IRQS_IRET
>> +	RESTORE_REGS
>> +	addl $4, %esp			# skip orig_eax/error_code
>> +	CFI_ADJUST_CFA_OFFSET -4
>> +	INTERRUPT_RETURN_NMI_SAFE
>> +
>>  .section .fixup,"ax"
>>  iret_exc:
>>  	pushl $0			# no error code
>> Index: linux-2.6-lttng/arch/x86/kernel/entry_64.S
>> ===================================================================
>> --- linux-2.6-lttng.orig/arch/x86/kernel/entry_64.S	2008-04-16 
>> 11:25:18.000000000 -0400
>> +++ linux-2.6-lttng/arch/x86/kernel/entry_64.S	2008-04-16 
>> 12:06:31.000000000 -0400
>> @@ -581,12 +581,27 @@ retint_restore_args:	/* return to kernel
>>  	 * The iretq could re-enable interrupts:
>>  	 */
>>  	TRACE_IRQS_IRETQ
>> +	testl $0x40000000,threadinfo_preempt_count(%rcx) /* Nested over NMI ? */
>>   
>
> HARDNMI_MASK?  (ditto below)
>

Will fix too.

>> +	jnz return_to_nmi
>>  restore_args:
>>  	RESTORE_ARGS 0,8,0
>>   irq_return:
>>  	INTERRUPT_RETURN
>>  +return_to_nmi:				/*
>> +					 * If single-stepping an NMI handler,
>> +					 * use the normal iret path instead of
>> +					 * the popf/lret because lret would be
>> +					 * single-stepped. It should not
>> +					 * happen : it will reactivate NMIs
>> +					 * prematurely.
>> +					 */
>> +	bt $8,EFLAGS-ARGOFFSET(%rsp)	/* trap flag? */
>>   
>
> test[bwl] is a bit more usual; then you can use X86_EFLAGS_TF.
>

Ok. I tried to follow the coding style in entry_64.S, but I agree test
is more elegant. testw should be ok for 0x00000100.

>> +	jc restore_args
>> +	RESTORE_ARGS 0,8,0
>> +	INTERRUPT_RETURN_NMI_SAFE
>> +
>>  	.section __ex_table, "a"
>>  	.quad irq_return, bad_iret
>>  	.previous
>> @@ -802,6 +817,10 @@ END(spurious_interrupt)
>>  	.macro paranoidexit trace=1
>>  	/* ebx:	no swapgs flag */
>>  paranoid_exit\trace:
>> +	GET_THREAD_INFO(%rcx)
>> +	testl $0x40000000,threadinfo_preempt_count(%rcx) /* Nested over NMI ? */
>> +	jnz paranoid_return_to_nmi\trace
>> +paranoid_exit_no_nmi\trace:
>>  	testl %ebx,%ebx				/* swapgs needed? */
>>  	jnz paranoid_restore\trace
>>  	testl $3,CS(%rsp)
>> @@ -814,6 +833,18 @@ paranoid_swapgs\trace:
>>  paranoid_restore\trace:
>>  	RESTORE_ALL 8
>>  	jmp irq_return
>> +paranoid_return_to_nmi\trace:		/*
>> +					 * If single-stepping an NMI handler,
>> +					 * use the normal iret path instead of
>> +					 * the popf/lret because lret would be
>> +					 * single-stepped. It should not
>> +					 * happen : it will reactivate NMIs
>> +					 * prematurely.
>> +					 */
>> +	bt $8,EFLAGS-0(%rsp)		/* trap flag? */
>> +	jc paranoid_exit_no_nmi\trace
>> +	RESTORE_ALL 8
>> +	INTERRUPT_RETURN_NMI_SAFE
>>   
>
> Does this need to be repeated verbatim?
>

The comment : no, will fix. The code : yes, because the ARGOFFSET is
different.

>>  paranoid_userspace\trace:
>>  	GET_THREAD_INFO(%rcx)
>>  	movl threadinfo_flags(%rcx),%ebx
>> Index: linux-2.6-lttng/include/asm-x86/irqflags.h
>> ===================================================================
>> --- linux-2.6-lttng.orig/include/asm-x86/irqflags.h	2008-04-16 
>> 11:25:18.000000000 -0400
>> +++ linux-2.6-lttng/include/asm-x86/irqflags.h	2008-04-16 
>> 11:29:30.000000000 -0400
>> @@ -138,12 +138,73 @@ static inline unsigned long __raw_local_
>>   #ifdef CONFIG_X86_64
>>  #define INTERRUPT_RETURN	iretq
>> +
>> +/*
>> + * Only returns from a trap or exception to a NMI context 
>> (intra-privilege
>> + * level near return) to the same SS and CS segments. Should be used
>> + * upon trap or exception return when nested over a NMI context so no 
>> iret is
>> + * issued. It takes care of modifying the eflags, rsp and returning to 
>> the
>> + * previous function.
>> + *
>> + * The stack, at that point, looks like :
>> + *
>> + * 0(rsp)  RIP
>> + * 8(rsp)  CS
>> + * 16(rsp) EFLAGS
>> + * 24(rsp) RSP
>> + * 32(rsp) SS
>> + *
>> + * Upon execution :
>> + * Copy EIP to the top of the return stack
>> + * Update top of return stack address
>> + * Pop eflags into the eflags register
>> + * Make the return stack current
>> + * Near return (popping the return address from the return stack)
>> + */
>> +#define INTERRUPT_RETURN_NMI_SAFE	pushq %rax;		\
>> +					pushq %rbx;		\
>> +					movq 40(%rsp), %rax;	\
>> +					movq 16(%rsp), %rbx;	\
>>   
>
> Use X+16(%rsp) notation here, so that the offsets correspond to the comment 
> above.
>

Ok.

>> +					subq $8, %rax;		\
>> +					movq %rbx, (%rax);	\
>> +					movq %rax, 40(%rsp);	\
>> +					popq %rbx;		\
>> +					popq %rax;		\
>> +					addq $16, %rsp;		\
>> +					popfq;			\
>> +					movq (%rsp), %rsp;	\
>> +					ret;			\
>>   
>
> How about something like
>
> 	pushq	%rax
> 	mov	%rsp, %rax		/* old stack */
> 	mov	24+8(%rax), %rsp	/* switch stacks */
> 	pushq	 0+8(%rax)		/* push return rip */
> 	pushq	16+8(%rax)		/* push return rflags */
> 	movq	(%rax), %rax		/* restore %rax */
> 	popfq				/* restore flags */
> 	ret				/* restore rip */
>

Yup, it looks nice, thanks.

>
>> +
>>  #define ENABLE_INTERRUPTS_SYSCALL_RET			\
>>  			movq	%gs:pda_oldrsp, %rsp;	\
>>  			swapgs;				\
>>  			sysretq;
>>  #else
>>  #define INTERRUPT_RETURN		iret
>> +
>> +/*
>> + * Protected mode only, no V8086. Implies that protected mode must
>> + * be entered before NMIs or MCEs are enabled. Only returns from a trap 
>> or
>> + * exception to a NMI context (intra-privilege level far return). Should 
>> be used
>> + * upon trap or exception return when nested over a NMI context so no 
>> iret is
>> + * issued.
>> + *
>> + * The stack, at that point, looks like :
>> + *
>> + * 0(esp) EIP
>> + * 4(esp) CS
>> + * 8(esp) EFLAGS
>> + *
>> + * Upon execution :
>> + * Copy the stack eflags to top of stack
>> + * Pop eflags into the eflags register
>> + * Far return: pop EIP and CS into their register, and additionally pop 
>> EFLAGS.
>> + */
>> +#define INTERRUPT_RETURN_NMI_SAFE	pushl 8(%esp);	\
>> +					popfl;		\
>> +					.byte 0xCA;	\
>> +					.word 4;
>>   
>
> Why not "lret $4"?
>

Because my punch card reader is broken ;)  will fix.

I'll repost a new version including these changes. We will have to
figure out a way to support PARAVIRT better than I currently do : a
PARAVIRT kernel running on bare metal still currently use iret. :(

Thanks for the review,

Mathieu
>    J
>

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)
  2008-04-17 16:29           ` Mathieu Desnoyers
@ 2008-04-17 16:45             ` Andi Kleen
  2008-04-18  0:05               ` Mathieu Desnoyers
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2008-04-17 16:45 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jeremy Fitzhardinge, Ingo Molnar, akpm, H. Peter Anvin,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

Mathieu Desnoyers wrote:
> * Jeremy Fitzhardinge (jeremy@goop.org) wrote:
>> Mathieu Desnoyers wrote:
>>> "This way lies madness. Don't go there."
>>>   
>> It is a large amount of... stuff.  This immediate values thing makes a big 
>> improvement then?
>>
> 
> As ingo said : the nmi-safe traps and exception is not only usefu lto
> immediate values, but also to oprofile. 

How is it useful to oprofile?

> On top of that, the LTTng kernel
> tracer has to write into vmalloc'd memory, so it's required there too.

All this effort changing really critical (and also fragile) code paths
used all the time is to handle setting markers into NMI functions. Or
actually the special case of setting markers in there that access
vmalloc() without calling vmalloc_sync().

NMI are maybe 5-6 functions all over the kernel.

I just don't think it makes any sense to put markers in there.
It is a really small part of the kernel the kernel that is unlikely
to be really useful for anybody. You should rather first solve the
problem of tracing the other 99.999999% of the kernel properly.

And then you could actually set the markers in there if you're
crazy enough, just call vmalloc_sync().

Mathieu argued earlier that markers should be set everywhere but
that is also bogus because there is enough other code where
you cannot set them either (one example would be early boot code[1])

And to do anything in NMI context you cannot use any locks so you would
have to write all data structures used by the markers lock less. I did
that for the the new mce code, but it's a really painful and bug prone
experience that I cannot really recommend to anybody.

And then NMIs (and machine checks) are a really obscure case, very
rarely used.

I think the right way is just to say that you cannot set markers
into NMI and machine check. Even with this patch it is highly unlikely
the resulting code will be correct anyways. Actually you could probably
set them without the patch with some effort (like calling vmalloc_sync),
but for the basic reasons mentioned above (lock less code is really
hard, nmi type functions are less than hundred lines in the millions
of kernel LOCs) it is just a very very bad idea.

-Andi

[1] Now that I mentioned it I still have enough faith to assume nobody
will be crazy enough to come up with some horrible hack to set markers
in early boot code too. But after seeing this patchkit ending up in a
git tree I'm not sure.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)
  2008-04-17 16:45             ` Andi Kleen
@ 2008-04-18  0:05               ` Mathieu Desnoyers
  2008-04-18  8:27                 ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Mathieu Desnoyers @ 2008-04-18  0:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, Ingo Molnar, akpm, H. Peter Anvin,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

* Andi Kleen (andi@firstfloor.org) wrote:
> Mathieu Desnoyers wrote:
> > * Jeremy Fitzhardinge (jeremy@goop.org) wrote:
> >> Mathieu Desnoyers wrote:
> >>> "This way lies madness. Don't go there."
> >>>   
> >> It is a large amount of... stuff.  This immediate values thing makes a big 
> >> improvement then?
> >>
> > 
> > As ingo said : the nmi-safe traps and exception is not only usefu lto
> > immediate values, but also to oprofile. 
> 
> How is it useful to oprofile?
> 

oprofile hooks this in the nmi callbacks :

arch/x86/oprofile/nmi_timer_int.c: profile_timer_exceptions_notify()
calls
drivers/oprofile/oprofile_add_sample()
which calls oprofile_add_ext_sample()
where
       if (log_sample(cpu_buf, pc, is_kernel, event))
                oprofile_ops.backtrace(regs, backtrace_depth);

First, log_sample writes into the vmalloc'd cpu buffer. That's for one
possible page fault.

Then, is a kernel backtrace happen, then I am not sure if printk_address
won't try to read any of the module data, which is vmalloc'd.

> > On top of that, the LTTng kernel
> > tracer has to write into vmalloc'd memory, so it's required there too.
> 
> All this effort changing really critical (and also fragile) code paths
> used all the time is to handle setting markers into NMI functions. Or
> actually the special case of setting markers in there that access
> vmalloc() without calling vmalloc_sync().
> 

Isn't vmalloc_sync() an expensive operation ? That would imply doing a
vmalloc_sync() after loading modules and after each buffer allocation I
suppose. And it's also to be able to put a breakpoint there, for the
immediate values.

> NMI are maybe 5-6 functions all over the kernel.
> 
> I just don't think it makes any sense to put markers in there.
> It is a really small part of the kernel the kernel that is unlikely
> to be really useful for anybody. You should rather first solve the
> problem of tracing the other 99.999999% of the kernel properly.
> 

The fact is that NMIs are very useful and powerful when it comes to try
to understand where code disabling interrupts is stucked, to get
performance counter reads periodically without suffering from IRQ
latency. Also, when trying to figure out what is actually happening in
the kernel timekeeping, having a stable periodic time source can be
pretty useful. Hooking this kind of feature in a tracer seems rather
logical.

> And then you could actually set the markers in there if you're
> crazy enough, just call vmalloc_sync().
> 

That would be one way to do it, except that it would not deal with int3.
Also, it would have to be taken into account at module load time. To me,
that looks like an error-prone design. If the problem is at the lower
end of the architecture, in the interrupt return path, why don't we
simply fix it there for good ?

> Mathieu argued earlier that markers should be set everywhere but
> that is also bogus because there is enough other code where
> you cannot set them either (one example would be early boot code[1])
> 

hmmm ? :) There is no "init" function in marker.c. It depends on the rcu
mechanism though, so I guess we can instrument start_kernel only after
rcu_init(). And yes, boot code is one of the first thing embedded system
developers want to instrument.

> And to do anything in NMI context you cannot use any locks so you would
> have to write all data structures used by the markers lock less. I did
> that for the the new mce code, but it's a really painful and bug prone
> experience that I cannot really recommend to anybody.
> 

LTTng is a lockless tracer which uses the RCU mechanism for control data
structure updates and a lockless cmpxchg_local scheme to manage the
per-cpu buffer space reservation. It has been out there for about 3
years now and is used in the industry.

> And then NMIs (and machine checks) are a really obscure case, very
> rarely used.
> 

I wonder if they are used so rarely because the underlying kernel is
buggy with respect with NMIs or because they are useless.

> I think the right way is just to say that you cannot set markers
> into NMI and machine check. Even with this patch it is highly unlikely
> the resulting code will be correct anyways. Actually you could probably
> set them without the patch with some effort (like calling vmalloc_sync),
> but for the basic reasons mentioned above (lock less code is really
> hard, nmi type functions are less than hundred lines in the millions
> of kernel LOCs) it is just a very very bad idea.
> 

You should have a look at LTTng then. ;) And by the way, the kernel
marker infrastructure also uses RCU-style updates and is designed to be
NMI-safe from the start.

Mathieu

> -Andi
> 
> 
> [1] Now that I mentioned it I still have enough faith to assume nobody
> will be crazy enough to come up with some horrible hack to set markers
> in early boot code too. But after seeing this patchkit ending up in a
> git tree I'm not sure.
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2)
  2008-04-16 16:03       ` Jeremy Fitzhardinge
@ 2008-04-18  0:48         ` Mathieu Desnoyers
  2008-04-18  9:49           ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 19+ messages in thread
From: Mathieu Desnoyers @ 2008-04-18  0:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ingo Molnar, Andi Kleen, akpm, H. Peter Anvin, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel, Rusty Russell, Zachary Amsden

* Jeremy Fitzhardinge (jeremy@goop.org) wrote:
> Ingo Molnar wrote:
>> * Ingo Molnar <mingo@elte.hu> wrote:
>>
>>   
>>> thanks Mathieu, i've picked this up into x86.git for more testing.
>>>     
>>
>> ... but had to drop it due to missing PARAVIRT support which broke the 
>> build. I guess on paravirt we could just initially define 
>> INTERRUPT_RETURN_NMI_SAFE to iret, etc.?
>
> I have not yet implemented Xen's support for paravirtual NMI, so there's no 
> scope for breaking anything from my perspective.  When I get around to NMI, 
> I'll work around whatever's there.  I don't know if lguest or VMI has any 
> guest NMI support.
>
>    J
>

I wonder if we could simply paravirtualize the popf instruction, which
seems to be the only one requiring to run in ring 0.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)
  2008-04-18  0:05               ` Mathieu Desnoyers
@ 2008-04-18  8:27                 ` Andi Kleen
  2008-04-19 21:18                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2008-04-18  8:27 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jeremy Fitzhardinge, Ingo Molnar, akpm, H. Peter Anvin,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

> arch/x86/oprofile/nmi_timer_int.c: profile_timer_exceptions_notify()
> calls
> drivers/oprofile/oprofile_add_sample()
> which calls oprofile_add_ext_sample()
> where
>        if (log_sample(cpu_buf, pc, is_kernel, event))
>                 oprofile_ops.backtrace(regs, backtrace_depth);

A red hering: The notifier setup calls vmalloc_sync_all() and oprofile
allocates its buffers before registering the notifier.

> First, log_sample writes into the vmalloc'd cpu buffer. That's for one
> possible page fault.

> Then, is a kernel backtrace happen, then I am not sure if printk_address
> won't try to read any of the module data, which is vmalloc'd.

Yes, admittedly the backtrace mode was always somewhat flakey. It probably
has more problems too.

The right fix for that is to call vmalloc_sync_all() after module load
when any nmi notifiers are registered.

> 
> 
>> NMI are maybe 5-6 functions all over the kernel.
>>
>> I just don't think it makes any sense to put markers in there.
>> It is a really small part of the kernel the kernel that is unlikely
>> to be really useful for anybody. You should rather first solve the
>> problem of tracing the other 99.999999% of the kernel properly.
>>
> 
> The fact is that NMIs are very useful and powerful when it comes to try
> to understand where code disabling interrupts is stucked, to get
> performance counter reads periodically

First there are no truly periodic (as in time) NMIs. The NMI watchdog 
is not really periodic but is delayed arbitrarily all the time when the CPU 
is in sleep states.

Then oprofile does this already what you describe. Why do we need
another questionable infrastructure to reimplement what is 
already there? 

 without suffering from IRQ
> latency

Just from all kind of other latency caused by non ticking performance
counters.

. Also, when trying to figure out what is actually happening in
> the kernel timekeeping, having a stable periodic time source can be
> pretty useful. 

Haha. You seem to be so deep into nonsense land, it is hard to comprehend.

> That would be one way to do it, except that it would not deal with int3.
> Also, it would have to be taken into account at module load time. To me,
> that looks like an error-prone design. If the problem is at the lower
> end of the architecture, in the interrupt return path, why don't we
> simply fix it there for good ?

There are all kinds of problems with NMIs, this is only one of them.
And NMIs are a really really obscure case

Frankly, if you spend all your time on fringe cases like this instead
of getting it to work on the 99.99999999999999% case it doesn't
surprise me that the markers don't make any progress for years now.

 And yes, boot code is one of the first thing embedded system
> developers want to instrument.

Crap. That code runs once. The only interest is correctness and 
if it's not correct you just step it through with a JTAG debugger.

> I wonder if they are used so rarely because the underlying kernel is
> buggy with respect with NMIs or because they are useless.

lockless programming is just really hard and not doing it is in most 
cases the sanest option.

Anyways I give up. Do what you want.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2)
  2008-04-18  0:48         ` Mathieu Desnoyers
@ 2008-04-18  9:49           ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 19+ messages in thread
From: Jeremy Fitzhardinge @ 2008-04-18  9:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Andi Kleen, akpm, H. Peter Anvin, Steven Rostedt,
	Frank Ch. Eigler, linux-kernel, Rusty Russell, Zachary Amsden

Mathieu Desnoyers wrote:
> * Jeremy Fitzhardinge (jeremy@goop.org) wrote:
>   
>> Ingo Molnar wrote:
>>     
>>> * Ingo Molnar <mingo@elte.hu> wrote:
>>>
>>>   
>>>       
>>>> thanks Mathieu, i've picked this up into x86.git for more testing.
>>>>     
>>>>         
>>> ... but had to drop it due to missing PARAVIRT support which broke the 
>>> build. I guess on paravirt we could just initially define 
>>> INTERRUPT_RETURN_NMI_SAFE to iret, etc.?
>>>       
>> I have not yet implemented Xen's support for paravirtual NMI, so there's no 
>> scope for breaking anything from my perspective.  When I get around to NMI, 
>> I'll work around whatever's there.  I don't know if lguest or VMI has any 
>> guest NMI support.
>>
>>    J
>>
>>     
>
> I wonder if we could simply paravirtualize the popf instruction, which
> seems to be the only one requiring to run in ring 0.

Hm, I'd need to think about it more.  There's more to NMI's than just 
the popf.

    J


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)
  2008-04-18  8:27                 ` Andi Kleen
@ 2008-04-19 21:18                   ` Mathieu Desnoyers
  2008-04-20 12:58                     ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Mathieu Desnoyers @ 2008-04-19 21:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, Ingo Molnar, akpm, H. Peter Anvin,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

* Andi Kleen (andi@firstfloor.org) wrote:
> 
> > arch/x86/oprofile/nmi_timer_int.c: profile_timer_exceptions_notify()
> > calls
> > drivers/oprofile/oprofile_add_sample()
> > which calls oprofile_add_ext_sample()
> > where
> >        if (log_sample(cpu_buf, pc, is_kernel, event))
> >                 oprofile_ops.backtrace(regs, backtrace_depth);
> 
> A red hering: The notifier setup calls vmalloc_sync_all() and oprofile
> allocates its buffers before registering the notifier.
> 

Ah, yes, you are right on this one, it was well hidden. :)

> > First, log_sample writes into the vmalloc'd cpu buffer. That's for one
> > possible page fault.
> 
>  
> > Then, is a kernel backtrace happen, then I am not sure if printk_address
> > won't try to read any of the module data, which is vmalloc'd.
> 
> Yes, admittedly the backtrace mode was always somewhat flakey. It probably
> has more problems too.
> 
> The right fix for that is to call vmalloc_sync_all() after module load
> when any nmi notifiers are registered.

I guess it would work, but it certainly looks like a patchy workaround.

>  
> > 
> > 
> >> NMI are maybe 5-6 functions all over the kernel.
> >>
> >> I just don't think it makes any sense to put markers in there.
> >> It is a really small part of the kernel the kernel that is unlikely
> >> to be really useful for anybody. You should rather first solve the
> >> problem of tracing the other 99.999999% of the kernel properly.
> >>
> > 
> > The fact is that NMIs are very useful and powerful when it comes to try
> > to understand where code disabling interrupts is stucked, to get
> > performance counter reads periodically
> 
> First there are no truly periodic (as in time) NMIs. The NMI watchdog 
> is not really periodic but is delayed arbitrarily all the time when the CPU 
> is in sleep states.
> 

The is no such thing as "perfect" periodicity. There is just better and
worse periodicity. NMIs just tend to have much less jitter.

> Then oprofile does this already what you describe. Why do we need
> another questionable infrastructure to reimplement what is 
> already there? 
> 

I don't like to duplicate work. I would just like to dump performance
counters in LTTng trace buffers at specific points. I guess building on
top of oprofile would be a good way to do it.

>  without suffering from IRQ
> > latency
> 
> Just from all kind of other latency caused by non ticking performance
> counters.
> 
> . Also, when trying to figure out what is actually happening in
> > the kernel timekeeping, having a stable periodic time source can be
> > pretty useful. 
> 
> Haha. You seem to be so deep into nonsense land, it is hard to comprehend.
> 
> > That would be one way to do it, except that it would not deal with int3.
> > Also, it would have to be taken into account at module load time. To me,
> > that looks like an error-prone design. If the problem is at the lower
> > end of the architecture, in the interrupt return path, why don't we
> > simply fix it there for good ?
> 
> There are all kinds of problems with NMIs, this is only one of them.
> And NMIs are a really really obscure case
> 

Which other problems ? I am listening.

> Frankly, if you spend all your time on fringe cases like this instead
> of getting it to work on the 99.99999999999999% case it doesn't
> surprise me that the markers don't make any progress for years now.
> 

One thing is that I really don't want to add fragility to a traced
kernel. A tracer that would make the kernel more fragile is the last
thing I want. Therefore, I make sure the tracer provides good reentrancy
so it can be called from virtually any kernel context and I also make
sure it stays in its own sandbox as much as possible, using atomic
operations to update its data structures/buffers.

>  And yes, boot code is one of the first thing embedded system
> > developers want to instrument.
> 
> Crap. That code runs once. The only interest is correctness and 
> if it's not correct you just step it through with a JTAG debugger.
> 

looking at these links tells us that some embedded developers are
interested in speeding up Linux boot time, and that's a task a kernel
tracer is very good at.

http://www.linuxdevices.com/news/NS5907201615.html
http://elinux.org/Boot_Time

> > I wonder if they are used so rarely because the underlying kernel is
> > buggy with respect with NMIs or because they are useless.
> 
> lockless programming is just really hard and not doing it is in most 
> cases the sanest option.
> 

I think kernel tracing would be an exception; that a kernel tracer
should be designed not to use any sort of lock, to have as little
dependency on the rest of the kernel as possible and to update its own
data structures atomically. That insures the tracer can be called from
virtually anywhere without having to worry about side-effects.  Given
that LTTng users have been happy with it for the past 2.5 years, I tend
to think I was right.

Mathieu

> Anyways I give up. Do what you want.
> 
> -Andi
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3)
  2008-04-19 21:18                   ` Mathieu Desnoyers
@ 2008-04-20 12:58                     ` Andi Kleen
  0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2008-04-20 12:58 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jeremy Fitzhardinge, Ingo Molnar, akpm, H. Peter Anvin,
	Steven Rostedt, Frank Ch. Eigler, linux-kernel

>>> First, log_sample writes into the vmalloc'd cpu buffer. That's for one
>>> possible page fault.
>>  
>>> Then, is a kernel backtrace happen, then I am not sure if printk_address
>>> won't try to read any of the module data, which is vmalloc'd.
>> Yes, admittedly the backtrace mode was always somewhat flakey. It probably
>> has more problems too.
>>
>> The right fix for that is to call vmalloc_sync_all() after module load
>> when any nmi notifiers are registered.
> 
> I guess it would work, but it certainly looks like a patchy workaround.

Actually i was wrong earlier on this. The backtrace should not access
any vmalloc data, only the stack which is not vmalloced. oprofile does
address resolving in user space. It will access the module struct,
but that one is not vmalloc'ed and I don't think it will access anything
else vmalloc'ed

I was thinking earlier of the dwarf2 unwinder case where it could have
happened when accessing the unwind table, but that is not mainline code
currently.

So oprofile is ok without any changes.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-04-20 12:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080414230344.GA16061@Krystal>
2008-04-14 23:05 ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2) Mathieu Desnoyers
2008-04-16 13:06   ` Ingo Molnar
2008-04-16 13:47     ` [TEST PATCH] Test NMI kprobe modules Mathieu Desnoyers
2008-04-16 14:34       ` Ingo Molnar
2008-04-16 14:54         ` Mathieu Desnoyers
2008-04-16 15:10     ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v2) Ingo Molnar
2008-04-16 15:18       ` H. Peter Anvin
2008-04-16 15:37       ` Mathieu Desnoyers
2008-04-16 16:03       ` Jeremy Fitzhardinge
2008-04-18  0:48         ` Mathieu Desnoyers
2008-04-18  9:49           ` Jeremy Fitzhardinge
2008-04-16 16:28       ` [RFC PATCH] x86 NMI-safe INT3 and Page Fault (v3) Mathieu Desnoyers
2008-04-16 17:57         ` Jeremy Fitzhardinge
2008-04-17 16:29           ` Mathieu Desnoyers
2008-04-17 16:45             ` Andi Kleen
2008-04-18  0:05               ` Mathieu Desnoyers
2008-04-18  8:27                 ` Andi Kleen
2008-04-19 21:18                   ` Mathieu Desnoyers
2008-04-20 12:58                     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox