All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Steven Rostedt <rostedt@rostedt.homelinux.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>, Li Zefan <lizf@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Johannes Berg <johannes.berg@intel.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Tom Zanussi <tzanussi@gmail.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Andi Kleen <andi@firstfloor.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Tejun Heo <htejun@gmail.com>
Subject: Re: [patch 1/2] x86_64 page fault NMI-safe
Date: Thu, 15 Jul 2010 12:44:50 -0400	[thread overview]
Message-ID: <20100715164450.GC30989@Krystal> (raw)
In-Reply-To: <AANLkTin21xBD-BFY4ZX2BPRFnT5-LkgGugvWsvTEP4Mr@mail.gmail.com>

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> On Wed, Jul 14, 2010 at 3:37 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I think the %rip check should be pretty simple - exactly because there
> > is only a single point where the race is open between that 'mov' and
> > the 'iret'. So it's simpler than the (similar) thing we do for
> > debug/nmi stack fixup for sysenter that has to check a range.
> 
> So this is what I think it might look like, with the %rip in place.
> And I changed the "nmi_stack_ptr" thing to have both the pointer and a
> flag - because it turns out that in the single-instruction race case,
> we actually want the old pointer.
> 
> Totally untested, of course. But _something_ like this might work:
> 
> #
> # Two per-cpu variables: a "are we nested" flag (one byte), and
> # a "if we're nested, what is the %rsp for the nested case".
> #
> # The reason for why we can't just clear the saved-rsp field and
> # use that as the flag is that we actually want to know the saved
> # rsp for the special case of having a nested NMI happen on the
> # final iret of the unnested case.
> #
> nmi:
> 	cmpb $0,%__percpu_seg:nmi_stack_nesting
> 	jne nmi_nested_corrupt_and_return
> 	cmpq $nmi_iret_address,0(%rsp)
> 	je nmi_might_be_nested
> 	# create new stack
> is_unnested_nmi:
> 	# Save some space for nested NMI's. The exception itself
> 	# will never use more space, but it might use less (since
> 	# if will be a kernel-kernel transition). But the nested
> 	# exception will want two save registers and a place to
> 	# save the original CS that it will corrupt
> 	subq $64,%rsp
> 
> 	# copy the five words of stack info. 96 = 64 + stack
> 	# offset of ss.
> 	pushq 96(%rsp)   # ss
> 	pushq 96(%rsp)   # rsp
> 	pushq 96(%rsp)   # eflags
> 	pushq 96(%rsp)   # cs
> 	pushq 96(%rsp)   # rip
> 
> 	# and set the nesting flags
> 	movq %rsp,%__percpu_seg:nmi_stack_ptr
> 	movb $0xff,%__percpu_seg:nmi_stack_nesting
> 
> regular_nmi_code:
> 	...
> 	# regular NMI code goes here, and can take faults,
> 	# because this sequence now has proper nested-nmi
> 	# handling
> 	...
> nmi_exit:
> 	movb $0,%__percpu_seg:nmi_stack_nesting

The first thing that strikes me is that we could be interrupted by a standard
interrupt on top of the iret instruction below. This interrupt handler could in
turn be interrupted by a NMI, so the NMI handler would not know that it is
nested over nmi_iret_address. Maybe we could simply disable interrupts
explicitly at the beginning of the handler, so they get re-enabled by iret below
upon return from nmi ?

Doing that would ensure that only NMIs can interrupt us.

I'll look a bit more at the code and come back with more comments if things come
up.

Thanks,

Mathieu

> nmi_iret_address:
> 	iret
> 
> # The saved rip points to the final NMI iret, after we've cleared
> # nmi_stack_ptr. Check the CS segment to make sure.
> nmi_might_be_nested:
> 	cmpw $__KERNEL_CS,8(%rsp)
> 	jne is_unnested_nmi
> 
> # This is the case when we hit just as we're supposed to do the final
> # iret of a previous nmi.  We run the NMI using the old return address
> # that is still on the stack, rather than copy the new one that is bogus
> # and points to where the nested NMI interrupted the original NMI
> # handler!
> # Easy: just reset the stack pointer to the saved one (this is why
> # we use a separate "valid" flag, so that we can still use the saved
> # stack pointer)
> 	movq %__percpu_seg:nmi_stack_ptr,%rsp
> 	jmp regular_nmi_code
> 
> # This is the actual nested case.  Make sure we fault on iret by setting
> # CS to zero and saving the old CS.  %rax contains the stack pointer to
> # the original code.
> nmi_nested_corrupt_and_return:
> 	pushq %rax
> 	pushq %rdx
> 	movq %__percpu_seg:nmi_stack_ptr,%rax
> 	movq 8(%rax),%rdx	# CS of original NMI
> 	testq %rdx,%rdx		# CS already zero?
> 	je nmi_nested_and_already_corrupted
> 	movq %rdx,40(%rax)	# save old CS away
> 	movq $0,8(%rax)
> nmi_nested_and_already_corrupted:
> 	popq %rdx
> 	popq %rax
> 	popfq
> 	jmp *(%rsp)
> 
> Hmm?
> 
>                Linus

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

  parent reply	other threads:[~2010-07-15 16:44 UTC|newest]

Thread overview: 163+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-14 15:49 [patch 0/2] x86: NMI-safe trap handlers Mathieu Desnoyers
2010-07-14 15:49 ` [patch 1/2] x86_64 page fault NMI-safe Mathieu Desnoyers
2010-07-14 16:28   ` Linus Torvalds
2010-07-14 17:06     ` Mathieu Desnoyers
2010-07-14 18:10       ` Linus Torvalds
2010-07-14 18:46         ` Ingo Molnar
2010-07-14 19:14           ` Linus Torvalds
2010-07-14 19:36             ` Frederic Weisbecker
2010-07-14 19:54               ` Linus Torvalds
2010-07-14 20:17                 ` Mathieu Desnoyers
2010-07-14 20:55                   ` Linus Torvalds
2010-07-14 21:18                     ` Ingo Molnar
2010-07-14 22:14                 ` Frederic Weisbecker
2010-07-14 22:31                   ` Mathieu Desnoyers
2010-07-14 22:48                     ` Frederic Weisbecker
2010-07-14 23:11                       ` Mathieu Desnoyers
2010-07-14 23:38                         ` Frederic Weisbecker
2010-07-15 16:26                           ` Mathieu Desnoyers
2010-08-03 17:18                             ` Peter Zijlstra
2010-08-03 18:25                               ` Mathieu Desnoyers
2010-08-04  6:46                                 ` Peter Zijlstra
2010-08-04  7:14                                   ` Ingo Molnar
2010-08-04 14:45                                   ` Mathieu Desnoyers
2010-08-04 14:56                                     ` Peter Zijlstra
2010-08-06  1:49                                       ` Mathieu Desnoyers
2010-08-06  9:51                                         ` Peter Zijlstra
2010-08-06 13:46                                           ` Mathieu Desnoyers
2010-08-06  6:18                                       ` Masami Hiramatsu
2010-08-06  9:50                                         ` Peter Zijlstra
2010-08-06 13:37                                           ` Mathieu Desnoyers
2010-08-07  9:51                                           ` Masami Hiramatsu
2010-08-09 16:53                                           ` Frederic Weisbecker
2010-08-03 18:56                               ` Linus Torvalds
2010-08-03 19:45                                 ` Mathieu Desnoyers
2010-08-03 20:02                                   ` Linus Torvalds
2010-08-03 20:10                                     ` Ingo Molnar
2010-08-03 20:21                                       ` Ingo Molnar
2010-08-03 21:16                                         ` Mathieu Desnoyers
2010-08-03 20:54                                     ` Mathieu Desnoyers
2010-08-04  6:27                                 ` Peter Zijlstra
2010-08-04 14:06                                   ` Mathieu Desnoyers
2010-08-04 14:50                                     ` Peter Zijlstra
2010-08-06  1:42                                       ` Mathieu Desnoyers
2010-08-06 10:11                                         ` Peter Zijlstra
2010-08-06 11:14                                           ` Peter Zijlstra
2010-08-06 14:15                                             ` Mathieu Desnoyers
2010-08-06 14:13                                           ` Mathieu Desnoyers
2010-08-11 14:44                                             ` Steven Rostedt
2010-08-11 14:34                                   ` Steven Rostedt
2010-08-15 13:35                                     ` Mathieu Desnoyers
2010-08-15 16:33                                     ` Avi Kivity
2010-08-15 16:44                                       ` Mathieu Desnoyers
2010-08-15 16:51                                         ` Avi Kivity
2010-08-15 18:31                                           ` Mathieu Desnoyers
2010-08-16 10:49                                             ` Avi Kivity
2010-08-16 11:29                                             ` Avi Kivity
2010-08-04  6:46                                 ` Dave Chinner
2010-08-04  7:21                                   ` Ingo Molnar
2010-07-14 23:40                         ` Steven Rostedt
2010-07-14 19:41             ` Linus Torvalds
2010-07-14 19:56               ` Andi Kleen
2010-07-14 20:05                 ` Mathieu Desnoyers
2010-07-14 20:07                   ` Andi Kleen
2010-07-14 20:08                     ` H. Peter Anvin
2010-07-14 23:32                       ` Tejun Heo
2010-07-14 22:31                   ` Frederic Weisbecker
2010-07-14 22:56                     ` Linus Torvalds
2010-07-14 23:09                       ` Andi Kleen
2010-07-14 23:22                         ` Linus Torvalds
2010-07-15 14:11                       ` Frederic Weisbecker
2010-07-15 14:35                         ` Andi Kleen
2010-07-16 11:21                           ` Frederic Weisbecker
2010-07-15 14:46                         ` Steven Rostedt
2010-07-16 10:47                           ` Frederic Weisbecker
2010-07-16 11:43                             ` Steven Rostedt
2010-07-15 14:51                         ` Linus Torvalds
2010-07-15 15:38                           ` Linus Torvalds
2010-07-16 12:00                           ` Frederic Weisbecker
2010-07-16 12:54                             ` Steven Rostedt
2010-07-14 20:39         ` Mathieu Desnoyers
2010-07-14 21:23           ` Linus Torvalds
2010-07-14 21:45             ` Maciej W. Rozycki
2010-07-14 21:52               ` Linus Torvalds
2010-07-14 22:31                 ` Maciej W. Rozycki
2010-07-14 22:21             ` Mathieu Desnoyers
2010-07-14 22:37               ` Linus Torvalds
2010-07-14 22:51                 ` Jeremy Fitzhardinge
2010-07-14 23:02                   ` Linus Torvalds
2010-07-14 23:54                     ` Jeremy Fitzhardinge
2010-07-15  1:23                 ` Linus Torvalds
2010-07-15  1:45                   ` Linus Torvalds
2010-07-15 18:31                     ` Mathieu Desnoyers
2010-07-15 18:43                       ` Linus Torvalds
2010-07-15 18:48                         ` Linus Torvalds
2010-07-15 22:01                           ` Mathieu Desnoyers
2010-07-15 22:16                             ` Linus Torvalds
2010-07-15 22:24                               ` H. Peter Anvin
2010-07-15 22:26                               ` Linus Torvalds
2010-07-15 22:46                                 ` H. Peter Anvin
2010-07-15 22:58                                 ` Andi Kleen
2010-07-15 23:20                                   ` H. Peter Anvin
2010-07-15 23:23                                     ` Linus Torvalds
2010-07-15 23:41                                       ` H. Peter Anvin
2010-07-15 23:44                                         ` Linus Torvalds
2010-07-15 23:46                                           ` H. Peter Anvin
2010-07-15 23:48                                       ` Andi Kleen
2010-07-15 22:30                               ` Mathieu Desnoyers
2010-07-16 19:13                             ` Mathieu Desnoyers
2010-07-15 16:44                   ` Mathieu Desnoyers [this message]
2010-07-15 16:49                     ` Linus Torvalds
2010-07-15 17:38                       ` Mathieu Desnoyers
2010-07-15 20:44                         ` H. Peter Anvin
2010-07-18 11:03                   ` Avi Kivity
2010-07-18 17:36                     ` Linus Torvalds
2010-07-18 18:04                       ` Avi Kivity
2010-07-18 18:22                         ` Linus Torvalds
2010-07-19  7:32                           ` Avi Kivity
2010-07-18 18:17                       ` Linus Torvalds
2010-07-18 18:43                         ` Steven Rostedt
2010-07-18 19:26                           ` Linus Torvalds
2010-07-14 15:49 ` [patch 2/2] x86 NMI-safe INT3 and Page Fault Mathieu Desnoyers
2010-07-14 16:42   ` Maciej W. Rozycki
2010-07-14 18:12     ` Mathieu Desnoyers
2010-07-14 19:21       ` Maciej W. Rozycki
2010-07-14 19:58         ` Mathieu Desnoyers
2010-07-14 20:36           ` Maciej W. Rozycki
2010-07-16 12:28   ` Avi Kivity
2010-07-16 14:49     ` Mathieu Desnoyers
2010-07-16 15:34       ` Andi Kleen
2010-07-16 15:40         ` Mathieu Desnoyers
2010-07-16 16:47       ` Avi Kivity
2010-07-16 16:58         ` Mathieu Desnoyers
2010-07-16 17:54           ` Avi Kivity
2010-07-16 18:05             ` H. Peter Anvin
2010-07-16 18:15               ` Avi Kivity
2010-07-16 18:17                 ` H. Peter Anvin
2010-07-16 18:28                   ` Avi Kivity
2010-07-16 18:37                     ` Linus Torvalds
2010-07-16 19:26                       ` Avi Kivity
2010-07-16 21:39                         ` Linus Torvalds
2010-07-16 22:07                           ` Andi Kleen
2010-07-16 22:26                             ` Linus Torvalds
2010-07-16 22:41                               ` Andi Kleen
2010-07-17  1:15                                 ` Linus Torvalds
2010-07-16 22:40                             ` Mathieu Desnoyers
2010-07-18  9:23                           ` Avi Kivity
2010-07-16 18:22                 ` Mathieu Desnoyers
2010-07-16 18:32                   ` Avi Kivity
2010-07-16 19:29                     ` H. Peter Anvin
2010-07-16 19:39                       ` Avi Kivity
2010-07-16 19:32                     ` Andi Kleen
2010-07-16 18:25                 ` Linus Torvalds
2010-07-16 19:30                   ` Andi Kleen
2010-07-18  9:26                     ` Avi Kivity
2010-07-16 19:28               ` Andi Kleen
2010-07-16 19:32                 ` Avi Kivity
2010-07-16 19:34                   ` Andi Kleen
2010-08-04  9:46               ` Peter Zijlstra
2010-08-04 20:23                 ` H. Peter Anvin
2010-07-14 17:06 ` [patch 0/2] x86: NMI-safe trap handlers Andi Kleen
2010-07-14 17:08   ` Mathieu Desnoyers
2010-07-14 18:56     ` Andi Kleen
2010-07-14 23:29       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100715164450.GC30989@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=acme@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=fche@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=htejun@gmail.com \
    --cc=jeremy@goop.org \
    --cc=johannes.berg@intel.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rostedt@rostedt.homelinux.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tzanussi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.