All of lore.kernel.org
 help / color / mirror / Atom feed
From: Raymond Jennings <shentino@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: X86 ML <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Willy Tarreau <w@1wt.eu>, Borislav Petkov <bp@alien8.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Brian Gerst <brgerst@gmail.com>
Subject: Re: Dealing with the NMI mess
Date: Fri, 24 Jul 2015 09:33:11 -0700	[thread overview]
Message-ID: <1437755591.5522.0.camel@gmail.com> (raw)
In-Reply-To: <CALCETrUf9s-o-ETMiSxxjMGxVeH7di4O9vTi0Oe7wS-RCiVXLA@mail.gmail.com>

On Thu, 2015-07-23 at 13:21 -0700, Andy Lutomirski wrote:
> [moved to a new thread, cc list trimmed]
> 
> Hi all-
> 
> We've considered two approaches to dealing with NMIs:
> 
> 1. Allow nesting.  We know quite well how messy that is.

This might be a stupid question, but

1.  What exactly does the NMI handler handle
2.  Is it possible for the NMI handler to just increment a counter and
return if it nests, and let the outer handler notice and rerun itself.

> 2. Forbid IRET inside NMIs.  Doable but maybe not that pretty.
> 
> We haven't considered:
> 
> 3. Forbid faults (other than MCE) inside NMI.
> 
> Option 3 is almost easy.  There are really only two kinds of faults
> that can legitimately nest inside NMI: #PF and #DB.  #DB is easy to
> fix (e.g. with my patches or Peter's patches).
> 
> What if we went all out and forbade page faults in NMI as well.  There
> are two reasons that I can think of that we might page fault inside an
> NMI:
> 
> a) vmalloc fault.  I think Ingo already half-implemented a rework to
> eliminate vmalloc faults entirely.
> 
> b) User memory access faults.
> 
> The reason we access user state in general from an NMI is to allow
> perf to capture enough user stack data to let the tooling backtrace
> back to user space.  What if we did it differently?  Instead of
> capturing this data in NMI context, capture it in
> prepare_exit_to_usermode.  That would let us capture user state
> *correctly*, which we currently can't really do.  There's a
> never-ending series of minor bugs in which we try to guess the user
> register state from NMI context, and it sort of works.  In
> prepare_exit_to_usermode, we really truly know the user state.
> There's a race where an NMI hits during or after
> prepare_exit_to_usermode, but maybe that's okay -- just admit defeat
> in that case and don't show the user state.  (Realistically, without
> CFI data, we're not going to be guaranteed to get the right state
> anyway.)
> 
> To make this work, we'd have to teach NMI-from-userspace to call the
> callback itself.  It would look like:
> 
> prepare_exit_to_usermode() {
>   ...
>   while (blah blah blah) {
>     if (cached_flags & TIF_PERF_CAPTURE_USER_STATE)
>       perf_capture_user_state();
>     ...
>   }
>   ...
> }
> 
> and then, on NMI exit, we'd call perf_capture_user_state directly,
> since we don't want to enable IRQs or do opportunsitic sysret on exit
> from NMI.  (Why not?  Because NMIs are still masked, and we don't want
> to pay for double-IRET to unmask them, so we really want to leave IRQs
> off and IRET straight back to user mode.)
> 
> There's an unavoidable race in which we enter user mode with
> TIF_PERF_CAPTURE_USER_STATE still set.  In principle, we could
> IPI-to-self from the NMI handler to cover that case (mostly -- we
> capture the wrong state if we're on our way to an IRET fault), or we
> could just check on entry if the flag is still set and, if so, admit
> defeat.
> 
> Peter, can this be done without breaking the perf ABI?  If we were
> designing all of this stuff from scratch right now, I'd suggest doing
> it this way, but I'm not sure whether it makes sense to try to
> retrofit it in.
> 
> 
> If we decide to stick with option 2, then I've now convinced myself
> that banning all kernel breakpoints and watchpoints during NMI
> processing is probably for the best.  Maybe we should go one step
> farther and ban all DR7 breakpoints period.  Sure, it will slow down
> perf if there are user breakpoints or watchpoints set, but, having
> looked at the asm, returning from #DB using RET is, while doable,
> distinctly ugly.
> 
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



      parent reply	other threads:[~2015-07-24 16:33 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-23 20:21 Dealing with the NMI mess Andy Lutomirski
2015-07-23 20:38 ` Linus Torvalds
2015-07-23 20:49   ` Andy Lutomirski
2015-07-23 21:08     ` Linus Torvalds
2015-07-23 21:31       ` Steven Rostedt
2015-07-23 21:46         ` Willy Tarreau
2015-07-23 21:46           ` Andy Lutomirski
2015-07-23 21:50             ` Willy Tarreau
2015-07-23 21:48         ` Linus Torvalds
2015-07-23 21:50           ` Andy Lutomirski
2015-07-23 21:59             ` Linus Torvalds
2015-07-24  8:13               ` Peter Zijlstra
2015-07-24  9:02                 ` Willy Tarreau
2015-07-24 11:58                 ` Steven Rostedt
2015-07-24 12:43                   ` Peter Zijlstra
2015-07-24 13:03                     ` Steven Rostedt
2015-07-24 13:21                       ` Willy Tarreau
2015-07-24 13:30                         ` Peter Zijlstra
2015-07-24 13:33                           ` Peter Zijlstra
2015-07-24 14:31                         ` Steven Rostedt
2015-07-24 14:59                           ` Willy Tarreau
2015-07-24 15:16                             ` Steven Rostedt
2015-07-24 15:26                               ` Willy Tarreau
2015-07-24 15:30                                 ` Peter Zijlstra
2015-07-24 15:33                                   ` Willy Tarreau
2015-07-24 18:29                                   ` Linus Torvalds
2015-07-24 18:41                                     ` Linus Torvalds
2015-07-24 19:05                                       ` Steven Rostedt
2015-07-24 19:55                                     ` Peter Zijlstra
2015-07-24 20:22                                       ` Linus Torvalds
2015-07-24 20:51                                         ` Peter Zijlstra
2015-07-24 21:07                                           ` Steven Rostedt
2015-07-24 21:08                                           ` Andy Lutomirski
2015-07-30 15:41                                             ` Paolo Bonzini
2015-07-30 21:22                                               ` Andy Lutomirski
2015-07-30 21:58                                                 ` Brian Gerst
2015-07-30 22:59                                                 ` Thomas Gleixner
2015-07-31  4:22                                                 ` Borislav Petkov
2015-07-31  5:11                                                   ` Andy Lutomirski
2015-07-31  7:51                                                     ` Paolo Bonzini
2015-07-31  8:03                                                     ` Borislav Petkov
2015-07-31  9:27                                                       ` Paolo Bonzini
2015-07-31 10:25                                                         ` Borislav Petkov
2015-07-31 10:26                                                           ` Paolo Bonzini
2015-07-31 10:32                                                             ` Borislav Petkov
2015-09-07  5:39                                                       ` Maciej W. Rozycki
2015-09-07  7:42                                                         ` Ingo Molnar
2015-09-07  8:19                                                           ` Maciej W. Rozycki
2015-09-07 10:19                                                             ` Paolo Bonzini
2015-09-07 17:01                                                               ` Maciej W. Rozycki
2015-09-07 17:22                                                                 ` Andy Lutomirski
2015-09-07 19:30                                                                   ` Maciej W. Rozycki
2015-09-07 21:56                                                                     ` Andy Lutomirski
2015-09-08 16:21                                                                       ` Maciej W. Rozycki
2015-07-24 23:53                                           ` Linus Torvalds
2015-07-24 15:34                                 ` Steven Rostedt
2015-07-24 15:49                                   ` Willy Tarreau
2015-07-24 15:48                 ` Andy Lutomirski
2015-07-24 16:02                   ` Steven Rostedt
2015-07-24 16:08                     ` Willy Tarreau
2015-07-24 16:31                       ` Steven Rostedt
2015-07-24 16:06                   ` Steven Rostedt
2015-07-24 16:25                   ` Willy Tarreau
2015-07-24 17:21                     ` Andy Lutomirski
2015-07-24 17:10                   ` Willy Tarreau
2015-07-24 17:20                     ` Andy Lutomirski
2015-07-30 15:54                       ` Paolo Bonzini
2015-07-24 17:21                     ` Willy Tarreau
2015-07-23 20:52   ` Willy Tarreau
2015-07-23 20:53     ` Andy Lutomirski
2015-07-23 21:07       ` Willy Tarreau
2015-07-23 21:13     ` Linus Torvalds
2015-07-23 21:18       ` Willy Tarreau
2015-07-23 21:20   ` Peter Zijlstra
2015-07-23 21:35     ` Linus Torvalds
2015-07-23 21:45       ` Andy Lutomirski
2015-07-23 21:54         ` Linus Torvalds
2015-07-23 21:59           ` Andy Lutomirski
2015-07-23 22:03             ` Linus Torvalds
2015-07-24 10:28             ` Peter Zijlstra
2015-07-24 11:06           ` Peter Zijlstra
2015-07-23 21:17 ` Peter Zijlstra
2015-07-23 21:20 ` Steven Rostedt
2015-07-23 21:46   ` Andy Lutomirski
2015-07-24 16:33 ` Raymond Jennings [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1437755591.5522.0.camel@gmail.com \
    --to=shentino@gmail.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=w@1wt.eu \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.