From: Raymond Jennings <shentino@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: X86 ML <x86@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Willy Tarreau <w@1wt.eu>, Borislav Petkov <bp@alien8.de>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Steven Rostedt <rostedt@goodmis.org>,
Brian Gerst <brgerst@gmail.com>
Subject: Re: Dealing with the NMI mess
Date: Fri, 24 Jul 2015 09:33:11 -0700 [thread overview]
Message-ID: <1437755591.5522.0.camel@gmail.com> (raw)
In-Reply-To: <CALCETrUf9s-o-ETMiSxxjMGxVeH7di4O9vTi0Oe7wS-RCiVXLA@mail.gmail.com>
On Thu, 2015-07-23 at 13:21 -0700, Andy Lutomirski wrote:
> [moved to a new thread, cc list trimmed]
>
> Hi all-
>
> We've considered two approaches to dealing with NMIs:
>
> 1. Allow nesting. We know quite well how messy that is.
This might be a stupid question, but
1. What exactly does the NMI handler handle
2. Is it possible for the NMI handler to just increment a counter and
return if it nests, and let the outer handler notice and rerun itself.
> 2. Forbid IRET inside NMIs. Doable but maybe not that pretty.
>
> We haven't considered:
>
> 3. Forbid faults (other than MCE) inside NMI.
>
> Option 3 is almost easy. There are really only two kinds of faults
> that can legitimately nest inside NMI: #PF and #DB. #DB is easy to
> fix (e.g. with my patches or Peter's patches).
>
> What if we went all out and forbade page faults in NMI as well. There
> are two reasons that I can think of that we might page fault inside an
> NMI:
>
> a) vmalloc fault. I think Ingo already half-implemented a rework to
> eliminate vmalloc faults entirely.
>
> b) User memory access faults.
>
> The reason we access user state in general from an NMI is to allow
> perf to capture enough user stack data to let the tooling backtrace
> back to user space. What if we did it differently? Instead of
> capturing this data in NMI context, capture it in
> prepare_exit_to_usermode. That would let us capture user state
> *correctly*, which we currently can't really do. There's a
> never-ending series of minor bugs in which we try to guess the user
> register state from NMI context, and it sort of works. In
> prepare_exit_to_usermode, we really truly know the user state.
> There's a race where an NMI hits during or after
> prepare_exit_to_usermode, but maybe that's okay -- just admit defeat
> in that case and don't show the user state. (Realistically, without
> CFI data, we're not going to be guaranteed to get the right state
> anyway.)
>
> To make this work, we'd have to teach NMI-from-userspace to call the
> callback itself. It would look like:
>
> prepare_exit_to_usermode() {
> ...
> while (blah blah blah) {
> if (cached_flags & TIF_PERF_CAPTURE_USER_STATE)
> perf_capture_user_state();
> ...
> }
> ...
> }
>
> and then, on NMI exit, we'd call perf_capture_user_state directly,
> since we don't want to enable IRQs or do opportunsitic sysret on exit
> from NMI. (Why not? Because NMIs are still masked, and we don't want
> to pay for double-IRET to unmask them, so we really want to leave IRQs
> off and IRET straight back to user mode.)
>
> There's an unavoidable race in which we enter user mode with
> TIF_PERF_CAPTURE_USER_STATE still set. In principle, we could
> IPI-to-self from the NMI handler to cover that case (mostly -- we
> capture the wrong state if we're on our way to an IRET fault), or we
> could just check on entry if the flag is still set and, if so, admit
> defeat.
>
> Peter, can this be done without breaking the perf ABI? If we were
> designing all of this stuff from scratch right now, I'd suggest doing
> it this way, but I'm not sure whether it makes sense to try to
> retrofit it in.
>
>
> If we decide to stick with option 2, then I've now convinced myself
> that banning all kernel breakpoints and watchpoints during NMI
> processing is probably for the best. Maybe we should go one step
> farther and ban all DR7 breakpoints period. Sure, it will slow down
> perf if there are user breakpoints or watchpoints set, but, having
> looked at the asm, returning from #DB using RET is, while doable,
> distinctly ugly.
>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
prev parent reply other threads:[~2015-07-24 16:33 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-23 20:21 Dealing with the NMI mess Andy Lutomirski
2015-07-23 20:38 ` Linus Torvalds
2015-07-23 20:49 ` Andy Lutomirski
2015-07-23 21:08 ` Linus Torvalds
2015-07-23 21:31 ` Steven Rostedt
2015-07-23 21:46 ` Willy Tarreau
2015-07-23 21:46 ` Andy Lutomirski
2015-07-23 21:50 ` Willy Tarreau
2015-07-23 21:48 ` Linus Torvalds
2015-07-23 21:50 ` Andy Lutomirski
2015-07-23 21:59 ` Linus Torvalds
2015-07-24 8:13 ` Peter Zijlstra
2015-07-24 9:02 ` Willy Tarreau
2015-07-24 11:58 ` Steven Rostedt
2015-07-24 12:43 ` Peter Zijlstra
2015-07-24 13:03 ` Steven Rostedt
2015-07-24 13:21 ` Willy Tarreau
2015-07-24 13:30 ` Peter Zijlstra
2015-07-24 13:33 ` Peter Zijlstra
2015-07-24 14:31 ` Steven Rostedt
2015-07-24 14:59 ` Willy Tarreau
2015-07-24 15:16 ` Steven Rostedt
2015-07-24 15:26 ` Willy Tarreau
2015-07-24 15:30 ` Peter Zijlstra
2015-07-24 15:33 ` Willy Tarreau
2015-07-24 18:29 ` Linus Torvalds
2015-07-24 18:41 ` Linus Torvalds
2015-07-24 19:05 ` Steven Rostedt
2015-07-24 19:55 ` Peter Zijlstra
2015-07-24 20:22 ` Linus Torvalds
2015-07-24 20:51 ` Peter Zijlstra
2015-07-24 21:07 ` Steven Rostedt
2015-07-24 21:08 ` Andy Lutomirski
2015-07-30 15:41 ` Paolo Bonzini
2015-07-30 21:22 ` Andy Lutomirski
2015-07-30 21:58 ` Brian Gerst
2015-07-30 22:59 ` Thomas Gleixner
2015-07-31 4:22 ` Borislav Petkov
2015-07-31 5:11 ` Andy Lutomirski
2015-07-31 7:51 ` Paolo Bonzini
2015-07-31 8:03 ` Borislav Petkov
2015-07-31 9:27 ` Paolo Bonzini
2015-07-31 10:25 ` Borislav Petkov
2015-07-31 10:26 ` Paolo Bonzini
2015-07-31 10:32 ` Borislav Petkov
2015-09-07 5:39 ` Maciej W. Rozycki
2015-09-07 7:42 ` Ingo Molnar
2015-09-07 8:19 ` Maciej W. Rozycki
2015-09-07 10:19 ` Paolo Bonzini
2015-09-07 17:01 ` Maciej W. Rozycki
2015-09-07 17:22 ` Andy Lutomirski
2015-09-07 19:30 ` Maciej W. Rozycki
2015-09-07 21:56 ` Andy Lutomirski
2015-09-08 16:21 ` Maciej W. Rozycki
2015-07-24 23:53 ` Linus Torvalds
2015-07-24 15:34 ` Steven Rostedt
2015-07-24 15:49 ` Willy Tarreau
2015-07-24 15:48 ` Andy Lutomirski
2015-07-24 16:02 ` Steven Rostedt
2015-07-24 16:08 ` Willy Tarreau
2015-07-24 16:31 ` Steven Rostedt
2015-07-24 16:06 ` Steven Rostedt
2015-07-24 16:25 ` Willy Tarreau
2015-07-24 17:21 ` Andy Lutomirski
2015-07-24 17:10 ` Willy Tarreau
2015-07-24 17:20 ` Andy Lutomirski
2015-07-30 15:54 ` Paolo Bonzini
2015-07-24 17:21 ` Willy Tarreau
2015-07-23 20:52 ` Willy Tarreau
2015-07-23 20:53 ` Andy Lutomirski
2015-07-23 21:07 ` Willy Tarreau
2015-07-23 21:13 ` Linus Torvalds
2015-07-23 21:18 ` Willy Tarreau
2015-07-23 21:20 ` Peter Zijlstra
2015-07-23 21:35 ` Linus Torvalds
2015-07-23 21:45 ` Andy Lutomirski
2015-07-23 21:54 ` Linus Torvalds
2015-07-23 21:59 ` Andy Lutomirski
2015-07-23 22:03 ` Linus Torvalds
2015-07-24 10:28 ` Peter Zijlstra
2015-07-24 11:06 ` Peter Zijlstra
2015-07-23 21:17 ` Peter Zijlstra
2015-07-23 21:20 ` Steven Rostedt
2015-07-23 21:46 ` Andy Lutomirski
2015-07-24 16:33 ` Raymond Jennings [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1437755591.5522.0.camel@gmail.com \
--to=shentino@gmail.com \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=w@1wt.eu \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).