All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brian Gerst <brgerst@gmail.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	X86 ML <x86@kernel.org>
Subject: Re: ia32_sysenter_target does not preserve EFLAGS
Date: Sat, 28 Mar 2015 10:30:36 +0100	[thread overview]
Message-ID: <20150328093036.GA9453@gmail.com> (raw)
In-Reply-To: <CA+55aFwO++rNZM-6UL2A2hRcC_tUAvEd+AeQtSSZXhozn3h-=g@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, Mar 27, 2015 at 1:53 PM, Brian Gerst <brgerst@gmail.com> wrote:
> >> <-- IRQ.  Boom
> >
> > The sti will delay interrupts for one instruction, and that should include NMIs.
> 
> Nope. Intel explicitly documents the NMI case only for mov->ss and popss.

Interestingly, I still see a STI 'NMI shadow' even on Intel CPUs.

Try something like this as root on a system with Intel CPUs (running 
recent tools/perf), with high-freq NMI sampling:

   perf top -F 10000

execute a tight syscall loop on all CPUs (getppid() loop for example), 
and you'll see something like this in the profile:

Samples: 1M of event 'cycles', Event count (approx.): 377899840545                                                                                                        
Overhead  Shared Object                      Symbol                                                                                                                      ◆
  27.67%  libc-2.19.so                       [.] __GI___getppid                                                                                                          ▒
  21.34%  [kernel]                           [k] system_call                                                                                                             ▒
  17.42%  [kernel]                           [k] system_call_after_swapgs                                                                                                ▒
  12.00%  [kernel]                           [k] pid_vnr                                                                                                                 ▒
   7.49%  [kernel]                           [k] sys_getppid                                                                                                             ▒
   5.49%  [kernel]                           [k] sysret_check                                                                                                            ▒
   5.34%  loop-getppid                       [.] main                                                                                                                    ▒
   1.56%  [kernel]                           [k] system_call_fastpath                                                                                                    ▒
   0.36%  loop-getppid                       [.] getppid@plt                                                                                                             ▒

Note the very high sample count (due to sampling at 10 KHz).

Now if you hit '<Enter>' twice to annotate system_call_after_swapgs 
you should see something like this (the live kernel image disassembly, 
annotated):

  system_call_after_swapgs  /proc/kcore                                                                                                                                     
       │                                             
       │                                             
       │                                             
       │              Disassembly of section load0:  
       │                                             
       │              ffffffff8178b3f3 <load0>:      
  9.72 │ffffffff8178b3f3:   mov    %rsp,%gs:0xb040
 44.24 │ffffffff8178b3fc:   mov    %gs:0xb888,%rsp
  0.02 │ffffffff8178b405:   sti                      
       │ffffffff8178b406:   nopl   0x0(%rax)         
 16.04 │ffffffff8178b40d:   sub    $0x50,%rsp
       │ffffffff8178b411:   mov    %rdi,0x40(%rsp)   
  6.51 │ffffffff8178b416:   mov    %rsi,0x38(%rsp)
  5.81 │ffffffff8178b41b:   mov    %rdx,0x30(%rsp)
  2.22 │ffffffff8178b420:   mov    %rax,0x20(%rsp)
  2.16 │ffffffff8178b425:   mov    %r8,0x18(%rsp)
  0.93 │ffffffff8178b42a:   mov    %r9,0x10(%rsp)
  1.57 │ffffffff8178b42f:   mov    %r10,0x8(%rsp)
  3.70 │ffffffff8178b434:   mov    %r11,(%rsp)
  2.27 │ffffffff8178b438:   mov    %rax,0x48(%rsp)
                                                
Note how the 7-byte NOP after the STI did not get a single profiler 
hit.

This is with the default '-e cycles', not '-e cycles:pp', so what we 
see as profiler hits should be the raw NMI entry RIPs.

Arguably this could be just the decoder hiding the NOP efficiently, 
I'll try to run some more experiments ...

Thanks,

	Ingo

  reply	other threads:[~2015-03-28  9:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-27 14:25 ia32_sysenter_target does not preserve EFLAGS Denys Vlasenko
2015-03-27 17:12 ` Borislav Petkov
2015-03-27 18:37 ` Andy Lutomirski
2015-03-27 20:09   ` Linus Torvalds
2015-03-27 20:16     ` Andy Lutomirski
2015-03-27 20:31       ` Linus Torvalds
2015-03-27 21:11         ` Andy Lutomirski
2015-03-28  9:42     ` Borislav Petkov
2015-03-27 20:53   ` Brian Gerst
2015-03-27 21:02     ` Linus Torvalds
2015-03-28  9:30       ` Ingo Molnar [this message]
2015-03-28  9:39       ` Ingo Molnar
2015-03-27 20:00 ` Linus Torvalds
2015-03-28  0:34   ` Denys Vlasenko
2015-03-28  9:28     ` Olivier Galibert
2015-03-28  9:46     ` Ingo Molnar
2015-03-28 11:17       ` Denys Vlasenko
2015-03-29 17:12         ` Andy Lutomirski
     [not found]     ` <CA+55aFxwNq6g+Oi-UhGBgEZuDQyNkeg6qZnkDY11PNhTN=fmzg@mail.gmail.com>
2015-03-28 11:01       ` Denys Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150328093036.GA9453@gmail.com \
    --to=mingo@kernel.org \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.