All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Denys Vlasenko <vda.linux@googlemail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Ingo Molnar <mingo@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Will Drewry <wad@chromium.org>, Kees Cook <keescook@chromium.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
Date: Mon, 27 Apr 2015 16:39:05 +0200	[thread overview]
Message-ID: <20150427143905.GK6774@pd.tnic> (raw)
In-Reply-To: <CALCETrWvqyCYOzCYXz7ZnzaM0obbidqo5CxVp2Hn1ELEiC3m3g@mail.gmail.com>

On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote:
> I know it would be ugly, but would it be worth saving two bytes by
> using ALTERNATIVE "jmp 1f", "shl ...", ...?

Damn, it is actually visible even that saving the unconditional forward
JMP makes the numbers marginally nicer (E: row). So I guess we'll be
dropping the forward JMP too.

A:    2835570.145246      cpu-clock (msec)                                              ( +-  0.02% ) [100.00%]
B:    2833364.074970      cpu-clock (msec)                                              ( +-  0.04% ) [100.00%]
C:    2834708.335431      cpu-clock (msec)                                              ( +-  0.02% ) [100.00%]
D:    2835055.118431      cpu-clock (msec)                                              ( +-  0.01% ) [100.00%]
E:    2833115.118624      cpu-clock (msec)                                              ( +-  0.06% ) [100.00%]

A:    2835570.099981      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.02% ) [100.00%]
B:    2833364.073633      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.04% ) [100.00%]
C:    2834708.350387      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.02% ) [100.00%]
D:    2835055.094383      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.01% ) [100.00%]
E:    2833115.145292      task-clock (msec)         #    3.996 CPUs utilized            ( +-  0.06% ) [100.00%]

A: 5,591,213,166,613      cycles                    #    1.972 GHz                      ( +-  0.03% ) [75.00%]
B: 5,585,023,802,888      cycles                    #    1.971 GHz                      ( +-  0.03% ) [75.00%]
C: 5,587,983,212,758      cycles                    #    1.971 GHz                      ( +-  0.02% ) [75.00%]
D: 5,584,838,532,936      cycles                    #    1.970 GHz                      ( +-  0.03% ) [75.00%]
E: 5,583,979,727,842      cycles                    #    1.971 GHz                      ( +-  0.05% ) [75.00%]

cycles is the lowest, nice.

A: 3,106,707,101,530      instructions              #    0.56  insns per cycle          ( +-  0.01% ) [75.00%]
B: 3,106,632,251,528      instructions              #    0.56  insns per cycle          ( +-  0.00% ) [75.00%]
C: 3,106,265,958,142      instructions              #    0.56  insns per cycle          ( +-  0.00% ) [75.00%]
D: 3,106,294,801,185      instructions              #    0.56  insns per cycle          ( +-  0.00% ) [75.00%]
E: 3,106,381,223,355      instructions              #    0.56  insns per cycle          ( +-  0.01% ) [75.00%]

Understandable - we end up executing 5 insns more:

ffffffff815b90ac:       66 66 66 90             data16 data16 xchg %ax,%ax
ffffffff815b90b0:       66 66 66 90             data16 data16 xchg %ax,%ax
ffffffff815b90b4:       66 66 66 90             data16 data16 xchg %ax,%ax
ffffffff815b90b8:       66 66 66 90             data16 data16 xchg %ax,%ax
ffffffff815b90bc:       90                      nop


A:   683,676,044,429      branches                  #  241.107 M/sec                    ( +-  0.01% ) [75.00%]
B:   683,670,899,595      branches                  #  241.293 M/sec                    ( +-  0.01% ) [75.00%]
C:   683,675,772,858      branches                  #  241.180 M/sec                    ( +-  0.01% ) [75.00%]
D:   683,683,533,664      branches                  #  241.154 M/sec                    ( +-  0.00% ) [75.00%]
E:   683,648,518,667      branches                  #  241.306 M/sec                    ( +-  0.01% ) [75.00%]

Lowest.

A:    43,829,535,008      branch-misses             #    6.41% of all branches          ( +-  0.02% ) [75.00%]
B:    43,844,118,416      branch-misses             #    6.41% of all branches          ( +-  0.03% ) [75.00%]
C:    43,819,871,086      branch-misses             #    6.41% of all branches          ( +-  0.02% ) [75.00%]
D:    43,795,107,998      branch-misses             #    6.41% of all branches          ( +-  0.02% ) [75.00%]
E:    43,801,985,070      branch-misses             #    6.41% of all branches          ( +-  0.02% ) [75.00%]

That looks like noise to me - we shouldn't be getting more branch misses
with the E: version.

A:         2,030,357      context-switches          #    0.716 K/sec                    ( +-  0.06% ) [100.00%]
B:         2,029,313      context-switches          #    0.716 K/sec                    ( +-  0.05% ) [100.00%]
C:         2,028,566      context-switches          #    0.716 K/sec                    ( +-  0.06% ) [100.00%]
D:         2,028,895      context-switches          #    0.716 K/sec                    ( +-  0.06% ) [100.00%]
E:         2,031,008      context-switches          #    0.717 K/sec                    ( +-  0.09% ) [100.00%]

A:            52,421      migrations                #    0.018 K/sec                    ( +-  1.13% )
B:            52,049      migrations                #    0.018 K/sec                    ( +-  1.02% )
C:            51,365      migrations                #    0.018 K/sec                    ( +-  0.92% )
D:            51,766      migrations                #    0.018 K/sec                    ( +-  1.11% )
E:            53,047      migrations                #    0.019 K/sec                    ( +-  1.08% )

A:     709.528485252 seconds time elapsed                                          ( +-  0.02% )
B:     708.976557288 seconds time elapsed                                          ( +-  0.04% )
C:     709.312844791 seconds time elapsed                                          ( +-  0.02% )
D:     709.400050112 seconds time elapsed                                          ( +-  0.01% )
E:     708.914562508 seconds time elapsed                                          ( +-  0.06% )

Nice.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

      parent reply	other threads:[~2015-04-27 14:39 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-24  2:15 [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue Andy Lutomirski
2015-04-24  2:18 ` Andy Lutomirski
2015-04-26 12:34   ` Denys Vlasenko
2015-04-24  3:58 ` Brian Gerst
2015-04-24  9:59 ` Denys Vlasenko
2015-04-24 10:59   ` Borislav Petkov
2015-04-24 19:58     ` Borislav Petkov
2015-04-24 11:27 ` Denys Vlasenko
2015-04-24 12:00   ` Brian Gerst
2015-04-24 16:25     ` Linus Torvalds
2015-04-24 17:33       ` Brian Gerst
2015-04-24 17:41         ` Linus Torvalds
2015-04-24 17:57           ` Brian Gerst
2015-04-24 20:21 ` Andy Lutomirski
2015-04-24 20:46   ` Denys Vlasenko
2015-04-24 20:50     ` Andy Lutomirski
2015-04-24 21:45       ` H. Peter Anvin
2015-04-24 21:45       ` H. Peter Anvin
2015-04-24 21:45       ` H. Peter Anvin
2015-04-24 21:45       ` H. Peter Anvin
2015-04-24 21:45       ` H. Peter Anvin
2015-04-24 21:45       ` H. Peter Anvin
2015-04-25  2:17       ` Denys Vlasenko
2015-04-26 23:36         ` Andy Lutomirski
2015-04-24 20:53   ` Linus Torvalds
2015-04-25 21:12 ` Borislav Petkov
2015-04-26 11:22   ` perf numbers (was: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue) Borislav Petkov
2015-04-26 23:39   ` [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue Andy Lutomirski
2015-04-27  8:53     ` Borislav Petkov
2015-04-27 10:07       ` Denys Vlasenko
2015-04-27 10:09         ` Borislav Petkov
2015-04-27 11:35       ` Borislav Petkov
2015-04-27 12:08         ` Denys Vlasenko
2015-04-27 12:48           ` Borislav Petkov
2015-04-27 14:57         ` Linus Torvalds
2015-04-27 15:06           ` Linus Torvalds
2015-04-27 15:35             ` Borislav Petkov
2015-04-27 15:46           ` Borislav Petkov
2015-04-27 15:56             ` Andy Lutomirski
2015-04-27 16:04               ` Brian Gerst
2015-04-27 16:10                 ` Denys Vlasenko
2015-04-27 16:00             ` Linus Torvalds
2015-04-27 16:40               ` Borislav Petkov
2015-04-27 18:14                 ` Linus Torvalds
2015-04-27 18:38                   ` Borislav Petkov
2015-04-27 18:47                     ` Linus Torvalds
2015-04-27 18:53                       ` Borislav Petkov
2015-04-27 19:59                         ` H. Peter Anvin
2015-04-27 20:03                           ` Borislav Petkov
2015-04-27 20:14                             ` H. Peter Anvin
2015-04-28 15:55                               ` Borislav Petkov
2015-04-28 16:28                                 ` Linus Torvalds
2015-04-28 16:58                                   ` Borislav Petkov
2015-04-28 17:16                                     ` Linus Torvalds
2015-04-28 18:38                                       ` Borislav Petkov
2015-04-30 21:39                                         ` H. Peter Anvin
2015-04-30 23:23                                           ` H. Peter Anvin
2015-05-01  9:03                                             ` Borislav Petkov
2015-05-03 11:51                                           ` Borislav Petkov
2015-04-27 19:11                     ` Borislav Petkov
2015-04-27 19:21                       ` Denys Vlasenko
2015-04-27 19:45                         ` Borislav Petkov
2015-04-28 13:40                           ` Borislav Petkov
2015-04-27 16:12           ` Denys Vlasenko
2015-04-27 18:12             ` Linus Torvalds
2015-04-27 18:47               ` Borislav Petkov
2015-04-27 14:39     ` Borislav Petkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150427143905.GK6774@pd.tnic \
    --to=bp@alien8.de \
    --cc=ast@plumgrid.com \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vda.linux@googlemail.com \
    --cc=wad@chromium.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.