All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Dobriyan <adobriyan@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrew Lutomirski <luto@kernel.org>,
	Borislav Petkov <bp@alien8.de>,
	Josh Poimboeuf <jpoimboe@redhat.com>, Peter Anvin <hpa@zytor.com>,
	Denys Vlasenko <dvlasenk@redhat.com>
Subject: Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
Date: Wed, 6 Jun 2018 01:41:50 +0300	[thread overview]
Message-ID: <20180605224150.GA2051@avx2> (raw)
In-Reply-To: <CA+55aFxTve6FxLuvWyyD88ACOg+3eHKbpWUeuvNZMSB=AFfuqg@mail.gmail.com>

On Tue, Jun 05, 2018 at 10:32:55AM -0700, Linus Torvalds wrote:
> On Tue, Jun 5, 2018 at 10:22 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> > Tested? :^) I had P4 maybe ~15(?) years ago.
> 
> Did you EVEN test it on what you have today?
> 
> Do you have any numbers at all, in other words?
> 
> Micro-optimizations need numbers. Otherwise they aren't
> micro-optimizations, they are just "change code randomly".

On my potato performance increase is 33%, sheesh.
And CPU starts doing 3 instructions per cycle vs 2.

benchmark is "clear_user(p + 4096 - 4068, 4068)"
4068 comes from booting Debian 8 with printk.

f0(4068) (old clear_user)
--------
$ taskset -c 15 perf stat -r 16 ./a.out

 Performance counter stats for './a.out' (16 runs):

       2033.189084      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.41% )
                 2      context-switches          #    0.001 K/sec                    ( +- 11.11% )
                 0      cpu-migrations            #    0.000 K/sec
                46      page-faults               #    0.023 K/sec                    ( +-  0.91% )
     4,268,425,486      cycles                    #    2.099 GHz                      ( +-  0.41% )
     8,672,326,256      instructions              #    2.03  insn per cycle           ( +-  0.00% )
     2,169,900,710      branches                  # 1067.240 M/sec                    ( +-  0.00% )
         4,226,258      branch-misses             #    0.19% of all branches          ( +-  0.01% )

       2.033700109 seconds time elapsed                                          ( +-  0.41% )

f1(4068) (new clear_user)
$ taskset -c 15 perf stat -r 16 ./a.out

 Performance counter stats for './a.out' (16 runs):

       1345.149992      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.01% )
                 2      context-switches          #    0.002 K/sec                    ( +-  8.35% )
                 0      cpu-migrations            #    0.000 K/sec
                46      page-faults               #    0.034 K/sec                    ( +-  0.82% )
     2,823,965,728      cycles                    #    2.099 GHz                      ( +-  0.01% )
     8,661,733,733      instructions              #    3.07  insn per cycle           ( +-  0.00% )
     2,169,437,410      branches                  # 1612.785 M/sec                    ( +-  0.00% )
         4,216,469      branch-misses             #    0.19% of all branches          ( +-  0.01% )

       1.345375114 seconds time elapsed                                          ( +-  0.01% )

-------------------------------------
CFLAGS = -Wall -fno-strict-aliasing -fno-common -fshort-wchar -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -funit-at-a-time -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 -fno-stack-protector -fomit-frame-pointer -fno-var-tracking-assignments -g -femit-struct-debug-baseonly -fno-var-tracking -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack


0000000000000780 <f0>:
 780:	mov    rax,rsi
 783:	mov    rcx,rsi
 786:	xor    edx,edx
 788:	and    eax,0x7
 78b:	shr    rcx,0x3
 78f:	mov    esi,0x8
 794:	test   rcx,rcx
 797:	je     7a3 <f0+0x23>
 799:	mov    QWORD PTR [rdi],rdx
 79c:	add    rdi,rsi
 79f:	dec    ecx
 7a1:	jne    799 <f0+0x19>
 7a3:	mov    rcx,rax
 7a6:	test   ecx,ecx
 7a8:	je     7b3 <f0+0x33>
 7aa:	mov    BYTE PTR [rdi],dl
 7ac:	inc    rdi
 7af:	dec    ecx
 7b1:	jne    7aa <f0+0x2a>
 7b3:	mov    rax,rcx
 7b6:	ret    

00000000000007c0 <f1>:
 7c0:	mov    rax,rsi
 7c3:	shr    rsi,0x3
 7c7:	and    eax,0x7
 7ca:	mov    rcx,rsi
 7cd:	test   rcx,rcx
 7d0:	je     7e1 <f1+0x21>
 7d2:	mov    QWORD PTR [rdi],0x0
 7d9:	add    rdi,0x8
 7dd:	dec    ecx
 7df:	jne    7d2 <f1+0x12>
 7e1:	mov    rcx,rax
 7e4:	test   ecx,ecx
 7e6:	je     7f2 <f1+0x32>
 7e8:	mov    BYTE PTR [rdi],0x0
 7eb:	inc    rdi
 7ee:	dec    ecx
 7f0:	jne    7e8 <f1+0x28>
 7f2:	mov    rax,rcx
 7f5:	ret    

  reply	other threads:[~2018-06-05 22:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-04 12:21 [GIT PULL] x86/asm changes for v4.18 Ingo Molnar
2018-06-05  1:58 ` Linus Torvalds
2018-06-05 15:05   ` x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18") Ingo Molnar
2018-06-05 15:47     ` Linus Torvalds
2018-06-05 17:22     ` Alexey Dobriyan
2018-06-05 17:32       ` Linus Torvalds
2018-06-05 22:41         ` Alexey Dobriyan [this message]
2018-06-05 23:01           ` Linus Torvalds
2018-06-05 23:04             ` Linus Torvalds
2018-06-05 23:20               ` Alexey Dobriyan
2018-06-05 23:27                 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180605224150.GA2051@avx2 \
    --to=adobriyan@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.