From: Alexey Dobriyan <adobriyan@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Andrew Morton <akpm@linux-foundation.org>,
Andrew Lutomirski <luto@kernel.org>,
Borislav Petkov <bp@alien8.de>,
Josh Poimboeuf <jpoimboe@redhat.com>, Peter Anvin <hpa@zytor.com>,
Denys Vlasenko <dvlasenk@redhat.com>
Subject: Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
Date: Wed, 6 Jun 2018 01:41:50 +0300 [thread overview]
Message-ID: <20180605224150.GA2051@avx2> (raw)
In-Reply-To: <CA+55aFxTve6FxLuvWyyD88ACOg+3eHKbpWUeuvNZMSB=AFfuqg@mail.gmail.com>
On Tue, Jun 05, 2018 at 10:32:55AM -0700, Linus Torvalds wrote:
> On Tue, Jun 5, 2018 at 10:22 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> > Tested? :^) I had P4 maybe ~15(?) years ago.
>
> Did you EVEN test it on what you have today?
>
> Do you have any numbers at all, in other words?
>
> Micro-optimizations need numbers. Otherwise they aren't
> micro-optimizations, they are just "change code randomly".
On my potato performance increase is 33%, sheesh.
And CPU starts doing 3 instructions per cycle vs 2.
benchmark is "clear_user(p + 4096 - 4068, 4068)"
4068 comes from booting Debian 8 with printk.
f0(4068) (old clear_user)
--------
$ taskset -c 15 perf stat -r 16 ./a.out
Performance counter stats for './a.out' (16 runs):
2033.189084 task-clock (msec) # 1.000 CPUs utilized ( +- 0.41% )
2 context-switches # 0.001 K/sec ( +- 11.11% )
0 cpu-migrations # 0.000 K/sec
46 page-faults # 0.023 K/sec ( +- 0.91% )
4,268,425,486 cycles # 2.099 GHz ( +- 0.41% )
8,672,326,256 instructions # 2.03 insn per cycle ( +- 0.00% )
2,169,900,710 branches # 1067.240 M/sec ( +- 0.00% )
4,226,258 branch-misses # 0.19% of all branches ( +- 0.01% )
2.033700109 seconds time elapsed ( +- 0.41% )
f1(4068) (new clear_user)
$ taskset -c 15 perf stat -r 16 ./a.out
Performance counter stats for './a.out' (16 runs):
1345.149992 task-clock (msec) # 1.000 CPUs utilized ( +- 0.01% )
2 context-switches # 0.002 K/sec ( +- 8.35% )
0 cpu-migrations # 0.000 K/sec
46 page-faults # 0.034 K/sec ( +- 0.82% )
2,823,965,728 cycles # 2.099 GHz ( +- 0.01% )
8,661,733,733 instructions # 3.07 insn per cycle ( +- 0.00% )
2,169,437,410 branches # 1612.785 M/sec ( +- 0.00% )
4,216,469 branch-misses # 0.19% of all branches ( +- 0.01% )
1.345375114 seconds time elapsed ( +- 0.01% )
-------------------------------------
CFLAGS = -Wall -fno-strict-aliasing -fno-common -fshort-wchar -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -funit-at-a-time -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 -fno-stack-protector -fomit-frame-pointer -fno-var-tracking-assignments -g -femit-struct-debug-baseonly -fno-var-tracking -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack
0000000000000780 <f0>:
780: mov rax,rsi
783: mov rcx,rsi
786: xor edx,edx
788: and eax,0x7
78b: shr rcx,0x3
78f: mov esi,0x8
794: test rcx,rcx
797: je 7a3 <f0+0x23>
799: mov QWORD PTR [rdi],rdx
79c: add rdi,rsi
79f: dec ecx
7a1: jne 799 <f0+0x19>
7a3: mov rcx,rax
7a6: test ecx,ecx
7a8: je 7b3 <f0+0x33>
7aa: mov BYTE PTR [rdi],dl
7ac: inc rdi
7af: dec ecx
7b1: jne 7aa <f0+0x2a>
7b3: mov rax,rcx
7b6: ret
00000000000007c0 <f1>:
7c0: mov rax,rsi
7c3: shr rsi,0x3
7c7: and eax,0x7
7ca: mov rcx,rsi
7cd: test rcx,rcx
7d0: je 7e1 <f1+0x21>
7d2: mov QWORD PTR [rdi],0x0
7d9: add rdi,0x8
7dd: dec ecx
7df: jne 7d2 <f1+0x12>
7e1: mov rcx,rax
7e4: test ecx,ecx
7e6: je 7f2 <f1+0x32>
7e8: mov BYTE PTR [rdi],0x0
7eb: inc rdi
7ee: dec ecx
7f0: jne 7e8 <f1+0x28>
7f2: mov rax,rcx
7f5: ret
next prev parent reply other threads:[~2018-06-05 22:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-04 12:21 [GIT PULL] x86/asm changes for v4.18 Ingo Molnar
2018-06-05 1:58 ` Linus Torvalds
2018-06-05 15:05 ` x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18") Ingo Molnar
2018-06-05 15:47 ` Linus Torvalds
2018-06-05 17:22 ` Alexey Dobriyan
2018-06-05 17:32 ` Linus Torvalds
2018-06-05 22:41 ` Alexey Dobriyan [this message]
2018-06-05 23:01 ` Linus Torvalds
2018-06-05 23:04 ` Linus Torvalds
2018-06-05 23:20 ` Alexey Dobriyan
2018-06-05 23:27 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180605224150.GA2051@avx2 \
--to=adobriyan@gmail.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dvlasenk@redhat.com \
--cc=hpa@zytor.com \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.