From: Vineet Gupta <Vineet.Gupta1@synopsys.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Richard Henderson <rth@twiddle.net>, Russell King <linux@armlinux.org.uk>, Will Deacon <will.deacon@arm.com>, Haavard Skinnemoen <hskinnemoen@gmail.com>, Steven Miao <realmz6@gmail.com>, Jesper Nilsson <jesper.nilsson@axis.com>, Mark Salter <msalter@redhat.com>, Yoshinori Sato <ysato@users.sourceforge.jp>, Richard Kuo <rkuo@codeaurora.org>, Tony Luck <tony.luck@intel.com>, Geert Uytterhoeven <geert@linux-m68k.org>, James Hogan <james.hogan@imgtec.com>, Michal Simek <monstr@monstr.eu>, David Howells <dhowells@redhat.com>, Ley Foon Tan <lftan@altera.com>, Jonas Bonn <Jonas.Nilsson@syn> Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification Date: Thu, 30 Mar 2017 13:40:31 -0700 [thread overview] Message-ID: <efb7aaa4-7d25-0c68-ebf8-cdd7eb1297dc@synopsys.com> (raw) In-Reply-To: <CA+55aFyGwYwdk8i7-GbXV7NLTn38e-bow3VD-hHcQmTr9ebAjw@mail.gmail.com> On 03/29/2017 05:27 PM, Linus Torvalds wrote: > On Wed, Mar 29, 2017 at 5:02 PM, Vineet Gupta > <Vineet.Gupta1@synopsys.com> wrote: >> >> I guess I can in next day or two - but mind you the inline version for ARC is kind >> of special vs. other arches. We have this "manual" constant propagation to elide >> the unrolled LD/ST for 1-15 byte stragglers, when @sz is constant. > > I don't think that's special. We do that on x86 too, and I suspect ARC > copied it from there (or from somebody else who did it). No, I (re)wrote that code and AFAIKR didn't copy from anyone and AFAICS it is certainly different from others if not special. If you look closely at arc:access.h it is not the trivial check for 1-2-4 conversion as in the commit you referred to. It actually tries to compile time eliminate hunks from inline assembly, for constant @sz (so is designed purely for inlined variants, whether that matters or not is a different story). Thing is from the hardware POV, 4 LD/ST in flight is good (atleast for ARC700 cores) so we wrap it up in a Zero delay loop. This takes care of multiples of 16 bytes, the last 15 bytes are the killer which requires bunch of conditionals which is what I try to eliminate. FWIW, I experimented with uaccess inlining on ARC 1. pristine 4.11-rc1 (all inline) 2. Inline + disabling the "smart" const propagation 3. Out of line only variants (which already existed/default on ARC for -Os, but hacked for current -O3) Numbers for LMBench FS latency (off of tmpfs to avoid any device related perturbation). Note that LMBench already runs them several times itself and each of below is obviously with a fresh reboot since kernels were different. So it seems 0k file create/del gets worse without the smart inline, while 10k gets better. mmap (16k) got worse as well. With out of line some got better while some worse. File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6 170330-v4 Linux 4.11.0- 154.9 88.3 709.2 131.2 2494.0 4.056 11.0 91.1 170330-v4 Linux 4.11.0- 157.7 69.8 622.7 140.8 2168.0 5.654 10.8 91.0 Compare that to data against 1. pristine 4.11-rc1 (all inline) 2. Al's series + ARC forced inline 3. Al's series + ARC forced NOT inline File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6 170329-v4 Linux 4.11.0- 141.2 63.4 629.7 130.0 2172.0 5.796 10.8 90.0 170329-v4 Linux 4.11.0- 154.9 89.2 691.6 147.7 2323.0 4.922 10.8 92.3 So it's a mix bag really. Maybe we need some better directed test to really drill it down. > But at least on x86 is is limited entirely to the "__" versions, and > it's almost entirely pointless. We actually removed some of that kind > of code because it was *do* pointless, and it had just been copied > around into the "atomic" versions too. > > See for example commit bd28b14591b9 ("x86: remove more uaccess_32.h > complexity"), which did that. > > The basic "__" versions still do that constant-size thing, but they > really are questionable. Perhaps because the scope of constant usage was pretty narrow - it would only benefit if *copy_from_user() were called with 1,2,4 which is relatively unlikely as we have __get_user and friends for that already. > Exactly because it's just the "__" versions - > the *regular* "copy_to/from_user()" is an unconditional function call, > because inlining it isn't just the access operations, it's the size > check, and on modern x86 it's also the "set AC to mark the user access > as safe". So what you are saying is it is relatively costly on x86 because of SMAP which may not be true for arches w/o hardware support. Note that I'm not arguing for/against inlining per-se, it seems it doesn't matter -Vineet
WARNING: multiple messages have this Message-ID (diff)
From: Vineet Gupta <Vineet.Gupta1@synopsys.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Richard Henderson <rth@twiddle.net>, Russell King <linux@armlinux.org.uk>, Will Deacon <will.deacon@arm.com>, Haavard Skinnemoen <hskinnemoen@gmail.com>, Steven Miao <realmz6@gmail.com>, Jesper Nilsson <jesper.nilsson@axis.com>, Mark Salter <msalter@redhat.com>, Yoshinori Sato <ysato@users.sourceforge.jp>, Richard Kuo <rkuo@codeaurora.org>, Tony Luck <tony.luck@intel.com>, Geert Uytterhoeven <geert@linux-m68k.org>, James Hogan <james.hogan@imgtec.com>, Michal Simek <monstr@monstr.eu>, David Howells <dhowells@redhat.com>, Ley Foon Tan <lftan@altera.com>, Jonas Bonn <Jonas.Nilsson@synopsys> Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification Date: Thu, 30 Mar 2017 13:40:31 -0700 [thread overview] Message-ID: <efb7aaa4-7d25-0c68-ebf8-cdd7eb1297dc@synopsys.com> (raw) Message-ID: <20170330204031.9SBBdAqZ2mEeOkBlpBxUQ3cfhG5V4TD6Xq9uWCrGse8@z> (raw) In-Reply-To: <CA+55aFyGwYwdk8i7-GbXV7NLTn38e-bow3VD-hHcQmTr9ebAjw@mail.gmail.com> On 03/29/2017 05:27 PM, Linus Torvalds wrote: > On Wed, Mar 29, 2017 at 5:02 PM, Vineet Gupta > <Vineet.Gupta1@synopsys.com> wrote: >> >> I guess I can in next day or two - but mind you the inline version for ARC is kind >> of special vs. other arches. We have this "manual" constant propagation to elide >> the unrolled LD/ST for 1-15 byte stragglers, when @sz is constant. > > I don't think that's special. We do that on x86 too, and I suspect ARC > copied it from there (or from somebody else who did it). No, I (re)wrote that code and AFAIKR didn't copy from anyone and AFAICS it is certainly different from others if not special. If you look closely at arc:access.h it is not the trivial check for 1-2-4 conversion as in the commit you referred to. It actually tries to compile time eliminate hunks from inline assembly, for constant @sz (so is designed purely for inlined variants, whether that matters or not is a different story). Thing is from the hardware POV, 4 LD/ST in flight is good (atleast for ARC700 cores) so we wrap it up in a Zero delay loop. This takes care of multiples of 16 bytes, the last 15 bytes are the killer which requires bunch of conditionals which is what I try to eliminate. FWIW, I experimented with uaccess inlining on ARC 1. pristine 4.11-rc1 (all inline) 2. Inline + disabling the "smart" const propagation 3. Out of line only variants (which already existed/default on ARC for -Os, but hacked for current -O3) Numbers for LMBench FS latency (off of tmpfs to avoid any device related perturbation). Note that LMBench already runs them several times itself and each of below is obviously with a fresh reboot since kernels were different. So it seems 0k file create/del gets worse without the smart inline, while 10k gets better. mmap (16k) got worse as well. With out of line some got better while some worse. File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6 170330-v4 Linux 4.11.0- 154.9 88.3 709.2 131.2 2494.0 4.056 11.0 91.1 170330-v4 Linux 4.11.0- 157.7 69.8 622.7 140.8 2168.0 5.654 10.8 91.0 Compare that to data against 1. pristine 4.11-rc1 (all inline) 2. Al's series + ARC forced inline 3. Al's series + ARC forced NOT inline File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 170329-v4 Linux 4.11.0- 124.3 75.3 734.2 147.8 2200.0 6.205 10.9 87.6 170329-v4 Linux 4.11.0- 141.2 63.4 629.7 130.0 2172.0 5.796 10.8 90.0 170329-v4 Linux 4.11.0- 154.9 89.2 691.6 147.7 2323.0 4.922 10.8 92.3 So it's a mix bag really. Maybe we need some better directed test to really drill it down. > But at least on x86 is is limited entirely to the "__" versions, and > it's almost entirely pointless. We actually removed some of that kind > of code because it was *do* pointless, and it had just been copied > around into the "atomic" versions too. > > See for example commit bd28b14591b9 ("x86: remove more uaccess_32.h > complexity"), which did that. > > The basic "__" versions still do that constant-size thing, but they > really are questionable. Perhaps because the scope of constant usage was pretty narrow - it would only benefit if *copy_from_user() were called with 1,2,4 which is relatively unlikely as we have __get_user and friends for that already. > Exactly because it's just the "__" versions - > the *regular* "copy_to/from_user()" is an unconditional function call, > because inlining it isn't just the access operations, it's the size > check, and on modern x86 it's also the "set AC to mark the user access > as safe". So what you are saying is it is relatively costly on x86 because of SMAP which may not be true for arches w/o hardware support. Note that I'm not arguing for/against inlining per-se, it seems it doesn't matter -Vineet
next prev parent reply other threads:[~2017-03-30 20:41 UTC|newest] Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-03-29 5:57 [RFC][CFT][PATCHSET v1] uaccess unification Al Viro 2017-03-29 5:57 ` Al Viro 2017-03-29 20:08 ` Vineet Gupta 2017-03-29 20:08 ` Vineet Gupta 2017-03-29 20:29 ` Al Viro 2017-03-29 20:29 ` Al Viro 2017-03-29 20:37 ` Linus Torvalds 2017-03-29 20:37 ` Linus Torvalds 2017-03-29 21:03 ` Al Viro 2017-03-29 21:03 ` Al Viro 2017-03-29 21:24 ` Linus Torvalds 2017-03-29 21:24 ` Linus Torvalds 2017-03-29 23:09 ` Al Viro 2017-03-29 23:09 ` Al Viro 2017-03-29 23:43 ` Linus Torvalds 2017-03-29 23:43 ` Linus Torvalds 2017-03-30 15:31 ` Al Viro 2017-03-30 15:31 ` Al Viro 2017-03-29 21:14 ` Vineet Gupta 2017-03-29 21:14 ` Vineet Gupta 2017-03-29 23:42 ` Al Viro 2017-03-29 23:42 ` Al Viro 2017-03-30 0:02 ` Vineet Gupta 2017-03-30 0:02 ` Vineet Gupta 2017-03-30 0:27 ` Linus Torvalds 2017-03-30 0:27 ` Linus Torvalds 2017-03-30 1:15 ` Al Viro 2017-03-30 1:15 ` Al Viro 2017-03-30 20:40 ` Vineet Gupta [this message] 2017-03-30 20:40 ` Vineet Gupta 2017-03-30 20:59 ` Linus Torvalds 2017-03-30 20:59 ` Linus Torvalds 2017-03-30 23:21 ` Russell King - ARM Linux 2017-03-30 23:21 ` Russell King - ARM Linux 2017-03-30 12:32 ` Martin Schwidefsky 2017-03-30 12:32 ` Martin Schwidefsky 2017-03-30 14:48 ` Al Viro 2017-03-30 14:48 ` Al Viro 2017-03-30 16:22 ` Russell King - ARM Linux 2017-03-30 16:22 ` Russell King - ARM Linux 2017-03-30 16:43 ` Al Viro 2017-03-30 16:43 ` Al Viro 2017-03-30 17:18 ` Linus Torvalds 2017-03-30 17:18 ` Linus Torvalds 2017-03-30 18:48 ` Al Viro 2017-03-30 18:48 ` Al Viro 2017-03-30 18:54 ` Al Viro 2017-03-30 18:54 ` Al Viro 2017-03-30 18:59 ` Linus Torvalds 2017-03-30 18:59 ` Linus Torvalds 2017-03-30 19:10 ` Al Viro 2017-03-30 19:10 ` Al Viro 2017-03-30 19:19 ` Linus Torvalds 2017-03-30 19:19 ` Linus Torvalds 2017-03-30 21:08 ` Al Viro 2017-03-30 21:08 ` Al Viro 2017-03-30 18:56 ` Linus Torvalds 2017-03-30 18:56 ` Linus Torvalds 2017-03-31 0:21 ` Kees Cook 2017-03-31 0:21 ` Kees Cook 2017-03-31 13:38 ` James Hogan 2017-03-31 13:38 ` James Hogan 2017-04-03 16:27 ` James Morse 2017-04-03 16:27 ` James Morse 2017-04-04 20:26 ` Max Filippov 2017-04-04 20:26 ` Max Filippov 2017-04-04 20:52 ` Al Viro 2017-04-04 20:52 ` Al Viro 2017-04-05 5:05 ` ia64 exceptions (Re: [RFC][CFT][PATCHSET v1] uaccess unification) Al Viro 2017-04-05 8:08 ` Al Viro 2017-04-05 8:08 ` Al Viro 2017-04-05 18:44 ` Tony Luck 2017-04-05 18:44 ` Tony Luck 2017-04-05 20:33 ` Al Viro 2017-04-05 20:33 ` Al Viro 2017-04-07 0:24 ` [RFC][CFT][PATCHSET v2] uaccess unification Al Viro 2017-04-07 0:24 ` Al Viro 2017-04-07 0:35 ` Al Viro 2017-04-07 0:35 ` Al Viro
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=efb7aaa4-7d25-0c68-ebf8-cdd7eb1297dc@synopsys.com \ --to=vineet.gupta1@synopsys.com \ --cc=Jonas.Nilsson@syn \ --cc=dhowells@redhat.com \ --cc=geert@linux-m68k.org \ --cc=hskinnemoen@gmail.com \ --cc=james.hogan@imgtec.com \ --cc=jesper.nilsson@axis.com \ --cc=lftan@altera.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=monstr@monstr.eu \ --cc=msalter@redhat.com \ --cc=realmz6@gmail.com \ --cc=rkuo@codeaurora.org \ --cc=rth@twiddle.net \ --cc=tony.luck@intel.com \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ --cc=will.deacon@arm.com \ --cc=ysato@users.sourceforge.jp \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).