From: Russell King - ARM Linux <linux@armlinux.org.uk> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>, Al Viro <viro@zeniv.linux.org.uk>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Richard Henderson <rth@twiddle.net>, Will Deacon <will.deacon@arm.com>, Haavard Skinnemoen <hskinnemoen@gmail.com>, Steven Miao <realmz6@gmail.com>, Jesper Nilsson <jesper.nilsson@axis.com>, Mark Salter <msalter@redhat.com>, Yoshinori Sato <ysato@users.sourceforge.jp>, Richard Kuo <rkuo@codeaurora.org>, Tony Luck <tony.luck@intel.com>, Geert Uytterhoeven <geert@linux-m68k.org>, James Hogan <james.hogan@imgtec.com>, Michal Simek <monstr@monstr.eu>, David Howells <dhowells@redhat.com>, Ley Foon Tan <lftan@altera.com>, Jonas Bonn <Jonas.Nilsson@syno> Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification Date: Fri, 31 Mar 2017 00:21:47 +0100 [thread overview] Message-ID: <20170330232147.GL7909@n2100.armlinux.org.uk> (raw) In-Reply-To: <CA+55aFyQL75SOyx=zn1zWvy+TS-Ockv=O9Q59b_ZQwSeCh7WnQ@mail.gmail.com> On Thu, Mar 30, 2017 at 01:59:58PM -0700, Linus Torvalds wrote: > On Thu, Mar 30, 2017 at 1:40 PM, Vineet Gupta > <Vineet.Gupta1@synopsys.com> wrote: > > > > So it's a mix bag really. Maybe we need some better directed test to really drill > > it down. > > As mentioned inn the discussion about ARM, I seriously doubt that the > inlining will even be noticeable compared to other effects here. (Sorry to switch sub-threads.) I'm running tests on that point, concentrating on hdparm -T and perfing that. You're right in so far as perf identifies the hotspot as the copy_to_user() function for that workload, rather than the inlined bits - the top hits in perf of hdparm -T are: + 66.52% hdparm [k] __copy_to_user_std + 8.49% hdparm [k] generic_file_read_iter + 3.82% hdparm [k] lock_acquire + 2.80% hdparm [k] copy_page_to_iter + 2.49% hdparm [k] find_get_entry + 1.19% hdparm [k] lock_release Note: perf on ARM does is affected by IRQ-disabled regions, so hotspots can be off. The generic_file_read_iter() one is definitely affected by an IRQ- disabled region in there. Here's the average hdparm -T transfer rates and standard deviation over 20 samples: Unpatched: Average=320.42 MB/s sigma=0.878657 Uaccess+inline: Average=318.77 MB/s sigma=1.003332 Uaccess+noinline: Average=319.40 MB/s sigma=1.088354 This pattern - where the noinline version sits between the inlined version and unpatched version seems to be a pattern in all the measurements I've done so far, and it points to inlining that code having a slight detrimental effect. What we don't know is whether uninlining the code without Al's patch would see a slight boost, but I'm not about to go there. However, this all points towards there being a very slight advantage to dropping the INLINE_COPY_TO_USER and INLINE_COPY_FROM_USER for ARM, but I'd say it's really down in the noise - I'm not concerned. > (On ARM, hopefully the UAO bit is faster to set, but it's still > "another instruction before and after", so even if it's not as > expensive as clac/stac are on current x86 chips, it's an argument > against inlining) The UAO set/clear does show up as a hotspot within copy_page_to_iter(), but as we can see, overall its about 3% of the workload. Within copy_page_to_iter(), it's the __put_user() based loop inside fault_in_pages_writeable() which has the hotspot, due to the repeated enable+disable sequence (more the instruction barriers that we need.) Perf reports that the barriers account for 8.33 and 17.59% of the time spent within that function, so we're actually talking about maybe .25% and .5% of this workload spent doing the UAO thing. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux <linux@armlinux.org.uk> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>, Al Viro <viro@zeniv.linux.org.uk>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Richard Henderson <rth@twiddle.net>, Will Deacon <will.deacon@arm.com>, Haavard Skinnemoen <hskinnemoen@gmail.com>, Steven Miao <realmz6@gmail.com>, Jesper Nilsson <jesper.nilsson@axis.com>, Mark Salter <msalter@redhat.com>, Yoshinori Sato <ysato@users.sourceforge.jp>, Richard Kuo <rkuo@codeaurora.org>, Tony Luck <tony.luck@intel.com>, Geert Uytterhoeven <geert@linux-m68k.org>, James Hogan <james.hogan@imgtec.com>, Michal Simek <monstr@monstr.eu>, David Howells <dhowells@redhat.com>, Ley Foon Tan <lftan@altera.com>, Jonas Bonn <Jonas.Nilsson@synopsys.com> Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification Date: Fri, 31 Mar 2017 00:21:47 +0100 [thread overview] Message-ID: <20170330232147.GL7909@n2100.armlinux.org.uk> (raw) Message-ID: <20170330232147.xd4hA-gLlSD6EVPCAEyimDBIZtz36MJiaeY8QupzhYY@z> (raw) In-Reply-To: <CA+55aFyQL75SOyx=zn1zWvy+TS-Ockv=O9Q59b_ZQwSeCh7WnQ@mail.gmail.com> On Thu, Mar 30, 2017 at 01:59:58PM -0700, Linus Torvalds wrote: > On Thu, Mar 30, 2017 at 1:40 PM, Vineet Gupta > <Vineet.Gupta1@synopsys.com> wrote: > > > > So it's a mix bag really. Maybe we need some better directed test to really drill > > it down. > > As mentioned inn the discussion about ARM, I seriously doubt that the > inlining will even be noticeable compared to other effects here. (Sorry to switch sub-threads.) I'm running tests on that point, concentrating on hdparm -T and perfing that. You're right in so far as perf identifies the hotspot as the copy_to_user() function for that workload, rather than the inlined bits - the top hits in perf of hdparm -T are: + 66.52% hdparm [k] __copy_to_user_std + 8.49% hdparm [k] generic_file_read_iter + 3.82% hdparm [k] lock_acquire + 2.80% hdparm [k] copy_page_to_iter + 2.49% hdparm [k] find_get_entry + 1.19% hdparm [k] lock_release Note: perf on ARM does is affected by IRQ-disabled regions, so hotspots can be off. The generic_file_read_iter() one is definitely affected by an IRQ- disabled region in there. Here's the average hdparm -T transfer rates and standard deviation over 20 samples: Unpatched: Average=320.42 MB/s sigma=0.878657 Uaccess+inline: Average=318.77 MB/s sigma=1.003332 Uaccess+noinline: Average=319.40 MB/s sigma=1.088354 This pattern - where the noinline version sits between the inlined version and unpatched version seems to be a pattern in all the measurements I've done so far, and it points to inlining that code having a slight detrimental effect. What we don't know is whether uninlining the code without Al's patch would see a slight boost, but I'm not about to go there. However, this all points towards there being a very slight advantage to dropping the INLINE_COPY_TO_USER and INLINE_COPY_FROM_USER for ARM, but I'd say it's really down in the noise - I'm not concerned. > (On ARM, hopefully the UAO bit is faster to set, but it's still > "another instruction before and after", so even if it's not as > expensive as clac/stac are on current x86 chips, it's an argument > against inlining) The UAO set/clear does show up as a hotspot within copy_page_to_iter(), but as we can see, overall its about 3% of the workload. Within copy_page_to_iter(), it's the __put_user() based loop inside fault_in_pages_writeable() which has the hotspot, due to the repeated enable+disable sequence (more the instruction barriers that we need.) Perf reports that the barriers account for 8.33 and 17.59% of the time spent within that function, so we're actually talking about maybe .25% and .5% of this workload spent doing the UAO thing. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
next prev parent reply other threads:[~2017-03-30 23:21 UTC|newest] Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-03-29 5:57 [RFC][CFT][PATCHSET v1] uaccess unification Al Viro 2017-03-29 5:57 ` Al Viro 2017-03-29 20:08 ` Vineet Gupta 2017-03-29 20:08 ` Vineet Gupta 2017-03-29 20:29 ` Al Viro 2017-03-29 20:29 ` Al Viro 2017-03-29 20:37 ` Linus Torvalds 2017-03-29 20:37 ` Linus Torvalds 2017-03-29 21:03 ` Al Viro 2017-03-29 21:03 ` Al Viro 2017-03-29 21:24 ` Linus Torvalds 2017-03-29 21:24 ` Linus Torvalds 2017-03-29 23:09 ` Al Viro 2017-03-29 23:09 ` Al Viro 2017-03-29 23:43 ` Linus Torvalds 2017-03-29 23:43 ` Linus Torvalds 2017-03-30 15:31 ` Al Viro 2017-03-30 15:31 ` Al Viro 2017-03-29 21:14 ` Vineet Gupta 2017-03-29 21:14 ` Vineet Gupta 2017-03-29 23:42 ` Al Viro 2017-03-29 23:42 ` Al Viro 2017-03-30 0:02 ` Vineet Gupta 2017-03-30 0:02 ` Vineet Gupta 2017-03-30 0:27 ` Linus Torvalds 2017-03-30 0:27 ` Linus Torvalds 2017-03-30 1:15 ` Al Viro 2017-03-30 1:15 ` Al Viro 2017-03-30 20:40 ` Vineet Gupta 2017-03-30 20:40 ` Vineet Gupta 2017-03-30 20:59 ` Linus Torvalds 2017-03-30 20:59 ` Linus Torvalds 2017-03-30 23:21 ` Russell King - ARM Linux [this message] 2017-03-30 23:21 ` Russell King - ARM Linux 2017-03-30 12:32 ` Martin Schwidefsky 2017-03-30 12:32 ` Martin Schwidefsky 2017-03-30 14:48 ` Al Viro 2017-03-30 14:48 ` Al Viro 2017-03-30 16:22 ` Russell King - ARM Linux 2017-03-30 16:22 ` Russell King - ARM Linux 2017-03-30 16:43 ` Al Viro 2017-03-30 16:43 ` Al Viro 2017-03-30 17:18 ` Linus Torvalds 2017-03-30 17:18 ` Linus Torvalds 2017-03-30 18:48 ` Al Viro 2017-03-30 18:48 ` Al Viro 2017-03-30 18:54 ` Al Viro 2017-03-30 18:54 ` Al Viro 2017-03-30 18:59 ` Linus Torvalds 2017-03-30 18:59 ` Linus Torvalds 2017-03-30 19:10 ` Al Viro 2017-03-30 19:10 ` Al Viro 2017-03-30 19:19 ` Linus Torvalds 2017-03-30 19:19 ` Linus Torvalds 2017-03-30 21:08 ` Al Viro 2017-03-30 21:08 ` Al Viro 2017-03-30 18:56 ` Linus Torvalds 2017-03-30 18:56 ` Linus Torvalds 2017-03-31 0:21 ` Kees Cook 2017-03-31 0:21 ` Kees Cook 2017-03-31 13:38 ` James Hogan 2017-03-31 13:38 ` James Hogan 2017-04-03 16:27 ` James Morse 2017-04-03 16:27 ` James Morse 2017-04-04 20:26 ` Max Filippov 2017-04-04 20:26 ` Max Filippov 2017-04-04 20:52 ` Al Viro 2017-04-04 20:52 ` Al Viro 2017-04-05 5:05 ` ia64 exceptions (Re: [RFC][CFT][PATCHSET v1] uaccess unification) Al Viro 2017-04-05 8:08 ` Al Viro 2017-04-05 8:08 ` Al Viro 2017-04-05 18:44 ` Tony Luck 2017-04-05 18:44 ` Tony Luck 2017-04-05 20:33 ` Al Viro 2017-04-05 20:33 ` Al Viro 2017-04-07 0:24 ` [RFC][CFT][PATCHSET v2] uaccess unification Al Viro 2017-04-07 0:24 ` Al Viro 2017-04-07 0:35 ` Al Viro 2017-04-07 0:35 ` Al Viro
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20170330232147.GL7909@n2100.armlinux.org.uk \ --to=linux@armlinux.org.uk \ --cc=Jonas.Nilsson@syno \ --cc=Vineet.Gupta1@synopsys.com \ --cc=dhowells@redhat.com \ --cc=geert@linux-m68k.org \ --cc=hskinnemoen@gmail.com \ --cc=james.hogan@imgtec.com \ --cc=jesper.nilsson@axis.com \ --cc=lftan@altera.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=monstr@monstr.eu \ --cc=msalter@redhat.com \ --cc=realmz6@gmail.com \ --cc=rkuo@codeaurora.org \ --cc=rth@twiddle.net \ --cc=tony.luck@intel.com \ --cc=torvalds@linux-foundation.org \ --cc=viro@zeniv.linux.org.uk \ --cc=will.deacon@arm.com \ --cc=ysato@users.sourceforge.jp \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).