From: Russell King - ARM Linux <linux@armlinux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>,
Al Viro <viro@zeniv.linux.org.uk>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Richard Henderson <rth@twiddle.net>,
Will Deacon <will.deacon@arm.com>,
Haavard Skinnemoen <hskinnemoen@gmail.com>,
Steven Miao <realmz6@gmail.com>,
Jesper Nilsson <jesper.nilsson@axis.com>,
Mark Salter <msalter@redhat.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Richard Kuo <rkuo@codeaurora.org>,
Tony Luck <tony.luck@intel.com>,
Geert Uytterhoeven <geert@linux-m68k.org>,
James Hogan <james.hogan@imgtec.com>,
Michal Simek <monstr@monstr.eu>,
David Howells <dhowells@redhat.com>,
Ley Foon Tan <lftan@altera.com>, Jonas Bonn <Jonas.Nilsson@syno>
Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification
Date: Fri, 31 Mar 2017 00:21:47 +0100 [thread overview]
Message-ID: <20170330232147.GL7909@n2100.armlinux.org.uk> (raw)
In-Reply-To: <CA+55aFyQL75SOyx=zn1zWvy+TS-Ockv=O9Q59b_ZQwSeCh7WnQ@mail.gmail.com>
On Thu, Mar 30, 2017 at 01:59:58PM -0700, Linus Torvalds wrote:
> On Thu, Mar 30, 2017 at 1:40 PM, Vineet Gupta
> <Vineet.Gupta1@synopsys.com> wrote:
> >
> > So it's a mix bag really. Maybe we need some better directed test to really drill
> > it down.
>
> As mentioned inn the discussion about ARM, I seriously doubt that the
> inlining will even be noticeable compared to other effects here.
(Sorry to switch sub-threads.)
I'm running tests on that point, concentrating on hdparm -T and perfing
that. You're right in so far as perf identifies the hotspot as the
copy_to_user() function for that workload, rather than the inlined bits
- the top hits in perf of hdparm -T are:
+ 66.52% hdparm [k] __copy_to_user_std
+ 8.49% hdparm [k] generic_file_read_iter
+ 3.82% hdparm [k] lock_acquire
+ 2.80% hdparm [k] copy_page_to_iter
+ 2.49% hdparm [k] find_get_entry
+ 1.19% hdparm [k] lock_release
Note: perf on ARM does is affected by IRQ-disabled regions, so hotspots
can be off.
The generic_file_read_iter() one is definitely affected by an IRQ-
disabled region in there.
Here's the average hdparm -T transfer rates and standard deviation over
20 samples:
Unpatched: Average=320.42 MB/s sigma=0.878657
Uaccess+inline: Average=318.77 MB/s sigma=1.003332
Uaccess+noinline: Average=319.40 MB/s sigma=1.088354
This pattern - where the noinline version sits between the inlined
version and unpatched version seems to be a pattern in all the
measurements I've done so far, and it points to inlining that code
having a slight detrimental effect. What we don't know is whether
uninlining the code without Al's patch would see a slight boost,
but I'm not about to go there.
However, this all points towards there being a very slight advantage
to dropping the INLINE_COPY_TO_USER and INLINE_COPY_FROM_USER for
ARM, but I'd say it's really down in the noise - I'm not concerned.
> (On ARM, hopefully the UAO bit is faster to set, but it's still
> "another instruction before and after", so even if it's not as
> expensive as clac/stac are on current x86 chips, it's an argument
> against inlining)
The UAO set/clear does show up as a hotspot within copy_page_to_iter(),
but as we can see, overall its about 3% of the workload. Within
copy_page_to_iter(), it's the __put_user() based loop inside
fault_in_pages_writeable() which has the hotspot, due to the repeated
enable+disable sequence (more the instruction barriers that we need.)
Perf reports that the barriers account for 8.33 and 17.59% of the
time spent within that function, so we're actually talking about
maybe .25% and .5% of this workload spent doing the UAO thing.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux <linux@armlinux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>,
Al Viro <viro@zeniv.linux.org.uk>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Richard Henderson <rth@twiddle.net>,
Will Deacon <will.deacon@arm.com>,
Haavard Skinnemoen <hskinnemoen@gmail.com>,
Steven Miao <realmz6@gmail.com>,
Jesper Nilsson <jesper.nilsson@axis.com>,
Mark Salter <msalter@redhat.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Richard Kuo <rkuo@codeaurora.org>,
Tony Luck <tony.luck@intel.com>,
Geert Uytterhoeven <geert@linux-m68k.org>,
James Hogan <james.hogan@imgtec.com>,
Michal Simek <monstr@monstr.eu>,
David Howells <dhowells@redhat.com>,
Ley Foon Tan <lftan@altera.com>,
Jonas Bonn <Jonas.Nilsson@synopsys.com>
Subject: Re: [RFC][CFT][PATCHSET v1] uaccess unification
Date: Fri, 31 Mar 2017 00:21:47 +0100 [thread overview]
Message-ID: <20170330232147.GL7909@n2100.armlinux.org.uk> (raw)
Message-ID: <20170330232147.xd4hA-gLlSD6EVPCAEyimDBIZtz36MJiaeY8QupzhYY@z> (raw)
In-Reply-To: <CA+55aFyQL75SOyx=zn1zWvy+TS-Ockv=O9Q59b_ZQwSeCh7WnQ@mail.gmail.com>
On Thu, Mar 30, 2017 at 01:59:58PM -0700, Linus Torvalds wrote:
> On Thu, Mar 30, 2017 at 1:40 PM, Vineet Gupta
> <Vineet.Gupta1@synopsys.com> wrote:
> >
> > So it's a mix bag really. Maybe we need some better directed test to really drill
> > it down.
>
> As mentioned inn the discussion about ARM, I seriously doubt that the
> inlining will even be noticeable compared to other effects here.
(Sorry to switch sub-threads.)
I'm running tests on that point, concentrating on hdparm -T and perfing
that. You're right in so far as perf identifies the hotspot as the
copy_to_user() function for that workload, rather than the inlined bits
- the top hits in perf of hdparm -T are:
+ 66.52% hdparm [k] __copy_to_user_std
+ 8.49% hdparm [k] generic_file_read_iter
+ 3.82% hdparm [k] lock_acquire
+ 2.80% hdparm [k] copy_page_to_iter
+ 2.49% hdparm [k] find_get_entry
+ 1.19% hdparm [k] lock_release
Note: perf on ARM does is affected by IRQ-disabled regions, so hotspots
can be off.
The generic_file_read_iter() one is definitely affected by an IRQ-
disabled region in there.
Here's the average hdparm -T transfer rates and standard deviation over
20 samples:
Unpatched: Average=320.42 MB/s sigma=0.878657
Uaccess+inline: Average=318.77 MB/s sigma=1.003332
Uaccess+noinline: Average=319.40 MB/s sigma=1.088354
This pattern - where the noinline version sits between the inlined
version and unpatched version seems to be a pattern in all the
measurements I've done so far, and it points to inlining that code
having a slight detrimental effect. What we don't know is whether
uninlining the code without Al's patch would see a slight boost,
but I'm not about to go there.
However, this all points towards there being a very slight advantage
to dropping the INLINE_COPY_TO_USER and INLINE_COPY_FROM_USER for
ARM, but I'd say it's really down in the noise - I'm not concerned.
> (On ARM, hopefully the UAO bit is faster to set, but it's still
> "another instruction before and after", so even if it's not as
> expensive as clac/stac are on current x86 chips, it's an argument
> against inlining)
The UAO set/clear does show up as a hotspot within copy_page_to_iter(),
but as we can see, overall its about 3% of the workload. Within
copy_page_to_iter(), it's the __put_user() based loop inside
fault_in_pages_writeable() which has the hotspot, due to the repeated
enable+disable sequence (more the instruction barriers that we need.)
Perf reports that the barriers account for 8.33 and 17.59% of the
time spent within that function, so we're actually talking about
maybe .25% and .5% of this workload spent doing the UAO thing.
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
next prev parent reply other threads:[~2017-03-30 23:21 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-29 5:57 [RFC][CFT][PATCHSET v1] uaccess unification Al Viro
2017-03-29 5:57 ` Al Viro
2017-03-29 20:08 ` Vineet Gupta
2017-03-29 20:08 ` Vineet Gupta
2017-03-29 20:08 ` Vineet Gupta
2017-03-29 20:29 ` Al Viro
2017-03-29 20:29 ` Al Viro
2017-03-29 20:37 ` Linus Torvalds
2017-03-29 20:37 ` Linus Torvalds
2017-03-29 21:03 ` Al Viro
2017-03-29 21:03 ` Al Viro
2017-03-29 21:24 ` Linus Torvalds
2017-03-29 21:24 ` Linus Torvalds
2017-03-29 23:09 ` Al Viro
2017-03-29 23:09 ` Al Viro
2017-03-29 23:43 ` Linus Torvalds
2017-03-29 23:43 ` Linus Torvalds
2017-03-30 15:31 ` Al Viro
2017-03-30 15:31 ` Al Viro
2017-03-29 21:14 ` Vineet Gupta
2017-03-29 21:14 ` Vineet Gupta
2017-03-29 23:42 ` Al Viro
2017-03-29 23:42 ` Al Viro
2017-03-30 0:02 ` Vineet Gupta
2017-03-30 0:02 ` Vineet Gupta
2017-03-30 0:27 ` Linus Torvalds
2017-03-30 0:27 ` Linus Torvalds
2017-03-30 1:15 ` Al Viro
2017-03-30 1:15 ` Al Viro
2017-03-30 20:40 ` Vineet Gupta
2017-03-30 20:40 ` Vineet Gupta
2017-03-30 20:59 ` Linus Torvalds
2017-03-30 20:59 ` Linus Torvalds
2017-03-30 23:21 ` Russell King - ARM Linux [this message]
2017-03-30 23:21 ` Russell King - ARM Linux
2017-03-30 12:32 ` Martin Schwidefsky
2017-03-30 12:32 ` Martin Schwidefsky
2017-03-30 14:48 ` Al Viro
2017-03-30 14:48 ` Al Viro
2017-03-30 16:22 ` Russell King - ARM Linux
2017-03-30 16:22 ` Russell King - ARM Linux
2017-03-30 16:43 ` Al Viro
2017-03-30 16:43 ` Al Viro
2017-03-30 17:18 ` Linus Torvalds
2017-03-30 17:18 ` Linus Torvalds
2017-03-30 18:48 ` Al Viro
2017-03-30 18:48 ` Al Viro
2017-03-30 18:54 ` Al Viro
2017-03-30 18:54 ` Al Viro
2017-03-30 18:59 ` Linus Torvalds
2017-03-30 18:59 ` Linus Torvalds
2017-03-30 19:10 ` Al Viro
2017-03-30 19:10 ` Al Viro
2017-03-30 19:19 ` Linus Torvalds
2017-03-30 19:19 ` Linus Torvalds
2017-03-30 21:08 ` Al Viro
2017-03-30 21:08 ` Al Viro
2017-03-30 18:56 ` Linus Torvalds
2017-03-30 18:56 ` Linus Torvalds
2017-03-31 0:21 ` Kees Cook
2017-03-31 0:21 ` Kees Cook
2017-03-31 13:38 ` James Hogan
2017-03-31 13:38 ` James Hogan
2017-04-03 16:27 ` James Morse
2017-04-03 16:27 ` James Morse
2017-04-04 20:26 ` Max Filippov
2017-04-04 20:26 ` Max Filippov
2017-04-04 20:26 ` Max Filippov
2017-04-04 20:52 ` Al Viro
2017-04-04 20:52 ` Al Viro
2017-04-05 5:05 ` ia64 exceptions (Re: [RFC][CFT][PATCHSET v1] uaccess unification) Al Viro
2017-04-05 5:05 ` Al Viro
2017-04-05 8:08 ` Al Viro
2017-04-05 8:08 ` Al Viro
2017-04-05 18:44 ` Tony Luck
2017-04-05 18:44 ` Tony Luck
2017-04-05 20:33 ` Al Viro
2017-04-05 20:33 ` Al Viro
2017-04-07 0:24 ` [RFC][CFT][PATCHSET v2] uaccess unification Al Viro
2017-04-07 0:24 ` Al Viro
2017-04-07 0:35 ` Al Viro
2017-04-07 0:35 ` Al Viro
[not found] <CACVxJT8+fQqvpSPb9rTWFy6g7moqUqxi+Ewjcg0ykuqo=vm4Ow@mail.gmail.com>
2017-03-30 13:27 ` [RFC][CFT][PATCHSET v1] " Alexey Dobriyan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170330232147.GL7909@n2100.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=Jonas.Nilsson@syno \
--cc=Vineet.Gupta1@synopsys.com \
--cc=dhowells@redhat.com \
--cc=geert@linux-m68k.org \
--cc=hskinnemoen@gmail.com \
--cc=james.hogan@imgtec.com \
--cc=jesper.nilsson@axis.com \
--cc=lftan@altera.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=monstr@monstr.eu \
--cc=msalter@redhat.com \
--cc=realmz6@gmail.com \
--cc=rkuo@codeaurora.org \
--cc=rth@twiddle.net \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=will.deacon@arm.com \
--cc=ysato@users.sourceforge.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.