From: Jens Axboe <axboe@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>, pabeni@redhat.com
Cc: Ingo Molnar <mingo@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>,
bp@alien8.de, Peter Anvin <hpa@zytor.com>,
the arch/x86 maintainers <x86@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andrew Lutomirski <luto@kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
dvlasenk@redhat.com, brgerst@gmail.com,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes
Date: Wed, 21 Nov 2018 11:04:54 -0700 [thread overview]
Message-ID: <658cdb28-e3e5-c0af-368f-c26daf9986ac@kernel.dk> (raw)
In-Reply-To: <CAHk-=wgnjB+=o0771=J_YkQzabU5aadh6pN3x9Vk4HPs3MHL3g@mail.gmail.com>
On 11/21/18 10:27 AM, Linus Torvalds wrote:
> On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni <pabeni@redhat.com> wrote:
>>
>> In my experiments 64 bytes was the break even point for all the CPUs I
>> had handy, but I guess that may change with other models.
>
> Note that experiments with memcpy speed are almost invariably broken.
> microbenchmarks don't show the impact of I$, but they also don't show
> the impact of _behavior_.
>
> For example, there might be things like "repeat strings do cacheline
> optimizations" that end up meaning that cachelines stay in L2, for
> example, and are never brought into L1. That can be a really good
> thing, but it can also mean that now the result isn't as close to the
> CPU, and the subsequent use of the cacheline can be costlier.
Totally agree, which is why all my testing was NOT microbenchmarking.
> I say "go for upping the limit to 128 bytes".
See below...
> That said, if the aio user copy is _so_ critical that it's this
> noticeable, there may be other issues. Sometimes _real_ cost of small
> user copies is often the STAC/CLAC, more so than the "rep movs".
>
> It would be interesting to know exactly which copy it is that matters
> so much... *inlining* the erms case might show that nicely in
> profiles.
Oh I totally agree, which is why I since went a different route. The
copy that matters is the copy_from_user() of the iocb, which is 64
bytes. Even for 4k IOs, copying 64b per IO is somewhat counter
productive for O_DIRECT.
Playing around with this:
http://git.kernel.dk/cgit/linux-block/commit/?h=aio-poll&id=ed0a0a445c0af4cfd18b0682511981eaf352d483
since we're doing a new sys_io_setup2() for polled aio anyway. This
completely avoids the iocb copy, but that's just for my initial
particular gripe.
diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index db4e5aa0858b..21c4d68c5fac 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -175,8 +175,8 @@ EXPORT_SYMBOL(copy_user_generic_string)
*/
ENTRY(copy_user_enhanced_fast_string)
ASM_STAC
- cmpl $64,%edx
- jb .L_copy_short_string /* less then 64 bytes, avoid the costly 'rep' */
+ cmpl $128,%edx
+ jb .L_copy_short_string /* less then 128 bytes, avoid costly 'rep' */
movl %edx,%ecx
1: rep
movsb
--
Jens Axboe
next prev parent reply other threads:[~2018-11-21 18:05 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <02bfc577-32a5-66be-64bf-d476b7d447d2@kernel.dk>
2018-11-20 20:24 ` [PATCH] x86: only use ERMS for user copies for larger sizes Jens Axboe
2018-11-21 6:36 ` Ingo Molnar
2018-11-21 13:32 ` Jens Axboe
2018-11-21 13:44 ` Denys Vlasenko
2018-11-22 17:36 ` David Laight
2018-11-22 17:52 ` Linus Torvalds
2018-11-22 18:06 ` Andy Lutomirski
2018-11-22 18:58 ` Linus Torvalds
2018-11-23 9:34 ` David Laight
2018-11-23 10:12 ` David Laight
2018-11-23 16:36 ` Linus Torvalds
2018-11-23 17:42 ` Linus Torvalds
2018-11-23 18:39 ` Andy Lutomirski
2018-11-23 18:44 ` Linus Torvalds
2018-11-23 19:11 ` Andy Lutomirski
2018-11-26 10:12 ` David Laight
2018-11-26 10:01 ` David Laight
2018-11-26 10:26 ` David Laight
2019-01-05 2:38 ` Linus Torvalds
2019-01-07 9:55 ` David Laight
2019-01-07 17:43 ` Linus Torvalds
2019-01-08 9:10 ` David Laight
2019-01-08 18:01 ` Linus Torvalds
2018-11-21 13:45 ` Paolo Abeni
2018-11-21 17:27 ` Linus Torvalds
2018-11-21 18:04 ` Jens Axboe [this message]
2018-11-21 18:26 ` Andy Lutomirski
2018-11-21 18:43 ` Linus Torvalds
2018-11-21 22:38 ` Andy Lutomirski
2018-11-21 18:16 ` Linus Torvalds
2018-11-21 19:01 ` Linus Torvalds
2018-11-22 10:32 ` Ingo Molnar
2018-11-22 11:13 ` Ingo Molnar
2018-11-22 11:21 ` Ingo Molnar
2018-11-23 16:40 ` Josh Poimboeuf
2018-11-22 16:55 ` Linus Torvalds
2018-11-22 17:26 ` Andy Lutomirski
2018-11-22 17:35 ` Linus Torvalds
2018-11-24 6:09 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=658cdb28-e3e5-c0af-368f-c26daf9986ac@kernel.dk \
--to=axboe@kernel.dk \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=dvlasenk@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=pabeni@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.