Re: [PATCH] x86: only use ERMS for user copies for larger sizes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Brian Gerst <brgerst@gmail.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes
Date: Wed, 21 Nov 2018 07:36:09 +0100	[thread overview]
Message-ID: <20181121063609.GA109082@gmail.com> (raw)
In-Reply-To: <02bfc577-32a5-66be-64bf-d476b7d447d2@kernel.dk>


[ Cc:-ed a few other gents and lkml. ]

* Jens Axboe <axboe@kernel.dk> wrote:

> Hi,
> 
> So this is a fun one... While I was doing the aio polled work, I noticed
> that the submitting process spent a substantial amount of time copying
> data to/from userspace. For aio, that's iocb and io_event, which are 64
> and 32 bytes respectively. Looking closer at this, and it seems that
> ERMS rep movsb is SLOWER for smaller copies, due to a higher startup
> cost.
> 
> I came up with this hack to test it out, and low and behold, we now cut
> the time spent in copying in half. 50% less.
> 
> Since these kinds of patches tend to lend themselves to bike shedding, I
> also ran a string of kernel compilations out of RAM. Results are as
> follows:
> 
> Patched	: 62.86s avg, stddev 0.65s
> Stock	: 63.73s avg, stddev 0.67s
> 
> which would also seem to indicate that we're faster punting smaller
> (< 128 byte) copies.
> 
> CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
> 
> Interestingly, text size is smaller with the patch as well?!
> 
> I'm sure there are smarter ways to do this, but results look fairly
> conclusive. FWIW, the behaviorial change was introduced by:
> 
> commit 954e482bde20b0e208fd4d34ef26e10afd194600
> Author: Fenghua Yu <fenghua.yu@intel.com>
> Date:   Thu May 24 18:19:45 2012 -0700
> 
>     x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature
> 
> which contains nothing in terms of benchmarking or results, just claims
> that the new hotness is better.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
> 
> diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
> index a9d637bc301d..7dbb78827e64 100644
> --- a/arch/x86/include/asm/uaccess_64.h
> +++ b/arch/x86/include/asm/uaccess_64.h
> @@ -29,16 +29,27 @@ copy_user_generic(void *to, const void *from, unsigned len)
>  {
>  	unsigned ret;
>  
> +	/*
> +	 * For smaller copies, don't use ERMS as it's slower.
> +	 */
> +	if (len < 128) {
> +		alternative_call(copy_user_generic_unrolled,
> +				 copy_user_generic_string, X86_FEATURE_REP_GOOD,
> +				 ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
> +					     "=d" (len)),
> +				 "1" (to), "2" (from), "3" (len)
> +				 : "memory", "rcx", "r8", "r9", "r10", "r11");
> +		return ret;
> +	}
> +
>  	/*
>  	 * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
>  	 * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
>  	 * Otherwise, use copy_user_generic_unrolled.
>  	 */
>  	alternative_call_2(copy_user_generic_unrolled,
> -			 copy_user_generic_string,
> -			 X86_FEATURE_REP_GOOD,
> -			 copy_user_enhanced_fast_string,
> -			 X86_FEATURE_ERMS,
> +			 copy_user_generic_string, X86_FEATURE_REP_GOOD,
> +			 copy_user_enhanced_fast_string, X86_FEATURE_ERMS,
>  			 ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
>  				     "=d" (len)),
>  			 "1" (to), "2" (from), "3" (len)

So I'm inclined to do something like yours, because clearly the changelog 
of 954e482bde20 was at least partly false: Intel can say whatever they 
want, it's a fact that ERMS has high setup costs for low buffer sizes - 
ERMS is optimized for large size, cache-aligned copies mainly.

But the result is counter-intuitive in terms of kernel text footprint, 
plus the '128' is pretty arbitrary - we should at least try to come up 
with a break-even point where manual copy is about as fast as ERMS - on 
at least a single CPU ...

Thanks,

	Ingo

next prev parent reply	other threads:[~2018-11-21  6:36 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <02bfc577-32a5-66be-64bf-d476b7d447d2@kernel.dk>
2018-11-20 20:24 ` [PATCH] x86: only use ERMS for user copies for larger sizes Jens Axboe
2018-11-21  6:36 ` Ingo Molnar [this message]
2018-11-21 13:32   ` Jens Axboe
2018-11-21 13:44     ` Denys Vlasenko
2018-11-22 17:36       ` David Laight
2018-11-22 17:52         ` Linus Torvalds
2018-11-22 18:06           ` Andy Lutomirski
2018-11-22 18:58             ` Linus Torvalds
2018-11-23  9:34               ` David Laight
2018-11-23 10:12                 ` David Laight
2018-11-23 16:36                   ` Linus Torvalds
2018-11-23 17:42                     ` Linus Torvalds
2018-11-23 18:39                       ` Andy Lutomirski
2018-11-23 18:44                         ` Linus Torvalds
2018-11-23 19:11                           ` Andy Lutomirski
2018-11-26 10:12                             ` David Laight
2018-11-26 10:01                     ` David Laight
2018-11-26 10:26                     ` David Laight
2019-01-05  2:38                       ` Linus Torvalds
2019-01-07  9:55                         ` David Laight
2019-01-07 17:43                           ` Linus Torvalds
2019-01-08  9:10                             ` David Laight
2019-01-08 18:01                               ` Linus Torvalds
2018-11-21 13:45     ` Paolo Abeni
2018-11-21 17:27       ` Linus Torvalds
2018-11-21 18:04         ` Jens Axboe
2018-11-21 18:26           ` Andy Lutomirski
2018-11-21 18:43             ` Linus Torvalds
2018-11-21 22:38               ` Andy Lutomirski
2018-11-21 18:16         ` Linus Torvalds
2018-11-21 19:01           ` Linus Torvalds
2018-11-22 10:32             ` Ingo Molnar
2018-11-22 11:13               ` Ingo Molnar
2018-11-22 11:21                 ` Ingo Molnar
2018-11-23 16:40                 ` Josh Poimboeuf
2018-11-22 16:55               ` Linus Torvalds
2018-11-22 17:26                 ` Andy Lutomirski
2018-11-22 17:35                   ` Linus Torvalds
2018-11-24  6:09           ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181121063609.GA109082@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.