public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: dhowells@redhat.com, Borislav Petkov <bp@alien8.de>,
	kernel test robot <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Christian Brauner <christian@brauner.io>,
	Matthew Wilcox <willy@infradead.org>,
	David Laight <David.Laight@aculab.com>,
	ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com
Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression
Date: Mon, 20 Nov 2023 13:32:57 +0000	[thread overview]
Message-ID: <2284219.1700487177@warthog.procyon.org.uk> (raw)
In-Reply-To: <CAHk-=wiRQHD5xnB8H9Lwk9fJPDpfVNAwPS4KLnfrcrU3zbMAdQ@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> So I don't think we should use either of these benchmarks as a "we
> need to optimize for *this*", but it is another example of how much
> memcpy() does matter. Even if the end result is then "but different
> microarchitectrues react so differently that we can't please
> everybody".

So what, if anything, should I change?  Should I make it directly call
__memcpy?  Or should we just leave it to the compiler?  I would prefer to
leave memcpy_from_iter() and memcpy_to_iter() as __always_inline to eliminate
the function pointer call we otherwise end up with and to eliminate the return
value (which is always 0 in this case).

How about something along the attached lines?  (With the inline asm
appropriate pushed out to an arch header file).

On the other hand, it might be better to have memcpy_to/from_iter() just call
__memcpy() as it doesn't seem to make much difference to the time taken and
the inline func can still return a constant 0 return value that can be
optimised away.

David
---

diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index 0ae2e1712e2e..7354982dc433 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -43,7 +43,7 @@ EXPORT_SYMBOL(__memcpy)
 SYM_FUNC_ALIAS_MEMFUNC(memcpy, __memcpy)
 EXPORT_SYMBOL(memcpy)
 
-SYM_FUNC_START_LOCAL(memcpy_orig)
+SYM_TYPED_FUNC_START(memcpy_orig)
 	movq %rdi, %rax
 
 	cmpq $0x20, %rdx
@@ -169,4 +169,4 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
 .Lend:
 	RET
 SYM_FUNC_END(memcpy_orig)
-
+EXPORT_SYMBOL(memcpy_orig)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index de7d11cf4c63..de73edb9ffcc 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -62,7 +62,17 @@ static __always_inline
 size_t memcpy_to_iter(void *iter_to, size_t progress,
 		      size_t len, void *from, void *priv2)
 {
-	memcpy(iter_to, from + progress, len);
+	size_t len2 = len;
+	from += progress;
+	/*
+	 * If CPU has FSRM feature, use 'rep movs'.
+	 * Otherwise, use rep_movs_alternative.
+	 */
+	asm volatile(
+		ALTERNATIVE("rep movsb",
+			    "call memcpy_orig", ALT_NOT(X86_FEATURE_FSRM))
+		:"+D" (iter_to), "+S" (from), "+d" (len), "+c"(len2), ASM_CALL_CONSTRAINT
+		:: "memory", "rax", "r8", "r9", "r10", "r11");
 	return 0;
 }
 
@@ -70,7 +80,18 @@ static __always_inline
 size_t memcpy_from_iter(void *iter_from, size_t progress,
 			size_t len, void *to, void *priv2)
 {
-	memcpy(to + progress, iter_from, len);
+	size_t len2 = len;
+	to += progress;
+	/*
+	 * If CPU has FSRM feature, use 'rep movs'.
+	 * Otherwise, use rep_movs_alternative.
+	 */
+	asm volatile(
+		ALTERNATIVE("rep movsb",
+			    "call memcpy_orig",
+			    ALT_NOT(X86_FEATURE_FSRM))
+		:"+D" (to), "+S" (iter_from), "+d" (len), "+c" (len2), ASM_CALL_CONSTRAINT
+		:: "memory", "rax", "r8", "r9", "r10", "r11");
 	return 0;
 }
 


  reply	other threads:[~2023-11-20 13:33 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07  1:40 [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression kernel test robot
2023-11-15 12:48 ` David Howells
2023-11-15 13:18 ` David Howells
2023-11-15 15:20 ` David Howells
2023-11-15 16:53   ` Linus Torvalds
2023-11-15 17:38     ` Linus Torvalds
2023-11-15 18:35       ` David Howells
2023-11-15 18:45         ` Linus Torvalds
2023-11-15 19:09           ` Linus Torvalds
2023-11-15 20:54           ` David Howells
2023-11-15 18:38       ` Linus Torvalds
2023-11-15 19:09         ` Borislav Petkov
2023-11-15 19:15           ` Linus Torvalds
2023-11-15 20:07             ` Linus Torvalds
2023-11-16 10:07               ` David Laight
2023-11-16 10:14                 ` David Howells
2023-11-16 11:38                   ` David Laight
2023-11-15 19:26           ` Linus Torvalds
2023-11-16 15:44             ` Borislav Petkov
2023-11-16 16:44               ` David Howells
2023-11-17 11:35                 ` Borislav Petkov
2023-11-17 14:12                   ` David Howells
2023-11-17 16:09                     ` Borislav Petkov
2023-11-17 16:32                       ` Linus Torvalds
2023-11-17 16:44                         ` Linus Torvalds
2023-11-17 19:12                           ` Borislav Petkov
2023-11-17 21:57                             ` Linus Torvalds
2023-11-20 13:32                               ` David Howells [this message]
2023-11-20 16:06                                 ` Linus Torvalds
2023-11-20 16:09                                 ` David Laight
2023-11-16 16:48               ` Linus Torvalds
2023-11-16 16:58                 ` David Laight
2023-11-17 11:44                 ` Borislav Petkov
2023-11-17 12:09                   ` Jakub Jelinek
2023-11-17 12:18                     ` Borislav Petkov
2023-11-17 13:09                   ` David Laight
2023-11-17 13:36                     ` Linus Torvalds
2023-11-17 15:20                       ` David Laight
2023-11-15 21:43         ` David Howells
2023-11-15 21:50           ` Linus Torvalds
2023-11-15 21:59             ` Borislav Petkov
2023-11-15 22:59             ` David Howells
2023-11-16  3:26               ` Linus Torvalds
2023-11-16 16:55                 ` David Laight
2023-11-16 17:24                   ` Linus Torvalds
2023-11-16 22:53                     ` David Laight
2023-11-16 21:09                 ` David Howells
2023-11-16 22:36                   ` Linus Torvalds
2023-11-20 11:52             ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2284219.1700487177@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=David.Laight@aculab.com \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=christian@brauner.io \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox