public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Linus Torvalds' <torvalds@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>
Cc: David Howells <dhowells@redhat.com>,
	kernel test robot <oliver.sang@intel.com>,
	"oe-lkp@lists.linux.dev" <oe-lkp@lists.linux.dev>,
	"lkp@intel.com" <lkp@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Christian Brauner <christian@brauner.io>,
	Matthew Wilcox <willy@infradead.org>,
	"ying.huang@intel.com" <ying.huang@intel.com>,
	"feng.tang@intel.com" <feng.tang@intel.com>,
	"fengwei.yin@intel.com" <fengwei.yin@intel.com>
Subject: RE: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression
Date: Thu, 16 Nov 2023 10:07:35 +0000	[thread overview]
Message-ID: <4c0c3ee6cfa84d21a807055bc1aa27b8@AcuMS.aculab.com> (raw)
In-Reply-To: <CAHk-=whrc-ruKs4Kt90EGzKd+pYhZFKs6bgBVCV=55BK+p1nzg@mail.gmail.com>

From: Linus Torvalds
> Sent: 15 November 2023 20:07
...
>  - our current "memcpy_orig" fallback does unrolled copy loops, and
> the rep_movs_alternative fallback obviously doesn't.
> 
> It's not clear that the unrolled copy loops matter for the in-kernel
> kinds of copies, but who knows. The memcpy_orig code is definitely
> trying to be smarter in some other ways too. So the fallback should
> try a *bit* harder than I did, and not just with the whole "don't try
> to handle exceptions" issue I mentioned.

I'm pretty sure the unrolled copy (and other unrolled loops)
just wastes I-cache and slows things down cold-cache.

With out of order execute on most x86 cpu (except atoms) you
don't really have to worry about the memory latency.
So get the loop control instructions to run in parallel with
the memory access ones and you can copy one word every clock.
I never managed a single clock loop, but you can get a two
clock loop (with 2 reads and 2 writes in it).

So unrolling once is typically enough.

You can also ignore alignment, the extra cost is minimal (on
Intel cpu at least). I think it requires an extra u-op when
the copy crosses a cache line boundadry.

On haswell (which is now quite old) both 'rep movsb' and
'rep movsq' copy 16 bytes/clock unless the destination
is 32 byte aligned when they copy 32 bytes/clock.
Source alignment make no different, neither does byte
alignment.

Another -Os stupidity is 'push $x; pop %reg' to load
a signed byte constant.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  reply	other threads:[~2023-11-16 10:08 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07  1:40 [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression kernel test robot
2023-11-15 12:48 ` David Howells
2023-11-15 13:18 ` David Howells
2023-11-15 15:20 ` David Howells
2023-11-15 16:53   ` Linus Torvalds
2023-11-15 17:38     ` Linus Torvalds
2023-11-15 18:35       ` David Howells
2023-11-15 18:45         ` Linus Torvalds
2023-11-15 19:09           ` Linus Torvalds
2023-11-15 20:54           ` David Howells
2023-11-15 18:38       ` Linus Torvalds
2023-11-15 19:09         ` Borislav Petkov
2023-11-15 19:15           ` Linus Torvalds
2023-11-15 20:07             ` Linus Torvalds
2023-11-16 10:07               ` David Laight [this message]
2023-11-16 10:14                 ` David Howells
2023-11-16 11:38                   ` David Laight
2023-11-15 19:26           ` Linus Torvalds
2023-11-16 15:44             ` Borislav Petkov
2023-11-16 16:44               ` David Howells
2023-11-17 11:35                 ` Borislav Petkov
2023-11-17 14:12                   ` David Howells
2023-11-17 16:09                     ` Borislav Petkov
2023-11-17 16:32                       ` Linus Torvalds
2023-11-17 16:44                         ` Linus Torvalds
2023-11-17 19:12                           ` Borislav Petkov
2023-11-17 21:57                             ` Linus Torvalds
2023-11-20 13:32                               ` David Howells
2023-11-20 16:06                                 ` Linus Torvalds
2023-11-20 16:09                                 ` David Laight
2023-11-16 16:48               ` Linus Torvalds
2023-11-16 16:58                 ` David Laight
2023-11-17 11:44                 ` Borislav Petkov
2023-11-17 12:09                   ` Jakub Jelinek
2023-11-17 12:18                     ` Borislav Petkov
2023-11-17 13:09                   ` David Laight
2023-11-17 13:36                     ` Linus Torvalds
2023-11-17 15:20                       ` David Laight
2023-11-15 21:43         ` David Howells
2023-11-15 21:50           ` Linus Torvalds
2023-11-15 21:59             ` Borislav Petkov
2023-11-15 22:59             ` David Howells
2023-11-16  3:26               ` Linus Torvalds
2023-11-16 16:55                 ` David Laight
2023-11-16 17:24                   ` Linus Torvalds
2023-11-16 22:53                     ` David Laight
2023-11-16 21:09                 ` David Howells
2023-11-16 22:36                   ` Linus Torvalds
2023-11-20 11:52             ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4c0c3ee6cfa84d21a807055bc1aa27b8@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox