From: David Laight <David.Laight@ACULAB.COM>
To: 'David Howells' <dhowells@redhat.com>
Cc: 'Linus Torvalds' <torvalds@linux-foundation.org>,
Borislav Petkov <bp@alien8.de>,
kernel test robot <oliver.sang@intel.com>,
"oe-lkp@lists.linux.dev" <oe-lkp@lists.linux.dev>,
"lkp@intel.com" <lkp@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Christian Brauner <brauner@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
Christian Brauner <christian@brauner.io>,
Matthew Wilcox <willy@infradead.org>,
"ying.huang@intel.com" <ying.huang@intel.com>,
"feng.tang@intel.com" <feng.tang@intel.com>,
"fengwei.yin@intel.com" <fengwei.yin@intel.com>
Subject: RE: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression
Date: Thu, 16 Nov 2023 11:38:08 +0000 [thread overview]
Message-ID: <cde76a2d8ed54885a77771894ecd3e99@AcuMS.aculab.com> (raw)
In-Reply-To: <97468.1700129643@warthog.procyon.org.uk>
From: David Howells
> Sent: 16 November 2023 10:14
>
> David Laight <David.Laight@ACULAB.COM> wrote:
>
> > On haswell (which is now quite old) both 'rep movsb' and
> > 'rep movsq' copy 16 bytes/clock unless the destination
> > is 32 byte aligned when they copy 32 bytes/clock.
> > Source alignment make no different, neither does byte
> > alignment.
>
> I think the i3-4170 cpu I'm using is Haswell. Does that mean for my
> particular cpu, just using inline "rep movsb" is the best choice?
I've just looked at a slight old copy of the instruction timing
doc from https://www.agner.org/optimize
Apart from P4 (130 clock setup!) the setup cost for 'rep movs'
is relatively small.
I think everything since sandy bridge and bulldozer (except atoms,
but including silvermont) do fast copies for 'rep movsb'.
(But the C2758 atom we use claims erms.)
I'd bet that the overhead for using 'rep movsb' for a short copy
is less than that of the mispredicted branch (or two) to select
the required code.
That rather implies always using 'rep movsb' is best unless
someone is compiling explicitly for an old cpu.
And apart from P4 an explicit 'rep movsl' will be fastest then
because the setup cost is minimal/zero.
The cutoff for using 'rep movsb' for constant sized copies
is probably also a lot less than you might expect.
Especially assuming cold cache.
This all makes that POS that gcc is inlining even more stupid.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
next prev parent reply other threads:[~2023-11-16 11:38 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-07 1:40 [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression kernel test robot
2023-11-15 12:48 ` David Howells
2023-11-15 13:18 ` David Howells
2023-11-15 15:20 ` David Howells
2023-11-15 16:53 ` Linus Torvalds
2023-11-15 17:38 ` Linus Torvalds
2023-11-15 18:35 ` David Howells
2023-11-15 18:45 ` Linus Torvalds
2023-11-15 19:09 ` Linus Torvalds
2023-11-15 20:54 ` David Howells
2023-11-15 18:38 ` Linus Torvalds
2023-11-15 19:09 ` Borislav Petkov
2023-11-15 19:15 ` Linus Torvalds
2023-11-15 20:07 ` Linus Torvalds
2023-11-16 10:07 ` David Laight
2023-11-16 10:14 ` David Howells
2023-11-16 11:38 ` David Laight [this message]
2023-11-15 19:26 ` Linus Torvalds
2023-11-16 15:44 ` Borislav Petkov
2023-11-16 16:44 ` David Howells
2023-11-17 11:35 ` Borislav Petkov
2023-11-17 14:12 ` David Howells
2023-11-17 16:09 ` Borislav Petkov
2023-11-17 16:32 ` Linus Torvalds
2023-11-17 16:44 ` Linus Torvalds
2023-11-17 19:12 ` Borislav Petkov
2023-11-17 21:57 ` Linus Torvalds
2023-11-20 13:32 ` David Howells
2023-11-20 16:06 ` Linus Torvalds
2023-11-20 16:09 ` David Laight
2023-11-16 16:48 ` Linus Torvalds
2023-11-16 16:58 ` David Laight
2023-11-17 11:44 ` Borislav Petkov
2023-11-17 12:09 ` Jakub Jelinek
2023-11-17 12:18 ` Borislav Petkov
2023-11-17 13:09 ` David Laight
2023-11-17 13:36 ` Linus Torvalds
2023-11-17 15:20 ` David Laight
2023-11-15 21:43 ` David Howells
2023-11-15 21:50 ` Linus Torvalds
2023-11-15 21:59 ` Borislav Petkov
2023-11-15 22:59 ` David Howells
2023-11-16 3:26 ` Linus Torvalds
2023-11-16 16:55 ` David Laight
2023-11-16 17:24 ` Linus Torvalds
2023-11-16 22:53 ` David Laight
2023-11-16 21:09 ` David Howells
2023-11-16 22:36 ` Linus Torvalds
2023-11-20 11:52 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cde76a2d8ed54885a77771894ecd3e99@AcuMS.aculab.com \
--to=david.laight@aculab.com \
--cc=axboe@kernel.dk \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=christian@brauner.io \
--cc=dhowells@redhat.com \
--cc=feng.tang@intel.com \
--cc=fengwei.yin@intel.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox