linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@digeo.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "David S. Miller" <davem@redhat.com>,
	taka@valinux.co.jp, neilb@cse.unsw.edu.au,
	linux-kernel@vger.kernel.org, nfs@lists.sourceforge.net
Subject: Re: [PATCH] zerocopy NFS for 2.5.36
Date: Wed, 18 Sep 2002 17:16:43 -0700	[thread overview]
Message-ID: <3D89176B.40FFD09B@digeo.com> (raw)
In-Reply-To: 1032393277.24895.8.camel@irongate.swansea.linux.org.uk

Alan Cox wrote:
> 
> On Thu, 2002-09-19 at 00:00, David S. Miller wrote:
> > It was discussed long ago that csum_and_copy_from_user() performs
> > better than plain copy_from_user() on x86.  I do not remember all
> 
> The better was a freak of PPro/PII scheduling I think
> 
> > details, but I do know that using copy_from_user() is not a real
> > improvement at least on x86 architecture.
> 
> The same as bit is easy to explain. Its totally memory bandwidth limited
> on current x86-32 processors. (Although I'd welcome demonstrations to
> the contrary on newer toys)

Nope.  There are distinct alignment problems with movsl-based
memcpy on PII and (at least) "Pentium III (Coppermine)", which is
tested here:

copy_32 uses movsl.  copy_duff just uses a stream of "movl"s

Time uncached-to-uncached memcpy, source and dest are 8-byte-aligned:

akpm:/usr/src/cptimer> ./cptimer -d -s     
nbytes=10240  from_align=0, to_align=0
    copy_32: copied 19.1 Mbytes in 0.078 seconds at 243.9 Mbytes/sec
__copy_duff: copied 19.1 Mbytes in 0.090 seconds at 211.1 Mbytes/sec

OK, movsl wins.   But now give the source address 8+1 alignment:

akpm:/usr/src/cptimer> ./cptimer -d -s -f 1
nbytes=10240  from_align=1, to_align=0
    copy_32: copied 19.1 Mbytes in 0.158 seconds at 120.8 Mbytes/sec
__copy_duff: copied 19.1 Mbytes in 0.091 seconds at 210.3 Mbytes/sec

The "movl"-based copy wins.  By miles.

Make the source 8+4 aligned:

akpm:/usr/src/cptimer> ./cptimer -d -s -f 4
nbytes=10240  from_align=4, to_align=0
    copy_32: copied 19.1 Mbytes in 0.134 seconds at 142.1 Mbytes/sec
__copy_duff: copied 19.1 Mbytes in 0.089 seconds at 214.0 Mbytes/sec

So movl still beats movsl, by lots.

I have various scriptlets which generate the entire matrix.

I think I ended up deciding that we should use movsl _only_
when both src and dsc are 8-byte-aligned.  And that when you
multiply the gain from that by the frequency*size with which
funny alignments are used by TCP the net gain was 2% or something.

It needs redoing.  These differences are really big, and this
is the kernel's most expensive function.

A little project for someone.

The tools are at http://www.zip.com.au/~akpm/linux/cptimer.tar.gz

  reply	other threads:[~2002-09-19  0:11 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-18  8:14 [PATCH] zerocopy NFS for 2.5.36 Hirokazu Takahashi
2002-09-18 23:00 ` David S. Miller
2002-09-18 23:54   ` Alan Cox
2002-09-19  0:16     ` Andrew Morton [this message]
2002-09-19  2:13       ` Aaron Lehmann
2002-09-19  3:30         ` Andrew Morton
2002-09-19 10:42           ` Alan Cox
2002-09-19 13:15       ` [NFS] " Hirokazu Takahashi
2002-09-19 20:42         ` Andrew Morton
2002-09-19 21:12           ` David S. Miller
2002-09-21 11:56   ` Pavel Machek
2002-10-14  5:50 ` Neil Brown
2002-10-14  6:15   ` David S. Miller
2002-10-14 10:45     ` kuznet
2002-10-14 10:48       ` David S. Miller
2002-10-14 12:01   ` Hirokazu Takahashi
2002-10-14 14:12     ` Andrew Theurer
2002-10-16  3:44     ` Neil Brown
2002-10-16  4:31       ` David S. Miller
2002-10-16 15:04         ` Andrew Theurer
2002-10-17  2:03         ` [NFS] " Andrew Theurer
2002-10-17  2:31           ` Hirokazu Takahashi
2002-10-17 13:16             ` Andrew Theurer
2002-10-17 13:26               ` Hirokazu Takahashi
2002-10-17 14:10                 ` Andrew Theurer
2002-10-17 16:26                   ` Hirokazu Takahashi
2002-10-18  5:38                     ` Trond Myklebust
2002-10-18  7:19                       ` Hirokazu Takahashi
2002-10-18 15:12                         ` Andrew Theurer
2002-10-19 20:34                           ` Hirokazu Takahashi
2002-10-22 21:16                             ` Andrew Theurer
2002-10-23  9:29                               ` Hirokazu Takahashi
2002-10-24 15:32                                 ` Andrew Theurer
2002-10-16 11:09       ` Hirokazu Takahashi
2002-10-16 17:02         ` kaza
2002-10-17  4:36           ` rddunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D89176B.40FFD09B@digeo.com \
    --to=akpm@digeo.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    --cc=nfs@lists.sourceforge.net \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).