All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@digeo.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "David S. Miller" <davem@redhat.com>,
	taka@valinux.co.jp, neilb@cse.unsw.edu.au,
	linux-kernel@vger.kernel.org, nfs@lists.sourceforge.net
Subject: Re: [PATCH] zerocopy NFS for 2.5.36
Date: Wed, 18 Sep 2002 17:16:43 -0700	[thread overview]
Message-ID: <3D89176B.40FFD09B@digeo.com> (raw)
In-Reply-To: 1032393277.24895.8.camel@irongate.swansea.linux.org.uk

Alan Cox wrote:
> 
> On Thu, 2002-09-19 at 00:00, David S. Miller wrote:
> > It was discussed long ago that csum_and_copy_from_user() performs
> > better than plain copy_from_user() on x86.  I do not remember all
> 
> The better was a freak of PPro/PII scheduling I think
> 
> > details, but I do know that using copy_from_user() is not a real
> > improvement at least on x86 architecture.
> 
> The same as bit is easy to explain. Its totally memory bandwidth limited
> on current x86-32 processors. (Although I'd welcome demonstrations to
> the contrary on newer toys)

Nope.  There are distinct alignment problems with movsl-based
memcpy on PII and (at least) "Pentium III (Coppermine)", which is
tested here:

copy_32 uses movsl.  copy_duff just uses a stream of "movl"s

Time uncached-to-uncached memcpy, source and dest are 8-byte-aligned:

akpm:/usr/src/cptimer> ./cptimer -d -s     
nbytes=10240  from_align=0, to_align=0
    copy_32: copied 19.1 Mbytes in 0.078 seconds at 243.9 Mbytes/sec
__copy_duff: copied 19.1 Mbytes in 0.090 seconds at 211.1 Mbytes/sec

OK, movsl wins.   But now give the source address 8+1 alignment:

akpm:/usr/src/cptimer> ./cptimer -d -s -f 1
nbytes=10240  from_align=1, to_align=0
    copy_32: copied 19.1 Mbytes in 0.158 seconds at 120.8 Mbytes/sec
__copy_duff: copied 19.1 Mbytes in 0.091 seconds at 210.3 Mbytes/sec

The "movl"-based copy wins.  By miles.

Make the source 8+4 aligned:

akpm:/usr/src/cptimer> ./cptimer -d -s -f 4
nbytes=10240  from_align=4, to_align=0
    copy_32: copied 19.1 Mbytes in 0.134 seconds at 142.1 Mbytes/sec
__copy_duff: copied 19.1 Mbytes in 0.089 seconds at 214.0 Mbytes/sec

So movl still beats movsl, by lots.

I have various scriptlets which generate the entire matrix.

I think I ended up deciding that we should use movsl _only_
when both src and dsc are 8-byte-aligned.  And that when you
multiply the gain from that by the frequency*size with which
funny alignments are used by TCP the net gain was 2% or something.

It needs redoing.  These differences are really big, and this
is the kernel's most expensive function.

A little project for someone.

The tools are at http://www.zip.com.au/~akpm/linux/cptimer.tar.gz

  reply	other threads:[~2002-09-19  0:16 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-18  8:14 [PATCH] zerocopy NFS for 2.5.36 Hirokazu Takahashi
2002-09-18 23:00 ` David S. Miller
2002-09-18 23:54   ` Alan Cox
2002-09-18 23:54     ` Alan Cox
2002-09-19  0:16     ` Andrew Morton [this message]
2002-09-19  2:13       ` Aaron Lehmann
2002-09-19  3:30         ` Andrew Morton
2002-09-19  3:30           ` Andrew Morton
2002-09-19 10:42           ` Alan Cox
2002-09-19 10:42             ` Alan Cox
2002-09-19 13:15       ` [NFS] " Hirokazu Takahashi
2002-09-19 20:42         ` Andrew Morton
2002-09-19 21:12           ` David S. Miller
2002-09-19 21:12             ` [NFS] " David S. Miller
2002-09-21 11:56   ` Pavel Machek
2002-09-21 11:56     ` Pavel Machek
2002-10-14  5:50 ` Neil Brown
2002-10-14  6:15   ` David S. Miller
2002-10-14 10:45     ` kuznet
2002-10-14 10:48       ` David S. Miller
2002-10-14 12:01   ` Hirokazu Takahashi
2002-10-14 14:12     ` Andrew Theurer
2002-10-16  3:44     ` Neil Brown
2002-10-16  4:31       ` David S. Miller
2002-10-16 15:04         ` Andrew Theurer
2002-10-17  2:03         ` [NFS] " Andrew Theurer
2002-10-17  2:31           ` Hirokazu Takahashi
2002-10-17 13:16             ` Andrew Theurer
2002-10-17 13:16               ` [NFS] " Andrew Theurer
2002-10-17 13:26               ` Hirokazu Takahashi
2002-10-17 13:26                 ` [NFS] " Hirokazu Takahashi
2002-10-17 14:10                 ` Andrew Theurer
2002-10-17 16:26                   ` Hirokazu Takahashi
2002-10-17 16:26                     ` [NFS] " Hirokazu Takahashi
2002-10-18  5:38                     ` Trond Myklebust
2002-10-18  7:19                       ` Hirokazu Takahashi
2002-10-18 15:12                         ` Andrew Theurer
2002-10-18 15:12                           ` [NFS] " Andrew Theurer
2002-10-19 20:34                           ` Hirokazu Takahashi
2002-10-19 20:34                             ` [NFS] " Hirokazu Takahashi
2002-10-22 21:16                             ` Andrew Theurer
2002-10-22 21:16                               ` [NFS] " Andrew Theurer
2002-10-23  9:29                               ` Hirokazu Takahashi
2002-10-24 15:32                                 ` Andrew Theurer
2002-10-27 11:10                                   ` Hirokazu Takahashi
2002-10-16 11:09       ` Hirokazu Takahashi
2002-10-16 17:02         ` kaza
2002-10-17  4:36           ` rddunlap
2002-10-18 13:11   ` [PATCH] zerocopy NFS for 2.5.43 Hirokazu Takahashi
2002-10-23  1:18     ` Neil Brown
2002-10-23  3:53       ` Hirokazu Takahashi
2002-10-23  5:40         ` Hirokazu Takahashi
2002-10-23  6:03           ` Neil Brown
2002-10-23 22:35             ` Hirokazu Takahashi
2002-10-23  6:10         ` Neil Brown
2002-10-23  7:08           ` Hirokazu Takahashi
2002-10-23 15:23           ` Trond Myklebust
2002-10-23 21:50       ` Hirokazu Takahashi
2002-10-23 23:55         ` Trond Myklebust
2002-10-24  1:33           ` Hirokazu Takahashi
2002-10-27 10:39             ` Hirokazu Takahashi
2002-10-28 16:31               ` Trond Myklebust
2002-10-28 23:39                 ` Hirokazu Takahashi
2002-10-29  6:36                 ` Hirokazu Takahashi
2002-10-29 15:09                   ` Trond Myklebust
2002-10-29 16:27                     ` Hirokazu Takahashi
2002-10-29 16:49                       ` Trond Myklebust
2002-10-30  3:18                     ` Hirokazu Takahashi
2002-10-25  9:52       ` Hirokazu Takahashi
2002-10-25 12:41         ` Neil Brown
2002-10-26  3:11           ` Hirokazu Takahashi
2002-10-26  3:46             ` Benjamin LaHaise
2002-10-27 22:46               ` Neil Brown
2002-10-30 23:29           ` Hirokazu Takahashi
2002-10-30 23:53             ` Neil Brown
2002-10-31  2:06               ` Hirokazu Takahashi
2002-10-31 15:40                 ` Hirokazu Takahashi
2002-10-31 16:56                   ` Hirokazu Takahashi
2002-11-01  1:10                     ` Neil Brown
2002-11-04 21:13                       ` Andrew Theurer
2002-11-01  0:54                   ` Neil Brown
2002-11-01  1:39                     ` Hirokazu Takahashi
2002-11-01  3:41                     ` Hirokazu Takahashi
2002-11-01  4:20                       ` Neil Brown
2002-11-01  5:07                         ` Hirokazu Takahashi
2002-10-25 17:23         ` Trond Myklebust
2002-10-26  3:26           ` Hirokazu Takahashi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D89176B.40FFD09B@digeo.com \
    --to=akpm@digeo.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    --cc=nfs@lists.sourceforge.net \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.