Linux NFS development
 help / color / mirror / Atom feed
* [PATCH] zerocopy NFS for 2.5.36
@ 2002-09-18  8:14 Hirokazu Takahashi
  2002-09-18 23:00 ` David S. Miller
  2002-10-14  5:50 ` Neil Brown
  0 siblings, 2 replies; 84+ messages in thread
From: Hirokazu Takahashi @ 2002-09-18  8:14 UTC (permalink / raw)
  To: Neil Brown, linux-kernel, nfs

Hello,

I ported the zerocopy NFS patches against linux-2.5.36.

I made va05-zerocopy-nfsdwrite-2.5.36.patch more generic,
so that it would be easy to merge with NFSv4. Each procedure can
chose whether it can accept splitted buffers or not.
And I fixed a probelem that nfsd couldn't handle NFS-symlink
requests which were very large.


1)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va10-hwchecksum-2.5.36.patch
This patch enables HW-checksum against outgoing packets including UDP frames.

2)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va11-udpsendfile-2.5.36.patch
This patch makes sendfile systemcall over UDP work. It also supports
UDP_CORK interface which is very similar to TCP_CORK. And you can call
sendmsg/senfile with MSG_MORE flags over UDP sockets.

3)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va-csumpartial-fix-2.5.36.patch
This patch fixes the problem of x86 csum_partilal() routines which
can't handle odd addressed buffers.

4)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va01-zerocopy-rpc-2.5.36.patch
This patch makes RPC can send some pieces of data and pages without copy.

5)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va02-zerocopy-nfsdread-2.5.36.patch
This patch makes NFSD send pages in pagecache directly when NFS clinets request
file-read.

6)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va03-zerocopy-nfsdreaddir-2.5.36.patch
nfsd_readdir can also send pages without copy.

7)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va04-zerocopy-shadowsock-2.5.36.patch
This patch makes per-cpu UDP sockets so that NFSD can send UDP frames on
each prosessor simultaneously.
Without the patch we can send only one UDP frame at the time as a UDP socket
have to be locked during sending some pages to serialize them.

8)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va05-zerocopy-nfsdwrite-2.5.36.patch
This patch enables NFS-write uses writev interface. NFSd can handle NFS
requests without reassembling IP fragments into one UDP frame.

9)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/taka-writev-2.5.36.patch
This patch makes writev for regular file work faster.
It also can be found at
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.35/2.5.35-mm1/broken-out/

Caution:
       XFS doesn't support writev interface yet. NFS write on XFS might
       slow down with No.8 patch. I wish SGI guys will implement it.

10)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va07-nfsbigbuf-2.5.36.patch
This makes NFS buffer much bigger (60KB).
60KB buffer is the same to 32KB buffer for linux-kernel as both of them
require 64KB chunk.


11)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va09-zerocopy-tempsendto-2.5.36.patch
If you don't want to use sendfile over UDP yet, you can apply it instead of No.1 and No.2 patches.



Regards,
Hirokazu Takahashi

^ permalink raw reply	[flat|nested] 84+ messages in thread
* RE: [NFS] Re: [PATCH] zerocopy NFS for 2.5.36
@ 2002-09-19  2:00 Lever, Charles
  0 siblings, 0 replies; 84+ messages in thread
From: Lever, Charles @ 2002-09-19  2:00 UTC (permalink / raw)
  To: 'Andrew Morton'
  Cc: David S. Miller, taka, neilb, linux-kernel, nfs, Alan Cox

dude, that's pretty cool.

if you were re-implementing XDR, you think a series of movl
instructions would be best?  i'm not sure how practical that
is for an architecture-independent implementation.

> > > It was discussed long ago that csum_and_copy_from_user() performs 
> > > better than plain copy_from_user() on x86.  I do not remember all
> > 
> > The better was a freak of PPro/PII scheduling I think
> > 
> > > details, but I do know that using copy_from_user() is not a real 
> > > improvement at least on x86 architecture.
> > 
> > The same as bit is easy to explain. Its totally memory bandwidth 
> > limited on current x86-32 processors. (Although I'd welcome 
> > demonstrations to the contrary on newer toys)
> 
> Nope.  There are distinct alignment problems with movsl-based 
> memcpy on PII and (at least) "Pentium III (Coppermine)", 
> which is tested here:
> 
> copy_32 uses movsl.  copy_duff just uses a stream of "movl"s
> 
> Time uncached-to-uncached memcpy, source and dest are 8-byte-aligned:
> 
> akpm:/usr/src/cptimer> ./cptimer -d -s     
> nbytes=10240  from_align=0, to_align=0
>     copy_32: copied 19.1 Mbytes in 0.078 seconds at 243.9 Mbytes/sec
> __copy_duff: copied 19.1 Mbytes in 0.090 seconds at 211.1 Mbytes/sec
> 
> OK, movsl wins.   But now give the source address 8+1 alignment:
> 
> akpm:/usr/src/cptimer> ./cptimer -d -s -f 1
> nbytes=10240  from_align=1, to_align=0
>     copy_32: copied 19.1 Mbytes in 0.158 seconds at 120.8 Mbytes/sec
> __copy_duff: copied 19.1 Mbytes in 0.091 seconds at 210.3 Mbytes/sec
> 
> The "movl"-based copy wins.  By miles.
> 
> Make the source 8+4 aligned:
> 
> akpm:/usr/src/cptimer> ./cptimer -d -s -f 4
> nbytes=10240  from_align=4, to_align=0
>     copy_32: copied 19.1 Mbytes in 0.134 seconds at 142.1 Mbytes/sec
> __copy_duff: copied 19.1 Mbytes in 0.089 seconds at 214.0 Mbytes/sec
> 
> So movl still beats movsl, by lots.
> 
> I have various scriptlets which generate the entire matrix.
> 
> I think I ended up deciding that we should use movsl _only_ 
> when both src and dsc are 8-byte-aligned.  And that when you 
> multiply the gain from that by the frequency*size with which 
> funny alignments are used by TCP the net gain was 2% or something.
> 
> It needs redoing.  These differences are really big, and this 
> is the kernel's most expensive function.
> 
> A little project for someone.
> 
> The tools are at http://www.zip.com.au/~akpm/linux/cptimer.tar.gz

^ permalink raw reply	[flat|nested] 84+ messages in thread
* Re: Re: [PATCH] zerocopy NFS for 2.5.36
@ 2002-09-20  1:00 Andi Kleen
  2002-09-20  1:09 ` Andrew Morton
  0 siblings, 1 reply; 84+ messages in thread
From: Andi Kleen @ 2002-09-20  1:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hirokazu Takahashi, alan, davem, neilb, linux-kernel, nfs

Andrew Morton <akpm@digeo.com> writes:

> Hirokazu Takahashi wrote:
> > 
> > ...
> > > It needs redoing.  These differences are really big, and this
> > > is the kernel's most expensive function.
> > >
> > > A little project for someone.
> > 
> > OK, if there is nobody who wants to do it I'll do it by myself.
> 
> That would be fantastic - thanks.  This is more a measurement
> and testing exercise than a coding one.  And if those measurements
> are sufficiently nice (eg: >5%) then a 2.4 backport should be done.

Very interesting IMHO would be to find a heuristic to switch between
a write combining copy and a cache hot copy. Write combining is good 
for blasting huge amounts of data quickly without killing your caches.
Cache hot is good for everything else.

But it'll need hints from the higher level code. e.g. read and write
could turn on write combining for bigger writes (let's say >8K) 
I discovered that just unconditionally turning it on for all copies 
is not good because it forces data out of cache. But I still have hope
that it helps for selected copies.


-Andi


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 84+ messages in thread
* RE: [NFS] Re: [PATCH] zerocopy NFS for 2.5.36
@ 2002-10-16 14:04 Lever, Charles
  0 siblings, 0 replies; 84+ messages in thread
From: Lever, Charles @ 2002-10-16 14:04 UTC (permalink / raw)
  To: neilb; +Cc: taka, linux-kernel, nfs, 'David S. Miller'

> -----Original Message-----
> From: David S. Miller [mailto:davem@redhat.com]
> Sent: Wednesday, October 16, 2002 12:31 AM
>
>    From: Neil Brown <neilb@cse.unsw.edu.au>
>    Date: Wed, 16 Oct 2002 13:44:04 +1000
> 
>    Presumably on a sufficiently large SMP machine that this became an
>    issue, there would be multiple NICs.  Maybe it would make sense to
>    have one udp socket for each NIC.  Would that make sense? or work?
>    It feels to me to be cleaner than one for each CPU.
>    
> Doesn't make much sense.
> 
> Usually we are talking via one IP address, and thus over
> one device.  It could be using multiple NICs via BONDING,
> but that would be transparent to anything at the socket
> level.
> 
> Really, I think there is real value to making the socket
> per-cpu even on a 2 or 4 way system.

having a local socket per CPU is very good for SMP scaling.
it multiplies input buffer space, and reduces socket lock
and CPU cache contention.

sorry, i don't have measurements.

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2002-11-04 21:45 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-18  8:14 [PATCH] zerocopy NFS for 2.5.36 Hirokazu Takahashi
2002-09-18 23:00 ` David S. Miller
2002-09-18 23:54   ` Alan Cox
2002-09-19  0:16     ` Andrew Morton
2002-09-19  2:13       ` Aaron Lehmann
2002-09-19  3:30         ` Andrew Morton
2002-09-19 10:42           ` Alan Cox
2002-09-19 13:15       ` [NFS] " Hirokazu Takahashi
2002-09-19 20:42         ` Andrew Morton
2002-09-19 21:12           ` David S. Miller
2002-09-21 11:56   ` Pavel Machek
2002-10-14  5:50 ` Neil Brown
2002-10-14  6:15   ` David S. Miller
2002-10-14 10:45     ` kuznet
2002-10-14 10:48       ` David S. Miller
2002-10-14 12:01   ` Hirokazu Takahashi
2002-10-14 14:12     ` Andrew Theurer
2002-10-16  3:44     ` Neil Brown
2002-10-16  4:31       ` David S. Miller
2002-10-16 15:04         ` Andrew Theurer
2002-10-17  2:03         ` [NFS] " Andrew Theurer
2002-10-17  2:31           ` Hirokazu Takahashi
2002-10-17 13:16             ` Andrew Theurer
2002-10-17 13:26               ` Hirokazu Takahashi
2002-10-17 14:10                 ` [NFS] " Andrew Theurer
2002-10-17 16:26                   ` Hirokazu Takahashi
2002-10-18  5:38                     ` [NFS] " Trond Myklebust
2002-10-18  7:19                       ` Hirokazu Takahashi
2002-10-18 15:12                         ` Andrew Theurer
2002-10-19 20:34                           ` Hirokazu Takahashi
2002-10-22 21:16                             ` Andrew Theurer
2002-10-23  9:29                               ` [NFS] " Hirokazu Takahashi
2002-10-24 15:32                                 ` Andrew Theurer
2002-10-27 11:10                                   ` Hirokazu Takahashi
2002-10-16 11:09       ` Hirokazu Takahashi
2002-10-16 17:02         ` kaza
2002-10-17  4:36           ` rddunlap
2002-10-18 13:11   ` [PATCH] zerocopy NFS for 2.5.43 Hirokazu Takahashi
2002-10-23  1:18     ` Neil Brown
2002-10-23  3:53       ` Hirokazu Takahashi
2002-10-23  5:40         ` Hirokazu Takahashi
2002-10-23  6:03           ` Neil Brown
2002-10-23 22:35             ` Hirokazu Takahashi
2002-10-23  6:10         ` Neil Brown
2002-10-23  7:08           ` Hirokazu Takahashi
2002-10-23 15:23           ` Trond Myklebust
2002-10-23 21:50       ` Hirokazu Takahashi
2002-10-23 23:55         ` Trond Myklebust
2002-10-24  1:33           ` Hirokazu Takahashi
2002-10-27 10:39             ` Hirokazu Takahashi
2002-10-28 16:31               ` Trond Myklebust
2002-10-28 23:39                 ` Hirokazu Takahashi
2002-10-29  6:36                 ` Hirokazu Takahashi
2002-10-29 15:09                   ` Trond Myklebust
2002-10-29 16:27                     ` Hirokazu Takahashi
2002-10-29 16:49                       ` Trond Myklebust
2002-10-30  3:18                     ` Hirokazu Takahashi
2002-10-25  9:52       ` Hirokazu Takahashi
2002-10-25 12:41         ` Neil Brown
2002-10-26  3:11           ` Hirokazu Takahashi
2002-10-26  3:46             ` Benjamin LaHaise
2002-10-27 22:46               ` Neil Brown
2002-10-30 23:29           ` Hirokazu Takahashi
2002-10-30 23:53             ` Neil Brown
2002-10-31  2:06               ` Hirokazu Takahashi
2002-10-31 15:40                 ` Hirokazu Takahashi
2002-10-31 16:56                   ` Hirokazu Takahashi
2002-11-01  1:10                     ` Neil Brown
2002-11-04 21:13                       ` Andrew Theurer
2002-11-01  0:54                   ` Neil Brown
2002-11-01  1:39                     ` Hirokazu Takahashi
2002-11-01  3:41                     ` Hirokazu Takahashi
2002-11-01  4:20                       ` Neil Brown
2002-11-01  5:07                         ` Hirokazu Takahashi
2002-10-25 17:23         ` Trond Myklebust
2002-10-26  3:26           ` Hirokazu Takahashi
  -- strict thread matches above, loose matches on Subject: below --
2002-09-19  2:00 [NFS] Re: [PATCH] zerocopy NFS for 2.5.36 Lever, Charles
2002-09-20  1:00 Andi Kleen
2002-09-20  1:09 ` Andrew Morton
2002-09-20  1:23   ` Andi Kleen
2002-09-20  1:27     ` [NFS] " David S. Miller
2002-09-20  2:06       ` Andi Kleen
2002-09-20  2:01         ` David S. Miller
2002-09-20  2:28           ` Andi Kleen
2002-09-20  2:20             ` David S. Miller
2002-09-20  2:35               ` Andi Kleen
2002-10-16 14:04 Lever, Charles

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox