linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] zerocopy NFS for 2.5.36
@ 2002-09-18  8:14 Hirokazu Takahashi
  2002-09-18 23:00 ` David S. Miller
  2002-10-14  5:50 ` Neil Brown
  0 siblings, 2 replies; 47+ messages in thread
From: Hirokazu Takahashi @ 2002-09-18  8:14 UTC (permalink / raw)
  To: Neil Brown, linux-kernel, nfs

Hello,

I ported the zerocopy NFS patches against linux-2.5.36.

I made va05-zerocopy-nfsdwrite-2.5.36.patch more generic,
so that it would be easy to merge with NFSv4. Each procedure can
chose whether it can accept splitted buffers or not.
And I fixed a probelem that nfsd couldn't handle NFS-symlink
requests which were very large.


1)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va10-hwchecksum-2.5.36.patch
This patch enables HW-checksum against outgoing packets including UDP frames.

2)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va11-udpsendfile-2.5.36.patch
This patch makes sendfile systemcall over UDP work. It also supports
UDP_CORK interface which is very similar to TCP_CORK. And you can call
sendmsg/senfile with MSG_MORE flags over UDP sockets.

3)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va-csumpartial-fix-2.5.36.patch
This patch fixes the problem of x86 csum_partilal() routines which
can't handle odd addressed buffers.

4)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va01-zerocopy-rpc-2.5.36.patch
This patch makes RPC can send some pieces of data and pages without copy.

5)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va02-zerocopy-nfsdread-2.5.36.patch
This patch makes NFSD send pages in pagecache directly when NFS clinets request
file-read.

6)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va03-zerocopy-nfsdreaddir-2.5.36.patch
nfsd_readdir can also send pages without copy.

7)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va04-zerocopy-shadowsock-2.5.36.patch
This patch makes per-cpu UDP sockets so that NFSD can send UDP frames on
each prosessor simultaneously.
Without the patch we can send only one UDP frame at the time as a UDP socket
have to be locked during sending some pages to serialize them.

8)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va05-zerocopy-nfsdwrite-2.5.36.patch
This patch enables NFS-write uses writev interface. NFSd can handle NFS
requests without reassembling IP fragments into one UDP frame.

9)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/taka-writev-2.5.36.patch
This patch makes writev for regular file work faster.
It also can be found at
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.35/2.5.35-mm1/broken-out/

Caution:
       XFS doesn't support writev interface yet. NFS write on XFS might
       slow down with No.8 patch. I wish SGI guys will implement it.

10)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va07-nfsbigbuf-2.5.36.patch
This makes NFS buffer much bigger (60KB).
60KB buffer is the same to 32KB buffer for linux-kernel as both of them
require 64KB chunk.


11)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va09-zerocopy-tempsendto-2.5.36.patch
If you don't want to use sendfile over UDP yet, you can apply it instead of No.1 and No.2 patches.



Regards,
Hirokazu Takahashi

^ permalink raw reply	[flat|nested] 47+ messages in thread
* RE: [NFS] Re: [PATCH] zerocopy NFS for 2.5.36
@ 2002-09-19  2:00 Lever, Charles
  0 siblings, 0 replies; 47+ messages in thread
From: Lever, Charles @ 2002-09-19  2:00 UTC (permalink / raw)
  To: 'Andrew Morton'
  Cc: David S. Miller, taka, neilb, linux-kernel, nfs, Alan Cox

dude, that's pretty cool.

if you were re-implementing XDR, you think a series of movl
instructions would be best?  i'm not sure how practical that
is for an architecture-independent implementation.

> > > It was discussed long ago that csum_and_copy_from_user() performs 
> > > better than plain copy_from_user() on x86.  I do not remember all
> > 
> > The better was a freak of PPro/PII scheduling I think
> > 
> > > details, but I do know that using copy_from_user() is not a real 
> > > improvement at least on x86 architecture.
> > 
> > The same as bit is easy to explain. Its totally memory bandwidth 
> > limited on current x86-32 processors. (Although I'd welcome 
> > demonstrations to the contrary on newer toys)
> 
> Nope.  There are distinct alignment problems with movsl-based 
> memcpy on PII and (at least) "Pentium III (Coppermine)", 
> which is tested here:
> 
> copy_32 uses movsl.  copy_duff just uses a stream of "movl"s
> 
> Time uncached-to-uncached memcpy, source and dest are 8-byte-aligned:
> 
> akpm:/usr/src/cptimer> ./cptimer -d -s     
> nbytes=10240  from_align=0, to_align=0
>     copy_32: copied 19.1 Mbytes in 0.078 seconds at 243.9 Mbytes/sec
> __copy_duff: copied 19.1 Mbytes in 0.090 seconds at 211.1 Mbytes/sec
> 
> OK, movsl wins.   But now give the source address 8+1 alignment:
> 
> akpm:/usr/src/cptimer> ./cptimer -d -s -f 1
> nbytes=10240  from_align=1, to_align=0
>     copy_32: copied 19.1 Mbytes in 0.158 seconds at 120.8 Mbytes/sec
> __copy_duff: copied 19.1 Mbytes in 0.091 seconds at 210.3 Mbytes/sec
> 
> The "movl"-based copy wins.  By miles.
> 
> Make the source 8+4 aligned:
> 
> akpm:/usr/src/cptimer> ./cptimer -d -s -f 4
> nbytes=10240  from_align=4, to_align=0
>     copy_32: copied 19.1 Mbytes in 0.134 seconds at 142.1 Mbytes/sec
> __copy_duff: copied 19.1 Mbytes in 0.089 seconds at 214.0 Mbytes/sec
> 
> So movl still beats movsl, by lots.
> 
> I have various scriptlets which generate the entire matrix.
> 
> I think I ended up deciding that we should use movsl _only_ 
> when both src and dsc are 8-byte-aligned.  And that when you 
> multiply the gain from that by the frequency*size with which 
> funny alignments are used by TCP the net gain was 2% or something.
> 
> It needs redoing.  These differences are really big, and this 
> is the kernel's most expensive function.
> 
> A little project for someone.
> 
> The tools are at http://www.zip.com.au/~akpm/linux/cptimer.tar.gz

^ permalink raw reply	[flat|nested] 47+ messages in thread
[parent not found: <3D89176B.40FFD09B@digeo.com.suse.lists.linux.kernel>]
* RE: [NFS] Re: [PATCH] zerocopy NFS for 2.5.36
@ 2002-10-16 14:04 Lever, Charles
  0 siblings, 0 replies; 47+ messages in thread
From: Lever, Charles @ 2002-10-16 14:04 UTC (permalink / raw)
  To: neilb; +Cc: taka, linux-kernel, nfs, 'David S. Miller'

> -----Original Message-----
> From: David S. Miller [mailto:davem@redhat.com]
> Sent: Wednesday, October 16, 2002 12:31 AM
>
>    From: Neil Brown <neilb@cse.unsw.edu.au>
>    Date: Wed, 16 Oct 2002 13:44:04 +1000
> 
>    Presumably on a sufficiently large SMP machine that this became an
>    issue, there would be multiple NICs.  Maybe it would make sense to
>    have one udp socket for each NIC.  Would that make sense? or work?
>    It feels to me to be cleaner than one for each CPU.
>    
> Doesn't make much sense.
> 
> Usually we are talking via one IP address, and thus over
> one device.  It could be using multiple NICs via BONDING,
> but that would be transparent to anything at the socket
> level.
> 
> Really, I think there is real value to making the socket
> per-cpu even on a 2 or 4 way system.

having a local socket per CPU is very good for SMP scaling.
it multiplies input buffer space, and reduces socket lock
and CPU cache contention.

sorry, i don't have measurements.

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2002-10-24 15:19 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-18  8:14 [PATCH] zerocopy NFS for 2.5.36 Hirokazu Takahashi
2002-09-18 23:00 ` David S. Miller
2002-09-18 23:54   ` Alan Cox
2002-09-19  0:16     ` Andrew Morton
2002-09-19  2:13       ` Aaron Lehmann
2002-09-19  3:30         ` Andrew Morton
2002-09-19 10:42           ` Alan Cox
2002-09-19 13:15       ` [NFS] " Hirokazu Takahashi
2002-09-19 20:42         ` Andrew Morton
2002-09-19 21:12           ` David S. Miller
2002-09-21 11:56   ` Pavel Machek
2002-10-14  5:50 ` Neil Brown
2002-10-14  6:15   ` David S. Miller
2002-10-14 10:45     ` kuznet
2002-10-14 10:48       ` David S. Miller
2002-10-14 12:01   ` Hirokazu Takahashi
2002-10-14 14:12     ` Andrew Theurer
2002-10-16  3:44     ` Neil Brown
2002-10-16  4:31       ` David S. Miller
2002-10-16 15:04         ` Andrew Theurer
2002-10-17  2:03         ` [NFS] " Andrew Theurer
2002-10-17  2:31           ` Hirokazu Takahashi
2002-10-17 13:16             ` Andrew Theurer
2002-10-17 13:26               ` Hirokazu Takahashi
2002-10-17 14:10                 ` Andrew Theurer
2002-10-17 16:26                   ` Hirokazu Takahashi
2002-10-18  5:38                     ` Trond Myklebust
2002-10-18  7:19                       ` Hirokazu Takahashi
2002-10-18 15:12                         ` Andrew Theurer
2002-10-19 20:34                           ` Hirokazu Takahashi
2002-10-22 21:16                             ` Andrew Theurer
2002-10-23  9:29                               ` Hirokazu Takahashi
2002-10-24 15:32                                 ` Andrew Theurer
2002-10-16 11:09       ` Hirokazu Takahashi
2002-10-16 17:02         ` kaza
2002-10-17  4:36           ` rddunlap
  -- strict thread matches above, loose matches on Subject: below --
2002-09-19  2:00 [NFS] " Lever, Charles
     [not found] <3D89176B.40FFD09B@digeo.com.suse.lists.linux.kernel>
     [not found] ` <20020919.221513.28808421.taka@valinux.co.jp.suse.lists.linux.kernel>
     [not found]   ` <3D8A36A5.846D806@digeo.com.suse.lists.linux.kernel>
2002-09-20  1:00     ` Andi Kleen
2002-09-20  1:09       ` Andrew Morton
2002-09-20  1:23         ` Andi Kleen
2002-09-20  1:27           ` David S. Miller
2002-09-20  2:06             ` Andi Kleen
2002-09-20  2:01               ` David S. Miller
2002-09-20  2:28                 ` Andi Kleen
2002-09-20  2:20                   ` David S. Miller
2002-09-20  2:35                     ` Andi Kleen
2002-10-16 14:04 Lever, Charles

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).