public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] TCP Zero Copy for mmapped files
@ 2002-12-30  1:09 Thomas Ogrisegg
  2002-12-30  1:29 ` Larry McVoy
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Ogrisegg @ 2002-12-30  1:09 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 529 bytes --]

The following patch (for 2.4.20 -- should work with all kernels
above 2.4.17) implements TCP Zero Copy for normal (writing)
socket operations on memory mapped files.

This is a major speedup for the TCP/IP stack (depending on the size
of the file more than 100% more throughput) and makes sendfile(2)
nearly useless.

BTW: When I did a (loopback) benchmark against my very own HTTP-
Server it outperformed TUX by roughly 6%. With logging disabled
by roughly 20%.

Please CC any replies to me, as I'm not subscribed to this list.

[-- Attachment #2: tcp.diff --]
[-- Type: text/plain, Size: 2685 bytes --]

--- linux.old/net/ipv4/tcp.c	Fri Nov 29 00:53:15 2002
+++ linux-2.4.20/net/ipv4/tcp.c	Sun Dec 29 20:30:10 2002
@@ -204,6 +204,7 @@
  *		Andi Kleen 	:	Make poll agree with SIGIO
  *	Salvatore Sanfilippo	:	Support SO_LINGER with linger == 1 and
  *					lingertime == 0 (RFC 793 ABORT Call)
+ *	Thomas Ogrisegg		:	Added TCP Zero Copy for mmapped files
  *					
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
@@ -1006,6 +1007,41 @@
 	return tmp;
 }
 
+static ssize_t file_send_actor (read_descriptor_t *desc, struct page *page,
+	unsigned long offset, unsigned long size)
+{
+	ssize_t res;
+	unsigned long count = desc->count;
+	struct sock *sk = (struct sock *) desc->buf;
+	int flags;
+
+	if (size > count)
+		size = count;
+
+	flags = (sk->socket->file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
+	if (size < count) flags |= MSG_MORE;
+
+#define TCP_ZC_CSUM_FLAGS (NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM)
+
+	if (!(sk->route_caps & NETIF_F_SG) ||
+		!(sk->route_caps & TCP_ZC_CSUM_FLAGS))
+		return sock_no_sendpage(sk->socket, page, offset, size, flags);
+
+#undef TCP_ZC_CSUM_FLAGS
+
+	TCP_CHECK_TIMER(sk);
+	res = do_tcp_sendpages(sk, &page, offset, size, flags);
+	TCP_CHECK_TIMER(sk);
+
+	if (res < 0) desc->error = res;
+	else {
+		desc->count -= res;
+		desc->written += res;
+	}
+
+	return res;
+}
+
 int tcp_sendmsg(struct sock *sk, struct msghdr *msg, int size)
 {
 	struct iovec *iov;
@@ -1015,6 +1051,7 @@
 	int mss_now;
 	int err, copied;
 	long timeo;
+	int has_sendpage = sk->socket->file->f_op->sendpage != NULL;
 
 	tp = &(sk->tp_pinfo.af_tcp);
 
@@ -1049,6 +1086,44 @@
 
 		iov++;
 
+		if (seglen >= PAGE_SIZE && has_sendpage) {
+			struct vm_area_struct *vma =
+				find_vma (current->mm, (long) from);
+			struct file *filp;
+
+			if (vma && (filp = vma->vm_file)) {
+				read_descriptor_t desc;
+				struct inode *in, *out;
+				loff_t pos = (long) from - vma->vm_start;
+
+				in  = filp->f_dentry->d_inode;
+				out = sk->socket->file->f_dentry->d_inode;
+
+				if (locks_verify_area (FLOCK_VERIFY_READ, in,
+					filp, filp->f_pos, seglen))
+					goto out_no_zero_copy;
+
+				if (locks_verify_area (FLOCK_VERIFY_WRITE, out,
+					sk->socket->file, 0, seglen))
+					goto out_no_zero_copy;
+
+				desc.written = 0;
+				desc.count   = seglen;
+				desc.buf     = (char *) sk;
+				desc.error   = 0;
+
+				do_generic_file_read (filp, &pos, &desc,
+					file_send_actor);
+
+				if (!desc.written) {
+					err = desc.error;
+					goto do_error;
+				}
+				copied += desc.written;
+				continue;
+			}
+		}
+out_no_zero_copy:
 		while (seglen > 0) {
 			int copy;
 			

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2002-12-30  1:09 [PATCH] TCP Zero Copy for mmapped files Thomas Ogrisegg
@ 2002-12-30  1:29 ` Larry McVoy
  2003-01-02  6:37   ` David S. Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Larry McVoy @ 2002-12-30  1:29 UTC (permalink / raw)
  To: Thomas Ogrisegg; +Cc: linux-kernel

How about putting this into a different function?  It's a lot to add
inline for a special case.

>  int tcp_sendmsg(struct sock *sk, struct msghdr *msg, int size)
>  {
>  	struct iovec *iov;
> @@ -1015,6 +1051,7 @@
>  	int mss_now;
>  	int err, copied;
>  	long timeo;
> +	int has_sendpage = sk->socket->file->f_op->sendpage != NULL;
>  
>  	tp = &(sk->tp_pinfo.af_tcp);
>  
> @@ -1049,6 +1086,44 @@
>  
>  		iov++;
>  
> +		if (seglen >= PAGE_SIZE && has_sendpage) {
> +			struct vm_area_struct *vma =
> +				find_vma (current->mm, (long) from);
> +			struct file *filp;
> +
> +			if (vma && (filp = vma->vm_file)) {
> +				read_descriptor_t desc;
> +				struct inode *in, *out;
> +				loff_t pos = (long) from - vma->vm_start;
> +
> +				in  = filp->f_dentry->d_inode;
> +				out = sk->socket->file->f_dentry->d_inode;
> +
> +				if (locks_verify_area (FLOCK_VERIFY_READ, in,
> +					filp, filp->f_pos, seglen))
> +					goto out_no_zero_copy;
> +
> +				if (locks_verify_area (FLOCK_VERIFY_WRITE, out,
> +					sk->socket->file, 0, seglen))
> +					goto out_no_zero_copy;
> +
> +				desc.written = 0;
> +				desc.count   = seglen;
> +				desc.buf     = (char *) sk;
> +				desc.error   = 0;
> +
> +				do_generic_file_read (filp, &pos, &desc,
> +					file_send_actor);
> +
> +				if (!desc.written) {
> +					err = desc.error;
> +					goto do_error;
> +				}
> +				copied += desc.written;
> +				continue;
> +			}
> +		}
> +out_no_zero_copy:
>  		while (seglen > 0) {
>  			int copy;
>  			


-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2002-12-30  1:29 ` Larry McVoy
@ 2003-01-02  6:37   ` David S. Miller
  2003-01-02 22:12     ` Thomas Ogrisegg
  0 siblings, 1 reply; 19+ messages in thread
From: David S. Miller @ 2003-01-02  6:37 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Thomas Ogrisegg, linux-kernel

On Sun, 2002-12-29 at 17:29, Larry McVoy wrote:
> How about putting this into a different function?  It's a lot to add
> inline for a special case.

This patch also has a ton of other problems:

1) Does not handle writes that straddle multiple VMAs
2) We do not want to encourage people to use this mmap
   scheme anyways.  The mmap way consumes precious VM
   space, whereas the sendfile scheme does not.
3) Finally, I'm very dubious about the "this is faster than
   TUX claim".  Firstly because you've not provided your
   self-made HTTP server so that others can try to reproduce
   your benchmark.  And secondly because you haven't indicated
   if your self-made HTTP server is as full featured as TUX or
   not.  And thirdly you haven't indicated what happens if in
   parallel clients ask to be served more files than you could
   mmap fit into the HTTP server processes address space (ie. see
   #2)

So I think this patch stinks :)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-02  6:37   ` David S. Miller
@ 2003-01-02 22:12     ` Thomas Ogrisegg
  2003-01-02 22:28       ` Larry McVoy
  2003-01-02 23:13       ` David S. Miller
  0 siblings, 2 replies; 19+ messages in thread
From: Thomas Ogrisegg @ 2003-01-02 22:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, Larry McVoy

On Wed, Jan 01, 2003 at 10:37:01PM -0800, David S. Miller wrote:
> On Sun, 2002-12-29 at 17:29, Larry McVoy wrote:
> > How about putting this into a different function?  It's a lot to add
> > inline for a special case.

All right.

> 1) Does not handle writes that straddle multiple VMAs

What exactly do you mean? In my test, files larger than a
page were handled perfectly, as well.

> 2) We do not want to encourage people to use this mmap
>    scheme anyways.  The mmap way consumes precious VM
>    space, whereas the sendfile scheme does not.

Is that the answer to my "sendfile is now obsolete"?

Sure we cannot remove sendfile now, as some applications
depends on it, but that's not what I wanted.

I made this patch, so that _portable_ applications (and sendfile
is miles away from beeing portable - even if the target has a
sendfile systemcall, its highly unlikely that it has the same
semantics as Linux' sendfile) are sped up.

However, I didn't like the VM waste either, but I believe there
is no other way.

> 3) Finally, I'm very dubious about the "this is faster than
>    TUX claim".  Firstly because you've not provided your
>    self-made HTTP server so that others can try to reproduce
>    your benchmark.  And secondly because you haven't indicated
>    if your self-made HTTP server is as full featured as TUX or
>    not.  And thirdly you haven't indicated what happens if in
>    parallel clients ask to be served more files than you could
>    mmap fit into the HTTP server processes address space (ie. see
>    #2)

Hehe. In fact that wasn't a really serious claim. My tests
were (as explicitly stated by me) done over the Loopback-
Interface. And as far as I know TUX can handle interrupts
from the network card directly, which probably makes it by
far faster.

As I neither have the time nor the infrastructure to do a real
test, I cannot really say whether TUX or my (currently unreleased)
Webserver is faster.

BTW: My webservers maps files only once, so there shouldn't be
a problem with parallel transfers.

> So I think this patch stinks :)

But it worked? If I didn't misunderstood #1 then I don't see a
problem for integrating it into the current kernel.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-02 22:12     ` Thomas Ogrisegg
@ 2003-01-02 22:28       ` Larry McVoy
  2003-01-02 23:20         ` Alan Cox
  2003-01-02 23:13       ` David S. Miller
  1 sibling, 1 reply; 19+ messages in thread
From: Larry McVoy @ 2003-01-02 22:28 UTC (permalink / raw)
  To: Thomas Ogrisegg; +Cc: David S. Miller, linux-kernel, Larry McVoy

> > 1) Does not handle writes that straddle multiple VMAs
> 
> What exactly do you mean? In my test, files larger than a
> page were handled perfectly, as well.

	mmap(file1 at location [a,b)
	mmap(file2 at location [b,c)
	write(sock, a, (size_t)(c - a));

> However, I didn't like the VM waste either, but I believe there
> is no other way.

The VM cost hurts.  Badly.  Imagine that the network costs ZERO.  Then
the map/unmap/vm ops become the dominating term.  That's why it is a
fruitless approach, it still has a practical limit which is too low.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-02 22:12     ` Thomas Ogrisegg
  2003-01-02 22:28       ` Larry McVoy
@ 2003-01-02 23:13       ` David S. Miller
  2003-01-03  0:45         ` Thomas Ogrisegg
  1 sibling, 1 reply; 19+ messages in thread
From: David S. Miller @ 2003-01-02 23:13 UTC (permalink / raw)
  To: tom; +Cc: linux-kernel, lm

   From: Thomas Ogrisegg <tom@rhadamanthys.org>
   Date: Thu, 2 Jan 2003 23:12:11 +0100

   > 1) Does not handle writes that straddle multiple VMAs
   
   What exactly do you mean?

If I mmap two areas 1 right after another, then do a write
of comprising of those two areas, your code will only lookup
one of the VMAs.

It's a bug.
   
   > 2) We do not want to encourage people to use this mmap
   >    scheme anyways.  The mmap way consumes precious VM
   >    space, whereas the sendfile scheme does not.
   
   Is that the answer to my "sendfile is now obsolete"?
   
It is a "this patch is unacceptable because" comment.

   Sure we cannot remove sendfile now, as some applications
   depends on it, but that's not what I wanted.
   
That's not what I'm talking about.  I'm saying, making this
mmap thing available makes no sense at all.

   I made this patch, so that _portable_ applications (and sendfile
   is miles away from beeing portable - even if the target has a
   sendfile systemcall, its highly unlikely that it has the same
   semantics as Linux' sendfile) are sped up.
   
This isn't a priority for us.  People who want the best possible
performance can code their apps up to take advantage of sendfile()
on systems that have it.  (and really, show me how many systems
lack a sendfile mechanism these days).

   However, I didn't like the VM waste either, but I believe there
   is no other way.
   
There is a way, convert to sendfile.

   Hehe. In fact that wasn't a really serious claim.

Then don't make such claims.

   > So I think this patch stinks :)
   
   But it worked? If I didn't misunderstood #1 then I don't see a
   problem for integrating it into the current kernel.
   
I think you need to rethink the multiple VMA case in #1, and
also understand why I don't want this facility in the tree
at all anyways.  Apps can convert to sendfile(), and as a result
they'll get improved performance on ALL linux kernels, not just
the ones with your special patch applied.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-02 23:20         ` Alan Cox
@ 2003-01-02 23:16           ` David S. Miller
  2003-01-03  0:56             ` Alan Cox
  0 siblings, 1 reply; 19+ messages in thread
From: David S. Miller @ 2003-01-02 23:16 UTC (permalink / raw)
  To: alan; +Cc: lm, tom, linux-kernel

   From: Alan Cox <alan@lxorguk.ukuu.org.uk>
   Date: 02 Jan 2003 23:20:44 +0000

   On Thu, 2003-01-02 at 22:28, Larry McVoy wrote:
   > The VM cost hurts.  Badly.  Imagine that the network costs ZERO.  Then
   > the map/unmap/vm ops become the dominating term.  That's why it is a
   > fruitless approach, it still has a practical limit which is too low.
   
   It depends how predictable your content is. With a 64bit box and a porn
   server its probably quite tidy
   
Let's say you have infinite VM (which is what 64-bit almost is :) then
the cost is setting up all of these useless VMAs for each and every
file (which is a 1 time cost, ok), and also the VMA lookup each
write() call.

With sendfile() all of this goes straight to the page cache directly
without a VMA lookup.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-02 22:28       ` Larry McVoy
@ 2003-01-02 23:20         ` Alan Cox
  2003-01-02 23:16           ` David S. Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Alan Cox @ 2003-01-02 23:20 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Thomas Ogrisegg, David S. Miller, Linux Kernel Mailing List

On Thu, 2003-01-02 at 22:28, Larry McVoy wrote:
> The VM cost hurts.  Badly.  Imagine that the network costs ZERO.  Then
> the map/unmap/vm ops become the dominating term.  That's why it is a
> fruitless approach, it still has a practical limit which is too low.

It depends how predictable your content is. With a 64bit box and a porn
server its probably quite tidy


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-02 23:13       ` David S. Miller
@ 2003-01-03  0:45         ` Thomas Ogrisegg
  2003-01-03  1:01           ` Larry McVoy
                             ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Thomas Ogrisegg @ 2003-01-03  0:45 UTC (permalink / raw)
  To: David S. Miller; +Cc: tom, linux-kernel, lm

On Thu, Jan 02, 2003 at 03:13:46PM -0800, David S. Miller wrote:
>    From: Thomas Ogrisegg <tom@rhadamanthys.org>
>    Date: Thu, 2 Jan 2003 23:12:11 +0100
> 
> It's a bug.

I see. Ok, that can be fixed easily.

>    Sure we cannot remove sendfile now, as some applications
>    depends on it, but that's not what I wanted.
>    
> That's not what I'm talking about.  I'm saying, making this
> mmap thing available makes no sense at all.

No. For portable applications it makes great sense.

>    I made this patch, so that _portable_ applications (and sendfile
>    is miles away from beeing portable - even if the target has a
>    sendfile systemcall, its highly unlikely that it has the same
>    semantics as Linux' sendfile) are sped up.
>    
> This isn't a priority for us.  People who want the best possible
> performance can code their apps up to take advantage of sendfile()
> on systems that have it.

So you want to chain people to your "propritaery solution"?

> (and really, show me how many systems
> lack a sendfile mechanism these days).

What kind of systems are you talking about? Operating systems?
Nearly all.

>    However, I didn't like the VM waste either, but I believe there
>    is no other way.
>    
> There is a way, convert to sendfile.

It might be a bit difficult to convert all applications to
sendfile. Especially those for which you don't have the
source code.

>    But it worked? If I didn't misunderstood #1 then I don't see a
>    problem for integrating it into the current kernel.
>    
> I think you need to rethink the multiple VMA case in #1, and
> also understand why I don't want this facility in the tree
> at all anyways.  Apps can convert to sendfile(), and as a result
> they'll get improved performance on ALL linux kernels, not just
> the ones with your special patch applied.

I don't see your point. Applications which really need the
performance will switch to sendfile anyway because of the
problems with mmap, you mentioned.

My patch is very simple and takes less than 1KB of code but
will speed up many applications and doesn't have a real
drawback (except when sending "normal" data which is larger
than a page - but that shouldn't happen very often).

Yet another advantage of my version is that you can use it
in conjunction with writev.

Unfortunately the linux-sendfile is not as good as the HP-UX
one. Under HP-UX you can define a "struct iovec" header to
be sent before the file is sent.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-02 23:16           ` David S. Miller
@ 2003-01-03  0:56             ` Alan Cox
  2003-01-03  2:40               ` David S. Miller
  2003-01-03  2:41               ` Linus Torvalds
  0 siblings, 2 replies; 19+ messages in thread
From: Alan Cox @ 2003-01-03  0:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: lm, tom, Linux Kernel Mailing List

On Thu, 2003-01-02 at 23:16, David S. Miller wrote:    
>    It depends how predictable your content is. With a 64bit box and a porn
>    server its probably quite tidy
>    
> Let's say you have infinite VM (which is what 64-bit almost is :) then
> the cost is setting up all of these useless VMAs for each and every
> file (which is a 1 time cost, ok), and also the VMA lookup each
> write() call.
> 
> With sendfile() all of this goes straight to the page cache directly
> without a VMA lookup.

With a nasty unpleasant splat the moment you do modification on the
content at all. For static objects sendfile is certainly superior,


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-03  0:45         ` Thomas Ogrisegg
@ 2003-01-03  1:01           ` Larry McVoy
  2003-01-03  1:59             ` Alan Cox
  2003-01-03  1:56           ` Alan Cox
  2003-01-03  2:42           ` David S. Miller
  2 siblings, 1 reply; 19+ messages in thread
From: Larry McVoy @ 2003-01-03  1:01 UTC (permalink / raw)
  To: Thomas Ogrisegg; +Cc: David S. Miller, linux-kernel, lm

> It might be a bit difficult to convert all applications to
> sendfile. Especially those for which you don't have the
> source code.

And the list of applications which do

	sock = socket(...);
	map = mmap(...);
	write(sock, map, bytes);

are?  There are not very many that I know of and if you look carefully
at the bandwidth graphs in LMbench you'll see why.  There is a cross
over point where mmap becomes cheaper but it used to be around 16-64K.
I don't know what it is now, I doubt it's moved much.  I can check if
you really want.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-03  1:56           ` Alan Cox
@ 2003-01-03  1:27             ` Larry McVoy
  0 siblings, 0 replies; 19+ messages in thread
From: Larry McVoy @ 2003-01-03  1:27 UTC (permalink / raw)
  To: Alan Cox; +Cc: Thomas Ogrisegg, David S. Miller, Linux Kernel Mailing List, lm

On Fri, Jan 03, 2003 at 01:56:27AM +0000, Alan Cox wrote:
> On Fri, 2003-01-03 at 00:45, Thomas Ogrisegg wrote:
> > Unfortunately the linux-sendfile is not as good as the HP-UX
> > one. Under HP-UX you can define a "struct iovec" header to
> > be sent before the file is sent.
> 
> Thats a design decision. With TCP_CORK and sensible syscall performance
> those kind of web specific hacks are not appropriate

Indeed.  In case Alan's message wasn't clear: if your syscall overhead
is zero then many "optimizations" become superfluous.  In fact, those
optimizations, one cache miss at a time, tend to be a big part of what
makes the syscall layer so heavyweight.

Linux is amazing in that it is basically the only real operating system
I know of that has stayed so focussed on making the syscall layer be
almost invisible.  it's worth a "rah rah" because you can use the 
operating system like it was libc, there is basically very little 
cost in crossing in/out.

Here's the LMbench context switch benchmark running on a 1.6Ghz Athlon:

load free cach swap pgin  pgou dk0 dk1 dk2 dk3 ipkt opkt  int  ctx  usr sys idl
0.67  73M 577M  25M   0     0    0   0   0   0  4.0  2.0  107  548K  23  77   0
0.67  73M 577M  25M   0     0    0   0   0   0  2.0  2.0  105  549K  19  81   0
0.67  73M 577M  25M   0     0    0   0   0   0  4.0  2.0  107  549K  27  73   0
0.70  73M 577M  25M   0     0    0   0   0   0  2.0  2.0  105  548K  23  77   0

Yeah, that's more than a half a million context switchs/second and each
of those include 2 system calls.  So Linux is doing 2 system calls and
a context switch in 1.8 microseconds.

When you can get in and out of the kernel that fast, your thinking should 
change.  You get to use the kernel more freely.  And you certainly don't
want to do anything to screw that up.  My hat is off to Linus and team 
for working so hard to make these numbers be so good (and keep on working,
see the recent syscall discussion).
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-03  0:45         ` Thomas Ogrisegg
  2003-01-03  1:01           ` Larry McVoy
@ 2003-01-03  1:56           ` Alan Cox
  2003-01-03  1:27             ` Larry McVoy
  2003-01-03  2:42           ` David S. Miller
  2 siblings, 1 reply; 19+ messages in thread
From: Alan Cox @ 2003-01-03  1:56 UTC (permalink / raw)
  To: Thomas Ogrisegg; +Cc: David S. Miller, Linux Kernel Mailing List, lm

On Fri, 2003-01-03 at 00:45, Thomas Ogrisegg wrote:
> Unfortunately the linux-sendfile is not as good as the HP-UX
> one. Under HP-UX you can define a "struct iovec" header to
> be sent before the file is sent.

Thats a design decision. With TCP_CORK and sensible syscall performance
those kind of web specific hacks are not appropriate


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-03  1:01           ` Larry McVoy
@ 2003-01-03  1:59             ` Alan Cox
  2003-01-06 14:36               ` Gianni Tedesco
  0 siblings, 1 reply; 19+ messages in thread
From: Alan Cox @ 2003-01-03  1:59 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Thomas Ogrisegg, David S. Miller, Linux Kernel Mailing List

On Fri, 2003-01-03 at 01:01, Larry McVoy wrote:
> And the list of applications which do
> 
> 	sock = socket(...);
> 	map = mmap(...);
> 	write(sock, map, bytes);
> 
> are?  There are not very many that I know of and if you look carefully
> at the bandwidth graphs in LMbench you'll see why.  There is a cross
> over point where mmap becomes cheaper but it used to be around 16-64K.
> I don't know what it is now, I doubt it's moved much.  I can check if
> you really want.

You may not be doing an mmap a send, its more likely to look like

	page = hash(url);
	memcpy(current_time, page->clock, TIMESIZE);
	write(sock, page->data, page->len);

that changes the breakeven point a lot

Alan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-03  0:56             ` Alan Cox
@ 2003-01-03  2:40               ` David S. Miller
  2003-01-03  2:41               ` Linus Torvalds
  1 sibling, 0 replies; 19+ messages in thread
From: David S. Miller @ 2003-01-03  2:40 UTC (permalink / raw)
  To: alan; +Cc: lm, tom, linux-kernel

   From: Alan Cox <alan@lxorguk.ukuu.org.uk>
   Date: 03 Jan 2003 00:56:59 +0000

   On Thu, 2003-01-02 at 23:16, David S. Miller wrote:    
   > With sendfile() all of this goes straight to the page cache directly
   > without a VMA lookup.
   
   With a nasty unpleasant splat the moment you do modification on the
   content at all. For static objects sendfile is certainly superior,
   
Sendfile does not protect against content changes to the
file contents.  We don't lock the pages, we merely grab
references to them for the network I/O.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-03  0:56             ` Alan Cox
  2003-01-03  2:40               ` David S. Miller
@ 2003-01-03  2:41               ` Linus Torvalds
  1 sibling, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2003-01-03  2:41 UTC (permalink / raw)
  To: linux-kernel

In article <1041555419.24901.86.camel@irongate.swansea.linux.org.uk>,
Alan Cox  <alan@lxorguk.ukuu.org.uk> wrote:
>On Thu, 2003-01-02 at 23:16, David S. Miller wrote:    
>> 
>> With sendfile() all of this goes straight to the page cache directly
>> without a VMA lookup.
>
>With a nasty unpleasant splat the moment you do modification on the
>content at all. For static objects sendfile is certainly superior,

Oh, the "unpleasant splat" happens with the mmap approach too, there's
no avoiding it. It can happen with a regular "read()" loop too (if the
read happens at the wrong time).

Both mmap and sendfile have the issue that the "splat" can happen every
time, while a read() into a private area means that the splat can only
happen the first time the web server caches the content.  But the read
into a private area is also obviously the worst one from a performance
standpoint. 

There are two ways to avoid the splat:
 - lock the file some way before reading/writing to it.
 - do all updates to a temp-file, and move the temp-file to the new location.

Those two approaches will fix the "splat" problem _regardless_ of what
IO mechanism you use. With that in mind, sendfile() is clearly the one
that performs best by far, so..

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-03  0:45         ` Thomas Ogrisegg
  2003-01-03  1:01           ` Larry McVoy
  2003-01-03  1:56           ` Alan Cox
@ 2003-01-03  2:42           ` David S. Miller
  2 siblings, 0 replies; 19+ messages in thread
From: David S. Miller @ 2003-01-03  2:42 UTC (permalink / raw)
  To: tom; +Cc: linux-kernel, lm

   From: Thomas Ogrisegg <tom@rhadamanthys.org>
   Date: Fri, 3 Jan 2003 01:45:43 +0100

   > This isn't a priority for us.  People who want the best possible
   > performance can code their apps up to take advantage of sendfile()
   > on systems that have it.
   
   So you want to chain people to your "propritaery solution"?
   
I don't hide my APIs.

   > (and really, show me how many systems
   > lack a sendfile mechanism these days).
   
   What kind of systems are you talking about? Operating systems?
   Nearly all.
   
HPUX has it, Solaris has it, Microsoft has something very similar,
FreeBSD has it as does I believe NetBSD.  Show me the exceptions.

   It might be a bit difficult to convert all applications to
   sendfile. Especially those for which you don't have the
   source code.
   
If the performance really must be top notch, someone will invest
the time for a given application.  Otherwise, if it's not that
important enough to port why should it be important enough to put
a hack into the OS for it?

   I don't see your point. Applications which really need the
   performance will switch to sendfile anyway because of the
   problems with mmap, you mentioned.
   
Right, so why bother with your patch?

   My patch is very simple and takes less than 1KB of code but
   will speed up many applications and doesn't have a real
   drawback (except when sending "normal" data which is larger
   than a page - but that shouldn't happen very often).
   
What about the extra checks you are placing in a fast path?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-03  1:59             ` Alan Cox
@ 2003-01-06 14:36               ` Gianni Tedesco
  2003-01-06 23:29                 ` David S. Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Gianni Tedesco @ 2003-01-06 14:36 UTC (permalink / raw)
  To: Alan Cox
  Cc: Larry McVoy, Thomas Ogrisegg, David S. Miller,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 602 bytes --]

On Fri, 2003-01-03 at 01:59, Alan Cox wrote:
> You may not be doing an mmap a send, its more likely to look like
> 
> 	page = hash(url);
> 	memcpy(current_time, page->clock, TIMESIZE);
> 	write(sock, page->data, page->len);

If your web data rarely changes, it could also be all the files stored
in a hashfile database covered by one large mmap, eliminating filesystem
overhead (and vma overhead).

-- 
// Gianni Tedesco (gianni at scaramanga dot co dot uk)
lynx --source www.scaramanga.co.uk/gianni-at-ecsc.asc | gpg --import
8646BE7D: 6D9F 2287 870E A2C9 8F60 3A3C 91B5 7669 8646 BE7D

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] TCP Zero Copy for mmapped files
  2003-01-06 14:36               ` Gianni Tedesco
@ 2003-01-06 23:29                 ` David S. Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David S. Miller @ 2003-01-06 23:29 UTC (permalink / raw)
  To: gianni; +Cc: alan, lm, tom, linux-kernel

   From: Gianni Tedesco <gianni@ecsc.co.uk>
   Date: 06 Jan 2003 14:36:19 +0000
   
   If your web data rarely changes, it could also be all the files stored
   in a hashfile database covered by one large mmap, eliminating filesystem
   overhead (and vma overhead).

You still would eat a VMA lookup each and every send.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-01-06 23:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-30  1:09 [PATCH] TCP Zero Copy for mmapped files Thomas Ogrisegg
2002-12-30  1:29 ` Larry McVoy
2003-01-02  6:37   ` David S. Miller
2003-01-02 22:12     ` Thomas Ogrisegg
2003-01-02 22:28       ` Larry McVoy
2003-01-02 23:20         ` Alan Cox
2003-01-02 23:16           ` David S. Miller
2003-01-03  0:56             ` Alan Cox
2003-01-03  2:40               ` David S. Miller
2003-01-03  2:41               ` Linus Torvalds
2003-01-02 23:13       ` David S. Miller
2003-01-03  0:45         ` Thomas Ogrisegg
2003-01-03  1:01           ` Larry McVoy
2003-01-03  1:59             ` Alan Cox
2003-01-06 14:36               ` Gianni Tedesco
2003-01-06 23:29                 ` David S. Miller
2003-01-03  1:56           ` Alan Cox
2003-01-03  1:27             ` Larry McVoy
2003-01-03  2:42           ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox