netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tcp: allow splice() to build full TSO packets
@ 2012-04-03 19:37 Eric Dumazet
  2012-04-03 21:21 ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-04-03 19:37 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Neal Cardwell, Tom Herbert, Yuchung Cheng, H.K. Jerry Chu,
	Maciej Żenczykowski, Mahesh Bandewar, Ilpo Järvinen,
	Nandita Dukkipati

vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
---
 net/ipv4/tcp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index cfd7edd..2ff6f45 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -860,7 +860,7 @@ wait_for_memory:
 	}
 
 out:
-	if (copied)
+	if (copied && !(flags & MSG_MORE))
 		tcp_push(sk, flags, mss_now, tp->nonagle);
 	return copied;
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-04-06  1:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-03 19:37 [PATCH] tcp: allow splice() to build full TSO packets Eric Dumazet
2012-04-03 21:21 ` David Miller
2012-04-03 21:31   ` Eric Dumazet
2012-04-03 21:36     ` David Miller
2012-04-03 22:50       ` David Miller
2012-04-04  6:15         ` Eric Dumazet
2012-04-04  6:20           ` David Miller
2012-04-05 13:05       ` Eric Dumazet
2012-04-05 23:05         ` David Miller
2012-04-06  1:59           ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).