* TCP throughput drops sharply around MTU of 180 bytes
@ 2009-11-14 8:28 Ang Way Chuang
2009-11-14 8:53 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Ang Way Chuang @ 2009-11-14 8:28 UTC (permalink / raw)
To: netdev
Hi kernel networking gurus,
I came across a situation where I need to evaluate the packing
efficiency of
small packets. Because TCP is byte oriented protocol, there is no way
to force it
to generate small packets, so I forced it by adjusting MTU.
However, through the experiment, I found that measured throughput
of TCP stream dropped significantly if MTU is set to value less than 180 bytes.
At first, I thought it must be some bugs on my code. So, I decided to repeat
the test over a dedicated 10Mbps Ethernet link. The results that I measured
over Ethernet is shown below:
MTU Throughput (Mbps)
----- --------------
181 4.50
180 4.39
179 3.05
178 3.04
I used iperf throughout the tests. Can someone enlighten me on this matter?
Why the throughput drops sharply if MTU is less than 180 bytes? Is there some
special meaning associated with 180 that I don't know of?
Some basic information about my system:
kernel version: 2.6.27.5-117.fc10.x86_64
distro: fedora 10
CPU: Pentium 4 3.00 GHz
Should you need more information, I shall gladly cooperate. Please keep me
in the loop because it is hard for me to cop with the volume of emails on netdev
mailing list.
Thank you in advance.
Regards,
Ang Way Chuang
^ permalink raw reply [flat|nested] 7+ messages in thread
* TCP throughput drops sharply around MTU of 180 bytes
@ 2009-11-14 8:48 Ang Way Chuang
0 siblings, 0 replies; 7+ messages in thread
From: Ang Way Chuang @ 2009-11-14 8:48 UTC (permalink / raw)
To: netdev
Hi kernel networking gurus,
I came across a situation where I need to evaluate the packing
efficiency of small packets. Because TCP is byte oriented protocol,
there is no way
to force it to generate small packets, so I forced it by adjusting MTU.
However, through the experiment, I found that measured throughput
of TCP stream dropped significantly if MTU is set to value less than 180 bytes.
At first, I thought it must be some bugs on my code. So, I decided to repeat
the test over a dedicated 10Mbps Ethernet link. The results that I measured
over Ethernet is shown below:
MTU Throughput (Mbps)
----- --------------
181 4.50
180 4.39
179 3.05
178 3.04
I used iperf throughout the tests. Can someone enlighten me on this matter?
Why the throughput drops sharply if MTU is less than 180 bytes? Is there some
special meaning associated with 180 that I don't know of?
Some basic information about my system:
kernel version: 2.6.27.5-117.fc10.x86_64
distro: fedora 10
CPU: Pentium 4 3.00 GHz
Should you need more information, I shall gladly cooperate. Please keep me
in the loop because it is hard for me to cop with the volume of emails on netdev
mailing list.
Thank you in advance.
Regards,
Ang Way Chuang
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: TCP throughput drops sharply around MTU of 180 bytes
2009-11-14 8:28 TCP throughput drops sharply around MTU of 180 bytes Ang Way Chuang
@ 2009-11-14 8:53 ` Eric Dumazet
2009-11-14 9:12 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-11-14 8:53 UTC (permalink / raw)
To: Ang Way Chuang; +Cc: netdev
Ang Way Chuang a écrit :
> Hi kernel networking gurus,
>
> I came across a situation where I need to evaluate the packing
> efficiency of
> small packets. Because TCP is byte oriented protocol, there is no way
> to force it
> to generate small packets, so I forced it by adjusting MTU.
>
> However, through the experiment, I found that measured throughput
> of TCP stream dropped significantly if MTU is set to value less than 180 bytes.
> At first, I thought it must be some bugs on my code. So, I decided to repeat
> the test over a dedicated 10Mbps Ethernet link. The results that I measured
> over Ethernet is shown below:
>
>
> MTU Throughput (Mbps)
> ----- --------------
> 181 4.50
> 180 4.39
> 179 3.05
> 178 3.04
>
> I used iperf throughout the tests. Can someone enlighten me on this matter?
> Why the throughput drops sharply if MTU is less than 180 bytes? Is there some
> special meaning associated with 180 that I don't know of?
>
> Some basic information about my system:
> kernel version: 2.6.27.5-117.fc10.x86_64
> distro: fedora 10
> CPU: Pentium 4 3.00 GHz
>
> Should you need more information, I shall gladly cooperate. Please keep me
> in the loop because it is hard for me to cop with the volume of emails on netdev
> mailing list.
>
Oh well, this reminds me TCP_MAXSEG doesnt work as expected...
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: TCP throughput drops sharply around MTU of 180 bytes
2009-11-14 8:53 ` Eric Dumazet
@ 2009-11-14 9:12 ` Eric Dumazet
2009-11-14 9:24 ` Ang Way Chuang
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-11-14 9:12 UTC (permalink / raw)
To: Ang Way Chuang; +Cc: netdev
Eric Dumazet a écrit :
>
> Oh well, this reminds me TCP_MAXSEG doesnt work as expected...
Oops, this problem I had few years ago was fixed last year in 2.6.27, sorry ...
commit f5fff5dc8a7a3f395b0525c02ba92c95d42b7390
Author: Tom Quetchenbach <virtualphtn@gmail.com>
Date: Sun Sep 21 00:21:51 2008 -0700
tcp: advertise MSS requested by user
I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS
for both sides of a bidirectional connection.
man tcp says: "If this option is set before connection establishment, it
also changes the MSS value announced to the other end in the initial
packet."
However, the kernel only uses the MTU/route cache to set the advertised
MSS. That means if I set the MSS to, say, 500 before calling connect(),
I will send at most 500-byte packets, but I will still receive 1500-byte
packets in reply.
This is a bug, either in the kernel or the documentation.
This patch (applies to latest net-2.6) reduces the advertised value to
that requested by the user as long as setsockopt() is called before
connect() or accept(). This seems like the behavior that one would
expect as well as that which is documented.
I've tried to make sure that things that depend on the advertised MSS
are set correctly.
Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: TCP throughput drops sharply around MTU of 180 bytes
2009-11-14 9:12 ` Eric Dumazet
@ 2009-11-14 9:24 ` Ang Way Chuang
2009-11-14 9:57 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Ang Way Chuang @ 2009-11-14 9:24 UTC (permalink / raw)
To: Eric Dumazet, netdev
I assume that ip route .... advmss achieves that effect? I tried it,
but it doesn't work.
On Sat, Nov 14, 2009 at 5:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Eric Dumazet a écrit :
>>
>> Oh well, this reminds me TCP_MAXSEG doesnt work as expected...
>
> Oops, this problem I had few years ago was fixed last year in 2.6.27, sorry ...
>
> commit f5fff5dc8a7a3f395b0525c02ba92c95d42b7390
> Author: Tom Quetchenbach <virtualphtn@gmail.com>
> Date: Sun Sep 21 00:21:51 2008 -0700
>
> tcp: advertise MSS requested by user
>
> I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS
> for both sides of a bidirectional connection.
>
> man tcp says: "If this option is set before connection establishment, it
> also changes the MSS value announced to the other end in the initial
> packet."
>
> However, the kernel only uses the MTU/route cache to set the advertised
> MSS. That means if I set the MSS to, say, 500 before calling connect(),
> I will send at most 500-byte packets, but I will still receive 1500-byte
> packets in reply.
>
> This is a bug, either in the kernel or the documentation.
>
> This patch (applies to latest net-2.6) reduces the advertised value to
> that requested by the user as long as setsockopt() is called before
> connect() or accept(). This seems like the behavior that one would
> expect as well as that which is documented.
>
> I've tried to make sure that things that depend on the advertised MSS
> are set correctly.
>
> Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: TCP throughput drops sharply around MTU of 180 bytes
2009-11-14 9:24 ` Ang Way Chuang
@ 2009-11-14 9:57 ` Eric Dumazet
2009-11-14 10:19 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2009-11-14 9:57 UTC (permalink / raw)
To: Ang Way Chuang; +Cc: netdev
Ang Way Chuang a écrit :
> I assume that ip route .... advmss achieves that effect? I tried it,
> but it doesn't work.
>
It should. Please provide your kernel version and what you tried exactly.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: TCP throughput drops sharply around MTU of 180 bytes
2009-11-14 9:57 ` Eric Dumazet
@ 2009-11-14 10:19 ` Eric Dumazet
0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2009-11-14 10:19 UTC (permalink / raw)
To: Ang Way Chuang; +Cc: netdev
Eric Dumazet a écrit :
> Ang Way Chuang a écrit :
>> I assume that ip route .... advmss achieves that effect? I tried it,
>> but it doesn't work.
>>
>
> It should. Please provide your kernel version and what you tried exactly.
>
>
Here a tcpdump of an iperf -M 120 session on current net-next-2.6 kernel
iperf -M 120 -c 192.168.0.1 -m
WARNING: attempt to set TCP maximum segment size to 120, but got 536
------------------------------------------------------------
Client connecting to 192.168.0.1, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.2 port 48178 connected with 192.168.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 26.4 MBytes 22.1 Mbits/sec
[ 3] MSS size 108 bytes (MTU 148 bytes, unknown interface)
hints :
1) Use iperf -M 120 flag on both sides
2) disable tso/gso on sender
ethtool -K eth0 tso off gso off
11:15:17.052202 IP 192.168.0.2.48177 > 192.168.0.1.5001: S 3914872183:3914872183(0) win 480 <mss 120,nop,nop,timestamp 63768988 0,nop,wscale 6>
11:15:17.052291 IP 192.168.0.1.5001 > 192.168.0.2.48177: S 2771326659:2771326659(0) ack 3914872184 win 432 <mss 120,nop,nop,timestamp 580923920 63768988,nop,wscale 6>
11:15:17.052494 IP 192.168.0.2.48177 > 192.168.0.1.5001: . ack 1 win 8 <nop,nop,timestamp 63768988 580923920>
11:15:17.052514 IP 192.168.0.2.48177 > 192.168.0.1.5001: P 1:25(24) ack 1 win 8 <nop,nop,timestamp 63768988 580923920>
11:15:17.052532 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 25:133(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923920>
11:15:17.052534 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 133:241(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923920>
11:15:17.052537 IP 192.168.0.2.48177 > 192.168.0.1.5001: P 241:349(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923920>
11:15:17.052612 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 25 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052621 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 349:457(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052624 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 133 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052629 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 457:565(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052631 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 349 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052635 IP 192.168.0.2.48177 > 192.168.0.1.5001: P 565:673(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052638 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 673:781(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052711 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 565 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052715 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 781:889(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052718 IP 192.168.0.2.48177 > 192.168.0.1.5001: P 889:997(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052720 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 781 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052724 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 997:1105(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052727 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 1105:1213(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052804 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 997 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052813 IP 192.168.0.2.48177 > 192.168.0.1.5001: P 1213:1321(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052816 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 1321:1429(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052823 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 1213 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052827 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 1429:1537(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052829 IP 192.168.0.2.48177 > 192.168.0.1.5001: P 1537:1645(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052901 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 1429 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052910 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 1645:1753(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052913 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 1753:1861(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052914 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 1645 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052917 IP 192.168.0.2.48177 > 192.168.0.1.5001: P 1861:1969(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052920 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 1969:2077(108) ack 1 win 8 <nop,nop,timestamp 63768988 580923921>
11:15:17.052990 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 1861 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.052996 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 2077:2185(108) ack 1 win 8 <nop,nop,timestamp 63768989 580923921>
11:15:17.052999 IP 192.168.0.2.48177 > 192.168.0.1.5001: P 2185:2293(108) ack 1 win 8 <nop,nop,timestamp 63768989 580923921>
11:15:17.053001 IP 192.168.0.1.5001 > 192.168.0.2.48177: . ack 2077 win 7 <nop,nop,timestamp 580923921 63768988>
11:15:17.053005 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 2293:2401(108) ack 1 win 8 <nop,nop,timestamp 63768989 580923921>
11:15:17.053008 IP 192.168.0.2.48177 > 192.168.0.1.5001: . 2401:2509(108) ack 1 win 8 <nop,nop,timestamp 63768989 580923921>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-11-14 10:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-14 8:28 TCP throughput drops sharply around MTU of 180 bytes Ang Way Chuang
2009-11-14 8:53 ` Eric Dumazet
2009-11-14 9:12 ` Eric Dumazet
2009-11-14 9:24 ` Ang Way Chuang
2009-11-14 9:57 ` Eric Dumazet
2009-11-14 10:19 ` Eric Dumazet
-- strict thread matches above, loose matches on Subject: below --
2009-11-14 8:48 Ang Way Chuang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).