From: Niklas Cassel <niklas.cassel@axis.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: network stream fairness
Date: Fri, 20 Nov 2015 16:33:17 +0100 [thread overview]
Message-ID: <564F3D3D.7050004@axis.com> (raw)
In-Reply-To: <1447085271.17135.44.camel@edumazet-glaptop2.roam.corp.google.com>
[-- Attachment #1: Type: text/plain, Size: 6814 bytes --]
On 11/09/2015 05:07 PM, Eric Dumazet wrote:
> On Mon, 2015-11-09 at 16:53 +0100, Niklas Cassel wrote:
>> On 11/09/2015 04:50 PM, Eric Dumazet wrote:
>>> On Mon, 2015-11-09 at 16:41 +0100, Niklas Cassel wrote:
>>>> I have a ethernet driver for a 100 Mbps NIC.
>>>> The NIC has dedicated hardware for offloading.
>>>> The driver has implemented TSO, GSO and BQL.
>>>> Since the CPU on the SoC is rather weak, I'd rather
>>>> not increase the CPU load by turning off offloading.
>>>>
>>>> Since commit
>>>> 605ad7f184b6 ("tcp: refine TSO autosizing")
>>>>
>>>> the bandwidth is no longer fair between streams.
>>>> see output at the end of the mail, where I'm testing with 2 streams.
>>>>
>>>>
>>>> If I revert 605ad7f184b6 on 4.3, I get a stable 45 Mbps per stream.
>>>>
>>>> I can also use vanilla 4.3 and do:
>>>> echo 3000 > /sys/class/net/eth0/queues/tx-0/byte_queue_limits/limit_max
>>>> to also get a stable 45 Mbps per stream.
>>>>
>>>> My question is, am I supposed to set the BQL limit explicitly?
>>>> It is possible that I have missed something in my driver,
>>>> but my understanding is that the TCP stack sets and adjusts
>>>> the BQL limit automatically.
>>>>
>>>>
>>>> Perhaps the following info might help:
>>>>
>>>> After running iperf3 on vanilla 4.3:
>>>> /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
>>>> limit 89908
>>>> limit_max 1879048192
>>>>
>>>> After running iperf3 on vanilla 4.3 + BQL explicitly set:
>>>> /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
>>>> limit 3000
>>>> limit_max 3000
>>>>
>>>> After running iperf3 on 4.3 + 605ad7f184b6 reverted:
>>>> /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
>>>> limit 8886
>>>> limit_max 1879048192
>>>>
>>>
>>> There is absolutely nothing ensuring fairness among multiple TCP flows.
>>>
>>> One TCP flow can very easily grab whole bandwidth for itself, there are
>>> numerous descriptions of this phenomena in various TCP studies.
>>>
>>> This is why we have packet schedulers ;)
>>
>> Oh.. How stupid of me, I forgot to mention.. all of the measurements were
>> done with fq_codel.
>
> Your numbers suggest a cwnd growth then, which might show a CC bug.
>
> Please run the following when your iper3 runs on regular 4.3 kernel
>
> for i in `seq 1 10`
> do
> ss -temoi dst 192.168.0.141
> sleep 1
> done
>
>
I've been able to reproduce this on a ARMv7, single core, 100 Mbps NIC.
Kernel vanilla 4.3, driver has BQL implemented, but is unfortunately not upstreamed.
ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: off
tx-checksumming: on
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:40:8c:18:58:c8 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.136/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
# before iperf3 run
tc -s -d qdisc
qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 21001 bytes 45 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic
# after iperf3 run
tc -s -d qdisc
qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 5618224754 bytes 3710914 pkt (dropped 0, overlimits 0 requeues 1)
backlog 0b 0p requeues 1
maxpacket 1514 drop_overlimit 0 new_flow_count 2 ecn_mark 0
new_flows_len 0 old_flows_len 0
Note that it appears stable for 411 seconds before you can see the
congestion window growth. It appears that the amount of time you have
to wait before things go downhill varies a lot.
No switch was used between the server and client; they were connected directly.
For full iperf3 log and output from ss command, see attachment.
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 411.00-412.00 sec 5.09 MBytes 42.7 Mbits/sec 0 22.6 KBytes
[ 6] 411.00-412.00 sec 5.14 MBytes 43.1 Mbits/sec 0 22.6 KBytes
[SUM] 411.00-412.00 sec 10.2 MBytes 85.8 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 4] 412.00-413.00 sec 5.12 MBytes 43.0 Mbits/sec 0 22.6 KBytes
[ 6] 412.00-413.00 sec 5.13 MBytes 43.0 Mbits/sec 0 22.6 KBytes
[SUM] 412.00-413.00 sec 10.3 MBytes 86.0 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 4] 413.00-414.00 sec 5.17 MBytes 43.4 Mbits/sec 0 22.6 KBytes
[ 6] 413.00-414.00 sec 5.07 MBytes 42.6 Mbits/sec 0 22.6 KBytes
[SUM] 413.00-414.00 sec 10.2 MBytes 86.0 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 4] 414.00-415.00 sec 5.11 MBytes 42.9 Mbits/sec 0 22.6 KBytes
[ 6] 414.00-415.00 sec 5.14 MBytes 43.1 Mbits/sec 0 22.6 KBytes
[SUM] 414.00-415.00 sec 10.3 MBytes 86.0 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 4] 415.00-416.00 sec 5.11 MBytes 42.9 Mbits/sec 0 32.5 KBytes
[ 6] 415.00-416.00 sec 5.15 MBytes 43.2 Mbits/sec 0 22.6 KBytes
[SUM] 415.00-416.00 sec 10.3 MBytes 86.2 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 4] 416.00-417.00 sec 6.18 MBytes 51.8 Mbits/sec 0 35.4 KBytes
[ 6] 416.00-417.00 sec 4.08 MBytes 34.3 Mbits/sec 0 22.6 KBytes
[SUM] 416.00-417.00 sec 10.3 MBytes 86.1 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 4] 417.00-418.00 sec 6.24 MBytes 52.4 Mbits/sec 0 35.4 KBytes
[ 6] 417.00-418.00 sec 4.01 MBytes 33.6 Mbits/sec 0 22.6 KBytes
[SUM] 417.00-418.00 sec 10.3 MBytes 86.0 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 4] 418.00-419.00 sec 6.28 MBytes 52.7 Mbits/sec 0 35.4 KBytes
[ 6] 418.00-419.00 sec 3.98 MBytes 33.4 Mbits/sec 0 22.6 KBytes
[SUM] 418.00-419.00 sec 10.3 MBytes 86.0 Mbits/sec 0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 4] 419.00-420.00 sec 6.30 MBytes 52.8 Mbits/sec 0 35.4 KBytes
[ 6] 419.00-420.00 sec 3.96 MBytes 33.2 Mbits/sec 0 22.6 KBytes
[SUM] 419.00-420.00 sec 10.3 MBytes 86.0 Mbits/sec 0
[-- Attachment #2: iperf3-ss-logs.tar.gz --]
[-- Type: application/gzip, Size: 62161 bytes --]
next prev parent reply other threads:[~2015-11-20 15:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-09 15:41 network stream fairness Niklas Cassel
2015-11-09 15:50 ` Eric Dumazet
2015-11-09 15:53 ` Niklas Cassel
2015-11-09 16:07 ` Eric Dumazet
2015-11-09 16:24 ` Eric Dumazet
2015-11-09 16:50 ` Niklas Cassel
2015-11-09 17:23 ` Eric Dumazet
2015-11-10 9:38 ` Niklas Cassel
2015-11-20 15:33 ` Niklas Cassel [this message]
2015-11-20 18:16 ` Eric Dumazet
2015-11-25 12:47 ` Niklas Cassel
2015-11-25 13:49 ` Eric Dumazet
2015-11-29 2:41 ` Niklas Cassel
2015-11-29 17:40 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=564F3D3D.7050004@axis.com \
--to=niklas.cassel@axis.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).