From: arno@natisbad.org (Arnaud Ebalard)
To: linux-arm-kernel@lists.infradead.org
Subject: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s
Date: Tue, 19 Nov 2013 07:44:50 +0100 [thread overview]
Message-ID: <87li0kkhzx.fsf@natisbad.org> (raw)
In-Reply-To: <1384710098.8604.58.camel@edumazet-glaptop2.roam.corp.google.com> (Eric Dumazet's message of "Sun, 17 Nov 2013 09:41:38 -0800")
Hi,
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On Sun, 2013-11-17 at 15:19 +0100, Willy Tarreau wrote:
>
>>
>> So it is fairly possible that in your case you can't fill the link if you
>> consume too many descriptors. For example, if your server uses TCP_NODELAY
>> and sends incomplete segments (which is quite common), it's very easy to
>> run out of descriptors before the link is full.
>
> BTW I have a very simple patch for TCP stack that could help this exact
> situation...
>
> Idea is to use TCP Small Queue so that we dont fill qdisc/TX ring with
> very small frames, and let tcp_sendmsg() have more chance to fill
> complete packets.
>
> Again, for this to work very well, you need that NIC performs TX
> completion in reasonable amount of time...
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 3dc0c6c..10456cf 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -624,13 +624,19 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now,
> {
> if (tcp_send_head(sk)) {
> struct tcp_sock *tp = tcp_sk(sk);
> + struct sk_buff *skb = tcp_write_queue_tail(sk);
>
> if (!(flags & MSG_MORE) || forced_push(tp))
> - tcp_mark_push(tp, tcp_write_queue_tail(sk));
> + tcp_mark_push(tp, skb);
>
> tcp_mark_urg(tp, flags);
> - __tcp_push_pending_frames(sk, mss_now,
> - (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
> + if (flags & MSG_MORE)
> + nonagle = TCP_NAGLE_CORK;
> + if (atomic_read(&sk->sk_wmem_alloc) > 2048) {
> + set_bit(TSQ_THROTTLED, &tp->tsq_flags);
> + nonagle = TCP_NAGLE_CORK;
> + }
> + __tcp_push_pending_frames(sk, mss_now, nonagle);
> }
> }
I did some test regarding mvneta perf on current linus tree (commit
2d3c627502f2a9b0, w/ c9eeec26e32e "tcp: TSQ can use a dynamic limit"
reverted). It has Simon's tclk patch for mvebu (1022c75f5abd, "clk:
armada-370: fix tclk frequencies"). Kernel has some debug options
enabled and the patch above is not applied. I will spend some time on
this two directions this evening. The idea was to get some numbers on
the impact of TCP send window size and tcp_limit_output_bytes for
mvneta.
The test is done with a laptop (Debian, 3.11.0, e1000e) directly
connected to a RN102 (Marvell Armada 370 @1.2GHz, mvneta). The RN102
is running Debian armhf with an Apache2 serving a 1GB file from ext4
over lvm over RAID1 from 2 WD30EFRX. The client is nothing fancy, i.e.
a simple wget w/ -O /dev/null option.
With the exact same setup on a ReadyNAS Duo v2 (Kirkwood 88f6282
@1.6GHz, mv643xx_eth), I managed to get a throughput of 108MB/s
(cannot remember the kernel version but sth between 3.8 and 3.10.
So with that setup:
w/ TCP send window set to 4MB: 17.4 MB/s
w/ TCP send window set to 2MB: 16.2 MB/s
w/ TCP send window set to 1MB: 15.6 MB/s
w/ TCP send window set to 512KB: 25.6 MB/s
w/ TCP send window set to 256KB: 57.7 MB/s
w/ TCP send window set to 128KB: 54.0 MB/s
w/ TCP send window set to 64KB: 46.2 MB/s
w/ TCP send window set to 32KB: 42.8 MB/s
Then, I started playing w/ tcp_limit_output_bytes (default is 131072),
w/ TCP send window set to 256KB:
tcp_limit_output_bytes set to 512KB: 59.3 MB/s
tcp_limit_output_bytes set to 256KB: 58.5 MB/s
tcp_limit_output_bytes set to 128KB: 56.2 MB/s
tcp_limit_output_bytes set to 64KB: 32.1 MB/s
tcp_limit_output_bytes set to 32KB: 4.76 MB/s
As a side note, during the test, I sometimes gets peak for some seconds
at the beginning at 90MB/s which tend to confirm what WIlly wrote,
i.e. that the hardware can do more.
Cheers,
a+
WARNING: multiple messages have this Message-ID (diff)
From: arno@natisbad.org (Arnaud Ebalard)
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>,
Florian Fainelli <f.fainelli@gmail.com>,
simon.guinot@sequanux.org, netdev@vger.kernel.org,
edumazet@google.com, Cong Wang <xiyou.wangcong@gmail.com>,
Willy Tarreau <w@1wt.eu>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s
Date: Tue, 19 Nov 2013 07:44:50 +0100 [thread overview]
Message-ID: <87li0kkhzx.fsf@natisbad.org> (raw)
In-Reply-To: <1384710098.8604.58.camel@edumazet-glaptop2.roam.corp.google.com> (Eric Dumazet's message of "Sun, 17 Nov 2013 09:41:38 -0800")
Hi,
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On Sun, 2013-11-17 at 15:19 +0100, Willy Tarreau wrote:
>
>>
>> So it is fairly possible that in your case you can't fill the link if you
>> consume too many descriptors. For example, if your server uses TCP_NODELAY
>> and sends incomplete segments (which is quite common), it's very easy to
>> run out of descriptors before the link is full.
>
> BTW I have a very simple patch for TCP stack that could help this exact
> situation...
>
> Idea is to use TCP Small Queue so that we dont fill qdisc/TX ring with
> very small frames, and let tcp_sendmsg() have more chance to fill
> complete packets.
>
> Again, for this to work very well, you need that NIC performs TX
> completion in reasonable amount of time...
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 3dc0c6c..10456cf 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -624,13 +624,19 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now,
> {
> if (tcp_send_head(sk)) {
> struct tcp_sock *tp = tcp_sk(sk);
> + struct sk_buff *skb = tcp_write_queue_tail(sk);
>
> if (!(flags & MSG_MORE) || forced_push(tp))
> - tcp_mark_push(tp, tcp_write_queue_tail(sk));
> + tcp_mark_push(tp, skb);
>
> tcp_mark_urg(tp, flags);
> - __tcp_push_pending_frames(sk, mss_now,
> - (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
> + if (flags & MSG_MORE)
> + nonagle = TCP_NAGLE_CORK;
> + if (atomic_read(&sk->sk_wmem_alloc) > 2048) {
> + set_bit(TSQ_THROTTLED, &tp->tsq_flags);
> + nonagle = TCP_NAGLE_CORK;
> + }
> + __tcp_push_pending_frames(sk, mss_now, nonagle);
> }
> }
I did some test regarding mvneta perf on current linus tree (commit
2d3c627502f2a9b0, w/ c9eeec26e32e "tcp: TSQ can use a dynamic limit"
reverted). It has Simon's tclk patch for mvebu (1022c75f5abd, "clk:
armada-370: fix tclk frequencies"). Kernel has some debug options
enabled and the patch above is not applied. I will spend some time on
this two directions this evening. The idea was to get some numbers on
the impact of TCP send window size and tcp_limit_output_bytes for
mvneta.
The test is done with a laptop (Debian, 3.11.0, e1000e) directly
connected to a RN102 (Marvell Armada 370 @1.2GHz, mvneta). The RN102
is running Debian armhf with an Apache2 serving a 1GB file from ext4
over lvm over RAID1 from 2 WD30EFRX. The client is nothing fancy, i.e.
a simple wget w/ -O /dev/null option.
With the exact same setup on a ReadyNAS Duo v2 (Kirkwood 88f6282
@1.6GHz, mv643xx_eth), I managed to get a throughput of 108MB/s
(cannot remember the kernel version but sth between 3.8 and 3.10.
So with that setup:
w/ TCP send window set to 4MB: 17.4 MB/s
w/ TCP send window set to 2MB: 16.2 MB/s
w/ TCP send window set to 1MB: 15.6 MB/s
w/ TCP send window set to 512KB: 25.6 MB/s
w/ TCP send window set to 256KB: 57.7 MB/s
w/ TCP send window set to 128KB: 54.0 MB/s
w/ TCP send window set to 64KB: 46.2 MB/s
w/ TCP send window set to 32KB: 42.8 MB/s
Then, I started playing w/ tcp_limit_output_bytes (default is 131072),
w/ TCP send window set to 256KB:
tcp_limit_output_bytes set to 512KB: 59.3 MB/s
tcp_limit_output_bytes set to 256KB: 58.5 MB/s
tcp_limit_output_bytes set to 128KB: 56.2 MB/s
tcp_limit_output_bytes set to 64KB: 32.1 MB/s
tcp_limit_output_bytes set to 32KB: 4.76 MB/s
As a side note, during the test, I sometimes gets peak for some seconds
at the beginning at 90MB/s which tend to confirm what WIlly wrote,
i.e. that the hardware can do more.
Cheers,
a+
next prev parent reply other threads:[~2013-11-19 6:44 UTC|newest]
Thread overview: 121+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-10 13:53 [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s Arnaud Ebalard
2013-11-10 13:53 ` Arnaud Ebalard
2013-11-12 6:48 ` Cong Wang
2013-11-12 6:48 ` Cong Wang
2013-11-12 7:56 ` Arnaud Ebalard
2013-11-12 7:56 ` Arnaud Ebalard
2013-11-12 8:36 ` Willy Tarreau
2013-11-12 8:36 ` Willy Tarreau
2013-11-12 9:14 ` Arnaud Ebalard
2013-11-12 9:14 ` Arnaud Ebalard
2013-11-12 10:01 ` Willy Tarreau
2013-11-12 10:01 ` Willy Tarreau
2013-11-12 15:34 ` Arnaud Ebalard
2013-11-12 15:34 ` Arnaud Ebalard
2013-11-13 7:22 ` Willy Tarreau
2013-11-13 7:22 ` Willy Tarreau
2013-11-17 14:19 ` Willy Tarreau
2013-11-17 14:19 ` Willy Tarreau
2013-11-17 17:41 ` Eric Dumazet
2013-11-17 17:41 ` Eric Dumazet
2013-11-19 6:44 ` Arnaud Ebalard [this message]
2013-11-19 6:44 ` Arnaud Ebalard
2013-11-19 13:53 ` Eric Dumazet
2013-11-19 13:53 ` Eric Dumazet
2013-11-19 17:43 ` Willy Tarreau
2013-11-19 17:43 ` Willy Tarreau
2013-11-19 18:31 ` Eric Dumazet
2013-11-19 18:31 ` Eric Dumazet
2013-11-19 18:41 ` Willy Tarreau
2013-11-19 18:41 ` Willy Tarreau
2013-11-19 23:53 ` Arnaud Ebalard
2013-11-19 23:53 ` Arnaud Ebalard
2013-11-20 0:08 ` Eric Dumazet
2013-11-20 0:08 ` Eric Dumazet
2013-11-20 0:35 ` Willy Tarreau
2013-11-20 0:35 ` Willy Tarreau
2013-11-20 0:43 ` Eric Dumazet
2013-11-20 0:43 ` Eric Dumazet
2013-11-20 0:52 ` Willy Tarreau
2013-11-20 0:52 ` Willy Tarreau
2013-11-20 8:50 ` Thomas Petazzoni
2013-11-20 8:50 ` Thomas Petazzoni
2013-11-20 19:21 ` Arnaud Ebalard
2013-11-20 19:11 ` Willy Tarreau
2013-11-20 19:11 ` Willy Tarreau
2013-11-20 19:26 ` Arnaud Ebalard
2013-11-20 19:26 ` Arnaud Ebalard
2013-11-20 21:28 ` Arnaud Ebalard
2013-11-20 21:28 ` Arnaud Ebalard
2013-11-20 21:54 ` Willy Tarreau
2013-11-20 21:54 ` Willy Tarreau
2013-11-21 0:44 ` Willy Tarreau
2013-11-21 0:44 ` Willy Tarreau
2013-11-21 18:38 ` ARM network performance and dma_mask (was: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s) Willy Tarreau
2013-11-21 19:04 ` Thomas Petazzoni
2013-11-21 19:04 ` Thomas Petazzoni
2013-11-21 21:51 ` ARM network performance and dma_mask (was: [BUG, REGRESSION?] 3.11.6+, 3.12: " Willy Tarreau
2013-11-21 21:51 ` ARM network performance and dma_mask (was: [BUG,REGRESSION?] 3.11.6+,3.12: " Willy Tarreau
2013-11-21 22:01 ` ARM network performance and dma_mask Rob Herring
2013-11-21 22:01 ` Rob Herring
2013-11-21 22:13 ` Willy Tarreau
2013-11-21 22:13 ` Willy Tarreau
2013-11-21 21:51 ` [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s Arnaud Ebalard
2013-11-21 21:51 ` Arnaud Ebalard
2013-11-21 21:52 ` Willy Tarreau
2013-11-21 21:52 ` Willy Tarreau
2013-11-21 22:00 ` Eric Dumazet
2013-11-21 22:00 ` Eric Dumazet
2013-11-21 22:55 ` Arnaud Ebalard
2013-11-21 22:55 ` Arnaud Ebalard
2013-11-21 23:23 ` Rick Jones
2013-11-21 23:23 ` Rick Jones
2013-11-20 17:12 ` Willy Tarreau
2013-11-20 17:12 ` Willy Tarreau
2013-11-20 17:30 ` Eric Dumazet
2013-11-20 17:30 ` Eric Dumazet
2013-11-20 17:38 ` Willy Tarreau
2013-11-20 17:38 ` Willy Tarreau
2013-11-20 18:52 ` David Miller
2013-11-20 18:52 ` David Miller
2013-11-20 17:34 ` Willy Tarreau
2013-11-20 17:34 ` Willy Tarreau
2013-11-20 17:40 ` Eric Dumazet
2013-11-20 17:40 ` Eric Dumazet
2013-11-20 18:15 ` Willy Tarreau
2013-11-20 18:15 ` Willy Tarreau
2013-11-20 18:21 ` Eric Dumazet
2013-11-20 18:21 ` Eric Dumazet
2013-11-20 18:29 ` Willy Tarreau
2013-11-20 18:29 ` Willy Tarreau
2013-11-20 19:22 ` Arnaud Ebalard
2013-11-20 19:22 ` Arnaud Ebalard
2013-11-18 10:09 ` David Laight
2013-11-18 10:09 ` David Laight
2013-11-18 10:52 ` Willy Tarreau
2013-11-18 10:52 ` Willy Tarreau
2013-11-18 10:26 ` Thomas Petazzoni
2013-11-18 10:26 ` Thomas Petazzoni
2013-11-18 10:44 ` Simon Guinot
2013-11-18 10:44 ` Simon Guinot
2013-11-18 16:54 ` Stephen Hemminger
2013-11-18 16:54 ` Stephen Hemminger
2013-11-18 17:13 ` Eric Dumazet
2013-11-18 17:13 ` Eric Dumazet
2013-11-18 10:51 ` Willy Tarreau
2013-11-18 10:51 ` Willy Tarreau
2013-11-18 17:58 ` Florian Fainelli
2013-11-18 17:58 ` Florian Fainelli
2013-11-12 14:39 ` [PATCH] tcp: tsq: restore minimal amount of queueing Eric Dumazet
2013-11-12 15:24 ` Sujith Manoharan
2013-11-13 14:06 ` Eric Dumazet
2013-11-13 14:32 ` [PATCH v2] " Eric Dumazet
2013-11-13 21:18 ` Arnaud Ebalard
2013-11-13 21:59 ` Holger Hoffstaette
2013-11-13 23:40 ` Eric Dumazet
2013-11-13 23:52 ` Holger Hoffstaette
2013-11-17 23:15 ` Francois Romieu
2013-11-18 16:26 ` Holger Hoffstätte
2013-11-18 16:47 ` Eric Dumazet
2013-11-13 22:41 ` Eric Dumazet
2013-11-14 21:26 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87li0kkhzx.fsf@natisbad.org \
--to=arno@natisbad.org \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.