From: w@1wt.eu (Willy Tarreau)
To: linux-arm-kernel@lists.infradead.org
Subject: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s
Date: Sun, 17 Nov 2013 15:19:40 +0100 [thread overview]
Message-ID: <20131117141940.GA18569@1wt.eu> (raw)
In-Reply-To: <20131113072257.GB10591@1wt.eu>
Hi Arnaud,
[CCing Thomas and removing stable@]
On Wed, Nov 13, 2013 at 08:22:57AM +0100, Willy Tarreau wrote:
> On Tue, Nov 12, 2013 at 04:34:24PM +0100, Arnaud Ebalard wrote:
> > Can you give a pre-3.11.7 kernel a try if you find the time? I started
> > working on RN102 during 3.10-rc cycle but do not remember if I did the
> > first preformance tests on 3.10 or 3.11. And if you find more time,
> > 3.11.7 would be nice too ;-)
>
> Still have not found time for this but I observed something intriguing
> which might possibly match your experience : if I use large enough send
> buffers on the mirabox and receive buffers on the client, then the
> traffic drops for objects larger than 1 MB. I have quickly checked what's
> happening and it's just that there are pauses of up to 8 ms between some
> packets when the TCP send window grows larger than about 200 kB. And
> since there are no drops, there is no reason for the window to shrink.
> I suspect it's exactly related to the issue explained by Eric about the
> timer used to recycle the Tx descriptors. However last time I checked,
> these ones were also processed in the Rx path, which means that the
> ACKs that flow back should have had the same effect as a Tx IRQ (unless
> I'd use asymmetric routing, which was not the case). So there might be
> another issue. Ah, and it only happens with GSO.
I just had a quick look at the driver and I can confirm that Eric is right
about the fact that we use up to two descriptors per GSO segment. Thus, we
can saturate the Tx queue at 532/2 = 266 Tx segments = 388360 bytes (for
1460 MSS). I thought I had seen a tx flush from the rx poll function but I
can't find it so it seems I was wrong, or that I possibly misunderstood
mvneta_poll() the first time I read it. Thus the observed behaviour is
perfectly normal.
With GSO enabled, as soon as the window grows large enough, we can fill
all the Tx descriptors with few segments, then need to wait for 10ms (12
if running at 250 Hz as I am) to flush them, which explains the low speed
I was observing with large windows. When disabling GSO, as much as twice
the number of descriptors can be used, which is enough to fill the wire
in the same time frame. Additionally it's likely that more descriptors
get the time to be sent during that period and that each call to mvneta_tx()
causing a call to mvneta_txq_done() releases some of the previously sent
descriptors, allowing to sustain wire rate.
I wonder if we can call mvneta_txq_done() from the IRQ handler, which would
cause some recycling of the Tx descriptors when receiving the corresponding
ACKs.
Ideally we should enable the Tx IRQ, but I still have no access to this
chip's datasheet despite having asked Marvell several times in one year
(Thomas has it though).
So it is fairly possible that in your case you can't fill the link if you
consume too many descriptors. For example, if your server uses TCP_NODELAY
and sends incomplete segments (which is quite common), it's very easy to
run out of descriptors before the link is full.
I still did not have time to run a new kernel on this device however :-(
Best regards,
Willy
next prev parent reply other threads:[~2013-11-17 14:19 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-10 13:53 [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s Arnaud Ebalard
2013-11-12 6:48 ` Cong Wang
2013-11-12 7:56 ` Arnaud Ebalard
2013-11-12 8:36 ` Willy Tarreau
2013-11-12 9:14 ` Arnaud Ebalard
2013-11-12 10:01 ` Willy Tarreau
2013-11-12 15:34 ` Arnaud Ebalard
2013-11-13 7:22 ` Willy Tarreau
2013-11-17 14:19 ` Willy Tarreau [this message]
2013-11-17 17:41 ` Eric Dumazet
2013-11-19 6:44 ` Arnaud Ebalard
2013-11-19 13:53 ` Eric Dumazet
2013-11-19 17:43 ` Willy Tarreau
2013-11-19 18:31 ` Eric Dumazet
2013-11-19 18:41 ` Willy Tarreau
2013-11-19 23:53 ` Arnaud Ebalard
2013-11-20 0:08 ` Eric Dumazet
2013-11-20 0:35 ` Willy Tarreau
2013-11-20 0:43 ` Eric Dumazet
2013-11-20 0:52 ` Willy Tarreau
2013-11-20 8:50 ` Thomas Petazzoni
2013-11-20 19:21 ` Arnaud Ebalard
2013-11-20 19:11 ` Willy Tarreau
2013-11-20 19:26 ` Arnaud Ebalard
2013-11-20 21:28 ` Arnaud Ebalard
2013-11-20 21:54 ` Willy Tarreau
2013-11-21 0:44 ` Willy Tarreau
2013-11-21 21:51 ` Arnaud Ebalard
2013-11-21 21:52 ` Willy Tarreau
2013-11-21 22:00 ` Eric Dumazet
2013-11-21 22:55 ` Arnaud Ebalard
2013-11-21 23:23 ` Rick Jones
[not found] ` <20131121183834.GB18513@1wt.eu>
2013-11-21 19:04 ` ARM network performance and dma_mask (was: [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s) Thomas Petazzoni
2013-11-21 21:51 ` ARM network performance and dma_mask (was: [BUG, REGRESSION?] 3.11.6+, 3.12: " Willy Tarreau
2013-11-21 22:01 ` ARM network performance and dma_mask Rob Herring
2013-11-21 22:13 ` Willy Tarreau
2013-11-20 17:12 ` [BUG,REGRESSION?] 3.11.6+,3.12: GbE iface rate drops to few KB/s Willy Tarreau
2013-11-20 17:30 ` Eric Dumazet
2013-11-20 17:38 ` Willy Tarreau
2013-11-20 18:52 ` David Miller
2013-11-20 17:34 ` Willy Tarreau
2013-11-20 17:40 ` Eric Dumazet
2013-11-20 18:15 ` Willy Tarreau
2013-11-20 18:21 ` Eric Dumazet
2013-11-20 18:29 ` Willy Tarreau
2013-11-20 19:22 ` Arnaud Ebalard
2013-11-18 10:09 ` David Laight
2013-11-18 10:52 ` Willy Tarreau
2013-11-18 10:26 ` Thomas Petazzoni
2013-11-18 10:44 ` Simon Guinot
2013-11-18 16:54 ` Stephen Hemminger
2013-11-18 17:13 ` Eric Dumazet
2013-11-18 10:51 ` Willy Tarreau
2013-11-18 17:58 ` Florian Fainelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131117141940.GA18569@1wt.eu \
--to=w@1wt.eu \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).