TCP performance regression

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* TCP performance regression
@ 2013-11-11  5:30 Sujith Manoharan
  2013-11-11  5:55 ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Sujith Manoharan @ 2013-11-11  5:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Hi,

The commit, "tcp: TSQ can use a dynamic limit" causes a large
performance drop in TCP transmission with the wireless driver ath9k.

With a 2-stream card (AR9462), the usual throughput is around 195 Mbps.
But, with this commit, it drops to ~125 Mbps, occasionally reaching 130.

If the commit is reverted, performance is normal again and I can get
190+ Mbps. Apparently, ath10k is also affected and a 250 Mbps drop
is seen (from an original 740 Mbps).

I am using Linville's wireless-testing tree.

>From the test machine:

root@linux-test ~# uname -a
Linux linux-test 3.12.0-wl-nodebug #104 SMP PREEMPT Mon Nov 11 10:27:56 IST 2013 x86_64 GNU/Linux

root@linux-test ~# tc -d -s qdisc show dev wlan0 
qdisc mq 0: root 
 Sent 342682272 bytes 226366 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 

root@linux-test ~# zgrep -i net_sch /proc/config.gz
CONFIG_NET_SCHED=y
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFB=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_SCH_MQPRIO=m
CONFIG_NET_SCH_CHOKE=m
CONFIG_NET_SCH_QFQ=m
CONFIG_NET_SCH_CODEL=m
CONFIG_NET_SCH_FQ_CODEL=m
CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_INGRESS=m
# CONFIG_NET_SCH_PLUG is not set
CONFIG_NET_SCH_FIFO=y

If more information is required, please let me know.

Sujith

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11  5:30 TCP performance regression Sujith Manoharan
@ 2013-11-11  5:55 ` Eric Dumazet
  2013-11-11  6:07   ` Sujith Manoharan
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2013-11-11  5:55 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: Eric Dumazet, netdev

On Mon, 2013-11-11 at 11:00 +0530, Sujith Manoharan wrote:
> Hi,
> 
> The commit, "tcp: TSQ can use a dynamic limit" causes a large
> performance drop in TCP transmission with the wireless driver ath9k.
> 
> With a 2-stream card (AR9462), the usual throughput is around 195 Mbps.
> But, with this commit, it drops to ~125 Mbps, occasionally reaching 130.
> 
> If the commit is reverted, performance is normal again and I can get
> 190+ Mbps. Apparently, ath10k is also affected and a 250 Mbps drop
> is seen (from an original 740 Mbps).

I am afraid this commit shows bugs in various network drivers.

All drivers doing TX completion using a timer are buggy.

Random example : drivers/net/ethernet/marvell/mvneta.c

#define MVNETA_TX_DONE_TIMER_PERIOD 10

/* Trigger tx done timer in MVNETA_TX_DONE_TIMER_PERIOD msecs */
static void mvneta_add_tx_done_timer(struct mvneta_port *pp)
{
        if (test_and_set_bit(MVNETA_F_TX_DONE_TIMER_BIT, &pp->flags) == 0) {
                pp->tx_done_timer.expires = jiffies +
                        msecs_to_jiffies(MVNETA_TX_DONE_TIMER_PERIOD);
                add_timer(&pp->tx_done_timer);
        }
}

Holding skb 10 ms before TX completion is totally wrong and must be fixed.

If really NIC is not able to trigger an interrupt after TX completion, then
driver should call skb_orphan() in its ndo_start_xmit()

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11  5:55 ` Eric Dumazet
@ 2013-11-11  6:07   ` Sujith Manoharan
  2013-11-11  6:54     ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Sujith Manoharan @ 2013-11-11  6:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Eric Dumazet wrote:
> I am afraid this commit shows bugs in various network drivers.
> 
> All drivers doing TX completion using a timer are buggy.
> 
> Holding skb 10 ms before TX completion is totally wrong and must be fixed.
> 
> If really NIC is not able to trigger an interrupt after TX completion, then
> driver should call skb_orphan() in its ndo_start_xmit()

802.11 AMPDU formation is done in the TX completion path in ath9k.

Incoming frames are added to a software queue and the TX completion
tasklet checks if enough frames are available to form an aggregate and
if so, forms new aggregates and transmits them.

There is no timer involved, but the completion routine is rather heavy.
Many wireless drivers handle 802.11 aggregation in this way:
ath9k, ath9k_htc, ath10k etc.

Sujith

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11  6:07   ` Sujith Manoharan
@ 2013-11-11  6:54     ` Eric Dumazet
  2013-11-11  8:19       ` Sujith Manoharan
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2013-11-11  6:54 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: netdev

On Mon, 2013-11-11 at 11:37 +0530, Sujith Manoharan wrote:
> Eric Dumazet wrote:
> > I am afraid this commit shows bugs in various network drivers.
> > 
> > All drivers doing TX completion using a timer are buggy.
> > 
> > Holding skb 10 ms before TX completion is totally wrong and must be fixed.
> > 
> > If really NIC is not able to trigger an interrupt after TX completion, then
> > driver should call skb_orphan() in its ndo_start_xmit()
> 
> 802.11 AMPDU formation is done in the TX completion path in ath9k.
> 
> Incoming frames are added to a software queue and the TX completion
> tasklet checks if enough frames are available to form an aggregate and
> if so, forms new aggregates and transmits them.
> 

Hmm... apparently ath9k uses :

#define ATH_AMPDU_LIMIT_MAX        (64 * 1024 - 1)

And mentions a 4ms time frame :

max_4ms_framelen = ATH_AMPDU_LIMIT_MAX;

So prior to "tcp: TSQ can use a dynamic limit", the ~128KB bytes TCP
could queue per TCP socket on qdisc/NIC would happen to please ath9k

ath9k can set rts_aggr_limit to 8*1024 :

 if (AR_SREV_9160_10_OR_LATER(ah) || AR_SREV_9100(ah))
  pCap->rts_aggr_limit = ATH_AMPDU_LIMIT_MAX; 
 else
  pCap->rts_aggr_limit = (8 * 1024);

> There is no timer involved, but the completion routine is rather heavy.
> Many wireless drivers handle 802.11 aggregation in this way:
> ath9k, ath9k_htc, ath10k etc.
> 

A timer would be definitely needed, and it should be rather small (1 or
2 ms)

If TCP socket is application limited, it seems ath9k can delay the last
block by a too long time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11  6:54     ` Eric Dumazet
@ 2013-11-11  8:19       ` Sujith Manoharan
  2013-11-11 14:27         ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Sujith Manoharan @ 2013-11-11  8:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Eric Dumazet wrote:
> Hmm... apparently ath9k uses :
> 
> #define ATH_AMPDU_LIMIT_MAX        (64 * 1024 - 1)

This is the maximum AMPDU size, specified in the
802.11 standard.

> And mentions a 4ms time frame :
> 
> max_4ms_framelen = ATH_AMPDU_LIMIT_MAX;

The 4ms limitation is a FCC limitation and is used
for regulatory compliance.

> So prior to "tcp: TSQ can use a dynamic limit", the ~128KB bytes TCP
> could queue per TCP socket on qdisc/NIC would happen to please ath9k

Ok.

> ath9k can set rts_aggr_limit to 8*1024 :
> 
>  if (AR_SREV_9160_10_OR_LATER(ah) || AR_SREV_9100(ah))
>   pCap->rts_aggr_limit = ATH_AMPDU_LIMIT_MAX; 
>  else
>   pCap->rts_aggr_limit = (8 * 1024);

The RTS limit is required for some old chips which had HW bugs and
the above code is a workaround.

> A timer would be definitely needed, and it should be rather small (1 or
> 2 ms)
> 
> If TCP socket is application limited, it seems ath9k can delay the last
> block by a too long time.

I am not really clear on how this regression can be fixed in the driver
since the majority of the transmission/aggregation logic is present in the
TX completion path.

Sujith

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11  8:19       ` Sujith Manoharan
@ 2013-11-11 14:27         ` Eric Dumazet
  2013-11-11 14:39           ` Eric Dumazet
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Eric Dumazet @ 2013-11-11 14:27 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: netdev, Dave Taht

On Mon, 2013-11-11 at 13:49 +0530, Sujith Manoharan wrote:

> I am not really clear on how this regression can be fixed in the driver
> since the majority of the transmission/aggregation logic is present in the
> TX completion path.

We have many choices.

1) Add back a minimum of ~128 K of outstanding bytes per TCP session,
   so that buggy drivers can sustain 'line rate'.

   Note that with 100 concurrent TCP streams, total amount of bytes
   queued on the NIC is 12 MB.
   And pfifo_fast qdisc will drop packets anyway.

   Thats what we call 'BufferBloat'

2) Try lower values like 64K. Still bufferbloat.

3) Fix buggy drivers, using a proper logic, or shorter timers (mvneta
case for example)

4) Add a new netdev attribute, so that well behaving NIC drivers do not
have to artificially force TCP stack to queue too many bytes in
Qdisc/NIC queues.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 14:27         ` Eric Dumazet
@ 2013-11-11 14:39           ` Eric Dumazet
  2013-11-11 16:44             ` Eric Dumazet
  2013-11-11 15:05           ` David Laight
  2013-11-11 16:13           ` Sujith Manoharan
  2 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2013-11-11 14:39 UTC (permalink / raw)
  To: Sujith Manoharan, Arnaud Ebalard; +Cc: netdev, Dave Taht, Thomas Petazzoni

On Mon, 2013-11-11 at 06:27 -0800, Eric Dumazet wrote:
> On Mon, 2013-11-11 at 13:49 +0530, Sujith Manoharan wrote:
> 
> > I am not really clear on how this regression can be fixed in the driver
> > since the majority of the transmission/aggregation logic is present in the
> > TX completion path.
> 
> We have many choices.
> 
> 1) Add back a minimum of ~128 K of outstanding bytes per TCP session,
>    so that buggy drivers can sustain 'line rate'.
> 
>    Note that with 100 concurrent TCP streams, total amount of bytes
>    queued on the NIC is 12 MB.
>    And pfifo_fast qdisc will drop packets anyway.
> 
>    Thats what we call 'BufferBloat'
> 
> 2) Try lower values like 64K. Still bufferbloat.
> 
> 3) Fix buggy drivers, using a proper logic, or shorter timers (mvneta
> case for example)
> 
> 4) Add a new netdev attribute, so that well behaving NIC drivers do not
> have to artificially force TCP stack to queue too many bytes in
> Qdisc/NIC queues.

How following patch helps mvneta performance on current net-next tree
for a single TCP (sending) flow ?



diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 7d99e695a110..002ac464202f 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -172,12 +172,11 @@
 /* Various constants */
 
 /* Coalescing */
-#define MVNETA_TXDONE_COAL_PKTS		16
 #define MVNETA_RX_COAL_PKTS		32
 #define MVNETA_RX_COAL_USEC		100
 
 /* Timer */
-#define MVNETA_TX_DONE_TIMER_PERIOD	10
+#define MVNETA_TX_DONE_TIMER_PERIOD	1
 
 /* Napi polling weight */
 #define MVNETA_RX_POLL_WEIGHT		64
@@ -1592,8 +1591,7 @@ out:
 		dev_kfree_skb_any(skb);
 	}
 
-	if (txq->count >= MVNETA_TXDONE_COAL_PKTS)
-		mvneta_txq_done(pp, txq);
+	mvneta_txq_done(pp, txq);
 
 	/* If after calling mvneta_txq_done, count equals
 	 * frags, we need to set the timer

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 14:39           ` Eric Dumazet
@ 2013-11-11 16:44             ` Eric Dumazet
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2013-11-11 16:44 UTC (permalink / raw)
  To: Sujith Manoharan; +Cc: Arnaud Ebalard, netdev, Dave Taht, Thomas Petazzoni

On Mon, 2013-11-11 at 06:39 -0800, Eric Dumazet wrote:

> How following patch helps mvneta performance on current net-next tree
> for a single TCP (sending) flow ?
> 

v2 (more chance to even compile ;)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 7d99e695a110..e8211277f15d 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -172,12 +172,12 @@
 /* Various constants */
 
 /* Coalescing */
-#define MVNETA_TXDONE_COAL_PKTS		16
+#define MVNETA_TXDONE_COAL_PKTS		1
 #define MVNETA_RX_COAL_PKTS		32
 #define MVNETA_RX_COAL_USEC		100
 
 /* Timer */
-#define MVNETA_TX_DONE_TIMER_PERIOD	10
+#define MVNETA_TX_DONE_TIMER_PERIOD	1
 
 /* Napi polling weight */
 #define MVNETA_RX_POLL_WEIGHT		64

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* RE: TCP performance regression
  2013-11-11 14:27         ` Eric Dumazet
  2013-11-11 14:39           ` Eric Dumazet
@ 2013-11-11 15:05           ` David Laight
  2013-11-11 15:29             ` Eric Dumazet
  2013-11-11 16:13           ` Sujith Manoharan
  2 siblings, 1 reply; 26+ messages in thread
From: David Laight @ 2013-11-11 15:05 UTC (permalink / raw)
  To: Eric Dumazet, Sujith Manoharan; +Cc: netdev, Dave Taht

> On Mon, 2013-11-11 at 13:49 +0530, Sujith Manoharan wrote:
> 
> > I am not really clear on how this regression can be fixed in the driver
> > since the majority of the transmission/aggregation logic is present in the
> > TX completion path.
> 
> We have many choices.
> 
> 1) Add back a minimum of ~128 K of outstanding bytes per TCP session,
>    so that buggy drivers can sustain 'line rate'.
> 
>    Note that with 100 concurrent TCP streams, total amount of bytes
>    queued on the NIC is 12 MB.
>    And pfifo_fast qdisc will drop packets anyway.
> 
>    Thats what we call 'BufferBloat'
> 
> 2) Try lower values like 64K. Still bufferbloat.
> 
> 3) Fix buggy drivers, using a proper logic, or shorter timers (mvneta
> case for example)
> 
> 4) Add a new netdev attribute, so that well behaving NIC drivers do not
> have to artificially force TCP stack to queue too many bytes in
> Qdisc/NIC queues.

Or, maybe:
5) call skb_orphan() (I think that is the correct function) when transmit
   packets are given to the hardware.
   I think that if the mac driver supports BQL this could be done as soon
   as the BQL resource is assigned to the packet.
   I suspect this could be done unconditionally.

Clearly the skb may also need to be freed to allow protocol
retransmissions to complete properly - but that won't be so timing
critical.

I remember (a long time ago) getting a measurable performance increase
by disabling the 'end of transmit' interrupt and only doing tx tidyup
when the driver was active for other reasons.
There were 2 reasons for enabling the interrupt:
1) tx ring full.
2) tx buffer had a user-defined delete function.

	David

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: TCP performance regression
  2013-11-11 15:05           ` David Laight
@ 2013-11-11 15:29             ` Eric Dumazet
  2013-11-11 15:43               ` David Laight
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2013-11-11 15:29 UTC (permalink / raw)
  To: David Laight; +Cc: Sujith Manoharan, netdev, Dave Taht

On Mon, 2013-11-11 at 15:05 +0000, David Laight wrote:

> Or, maybe:
> 5) call skb_orphan() (I think that is the correct function) when transmit
>    packets are given to the hardware.

This is the worth possible solution, as it basically re-enables
bufferbloat again.

socket sk_wmem_queued should not be fooled, unless we have no other
choice.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: TCP performance regression
  2013-11-11 15:29             ` Eric Dumazet
@ 2013-11-11 15:43               ` David Laight
  2013-11-11 16:17                 ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: David Laight @ 2013-11-11 15:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sujith Manoharan, netdev, Dave Taht

> > Or, maybe:
> > 5) call skb_orphan() (I think that is the correct function) when transmit
> >    packets are given to the hardware.
> 
> This is the worth possible solution, as it basically re-enables
              ^^^^^ worst ?
> bufferbloat again.

It should be ok if the mac driver only gives the hardware a small
number of bytes/packets - or one appropriate for the link speed.

	David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: TCP performance regression
  2013-11-11 15:43               ` David Laight
@ 2013-11-11 16:17                 ` Eric Dumazet
  2013-11-11 16:35                   ` David Laight
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2013-11-11 16:17 UTC (permalink / raw)
  To: David Laight; +Cc: Sujith Manoharan, netdev, Dave Taht

On Mon, 2013-11-11 at 15:43 +0000, David Laight wrote:
> > > Or, maybe:
> > > 5) call skb_orphan() (I think that is the correct function) when transmit
> > >    packets are given to the hardware.
> > 
> > This is the worth possible solution, as it basically re-enables
>               ^^^^^ worst ?
> > bufferbloat again.
> 
> It should be ok if the mac driver only gives the hardware a small
> number of bytes/packets - or one appropriate for the link speed.

There is some confusion here.

mvneta has a TX ring buffer, which can hold up to 532 TX descriptors.

If this driver used skb_orphan(), a single TCP flow could use the whole
TX ring.

TCP Small Queue would only limit the number of skbs on Qdisc.

Try then to send a ping message, it will have to wait a lot.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: TCP performance regression
  2013-11-11 16:17                 ` Eric Dumazet
@ 2013-11-11 16:35                   ` David Laight
  2013-11-11 17:41                     ` Eric Dumazet
  2013-11-12  7:42                     ` Willy Tarreau
  0 siblings, 2 replies; 26+ messages in thread
From: David Laight @ 2013-11-11 16:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sujith Manoharan, netdev, Dave Taht

> > It should be ok if the mac driver only gives the hardware a small
> > number of bytes/packets - or one appropriate for the link speed.
> 
> There is some confusion here.
> 
> mvneta has a TX ring buffer, which can hold up to 532 TX descriptors.
> 
> If this driver used skb_orphan(), a single TCP flow could use the whole
> TX ring.
> 
> TCP Small Queue would only limit the number of skbs on Qdisc.
> 
> Try then to send a ping message, it will have to wait a lot.

532 is a ridiculously large number especially for a slow interface.
At a guess you don't want more than 10-20ms of data in the tx ring.
You might need extra descriptors for badly fragmented packets.

	David


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: TCP performance regression
  2013-11-11 16:35                   ` David Laight
@ 2013-11-11 17:41                     ` Eric Dumazet
  2013-11-12  7:42                     ` Willy Tarreau
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2013-11-11 17:41 UTC (permalink / raw)
  To: David Laight; +Cc: Sujith Manoharan, netdev, Dave Taht

On Mon, 2013-11-11 at 16:35 +0000, David Laight wrote:
> > > It should be ok if the mac driver only gives the hardware a small
> > > number of bytes/packets - or one appropriate for the link speed.
> > 
> > There is some confusion here.
> > 
> > mvneta has a TX ring buffer, which can hold up to 532 TX descriptors.
> > 
> > If this driver used skb_orphan(), a single TCP flow could use the whole
> > TX ring.
> > 
> > TCP Small Queue would only limit the number of skbs on Qdisc.
> > 
> > Try then to send a ping message, it will have to wait a lot.
> 
> 532 is a ridiculously large number especially for a slow interface.
> At a guess you don't want more than 10-20ms of data in the tx ring.
> You might need extra descriptors for badly fragmented packets.

Thats why we invented BQL.

Problem is most driver authors don't care of the problem.

They already have hard time to make bug free drivers.

BQL is adding pressure and expose long standing bugs.

Some drivers have large TX rings to lower race probabilities.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 16:35                   ` David Laight
  2013-11-11 17:41                     ` Eric Dumazet
@ 2013-11-12  7:42                     ` Willy Tarreau
  2013-11-12 14:16                       ` Eric Dumazet
  2013-11-14  9:54                       ` Dave Taht
  1 sibling, 2 replies; 26+ messages in thread
From: Willy Tarreau @ 2013-11-12  7:42 UTC (permalink / raw)
  To: David Laight; +Cc: Eric Dumazet, Sujith Manoharan, netdev, Dave Taht

On Mon, Nov 11, 2013 at 04:35:30PM -0000, David Laight wrote:
> > > It should be ok if the mac driver only gives the hardware a small
> > > number of bytes/packets - or one appropriate for the link speed.
> > 
> > There is some confusion here.
> > 
> > mvneta has a TX ring buffer, which can hold up to 532 TX descriptors.
> > 
> > If this driver used skb_orphan(), a single TCP flow could use the whole
> > TX ring.
> > 
> > TCP Small Queue would only limit the number of skbs on Qdisc.
> > 
> > Try then to send a ping message, it will have to wait a lot.
> 
> 532 is a ridiculously large number especially for a slow interface.
> At a guess you don't want more than 10-20ms of data in the tx ring.

Well, it's not *that* large, 532 descriptors is 800 kB or 6.4 ms with
1500-bytes packets, and 273 microseconds for 64-byte packets. In fact
it's not a slow interface, it's the systems it runs on which are
generally not that fast. For example it is possible to saturate two
gig ports at once on a single-core Armada370. But you need buffers
large enough to compensate for the context switch time if you use
multiple threads to send.

Regards,
Willy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-12  7:42                     ` Willy Tarreau
@ 2013-11-12 14:16                       ` Eric Dumazet
  2013-11-14  9:54                       ` Dave Taht
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2013-11-12 14:16 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Laight, Sujith Manoharan, netdev, Dave Taht

On Tue, 2013-11-12 at 08:42 +0100, Willy Tarreau wrote:

> Well, it's not *that* large, 532 descriptors is 800 kB or 6.4 ms with
> 1500-bytes packets, and 273 microseconds for 64-byte packets. In fact
> it's not a slow interface, it's the systems it runs on which are
> generally not that fast. For example it is possible to saturate two
> gig ports at once on a single-core Armada370. But you need buffers
> large enough to compensate for the context switch time if you use
> multiple threads to send.

With GSO, each 1500-bytes packet might need 2 descriptors anyway (one
for the headers, one for the payload), so 532 descriptors only hold 400
kB or 3.2 ms ;)

If the NIC was supporting TSO, this would be another story, as a 64KB
packet could use only 3 descriptors.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-12  7:42                     ` Willy Tarreau
  2013-11-12 14:16                       ` Eric Dumazet
@ 2013-11-14  9:54                       ` Dave Taht
  1 sibling, 0 replies; 26+ messages in thread
From: Dave Taht @ 2013-11-14  9:54 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: David Laight, Eric Dumazet, Sujith Manoharan,
	netdev@vger.kernel.org

On Mon, Nov 11, 2013 at 11:42 PM, Willy Tarreau <w@1wt.eu> wrote:
> On Mon, Nov 11, 2013 at 04:35:30PM -0000, David Laight wrote:
>> > > It should be ok if the mac driver only gives the hardware a small
>> > > number of bytes/packets - or one appropriate for the link speed.
>> >
>> > There is some confusion here.
>> >
>> > mvneta has a TX ring buffer, which can hold up to 532 TX descriptors.
>> >
>> > If this driver used skb_orphan(), a single TCP flow could use the whole
>> > TX ring.
>> >
>> > TCP Small Queue would only limit the number of skbs on Qdisc.
>> >
>> > Try then to send a ping message, it will have to wait a lot.
>>
>> 532 is a ridiculously large number especially for a slow interface.
>> At a guess you don't want more than 10-20ms of data in the tx ring.
>
> Well, it's not *that* large, 532 descriptors is 800 kB or 6.4 ms with
> 1500-bytes packets, and 273 microseconds for 64-byte packets. In fact
> it's not a slow interface, it's the systems it runs on which are
> generally not that fast. For example it is possible to saturate two
> gig ports at once on a single-core Armada370. But you need buffers
> large enough to compensate for the context switch time if you use
> multiple threads to send.

There is this terrible tendency to think that all interfaces run at
maximum rate, always. There has been an interesting trend towards
slower rates of late -Things like the pi and beaglebone black have
interfaces that run at 100Mbits (cheaper phy, less power), and thus
communication from a armada370 in this case, at line rate, would
induce up to 64ms of delay. A 10Mbit interface, 640ms. Many devices
that connect to the internet run at these lower speeds.

BQL can hold that down to something reasonable at a wide range of line
rates on ethernet.

In the context of 802.11 wireless, the rate problem is much, much
worse, going down to 1Mbit, and never getting as high as a gig, and
often massively extending things with exorbitant retries and
retransmits.

Although Eric fixed "the regression" on the new fq stuff vs a vs the
ath10k and ath9k, I would really have liked a set of benchmarks of the
ath10k and ath9k device and driver at realistic rates like MCS1 and
MCS4, to make more clear the problems those devices have at real
world, rather than lab, transmission rates.

>
> Regards,
> Willy
>

-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 14:27         ` Eric Dumazet
  2013-11-11 14:39           ` Eric Dumazet
  2013-11-11 15:05           ` David Laight
@ 2013-11-11 16:13           ` Sujith Manoharan
  2013-11-11 16:38             ` Felix Fietkau
  2 siblings, 1 reply; 26+ messages in thread
From: Sujith Manoharan @ 2013-11-11 16:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Felix Fietkau, netdev, Dave Taht

Eric Dumazet wrote:
> We have many choices.
> 
> 1) Add back a minimum of ~128 K of outstanding bytes per TCP session,
>    so that buggy drivers can sustain 'line rate'.
> 
>    Note that with 100 concurrent TCP streams, total amount of bytes
>    queued on the NIC is 12 MB.
>    And pfifo_fast qdisc will drop packets anyway.
> 
>    Thats what we call 'BufferBloat'
> 
> 2) Try lower values like 64K. Still bufferbloat.
> 
> 3) Fix buggy drivers, using a proper logic, or shorter timers (mvneta
> case for example)
> 
> 4) Add a new netdev attribute, so that well behaving NIC drivers do not
> have to artificially force TCP stack to queue too many bytes in
> Qdisc/NIC queues.

I think the quirks of 802.11 aggregation should be taken into account.
I am adding Felix to this thread, who would have more to say on latency/bufferbloat
with wireless drivers.

Sujith

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 16:13           ` Sujith Manoharan
@ 2013-11-11 16:38             ` Felix Fietkau
  2013-11-11 17:38               ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Felix Fietkau @ 2013-11-11 16:38 UTC (permalink / raw)
  To: Sujith Manoharan, Eric Dumazet; +Cc: netdev, Dave Taht

On 2013-11-11 17:13, Sujith Manoharan wrote:
> Eric Dumazet wrote:
>> We have many choices.
>> 
>> 1) Add back a minimum of ~128 K of outstanding bytes per TCP session,
>>    so that buggy drivers can sustain 'line rate'.
>> 
>>    Note that with 100 concurrent TCP streams, total amount of bytes
>>    queued on the NIC is 12 MB.
>>    And pfifo_fast qdisc will drop packets anyway.
>> 
>>    Thats what we call 'BufferBloat'
>> 
>> 2) Try lower values like 64K. Still bufferbloat.
>> 
>> 3) Fix buggy drivers, using a proper logic, or shorter timers (mvneta
>> case for example)
>> 
>> 4) Add a new netdev attribute, so that well behaving NIC drivers do not
>> have to artificially force TCP stack to queue too many bytes in
>> Qdisc/NIC queues.
> 
> I think the quirks of 802.11 aggregation should be taken into account.
> I am adding Felix to this thread, who would have more to say on latency/bufferbloat
> with wireless drivers.
I don't think this issue is about something as simple as timer handling
for tx completion (or even broken/buggy drivers).

There's simply no way to make 802.11 aggregation work well and have
similar tx completion latency characteristics as Ethernet devices.

802.11 aggregation reduces the per-packet airtime overhead by combining
multiple packets into one transmission (saving a lot of time getting a
tx opportunity, transmitting the PHY header, etc.), which makes the
'line rate' heavily depend on the amount of buffering.

Aggregating multiple packets into one transmission also causes extra
packet loss, which is compensated by retransmission and reordering, thus
introducing additional latency.

I don't think that TSQ can do a decent job of mitigating bufferbloat on
802.11n devices without a significant performance hit, so adding a new
netdev attribute might be a good idea.

- Felix

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 16:38             ` Felix Fietkau
@ 2013-11-11 17:38               ` Eric Dumazet
  2013-11-11 17:44                 ` Felix Fietkau
  2013-11-11 18:03                 ` Dave Taht
  0 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2013-11-11 17:38 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: Sujith Manoharan, netdev, Dave Taht

On Mon, 2013-11-11 at 17:38 +0100, Felix Fietkau wrote:
> On 2013-11-11 17:13, Sujith Manoharan wrote:
> > Eric Dumazet wrote:
> >> We have many choices.
> >> 
> >> 1) Add back a minimum of ~128 K of outstanding bytes per TCP session,
> >>    so that buggy drivers can sustain 'line rate'.
> >> 
> >>    Note that with 100 concurrent TCP streams, total amount of bytes
> >>    queued on the NIC is 12 MB.
> >>    And pfifo_fast qdisc will drop packets anyway.
> >> 
> >>    Thats what we call 'BufferBloat'
> >> 
> >> 2) Try lower values like 64K. Still bufferbloat.
> >> 
> >> 3) Fix buggy drivers, using a proper logic, or shorter timers (mvneta
> >> case for example)
> >> 
> >> 4) Add a new netdev attribute, so that well behaving NIC drivers do not
> >> have to artificially force TCP stack to queue too many bytes in
> >> Qdisc/NIC queues.
> > 
> > I think the quirks of 802.11 aggregation should be taken into account.
> > I am adding Felix to this thread, who would have more to say on latency/bufferbloat
> > with wireless drivers.
> I don't think this issue is about something as simple as timer handling
> for tx completion (or even broken/buggy drivers).
> 
> There's simply no way to make 802.11 aggregation work well and have
> similar tx completion latency characteristics as Ethernet devices.
> 
> 802.11 aggregation reduces the per-packet airtime overhead by combining
> multiple packets into one transmission (saving a lot of time getting a
> tx opportunity, transmitting the PHY header, etc.), which makes the
> 'line rate' heavily depend on the amount of buffering.

How long a TX packet is put on hold hoping a following packet will
come ?



> Aggregating multiple packets into one transmission also causes extra
> packet loss, which is compensated by retransmission and reordering, thus
> introducing additional latency.
> 
> I don't think that TSQ can do a decent job of mitigating bufferbloat on
> 802.11n devices without a significant performance hit, so adding a new
> netdev attribute might be a good idea.

The netdev attribute would work, but might not work well if using a
tunnel...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 17:38               ` Eric Dumazet
@ 2013-11-11 17:44                 ` Felix Fietkau
  2013-11-11 18:03                 ` Dave Taht
  1 sibling, 0 replies; 26+ messages in thread
From: Felix Fietkau @ 2013-11-11 17:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sujith Manoharan, netdev, Dave Taht

On 2013-11-11 18:38, Eric Dumazet wrote:
> On Mon, 2013-11-11 at 17:38 +0100, Felix Fietkau wrote:
>> I don't think this issue is about something as simple as timer handling
>> for tx completion (or even broken/buggy drivers).
>> 
>> There's simply no way to make 802.11 aggregation work well and have
>> similar tx completion latency characteristics as Ethernet devices.
>> 
>> 802.11 aggregation reduces the per-packet airtime overhead by combining
>> multiple packets into one transmission (saving a lot of time getting a
>> tx opportunity, transmitting the PHY header, etc.), which makes the
>> 'line rate' heavily depend on the amount of buffering.
> 
> How long a TX packet is put on hold hoping a following packet will
> come ?
TX packets in the aggregation queue are held as long as the hardware
queue holds two A-MPDUs (each of which can contain up to 32 packets).
If the aggregation queues are empty and the hardware queue is not full,
the next tx packet from the network stack is pushed to the hardware
queue immediately.

- Felix

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 17:38               ` Eric Dumazet
  2013-11-11 17:44                 ` Felix Fietkau
@ 2013-11-11 18:03                 ` Dave Taht
  2013-11-11 18:29                   ` Sujith Manoharan
  2013-11-11 18:31                   ` Dave Taht
  1 sibling, 2 replies; 26+ messages in thread
From: Dave Taht @ 2013-11-11 18:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Felix Fietkau, Sujith Manoharan, netdev@vger.kernel.org

On Mon, Nov 11, 2013 at 9:38 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2013-11-11 at 17:38 +0100, Felix Fietkau wrote:
>> On 2013-11-11 17:13, Sujith Manoharan wrote:
>> > Eric Dumazet wrote:
>> >> We have many choices.
>> >>
>> >> 1) Add back a minimum of ~128 K of outstanding bytes per TCP session,
>> >>    so that buggy drivers can sustain 'line rate'.
>> >>
>> >>    Note that with 100 concurrent TCP streams, total amount of bytes
>> >>    queued on the NIC is 12 MB.
>> >>    And pfifo_fast qdisc will drop packets anyway.
>> >>
>> >>    Thats what we call 'BufferBloat'
>> >>
>> >> 2) Try lower values like 64K. Still bufferbloat.
>> >>
>> >> 3) Fix buggy drivers, using a proper logic, or shorter timers (mvneta
>> >> case for example)
>> >>
>> >> 4) Add a new netdev attribute, so that well behaving NIC drivers do not
>> >> have to artificially force TCP stack to queue too many bytes in
>> >> Qdisc/NIC queues.
>> >
>> > I think the quirks of 802.11 aggregation should be taken into account.
>> > I am adding Felix to this thread, who would have more to say on latency/bufferbloat
>> > with wireless drivers.

As I just got dropped in the middle of this convo, I tend to think
that the mac80211 questions is should be handled in it's own thread as
this conversation seemed to be about a certain ethernet driver's
flaws.

>> I don't think this issue is about something as simple as timer handling
>> for tx completion (or even broken/buggy drivers).
>>
>> There's simply no way to make 802.11 aggregation work well and have
>> similar tx completion latency characteristics as Ethernet devices.

I don't quite share all of felix's pessimism. It will tend to be
burstier, yes, but I felt that would not look that much different than
napi -> BQL.

>> 802.11 aggregation reduces the per-packet airtime overhead by combining
>> multiple packets into one transmission (saving a lot of time getting a
>> tx opportunity, transmitting the PHY header, etc.), which makes the
>> 'line rate' heavily depend on the amount of buffering.

making aggregation work well is key to fixing wifi worldwide.
Presently aggregation performance is pretty universally terrible under
real loads and tcp.

(looking further ahead, getting multi-user mimo to work in 802.11ac
would also be helpful but I'm not even sure the IEEE figured that out
yet. Ath10k hw2 do it?)

> How long a TX packet is put on hold hoping a following packet will
> come ?
>
>
>
>> Aggregating multiple packets into one transmission also causes extra
>> packet loss, which is compensated by retransmission and reordering, thus
>> introducing additional latency.

I was extremely encouraged by Yucheng's presentation at ietf on some
vast improvements on managing re-ordering problems. I daydreamed that
it would become possible to eliminate the reorder buffer in lower
levels of the wireless stack(s?).

See slides and fantasize:

http://www.ietf.org/proceedings/88/slides/slides-88-iccrg-6.pdf

The rest of the preso was good, too.

I also thought the new pacing stuff would cause trouble in wifi and aggregation.

>> I don't think that TSQ can do a decent job of mitigating bufferbloat on
>> 802.11n devices without a significant performance hit, so adding a new
>> netdev attribute might be a good idea.

I am not sure which part of what subsystem(s) is really under debate
here. TSQ limits the number of packets that can be outstanding in a
stream. The characteristics of a wifi connection (EDCA scheduling and
aggregated batching) play merry hell with TCP assumptions. The recent
work on fixing TSO offloads shows what can happen if that underlying
set of assumptions is fixed.

My overall take on this, tho, is to take the latest bits of  TSQ and
"fq" code, and go measure the effect on wifi stations rather than
discuss what layer is busted or what options need to be added to
netdev. Has anyone done that? I've been busy with 3.10.x

Personally I don't have much of a problem if TSQ hurts single stream
TCP throughput on wifi. I would vastly prefer aggregation to work
better for multiple streams with vastly smaller buffers than it does.
That would be a bigger win, overall.

> The netdev attribute would work, but might not work well if using a
> tunnel...

I am going to make some coffee and catch up. Please excuse whatever
noise I just introduced.

>
>
>

-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 18:03                 ` Dave Taht
@ 2013-11-11 18:29                   ` Sujith Manoharan
  2013-11-11 18:31                   ` Dave Taht
  1 sibling, 0 replies; 26+ messages in thread
From: Sujith Manoharan @ 2013-11-11 18:29 UTC (permalink / raw)
  To: Dave Taht; +Cc: Eric Dumazet, Felix Fietkau, netdev@vger.kernel.org

Dave Taht wrote:
> Personally I don't have much of a problem if TSQ hurts single stream
> TCP throughput on wifi. I would vastly prefer aggregation to work
> better for multiple streams with vastly smaller buffers than it does.
> That would be a bigger win, overall.

ath9k doesn't hold very deep queues for aggregated traffic. A maximum
of 128 packets can be buffered for each Access Class queue and still
good throughput is obtained, even for 3x3 scenarios.

A loss of almost 50% throughput is seen in 1x1 setups and the penalty
becomes higher with more streams. I don't think such a big loss
in performance is acceptable to achieve low latency.

Sujith

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 18:03                 ` Dave Taht
  2013-11-11 18:29                   ` Sujith Manoharan
@ 2013-11-11 18:31                   ` Dave Taht
  2013-11-11 19:11                     ` Ben Greear
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Taht @ 2013-11-11 18:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Felix Fietkau, Sujith Manoharan, netdev@vger.kernel.org,
	Avery Pennarun

Ah, this thread started with a huge regression in ath10k performance
with the new TSQ stuff, and isn't actually about a two line fix to the
mv ethernet driver.

http://comments.gmane.org/gmane.linux.network/290269

I suddenly care a lot more. And I'll care a lot, lot, lot more, if
someone can post a rrul test for before and after the new fq scheduler
and tsq change on this driver on this hardware... What, if anything,
in terms of improvements or regressions, happened to multi-stream
throughput and latency?

https://github.com/tohojo/netperf-wrapper

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 18:31                   ` Dave Taht
@ 2013-11-11 19:11                     ` Ben Greear
  2013-11-11 19:24                       ` Dave Taht
  0 siblings, 1 reply; 26+ messages in thread
From: Ben Greear @ 2013-11-11 19:11 UTC (permalink / raw)
  To: Dave Taht
  Cc: Eric Dumazet, Felix Fietkau, Sujith Manoharan,
	netdev@vger.kernel.org, Avery Pennarun

On 11/11/2013 10:31 AM, Dave Taht wrote:
> Ah, this thread started with a huge regression in ath10k performance
> with the new TSQ stuff, and isn't actually about a two line fix to the
> mv ethernet driver.
> 
> http://comments.gmane.org/gmane.linux.network/290269
> 
> I suddenly care a lot more. And I'll care a lot, lot, lot more, if
> someone can post a rrul test for before and after the new fq scheduler
> and tsq change on this driver on this hardware... What, if anything,
> in terms of improvements or regressions, happened to multi-stream
> throughput and latency?
> 
> https://github.com/tohojo/netperf-wrapper

Not directly related, but we have run some automated tests against
an older buffer-bloat enabled AP (not ath10k hardware, don't know the
exact details at the moment), and in general the performance
is horrible compared to all of the other APs we test against.

Our tests are concerned mostly with throughput.

For reference, here are some graphs with supplicant/hostapd
running on higher-end x86-64 hardware and ath9k:

http://www.candelatech.com/lf_wifi_examples.php

We see somewhat similar results with most commercial APs, though
often they max out at 128 or fewer stations instead of the several
hundred we get on our own AP configs.

We'll update to more recent buffer-bloat AP software and post some
results when we get a chance.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: TCP performance regression
  2013-11-11 19:11                     ` Ben Greear
@ 2013-11-11 19:24                       ` Dave Taht
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Taht @ 2013-11-11 19:24 UTC (permalink / raw)
  To: Ben Greear
  Cc: Eric Dumazet, Felix Fietkau, Sujith Manoharan,
	netdev@vger.kernel.org, Avery Pennarun

On Mon, Nov 11, 2013 at 11:11 AM, Ben Greear <greearb@candelatech.com> wrote:
> On 11/11/2013 10:31 AM, Dave Taht wrote:
>> Ah, this thread started with a huge regression in ath10k performance
>> with the new TSQ stuff, and isn't actually about a two line fix to the
>> mv ethernet driver.
>>
>> http://comments.gmane.org/gmane.linux.network/290269
>>
>> I suddenly care a lot more. And I'll care a lot, lot, lot more, if
>> someone can post a rrul test for before and after the new fq scheduler
>> and tsq change on this driver on this hardware... What, if anything,
>> in terms of improvements or regressions, happened to multi-stream
>> throughput and latency?
>>
>> https://github.com/tohojo/netperf-wrapper
>
> Not directly related, but we have run some automated tests against
> an older buffer-bloat enabled AP (not ath10k hardware, don't know the
> exact details at the moment), and in general the performance
> is horrible compared to all of the other APs we test against.

I was not happy with the dlink product and the streamboost
implementation, if that is what it was.

> Our tests are concerned mostly with throughput.

:(

> For reference, here are some graphs with supplicant/hostapd
> running on higher-end x86-64 hardware and ath9k:
>
> http://www.candelatech.com/lf_wifi_examples.php
>
> We see somewhat similar results with most commercial APs, though
> often they max out at 128 or fewer stations instead of the several
> hundred we get on our own AP configs.
>
> We'll update to more recent buffer-bloat AP software and post some
> results when we get a chance.

Are you talking cerowrt (on the wndr3800) here? I am well aware that
it doesn't presently scale well with large numbers of clients, which
is awaiting the per-sta queue work. (most of the work to date has been
on the aqm-to-the-universe code)

This is the most recent stable firmware for that:

http://snapon.lab.bufferbloat.net/~cero2/cerowrt/wndr/3.10.17-6/

I just did 3.10.18 but haven't tested it.

Cero also runs HT20 by default, and there are numerous other things
that are configured more for "science" than throughput. Notably the
size of the aggregation queues is limited. But I'd LOVE a test through
your suite.

I note I'd also love to see TCP tests through your suite with the AP
configured thusly

(server) - (100ms delay box running a recent netem and a packet limit
of 100000+) - AP (w 1000 packets buffering/wo AQM, and with AQM) -
(wifi clients)

(and will gladly help set that up. Darn, I just drove past your offices)

>
> Thanks,
> Ben
>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>



-- 
Dave Täht

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2013-11-14  9:54 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-11  5:30 TCP performance regression Sujith Manoharan
2013-11-11  5:55 ` Eric Dumazet
2013-11-11  6:07   ` Sujith Manoharan
2013-11-11  6:54     ` Eric Dumazet
2013-11-11  8:19       ` Sujith Manoharan
2013-11-11 14:27         ` Eric Dumazet
2013-11-11 14:39           ` Eric Dumazet
2013-11-11 16:44             ` Eric Dumazet
2013-11-11 15:05           ` David Laight
2013-11-11 15:29             ` Eric Dumazet
2013-11-11 15:43               ` David Laight
2013-11-11 16:17                 ` Eric Dumazet
2013-11-11 16:35                   ` David Laight
2013-11-11 17:41                     ` Eric Dumazet
2013-11-12  7:42                     ` Willy Tarreau
2013-11-12 14:16                       ` Eric Dumazet
2013-11-14  9:54                       ` Dave Taht
2013-11-11 16:13           ` Sujith Manoharan
2013-11-11 16:38             ` Felix Fietkau
2013-11-11 17:38               ` Eric Dumazet
2013-11-11 17:44                 ` Felix Fietkau
2013-11-11 18:03                 ` Dave Taht
2013-11-11 18:29                   ` Sujith Manoharan
2013-11-11 18:31                   ` Dave Taht
2013-11-11 19:11                     ` Ben Greear
2013-11-11 19:24                       ` Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).