Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] tc_util: fix incorrect bare number process in get_rate.
From: Stephen Hemminger @ 2012-07-11 14:51 UTC (permalink / raw)
  To: Li Wei; +Cc: netdev
In-Reply-To: <4FFD2A42.9080507@cn.fujitsu.com>

On Wed, 11 Jul 2012 15:24:50 +0800
Li Wei <lw@cn.fujitsu.com> wrote:

> 
> As the comment and manpage indicated that the bare number means
> bytes per second, so the division is not needed.
> ---
>  tc/tc_util.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/tc/tc_util.c b/tc/tc_util.c
> index 926ed08..e8d89c1 100644
> --- a/tc/tc_util.c
> +++ b/tc/tc_util.c
> @@ -153,7 +153,7 @@ int get_rate(unsigned *rate, const char *str)
>  		return -1;
>  
>  	if (*p == '\0') {
> -		*rate = bps / 8.;	/* assume bytes/sec */
> +		*rate = bps;	/* assume bytes/sec */
>  		return 0;
>  	}
>  
Thanks for finding this. The documentation, code and comment do
all need to be the same!

But changing the code as you propose would break existing usage
by scripts. Instead, the man page and comment need to change
to match the reality of the existing application.

^ permalink raw reply

* Re: [patch] net/mlx4: off by one in parse_trans_rule()
From: Hadar Hen Zion @ 2012-07-11 14:51 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Hadar Hen Zion, David S. Miller, Or Gerlitz, Eugenia Emantayev,
	Yevgeny Petrilin, netdev, kernel-janitors
In-Reply-To: <20120711063336.GC11812@elgon.mountain>

On 7/11/2012 9:33 AM, Dan Carpenter wrote:
> This should be ">=" here instead of ">".  MLX4_NET_TRANS_RULE_NUM is 6.
> We use "spec->id" as an array offset into the __rule_hw_sz[] and
> __sw_id_hw[] arrays which have 6 elements.
>
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
> index bc62f53..5bac0df 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
> @@ -773,7 +773,7 @@ static int parse_trans_rule(struct mlx4_dev *dev, struct mlx4_spec_list *spec,
>   		[MLX4_NET_TRANS_RULE_ID_UDP] =
>   			sizeof(struct mlx4_net_trans_rule_hw_tcp_udp)
>   	};
> -	if (spec->id > MLX4_NET_TRANS_RULE_NUM) {
> +	if (spec->id >= MLX4_NET_TRANS_RULE_NUM) {
>   		mlx4_err(dev, "Invalid network rule id. id = %d\n", spec->id);
>   		return -EINVAL;
>   	}
>

Hi Dan,

This is indeed a bug, thanks for spotting this over,

Please add:
Acked-by: Hadar Hen Zion <hadarh@mellanox.co.il>

Hadar

^ permalink raw reply

* Re: FW: [patch] net/mlx4_en: dereferencing freed memory
From: Hadar Hen Zion @ 2012-07-11 15:01 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: David S. Miller, amirv, Or Gerlitz, Alexander Guller, netdev,
	kernel-janitors
In-Reply-To: <EB097B4067A7C64DA511817018705AE595EDE0BE@MTLDAG02.mtl.com>

On 7/11/2012 11:32 AM, Amir Vadai wrote:
>
>
> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Wednesday, July 11, 2012 9:34 AM
> To: Yevgeny Petrilin
> Cc: David S. Miller; Amir Vadai; Or Gerlitz; Alexander Guller; netdev@vger.kernel.org; kernel-janitors@vger.kernel.org
> Subject: [patch] net/mlx4_en: dereferencing freed memory
>
> We dereferenced "mclist" after the kfree().
>
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 94375a8..4ce5ca8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -503,9 +503,7 @@ static void mlx4_en_do_set_multicast(struct work_struct *work)
>   				/* remove from list */
>   				list_del(&mclist->list);
>   				kfree(mclist);
> -			}
> -
> -			if (mclist->action == MCLIST_ADD) {
> +			} else if (mclist->action == MCLIST_ADD) {
>   				/* attach the address */
>   				memcpy(&mc_list[10], mclist->addr, ETH_ALEN);
>   				/* needed for B0 steering support */
>

Hi Dan,

It's the same in here. This is indeed a bug, thanks for spotting this over,

Please add:
Acked-by: Hadar Hen Zion <hadarh@mellanox.co.il>

Hadar

^ permalink raw reply

* Re: [PATCH 4/4] asix: Add a new driver for the AX88172A
From: Christian Riesch @ 2012-07-11 15:10 UTC (permalink / raw)
  To: michael
  Cc: Ben Hutchings, netdev, Oliver Neukum, Eric Dumazet, Allan Chou,
	Mark Lord, Grant Grundler, Ming Lei
In-Reply-To: <CABkLObo5v00QKo-X7hEVbMcXA_QwKFA6HfL-Le5VvU2J5Cs2eg@mail.gmail.com>

Hi again,

On Wed, Jul 11, 2012 at 10:27 AM, Christian Riesch
<christian.riesch@omicron.at> wrote:
> Hi Ben and Michael,
>
> On Mon, Jul 9, 2012 at 12:30 PM, Christian Riesch
> <christian.riesch@omicron.at> wrote:
>> Hi Ben and Michael,
>>
>> On Sun, Jul 8, 2012 at 5:39 PM, Michael Riesch <michael@riesch.at> wrote:
>>> On Fri, 2012-07-06 at 18:37 +0100, Ben Hutchings wrote:
>>>> > +   priv->mdio->priv = (void *)dev;
>>>> > +   priv->mdio->read = &asix_mdio_bus_read;
>>>> > +   priv->mdio->write = &asix_mdio_bus_write;
>>>> > +   priv->mdio->name = "Asix MDIO Bus";
>>>> > +   snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "asix-%s",
>>>> > +            dev_name(dev->net->dev.parent));
>>>> [...]
>>>>
>>>> I think you need to ensure that the bus identifier is unique throughout
>>>> its lifetime, but net devices can be renamed and that could lead to a
>>>> collision.  Perhaps you could use the ifindex or the USB device path
>>>
>>> Ben,
>>>
>>> the dev_name function in the code above returns the sysfs filename of
>>> the USB device (e.g. 1-0:1.0).
>>>
>>>> (though that might be too long).
>>>
>>> This may be a problem. The bus identifier may be 17 characters long, so
>>> if we leave the endpoint/configuration part (:1.0) and the prefix away
>>> it should be fine in any "normal" system. However, on a system with a
>>> more-than-9-root-hubs 5-tier 127-devices-each USB infrastructure it
>>> results in collisions. So is this approach acceptable?
>>>
>>> Using the ifindex sounds good to me,
>>>
>>> snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "asix-%d",
>>>         dev->net->ifindex);
>>>
>>> works on any system with less than 10^12 network interfaces.
>>
>> Ok, I'll change that to use ifindex.
>
> No, I won't.
> At the time the mdio bus is registered, ifindex is not yet set, so the
> snprintf would always result in "asix-0".

What do you think about
snprintf(priv->mdio->id, MII_BUS_ID_SIZE, "usb-%03d:%03d",
dev->udev->bus->busnum, dev->udev->devnum);
??

This would use the busnum/devnum identifier as reported by lsusb and
would be short enough for an mdio bus name.

Thanks, Christian

^ permalink raw reply

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 15:11 UTC (permalink / raw)
  To: David Miller; +Cc: nanditad, netdev, codel, mattmathis, ncardwell
In-Reply-To: <1341933215.3265.5476.camel@edumazet-glaptop>

On Tue, 2012-07-10 at 17:13 +0200, Eric Dumazet wrote:
> This introduce TSQ (TCP Small Queues)
> 
> TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
> device queues), to reduce RTT and cwnd bias, part of the bufferbloat
> problem.
> 
> sk->sk_wmem_alloc not allowed to grow above a given limit,
> allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
> given time.
> 
> TSO packets are sized/capped to half the limit, so that we have two
> TSO packets in flight, allowing better bandwidth use.
> 
> As a side effect, setting the limit to 40000 automatically reduces the
> standard gso max limit (65536) to 40000/2 : It can help to reduce
> latencies of high prio packets, having smaller TSO packets.
> 
> This means we divert sock_wfree() to a tcp_wfree() handler, to
> queue/send following frames when skb_orphan() [2] is called for the
> already queued skbs.
> 
> Results on my dev machine (tg3 nic) are really impressive, using
> standard pfifo_fast, and with or without TSO/GSO. Without reduction of
> nominal bandwidth.
> 
> I no longer have 3MBytes backlogged in qdisc by a single netperf
> session, and both side socket autotuning no longer use 4 Mbytes.
> 
> As skb destructor cannot restart xmit itself ( as qdisc lock might be
> taken at this point ), we delegate the work to a tasklet. We use one
> tasklest per cpu for performance reasons.
> 
> 
> 
> [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
> [2] skb_orphan() is usually called at TX completion time,
>   but some drivers call it in their start_xmit() handler.
>   These drivers should at least use BQL, or else a single TCP
>   session can still fill the whole NIC TX ring, since TSQ will
>   have no effect.

I am going to send an official patch (I'll put a v3 tag in it)

I believe I did a full implementation, including the xmit() done
by the user at release_sock() time, if the tasklet found socket owned by
the user.

Some bench results about the choice of 128KB being the default value:

64KB seems the 'good' value on 10Gb links to reach max throughput on my
lab machines (ixgbe adapters).

Using 128KB is a very conservative value to allow link rate on 20Gbps.

Still, it allows less than 1ms of buffering on a Gbit link, and less
than 8ms on 100Mbit link (instead of 130ms without Small Queues)


Tests using a single TCP flow.

Tests on 10Gbit links :


echo 16384 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 79
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
392360      392360      16384  20.00   1389.53    10^6bits/s  0.52  S      4.30   S      0.737   1.014   usec/KB  

echo 24576 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 33 tpci_snd_cwnd 86
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
396976      396976      16384  20.00   1483.03    10^6bits/s  0.45  S      4.51   S      0.603   0.997   usec/KB  

echo 32768 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 19 tpci_snd_cwnd 100
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
461600      461600      16384  20.00   2039.67    10^6bits/s  0.64  S      5.17   S      0.620   0.830   usec/KB  

echo 49152 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 28 tpci_snd_cwnd 207
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
955512      955512      16384  20.00   4448.86    10^6bits/s  1.19  S      11.16  S      0.526   0.822   usec/KB  

echo 65536 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 399 tpci_snd_cwnd 488
tcpi_reordering 127 tcpi_total_retrans 75
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
2460328     2460328     16384  20.00   5975.12    10^6bits/s  1.81  S      14.65  S      0.595   0.803   usec/KB  

echo 81920 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 24 tpci_snd_cwnd 236
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1144768     1144768     16384  20.00   5190.08    10^6bits/s  1.56  S      12.63  S      0.591   0.798   usec/KB  

echo 98304 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 20 tpci_snd_cwnd 644
tcpi_reordering 59 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
2991168     2991168     16384  20.00   5976.00    10^6bits/s  1.60  S      14.61  S      0.526   0.801   usec/KB  

echo 114688 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 23 tpci_snd_cwnd 683
tcpi_reordering 59 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
3161960     3161960     16384  20.00   5975.14    10^6bits/s  1.42  S      14.78  S      0.469   0.810   usec/KB  

echo 131072 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 23 tpci_snd_cwnd 591
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
2728056     2728056     16384  20.00   5976.16    10^6bits/s  1.71  S      14.62  S      0.562   0.802   usec/KB  

echo 147456 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 27 tpci_snd_cwnd 697
tcpi_reordering 64 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
3240432     3240432     16384  20.00   5975.64    10^6bits/s  1.51  S      14.78  S      0.498   0.811   usec/KB  

echo 163840 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 18 tpci_snd_cwnd 710
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
3277360     3277360     16384  20.00   5975.56    10^6bits/s  1.59  S      14.79  S      0.525   0.811   usec/KB  

echo 180224 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 32 tpci_snd_cwnd 701
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
3235816     3235816     16384  20.00   5976.80    10^6bits/s  1.56  S      14.61  S      0.514   0.801   usec/KB  

echo 196608 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 502 tpci_snd_cwnd 690
tcpi_reordering 127 tcpi_total_retrans 37
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
3185040     3185040     16384  20.00   5975.46    10^6bits/s  1.50  S      14.67  S      0.493   0.804   usec/KB  

echo 262144 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 721
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
3448152     3448152     16384  20.00   5975.49    10^6bits/s  1.57  S      14.78  S      0.516   0.811   usec/KB  

echo 524288 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2000 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 927
tcpi_reordering 53 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
4194304     4194304     16384  20.01   5976.61    10^6bits/s  1.63  S      14.56  S      0.538   0.798   usec/KB  

echo 1048576 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2500 tcpi_rttvar 750 tcpi_snd_ssthresh 17 tpci_snd_cwnd 1272
tcpi_reordering 90 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
4194304     4194304     16384  20.01   5975.11    10^6bits/s  1.64  S      14.69  S      0.541   0.805   usec/KB  



Tests on Gbit link :


echo 16384 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 30 tpci_snd_cwnd 274
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1264784     1264784     16384  20.01   689.70     10^6bits/s  0.22  S      15.05  S      0.634   7.149   usec/KB  

echo 24576 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 43 tpci_snd_cwnd 245
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1130920     1130920     16384  20.01   860.21     10^6bits/s  0.25  S      16.05  S      0.576   6.112   usec/KB  

echo 32768 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 36 tpci_snd_cwnd 229
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1057064     1057064     16384  20.01   867.76     10^6bits/s  0.28  S      15.46  S      0.634   5.839   usec/KB  

echo 49152 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 32 tpci_snd_cwnd 293
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1352488     1352488     16384  20.01   873.61     10^6bits/s  0.21  S      16.25  S      0.483   6.095   usec/KB  

echo 65536 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 48 tpci_snd_cwnd 274
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1264784     1264784     16384  20.01   875.90     10^6bits/s  0.19  S      15.56  S      0.421   5.822   usec/KB  

echo 81920 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 18 tpci_snd_cwnd 246
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1135536     1135536     16384  20.01   879.10     10^6bits/s  0.26  S      15.92  S      0.590   5.935   usec/KB  

echo 98304 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 20 tpci_snd_cwnd 361
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1666376     1666376     16384  20.02   880.30     10^6bits/s  0.25  S      16.07  S      0.560   5.980   usec/KB  

echo 114688 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 41 tpci_snd_cwnd 281
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1297096     1297096     16384  20.01   881.30     10^6bits/s  0.26  S      15.96  S      0.569   5.933   usec/KB  

echo 131072 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 30 tpci_snd_cwnd 292
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1347872     1347872     16384  20.01   880.43     10^6bits/s  0.23  S      16.71  S      0.511   6.219   usec/KB  

echo 147456 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2000 tcpi_rttvar 750 tcpi_snd_ssthresh 31 tpci_snd_cwnd 286
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1320176     1320176     16384  20.01   880.57     10^6bits/s  0.24  S      16.62  S      0.534   6.187   usec/KB  

echo 163840 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 19 tpci_snd_cwnd 406
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1874096     1874096     16384  20.02   880.23     10^6bits/s  0.25  S      17.08  S      0.550   6.358   usec/KB  

echo 180224 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2000 tcpi_rttvar 750 tcpi_snd_ssthresh 27 tpci_snd_cwnd 304
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1403264     1403264     16384  20.01   880.34     10^6bits/s  0.22  S      16.03  S      0.501   5.965   usec/KB  

echo 196608 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2000 tcpi_rttvar 750 tcpi_snd_ssthresh 42 tpci_snd_cwnd 365
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
1684840     1684840     16384  20.02   879.73     10^6bits/s  0.26  S      16.82  S      0.578   6.267   usec/KB  

echo 262144 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 202000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 2875 tcpi_rttvar 750 tcpi_snd_ssthresh 27 tpci_snd_cwnd 471
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
2174136     2174136     16384  20.01   879.89     10^6bits/s  0.25  S      18.52  S      0.556   6.898   usec/KB  

echo 524288 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 205000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 5000 tcpi_rttvar 750 tcpi_snd_ssthresh 42 tpci_snd_cwnd 627
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
2894232     2894232     16384  20.03   879.84     10^6bits/s  0.25  S      17.12  S      0.564   6.374   usec/KB  

echo 1048576 >/proc/sys/net/ipv4/tcp_limit_output_bytes
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.30.42.18 (172.30.42.18) port 0 AF_INET
tcpi_rto 209000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 9875 tcpi_rttvar 750 tcpi_snd_ssthresh 33 tpci_snd_cwnd 950
tcpi_reordering 3 tcpi_total_retrans 0
Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service  
Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand   
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units    
Final       Final                                             %     Method %      Method                          
4194304     4194304     16384  20.03   880.70     10^6bits/s  0.25  S      18.44  S      0.560   6.861   usec/KB  

^ permalink raw reply

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Ben Greear @ 2012-07-11 15:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, ycheng, dave.taht, netdev, codel, therbert,
	mattmathis, nanditad, ncardwell, andrewmcgr, Rick Jones
In-Reply-To: <1342019518.3265.8116.camel@edumazet-glaptop>

On 07/11/2012 08:11 AM, Eric Dumazet wrote:
> On Tue, 2012-07-10 at 17:13 +0200, Eric Dumazet wrote:
>> This introduce TSQ (TCP Small Queues)
>>
>> TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
>> device queues), to reduce RTT and cwnd bias, part of the bufferbloat
>> problem.
>>
>> sk->sk_wmem_alloc not allowed to grow above a given limit,
>> allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
>> given time.
>>
>> TSO packets are sized/capped to half the limit, so that we have two
>> TSO packets in flight, allowing better bandwidth use.
>>
>> As a side effect, setting the limit to 40000 automatically reduces the
>> standard gso max limit (65536) to 40000/2 : It can help to reduce
>> latencies of high prio packets, having smaller TSO packets.
>>
>> This means we divert sock_wfree() to a tcp_wfree() handler, to
>> queue/send following frames when skb_orphan() [2] is called for the
>> already queued skbs.
>>
>> Results on my dev machine (tg3 nic) are really impressive, using
>> standard pfifo_fast, and with or without TSO/GSO. Without reduction of
>> nominal bandwidth.
>>
>> I no longer have 3MBytes backlogged in qdisc by a single netperf
>> session, and both side socket autotuning no longer use 4 Mbytes.
>>
>> As skb destructor cannot restart xmit itself ( as qdisc lock might be
>> taken at this point ), we delegate the work to a tasklet. We use one
>> tasklest per cpu for performance reasons.
>>
>>
>>
>> [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
>> [2] skb_orphan() is usually called at TX completion time,
>>    but some drivers call it in their start_xmit() handler.
>>    These drivers should at least use BQL, or else a single TCP
>>    session can still fill the whole NIC TX ring, since TSQ will
>>    have no effect.
>
> I am going to send an official patch (I'll put a v3 tag in it)
>
> I believe I did a full implementation, including the xmit() done
> by the user at release_sock() time, if the tasklet found socket owned by
> the user.
>
> Some bench results about the choice of 128KB being the default value:
>
> 64KB seems the 'good' value on 10Gb links to reach max throughput on my
> lab machines (ixgbe adapters).
>
> Using 128KB is a very conservative value to allow link rate on 20Gbps.
>
> Still, it allows less than 1ms of buffering on a Gbit link, and less
> than 8ms on 100Mbit link (instead of 130ms without Small Queues)

I haven't read your patch in detail, but I was wondering if this feature
would cause trouble for applications that are servicing many sockets at once
and so might take several ms between handling each individual socket.

Or, applications that for other reasons cannot service sockets quite
as fast.  Without this feature, they could poke more data into the
xmit queues to be handled by the kernel while the app goes about it's
other user-space work?

Maybe this feature could be enabled/tuned on a per-socket basis?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 15:25 UTC (permalink / raw)
  To: Ben Greear; +Cc: nanditad, netdev, mattmathis, codel, ncardwell, David Miller
In-Reply-To: <4FFD98EA.1040301@candelatech.com>

On Wed, 2012-07-11 at 08:16 -0700, Ben Greear wrote:

> I haven't read your patch in detail, but I was wondering if this feature
> would cause trouble for applications that are servicing many sockets at once
> and so might take several ms between handling each individual socket.
> 

Well, this patch has no impact for such applications. In fact their
send()/write() will return to userland faster than before (for very
large send())

> Or, applications that for other reasons cannot service sockets quite
> as fast.  Without this feature, they could poke more data into the
> xmit queues to be handled by the kernel while the app goes about it's
> other user-space work?
> 

There is no impact for the applications. They queue their data in socket
write queue, and tcp stack do the work to actually transmit data
and handle ACKS.

Before this patch, this work was triggered by :

- Timers
- Incoming ACKS

We now add a third trigger : TX completion


> Maybe this feature could be enabled/tuned on a per-socket basis?

Well, why not, but I want first to see why it would be needed.

I mean, if a single application _needs_ to send MBytes of tcp data in
Qdisc at once, everything else on the machine is stuck (as today)

So just increase global param.

^ permalink raw reply

* [patch net-next 0/3] team: couple of patches
From: Jiri Pirko @ 2012-07-11 15:34 UTC (permalink / raw)
  To: netdev; +Cc: davem

Jiri Pirko (3):
  team: use function team_port_txable() for determing enabled and up
    port
  team: add broadcast mode
  team: make team_port_enabled() and team_port_txable() static inline

 drivers/net/team/Kconfig                |   13 ++++-
 drivers/net/team/Makefile               |    1 +
 drivers/net/team/team.c                 |    6 ---
 drivers/net/team/team_mode_broadcast.c  |   88 +++++++++++++++++++++++++++++++
 drivers/net/team/team_mode_roundrobin.c |    6 +--
 include/linux/if_team.h                 |   10 +++-
 6 files changed, 113 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/team/team_mode_broadcast.c

-- 
1.7.10.4

^ permalink raw reply

* [patch net-next 1/3] team: use function team_port_txable() for determing enabled and up port
From: Jiri Pirko @ 2012-07-11 15:34 UTC (permalink / raw)
  To: netdev; +Cc: davem
In-Reply-To: <1342020844-3547-1-git-send-email-jpirko@redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/team/team.c                 |    6 ++++++
 drivers/net/team/team_mode_roundrobin.c |    6 +++---
 include/linux/if_team.h                 |    1 +
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 9b94f53..bc7afa5 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -677,6 +677,12 @@ bool team_port_enabled(struct team_port *port)
 }
 EXPORT_SYMBOL(team_port_enabled);
 
+bool team_port_txable(struct team_port *port)
+{
+	return port->linkup && team_port_enabled(port);
+}
+EXPORT_SYMBOL(team_port_txable);
+
 /*
  * Enable/disable port by adding to enabled port hashlist and setting
  * port->index (Might be racy so reader could see incorrect ifindex when
diff --git a/drivers/net/team/team_mode_roundrobin.c b/drivers/net/team/team_mode_roundrobin.c
index 52dd0ec..0cf38e9 100644
--- a/drivers/net/team/team_mode_roundrobin.c
+++ b/drivers/net/team/team_mode_roundrobin.c
@@ -30,16 +30,16 @@ static struct team_port *__get_first_port_up(struct team *team,
 {
 	struct team_port *cur;
 
-	if (port->linkup)
+	if (team_port_txable(port))
 		return port;
 	cur = port;
 	list_for_each_entry_continue_rcu(cur, &team->port_list, list)
-		if (cur->linkup)
+		if (team_port_txable(port))
 			return cur;
 	list_for_each_entry_rcu(cur, &team->port_list, list) {
 		if (cur == port)
 			break;
-		if (cur->linkup)
+		if (team_port_txable(port))
 			return cur;
 	}
 	return NULL;
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index 99efd60..dca426c 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -64,6 +64,7 @@ struct team_port {
 };
 
 extern bool team_port_enabled(struct team_port *port);
+extern bool team_port_txable(struct team_port *port);
 
 struct team_mode_ops {
 	int (*init)(struct team *team);
-- 
1.7.10.4

^ permalink raw reply related

* [patch net-next 2/3] team: add broadcast mode
From: Jiri Pirko @ 2012-07-11 15:34 UTC (permalink / raw)
  To: netdev; +Cc: davem
In-Reply-To: <1342020844-3547-1-git-send-email-jpirko@redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/team/Kconfig               |   13 ++++-
 drivers/net/team/Makefile              |    1 +
 drivers/net/team/team_mode_broadcast.c |   88 ++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/team/team_mode_broadcast.c

diff --git a/drivers/net/team/Kconfig b/drivers/net/team/Kconfig
index 89024d5..6a7260b 100644
--- a/drivers/net/team/Kconfig
+++ b/drivers/net/team/Kconfig
@@ -15,6 +15,17 @@ menuconfig NET_TEAM
 
 if NET_TEAM
 
+config NET_TEAM_MODE_BROADCAST
+	tristate "Broadcast mode support"
+	depends on NET_TEAM
+	---help---
+	  Basic mode where packets are transmitted always by all suitable ports.
+
+	  All added ports are setup to have team's mac address.
+
+	  To compile this team mode as a module, choose M here: the module
+	  will be called team_mode_broadcast.
+
 config NET_TEAM_MODE_ROUNDROBIN
 	tristate "Round-robin mode support"
 	depends on NET_TEAM
@@ -22,7 +33,7 @@ config NET_TEAM_MODE_ROUNDROBIN
 	  Basic mode where port used for transmitting packets is selected in
 	  round-robin fashion using packet counter.
 
-	  All added ports are setup to have bond's mac address.
+	  All added ports are setup to have team's mac address.
 
 	  To compile this team mode as a module, choose M here: the module
 	  will be called team_mode_roundrobin.
diff --git a/drivers/net/team/Makefile b/drivers/net/team/Makefile
index fb9f4c1..9757630 100644
--- a/drivers/net/team/Makefile
+++ b/drivers/net/team/Makefile
@@ -3,6 +3,7 @@
 #
 
 obj-$(CONFIG_NET_TEAM) += team.o
+obj-$(CONFIG_NET_TEAM_MODE_BROADCAST) += team_mode_broadcast.o
 obj-$(CONFIG_NET_TEAM_MODE_ROUNDROBIN) += team_mode_roundrobin.o
 obj-$(CONFIG_NET_TEAM_MODE_ACTIVEBACKUP) += team_mode_activebackup.o
 obj-$(CONFIG_NET_TEAM_MODE_LOADBALANCE) += team_mode_loadbalance.o
diff --git a/drivers/net/team/team_mode_broadcast.c b/drivers/net/team/team_mode_broadcast.c
new file mode 100644
index 0000000..5562345
--- /dev/null
+++ b/drivers/net/team/team_mode_broadcast.c
@@ -0,0 +1,88 @@
+/*
+ * drivers/net/team/team_mode_broadcast.c - Broadcast mode for team
+ * Copyright (c) 2012 Jiri Pirko <jpirko@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/netdevice.h>
+#include <linux/if_team.h>
+
+static bool bc_transmit(struct team *team, struct sk_buff *skb)
+{
+	struct team_port *cur;
+	struct team_port *last = NULL;
+	struct sk_buff *skb2;
+	bool ret;
+	bool sum_ret = false;
+
+	list_for_each_entry_rcu(cur, &team->port_list, list) {
+		if (team_port_txable(cur)) {
+			if (last) {
+				skb2 = skb_clone(skb, GFP_ATOMIC);
+				if (skb2) {
+					skb2->dev = last->dev;
+					ret = dev_queue_xmit(skb2);
+					if (!sum_ret)
+						sum_ret = ret;
+				}
+			}
+			last = cur;
+		}
+	}
+	if (last) {
+		skb->dev = last->dev;
+		ret = dev_queue_xmit(skb);
+		if (!sum_ret)
+			sum_ret = ret;
+	}
+	return sum_ret;
+}
+
+static int bc_port_enter(struct team *team, struct team_port *port)
+{
+	return team_port_set_team_mac(port);
+}
+
+static void bc_port_change_mac(struct team *team, struct team_port *port)
+{
+	team_port_set_team_mac(port);
+}
+
+static const struct team_mode_ops bc_mode_ops = {
+	.transmit		= bc_transmit,
+	.port_enter		= bc_port_enter,
+	.port_change_mac	= bc_port_change_mac,
+};
+
+static const struct team_mode bc_mode = {
+	.kind		= "broadcast",
+	.owner		= THIS_MODULE,
+	.ops		= &bc_mode_ops,
+};
+
+static int __init bc_init_module(void)
+{
+	return team_mode_register(&bc_mode);
+}
+
+static void __exit bc_cleanup_module(void)
+{
+	team_mode_unregister(&bc_mode);
+}
+
+module_init(bc_init_module);
+module_exit(bc_cleanup_module);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jpirko@redhat.com>");
+MODULE_DESCRIPTION("Broadcast mode for team");
+MODULE_ALIAS("team-mode-broadcast");
-- 
1.7.10.4

^ permalink raw reply related

* [patch net-next 3/3] team: make team_port_enabled() and team_port_txable() static inline
From: Jiri Pirko @ 2012-07-11 15:34 UTC (permalink / raw)
  To: netdev; +Cc: davem
In-Reply-To: <1342020844-3547-1-git-send-email-jpirko@redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/team/team.c |   12 ------------
 include/linux/if_team.h |   11 +++++++++--
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index bc7afa5..3620c63 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -671,18 +671,6 @@ static bool team_port_find(const struct team *team,
 	return false;
 }
 
-bool team_port_enabled(struct team_port *port)
-{
-	return port->index != -1;
-}
-EXPORT_SYMBOL(team_port_enabled);
-
-bool team_port_txable(struct team_port *port)
-{
-	return port->linkup && team_port_enabled(port);
-}
-EXPORT_SYMBOL(team_port_txable);
-
 /*
  * Enable/disable port by adding to enabled port hashlist and setting
  * port->index (Might be racy so reader could see incorrect ifindex when
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index dca426c..dfa0c8e 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -63,8 +63,15 @@ struct team_port {
 	long mode_priv[0];
 };
 
-extern bool team_port_enabled(struct team_port *port);
-extern bool team_port_txable(struct team_port *port);
+static inline bool team_port_enabled(struct team_port *port)
+{
+	return port->index != -1;
+}
+
+static inline bool team_port_txable(struct team_port *port)
+{
+	return port->linkup && team_port_enabled(port);
+}
 
 struct team_mode_ops {
 	int (*init)(struct team *team);
-- 
1.7.10.4

^ permalink raw reply related

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Ben Greear @ 2012-07-11 15:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, ycheng, dave.taht, netdev, codel, therbert,
	mattmathis, nanditad, ncardwell, andrewmcgr, Rick Jones
In-Reply-To: <1342020306.3265.8129.camel@edumazet-glaptop>

On 07/11/2012 08:25 AM, Eric Dumazet wrote:
> On Wed, 2012-07-11 at 08:16 -0700, Ben Greear wrote:
>
>> I haven't read your patch in detail, but I was wondering if this feature
>> would cause trouble for applications that are servicing many sockets at once
>> and so might take several ms between handling each individual socket.
>>
>
> Well, this patch has no impact for such applications. In fact their
> send()/write() will return to userland faster than before (for very
> large send())

Maybe I'm just confused.  Is your patch just mucking with
the queues below the tcp xmit queues?  From the patch description
I was thinking you were somehow directly limiting the TCP xmit
queues...

If you are just draining the tcp xmit queues on a new/faster
trigger, then I see no problem with that, and no need for
a per-socket control.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [Ksummit-2012-discuss] Organising Mini Summits within the Kernel Summit
From: Stephen Hemminger @ 2012-07-11 15:44 UTC (permalink / raw)
  To: James Bottomley; +Cc: ksummit-2012-discuss, netdev
In-Reply-To: <1341994155.3522.16.camel@dabdike.int.hansenpartnership.com>

On Wed, 11 Jul 2012 09:09:15 +0100
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> Hi All,
> 
> We have set aside the second day of the kernel summit (Tuesday 28
> August) as mini-summit day.  So far we have only the PCI mini summit on
> this day, so if you can think of other topics, please send them to the
> kernel summit discuss list:
> 
> ksummit-2012-discuss@lists.linux-foundation.org
> 
> Looking at the available rooms, we think we can run about four or five
> mini summits.
> 
> As an added incentive, mini summit organisers get to pick who they
> invite and all the people they pick will get an automatic invitation to
> the third day of the kernel summit (but not the core first day) and the
> evening events.
> 
> James

Is there enough interest to have a networking mini-summit?

^ permalink raw reply

* [PATCH v3 net-next] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 15:50 UTC (permalink / raw)
  To: David Miller; +Cc: nanditad, netdev, codel, ncardwell, mattmathis

This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
---
 Documentation/networking/ip-sysctl.txt |   14 ++
 include/linux/tcp.h                    |    9 +
 include/net/sock.h                     |    2 
 include/net/tcp.h                      |    4 
 net/core/sock.c                        |    4 
 net/ipv4/sysctl_net_ipv4.c             |    7 +
 net/ipv4/tcp.c                         |    6 
 net/ipv4/tcp_ipv4.c                    |    1 
 net/ipv4/tcp_minisocks.c               |    1 
 net/ipv4/tcp_output.c                  |  154 ++++++++++++++++++++++-
 net/ipv6/tcp_ipv6.c                    |    1 
 11 files changed, 202 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 47b6c79..e20c17a 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -551,6 +551,20 @@ tcp_thin_dupack - BOOLEAN
 	Documentation/networking/tcp-thin.txt
 	Default: 0
 
+tcp_limit_output_bytes - INTEGER
+	Controls TCP Small Queue limit per tcp socket.
+	TCP bulk sender tends to increase packets in flight until it
+	gets losses notifications. With SNDBUF autotuning, this can
+	result in a large amount of packets queued in qdisc/device
+	on the local machine, hurting latency of other flows, for
+	typical pfifo_fast qdiscs.
+	tcp_limit_output_bytes limits the number of bytes on qdisc
+	or device to reduce artificial RTT/cwnd and reduce bufferbloat.
+	Note: For GSO/TSO enabled flows, we try to have at least two
+	packets in flight. Reducing tcp_limit_output_bytes might also
+	reduce the size of individual GSO packet (64KB being the max)
+	Default: 131072
+
 UDP variables:
 
 udp_mem - vector of 3 INTEGERs: min, pressure, max
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 2de9cf4..1888169 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -339,6 +339,9 @@ struct tcp_sock {
 	u32	rcv_tstamp;	/* timestamp of last received ACK (for keepalives) */
 	u32	lsndtime;	/* timestamp of last sent data packet (for restart window) */
 
+	struct list_head tsq_node; /* anchor in tsq_tasklet.head list */
+	unsigned long	tsq_flags;
+
 	/* Data for direct copy to user */
 	struct {
 		struct sk_buff_head	prequeue;
@@ -494,6 +497,12 @@ struct tcp_sock {
 	struct tcp_cookie_values  *cookie_values;
 };
 
+enum tsq_flags {
+	TSQ_THROTTLED,
+	TSQ_QUEUED,
+	TSQ_OWNED, /* tcp_tasklet_func() found socket was locked */
+};
+
 static inline struct tcp_sock *tcp_sk(const struct sock *sk)
 {
 	return (struct tcp_sock *)sk;
diff --git a/include/net/sock.h b/include/net/sock.h
index 640432a..eefce84 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -858,6 +858,8 @@ struct proto {
 	int			(*backlog_rcv) (struct sock *sk,
 						struct sk_buff *skb);
 
+	void		(*release_cb)(struct sock *sk);
+
 	/* Keeping track of sk's, looking them up, and port selection methods. */
 	void			(*hash)(struct sock *sk);
 	void			(*unhash)(struct sock *sk);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3618fef..439984b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -253,6 +253,7 @@ extern int sysctl_tcp_cookie_size;
 extern int sysctl_tcp_thin_linear_timeouts;
 extern int sysctl_tcp_thin_dupack;
 extern int sysctl_tcp_early_retrans;
+extern int sysctl_tcp_limit_output_bytes;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
@@ -321,6 +322,8 @@ extern struct proto tcp_prot;
 
 extern void tcp_init_mem(struct net *net);
 
+extern void tcp_tasklet_init(void);
+
 extern void tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void tcp_shutdown (struct sock *sk, int how);
@@ -334,6 +337,7 @@ extern int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		       size_t size);
 extern int tcp_sendpage(struct sock *sk, struct page *page, int offset,
 			size_t size, int flags);
+extern void tcp_release_cb(struct sock *sk);
 extern int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 extern int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 				 const struct tcphdr *th, unsigned int len);
diff --git a/net/core/sock.c b/net/core/sock.c
index 929bdcc..24039ac 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2159,6 +2159,10 @@ void release_sock(struct sock *sk)
 	spin_lock_bh(&sk->sk_lock.slock);
 	if (sk->sk_backlog.tail)
 		__release_sock(sk);
+
+	if (sk->sk_prot->release_cb)
+		sk->sk_prot->release_cb(sk);
+
 	sk->sk_lock.owned = 0;
 	if (waitqueue_active(&sk->sk_lock.wq))
 		wake_up(&sk->sk_lock.wq);
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 12aa0c5..70730f7 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -598,6 +598,13 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "tcp_limit_output_bytes",
+		.data		= &sysctl_tcp_limit_output_bytes,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 #ifdef CONFIG_NET_DMA
 	{
 		.procname	= "tcp_dma_copybreak",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d902da9..4252cd8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -376,6 +376,7 @@ void tcp_init_sock(struct sock *sk)
 	skb_queue_head_init(&tp->out_of_order_queue);
 	tcp_init_xmit_timers(sk);
 	tcp_prequeue_init(tp);
+	INIT_LIST_HEAD(&tp->tsq_node);
 
 	icsk->icsk_rto = TCP_TIMEOUT_INIT;
 	tp->mdev = TCP_TIMEOUT_INIT;
@@ -796,6 +797,10 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 				  inet_csk(sk)->icsk_ext_hdr_len -
 				  tp->tcp_header_len);
 
+		/* TSQ : try to have two TSO segments in flight */
+		xmit_size_goal = min_t(u32, xmit_size_goal,
+				       sysctl_tcp_limit_output_bytes >> 1);
+
 		xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
 
 		/* We try hard to avoid divides here */
@@ -3574,4 +3579,5 @@ void __init tcp_init(void)
 	tcp_secret_primary = &tcp_secret_one;
 	tcp_secret_retiring = &tcp_secret_two;
 	tcp_secret_secondary = &tcp_secret_two;
+	tcp_tasklet_init();
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ddefd39..01545a3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2588,6 +2588,7 @@ struct proto tcp_prot = {
 	.sendmsg		= tcp_sendmsg,
 	.sendpage		= tcp_sendpage,
 	.backlog_rcv		= tcp_v4_do_rcv,
+	.release_cb		= tcp_release_cb,
 	.hash			= inet_hash,
 	.unhash			= inet_unhash,
 	.get_port		= inet_csk_get_port,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 6560886..c66f2ed 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -424,6 +424,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
 			treq->snt_isn + 1 + tcp_s_data_size(oldtp);
 
 		tcp_prequeue_init(newtp);
+		INIT_LIST_HEAD(&newtp->tsq_node);
 
 		tcp_init_wl(newtp, treq->rcv_isn);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c465d3e..03854ab 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -50,6 +50,9 @@ int sysctl_tcp_retrans_collapse __read_mostly = 1;
  */
 int sysctl_tcp_workaround_signed_windows __read_mostly = 0;
 
+/* Default TSQ limit of two TSO segments */
+int sysctl_tcp_limit_output_bytes __read_mostly = 131072;
+
 /* This limits the percentage of the congestion window which we
  * will allow a single TSO frame to consume.  Building TSO frames
  * which are too large can cause TCP streams to be bursty.
@@ -65,6 +68,8 @@ int sysctl_tcp_slow_start_after_idle __read_mostly = 1;
 int sysctl_tcp_cookie_size __read_mostly = 0; /* TCP_COOKIE_MAX */
 EXPORT_SYMBOL_GPL(sysctl_tcp_cookie_size);
 
+static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
+			   int push_one, gfp_t gfp);
 
 /* Account for new data that has been sent to the network. */
 static void tcp_event_new_data_sent(struct sock *sk, const struct sk_buff *skb)
@@ -783,6 +788,140 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
 	return size;
 }
 
+
+/* TCP SMALL QUEUES (TSQ)
+ *
+ * TSQ goal is to keep small amount of skbs per tcp flow in tx queues (qdisc+dev)
+ * to reduce RTT and bufferbloat.
+ * We do this using a special skb destructor (tcp_wfree).
+ *
+ * Its important tcp_wfree() can be replaced by sock_wfree() in the event skb
+ * needs to be reallocated in a driver.
+ * The invariant being skb->truesize substracted from sk->sk_wmem_alloc
+ *
+ * Since transmit from skb destructor is forbidden, we use a tasklet
+ * to process all sockets that eventually need to send more skbs.
+ * We use one tasklet per cpu, with its own queue of sockets.
+ */
+struct tsq_tasklet {
+	struct tasklet_struct	tasklet;
+	struct list_head	head; /* queue of tcp sockets */
+};
+static DEFINE_PER_CPU(struct tsq_tasklet, tsq_tasklet);
+
+/*
+ * One tasklest per cpu tries to send more skbs.
+ * We run in tasklet context but need to disable irqs when
+ * transfering tsq->head because tcp_wfree() might
+ * interrupt us (non NAPI drivers)
+ */
+static void tcp_tasklet_func(unsigned long data)
+{
+	struct tsq_tasklet *tsq = (struct tsq_tasklet *)data;
+	LIST_HEAD(list);
+	unsigned long flags;
+	struct list_head *q, *n;
+	struct tcp_sock *tp;
+	struct sock *sk;
+
+	local_irq_save(flags);
+	list_splice_init(&tsq->head, &list);
+	local_irq_restore(flags);
+
+	list_for_each_safe(q, n, &list) {
+		tp = list_entry(q, struct tcp_sock, tsq_node);
+		list_del(&tp->tsq_node);
+
+		sk = (struct sock *)tp;
+		bh_lock_sock(sk);
+
+		if (!sock_owned_by_user(sk)) {
+			if ((1 << sk->sk_state) &
+			    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 |
+			     TCPF_CLOSING | TCPF_CLOSE_WAIT))
+				tcp_write_xmit(sk,
+					       tcp_current_mss(sk),
+					       0, 0,
+					       GFP_ATOMIC);
+		} else {
+			/* defer the work to tcp_release_cb() */
+			set_bit(TSQ_OWNED, &tp->tsq_flags);
+		}
+		bh_unlock_sock(sk);
+
+		clear_bit(TSQ_QUEUED, &tp->tsq_flags);
+		sk_free(sk);
+	}
+}
+
+/**
+ * tcp_release_cb - tcp release_sock() callback
+ * @sk: socket
+ *
+ * called from release_sock() to perform protocol dependent
+ * actions before socket release.
+ */
+void tcp_release_cb(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (test_and_clear_bit(TSQ_OWNED, &tp->tsq_flags)) {
+		if ((1 << sk->sk_state) &
+		    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 |
+		     TCPF_CLOSING | TCPF_CLOSE_WAIT))
+			tcp_write_xmit(sk,
+				       tcp_current_mss(sk),
+				       0, 0,
+				       GFP_ATOMIC);
+	}
+}
+EXPORT_SYMBOL(tcp_release_cb);
+
+void __init tcp_tasklet_init(void)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct tsq_tasklet *tsq = &per_cpu(tsq_tasklet, i);
+
+		INIT_LIST_HEAD(&tsq->head);
+		tasklet_init(&tsq->tasklet,
+			     tcp_tasklet_func,
+			     (unsigned long)tsq);
+	}
+}
+
+/*
+ * Write buffer destructor automatically called from kfree_skb.
+ * We cant xmit new skbs from this context, as we might already
+ * hold qdisc lock.
+ */
+void tcp_wfree(struct sk_buff *skb)
+{
+	struct sock *sk = skb->sk;
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (test_and_clear_bit(TSQ_THROTTLED, &tp->tsq_flags) &&
+	    !test_and_set_bit(TSQ_QUEUED, &tp->tsq_flags)) {
+		unsigned long flags;
+		struct tsq_tasklet *tsq;
+
+		/* Keep a ref on socket.
+		 * This last ref will be released in tcp_tasklet_func()
+		 */
+		atomic_sub(skb->truesize - 1, &sk->sk_wmem_alloc);
+
+		/* queue this socket to tasklet queue */
+		local_irq_save(flags);
+		tsq = &__get_cpu_var(tsq_tasklet);
+		list_add(&tp->tsq_node, &tsq->head);
+		tasklet_schedule(&tsq->tasklet);
+		local_irq_restore(flags);
+	} else {
+		sock_wfree(skb);
+	}
+}
+
 /* This routine actually transmits TCP packets queued in by
  * tcp_do_sendmsg().  This is used by both the initial
  * transmission and possible later retransmissions.
@@ -844,7 +983,12 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 
 	skb_push(skb, tcp_header_size);
 	skb_reset_transport_header(skb);
-	skb_set_owner_w(skb, sk);
+
+	skb_orphan(skb);
+	skb->sk = sk;
+	skb->destructor = (sysctl_tcp_limit_output_bytes > 0) ?
+			  tcp_wfree : sock_wfree;
+	atomic_add(skb->truesize, &sk->sk_wmem_alloc);
 
 	/* Build TCP header and checksum it. */
 	th = tcp_hdr(skb);
@@ -1780,6 +1924,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
 
+
 		tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
 		BUG_ON(!tso_segs);
 
@@ -1800,6 +1945,13 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 				break;
 		}
 
+		/* TSQ : sk_wmem_alloc accounts skb truesize,
+		 * including skb overhead. But thats OK.
+		 */
+		if (atomic_read(&sk->sk_wmem_alloc) >= sysctl_tcp_limit_output_bytes) {
+			set_bit(TSQ_THROTTLED, &tp->tsq_flags);
+			break;
+		}
 		limit = mss_now;
 		if (tso_segs > 1 && !tcp_urg_mode(tp))
 			limit = tcp_mss_split_point(sk, skb, mss_now,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 61175cb..70458a9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1970,6 +1970,7 @@ struct proto tcpv6_prot = {
 	.sendmsg		= tcp_sendmsg,
 	.sendpage		= tcp_sendpage,
 	.backlog_rcv		= tcp_v6_do_rcv,
+	.release_cb		= tcp_release_cb,
 	.hash			= tcp_v6_hash,
 	.unhash			= inet_unhash,
 	.get_port		= inet_csk_get_port,

^ permalink raw reply related

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 15:54 UTC (permalink / raw)
  To: Ben Greear; +Cc: nanditad, netdev, mattmathis, codel, ncardwell, David Miller
In-Reply-To: <4FFD9F18.6030401@candelatech.com>

On Wed, 2012-07-11 at 08:43 -0700, Ben Greear wrote:
> On 07/11/2012 08:25 AM, Eric Dumazet wrote:
> > On Wed, 2012-07-11 at 08:16 -0700, Ben Greear wrote:
> >
> >> I haven't read your patch in detail, but I was wondering if this feature
> >> would cause trouble for applications that are servicing many sockets at once
> >> and so might take several ms between handling each individual socket.
> >>
> >
> > Well, this patch has no impact for such applications. In fact their
> > send()/write() will return to userland faster than before (for very
> > large send())
> 
> Maybe I'm just confused.  Is your patch just mucking with
> the queues below the tcp xmit queues?  From the patch description
> I was thinking you were somehow directly limiting the TCP xmit
> queues...
> 

I dont limit tcp xmit queues. I might avoid excessive autotuning.



> If you are just draining the tcp xmit queues on a new/faster
> trigger, then I see no problem with that, and no need for
> a per-socket control.

Thats the plan : limiting numer of bytes in Qdisc, not number of bytes
in socket write queue.

^ permalink raw reply

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Ben Greear @ 2012-07-11 16:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, ycheng, dave.taht, netdev, codel, therbert,
	mattmathis, nanditad, ncardwell, andrewmcgr, Rick Jones
In-Reply-To: <1342022043.3265.8179.camel@edumazet-glaptop>

On 07/11/2012 08:54 AM, Eric Dumazet wrote:
> On Wed, 2012-07-11 at 08:43 -0700, Ben Greear wrote:
>> On 07/11/2012 08:25 AM, Eric Dumazet wrote:
>>> On Wed, 2012-07-11 at 08:16 -0700, Ben Greear wrote:
>>>
>>>> I haven't read your patch in detail, but I was wondering if this feature
>>>> would cause trouble for applications that are servicing many sockets at once
>>>> and so might take several ms between handling each individual socket.
>>>>
>>>
>>> Well, this patch has no impact for such applications. In fact their
>>> send()/write() will return to userland faster than before (for very
>>> large send())
>>
>> Maybe I'm just confused.  Is your patch just mucking with
>> the queues below the tcp xmit queues?  From the patch description
>> I was thinking you were somehow directly limiting the TCP xmit
>> queues...
>>
>
> I dont limit tcp xmit queues. I might avoid excessive autotuning.
>
>
>
>> If you are just draining the tcp xmit queues on a new/faster
>> trigger, then I see no problem with that, and no need for
>> a per-socket control.
>
> Thats the plan : limiting numer of bytes in Qdisc, not number of bytes
> in socket write queue.

Thanks for the explanation.

Out of curiosity, have you tried running multiple TCP streams
with different processes driving each stream, where each is trying
to drive, say, 700Mbps bi-directional traffic over a 1Gbps link?

Perhaps with 50ms of latency generated by a network emulator.

This used to cause some extremely high latency
due to excessive TCP xmit queues (from what I could tell),
but maybe this new patch will cure that.

I'll re-run my tests with your patch eventually..but too bogged
down to do so soon.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* RE: [RFC 1/2] i2400m: remove SDIO device support
From: Perez-Gonzalez, Inaky @ 2012-07-11 16:50 UTC (permalink / raw)
  To: John W. Linville, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Ortiz, Samuel, linux-wimax,
	wimax-BPSAo7wm5JOHVYUYWc+uSQ@public.gmane.org
In-Reply-To: <1341952049-32193-1-git-send-email-linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

> From: John W. Linville [mailto:linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org]
> 
> From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> 
> SDIO support in this driver was intended to support the iwmc3200
> device.  This hardware never became available to normal humans.
> Leaving this driver imposes unwelcome maintenance costs for no clear
> benefit.
> 
> Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: John W. Linville <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> ---
> If there are no objections, I'll push this series through the
> wireless-next tree along with the iwmc3200wifi removal.

Acked-by: Inaky Perez-Gonzalez <inaky.perez-gonzalez-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next v3 1/3] 6lowpan: Fix null pointer dereference in UDP uncompression function
From: Tony Cheneau @ 2012-07-11 16:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1342025476-20949-1-git-send-email-tony.cheneau@amnesiak.org>

When a UDP packet gets fragmented, a crash will occur at reassembly time.
This is because skb->transport_header is not set during earlier period of fragment reassembly.
As a consequence, call to udp_hdr() return NULL and uh (which is NULL) gets
dereferenced without much test.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
---
 net/ieee802154/6lowpan.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index f4070e5..0c9f6d1 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -315,6 +315,9 @@ lowpan_uncompress_udp_header(struct sk_buff *skb)
 	struct udphdr *uh = udp_hdr(skb);
 	u8 tmp;
 
+	if (!uh)
+		goto err;
+
 	if (lowpan_fetch_skb_u8(skb, &tmp))
 		goto err;
 
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH net-next v3 0/3] 6lowpan: Various bug fixes
From: Tony Cheneau @ 2012-07-11 16:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Alexander Smirnov

Hello,

After reading and playing with the 6lowpan code, I found out a few issues. This
patchset fixes them. This patchset should apply cleanly against the current
net-next. It contains only bug fixes, I'll send later on an other patchset that
will contain new functionalities.

Changes since version 2:
- remove a patch that prevented fragmentation to work after few packets have
  been send: Alexander included the patch in his patchset
- fix the title of the git commit to include the "6lowpan" tag

Regards,
	Tony Cheneau

Tony Cheneau (3):
  Fix null pointer dereference in UDP uncompression function
  Change byte order when storing/accessing u16 tag
  Change byte order when storing/accessing to len field

 net/ieee802154/6lowpan.c |   29 ++++++++++++++++++-----------
 1 files changed, 18 insertions(+), 11 deletions(-)

-- 
1.7.3.4

^ permalink raw reply

* [PATCH net-next v3 2/3] 6lowpan: Change byte order when storing/accessing u16 tag
From: Tony Cheneau @ 2012-07-11 16:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1342025476-20949-1-git-send-email-tony.cheneau@amnesiak.org>

The tag field should be stored and accessed using big endian byte order (as
intended in the specs). Or else, when displayed with a trafic analyser, such a
Wireshark, the field not properly displayed (e.g. 0x01 00 instead of 0x00 01,
and so on).

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
---
 net/ieee802154/6lowpan.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 0c9f6d1..9de1ece 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -303,7 +303,7 @@ static inline int lowpan_fetch_skb_u16(struct sk_buff *skb, u16 *val)
 	if (unlikely(!pskb_may_pull(skb, 2)))
 		return -EINVAL;
 
-	*val = skb->data[0] | (skb->data[1] << 8);
+	*val = (skb->data[0] << 8) | skb->data[1];
 	skb_pull(skb, 2);
 
 	return 0;
@@ -1010,8 +1010,8 @@ lowpan_skb_fragmentation(struct sk_buff *skb)
 	/* first fragment header */
 	head[0] = LOWPAN_DISPATCH_FRAG1 | (payload_length & 0x7);
 	head[1] = (payload_length >> 3) & 0xff;
-	head[2] = tag & 0xff;
-	head[3] = tag >> 8;
+	head[2] = tag >> 8;
+	head[3] = tag & 0xff;
 
 	err = lowpan_fragment_xmit(skb, head, header_length, 0, 0);
 
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH net-next v3 3/3] 6lowpan: Change byte order when storing/accessing to len field
From: Tony Cheneau @ 2012-07-11 16:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Alexander Smirnov
In-Reply-To: <1342025476-20949-1-git-send-email-tony.cheneau@amnesiak.org>

Lenght field should be encoded using big endian byte order, such as intend in the specs.
As it is currently written, the len field would not be decoded properly on an implementation using the correct byte ordering. Hence, it could lead to interroperability issues.

Also, I rewrote the code so that iphc0 argument of lowpan_alloc_new_frame could be removed.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
---
 net/ieee802154/6lowpan.c |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 9de1ece..75d91bb 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -649,7 +649,7 @@ static void lowpan_fragment_timer_expired(unsigned long entry_addr)
 }
 
 static struct lowpan_fragment *
-lowpan_alloc_new_frame(struct sk_buff *skb, u8 iphc0, u8 len, u8 tag)
+lowpan_alloc_new_frame(struct sk_buff *skb, u8 len, u8 tag)
 {
 	struct lowpan_fragment *frame;
 
@@ -660,7 +660,7 @@ lowpan_alloc_new_frame(struct sk_buff *skb, u8 iphc0, u8 len, u8 tag)
 
 	INIT_LIST_HEAD(&frame->list);
 
-	frame->length = (iphc0 & 7) | (len << 3);
+	frame->length = len;
 	frame->tag = tag;
 
 	/* allocate buffer for frame assembling */
@@ -718,14 +718,18 @@ lowpan_process_data(struct sk_buff *skb)
 	case LOWPAN_DISPATCH_FRAGN:
 	{
 		struct lowpan_fragment *frame;
-		u8 len, offset;
-		u16 tag;
+		/* slen stores the rightmost 8 bits of the 11 bits length */
+		u8 slen, offset;
+		u16 len, tag;
 		bool found = false;
 
-		if (lowpan_fetch_skb_u8(skb, &len) || /* frame length */
+		if (lowpan_fetch_skb_u8(skb, &slen) || /* frame length */
 		    lowpan_fetch_skb_u16(skb, &tag))  /* fragment tag */
 			goto drop;
 
+		/* adds the 3 MSB to the 8 LSB to retrieve the 11 bits length */
+		len = ((iphc0 & 7) << 8) | slen;
+
 		/*
 		 * check if frame assembling with the same tag is
 		 * already in progress
@@ -740,7 +744,7 @@ lowpan_process_data(struct sk_buff *skb)
 
 		/* alloc new frame structure */
 		if (!found) {
-			frame = lowpan_alloc_new_frame(skb, iphc0, len, tag);
+			frame = lowpan_alloc_new_frame(skb, len, tag);
 			if (!frame)
 				goto unlock_and_drop;
 		}
@@ -1008,8 +1012,8 @@ lowpan_skb_fragmentation(struct sk_buff *skb)
 	tag = fragment_tag++;
 
 	/* first fragment header */
-	head[0] = LOWPAN_DISPATCH_FRAG1 | (payload_length & 0x7);
-	head[1] = (payload_length >> 3) & 0xff;
+	head[0] = LOWPAN_DISPATCH_FRAG1 | ((payload_length >> 8) & 0x7);
+	head[1] = payload_length & 0xff;
 	head[2] = tag >> 8;
 	head[3] = tag & 0xff;
 
-- 
1.7.3.4

^ permalink raw reply related

* Re: UDP ordering when using multiple rx queue
From: Rick Jones @ 2012-07-11 17:50 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: Merav Sicron, netdev
In-Reply-To: <CAL8zT=hi9_Y4oGw=cVSnYE=km6MZBAAie-A5RWLy=47FR8aTag@mail.gmail.com>

On 07/11/2012 06:41 AM, Jean-Michel Hautbois wrote:
> I confirm that using ethtool -L eth1 combined 1 solves my issue.

My being pedantic or not, you have kludged around your issue, which is a 
broken application.

Can you actually ass-u-me that this application is deployed with just a 
single back-to-back link between two systems?  I'm guessing that isn't 
the way it is deployed in production or there would be zero call for 
multicast.   There is *zero* guarantee of ordering with UDP, multicast 
or otherwise - certainly not between sends involving different port 
numbers, nor for that matter even between sends involving the same port 
numbers.  Once you leave the NIC (and perhaps even before) all bets are off.

Have you tested using bonded links?  Or through switches which 
themselves are joined by bonded links? Various bonding modes can even 
re-order traffic of a single flow (eg mode-rr).  As I understand it, the 
moves to "break the bottlenecks" imposed by spanning tree will mean that 
meshes of switches, even without bonded links, will send traffic of 
different flows through different paths through the switch fabric.  In 
those cases they might send traffic to the same multicast address along 
the same path each time, but you probably cannot count on that, nor them 
sending traffic to different multicast addresses along the same path. 
Some clever meshed-switch folks may go ahead and look up at the 
transport-layer port numbers when deciding on their splits - just like 
some bonding modes can.

Until you get the application re-written to handle out-of-order traffic, 
it "works" only by chance.

> Unicast traffic seems ok (I used netperf in order to check this assumption).

Netperf does nothing to check the order of datagrams.  It is perfectly 
content receiving datagrams in any order.  So you can use it to see that 
a single flow of UDP unicast is not split-up by the NIC (by looking at 
the per-queue stats) you can assume nothing about the final ordering of 
those UDP datagrams from a "successful" netperf UDP_STREAM test.

rick jones

^ permalink raw reply

* Re: [RFC PATCH 07/10] ixgbe: Add function for setting XPS queue mapping
From: Ben Hutchings @ 2012-07-11 18:15 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, davem, jeffrey.t.kirsher, edumazet, therbert,
	alexander.duyck
In-Reply-To: <20120630001649.29939.725.stgit@gitlad.jf.intel.com>

On Fri, 2012-06-29 at 17:16 -0700, Alexander Duyck wrote:
> This change adds support for ixgbe to configure the XPS queue mapping on
> load.  The result of this change is that on open we will now be resetting
> the number of Tx queues, and then setting the default configuration for XPS
> based on if ATR is enabled or disabled.
[...]

I didn't see where you're resetting the number of TX queues; was that
actually added in an earlier patch?

It seems strange to be resetting XPS configuration on open; normally net
device configuration persists as long as the device is registered.
Maybe only do this if the number of TX queues has to change?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Response
From: drdave @ 2012-07-11 18:15 UTC (permalink / raw)


Gooday to you Sir/Ma,
We here by pronounce you the lucky beneficiary of the (1Million GBP*) from Microsoft draw, which was held in the UK,you are to provide your full details to this desk.
Required info:Full name,Full address,Country of residence,Originality,Tell number,Age,Occupation,Sex. Here is the reply to email (m.net.org@msn.com) kindly reply us via this email (m.net.org@msn.com)

Kind Regards
Lady Lisa.
m.net.org@msn.com

^ permalink raw reply

* Re: [RFC PATCH 09/10] ixgbe: Add support for displaying the number of Tx/Rx channels
From: Ben Hutchings @ 2012-07-11 18:21 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, davem, jeffrey.t.kirsher, edumazet, therbert,
	alexander.duyck
In-Reply-To: <20120630001659.29939.61276.stgit@gitlad.jf.intel.com>

On Fri, 2012-06-29 at 17:16 -0700, Alexander Duyck wrote:
> This patch adds support for the ethtool get_channels operation.
> 
> Since the ixgbe driver has to support DCB as well as the other modes the
> assumption I made here is that the number of channels in DCB modes refers
> to the number of queues per traffic class, not the number of queues total.
[...]

When MSI-X is enabled, a 'channel' is an MSI-X vector and the associated
queues, i.e. total number of channels reported should be the total
number of MSI-X vectors in use.  (That was my intended interpretation,
anyway.  It may be that there is too much variation in the way queues
and interrupts are associated for these operations to be defined in a
general way.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox