Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Possible networking regression in 3.6.0
From: Eric Dumazet @ 2012-10-01 16:37 UTC (permalink / raw)
  To: Chris Clayton; +Cc: David Miller, netdev, gpiez
In-Reply-To: <5069C27A.3090706@googlemail.com>

On Mon, 2012-10-01 at 17:19 +0100, Chris Clayton wrote:
> 
> On 10/01/12 16:31, Eric Dumazet wrote:
> > On Mon, 2012-10-01 at 16:13 +0100, Chris Clayton wrote:
> >>
> >> On 10/01/12 10:15, Eric Dumazet wrote:
> >>> On Mon, 2012-10-01 at 09:36 +0100, Chris Clayton wrote:
> >>>>
> >>>
> >>>>        0 ICMP messages received
> >>>>        0 input ICMP message failed.
> >>>>        ICMP input histogram:
> >>>>        0 ICMP messages sent
> >>>>        0 ICMP messages failed
> >>>>        ICMP output histogram:
> >>>
> >>>>
> >>>> After:
> >>>>
> >>>> $ netstat -s
> >>>> Icmp:
> >>>>        4 ICMP messages received
> >>>>        4 input ICMP message failed.
> >>>>        ICMP input histogram:
> >>>>            echo replies: 4
> >>>
> >>> So icmp replies come back and are delivered to host instead of being
> >>> forwarded.
> >>>
> >>> I wonder if MASQUERADE broke...
> >>>
> >>> Could you send
> >>>
> >>> iptables -t -nat -nvL
> >>
> >> $ iptables -t -nat -nvL
> >> iptables v1.4.15: can't initialize iptables table `-nat': Table does not
> >> exist (do you need to insmod?)
> >> Perhaps iptables or your kernel needs to be upgraded.
> >>
> >>> conntrack -L   # while ping is running from guest
> >>
> >> $ conntrack -L
> >> conntrack v1.2.2 (conntrack-tools): Operation failed: invalid parameters
> >>
> >
> > Thats not expected, you described you used MASQUERADE target, so
> > "iptables -t nat -nvL" should display something.
> >
> 
> To check this I've booted a 3.5.4 kernel. I get the same response to the 
> two commands. I also double checked that, with a 3.5.4 kernel, pinging 
> the router and browsing the internet from the client work and they do.
> 
> Except for the packets and bytes columns, the command iptables -nvL 
> gives the following output under both 3.5.4 and 3.6.0 kernels:
> 
> Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
>   pkts bytes target     prot opt in     out     source destination
>   3757 3240K ACCEPT     all  --  *      *       0.0.0.0/0 0.0.0.0/0 
>         state RELATED,ESTABLISHED
>     14   840 ACCEPT     all  --  *      *       127.0.0.1 127.0.0.1
>     41  4362 ACCEPT     all  --  *      *       192.168.0.0/24 0.0.0.0/0
>     90 12780 ACCEPT     all  --  *      *       192.168.200.0/24 0.0.0.0/0
>      0     0 ACCEPT     all  --  *      *       192.168.201.0/24 0.0.0.0/0
>      0     0 DROP       all  --  *      *       0.0.0.0/0 0.0.0.0/0
> 
> Chain FORWARD (policy ACCEPT 4470 packets, 3065K bytes)
>   pkts bytes target     prot opt in     out     source destination
> 
> Chain OUTPUT (policy ACCEPT 3243 packets, 349K bytes)
>   pkts bytes target     prot opt in     out     source destination
>     64  8344 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.200.0/24
>      0     0 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.201.0/24

I am lost, since n your first mail you said :
-----------------------------------------------------------------------------
# Load the connection-sharing for qemu/kvm guests
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
...
# allow traffic to and from the qemu/kvm virtual networks
NETS="200 201"
for net in $NETS; do
   iptables -A INPUT -s 192.168.$net.0/24 -j ACCEPT
   iptables -A OUTPUT -d 192.168.$net.0/24 -j ACCEPT
done
...

The network-related modules that are loaded are:

$ lsmod
Module                  Size  Used by
tun                    12412  0
xt_state                 891  1
iptable_filter           852  1
ipt_MASQUERADE          1222  1
iptable_nat             3087  1
nf_nat                 10901  2 ipt_MASQUERADE,iptable_nat
nf_conntrack_ipv4       4942  4 nf_nat,iptable_nat
nf_defrag_ipv4           815  1 nf_conntrack_ipv4
nf_conntrack           37644  5 
ipt_MASQUERADE,nf_nat,xt_state,iptable_nat,nf_conntrack_ipv4
...
r8169                  47159  0


-----------------------------------------------

Now you say you dont have nat ?

Something is wrong.

^ permalink raw reply

* Re: [PATCH net-next v5 1/1] ipv6: add support of ECMP
From: Joe Perches @ 2012-10-01 16:47 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: davem, bernat, netdev, yoshfuji
In-Reply-To: <1349099807-3907-2-git-send-email-nicolas.dichtel@6wind.com>

On Mon, 2012-10-01 at 15:56 +0200, Nicolas Dichtel wrote:
> This patch adds the support of equal cost multipath for IPv6.

trivia:

> diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
[]
> @@ -47,6 +47,10 @@ struct fib6_config {
>  	unsigned long	fc_expires;
>  	struct nlattr	*fc_mx;
>  	int		fc_mx_len;
> +#ifdef CONFIG_IPV6_MULTIPATH
> +	struct nlattr	*fc_mp;
> +	int		fc_mp_len;
> +#endif

These new entries should be in the reverse order to
avoid having a padding hole in 64-bit systems.

> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c

> @@ -672,6 +672,10 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
>  			    iter->rt6i_idev == rt->rt6i_idev &&
>  			    ipv6_addr_equal(&iter->rt6i_gateway,
>  					    &rt->rt6i_gateway)) {
> +#ifdef CONFIG_IPV6_MULTIPATH
> +				if (rt->rt6i_nsiblings)
> +					rt->rt6i_nsiblings = 0;
> +#endif

There are a _lot_ of #ifdef CONFIG_IPV6_MULTIPATH blocks.

It might be better to add a few static line functions
in a header file like:

#ifdef CONFIG_IPV6_MULTIPATH
static inline int ipv6_get_multipath_siblings(const struct rt6_info *rt)
{
	return rt->rt6i_nsiblings;
}
#else
static inline int ipv6_get_multipath_siblings(const struct rt6_info *rt)
{
	return 0;
}
#endif

and remove most of the #ifdef blocks.

^ permalink raw reply

* Re: [RFC] gre: conform to RFC6040 ECN progogation
From: Ben Hutchings @ 2012-10-01 16:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Chris Wright, David Miller, netdev
In-Reply-To: <20121001085606.6f828f7d@nehalam.linuxnetplumber.net>

On Mon, 2012-10-01 at 08:56 -0700, Stephen Hemminger wrote:
> On Mon, 1 Oct 2012 16:55:19 +0100
> Ben Hutchings <bhutchings@solarflare.com> wrote:
> 
> > On Mon, 2012-09-24 at 14:44 -0700, Stephen Hemminger wrote:
> > [...]
> > > --- a/net/ipv4/ip_gre.c	2012-09-21 08:45:55.948772761 -0700
> > > +++ b/net/ipv4/ip_gre.c	2012-09-24 14:35:54.666185603 -0700
> > [...]
> > > @@ -703,17 +704,18 @@ static int ipgre_rcv(struct sk_buff *skb
> > >  			skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> > >  		}
> > >  
> > > +		__skb_tunnel_rx(skb, tunnel->dev);
> > > +
> > > +		skb_reset_network_header(skb);
> > > +		if (!ipgre_ecn_decapsulate(iph, skb))
> > > +			goto drop;
> > > +
> > >  		tstats = this_cpu_ptr(tunnel->dev->tstats);
> > >  		u64_stats_update_begin(&tstats->syncp);
> > >  		tstats->rx_packets++;
> > >  		tstats->rx_bytes += skb->len;
> > >  		u64_stats_update_end(&tstats->syncp);
> > 
> > I don't know why you're moving this code above the stats update;
> > rx_packets/rx_bytes should include dropped packets.
> > 
> > Ben.
> > 
> > > -		__skb_tunnel_rx(skb, tunnel->dev);
> > > -
> > > -		skb_reset_network_header(skb);
> > > -		ipgre_ecn_decapsulate(iph, skb);
> > > -
> > >  		netif_rx(skb);
> > >  
> > >  		rcu_read_unlock();
> > 
> 
> Is that true? I thought they were accounted for by rx_dropped.

I don't think rx_dropped is appropriate for counting invalid packets,
but maybe actual practice is already different.

As for whether packets counted in rx_dropped should also be counted in
rx_packets/rx_bytes, I really don't know.  The current comments on
rtnl_link_stats (inherited from net_device_stats) are totally inadequate
as a specification.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [RFC] gre: conform to RFC6040 ECN progogation
From: Eric Dumazet @ 2012-10-01 17:13 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Stephen Hemminger, Chris Wright, David Miller, netdev
In-Reply-To: <1349110181.2577.16.camel@bwh-desktop.uk.solarflarecom.com>

On Mon, 2012-10-01 at 17:49 +0100, Ben Hutchings wrote:

> I don't think rx_dropped is appropriate for counting invalid packets,
> but maybe actual practice is already different.
> 
> As for whether packets counted in rx_dropped should also be counted in
> rx_packets/rx_bytes, I really don't know.  The current comments on
> rtnl_link_stats (inherited from net_device_stats) are totally inadequate
> as a specification.

rx_dropped is used by core network stack, not the devices themselves.

So a packet is first accounted in rx_bytes/rx_packets by the driver,
and if net/core/dev.c drops it, rx_dropped is incremented as well.

^ permalink raw reply

* Re: [PATCH net] use skb_end_offset() in skb_try_coalesce()
From: Eric Dumazet @ 2012-10-01 17:28 UTC (permalink / raw)
  To: Weiping Pan; +Cc: netdev
In-Reply-To: <6f3540c213735f009409d1cc7d3fe0dea91469d9.1348899174.git.wpan@redhat.com>

On Sat, 2012-09-29 at 14:15 +0800, Weiping Pan wrote:
> Commit ec47ea824774(skb: Add inline helper for getting the skb end offset from
> head) introduces this helper function, skb_end_offset(),
> we should make use of it.
> 
> Signed-off-by: Weiping Pan <wpan@redhat.com>
> ---
>  net/core/skbuff.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index e33ebae..86f040a 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3488,8 +3488,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>  		    skb_shinfo(from)->nr_frags > MAX_SKB_FRAGS)
>  			return false;
>  
> -		delta = from->truesize -
> -			SKB_TRUESIZE(skb_end_pointer(from) - from->head);
> +		delta = from->truesize - SKB_TRUESIZE(skb_end_offset(from));
>  	}
>  
>  	WARN_ON_ONCE(delta < len);

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH RFC] pkt_sched: QFQ Plus: fair-queueing service at DRR cost
From: Paolo Valente @ 2012-10-01 17:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: jhs, davem, linux-kernel, netdev, rizzo, fchecconi
In-Reply-To: <20121001083100.13fc231c@nehalam.linuxnetplumber.net>

Il 01/10/2012 17:31, Stephen Hemminger ha scritto:
> On Sun, 30 Sep 2012 19:40:49 +0200
> Paolo Valente <paolo.valente@unimore.it> wrote:
>
>> Hi,
>> this patch turns QFQ into QFQ+, a faster variant of QFQ that groups
>> classes into aggregates, and uses the original QFQ scheduling
>> algorithm to schedule aggregates instead of single classes. An
>> aggregate is made of at most M classes, all with the same weight and
>> maximum packet size.  M is equal to the minimum between tx_queue_len+1
>> and 8 (value chosen to get a good trade-off between execution time and
>> service guarantees). QFQ+ associates each aggregate with a budget
>> equal to the maximum packet size for the classes in the aggregate,
>> multiplied by the number of classes of the aggregate. Once selected an
>> aggregate for service, QFQ+ dequeues only the packets of its classes,
>> until the aggregate finishes its budget. Finally, within an aggregate,
>> classes are scheduled with DRR. In my tests, described below, the
>> execution time of QFQ+ with M=8 was from 16% to 31% lower than that of
>> QFQ, and close to that of DRR.
>>
>> QFQ+ does not use packet lengths for computing aggregate timestamps,
>> but budgets. Hence it does not need to modify any timestamp if the
>> head packet of a class changes. As a consequence, differently from
>> QFQ, which uses head-packet lengths to compute class timestamps, QFQ+
>> does not need further modifications to correctly schedule also
>> non-leaf classes and classes with non-FIFO qdiscs. Finally, QFQ+ is
>> more robust than QFQ against corruption of the data structures
>> implementing the bucket lists. A detailed description of QFQ+ can be
>> found in [1].
>>
>> As for service guarantees, thanks to the way how M is computed, the
>> service of QFQ+ is close to the one of QFQ. For example, as proved in
>> [1], under QFQ+ every packet of a given class is guaranteed the same
>> worst-case completion time as under QFQ, plus an additional delay
>> equal to the transmission time, at the rate reserved to the class, of
>> three maximum-size packet. See [1, Section 7.1] for a numerical
>> comparison among the packet delays guaranteed by QFQ+, QFQ and DRR.
>>
>> I measured the execution time of QFQ+, DRR and QFQ using the testing
>> environment [2]. In particular, for each scheduler I measured the
>> average total execution time of a packet enqueue plus a packet
>> dequeue.  For practical reasons, in this testing environment each
>> enqueue&dequeue is also charged for the cost of generating and
>> discarding an empty, fixed-size packet (using a free list). The
>> following table reports the results with an i7-2760QM, against four
>> different class sets. Time is measured in nanoseconds, while each set
>> or subset of classes is denoted as <num_classes>-w<weight>, where
>> <num_classes> and <weight> are, respectively, the number of classes
>> and the weight of every class in the set/subset (for example, 250-w1
>> stands for 250 classes with weight 1). For QFQ+, the table shows the
>> results for the two extremes for M: 1 and 8 (see [1, Section 7.2] for
>> results with other values of M and for more information).
>>
>>   -----------------------------------------------
>> | Set of  |      QFQ+ (M)     |   DRR      QFQ  |
>> | classes |    1          8   |                 |
>> |-----------------------------------------------|
>> | 1k-w1   |   89         63   |    56       81  |
>> |-----------------------------------------------|
>> | 500-w1, |                   |                 |
>> | 250-w2, |  102         71   |    87      103  |
>> | 250-w4  |                   |                 |
>> |-----------------------------------------------|
>> | 32k-w1  |  267        225   |   173      257  |
>> |-----------------------------------------------|
>> | 16k-w1, |                   |                 |
>> | 8k-w2,  |  253        187   |   252      257  |
>> | 8k-w4   |                   |                 |
>>   -----------------------------------------------
>>
>> About DRR, it achieves its best performance when all the classes have
>> the same weight. This is fortunate, because in such scenarios it is
>> actually pointless to use a fair-queueing scheduler, as the latter
>> would provide the same quality of service as DRR. In contrast, when
>> classes have differentiated weights and the better service properties
>> of QFQ+ make a difference, QFQ+ has better performance than DRR. It
>> happens mainly because QFQ+ dequeues packets in an order that causes
>> about 8% less cache misses than DRR. As for the number of
>> instructions, QFQ+ executes instead about 7% more instructions than
>> DRR, whereas QFQ executes from 25% to 34% more instructions than DRR.
>>
>> Paolo
>>
>> [1] P. Valente, "Reducing the Execution Time of Fair-Queueing Schedulers"
>> http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf
>>
>> [2] http://algo.ing.unimo.it/people/paolo/agg-sched/test-env.tgz
>>
>> Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
> I like the improvement and the performance improvement.
> Is there some concern that changing the implementation this much might
> upset some people already using QFQ?
If you mean people upset for the degradation of the service quality 
(which should however be hard to perceive in most practical 
applications), then the following solution could address this issue. It 
was the my first idea, before I decided not to change the interface at all.

1. Add an additional parameter M to the tc interface, with two types of 
values:
0        -> automatically compute the max number of classes in an 
aggregate using the current formula
 >0     -> use the value provided by the user as max number of classes

2. Set M to 1 as default value, which would let QFQ+ behave as QFQ by 
default.

tc should however be modified, and people using QFQ should probably move 
to the new version (which is the main reason why I opted for the other 
solution).

Paolo
> What happens if an existing working QFQ config is used in QFQ+?
>
>
>


-- 
-----------------------------------------------------------
| Paolo Valente              |                            |
| Algogroup                  |                            |
| Dip. Ing. Informazione     | tel:   +39 059 2056318     |
| Via Vignolese 905/b	     | fax:   +39 059 2056129     |
| 41125 Modena - Italy       | 				  |
|     home:  http://algo.ing.unimo.it/people/paolo/       |
-----------------------------------------------------------

^ permalink raw reply

* Re: [PATCH RFC] pkt_sched: QFQ Plus: fair-queueing service at DRR cost
From: Stephen Hemminger @ 2012-10-01 17:52 UTC (permalink / raw)
  To: Paolo Valente; +Cc: jhs, davem, linux-kernel, netdev, rizzo, fchecconi
In-Reply-To: <5069D701.9090403@unimore.it>

On Mon, 01 Oct 2012 19:46:41 +0200
Paolo Valente <paolo.valente@unimore.it> wrote:

> Il 01/10/2012 17:31, Stephen Hemminger ha scritto:
> > On Sun, 30 Sep 2012 19:40:49 +0200
> > Paolo Valente <paolo.valente@unimore.it> wrote:
> >
> >> Hi,
> >> this patch turns QFQ into QFQ+, a faster variant of QFQ that groups
> >> classes into aggregates, and uses the original QFQ scheduling
> >> algorithm to schedule aggregates instead of single classes. An
> >> aggregate is made of at most M classes, all with the same weight and
> >> maximum packet size.  M is equal to the minimum between tx_queue_len+1
> >> and 8 (value chosen to get a good trade-off between execution time and
> >> service guarantees). QFQ+ associates each aggregate with a budget
> >> equal to the maximum packet size for the classes in the aggregate,
> >> multiplied by the number of classes of the aggregate. Once selected an
> >> aggregate for service, QFQ+ dequeues only the packets of its classes,
> >> until the aggregate finishes its budget. Finally, within an aggregate,
> >> classes are scheduled with DRR. In my tests, described below, the
> >> execution time of QFQ+ with M=8 was from 16% to 31% lower than that of
> >> QFQ, and close to that of DRR.
> >>
> >> QFQ+ does not use packet lengths for computing aggregate timestamps,
> >> but budgets. Hence it does not need to modify any timestamp if the
> >> head packet of a class changes. As a consequence, differently from
> >> QFQ, which uses head-packet lengths to compute class timestamps, QFQ+
> >> does not need further modifications to correctly schedule also
> >> non-leaf classes and classes with non-FIFO qdiscs. Finally, QFQ+ is
> >> more robust than QFQ against corruption of the data structures
> >> implementing the bucket lists. A detailed description of QFQ+ can be
> >> found in [1].
> >>
> >> As for service guarantees, thanks to the way how M is computed, the
> >> service of QFQ+ is close to the one of QFQ. For example, as proved in
> >> [1], under QFQ+ every packet of a given class is guaranteed the same
> >> worst-case completion time as under QFQ, plus an additional delay
> >> equal to the transmission time, at the rate reserved to the class, of
> >> three maximum-size packet. See [1, Section 7.1] for a numerical
> >> comparison among the packet delays guaranteed by QFQ+, QFQ and DRR.
> >>
> >> I measured the execution time of QFQ+, DRR and QFQ using the testing
> >> environment [2]. In particular, for each scheduler I measured the
> >> average total execution time of a packet enqueue plus a packet
> >> dequeue.  For practical reasons, in this testing environment each
> >> enqueue&dequeue is also charged for the cost of generating and
> >> discarding an empty, fixed-size packet (using a free list). The
> >> following table reports the results with an i7-2760QM, against four
> >> different class sets. Time is measured in nanoseconds, while each set
> >> or subset of classes is denoted as <num_classes>-w<weight>, where
> >> <num_classes> and <weight> are, respectively, the number of classes
> >> and the weight of every class in the set/subset (for example, 250-w1
> >> stands for 250 classes with weight 1). For QFQ+, the table shows the
> >> results for the two extremes for M: 1 and 8 (see [1, Section 7.2] for
> >> results with other values of M and for more information).
> >>
> >>   -----------------------------------------------
> >> | Set of  |      QFQ+ (M)     |   DRR      QFQ  |
> >> | classes |    1          8   |                 |
> >> |-----------------------------------------------|
> >> | 1k-w1   |   89         63   |    56       81  |
> >> |-----------------------------------------------|
> >> | 500-w1, |                   |                 |
> >> | 250-w2, |  102         71   |    87      103  |
> >> | 250-w4  |                   |                 |
> >> |-----------------------------------------------|
> >> | 32k-w1  |  267        225   |   173      257  |
> >> |-----------------------------------------------|
> >> | 16k-w1, |                   |                 |
> >> | 8k-w2,  |  253        187   |   252      257  |
> >> | 8k-w4   |                   |                 |
> >>   -----------------------------------------------
> >>
> >> About DRR, it achieves its best performance when all the classes have
> >> the same weight. This is fortunate, because in such scenarios it is
> >> actually pointless to use a fair-queueing scheduler, as the latter
> >> would provide the same quality of service as DRR. In contrast, when
> >> classes have differentiated weights and the better service properties
> >> of QFQ+ make a difference, QFQ+ has better performance than DRR. It
> >> happens mainly because QFQ+ dequeues packets in an order that causes
> >> about 8% less cache misses than DRR. As for the number of
> >> instructions, QFQ+ executes instead about 7% more instructions than
> >> DRR, whereas QFQ executes from 25% to 34% more instructions than DRR.
> >>
> >> Paolo
> >>
> >> [1] P. Valente, "Reducing the Execution Time of Fair-Queueing Schedulers"
> >> http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf
> >>
> >> [2] http://algo.ing.unimo.it/people/paolo/agg-sched/test-env.tgz
> >>
> >> Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
> > I like the improvement and the performance improvement.
> > Is there some concern that changing the implementation this much might
> > upset some people already using QFQ?
> If you mean people upset for the degradation of the service quality 
> (which should however be hard to perceive in most practical 
> applications), then the following solution could address this issue. It 
> was the my first idea, before I decided not to change the interface at all.
> 
> 1. Add an additional parameter M to the tc interface, with two types of 
> values:
> 0        -> automatically compute the max number of classes in an 
> aggregate using the current formula
>  >0     -> use the value provided by the user as max number of classes
> 
> 2. Set M to 1 as default value, which would let QFQ+ behave as QFQ by 
> default.
> 
> tc should however be modified, and people using QFQ should probably move 
> to the new version (which is the main reason why I opted for the other 
> solution).
> 
> Paolo
> > What happens if an existing working QFQ config is used in QFQ+?
> >
> >
> >
> 
> 

In order for the transistion to be seamless all possible upgrades
have to work. As in:
  * old iproute2 utilities with new kernel with QFQ+
  * new iproute2 utilities with old kernel with QFQ

It is okay to force users to give new parameters to get full performance,
but just don't want to break existing users.

^ permalink raw reply

* Re: [PATCH] net: fix neigh_resolve_output can cause skb_under_panic
From: Ben Hutchings @ 2012-10-01 18:00 UTC (permalink / raw)
  To: gregkh@linuxfoundation.org
  Cc: Ramesh Nagappa, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, davem@davemloft.net,
	michel@digirati.com.br, eric.dumazet@gmail.com
In-Reply-To: <20120929005721.GA13335@kroah.com>

On Fri, 2012-09-28 at 17:57 -0700, gregkh@linuxfoundation.org wrote:
[...]
> > > You need a blank line before the first Signed-off-by: line.  
> > > Surely one of the reviewers should have caught this basic thing?
> > 
> > Outlook mangled the patch. I am unable to use git send-email because of
> > a corporate firewall on the build machine.
> 
> Then your patch would also be corrupted, Outlook, and Exchange, can not
> handle patches at all.  Please read Documentation/email_clients.txt for
> more details.
> 
> Also, ask your coworkers who properly submit patches what they do to
> work around your broken email infrastructure.

I successfully send patches using git-imap-send and Evolution talking to
Exchange 2010 through its IMAP and SMTP submission interfaces.  (I think
the previous version worked as well.)  But I would agree that Outlook is
probably hopeless.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Chris Clayton @ 2012-10-01 18:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, gpiez
In-Reply-To: <1349109436.12401.712.camel@edumazet-glaptop>



On 10/01/12 17:37, Eric Dumazet wrote:
> On Mon, 2012-10-01 at 17:19 +0100, Chris Clayton wrote:
>>
>> On 10/01/12 16:31, Eric Dumazet wrote:
>>> On Mon, 2012-10-01 at 16:13 +0100, Chris Clayton wrote:
>>>>
>>>> On 10/01/12 10:15, Eric Dumazet wrote:
>>>>> On Mon, 2012-10-01 at 09:36 +0100, Chris Clayton wrote:
>>>>>>
>>>>>
>>>>>>         0 ICMP messages received
>>>>>>         0 input ICMP message failed.
>>>>>>         ICMP input histogram:
>>>>>>         0 ICMP messages sent
>>>>>>         0 ICMP messages failed
>>>>>>         ICMP output histogram:
>>>>>
>>>>>>
>>>>>> After:
>>>>>>
>>>>>> $ netstat -s
>>>>>> Icmp:
>>>>>>         4 ICMP messages received
>>>>>>         4 input ICMP message failed.
>>>>>>         ICMP input histogram:
>>>>>>             echo replies: 4
>>>>>
>>>>> So icmp replies come back and are delivered to host instead of being
>>>>> forwarded.
>>>>>
>>>>> I wonder if MASQUERADE broke...
>>>>>
>>>>> Could you send
>>>>>
>>>>> iptables -t -nat -nvL
>>>>
>>>> $ iptables -t -nat -nvL
>>>> iptables v1.4.15: can't initialize iptables table `-nat': Table does not
>>>> exist (do you need to insmod?)
>>>> Perhaps iptables or your kernel needs to be upgraded.
>>>>
>>>>> conntrack -L   # while ping is running from guest
>>>>
>>>> $ conntrack -L
>>>> conntrack v1.2.2 (conntrack-tools): Operation failed: invalid parameters
>>>>
>>>
>>> Thats not expected, you described you used MASQUERADE target, so
>>> "iptables -t nat -nvL" should display something.
>>>
>>
>> To check this I've booted a 3.5.4 kernel. I get the same response to the
>> two commands. I also double checked that, with a 3.5.4 kernel, pinging
>> the router and browsing the internet from the client work and they do.
>>
>> Except for the packets and bytes columns, the command iptables -nvL
>> gives the following output under both 3.5.4 and 3.6.0 kernels:
>>
>> Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
>>    pkts bytes target     prot opt in     out     source destination
>>    3757 3240K ACCEPT     all  --  *      *       0.0.0.0/0 0.0.0.0/0
>>          state RELATED,ESTABLISHED
>>      14   840 ACCEPT     all  --  *      *       127.0.0.1 127.0.0.1
>>      41  4362 ACCEPT     all  --  *      *       192.168.0.0/24 0.0.0.0/0
>>      90 12780 ACCEPT     all  --  *      *       192.168.200.0/24 0.0.0.0/0
>>       0     0 ACCEPT     all  --  *      *       192.168.201.0/24 0.0.0.0/0
>>       0     0 DROP       all  --  *      *       0.0.0.0/0 0.0.0.0/0
>>
>> Chain FORWARD (policy ACCEPT 4470 packets, 3065K bytes)
>>    pkts bytes target     prot opt in     out     source destination
>>
>> Chain OUTPUT (policy ACCEPT 3243 packets, 349K bytes)
>>    pkts bytes target     prot opt in     out     source destination
>>      64  8344 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.200.0/24
>>       0     0 ACCEPT     all  --  *      *       0.0.0.0/0 192.168.201.0/24
>
> I am lost, since n your first mail you said :
> -----------------------------------------------------------------------------
> # Load the connection-sharing for qemu/kvm guests
> echo 1 > /proc/sys/net/ipv4/ip_forward
> iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> ...
> # allow traffic to and from the qemu/kvm virtual networks
> NETS="200 201"
> for net in $NETS; do
>     iptables -A INPUT -s 192.168.$net.0/24 -j ACCEPT
>     iptables -A OUTPUT -d 192.168.$net.0/24 -j ACCEPT
> done
> ...
>
> The network-related modules that are loaded are:
>
> $ lsmod
> Module                  Size  Used by
> tun                    12412  0
> xt_state                 891  1
> iptable_filter           852  1
> ipt_MASQUERADE          1222  1
> iptable_nat             3087  1
> nf_nat                 10901  2 ipt_MASQUERADE,iptable_nat
> nf_conntrack_ipv4       4942  4 nf_nat,iptable_nat
> nf_defrag_ipv4           815  1 nf_conntrack_ipv4
> nf_conntrack           37644  5
> ipt_MASQUERADE,nf_nat,xt_state,iptable_nat,nf_conntrack_ipv4
> ...
> r8169                  47159  0
>
>
> -----------------------------------------------
>
> Now you say you dont have nat ?
>
> Something is wrong.
>

Here's the complete script that starts up my firewall. I can't recall 
having changed this at all for two or three years, other than when a 
replacement router changed the network from 192.168.1.x or I add (or 
remove) other networks to (from) the $NETS list for other KVM clients

$ cat /etc/rc.d/rc.firewall
#! /bin/sh

case "$1" in
     stop)
         echo 0 > /proc/sys/net/ipv4/ip_forward
         # clear out the current settings
         iptables -F
         iptables -X
         iptables -Z
         ;;
     start)
         # Load the connection-sharing for qemu/kvm guests
         echo 1 > /proc/sys/net/ipv4/ip_forward
         iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

         iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

         # Allow anything internal to this machine (i.e. localhost)
	# is this really necessary?
         iptables -A INPUT -s 127.0.0.1 -d 127.0.0.1 -j ACCEPT

         # Allow any traffic from nodes on home network
         iptables -A INPUT -s 192.168.0.0/24 -j ACCEPT

         # and traffic to and from the qemu/kvm virtual networks
         NETS="200 201"
         for net in $NETS; do
             iptables -A INPUT -s 192.168.$net.0/24 -j ACCEPT
             iptables -A OUTPUT -d 192.168.$net.0/24 -j ACCEPT
          done

         # drop everything else
         # iptables -A INPUT -j LOG --log-level 4 --log-prefix "FIREWALL: "
         iptables -A INPUT -j DROP
         ;;
     restart|reload)
         $0 stop
         $0 start
         ;;
     status)
         iptables -L
         ;;
     *)
         echo "Usage: $0 {start|stop|restart|reload|status}"
         exit 1
         ;;
esac

>

eth0 is set up by calling /sbin/ifup from udev on the add event for eth0 
(wlan0 is disabled on the laptop, so that won't be getting in the way). 
Here's the script (the SSID is not really XXXXX:

$ cat /sbin/ifup
#!/bin/sh

PATH="/usr/bin:/usr/sbin:/sbin:/bin"
export PATH

SSID=XXXXX

#logger "$0 called with arguments $@"
if [ "$1" = "wlan0" ]; then

     # Bring the interface up before the iwconfig stuff below
     # assign ip address later else association with AP fails when using WPA
     ifconfig wlan0 up

     # Configure the wireless adapter
     iw wlan0 connect $SSID

     # start wpa_supplicant
     if [ -z `pgrep wpa_supplicant` ]; then
         wpa_supplicant -c/etc/wpa_supplicant/wpa_supplicant.conf 
-iwlan0 -Dwext -B -f/var/log/wpa_supplicant.log
     fi

     # wait until associated with the AP - can take a while with WPA
     secs=0
     until iw wlan0 link | grep -q "SSID: $SSID"; do
         let secs++
         if [ $secs -ge 20 ]; then
             logger -p user.err -t IFUP "Failed to associate with AP 
within 20 seconds"
             exit -1
         fi
         sleep 1
     done

     # set the regulatory domain (kernel >= 2.6.28)
     iw reg set GB

     ifconfig wlan0 192.168.0.140 netmask 255.255.255.0 up

     route add default gw 192.168.0.1 netmask 0.0.0.0 metric 1

     exit 0

fi

if [ "$1" = "eth0" ] ; then
     # load the module if necessary
     if ! grep -q eth0 /proc/net/dev; then
         modprobe r8169
     fi

     # wait up to 5 seconds for eth0 to appear
     secs=0
     until grep -q eth0 /proc/net/dev; do
         let secs++
         if [ $secs -ge 5 ]; then
             logger -p user.err -t IFUP "eth0 failed to appear within 5 
seconds"
             exit -1
         fi
         sleep 1
     done

     ifconfig eth0 192.168.0.40 netmask 255.255.255.0 up

     route add default gw 192.168.0.1 netmask 0.0.0.0 metric 1

     exit 0
fi

When the KVM client is running the routing on the host is:

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use 
Iface
default         router.local.la 0.0.0.0         UG    1      0        0 eth0
Unix            *               255.0.0.0       U     0      0        0 lo
local.lan       *               255.255.255.0   U     0      0        0 eth0
192.168.200.0   *               255.255.255.0   U     0      0        0 tap0

Like I say, the set up has been like this for ages and has worked. It's 
only since I started using 3.6 kernels that I've had a problem. I don't 
recall anything from the nat table ever having been listed by iptables -L.

>

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Captain Obvious @ 2012-10-01 18:34 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Eric Dumazet, David Miller, netdev, gpiez
In-Reply-To: <1349105498.12401.706.camel@edumazet-glaptop>

Eric Dumazet <eric.dumazet@gmail.com> :
[...]
> > > Could you send
> > >
> > > iptables -t -nat -nvL
> > 
> > $ iptables -t -nat -nvL
                  ^ typo

Please try "iptables -t nat -nvL" as was also suggested.

-- 
Ueimor

^ permalink raw reply

* Re: Possible bug with r8169 driver
From: Francois Romieu @ 2012-10-01 18:56 UTC (permalink / raw)
  To: hayeswang; +Cc: 'Nolwenn', netdev
In-Reply-To: <262549102A0343EDA34AA0A41217180C@realtek.com.tw>

hayeswang <hayeswang@realtek.com> :
[...]
> I need check it with our hardware engineers. I would reply after getting
> response.

Thanks.

> Could you try to set IO 0x08 ~ 0x0F to 0xff for testing, first?

Something like the patch below (against 3.5.4) ?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index eb81da4..e0f1b8d 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4271,8 +4271,8 @@ static void rtl_set_rx_mode(struct net_device *dev)
 		mc_filter[1] = swab32(data);
 	}
 
-	RTL_W32(MAR0 + 4, mc_filter[1]);
-	RTL_W32(MAR0 + 0, mc_filter[0]);
+	RTL_W32(MAR0 + 4, 0xffffffff);
+	RTL_W32(MAR0 + 0, 0xffffffff);
 
 	RTL_W32(RxConfig, tmp);
 }

^ permalink raw reply related

* Re: Possible networking regression in 3.6.0
From: Eric Dumazet @ 2012-10-01 19:21 UTC (permalink / raw)
  To: Captain Obvious; +Cc: Chris Clayton, David Miller, netdev, gpiez
In-Reply-To: <20121001183452.GA4492@electric-eye.fr.zoreil.com>

On Mon, 2012-10-01 at 20:34 +0200, Captain Obvious wrote:
> Eric Dumazet <eric.dumazet@gmail.com> :
> [...]
> > > > Could you send
> > > >
> > > > iptables -t -nat -nvL
> > > 
> > > $ iptables -t -nat -nvL
>                   ^ typo
> 
> Please try "iptables -t nat -nvL" as was also suggested.
> 

Oh well, good catch ;)

And for conntrack -L, please Chris add CONFIG_NF_CT_NETLINK=m to your
kernel .config

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Chris Clayton @ 2012-10-01 19:22 UTC (permalink / raw)
  To: Captain Obvious; +Cc: Eric Dumazet, David Miller, netdev, gpiez
In-Reply-To: <20121001183452.GA4492@electric-eye.fr.zoreil.com>



On 10/01/12 19:34, Captain Obvious wrote:
> Eric Dumazet <eric.dumazet@gmail.com> :
> [...]
>>>> Could you send
>>>>
>>>> iptables -t -nat -nvL
>>>
>>> $ iptables -t -nat -nvL
>                    ^ typo
>
> Please try "iptables -t nat -nvL" as was also suggested.
>

Good catch, Captain. Thanks.

$ iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 58 packets, 7716 bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain INPUT (policy ACCEPT 41 packets, 5895 bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain OUTPUT (policy ACCEPT 1158 packets, 75559 bytes)
  pkts bytes target     prot opt in     out     source 
destination

Chain POSTROUTING (policy ACCEPT 208 packets, 14279 bytes)
  pkts bytes target     prot opt in     out     source 
destination
   951 61351 MASQUERADE  all  --  *      eth0    0.0.0.0/0 
0.0.0.0/0

^ permalink raw reply

* pull request: wireless-next 2012-10-01
From: John W. Linville @ 2012-10-01 19:24 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 10197 bytes --]

commit e4d680c706284ca0413a84bd2a28fda76b360904

In my previous pull request, I mentioned that I had a few stragglers.
They are all driver updates: ti, brcmfmac, ath9k, one for bcma and
one for b43legacy.  I merged them late last week and was letting them
soak in linux-next.  I hope they can still make 3.7.

Please let me know if there are problems!

Thanks,

John

---

The following changes since commit a248afdc1b5916c2bfd007233112333d85aa28f6:

  Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next (2012-09-30 02:30:16 -0400)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next.git for-davem

for you to fetch changes up to e4d680c706284ca0413a84bd2a28fda76b360904:

  Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem (2012-10-01 07:39:36 -0400)

----------------------------------------------------------------

Arend van Spriel (6):
      brcmfmac: get rid of extern keyword in wl_cfg80211.h
      brcmfmac: use brcmf_cfg80211_priv to interface with wl_cfg80211 code
      brcmfmac: remove two obsolete structure definitions
      brcmfmac: rename structure brcmf_cfg80211_priv
      brcmfmac: remove brcmf_read_prof() function
      brcmfmac: remove brcmf_update_prof() function

Arik Nemtsov (9):
      wlcore: AP mode - send non-data packets with basic rates
      wlcore: allow only the lowest OFDM rate for p2p setup frames
      wlcore: make Tx flush timings more verbose
      wlcore: tx_flush - optimize flow and force Tx during the flush
      wlcore/wl18xx/wl12xx: allow up to 3 mac addresses
      wlcore: make debug prints work without dynamic debug
      wlcore: allow up to 3 running STA interfaces in combinations
      wlcore: spi: use private max-buf-size limit
      wl18xx: default to siso40 in 2.4ghz with a single antenna

Avinash Patil (1):
      mwifiex: enhance RX reordering to avoid packet drop during host sleep

Bala Shanmugam (1):
      ath9k: Enable MCI for AR9565

Dan Carpenter (1):
      brcmfmac: use kcalloc() to prevent integer overflow

Devendra Naga (1):
      wl18xx: use module_platform_driver

Eliad Peller (9):
      wlcore: consider single fw case
      wlcore: cancel recovery_work on stop() instead of remove_interface()
      wlcore: resume() only if sta is associated
      wlcore: always use wlvif->role_id for scans
      wlcore: lazy-enable device roles
      wlcore: invalidate keep-alive template on disconnection
      wlcore: use dynamic keep-alive template ids
      wlcore: decrease elp timeout
      wlcore: protect wlcore_op_set_key with mutex

Eyal Shapira (1):
      wlcore: configure wowlan regardless of wakeup conditions

Hante Meuleman (5):
      brcmfmac: use wait_event_timeout for fw control packets over usb.
      brcmfmac: use different fw api for encryption,auth. config
      brcmfmac: use define instead of hardcoded values.
      brcmfmac: notify common driver about usb tx completion.
      brcmfmac: add hostap supoort.

Ido Reis (2):
      wl18xx: update default phy configuration for pg2
      wl18xx: increase rx_ba_win_size to 32

Ido Yariv (6):
      wlcore: Prevent interaction with HW after recovery is queued
      wlcore: Don't recover during boot
      wlcore: Fix unbalanced interrupts enablement
      wlcore: Allow memory access when the FW crashes
      wlcore: Refactor probe
      wlcore: Load the NVS file asynchronously

Igal Chernobelsky (2):
      wl18xx/wl12xx: defines for Tx/Rx descriptors num
      wlcore/wl18xx/wl12xx: aggregation buffer size set

John W. Linville (2):
      Merge branch 'for-linville' of git://git.kernel.org/.../luca/wl12xx
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-next into for-davem

Kevin Gan (1):
      mwifiex: add inactivity deauth support for ap

Larry Finger (1):
      b43legacy: Fix crash on unload when firmware not available

Luciano Coelho (1):
      wl12xx: use module_platform_driver

Rafał Miłecki (1):
      bcma: change delays to follow timers-howto guide

Stone Piao (14):
      mwifiex: fix coding style issue
      mwifiex: implement cfg80211 mgmt_tx handler
      mwifiex: advertise mgmt_stype to cfg80211
      mwifiex: implement cfg80211 mgmt_frame_register handler
      mwifiex: report received management frames to cfg80211
      mwifiex: implement remain_on_channel and cancel_remain_on_channel
      mwifiex: process remain on channel expired event
      mwifiex: append each IE into a seperate IE buffer
      mwifiex: send firmware initialization commands synchronously
      mwifiex: add P2P interface
      mwifiex: add support for P2P client in interface type change
      mwifiex: add support for P2P GO in interface type change
      mwifiex: parse P2P IEs from beacon_data
      mwifiex: set txpd when send a mgmt frame for AP and GO mode

Sujith Manoharan (4):
      ath9k: Handle errors properly in MCI initialization
      ath9k: Add a debugfs file to adjust antenna diversity
      ath9k: Fix chainmask selection for AR9462
      ath9k: Fix BTCOEX weight initialization

Sven Eckelmann (1):
      ath9k_hw: Handle AR_INTR_SYNC_HOST1_(FATAL|PERR) on AR9003

Thomas Wagner (1):
      ath9k: Fix rx filtering issue for older chips

Tim Gardner (1):
      wlcore: Declare MODULE_FIRMWARE usage

Wei Yongjun (2):
      wl12xx: remove duplicated include from main.c
      mwifiex: convert to use le16_add_cpu()

Yair Shapira (2):
      wl18xx: number_of_assembled_ant5 indicates if A band is enabled
      wlcore/wl18xx: add phy_fw_version_str to debugfs driver_state

 drivers/bcma/core.c                                |    2 +-
 drivers/bcma/driver_chipcommon_pmu.c               |    5 +-
 drivers/bcma/driver_pci.c                          |    6 +-
 drivers/bcma/driver_pci_host.c                     |    8 +-
 drivers/net/wireless/ath/ath9k/ar9003_mac.c        |   17 +
 drivers/net/wireless/ath/ath9k/ar9003_mci.c        |   21 +-
 drivers/net/wireless/ath/ath9k/ar9003_mci.h        |    8 +-
 drivers/net/wireless/ath/ath9k/ar9003_phy.c        |    2 +-
 drivers/net/wireless/ath/ath9k/ath9k.h             |    2 +
 drivers/net/wireless/ath/ath9k/btcoex.c            |   65 +-
 drivers/net/wireless/ath/ath9k/btcoex.h            |    3 +-
 drivers/net/wireless/ath/ath9k/debug.c             |   55 +-
 drivers/net/wireless/ath/ath9k/gpio.c              |    7 +-
 drivers/net/wireless/ath/ath9k/htc_drv_gpio.c      |    2 +-
 drivers/net/wireless/ath/ath9k/mci.c               |   11 +-
 drivers/net/wireless/ath/ath9k/recv.c              |    4 +-
 drivers/net/wireless/ath/ath9k/wow.c               |    2 +-
 drivers/net/wireless/ath/ath9k/xmit.c              |    4 +
 drivers/net/wireless/b43legacy/main.c              |    2 +
 drivers/net/wireless/brcm80211/brcmfmac/dhd.h      |   33 +-
 .../net/wireless/brcm80211/brcmfmac/dhd_common.c   |   46 +
 drivers/net/wireless/brcm80211/brcmfmac/usb.c      |   48 +-
 .../net/wireless/brcm80211/brcmfmac/wl_cfg80211.c  | 2416 +++++++++++++-------
 .../net/wireless/brcm80211/brcmfmac/wl_cfg80211.h  |   81 +-
 drivers/net/wireless/mwifiex/11n_rxreorder.c       |   49 +-
 drivers/net/wireless/mwifiex/11n_rxreorder.h       |    5 +
 drivers/net/wireless/mwifiex/cfg80211.c            |  386 +++-
 drivers/net/wireless/mwifiex/cmdevt.c              |    2 +
 drivers/net/wireless/mwifiex/decl.h                |    6 +-
 drivers/net/wireless/mwifiex/fw.h                  |   43 +
 drivers/net/wireless/mwifiex/ie.c                  |   86 +-
 drivers/net/wireless/mwifiex/init.c                |   14 +-
 drivers/net/wireless/mwifiex/ioctl.h               |    2 +
 drivers/net/wireless/mwifiex/main.c                |   41 +-
 drivers/net/wireless/mwifiex/main.h                |   36 +
 drivers/net/wireless/mwifiex/scan.c                |    5 +-
 drivers/net/wireless/mwifiex/sta_cmd.c             |   92 +-
 drivers/net/wireless/mwifiex/sta_cmdresp.c         |   39 +
 drivers/net/wireless/mwifiex/sta_event.c           |   12 +
 drivers/net/wireless/mwifiex/sta_ioctl.c           |   59 +
 drivers/net/wireless/mwifiex/sta_rx.c              |    6 +
 drivers/net/wireless/mwifiex/sta_tx.c              |   12 +-
 drivers/net/wireless/mwifiex/uap_cmd.c             |   22 +
 drivers/net/wireless/mwifiex/uap_txrx.c            |   15 +
 drivers/net/wireless/mwifiex/util.c                |   40 +
 drivers/net/wireless/mwifiex/wmm.c                 |    9 +-
 drivers/net/wireless/ti/wl12xx/main.c              |   79 +-
 drivers/net/wireless/ti/wl12xx/wl12xx.h            |    7 +
 drivers/net/wireless/ti/wl18xx/debugfs.c           |    2 +-
 drivers/net/wireless/ti/wl18xx/main.c              |  128 +-
 drivers/net/wireless/ti/wl18xx/wl18xx.h            |    7 +
 drivers/net/wireless/ti/wlcore/cmd.c               |   21 +-
 drivers/net/wireless/ti/wlcore/cmd.h               |    5 -
 drivers/net/wireless/ti/wlcore/conf.h              |    3 +-
 drivers/net/wireless/ti/wlcore/debug.h             |   16 +-
 drivers/net/wireless/ti/wlcore/debugfs.c           |   32 +-
 drivers/net/wireless/ti/wlcore/init.c              |   12 +-
 drivers/net/wireless/ti/wlcore/io.h                |    4 +-
 drivers/net/wireless/ti/wlcore/main.c              |  366 +--
 drivers/net/wireless/ti/wlcore/ps.c                |   10 +-
 drivers/net/wireless/ti/wlcore/rx.c                |    2 +-
 drivers/net/wireless/ti/wlcore/scan.c              |   20 +-
 drivers/net/wireless/ti/wlcore/spi.c               |   10 +-
 drivers/net/wireless/ti/wlcore/testmode.c          |    4 +-
 drivers/net/wireless/ti/wlcore/tx.c                |   51 +-
 drivers/net/wireless/ti/wlcore/wlcore.h            |   23 +-
 drivers/net/wireless/ti/wlcore/wlcore_i.h          |   13 +-
 67 files changed, 3271 insertions(+), 1375 deletions(-)
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Dave Jones @ 2012-10-01 19:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Chris Clayton, David Miller, netdev, gpiez
In-Reply-To: <1349082950.12401.669.camel@edumazet-glaptop>

On Mon, Oct 01, 2012 at 11:15:50AM +0200, Eric Dumazet wrote:
 > > 
 > > $ netstat -s
 > > Icmp:
 > >      4 ICMP messages received
 > >      4 input ICMP message failed.
 > >      ICMP input histogram:
 > >          echo replies: 4
 > 
 > So icmp replies come back and are delivered to host instead of being
 > forwarded.
 > 
 > I wonder if MASQUERADE broke...

I hit something that sounds just like this a few months back..
http://lists.openwall.net/netdev/2012/07/25/53

It "went away" a few builds later, but I've seen it happen
again from time to time.

	Dave

^ permalink raw reply

* [PATCH] iproute2: add support for tcp_metrics
From: Julian Anastasov @ 2012-10-01 19:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

	ip tcp_metrics/tcpmetrics

	We support get/del for single entry and dump for
show/flush.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/linux/tcp_metrics.h |   54 ++++++
 ip/Makefile                 |    2 +-
 ip/ip.c                     |    4 +-
 ip/ip_common.h              |    1 +
 ip/tcp_metrics.c            |  426 +++++++++++++++++++++++++++++++++++++++++++
 man/man8/Makefile           |    3 +-
 man/man8/ip-tcp_metrics.8   |  143 +++++++++++++++
 man/man8/ip.8               |    7 +-
 8 files changed, 636 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/tcp_metrics.h
 create mode 100644 ip/tcp_metrics.c
 create mode 100644 man/man8/ip-tcp_metrics.8

diff --git a/include/linux/tcp_metrics.h b/include/linux/tcp_metrics.h
new file mode 100644
index 0000000..cb5157b
--- /dev/null
+++ b/include/linux/tcp_metrics.h
@@ -0,0 +1,54 @@
+/* tcp_metrics.h - TCP Metrics Interface */
+
+#ifndef _LINUX_TCP_METRICS_H
+#define _LINUX_TCP_METRICS_H
+
+#include <linux/types.h>
+
+/* NETLINK_GENERIC related info
+ */
+#define TCP_METRICS_GENL_NAME		"tcp_metrics"
+#define TCP_METRICS_GENL_VERSION	0x1
+
+enum tcp_metric_index {
+	TCP_METRIC_RTT,
+	TCP_METRIC_RTTVAR,
+	TCP_METRIC_SSTHRESH,
+	TCP_METRIC_CWND,
+	TCP_METRIC_REORDERING,
+
+	/* Always last.  */
+	__TCP_METRIC_MAX,
+};
+
+#define TCP_METRIC_MAX	(__TCP_METRIC_MAX - 1)
+
+enum {
+	TCP_METRICS_ATTR_UNSPEC,
+	TCP_METRICS_ATTR_ADDR_IPV4,		/* u32 */
+	TCP_METRICS_ATTR_ADDR_IPV6,		/* binary */
+	TCP_METRICS_ATTR_AGE,			/* msecs */
+	TCP_METRICS_ATTR_TW_TSVAL,		/* u32, raw, rcv tsval */
+	TCP_METRICS_ATTR_TW_TS_STAMP,		/* s32, sec age */
+	TCP_METRICS_ATTR_VALS,			/* nested +1, u32 */
+	TCP_METRICS_ATTR_FOPEN_MSS,		/* u16 */
+	TCP_METRICS_ATTR_FOPEN_SYN_DROPS,	/* u16, count of drops */
+	TCP_METRICS_ATTR_FOPEN_SYN_DROP_TS,	/* msecs age */
+	TCP_METRICS_ATTR_FOPEN_COOKIE,		/* binary */
+
+	__TCP_METRICS_ATTR_MAX,
+};
+
+#define TCP_METRICS_ATTR_MAX	(__TCP_METRICS_ATTR_MAX - 1)
+
+enum {
+	TCP_METRICS_CMD_UNSPEC,
+	TCP_METRICS_CMD_GET,
+	TCP_METRICS_CMD_DEL,
+
+	__TCP_METRICS_CMD_MAX,
+};
+
+#define TCP_METRICS_CMD_MAX	(__TCP_METRICS_CMD_MAX - 1)
+
+#endif /* _LINUX_TCP_METRICS_H */
diff --git a/ip/Makefile b/ip/Makefile
index 6a518f8..e3c991f 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -3,7 +3,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o ipnetns.o \
     ipmaddr.o ipmonitor.o ipmroute.o ipprefix.o iptuntap.o \
     ipxfrm.o xfrm_state.o xfrm_policy.o xfrm_monitor.o \
     iplink_vlan.o link_veth.o link_gre.o iplink_can.o \
-    iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o
+    iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o tcp_metrics.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/ip.c b/ip/ip.c
index df06d3e..e0f7e60 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -45,7 +45,7 @@ static void usage(void)
 "       ip [ -force ] -batch filename\n"
 "where  OBJECT := { link | addr | addrlabel | route | rule | neigh | ntable |\n"
 "                   tunnel | tuntap | maddr | mroute | mrule | monitor | xfrm |\n"
-"                   netns | l2tp }\n"
+"                   netns | l2tp | tcp_metrics }\n"
 "       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
 "                    -f[amily] { inet | inet6 | ipx | dnet | bridge | link } |\n"
 "                    -l[oops] { maximum-addr-flush-attempts } |\n"
@@ -78,6 +78,8 @@ static const struct cmd {
 	{ "tunl",	do_iptunnel },
 	{ "tuntap",	do_iptuntap },
 	{ "tap",	do_iptuntap },
+	{ "tcpmetrics",	do_tcp_metrics },
+	{ "tcp_metrics",do_tcp_metrics },
 	{ "monitor",	do_ipmonitor },
 	{ "xfrm",	do_xfrm },
 	{ "mroute",	do_multiroute },
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 5fa2cc0..2fd66b7 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -42,6 +42,7 @@ extern int do_multirule(int argc, char **argv);
 extern int do_netns(int argc, char **argv);
 extern int do_xfrm(int argc, char **argv);
 extern int do_ipl2tp(int argc, char **argv);
+extern int do_tcp_metrics(int argc, char **argv);
 
 static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb)
 {
diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
new file mode 100644
index 0000000..cd6d60c
--- /dev/null
+++ b/ip/tcp_metrics.c
@@ -0,0 +1,426 @@
+/*
+ * tcp_metrics.c	"ip tcp_metrics/tcpmetrics"
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		version 2 as published by the Free Software Foundation;
+ *
+ * Authors:	Julian Anastasov <ja@ssi.bg>, August 2012
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <arpa/inet.h>
+#include <sys/ioctl.h>
+#include <linux/if.h>
+
+#include <linux/genetlink.h>
+#include <linux/tcp_metrics.h>
+
+#include "utils.h"
+#include "ip_common.h"
+#include "libgenl.h"
+
+static void usage(void)
+{
+	fprintf(stderr, "Usage: ip tcp_metrics/tcpmetrics { COMMAND | help }\n");
+	fprintf(stderr, "       ip tcp_metrics { show | flush } SELECTOR\n");
+	fprintf(stderr, "       ip tcp_metrics delete [ address ] ADDRESS\n");
+	fprintf(stderr, "SELECTOR := [ [ address ] PREFIX ]\n");
+	exit(-1);
+}
+
+/* netlink socket */
+static struct rtnl_handle grth = { .fd = -1 };
+static int genl_family = -1;
+
+#define TCPM_REQUEST(_req, _bufsiz, _cmd, _flags) \
+	GENL_REQUEST(_req, _bufsiz, genl_family, 0, \
+		     TCP_METRICS_GENL_VERSION, _cmd, _flags)
+
+#define CMD_LIST	0x0001	/* list, lst, show		*/
+#define CMD_DEL		0x0002	/* delete, remove		*/
+#define CMD_FLUSH	0x0004	/* flush			*/
+
+static struct {
+	char	*name;
+	int	code;
+} cmds[] = {
+	{	"list",		CMD_LIST	},
+	{	"lst",		CMD_LIST	},
+	{	"show",		CMD_LIST	},
+	{	"delete",	CMD_DEL		},
+	{	"remove",	CMD_DEL		},
+	{	"flush",	CMD_FLUSH	},
+};
+
+static char *metric_name[TCP_METRIC_MAX + 1] = {
+	[TCP_METRIC_RTT]		= "rtt",
+	[TCP_METRIC_RTTVAR]		= "rttvar",
+	[TCP_METRIC_SSTHRESH]		= "ssthresh",
+	[TCP_METRIC_CWND]		= "cwnd",
+	[TCP_METRIC_REORDERING]		= "reordering",
+};
+
+static struct
+{
+	int flushed;
+	char *flushb;
+	int flushp;
+	int flushe;
+	int cmd;
+	inet_prefix addr;
+} f;
+
+static int flush_update(void)
+{
+	if (rtnl_send_check(&grth, f.flushb, f.flushp) < 0) {
+		perror("Failed to send flush request\n");
+		return -1;
+	}
+	f.flushp = 0;
+	return 0;
+}
+
+static int process_msg(const struct sockaddr_nl *who, struct nlmsghdr *n,
+		       void *arg)
+{
+	FILE *fp = (FILE *) arg;
+	struct genlmsghdr *ghdr;
+	struct rtattr *attrs[TCP_METRICS_ATTR_MAX + 1], *a;
+	int len = n->nlmsg_len;
+	char abuf[256];
+	inet_prefix addr;
+	int family, i, atype;
+
+	if (n->nlmsg_type != genl_family)
+		return -1;
+
+	len -= NLMSG_LENGTH(GENL_HDRLEN);
+	if (len < 0)
+		return -1;
+
+	ghdr = NLMSG_DATA(n);
+	if (ghdr->cmd != TCP_METRICS_CMD_GET)
+		return 0;
+
+	parse_rtattr(attrs, TCP_METRICS_ATTR_MAX, (void *) ghdr + GENL_HDRLEN,
+		     len);
+
+	a = attrs[TCP_METRICS_ATTR_ADDR_IPV4];
+	if (a) {
+		if (f.addr.family && f.addr.family != AF_INET)
+			return 0;
+		memcpy(&addr.data, RTA_DATA(a), 4);
+		addr.bytelen = 4;
+		family = AF_INET;
+		atype = TCP_METRICS_ATTR_ADDR_IPV4;
+	} else {
+		a = attrs[TCP_METRICS_ATTR_ADDR_IPV6];
+		if (a) {
+			if (f.addr.family && f.addr.family != AF_INET6)
+				return 0;
+			memcpy(&addr.data, RTA_DATA(a), 16);
+			addr.bytelen = 16;
+			family = AF_INET6;
+			atype = TCP_METRICS_ATTR_ADDR_IPV6;
+		} else
+			return 0;
+	}
+
+	if (f.addr.family && f.addr.bitlen >= 0 &&
+	    inet_addr_match(&addr, &f.addr, f.addr.bitlen))
+		return 0;
+
+	if (f.flushb) {
+		struct nlmsghdr *fn;
+		TCPM_REQUEST(req2, 128, TCP_METRICS_CMD_DEL, NLM_F_REQUEST);
+
+		addattr_l(&req2.n, sizeof(req2), atype, &addr.data,
+			  addr.bytelen);
+
+		if (NLMSG_ALIGN(f.flushp) + req2.n.nlmsg_len > f.flushe) {
+			if (flush_update())
+				return -1;
+		}
+		fn = (struct nlmsghdr *) (f.flushb + NLMSG_ALIGN(f.flushp));
+		memcpy(fn, &req2.n, req2.n.nlmsg_len);
+		fn->nlmsg_seq = ++grth.seq;
+		f.flushp = (((char *) fn) + req2.n.nlmsg_len) - f.flushb;
+		f.flushed++;
+		if (show_stats < 2)
+			return 0;
+	}
+
+	if (f.cmd & (CMD_DEL | CMD_FLUSH))
+		fprintf(fp, "Deleted ");
+
+	fprintf(fp, "%s",
+		format_host(family, RTA_PAYLOAD(a), &addr.data,
+			    abuf, sizeof(abuf)));
+
+	a = attrs[TCP_METRICS_ATTR_AGE];
+	if (a) {
+		__u64 val = rta_getattr_u64(a);
+
+		fprintf(fp, " age %llu.%03llusec",
+			val / 1000, val % 1000);
+	}
+
+	a = attrs[TCP_METRICS_ATTR_TW_TS_STAMP];
+	if (a) {
+		__s32 val = (__s32) rta_getattr_u32(a);
+		__u32 tsval;
+
+		a = attrs[TCP_METRICS_ATTR_TW_TSVAL];
+		tsval = a ? rta_getattr_u32(a) : 0;
+		fprintf(fp, " tw_ts %u/%dsec ago", tsval, val);
+	}
+
+	a = attrs[TCP_METRICS_ATTR_VALS];
+	if (a) {
+		struct rtattr *m[TCP_METRIC_MAX + 1 + 1];
+
+		parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a);
+
+		for (i = 0; i < TCP_METRIC_MAX + 1; i++) {
+			__u32 val;
+
+			a = m[i + 1];
+			if (!a)
+				continue;
+			if (metric_name[i])
+				fprintf(fp, " %s ", metric_name[i]);
+			else
+				fprintf(fp, " metric_%d ", i);
+			val = rta_getattr_u32(a);
+			switch (i) {
+			case TCP_METRIC_RTT:
+			case TCP_METRIC_RTTVAR:
+				fprintf(fp, "%ums", val);
+				break;
+			case TCP_METRIC_SSTHRESH:
+			case TCP_METRIC_CWND:
+			case TCP_METRIC_REORDERING:
+			default:
+				fprintf(fp, "%u", val);
+				break;
+			}
+		}
+	}
+
+	a = attrs[TCP_METRICS_ATTR_FOPEN_MSS];
+	if (a)
+		fprintf(fp, " fo_mss %u", rta_getattr_u16(a));
+
+	a = attrs[TCP_METRICS_ATTR_FOPEN_SYN_DROPS];
+	if (a) {
+		__u16 syn_loss = rta_getattr_u16(a);
+		__u64 ts;
+
+		a = attrs[TCP_METRICS_ATTR_FOPEN_SYN_DROP_TS];
+		ts = a ? rta_getattr_u64(a) : 0;
+
+		fprintf(fp, " fo_syn_drops %u/%llu.%03llusec ago",
+			syn_loss, ts / 1000, ts % 1000);
+	}
+
+	a = attrs[TCP_METRICS_ATTR_FOPEN_COOKIE];
+	if (a) {
+		char cookie[32 + 1];
+		unsigned char *ptr = RTA_DATA(a);
+		int i, max = RTA_PAYLOAD(a);
+
+		if (max > 16)
+			max = 16;
+		cookie[0] = 0;
+		for (i = 0; i < max; i++)
+			sprintf(cookie + i + i, "%02x", ptr[i]);
+		fprintf(fp, " fo_cookie %s", cookie);
+	}
+
+	fprintf(fp, "\n");
+
+	fflush(fp);
+	return 0;
+}
+
+static int tcpm_do_cmd(int cmd, int argc, char **argv)
+{
+	TCPM_REQUEST(req, 1024, TCP_METRICS_CMD_GET, NLM_F_REQUEST);
+	int atype = -1;
+	int ack;
+
+	memset(&f, 0, sizeof(f));
+	f.addr.bitlen = -1;
+	f.addr.family = preferred_family;
+
+	switch (preferred_family) {
+	case AF_UNSPEC:
+	case AF_INET:
+	case AF_INET6:
+		break;
+	default:
+		fprintf(stderr, "Unsupported family:%d\n", preferred_family);
+		return -1;
+	}
+
+	for (; argc > 0; argc--, argv++) {
+		char *who = "address";
+
+		if (strcmp(*argv, "addr") == 0 ||
+		    strcmp(*argv, "address") == 0) {
+			who = *argv;
+			NEXT_ARG();
+		}
+		if (matches(*argv, "help") == 0)
+			usage();
+		if (f.addr.bitlen >= 0)
+			duparg2(who, *argv);
+
+		get_prefix(&f.addr, *argv, preferred_family);
+		if (f.addr.bytelen && f.addr.bytelen * 8 == f.addr.bitlen) {
+			if (f.addr.family == AF_INET)
+				atype = TCP_METRICS_ATTR_ADDR_IPV4;
+			else if (f.addr.family == AF_INET6)
+				atype = TCP_METRICS_ATTR_ADDR_IPV6;
+		}
+		if ((CMD_DEL & cmd) && atype < 0) {
+			fprintf(stderr, "Error: a specific IP address is expected rather than \"%s\"\n",
+				*argv);
+			return -1;
+		}
+
+		argc--; argv++;
+	}
+
+	if (cmd == CMD_DEL && atype < 0)
+		missarg("address");
+
+	/* flush for exact address ? Single del */
+	if (cmd == CMD_FLUSH && atype >= 0)
+		cmd = CMD_DEL;
+
+	/* flush for all addresses ? Single del without address */
+	if (cmd == CMD_FLUSH && f.addr.bitlen <= 0 &&
+	    preferred_family == AF_UNSPEC) {
+		cmd = CMD_DEL;
+		req.g.cmd = TCP_METRICS_CMD_DEL;
+		ack = 1;
+	} else if (cmd == CMD_DEL) {
+		req.g.cmd = TCP_METRICS_CMD_DEL;
+		ack = 1;
+	} else {	/* CMD_FLUSH, CMD_LIST */
+		ack = 0;
+	}
+
+	if (genl_family < 0) {
+		if (rtnl_open_byproto(&grth, 0, NETLINK_GENERIC) < 0) {
+			fprintf(stderr, "Cannot open generic netlink socket\n");
+			exit(1);
+		}
+		genl_family = genl_resolve_family(&grth,
+						  TCP_METRICS_GENL_NAME);
+		if (genl_family < 0)
+			exit(1);
+	}
+
+	if (!(cmd & CMD_FLUSH) && (atype >= 0 || (cmd & CMD_DEL))) {
+		if (ack)
+			req.n.nlmsg_flags |= NLM_F_ACK;
+		if (atype >= 0)
+			addattr_l(&req.n, sizeof(req), atype, &f.addr.data,
+				  f.addr.bytelen);
+	} else {
+		req.n.nlmsg_flags |= NLM_F_DUMP;
+	}
+
+	f.cmd = cmd;
+	if (cmd & CMD_FLUSH) {
+		int round = 0;
+		char flushb[4096-512];
+
+		f.flushb = flushb;
+		f.flushp = 0;
+		f.flushe = sizeof(flushb);
+
+		for (;;) {
+			req.n.nlmsg_seq = grth.dump = ++grth.seq;
+			if (rtnl_send(&grth, &req, req.n.nlmsg_len) < 0) {
+				perror("Failed to send flush request");
+				exit(1);
+			}
+			f.flushed = 0;
+			if (rtnl_dump_filter(&grth, process_msg, stdout) < 0) {
+				fprintf(stderr, "Flush terminated\n");
+				exit(1);
+			}
+			if (f.flushed == 0) {
+				if (round == 0) {
+					fprintf(stderr, "Nothing to flush.\n");
+				} else if (show_stats)
+					printf("*** Flush is complete after %d round%s ***\n",
+					       round, round > 1 ? "s" : "");
+				fflush(stdout);
+				return 0;
+			}
+			round++;
+			if (flush_update() < 0)
+				exit(1);
+			if (show_stats) {
+				printf("\n*** Round %d, deleting %d entries ***\n",
+				       round, f.flushed);
+				fflush(stdout);
+			}
+		}
+		return 0;
+	}
+
+	if (ack) {
+		if (rtnl_talk(&grth, &req.n, 0, 0, NULL) < 0)
+			return -2;
+	} else if (atype >= 0) {
+		if (rtnl_talk(&grth, &req.n, 0, 0, &req.n) < 0)
+			return -2;
+		if (process_msg(NULL, &req.n, stdout) < 0) {
+			fprintf(stderr, "Dump terminated\n");
+			exit(1);
+		}
+	} else {
+		req.n.nlmsg_seq = grth.dump = ++grth.seq;
+		if (rtnl_send(&grth, &req, req.n.nlmsg_len) < 0) {
+			perror("Failed to send dump request");
+			exit(1);
+		}
+
+		if (rtnl_dump_filter(&grth, process_msg, stdout) < 0) {
+			fprintf(stderr, "Dump terminated\n");
+			exit(1);
+		}
+	}
+	return 0;
+}
+
+int do_tcp_metrics(int argc, char **argv)
+{
+	int i;
+
+	if (argc < 1)
+		return tcpm_do_cmd(CMD_LIST, 0, NULL);
+	for (i = 0; i < ARRAY_SIZE(cmds); i++) {
+		if (matches(argv[0], cmds[i].name) == 0)
+			return tcpm_do_cmd(cmds[i].code, argc-1, argv+1);
+	}
+	if (matches(argv[0], "help") == 0)
+		usage();
+
+	fprintf(stderr, "Command \"%s\" is unknown, "
+			"try \"ip tcp_metrics help\".\n", *argv);
+	exit(-1);
+}
+
diff --git a/man/man8/Makefile b/man/man8/Makefile
index 4ed3eab..aaf1729 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -8,7 +8,8 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 ss.8 \
 	bridge.8 rtstat.8 ctstat.8 nstat.8 routef.8 \
 	ip-address.8 ip-addrlabel.8 ip-l2tp.8 ip-link.8 \
 	ip-maddress.8 ip-monitor.8 ip-mroute.8 ip-neighbour.8 \
-	ip-netns.8 ip-ntable.8 ip-route.8 ip-rule.8 ip-tunnel.8 ip-xfrm.8
+	ip-netns.8 ip-ntable.8 ip-route.8 ip-rule.8 ip-tunnel.8 ip-xfrm.8 \
+	ip-tcp_metrics.8
 
 all: $(TARGETS)
 
diff --git a/man/man8/ip-tcp_metrics.8 b/man/man8/ip-tcp_metrics.8
new file mode 100644
index 0000000..1aa4d45
--- /dev/null
+++ b/man/man8/ip-tcp_metrics.8
@@ -0,0 +1,143 @@
+.TH "IP\-TCP_METRICS" 8 "23 Aug 2012" "iproute2" "Linux"
+.SH "NAME"
+ip-tcp_metrics \- management for TCP Metrics
+.SH "SYNOPSIS"
+.sp
+.ad l
+.in +8
+.ti -8
+.B ip
+.RI "[ " OPTIONS " ]"
+.B tcp_metrics
+.RI "{ " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.BR "ip tcp_metrics" " { " show " | " flush " }
+.IR SELECTOR
+
+.ti -8
+.BR "ip tcp_metrics delete " [ " address " ]
+.IR ADDRESS
+
+.ti -8
+.IR SELECTOR " := "
+.RB "[ [ " address " ] "
+.IR PREFIX " ]"
+
+.SH "DESCRIPTION"
+.B ip tcp_metrics
+is used to manipulate entries in the kernel that keep TCP information
+for IPv4 and IPv6 destinations. The entries are created when
+TCP sockets want to share information for destinations and are
+stored in a cache keyed by the destination address. The saved
+information may include values for metrics (initially obtained from
+routes), recent TSVAL for TIME-WAIT recycling purposes, state for the
+Fast Open feature, etc.
+For performance reasons the cache can not grow above configured limit
+and the older entries are replaced with fresh information, sometimes
+reclaimed and used for new destinations. The kernel never removes
+entries, they can be flushed only with this tool.
+
+.SS ip tcp_metrics show - show cached entries
+
+.TP
+.BI address " PREFIX " (default)
+IPv4/IPv6 prefix or address. If no prefix is provided all entries are shown.
+
+.LP
+The output may contain the following information:
+
+.BI age " <S.MMM>" sec
+- time after the entry was created, reset or updated with metrics
+from sockets. The entry is reset and refreshed on use with metrics from
+route if the metrics are not updated in last hour. Not all cached values
+reset the age on update.
+
+.BI cwnd " <N>"
+- CWND metric value
+
+.BI fo_cookie " <HEX-STRING>"
+- Cookie value received in SYN-ACK to be used by Fast Open for next SYNs
+
+.BI fo_mss " <N>"
+- MSS value received in SYN-ACK to be used by Fast Open for next SYNs
+
+.BI fo_syn_drops " <N>/<S.MMM>" "sec ago"
+- Number of drops of initial outgoing Fast Open SYNs with data
+detected by monitoring the received SYN-ACK after SYN retransmission.
+The seconds show the time after last SYN drop and together with
+the drop count can be used to disable Fast Open for some time.
+
+.BI reordering " <N>"
+- Reordering metric value
+
+.BI rtt " <N>" ms
+- RTT metric value
+
+.BI rttvar " <N>" ms
+- RTTVAR metric value
+
+.BI ssthresh " <SSTHRESH>"
+- SSTHRESH metric value
+
+.BI tw_ts " <TSVAL>/<SEC>" "sec ago"
+- recent TSVAL and the seconds after saving it into TIME-WAIT socket
+
+.SS ip tcp_metrics delete - delete single entry
+
+.TP
+.BI address " ADDRESS " (default)
+IPv4/IPv6 address. The address is a required argument.
+
+.SS ip tcp_metrics flush - flush entries
+This command flushes the entries selected by some criteria.
+
+.PP
+This command has the same arguments as
+.B show.
+
+.SH "EXAMPLES"
+.PP
+ip tcp_metrics show address 192.168.0.0/24
+.RS 4
+Shows the entries for destinations from subnet
+.RE
+.PP
+ip tcp_metrics show 192.168.0.0/24
+.RS 4
+The same but address keyword is optional
+.RE
+.PP
+ip tcp_metrics
+.RS 4
+Show all is the default action
+.RE
+.PP
+ip tcp_metrics delete 192.168.0.1
+.RS 4
+Removes the entry for 192.168.0.1 from cache.
+.RE
+.PP
+ip tcp_metrics flush 192.168.0.0/24
+.RS 4
+Removes entries for destinations from subnet
+.RE
+.PP
+ip tcp_metrics flush all
+.RS 4
+Removes all entries from cache
+.RE
+.PP
+ip -6 tcp_metrics flush all
+.RS 4
+Removes all IPv6 entries from cache keeping the IPv4 entries.
+.RE
+
+.SH SEE ALSO
+.br
+.BR ip (8)
+
+.SH AUTHOR
+Original Manpage by Julian Anastasov <ja@ssi.bg>
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index 4db8a67..9063049 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -15,7 +15,7 @@ ip \- show / manipulate routing, devices, policy routing and tunnels
 .IR OBJECT " := { "
 .BR link " | " addr " | " addrlabel " | " route " | " rule " | " neigh " | "\
  ntable " | " tunnel " | " tuntap " | " maddr " | "  mroute " | " mrule " | "\
- monitor " | " xfrm " | " netns " | "  l2tp " }"
+ monitor " | " xfrm " | " netns " | "  l2tp " | "  tcp_metrics " }"
 .sp
 
 .ti -8
@@ -161,6 +161,10 @@ host addresses.
 - rule in routing policy database.
 
 .TP
+.B tcp_metrics/tcpmetrics
+- manage TCP Metrics
+
+.TP
 .B tunnel
 - tunnel over IP.
 
@@ -220,6 +224,7 @@ was written by Alexey N. Kuznetsov and added in Linux 2.2.
 .BR ip-ntable (8),
 .BR ip-route (8),
 .BR ip-rule (8),
+.BR ip-tcp_metrics (8),
 .BR ip-tunnel (8),
 .BR ip-xfrm (8)
 .br
-- 
1.7.3.4

^ permalink raw reply related

* Re: Possible networking regression in 3.6.0
From: Chris Clayton @ 2012-10-01 19:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Captain Obvious, David Miller, netdev, gpiez
In-Reply-To: <1349119299.12401.719.camel@edumazet-glaptop>



On 10/01/12 20:21, Eric Dumazet wrote:
> On Mon, 2012-10-01 at 20:34 +0200, Captain Obvious wrote:
>> Eric Dumazet <eric.dumazet@gmail.com> :
>> [...]
>>>>> Could you send
>>>>>
>>>>> iptables -t -nat -nvL
>>>>
>>>> $ iptables -t -nat -nvL
>>                    ^ typo
>>
>> Please try "iptables -t nat -nvL" as was also suggested.
>>
>
> Oh well, good catch ;)
>
> And for conntrack -L, please Chris add CONFIG_NF_CT_NETLINK=m to your
> kernel .config
>

$ conntrack -L
unknown  2 566 src=192.168.0.1 dst=224.0.0.1 [UNREPLIED] src=224.0.0.1 
dst=192.168.0.1 use=1
icmp     1 25 src=192.168.200.1 dst=192.168.0.1 type=8 code=0 id=512 
src=192.168.0.1 dst=192.168.0.40 type=0 code=0 id=512 use=1
conntrack v1.2.2 (conntrack-tools): 2 flow entries have been shown.

>
>

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: David Miller @ 2012-10-01 20:01 UTC (permalink / raw)
  To: davej; +Cc: eric.dumazet, chris2553, netdev, gpiez
In-Reply-To: <20121001193434.GA14236@redhat.com>

From: Dave Jones <davej@redhat.com>
Date: Mon, 1 Oct 2012 15:34:34 -0400

> On Mon, Oct 01, 2012 at 11:15:50AM +0200, Eric Dumazet wrote:
>  > > 
>  > > $ netstat -s
>  > > Icmp:
>  > >      4 ICMP messages received
>  > >      4 input ICMP message failed.
>  > >      ICMP input histogram:
>  > >          echo replies: 4
>  > 
>  > So icmp replies come back and are delivered to host instead of being
>  > forwarded.
>  > 
>  > I wonder if MASQUERADE broke...
> 
> I hit something that sounds just like this a few months back..
> http://lists.openwall.net/netdev/2012/07/25/53
> 
> It "went away" a few builds later, but I've seen it happen
> again from time to time.

Yep I remembe that report.

If you can find a way to more reliably trigger the case, that would
help us immensely.

^ permalink raw reply

* Re: pull request: wireless-next 2012-10-01
From: David Miller @ 2012-10-01 20:00 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20121001192448.GA1944@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Mon, 1 Oct 2012 15:24:49 -0400

> commit e4d680c706284ca0413a84bd2a28fda76b360904
> 
> In my previous pull request, I mentioned that I had a few stragglers.
> They are all driver updates: ti, brcmfmac, ath9k, one for bcma and
> one for b43legacy.  I merged them late last week and was letting them
> soak in linux-next.  I hope they can still make 3.7.
> 
> Please let me know if there are problems!

Pulled, thanks John.

^ permalink raw reply

* Re: Possible networking regression in 3.6.0
From: Eric Dumazet @ 2012-10-01 20:04 UTC (permalink / raw)
  To: David Miller; +Cc: davej, chris2553, netdev, gpiez
In-Reply-To: <20121001.160115.1816241312626722150.davem@davemloft.net>

On Mon, 2012-10-01 at 16:01 -0400, David Miller wrote:
> From: Dave Jones <davej@redhat.com>
> Date: Mon, 1 Oct 2012 15:34:34 -0400
> 
> > On Mon, Oct 01, 2012 at 11:15:50AM +0200, Eric Dumazet wrote:
> >  > > 
> >  > > $ netstat -s
> >  > > Icmp:
> >  > >      4 ICMP messages received
> >  > >      4 input ICMP message failed.
> >  > >      ICMP input histogram:
> >  > >          echo replies: 4
> >  > 
> >  > So icmp replies come back and are delivered to host instead of being
> >  > forwarded.
> >  > 
> >  > I wonder if MASQUERADE broke...
> > 
> > I hit something that sounds just like this a few months back..
> > http://lists.openwall.net/netdev/2012/07/25/53
> > 
> > It "went away" a few builds later, but I've seen it happen
> > again from time to time.
> 
> Yep I remembe that report.
> 
> If you can find a way to more reliably trigger the case, that would
> help us immensely.

I am building a KMEMCHECK kernel, as a last try before my night ;)

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] bnx2x: net-next FW upgrade
From: David Miller @ 2012-10-01 20:46 UTC (permalink / raw)
  To: dmitry; +Cc: netdev
In-Reply-To: <1349099180-5606-1-git-send-email-dmitry@broadcom.com>

From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Mon, 1 Oct 2012 15:46:18 +0200

> This is a respin of previous series contains 2 patches -
> the first allows the bnx2x and cnic driver to
> utilize the recently submitted bnx2x FW 7.8.2, while the second advances
> the bnx2x version to 1.78.00-0.
> 
> Changes since V1:
> 	Integrated changes to CNIC driver for FW 7.8.2.

Applied.

^ permalink raw reply

* Re: [PATCH net] use skb_end_offset() in skb_try_coalesce()
From: David Miller @ 2012-10-01 20:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: wpan, netdev
In-Reply-To: <1349112517.12401.715.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 01 Oct 2012 19:28:37 +0200

> On Sat, 2012-09-29 at 14:15 +0800, Weiping Pan wrote:
>> Commit ec47ea824774(skb: Add inline helper for getting the skb end offset from
>> head) introduces this helper function, skb_end_offset(),
>> we should make use of it.
>> 
>> Signed-off-by: Weiping Pan <wpan@redhat.com>
>> ---
>>  net/core/skbuff.c |    3 +--
>>  1 files changed, 1 insertions(+), 2 deletions(-)
>> 
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index e33ebae..86f040a 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3488,8 +3488,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>>  		    skb_shinfo(from)->nr_frags > MAX_SKB_FRAGS)
>>  			return false;
>>  
>> -		delta = from->truesize -
>> -			SKB_TRUESIZE(skb_end_pointer(from) - from->head);
>> +		delta = from->truesize - SKB_TRUESIZE(skb_end_offset(from));
>>  	}
>>  
>>  	WARN_ON_ONCE(delta < len);
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2] be2net: fix vfs enumeration
From: David Miller @ 2012-10-01 20:46 UTC (permalink / raw)
  To: ivecera; +Cc: netdev, sathya.perla, subbu.seetharaman, ajit.khaparde, sfr
In-Reply-To: <1349092615-31894-1-git-send-email-ivecera@redhat.com>

From: Ivan Vecera <ivecera@redhat.com>
Date: Mon,  1 Oct 2012 13:56:55 +0200

> Current VFs enumeration algorithm used in be_find_vfs does not take domain
> number into the match. The match found in igb/ixgbe is more elegant and
> safe.
> 
> This 2nd version uses pci_physfn instead of checking dev->physfn directly.
> 
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] ipv6: del unreachable route when an addr is deleted on lo
From: David Miller @ 2012-10-01 20:49 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: eric.dumazet, netdev, yoshfuji
In-Reply-To: <1348653895-4027-1-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 26 Sep 2012 12:04:55 +0200

> When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), two routes
> are added:
>  - one in the local table:
>     local 2002::1 via :: dev lo  proto none  metric 0
>  - one the in main table (for the prefix):
>     unreachable 2002::1 dev lo  proto kernel  metric 256  error -101
> 
> When the address is deleted, the route inserted in the main table remains
> because we use rt6_lookup(), which returns NULL when dst->error is set, which
> is the case here! Thus, it is better to use ip6_route_lookup() to avoid this
> kind of filter.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> v2: rt cannot be NULL, so adjust the test after ip6_route_lookup()

Applied, thanks.

^ permalink raw reply

* Re: [RFC PATCH] ipv6: don't add link local route when there is no link local address
From: David Miller @ 2012-10-01 20:55 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, yoshfuji
In-Reply-To: <1348664962-4018-1-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 26 Sep 2012 15:09:22 +0200

> When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), a route
> to fe80::/64 is added in the main table:
>   unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
> 
> This route does not match any prefix (no fe80:: address on lo). In fact,
> addrconf_dev_config() will not add link local address because this function
> filters interfaces by type. If the link local address is added manually, the
> route to the link local prefix will be automatically added by
> addrconf_add_linklocal().
> Note also, that this route is not deleted when the address is removed.
> 
> After looking at the code, it seems that addrconf_add_lroute() is redundant with
> addrconf_add_linklocal(), because this function will add the link local route
> when the link local address is configured.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

This change looks correct, however:

> @@ -2489,7 +2479,6 @@ static void addrconf_sit_config(struct net_device *dev)
>  
>  	if (dev->flags&IFF_POINTOPOINT) {
>  		addrconf_add_mroute(dev);
> -		addrconf_add_lroute(dev);
>  	} else
>  		sit_route_add(dev);

now that the if() branch is a single statement, please remove the
curly braces.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox