Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next V4 00/13] Add basic VLAN support to bridges
From: Vitalii Demianets @ 2012-12-20 10:08 UTC (permalink / raw)
  To: Andrew Collins
  Cc: Vlad Yasevich, netdev, shemminger, davem, or.gerlitz, jhs, mst,
	erdnetdev, jiri
In-Reply-To: <CAKTPYJTAB-oOW5UE9EbNxwA+XbhmJu1FLrvq_mU8B1Qi6trxeA@mail.gmail.com>

On Thursday 20 December 2012 00:54:27 Andrew Collins wrote:
> On Wed, Dec 19, 2012 at 10:48 AM, Vlad Yasevich <vyasevic@redhat.com> wrote:
> > This series of patches provides an ability to add VLANs to the bridge
> > ports.  This is similar to what can be found in most switches.  The
> > bridge port may have any number of VLANs added to it including vlan 0
> > priority tagged traffic.  When vlans are added to the port, only traffic
> > tagged with particular vlan will forwarded over this port.  Additionally,
> > vlan ids are added to FDB entries and become part of the lookup.  This
> > way we correctly identify the FDB entry.
>
> This is likely well beyond the scope of this change, but I figured I'd
> throw out the question anyway.  This changeset looks to bring the
> Linux bridging code closer to the 802.1Q-2005 definition of a bridge,
> which is nice to see, I'm curious if this changeset also opens up the
> possibility of supporting MSTP in the future?  The big thing I see
> missing is per-VLAN port state, although I'm not very familiar with
> the current STP/bridge interactions.  Has anyone put any thought into
> what other necessary bridge pieces might be missing for MSTP support?

I think, to be compatible with 802.1Q-2005 we need the following pieces:
1) Multiple FIDs (it is 802.1Q term for FDB) support. It means that kernel 
should support several independent FDBs on a single bridge. The 802.1Q-2005 
standard requires the number of supported FDBs to be no less than the number 
of different MSTIs the implementation supports;
2) VLAN-to-FDB mapping should be introduced;
3) Support of Multiple Spanning Tree Instances (MSTIs);
4) FDB-to-MSTI mapping should be introduced;
5) And finally, per-MST port states should be implemented.

> obviously something to handle the MSTP protocol itself would need to exist 
as well

Please look here: http://sourceforge.net/projects/mstpd/

^ permalink raw reply

* Re: [PATCH] bridge: call br_netpoll_disable in br_add_if
From: Cong Wang @ 2012-12-20 10:33 UTC (permalink / raw)
  To: Gao feng; +Cc: netdev, shemminger, davem
In-Reply-To: <1355996503-19318-1-git-send-email-gaofeng@cn.fujitsu.com>

On Thu, 2012-12-20 at 17:41 +0800, Gao feng wrote:
> When netdev_set_master faild in br_add_if, we should
> call br_netpoll_disable to do some cleanup jobs,such
> as free the memory of struct netpoll which allocated
> in br_netpoll_enable.
> 
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>

Looks good!

Acked-by: Cong Wang <amwang@redhat.com>

^ permalink raw reply

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
From: Yurij M. Plotnikov @ 2012-12-20 11:22 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky
In-Reply-To: <20121220073445.GM18940@secunet.com>

On 12/20/12 11:34, Steffen Klassert wrote:
> On Wed, Dec 19, 2012 at 07:37:44PM +0000, Ben Hutchings wrote:
>    
>> On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote:
>>      
>>> On 12/19/12 17:35, Ben Hutchings wrote:
>>>        
>>>> On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
>>>>
>>>>          
>>>>> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket
>>>>> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT
>>>>> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the
>>>>> same and packet is always sent with "Don't Fragment" bit in case of
>>>>> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.
>>>>>
>>>>>            
>>>> You could try reverting:
>>>>
>>>> commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
>>>> Author: Steffen Klassert<steffen.klassert@secunet.com>
>>>> Date:   Mon Oct 8 00:56:54 2012 +0000
>>>>
>>>>       ipv4: Don't report stale pmtu values to userspace
>>>>
>>>>       We report cached pmtu values even if they are already expired.
>>>>       Change this to not report these values after they are expired
>>>>       and fix a race in the expire time calculation, as suggested by
>>>>       Eric Dumazet.
>>>>
>>>> Still, PMTU information is not supposed to expire for 10 minutes...
>>>>
>>>>
>>>>          
>>> With reverted commit there is no such problem on 3.7.1: IP_MTU is
>>> updated and DF is set only for the first packet in case of
>>> IP_PMTUDISC_WANT.
>>>        
>> [...]
>>
>> So it looks like something is going wrong with the expiry calculation
>> here.
>>
>> This change shouldn't affect the PMTU actually used by the kernel, but
>> could affect Onload since that relies on netlink route updates to keep
>> in synch.  You didn't say you were using Onload, but if you are then we
>> should not bother netdev with this until we can demonstrate a problem
>> that involves only the kernel stack.
>>
>>      
> I'm really surprised that this change can have such an effect,
> it changes nothing at the kernels pmtu handling. When looking
> at the code, I found that we may report a mtu value from a stale
> dst_entry when we query the mtu value with the IP_MTU socket
> option. But a subsequent send() should update the socket cached
> dst_entry, so at most one packet should be affected.
>
> Does the patch below change anything?
>
>
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 3c9d208..1049ce0 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
>   	{
>   		struct dst_entry *dst;
>   		val = 0;
> -		dst = sk_dst_get(sk);
> +		dst = sk_dst_check(sk, 0);
>   		if (dst) {
>   			val = dst_mtu(dst);
>   			dst_release(dst);
>    
With this patch kernel 3.7.1 works perfect. All described problems are 
fixed.

^ permalink raw reply

* Re: [PATCH] net: ipv4: route: fixed a coding style issues net: ipv4: tcp: fixed a coding style issues
From: Nicolas Dichtel @ 2012-12-20 12:07 UTC (permalink / raw)
  To: Stefan Hasko
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <1355990910-3688-1-git-send-email-hasko.stevo@gmail.com>

Le 20/12/2012 09:08, Stefan Hasko a écrit :
> Fix a coding style issues.
>
> Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
> ---
>   net/ipv4/route.c |  125 ++++++++++++++++++-------------
>   net/ipv4/tcp.c   |  218 +++++++++++++++++++++++++++++++-----------------------
>   2 files changed, 200 insertions(+), 143 deletions(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 844a9ef..fff7ce6 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -20,7 +20,7 @@
>    *		Alan Cox	:	Added BSD route gw semantics
>    *		Alan Cox	:	Super /proc >4K
>    *		Alan Cox	:	MTU in route table
> - *		Alan Cox	: 	MSS actually. Also added the window
> + *		Alan Cox	:	MSS actually. Also added the window
>    *					clamper.
>    *		Sam Lantinga	:	Fixed route matching in rt_del()
>    *		Alan Cox	:	Routing cache support.
> @@ -31,30 +31,35 @@
>    *	Miquel van Smoorenburg	:	BSD API fixes.
>    *	Miquel van Smoorenburg	:	Metrics.
>    *		Alan Cox	:	Use __u32 properly
> - *		Alan Cox	:	Aligned routing errors more closely with BSD
> + *		Alan Cox	:	Aligned routing errors more
> + *					closely with BSD
>    *					our system is still very different.
>    *		Alan Cox	:	Faster /proc handling
> - *	Alexey Kuznetsov	:	Massive rework to support tree based routing,
> + *	Alexey Kuznetsov	:	Massive rework to support
> + *					tree based routing,
>    *					routing caches and better behaviour.
>    *
>    *		Olaf Erb	:	irtt wasn't being copied right.
>    *		Bjorn Ekwall	:	Kerneld route support.
>    *		Alan Cox	:	Multicast fixed (I hope)
> - * 		Pavel Krauz	:	Limited broadcast fixed
> + *		Pavel Krauz	:	Limited broadcast fixed
>    *		Mike McLagan	:	Routing by source
>    *	Alexey Kuznetsov	:	End of old history. Split to fib.c and
>    *					route.c and rewritten from scratch.
>    *		Andi Kleen	:	Load-limit warning messages.
> - *	Vitaly E. Lavrov	:	Transparent proxy revived after year coma.
> + *	Vitaly E. Lavrov	:	Transparent proxy revived
> + *					after year coma.
>    *	Vitaly E. Lavrov	:	Race condition in ip_route_input_slow.
> - *	Tobias Ringstrom	:	Uninitialized res.type in ip_route_output_slow.
> + *	Tobias Ringstrom	:	Uninitialized res.type in
> + *					ip_route_output_slow.
>    *	Vladimir V. Ivanov	:	IP rule info (flowid) is really useful.
>    *		Marc Boucher	:	routing by fwmark
>    *	Robert Olsson		:	Added rt_cache statistics
>    *	Arnaldo C. Melo		:	Convert proc stuff to seq_file
> - *	Eric Dumazet		:	hashed spinlocks and rt_check_expire() fixes.
> - * 	Ilia Sotnikov		:	Ignore TOS on PMTUD and Redirect
> - * 	Ilia Sotnikov		:	Removed TOS from hash calculations
> + *	Eric Dumazet		:	hashed spinlocks and
> + *					rt_check_expire() fixes.
> + *	Ilia Sotnikov		:	Ignore TOS on PMTUD and Redirect
> + *	Ilia Sotnikov		:	Removed TOS from hash calculations
>    *
>    *		This program is free software; you can redistribute it and/or
>    *		modify it under the terms of the GNU General Public License
> @@ -65,7 +70,7 @@
>   #define pr_fmt(fmt) "IPv4: " fmt
>
>   #include <linux/module.h>
> -#include <asm/uaccess.h>
> +#include <linux/uaccess.h>
>   #include <linux/bitops.h>
>   #include <linux/types.h>
>   #include <linux/kernel.h>
> @@ -139,7 +144,8 @@ static unsigned int	 ipv4_default_advmss(const struct dst_entry *dst);
>   static unsigned int	 ipv4_mtu(const struct dst_entry *dst);
>   static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
>   static void		 ipv4_link_failure(struct sk_buff *skb);
> -static void		 ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
> +static void		 ip_rt_update_pmtu(struct dst_entry *dst,
> +					   struct sock *sk,
>   					   struct sk_buff *skb, u32 mtu);
>   static void		 ip_do_redirect(struct dst_entry *dst, struct sock *sk,
>   					struct sk_buff *skb);
> @@ -291,12 +297,17 @@ static int rt_cpu_seq_show(struct seq_file *seq, void *v)
>   	struct rt_cache_stat *st = v;
>
>   	if (v == SEQ_START_TOKEN) {
> -		seq_printf(seq, "entries  in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src  out_hit out_slow_tot out_slow_mc  gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n");
> +		seq_printf(seq, "entries  in_hit in_slow_tot in_slow_mc "
> +				"in_no_route in_brd in_martian_dst "
> +				"in_martian_src  out_hit out_slow_tot "
> +				"out_slow_mc  gc_total gc_ignored "
> +				"gc_goal_miss gc_dst_overflow in_hlist_search "
> +				"out_hlist_search\n");
checkpatch will warn you about this one, something like:
"WARNING: quoted string split across lines".
Not breaking such line ease to grep the pattern.

Nicolas

^ permalink raw reply

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
From: Jamal Hadi Salim @ 2012-12-20 12:35 UTC (permalink / raw)
  To: Yury Stankevich
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50D2D229.6040802@gmail.com>


Could be your setup. I didnt do a lot of testing but
from my notes (running different kernel at the moment):

#try to point to everything (no iptables setup)
tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid 
23:23 action xt -j CONNMARK --restore-mark
#let it run for a 1 sec then display with
tc -s filter show dev eth0 parent ffff:

----
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt 
0 flowid 23:23
   match 00000000/00000000 at 0
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 1 ref 1 bind 1 installed 3 sec used 1 sec
	Action statistics:
	Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
----

cheers,
jamal

On 12-12-20 03:54 AM, Yury Stankevich wrote:
> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>> Hasan/Yury, if you test this please use the latest iproute2 with only
>> the first patch I posted (originally from Hasan). Hasan please use that
>> patch not your version - if theres anything wrong we can find out sooner
>> before the patch becomes final.
>
> Hello,
> 3.7.1 kernel with 3.7.0 iproute,
> patch-xt, xt-p1 + linkage fix was applyed
> command successfully performed, but actually doesn't work.
>
> command:
> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>              action xt -j CONNMARK --restore-mark \
>              action mirred egress redirect dev ifb0
> then i use filter:
>
> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
> 1:102
>
> iptables line:
> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
> bytes -j CONNMARK --set-mark 0xa
>
> once i run a test to download 300K file,
> from iptables counters i can see that rule in POSTROUTING is triggered,
> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
> 1:102 flow.
>
> btw,
> tc -p -s filter show dev ifb0 parent 1:
> do not show stats `(rule hit 416 success 0)` for this (filter protocol
> ip pref 2 fw handle 0xa classid 1:102) rule.
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
From: Steffen Klassert @ 2012-12-20 12:35 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky
In-Reply-To: <50D2F4E5.4050904@oktetlabs.ru>

On Thu, Dec 20, 2012 at 03:22:13PM +0400, Yurij M. Plotnikov wrote:
> On 12/20/12 11:34, Steffen Klassert wrote:
> >
> >diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> >index 3c9d208..1049ce0 100644
> >--- a/net/ipv4/ip_sockglue.c
> >+++ b/net/ipv4/ip_sockglue.c
> >@@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
> >  	{
> >  		struct dst_entry *dst;
> >  		val = 0;
> >-		dst = sk_dst_get(sk);
> >+		dst = sk_dst_check(sk, 0);
> >  		if (dst) {
> >  			val = dst_mtu(dst);
> >  			dst_release(dst);
> With this patch kernel 3.7.1 works perfect. All described problems
> are fixed.

Thanks for testing!

I'm not sure if we can't use this as a fix. I think with this patch it
could happen that we return -ENOTCONN instead of a pmtu value on a
connected socket. Perhaps it is better to update the cached dst_entry in
ipv4_sk_update_pmtu() when we receive the -EMSGSIZE. I'll do some
investigation.

Anyway, it is still odd that reverting my other patch 'fixes'
this issue too.

^ permalink raw reply

* RE: TCP delayed ACK heuristic
From: Cong Wang @ 2012-12-20 12:41 UTC (permalink / raw)
  To: David Laight
  Cc: David Miller, rick.jones2, netdev, greearb, eric.dumazet,
	shemminger, tgraf
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70FC@saturn3.aculab.com>

On Thu, 2012-12-20 at 09:57 +0000, David Laight wrote:
> > So, can we at least have a sysctl to control the timeout of the delayed
> > ACK? I mean the minimum 40ms. TCP_QUICKACK can help too, but it requires
> > the receiver to modify the application and has to be set every time when
> > calling recv().
> 
> A sysctl in inappropriate - it affects the entire TCP protocol stack.
> 
> You want different behaviour for different remote hosts (probably
> different subnets).
> In particular your local subnet is unlikely to have packet loss
> and very likely to have a very low RTT.
> 
> AFAICT a lot of the recent 'tuning' has been done for web/ftp
> servers that are very remote from the client. These connections
> are also request-response ones - quite often with large responses.
> 
> IMHO This has been to the detriment of local connections.
> 

A customer prefers faster response in their low-loss environment, 40ms
is not good. Of course, they are supposed to know their environment when
they tune this.

Or maybe a sysctl equals to TCP_QUICKACK?

^ permalink raw reply

* Re: [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 12:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
	annie li, xen-devel@lists.xensource.com
In-Reply-To: <1355933869.21834.13.camel@edumazet-glaptop>


Wednesday, December 19, 2012, 5:17:49 PM, you wrote:

> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:

>> Hi Ian,
>> 
>> It ran overnight and i haven't seen the warn_once trigger.
>> (but i also didn't with the previous patch)
>> 

> As I said, the miminum value to not trigger the warning was what Ian
> patch was doing, but it was still a not accurate estimation.

> Doing the real accounting might trigger slow transferts, or dropped
> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.

> So the real question was : If accounting for full pages, is your
> applications run as smooth as before, with no huge performance
> regression ?

Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.

I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..

- The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
- The first variant (current code) seems to always substract from the truesize for small packets.
- The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
- The third variant seems to be a pretty wasteful estimation.

So the last variant seems to be rather wasteful, and the second one the most accurate so far.

Eric:
     From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
     When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?



[  116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
[  117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
[  117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
[  117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
[  117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
[  117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
[  117.539403] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
[  117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
[  117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
[  121.981917] net_ratelimit: 27 callbacks suppressed
[  121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)



diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c26e28b..8833e38 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
        struct sk_buff_head tmpq;
        unsigned long flags;
        int err;
+       int tsz,len;

        spin_lock(&np->rx_lock);

@@ -1037,9 +1038,22 @@ err:
                 * receive throughout using the standard receive
                 * buffer size was cut by 25%(!!!).
                 */
-               skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
+
+
+
+
+                tsz = skb->truesize;
+                len = skb->len;
+                /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
+                skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
                skb->len += skb->data_len;

+               net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
+                        skb->dev->name, skb->dev->mtu, skb->data_len, len,  skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
+                        skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
+                        skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
+                        PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
+
                if (rx->flags & XEN_NETRXF_csum_blank)
                        skb->ip_summed = CHECKSUM_PARTIAL;
                else if (rx->flags & XEN_NETRXF_data_validated)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3ab989b..6d0cd86 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,

        WARN_ON_ONCE(delta < len);

+       if(delta < len) {
+               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
+                        to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
+       }
+
+       if (delta > len && delta - len > 100) {
+               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
+                        to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
+       }
+
        memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
               skb_shinfo(from)->frags,
               skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));

^ permalink raw reply related

* [PATCH net] net/vxlan: Use the underlying device index when joining/leaving multicast groups
From: Yan Burman @ 2012-12-20 13:36 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, ogerlitz, Yan Burman

The socket calls from vxlan to join/leave multicast group aren't
using the index of the underlying device, as a result the stack uses
the first interface that is up. This results in vxlan being non functional
over a device which isn't the 1st to be up.
Fix this by providing the iflink field to the vxlan instance
to the multicast calls.

Signed-off-by: Yan Burman <yanb@mellanox.com>
---
 drivers/net/vxlan.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3b3fdf6..40f2cc1 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -505,7 +505,8 @@ static int vxlan_join_group(struct net_device *dev)
 	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 	struct sock *sk = vn->sock->sk;
 	struct ip_mreqn mreq = {
-		.imr_multiaddr.s_addr = vxlan->gaddr,
+		.imr_multiaddr.s_addr	= vxlan->gaddr,
+		.imr_ifindex		= vxlan->link,
 	};
 	int err;
 
@@ -532,7 +533,8 @@ static int vxlan_leave_group(struct net_device *dev)
 	int err = 0;
 	struct sock *sk = vn->sock->sk;
 	struct ip_mreqn mreq = {
-		.imr_multiaddr.s_addr = vxlan->gaddr,
+		.imr_multiaddr.s_addr	= vxlan->gaddr,
+		.imr_ifindex		= vxlan->link,
 	};
 
 	/* Only leave group when last vxlan is done. */
-- 
1.7.11.3

^ permalink raw reply related

* Re: [PATCH net-next V4 02/13] bridge: Add vlan filtering infrastructure
From: Shmulik Ladkani @ 2012-12-20 13:39 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <1355939304-21804-3-git-send-email-vyasevic@redhat.com>

Hi Vlad,

On Wed, 19 Dec 2012 12:48:13 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> +static void nbp_vlan_flush(struct net_bridge_port *p)
> +{
> +	struct net_port_vlan *pve;
> +	struct net_port_vlan *tmp;
> +
> +	ASSERT_RTNL();
> +
> +	list_for_each_entry_safe(pve, tmp, &p->vlan_list, list)
> +		nbp_vlan_delete(p, pve->vid, BRIDGE_FLAGS_SELF);

Why would you want to clear "bridge master port" association from this
vlan, in the event of NBP destruction?
The "bridge port" may still be a member of this vlan, doesn't it?
Seems flags argument should be 0.

> +#define BR_VID_HASH_SIZE (1<<6)
> +#define br_vlan_hash(vid) ((vid) % (BR_VID_HASH_SIZE - 1))

Did you mean:                       & (BR_VID_HASH_SIZE - 1)

Regards,
Shmulik

^ permalink raw reply

* Re: Network namespace bugs in L2TP
From: Tom Parkin @ 2012-12-20 13:52 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev
In-Reply-To: <87r4mt4um7.fsf@xmission.com>

[-- Attachment #1: Type: text/plain, Size: 2955 bytes --]

Hi Eric,

On Thu, Dec 13, 2012 at 11:31:12AM -0800, Eric W. Biederman wrote:
> Tom Parkin <tparkin@katalix.com> writes:
> 
> > On Wed, Dec 12, 2012 at 11:44:36AM -0800, Eric W. Biederman wrote:
> >> Tom Parkin <tparkin@katalix.com> writes:
> > I think that raises a question in the case of the L2TP tunnel sockets,
> > though.  Currently l2tp_tunnel_sock_create uses the namespace of the
> > current process for the socket.  The alternative is to pass in the
> > desired namespace from l2tp_tunnel_create -- and this makes sense, I
> > think.
> >
> > However, when l2tp_tunnel_create is called from the netlink code, the
> > namespace passed is that of the netlink socket.  At the risk of sounding
> > silly, what's the benefit of using the netlink socket namespace over the
> > process namespace in this case?
> 
> Using the netlink socket namespace ensure that if the netlink socket is
> passed between processes the semantics of sending messages down the
> netlink socket don't change.
> 
> There is another thread on netdev discussing another variant of this
> right now.  For some cases it is just a waste of resources to have one
> copy of a daemon per network namespace.  In which case a controlling
> daemon will open one netlink socket per network namespace and send
> commands down the appropriate socket for the network namespace the
> daemon wishes to control.

Yes, I saw that other thread.  Thanks for the clarification on this
point.

> > But that doesn't seem too unreasonable.  A user would have to take
> > explicit action to create an L2TP tunnel socket, and it might seem
> > reasonable for that socket to keep the namespace alive until the user
> > explicitly tears it down again.
> 
> Sending a netlink message to tear down the socket is not unreasonable.
> 
> Having a reference counting loop such that it is possible to close all
> other sockets and all other references to a network namespace and not
> have the network namespace go away because the L2TP tunnel socket holds
> a reference to the unreachable and unuusable network namespace is
> unreasonable.
> 
> We handle this with arp and icmp control sockets by not creating a
> reference count.  And having a pernet cleanup routing clean up those
> sockets.  Assuming I am right about the reference counting loop being
> possible this is something to look at.

Yep, OK.  I hadn't appreciated the namespace could become inaccessible!

I've done some digging and I believe there is an issue with the
reference counting for the unmanaged tunnel sockets -- certainly I am
able to leak netns resources here.

I've been working on a patchset which I hope will address these issues
in l2tp_core.  I'm stress testing it now and hope to post to netdev
soon for review.

Thanks again for your help.

Tom
-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH] 8139cp: Prevent dev_close/cp_interrupt race on MTU change
From: John Greene @ 2012-12-20 13:55 UTC (permalink / raw)
  To: David Woodhouse; +Cc: David Miller, netdev
In-Reply-To: <1355950547.18919.93.camel@shinybook.infradead.org>

On 12/19/2012 03:55 PM, David Woodhouse wrote:
> On Wed, 2012-12-19 at 12:40 -0800, David Miller wrote:
>> You sent this as a "request for testing" last week, but I saw
>> no testing on real hardware whatsoever.
>
> Thanks for the reminder :)
>
> Seems to work fine here. I haven't confirmed whether I actually see the
> race or not but changing MTU on a live device works fine, even when it's
> being ping-flooded.
>
> Tested-by: David Woodhouse <David.Woodhouse@intel.com>
>
Thanks all. Happy holidays!

-- 
John Greene

^ permalink raw reply

* skb->cb size checks (was Re: [PATCH 00/17] ATM fixes for pppoatm/br2684)
From: David Woodhouse @ 2012-12-20 14:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20121201.204906.1703696018528746748.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]

On Sat, 2012-12-01 at 20:49 -0500, David Miller wrote:
> From: David Woodhouse <dwmw2@infradead.org>
> Date: Sun, 02 Dec 2012 00:40:47 +0000
> 
> > On Sat, 2012-12-01 at 17:33 +0000, David Woodhouse wrote:
> >> 
> >> Very glad I added the BUILD_BUG_ON on the cb struct size now. Perhaps
> >> there should be a generic helper for that? Something like
> >>  skb_cb_cast(struct foo_cb, skb) could do it automatically...?
> > 
> > Something like this, perhaps? Using skb_cast_cb() would then make it
> > fairly much impossible to accidentally overflow the size of the skb cb.
> 
> I actually prefer what we do now, which is do the BUILD_BUG_ON()
> once in the subsystem specific code, usually the initializer.
> 
> It's part of creating a new SKB cb, adding that assertion somewhere.

I looked harder at this, and should follow up before it actually does
fall out of the cracks in my brain and get completely forgotten.

Basically, you lie :)

What we *actually* do now, in about two-thirds of cases¹ even in net/
code (I didn't even look at drivers, which I expect to be worse), is use
skb->cb without any form of automatic size check at all. No manual
BUILD_BUG_ON() or anything.

Admittedly, in almost all cases that *isn't* a real problem, because the
structure *isn't* too big for skb->cb and it's all fine. But as a matter
of principle we probably *should* be doing those checks. Just in *case*
someone comes along and adds something stupid to the structure.

So... should we:
 - Ignore the "problem" and leave things as they are.

 - Go through and fix the 2/3 of offending net/ code and then the
   drivers too, *without* making the generic 'deference and automatic
   check' macro that I think would simplify that and help to keep us
   honest in future.
or
 - Let me add something like the skb_cast_cb() macro I wanted, then use
   it in all the offending code I can find.

-- 
dwmw2

¹ http://www.spinics.net/lists/netdev/msg218642.html


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Lockdep warning in vxlan
From: Yan Burman @ 2012-12-20 14:00 UTC (permalink / raw)
  To: shemminger, netdev, Yan Burman

Hi.

When working with vxlan from current net-next, I got a lockdep warning 
(below).
It seems to happen when I have host B pinging host A and while the pings 
continue,
I do "ip link del" on the vxlan interface on host A. The lockdep warning 
is on host A.
Tell me if you need some more info.

=============================================
[ INFO: possible recursive locking detected ]
3.7.0+ #24 Not tainted
---------------------------------------------
swapper/1/0 is trying to acquire lock:
  (&n->lock){++--..}, at: [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0

but task is already holding lock:
  (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&n->lock);
   lock(&n->lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

4 locks held by swapper/1/0:
  #0:  (((&n->timer))){+.-...}, at: [<ffffffff8104b350>] 
call_timer_fn+0x0/0x1c0
  #1:  (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280
  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81395400>] 
dev_queue_xmit+0x0/0x5d0
  #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff813cb41e>] 
ip_finish_output+0x13e/0x640

stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.7.0+ #24
Call Trace:
  <IRQ>  [<ffffffff8108c7ac>] validate_chain+0xdcc/0x11f0
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff81120565>] ? kmem_cache_free+0xe5/0x1c0
  [<ffffffff8108d570>] __lock_acquire+0x440/0xc30
  [<ffffffff813c3570>] ? inet_getpeer+0x40/0x600
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8108ddf5>] lock_acquire+0x95/0x140
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff81448d4b>] _raw_write_lock_bh+0x3b/0x50
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0
  [<ffffffff8139f99b>] neigh_resolve_output+0x16b/0x270
  [<ffffffff813cb62d>] ip_finish_output+0x34d/0x640
  [<ffffffff813cb41e>] ? ip_finish_output+0x13e/0x640
  [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
  [<ffffffff813cb9a0>] ip_output+0x80/0xf0
  [<ffffffff813ca368>] ip_local_out+0x28/0x80
  [<ffffffffa046f25a>] vxlan_xmit+0x66a/0xbec [vxlan]
  [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
  [<ffffffff81394a50>] ? skb_gso_segment+0x2b0/0x2b0
  [<ffffffff81449355>] ? _raw_spin_unlock_irqrestore+0x65/0x80
  [<ffffffff81394c57>] ? dev_queue_xmit_nit+0x207/0x270
  [<ffffffff813950c8>] dev_hard_start_xmit+0x298/0x5d0
  [<ffffffff813956f3>] dev_queue_xmit+0x2f3/0x5d0
  [<ffffffff81395400>] ? dev_hard_start_xmit+0x5d0/0x5d0
  [<ffffffff813f5788>] arp_xmit+0x58/0x60
  [<ffffffff813f59db>] arp_send+0x3b/0x40
  [<ffffffff813f6424>] arp_solicit+0x204/0x280
  [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
  [<ffffffff8139f515>] neigh_probe+0x45/0x70
  [<ffffffff813a1c10>] neigh_timer_handler+0x1a0/0x2a0
  [<ffffffff8104b3cf>] call_timer_fn+0x7f/0x1c0
  [<ffffffff8104b350>] ? detach_if_pending+0x120/0x120
  [<ffffffff8104b748>] run_timer_softirq+0x238/0x2b0
  [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
  [<ffffffff81043e51>] __do_softirq+0x101/0x280
  [<ffffffff814518cc>] call_softirq+0x1c/0x30
  [<ffffffff81003b65>] do_softirq+0x85/0xc0
  [<ffffffff81043a7e>] irq_exit+0x9e/0xc0
  [<ffffffff810264f8>] smp_apic_timer_interrupt+0x68/0xa0
  [<ffffffff8145122f>] apic_timer_interrupt+0x6f/0x80
  <EOI>  [<ffffffff8100a054>] ? mwait_idle+0xa4/0x1c0
  [<ffffffff8100a04b>] ? mwait_idle+0x9b/0x1c0
  [<ffffffff8100a6a9>] cpu_idle+0x89/0xe0
  [<ffffffff81441127>] start_secondary+0x1b2/0x1b6

Hope this helps
Yan

^ permalink raw reply

* Re: [PATCH net-next V4 03/13] bridge: Validate that vlan is permitted on ingress
From: Shmulik Ladkani @ 2012-12-20 14:07 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <1355939304-21804-4-git-send-email-vyasevic@redhat.com>

Hi Vlad,

On Wed, 19 Dec 2012 12:48:14 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> +static bool br_allowed_ingress(struct net_bridge_port *p, struct sk_buff *skb)
> +{
> +	struct net_port_vlan *pve;
> +	u16 vid;
> +
> +	/* If there are no vlan in the permitted list, all packets are
> +	 * permitted.
> +	 */
> +	if (list_empty(&p->vlan_list))
> +		return true;

I assumed the default policy would be Drop in such case, otherwise
leaking between vlan domains is possible.
Or maybe, ingress policy when port isn't a member of ingress VID should
be configurable (drop/allow).

> +	vid = br_get_vlan(skb);
> +	pve = nbp_vlan_find(p, vid);

Why search by iterating through NBP's vlan_list?
You know the VID (hence may fetch the net_bridge_vlan from the hash), so
why don't you directly consult the net_bridge_vlan's port_bitmap?

> @@ -54,6 +74,9 @@ int br_handle_frame_finish(struct sk_buff *skb)
>  	if (!p || p->state == BR_STATE_DISABLED)
>  		goto drop;
>  
> +	if (!br_allowed_ingress(p, skb))
> +		goto drop;
> +

This condition should be also encorporated upon "ingress" at the "bridge
master port" (that is, early at br_dev_xmit).
Think of the "bridge master port" as yet another port:
upon "ingress" (meaning, tx packets from the ip stack), we should
also enforce any ingress permission rules.

Regards,
Shmulik

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 14:23 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Eric Dumazet, netdev@vger.kernel.org, annie li,
	xen-devel@lists.xensource.com, Ian Campbell,
	Konrad Rzeszutek Wilk
In-Reply-To: <1797374383.20121220135139@eikelenboom.it>


Thursday, December 20, 2012, 1:51:39 PM, you wrote:


> Wednesday, December 19, 2012, 5:17:49 PM, you wrote:

>> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:

>>> Hi Ian,
>>> 
>>> It ran overnight and i haven't seen the warn_once trigger.
>>> (but i also didn't with the previous patch)
>>> 

>> As I said, the miminum value to not trigger the warning was what Ian
>> patch was doing, but it was still a not accurate estimation.

>> Doing the real accounting might trigger slow transferts, or dropped
>> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.

>> So the real question was : If accounting for full pages, is your
>> applications run as smooth as before, with no huge performance
>> regression ?

> Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.

> I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
> So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..

> - The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
> - The first variant (current code) seems to always substract from the truesize for small packets.
> - The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
> - The third variant seems to be a pretty wasteful estimation.

> So the last variant seems to be rather wasteful, and the second one the most accurate so far.

> Eric:
>      From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
>      When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?



> [  116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
> [  117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
> [  117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
> [  117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
> [  117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
> [  117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
> [  117.539403] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
> [  117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
> [  117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
> [  121.981917] net_ratelimit: 27 callbacks suppressed
> [  121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)



> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index c26e28b..8833e38 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>         struct sk_buff_head tmpq;
>         unsigned long flags;
>         int err;
> +       int tsz,len;

>         spin_lock(&np->rx_lock);

> @@ -1037,9 +1038,22 @@ err:
>                  * receive throughout using the standard receive
>                  * buffer size was cut by 25%(!!!).
>                  */
> -               skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> +
> +
> +
> +
> +                tsz = skb->truesize;
> +                len = skb->len;
> +                /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
> +                skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>                 skb->len += skb->data_len;

> +               net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
> +                        skb->dev->name, skb->dev->mtu, skb->data_len, len,  skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
> +                        skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
> +                        skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
> +                        PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
> +
>                 if (rx->flags & XEN_NETRXF_csum_blank)
>                         skb->ip_summed = CHECKSUM_PARTIAL;
>                 else if (rx->flags & XEN_NETRXF_data_validated)
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 3ab989b..6d0cd86 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,

>         WARN_ON_ONCE(delta < len);

> +       if(delta < len) {
> +               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
> +                        to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
> +       }
> +
+       if (delta >> len && delta - len > 100) {
> +               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
> +                        to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
> +       }
> +
>         memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
>                skb_shinfo(from)->frags,
>                skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));



Ok i succeeded in triggering the warn_on_once, but it seems the extra debug info from netfront was just rate limited away for the offending packet :(

Dec 20 15:17:33 media kernel: [  393.464062] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [  393.464438] eth0: mtu:1500 data_len:762 len before:0 len after:762 truesize before:896 truesize after:1402 nr_frags:1 variant1:506(1402) variant2:506(1402) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [  393.465083] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [  393.466114] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [  393.467336] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  394.940211] ------------[ cut here ]------------
Dec 20 15:17:35 media kernel: [  394.940259] WARNING: at net/core/skbuff.c:3472 skb_try_coalesce+0x3fc/0x470()
Dec 20 15:17:35 media kernel: [  394.940282] Modules linked in:
Dec 20 15:17:35 media kernel: [  394.940306] Pid: 2632, comm: glusterfs Not tainted 3.7.0-rc0-20121220-netfrontdebug1 #1
Dec 20 15:17:35 media kernel: [  394.940330] Call Trace:
Dec 20 15:17:35 media kernel: [  394.940343]  <IRQ>  [<ffffffff8106889a>] warn_slowpath_common+0x7a/0xb0
Dec 20 15:17:35 media kernel: [  394.940384]  [<ffffffff810688e5>] warn_slowpath_null+0x15/0x20
Dec 20 15:17:35 media kernel: [  394.940409]  [<ffffffff8184298c>] skb_try_coalesce+0x3fc/0x470
Dec 20 15:17:35 media kernel: [  394.940434]  [<ffffffff818fb049>] tcp_try_coalesce+0x69/0xc0
Dec 20 15:17:35 media kernel: [  394.940458]  [<ffffffff818fb0f4>] tcp_queue_rcv+0x54/0x100
Dec 20 15:17:35 media kernel: [  394.940481]  [<ffffffff8190029f>] ? tcp_mtup_init+0x2f/0x90
Dec 20 15:17:35 media kernel: [  394.940504]  [<ffffffff818ffbdb>] tcp_rcv_established+0x2bb/0x6a0
Dec 20 15:17:35 media kernel: [  394.940528]  [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
Dec 20 15:17:35 media kernel: [  394.940551]  [<ffffffff81907985>] tcp_v4_do_rcv+0x135/0x480
Dec 20 15:17:35 media kernel: [  394.940576]  [<ffffffff819b3532>] ? _raw_spin_lock_nested+0x42/0x50
Dec 20 15:17:35 media kernel: [  394.940600]  [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
Dec 20 15:17:35 media kernel: [  394.940623]  [<ffffffff8190862d>] tcp_v4_rcv+0x95d/0xb10
Dec 20 15:17:35 media kernel: [  394.940666]  [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
Dec 20 15:17:35 media kernel: [  394.940694]  [<ffffffff818e4d6a>] ip_local_deliver_finish+0x11a/0x230
Dec 20 15:17:35 media kernel: [  394.940720]  [<ffffffff818e4c95>] ? ip_local_deliver_finish+0x45/0x230
Dec 20 15:17:35 media kernel: [  394.940745]  [<ffffffff818e4eb8>] ip_local_deliver+0x38/0x80
Dec 20 15:17:35 media kernel: [  394.940784]  [<ffffffff818e447a>] ip_rcv_finish+0x15a/0x630
Dec 20 15:17:35 media kernel: [  394.940807]  [<ffffffff818e4b68>] ip_rcv+0x218/0x300
Dec 20 15:17:35 media kernel: [  394.940829]  [<ffffffff8184bf2d>] __netif_receive_skb+0x65d/0x8d0
Dec 20 15:17:35 media kernel: [  394.940853]  [<ffffffff8184ba15>] ? __netif_receive_skb+0x145/0x8d0
Dec 20 15:17:35 media kernel: [  394.940889]  [<ffffffff810b192d>] ? trace_hardirqs_on+0xd/0x10
Dec 20 15:17:35 media kernel: [  394.940914]  [<ffffffff810fecbb>] ? free_hot_cold_page+0x1ab/0x1e0
Dec 20 15:17:35 media kernel: [  394.940939]  [<ffffffff8184e4f8>] netif_receive_skb+0x28/0xf0
Dec 20 15:17:35 media kernel: [  394.940964]  [<ffffffff81843e83>] ? __pskb_pull_tail+0x253/0x340
Dec 20 15:17:35 media kernel: [  394.941000]  [<ffffffff8164fbb5>] xennet_poll+0xae5/0xed0
Dec 20 15:17:35 media kernel: [  394.941024]  [<ffffffff81080081>] ? wake_up_worker+0x1/0x30
Dec 20 15:17:35 media kernel: [  394.941046]  [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
Dec 20 15:17:35 media kernel: [  394.941075]  [<ffffffff8184ed66>] net_rx_action+0x136/0x260
Dec 20 15:17:35 media kernel: [  394.941098]  [<ffffffff81070551>] ? __do_softirq+0x71/0x1a0
Dec 20 15:17:35 media kernel: [  394.941133]  [<ffffffff810705a9>] __do_softirq+0xc9/0x1a0
Dec 20 15:17:35 media kernel: [  394.941157]  [<ffffffff819b623c>] call_softirq+0x1c/0x30
Dec 20 15:17:35 media kernel: [  394.941179]  [<ffffffff8100fdc5>] do_softirq+0x85/0xf0
Dec 20 15:17:35 media kernel: [  394.941201]  [<ffffffff8107041e>] irq_exit+0x9e/0xd0
Dec 20 15:17:35 media kernel: [  394.941235]  [<ffffffff81430b1f>] xen_evtchn_do_upcall+0x2f/0x40
Dec 20 15:17:35 media kernel: [  394.941259]  [<ffffffff819b629e>] xen_do_hypervisor_callback+0x1e/0x30
Dec 20 15:17:35 media kernel: [  394.941279]  <EOI>  [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
Dec 20 15:17:35 media kernel: [  394.941318]  [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
Dec 20 15:17:35 media kernel: [  394.941356]  [<ffffffff8100890d>] ? xen_force_evtchn_callback+0xd/0x10
Dec 20 15:17:35 media kernel: [  394.941381]  [<ffffffff810092b2>] ? check_events+0x12/0x20
Dec 20 15:17:35 media kernel: [  394.941405]  [<ffffffff81009259>] ? xen_irq_enable_direct_reloc+0x4/0x4
Dec 20 15:17:35 media kernel: [  394.941432]  [<ffffffff819b3f6c>] ? _raw_spin_unlock_irq+0x3c/0x70
Dec 20 15:17:35 media kernel: [  394.941473]  [<ffffffff81095f83>] ? finish_task_switch+0x83/0xe0
Dec 20 15:17:35 media kernel: [  394.941507]  [<ffffffff81095f46>] ? finish_task_switch+0x46/0xe0
Dec 20 15:17:35 media kernel: [  394.941533]  [<ffffffff819b2434>] ? __schedule+0x444/0x880
Dec 20 15:17:35 media kernel: [  394.941555]  [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
Dec 20 15:17:35 media kernel: [  394.941580]  [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
Dec 20 15:17:35 media kernel: [  394.941614]  [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
Dec 20 15:17:35 media kernel: [  394.941638]  [<ffffffff819aff95>] ? __mutex_unlock_slowpath+0x135/0x1d0
Dec 20 15:17:35 media kernel: [  394.941663]  [<ffffffff819b2904>] ? schedule+0x24/0x70
Dec 20 15:17:35 media kernel: [  394.941697]  [<ffffffff819b179d>] ? schedule_hrtimeout_range_clock+0x11d/0x140
Dec 20 15:17:35 media kernel: [  394.941725]  [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
Dec 20 15:17:35 media kernel: [  394.941748]  [<ffffffff8118a558>] ? ep_poll+0xf8/0x3a0
Dec 20 15:17:35 media kernel: [  394.941770]  [<ffffffff819b4015>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
Dec 20 15:17:35 media kernel: [  394.941808]  [<ffffffff810b1818>] ? trace_hardirqs_on_caller+0xf8/0x200
Dec 20 15:17:35 media kernel: [  394.941833]  [<ffffffff819b17ce>] ? schedule_hrtimeout_range+0xe/0x10
Dec 20 15:17:35 media kernel: [  394.941856]  [<ffffffff8118a75a>] ? ep_poll+0x2fa/0x3a0
Dec 20 15:17:35 media kernel: [  394.941878]  [<ffffffff81098630>] ? try_to_wake_up+0x310/0x310
Dec 20 15:17:35 media kernel: [  394.941913]  [<ffffffff810b5b17>] ? lock_release+0x117/0x250
Dec 20 15:17:35 media kernel: [  394.941938]  [<ffffffff81165fd7>] ? fget_light+0xd7/0x140
Dec 20 15:17:35 media kernel: [  394.941959]  [<ffffffff81165f3a>] ? fget_light+0x3a/0x140
Dec 20 15:17:35 media kernel: [  394.941981]  [<ffffffff8118a8ce>] ? sys_epoll_wait+0xce/0xe0
Dec 20 15:17:35 media kernel: [  394.942015]  [<ffffffff819b4e69>] ? system_call_fastpath+0x16/0x1b
Dec 20 15:17:35 media kernel: [  394.942036] ---[ end trace 6f3a832c9e91c8af ]---
Dec 20 15:17:35 media kernel: [  394.942056] to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:22978 len:23168 from->truesize:23874 skb_headlen(from):0 skb_shinfo(to)->nr_frags:4 skb_shinfo(from)->nr_frags:6
Dec 20 15:17:35 media kernel: [  394.968199] to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:14290 len:14480 from->truesize:15186 skb_headlen(from):0 skb_shinfo(to)->nr_frags:13 skb_shinfo(from)->nr_frags:4
Dec 20 15:17:35 media kernel: [  395.262814] net_ratelimit: 371 callbacks suppressed
Dec 20 15:17:35 media kernel: [  395.262858] eth0: mtu:1500 data_len:90 len before:0 len after:90 truesize before:896 truesize after:730 nr_frags:1 variant1:-166(730) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.264767] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.266193] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.268422] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.271617] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.274794] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.278104] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.281319] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.284454] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.287797] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.291121] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)

^ permalink raw reply

* [PATCH] net: ipv4: route: fix coding style issues net: ipv4: tcp: fix coding style issues
From: Stefan Hasko @ 2012-12-20 14:28 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev
  Cc: linux-kernel, Stefan Hasko

Fix a coding style issues.

Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
---
 net/ipv4/route.c |  119 ++++++++++++++++-------------
 net/ipv4/tcp.c   |  218 +++++++++++++++++++++++++++++++-----------------------
 2 files changed, 194 insertions(+), 143 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 844a9ef..29678e5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -20,7 +20,7 @@
  *		Alan Cox	:	Added BSD route gw semantics
  *		Alan Cox	:	Super /proc >4K
  *		Alan Cox	:	MTU in route table
- *		Alan Cox	: 	MSS actually. Also added the window
+ *		Alan Cox	:	MSS actually. Also added the window
  *					clamper.
  *		Sam Lantinga	:	Fixed route matching in rt_del()
  *		Alan Cox	:	Routing cache support.
@@ -31,30 +31,35 @@
  *	Miquel van Smoorenburg	:	BSD API fixes.
  *	Miquel van Smoorenburg	:	Metrics.
  *		Alan Cox	:	Use __u32 properly
- *		Alan Cox	:	Aligned routing errors more closely with BSD
+ *		Alan Cox	:	Aligned routing errors more
+ *					closely with BSD
  *					our system is still very different.
  *		Alan Cox	:	Faster /proc handling
- *	Alexey Kuznetsov	:	Massive rework to support tree based routing,
+ *	Alexey Kuznetsov	:	Massive rework to support
+ *					tree based routing,
  *					routing caches and better behaviour.
  *
  *		Olaf Erb	:	irtt wasn't being copied right.
  *		Bjorn Ekwall	:	Kerneld route support.
  *		Alan Cox	:	Multicast fixed (I hope)
- * 		Pavel Krauz	:	Limited broadcast fixed
+ *		Pavel Krauz	:	Limited broadcast fixed
  *		Mike McLagan	:	Routing by source
  *	Alexey Kuznetsov	:	End of old history. Split to fib.c and
  *					route.c and rewritten from scratch.
  *		Andi Kleen	:	Load-limit warning messages.
- *	Vitaly E. Lavrov	:	Transparent proxy revived after year coma.
+ *	Vitaly E. Lavrov	:	Transparent proxy revived
+ *					after year coma.
  *	Vitaly E. Lavrov	:	Race condition in ip_route_input_slow.
- *	Tobias Ringstrom	:	Uninitialized res.type in ip_route_output_slow.
+ *	Tobias Ringstrom	:	Uninitialized res.type in
+ *					ip_route_output_slow.
  *	Vladimir V. Ivanov	:	IP rule info (flowid) is really useful.
  *		Marc Boucher	:	routing by fwmark
  *	Robert Olsson		:	Added rt_cache statistics
  *	Arnaldo C. Melo		:	Convert proc stuff to seq_file
- *	Eric Dumazet		:	hashed spinlocks and rt_check_expire() fixes.
- * 	Ilia Sotnikov		:	Ignore TOS on PMTUD and Redirect
- * 	Ilia Sotnikov		:	Removed TOS from hash calculations
+ *	Eric Dumazet		:	hashed spinlocks and
+ *					rt_check_expire() fixes.
+ *	Ilia Sotnikov		:	Ignore TOS on PMTUD and Redirect
+ *	Ilia Sotnikov		:	Removed TOS from hash calculations
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
@@ -65,7 +70,7 @@
 #define pr_fmt(fmt) "IPv4: " fmt
 
 #include <linux/module.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 #include <linux/bitops.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
@@ -139,7 +144,8 @@ static unsigned int	 ipv4_default_advmss(const struct dst_entry *dst);
 static unsigned int	 ipv4_mtu(const struct dst_entry *dst);
 static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
 static void		 ipv4_link_failure(struct sk_buff *skb);
-static void		 ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
+static void		 ip_rt_update_pmtu(struct dst_entry *dst,
+					   struct sock *sk,
 					   struct sk_buff *skb, u32 mtu);
 static void		 ip_do_redirect(struct dst_entry *dst, struct sock *sk,
 					struct sk_buff *skb);
@@ -291,12 +297,11 @@ static int rt_cpu_seq_show(struct seq_file *seq, void *v)
 	struct rt_cache_stat *st = v;
 
 	if (v == SEQ_START_TOKEN) {
-		seq_printf(seq, "entries  in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src  out_hit out_slow_tot out_slow_mc  gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n");
+		seq_printf(seq, "entries in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src out_hit out_slow_tot out_slow_mc gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n");
 		return 0;
 	}
 
-	seq_printf(seq,"%08x  %08x %08x %08x %08x %08x %08x %08x "
-		   " %08x %08x %08x %08x %08x %08x %08x %08x %08x \n",
+		seq_printf(seq, "%08x  %08x %08x %08x %08x %08x %08x %08x  %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
 		   dst_entries_get_slow(&ipv4_dst_ops),
 		   st->in_hit,
 		   st->in_slow_tot,
@@ -657,8 +662,8 @@ out_unlock:
 	return;
 }
 
-static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flowi4 *fl4,
-			     bool kill_route)
+static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb,
+			     struct flowi4 *fl4, bool kill_route)
 {
 	__be32 new_gw = icmp_hdr(skb)->un.gateway;
 	__be32 old_gw = ip_hdr(skb)->saddr;
@@ -695,7 +700,8 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 	if (!IN_DEV_SHARED_MEDIA(in_dev)) {
 		if (!inet_addr_onlink(in_dev, new_gw, old_gw))
 			goto reject_redirect;
-		if (IN_DEV_SEC_REDIRECTS(in_dev) && ip_fib_check_default(new_gw, dev))
+		if (IN_DEV_SEC_REDIRECTS(in_dev) &&
+		    ip_fib_check_default(new_gw, dev))
 			goto reject_redirect;
 	} else {
 		if (inet_addr_type(net, new_gw) != RTN_UNICAST)
@@ -737,7 +743,8 @@ reject_redirect:
 	;
 }
 
-static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb)
+static void ip_do_redirect(struct dst_entry *dst, struct sock *sk,
+			   struct sk_buff *skb)
 {
 	struct rtable *rt;
 	struct flowi4 fl4;
@@ -1202,11 +1209,11 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
 	struct rtable *orig, *prev, **p;
 	bool ret = true;
 
-	if (rt_is_input_route(rt)) {
+	if (rt_is_input_route(rt))
 		p = (struct rtable **)&nh->nh_rth_input;
-	} else {
+	else
 		p = (struct rtable **)__this_cpu_ptr(nh->nh_pcpu_rth_output);
-	}
+
 	orig = *p;
 
 	prev = cmpxchg(p, orig, rt);
@@ -1359,17 +1366,17 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 #endif
 	rth->dst.output = ip_rt_bug;
 
-	rth->rt_genid	= rt_genid(dev_net(dev));
-	rth->rt_flags	= RTCF_MULTICAST;
-	rth->rt_type	= RTN_MULTICAST;
-	rth->rt_is_input= 1;
-	rth->rt_iif	= 0;
-	rth->rt_pmtu	= 0;
-	rth->rt_gateway	= 0;
+	rth->rt_genid = rt_genid(dev_net(dev));
+	rth->rt_flags = RTCF_MULTICAST;
+	rth->rt_type = RTN_MULTICAST;
+	rth->rt_is_input = 1;
+	rth->rt_iif = 0;
+	rth->rt_pmtu = 0;
+	rth->rt_gateway = 0;
 	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
 	if (our) {
-		rth->dst.input= ip_local_deliver;
+		rth->dst.input = ip_local_deliver;
 		rth->rt_flags |= RTCF_LOCAL;
 	}
 
@@ -1488,8 +1495,8 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->rt_flags = flags;
 	rth->rt_type = res->type;
 	rth->rt_is_input = 1;
-	rth->rt_iif 	= 0;
-	rth->rt_pmtu	= 0;
+	rth->rt_iif = 0;
+	rth->rt_pmtu = 0;
 	rth->rt_gateway	= 0;
 	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
@@ -1649,25 +1656,25 @@ local_input:
 	if (!rth)
 		goto e_nobufs;
 
-	rth->dst.input= ip_local_deliver;
-	rth->dst.output= ip_rt_bug;
+	rth->dst.input = ip_local_deliver;
+	rth->dst.output = ip_rt_bug;
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	rth->dst.tclassid = itag;
 #endif
 
 	rth->rt_genid = rt_genid(net);
-	rth->rt_flags 	= flags|RTCF_LOCAL;
-	rth->rt_type	= res.type;
+	rth->rt_flags = flags|RTCF_LOCAL;
+	rth->rt_type = res.type;
 	rth->rt_is_input = 1;
-	rth->rt_iif	= 0;
-	rth->rt_pmtu	= 0;
+	rth->rt_iif = 0;
+	rth->rt_pmtu = 0;
 	rth->rt_gateway	= 0;
 	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
 	if (res.type == RTN_UNREACHABLE) {
-		rth->dst.input= ip_error;
-		rth->dst.error= -err;
-		rth->rt_flags 	&= ~RTCF_LOCAL;
+		rth->dst.input = ip_error;
+		rth->dst.error = -err;
+		rth->rt_flags &= ~RTCF_LOCAL;
 	}
 	if (do_cache)
 		rt_cache_route(&FIB_RES_NH(res), rth);
@@ -1772,7 +1779,8 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 		return ERR_PTR(-EINVAL);
 
 	if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev)))
-		if (ipv4_is_loopback(fl4->saddr) && !(dev_out->flags & IFF_LOOPBACK))
+		if (ipv4_is_loopback(fl4->saddr) &&
+		    !(dev_out->flags & IFF_LOOPBACK))
 			return ERR_PTR(-EINVAL);
 
 	if (ipv4_is_lbcast(fl4->daddr))
@@ -1919,7 +1927,9 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 		if (fl4->flowi4_oif == 0 &&
 		    (ipv4_is_multicast(fl4->daddr) ||
 		     ipv4_is_lbcast(fl4->daddr))) {
-			/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
+			/* It is equivalent to
+			 * inet_addr_type(saddr) == RTN_LOCAL
+			 */
 			dev_out = __ip_dev_find(net, fl4->saddr, false);
 			if (dev_out == NULL)
 				goto out;
@@ -1944,7 +1954,9 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 		}
 
 		if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
-			/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
+			/* It is equivalent to
+			 * inet_addr_type(saddr) == RTN_LOCAL
+			 */
 			if (!__ip_dev_find(net, fl4->saddr, false))
 				goto out;
 		}
@@ -1972,7 +1984,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 		if (fl4->saddr) {
 			if (ipv4_is_multicast(fl4->daddr))
 				fl4->saddr = inet_select_addr(dev_out, 0,
-							      fl4->flowi4_scope);
+							     fl4->flowi4_scope);
 			else if (!fl4->daddr)
 				fl4->saddr = inet_select_addr(dev_out, 0,
 							      RT_SCOPE_HOST);
@@ -2061,7 +2073,8 @@ out:
 }
 EXPORT_SYMBOL_GPL(__ip_route_output_key);
 
-static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst, u32 cookie)
+static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst,
+						  u32 cookie)
 {
 	return NULL;
 }
@@ -2073,7 +2086,8 @@ static unsigned int ipv4_blackhole_mtu(const struct dst_entry *dst)
 	return mtu ? : dst->dev->mtu;
 }
 
-static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk,
+static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst,
+					  struct sock *sk,
 					  struct sk_buff *skb, u32 mtu)
 {
 }
@@ -2101,7 +2115,8 @@ static struct dst_ops ipv4_dst_blackhole_ops = {
 	.neigh_lookup		=	ipv4_neigh_lookup,
 };
 
-struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_orig)
+struct dst_entry *ipv4_blackhole_route(struct net *net,
+				       struct dst_entry *dst_orig)
 {
 	struct rtable *ort = (struct rtable *) dst_orig;
 	struct rtable *rt;
@@ -2265,7 +2280,8 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
-static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void *arg)
+static int inet_rtm_getroute(struct sk_buff *in_skb,
+			     struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct rtmsg *rtm;
@@ -2297,7 +2313,9 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void
 	skb_reset_mac_header(skb);
 	skb_reset_network_header(skb);
 
-	/* Bugfix: need to give ip_route_input enough of an IP header to not gag. */
+	/* Bugfix: need to give ip_route_input enough
+	 * of an IP header to not gag.
+	 */
 	ip_hdr(skb)->protocol = IPPROTO_ICMP;
 	skb_reserve(skb, MAX_HEADER + sizeof(struct iphdr));
 
@@ -2596,7 +2614,8 @@ int __init ip_rt_init(void)
 	int rc = 0;
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
-	ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct), __alignof__(struct ip_rt_acct));
+	ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct),
+				    __alignof__(struct ip_rt_acct));
 	if (!ip_rt_acct)
 		panic("IP: failed to allocate ip_rt_acct\n");
 #endif
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1ca2536..12fadb2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -45,7 +45,7 @@
  *					escape still
  *		Alan Cox	:	Fixed another acking RST frame bug.
  *					Should stop LAN workplace lockups.
- *		Alan Cox	: 	Some tidyups using the new skb list
+ *		Alan Cox	:	Some tidyups using the new skb list
  *					facilities
  *		Alan Cox	:	sk->keepopen now seems to work
  *		Alan Cox	:	Pulls options out correctly on accepts
@@ -160,7 +160,8 @@
  *					generates them.
  *		Alan Cox	:	Cache last socket.
  *		Alan Cox	:	Per route irtt.
- *		Matt Day	:	poll()->select() match BSD precisely on error
+ *		Matt Day	:	poll()->select() match BSD precisely
+ *					on error
  *		Alan Cox	:	New buffers
  *		Marc Tamsky	:	Various sk->prot->retransmits and
  *					sk->retransmits misupdating fixed.
@@ -168,9 +169,9 @@
  *					and TCP syn retries gets used now.
  *		Mark Yarvis	:	In tcp_read_wakeup(), don't send an
  *					ack if state is TCP_CLOSED.
- *		Alan Cox	:	Look up device on a retransmit - routes may
- *					change. Doesn't yet cope with MSS shrink right
- *					but it's a start!
+ *		Alan Cox	:	Look up device on a retransmit - routes
+ *					may change. Doesn't yet cope with MSS
+ *					shrink right but it's a start!
  *		Marc Tamsky	:	Closing in closing fixes.
  *		Mike Shaver	:	RFC1122 verifications.
  *		Alan Cox	:	rcv_saddr errors.
@@ -199,7 +200,7 @@
  *					tcp_do_sendmsg to avoid burstiness.
  *		Eric Schenk	:	Fix fast close down bug with
  *					shutdown() followed by close().
- *		Andi Kleen 	:	Make poll agree with SIGIO
+ *		Andi Kleen	:	Make poll agree with SIGIO
  *	Salvatore Sanfilippo	:	Support SO_LINGER with linger == 1 and
  *					lingertime == 0 (RFC 793 ABORT Call)
  *	Hirokazu Takahashi	:	Use copy_from_user() instead of
@@ -268,6 +269,7 @@
 #include <linux/crypto.h>
 #include <linux/time.h>
 #include <linux/slab.h>
+#include <linux/uaccess.h>
 
 #include <net/icmp.h>
 #include <net/inet_common.h>
@@ -277,7 +279,6 @@
 #include <net/netdma.h>
 #include <net/sock.h>
 
-#include <asm/uaccess.h>
 #include <asm/ioctls.h>
 
 int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
@@ -286,22 +287,20 @@ struct percpu_counter tcp_orphan_count;
 EXPORT_SYMBOL_GPL(tcp_orphan_count);
 
 int sysctl_tcp_wmem[3] __read_mostly;
-int sysctl_tcp_rmem[3] __read_mostly;
+EXPORT_SYMBOL(sysctl_tcp_wmem);
 
+int sysctl_tcp_rmem[3] __read_mostly;
 EXPORT_SYMBOL(sysctl_tcp_rmem);
-EXPORT_SYMBOL(sysctl_tcp_wmem);
 
 atomic_long_t tcp_memory_allocated;	/* Current allocated memory. */
 EXPORT_SYMBOL(tcp_memory_allocated);
 
-/*
- * Current number of TCP sockets.
+/* Current number of TCP sockets.
  */
 struct percpu_counter tcp_sockets_allocated;
 EXPORT_SYMBOL(tcp_sockets_allocated);
 
-/*
- * TCP splice context
+/* TCP splice context
  */
 struct tcp_splice_state {
 	struct pipe_inode_info *pipe;
@@ -309,8 +308,7 @@ struct tcp_splice_state {
 	unsigned int flags;
 };
 
-/*
- * Pressure flag: try to collapse.
+/* Pressure flag: try to collapse.
  * Technical note: it is used by multiple contexts non atomically.
  * All the __sk_mem_schedule() is of this nature: accounting
  * is strict, actions are advisory and have some latency.
@@ -430,8 +428,7 @@ void tcp_init_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
-/*
- *	Wait for a TCP event.
+/*	Wait for a TCP event.
  *
  *	Note that we don't need to lock the socket, as the upper poll layers
  *	take care of normal races (between the test and the event) and we don't
@@ -454,8 +451,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 
 	mask = 0;
 
-	/*
-	 * POLLHUP is certainly not done right. But poll() doesn't
+	/* POLLHUP is certainly not done right. But poll() doesn't
 	 * have a notion of HUP in just one direction, and for a
 	 * socket the read side is more interesting.
 	 *
@@ -498,7 +494,8 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 
 		/* Potential race condition. If read of tp below will
 		 * escape above sk->sk_state, we can be illegally awaken
-		 * in SYN_* states. */
+		 * in SYN_* states.
+		 */
 		if (tp->rcv_nxt - tp->copied_seq >= target)
 			mask |= POLLIN | POLLRDNORM;
 
@@ -509,14 +506,15 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 				set_bit(SOCK_ASYNC_NOSPACE,
 					&sk->sk_socket->flags);
 				set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-
-				/* Race breaker. If space is freed after
-				 * wspace test but before the flags are set,
-				 * IO signal will be lost.
-				 */
-				if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk))
-					mask |= POLLOUT | POLLWRNORM;
 			}
+
+			/* Race breaker. If space is freed after
+			 * wspace test but before the flags are set,
+			 * IO signal will be lost.
+			 */
+			if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)
+			 && sk_stream_wspace(sk) >= sk_stream_min_wspace(sk))
+				mask |= POLLOUT | POLLWRNORM;
 		} else
 			mask |= POLLOUT | POLLWRNORM;
 
@@ -634,7 +632,7 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now,
 
 		tcp_mark_urg(tp, flags);
 		__tcp_push_pending_frames(sk, mss_now,
-					  (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
+				(flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
 	}
 }
 
@@ -839,6 +837,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 	int err;
 	ssize_t copied;
 	long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
+	int ass_res = 0;
 
 	/* Wait for a connection to finish. One exception is TCP Fast Open
 	 * (passive side) where data is allowed to be sent before a connection
@@ -846,7 +845,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 	 */
 	if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
 	    !tcp_passive_fastopen(sk)) {
-		if ((err = sk_stream_wait_connect(sk, &timeo)) != 0)
+		ass_res = (err = sk_stream_wait_connect(sk, &timeo));
+		if (ass_res != 0)
 			goto out_err;
 	}
 
@@ -864,7 +864,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 		int copy, i;
 		bool can_coalesce;
 
-		if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
+		ass_res = (copy = size_goal - skb->len);
+		if (!tcp_send_head(sk) || ass_res <= 0) {
 new_segment:
 			if (!sk_stream_memory_free(sk))
 				goto wait_for_sndbuf;
@@ -911,7 +912,9 @@ new_segment:
 
 		copied += copy;
 		offset += copy;
-		if (!(size -= copy))
+
+		ass_res = (size -= copy);
+		if (!ass_res)
 			goto out;
 
 		if (skb->len < size_goal || (flags & MSG_OOB))
@@ -929,7 +932,8 @@ wait_for_sndbuf:
 wait_for_memory:
 		tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
 
-		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
+		ass_res = (err = sk_stream_wait_memory(sk, &timeo));
+		if (ass_res != 0)
 			goto do_error;
 
 		mss_now = tcp_send_mss(sk, &size_goal, flags);
@@ -1029,6 +1033,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int mss_now = 0, size_goal, copied_syn = 0, offset = 0;
 	bool sg;
 	long timeo;
+	int ass_res = 0;
 
 	lock_sock(sk);
 
@@ -1050,7 +1055,8 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	 */
 	if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
 	    !tcp_passive_fastopen(sk)) {
-		if ((err = sk_stream_wait_connect(sk, &timeo)) != 0)
+		ass_res = (err = sk_stream_wait_connect(sk, &timeo));
+		if (ass_res != 0)
 			goto do_error;
 	}
 
@@ -1099,7 +1105,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		}
 
 		while (seglen > 0) {
-			int copy = 0;
+			int copy = 0, ass_res = 0;
 			int max = size_goal;
 
 			skb = tcp_write_queue_tail(sk);
@@ -1123,8 +1129,7 @@ new_segment:
 				if (!skb)
 					goto wait_for_memory;
 
-				/*
-				 * Check whether we can use HW checksum.
+				/* Check whether we can use HW checksum.
 				 */
 				if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
 					skb->ip_summed = CHECKSUM_PARTIAL;
@@ -1162,7 +1167,8 @@ new_segment:
 					merge = false;
 				}
 
-				copy = min_t(int, copy, pfrag->size - pfrag->offset);
+				copy = min_t(int, copy,
+					pfrag->size - pfrag->offset);
 
 				if (!sk_wmem_schedule(sk, copy))
 					goto wait_for_memory;
@@ -1176,7 +1182,8 @@ new_segment:
 
 				/* Update the skb. */
 				if (merge) {
-					skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
+					skb_frag_size_add(
+					  &skb_shinfo(skb)->frags[i - 1], copy);
 				} else {
 					skb_fill_page_desc(skb, i, pfrag->page,
 							   pfrag->offset, copy);
@@ -1194,15 +1201,19 @@ new_segment:
 
 			from += copy;
 			copied += copy;
-			if ((seglen -= copy) == 0 && iovlen == 0)
+			ass_res = (seglen -= copy);
+			if (ass_res == 0 && iovlen == 0)
 				goto out;
 
-			if (skb->len < max || (flags & MSG_OOB) || unlikely(tp->repair))
+			if (skb->len < max ||
+			   (flags & MSG_OOB) ||
+			   unlikely(tp->repair))
 				continue;
 
 			if (forced_push(tp)) {
 				tcp_mark_push(tp, skb);
-				__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
+				__tcp_push_pending_frames(sk, mss_now,
+					TCP_NAGLE_PUSH);
 			} else if (skb == tcp_send_head(sk))
 				tcp_push_one(sk, mss_now);
 			continue;
@@ -1211,9 +1222,11 @@ wait_for_sndbuf:
 			set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 wait_for_memory:
 			if (copied)
-				tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
+				tcp_push(sk, flags & ~MSG_MORE,
+					mss_now, TCP_NAGLE_PUSH);
 
-			if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
+			ass_res = (err = sk_stream_wait_memory(sk, &timeo));
+			if (ass_res != 0)
 				goto do_error;
 
 			mss_now = tcp_send_mss(sk, &size_goal, flags);
@@ -1246,8 +1259,7 @@ out_err:
 }
 EXPORT_SYMBOL(tcp_sendmsg);
 
-/*
- *	Handle reading urgent data. BSD has very simple semantics for
+/*	Handle reading urgent data. BSD has very simple semantics for
  *	this, no blocking and very strange errors 8)
  */
 
@@ -1333,7 +1345,8 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
 	if (inet_csk_ack_scheduled(sk)) {
 		const struct inet_connection_sock *icsk = inet_csk(sk);
 		   /* Delayed ACKs frequently hit locked sockets during bulk
-		    * receive. */
+		    * receive.
+		    */
 		if (icsk->icsk_ack.blocked ||
 		    /* Once-per-two-segments ACK was not sent by tcp_input.c */
 		    tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||
@@ -1366,7 +1379,8 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
 
 			/* Send ACK now, if this read freed lots of space
 			 * in our buffer. Certainly, new_window is new window.
-			 * We can advertise it now, if it is not less than current one.
+			 * We can advertise it now, if it is not less than
+			 * current one.
 			 * "Lots" means "at least twice" here.
 			 */
 			if (new_window && new_window >= 2 * rcv_window_now)
@@ -1385,7 +1399,8 @@ static void tcp_prequeue_process(struct sock *sk)
 	NET_INC_STATS_USER(sock_net(sk), LINUX_MIB_TCPPREQUEUED);
 
 	/* RX process wants to run with disabled BHs, though it is not
-	 * necessary */
+	 * necessary
+	 */
 	local_bh_disable();
 	while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
 		sk_backlog_rcv(sk, skb);
@@ -1445,8 +1460,7 @@ static inline struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 	return NULL;
 }
 
-/*
- * This routine provides an alternative to tcp_recvmsg() for routines
+/* This routine provides an alternative to tcp_recvmsg() for routines
  * that would like to handle copying from skbuffs directly in 'sendfile'
  * fashion.
  * Note:
@@ -1526,8 +1540,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 }
 EXPORT_SYMBOL(tcp_read_sock);
 
-/*
- *	This routine copies from a sock struct into the user buffer.
+/*	This routine copies from a sock struct into the user buffer.
  *
  *	Technical note: in 2.3 we work on _locked_ socket, so that
  *	tricks with *seq access order and skb->users are not required.
@@ -1610,12 +1623,15 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	do {
 		u32 offset;
 
-		/* Are we at urgent data? Stop if we have read anything or have SIGURG pending. */
+		/* Are we at urgent data? Stop if we have read
+		 * anything or have SIGURG pending.
+		 */
 		if (tp->urg_data && tp->urg_seq == *seq) {
 			if (copied)
 				break;
 			if (signal_pending(current)) {
-				copied = timeo ? sock_intr_errno(timeo) : -EAGAIN;
+				copied = timeo ?
+					sock_intr_errno(timeo) : -EAGAIN;
 				break;
 			}
 		}
@@ -1744,7 +1760,8 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 				tcp_service_net_dma(sk, true);
 				tcp_cleanup_rbuf(sk, copied);
 			} else
-				dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+				dma_async_memcpy_issue_pending(
+					tp->ucopy.dma_chan);
 		}
 #endif
 		if (copied >= target) {
@@ -1760,12 +1777,15 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 #endif
 
 		if (user_recv) {
-			int chunk;
+			int chunk, ass_res = 0;
 
 			/* __ Restore normal policy in scheduler __ */
 
-			if ((chunk = len - tp->ucopy.len) != 0) {
-				NET_ADD_STATS_USER(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG, chunk);
+			ass_res = (chunk = len - tp->ucopy.len);
+			if (ass_res != 0) {
+				NET_ADD_STATS_USER(sock_net(sk),
+					LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG,
+					chunk);
 				len -= chunk;
 				copied += chunk;
 			}
@@ -1775,8 +1795,11 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 do_prequeue:
 				tcp_prequeue_process(sk);
 
-				if ((chunk = len - tp->ucopy.len) != 0) {
-					NET_ADD_STATS_USER(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE, chunk);
+				ass_res = (chunk = len - tp->ucopy.len);
+				if (ass_res != 0) {
+					NET_ADD_STATS_USER(sock_net(sk),
+					    LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE,
+					    chunk);
 					len -= chunk;
 					copied += chunk;
 				}
@@ -1791,7 +1814,7 @@ do_prequeue:
 		}
 		continue;
 
-	found_ok_skb:
+found_ok_skb:
 		/* Ok so how much can we use? */
 		used = skb->len - offset;
 		if (len < used)
@@ -1800,19 +1823,18 @@ do_prequeue:
 		/* Do we have urgent data here? */
 		if (tp->urg_data) {
 			u32 urg_offset = tp->urg_seq - *seq;
-			if (urg_offset < used) {
-				if (!urg_offset) {
-					if (!sock_flag(sk, SOCK_URGINLINE)) {
-						++*seq;
-						urg_hole++;
-						offset++;
-						used--;
-						if (!used)
-							goto skip_copy;
-					}
-				} else
-					used = urg_offset;
+			if (urg_offset < used && !urg_offset) {
+				if (!sock_flag(sk, SOCK_URGINLINE)) {
+					++*seq;
+					urg_hole++;
+					offset++;
+					used--;
+					if (!used)
+						goto skip_copy;
+				}
 			}
+			if (urg_offset < used && urg_offset)
+				used = urg_offset;
 		}
 
 		if (!(flags & MSG_TRUNC)) {
@@ -1821,7 +1843,9 @@ do_prequeue:
 				tp->ucopy.dma_chan = net_dma_find_channel();
 
 			if (tp->ucopy.dma_chan) {
-				tp->ucopy.dma_cookie = dma_skb_copy_datagram_iovec(
+				tp->ucopy.dma_cookie =
+					dma_skb_copy_datagram_iovec(
+
 					tp->ucopy.dma_chan, skb, offset,
 					msg->msg_iov, used,
 					tp->ucopy.pinned_list);
@@ -1837,7 +1861,8 @@ do_prequeue:
 					break;
 				}
 
-				dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+				dma_async_memcpy_issue_pending(
+					tp->ucopy.dma_chan);
 
 				if ((offset + used) == skb->len)
 					copied_early = true;
@@ -1878,7 +1903,7 @@ skip_copy:
 		}
 		continue;
 
-	found_fin_ok:
+found_fin_ok:
 		/* Process the FIN. */
 		++*seq;
 		if (!(flags & MSG_PEEK)) {
@@ -1890,14 +1915,17 @@ skip_copy:
 
 	if (user_recv) {
 		if (!skb_queue_empty(&tp->ucopy.prequeue)) {
-			int chunk;
+			int chunk, ass_res = 0;
 
 			tp->ucopy.len = copied > 0 ? len : 0;
 
 			tcp_prequeue_process(sk);
 
-			if (copied > 0 && (chunk = len - tp->ucopy.len) != 0) {
-				NET_ADD_STATS_USER(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE, chunk);
+			ass_res = (chunk = len - tp->ucopy.len);
+			if (copied > 0 && ass_res != 0) {
+				NET_ADD_STATS_USER(sock_net(sk),
+					LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE,
+					chunk);
 				len -= chunk;
 				copied += chunk;
 			}
@@ -1971,13 +1999,13 @@ void tcp_set_state(struct sock *sk, int state)
 	sk->sk_state = state;
 
 #ifdef STATE_TRACE
-	SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk, statename[oldstate], statename[state]);
+	SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk,
+		statename[oldstate], statename[state]);
 #endif
 }
 EXPORT_SYMBOL_GPL(tcp_set_state);
 
-/*
- *	State processing on a close. This implements the state shift for
+/*	State processing on a close. This implements the state shift for
  *	sending our FIN frame. Note that we only send a FIN for some
  *	states. A shutdown() may have already sent the FIN, or we may be
  *	closed.
@@ -2009,8 +2037,7 @@ static int tcp_close_state(struct sock *sk)
 	return next & TCP_ACTION_FIN;
 }
 
-/*
- *	Shutdown the sending side of a connection. Much like close except
+/*	Shutdown the sending side of a connection. Much like close except
  *	that we don't receive shut down or sock_set_flag(sk, SOCK_DEAD).
  */
 
@@ -2125,7 +2152,7 @@ void tcp_close(struct sock *sk, long timeout)
 		 * required by specs (TCP_ESTABLISHED, TCP_CLOSE_WAIT, when
 		 * they look as CLOSING or LAST_ACK for Linux)
 		 * Probably, I missed some more holelets.
-		 * 						--ANK
+		 *                                             --ANK
 		 * XXX (TFO) - To start off we don't support SYN+ACK+FIN
 		 * in a single packet! (May consider it later but will
 		 * probably need API support or TCP_CORK SYN-ACK until
@@ -2235,6 +2262,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	int err = 0;
+	int ass_res = 0;
 	int old_state = sk->sk_state;
 
 	if (old_state != TCP_CLOSE)
@@ -2272,7 +2300,8 @@ int tcp_disconnect(struct sock *sk, int flags)
 	sk->sk_shutdown = 0;
 	sock_reset_flag(sk, SOCK_DONE);
 	tp->srtt = 0;
-	if ((tp->write_seq += tp->max_window + 2) == 0)
+	ass_res = (tp->write_seq += tp->max_window + 2);
+	if (ass_res == 0)
 		tp->write_seq = 1;
 	icsk->icsk_backoff = 0;
 	tp->snd_cwnd = 2;
@@ -2358,8 +2387,7 @@ static int tcp_repair_options_est(struct tcp_sock *tp,
 	return 0;
 }
 
-/*
- *	Socket option code for TCP.
+/*	Socket option code for TCP.
  */
 static int do_tcp_setsockopt(struct sock *sk, int level,
 		int optname, char __user *optval, unsigned int optlen)
@@ -2491,7 +2519,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 	case TCP_MAXSEG:
 		/* Values greater than interface MTU won't take effect. However
 		 * at the point when this call is done we typically don't yet
-		 * know which interface is going to be used */
+		 * know which interface is going to be used
+		 */
+
 		if (val < TCP_MIN_MSS || val > MAX_TCP_WINDOW) {
 			err = -EINVAL;
 			break;
@@ -2509,6 +2539,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 			 * an explicit push, which overrides even TCP_CORK
 			 * for currently queued segments.
 			 */
+
 			tp->nonagle |= TCP_NAGLE_OFF|TCP_NAGLE_PUSH;
 			tcp_push_pending_frames(sk);
 		} else {
@@ -2786,7 +2817,8 @@ void tcp_get_info(const struct sock *sk, struct tcp_info *info)
 	info->tcpi_fackets = tp->fackets_out;
 
 	info->tcpi_last_data_sent = jiffies_to_msecs(now - tp->lsndtime);
-	info->tcpi_last_data_recv = jiffies_to_msecs(now - icsk->icsk_ack.lrcvtime);
+	info->tcpi_last_data_recv =
+		 jiffies_to_msecs(now - icsk->icsk_ack.lrcvtime);
 	info->tcpi_last_ack_recv = jiffies_to_msecs(now - tp->rcv_tstamp);
 
 	info->tcpi_pmtu = icsk->icsk_pmtu_cookie;
@@ -3378,12 +3410,12 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
 }
 EXPORT_SYMBOL(tcp_md5_hash_skb_data);
 
-int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *key)
+int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *k)
 {
 	struct scatterlist sg;
 
-	sg_init_one(&sg, key->key, key->keylen);
-	return crypto_hash_update(&hp->md5_desc, &sg, key->keylen);
+	sg_init_one(&sg, k->key, k->keylen);
+	return crypto_hash_update(&hp->md5_desc, &sg, k->keylen);
 }
 EXPORT_SYMBOL(tcp_md5_hash_key);
 
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net-next V4 04/13] bridge: Verify that a vlan is allowed to egress on give port
From: Shmulik Ladkani @ 2012-12-20 14:28 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <1355939304-21804-5-git-send-email-vyasevic@redhat.com>

Hi Vlad,

On Wed, 19 Dec 2012 12:48:15 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
>  /* Don't forward packets to originating port or forwarding diasabled */
>  static inline int should_deliver(const struct net_bridge_port *p,
>  				 const struct sk_buff *skb)
>  {
>  	return (((p->flags & BR_HAIRPIN_MODE) || skb->dev != p->dev) &&
> +		br_allowed_egress(p, skb) &&
>  		p->state == BR_STATE_FORWARDING);
>  }

This should be also encorporated into 'br_pass_frame_up' somehow.

Egress permission when leaving the bridge towards IP stack ("egress"
on the "bridge master port" from bridging point-of-view) should be
validated according to master port's membership.

Regards,
Shmulik

^ permalink raw reply

* [PATCH 1/3] iproute2: distinguish permanent and temporary mdb entries
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, bridge, Cong Wang

This patch adds a flag to mdb entries so that we can distinguish
permanent entries with temporary ones.

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 bridge/mdb.c              |   24 +++++++++++++++---------
 include/linux/if_bridge.h |    3 +++
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 121ce9c..6217c5f 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -28,7 +28,7 @@ int filter_index;
 
 static void usage(void)
 {
-	fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP\n");
+	fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp]\n");
 	fprintf(stderr, "       bridge mdb {show} [ dev DEV ]\n");
 	exit(-1);
 }
@@ -53,13 +53,15 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e)
 	SPRINT_BUF(abuf);
 
 	if (e->addr.proto == htons(ETH_P_IP))
-		fprintf(f, "bridge %s port %s group %s\n", ll_index_to_name(ifindex),
+		fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
 			ll_index_to_name(e->ifindex),
-			inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)));
+			inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)),
+			(e->state & MDB_PERMANENT) ? "permanent" : "temp");
 	else
-		fprintf(f, "bridge %s port %s group %s\n", ll_index_to_name(ifindex),
+		fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
 			ll_index_to_name(e->ifindex),
-			inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)));
+			inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)),
+			(e->state & MDB_PERMANENT) ? "permanent" : "temp");
 }
 
 static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr)
@@ -179,11 +181,15 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
 		} else if (strcmp(*argv, "grp") == 0) {
 			NEXT_ARG();
 			grp = *argv;
+		} else if (strcmp(*argv, "port") == 0) {
+			NEXT_ARG();
+			p = *argv;
+		} else if (strcmp(*argv, "permanent") == 0) {
+			if (cmd == RTM_NEWMDB)
+				entry.state |= MDB_PERMANENT;
+		} else if (strcmp(*argv, "temp") == 0) {
+			;/* nothing */
 		} else {
-			if (strcmp(*argv, "port") == 0) {
-				NEXT_ARG();
-				p = *argv;
-			}
 			if (matches(*argv, "help") == 0)
 				usage();
 		}
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b3b6a67..aac8b8c 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -163,6 +163,9 @@ struct br_port_msg {
 
 struct br_mdb_entry {
 	__u32 ifindex;
+#define MDB_TEMPORARY 0
+#define MDB_PERMANENT 1
+	__u8 state;
 	struct {
 		union {
 			__be32	ip4;
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 2/3] iproute2: update help info of bridge command
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
  To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger
In-Reply-To: <1356013915-20835-1-git-send-email-amwang@redhat.com>

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 bridge/bridge.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index 1fcd365..1d59a1e 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -27,7 +27,7 @@ static void usage(void)
 {
 	fprintf(stderr,
 "Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
-"where  OBJECT := { fdb |  monitor }\n"
+"where  OBJECT := { fdb |  mdb | monitor }\n"
 "       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails]\n" );
 	exit(-1);
 }
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 3/3] iproute2: make `bridge mdb` output consistent with input
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
  To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger
In-Reply-To: <1356013915-20835-1-git-send-email-amwang@redhat.com>

bridge -> dev
group -> grp

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 bridge/mdb.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 6217c5f..81d479b 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -53,12 +53,12 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e)
 	SPRINT_BUF(abuf);
 
 	if (e->addr.proto == htons(ETH_P_IP))
-		fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
+		fprintf(f, "dev %s port %s grp %s %s\n", ll_index_to_name(ifindex),
 			ll_index_to_name(e->ifindex),
 			inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)),
 			(e->state & MDB_PERMANENT) ? "permanent" : "temp");
 	else
-		fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
+		fprintf(f, "dev %s port %s grp %s %s\n", ll_index_to_name(ifindex),
 			ll_index_to_name(e->ifindex),
 			inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)),
 			(e->state & MDB_PERMANENT) ? "permanent" : "temp");
-- 
1.7.7.6

^ permalink raw reply related

* Re: [Xen-devel] [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 14:58 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Eric Dumazet, netdev@vger.kernel.org, annie li,
	xen-devel@lists.xensource.com, Ian Campbell,
	Konrad Rzeszutek Wilk
In-Reply-To: <1457826869.20121220152326@eikelenboom.it>


Thursday, December 20, 2012, 3:23:26 PM, you wrote:


> Thursday, December 20, 2012, 1:51:39 PM, you wrote:


>> Wednesday, December 19, 2012, 5:17:49 PM, you wrote:

>>> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:

>>>> Hi Ian,
>>>> 
>>>> It ran overnight and i haven't seen the warn_once trigger.
>>>> (but i also didn't with the previous patch)
>>>> 

>>> As I said, the miminum value to not trigger the warning was what Ian
>>> patch was doing, but it was still a not accurate estimation.

>>> Doing the real accounting might trigger slow transferts, or dropped
>>> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.

>>> So the real question was : If accounting for full pages, is your
>>> applications run as smooth as before, with no huge performance
>>> regression ?

>> Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.

>> I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
>> So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..

>> - The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
>> - The first variant (current code) seems to always substract from the truesize for small packets.
>> - The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
>> - The third variant seems to be a pretty wasteful estimation.

>> So the last variant seems to be rather wasteful, and the second one the most accurate so far.

>> Eric:
>>      From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
>>      When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?



>> [  116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
>> [  117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
>> [  117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
>> [  117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
>> [  117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
>> [  117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
>> [  117.539403] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
>> [  117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
>> [  117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
>> [  121.981917] net_ratelimit: 27 callbacks suppressed
>> [  121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)



>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index c26e28b..8833e38 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>>         struct sk_buff_head tmpq;
>>         unsigned long flags;
>>         int err;
>> +       int tsz,len;

>>         spin_lock(&np->rx_lock);

>> @@ -1037,9 +1038,22 @@ err:
>>                  * receive throughout using the standard receive
>>                  * buffer size was cut by 25%(!!!).
>>                  */
>> -               skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> +
>> +
>> +
>> +
>> +                tsz = skb->truesize;
>> +                len = skb->len;
>> +                /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
>> +                skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>>                 skb->len += skb->data_len;

>> +               net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
>> +                        skb->dev->name, skb->dev->mtu, skb->data_len, len,  skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
>> +                        skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
>> +                        skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
>> +                        PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
>> +
>>                 if (rx->flags & XEN_NETRXF_csum_blank)
>>                         skb->ip_summed = CHECKSUM_PARTIAL;
>>                 else if (rx->flags & XEN_NETRXF_data_validated)
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 3ab989b..6d0cd86 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,

>>         WARN_ON_ONCE(delta < len);

>> +       if(delta < len) {
>> +               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> +                        to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> +       }
>> +
+       if (delta >>> len && delta - len > 100) {
>> +               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> +                        to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> +       }
>> +
>>         memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
>>                skb_shinfo(from)->frags,
>>                skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));



> Ok i succeeded in triggering the warn_on_once, but it seems the extra debug info from netfront was just rate limited away for the offending packet :(

> Dec 20 15:17:33 media kernel: [  393.464062] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [  393.464438] eth0: mtu:1500 data_len:762 len before:0 len after:762 truesize before:896 truesize after:1402 nr_frags:1 variant1:506(1402) variant2:506(1402) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [  393.465083] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [  393.466114] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [  393.467336] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  394.940211] ------------[ cut here ]------------
> Dec 20 15:17:35 media kernel: [  394.940259] WARNING: at net/core/skbuff.c:3472 skb_try_coalesce+0x3fc/0x470()
> Dec 20 15:17:35 media kernel: [  394.940282] Modules linked in:
> Dec 20 15:17:35 media kernel: [  394.940306] Pid: 2632, comm: glusterfs Not tainted 3.7.0-rc0-20121220-netfrontdebug1 #1
> Dec 20 15:17:35 media kernel: [  394.940330] Call Trace:
> Dec 20 15:17:35 media kernel: [  394.940343]  <IRQ>  [<ffffffff8106889a>] warn_slowpath_common+0x7a/0xb0
> Dec 20 15:17:35 media kernel: [  394.940384]  [<ffffffff810688e5>] warn_slowpath_null+0x15/0x20
> Dec 20 15:17:35 media kernel: [  394.940409]  [<ffffffff8184298c>] skb_try_coalesce+0x3fc/0x470
> Dec 20 15:17:35 media kernel: [  394.940434]  [<ffffffff818fb049>] tcp_try_coalesce+0x69/0xc0
> Dec 20 15:17:35 media kernel: [  394.940458]  [<ffffffff818fb0f4>] tcp_queue_rcv+0x54/0x100
> Dec 20 15:17:35 media kernel: [  394.940481]  [<ffffffff8190029f>] ? tcp_mtup_init+0x2f/0x90
> Dec 20 15:17:35 media kernel: [  394.940504]  [<ffffffff818ffbdb>] tcp_rcv_established+0x2bb/0x6a0
> Dec 20 15:17:35 media kernel: [  394.940528]  [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [  394.940551]  [<ffffffff81907985>] tcp_v4_do_rcv+0x135/0x480
> Dec 20 15:17:35 media kernel: [  394.940576]  [<ffffffff819b3532>] ? _raw_spin_lock_nested+0x42/0x50
> Dec 20 15:17:35 media kernel: [  394.940600]  [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [  394.940623]  [<ffffffff8190862d>] tcp_v4_rcv+0x95d/0xb10
> Dec 20 15:17:35 media kernel: [  394.940666]  [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [  394.940694]  [<ffffffff818e4d6a>] ip_local_deliver_finish+0x11a/0x230
> Dec 20 15:17:35 media kernel: [  394.940720]  [<ffffffff818e4c95>] ? ip_local_deliver_finish+0x45/0x230
> Dec 20 15:17:35 media kernel: [  394.940745]  [<ffffffff818e4eb8>] ip_local_deliver+0x38/0x80
> Dec 20 15:17:35 media kernel: [  394.940784]  [<ffffffff818e447a>] ip_rcv_finish+0x15a/0x630
> Dec 20 15:17:35 media kernel: [  394.940807]  [<ffffffff818e4b68>] ip_rcv+0x218/0x300
> Dec 20 15:17:35 media kernel: [  394.940829]  [<ffffffff8184bf2d>] __netif_receive_skb+0x65d/0x8d0
> Dec 20 15:17:35 media kernel: [  394.940853]  [<ffffffff8184ba15>] ? __netif_receive_skb+0x145/0x8d0
> Dec 20 15:17:35 media kernel: [  394.940889]  [<ffffffff810b192d>] ? trace_hardirqs_on+0xd/0x10
> Dec 20 15:17:35 media kernel: [  394.940914]  [<ffffffff810fecbb>] ? free_hot_cold_page+0x1ab/0x1e0
> Dec 20 15:17:35 media kernel: [  394.940939]  [<ffffffff8184e4f8>] netif_receive_skb+0x28/0xf0
> Dec 20 15:17:35 media kernel: [  394.940964]  [<ffffffff81843e83>] ? __pskb_pull_tail+0x253/0x340
> Dec 20 15:17:35 media kernel: [  394.941000]  [<ffffffff8164fbb5>] xennet_poll+0xae5/0xed0
> Dec 20 15:17:35 media kernel: [  394.941024]  [<ffffffff81080081>] ? wake_up_worker+0x1/0x30
> Dec 20 15:17:35 media kernel: [  394.941046]  [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [  394.941075]  [<ffffffff8184ed66>] net_rx_action+0x136/0x260
> Dec 20 15:17:35 media kernel: [  394.941098]  [<ffffffff81070551>] ? __do_softirq+0x71/0x1a0
> Dec 20 15:17:35 media kernel: [  394.941133]  [<ffffffff810705a9>] __do_softirq+0xc9/0x1a0
> Dec 20 15:17:35 media kernel: [  394.941157]  [<ffffffff819b623c>] call_softirq+0x1c/0x30
> Dec 20 15:17:35 media kernel: [  394.941179]  [<ffffffff8100fdc5>] do_softirq+0x85/0xf0
> Dec 20 15:17:35 media kernel: [  394.941201]  [<ffffffff8107041e>] irq_exit+0x9e/0xd0
> Dec 20 15:17:35 media kernel: [  394.941235]  [<ffffffff81430b1f>] xen_evtchn_do_upcall+0x2f/0x40
> Dec 20 15:17:35 media kernel: [  394.941259]  [<ffffffff819b629e>] xen_do_hypervisor_callback+0x1e/0x30
> Dec 20 15:17:35 media kernel: [  394.941279]  <EOI>  [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [  394.941318]  [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [  394.941356]  [<ffffffff8100890d>] ? xen_force_evtchn_callback+0xd/0x10
> Dec 20 15:17:35 media kernel: [  394.941381]  [<ffffffff810092b2>] ? check_events+0x12/0x20
> Dec 20 15:17:35 media kernel: [  394.941405]  [<ffffffff81009259>] ? xen_irq_enable_direct_reloc+0x4/0x4
> Dec 20 15:17:35 media kernel: [  394.941432]  [<ffffffff819b3f6c>] ? _raw_spin_unlock_irq+0x3c/0x70
> Dec 20 15:17:35 media kernel: [  394.941473]  [<ffffffff81095f83>] ? finish_task_switch+0x83/0xe0
> Dec 20 15:17:35 media kernel: [  394.941507]  [<ffffffff81095f46>] ? finish_task_switch+0x46/0xe0
> Dec 20 15:17:35 media kernel: [  394.941533]  [<ffffffff819b2434>] ? __schedule+0x444/0x880
> Dec 20 15:17:35 media kernel: [  394.941555]  [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [  394.941580]  [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [  394.941614]  [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [  394.941638]  [<ffffffff819aff95>] ? __mutex_unlock_slowpath+0x135/0x1d0
> Dec 20 15:17:35 media kernel: [  394.941663]  [<ffffffff819b2904>] ? schedule+0x24/0x70
> Dec 20 15:17:35 media kernel: [  394.941697]  [<ffffffff819b179d>] ? schedule_hrtimeout_range_clock+0x11d/0x140
> Dec 20 15:17:35 media kernel: [  394.941725]  [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [  394.941748]  [<ffffffff8118a558>] ? ep_poll+0xf8/0x3a0
> Dec 20 15:17:35 media kernel: [  394.941770]  [<ffffffff819b4015>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
> Dec 20 15:17:35 media kernel: [  394.941808]  [<ffffffff810b1818>] ? trace_hardirqs_on_caller+0xf8/0x200
> Dec 20 15:17:35 media kernel: [  394.941833]  [<ffffffff819b17ce>] ? schedule_hrtimeout_range+0xe/0x10
> Dec 20 15:17:35 media kernel: [  394.941856]  [<ffffffff8118a75a>] ? ep_poll+0x2fa/0x3a0
> Dec 20 15:17:35 media kernel: [  394.941878]  [<ffffffff81098630>] ? try_to_wake_up+0x310/0x310
> Dec 20 15:17:35 media kernel: [  394.941913]  [<ffffffff810b5b17>] ? lock_release+0x117/0x250
> Dec 20 15:17:35 media kernel: [  394.941938]  [<ffffffff81165fd7>] ? fget_light+0xd7/0x140
> Dec 20 15:17:35 media kernel: [  394.941959]  [<ffffffff81165f3a>] ? fget_light+0x3a/0x140
> Dec 20 15:17:35 media kernel: [  394.941981]  [<ffffffff8118a8ce>] ? sys_epoll_wait+0xce/0xe0
> Dec 20 15:17:35 media kernel: [  394.942015]  [<ffffffff819b4e69>] ? system_call_fastpath+0x16/0x1b
> Dec 20 15:17:35 media kernel: [  394.942036] ---[ end trace 6f3a832c9e91c8af ]---
> Dec 20 15:17:35 media kernel: [  394.942056] to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:22978 len:23168 from->truesize:23874 skb_headlen(from):0 skb_shinfo(to)->nr_frags:4 skb_shinfo(from)->nr_frags:6
> Dec 20 15:17:35 media kernel: [  394.968199] to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:14290 len:14480 from->truesize:15186 skb_headlen(from):0 skb_shinfo(to)->nr_frags:13 skb_shinfo(from)->nr_frags:4
> Dec 20 15:17:35 media kernel: [  395.262814] net_ratelimit: 371 callbacks suppressed
> Dec 20 15:17:35 media kernel: [  395.262858] eth0: mtu:1500 data_len:90 len before:0 len after:90 truesize before:896 truesize after:730 nr_frags:1 variant1:-166(730) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.264767] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.266193] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.268422] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.271617] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.274794] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.278104] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.281319] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.284454] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.287797] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.291121] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)


Hmm perhaps a better example, i have indented some perhaps interesting points:

        Dec 20 14:12:57 media kernel: [  794.895136] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
        Dec 20 14:12:57 media kernel: [  794.895431] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
        Dec 20 14:12:57 media kernel: [  794.895616] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18824(19720) variant3:24576(25472)
        Dec 20 14:12:57 media kernel: [  794.895804] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
        Dec 20 14:12:57 media kernel: [  794.895823] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:3 variant1:7050(7946) variant2:7050(7946) variant3:12288(13184)
        Dec 20 14:12:57 media kernel: [  794.895868] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:3
        Dec 20 14:12:57 media kernel: [  794.896133] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
        Dec 20 14:12:57 media kernel: [  794.896152] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
        Dec 20 14:12:57 media kernel: [  794.896200] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:1
        Dec 20 14:12:57 media kernel: [  794.907232] eth0: mtu:1500 data_len:23234 len before:0 len after:23234 truesize before:896 truesize after:23874 nr_frags:7 variant1:22978(23874) variant2:22978(23874) variant3:28672(29568)
        Dec 20 14:12:57 media kernel: [  794.907517] eth0: mtu:1500 data_len:24682 len before:0 len after:24682 truesize before:896 truesize after:25322 nr_frags:7 variant1:24426(25322) variant2:24426(25322) variant3:28672(29568)
        Dec 20 14:12:57 media kernel: [  794.907693] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)
        Dec 20 14:12:57 media kernel: [  794.907882] eth0: mtu:1500 data_len:14546 len before:0 len after:14546 truesize before:896 truesize after:15186 nr_frags:5 variant1:14290(15186) variant2:14290(15186) variant3:20480(21376)
        Dec 20 14:12:57 media kernel: [  794.907901] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
        Dec 20 14:12:57 media kernel: [  794.907938] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:13482 len:13032 from->truesize:13738 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:4
        Dec 20 14:12:57 media kernel: [  794.908191] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28880(29776) variant3:36864(37760)
        Dec 20 14:12:57 media kernel: [  794.908386] eth0: mtu:1500 data_len:30474 len before:0 len after:30474 truesize before:896 truesize after:31114 nr_frags:8 variant1:30218(31114) variant2:30218(31114) variant3:32768(33664)

A1) Here we have a packet data_len: 5858 and truesize set to 6498 and nr_frags: 2
        Dec 20 14:12:57 media kernel: [  794.908560] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:2 variant1:5602(6498) variant2:5602(6498) variant3:8192(9088)

        Dec 20 14:12:57 media kernel: [  794.908581] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)

A2) That seems to end up in skb_try_coalesce, from->nr_frags is still 2, delta >> LEN in this case, no warning but perhaps wasteful ?
        Dec 20 14:12:57 media kernel: [  794.908616] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:6242 len:5792 from->truesize:6498 skb_headlen(from):190 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2

        Dec 20 14:12:57 media kernel: [  794.908834] eth0: mtu:1500 data_len:33370 len before:0 len after:33370 truesize before:896 truesize after:34010 nr_frags:9 variant1:33114(34010) variant2:33114(34010) variant3:36864(37760)

B1) Here we have again a packet data_len: 5858 and truesize set to 6498, but nr_frags: 3 this time.
        Dec 20 14:12:57 media kernel: [  794.908992] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:3 variant1:5602(6498) variant2:5792(6688) variant3:12288(13184)
        Dec 20 14:12:57 media kernel: [  794.909012] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:8 variant1:28770(29666) variant2:28770(29666) variant3:32768(33664)

B2) That seems to end up in skb_try_coalesce, from->nr_frags is now 2 instead of 3, delta < LEN in this case, so it would have triggered the warn_on_once
        Dec 20 14:12:57 media kernel: [  794.909040] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:5602 len:5792 from->truesize:6498 skb_headlen(from):0 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2

        Dec 20 14:12:57 media kernel: [  794.909673] eth0: mtu:1500 data_len:1514 len before:0 len after:1514 truesize before:896 truesize after:2154 nr_frags:1 variant1:1258(2154) variant2:1258(2154) variant3:4096(4992)
        Dec 20 14:12:57 media kernel: [  794.909692] eth0: mtu:1500 data_len:522 len before:0 len after:522 truesize before:896 truesize after:1162 nr_frags:1 variant1:266(1162) variant2:266(1162) variant3:4096(4992)
        Dec 20 14:12:57 media kernel: [  794.909736] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:906 len:456 from->truesize:1162 skb_headlen(from):190 skb_shinfo(to)->nr_frags:2 skb_shinfo(from)->nr_frags:1
        Dec 20 14:12:57 media kernel: [  794.910205] eth0: mtu:1500 data_len:36266 len before:0 len after:36266 truesize before:896 truesize after:36906 nr_frags:10 variant1:36010(36906) variant2:36010(36906) variant3:40960(41856)
        Dec 20 14:12:57 media kernel: [  794.910706] eth0: mtu:1500 data_len:37714 len before:0 len after:37714 truesize before:896 truesize after:38354 nr_frags:10 variant1:37458(38354) variant2:37458(38354) variant3:40960(41856)
        Dec 20 14:12:57 media kernel: [  794.911472] eth0: mtu:1500 data_len:27578 len before:0 len after:27578 truesize before:896 truesize after:28218 nr_frags:8 variant1:27322(28218) variant2:27322(28218) variant3:32768(33664)
        Dec 20 14:12:57 media kernel: [  794.911695] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28770(29666) variant3:36864(37760)
        Dec 20 14:12:57 media kernel: [  795.015511] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
        Dec 20 14:12:57 media kernel: [  795.015585] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:10 skb_shinfo(from)->nr_frags:1
        Dec 20 14:12:57 media kernel: [  795.015641] eth0: mtu:1500 data_len:10202 len before:0 len after:10202 truesize before:896 truesize after:10842 nr_frags:4 variant1:9946(10842) variant2:9946(10842) variant3:16384(17280)
        Dec 20 14:12:57 media kernel: [  795.015657] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
        Dec 20 14:12:58 media kernel: [  795.817824] net_ratelimit: 9 callbacks suppressed

--
Sander

^ permalink raw reply

* Re: [PATCH] net: ipv4: route: fixed a coding style issues net: ipv4: tcp: fixed a coding style issues
From: Eric Dumazet @ 2012-12-20 15:23 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: Stefan Hasko, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <50D2FF86.3000603@6wind.com>

On Thu, 2012-12-20 at 13:07 +0100, Nicolas Dichtel wrote:
> Le 20/12/2012 09:08, Stefan Hasko a écrit :

> > +				"out_hlist_search\n");
> checkpatch will warn you about this one, something like:
> "WARNING: quoted string split across lines".
> Not breaking such line ease to grep the pattern.


Yes.

Could we please leave this file as is for at least 2 years ?

We had a lot of recent changes and probable fixes are expected.

Such "coding style" patches are a real pain when trying to fix bugs,
especially dealing with stable/old kernels.

Thanks

^ permalink raw reply

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
From: Yury Stankevich @ 2012-12-20 14:59 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50D305FD.7000901@mojatatu.com>

interesting,

#tc -s filter show dev usb0 parent ffff:
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
0 terminal flowid ???  (rule hit 707 success 707)
  match 00000000/00000000 at 0 (success 707 )
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 5 ref 1 bind 1 installed 394 sec used 11 sec
	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: mirred (Egress Redirect to device ifb0) stolen
 	index 5 ref 1 bind 1 installed 394 sec used 11 sec
 	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

so, looks like packets was sent to CONNMARK target.

but...
i make a iptables rule to log packets with 0xa mark:

Chain PREROUTING (policy ACCEPT 1308 packets, 848K bytes)
 pkts bytes target     prot opt in     out     source
destination
    0     0 NFLOG      all  --  *      *       0.0.0.0/0
0.0.0.0/0            mark match 0xa nflog-group 1

Chain POSTROUTING (policy ACCEPT 1240 packets, 550K bytes)
 pkts bytes target     prot opt in     out     source
destination
    1    40 CONNMARK   tcp  --  *      *       0.0.0.0/0
0.0.0.0/0            tcp dpt:80 connmark match  0x0 connbytes 204800
connbytes mode bytes connbytes direction both CONNMARK set 0xa

idea is:
i run downloading, rule from POSTROUTING must fire if i download more
than ~200K,
tc filter call to CONNMARK restore, must restore mark (0xa) for packets
belong to this connection.
so i expect, that PREROUTING rule must notice the restored mark, but it
doesn't.
maybe i miss something ?


20.12.2012 16:35, Jamal Hadi Salim пишет:
> 
> Could be your setup. I didnt do a lot of testing but
> from my notes (running different kernel at the moment):
> 
> #try to point to everything (no iptables setup)
> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid
> 23:23 action xt -j CONNMARK --restore-mark
> #let it run for a 1 sec then display with
> tc -s filter show dev eth0 parent ffff:
> 
> ----
> filter protocol ip pref 49152 u32
> filter protocol ip pref 49152 u32 fh 800: ht divisor 1
> filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
> 0 flowid 23:23
>   match 00000000/00000000 at 0
>     action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
>     target  CONNMARK restore
>     index 1 ref 1 bind 1 installed 3 sec used 1 sec
>     Action statistics:
>     Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
>     backlog 0b 0p requeues 0
> ----
> 
> cheers,
> jamal
> 
> On 12-12-20 03:54 AM, Yury Stankevich wrote:
>> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>>> Hasan/Yury, if you test this please use the latest iproute2 with only
>>> the first patch I posted (originally from Hasan). Hasan please use that
>>> patch not your version - if theres anything wrong we can find out sooner
>>> before the patch becomes final.
>>
>> Hello,
>> 3.7.1 kernel with 3.7.0 iproute,
>> patch-xt, xt-p1 + linkage fix was applyed
>> command successfully performed, but actually doesn't work.
>>
>> command:
>> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>>              action xt -j CONNMARK --restore-mark \
>>              action mirred egress redirect dev ifb0
>> then i use filter:
>>
>> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
>> 1:102
>>
>> iptables line:
>> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
>> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
>> bytes -j CONNMARK --set-mark 0xa
>>
>> once i run a test to download 300K file,
>> from iptables counters i can see that rule in POSTROUTING is triggered,
>> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
>> 1:102 flow.
>>
>> btw,
>> tc -p -s filter show dev ifb0 parent 1:
>> do not show stats `(rule hit 416 success 0)` for this (filter protocol
>> ip pref 2 fw handle 0xa classid 1:102) rule.
>>
>>
>>
> 


-- 
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>

^ permalink raw reply

* Re: [PATCH net-next V4 02/13] bridge: Add vlan filtering infrastructure
From: Vlad Yasevich @ 2012-12-20 15:31 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <20121220153913.11a10fd0@pixies.home.jungo.com>

On 12/20/2012 08:39 AM, Shmulik Ladkani wrote:
> Hi Vlad,
>
> On Wed, 19 Dec 2012 12:48:13 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
>> +static void nbp_vlan_flush(struct net_bridge_port *p)
>> +{
>> +	struct net_port_vlan *pve;
>> +	struct net_port_vlan *tmp;
>> +
>> +	ASSERT_RTNL();
>> +
>> +	list_for_each_entry_safe(pve, tmp, &p->vlan_list, list)
>> +		nbp_vlan_delete(p, pve->vid, BRIDGE_FLAGS_SELF);
>
> Why would you want to clear "bridge master port" association from this
> vlan, in the event of NBP destruction?
> The "bridge port" may still be a member of this vlan, doesn't it?
> Seems flags argument should be 0.

This ends up getting fixed later, but you are right.  This should be 0.

>
>> +#define BR_VID_HASH_SIZE (1<<6)
>> +#define br_vlan_hash(vid) ((vid) % (BR_VID_HASH_SIZE - 1))
>
> Did you mean:                       & (BR_VID_HASH_SIZE - 1)

yes.

thanks
-vlad

>
> Regards,
> Shmulik
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox