Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
From: Yurij M. Plotnikov @ 2012-12-20 11:22 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky
In-Reply-To: <20121220073445.GM18940@secunet.com>

On 12/20/12 11:34, Steffen Klassert wrote:
> On Wed, Dec 19, 2012 at 07:37:44PM +0000, Ben Hutchings wrote:
>    
>> On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote:
>>      
>>> On 12/19/12 17:35, Ben Hutchings wrote:
>>>        
>>>> On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
>>>>
>>>>          
>>>>> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket
>>>>> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT
>>>>> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the
>>>>> same and packet is always sent with "Don't Fragment" bit in case of
>>>>> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.
>>>>>
>>>>>            
>>>> You could try reverting:
>>>>
>>>> commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
>>>> Author: Steffen Klassert<steffen.klassert@secunet.com>
>>>> Date:   Mon Oct 8 00:56:54 2012 +0000
>>>>
>>>>       ipv4: Don't report stale pmtu values to userspace
>>>>
>>>>       We report cached pmtu values even if they are already expired.
>>>>       Change this to not report these values after they are expired
>>>>       and fix a race in the expire time calculation, as suggested by
>>>>       Eric Dumazet.
>>>>
>>>> Still, PMTU information is not supposed to expire for 10 minutes...
>>>>
>>>>
>>>>          
>>> With reverted commit there is no such problem on 3.7.1: IP_MTU is
>>> updated and DF is set only for the first packet in case of
>>> IP_PMTUDISC_WANT.
>>>        
>> [...]
>>
>> So it looks like something is going wrong with the expiry calculation
>> here.
>>
>> This change shouldn't affect the PMTU actually used by the kernel, but
>> could affect Onload since that relies on netlink route updates to keep
>> in synch.  You didn't say you were using Onload, but if you are then we
>> should not bother netdev with this until we can demonstrate a problem
>> that involves only the kernel stack.
>>
>>      
> I'm really surprised that this change can have such an effect,
> it changes nothing at the kernels pmtu handling. When looking
> at the code, I found that we may report a mtu value from a stale
> dst_entry when we query the mtu value with the IP_MTU socket
> option. But a subsequent send() should update the socket cached
> dst_entry, so at most one packet should be affected.
>
> Does the patch below change anything?
>
>
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 3c9d208..1049ce0 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
>   	{
>   		struct dst_entry *dst;
>   		val = 0;
> -		dst = sk_dst_get(sk);
> +		dst = sk_dst_check(sk, 0);
>   		if (dst) {
>   			val = dst_mtu(dst);
>   			dst_release(dst);
>    
With this patch kernel 3.7.1 works perfect. All described problems are 
fixed.

^ permalink raw reply

* Re: [PATCH] net: ipv4: route: fixed a coding style issues net: ipv4: tcp: fixed a coding style issues
From: Nicolas Dichtel @ 2012-12-20 12:07 UTC (permalink / raw)
  To: Stefan Hasko
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <1355990910-3688-1-git-send-email-hasko.stevo@gmail.com>

Le 20/12/2012 09:08, Stefan Hasko a écrit :
> Fix a coding style issues.
>
> Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
> ---
>   net/ipv4/route.c |  125 ++++++++++++++++++-------------
>   net/ipv4/tcp.c   |  218 +++++++++++++++++++++++++++++++-----------------------
>   2 files changed, 200 insertions(+), 143 deletions(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 844a9ef..fff7ce6 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -20,7 +20,7 @@
>    *		Alan Cox	:	Added BSD route gw semantics
>    *		Alan Cox	:	Super /proc >4K
>    *		Alan Cox	:	MTU in route table
> - *		Alan Cox	: 	MSS actually. Also added the window
> + *		Alan Cox	:	MSS actually. Also added the window
>    *					clamper.
>    *		Sam Lantinga	:	Fixed route matching in rt_del()
>    *		Alan Cox	:	Routing cache support.
> @@ -31,30 +31,35 @@
>    *	Miquel van Smoorenburg	:	BSD API fixes.
>    *	Miquel van Smoorenburg	:	Metrics.
>    *		Alan Cox	:	Use __u32 properly
> - *		Alan Cox	:	Aligned routing errors more closely with BSD
> + *		Alan Cox	:	Aligned routing errors more
> + *					closely with BSD
>    *					our system is still very different.
>    *		Alan Cox	:	Faster /proc handling
> - *	Alexey Kuznetsov	:	Massive rework to support tree based routing,
> + *	Alexey Kuznetsov	:	Massive rework to support
> + *					tree based routing,
>    *					routing caches and better behaviour.
>    *
>    *		Olaf Erb	:	irtt wasn't being copied right.
>    *		Bjorn Ekwall	:	Kerneld route support.
>    *		Alan Cox	:	Multicast fixed (I hope)
> - * 		Pavel Krauz	:	Limited broadcast fixed
> + *		Pavel Krauz	:	Limited broadcast fixed
>    *		Mike McLagan	:	Routing by source
>    *	Alexey Kuznetsov	:	End of old history. Split to fib.c and
>    *					route.c and rewritten from scratch.
>    *		Andi Kleen	:	Load-limit warning messages.
> - *	Vitaly E. Lavrov	:	Transparent proxy revived after year coma.
> + *	Vitaly E. Lavrov	:	Transparent proxy revived
> + *					after year coma.
>    *	Vitaly E. Lavrov	:	Race condition in ip_route_input_slow.
> - *	Tobias Ringstrom	:	Uninitialized res.type in ip_route_output_slow.
> + *	Tobias Ringstrom	:	Uninitialized res.type in
> + *					ip_route_output_slow.
>    *	Vladimir V. Ivanov	:	IP rule info (flowid) is really useful.
>    *		Marc Boucher	:	routing by fwmark
>    *	Robert Olsson		:	Added rt_cache statistics
>    *	Arnaldo C. Melo		:	Convert proc stuff to seq_file
> - *	Eric Dumazet		:	hashed spinlocks and rt_check_expire() fixes.
> - * 	Ilia Sotnikov		:	Ignore TOS on PMTUD and Redirect
> - * 	Ilia Sotnikov		:	Removed TOS from hash calculations
> + *	Eric Dumazet		:	hashed spinlocks and
> + *					rt_check_expire() fixes.
> + *	Ilia Sotnikov		:	Ignore TOS on PMTUD and Redirect
> + *	Ilia Sotnikov		:	Removed TOS from hash calculations
>    *
>    *		This program is free software; you can redistribute it and/or
>    *		modify it under the terms of the GNU General Public License
> @@ -65,7 +70,7 @@
>   #define pr_fmt(fmt) "IPv4: " fmt
>
>   #include <linux/module.h>
> -#include <asm/uaccess.h>
> +#include <linux/uaccess.h>
>   #include <linux/bitops.h>
>   #include <linux/types.h>
>   #include <linux/kernel.h>
> @@ -139,7 +144,8 @@ static unsigned int	 ipv4_default_advmss(const struct dst_entry *dst);
>   static unsigned int	 ipv4_mtu(const struct dst_entry *dst);
>   static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
>   static void		 ipv4_link_failure(struct sk_buff *skb);
> -static void		 ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
> +static void		 ip_rt_update_pmtu(struct dst_entry *dst,
> +					   struct sock *sk,
>   					   struct sk_buff *skb, u32 mtu);
>   static void		 ip_do_redirect(struct dst_entry *dst, struct sock *sk,
>   					struct sk_buff *skb);
> @@ -291,12 +297,17 @@ static int rt_cpu_seq_show(struct seq_file *seq, void *v)
>   	struct rt_cache_stat *st = v;
>
>   	if (v == SEQ_START_TOKEN) {
> -		seq_printf(seq, "entries  in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src  out_hit out_slow_tot out_slow_mc  gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n");
> +		seq_printf(seq, "entries  in_hit in_slow_tot in_slow_mc "
> +				"in_no_route in_brd in_martian_dst "
> +				"in_martian_src  out_hit out_slow_tot "
> +				"out_slow_mc  gc_total gc_ignored "
> +				"gc_goal_miss gc_dst_overflow in_hlist_search "
> +				"out_hlist_search\n");
checkpatch will warn you about this one, something like:
"WARNING: quoted string split across lines".
Not breaking such line ease to grep the pattern.

Nicolas

^ permalink raw reply

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
From: Jamal Hadi Salim @ 2012-12-20 12:35 UTC (permalink / raw)
  To: Yury Stankevich
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50D2D229.6040802@gmail.com>


Could be your setup. I didnt do a lot of testing but
from my notes (running different kernel at the moment):

#try to point to everything (no iptables setup)
tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid 
23:23 action xt -j CONNMARK --restore-mark
#let it run for a 1 sec then display with
tc -s filter show dev eth0 parent ffff:

----
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt 
0 flowid 23:23
   match 00000000/00000000 at 0
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 1 ref 1 bind 1 installed 3 sec used 1 sec
	Action statistics:
	Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
----

cheers,
jamal

On 12-12-20 03:54 AM, Yury Stankevich wrote:
> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>> Hasan/Yury, if you test this please use the latest iproute2 with only
>> the first patch I posted (originally from Hasan). Hasan please use that
>> patch not your version - if theres anything wrong we can find out sooner
>> before the patch becomes final.
>
> Hello,
> 3.7.1 kernel with 3.7.0 iproute,
> patch-xt, xt-p1 + linkage fix was applyed
> command successfully performed, but actually doesn't work.
>
> command:
> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>              action xt -j CONNMARK --restore-mark \
>              action mirred egress redirect dev ifb0
> then i use filter:
>
> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
> 1:102
>
> iptables line:
> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
> bytes -j CONNMARK --set-mark 0xa
>
> once i run a test to download 300K file,
> from iptables counters i can see that rule in POSTROUTING is triggered,
> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
> 1:102 flow.
>
> btw,
> tc -p -s filter show dev ifb0 parent 1:
> do not show stats `(rule hit 416 success 0)` for this (filter protocol
> ip pref 2 fw handle 0xa classid 1:102) rule.
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
From: Steffen Klassert @ 2012-12-20 12:35 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky
In-Reply-To: <50D2F4E5.4050904@oktetlabs.ru>

On Thu, Dec 20, 2012 at 03:22:13PM +0400, Yurij M. Plotnikov wrote:
> On 12/20/12 11:34, Steffen Klassert wrote:
> >
> >diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> >index 3c9d208..1049ce0 100644
> >--- a/net/ipv4/ip_sockglue.c
> >+++ b/net/ipv4/ip_sockglue.c
> >@@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
> >  	{
> >  		struct dst_entry *dst;
> >  		val = 0;
> >-		dst = sk_dst_get(sk);
> >+		dst = sk_dst_check(sk, 0);
> >  		if (dst) {
> >  			val = dst_mtu(dst);
> >  			dst_release(dst);
> With this patch kernel 3.7.1 works perfect. All described problems
> are fixed.

Thanks for testing!

I'm not sure if we can't use this as a fix. I think with this patch it
could happen that we return -ENOTCONN instead of a pmtu value on a
connected socket. Perhaps it is better to update the cached dst_entry in
ipv4_sk_update_pmtu() when we receive the -EMSGSIZE. I'll do some
investigation.

Anyway, it is still odd that reverting my other patch 'fixes'
this issue too.

^ permalink raw reply

* RE: TCP delayed ACK heuristic
From: Cong Wang @ 2012-12-20 12:41 UTC (permalink / raw)
  To: David Laight
  Cc: David Miller, rick.jones2, netdev, greearb, eric.dumazet,
	shemminger, tgraf
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70FC@saturn3.aculab.com>

On Thu, 2012-12-20 at 09:57 +0000, David Laight wrote:
> > So, can we at least have a sysctl to control the timeout of the delayed
> > ACK? I mean the minimum 40ms. TCP_QUICKACK can help too, but it requires
> > the receiver to modify the application and has to be set every time when
> > calling recv().
> 
> A sysctl in inappropriate - it affects the entire TCP protocol stack.
> 
> You want different behaviour for different remote hosts (probably
> different subnets).
> In particular your local subnet is unlikely to have packet loss
> and very likely to have a very low RTT.
> 
> AFAICT a lot of the recent 'tuning' has been done for web/ftp
> servers that are very remote from the client. These connections
> are also request-response ones - quite often with large responses.
> 
> IMHO This has been to the detriment of local connections.
> 

A customer prefers faster response in their low-loss environment, 40ms
is not good. Of course, they are supposed to know their environment when
they tune this.

Or maybe a sysctl equals to TCP_QUICKACK?

^ permalink raw reply

* Re: [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 12:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
	annie li, xen-devel@lists.xensource.com
In-Reply-To: <1355933869.21834.13.camel@edumazet-glaptop>


Wednesday, December 19, 2012, 5:17:49 PM, you wrote:

> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:

>> Hi Ian,
>> 
>> It ran overnight and i haven't seen the warn_once trigger.
>> (but i also didn't with the previous patch)
>> 

> As I said, the miminum value to not trigger the warning was what Ian
> patch was doing, but it was still a not accurate estimation.

> Doing the real accounting might trigger slow transferts, or dropped
> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.

> So the real question was : If accounting for full pages, is your
> applications run as smooth as before, with no huge performance
> regression ?

Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.

I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..

- The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
- The first variant (current code) seems to always substract from the truesize for small packets.
- The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
- The third variant seems to be a pretty wasteful estimation.

So the last variant seems to be rather wasteful, and the second one the most accurate so far.

Eric:
     From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
     When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?



[  116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[  117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
[  117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
[  117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
[  117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
[  117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
[  117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
[  117.539403] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
[  117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
[  117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
[  121.981917] net_ratelimit: 27 callbacks suppressed
[  121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[  126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)



diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c26e28b..8833e38 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
        struct sk_buff_head tmpq;
        unsigned long flags;
        int err;
+       int tsz,len;

        spin_lock(&np->rx_lock);

@@ -1037,9 +1038,22 @@ err:
                 * receive throughout using the standard receive
                 * buffer size was cut by 25%(!!!).
                 */
-               skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
+
+
+
+
+                tsz = skb->truesize;
+                len = skb->len;
+                /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
+                skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
                skb->len += skb->data_len;

+               net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
+                        skb->dev->name, skb->dev->mtu, skb->data_len, len,  skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
+                        skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
+                        skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
+                        PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
+
                if (rx->flags & XEN_NETRXF_csum_blank)
                        skb->ip_summed = CHECKSUM_PARTIAL;
                else if (rx->flags & XEN_NETRXF_data_validated)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3ab989b..6d0cd86 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,

        WARN_ON_ONCE(delta < len);

+       if(delta < len) {
+               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
+                        to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
+       }
+
+       if (delta > len && delta - len > 100) {
+               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
+                        to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
+       }
+
        memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
               skb_shinfo(from)->frags,
               skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));

^ permalink raw reply related

* [PATCH net] net/vxlan: Use the underlying device index when joining/leaving multicast groups
From: Yan Burman @ 2012-12-20 13:36 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, ogerlitz, Yan Burman

The socket calls from vxlan to join/leave multicast group aren't
using the index of the underlying device, as a result the stack uses
the first interface that is up. This results in vxlan being non functional
over a device which isn't the 1st to be up.
Fix this by providing the iflink field to the vxlan instance
to the multicast calls.

Signed-off-by: Yan Burman <yanb@mellanox.com>
---
 drivers/net/vxlan.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3b3fdf6..40f2cc1 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -505,7 +505,8 @@ static int vxlan_join_group(struct net_device *dev)
 	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 	struct sock *sk = vn->sock->sk;
 	struct ip_mreqn mreq = {
-		.imr_multiaddr.s_addr = vxlan->gaddr,
+		.imr_multiaddr.s_addr	= vxlan->gaddr,
+		.imr_ifindex		= vxlan->link,
 	};
 	int err;
 
@@ -532,7 +533,8 @@ static int vxlan_leave_group(struct net_device *dev)
 	int err = 0;
 	struct sock *sk = vn->sock->sk;
 	struct ip_mreqn mreq = {
-		.imr_multiaddr.s_addr = vxlan->gaddr,
+		.imr_multiaddr.s_addr	= vxlan->gaddr,
+		.imr_ifindex		= vxlan->link,
 	};
 
 	/* Only leave group when last vxlan is done. */
-- 
1.7.11.3

^ permalink raw reply related

* Re: [PATCH net-next V4 02/13] bridge: Add vlan filtering infrastructure
From: Shmulik Ladkani @ 2012-12-20 13:39 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <1355939304-21804-3-git-send-email-vyasevic@redhat.com>

Hi Vlad,

On Wed, 19 Dec 2012 12:48:13 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> +static void nbp_vlan_flush(struct net_bridge_port *p)
> +{
> +	struct net_port_vlan *pve;
> +	struct net_port_vlan *tmp;
> +
> +	ASSERT_RTNL();
> +
> +	list_for_each_entry_safe(pve, tmp, &p->vlan_list, list)
> +		nbp_vlan_delete(p, pve->vid, BRIDGE_FLAGS_SELF);

Why would you want to clear "bridge master port" association from this
vlan, in the event of NBP destruction?
The "bridge port" may still be a member of this vlan, doesn't it?
Seems flags argument should be 0.

> +#define BR_VID_HASH_SIZE (1<<6)
> +#define br_vlan_hash(vid) ((vid) % (BR_VID_HASH_SIZE - 1))

Did you mean:                       & (BR_VID_HASH_SIZE - 1)

Regards,
Shmulik

^ permalink raw reply

* Re: Network namespace bugs in L2TP
From: Tom Parkin @ 2012-12-20 13:52 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev
In-Reply-To: <87r4mt4um7.fsf@xmission.com>

[-- Attachment #1: Type: text/plain, Size: 2955 bytes --]

Hi Eric,

On Thu, Dec 13, 2012 at 11:31:12AM -0800, Eric W. Biederman wrote:
> Tom Parkin <tparkin@katalix.com> writes:
> 
> > On Wed, Dec 12, 2012 at 11:44:36AM -0800, Eric W. Biederman wrote:
> >> Tom Parkin <tparkin@katalix.com> writes:
> > I think that raises a question in the case of the L2TP tunnel sockets,
> > though.  Currently l2tp_tunnel_sock_create uses the namespace of the
> > current process for the socket.  The alternative is to pass in the
> > desired namespace from l2tp_tunnel_create -- and this makes sense, I
> > think.
> >
> > However, when l2tp_tunnel_create is called from the netlink code, the
> > namespace passed is that of the netlink socket.  At the risk of sounding
> > silly, what's the benefit of using the netlink socket namespace over the
> > process namespace in this case?
> 
> Using the netlink socket namespace ensure that if the netlink socket is
> passed between processes the semantics of sending messages down the
> netlink socket don't change.
> 
> There is another thread on netdev discussing another variant of this
> right now.  For some cases it is just a waste of resources to have one
> copy of a daemon per network namespace.  In which case a controlling
> daemon will open one netlink socket per network namespace and send
> commands down the appropriate socket for the network namespace the
> daemon wishes to control.

Yes, I saw that other thread.  Thanks for the clarification on this
point.

> > But that doesn't seem too unreasonable.  A user would have to take
> > explicit action to create an L2TP tunnel socket, and it might seem
> > reasonable for that socket to keep the namespace alive until the user
> > explicitly tears it down again.
> 
> Sending a netlink message to tear down the socket is not unreasonable.
> 
> Having a reference counting loop such that it is possible to close all
> other sockets and all other references to a network namespace and not
> have the network namespace go away because the L2TP tunnel socket holds
> a reference to the unreachable and unuusable network namespace is
> unreasonable.
> 
> We handle this with arp and icmp control sockets by not creating a
> reference count.  And having a pernet cleanup routing clean up those
> sockets.  Assuming I am right about the reference counting loop being
> possible this is something to look at.

Yep, OK.  I hadn't appreciated the namespace could become inaccessible!

I've done some digging and I believe there is an issue with the
reference counting for the unmanaged tunnel sockets -- certainly I am
able to leak netns resources here.

I've been working on a patchset which I hope will address these issues
in l2tp_core.  I'm stress testing it now and hope to post to netdev
soon for review.

Thanks again for your help.

Tom
-- 
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH] 8139cp: Prevent dev_close/cp_interrupt race on MTU change
From: John Greene @ 2012-12-20 13:55 UTC (permalink / raw)
  To: David Woodhouse; +Cc: David Miller, netdev
In-Reply-To: <1355950547.18919.93.camel@shinybook.infradead.org>

On 12/19/2012 03:55 PM, David Woodhouse wrote:
> On Wed, 2012-12-19 at 12:40 -0800, David Miller wrote:
>> You sent this as a "request for testing" last week, but I saw
>> no testing on real hardware whatsoever.
>
> Thanks for the reminder :)
>
> Seems to work fine here. I haven't confirmed whether I actually see the
> race or not but changing MTU on a live device works fine, even when it's
> being ping-flooded.
>
> Tested-by: David Woodhouse <David.Woodhouse@intel.com>
>
Thanks all. Happy holidays!

-- 
John Greene

^ permalink raw reply

* skb->cb size checks (was Re: [PATCH 00/17] ATM fixes for pppoatm/br2684)
From: David Woodhouse @ 2012-12-20 14:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20121201.204906.1703696018528746748.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]

On Sat, 2012-12-01 at 20:49 -0500, David Miller wrote:
> From: David Woodhouse <dwmw2@infradead.org>
> Date: Sun, 02 Dec 2012 00:40:47 +0000
> 
> > On Sat, 2012-12-01 at 17:33 +0000, David Woodhouse wrote:
> >> 
> >> Very glad I added the BUILD_BUG_ON on the cb struct size now. Perhaps
> >> there should be a generic helper for that? Something like
> >>  skb_cb_cast(struct foo_cb, skb) could do it automatically...?
> > 
> > Something like this, perhaps? Using skb_cast_cb() would then make it
> > fairly much impossible to accidentally overflow the size of the skb cb.
> 
> I actually prefer what we do now, which is do the BUILD_BUG_ON()
> once in the subsystem specific code, usually the initializer.
> 
> It's part of creating a new SKB cb, adding that assertion somewhere.

I looked harder at this, and should follow up before it actually does
fall out of the cracks in my brain and get completely forgotten.

Basically, you lie :)

What we *actually* do now, in about two-thirds of cases¹ even in net/
code (I didn't even look at drivers, which I expect to be worse), is use
skb->cb without any form of automatic size check at all. No manual
BUILD_BUG_ON() or anything.

Admittedly, in almost all cases that *isn't* a real problem, because the
structure *isn't* too big for skb->cb and it's all fine. But as a matter
of principle we probably *should* be doing those checks. Just in *case*
someone comes along and adds something stupid to the structure.

So... should we:
 - Ignore the "problem" and leave things as they are.

 - Go through and fix the 2/3 of offending net/ code and then the
   drivers too, *without* making the generic 'deference and automatic
   check' macro that I think would simplify that and help to keep us
   honest in future.
or
 - Let me add something like the skb_cast_cb() macro I wanted, then use
   it in all the offending code I can find.

-- 
dwmw2

¹ http://www.spinics.net/lists/netdev/msg218642.html


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply

* Lockdep warning in vxlan
From: Yan Burman @ 2012-12-20 14:00 UTC (permalink / raw)
  To: shemminger, netdev, Yan Burman

Hi.

When working with vxlan from current net-next, I got a lockdep warning 
(below).
It seems to happen when I have host B pinging host A and while the pings 
continue,
I do "ip link del" on the vxlan interface on host A. The lockdep warning 
is on host A.
Tell me if you need some more info.

=============================================
[ INFO: possible recursive locking detected ]
3.7.0+ #24 Not tainted
---------------------------------------------
swapper/1/0 is trying to acquire lock:
  (&n->lock){++--..}, at: [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0

but task is already holding lock:
  (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&n->lock);
   lock(&n->lock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

4 locks held by swapper/1/0:
  #0:  (((&n->timer))){+.-...}, at: [<ffffffff8104b350>] 
call_timer_fn+0x0/0x1c0
  #1:  (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280
  #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81395400>] 
dev_queue_xmit+0x0/0x5d0
  #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff813cb41e>] 
ip_finish_output+0x13e/0x640

stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.7.0+ #24
Call Trace:
  <IRQ>  [<ffffffff8108c7ac>] validate_chain+0xdcc/0x11f0
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff81120565>] ? kmem_cache_free+0xe5/0x1c0
  [<ffffffff8108d570>] __lock_acquire+0x440/0xc30
  [<ffffffff813c3570>] ? inet_getpeer+0x40/0x600
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8108ddf5>] lock_acquire+0x95/0x140
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
  [<ffffffff81448d4b>] _raw_write_lock_bh+0x3b/0x50
  [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
  [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0
  [<ffffffff8139f99b>] neigh_resolve_output+0x16b/0x270
  [<ffffffff813cb62d>] ip_finish_output+0x34d/0x640
  [<ffffffff813cb41e>] ? ip_finish_output+0x13e/0x640
  [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
  [<ffffffff813cb9a0>] ip_output+0x80/0xf0
  [<ffffffff813ca368>] ip_local_out+0x28/0x80
  [<ffffffffa046f25a>] vxlan_xmit+0x66a/0xbec [vxlan]
  [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
  [<ffffffff81394a50>] ? skb_gso_segment+0x2b0/0x2b0
  [<ffffffff81449355>] ? _raw_spin_unlock_irqrestore+0x65/0x80
  [<ffffffff81394c57>] ? dev_queue_xmit_nit+0x207/0x270
  [<ffffffff813950c8>] dev_hard_start_xmit+0x298/0x5d0
  [<ffffffff813956f3>] dev_queue_xmit+0x2f3/0x5d0
  [<ffffffff81395400>] ? dev_hard_start_xmit+0x5d0/0x5d0
  [<ffffffff813f5788>] arp_xmit+0x58/0x60
  [<ffffffff813f59db>] arp_send+0x3b/0x40
  [<ffffffff813f6424>] arp_solicit+0x204/0x280
  [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
  [<ffffffff8139f515>] neigh_probe+0x45/0x70
  [<ffffffff813a1c10>] neigh_timer_handler+0x1a0/0x2a0
  [<ffffffff8104b3cf>] call_timer_fn+0x7f/0x1c0
  [<ffffffff8104b350>] ? detach_if_pending+0x120/0x120
  [<ffffffff8104b748>] run_timer_softirq+0x238/0x2b0
  [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
  [<ffffffff81043e51>] __do_softirq+0x101/0x280
  [<ffffffff814518cc>] call_softirq+0x1c/0x30
  [<ffffffff81003b65>] do_softirq+0x85/0xc0
  [<ffffffff81043a7e>] irq_exit+0x9e/0xc0
  [<ffffffff810264f8>] smp_apic_timer_interrupt+0x68/0xa0
  [<ffffffff8145122f>] apic_timer_interrupt+0x6f/0x80
  <EOI>  [<ffffffff8100a054>] ? mwait_idle+0xa4/0x1c0
  [<ffffffff8100a04b>] ? mwait_idle+0x9b/0x1c0
  [<ffffffff8100a6a9>] cpu_idle+0x89/0xe0
  [<ffffffff81441127>] start_secondary+0x1b2/0x1b6

Hope this helps
Yan

^ permalink raw reply

* Re: [PATCH net-next V4 03/13] bridge: Validate that vlan is permitted on ingress
From: Shmulik Ladkani @ 2012-12-20 14:07 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <1355939304-21804-4-git-send-email-vyasevic@redhat.com>

Hi Vlad,

On Wed, 19 Dec 2012 12:48:14 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> +static bool br_allowed_ingress(struct net_bridge_port *p, struct sk_buff *skb)
> +{
> +	struct net_port_vlan *pve;
> +	u16 vid;
> +
> +	/* If there are no vlan in the permitted list, all packets are
> +	 * permitted.
> +	 */
> +	if (list_empty(&p->vlan_list))
> +		return true;

I assumed the default policy would be Drop in such case, otherwise
leaking between vlan domains is possible.
Or maybe, ingress policy when port isn't a member of ingress VID should
be configurable (drop/allow).

> +	vid = br_get_vlan(skb);
> +	pve = nbp_vlan_find(p, vid);

Why search by iterating through NBP's vlan_list?
You know the VID (hence may fetch the net_bridge_vlan from the hash), so
why don't you directly consult the net_bridge_vlan's port_bitmap?

> @@ -54,6 +74,9 @@ int br_handle_frame_finish(struct sk_buff *skb)
>  	if (!p || p->state == BR_STATE_DISABLED)
>  		goto drop;
>  
> +	if (!br_allowed_ingress(p, skb))
> +		goto drop;
> +

This condition should be also encorporated upon "ingress" at the "bridge
master port" (that is, early at br_dev_xmit).
Think of the "bridge master port" as yet another port:
upon "ingress" (meaning, tx packets from the ip stack), we should
also enforce any ingress permission rules.

Regards,
Shmulik

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 14:23 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Eric Dumazet, netdev@vger.kernel.org, annie li,
	xen-devel@lists.xensource.com, Ian Campbell,
	Konrad Rzeszutek Wilk
In-Reply-To: <1797374383.20121220135139@eikelenboom.it>


Thursday, December 20, 2012, 1:51:39 PM, you wrote:


> Wednesday, December 19, 2012, 5:17:49 PM, you wrote:

>> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:

>>> Hi Ian,
>>> 
>>> It ran overnight and i haven't seen the warn_once trigger.
>>> (but i also didn't with the previous patch)
>>> 

>> As I said, the miminum value to not trigger the warning was what Ian
>> patch was doing, but it was still a not accurate estimation.

>> Doing the real accounting might trigger slow transferts, or dropped
>> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.

>> So the real question was : If accounting for full pages, is your
>> applications run as smooth as before, with no huge performance
>> regression ?

> Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.

> I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
> So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..

> - The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
> - The first variant (current code) seems to always substract from the truesize for small packets.
> - The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
> - The third variant seems to be a pretty wasteful estimation.

> So the last variant seems to be rather wasteful, and the second one the most accurate so far.

> Eric:
>      From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
>      When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?



> [  116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [  117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
> [  117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
> [  117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
> [  117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
> [  117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
> [  117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
> [  117.539403] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
> [  117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
> [  117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
> [  121.981917] net_ratelimit: 27 callbacks suppressed
> [  121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [  126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)



> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index c26e28b..8833e38 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>         struct sk_buff_head tmpq;
>         unsigned long flags;
>         int err;
> +       int tsz,len;

>         spin_lock(&np->rx_lock);

> @@ -1037,9 +1038,22 @@ err:
>                  * receive throughout using the standard receive
>                  * buffer size was cut by 25%(!!!).
>                  */
> -               skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> +
> +
> +
> +
> +                tsz = skb->truesize;
> +                len = skb->len;
> +                /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
> +                skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>                 skb->len += skb->data_len;

> +               net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
> +                        skb->dev->name, skb->dev->mtu, skb->data_len, len,  skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
> +                        skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
> +                        skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
> +                        PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
> +
>                 if (rx->flags & XEN_NETRXF_csum_blank)
>                         skb->ip_summed = CHECKSUM_PARTIAL;
>                 else if (rx->flags & XEN_NETRXF_data_validated)
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 3ab989b..6d0cd86 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,

>         WARN_ON_ONCE(delta < len);

> +       if(delta < len) {
> +               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
> +                        to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
> +       }
> +
+       if (delta >> len && delta - len > 100) {
> +               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
> +                        to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
> +       }
> +
>         memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
>                skb_shinfo(from)->frags,
>                skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));



Ok i succeeded in triggering the warn_on_once, but it seems the extra debug info from netfront was just rate limited away for the offending packet :(

Dec 20 15:17:33 media kernel: [  393.464062] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [  393.464438] eth0: mtu:1500 data_len:762 len before:0 len after:762 truesize before:896 truesize after:1402 nr_frags:1 variant1:506(1402) variant2:506(1402) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [  393.465083] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [  393.466114] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [  393.467336] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  394.940211] ------------[ cut here ]------------
Dec 20 15:17:35 media kernel: [  394.940259] WARNING: at net/core/skbuff.c:3472 skb_try_coalesce+0x3fc/0x470()
Dec 20 15:17:35 media kernel: [  394.940282] Modules linked in:
Dec 20 15:17:35 media kernel: [  394.940306] Pid: 2632, comm: glusterfs Not tainted 3.7.0-rc0-20121220-netfrontdebug1 #1
Dec 20 15:17:35 media kernel: [  394.940330] Call Trace:
Dec 20 15:17:35 media kernel: [  394.940343]  <IRQ>  [<ffffffff8106889a>] warn_slowpath_common+0x7a/0xb0
Dec 20 15:17:35 media kernel: [  394.940384]  [<ffffffff810688e5>] warn_slowpath_null+0x15/0x20
Dec 20 15:17:35 media kernel: [  394.940409]  [<ffffffff8184298c>] skb_try_coalesce+0x3fc/0x470
Dec 20 15:17:35 media kernel: [  394.940434]  [<ffffffff818fb049>] tcp_try_coalesce+0x69/0xc0
Dec 20 15:17:35 media kernel: [  394.940458]  [<ffffffff818fb0f4>] tcp_queue_rcv+0x54/0x100
Dec 20 15:17:35 media kernel: [  394.940481]  [<ffffffff8190029f>] ? tcp_mtup_init+0x2f/0x90
Dec 20 15:17:35 media kernel: [  394.940504]  [<ffffffff818ffbdb>] tcp_rcv_established+0x2bb/0x6a0
Dec 20 15:17:35 media kernel: [  394.940528]  [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
Dec 20 15:17:35 media kernel: [  394.940551]  [<ffffffff81907985>] tcp_v4_do_rcv+0x135/0x480
Dec 20 15:17:35 media kernel: [  394.940576]  [<ffffffff819b3532>] ? _raw_spin_lock_nested+0x42/0x50
Dec 20 15:17:35 media kernel: [  394.940600]  [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
Dec 20 15:17:35 media kernel: [  394.940623]  [<ffffffff8190862d>] tcp_v4_rcv+0x95d/0xb10
Dec 20 15:17:35 media kernel: [  394.940666]  [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
Dec 20 15:17:35 media kernel: [  394.940694]  [<ffffffff818e4d6a>] ip_local_deliver_finish+0x11a/0x230
Dec 20 15:17:35 media kernel: [  394.940720]  [<ffffffff818e4c95>] ? ip_local_deliver_finish+0x45/0x230
Dec 20 15:17:35 media kernel: [  394.940745]  [<ffffffff818e4eb8>] ip_local_deliver+0x38/0x80
Dec 20 15:17:35 media kernel: [  394.940784]  [<ffffffff818e447a>] ip_rcv_finish+0x15a/0x630
Dec 20 15:17:35 media kernel: [  394.940807]  [<ffffffff818e4b68>] ip_rcv+0x218/0x300
Dec 20 15:17:35 media kernel: [  394.940829]  [<ffffffff8184bf2d>] __netif_receive_skb+0x65d/0x8d0
Dec 20 15:17:35 media kernel: [  394.940853]  [<ffffffff8184ba15>] ? __netif_receive_skb+0x145/0x8d0
Dec 20 15:17:35 media kernel: [  394.940889]  [<ffffffff810b192d>] ? trace_hardirqs_on+0xd/0x10
Dec 20 15:17:35 media kernel: [  394.940914]  [<ffffffff810fecbb>] ? free_hot_cold_page+0x1ab/0x1e0
Dec 20 15:17:35 media kernel: [  394.940939]  [<ffffffff8184e4f8>] netif_receive_skb+0x28/0xf0
Dec 20 15:17:35 media kernel: [  394.940964]  [<ffffffff81843e83>] ? __pskb_pull_tail+0x253/0x340
Dec 20 15:17:35 media kernel: [  394.941000]  [<ffffffff8164fbb5>] xennet_poll+0xae5/0xed0
Dec 20 15:17:35 media kernel: [  394.941024]  [<ffffffff81080081>] ? wake_up_worker+0x1/0x30
Dec 20 15:17:35 media kernel: [  394.941046]  [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
Dec 20 15:17:35 media kernel: [  394.941075]  [<ffffffff8184ed66>] net_rx_action+0x136/0x260
Dec 20 15:17:35 media kernel: [  394.941098]  [<ffffffff81070551>] ? __do_softirq+0x71/0x1a0
Dec 20 15:17:35 media kernel: [  394.941133]  [<ffffffff810705a9>] __do_softirq+0xc9/0x1a0
Dec 20 15:17:35 media kernel: [  394.941157]  [<ffffffff819b623c>] call_softirq+0x1c/0x30
Dec 20 15:17:35 media kernel: [  394.941179]  [<ffffffff8100fdc5>] do_softirq+0x85/0xf0
Dec 20 15:17:35 media kernel: [  394.941201]  [<ffffffff8107041e>] irq_exit+0x9e/0xd0
Dec 20 15:17:35 media kernel: [  394.941235]  [<ffffffff81430b1f>] xen_evtchn_do_upcall+0x2f/0x40
Dec 20 15:17:35 media kernel: [  394.941259]  [<ffffffff819b629e>] xen_do_hypervisor_callback+0x1e/0x30
Dec 20 15:17:35 media kernel: [  394.941279]  <EOI>  [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
Dec 20 15:17:35 media kernel: [  394.941318]  [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
Dec 20 15:17:35 media kernel: [  394.941356]  [<ffffffff8100890d>] ? xen_force_evtchn_callback+0xd/0x10
Dec 20 15:17:35 media kernel: [  394.941381]  [<ffffffff810092b2>] ? check_events+0x12/0x20
Dec 20 15:17:35 media kernel: [  394.941405]  [<ffffffff81009259>] ? xen_irq_enable_direct_reloc+0x4/0x4
Dec 20 15:17:35 media kernel: [  394.941432]  [<ffffffff819b3f6c>] ? _raw_spin_unlock_irq+0x3c/0x70
Dec 20 15:17:35 media kernel: [  394.941473]  [<ffffffff81095f83>] ? finish_task_switch+0x83/0xe0
Dec 20 15:17:35 media kernel: [  394.941507]  [<ffffffff81095f46>] ? finish_task_switch+0x46/0xe0
Dec 20 15:17:35 media kernel: [  394.941533]  [<ffffffff819b2434>] ? __schedule+0x444/0x880
Dec 20 15:17:35 media kernel: [  394.941555]  [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
Dec 20 15:17:35 media kernel: [  394.941580]  [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
Dec 20 15:17:35 media kernel: [  394.941614]  [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
Dec 20 15:17:35 media kernel: [  394.941638]  [<ffffffff819aff95>] ? __mutex_unlock_slowpath+0x135/0x1d0
Dec 20 15:17:35 media kernel: [  394.941663]  [<ffffffff819b2904>] ? schedule+0x24/0x70
Dec 20 15:17:35 media kernel: [  394.941697]  [<ffffffff819b179d>] ? schedule_hrtimeout_range_clock+0x11d/0x140
Dec 20 15:17:35 media kernel: [  394.941725]  [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
Dec 20 15:17:35 media kernel: [  394.941748]  [<ffffffff8118a558>] ? ep_poll+0xf8/0x3a0
Dec 20 15:17:35 media kernel: [  394.941770]  [<ffffffff819b4015>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
Dec 20 15:17:35 media kernel: [  394.941808]  [<ffffffff810b1818>] ? trace_hardirqs_on_caller+0xf8/0x200
Dec 20 15:17:35 media kernel: [  394.941833]  [<ffffffff819b17ce>] ? schedule_hrtimeout_range+0xe/0x10
Dec 20 15:17:35 media kernel: [  394.941856]  [<ffffffff8118a75a>] ? ep_poll+0x2fa/0x3a0
Dec 20 15:17:35 media kernel: [  394.941878]  [<ffffffff81098630>] ? try_to_wake_up+0x310/0x310
Dec 20 15:17:35 media kernel: [  394.941913]  [<ffffffff810b5b17>] ? lock_release+0x117/0x250
Dec 20 15:17:35 media kernel: [  394.941938]  [<ffffffff81165fd7>] ? fget_light+0xd7/0x140
Dec 20 15:17:35 media kernel: [  394.941959]  [<ffffffff81165f3a>] ? fget_light+0x3a/0x140
Dec 20 15:17:35 media kernel: [  394.941981]  [<ffffffff8118a8ce>] ? sys_epoll_wait+0xce/0xe0
Dec 20 15:17:35 media kernel: [  394.942015]  [<ffffffff819b4e69>] ? system_call_fastpath+0x16/0x1b
Dec 20 15:17:35 media kernel: [  394.942036] ---[ end trace 6f3a832c9e91c8af ]---
Dec 20 15:17:35 media kernel: [  394.942056] to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:22978 len:23168 from->truesize:23874 skb_headlen(from):0 skb_shinfo(to)->nr_frags:4 skb_shinfo(from)->nr_frags:6
Dec 20 15:17:35 media kernel: [  394.968199] to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:14290 len:14480 from->truesize:15186 skb_headlen(from):0 skb_shinfo(to)->nr_frags:13 skb_shinfo(from)->nr_frags:4
Dec 20 15:17:35 media kernel: [  395.262814] net_ratelimit: 371 callbacks suppressed
Dec 20 15:17:35 media kernel: [  395.262858] eth0: mtu:1500 data_len:90 len before:0 len after:90 truesize before:896 truesize after:730 nr_frags:1 variant1:-166(730) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.264767] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.266193] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.268422] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.271617] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.274794] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.278104] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.281319] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.284454] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.287797] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [  395.291121] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)

^ permalink raw reply

* [PATCH] net: ipv4: route: fix coding style issues net: ipv4: tcp: fix coding style issues
From: Stefan Hasko @ 2012-12-20 14:28 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev
  Cc: linux-kernel, Stefan Hasko

Fix a coding style issues.

Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
---
 net/ipv4/route.c |  119 ++++++++++++++++-------------
 net/ipv4/tcp.c   |  218 +++++++++++++++++++++++++++++++-----------------------
 2 files changed, 194 insertions(+), 143 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 844a9ef..29678e5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -20,7 +20,7 @@
  *		Alan Cox	:	Added BSD route gw semantics
  *		Alan Cox	:	Super /proc >4K
  *		Alan Cox	:	MTU in route table
- *		Alan Cox	: 	MSS actually. Also added the window
+ *		Alan Cox	:	MSS actually. Also added the window
  *					clamper.
  *		Sam Lantinga	:	Fixed route matching in rt_del()
  *		Alan Cox	:	Routing cache support.
@@ -31,30 +31,35 @@
  *	Miquel van Smoorenburg	:	BSD API fixes.
  *	Miquel van Smoorenburg	:	Metrics.
  *		Alan Cox	:	Use __u32 properly
- *		Alan Cox	:	Aligned routing errors more closely with BSD
+ *		Alan Cox	:	Aligned routing errors more
+ *					closely with BSD
  *					our system is still very different.
  *		Alan Cox	:	Faster /proc handling
- *	Alexey Kuznetsov	:	Massive rework to support tree based routing,
+ *	Alexey Kuznetsov	:	Massive rework to support
+ *					tree based routing,
  *					routing caches and better behaviour.
  *
  *		Olaf Erb	:	irtt wasn't being copied right.
  *		Bjorn Ekwall	:	Kerneld route support.
  *		Alan Cox	:	Multicast fixed (I hope)
- * 		Pavel Krauz	:	Limited broadcast fixed
+ *		Pavel Krauz	:	Limited broadcast fixed
  *		Mike McLagan	:	Routing by source
  *	Alexey Kuznetsov	:	End of old history. Split to fib.c and
  *					route.c and rewritten from scratch.
  *		Andi Kleen	:	Load-limit warning messages.
- *	Vitaly E. Lavrov	:	Transparent proxy revived after year coma.
+ *	Vitaly E. Lavrov	:	Transparent proxy revived
+ *					after year coma.
  *	Vitaly E. Lavrov	:	Race condition in ip_route_input_slow.
- *	Tobias Ringstrom	:	Uninitialized res.type in ip_route_output_slow.
+ *	Tobias Ringstrom	:	Uninitialized res.type in
+ *					ip_route_output_slow.
  *	Vladimir V. Ivanov	:	IP rule info (flowid) is really useful.
  *		Marc Boucher	:	routing by fwmark
  *	Robert Olsson		:	Added rt_cache statistics
  *	Arnaldo C. Melo		:	Convert proc stuff to seq_file
- *	Eric Dumazet		:	hashed spinlocks and rt_check_expire() fixes.
- * 	Ilia Sotnikov		:	Ignore TOS on PMTUD and Redirect
- * 	Ilia Sotnikov		:	Removed TOS from hash calculations
+ *	Eric Dumazet		:	hashed spinlocks and
+ *					rt_check_expire() fixes.
+ *	Ilia Sotnikov		:	Ignore TOS on PMTUD and Redirect
+ *	Ilia Sotnikov		:	Removed TOS from hash calculations
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
@@ -65,7 +70,7 @@
 #define pr_fmt(fmt) "IPv4: " fmt
 
 #include <linux/module.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 #include <linux/bitops.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
@@ -139,7 +144,8 @@ static unsigned int	 ipv4_default_advmss(const struct dst_entry *dst);
 static unsigned int	 ipv4_mtu(const struct dst_entry *dst);
 static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
 static void		 ipv4_link_failure(struct sk_buff *skb);
-static void		 ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
+static void		 ip_rt_update_pmtu(struct dst_entry *dst,
+					   struct sock *sk,
 					   struct sk_buff *skb, u32 mtu);
 static void		 ip_do_redirect(struct dst_entry *dst, struct sock *sk,
 					struct sk_buff *skb);
@@ -291,12 +297,11 @@ static int rt_cpu_seq_show(struct seq_file *seq, void *v)
 	struct rt_cache_stat *st = v;
 
 	if (v == SEQ_START_TOKEN) {
-		seq_printf(seq, "entries  in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src  out_hit out_slow_tot out_slow_mc  gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n");
+		seq_printf(seq, "entries in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src out_hit out_slow_tot out_slow_mc gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n");
 		return 0;
 	}
 
-	seq_printf(seq,"%08x  %08x %08x %08x %08x %08x %08x %08x "
-		   " %08x %08x %08x %08x %08x %08x %08x %08x %08x \n",
+		seq_printf(seq, "%08x  %08x %08x %08x %08x %08x %08x %08x  %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
 		   dst_entries_get_slow(&ipv4_dst_ops),
 		   st->in_hit,
 		   st->in_slow_tot,
@@ -657,8 +662,8 @@ out_unlock:
 	return;
 }
 
-static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flowi4 *fl4,
-			     bool kill_route)
+static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb,
+			     struct flowi4 *fl4, bool kill_route)
 {
 	__be32 new_gw = icmp_hdr(skb)->un.gateway;
 	__be32 old_gw = ip_hdr(skb)->saddr;
@@ -695,7 +700,8 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 	if (!IN_DEV_SHARED_MEDIA(in_dev)) {
 		if (!inet_addr_onlink(in_dev, new_gw, old_gw))
 			goto reject_redirect;
-		if (IN_DEV_SEC_REDIRECTS(in_dev) && ip_fib_check_default(new_gw, dev))
+		if (IN_DEV_SEC_REDIRECTS(in_dev) &&
+		    ip_fib_check_default(new_gw, dev))
 			goto reject_redirect;
 	} else {
 		if (inet_addr_type(net, new_gw) != RTN_UNICAST)
@@ -737,7 +743,8 @@ reject_redirect:
 	;
 }
 
-static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb)
+static void ip_do_redirect(struct dst_entry *dst, struct sock *sk,
+			   struct sk_buff *skb)
 {
 	struct rtable *rt;
 	struct flowi4 fl4;
@@ -1202,11 +1209,11 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
 	struct rtable *orig, *prev, **p;
 	bool ret = true;
 
-	if (rt_is_input_route(rt)) {
+	if (rt_is_input_route(rt))
 		p = (struct rtable **)&nh->nh_rth_input;
-	} else {
+	else
 		p = (struct rtable **)__this_cpu_ptr(nh->nh_pcpu_rth_output);
-	}
+
 	orig = *p;
 
 	prev = cmpxchg(p, orig, rt);
@@ -1359,17 +1366,17 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 #endif
 	rth->dst.output = ip_rt_bug;
 
-	rth->rt_genid	= rt_genid(dev_net(dev));
-	rth->rt_flags	= RTCF_MULTICAST;
-	rth->rt_type	= RTN_MULTICAST;
-	rth->rt_is_input= 1;
-	rth->rt_iif	= 0;
-	rth->rt_pmtu	= 0;
-	rth->rt_gateway	= 0;
+	rth->rt_genid = rt_genid(dev_net(dev));
+	rth->rt_flags = RTCF_MULTICAST;
+	rth->rt_type = RTN_MULTICAST;
+	rth->rt_is_input = 1;
+	rth->rt_iif = 0;
+	rth->rt_pmtu = 0;
+	rth->rt_gateway = 0;
 	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
 	if (our) {
-		rth->dst.input= ip_local_deliver;
+		rth->dst.input = ip_local_deliver;
 		rth->rt_flags |= RTCF_LOCAL;
 	}
 
@@ -1488,8 +1495,8 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->rt_flags = flags;
 	rth->rt_type = res->type;
 	rth->rt_is_input = 1;
-	rth->rt_iif 	= 0;
-	rth->rt_pmtu	= 0;
+	rth->rt_iif = 0;
+	rth->rt_pmtu = 0;
 	rth->rt_gateway	= 0;
 	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
@@ -1649,25 +1656,25 @@ local_input:
 	if (!rth)
 		goto e_nobufs;
 
-	rth->dst.input= ip_local_deliver;
-	rth->dst.output= ip_rt_bug;
+	rth->dst.input = ip_local_deliver;
+	rth->dst.output = ip_rt_bug;
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	rth->dst.tclassid = itag;
 #endif
 
 	rth->rt_genid = rt_genid(net);
-	rth->rt_flags 	= flags|RTCF_LOCAL;
-	rth->rt_type	= res.type;
+	rth->rt_flags = flags|RTCF_LOCAL;
+	rth->rt_type = res.type;
 	rth->rt_is_input = 1;
-	rth->rt_iif	= 0;
-	rth->rt_pmtu	= 0;
+	rth->rt_iif = 0;
+	rth->rt_pmtu = 0;
 	rth->rt_gateway	= 0;
 	rth->rt_uses_gateway = 0;
 	INIT_LIST_HEAD(&rth->rt_uncached);
 	if (res.type == RTN_UNREACHABLE) {
-		rth->dst.input= ip_error;
-		rth->dst.error= -err;
-		rth->rt_flags 	&= ~RTCF_LOCAL;
+		rth->dst.input = ip_error;
+		rth->dst.error = -err;
+		rth->rt_flags &= ~RTCF_LOCAL;
 	}
 	if (do_cache)
 		rt_cache_route(&FIB_RES_NH(res), rth);
@@ -1772,7 +1779,8 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 		return ERR_PTR(-EINVAL);
 
 	if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev)))
-		if (ipv4_is_loopback(fl4->saddr) && !(dev_out->flags & IFF_LOOPBACK))
+		if (ipv4_is_loopback(fl4->saddr) &&
+		    !(dev_out->flags & IFF_LOOPBACK))
 			return ERR_PTR(-EINVAL);
 
 	if (ipv4_is_lbcast(fl4->daddr))
@@ -1919,7 +1927,9 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 		if (fl4->flowi4_oif == 0 &&
 		    (ipv4_is_multicast(fl4->daddr) ||
 		     ipv4_is_lbcast(fl4->daddr))) {
-			/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
+			/* It is equivalent to
+			 * inet_addr_type(saddr) == RTN_LOCAL
+			 */
 			dev_out = __ip_dev_find(net, fl4->saddr, false);
 			if (dev_out == NULL)
 				goto out;
@@ -1944,7 +1954,9 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 		}
 
 		if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
-			/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
+			/* It is equivalent to
+			 * inet_addr_type(saddr) == RTN_LOCAL
+			 */
 			if (!__ip_dev_find(net, fl4->saddr, false))
 				goto out;
 		}
@@ -1972,7 +1984,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 		if (fl4->saddr) {
 			if (ipv4_is_multicast(fl4->daddr))
 				fl4->saddr = inet_select_addr(dev_out, 0,
-							      fl4->flowi4_scope);
+							     fl4->flowi4_scope);
 			else if (!fl4->daddr)
 				fl4->saddr = inet_select_addr(dev_out, 0,
 							      RT_SCOPE_HOST);
@@ -2061,7 +2073,8 @@ out:
 }
 EXPORT_SYMBOL_GPL(__ip_route_output_key);
 
-static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst, u32 cookie)
+static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst,
+						  u32 cookie)
 {
 	return NULL;
 }
@@ -2073,7 +2086,8 @@ static unsigned int ipv4_blackhole_mtu(const struct dst_entry *dst)
 	return mtu ? : dst->dev->mtu;
 }
 
-static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk,
+static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst,
+					  struct sock *sk,
 					  struct sk_buff *skb, u32 mtu)
 {
 }
@@ -2101,7 +2115,8 @@ static struct dst_ops ipv4_dst_blackhole_ops = {
 	.neigh_lookup		=	ipv4_neigh_lookup,
 };
 
-struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_orig)
+struct dst_entry *ipv4_blackhole_route(struct net *net,
+				       struct dst_entry *dst_orig)
 {
 	struct rtable *ort = (struct rtable *) dst_orig;
 	struct rtable *rt;
@@ -2265,7 +2280,8 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
-static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void *arg)
+static int inet_rtm_getroute(struct sk_buff *in_skb,
+			     struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(in_skb->sk);
 	struct rtmsg *rtm;
@@ -2297,7 +2313,9 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void
 	skb_reset_mac_header(skb);
 	skb_reset_network_header(skb);
 
-	/* Bugfix: need to give ip_route_input enough of an IP header to not gag. */
+	/* Bugfix: need to give ip_route_input enough
+	 * of an IP header to not gag.
+	 */
 	ip_hdr(skb)->protocol = IPPROTO_ICMP;
 	skb_reserve(skb, MAX_HEADER + sizeof(struct iphdr));
 
@@ -2596,7 +2614,8 @@ int __init ip_rt_init(void)
 	int rc = 0;
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
-	ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct), __alignof__(struct ip_rt_acct));
+	ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct),
+				    __alignof__(struct ip_rt_acct));
 	if (!ip_rt_acct)
 		panic("IP: failed to allocate ip_rt_acct\n");
 #endif
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1ca2536..12fadb2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -45,7 +45,7 @@
  *					escape still
  *		Alan Cox	:	Fixed another acking RST frame bug.
  *					Should stop LAN workplace lockups.
- *		Alan Cox	: 	Some tidyups using the new skb list
+ *		Alan Cox	:	Some tidyups using the new skb list
  *					facilities
  *		Alan Cox	:	sk->keepopen now seems to work
  *		Alan Cox	:	Pulls options out correctly on accepts
@@ -160,7 +160,8 @@
  *					generates them.
  *		Alan Cox	:	Cache last socket.
  *		Alan Cox	:	Per route irtt.
- *		Matt Day	:	poll()->select() match BSD precisely on error
+ *		Matt Day	:	poll()->select() match BSD precisely
+ *					on error
  *		Alan Cox	:	New buffers
  *		Marc Tamsky	:	Various sk->prot->retransmits and
  *					sk->retransmits misupdating fixed.
@@ -168,9 +169,9 @@
  *					and TCP syn retries gets used now.
  *		Mark Yarvis	:	In tcp_read_wakeup(), don't send an
  *					ack if state is TCP_CLOSED.
- *		Alan Cox	:	Look up device on a retransmit - routes may
- *					change. Doesn't yet cope with MSS shrink right
- *					but it's a start!
+ *		Alan Cox	:	Look up device on a retransmit - routes
+ *					may change. Doesn't yet cope with MSS
+ *					shrink right but it's a start!
  *		Marc Tamsky	:	Closing in closing fixes.
  *		Mike Shaver	:	RFC1122 verifications.
  *		Alan Cox	:	rcv_saddr errors.
@@ -199,7 +200,7 @@
  *					tcp_do_sendmsg to avoid burstiness.
  *		Eric Schenk	:	Fix fast close down bug with
  *					shutdown() followed by close().
- *		Andi Kleen 	:	Make poll agree with SIGIO
+ *		Andi Kleen	:	Make poll agree with SIGIO
  *	Salvatore Sanfilippo	:	Support SO_LINGER with linger == 1 and
  *					lingertime == 0 (RFC 793 ABORT Call)
  *	Hirokazu Takahashi	:	Use copy_from_user() instead of
@@ -268,6 +269,7 @@
 #include <linux/crypto.h>
 #include <linux/time.h>
 #include <linux/slab.h>
+#include <linux/uaccess.h>
 
 #include <net/icmp.h>
 #include <net/inet_common.h>
@@ -277,7 +279,6 @@
 #include <net/netdma.h>
 #include <net/sock.h>
 
-#include <asm/uaccess.h>
 #include <asm/ioctls.h>
 
 int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
@@ -286,22 +287,20 @@ struct percpu_counter tcp_orphan_count;
 EXPORT_SYMBOL_GPL(tcp_orphan_count);
 
 int sysctl_tcp_wmem[3] __read_mostly;
-int sysctl_tcp_rmem[3] __read_mostly;
+EXPORT_SYMBOL(sysctl_tcp_wmem);
 
+int sysctl_tcp_rmem[3] __read_mostly;
 EXPORT_SYMBOL(sysctl_tcp_rmem);
-EXPORT_SYMBOL(sysctl_tcp_wmem);
 
 atomic_long_t tcp_memory_allocated;	/* Current allocated memory. */
 EXPORT_SYMBOL(tcp_memory_allocated);
 
-/*
- * Current number of TCP sockets.
+/* Current number of TCP sockets.
  */
 struct percpu_counter tcp_sockets_allocated;
 EXPORT_SYMBOL(tcp_sockets_allocated);
 
-/*
- * TCP splice context
+/* TCP splice context
  */
 struct tcp_splice_state {
 	struct pipe_inode_info *pipe;
@@ -309,8 +308,7 @@ struct tcp_splice_state {
 	unsigned int flags;
 };
 
-/*
- * Pressure flag: try to collapse.
+/* Pressure flag: try to collapse.
  * Technical note: it is used by multiple contexts non atomically.
  * All the __sk_mem_schedule() is of this nature: accounting
  * is strict, actions are advisory and have some latency.
@@ -430,8 +428,7 @@ void tcp_init_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
-/*
- *	Wait for a TCP event.
+/*	Wait for a TCP event.
  *
  *	Note that we don't need to lock the socket, as the upper poll layers
  *	take care of normal races (between the test and the event) and we don't
@@ -454,8 +451,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 
 	mask = 0;
 
-	/*
-	 * POLLHUP is certainly not done right. But poll() doesn't
+	/* POLLHUP is certainly not done right. But poll() doesn't
 	 * have a notion of HUP in just one direction, and for a
 	 * socket the read side is more interesting.
 	 *
@@ -498,7 +494,8 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 
 		/* Potential race condition. If read of tp below will
 		 * escape above sk->sk_state, we can be illegally awaken
-		 * in SYN_* states. */
+		 * in SYN_* states.
+		 */
 		if (tp->rcv_nxt - tp->copied_seq >= target)
 			mask |= POLLIN | POLLRDNORM;
 
@@ -509,14 +506,15 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 				set_bit(SOCK_ASYNC_NOSPACE,
 					&sk->sk_socket->flags);
 				set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-
-				/* Race breaker. If space is freed after
-				 * wspace test but before the flags are set,
-				 * IO signal will be lost.
-				 */
-				if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk))
-					mask |= POLLOUT | POLLWRNORM;
 			}
+
+			/* Race breaker. If space is freed after
+			 * wspace test but before the flags are set,
+			 * IO signal will be lost.
+			 */
+			if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)
+			 && sk_stream_wspace(sk) >= sk_stream_min_wspace(sk))
+				mask |= POLLOUT | POLLWRNORM;
 		} else
 			mask |= POLLOUT | POLLWRNORM;
 
@@ -634,7 +632,7 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now,
 
 		tcp_mark_urg(tp, flags);
 		__tcp_push_pending_frames(sk, mss_now,
-					  (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
+				(flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
 	}
 }
 
@@ -839,6 +837,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 	int err;
 	ssize_t copied;
 	long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
+	int ass_res = 0;
 
 	/* Wait for a connection to finish. One exception is TCP Fast Open
 	 * (passive side) where data is allowed to be sent before a connection
@@ -846,7 +845,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 	 */
 	if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
 	    !tcp_passive_fastopen(sk)) {
-		if ((err = sk_stream_wait_connect(sk, &timeo)) != 0)
+		ass_res = (err = sk_stream_wait_connect(sk, &timeo));
+		if (ass_res != 0)
 			goto out_err;
 	}
 
@@ -864,7 +864,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 		int copy, i;
 		bool can_coalesce;
 
-		if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
+		ass_res = (copy = size_goal - skb->len);
+		if (!tcp_send_head(sk) || ass_res <= 0) {
 new_segment:
 			if (!sk_stream_memory_free(sk))
 				goto wait_for_sndbuf;
@@ -911,7 +912,9 @@ new_segment:
 
 		copied += copy;
 		offset += copy;
-		if (!(size -= copy))
+
+		ass_res = (size -= copy);
+		if (!ass_res)
 			goto out;
 
 		if (skb->len < size_goal || (flags & MSG_OOB))
@@ -929,7 +932,8 @@ wait_for_sndbuf:
 wait_for_memory:
 		tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
 
-		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
+		ass_res = (err = sk_stream_wait_memory(sk, &timeo));
+		if (ass_res != 0)
 			goto do_error;
 
 		mss_now = tcp_send_mss(sk, &size_goal, flags);
@@ -1029,6 +1033,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int mss_now = 0, size_goal, copied_syn = 0, offset = 0;
 	bool sg;
 	long timeo;
+	int ass_res = 0;
 
 	lock_sock(sk);
 
@@ -1050,7 +1055,8 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	 */
 	if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
 	    !tcp_passive_fastopen(sk)) {
-		if ((err = sk_stream_wait_connect(sk, &timeo)) != 0)
+		ass_res = (err = sk_stream_wait_connect(sk, &timeo));
+		if (ass_res != 0)
 			goto do_error;
 	}
 
@@ -1099,7 +1105,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		}
 
 		while (seglen > 0) {
-			int copy = 0;
+			int copy = 0, ass_res = 0;
 			int max = size_goal;
 
 			skb = tcp_write_queue_tail(sk);
@@ -1123,8 +1129,7 @@ new_segment:
 				if (!skb)
 					goto wait_for_memory;
 
-				/*
-				 * Check whether we can use HW checksum.
+				/* Check whether we can use HW checksum.
 				 */
 				if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
 					skb->ip_summed = CHECKSUM_PARTIAL;
@@ -1162,7 +1167,8 @@ new_segment:
 					merge = false;
 				}
 
-				copy = min_t(int, copy, pfrag->size - pfrag->offset);
+				copy = min_t(int, copy,
+					pfrag->size - pfrag->offset);
 
 				if (!sk_wmem_schedule(sk, copy))
 					goto wait_for_memory;
@@ -1176,7 +1182,8 @@ new_segment:
 
 				/* Update the skb. */
 				if (merge) {
-					skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
+					skb_frag_size_add(
+					  &skb_shinfo(skb)->frags[i - 1], copy);
 				} else {
 					skb_fill_page_desc(skb, i, pfrag->page,
 							   pfrag->offset, copy);
@@ -1194,15 +1201,19 @@ new_segment:
 
 			from += copy;
 			copied += copy;
-			if ((seglen -= copy) == 0 && iovlen == 0)
+			ass_res = (seglen -= copy);
+			if (ass_res == 0 && iovlen == 0)
 				goto out;
 
-			if (skb->len < max || (flags & MSG_OOB) || unlikely(tp->repair))
+			if (skb->len < max ||
+			   (flags & MSG_OOB) ||
+			   unlikely(tp->repair))
 				continue;
 
 			if (forced_push(tp)) {
 				tcp_mark_push(tp, skb);
-				__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
+				__tcp_push_pending_frames(sk, mss_now,
+					TCP_NAGLE_PUSH);
 			} else if (skb == tcp_send_head(sk))
 				tcp_push_one(sk, mss_now);
 			continue;
@@ -1211,9 +1222,11 @@ wait_for_sndbuf:
 			set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 wait_for_memory:
 			if (copied)
-				tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
+				tcp_push(sk, flags & ~MSG_MORE,
+					mss_now, TCP_NAGLE_PUSH);
 
-			if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
+			ass_res = (err = sk_stream_wait_memory(sk, &timeo));
+			if (ass_res != 0)
 				goto do_error;
 
 			mss_now = tcp_send_mss(sk, &size_goal, flags);
@@ -1246,8 +1259,7 @@ out_err:
 }
 EXPORT_SYMBOL(tcp_sendmsg);
 
-/*
- *	Handle reading urgent data. BSD has very simple semantics for
+/*	Handle reading urgent data. BSD has very simple semantics for
  *	this, no blocking and very strange errors 8)
  */
 
@@ -1333,7 +1345,8 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
 	if (inet_csk_ack_scheduled(sk)) {
 		const struct inet_connection_sock *icsk = inet_csk(sk);
 		   /* Delayed ACKs frequently hit locked sockets during bulk
-		    * receive. */
+		    * receive.
+		    */
 		if (icsk->icsk_ack.blocked ||
 		    /* Once-per-two-segments ACK was not sent by tcp_input.c */
 		    tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||
@@ -1366,7 +1379,8 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
 
 			/* Send ACK now, if this read freed lots of space
 			 * in our buffer. Certainly, new_window is new window.
-			 * We can advertise it now, if it is not less than current one.
+			 * We can advertise it now, if it is not less than
+			 * current one.
 			 * "Lots" means "at least twice" here.
 			 */
 			if (new_window && new_window >= 2 * rcv_window_now)
@@ -1385,7 +1399,8 @@ static void tcp_prequeue_process(struct sock *sk)
 	NET_INC_STATS_USER(sock_net(sk), LINUX_MIB_TCPPREQUEUED);
 
 	/* RX process wants to run with disabled BHs, though it is not
-	 * necessary */
+	 * necessary
+	 */
 	local_bh_disable();
 	while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
 		sk_backlog_rcv(sk, skb);
@@ -1445,8 +1460,7 @@ static inline struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 	return NULL;
 }
 
-/*
- * This routine provides an alternative to tcp_recvmsg() for routines
+/* This routine provides an alternative to tcp_recvmsg() for routines
  * that would like to handle copying from skbuffs directly in 'sendfile'
  * fashion.
  * Note:
@@ -1526,8 +1540,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 }
 EXPORT_SYMBOL(tcp_read_sock);
 
-/*
- *	This routine copies from a sock struct into the user buffer.
+/*	This routine copies from a sock struct into the user buffer.
  *
  *	Technical note: in 2.3 we work on _locked_ socket, so that
  *	tricks with *seq access order and skb->users are not required.
@@ -1610,12 +1623,15 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	do {
 		u32 offset;
 
-		/* Are we at urgent data? Stop if we have read anything or have SIGURG pending. */
+		/* Are we at urgent data? Stop if we have read
+		 * anything or have SIGURG pending.
+		 */
 		if (tp->urg_data && tp->urg_seq == *seq) {
 			if (copied)
 				break;
 			if (signal_pending(current)) {
-				copied = timeo ? sock_intr_errno(timeo) : -EAGAIN;
+				copied = timeo ?
+					sock_intr_errno(timeo) : -EAGAIN;
 				break;
 			}
 		}
@@ -1744,7 +1760,8 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 				tcp_service_net_dma(sk, true);
 				tcp_cleanup_rbuf(sk, copied);
 			} else
-				dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+				dma_async_memcpy_issue_pending(
+					tp->ucopy.dma_chan);
 		}
 #endif
 		if (copied >= target) {
@@ -1760,12 +1777,15 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 #endif
 
 		if (user_recv) {
-			int chunk;
+			int chunk, ass_res = 0;
 
 			/* __ Restore normal policy in scheduler __ */
 
-			if ((chunk = len - tp->ucopy.len) != 0) {
-				NET_ADD_STATS_USER(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG, chunk);
+			ass_res = (chunk = len - tp->ucopy.len);
+			if (ass_res != 0) {
+				NET_ADD_STATS_USER(sock_net(sk),
+					LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG,
+					chunk);
 				len -= chunk;
 				copied += chunk;
 			}
@@ -1775,8 +1795,11 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 do_prequeue:
 				tcp_prequeue_process(sk);
 
-				if ((chunk = len - tp->ucopy.len) != 0) {
-					NET_ADD_STATS_USER(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE, chunk);
+				ass_res = (chunk = len - tp->ucopy.len);
+				if (ass_res != 0) {
+					NET_ADD_STATS_USER(sock_net(sk),
+					    LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE,
+					    chunk);
 					len -= chunk;
 					copied += chunk;
 				}
@@ -1791,7 +1814,7 @@ do_prequeue:
 		}
 		continue;
 
-	found_ok_skb:
+found_ok_skb:
 		/* Ok so how much can we use? */
 		used = skb->len - offset;
 		if (len < used)
@@ -1800,19 +1823,18 @@ do_prequeue:
 		/* Do we have urgent data here? */
 		if (tp->urg_data) {
 			u32 urg_offset = tp->urg_seq - *seq;
-			if (urg_offset < used) {
-				if (!urg_offset) {
-					if (!sock_flag(sk, SOCK_URGINLINE)) {
-						++*seq;
-						urg_hole++;
-						offset++;
-						used--;
-						if (!used)
-							goto skip_copy;
-					}
-				} else
-					used = urg_offset;
+			if (urg_offset < used && !urg_offset) {
+				if (!sock_flag(sk, SOCK_URGINLINE)) {
+					++*seq;
+					urg_hole++;
+					offset++;
+					used--;
+					if (!used)
+						goto skip_copy;
+				}
 			}
+			if (urg_offset < used && urg_offset)
+				used = urg_offset;
 		}
 
 		if (!(flags & MSG_TRUNC)) {
@@ -1821,7 +1843,9 @@ do_prequeue:
 				tp->ucopy.dma_chan = net_dma_find_channel();
 
 			if (tp->ucopy.dma_chan) {
-				tp->ucopy.dma_cookie = dma_skb_copy_datagram_iovec(
+				tp->ucopy.dma_cookie =
+					dma_skb_copy_datagram_iovec(
+
 					tp->ucopy.dma_chan, skb, offset,
 					msg->msg_iov, used,
 					tp->ucopy.pinned_list);
@@ -1837,7 +1861,8 @@ do_prequeue:
 					break;
 				}
 
-				dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+				dma_async_memcpy_issue_pending(
+					tp->ucopy.dma_chan);
 
 				if ((offset + used) == skb->len)
 					copied_early = true;
@@ -1878,7 +1903,7 @@ skip_copy:
 		}
 		continue;
 
-	found_fin_ok:
+found_fin_ok:
 		/* Process the FIN. */
 		++*seq;
 		if (!(flags & MSG_PEEK)) {
@@ -1890,14 +1915,17 @@ skip_copy:
 
 	if (user_recv) {
 		if (!skb_queue_empty(&tp->ucopy.prequeue)) {
-			int chunk;
+			int chunk, ass_res = 0;
 
 			tp->ucopy.len = copied > 0 ? len : 0;
 
 			tcp_prequeue_process(sk);
 
-			if (copied > 0 && (chunk = len - tp->ucopy.len) != 0) {
-				NET_ADD_STATS_USER(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE, chunk);
+			ass_res = (chunk = len - tp->ucopy.len);
+			if (copied > 0 && ass_res != 0) {
+				NET_ADD_STATS_USER(sock_net(sk),
+					LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE,
+					chunk);
 				len -= chunk;
 				copied += chunk;
 			}
@@ -1971,13 +1999,13 @@ void tcp_set_state(struct sock *sk, int state)
 	sk->sk_state = state;
 
 #ifdef STATE_TRACE
-	SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk, statename[oldstate], statename[state]);
+	SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk,
+		statename[oldstate], statename[state]);
 #endif
 }
 EXPORT_SYMBOL_GPL(tcp_set_state);
 
-/*
- *	State processing on a close. This implements the state shift for
+/*	State processing on a close. This implements the state shift for
  *	sending our FIN frame. Note that we only send a FIN for some
  *	states. A shutdown() may have already sent the FIN, or we may be
  *	closed.
@@ -2009,8 +2037,7 @@ static int tcp_close_state(struct sock *sk)
 	return next & TCP_ACTION_FIN;
 }
 
-/*
- *	Shutdown the sending side of a connection. Much like close except
+/*	Shutdown the sending side of a connection. Much like close except
  *	that we don't receive shut down or sock_set_flag(sk, SOCK_DEAD).
  */
 
@@ -2125,7 +2152,7 @@ void tcp_close(struct sock *sk, long timeout)
 		 * required by specs (TCP_ESTABLISHED, TCP_CLOSE_WAIT, when
 		 * they look as CLOSING or LAST_ACK for Linux)
 		 * Probably, I missed some more holelets.
-		 * 						--ANK
+		 *                                             --ANK
 		 * XXX (TFO) - To start off we don't support SYN+ACK+FIN
 		 * in a single packet! (May consider it later but will
 		 * probably need API support or TCP_CORK SYN-ACK until
@@ -2235,6 +2262,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	int err = 0;
+	int ass_res = 0;
 	int old_state = sk->sk_state;
 
 	if (old_state != TCP_CLOSE)
@@ -2272,7 +2300,8 @@ int tcp_disconnect(struct sock *sk, int flags)
 	sk->sk_shutdown = 0;
 	sock_reset_flag(sk, SOCK_DONE);
 	tp->srtt = 0;
-	if ((tp->write_seq += tp->max_window + 2) == 0)
+	ass_res = (tp->write_seq += tp->max_window + 2);
+	if (ass_res == 0)
 		tp->write_seq = 1;
 	icsk->icsk_backoff = 0;
 	tp->snd_cwnd = 2;
@@ -2358,8 +2387,7 @@ static int tcp_repair_options_est(struct tcp_sock *tp,
 	return 0;
 }
 
-/*
- *	Socket option code for TCP.
+/*	Socket option code for TCP.
  */
 static int do_tcp_setsockopt(struct sock *sk, int level,
 		int optname, char __user *optval, unsigned int optlen)
@@ -2491,7 +2519,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 	case TCP_MAXSEG:
 		/* Values greater than interface MTU won't take effect. However
 		 * at the point when this call is done we typically don't yet
-		 * know which interface is going to be used */
+		 * know which interface is going to be used
+		 */
+
 		if (val < TCP_MIN_MSS || val > MAX_TCP_WINDOW) {
 			err = -EINVAL;
 			break;
@@ -2509,6 +2539,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 			 * an explicit push, which overrides even TCP_CORK
 			 * for currently queued segments.
 			 */
+
 			tp->nonagle |= TCP_NAGLE_OFF|TCP_NAGLE_PUSH;
 			tcp_push_pending_frames(sk);
 		} else {
@@ -2786,7 +2817,8 @@ void tcp_get_info(const struct sock *sk, struct tcp_info *info)
 	info->tcpi_fackets = tp->fackets_out;
 
 	info->tcpi_last_data_sent = jiffies_to_msecs(now - tp->lsndtime);
-	info->tcpi_last_data_recv = jiffies_to_msecs(now - icsk->icsk_ack.lrcvtime);
+	info->tcpi_last_data_recv =
+		 jiffies_to_msecs(now - icsk->icsk_ack.lrcvtime);
 	info->tcpi_last_ack_recv = jiffies_to_msecs(now - tp->rcv_tstamp);
 
 	info->tcpi_pmtu = icsk->icsk_pmtu_cookie;
@@ -3378,12 +3410,12 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
 }
 EXPORT_SYMBOL(tcp_md5_hash_skb_data);
 
-int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *key)
+int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *k)
 {
 	struct scatterlist sg;
 
-	sg_init_one(&sg, key->key, key->keylen);
-	return crypto_hash_update(&hp->md5_desc, &sg, key->keylen);
+	sg_init_one(&sg, k->key, k->keylen);
+	return crypto_hash_update(&hp->md5_desc, &sg, k->keylen);
 }
 EXPORT_SYMBOL(tcp_md5_hash_key);
 
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net-next V4 04/13] bridge: Verify that a vlan is allowed to egress on give port
From: Shmulik Ladkani @ 2012-12-20 14:28 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <1355939304-21804-5-git-send-email-vyasevic@redhat.com>

Hi Vlad,

On Wed, 19 Dec 2012 12:48:15 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
>  /* Don't forward packets to originating port or forwarding diasabled */
>  static inline int should_deliver(const struct net_bridge_port *p,
>  				 const struct sk_buff *skb)
>  {
>  	return (((p->flags & BR_HAIRPIN_MODE) || skb->dev != p->dev) &&
> +		br_allowed_egress(p, skb) &&
>  		p->state == BR_STATE_FORWARDING);
>  }

This should be also encorporated into 'br_pass_frame_up' somehow.

Egress permission when leaving the bridge towards IP stack ("egress"
on the "bridge master port" from bridging point-of-view) should be
validated according to master port's membership.

Regards,
Shmulik

^ permalink raw reply

* [PATCH 1/3] iproute2: distinguish permanent and temporary mdb entries
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, bridge, Cong Wang

This patch adds a flag to mdb entries so that we can distinguish
permanent entries with temporary ones.

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 bridge/mdb.c              |   24 +++++++++++++++---------
 include/linux/if_bridge.h |    3 +++
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 121ce9c..6217c5f 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -28,7 +28,7 @@ int filter_index;
 
 static void usage(void)
 {
-	fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP\n");
+	fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp]\n");
 	fprintf(stderr, "       bridge mdb {show} [ dev DEV ]\n");
 	exit(-1);
 }
@@ -53,13 +53,15 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e)
 	SPRINT_BUF(abuf);
 
 	if (e->addr.proto == htons(ETH_P_IP))
-		fprintf(f, "bridge %s port %s group %s\n", ll_index_to_name(ifindex),
+		fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
 			ll_index_to_name(e->ifindex),
-			inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)));
+			inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)),
+			(e->state & MDB_PERMANENT) ? "permanent" : "temp");
 	else
-		fprintf(f, "bridge %s port %s group %s\n", ll_index_to_name(ifindex),
+		fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
 			ll_index_to_name(e->ifindex),
-			inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)));
+			inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)),
+			(e->state & MDB_PERMANENT) ? "permanent" : "temp");
 }
 
 static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr)
@@ -179,11 +181,15 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
 		} else if (strcmp(*argv, "grp") == 0) {
 			NEXT_ARG();
 			grp = *argv;
+		} else if (strcmp(*argv, "port") == 0) {
+			NEXT_ARG();
+			p = *argv;
+		} else if (strcmp(*argv, "permanent") == 0) {
+			if (cmd == RTM_NEWMDB)
+				entry.state |= MDB_PERMANENT;
+		} else if (strcmp(*argv, "temp") == 0) {
+			;/* nothing */
 		} else {
-			if (strcmp(*argv, "port") == 0) {
-				NEXT_ARG();
-				p = *argv;
-			}
 			if (matches(*argv, "help") == 0)
 				usage();
 		}
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b3b6a67..aac8b8c 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -163,6 +163,9 @@ struct br_port_msg {
 
 struct br_mdb_entry {
 	__u32 ifindex;
+#define MDB_TEMPORARY 0
+#define MDB_PERMANENT 1
+	__u8 state;
 	struct {
 		union {
 			__be32	ip4;
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 2/3] iproute2: update help info of bridge command
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
  To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger
In-Reply-To: <1356013915-20835-1-git-send-email-amwang@redhat.com>

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 bridge/bridge.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index 1fcd365..1d59a1e 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -27,7 +27,7 @@ static void usage(void)
 {
 	fprintf(stderr,
 "Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
-"where  OBJECT := { fdb |  monitor }\n"
+"where  OBJECT := { fdb |  mdb | monitor }\n"
 "       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails]\n" );
 	exit(-1);
 }
-- 
1.7.7.6

^ permalink raw reply related

* [PATCH 3/3] iproute2: make `bridge mdb` output consistent with input
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
  To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger
In-Reply-To: <1356013915-20835-1-git-send-email-amwang@redhat.com>

bridge -> dev
group -> grp

Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
 bridge/mdb.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 6217c5f..81d479b 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -53,12 +53,12 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e)
 	SPRINT_BUF(abuf);
 
 	if (e->addr.proto == htons(ETH_P_IP))
-		fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
+		fprintf(f, "dev %s port %s grp %s %s\n", ll_index_to_name(ifindex),
 			ll_index_to_name(e->ifindex),
 			inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)),
 			(e->state & MDB_PERMANENT) ? "permanent" : "temp");
 	else
-		fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
+		fprintf(f, "dev %s port %s grp %s %s\n", ll_index_to_name(ifindex),
 			ll_index_to_name(e->ifindex),
 			inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)),
 			(e->state & MDB_PERMANENT) ? "permanent" : "temp");
-- 
1.7.7.6

^ permalink raw reply related

* Re: [Xen-devel] [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 14:58 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Eric Dumazet, netdev@vger.kernel.org, annie li,
	xen-devel@lists.xensource.com, Ian Campbell,
	Konrad Rzeszutek Wilk
In-Reply-To: <1457826869.20121220152326@eikelenboom.it>


Thursday, December 20, 2012, 3:23:26 PM, you wrote:


> Thursday, December 20, 2012, 1:51:39 PM, you wrote:


>> Wednesday, December 19, 2012, 5:17:49 PM, you wrote:

>>> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:

>>>> Hi Ian,
>>>> 
>>>> It ran overnight and i haven't seen the warn_once trigger.
>>>> (but i also didn't with the previous patch)
>>>> 

>>> As I said, the miminum value to not trigger the warning was what Ian
>>> patch was doing, but it was still a not accurate estimation.

>>> Doing the real accounting might trigger slow transferts, or dropped
>>> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.

>>> So the real question was : If accounting for full pages, is your
>>> applications run as smooth as before, with no huge performance
>>> regression ?

>> Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.

>> I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
>> So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..

>> - The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
>> - The first variant (current code) seems to always substract from the truesize for small packets.
>> - The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
>> - The third variant seems to be a pretty wasteful estimation.

>> So the last variant seems to be rather wasteful, and the second one the most accurate so far.

>> Eric:
>>      From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
>>      When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?



>> [  116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [  117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
>> [  117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
>> [  117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
>> [  117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
>> [  117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
>> [  117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
>> [  117.539403] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
>> [  117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
>> [  117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
>> [  121.981917] net_ratelimit: 27 callbacks suppressed
>> [  121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [  126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)



>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index c26e28b..8833e38 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>>         struct sk_buff_head tmpq;
>>         unsigned long flags;
>>         int err;
>> +       int tsz,len;

>>         spin_lock(&np->rx_lock);

>> @@ -1037,9 +1038,22 @@ err:
>>                  * receive throughout using the standard receive
>>                  * buffer size was cut by 25%(!!!).
>>                  */
>> -               skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> +
>> +
>> +
>> +
>> +                tsz = skb->truesize;
>> +                len = skb->len;
>> +                /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
>> +                skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>>                 skb->len += skb->data_len;

>> +               net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
>> +                        skb->dev->name, skb->dev->mtu, skb->data_len, len,  skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
>> +                        skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
>> +                        skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
>> +                        PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
>> +
>>                 if (rx->flags & XEN_NETRXF_csum_blank)
>>                         skb->ip_summed = CHECKSUM_PARTIAL;
>>                 else if (rx->flags & XEN_NETRXF_data_validated)
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 3ab989b..6d0cd86 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,

>>         WARN_ON_ONCE(delta < len);

>> +       if(delta < len) {
>> +               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> +                        to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> +       }
>> +
+       if (delta >>> len && delta - len > 100) {
>> +               net_warn_ratelimited("to: %s from: %s  skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> +                        to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> +       }
>> +
>>         memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
>>                skb_shinfo(from)->frags,
>>                skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));



> Ok i succeeded in triggering the warn_on_once, but it seems the extra debug info from netfront was just rate limited away for the offending packet :(

> Dec 20 15:17:33 media kernel: [  393.464062] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [  393.464438] eth0: mtu:1500 data_len:762 len before:0 len after:762 truesize before:896 truesize after:1402 nr_frags:1 variant1:506(1402) variant2:506(1402) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [  393.465083] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [  393.466114] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [  393.467336] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  394.940211] ------------[ cut here ]------------
> Dec 20 15:17:35 media kernel: [  394.940259] WARNING: at net/core/skbuff.c:3472 skb_try_coalesce+0x3fc/0x470()
> Dec 20 15:17:35 media kernel: [  394.940282] Modules linked in:
> Dec 20 15:17:35 media kernel: [  394.940306] Pid: 2632, comm: glusterfs Not tainted 3.7.0-rc0-20121220-netfrontdebug1 #1
> Dec 20 15:17:35 media kernel: [  394.940330] Call Trace:
> Dec 20 15:17:35 media kernel: [  394.940343]  <IRQ>  [<ffffffff8106889a>] warn_slowpath_common+0x7a/0xb0
> Dec 20 15:17:35 media kernel: [  394.940384]  [<ffffffff810688e5>] warn_slowpath_null+0x15/0x20
> Dec 20 15:17:35 media kernel: [  394.940409]  [<ffffffff8184298c>] skb_try_coalesce+0x3fc/0x470
> Dec 20 15:17:35 media kernel: [  394.940434]  [<ffffffff818fb049>] tcp_try_coalesce+0x69/0xc0
> Dec 20 15:17:35 media kernel: [  394.940458]  [<ffffffff818fb0f4>] tcp_queue_rcv+0x54/0x100
> Dec 20 15:17:35 media kernel: [  394.940481]  [<ffffffff8190029f>] ? tcp_mtup_init+0x2f/0x90
> Dec 20 15:17:35 media kernel: [  394.940504]  [<ffffffff818ffbdb>] tcp_rcv_established+0x2bb/0x6a0
> Dec 20 15:17:35 media kernel: [  394.940528]  [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [  394.940551]  [<ffffffff81907985>] tcp_v4_do_rcv+0x135/0x480
> Dec 20 15:17:35 media kernel: [  394.940576]  [<ffffffff819b3532>] ? _raw_spin_lock_nested+0x42/0x50
> Dec 20 15:17:35 media kernel: [  394.940600]  [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [  394.940623]  [<ffffffff8190862d>] tcp_v4_rcv+0x95d/0xb10
> Dec 20 15:17:35 media kernel: [  394.940666]  [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [  394.940694]  [<ffffffff818e4d6a>] ip_local_deliver_finish+0x11a/0x230
> Dec 20 15:17:35 media kernel: [  394.940720]  [<ffffffff818e4c95>] ? ip_local_deliver_finish+0x45/0x230
> Dec 20 15:17:35 media kernel: [  394.940745]  [<ffffffff818e4eb8>] ip_local_deliver+0x38/0x80
> Dec 20 15:17:35 media kernel: [  394.940784]  [<ffffffff818e447a>] ip_rcv_finish+0x15a/0x630
> Dec 20 15:17:35 media kernel: [  394.940807]  [<ffffffff818e4b68>] ip_rcv+0x218/0x300
> Dec 20 15:17:35 media kernel: [  394.940829]  [<ffffffff8184bf2d>] __netif_receive_skb+0x65d/0x8d0
> Dec 20 15:17:35 media kernel: [  394.940853]  [<ffffffff8184ba15>] ? __netif_receive_skb+0x145/0x8d0
> Dec 20 15:17:35 media kernel: [  394.940889]  [<ffffffff810b192d>] ? trace_hardirqs_on+0xd/0x10
> Dec 20 15:17:35 media kernel: [  394.940914]  [<ffffffff810fecbb>] ? free_hot_cold_page+0x1ab/0x1e0
> Dec 20 15:17:35 media kernel: [  394.940939]  [<ffffffff8184e4f8>] netif_receive_skb+0x28/0xf0
> Dec 20 15:17:35 media kernel: [  394.940964]  [<ffffffff81843e83>] ? __pskb_pull_tail+0x253/0x340
> Dec 20 15:17:35 media kernel: [  394.941000]  [<ffffffff8164fbb5>] xennet_poll+0xae5/0xed0
> Dec 20 15:17:35 media kernel: [  394.941024]  [<ffffffff81080081>] ? wake_up_worker+0x1/0x30
> Dec 20 15:17:35 media kernel: [  394.941046]  [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [  394.941075]  [<ffffffff8184ed66>] net_rx_action+0x136/0x260
> Dec 20 15:17:35 media kernel: [  394.941098]  [<ffffffff81070551>] ? __do_softirq+0x71/0x1a0
> Dec 20 15:17:35 media kernel: [  394.941133]  [<ffffffff810705a9>] __do_softirq+0xc9/0x1a0
> Dec 20 15:17:35 media kernel: [  394.941157]  [<ffffffff819b623c>] call_softirq+0x1c/0x30
> Dec 20 15:17:35 media kernel: [  394.941179]  [<ffffffff8100fdc5>] do_softirq+0x85/0xf0
> Dec 20 15:17:35 media kernel: [  394.941201]  [<ffffffff8107041e>] irq_exit+0x9e/0xd0
> Dec 20 15:17:35 media kernel: [  394.941235]  [<ffffffff81430b1f>] xen_evtchn_do_upcall+0x2f/0x40
> Dec 20 15:17:35 media kernel: [  394.941259]  [<ffffffff819b629e>] xen_do_hypervisor_callback+0x1e/0x30
> Dec 20 15:17:35 media kernel: [  394.941279]  <EOI>  [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [  394.941318]  [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [  394.941356]  [<ffffffff8100890d>] ? xen_force_evtchn_callback+0xd/0x10
> Dec 20 15:17:35 media kernel: [  394.941381]  [<ffffffff810092b2>] ? check_events+0x12/0x20
> Dec 20 15:17:35 media kernel: [  394.941405]  [<ffffffff81009259>] ? xen_irq_enable_direct_reloc+0x4/0x4
> Dec 20 15:17:35 media kernel: [  394.941432]  [<ffffffff819b3f6c>] ? _raw_spin_unlock_irq+0x3c/0x70
> Dec 20 15:17:35 media kernel: [  394.941473]  [<ffffffff81095f83>] ? finish_task_switch+0x83/0xe0
> Dec 20 15:17:35 media kernel: [  394.941507]  [<ffffffff81095f46>] ? finish_task_switch+0x46/0xe0
> Dec 20 15:17:35 media kernel: [  394.941533]  [<ffffffff819b2434>] ? __schedule+0x444/0x880
> Dec 20 15:17:35 media kernel: [  394.941555]  [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [  394.941580]  [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [  394.941614]  [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [  394.941638]  [<ffffffff819aff95>] ? __mutex_unlock_slowpath+0x135/0x1d0
> Dec 20 15:17:35 media kernel: [  394.941663]  [<ffffffff819b2904>] ? schedule+0x24/0x70
> Dec 20 15:17:35 media kernel: [  394.941697]  [<ffffffff819b179d>] ? schedule_hrtimeout_range_clock+0x11d/0x140
> Dec 20 15:17:35 media kernel: [  394.941725]  [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [  394.941748]  [<ffffffff8118a558>] ? ep_poll+0xf8/0x3a0
> Dec 20 15:17:35 media kernel: [  394.941770]  [<ffffffff819b4015>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
> Dec 20 15:17:35 media kernel: [  394.941808]  [<ffffffff810b1818>] ? trace_hardirqs_on_caller+0xf8/0x200
> Dec 20 15:17:35 media kernel: [  394.941833]  [<ffffffff819b17ce>] ? schedule_hrtimeout_range+0xe/0x10
> Dec 20 15:17:35 media kernel: [  394.941856]  [<ffffffff8118a75a>] ? ep_poll+0x2fa/0x3a0
> Dec 20 15:17:35 media kernel: [  394.941878]  [<ffffffff81098630>] ? try_to_wake_up+0x310/0x310
> Dec 20 15:17:35 media kernel: [  394.941913]  [<ffffffff810b5b17>] ? lock_release+0x117/0x250
> Dec 20 15:17:35 media kernel: [  394.941938]  [<ffffffff81165fd7>] ? fget_light+0xd7/0x140
> Dec 20 15:17:35 media kernel: [  394.941959]  [<ffffffff81165f3a>] ? fget_light+0x3a/0x140
> Dec 20 15:17:35 media kernel: [  394.941981]  [<ffffffff8118a8ce>] ? sys_epoll_wait+0xce/0xe0
> Dec 20 15:17:35 media kernel: [  394.942015]  [<ffffffff819b4e69>] ? system_call_fastpath+0x16/0x1b
> Dec 20 15:17:35 media kernel: [  394.942036] ---[ end trace 6f3a832c9e91c8af ]---
> Dec 20 15:17:35 media kernel: [  394.942056] to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:22978 len:23168 from->truesize:23874 skb_headlen(from):0 skb_shinfo(to)->nr_frags:4 skb_shinfo(from)->nr_frags:6
> Dec 20 15:17:35 media kernel: [  394.968199] to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:14290 len:14480 from->truesize:15186 skb_headlen(from):0 skb_shinfo(to)->nr_frags:13 skb_shinfo(from)->nr_frags:4
> Dec 20 15:17:35 media kernel: [  395.262814] net_ratelimit: 371 callbacks suppressed
> Dec 20 15:17:35 media kernel: [  395.262858] eth0: mtu:1500 data_len:90 len before:0 len after:90 truesize before:896 truesize after:730 nr_frags:1 variant1:-166(730) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.264767] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.266193] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.268422] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.271617] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.274794] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.278104] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.281319] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.284454] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.287797] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [  395.291121] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)


Hmm perhaps a better example, i have indented some perhaps interesting points:

        Dec 20 14:12:57 media kernel: [  794.895136] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
        Dec 20 14:12:57 media kernel: [  794.895431] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
        Dec 20 14:12:57 media kernel: [  794.895616] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18824(19720) variant3:24576(25472)
        Dec 20 14:12:57 media kernel: [  794.895804] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
        Dec 20 14:12:57 media kernel: [  794.895823] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:3 variant1:7050(7946) variant2:7050(7946) variant3:12288(13184)
        Dec 20 14:12:57 media kernel: [  794.895868] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:3
        Dec 20 14:12:57 media kernel: [  794.896133] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
        Dec 20 14:12:57 media kernel: [  794.896152] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
        Dec 20 14:12:57 media kernel: [  794.896200] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:1
        Dec 20 14:12:57 media kernel: [  794.907232] eth0: mtu:1500 data_len:23234 len before:0 len after:23234 truesize before:896 truesize after:23874 nr_frags:7 variant1:22978(23874) variant2:22978(23874) variant3:28672(29568)
        Dec 20 14:12:57 media kernel: [  794.907517] eth0: mtu:1500 data_len:24682 len before:0 len after:24682 truesize before:896 truesize after:25322 nr_frags:7 variant1:24426(25322) variant2:24426(25322) variant3:28672(29568)
        Dec 20 14:12:57 media kernel: [  794.907693] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)
        Dec 20 14:12:57 media kernel: [  794.907882] eth0: mtu:1500 data_len:14546 len before:0 len after:14546 truesize before:896 truesize after:15186 nr_frags:5 variant1:14290(15186) variant2:14290(15186) variant3:20480(21376)
        Dec 20 14:12:57 media kernel: [  794.907901] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
        Dec 20 14:12:57 media kernel: [  794.907938] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:13482 len:13032 from->truesize:13738 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:4
        Dec 20 14:12:57 media kernel: [  794.908191] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28880(29776) variant3:36864(37760)
        Dec 20 14:12:57 media kernel: [  794.908386] eth0: mtu:1500 data_len:30474 len before:0 len after:30474 truesize before:896 truesize after:31114 nr_frags:8 variant1:30218(31114) variant2:30218(31114) variant3:32768(33664)

A1) Here we have a packet data_len: 5858 and truesize set to 6498 and nr_frags: 2
        Dec 20 14:12:57 media kernel: [  794.908560] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:2 variant1:5602(6498) variant2:5602(6498) variant3:8192(9088)

        Dec 20 14:12:57 media kernel: [  794.908581] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)

A2) That seems to end up in skb_try_coalesce, from->nr_frags is still 2, delta >> LEN in this case, no warning but perhaps wasteful ?
        Dec 20 14:12:57 media kernel: [  794.908616] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:6242 len:5792 from->truesize:6498 skb_headlen(from):190 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2

        Dec 20 14:12:57 media kernel: [  794.908834] eth0: mtu:1500 data_len:33370 len before:0 len after:33370 truesize before:896 truesize after:34010 nr_frags:9 variant1:33114(34010) variant2:33114(34010) variant3:36864(37760)

B1) Here we have again a packet data_len: 5858 and truesize set to 6498, but nr_frags: 3 this time.
        Dec 20 14:12:57 media kernel: [  794.908992] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:3 variant1:5602(6498) variant2:5792(6688) variant3:12288(13184)
        Dec 20 14:12:57 media kernel: [  794.909012] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:8 variant1:28770(29666) variant2:28770(29666) variant3:32768(33664)

B2) That seems to end up in skb_try_coalesce, from->nr_frags is now 2 instead of 3, delta < LEN in this case, so it would have triggered the warn_on_once
        Dec 20 14:12:57 media kernel: [  794.909040] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA < LEN delta:5602 len:5792 from->truesize:6498 skb_headlen(from):0 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2

        Dec 20 14:12:57 media kernel: [  794.909673] eth0: mtu:1500 data_len:1514 len before:0 len after:1514 truesize before:896 truesize after:2154 nr_frags:1 variant1:1258(2154) variant2:1258(2154) variant3:4096(4992)
        Dec 20 14:12:57 media kernel: [  794.909692] eth0: mtu:1500 data_len:522 len before:0 len after:522 truesize before:896 truesize after:1162 nr_frags:1 variant1:266(1162) variant2:266(1162) variant3:4096(4992)
        Dec 20 14:12:57 media kernel: [  794.909736] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:906 len:456 from->truesize:1162 skb_headlen(from):190 skb_shinfo(to)->nr_frags:2 skb_shinfo(from)->nr_frags:1
        Dec 20 14:12:57 media kernel: [  794.910205] eth0: mtu:1500 data_len:36266 len before:0 len after:36266 truesize before:896 truesize after:36906 nr_frags:10 variant1:36010(36906) variant2:36010(36906) variant3:40960(41856)
        Dec 20 14:12:57 media kernel: [  794.910706] eth0: mtu:1500 data_len:37714 len before:0 len after:37714 truesize before:896 truesize after:38354 nr_frags:10 variant1:37458(38354) variant2:37458(38354) variant3:40960(41856)
        Dec 20 14:12:57 media kernel: [  794.911472] eth0: mtu:1500 data_len:27578 len before:0 len after:27578 truesize before:896 truesize after:28218 nr_frags:8 variant1:27322(28218) variant2:27322(28218) variant3:32768(33664)
        Dec 20 14:12:57 media kernel: [  794.911695] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28770(29666) variant3:36864(37760)
        Dec 20 14:12:57 media kernel: [  795.015511] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
        Dec 20 14:12:57 media kernel: [  795.015585] skbuff: to: (null) from: (null)  skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:10 skb_shinfo(from)->nr_frags:1
        Dec 20 14:12:57 media kernel: [  795.015641] eth0: mtu:1500 data_len:10202 len before:0 len after:10202 truesize before:896 truesize after:10842 nr_frags:4 variant1:9946(10842) variant2:9946(10842) variant3:16384(17280)
        Dec 20 14:12:57 media kernel: [  795.015657] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
        Dec 20 14:12:58 media kernel: [  795.817824] net_ratelimit: 9 callbacks suppressed

--
Sander

^ permalink raw reply

* Re: [PATCH] net: ipv4: route: fixed a coding style issues net: ipv4: tcp: fixed a coding style issues
From: Eric Dumazet @ 2012-12-20 15:23 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: Stefan Hasko, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <50D2FF86.3000603@6wind.com>

On Thu, 2012-12-20 at 13:07 +0100, Nicolas Dichtel wrote:
> Le 20/12/2012 09:08, Stefan Hasko a écrit :

> > +				"out_hlist_search\n");
> checkpatch will warn you about this one, something like:
> "WARNING: quoted string split across lines".
> Not breaking such line ease to grep the pattern.


Yes.

Could we please leave this file as is for at least 2 years ?

We had a lot of recent changes and probable fixes are expected.

Such "coding style" patches are a real pain when trying to fix bugs,
especially dealing with stable/old kernels.

Thanks

^ permalink raw reply

* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
From: Yury Stankevich @ 2012-12-20 14:59 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50D305FD.7000901@mojatatu.com>

interesting,

#tc -s filter show dev usb0 parent ffff:
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
0 terminal flowid ???  (rule hit 707 success 707)
  match 00000000/00000000 at 0 (success 707 )
	action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
	target  CONNMARK restore
	index 5 ref 1 bind 1 installed 394 sec used 11 sec
	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: mirred (Egress Redirect to device ifb0) stolen
 	index 5 ref 1 bind 1 installed 394 sec used 11 sec
 	Action statistics:
	Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

so, looks like packets was sent to CONNMARK target.

but...
i make a iptables rule to log packets with 0xa mark:

Chain PREROUTING (policy ACCEPT 1308 packets, 848K bytes)
 pkts bytes target     prot opt in     out     source
destination
    0     0 NFLOG      all  --  *      *       0.0.0.0/0
0.0.0.0/0            mark match 0xa nflog-group 1

Chain POSTROUTING (policy ACCEPT 1240 packets, 550K bytes)
 pkts bytes target     prot opt in     out     source
destination
    1    40 CONNMARK   tcp  --  *      *       0.0.0.0/0
0.0.0.0/0            tcp dpt:80 connmark match  0x0 connbytes 204800
connbytes mode bytes connbytes direction both CONNMARK set 0xa

idea is:
i run downloading, rule from POSTROUTING must fire if i download more
than ~200K,
tc filter call to CONNMARK restore, must restore mark (0xa) for packets
belong to this connection.
so i expect, that PREROUTING rule must notice the restored mark, but it
doesn't.
maybe i miss something ?


20.12.2012 16:35, Jamal Hadi Salim пишет:
> 
> Could be your setup. I didnt do a lot of testing but
> from my notes (running different kernel at the moment):
> 
> #try to point to everything (no iptables setup)
> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid
> 23:23 action xt -j CONNMARK --restore-mark
> #let it run for a 1 sec then display with
> tc -s filter show dev eth0 parent ffff:
> 
> ----
> filter protocol ip pref 49152 u32
> filter protocol ip pref 49152 u32 fh 800: ht divisor 1
> filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
> 0 flowid 23:23
>   match 00000000/00000000 at 0
>     action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING
>     target  CONNMARK restore
>     index 1 ref 1 bind 1 installed 3 sec used 1 sec
>     Action statistics:
>     Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
>     backlog 0b 0p requeues 0
> ----
> 
> cheers,
> jamal
> 
> On 12-12-20 03:54 AM, Yury Stankevich wrote:
>> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>>> Hasan/Yury, if you test this please use the latest iproute2 with only
>>> the first patch I posted (originally from Hasan). Hasan please use that
>>> patch not your version - if theres anything wrong we can find out sooner
>>> before the patch becomes final.
>>
>> Hello,
>> 3.7.1 kernel with 3.7.0 iproute,
>> patch-xt, xt-p1 + linkage fix was applyed
>> command successfully performed, but actually doesn't work.
>>
>> command:
>> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>>              action xt -j CONNMARK --restore-mark \
>>              action mirred egress redirect dev ifb0
>> then i use filter:
>>
>> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
>> 1:102
>>
>> iptables line:
>> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
>> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
>> bytes -j CONNMARK --set-mark 0xa
>>
>> once i run a test to download 300K file,
>> from iptables counters i can see that rule in POSTROUTING is triggered,
>> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
>> 1:102 flow.
>>
>> btw,
>> tc -p -s filter show dev ifb0 parent 1:
>> do not show stats `(rule hit 416 success 0)` for this (filter protocol
>> ip pref 2 fw handle 0xa classid 1:102) rule.
>>
>>
>>
> 


-- 
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>

^ permalink raw reply

* Re: [PATCH net-next V4 02/13] bridge: Add vlan filtering infrastructure
From: Vlad Yasevich @ 2012-12-20 15:31 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <20121220153913.11a10fd0@pixies.home.jungo.com>

On 12/20/2012 08:39 AM, Shmulik Ladkani wrote:
> Hi Vlad,
>
> On Wed, 19 Dec 2012 12:48:13 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
>> +static void nbp_vlan_flush(struct net_bridge_port *p)
>> +{
>> +	struct net_port_vlan *pve;
>> +	struct net_port_vlan *tmp;
>> +
>> +	ASSERT_RTNL();
>> +
>> +	list_for_each_entry_safe(pve, tmp, &p->vlan_list, list)
>> +		nbp_vlan_delete(p, pve->vid, BRIDGE_FLAGS_SELF);
>
> Why would you want to clear "bridge master port" association from this
> vlan, in the event of NBP destruction?
> The "bridge port" may still be a member of this vlan, doesn't it?
> Seems flags argument should be 0.

This ends up getting fixed later, but you are right.  This should be 0.

>
>> +#define BR_VID_HASH_SIZE (1<<6)
>> +#define br_vlan_hash(vid) ((vid) % (BR_VID_HASH_SIZE - 1))
>
> Did you mean:                       & (BR_VID_HASH_SIZE - 1)

yes.

thanks
-vlad

>
> Regards,
> Shmulik
>

^ permalink raw reply

* Re: [PATCH] net: ipv4: route: fix coding style issues net: ipv4: tcp: fix coding style issues
From: Eric Dumazet @ 2012-12-20 15:31 UTC (permalink / raw)
  To: Stefan Hasko
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <1356013685-31649-1-git-send-email-hasko.stevo@gmail.com>

On Thu, 2012-12-20 at 15:28 +0100, Stefan Hasko wrote:
> Fix a coding style issues.
> 
> Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
> ---
>  net/ipv4/route.c |  119 ++++++++++++++++-------------
>  net/ipv4/tcp.c   |  218 +++++++++++++++++++++++++++++++-----------------------
>  2 files changed, 194 insertions(+), 143 deletions(-)


I Nack this patch and any such patches in net/core, net/ipv4 trees for a
while.

We had too many recent changes and probably need a bunch of real fixes.

Thanks

^ permalink raw reply

* Re: [PATCH] xen/netfront: improve truesize tracking
From: Eric Dumazet @ 2012-12-20 15:39 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
	annie li, xen-devel@lists.xensource.com
In-Reply-To: <1797374383.20121220135139@eikelenboom.it>

On Thu, 2012-12-20 at 13:51 +0100, Sander Eikelenboom wrote:

> Eric:
>      From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
>      When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?

I would use the most exact value, which is :

   skb->truesize += nr_frags * PAGE_SIZE;

Then, if we can spot later a regression in some stacks, adapt the
limiting parameters. I did a lot of work in GRO and TCP stack to reduce
the memory, and further changes are possible.

We really want to account memory, because we want to control how memory
is used on our machines and don't let some users use more than the
amount that was allowed to them.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox