* Re: [PATCH net-next V4 00/13] Add basic VLAN support to bridges
From: Vitalii Demianets @ 2012-12-20 10:08 UTC (permalink / raw)
To: Andrew Collins
Cc: Vlad Yasevich, netdev, shemminger, davem, or.gerlitz, jhs, mst,
erdnetdev, jiri
In-Reply-To: <CAKTPYJTAB-oOW5UE9EbNxwA+XbhmJu1FLrvq_mU8B1Qi6trxeA@mail.gmail.com>
On Thursday 20 December 2012 00:54:27 Andrew Collins wrote:
> On Wed, Dec 19, 2012 at 10:48 AM, Vlad Yasevich <vyasevic@redhat.com> wrote:
> > This series of patches provides an ability to add VLANs to the bridge
> > ports. This is similar to what can be found in most switches. The
> > bridge port may have any number of VLANs added to it including vlan 0
> > priority tagged traffic. When vlans are added to the port, only traffic
> > tagged with particular vlan will forwarded over this port. Additionally,
> > vlan ids are added to FDB entries and become part of the lookup. This
> > way we correctly identify the FDB entry.
>
> This is likely well beyond the scope of this change, but I figured I'd
> throw out the question anyway. This changeset looks to bring the
> Linux bridging code closer to the 802.1Q-2005 definition of a bridge,
> which is nice to see, I'm curious if this changeset also opens up the
> possibility of supporting MSTP in the future? The big thing I see
> missing is per-VLAN port state, although I'm not very familiar with
> the current STP/bridge interactions. Has anyone put any thought into
> what other necessary bridge pieces might be missing for MSTP support?
I think, to be compatible with 802.1Q-2005 we need the following pieces:
1) Multiple FIDs (it is 802.1Q term for FDB) support. It means that kernel
should support several independent FDBs on a single bridge. The 802.1Q-2005
standard requires the number of supported FDBs to be no less than the number
of different MSTIs the implementation supports;
2) VLAN-to-FDB mapping should be introduced;
3) Support of Multiple Spanning Tree Instances (MSTIs);
4) FDB-to-MSTI mapping should be introduced;
5) And finally, per-MST port states should be implemented.
> obviously something to handle the MSTP protocol itself would need to exist
as well
Please look here: http://sourceforge.net/projects/mstpd/
^ permalink raw reply
* Re: [PATCH] bridge: call br_netpoll_disable in br_add_if
From: Cong Wang @ 2012-12-20 10:33 UTC (permalink / raw)
To: Gao feng; +Cc: netdev, shemminger, davem
In-Reply-To: <1355996503-19318-1-git-send-email-gaofeng@cn.fujitsu.com>
On Thu, 2012-12-20 at 17:41 +0800, Gao feng wrote:
> When netdev_set_master faild in br_add_if, we should
> call br_netpoll_disable to do some cleanup jobs,such
> as free the memory of struct netpoll which allocated
> in br_netpoll_enable.
>
> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Looks good!
Acked-by: Cong Wang <amwang@redhat.com>
^ permalink raw reply
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
From: Yurij M. Plotnikov @ 2012-12-20 11:22 UTC (permalink / raw)
To: Steffen Klassert; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky
In-Reply-To: <20121220073445.GM18940@secunet.com>
On 12/20/12 11:34, Steffen Klassert wrote:
> On Wed, Dec 19, 2012 at 07:37:44PM +0000, Ben Hutchings wrote:
>
>> On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote:
>>
>>> On 12/19/12 17:35, Ben Hutchings wrote:
>>>
>>>> On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
>>>>
>>>>
>>>>> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket
>>>>> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT
>>>>> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the
>>>>> same and packet is always sent with "Don't Fragment" bit in case of
>>>>> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.
>>>>>
>>>>>
>>>> You could try reverting:
>>>>
>>>> commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
>>>> Author: Steffen Klassert<steffen.klassert@secunet.com>
>>>> Date: Mon Oct 8 00:56:54 2012 +0000
>>>>
>>>> ipv4: Don't report stale pmtu values to userspace
>>>>
>>>> We report cached pmtu values even if they are already expired.
>>>> Change this to not report these values after they are expired
>>>> and fix a race in the expire time calculation, as suggested by
>>>> Eric Dumazet.
>>>>
>>>> Still, PMTU information is not supposed to expire for 10 minutes...
>>>>
>>>>
>>>>
>>> With reverted commit there is no such problem on 3.7.1: IP_MTU is
>>> updated and DF is set only for the first packet in case of
>>> IP_PMTUDISC_WANT.
>>>
>> [...]
>>
>> So it looks like something is going wrong with the expiry calculation
>> here.
>>
>> This change shouldn't affect the PMTU actually used by the kernel, but
>> could affect Onload since that relies on netlink route updates to keep
>> in synch. You didn't say you were using Onload, but if you are then we
>> should not bother netdev with this until we can demonstrate a problem
>> that involves only the kernel stack.
>>
>>
> I'm really surprised that this change can have such an effect,
> it changes nothing at the kernels pmtu handling. When looking
> at the code, I found that we may report a mtu value from a stale
> dst_entry when we query the mtu value with the IP_MTU socket
> option. But a subsequent send() should update the socket cached
> dst_entry, so at most one packet should be affected.
>
> Does the patch below change anything?
>
>
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 3c9d208..1049ce0 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
> {
> struct dst_entry *dst;
> val = 0;
> - dst = sk_dst_get(sk);
> + dst = sk_dst_check(sk, 0);
> if (dst) {
> val = dst_mtu(dst);
> dst_release(dst);
>
With this patch kernel 3.7.1 works perfect. All described problems are
fixed.
^ permalink raw reply
* Re: [PATCH] net: ipv4: route: fixed a coding style issues net: ipv4: tcp: fixed a coding style issues
From: Nicolas Dichtel @ 2012-12-20 12:07 UTC (permalink / raw)
To: Stefan Hasko
Cc: David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <1355990910-3688-1-git-send-email-hasko.stevo@gmail.com>
Le 20/12/2012 09:08, Stefan Hasko a écrit :
> Fix a coding style issues.
>
> Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
> ---
> net/ipv4/route.c | 125 ++++++++++++++++++-------------
> net/ipv4/tcp.c | 218 +++++++++++++++++++++++++++++++-----------------------
> 2 files changed, 200 insertions(+), 143 deletions(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 844a9ef..fff7ce6 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -20,7 +20,7 @@
> * Alan Cox : Added BSD route gw semantics
> * Alan Cox : Super /proc >4K
> * Alan Cox : MTU in route table
> - * Alan Cox : MSS actually. Also added the window
> + * Alan Cox : MSS actually. Also added the window
> * clamper.
> * Sam Lantinga : Fixed route matching in rt_del()
> * Alan Cox : Routing cache support.
> @@ -31,30 +31,35 @@
> * Miquel van Smoorenburg : BSD API fixes.
> * Miquel van Smoorenburg : Metrics.
> * Alan Cox : Use __u32 properly
> - * Alan Cox : Aligned routing errors more closely with BSD
> + * Alan Cox : Aligned routing errors more
> + * closely with BSD
> * our system is still very different.
> * Alan Cox : Faster /proc handling
> - * Alexey Kuznetsov : Massive rework to support tree based routing,
> + * Alexey Kuznetsov : Massive rework to support
> + * tree based routing,
> * routing caches and better behaviour.
> *
> * Olaf Erb : irtt wasn't being copied right.
> * Bjorn Ekwall : Kerneld route support.
> * Alan Cox : Multicast fixed (I hope)
> - * Pavel Krauz : Limited broadcast fixed
> + * Pavel Krauz : Limited broadcast fixed
> * Mike McLagan : Routing by source
> * Alexey Kuznetsov : End of old history. Split to fib.c and
> * route.c and rewritten from scratch.
> * Andi Kleen : Load-limit warning messages.
> - * Vitaly E. Lavrov : Transparent proxy revived after year coma.
> + * Vitaly E. Lavrov : Transparent proxy revived
> + * after year coma.
> * Vitaly E. Lavrov : Race condition in ip_route_input_slow.
> - * Tobias Ringstrom : Uninitialized res.type in ip_route_output_slow.
> + * Tobias Ringstrom : Uninitialized res.type in
> + * ip_route_output_slow.
> * Vladimir V. Ivanov : IP rule info (flowid) is really useful.
> * Marc Boucher : routing by fwmark
> * Robert Olsson : Added rt_cache statistics
> * Arnaldo C. Melo : Convert proc stuff to seq_file
> - * Eric Dumazet : hashed spinlocks and rt_check_expire() fixes.
> - * Ilia Sotnikov : Ignore TOS on PMTUD and Redirect
> - * Ilia Sotnikov : Removed TOS from hash calculations
> + * Eric Dumazet : hashed spinlocks and
> + * rt_check_expire() fixes.
> + * Ilia Sotnikov : Ignore TOS on PMTUD and Redirect
> + * Ilia Sotnikov : Removed TOS from hash calculations
> *
> * This program is free software; you can redistribute it and/or
> * modify it under the terms of the GNU General Public License
> @@ -65,7 +70,7 @@
> #define pr_fmt(fmt) "IPv4: " fmt
>
> #include <linux/module.h>
> -#include <asm/uaccess.h>
> +#include <linux/uaccess.h>
> #include <linux/bitops.h>
> #include <linux/types.h>
> #include <linux/kernel.h>
> @@ -139,7 +144,8 @@ static unsigned int ipv4_default_advmss(const struct dst_entry *dst);
> static unsigned int ipv4_mtu(const struct dst_entry *dst);
> static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
> static void ipv4_link_failure(struct sk_buff *skb);
> -static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
> +static void ip_rt_update_pmtu(struct dst_entry *dst,
> + struct sock *sk,
> struct sk_buff *skb, u32 mtu);
> static void ip_do_redirect(struct dst_entry *dst, struct sock *sk,
> struct sk_buff *skb);
> @@ -291,12 +297,17 @@ static int rt_cpu_seq_show(struct seq_file *seq, void *v)
> struct rt_cache_stat *st = v;
>
> if (v == SEQ_START_TOKEN) {
> - seq_printf(seq, "entries in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src out_hit out_slow_tot out_slow_mc gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n");
> + seq_printf(seq, "entries in_hit in_slow_tot in_slow_mc "
> + "in_no_route in_brd in_martian_dst "
> + "in_martian_src out_hit out_slow_tot "
> + "out_slow_mc gc_total gc_ignored "
> + "gc_goal_miss gc_dst_overflow in_hlist_search "
> + "out_hlist_search\n");
checkpatch will warn you about this one, something like:
"WARNING: quoted string split across lines".
Not breaking such line ease to grep the pattern.
Nicolas
^ permalink raw reply
* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
From: Jamal Hadi Salim @ 2012-12-20 12:35 UTC (permalink / raw)
To: Yury Stankevich
Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt,
netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50D2D229.6040802@gmail.com>
Could be your setup. I didnt do a lot of testing but
from my notes (running different kernel at the moment):
#try to point to everything (no iptables setup)
tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid
23:23 action xt -j CONNMARK --restore-mark
#let it run for a 1 sec then display with
tc -s filter show dev eth0 parent ffff:
----
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
0 flowid 23:23
match 00000000/00000000 at 0
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target CONNMARK restore
index 1 ref 1 bind 1 installed 3 sec used 1 sec
Action statistics:
Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
----
cheers,
jamal
On 12-12-20 03:54 AM, Yury Stankevich wrote:
> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>> Hasan/Yury, if you test this please use the latest iproute2 with only
>> the first patch I posted (originally from Hasan). Hasan please use that
>> patch not your version - if theres anything wrong we can find out sooner
>> before the patch becomes final.
>
> Hello,
> 3.7.1 kernel with 3.7.0 iproute,
> patch-xt, xt-p1 + linkage fix was applyed
> command successfully performed, but actually doesn't work.
>
> command:
> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
> action xt -j CONNMARK --restore-mark \
> action mirred egress redirect dev ifb0
> then i use filter:
>
> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
> 1:102
>
> iptables line:
> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
> bytes -j CONNMARK --set-mark 0xa
>
> once i run a test to download 300K file,
> from iptables counters i can see that rule in POSTROUTING is triggered,
> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
> 1:102 flow.
>
> btw,
> tc -p -s filter show dev ifb0 parent 1:
> do not show stats `(rule hit 416 success 0)` for this (filter protocol
> ip pref 2 fw handle 0xa classid 1:102) rule.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
From: Steffen Klassert @ 2012-12-20 12:35 UTC (permalink / raw)
To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky
In-Reply-To: <50D2F4E5.4050904@oktetlabs.ru>
On Thu, Dec 20, 2012 at 03:22:13PM +0400, Yurij M. Plotnikov wrote:
> On 12/20/12 11:34, Steffen Klassert wrote:
> >
> >diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> >index 3c9d208..1049ce0 100644
> >--- a/net/ipv4/ip_sockglue.c
> >+++ b/net/ipv4/ip_sockglue.c
> >@@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
> > {
> > struct dst_entry *dst;
> > val = 0;
> >- dst = sk_dst_get(sk);
> >+ dst = sk_dst_check(sk, 0);
> > if (dst) {
> > val = dst_mtu(dst);
> > dst_release(dst);
> With this patch kernel 3.7.1 works perfect. All described problems
> are fixed.
Thanks for testing!
I'm not sure if we can't use this as a fix. I think with this patch it
could happen that we return -ENOTCONN instead of a pmtu value on a
connected socket. Perhaps it is better to update the cached dst_entry in
ipv4_sk_update_pmtu() when we receive the -EMSGSIZE. I'll do some
investigation.
Anyway, it is still odd that reverting my other patch 'fixes'
this issue too.
^ permalink raw reply
* RE: TCP delayed ACK heuristic
From: Cong Wang @ 2012-12-20 12:41 UTC (permalink / raw)
To: David Laight
Cc: David Miller, rick.jones2, netdev, greearb, eric.dumazet,
shemminger, tgraf
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70FC@saturn3.aculab.com>
On Thu, 2012-12-20 at 09:57 +0000, David Laight wrote:
> > So, can we at least have a sysctl to control the timeout of the delayed
> > ACK? I mean the minimum 40ms. TCP_QUICKACK can help too, but it requires
> > the receiver to modify the application and has to be set every time when
> > calling recv().
>
> A sysctl in inappropriate - it affects the entire TCP protocol stack.
>
> You want different behaviour for different remote hosts (probably
> different subnets).
> In particular your local subnet is unlikely to have packet loss
> and very likely to have a very low RTT.
>
> AFAICT a lot of the recent 'tuning' has been done for web/ftp
> servers that are very remote from the client. These connections
> are also request-response ones - quite often with large responses.
>
> IMHO This has been to the detriment of local connections.
>
A customer prefers faster response in their low-loss environment, 40ms
is not good. Of course, they are supposed to know their environment when
they tune this.
Or maybe a sysctl equals to TCP_QUICKACK?
^ permalink raw reply
* Re: [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 12:51 UTC (permalink / raw)
To: Eric Dumazet
Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
In-Reply-To: <1355933869.21834.13.camel@edumazet-glaptop>
Wednesday, December 19, 2012, 5:17:49 PM, you wrote:
> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:
>> Hi Ian,
>>
>> It ran overnight and i haven't seen the warn_once trigger.
>> (but i also didn't with the previous patch)
>>
> As I said, the miminum value to not trigger the warning was what Ian
> patch was doing, but it was still a not accurate estimation.
> Doing the real accounting might trigger slow transferts, or dropped
> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.
> So the real question was : If accounting for full pages, is your
> applications run as smooth as before, with no huge performance
> regression ?
Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.
I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..
- The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
- The first variant (current code) seems to always substract from the truesize for small packets.
- The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
- The third variant seems to be a pretty wasteful estimation.
So the last variant seems to be rather wasteful, and the second one the most accurate so far.
Eric:
From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
[ 116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
[ 117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
[ 117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
[ 117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
[ 117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
[ 117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
[ 117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
[ 117.539403] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
[ 117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
[ 117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
[ 121.981917] net_ratelimit: 27 callbacks suppressed
[ 121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
[ 126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c26e28b..8833e38 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
struct sk_buff_head tmpq;
unsigned long flags;
int err;
+ int tsz,len;
spin_lock(&np->rx_lock);
@@ -1037,9 +1038,22 @@ err:
* receive throughout using the standard receive
* buffer size was cut by 25%(!!!).
*/
- skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
+
+
+
+
+ tsz = skb->truesize;
+ len = skb->len;
+ /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
+ skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
skb->len += skb->data_len;
+ net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
+ skb->dev->name, skb->dev->mtu, skb->data_len, len, skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
+ skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
+ skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
+ PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
+
if (rx->flags & XEN_NETRXF_csum_blank)
skb->ip_summed = CHECKSUM_PARTIAL;
else if (rx->flags & XEN_NETRXF_data_validated)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3ab989b..6d0cd86 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
WARN_ON_ONCE(delta < len);
+ if(delta < len) {
+ net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
+ to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
+ }
+
+ if (delta > len && delta - len > 100) {
+ net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
+ to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
+ }
+
memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
skb_shinfo(from)->frags,
skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
^ permalink raw reply related
* [PATCH net] net/vxlan: Use the underlying device index when joining/leaving multicast groups
From: Yan Burman @ 2012-12-20 13:36 UTC (permalink / raw)
To: shemminger; +Cc: netdev, ogerlitz, Yan Burman
The socket calls from vxlan to join/leave multicast group aren't
using the index of the underlying device, as a result the stack uses
the first interface that is up. This results in vxlan being non functional
over a device which isn't the 1st to be up.
Fix this by providing the iflink field to the vxlan instance
to the multicast calls.
Signed-off-by: Yan Burman <yanb@mellanox.com>
---
drivers/net/vxlan.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3b3fdf6..40f2cc1 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -505,7 +505,8 @@ static int vxlan_join_group(struct net_device *dev)
struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
struct sock *sk = vn->sock->sk;
struct ip_mreqn mreq = {
- .imr_multiaddr.s_addr = vxlan->gaddr,
+ .imr_multiaddr.s_addr = vxlan->gaddr,
+ .imr_ifindex = vxlan->link,
};
int err;
@@ -532,7 +533,8 @@ static int vxlan_leave_group(struct net_device *dev)
int err = 0;
struct sock *sk = vn->sock->sk;
struct ip_mreqn mreq = {
- .imr_multiaddr.s_addr = vxlan->gaddr,
+ .imr_multiaddr.s_addr = vxlan->gaddr,
+ .imr_ifindex = vxlan->link,
};
/* Only leave group when last vxlan is done. */
--
1.7.11.3
^ permalink raw reply related
* Re: [PATCH net-next V4 02/13] bridge: Add vlan filtering infrastructure
From: Shmulik Ladkani @ 2012-12-20 13:39 UTC (permalink / raw)
To: Vlad Yasevich
Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <1355939304-21804-3-git-send-email-vyasevic@redhat.com>
Hi Vlad,
On Wed, 19 Dec 2012 12:48:13 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> +static void nbp_vlan_flush(struct net_bridge_port *p)
> +{
> + struct net_port_vlan *pve;
> + struct net_port_vlan *tmp;
> +
> + ASSERT_RTNL();
> +
> + list_for_each_entry_safe(pve, tmp, &p->vlan_list, list)
> + nbp_vlan_delete(p, pve->vid, BRIDGE_FLAGS_SELF);
Why would you want to clear "bridge master port" association from this
vlan, in the event of NBP destruction?
The "bridge port" may still be a member of this vlan, doesn't it?
Seems flags argument should be 0.
> +#define BR_VID_HASH_SIZE (1<<6)
> +#define br_vlan_hash(vid) ((vid) % (BR_VID_HASH_SIZE - 1))
Did you mean: & (BR_VID_HASH_SIZE - 1)
Regards,
Shmulik
^ permalink raw reply
* Re: Network namespace bugs in L2TP
From: Tom Parkin @ 2012-12-20 13:52 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev
In-Reply-To: <87r4mt4um7.fsf@xmission.com>
[-- Attachment #1: Type: text/plain, Size: 2955 bytes --]
Hi Eric,
On Thu, Dec 13, 2012 at 11:31:12AM -0800, Eric W. Biederman wrote:
> Tom Parkin <tparkin@katalix.com> writes:
>
> > On Wed, Dec 12, 2012 at 11:44:36AM -0800, Eric W. Biederman wrote:
> >> Tom Parkin <tparkin@katalix.com> writes:
> > I think that raises a question in the case of the L2TP tunnel sockets,
> > though. Currently l2tp_tunnel_sock_create uses the namespace of the
> > current process for the socket. The alternative is to pass in the
> > desired namespace from l2tp_tunnel_create -- and this makes sense, I
> > think.
> >
> > However, when l2tp_tunnel_create is called from the netlink code, the
> > namespace passed is that of the netlink socket. At the risk of sounding
> > silly, what's the benefit of using the netlink socket namespace over the
> > process namespace in this case?
>
> Using the netlink socket namespace ensure that if the netlink socket is
> passed between processes the semantics of sending messages down the
> netlink socket don't change.
>
> There is another thread on netdev discussing another variant of this
> right now. For some cases it is just a waste of resources to have one
> copy of a daemon per network namespace. In which case a controlling
> daemon will open one netlink socket per network namespace and send
> commands down the appropriate socket for the network namespace the
> daemon wishes to control.
Yes, I saw that other thread. Thanks for the clarification on this
point.
> > But that doesn't seem too unreasonable. A user would have to take
> > explicit action to create an L2TP tunnel socket, and it might seem
> > reasonable for that socket to keep the namespace alive until the user
> > explicitly tears it down again.
>
> Sending a netlink message to tear down the socket is not unreasonable.
>
> Having a reference counting loop such that it is possible to close all
> other sockets and all other references to a network namespace and not
> have the network namespace go away because the L2TP tunnel socket holds
> a reference to the unreachable and unuusable network namespace is
> unreasonable.
>
> We handle this with arp and icmp control sockets by not creating a
> reference count. And having a pernet cleanup routing clean up those
> sockets. Assuming I am right about the reference counting loop being
> possible this is something to look at.
Yep, OK. I hadn't appreciated the namespace could become inaccessible!
I've done some digging and I believe there is an issue with the
reference counting for the unmanaged tunnel sockets -- certainly I am
able to leak netns resources here.
I've been working on a patchset which I hope will address these issues
in l2tp_core. I'm stress testing it now and hope to post to netdev
soon for review.
Thanks again for your help.
Tom
--
Tom Parkin
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: [PATCH] 8139cp: Prevent dev_close/cp_interrupt race on MTU change
From: John Greene @ 2012-12-20 13:55 UTC (permalink / raw)
To: David Woodhouse; +Cc: David Miller, netdev
In-Reply-To: <1355950547.18919.93.camel@shinybook.infradead.org>
On 12/19/2012 03:55 PM, David Woodhouse wrote:
> On Wed, 2012-12-19 at 12:40 -0800, David Miller wrote:
>> You sent this as a "request for testing" last week, but I saw
>> no testing on real hardware whatsoever.
>
> Thanks for the reminder :)
>
> Seems to work fine here. I haven't confirmed whether I actually see the
> race or not but changing MTU on a live device works fine, even when it's
> being ping-flooded.
>
> Tested-by: David Woodhouse <David.Woodhouse@intel.com>
>
Thanks all. Happy holidays!
--
John Greene
^ permalink raw reply
* skb->cb size checks (was Re: [PATCH 00/17] ATM fixes for pppoatm/br2684)
From: David Woodhouse @ 2012-12-20 14:03 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20121201.204906.1703696018528746748.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]
On Sat, 2012-12-01 at 20:49 -0500, David Miller wrote:
> From: David Woodhouse <dwmw2@infradead.org>
> Date: Sun, 02 Dec 2012 00:40:47 +0000
>
> > On Sat, 2012-12-01 at 17:33 +0000, David Woodhouse wrote:
> >>
> >> Very glad I added the BUILD_BUG_ON on the cb struct size now. Perhaps
> >> there should be a generic helper for that? Something like
> >> skb_cb_cast(struct foo_cb, skb) could do it automatically...?
> >
> > Something like this, perhaps? Using skb_cast_cb() would then make it
> > fairly much impossible to accidentally overflow the size of the skb cb.
>
> I actually prefer what we do now, which is do the BUILD_BUG_ON()
> once in the subsystem specific code, usually the initializer.
>
> It's part of creating a new SKB cb, adding that assertion somewhere.
I looked harder at this, and should follow up before it actually does
fall out of the cracks in my brain and get completely forgotten.
Basically, you lie :)
What we *actually* do now, in about two-thirds of cases¹ even in net/
code (I didn't even look at drivers, which I expect to be worse), is use
skb->cb without any form of automatic size check at all. No manual
BUILD_BUG_ON() or anything.
Admittedly, in almost all cases that *isn't* a real problem, because the
structure *isn't* too big for skb->cb and it's all fine. But as a matter
of principle we probably *should* be doing those checks. Just in *case*
someone comes along and adds something stupid to the structure.
So... should we:
- Ignore the "problem" and leave things as they are.
- Go through and fix the 2/3 of offending net/ code and then the
drivers too, *without* making the generic 'deference and automatic
check' macro that I think would simplify that and help to keep us
honest in future.
or
- Let me add something like the skb_cast_cb() macro I wanted, then use
it in all the offending code I can find.
--
dwmw2
¹ http://www.spinics.net/lists/netdev/msg218642.html
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]
^ permalink raw reply
* Lockdep warning in vxlan
From: Yan Burman @ 2012-12-20 14:00 UTC (permalink / raw)
To: shemminger, netdev, Yan Burman
Hi.
When working with vxlan from current net-next, I got a lockdep warning
(below).
It seems to happen when I have host B pinging host A and while the pings
continue,
I do "ip link del" on the vxlan interface on host A. The lockdep warning
is on host A.
Tell me if you need some more info.
=============================================
[ INFO: possible recursive locking detected ]
3.7.0+ #24 Not tainted
---------------------------------------------
swapper/1/0 is trying to acquire lock:
(&n->lock){++--..}, at: [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0
but task is already holding lock:
(&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&n->lock);
lock(&n->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by swapper/1/0:
#0: (((&n->timer))){+.-...}, at: [<ffffffff8104b350>]
call_timer_fn+0x0/0x1c0
#1: (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280
#2: (rcu_read_lock_bh){.+....}, at: [<ffffffff81395400>]
dev_queue_xmit+0x0/0x5d0
#3: (rcu_read_lock_bh){.+....}, at: [<ffffffff813cb41e>]
ip_finish_output+0x13e/0x640
stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.7.0+ #24
Call Trace:
<IRQ> [<ffffffff8108c7ac>] validate_chain+0xdcc/0x11f0
[<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
[<ffffffff81120565>] ? kmem_cache_free+0xe5/0x1c0
[<ffffffff8108d570>] __lock_acquire+0x440/0xc30
[<ffffffff813c3570>] ? inet_getpeer+0x40/0x600
[<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
[<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
[<ffffffff8108ddf5>] lock_acquire+0x95/0x140
[<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
[<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
[<ffffffff81448d4b>] _raw_write_lock_bh+0x3b/0x50
[<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
[<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0
[<ffffffff8139f99b>] neigh_resolve_output+0x16b/0x270
[<ffffffff813cb62d>] ip_finish_output+0x34d/0x640
[<ffffffff813cb41e>] ? ip_finish_output+0x13e/0x640
[<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
[<ffffffff813cb9a0>] ip_output+0x80/0xf0
[<ffffffff813ca368>] ip_local_out+0x28/0x80
[<ffffffffa046f25a>] vxlan_xmit+0x66a/0xbec [vxlan]
[<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
[<ffffffff81394a50>] ? skb_gso_segment+0x2b0/0x2b0
[<ffffffff81449355>] ? _raw_spin_unlock_irqrestore+0x65/0x80
[<ffffffff81394c57>] ? dev_queue_xmit_nit+0x207/0x270
[<ffffffff813950c8>] dev_hard_start_xmit+0x298/0x5d0
[<ffffffff813956f3>] dev_queue_xmit+0x2f3/0x5d0
[<ffffffff81395400>] ? dev_hard_start_xmit+0x5d0/0x5d0
[<ffffffff813f5788>] arp_xmit+0x58/0x60
[<ffffffff813f59db>] arp_send+0x3b/0x40
[<ffffffff813f6424>] arp_solicit+0x204/0x280
[<ffffffff813a1a70>] ? neigh_add+0x310/0x310
[<ffffffff8139f515>] neigh_probe+0x45/0x70
[<ffffffff813a1c10>] neigh_timer_handler+0x1a0/0x2a0
[<ffffffff8104b3cf>] call_timer_fn+0x7f/0x1c0
[<ffffffff8104b350>] ? detach_if_pending+0x120/0x120
[<ffffffff8104b748>] run_timer_softirq+0x238/0x2b0
[<ffffffff813a1a70>] ? neigh_add+0x310/0x310
[<ffffffff81043e51>] __do_softirq+0x101/0x280
[<ffffffff814518cc>] call_softirq+0x1c/0x30
[<ffffffff81003b65>] do_softirq+0x85/0xc0
[<ffffffff81043a7e>] irq_exit+0x9e/0xc0
[<ffffffff810264f8>] smp_apic_timer_interrupt+0x68/0xa0
[<ffffffff8145122f>] apic_timer_interrupt+0x6f/0x80
<EOI> [<ffffffff8100a054>] ? mwait_idle+0xa4/0x1c0
[<ffffffff8100a04b>] ? mwait_idle+0x9b/0x1c0
[<ffffffff8100a6a9>] cpu_idle+0x89/0xe0
[<ffffffff81441127>] start_secondary+0x1b2/0x1b6
Hope this helps
Yan
^ permalink raw reply
* Re: [PATCH net-next V4 03/13] bridge: Validate that vlan is permitted on ingress
From: Shmulik Ladkani @ 2012-12-20 14:07 UTC (permalink / raw)
To: Vlad Yasevich
Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <1355939304-21804-4-git-send-email-vyasevic@redhat.com>
Hi Vlad,
On Wed, 19 Dec 2012 12:48:14 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> +static bool br_allowed_ingress(struct net_bridge_port *p, struct sk_buff *skb)
> +{
> + struct net_port_vlan *pve;
> + u16 vid;
> +
> + /* If there are no vlan in the permitted list, all packets are
> + * permitted.
> + */
> + if (list_empty(&p->vlan_list))
> + return true;
I assumed the default policy would be Drop in such case, otherwise
leaking between vlan domains is possible.
Or maybe, ingress policy when port isn't a member of ingress VID should
be configurable (drop/allow).
> + vid = br_get_vlan(skb);
> + pve = nbp_vlan_find(p, vid);
Why search by iterating through NBP's vlan_list?
You know the VID (hence may fetch the net_bridge_vlan from the hash), so
why don't you directly consult the net_bridge_vlan's port_bitmap?
> @@ -54,6 +74,9 @@ int br_handle_frame_finish(struct sk_buff *skb)
> if (!p || p->state == BR_STATE_DISABLED)
> goto drop;
>
> + if (!br_allowed_ingress(p, skb))
> + goto drop;
> +
This condition should be also encorporated upon "ingress" at the "bridge
master port" (that is, early at br_dev_xmit).
Think of the "bridge master port" as yet another port:
upon "ingress" (meaning, tx packets from the ip stack), we should
also enforce any ingress permission rules.
Regards,
Shmulik
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 14:23 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Eric Dumazet, netdev@vger.kernel.org, annie li,
xen-devel@lists.xensource.com, Ian Campbell,
Konrad Rzeszutek Wilk
In-Reply-To: <1797374383.20121220135139@eikelenboom.it>
Thursday, December 20, 2012, 1:51:39 PM, you wrote:
> Wednesday, December 19, 2012, 5:17:49 PM, you wrote:
>> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:
>>> Hi Ian,
>>>
>>> It ran overnight and i haven't seen the warn_once trigger.
>>> (but i also didn't with the previous patch)
>>>
>> As I said, the miminum value to not trigger the warning was what Ian
>> patch was doing, but it was still a not accurate estimation.
>> Doing the real accounting might trigger slow transferts, or dropped
>> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.
>> So the real question was : If accounting for full pages, is your
>> applications run as smooth as before, with no huge performance
>> regression ?
> Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.
> I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
> So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..
> - The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
> - The first variant (current code) seems to always substract from the truesize for small packets.
> - The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
> - The third variant seems to be a pretty wasteful estimation.
> So the last variant seems to be rather wasteful, and the second one the most accurate so far.
> Eric:
> From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
> When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
> [ 116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
> [ 117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
> [ 117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
> [ 117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
> [ 117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
> [ 117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
> [ 117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
> [ 117.539403] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
> [ 117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
> [ 117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
> [ 121.981917] net_ratelimit: 27 callbacks suppressed
> [ 121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> [ 126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index c26e28b..8833e38 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
> struct sk_buff_head tmpq;
> unsigned long flags;
> int err;
> + int tsz,len;
> spin_lock(&np->rx_lock);
> @@ -1037,9 +1038,22 @@ err:
> * receive throughout using the standard receive
> * buffer size was cut by 25%(!!!).
> */
> - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> +
> +
> +
> +
> + tsz = skb->truesize;
> + len = skb->len;
> + /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
> + skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
> skb->len += skb->data_len;
> + net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
> + skb->dev->name, skb->dev->mtu, skb->data_len, len, skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
> + skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
> + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
> + PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
> +
> if (rx->flags & XEN_NETRXF_csum_blank)
> skb->ip_summed = CHECKSUM_PARTIAL;
> else if (rx->flags & XEN_NETRXF_data_validated)
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 3ab989b..6d0cd86 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> WARN_ON_ONCE(delta < len);
> + if(delta < len) {
> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
> + to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
> + }
> +
+ if (delta >> len && delta - len > 100) {
> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
> + to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
> + }
> +
> memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
> skb_shinfo(from)->frags,
> skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
Ok i succeeded in triggering the warn_on_once, but it seems the extra debug info from netfront was just rate limited away for the offending packet :(
Dec 20 15:17:33 media kernel: [ 393.464062] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [ 393.464438] eth0: mtu:1500 data_len:762 len before:0 len after:762 truesize before:896 truesize after:1402 nr_frags:1 variant1:506(1402) variant2:506(1402) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [ 393.465083] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [ 393.466114] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:33 media kernel: [ 393.467336] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 394.940211] ------------[ cut here ]------------
Dec 20 15:17:35 media kernel: [ 394.940259] WARNING: at net/core/skbuff.c:3472 skb_try_coalesce+0x3fc/0x470()
Dec 20 15:17:35 media kernel: [ 394.940282] Modules linked in:
Dec 20 15:17:35 media kernel: [ 394.940306] Pid: 2632, comm: glusterfs Not tainted 3.7.0-rc0-20121220-netfrontdebug1 #1
Dec 20 15:17:35 media kernel: [ 394.940330] Call Trace:
Dec 20 15:17:35 media kernel: [ 394.940343] <IRQ> [<ffffffff8106889a>] warn_slowpath_common+0x7a/0xb0
Dec 20 15:17:35 media kernel: [ 394.940384] [<ffffffff810688e5>] warn_slowpath_null+0x15/0x20
Dec 20 15:17:35 media kernel: [ 394.940409] [<ffffffff8184298c>] skb_try_coalesce+0x3fc/0x470
Dec 20 15:17:35 media kernel: [ 394.940434] [<ffffffff818fb049>] tcp_try_coalesce+0x69/0xc0
Dec 20 15:17:35 media kernel: [ 394.940458] [<ffffffff818fb0f4>] tcp_queue_rcv+0x54/0x100
Dec 20 15:17:35 media kernel: [ 394.940481] [<ffffffff8190029f>] ? tcp_mtup_init+0x2f/0x90
Dec 20 15:17:35 media kernel: [ 394.940504] [<ffffffff818ffbdb>] tcp_rcv_established+0x2bb/0x6a0
Dec 20 15:17:35 media kernel: [ 394.940528] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
Dec 20 15:17:35 media kernel: [ 394.940551] [<ffffffff81907985>] tcp_v4_do_rcv+0x135/0x480
Dec 20 15:17:35 media kernel: [ 394.940576] [<ffffffff819b3532>] ? _raw_spin_lock_nested+0x42/0x50
Dec 20 15:17:35 media kernel: [ 394.940600] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
Dec 20 15:17:35 media kernel: [ 394.940623] [<ffffffff8190862d>] tcp_v4_rcv+0x95d/0xb10
Dec 20 15:17:35 media kernel: [ 394.940666] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
Dec 20 15:17:35 media kernel: [ 394.940694] [<ffffffff818e4d6a>] ip_local_deliver_finish+0x11a/0x230
Dec 20 15:17:35 media kernel: [ 394.940720] [<ffffffff818e4c95>] ? ip_local_deliver_finish+0x45/0x230
Dec 20 15:17:35 media kernel: [ 394.940745] [<ffffffff818e4eb8>] ip_local_deliver+0x38/0x80
Dec 20 15:17:35 media kernel: [ 394.940784] [<ffffffff818e447a>] ip_rcv_finish+0x15a/0x630
Dec 20 15:17:35 media kernel: [ 394.940807] [<ffffffff818e4b68>] ip_rcv+0x218/0x300
Dec 20 15:17:35 media kernel: [ 394.940829] [<ffffffff8184bf2d>] __netif_receive_skb+0x65d/0x8d0
Dec 20 15:17:35 media kernel: [ 394.940853] [<ffffffff8184ba15>] ? __netif_receive_skb+0x145/0x8d0
Dec 20 15:17:35 media kernel: [ 394.940889] [<ffffffff810b192d>] ? trace_hardirqs_on+0xd/0x10
Dec 20 15:17:35 media kernel: [ 394.940914] [<ffffffff810fecbb>] ? free_hot_cold_page+0x1ab/0x1e0
Dec 20 15:17:35 media kernel: [ 394.940939] [<ffffffff8184e4f8>] netif_receive_skb+0x28/0xf0
Dec 20 15:17:35 media kernel: [ 394.940964] [<ffffffff81843e83>] ? __pskb_pull_tail+0x253/0x340
Dec 20 15:17:35 media kernel: [ 394.941000] [<ffffffff8164fbb5>] xennet_poll+0xae5/0xed0
Dec 20 15:17:35 media kernel: [ 394.941024] [<ffffffff81080081>] ? wake_up_worker+0x1/0x30
Dec 20 15:17:35 media kernel: [ 394.941046] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
Dec 20 15:17:35 media kernel: [ 394.941075] [<ffffffff8184ed66>] net_rx_action+0x136/0x260
Dec 20 15:17:35 media kernel: [ 394.941098] [<ffffffff81070551>] ? __do_softirq+0x71/0x1a0
Dec 20 15:17:35 media kernel: [ 394.941133] [<ffffffff810705a9>] __do_softirq+0xc9/0x1a0
Dec 20 15:17:35 media kernel: [ 394.941157] [<ffffffff819b623c>] call_softirq+0x1c/0x30
Dec 20 15:17:35 media kernel: [ 394.941179] [<ffffffff8100fdc5>] do_softirq+0x85/0xf0
Dec 20 15:17:35 media kernel: [ 394.941201] [<ffffffff8107041e>] irq_exit+0x9e/0xd0
Dec 20 15:17:35 media kernel: [ 394.941235] [<ffffffff81430b1f>] xen_evtchn_do_upcall+0x2f/0x40
Dec 20 15:17:35 media kernel: [ 394.941259] [<ffffffff819b629e>] xen_do_hypervisor_callback+0x1e/0x30
Dec 20 15:17:35 media kernel: [ 394.941279] <EOI> [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
Dec 20 15:17:35 media kernel: [ 394.941318] [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
Dec 20 15:17:35 media kernel: [ 394.941356] [<ffffffff8100890d>] ? xen_force_evtchn_callback+0xd/0x10
Dec 20 15:17:35 media kernel: [ 394.941381] [<ffffffff810092b2>] ? check_events+0x12/0x20
Dec 20 15:17:35 media kernel: [ 394.941405] [<ffffffff81009259>] ? xen_irq_enable_direct_reloc+0x4/0x4
Dec 20 15:17:35 media kernel: [ 394.941432] [<ffffffff819b3f6c>] ? _raw_spin_unlock_irq+0x3c/0x70
Dec 20 15:17:35 media kernel: [ 394.941473] [<ffffffff81095f83>] ? finish_task_switch+0x83/0xe0
Dec 20 15:17:35 media kernel: [ 394.941507] [<ffffffff81095f46>] ? finish_task_switch+0x46/0xe0
Dec 20 15:17:35 media kernel: [ 394.941533] [<ffffffff819b2434>] ? __schedule+0x444/0x880
Dec 20 15:17:35 media kernel: [ 394.941555] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
Dec 20 15:17:35 media kernel: [ 394.941580] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
Dec 20 15:17:35 media kernel: [ 394.941614] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
Dec 20 15:17:35 media kernel: [ 394.941638] [<ffffffff819aff95>] ? __mutex_unlock_slowpath+0x135/0x1d0
Dec 20 15:17:35 media kernel: [ 394.941663] [<ffffffff819b2904>] ? schedule+0x24/0x70
Dec 20 15:17:35 media kernel: [ 394.941697] [<ffffffff819b179d>] ? schedule_hrtimeout_range_clock+0x11d/0x140
Dec 20 15:17:35 media kernel: [ 394.941725] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
Dec 20 15:17:35 media kernel: [ 394.941748] [<ffffffff8118a558>] ? ep_poll+0xf8/0x3a0
Dec 20 15:17:35 media kernel: [ 394.941770] [<ffffffff819b4015>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
Dec 20 15:17:35 media kernel: [ 394.941808] [<ffffffff810b1818>] ? trace_hardirqs_on_caller+0xf8/0x200
Dec 20 15:17:35 media kernel: [ 394.941833] [<ffffffff819b17ce>] ? schedule_hrtimeout_range+0xe/0x10
Dec 20 15:17:35 media kernel: [ 394.941856] [<ffffffff8118a75a>] ? ep_poll+0x2fa/0x3a0
Dec 20 15:17:35 media kernel: [ 394.941878] [<ffffffff81098630>] ? try_to_wake_up+0x310/0x310
Dec 20 15:17:35 media kernel: [ 394.941913] [<ffffffff810b5b17>] ? lock_release+0x117/0x250
Dec 20 15:17:35 media kernel: [ 394.941938] [<ffffffff81165fd7>] ? fget_light+0xd7/0x140
Dec 20 15:17:35 media kernel: [ 394.941959] [<ffffffff81165f3a>] ? fget_light+0x3a/0x140
Dec 20 15:17:35 media kernel: [ 394.941981] [<ffffffff8118a8ce>] ? sys_epoll_wait+0xce/0xe0
Dec 20 15:17:35 media kernel: [ 394.942015] [<ffffffff819b4e69>] ? system_call_fastpath+0x16/0x1b
Dec 20 15:17:35 media kernel: [ 394.942036] ---[ end trace 6f3a832c9e91c8af ]---
Dec 20 15:17:35 media kernel: [ 394.942056] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:22978 len:23168 from->truesize:23874 skb_headlen(from):0 skb_shinfo(to)->nr_frags:4 skb_shinfo(from)->nr_frags:6
Dec 20 15:17:35 media kernel: [ 394.968199] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:14290 len:14480 from->truesize:15186 skb_headlen(from):0 skb_shinfo(to)->nr_frags:13 skb_shinfo(from)->nr_frags:4
Dec 20 15:17:35 media kernel: [ 395.262814] net_ratelimit: 371 callbacks suppressed
Dec 20 15:17:35 media kernel: [ 395.262858] eth0: mtu:1500 data_len:90 len before:0 len after:90 truesize before:896 truesize after:730 nr_frags:1 variant1:-166(730) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.264767] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.266193] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.268422] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.271617] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.274794] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.278104] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.281319] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.284454] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.287797] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Dec 20 15:17:35 media kernel: [ 395.291121] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
^ permalink raw reply
* [PATCH] net: ipv4: route: fix coding style issues net: ipv4: tcp: fix coding style issues
From: Stefan Hasko @ 2012-12-20 14:28 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev
Cc: linux-kernel, Stefan Hasko
Fix a coding style issues.
Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
---
net/ipv4/route.c | 119 ++++++++++++++++-------------
net/ipv4/tcp.c | 218 +++++++++++++++++++++++++++++++-----------------------
2 files changed, 194 insertions(+), 143 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 844a9ef..29678e5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -20,7 +20,7 @@
* Alan Cox : Added BSD route gw semantics
* Alan Cox : Super /proc >4K
* Alan Cox : MTU in route table
- * Alan Cox : MSS actually. Also added the window
+ * Alan Cox : MSS actually. Also added the window
* clamper.
* Sam Lantinga : Fixed route matching in rt_del()
* Alan Cox : Routing cache support.
@@ -31,30 +31,35 @@
* Miquel van Smoorenburg : BSD API fixes.
* Miquel van Smoorenburg : Metrics.
* Alan Cox : Use __u32 properly
- * Alan Cox : Aligned routing errors more closely with BSD
+ * Alan Cox : Aligned routing errors more
+ * closely with BSD
* our system is still very different.
* Alan Cox : Faster /proc handling
- * Alexey Kuznetsov : Massive rework to support tree based routing,
+ * Alexey Kuznetsov : Massive rework to support
+ * tree based routing,
* routing caches and better behaviour.
*
* Olaf Erb : irtt wasn't being copied right.
* Bjorn Ekwall : Kerneld route support.
* Alan Cox : Multicast fixed (I hope)
- * Pavel Krauz : Limited broadcast fixed
+ * Pavel Krauz : Limited broadcast fixed
* Mike McLagan : Routing by source
* Alexey Kuznetsov : End of old history. Split to fib.c and
* route.c and rewritten from scratch.
* Andi Kleen : Load-limit warning messages.
- * Vitaly E. Lavrov : Transparent proxy revived after year coma.
+ * Vitaly E. Lavrov : Transparent proxy revived
+ * after year coma.
* Vitaly E. Lavrov : Race condition in ip_route_input_slow.
- * Tobias Ringstrom : Uninitialized res.type in ip_route_output_slow.
+ * Tobias Ringstrom : Uninitialized res.type in
+ * ip_route_output_slow.
* Vladimir V. Ivanov : IP rule info (flowid) is really useful.
* Marc Boucher : routing by fwmark
* Robert Olsson : Added rt_cache statistics
* Arnaldo C. Melo : Convert proc stuff to seq_file
- * Eric Dumazet : hashed spinlocks and rt_check_expire() fixes.
- * Ilia Sotnikov : Ignore TOS on PMTUD and Redirect
- * Ilia Sotnikov : Removed TOS from hash calculations
+ * Eric Dumazet : hashed spinlocks and
+ * rt_check_expire() fixes.
+ * Ilia Sotnikov : Ignore TOS on PMTUD and Redirect
+ * Ilia Sotnikov : Removed TOS from hash calculations
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
@@ -65,7 +70,7 @@
#define pr_fmt(fmt) "IPv4: " fmt
#include <linux/module.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
#include <linux/bitops.h>
#include <linux/types.h>
#include <linux/kernel.h>
@@ -139,7 +144,8 @@ static unsigned int ipv4_default_advmss(const struct dst_entry *dst);
static unsigned int ipv4_mtu(const struct dst_entry *dst);
static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
static void ipv4_link_failure(struct sk_buff *skb);
-static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
+static void ip_rt_update_pmtu(struct dst_entry *dst,
+ struct sock *sk,
struct sk_buff *skb, u32 mtu);
static void ip_do_redirect(struct dst_entry *dst, struct sock *sk,
struct sk_buff *skb);
@@ -291,12 +297,11 @@ static int rt_cpu_seq_show(struct seq_file *seq, void *v)
struct rt_cache_stat *st = v;
if (v == SEQ_START_TOKEN) {
- seq_printf(seq, "entries in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src out_hit out_slow_tot out_slow_mc gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n");
+ seq_printf(seq, "entries in_hit in_slow_tot in_slow_mc in_no_route in_brd in_martian_dst in_martian_src out_hit out_slow_tot out_slow_mc gc_total gc_ignored gc_goal_miss gc_dst_overflow in_hlist_search out_hlist_search\n");
return 0;
}
- seq_printf(seq,"%08x %08x %08x %08x %08x %08x %08x %08x "
- " %08x %08x %08x %08x %08x %08x %08x %08x %08x \n",
+ seq_printf(seq, "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
dst_entries_get_slow(&ipv4_dst_ops),
st->in_hit,
st->in_slow_tot,
@@ -657,8 +662,8 @@ out_unlock:
return;
}
-static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flowi4 *fl4,
- bool kill_route)
+static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb,
+ struct flowi4 *fl4, bool kill_route)
{
__be32 new_gw = icmp_hdr(skb)->un.gateway;
__be32 old_gw = ip_hdr(skb)->saddr;
@@ -695,7 +700,8 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
if (!IN_DEV_SHARED_MEDIA(in_dev)) {
if (!inet_addr_onlink(in_dev, new_gw, old_gw))
goto reject_redirect;
- if (IN_DEV_SEC_REDIRECTS(in_dev) && ip_fib_check_default(new_gw, dev))
+ if (IN_DEV_SEC_REDIRECTS(in_dev) &&
+ ip_fib_check_default(new_gw, dev))
goto reject_redirect;
} else {
if (inet_addr_type(net, new_gw) != RTN_UNICAST)
@@ -737,7 +743,8 @@ reject_redirect:
;
}
-static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb)
+static void ip_do_redirect(struct dst_entry *dst, struct sock *sk,
+ struct sk_buff *skb)
{
struct rtable *rt;
struct flowi4 fl4;
@@ -1202,11 +1209,11 @@ static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
struct rtable *orig, *prev, **p;
bool ret = true;
- if (rt_is_input_route(rt)) {
+ if (rt_is_input_route(rt))
p = (struct rtable **)&nh->nh_rth_input;
- } else {
+ else
p = (struct rtable **)__this_cpu_ptr(nh->nh_pcpu_rth_output);
- }
+
orig = *p;
prev = cmpxchg(p, orig, rt);
@@ -1359,17 +1366,17 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
#endif
rth->dst.output = ip_rt_bug;
- rth->rt_genid = rt_genid(dev_net(dev));
- rth->rt_flags = RTCF_MULTICAST;
- rth->rt_type = RTN_MULTICAST;
- rth->rt_is_input= 1;
- rth->rt_iif = 0;
- rth->rt_pmtu = 0;
- rth->rt_gateway = 0;
+ rth->rt_genid = rt_genid(dev_net(dev));
+ rth->rt_flags = RTCF_MULTICAST;
+ rth->rt_type = RTN_MULTICAST;
+ rth->rt_is_input = 1;
+ rth->rt_iif = 0;
+ rth->rt_pmtu = 0;
+ rth->rt_gateway = 0;
rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
if (our) {
- rth->dst.input= ip_local_deliver;
+ rth->dst.input = ip_local_deliver;
rth->rt_flags |= RTCF_LOCAL;
}
@@ -1488,8 +1495,8 @@ static int __mkroute_input(struct sk_buff *skb,
rth->rt_flags = flags;
rth->rt_type = res->type;
rth->rt_is_input = 1;
- rth->rt_iif = 0;
- rth->rt_pmtu = 0;
+ rth->rt_iif = 0;
+ rth->rt_pmtu = 0;
rth->rt_gateway = 0;
rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
@@ -1649,25 +1656,25 @@ local_input:
if (!rth)
goto e_nobufs;
- rth->dst.input= ip_local_deliver;
- rth->dst.output= ip_rt_bug;
+ rth->dst.input = ip_local_deliver;
+ rth->dst.output = ip_rt_bug;
#ifdef CONFIG_IP_ROUTE_CLASSID
rth->dst.tclassid = itag;
#endif
rth->rt_genid = rt_genid(net);
- rth->rt_flags = flags|RTCF_LOCAL;
- rth->rt_type = res.type;
+ rth->rt_flags = flags|RTCF_LOCAL;
+ rth->rt_type = res.type;
rth->rt_is_input = 1;
- rth->rt_iif = 0;
- rth->rt_pmtu = 0;
+ rth->rt_iif = 0;
+ rth->rt_pmtu = 0;
rth->rt_gateway = 0;
rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
if (res.type == RTN_UNREACHABLE) {
- rth->dst.input= ip_error;
- rth->dst.error= -err;
- rth->rt_flags &= ~RTCF_LOCAL;
+ rth->dst.input = ip_error;
+ rth->dst.error = -err;
+ rth->rt_flags &= ~RTCF_LOCAL;
}
if (do_cache)
rt_cache_route(&FIB_RES_NH(res), rth);
@@ -1772,7 +1779,8 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
return ERR_PTR(-EINVAL);
if (likely(!IN_DEV_ROUTE_LOCALNET(in_dev)))
- if (ipv4_is_loopback(fl4->saddr) && !(dev_out->flags & IFF_LOOPBACK))
+ if (ipv4_is_loopback(fl4->saddr) &&
+ !(dev_out->flags & IFF_LOOPBACK))
return ERR_PTR(-EINVAL);
if (ipv4_is_lbcast(fl4->daddr))
@@ -1919,7 +1927,9 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
if (fl4->flowi4_oif == 0 &&
(ipv4_is_multicast(fl4->daddr) ||
ipv4_is_lbcast(fl4->daddr))) {
- /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
+ /* It is equivalent to
+ * inet_addr_type(saddr) == RTN_LOCAL
+ */
dev_out = __ip_dev_find(net, fl4->saddr, false);
if (dev_out == NULL)
goto out;
@@ -1944,7 +1954,9 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
}
if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
- /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
+ /* It is equivalent to
+ * inet_addr_type(saddr) == RTN_LOCAL
+ */
if (!__ip_dev_find(net, fl4->saddr, false))
goto out;
}
@@ -1972,7 +1984,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
if (fl4->saddr) {
if (ipv4_is_multicast(fl4->daddr))
fl4->saddr = inet_select_addr(dev_out, 0,
- fl4->flowi4_scope);
+ fl4->flowi4_scope);
else if (!fl4->daddr)
fl4->saddr = inet_select_addr(dev_out, 0,
RT_SCOPE_HOST);
@@ -2061,7 +2073,8 @@ out:
}
EXPORT_SYMBOL_GPL(__ip_route_output_key);
-static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst, u32 cookie)
+static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst,
+ u32 cookie)
{
return NULL;
}
@@ -2073,7 +2086,8 @@ static unsigned int ipv4_blackhole_mtu(const struct dst_entry *dst)
return mtu ? : dst->dev->mtu;
}
-static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk,
+static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst,
+ struct sock *sk,
struct sk_buff *skb, u32 mtu)
{
}
@@ -2101,7 +2115,8 @@ static struct dst_ops ipv4_dst_blackhole_ops = {
.neigh_lookup = ipv4_neigh_lookup,
};
-struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_orig)
+struct dst_entry *ipv4_blackhole_route(struct net *net,
+ struct dst_entry *dst_orig)
{
struct rtable *ort = (struct rtable *) dst_orig;
struct rtable *rt;
@@ -2265,7 +2280,8 @@ nla_put_failure:
return -EMSGSIZE;
}
-static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void *arg)
+static int inet_rtm_getroute(struct sk_buff *in_skb,
+ struct nlmsghdr *nlh, void *arg)
{
struct net *net = sock_net(in_skb->sk);
struct rtmsg *rtm;
@@ -2297,7 +2313,9 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void
skb_reset_mac_header(skb);
skb_reset_network_header(skb);
- /* Bugfix: need to give ip_route_input enough of an IP header to not gag. */
+ /* Bugfix: need to give ip_route_input enough
+ * of an IP header to not gag.
+ */
ip_hdr(skb)->protocol = IPPROTO_ICMP;
skb_reserve(skb, MAX_HEADER + sizeof(struct iphdr));
@@ -2596,7 +2614,8 @@ int __init ip_rt_init(void)
int rc = 0;
#ifdef CONFIG_IP_ROUTE_CLASSID
- ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct), __alignof__(struct ip_rt_acct));
+ ip_rt_acct = __alloc_percpu(256 * sizeof(struct ip_rt_acct),
+ __alignof__(struct ip_rt_acct));
if (!ip_rt_acct)
panic("IP: failed to allocate ip_rt_acct\n");
#endif
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1ca2536..12fadb2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -45,7 +45,7 @@
* escape still
* Alan Cox : Fixed another acking RST frame bug.
* Should stop LAN workplace lockups.
- * Alan Cox : Some tidyups using the new skb list
+ * Alan Cox : Some tidyups using the new skb list
* facilities
* Alan Cox : sk->keepopen now seems to work
* Alan Cox : Pulls options out correctly on accepts
@@ -160,7 +160,8 @@
* generates them.
* Alan Cox : Cache last socket.
* Alan Cox : Per route irtt.
- * Matt Day : poll()->select() match BSD precisely on error
+ * Matt Day : poll()->select() match BSD precisely
+ * on error
* Alan Cox : New buffers
* Marc Tamsky : Various sk->prot->retransmits and
* sk->retransmits misupdating fixed.
@@ -168,9 +169,9 @@
* and TCP syn retries gets used now.
* Mark Yarvis : In tcp_read_wakeup(), don't send an
* ack if state is TCP_CLOSED.
- * Alan Cox : Look up device on a retransmit - routes may
- * change. Doesn't yet cope with MSS shrink right
- * but it's a start!
+ * Alan Cox : Look up device on a retransmit - routes
+ * may change. Doesn't yet cope with MSS
+ * shrink right but it's a start!
* Marc Tamsky : Closing in closing fixes.
* Mike Shaver : RFC1122 verifications.
* Alan Cox : rcv_saddr errors.
@@ -199,7 +200,7 @@
* tcp_do_sendmsg to avoid burstiness.
* Eric Schenk : Fix fast close down bug with
* shutdown() followed by close().
- * Andi Kleen : Make poll agree with SIGIO
+ * Andi Kleen : Make poll agree with SIGIO
* Salvatore Sanfilippo : Support SO_LINGER with linger == 1 and
* lingertime == 0 (RFC 793 ABORT Call)
* Hirokazu Takahashi : Use copy_from_user() instead of
@@ -268,6 +269,7 @@
#include <linux/crypto.h>
#include <linux/time.h>
#include <linux/slab.h>
+#include <linux/uaccess.h>
#include <net/icmp.h>
#include <net/inet_common.h>
@@ -277,7 +279,6 @@
#include <net/netdma.h>
#include <net/sock.h>
-#include <asm/uaccess.h>
#include <asm/ioctls.h>
int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
@@ -286,22 +287,20 @@ struct percpu_counter tcp_orphan_count;
EXPORT_SYMBOL_GPL(tcp_orphan_count);
int sysctl_tcp_wmem[3] __read_mostly;
-int sysctl_tcp_rmem[3] __read_mostly;
+EXPORT_SYMBOL(sysctl_tcp_wmem);
+int sysctl_tcp_rmem[3] __read_mostly;
EXPORT_SYMBOL(sysctl_tcp_rmem);
-EXPORT_SYMBOL(sysctl_tcp_wmem);
atomic_long_t tcp_memory_allocated; /* Current allocated memory. */
EXPORT_SYMBOL(tcp_memory_allocated);
-/*
- * Current number of TCP sockets.
+/* Current number of TCP sockets.
*/
struct percpu_counter tcp_sockets_allocated;
EXPORT_SYMBOL(tcp_sockets_allocated);
-/*
- * TCP splice context
+/* TCP splice context
*/
struct tcp_splice_state {
struct pipe_inode_info *pipe;
@@ -309,8 +308,7 @@ struct tcp_splice_state {
unsigned int flags;
};
-/*
- * Pressure flag: try to collapse.
+/* Pressure flag: try to collapse.
* Technical note: it is used by multiple contexts non atomically.
* All the __sk_mem_schedule() is of this nature: accounting
* is strict, actions are advisory and have some latency.
@@ -430,8 +428,7 @@ void tcp_init_sock(struct sock *sk)
}
EXPORT_SYMBOL(tcp_init_sock);
-/*
- * Wait for a TCP event.
+/* Wait for a TCP event.
*
* Note that we don't need to lock the socket, as the upper poll layers
* take care of normal races (between the test and the event) and we don't
@@ -454,8 +451,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
mask = 0;
- /*
- * POLLHUP is certainly not done right. But poll() doesn't
+ /* POLLHUP is certainly not done right. But poll() doesn't
* have a notion of HUP in just one direction, and for a
* socket the read side is more interesting.
*
@@ -498,7 +494,8 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
/* Potential race condition. If read of tp below will
* escape above sk->sk_state, we can be illegally awaken
- * in SYN_* states. */
+ * in SYN_* states.
+ */
if (tp->rcv_nxt - tp->copied_seq >= target)
mask |= POLLIN | POLLRDNORM;
@@ -509,14 +506,15 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
set_bit(SOCK_ASYNC_NOSPACE,
&sk->sk_socket->flags);
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
-
- /* Race breaker. If space is freed after
- * wspace test but before the flags are set,
- * IO signal will be lost.
- */
- if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk))
- mask |= POLLOUT | POLLWRNORM;
}
+
+ /* Race breaker. If space is freed after
+ * wspace test but before the flags are set,
+ * IO signal will be lost.
+ */
+ if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)
+ && sk_stream_wspace(sk) >= sk_stream_min_wspace(sk))
+ mask |= POLLOUT | POLLWRNORM;
} else
mask |= POLLOUT | POLLWRNORM;
@@ -634,7 +632,7 @@ static inline void tcp_push(struct sock *sk, int flags, int mss_now,
tcp_mark_urg(tp, flags);
__tcp_push_pending_frames(sk, mss_now,
- (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
+ (flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
}
}
@@ -839,6 +837,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
int err;
ssize_t copied;
long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
+ int ass_res = 0;
/* Wait for a connection to finish. One exception is TCP Fast Open
* (passive side) where data is allowed to be sent before a connection
@@ -846,7 +845,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
*/
if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
!tcp_passive_fastopen(sk)) {
- if ((err = sk_stream_wait_connect(sk, &timeo)) != 0)
+ ass_res = (err = sk_stream_wait_connect(sk, &timeo));
+ if (ass_res != 0)
goto out_err;
}
@@ -864,7 +864,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
int copy, i;
bool can_coalesce;
- if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
+ ass_res = (copy = size_goal - skb->len);
+ if (!tcp_send_head(sk) || ass_res <= 0) {
new_segment:
if (!sk_stream_memory_free(sk))
goto wait_for_sndbuf;
@@ -911,7 +912,9 @@ new_segment:
copied += copy;
offset += copy;
- if (!(size -= copy))
+
+ ass_res = (size -= copy);
+ if (!ass_res)
goto out;
if (skb->len < size_goal || (flags & MSG_OOB))
@@ -929,7 +932,8 @@ wait_for_sndbuf:
wait_for_memory:
tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
- if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
+ ass_res = (err = sk_stream_wait_memory(sk, &timeo));
+ if (ass_res != 0)
goto do_error;
mss_now = tcp_send_mss(sk, &size_goal, flags);
@@ -1029,6 +1033,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
int mss_now = 0, size_goal, copied_syn = 0, offset = 0;
bool sg;
long timeo;
+ int ass_res = 0;
lock_sock(sk);
@@ -1050,7 +1055,8 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
*/
if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
!tcp_passive_fastopen(sk)) {
- if ((err = sk_stream_wait_connect(sk, &timeo)) != 0)
+ ass_res = (err = sk_stream_wait_connect(sk, &timeo));
+ if (ass_res != 0)
goto do_error;
}
@@ -1099,7 +1105,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
}
while (seglen > 0) {
- int copy = 0;
+ int copy = 0, ass_res = 0;
int max = size_goal;
skb = tcp_write_queue_tail(sk);
@@ -1123,8 +1129,7 @@ new_segment:
if (!skb)
goto wait_for_memory;
- /*
- * Check whether we can use HW checksum.
+ /* Check whether we can use HW checksum.
*/
if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
skb->ip_summed = CHECKSUM_PARTIAL;
@@ -1162,7 +1167,8 @@ new_segment:
merge = false;
}
- copy = min_t(int, copy, pfrag->size - pfrag->offset);
+ copy = min_t(int, copy,
+ pfrag->size - pfrag->offset);
if (!sk_wmem_schedule(sk, copy))
goto wait_for_memory;
@@ -1176,7 +1182,8 @@ new_segment:
/* Update the skb. */
if (merge) {
- skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy);
+ skb_frag_size_add(
+ &skb_shinfo(skb)->frags[i - 1], copy);
} else {
skb_fill_page_desc(skb, i, pfrag->page,
pfrag->offset, copy);
@@ -1194,15 +1201,19 @@ new_segment:
from += copy;
copied += copy;
- if ((seglen -= copy) == 0 && iovlen == 0)
+ ass_res = (seglen -= copy);
+ if (ass_res == 0 && iovlen == 0)
goto out;
- if (skb->len < max || (flags & MSG_OOB) || unlikely(tp->repair))
+ if (skb->len < max ||
+ (flags & MSG_OOB) ||
+ unlikely(tp->repair))
continue;
if (forced_push(tp)) {
tcp_mark_push(tp, skb);
- __tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
+ __tcp_push_pending_frames(sk, mss_now,
+ TCP_NAGLE_PUSH);
} else if (skb == tcp_send_head(sk))
tcp_push_one(sk, mss_now);
continue;
@@ -1211,9 +1222,11 @@ wait_for_sndbuf:
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
wait_for_memory:
if (copied)
- tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
+ tcp_push(sk, flags & ~MSG_MORE,
+ mss_now, TCP_NAGLE_PUSH);
- if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
+ ass_res = (err = sk_stream_wait_memory(sk, &timeo));
+ if (ass_res != 0)
goto do_error;
mss_now = tcp_send_mss(sk, &size_goal, flags);
@@ -1246,8 +1259,7 @@ out_err:
}
EXPORT_SYMBOL(tcp_sendmsg);
-/*
- * Handle reading urgent data. BSD has very simple semantics for
+/* Handle reading urgent data. BSD has very simple semantics for
* this, no blocking and very strange errors 8)
*/
@@ -1333,7 +1345,8 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
if (inet_csk_ack_scheduled(sk)) {
const struct inet_connection_sock *icsk = inet_csk(sk);
/* Delayed ACKs frequently hit locked sockets during bulk
- * receive. */
+ * receive.
+ */
if (icsk->icsk_ack.blocked ||
/* Once-per-two-segments ACK was not sent by tcp_input.c */
tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||
@@ -1366,7 +1379,8 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
/* Send ACK now, if this read freed lots of space
* in our buffer. Certainly, new_window is new window.
- * We can advertise it now, if it is not less than current one.
+ * We can advertise it now, if it is not less than
+ * current one.
* "Lots" means "at least twice" here.
*/
if (new_window && new_window >= 2 * rcv_window_now)
@@ -1385,7 +1399,8 @@ static void tcp_prequeue_process(struct sock *sk)
NET_INC_STATS_USER(sock_net(sk), LINUX_MIB_TCPPREQUEUED);
/* RX process wants to run with disabled BHs, though it is not
- * necessary */
+ * necessary
+ */
local_bh_disable();
while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
sk_backlog_rcv(sk, skb);
@@ -1445,8 +1460,7 @@ static inline struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
return NULL;
}
-/*
- * This routine provides an alternative to tcp_recvmsg() for routines
+/* This routine provides an alternative to tcp_recvmsg() for routines
* that would like to handle copying from skbuffs directly in 'sendfile'
* fashion.
* Note:
@@ -1526,8 +1540,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
}
EXPORT_SYMBOL(tcp_read_sock);
-/*
- * This routine copies from a sock struct into the user buffer.
+/* This routine copies from a sock struct into the user buffer.
*
* Technical note: in 2.3 we work on _locked_ socket, so that
* tricks with *seq access order and skb->users are not required.
@@ -1610,12 +1623,15 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
do {
u32 offset;
- /* Are we at urgent data? Stop if we have read anything or have SIGURG pending. */
+ /* Are we at urgent data? Stop if we have read
+ * anything or have SIGURG pending.
+ */
if (tp->urg_data && tp->urg_seq == *seq) {
if (copied)
break;
if (signal_pending(current)) {
- copied = timeo ? sock_intr_errno(timeo) : -EAGAIN;
+ copied = timeo ?
+ sock_intr_errno(timeo) : -EAGAIN;
break;
}
}
@@ -1744,7 +1760,8 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
tcp_service_net_dma(sk, true);
tcp_cleanup_rbuf(sk, copied);
} else
- dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+ dma_async_memcpy_issue_pending(
+ tp->ucopy.dma_chan);
}
#endif
if (copied >= target) {
@@ -1760,12 +1777,15 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
#endif
if (user_recv) {
- int chunk;
+ int chunk, ass_res = 0;
/* __ Restore normal policy in scheduler __ */
- if ((chunk = len - tp->ucopy.len) != 0) {
- NET_ADD_STATS_USER(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG, chunk);
+ ass_res = (chunk = len - tp->ucopy.len);
+ if (ass_res != 0) {
+ NET_ADD_STATS_USER(sock_net(sk),
+ LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG,
+ chunk);
len -= chunk;
copied += chunk;
}
@@ -1775,8 +1795,11 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
do_prequeue:
tcp_prequeue_process(sk);
- if ((chunk = len - tp->ucopy.len) != 0) {
- NET_ADD_STATS_USER(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE, chunk);
+ ass_res = (chunk = len - tp->ucopy.len);
+ if (ass_res != 0) {
+ NET_ADD_STATS_USER(sock_net(sk),
+ LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE,
+ chunk);
len -= chunk;
copied += chunk;
}
@@ -1791,7 +1814,7 @@ do_prequeue:
}
continue;
- found_ok_skb:
+found_ok_skb:
/* Ok so how much can we use? */
used = skb->len - offset;
if (len < used)
@@ -1800,19 +1823,18 @@ do_prequeue:
/* Do we have urgent data here? */
if (tp->urg_data) {
u32 urg_offset = tp->urg_seq - *seq;
- if (urg_offset < used) {
- if (!urg_offset) {
- if (!sock_flag(sk, SOCK_URGINLINE)) {
- ++*seq;
- urg_hole++;
- offset++;
- used--;
- if (!used)
- goto skip_copy;
- }
- } else
- used = urg_offset;
+ if (urg_offset < used && !urg_offset) {
+ if (!sock_flag(sk, SOCK_URGINLINE)) {
+ ++*seq;
+ urg_hole++;
+ offset++;
+ used--;
+ if (!used)
+ goto skip_copy;
+ }
}
+ if (urg_offset < used && urg_offset)
+ used = urg_offset;
}
if (!(flags & MSG_TRUNC)) {
@@ -1821,7 +1843,9 @@ do_prequeue:
tp->ucopy.dma_chan = net_dma_find_channel();
if (tp->ucopy.dma_chan) {
- tp->ucopy.dma_cookie = dma_skb_copy_datagram_iovec(
+ tp->ucopy.dma_cookie =
+ dma_skb_copy_datagram_iovec(
+
tp->ucopy.dma_chan, skb, offset,
msg->msg_iov, used,
tp->ucopy.pinned_list);
@@ -1837,7 +1861,8 @@ do_prequeue:
break;
}
- dma_async_memcpy_issue_pending(tp->ucopy.dma_chan);
+ dma_async_memcpy_issue_pending(
+ tp->ucopy.dma_chan);
if ((offset + used) == skb->len)
copied_early = true;
@@ -1878,7 +1903,7 @@ skip_copy:
}
continue;
- found_fin_ok:
+found_fin_ok:
/* Process the FIN. */
++*seq;
if (!(flags & MSG_PEEK)) {
@@ -1890,14 +1915,17 @@ skip_copy:
if (user_recv) {
if (!skb_queue_empty(&tp->ucopy.prequeue)) {
- int chunk;
+ int chunk, ass_res = 0;
tp->ucopy.len = copied > 0 ? len : 0;
tcp_prequeue_process(sk);
- if (copied > 0 && (chunk = len - tp->ucopy.len) != 0) {
- NET_ADD_STATS_USER(sock_net(sk), LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE, chunk);
+ ass_res = (chunk = len - tp->ucopy.len);
+ if (copied > 0 && ass_res != 0) {
+ NET_ADD_STATS_USER(sock_net(sk),
+ LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE,
+ chunk);
len -= chunk;
copied += chunk;
}
@@ -1971,13 +1999,13 @@ void tcp_set_state(struct sock *sk, int state)
sk->sk_state = state;
#ifdef STATE_TRACE
- SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk, statename[oldstate], statename[state]);
+ SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk,
+ statename[oldstate], statename[state]);
#endif
}
EXPORT_SYMBOL_GPL(tcp_set_state);
-/*
- * State processing on a close. This implements the state shift for
+/* State processing on a close. This implements the state shift for
* sending our FIN frame. Note that we only send a FIN for some
* states. A shutdown() may have already sent the FIN, or we may be
* closed.
@@ -2009,8 +2037,7 @@ static int tcp_close_state(struct sock *sk)
return next & TCP_ACTION_FIN;
}
-/*
- * Shutdown the sending side of a connection. Much like close except
+/* Shutdown the sending side of a connection. Much like close except
* that we don't receive shut down or sock_set_flag(sk, SOCK_DEAD).
*/
@@ -2125,7 +2152,7 @@ void tcp_close(struct sock *sk, long timeout)
* required by specs (TCP_ESTABLISHED, TCP_CLOSE_WAIT, when
* they look as CLOSING or LAST_ACK for Linux)
* Probably, I missed some more holelets.
- * --ANK
+ * --ANK
* XXX (TFO) - To start off we don't support SYN+ACK+FIN
* in a single packet! (May consider it later but will
* probably need API support or TCP_CORK SYN-ACK until
@@ -2235,6 +2262,7 @@ int tcp_disconnect(struct sock *sk, int flags)
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
int err = 0;
+ int ass_res = 0;
int old_state = sk->sk_state;
if (old_state != TCP_CLOSE)
@@ -2272,7 +2300,8 @@ int tcp_disconnect(struct sock *sk, int flags)
sk->sk_shutdown = 0;
sock_reset_flag(sk, SOCK_DONE);
tp->srtt = 0;
- if ((tp->write_seq += tp->max_window + 2) == 0)
+ ass_res = (tp->write_seq += tp->max_window + 2);
+ if (ass_res == 0)
tp->write_seq = 1;
icsk->icsk_backoff = 0;
tp->snd_cwnd = 2;
@@ -2358,8 +2387,7 @@ static int tcp_repair_options_est(struct tcp_sock *tp,
return 0;
}
-/*
- * Socket option code for TCP.
+/* Socket option code for TCP.
*/
static int do_tcp_setsockopt(struct sock *sk, int level,
int optname, char __user *optval, unsigned int optlen)
@@ -2491,7 +2519,9 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_MAXSEG:
/* Values greater than interface MTU won't take effect. However
* at the point when this call is done we typically don't yet
- * know which interface is going to be used */
+ * know which interface is going to be used
+ */
+
if (val < TCP_MIN_MSS || val > MAX_TCP_WINDOW) {
err = -EINVAL;
break;
@@ -2509,6 +2539,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
* an explicit push, which overrides even TCP_CORK
* for currently queued segments.
*/
+
tp->nonagle |= TCP_NAGLE_OFF|TCP_NAGLE_PUSH;
tcp_push_pending_frames(sk);
} else {
@@ -2786,7 +2817,8 @@ void tcp_get_info(const struct sock *sk, struct tcp_info *info)
info->tcpi_fackets = tp->fackets_out;
info->tcpi_last_data_sent = jiffies_to_msecs(now - tp->lsndtime);
- info->tcpi_last_data_recv = jiffies_to_msecs(now - icsk->icsk_ack.lrcvtime);
+ info->tcpi_last_data_recv =
+ jiffies_to_msecs(now - icsk->icsk_ack.lrcvtime);
info->tcpi_last_ack_recv = jiffies_to_msecs(now - tp->rcv_tstamp);
info->tcpi_pmtu = icsk->icsk_pmtu_cookie;
@@ -3378,12 +3410,12 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
}
EXPORT_SYMBOL(tcp_md5_hash_skb_data);
-int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *key)
+int tcp_md5_hash_key(struct tcp_md5sig_pool *hp, const struct tcp_md5sig_key *k)
{
struct scatterlist sg;
- sg_init_one(&sg, key->key, key->keylen);
- return crypto_hash_update(&hp->md5_desc, &sg, key->keylen);
+ sg_init_one(&sg, k->key, k->keylen);
+ return crypto_hash_update(&hp->md5_desc, &sg, k->keylen);
}
EXPORT_SYMBOL(tcp_md5_hash_key);
--
1.7.10.4
^ permalink raw reply related
* Re: [PATCH net-next V4 04/13] bridge: Verify that a vlan is allowed to egress on give port
From: Shmulik Ladkani @ 2012-12-20 14:28 UTC (permalink / raw)
To: Vlad Yasevich
Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <1355939304-21804-5-git-send-email-vyasevic@redhat.com>
Hi Vlad,
On Wed, 19 Dec 2012 12:48:15 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> /* Don't forward packets to originating port or forwarding diasabled */
> static inline int should_deliver(const struct net_bridge_port *p,
> const struct sk_buff *skb)
> {
> return (((p->flags & BR_HAIRPIN_MODE) || skb->dev != p->dev) &&
> + br_allowed_egress(p, skb) &&
> p->state == BR_STATE_FORWARDING);
> }
This should be also encorporated into 'br_pass_frame_up' somehow.
Egress permission when leaving the bridge towards IP stack ("egress"
on the "bridge master port" from bridging point-of-view) should be
validated according to master port's membership.
Regards,
Shmulik
^ permalink raw reply
* [PATCH 1/3] iproute2: distinguish permanent and temporary mdb entries
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, bridge, Cong Wang
This patch adds a flag to mdb entries so that we can distinguish
permanent entries with temporary ones.
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
bridge/mdb.c | 24 +++++++++++++++---------
include/linux/if_bridge.h | 3 +++
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/bridge/mdb.c b/bridge/mdb.c
index 121ce9c..6217c5f 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -28,7 +28,7 @@ int filter_index;
static void usage(void)
{
- fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP\n");
+ fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp]\n");
fprintf(stderr, " bridge mdb {show} [ dev DEV ]\n");
exit(-1);
}
@@ -53,13 +53,15 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e)
SPRINT_BUF(abuf);
if (e->addr.proto == htons(ETH_P_IP))
- fprintf(f, "bridge %s port %s group %s\n", ll_index_to_name(ifindex),
+ fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
ll_index_to_name(e->ifindex),
- inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)));
+ inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)),
+ (e->state & MDB_PERMANENT) ? "permanent" : "temp");
else
- fprintf(f, "bridge %s port %s group %s\n", ll_index_to_name(ifindex),
+ fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
ll_index_to_name(e->ifindex),
- inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)));
+ inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)),
+ (e->state & MDB_PERMANENT) ? "permanent" : "temp");
}
static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr)
@@ -179,11 +181,15 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
} else if (strcmp(*argv, "grp") == 0) {
NEXT_ARG();
grp = *argv;
+ } else if (strcmp(*argv, "port") == 0) {
+ NEXT_ARG();
+ p = *argv;
+ } else if (strcmp(*argv, "permanent") == 0) {
+ if (cmd == RTM_NEWMDB)
+ entry.state |= MDB_PERMANENT;
+ } else if (strcmp(*argv, "temp") == 0) {
+ ;/* nothing */
} else {
- if (strcmp(*argv, "port") == 0) {
- NEXT_ARG();
- p = *argv;
- }
if (matches(*argv, "help") == 0)
usage();
}
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b3b6a67..aac8b8c 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -163,6 +163,9 @@ struct br_port_msg {
struct br_mdb_entry {
__u32 ifindex;
+#define MDB_TEMPORARY 0
+#define MDB_PERMANENT 1
+ __u8 state;
struct {
union {
__be32 ip4;
--
1.7.7.6
^ permalink raw reply related
* [PATCH 2/3] iproute2: update help info of bridge command
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger
In-Reply-To: <1356013915-20835-1-git-send-email-amwang@redhat.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
bridge/bridge.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/bridge/bridge.c b/bridge/bridge.c
index 1fcd365..1d59a1e 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -27,7 +27,7 @@ static void usage(void)
{
fprintf(stderr,
"Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
-"where OBJECT := { fdb | monitor }\n"
+"where OBJECT := { fdb | mdb | monitor }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails]\n" );
exit(-1);
}
--
1.7.7.6
^ permalink raw reply related
* [PATCH 3/3] iproute2: make `bridge mdb` output consistent with input
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger
In-Reply-To: <1356013915-20835-1-git-send-email-amwang@redhat.com>
bridge -> dev
group -> grp
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
bridge/mdb.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/bridge/mdb.c b/bridge/mdb.c
index 6217c5f..81d479b 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -53,12 +53,12 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e)
SPRINT_BUF(abuf);
if (e->addr.proto == htons(ETH_P_IP))
- fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
+ fprintf(f, "dev %s port %s grp %s %s\n", ll_index_to_name(ifindex),
ll_index_to_name(e->ifindex),
inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)),
(e->state & MDB_PERMANENT) ? "permanent" : "temp");
else
- fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
+ fprintf(f, "dev %s port %s grp %s %s\n", ll_index_to_name(ifindex),
ll_index_to_name(e->ifindex),
inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)),
(e->state & MDB_PERMANENT) ? "permanent" : "temp");
--
1.7.7.6
^ permalink raw reply related
* Re: [Xen-devel] [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 14:58 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Eric Dumazet, netdev@vger.kernel.org, annie li,
xen-devel@lists.xensource.com, Ian Campbell,
Konrad Rzeszutek Wilk
In-Reply-To: <1457826869.20121220152326@eikelenboom.it>
Thursday, December 20, 2012, 3:23:26 PM, you wrote:
> Thursday, December 20, 2012, 1:51:39 PM, you wrote:
>> Wednesday, December 19, 2012, 5:17:49 PM, you wrote:
>>> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:
>>>> Hi Ian,
>>>>
>>>> It ran overnight and i haven't seen the warn_once trigger.
>>>> (but i also didn't with the previous patch)
>>>>
>>> As I said, the miminum value to not trigger the warning was what Ian
>>> patch was doing, but it was still a not accurate estimation.
>>> Doing the real accounting might trigger slow transferts, or dropped
>>> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.
>>> So the real question was : If accounting for full pages, is your
>>> applications run as smooth as before, with no huge performance
>>> regression ?
>> Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.
>> I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
>> So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..
>> - The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
>> - The first variant (current code) seems to always substract from the truesize for small packets.
>> - The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
>> - The third variant seems to be a pretty wasteful estimation.
>> So the last variant seems to be rather wasteful, and the second one the most accurate so far.
>> Eric:
>> From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
>> When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
>> [ 116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
>> [ 117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
>> [ 117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
>> [ 117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
>> [ 117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
>> [ 117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
>> [ 117.539403] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
>> [ 117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
>> [ 117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
>> [ 121.981917] net_ratelimit: 27 callbacks suppressed
>> [ 121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index c26e28b..8833e38 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>> struct sk_buff_head tmpq;
>> unsigned long flags;
>> int err;
>> + int tsz,len;
>> spin_lock(&np->rx_lock);
>> @@ -1037,9 +1038,22 @@ err:
>> * receive throughout using the standard receive
>> * buffer size was cut by 25%(!!!).
>> */
>> - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> +
>> +
>> +
>> +
>> + tsz = skb->truesize;
>> + len = skb->len;
>> + /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
>> + skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> skb->len += skb->data_len;
>> + net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
>> + skb->dev->name, skb->dev->mtu, skb->data_len, len, skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
>> + skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
>> + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
>> + PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
>> +
>> if (rx->flags & XEN_NETRXF_csum_blank)
>> skb->ip_summed = CHECKSUM_PARTIAL;
>> else if (rx->flags & XEN_NETRXF_data_validated)
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 3ab989b..6d0cd86 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>> WARN_ON_ONCE(delta < len);
>> + if(delta < len) {
>> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> + to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> + }
>> +
+ if (delta >>> len && delta - len > 100) {
>> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> + to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> + }
>> +
>> memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
>> skb_shinfo(from)->frags,
>> skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
> Ok i succeeded in triggering the warn_on_once, but it seems the extra debug info from netfront was just rate limited away for the offending packet :(
> Dec 20 15:17:33 media kernel: [ 393.464062] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.464438] eth0: mtu:1500 data_len:762 len before:0 len after:762 truesize before:896 truesize after:1402 nr_frags:1 variant1:506(1402) variant2:506(1402) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.465083] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.466114] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.467336] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 394.940211] ------------[ cut here ]------------
> Dec 20 15:17:35 media kernel: [ 394.940259] WARNING: at net/core/skbuff.c:3472 skb_try_coalesce+0x3fc/0x470()
> Dec 20 15:17:35 media kernel: [ 394.940282] Modules linked in:
> Dec 20 15:17:35 media kernel: [ 394.940306] Pid: 2632, comm: glusterfs Not tainted 3.7.0-rc0-20121220-netfrontdebug1 #1
> Dec 20 15:17:35 media kernel: [ 394.940330] Call Trace:
> Dec 20 15:17:35 media kernel: [ 394.940343] <IRQ> [<ffffffff8106889a>] warn_slowpath_common+0x7a/0xb0
> Dec 20 15:17:35 media kernel: [ 394.940384] [<ffffffff810688e5>] warn_slowpath_null+0x15/0x20
> Dec 20 15:17:35 media kernel: [ 394.940409] [<ffffffff8184298c>] skb_try_coalesce+0x3fc/0x470
> Dec 20 15:17:35 media kernel: [ 394.940434] [<ffffffff818fb049>] tcp_try_coalesce+0x69/0xc0
> Dec 20 15:17:35 media kernel: [ 394.940458] [<ffffffff818fb0f4>] tcp_queue_rcv+0x54/0x100
> Dec 20 15:17:35 media kernel: [ 394.940481] [<ffffffff8190029f>] ? tcp_mtup_init+0x2f/0x90
> Dec 20 15:17:35 media kernel: [ 394.940504] [<ffffffff818ffbdb>] tcp_rcv_established+0x2bb/0x6a0
> Dec 20 15:17:35 media kernel: [ 394.940528] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [ 394.940551] [<ffffffff81907985>] tcp_v4_do_rcv+0x135/0x480
> Dec 20 15:17:35 media kernel: [ 394.940576] [<ffffffff819b3532>] ? _raw_spin_lock_nested+0x42/0x50
> Dec 20 15:17:35 media kernel: [ 394.940600] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [ 394.940623] [<ffffffff8190862d>] tcp_v4_rcv+0x95d/0xb10
> Dec 20 15:17:35 media kernel: [ 394.940666] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [ 394.940694] [<ffffffff818e4d6a>] ip_local_deliver_finish+0x11a/0x230
> Dec 20 15:17:35 media kernel: [ 394.940720] [<ffffffff818e4c95>] ? ip_local_deliver_finish+0x45/0x230
> Dec 20 15:17:35 media kernel: [ 394.940745] [<ffffffff818e4eb8>] ip_local_deliver+0x38/0x80
> Dec 20 15:17:35 media kernel: [ 394.940784] [<ffffffff818e447a>] ip_rcv_finish+0x15a/0x630
> Dec 20 15:17:35 media kernel: [ 394.940807] [<ffffffff818e4b68>] ip_rcv+0x218/0x300
> Dec 20 15:17:35 media kernel: [ 394.940829] [<ffffffff8184bf2d>] __netif_receive_skb+0x65d/0x8d0
> Dec 20 15:17:35 media kernel: [ 394.940853] [<ffffffff8184ba15>] ? __netif_receive_skb+0x145/0x8d0
> Dec 20 15:17:35 media kernel: [ 394.940889] [<ffffffff810b192d>] ? trace_hardirqs_on+0xd/0x10
> Dec 20 15:17:35 media kernel: [ 394.940914] [<ffffffff810fecbb>] ? free_hot_cold_page+0x1ab/0x1e0
> Dec 20 15:17:35 media kernel: [ 394.940939] [<ffffffff8184e4f8>] netif_receive_skb+0x28/0xf0
> Dec 20 15:17:35 media kernel: [ 394.940964] [<ffffffff81843e83>] ? __pskb_pull_tail+0x253/0x340
> Dec 20 15:17:35 media kernel: [ 394.941000] [<ffffffff8164fbb5>] xennet_poll+0xae5/0xed0
> Dec 20 15:17:35 media kernel: [ 394.941024] [<ffffffff81080081>] ? wake_up_worker+0x1/0x30
> Dec 20 15:17:35 media kernel: [ 394.941046] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [ 394.941075] [<ffffffff8184ed66>] net_rx_action+0x136/0x260
> Dec 20 15:17:35 media kernel: [ 394.941098] [<ffffffff81070551>] ? __do_softirq+0x71/0x1a0
> Dec 20 15:17:35 media kernel: [ 394.941133] [<ffffffff810705a9>] __do_softirq+0xc9/0x1a0
> Dec 20 15:17:35 media kernel: [ 394.941157] [<ffffffff819b623c>] call_softirq+0x1c/0x30
> Dec 20 15:17:35 media kernel: [ 394.941179] [<ffffffff8100fdc5>] do_softirq+0x85/0xf0
> Dec 20 15:17:35 media kernel: [ 394.941201] [<ffffffff8107041e>] irq_exit+0x9e/0xd0
> Dec 20 15:17:35 media kernel: [ 394.941235] [<ffffffff81430b1f>] xen_evtchn_do_upcall+0x2f/0x40
> Dec 20 15:17:35 media kernel: [ 394.941259] [<ffffffff819b629e>] xen_do_hypervisor_callback+0x1e/0x30
> Dec 20 15:17:35 media kernel: [ 394.941279] <EOI> [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [ 394.941318] [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [ 394.941356] [<ffffffff8100890d>] ? xen_force_evtchn_callback+0xd/0x10
> Dec 20 15:17:35 media kernel: [ 394.941381] [<ffffffff810092b2>] ? check_events+0x12/0x20
> Dec 20 15:17:35 media kernel: [ 394.941405] [<ffffffff81009259>] ? xen_irq_enable_direct_reloc+0x4/0x4
> Dec 20 15:17:35 media kernel: [ 394.941432] [<ffffffff819b3f6c>] ? _raw_spin_unlock_irq+0x3c/0x70
> Dec 20 15:17:35 media kernel: [ 394.941473] [<ffffffff81095f83>] ? finish_task_switch+0x83/0xe0
> Dec 20 15:17:35 media kernel: [ 394.941507] [<ffffffff81095f46>] ? finish_task_switch+0x46/0xe0
> Dec 20 15:17:35 media kernel: [ 394.941533] [<ffffffff819b2434>] ? __schedule+0x444/0x880
> Dec 20 15:17:35 media kernel: [ 394.941555] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [ 394.941580] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [ 394.941614] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [ 394.941638] [<ffffffff819aff95>] ? __mutex_unlock_slowpath+0x135/0x1d0
> Dec 20 15:17:35 media kernel: [ 394.941663] [<ffffffff819b2904>] ? schedule+0x24/0x70
> Dec 20 15:17:35 media kernel: [ 394.941697] [<ffffffff819b179d>] ? schedule_hrtimeout_range_clock+0x11d/0x140
> Dec 20 15:17:35 media kernel: [ 394.941725] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [ 394.941748] [<ffffffff8118a558>] ? ep_poll+0xf8/0x3a0
> Dec 20 15:17:35 media kernel: [ 394.941770] [<ffffffff819b4015>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
> Dec 20 15:17:35 media kernel: [ 394.941808] [<ffffffff810b1818>] ? trace_hardirqs_on_caller+0xf8/0x200
> Dec 20 15:17:35 media kernel: [ 394.941833] [<ffffffff819b17ce>] ? schedule_hrtimeout_range+0xe/0x10
> Dec 20 15:17:35 media kernel: [ 394.941856] [<ffffffff8118a75a>] ? ep_poll+0x2fa/0x3a0
> Dec 20 15:17:35 media kernel: [ 394.941878] [<ffffffff81098630>] ? try_to_wake_up+0x310/0x310
> Dec 20 15:17:35 media kernel: [ 394.941913] [<ffffffff810b5b17>] ? lock_release+0x117/0x250
> Dec 20 15:17:35 media kernel: [ 394.941938] [<ffffffff81165fd7>] ? fget_light+0xd7/0x140
> Dec 20 15:17:35 media kernel: [ 394.941959] [<ffffffff81165f3a>] ? fget_light+0x3a/0x140
> Dec 20 15:17:35 media kernel: [ 394.941981] [<ffffffff8118a8ce>] ? sys_epoll_wait+0xce/0xe0
> Dec 20 15:17:35 media kernel: [ 394.942015] [<ffffffff819b4e69>] ? system_call_fastpath+0x16/0x1b
> Dec 20 15:17:35 media kernel: [ 394.942036] ---[ end trace 6f3a832c9e91c8af ]---
> Dec 20 15:17:35 media kernel: [ 394.942056] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:22978 len:23168 from->truesize:23874 skb_headlen(from):0 skb_shinfo(to)->nr_frags:4 skb_shinfo(from)->nr_frags:6
> Dec 20 15:17:35 media kernel: [ 394.968199] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:14290 len:14480 from->truesize:15186 skb_headlen(from):0 skb_shinfo(to)->nr_frags:13 skb_shinfo(from)->nr_frags:4
> Dec 20 15:17:35 media kernel: [ 395.262814] net_ratelimit: 371 callbacks suppressed
> Dec 20 15:17:35 media kernel: [ 395.262858] eth0: mtu:1500 data_len:90 len before:0 len after:90 truesize before:896 truesize after:730 nr_frags:1 variant1:-166(730) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.264767] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.266193] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.268422] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.271617] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.274794] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.278104] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.281319] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.284454] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.287797] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.291121] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Hmm perhaps a better example, i have indented some perhaps interesting points:
Dec 20 14:12:57 media kernel: [ 794.895136] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.895431] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.895616] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18824(19720) variant3:24576(25472)
Dec 20 14:12:57 media kernel: [ 794.895804] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
Dec 20 14:12:57 media kernel: [ 794.895823] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:3 variant1:7050(7946) variant2:7050(7946) variant3:12288(13184)
Dec 20 14:12:57 media kernel: [ 794.895868] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:3
Dec 20 14:12:57 media kernel: [ 794.896133] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.896152] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 794.896200] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:1
Dec 20 14:12:57 media kernel: [ 794.907232] eth0: mtu:1500 data_len:23234 len before:0 len after:23234 truesize before:896 truesize after:23874 nr_frags:7 variant1:22978(23874) variant2:22978(23874) variant3:28672(29568)
Dec 20 14:12:57 media kernel: [ 794.907517] eth0: mtu:1500 data_len:24682 len before:0 len after:24682 truesize before:896 truesize after:25322 nr_frags:7 variant1:24426(25322) variant2:24426(25322) variant3:28672(29568)
Dec 20 14:12:57 media kernel: [ 794.907693] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)
Dec 20 14:12:57 media kernel: [ 794.907882] eth0: mtu:1500 data_len:14546 len before:0 len after:14546 truesize before:896 truesize after:15186 nr_frags:5 variant1:14290(15186) variant2:14290(15186) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.907901] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
Dec 20 14:12:57 media kernel: [ 794.907938] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:13482 len:13032 from->truesize:13738 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:4
Dec 20 14:12:57 media kernel: [ 794.908191] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28880(29776) variant3:36864(37760)
Dec 20 14:12:57 media kernel: [ 794.908386] eth0: mtu:1500 data_len:30474 len before:0 len after:30474 truesize before:896 truesize after:31114 nr_frags:8 variant1:30218(31114) variant2:30218(31114) variant3:32768(33664)
A1) Here we have a packet data_len: 5858 and truesize set to 6498 and nr_frags: 2
Dec 20 14:12:57 media kernel: [ 794.908560] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:2 variant1:5602(6498) variant2:5602(6498) variant3:8192(9088)
Dec 20 14:12:57 media kernel: [ 794.908581] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)
A2) That seems to end up in skb_try_coalesce, from->nr_frags is still 2, delta >> LEN in this case, no warning but perhaps wasteful ?
Dec 20 14:12:57 media kernel: [ 794.908616] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:6242 len:5792 from->truesize:6498 skb_headlen(from):190 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2
Dec 20 14:12:57 media kernel: [ 794.908834] eth0: mtu:1500 data_len:33370 len before:0 len after:33370 truesize before:896 truesize after:34010 nr_frags:9 variant1:33114(34010) variant2:33114(34010) variant3:36864(37760)
B1) Here we have again a packet data_len: 5858 and truesize set to 6498, but nr_frags: 3 this time.
Dec 20 14:12:57 media kernel: [ 794.908992] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:3 variant1:5602(6498) variant2:5792(6688) variant3:12288(13184)
Dec 20 14:12:57 media kernel: [ 794.909012] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:8 variant1:28770(29666) variant2:28770(29666) variant3:32768(33664)
B2) That seems to end up in skb_try_coalesce, from->nr_frags is now 2 instead of 3, delta < LEN in this case, so it would have triggered the warn_on_once
Dec 20 14:12:57 media kernel: [ 794.909040] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:5602 len:5792 from->truesize:6498 skb_headlen(from):0 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2
Dec 20 14:12:57 media kernel: [ 794.909673] eth0: mtu:1500 data_len:1514 len before:0 len after:1514 truesize before:896 truesize after:2154 nr_frags:1 variant1:1258(2154) variant2:1258(2154) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 794.909692] eth0: mtu:1500 data_len:522 len before:0 len after:522 truesize before:896 truesize after:1162 nr_frags:1 variant1:266(1162) variant2:266(1162) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 794.909736] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:906 len:456 from->truesize:1162 skb_headlen(from):190 skb_shinfo(to)->nr_frags:2 skb_shinfo(from)->nr_frags:1
Dec 20 14:12:57 media kernel: [ 794.910205] eth0: mtu:1500 data_len:36266 len before:0 len after:36266 truesize before:896 truesize after:36906 nr_frags:10 variant1:36010(36906) variant2:36010(36906) variant3:40960(41856)
Dec 20 14:12:57 media kernel: [ 794.910706] eth0: mtu:1500 data_len:37714 len before:0 len after:37714 truesize before:896 truesize after:38354 nr_frags:10 variant1:37458(38354) variant2:37458(38354) variant3:40960(41856)
Dec 20 14:12:57 media kernel: [ 794.911472] eth0: mtu:1500 data_len:27578 len before:0 len after:27578 truesize before:896 truesize after:28218 nr_frags:8 variant1:27322(28218) variant2:27322(28218) variant3:32768(33664)
Dec 20 14:12:57 media kernel: [ 794.911695] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28770(29666) variant3:36864(37760)
Dec 20 14:12:57 media kernel: [ 795.015511] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 795.015585] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:10 skb_shinfo(from)->nr_frags:1
Dec 20 14:12:57 media kernel: [ 795.015641] eth0: mtu:1500 data_len:10202 len before:0 len after:10202 truesize before:896 truesize after:10842 nr_frags:4 variant1:9946(10842) variant2:9946(10842) variant3:16384(17280)
Dec 20 14:12:57 media kernel: [ 795.015657] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
Dec 20 14:12:58 media kernel: [ 795.817824] net_ratelimit: 9 callbacks suppressed
--
Sander
^ permalink raw reply
* Re: [PATCH] net: ipv4: route: fixed a coding style issues net: ipv4: tcp: fixed a coding style issues
From: Eric Dumazet @ 2012-12-20 15:23 UTC (permalink / raw)
To: nicolas.dichtel
Cc: Stefan Hasko, David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <50D2FF86.3000603@6wind.com>
On Thu, 2012-12-20 at 13:07 +0100, Nicolas Dichtel wrote:
> Le 20/12/2012 09:08, Stefan Hasko a écrit :
> > + "out_hlist_search\n");
> checkpatch will warn you about this one, something like:
> "WARNING: quoted string split across lines".
> Not breaking such line ease to grep the pattern.
Yes.
Could we please leave this file as is for at least 2 years ?
We had a lot of recent changes and probable fixes are expected.
Such "coding style" patches are a real pain when trying to fix bugs,
especially dealing with stable/old kernels.
Thanks
^ permalink raw reply
* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
From: Yury Stankevich @ 2012-12-20 14:59 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt,
netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50D305FD.7000901@mojatatu.com>
interesting,
#tc -s filter show dev usb0 parent ffff:
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
0 terminal flowid ??? (rule hit 707 success 707)
match 00000000/00000000 at 0 (success 707 )
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target CONNMARK restore
index 5 ref 1 bind 1 installed 394 sec used 11 sec
Action statistics:
Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
action order 2: mirred (Egress Redirect to device ifb0) stolen
index 5 ref 1 bind 1 installed 394 sec used 11 sec
Action statistics:
Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
so, looks like packets was sent to CONNMARK target.
but...
i make a iptables rule to log packets with 0xa mark:
Chain PREROUTING (policy ACCEPT 1308 packets, 848K bytes)
pkts bytes target prot opt in out source
destination
0 0 NFLOG all -- * * 0.0.0.0/0
0.0.0.0/0 mark match 0xa nflog-group 1
Chain POSTROUTING (policy ACCEPT 1240 packets, 550K bytes)
pkts bytes target prot opt in out source
destination
1 40 CONNMARK tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp dpt:80 connmark match 0x0 connbytes 204800
connbytes mode bytes connbytes direction both CONNMARK set 0xa
idea is:
i run downloading, rule from POSTROUTING must fire if i download more
than ~200K,
tc filter call to CONNMARK restore, must restore mark (0xa) for packets
belong to this connection.
so i expect, that PREROUTING rule must notice the restored mark, but it
doesn't.
maybe i miss something ?
20.12.2012 16:35, Jamal Hadi Salim пишет:
>
> Could be your setup. I didnt do a lot of testing but
> from my notes (running different kernel at the moment):
>
> #try to point to everything (no iptables setup)
> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid
> 23:23 action xt -j CONNMARK --restore-mark
> #let it run for a 1 sec then display with
> tc -s filter show dev eth0 parent ffff:
>
> ----
> filter protocol ip pref 49152 u32
> filter protocol ip pref 49152 u32 fh 800: ht divisor 1
> filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
> 0 flowid 23:23
> match 00000000/00000000 at 0
> action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
> target CONNMARK restore
> index 1 ref 1 bind 1 installed 3 sec used 1 sec
> Action statistics:
> Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> ----
>
> cheers,
> jamal
>
> On 12-12-20 03:54 AM, Yury Stankevich wrote:
>> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>>> Hasan/Yury, if you test this please use the latest iproute2 with only
>>> the first patch I posted (originally from Hasan). Hasan please use that
>>> patch not your version - if theres anything wrong we can find out sooner
>>> before the patch becomes final.
>>
>> Hello,
>> 3.7.1 kernel with 3.7.0 iproute,
>> patch-xt, xt-p1 + linkage fix was applyed
>> command successfully performed, but actually doesn't work.
>>
>> command:
>> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>> action xt -j CONNMARK --restore-mark \
>> action mirred egress redirect dev ifb0
>> then i use filter:
>>
>> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
>> 1:102
>>
>> iptables line:
>> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
>> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
>> bytes -j CONNMARK --set-mark 0xa
>>
>> once i run a test to download 300K file,
>> from iptables counters i can see that rule in POSTROUTING is triggered,
>> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
>> 1:102 flow.
>>
>> btw,
>> tc -p -s filter show dev ifb0 parent 1:
>> do not show stats `(rule hit 416 success 0)` for this (filter protocol
>> ip pref 2 fw handle 0xa classid 1:102) rule.
>>
>>
>>
>
--
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>
^ permalink raw reply
* Re: [PATCH net-next V4 02/13] bridge: Add vlan filtering infrastructure
From: Vlad Yasevich @ 2012-12-20 15:31 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <20121220153913.11a10fd0@pixies.home.jungo.com>
On 12/20/2012 08:39 AM, Shmulik Ladkani wrote:
> Hi Vlad,
>
> On Wed, 19 Dec 2012 12:48:13 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
>> +static void nbp_vlan_flush(struct net_bridge_port *p)
>> +{
>> + struct net_port_vlan *pve;
>> + struct net_port_vlan *tmp;
>> +
>> + ASSERT_RTNL();
>> +
>> + list_for_each_entry_safe(pve, tmp, &p->vlan_list, list)
>> + nbp_vlan_delete(p, pve->vid, BRIDGE_FLAGS_SELF);
>
> Why would you want to clear "bridge master port" association from this
> vlan, in the event of NBP destruction?
> The "bridge port" may still be a member of this vlan, doesn't it?
> Seems flags argument should be 0.
This ends up getting fixed later, but you are right. This should be 0.
>
>> +#define BR_VID_HASH_SIZE (1<<6)
>> +#define br_vlan_hash(vid) ((vid) % (BR_VID_HASH_SIZE - 1))
>
> Did you mean: & (BR_VID_HASH_SIZE - 1)
yes.
thanks
-vlad
>
> Regards,
> Shmulik
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox