* [PATCH 1/3] iproute2: distinguish permanent and temporary mdb entries
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, bridge, Cong Wang
This patch adds a flag to mdb entries so that we can distinguish
permanent entries with temporary ones.
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
bridge/mdb.c | 24 +++++++++++++++---------
include/linux/if_bridge.h | 3 +++
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/bridge/mdb.c b/bridge/mdb.c
index 121ce9c..6217c5f 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -28,7 +28,7 @@ int filter_index;
static void usage(void)
{
- fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP\n");
+ fprintf(stderr, "Usage: bridge mdb { add | del } dev DEV port PORT grp GROUP [permanent | temp]\n");
fprintf(stderr, " bridge mdb {show} [ dev DEV ]\n");
exit(-1);
}
@@ -53,13 +53,15 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e)
SPRINT_BUF(abuf);
if (e->addr.proto == htons(ETH_P_IP))
- fprintf(f, "bridge %s port %s group %s\n", ll_index_to_name(ifindex),
+ fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
ll_index_to_name(e->ifindex),
- inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)));
+ inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)),
+ (e->state & MDB_PERMANENT) ? "permanent" : "temp");
else
- fprintf(f, "bridge %s port %s group %s\n", ll_index_to_name(ifindex),
+ fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
ll_index_to_name(e->ifindex),
- inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)));
+ inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)),
+ (e->state & MDB_PERMANENT) ? "permanent" : "temp");
}
static void br_print_mdb_entry(FILE *f, int ifindex, struct rtattr *attr)
@@ -179,11 +181,15 @@ static int mdb_modify(int cmd, int flags, int argc, char **argv)
} else if (strcmp(*argv, "grp") == 0) {
NEXT_ARG();
grp = *argv;
+ } else if (strcmp(*argv, "port") == 0) {
+ NEXT_ARG();
+ p = *argv;
+ } else if (strcmp(*argv, "permanent") == 0) {
+ if (cmd == RTM_NEWMDB)
+ entry.state |= MDB_PERMANENT;
+ } else if (strcmp(*argv, "temp") == 0) {
+ ;/* nothing */
} else {
- if (strcmp(*argv, "port") == 0) {
- NEXT_ARG();
- p = *argv;
- }
if (matches(*argv, "help") == 0)
usage();
}
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b3b6a67..aac8b8c 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -163,6 +163,9 @@ struct br_port_msg {
struct br_mdb_entry {
__u32 ifindex;
+#define MDB_TEMPORARY 0
+#define MDB_PERMANENT 1
+ __u8 state;
struct {
union {
__be32 ip4;
--
1.7.7.6
^ permalink raw reply related
* [PATCH 2/3] iproute2: update help info of bridge command
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger
In-Reply-To: <1356013915-20835-1-git-send-email-amwang@redhat.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
bridge/bridge.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/bridge/bridge.c b/bridge/bridge.c
index 1fcd365..1d59a1e 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -27,7 +27,7 @@ static void usage(void)
{
fprintf(stderr,
"Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
-"where OBJECT := { fdb | monitor }\n"
+"where OBJECT := { fdb | mdb | monitor }\n"
" OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails]\n" );
exit(-1);
}
--
1.7.7.6
^ permalink raw reply related
* [PATCH 3/3] iproute2: make `bridge mdb` output consistent with input
From: Cong Wang @ 2012-12-20 14:31 UTC (permalink / raw)
To: netdev; +Cc: bridge, Cong Wang, Stephen Hemminger
In-Reply-To: <1356013915-20835-1-git-send-email-amwang@redhat.com>
bridge -> dev
group -> grp
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
bridge/mdb.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/bridge/mdb.c b/bridge/mdb.c
index 6217c5f..81d479b 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -53,12 +53,12 @@ static void print_mdb_entry(FILE *f, int ifindex, struct br_mdb_entry *e)
SPRINT_BUF(abuf);
if (e->addr.proto == htons(ETH_P_IP))
- fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
+ fprintf(f, "dev %s port %s grp %s %s\n", ll_index_to_name(ifindex),
ll_index_to_name(e->ifindex),
inet_ntop(AF_INET, &e->addr.u.ip4, abuf, sizeof(abuf)),
(e->state & MDB_PERMANENT) ? "permanent" : "temp");
else
- fprintf(f, "bridge %s port %s group %s %s\n", ll_index_to_name(ifindex),
+ fprintf(f, "dev %s port %s grp %s %s\n", ll_index_to_name(ifindex),
ll_index_to_name(e->ifindex),
inet_ntop(AF_INET6, &e->addr.u.ip6, abuf, sizeof(abuf)),
(e->state & MDB_PERMANENT) ? "permanent" : "temp");
--
1.7.7.6
^ permalink raw reply related
* Re: [Xen-devel] [PATCH] xen/netfront: improve truesize tracking
From: Sander Eikelenboom @ 2012-12-20 14:58 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Eric Dumazet, netdev@vger.kernel.org, annie li,
xen-devel@lists.xensource.com, Ian Campbell,
Konrad Rzeszutek Wilk
In-Reply-To: <1457826869.20121220152326@eikelenboom.it>
Thursday, December 20, 2012, 3:23:26 PM, you wrote:
> Thursday, December 20, 2012, 1:51:39 PM, you wrote:
>> Wednesday, December 19, 2012, 5:17:49 PM, you wrote:
>>> On Wed, 2012-12-19 at 12:34 +0100, Sander Eikelenboom wrote:
>>>> Hi Ian,
>>>>
>>>> It ran overnight and i haven't seen the warn_once trigger.
>>>> (but i also didn't with the previous patch)
>>>>
>>> As I said, the miminum value to not trigger the warning was what Ian
>>> patch was doing, but it was still a not accurate estimation.
>>> Doing the real accounting might trigger slow transferts, or dropped
>>> packets because of socket limits (SNDBUF / RCVBUF) being hit sooner.
>>> So the real question was : If accounting for full pages, is your
>>> applications run as smooth as before, with no huge performance
>>> regression ?
>> Ok i have added some extra debug info (see diff's below), the code still uses the old calculation for truesize (in the hope to trigger the warn_on_once again), but also calculates the variants IanC came up with.
>> I haven't got a clear test case to trigger the warn_on_once, it happens just every once in a while during my normal usage and i'm not a netperf expert :-)
>> So at the moment i haven't been able to trigger the warn_on_once yet, but the results so far do seem to shed some light ..
>> - The first variant (current code) seems to be the most effcient and a good estimation *most* of the the, but sometimes triggers the warn_on_once in skb_try_coalesce.
>> - The first variant (current code) seems to always substract from the truesize for small packets.
>> - The second variant always seems keep the truesize as is for most of the small network traffic, but it also seems to work ok for larger packets.
>> - The third variant seems to be a pretty wasteful estimation.
>> So the last variant seems to be rather wasteful, and the second one the most accurate so far.
>> Eric:
>> From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
>> When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
>> [ 116.965062] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 117.094538] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.094707] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.094869] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.095058] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.095216] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.096102] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.096311] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.096373] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.150398] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.150459] eth0: mtu:1500 data_len:54 len before:0 len after:54 truesize before:896 truesize after:694 nr_frags:1 variant1:-202(694) variant2:0(896) variant3:4096(4992)
>> [ 117.536901] eth0: mtu:1500 data_len:53642 len before:0 len after:53642 truesize before:896 truesize after:54282 nr_frags:14 variant1:53386(54282) variant2:53386(54282) variant3:57344(58240)
>> [ 117.537463] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
>> [ 117.537915] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
>> [ 117.538543] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18634(19530) variant3:24576(25472)
>> [ 117.539223] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
>> [ 117.539283] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:2 variant1:7050(7946) variant2:7050(7946) variant3:8192(9088)
>> [ 117.539403] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:2
>> [ 117.540035] eth0: mtu:1500 data_len:4410 len before:0 len after:4410 truesize before:896 truesize after:5050 nr_frags:3 variant1:4154(5050) variant2:4304(5200) variant3:12288(13184)
>> [ 117.540153] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
>> [ 121.981917] net_ratelimit: 27 callbacks suppressed
>> [ 121.981960] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 122.985019] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 123.988308] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 124.991961] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 125.995003] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> [ 126.998324] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index c26e28b..8833e38 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -964,6 +964,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
>> struct sk_buff_head tmpq;
>> unsigned long flags;
>> int err;
>> + int tsz,len;
>> spin_lock(&np->rx_lock);
>> @@ -1037,9 +1038,22 @@ err:
>> * receive throughout using the standard receive
>> * buffer size was cut by 25%(!!!).
>> */
>> - skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> +
>> +
>> +
>> +
>> + tsz = skb->truesize;
>> + len = skb->len;
>> + /* skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; */
>> + skb->truesize += skb->data_len - RX_COPY_THRESHOLD;
>> skb->len += skb->data_len;
>> + net_warn_ratelimited("%s: mtu:%d data_len:%d len before:%d len after:%d truesize before:%d truesize after:%d nr_frags:%d variant1:%d(%d) variant2:%d(%d) variant3:%d(%d) \n",
>> + skb->dev->name, skb->dev->mtu, skb->data_len, len, skb->len,tsz, skb->truesize, skb_shinfo(skb)->nr_frags,
>> + skb->data_len - RX_COPY_THRESHOLD, tsz + skb->data_len - RX_COPY_THRESHOLD ,
>> + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to, tsz + skb->data_len - NETFRONT_SKB_CB(skb)->pull_to,
>> + PAGE_SIZE * skb_shinfo(skb)->nr_frags, tsz + (PAGE_SIZE * skb_shinfo(skb)->nr_frags));
>> +
>> if (rx->flags & XEN_NETRXF_csum_blank)
>> skb->ip_summed = CHECKSUM_PARTIAL;
>> else if (rx->flags & XEN_NETRXF_data_validated)
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 3ab989b..6d0cd86 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3471,6 +3471,16 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
>> WARN_ON_ONCE(delta < len);
>> + if(delta < len) {
>> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA < LEN delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> + to->dev->name, from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> + }
>> +
+ if (delta >>> len && delta - len > 100) {
>> + net_warn_ratelimited("to: %s from: %s skb_try_coalesce: DELTA - LEN > 100 delta:%d len:%d from->truesize:%d skb_headlen(from):%d skb_shinfo(to)->nr_frags:%d skb_shinfo(from)->nr_frags:%d \n",
>> + to->dev->name,from->dev->name, delta, len, from->truesize, skb_headlen(from), skb_shinfo(to)->nr_frags, skb_shinfo(from)->nr_frags);
>> + }
>> +
>> memcpy(skb_shinfo(to)->frags + skb_shinfo(to)->nr_frags,
>> skb_shinfo(from)->frags,
>> skb_shinfo(from)->nr_frags * sizeof(skb_frag_t));
> Ok i succeeded in triggering the warn_on_once, but it seems the extra debug info from netfront was just rate limited away for the offending packet :(
> Dec 20 15:17:33 media kernel: [ 393.464062] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.464438] eth0: mtu:1500 data_len:762 len before:0 len after:762 truesize before:896 truesize after:1402 nr_frags:1 variant1:506(1402) variant2:506(1402) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.465083] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.466114] eth0: mtu:1500 data_len:118 len before:0 len after:118 truesize before:896 truesize after:758 nr_frags:1 variant1:-138(758) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:33 media kernel: [ 393.467336] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 394.940211] ------------[ cut here ]------------
> Dec 20 15:17:35 media kernel: [ 394.940259] WARNING: at net/core/skbuff.c:3472 skb_try_coalesce+0x3fc/0x470()
> Dec 20 15:17:35 media kernel: [ 394.940282] Modules linked in:
> Dec 20 15:17:35 media kernel: [ 394.940306] Pid: 2632, comm: glusterfs Not tainted 3.7.0-rc0-20121220-netfrontdebug1 #1
> Dec 20 15:17:35 media kernel: [ 394.940330] Call Trace:
> Dec 20 15:17:35 media kernel: [ 394.940343] <IRQ> [<ffffffff8106889a>] warn_slowpath_common+0x7a/0xb0
> Dec 20 15:17:35 media kernel: [ 394.940384] [<ffffffff810688e5>] warn_slowpath_null+0x15/0x20
> Dec 20 15:17:35 media kernel: [ 394.940409] [<ffffffff8184298c>] skb_try_coalesce+0x3fc/0x470
> Dec 20 15:17:35 media kernel: [ 394.940434] [<ffffffff818fb049>] tcp_try_coalesce+0x69/0xc0
> Dec 20 15:17:35 media kernel: [ 394.940458] [<ffffffff818fb0f4>] tcp_queue_rcv+0x54/0x100
> Dec 20 15:17:35 media kernel: [ 394.940481] [<ffffffff8190029f>] ? tcp_mtup_init+0x2f/0x90
> Dec 20 15:17:35 media kernel: [ 394.940504] [<ffffffff818ffbdb>] tcp_rcv_established+0x2bb/0x6a0
> Dec 20 15:17:35 media kernel: [ 394.940528] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [ 394.940551] [<ffffffff81907985>] tcp_v4_do_rcv+0x135/0x480
> Dec 20 15:17:35 media kernel: [ 394.940576] [<ffffffff819b3532>] ? _raw_spin_lock_nested+0x42/0x50
> Dec 20 15:17:35 media kernel: [ 394.940600] [<ffffffff8190839f>] ? tcp_v4_rcv+0x6cf/0xb10
> Dec 20 15:17:35 media kernel: [ 394.940623] [<ffffffff8190862d>] tcp_v4_rcv+0x95d/0xb10
> Dec 20 15:17:35 media kernel: [ 394.940666] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [ 394.940694] [<ffffffff818e4d6a>] ip_local_deliver_finish+0x11a/0x230
> Dec 20 15:17:35 media kernel: [ 394.940720] [<ffffffff818e4c95>] ? ip_local_deliver_finish+0x45/0x230
> Dec 20 15:17:35 media kernel: [ 394.940745] [<ffffffff818e4eb8>] ip_local_deliver+0x38/0x80
> Dec 20 15:17:35 media kernel: [ 394.940784] [<ffffffff818e447a>] ip_rcv_finish+0x15a/0x630
> Dec 20 15:17:35 media kernel: [ 394.940807] [<ffffffff818e4b68>] ip_rcv+0x218/0x300
> Dec 20 15:17:35 media kernel: [ 394.940829] [<ffffffff8184bf2d>] __netif_receive_skb+0x65d/0x8d0
> Dec 20 15:17:35 media kernel: [ 394.940853] [<ffffffff8184ba15>] ? __netif_receive_skb+0x145/0x8d0
> Dec 20 15:17:35 media kernel: [ 394.940889] [<ffffffff810b192d>] ? trace_hardirqs_on+0xd/0x10
> Dec 20 15:17:35 media kernel: [ 394.940914] [<ffffffff810fecbb>] ? free_hot_cold_page+0x1ab/0x1e0
> Dec 20 15:17:35 media kernel: [ 394.940939] [<ffffffff8184e4f8>] netif_receive_skb+0x28/0xf0
> Dec 20 15:17:35 media kernel: [ 394.940964] [<ffffffff81843e83>] ? __pskb_pull_tail+0x253/0x340
> Dec 20 15:17:35 media kernel: [ 394.941000] [<ffffffff8164fbb5>] xennet_poll+0xae5/0xed0
> Dec 20 15:17:35 media kernel: [ 394.941024] [<ffffffff81080081>] ? wake_up_worker+0x1/0x30
> Dec 20 15:17:35 media kernel: [ 394.941046] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [ 394.941075] [<ffffffff8184ed66>] net_rx_action+0x136/0x260
> Dec 20 15:17:35 media kernel: [ 394.941098] [<ffffffff81070551>] ? __do_softirq+0x71/0x1a0
> Dec 20 15:17:35 media kernel: [ 394.941133] [<ffffffff810705a9>] __do_softirq+0xc9/0x1a0
> Dec 20 15:17:35 media kernel: [ 394.941157] [<ffffffff819b623c>] call_softirq+0x1c/0x30
> Dec 20 15:17:35 media kernel: [ 394.941179] [<ffffffff8100fdc5>] do_softirq+0x85/0xf0
> Dec 20 15:17:35 media kernel: [ 394.941201] [<ffffffff8107041e>] irq_exit+0x9e/0xd0
> Dec 20 15:17:35 media kernel: [ 394.941235] [<ffffffff81430b1f>] xen_evtchn_do_upcall+0x2f/0x40
> Dec 20 15:17:35 media kernel: [ 394.941259] [<ffffffff819b629e>] xen_do_hypervisor_callback+0x1e/0x30
> Dec 20 15:17:35 media kernel: [ 394.941279] <EOI> [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [ 394.941318] [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
> Dec 20 15:17:35 media kernel: [ 394.941356] [<ffffffff8100890d>] ? xen_force_evtchn_callback+0xd/0x10
> Dec 20 15:17:35 media kernel: [ 394.941381] [<ffffffff810092b2>] ? check_events+0x12/0x20
> Dec 20 15:17:35 media kernel: [ 394.941405] [<ffffffff81009259>] ? xen_irq_enable_direct_reloc+0x4/0x4
> Dec 20 15:17:35 media kernel: [ 394.941432] [<ffffffff819b3f6c>] ? _raw_spin_unlock_irq+0x3c/0x70
> Dec 20 15:17:35 media kernel: [ 394.941473] [<ffffffff81095f83>] ? finish_task_switch+0x83/0xe0
> Dec 20 15:17:35 media kernel: [ 394.941507] [<ffffffff81095f46>] ? finish_task_switch+0x46/0xe0
> Dec 20 15:17:35 media kernel: [ 394.941533] [<ffffffff819b2434>] ? __schedule+0x444/0x880
> Dec 20 15:17:35 media kernel: [ 394.941555] [<ffffffff810b2fbc>] ? validate_chain+0x13c/0x1300
> Dec 20 15:17:35 media kernel: [ 394.941580] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [ 394.941614] [<ffffffff810b4c4b>] ? __lock_acquire+0x46b/0xdd0
> Dec 20 15:17:35 media kernel: [ 394.941638] [<ffffffff819aff95>] ? __mutex_unlock_slowpath+0x135/0x1d0
> Dec 20 15:17:35 media kernel: [ 394.941663] [<ffffffff819b2904>] ? schedule+0x24/0x70
> Dec 20 15:17:35 media kernel: [ 394.941697] [<ffffffff819b179d>] ? schedule_hrtimeout_range_clock+0x11d/0x140
> Dec 20 15:17:35 media kernel: [ 394.941725] [<ffffffff810b5688>] ? lock_acquire+0xd8/0x100
> Dec 20 15:17:35 media kernel: [ 394.941748] [<ffffffff8118a558>] ? ep_poll+0xf8/0x3a0
> Dec 20 15:17:35 media kernel: [ 394.941770] [<ffffffff819b4015>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
> Dec 20 15:17:35 media kernel: [ 394.941808] [<ffffffff810b1818>] ? trace_hardirqs_on_caller+0xf8/0x200
> Dec 20 15:17:35 media kernel: [ 394.941833] [<ffffffff819b17ce>] ? schedule_hrtimeout_range+0xe/0x10
> Dec 20 15:17:35 media kernel: [ 394.941856] [<ffffffff8118a75a>] ? ep_poll+0x2fa/0x3a0
> Dec 20 15:17:35 media kernel: [ 394.941878] [<ffffffff81098630>] ? try_to_wake_up+0x310/0x310
> Dec 20 15:17:35 media kernel: [ 394.941913] [<ffffffff810b5b17>] ? lock_release+0x117/0x250
> Dec 20 15:17:35 media kernel: [ 394.941938] [<ffffffff81165fd7>] ? fget_light+0xd7/0x140
> Dec 20 15:17:35 media kernel: [ 394.941959] [<ffffffff81165f3a>] ? fget_light+0x3a/0x140
> Dec 20 15:17:35 media kernel: [ 394.941981] [<ffffffff8118a8ce>] ? sys_epoll_wait+0xce/0xe0
> Dec 20 15:17:35 media kernel: [ 394.942015] [<ffffffff819b4e69>] ? system_call_fastpath+0x16/0x1b
> Dec 20 15:17:35 media kernel: [ 394.942036] ---[ end trace 6f3a832c9e91c8af ]---
> Dec 20 15:17:35 media kernel: [ 394.942056] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:22978 len:23168 from->truesize:23874 skb_headlen(from):0 skb_shinfo(to)->nr_frags:4 skb_shinfo(from)->nr_frags:6
> Dec 20 15:17:35 media kernel: [ 394.968199] to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:14290 len:14480 from->truesize:15186 skb_headlen(from):0 skb_shinfo(to)->nr_frags:13 skb_shinfo(from)->nr_frags:4
> Dec 20 15:17:35 media kernel: [ 395.262814] net_ratelimit: 371 callbacks suppressed
> Dec 20 15:17:35 media kernel: [ 395.262858] eth0: mtu:1500 data_len:90 len before:0 len after:90 truesize before:896 truesize after:730 nr_frags:1 variant1:-166(730) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.264767] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.266193] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.268422] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.271617] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.274794] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.278104] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.281319] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.284454] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.287797] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
> Dec 20 15:17:35 media kernel: [ 395.291121] eth0: mtu:1500 data_len:66 len before:0 len after:66 truesize before:896 truesize after:706 nr_frags:1 variant1:-190(706) variant2:0(896) variant3:4096(4992)
Hmm perhaps a better example, i have indented some perhaps interesting points:
Dec 20 14:12:57 media kernel: [ 794.895136] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.895431] eth0: mtu:1500 data_len:17442 len before:0 len after:17442 truesize before:896 truesize after:18082 nr_frags:5 variant1:17186(18082) variant2:17186(18082) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.895616] eth0: mtu:1500 data_len:18890 len before:0 len after:18890 truesize before:896 truesize after:19530 nr_frags:6 variant1:18634(19530) variant2:18824(19720) variant3:24576(25472)
Dec 20 14:12:57 media kernel: [ 794.895804] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
Dec 20 14:12:57 media kernel: [ 794.895823] eth0: mtu:1500 data_len:7306 len before:0 len after:7306 truesize before:896 truesize after:7946 nr_frags:3 variant1:7050(7946) variant2:7050(7946) variant3:12288(13184)
Dec 20 14:12:57 media kernel: [ 794.895868] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:7690 len:7240 from->truesize:7946 skb_headlen(from):190 skb_shinfo(to)->nr_frags:5 skb_shinfo(from)->nr_frags:3
Dec 20 14:12:57 media kernel: [ 794.896133] eth0: mtu:1500 data_len:15994 len before:0 len after:15994 truesize before:896 truesize after:16634 nr_frags:5 variant1:15738(16634) variant2:15738(16634) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.896152] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 794.896200] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:1
Dec 20 14:12:57 media kernel: [ 794.907232] eth0: mtu:1500 data_len:23234 len before:0 len after:23234 truesize before:896 truesize after:23874 nr_frags:7 variant1:22978(23874) variant2:22978(23874) variant3:28672(29568)
Dec 20 14:12:57 media kernel: [ 794.907517] eth0: mtu:1500 data_len:24682 len before:0 len after:24682 truesize before:896 truesize after:25322 nr_frags:7 variant1:24426(25322) variant2:24426(25322) variant3:28672(29568)
Dec 20 14:12:57 media kernel: [ 794.907693] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)
Dec 20 14:12:57 media kernel: [ 794.907882] eth0: mtu:1500 data_len:14546 len before:0 len after:14546 truesize before:896 truesize after:15186 nr_frags:5 variant1:14290(15186) variant2:14290(15186) variant3:20480(21376)
Dec 20 14:12:57 media kernel: [ 794.907901] eth0: mtu:1500 data_len:13098 len before:0 len after:13098 truesize before:896 truesize after:13738 nr_frags:4 variant1:12842(13738) variant2:12842(13738) variant3:16384(17280)
Dec 20 14:12:57 media kernel: [ 794.907938] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:13482 len:13032 from->truesize:13738 skb_headlen(from):190 skb_shinfo(to)->nr_frags:6 skb_shinfo(from)->nr_frags:4
Dec 20 14:12:57 media kernel: [ 794.908191] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28880(29776) variant3:36864(37760)
Dec 20 14:12:57 media kernel: [ 794.908386] eth0: mtu:1500 data_len:30474 len before:0 len after:30474 truesize before:896 truesize after:31114 nr_frags:8 variant1:30218(31114) variant2:30218(31114) variant3:32768(33664)
A1) Here we have a packet data_len: 5858 and truesize set to 6498 and nr_frags: 2
Dec 20 14:12:57 media kernel: [ 794.908560] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:2 variant1:5602(6498) variant2:5602(6498) variant3:8192(9088)
Dec 20 14:12:57 media kernel: [ 794.908581] eth0: mtu:1500 data_len:26130 len before:0 len after:26130 truesize before:896 truesize after:26770 nr_frags:7 variant1:25874(26770) variant2:25874(26770) variant3:28672(29568)
A2) That seems to end up in skb_try_coalesce, from->nr_frags is still 2, delta >> LEN in this case, no warning but perhaps wasteful ?
Dec 20 14:12:57 media kernel: [ 794.908616] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:6242 len:5792 from->truesize:6498 skb_headlen(from):190 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2
Dec 20 14:12:57 media kernel: [ 794.908834] eth0: mtu:1500 data_len:33370 len before:0 len after:33370 truesize before:896 truesize after:34010 nr_frags:9 variant1:33114(34010) variant2:33114(34010) variant3:36864(37760)
B1) Here we have again a packet data_len: 5858 and truesize set to 6498, but nr_frags: 3 this time.
Dec 20 14:12:57 media kernel: [ 794.908992] eth0: mtu:1500 data_len:5858 len before:0 len after:5858 truesize before:896 truesize after:6498 nr_frags:3 variant1:5602(6498) variant2:5792(6688) variant3:12288(13184)
Dec 20 14:12:57 media kernel: [ 794.909012] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:8 variant1:28770(29666) variant2:28770(29666) variant3:32768(33664)
B2) That seems to end up in skb_try_coalesce, from->nr_frags is now 2 instead of 3, delta < LEN in this case, so it would have triggered the warn_on_once
Dec 20 14:12:57 media kernel: [ 794.909040] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA < LEN delta:5602 len:5792 from->truesize:6498 skb_headlen(from):0 skb_shinfo(to)->nr_frags:9 skb_shinfo(from)->nr_frags:2
Dec 20 14:12:57 media kernel: [ 794.909673] eth0: mtu:1500 data_len:1514 len before:0 len after:1514 truesize before:896 truesize after:2154 nr_frags:1 variant1:1258(2154) variant2:1258(2154) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 794.909692] eth0: mtu:1500 data_len:522 len before:0 len after:522 truesize before:896 truesize after:1162 nr_frags:1 variant1:266(1162) variant2:266(1162) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 794.909736] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:906 len:456 from->truesize:1162 skb_headlen(from):190 skb_shinfo(to)->nr_frags:2 skb_shinfo(from)->nr_frags:1
Dec 20 14:12:57 media kernel: [ 794.910205] eth0: mtu:1500 data_len:36266 len before:0 len after:36266 truesize before:896 truesize after:36906 nr_frags:10 variant1:36010(36906) variant2:36010(36906) variant3:40960(41856)
Dec 20 14:12:57 media kernel: [ 794.910706] eth0: mtu:1500 data_len:37714 len before:0 len after:37714 truesize before:896 truesize after:38354 nr_frags:10 variant1:37458(38354) variant2:37458(38354) variant3:40960(41856)
Dec 20 14:12:57 media kernel: [ 794.911472] eth0: mtu:1500 data_len:27578 len before:0 len after:27578 truesize before:896 truesize after:28218 nr_frags:8 variant1:27322(28218) variant2:27322(28218) variant3:32768(33664)
Dec 20 14:12:57 media kernel: [ 794.911695] eth0: mtu:1500 data_len:29026 len before:0 len after:29026 truesize before:896 truesize after:29666 nr_frags:9 variant1:28770(29666) variant2:28770(29666) variant3:36864(37760)
Dec 20 14:12:57 media kernel: [ 795.015511] eth0: mtu:1500 data_len:1018 len before:0 len after:1018 truesize before:896 truesize after:1658 nr_frags:1 variant1:762(1658) variant2:762(1658) variant3:4096(4992)
Dec 20 14:12:57 media kernel: [ 795.015585] skbuff: to: (null) from: (null) skb_try_coalesce: DELTA - LEN > 100 delta:1402 len:952 from->truesize:1658 skb_headlen(from):190 skb_shinfo(to)->nr_frags:10 skb_shinfo(from)->nr_frags:1
Dec 20 14:12:57 media kernel: [ 795.015641] eth0: mtu:1500 data_len:10202 len before:0 len after:10202 truesize before:896 truesize after:10842 nr_frags:4 variant1:9946(10842) variant2:9946(10842) variant3:16384(17280)
Dec 20 14:12:57 media kernel: [ 795.015657] eth0: mtu:1500 data_len:42 len before:0 len after:42 truesize before:896 truesize after:682 nr_frags:1 variant1:-214(682) variant2:0(896) variant3:4096(4992)
Dec 20 14:12:58 media kernel: [ 795.817824] net_ratelimit: 9 callbacks suppressed
--
Sander
^ permalink raw reply
* Re: [PATCH] net: ipv4: route: fixed a coding style issues net: ipv4: tcp: fixed a coding style issues
From: Eric Dumazet @ 2012-12-20 15:23 UTC (permalink / raw)
To: nicolas.dichtel
Cc: Stefan Hasko, David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <50D2FF86.3000603@6wind.com>
On Thu, 2012-12-20 at 13:07 +0100, Nicolas Dichtel wrote:
> Le 20/12/2012 09:08, Stefan Hasko a écrit :
> > + "out_hlist_search\n");
> checkpatch will warn you about this one, something like:
> "WARNING: quoted string split across lines".
> Not breaking such line ease to grep the pattern.
Yes.
Could we please leave this file as is for at least 2 years ?
We had a lot of recent changes and probable fixes are expected.
Such "coding style" patches are a real pain when trying to fix bugs,
especially dealing with stable/old kernels.
Thanks
^ permalink raw reply
* Re: [PATCH] pkt_sched: act_xt support new Xtables interface
From: Yury Stankevich @ 2012-12-20 14:59 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Hasan Chowdhury, Stephen Hemminger, Jan Engelhardt,
netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <50D305FD.7000901@mojatatu.com>
interesting,
#tc -s filter show dev usb0 parent ffff:
filter protocol ip pref 49152 u32
filter protocol ip pref 49152 u32 fh 800: ht divisor 1
filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
0 terminal flowid ??? (rule hit 707 success 707)
match 00000000/00000000 at 0 (success 707 )
action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
target CONNMARK restore
index 5 ref 1 bind 1 installed 394 sec used 11 sec
Action statistics:
Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
action order 2: mirred (Egress Redirect to device ifb0) stolen
index 5 ref 1 bind 1 installed 394 sec used 11 sec
Action statistics:
Sent 783783 bytes 707 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
so, looks like packets was sent to CONNMARK target.
but...
i make a iptables rule to log packets with 0xa mark:
Chain PREROUTING (policy ACCEPT 1308 packets, 848K bytes)
pkts bytes target prot opt in out source
destination
0 0 NFLOG all -- * * 0.0.0.0/0
0.0.0.0/0 mark match 0xa nflog-group 1
Chain POSTROUTING (policy ACCEPT 1240 packets, 550K bytes)
pkts bytes target prot opt in out source
destination
1 40 CONNMARK tcp -- * * 0.0.0.0/0
0.0.0.0/0 tcp dpt:80 connmark match 0x0 connbytes 204800
connbytes mode bytes connbytes direction both CONNMARK set 0xa
idea is:
i run downloading, rule from POSTROUTING must fire if i download more
than ~200K,
tc filter call to CONNMARK restore, must restore mark (0xa) for packets
belong to this connection.
so i expect, that PREROUTING rule must notice the restored mark, but it
doesn't.
maybe i miss something ?
20.12.2012 16:35, Jamal Hadi Salim пишет:
>
> Could be your setup. I didnt do a lot of testing but
> from my notes (running different kernel at the moment):
>
> #try to point to everything (no iptables setup)
> tc filter add dev eth0 parent ffff: protocol ip u32 match u32 0 0 flowid
> 23:23 action xt -j CONNMARK --restore-mark
> #let it run for a 1 sec then display with
> tc -s filter show dev eth0 parent ffff:
>
> ----
> filter protocol ip pref 49152 u32
> filter protocol ip pref 49152 u32 fh 800: ht divisor 1
> filter protocol ip pref 49152 u32 fh 800::800 order 2048 key ht 800 bkt
> 0 flowid 23:23
> match 00000000/00000000 at 0
> action order 1: tablename: mangle hook: NF_IP_PRE_ROUTING
> target CONNMARK restore
> index 1 ref 1 bind 1 installed 3 sec used 1 sec
> Action statistics:
> Sent 280 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> ----
>
> cheers,
> jamal
>
> On 12-12-20 03:54 AM, Yury Stankevich wrote:
>> 19.12.2012 15:56, Jamal Hadi Salim пишет:
>>> Hasan/Yury, if you test this please use the latest iproute2 with only
>>> the first patch I posted (originally from Hasan). Hasan please use that
>>> patch not your version - if theres anything wrong we can find out sooner
>>> before the patch becomes final.
>>
>> Hello,
>> 3.7.1 kernel with 3.7.0 iproute,
>> patch-xt, xt-p1 + linkage fix was applyed
>> command successfully performed, but actually doesn't work.
>>
>> command:
>> tc filter add dev $dev parent ffff: protocol ip u32 match u32 0 0 \
>> action xt -j CONNMARK --restore-mark \
>> action mirred egress redirect dev ifb0
>> then i use filter:
>>
>> tc filter add dev ifb0 protocol ip parent 1: prio 2 handle 0xa fw flowid
>> 1:102
>>
>> iptables line:
>> iptable -t mangle -A POSTROUTING -p tcp --dport 80 -m connmark --mark 0
>> -m connbytes --connbytes 204800: --connbytes-dir both --connbytes-mode
>> bytes -j CONNMARK --set-mark 0xa
>>
>> once i run a test to download 300K file,
>> from iptables counters i can see that rule in POSTROUTING is triggered,
>> but from `tc -s qdisc show dev ifb0` i see that no packets was sent to
>> 1:102 flow.
>>
>> btw,
>> tc -p -s filter show dev ifb0 parent 1:
>> do not show stats `(rule hit 416 success 0)` for this (filter protocol
>> ip pref 2 fw handle 0xa classid 1:102) rule.
>>
>>
>>
>
--
Linux registered user #402966 // pub 1024D/E99AF373 <pgp.mit.edu>
^ permalink raw reply
* Re: [PATCH net-next V4 02/13] bridge: Add vlan filtering infrastructure
From: Vlad Yasevich @ 2012-12-20 15:31 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <20121220153913.11a10fd0@pixies.home.jungo.com>
On 12/20/2012 08:39 AM, Shmulik Ladkani wrote:
> Hi Vlad,
>
> On Wed, 19 Dec 2012 12:48:13 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
>> +static void nbp_vlan_flush(struct net_bridge_port *p)
>> +{
>> + struct net_port_vlan *pve;
>> + struct net_port_vlan *tmp;
>> +
>> + ASSERT_RTNL();
>> +
>> + list_for_each_entry_safe(pve, tmp, &p->vlan_list, list)
>> + nbp_vlan_delete(p, pve->vid, BRIDGE_FLAGS_SELF);
>
> Why would you want to clear "bridge master port" association from this
> vlan, in the event of NBP destruction?
> The "bridge port" may still be a member of this vlan, doesn't it?
> Seems flags argument should be 0.
This ends up getting fixed later, but you are right. This should be 0.
>
>> +#define BR_VID_HASH_SIZE (1<<6)
>> +#define br_vlan_hash(vid) ((vid) % (BR_VID_HASH_SIZE - 1))
>
> Did you mean: & (BR_VID_HASH_SIZE - 1)
yes.
thanks
-vlad
>
> Regards,
> Shmulik
>
^ permalink raw reply
* Re: [PATCH] net: ipv4: route: fix coding style issues net: ipv4: tcp: fix coding style issues
From: Eric Dumazet @ 2012-12-20 15:31 UTC (permalink / raw)
To: Stefan Hasko
Cc: David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <1356013685-31649-1-git-send-email-hasko.stevo@gmail.com>
On Thu, 2012-12-20 at 15:28 +0100, Stefan Hasko wrote:
> Fix a coding style issues.
>
> Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
> ---
> net/ipv4/route.c | 119 ++++++++++++++++-------------
> net/ipv4/tcp.c | 218 +++++++++++++++++++++++++++++++-----------------------
> 2 files changed, 194 insertions(+), 143 deletions(-)
I Nack this patch and any such patches in net/core, net/ipv4 trees for a
while.
We had too many recent changes and probably need a bunch of real fixes.
Thanks
^ permalink raw reply
* Re: [PATCH] xen/netfront: improve truesize tracking
From: Eric Dumazet @ 2012-12-20 15:39 UTC (permalink / raw)
To: Sander Eikelenboom
Cc: Ian Campbell, netdev@vger.kernel.org, Konrad Rzeszutek Wilk,
annie li, xen-devel@lists.xensource.com
In-Reply-To: <1797374383.20121220135139@eikelenboom.it>
On Thu, 2012-12-20 at 13:51 +0100, Sander Eikelenboom wrote:
> Eric:
> From the warn_on_once, delta should be smaller than len, but probably they should be as close together as possible.
> When you say "accurate estimation", what would be a acceptable difference between DELTA and LEN ?
I would use the most exact value, which is :
skb->truesize += nr_frags * PAGE_SIZE;
Then, if we can spot later a regression in some stacks, adapt the
limiting parameters. I did a lot of work in GRO and TCP stack to reduce
the memory, and further changes are possible.
We really want to account memory, because we want to control how memory
is used on our machines and don't let some users use more than the
amount that was allowed to them.
^ permalink raw reply
* Re: [PATCH net-next V4 03/13] bridge: Validate that vlan is permitted on ingress
From: Vlad Yasevich @ 2012-12-20 15:41 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <20121220160713.30cdfc05@pixies.home.jungo.com>
On 12/20/2012 09:07 AM, Shmulik Ladkani wrote:
> Hi Vlad,
>
> On Wed, 19 Dec 2012 12:48:14 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
>> +static bool br_allowed_ingress(struct net_bridge_port *p, struct sk_buff *skb)
>> +{
>> + struct net_port_vlan *pve;
>> + u16 vid;
>> +
>> + /* If there are no vlan in the permitted list, all packets are
>> + * permitted.
>> + */
>> + if (list_empty(&p->vlan_list))
>> + return true;
>
> I assumed the default policy would be Drop in such case, otherwise
> leaking between vlan domains is possible.
> Or maybe, ingress policy when port isn't a member of ingress VID should
> be configurable (drop/allow).
We have have to default to allow since we want to retain original bridge
functionality if there is no configuration.
>
>> + vid = br_get_vlan(skb);
>> + pve = nbp_vlan_find(p, vid);
>
> Why search by iterating through NBP's vlan_list?
> You know the VID (hence may fetch the net_bridge_vlan from the hash), so
> why don't you directly consult the net_bridge_vlan's port_bitmap?
It's an alternative... I am betting that this port isn't in too many
vlans and that searching the list might be faster.
>
>> @@ -54,6 +74,9 @@ int br_handle_frame_finish(struct sk_buff *skb)
>> if (!p || p->state == BR_STATE_DISABLED)
>> goto drop;
>>
>> + if (!br_allowed_ingress(p, skb))
>> + goto drop;
>> +
>
> This condition should be also encorporated upon "ingress" at the "bridge
> master port" (that is, early at br_dev_xmit).
> Think of the "bridge master port" as yet another port:
> upon "ingress" (meaning, tx packets from the ip stack), we should
> also enforce any ingress permission rules.
>
I've tried that before and now can't think of a reason why I rejected
it. I'll try to remember...
Thanks
-vlad
> Regards,
> Shmulik
>
^ permalink raw reply
* Re: [PATCH net-next V4 03/13] bridge: Validate that vlan is permitted on ingress
From: Shmulik Ladkani @ 2012-12-20 16:24 UTC (permalink / raw)
To: vyasevic; +Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <50D331A8.3080206@redhat.com>
On Thu, 20 Dec 2012 10:41:28 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> >> +static bool br_allowed_ingress(struct net_bridge_port *p, struct sk_buff *skb)
> >> +{
> >> + struct net_port_vlan *pve;
> >> + u16 vid;
> >> +
> >> + /* If there are no vlan in the permitted list, all packets are
> >> + * permitted.
> >> + */
> >> + if (list_empty(&p->vlan_list))
> >> + return true;
> >
> > I assumed the default policy would be Drop in such case, otherwise
> > leaking between vlan domains is possible.
> > Or maybe, ingress policy when port isn't a member of ingress VID should
> > be configurable (drop/allow).
>
> We have have to default to allow since we want to retain original bridge
> functionality if there is no configuration.
Ok; so having the port not a member of ANY vlan is a "port vlan
disabled" configuration knob, and as such, it is a member of ANY vlan,
meaning that:
(1) every "non-vlan port" is connected to any other "non-vlan port"
(2) frame ingress on a "non-vlan" port may egress on a "vlan enabled"
port, depending on the ingress VID and the port-membership map of the
egress port
(and thus, PVID should be defined even to "non-vlan" ports, for the
case where untagged frame is received on the non-vlan port)
(3) frame ingress on a "vlan-enabled" port would always egress on
"non-vlan" ports
Seems ok.
However this is an additional nuance that might not be expected by the
user configuring the bridge; maybe this needs some clarification.
> >> + vid = br_get_vlan(skb);
> >> + pve = nbp_vlan_find(p, vid);
> >
> > Why search by iterating through NBP's vlan_list?
> > You know the VID (hence may fetch the net_bridge_vlan from the hash), so
> > why don't you directly consult the net_bridge_vlan's port_bitmap?
>
> It's an alternative... I am betting that this port isn't in too many
> vlans and that searching the list might be faster.
I assumed the opposite: finding the hash bucket is just a bitwise mask,
and number of items in a bucket would rarely be grater than 1.
I expect such code to be shorter, but this needs to be verified.
Regards,
Shmulik
^ permalink raw reply
* Re: [PATCH net] net/vxlan: Use the underlying device index when joining/leaving multicast groups
From: Stephen Hemminger @ 2012-12-20 16:26 UTC (permalink / raw)
To: Yan Burman; +Cc: netdev, ogerlitz
In-Reply-To: <1356010568-21644-1-git-send-email-yanb@mellanox.com>
On Thu, 20 Dec 2012 15:36:08 +0200
Yan Burman <yanb@mellanox.com> wrote:
> The socket calls from vxlan to join/leave multicast group aren't
> using the index of the underlying device, as a result the stack uses
> the first interface that is up. This results in vxlan being non functional
> over a device which isn't the 1st to be up.
> Fix this by providing the iflink field to the vxlan instance
> to the multicast calls.
>
> Signed-off-by: Yan Burman <yanb@mellanox.com>
> ---
> drivers/net/vxlan.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 3b3fdf6..40f2cc1 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -505,7 +505,8 @@ static int vxlan_join_group(struct net_device *dev)
> struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
> struct sock *sk = vn->sock->sk;
> struct ip_mreqn mreq = {
> - .imr_multiaddr.s_addr = vxlan->gaddr,
> + .imr_multiaddr.s_addr = vxlan->gaddr,
> + .imr_ifindex = vxlan->link,
> };
> int err;
>
> @@ -532,7 +533,8 @@ static int vxlan_leave_group(struct net_device *dev)
> int err = 0;
> struct sock *sk = vn->sock->sk;
> struct ip_mreqn mreq = {
> - .imr_multiaddr.s_addr = vxlan->gaddr,
> + .imr_multiaddr.s_addr = vxlan->gaddr,
> + .imr_ifindex = vxlan->link,
> };
>
> /* Only leave group when last vxlan is done. */
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
^ permalink raw reply
* Re: Lockdep warning in vxlan
From: Stephen Hemminger @ 2012-12-20 16:34 UTC (permalink / raw)
To: Yan Burman; +Cc: netdev
In-Reply-To: <50D31A00.7060905@mellanox.com>
On Thu, 20 Dec 2012 16:00:32 +0200
Yan Burman <yanb@mellanox.com> wrote:
> Hi.
>
> When working with vxlan from current net-next, I got a lockdep warning
> (below).
> It seems to happen when I have host B pinging host A and while the pings
> continue,
> I do "ip link del" on the vxlan interface on host A. The lockdep warning
> is on host A.
> Tell me if you need some more info.
>
Looks like the case of nested ARP requests, the initial request is coming
from neigh_timer (ARP retransmit), but inside neigh_probe the lock
is dropped?
^ permalink raw reply
* Re: [PATCH net-next] af_unix: MSG_TRUNC support for dgram sockets
From: Michael Kerrisk (man-pages) @ 2012-12-20 16:50 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, piergiorgio.beruto, netdev
In-Reply-To: <1329902695.18384.101.camel@edumazet-laptop>
On Wed, Feb 22, 2012 at 10:24 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Piergiorgio Beruto expressed the need to fetch size of first datagram in
> queue for AF_UNIX sockets and suggested a patch against SIOCINQ ioctl.
>
> I suggested instead to implement MSG_TRUNC support as a recv() input
> flag, as already done for RAW, UDP & NETLINK sockets.
>
> len = recv(fd, &byte, 1, MSG_PEEK | MSG_TRUNC);
>
> MSG_TRUNC asks recv() to return the real length of the packet, even when
> is was longer than the passed buffer.
>
> There is risk that a userland application used MSG_TRUNC by accident
> (since it had no effect on af_unix sockets) and this might break after
> this patch.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
> CC: Michael Kerrisk <mtk.manpages@gmail.com>
Patch below to man-pages applied.
Thanks for CCing me, Eric.
Cheers,
Michael
--- a/man2/recv.2
+++ b/man2/recv.2
@@ -264,7 +264,7 @@ subsequent receive call will return the same data.
For raw
.RB ( AF_PACKET ),
Internet datagram (since Linux 2.4.27/2.6.8),
-and netlink (since Linux 2.6.22) sockets:
+netlink (since Linux 2.6.22) and UNIX datagram (since Linux 3.4) sockets:
return the real length of the packet or datagram,
even when it was longer than the passed buffer.
Not implemented for UNIX domain
^ permalink raw reply
* Re: [PATCH net-next V4 03/13] bridge: Validate that vlan is permitted on ingress
From: Vlad Yasevich @ 2012-12-20 16:54 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst, erdnetdev, jiri
In-Reply-To: <20121220182402.6143fcb1@pixies.home.jungo.com>
On 12/20/2012 11:24 AM, Shmulik Ladkani wrote:
> On Thu, 20 Dec 2012 10:41:28 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
>>>> +static bool br_allowed_ingress(struct net_bridge_port *p, struct sk_buff *skb)
>>>> +{
>>>> + struct net_port_vlan *pve;
>>>> + u16 vid;
>>>> +
>>>> + /* If there are no vlan in the permitted list, all packets are
>>>> + * permitted.
>>>> + */
>>>> + if (list_empty(&p->vlan_list))
>>>> + return true;
>>>
>>> I assumed the default policy would be Drop in such case, otherwise
>>> leaking between vlan domains is possible.
>>> Or maybe, ingress policy when port isn't a member of ingress VID should
>>> be configurable (drop/allow).
>>
>> We have have to default to allow since we want to retain original bridge
>> functionality if there is no configuration.
>
> Ok; so having the port not a member of ANY vlan is a "port vlan
> disabled" configuration knob, and as such, it is a member of ANY vlan,
> meaning that:
>
> (1) every "non-vlan port" is connected to any other "non-vlan port"
Technically, it would be connected to every over port.
> (2) frame ingress on a "non-vlan" port may egress on a "vlan enabled"
> port, depending on the ingress VID and the port-membership map of the
> egress port
> (and thus, PVID should be defined even to "non-vlan" ports, for the
> case where untagged frame is received on the non-vlan port)
Sort of. The way I did it (testing now), is like this:
if there is egress policy
apply policy and forward.
else if there was ingress policy (pvid)
apply it and forward
else
forward as is (old bridge behavior).
This way if there was a pvid on an ingress port and nothing on egress,
then pvid applies. If there was nothing configured on ingress port,
but we have a egress policy, we'll apply any vlan information from
the frame to egress policy. In this case, ingress untagged traffic
would be assigned vlan 0.
> (3) frame ingress on a "vlan-enabled" port would always egress on
> "non-vlan" ports
yes and they would egress based on their ingress policy.
>
> Seems ok.
> However this is an additional nuance that might not be expected by the
> user configuring the bridge; maybe this needs some clarification.
I'll try to document things sufficiently. This hybrid approach may
produce some unintended results. We could always remove it or introduce
the tunable to change default policy to drop once vlan configuration is
in effect.
>
>>>> + vid = br_get_vlan(skb);
>>>> + pve = nbp_vlan_find(p, vid);
>>>
>>> Why search by iterating through NBP's vlan_list?
>>> You know the VID (hence may fetch the net_bridge_vlan from the hash), so
>>> why don't you directly consult the net_bridge_vlan's port_bitmap?
>>
>> It's an alternative... I am betting that this port isn't in too many
>> vlans and that searching the list might be faster.
>
> I assumed the opposite: finding the hash bucket is just a bitwise mask,
> and number of items in a bucket would rarely be grater than 1.
> I expect such code to be shorter, but this needs to be verified.
I'll try to set something up, but that will probably be next year...
-vlad
>
> Regards,
> Shmulik
>
^ permalink raw reply
* Re: [PATCH net-next V4 00/13] Add basic VLAN support to bridges
From: Stephen Hemminger @ 2012-12-20 17:07 UTC (permalink / raw)
To: Vitalii Demianets
Cc: Andrew Collins, Vlad Yasevich, netdev, davem, or.gerlitz, jhs,
mst, erdnetdev, jiri
In-Reply-To: <201212201208.14204.vitas@nppfactor.kiev.ua>
On Thu, 20 Dec 2012 12:08:13 +0200
Vitalii Demianets <vitas@nppfactor.kiev.ua> wrote:
> On Thursday 20 December 2012 00:54:27 Andrew Collins wrote:
> > On Wed, Dec 19, 2012 at 10:48 AM, Vlad Yasevich <vyasevic@redhat.com> wrote:
> > > This series of patches provides an ability to add VLANs to the bridge
> > > ports. This is similar to what can be found in most switches. The
> > > bridge port may have any number of VLANs added to it including vlan 0
> > > priority tagged traffic. When vlans are added to the port, only traffic
> > > tagged with particular vlan will forwarded over this port. Additionally,
> > > vlan ids are added to FDB entries and become part of the lookup. This
> > > way we correctly identify the FDB entry.
> >
> > This is likely well beyond the scope of this change, but I figured I'd
> > throw out the question anyway. This changeset looks to bring the
> > Linux bridging code closer to the 802.1Q-2005 definition of a bridge,
> > which is nice to see, I'm curious if this changeset also opens up the
> > possibility of supporting MSTP in the future? The big thing I see
> > missing is per-VLAN port state, although I'm not very familiar with
> > the current STP/bridge interactions. Has anyone put any thought into
> > what other necessary bridge pieces might be missing for MSTP support?
>
> I think, to be compatible with 802.1Q-2005 we need the following pieces:
> 1) Multiple FIDs (it is 802.1Q term for FDB) support. It means that kernel
> should support several independent FDBs on a single bridge. The 802.1Q-2005
> standard requires the number of supported FDBs to be no less than the number
> of different MSTIs the implementation supports;
> 2) VLAN-to-FDB mapping should be introduced;
> 3) Support of Multiple Spanning Tree Instances (MSTIs);
> 4) FDB-to-MSTI mapping should be introduced;
> 5) And finally, per-MST port states should be implemented.
>
> > obviously something to handle the MSTP protocol itself would need to exist
> as well
>
> Please look here: http://sourceforge.net/projects/mstpd/
A couple of points:
* How does this compare with features/functionality of commercial
hardware bridges?
* Is this as simple as possible? It looks like there is creeping-featurism
here. I am all for a simple extension to allow bridge vlan filtering, but
not the added complexity of "let's teach bridges all about all possible
things any user might want to do with vlan.s"
^ permalink raw reply
* Re: at91sam9260 MACB problem with IP fragmentation
From: Nicolas Ferre @ 2012-12-20 17:51 UTC (permalink / raw)
To: Erwin Rol
Cc: linux-kernel, Havard Skinnemoen, linux-arm-kernel, matteo.fortini,
netdev
In-Reply-To: <50D2D7BD.3030801@erwinrol.com>
On 12/20/2012 10:17 AM, Erwin Rol :
> Hallo Nicolas,
>
> On 6-12-2012 14:27, Nicolas Ferre wrote:
>> Erwin,
>>
>> On 12/06/2012 12:32 PM, Erwin Rol :
>>> Hello Nicolas, Havard, all,
>>>
>>> I have a very obscure problem with a at91sam9260 board (almost 1 to 1
>>> copy of the Atmel EK).
>>>
>>> The MACB seems to stall when I use large (>2 * MTU) UDP datagrams. The
>>> test case is that a udp echo client (PC) sends datagrams with increasing
>>> length to the AT91 until the max length of the UDP datagram is reached.
>>> When there is no IP fragmentation everything is fine, but when the
>>> datagrams are starting to get fragmented the AT91 will not reply
>>> anymore. But as soon as some network traffic happens it goes on again,
>>> and non of the data is lost.
>
> <snip>
>
>>> I tried several kernels including the test version from Nicolas that he
>>> posted on LKML in October. They all show the same effect.
>>
>> [..]
>>
>> It seems that Matteo has the same behavior: check here:
>> http://www.spinics.net/lists/netdev/msg218951.html
>
> I tried Matteo's patch and it seems to work. But I don't know if the
> patch is really the right solution. I checked again with wireshark and
> it really seems the sending that stalls not the receiving. But as soon
> as a ethernet frame is received the sending "un-stalls". So maybe the
> patch just causes an MACB IRQ at certain moments that causes the sending
> to continue?
Any digging is interesting for me.
>> I am working on the macb driver right now, so I will try to reproduce
>> and track this issue on my side.
>
> Any luck reproducing it ?
Yes, I see unexpected things happening but as I am connected to a whole
company network so maybe some broadcast packets are unlocking the
interface...
Anyway, I am continuing to investigate.
Best regards,--
Nicolas Ferre
^ permalink raw reply
* Re: Lockdep warning in vxlan
From: Eric Dumazet @ 2012-12-20 18:16 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Yan Burman, netdev
In-Reply-To: <20121220083436.0c7fc33f@nehalam.linuxnetplumber.net>
On Thu, 2012-12-20 at 08:34 -0800, Stephen Hemminger wrote:
> On Thu, 20 Dec 2012 16:00:32 +0200
> Yan Burman <yanb@mellanox.com> wrote:
>
> > Hi.
> >
> > When working with vxlan from current net-next, I got a lockdep warning
> > (below).
> > It seems to happen when I have host B pinging host A and while the pings
> > continue,
> > I do "ip link del" on the vxlan interface on host A. The lockdep warning
> > is on host A.
> > Tell me if you need some more info.
> >
>
> Looks like the case of nested ARP requests, the initial request is coming
> from neigh_timer (ARP retransmit), but inside neigh_probe the lock
> is dropped?
Bug is from arp_solicit(), releasing the lock after arp_send()
Its used to protect neigh->ha
We could instead copy neigh->ha, without taking n->lock but ha_lock
seqlock, using neigh_ha_snapshot() helper
Yan, could you test the following patch ?
Thanks
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index ce6fbdf..1169ed4 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -321,7 +321,7 @@ static void arp_error_report(struct neighbour *neigh, struct sk_buff *skb)
static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
{
__be32 saddr = 0;
- u8 *dst_ha = NULL;
+ u8 dst_ha[MAX_ADDR_LEN];
struct net_device *dev = neigh->dev;
__be32 target = *(__be32 *)neigh->primary_key;
int probes = atomic_read(&neigh->probes);
@@ -363,9 +363,9 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
if (probes < 0) {
if (!(neigh->nud_state & NUD_VALID))
pr_debug("trying to ucast probe in NUD_INVALID\n");
- dst_ha = neigh->ha;
- read_lock_bh(&neigh->lock);
+ neigh_ha_snapshot(dst_ha, neigh, dev);
} else {
+ memset(dst_ha, 0, dev->addr_len);
probes -= neigh->parms->app_probes;
if (probes < 0) {
#ifdef CONFIG_ARPD
@@ -377,8 +377,6 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
arp_send(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
dst_ha, dev->dev_addr, NULL);
- if (dst_ha)
- read_unlock_bh(&neigh->lock);
}
static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
^ permalink raw reply related
* Re: Lockdep warning in vxlan
From: Stephen Hemminger @ 2012-12-20 18:22 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Yan Burman, netdev
In-Reply-To: <1356027360.21834.2973.camel@edumazet-glaptop>
On Thu, 20 Dec 2012 10:16:00 -0800
Eric Dumazet <erdnetdev@gmail.com> wrote:
> On Thu, 2012-12-20 at 08:34 -0800, Stephen Hemminger wrote:
> > On Thu, 20 Dec 2012 16:00:32 +0200
> > Yan Burman <yanb@mellanox.com> wrote:
> >
> > > Hi.
> > >
> > > When working with vxlan from current net-next, I got a lockdep warning
> > > (below).
> > > It seems to happen when I have host B pinging host A and while the pings
> > > continue,
> > > I do "ip link del" on the vxlan interface on host A. The lockdep warning
> > > is on host A.
> > > Tell me if you need some more info.
> > >
> >
> > Looks like the case of nested ARP requests, the initial request is coming
> > from neigh_timer (ARP retransmit), but inside neigh_probe the lock
> > is dropped?
>
> Bug is from arp_solicit(), releasing the lock after arp_send()
>
> Its used to protect neigh->ha
>
> We could instead copy neigh->ha, without taking n->lock but ha_lock
> seqlock, using neigh_ha_snapshot() helper
>
> Yan, could you test the following patch ?
>
> Thanks
> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> index ce6fbdf..1169ed4 100644
> --- a/net/ipv4/arp.c
> +++ b/net/ipv4/arp.c
> @@ -321,7 +321,7 @@ static void arp_error_report(struct neighbour *neigh, struct sk_buff *skb)
> static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
> {
> __be32 saddr = 0;
> - u8 *dst_ha = NULL;
> + u8 dst_ha[MAX_ADDR_LEN];
> struct net_device *dev = neigh->dev;
> __be32 target = *(__be32 *)neigh->primary_key;
> int probes = atomic_read(&neigh->probes);
> @@ -363,9 +363,9 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
> if (probes < 0) {
> if (!(neigh->nud_state & NUD_VALID))
> pr_debug("trying to ucast probe in NUD_INVALID\n");
> - dst_ha = neigh->ha;
> - read_lock_bh(&neigh->lock);
> + neigh_ha_snapshot(dst_ha, neigh, dev);
> } else {
> + memset(dst_ha, 0, dev->addr_len);
> probes -= neigh->parms->app_probes;
> if (probes < 0) {
> #ifdef CONFIG_ARPD
> @@ -377,8 +377,6 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
>
> arp_send(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
> dst_ha, dev->dev_addr, NULL);
> - if (dst_ha)
> - read_unlock_bh(&neigh->lock);
> }
>
> static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
I like this. Getting rid of yet another read lock
^ permalink raw reply
* Re: [PATCH 3/3] iproute2: make `bridge mdb` output consistent with input
From: Stephen Hemminger @ 2012-12-20 18:58 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, bridge
In-Reply-To: <1356013915-20835-3-git-send-email-amwang@redhat.com>
On Thu, 20 Dec 2012 22:31:55 +0800
Cong Wang <amwang@redhat.com> wrote:
> bridge -> dev
> group -> grp
>
All three patches accepted for next version of iproute2.
^ permalink raw reply
* [PATCH] bnx2x: use ARRAY_SIZE where possible
From: Sasha Levin @ 2012-12-20 19:11 UTC (permalink / raw)
To: Eilon Greenstein, netdev, linux-kernel; +Cc: Sasha Levin
In-Reply-To: <1356030701-16284-1-git-send-email-sasha.levin@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
index 09096b4..cb41f54 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
@@ -3659,7 +3659,7 @@ static void bnx2x_warpcore_enable_AN_KR2(struct bnx2x_phy *phy,
bnx2x_cl45_read_or_write(bp, phy, MDIO_WC_DEVAD,
MDIO_WC_REG_CL49_USERB0_CTRL, (3<<6));
- for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+ for (i = 0; i < ARRAY_SIZE(reg_set); i++)
bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
reg_set[i].val);
@@ -3713,7 +3713,7 @@ static void bnx2x_warpcore_enable_AN_KR(struct bnx2x_phy *phy,
};
DP(NETIF_MSG_LINK, "Enable Auto Negotiation for KR\n");
/* Set to default registers that may be overriden by 10G force */
- for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+ for (i = 0; i < ARRAY_SIZE(reg_set); i++)
bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
reg_set[i].val);
@@ -3854,7 +3854,7 @@ static void bnx2x_warpcore_set_10G_KR(struct bnx2x_phy *phy,
{MDIO_PMA_DEVAD, MDIO_WC_REG_PMD_KR_CONTROL, 0x2}
};
- for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+ for (i = 0; i < ARRAY_SIZE(reg_set); i++)
bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
reg_set[i].val);
@@ -4242,7 +4242,7 @@ static void bnx2x_warpcore_clear_regs(struct bnx2x_phy *phy,
bnx2x_cl45_read_or_write(bp, phy, MDIO_WC_DEVAD,
MDIO_WC_REG_RX66_CONTROL, (3<<13));
- for (i = 0; i < sizeof(wc_regs)/sizeof(struct bnx2x_reg_set); i++)
+ for (i = 0; i < ARRAY_SIZE(wc_regs); i++)
bnx2x_cl45_write(bp, phy, wc_regs[i].devad, wc_regs[i].reg,
wc_regs[i].val);
@@ -9520,7 +9520,7 @@ static void bnx2x_save_848xx_spirom_version(struct bnx2x_phy *phy,
} else {
/* For 32-bit registers in 848xx, access via MDIO2ARM i/f. */
/* (1) set reg 0xc200_0014(SPI_BRIDGE_CTRL_2) to 0x03000000 */
- for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set);
+ for (i = 0; i < ARRAY_SIZE(reg_set);
i++)
bnx2x_cl45_write(bp, phy, reg_set[i].devad,
reg_set[i].reg, reg_set[i].val);
@@ -9592,7 +9592,7 @@ static void bnx2x_848xx_set_led(struct bnx2x *bp,
MDIO_PMA_DEVAD,
MDIO_PMA_REG_8481_LINK_SIGNAL, val);
- for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+ for (i = 0; i < ARRAY_SIZE(reg_set); i++)
bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
reg_set[i].val);
@@ -13395,7 +13395,7 @@ static void bnx2x_disable_kr2(struct link_params *params,
};
DP(NETIF_MSG_LINK, "Disabling 20G-KR2\n");
- for (i = 0; i < sizeof(reg_set)/sizeof(struct bnx2x_reg_set); i++)
+ for (i = 0; i < ARRAY_SIZE(reg_set); i++)
bnx2x_cl45_write(bp, phy, reg_set[i].devad, reg_set[i].reg,
reg_set[i].val);
vars->link_attr_sync &= ~LINK_ATTR_SYNC_KR2_ENABLE;
--
1.8.0
^ permalink raw reply related
* Re: [PATCH 4/4] net/smsc911x: Provide common clock functionality
From: Linus Walleij @ 2012-12-20 19:12 UTC (permalink / raw)
To: Lee Jones
Cc: linux-arm-kernel, linux-kernel, arnd, linus.walleij,
Steve Glendinning, netdev, Robert Marklund
In-Reply-To: <1355937587-31730-4-git-send-email-lee.jones@linaro.org>
On Wed, Dec 19, 2012 at 6:19 PM, Lee Jones <lee.jones@linaro.org> wrote:
> Some platforms provide clocks which require enabling before the
> SMSC911x chip will power on. This patch uses the new common clk
> framework to do just that. If no clock is provided, it will just
> be ignored and the driver will continue to assume that no clock
> is required for the chip to run successfully.
>
> Cc: Steve Glendinning <steve.glendinning@shawell.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Lee Jones <lee.jones@linaro.org>
Seems to me like it'll do the trick.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Yours,
Linus Walleij
^ permalink raw reply
* [PATCH] wireless: mwifiex: remove unreachable paths
From: Sasha Levin @ 2012-12-20 19:11 UTC (permalink / raw)
To: Bing Zhao, John W. Linville, linux-wireless, netdev, linux-kernel
Cc: Sasha Levin
In-Reply-To: <1356030701-16284-1-git-send-email-sasha.levin@oracle.com>
We know 'firmware' is non-NULL from the beginning of mwifiex_prog_fw_w_helper,
remove all !firmware paths from the rest of the function.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
drivers/net/wireless/mwifiex/usb.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/drivers/net/wireless/mwifiex/usb.c b/drivers/net/wireless/mwifiex/usb.c
index 63ac9f2..8bd7098 100644
--- a/drivers/net/wireless/mwifiex/usb.c
+++ b/drivers/net/wireless/mwifiex/usb.c
@@ -836,23 +836,14 @@ static int mwifiex_prog_fw_w_helper(struct mwifiex_adapter *adapter,
dlen = 0;
} else {
/* copy the header of the fw_data to get the length */
- if (firmware)
- memcpy(&fwdata->fw_hdr, &firmware[tlen],
- sizeof(struct fw_header));
- else
- mwifiex_get_fw_data(adapter, tlen,
- sizeof(struct fw_header),
- (u8 *)&fwdata->fw_hdr);
+ memcpy(&fwdata->fw_hdr, &firmware[tlen],
+ sizeof(struct fw_header));
dlen = le32_to_cpu(fwdata->fw_hdr.data_len);
dnld_cmd = le32_to_cpu(fwdata->fw_hdr.dnld_cmd);
tlen += sizeof(struct fw_header);
- if (firmware)
- memcpy(fwdata->data, &firmware[tlen], dlen);
- else
- mwifiex_get_fw_data(adapter, tlen, dlen,
- (u8 *)fwdata->data);
+ memcpy(fwdata->data, &firmware[tlen], dlen);
fwdata->seq_num = cpu_to_le32(fw_seqnum);
tlen += dlen;
--
1.8.0
^ permalink raw reply related
* Re: [PATCH 4/4] net/smsc911x: Provide common clock functionality
From: Russell King - ARM Linux @ 2012-12-20 19:24 UTC (permalink / raw)
To: Linus Walleij
Cc: Lee Jones, Steve Glendinning, Robert Marklund, linus.walleij,
arnd, netdev, linux-kernel, linux-arm-kernel
In-Reply-To: <CACRpkda79O_b3Z8g7Sy7vMtW9neZU4x-Z=iEQgjqu4X5tFKyhw@mail.gmail.com>
On Thu, Dec 20, 2012 at 08:12:08PM +0100, Linus Walleij wrote:
> On Wed, Dec 19, 2012 at 6:19 PM, Lee Jones <lee.jones@linaro.org> wrote:
>
> > Some platforms provide clocks which require enabling before the
> > SMSC911x chip will power on. This patch uses the new common clk
> > framework to do just that. If no clock is provided, it will just
> > be ignored and the driver will continue to assume that no clock
> > is required for the chip to run successfully.
> >
> > Cc: Steve Glendinning <steve.glendinning@shawell.net>
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Lee Jones <lee.jones@linaro.org>
>
> Seems to me like it'll do the trick.
> Acked-by: Linus Walleij <linus.walleij@linaro.org>
This looks fairly dangerous. What about those platforms which use this
driver, but don't provide a clock for it?
It looks like this will result in those platforms losing their ethernet
support. There's at least a bunch of the ARM evaluation boards which
make use of this driver...
^ permalink raw reply
* NAPI documentation needed
From: Rafał Miłecki @ 2012-12-20 19:39 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller
I wanted to report some problem I've encouraged during bgmac driver development.
At the very beginning I've implemented IRQ using threaded IRQ
(request_threaded_irq). I didn't know about NAPI until someone pointed
me that mistake. So I decided to rewrite IRQs handling to use NAPI.
I've found following documents:
http://www.linuxfoundation.org/collaborate/workgroups/networking/napi
ftp://robur.slu.se/pub/Linux/net-development/NAPI/README
ftp://robur.slu.se/pub/Linux/net-development/NAPI/NAPI_HOWTO.txt
ftp://robur.slu.se/pub/Linux/net-development/NAPI/converting-to-NAPI.txt~
but nothing really official sitting in kernel's Documentation dir.
So I started to using found documents, but then noticed they are quite outdated.
1) We don't have netif_rx_schedule and netif_rx_complete anymore.
2) We don't set poll and weight manually anymore but use netif_napi_add
3) Return type and arguments has changed in poll. None of the
following is up-to-date:
static void my_poll (struct net_device *dev, int *budget)
int (*poll)(struct net_device *dev, int *budget);
It would be great if someone with NAPI knowledge could document it in
a kernel. Would be really helpful for new network drivers developers.
--
Rafał
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox