Netdev List
 help / color / mirror / Atom feed
* Re: linux-next: build failure after merge of the net-next tree
From: David Miller @ 2017-09-22  1:37 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, pabeni
In-Reply-To: <20170922110355.6a631fc5@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Fri, 22 Sep 2017 11:03:55 +1000

> After merging the net-next tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> net/ipv4/fib_frontend.c: In function 'fib_validate_source':
> net/ipv4/fib_frontend.c:411:16: error: 'struct netns_ipv4' has no member named 'fib_has_custom_local_routes'
>    if (net->ipv4.fib_has_custom_local_routes)
>                 ^
> net/ipv4/fib_frontend.c: In function 'inet_rtm_newroute':
> net/ipv4/fib_frontend.c:773:12: error: 'struct netns_ipv4' has no member named 'fib_has_custom_local_routes'
>    net->ipv4.fib_has_custom_local_routes = true;
>             ^
> 
> Caused by commit
> 
>   6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.")

Paolo, it seems this struct member should be placed outside of
IP_MULTIPLE_TABLES protection, since users can insert custom
local routes even without that set.

So I'm installing the following fix for this:

====================
[PATCH] ipv4: Move fib_has_custom_local_routes outside of IP_MULTIPLE_TABLES.

> net/ipv4/fib_frontend.c: In function 'fib_validate_source':
> net/ipv4/fib_frontend.c:411:16: error: 'struct netns_ipv4' has no member named 'fib_has_custom_local_routes'
>    if (net->ipv4.fib_has_custom_local_routes)
>                 ^
> net/ipv4/fib_frontend.c: In function 'inet_rtm_newroute':
> net/ipv4/fib_frontend.c:773:12: error: 'struct netns_ipv4' has no member named 'fib_has_custom_local_routes'
>    net->ipv4.fib_has_custom_local_routes = true;
>             ^

Fixes: 6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/netns/ipv4.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 20720721da4b..8387f099115e 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -49,10 +49,10 @@ struct netns_ipv4 {
 #ifdef CONFIG_IP_MULTIPLE_TABLES
 	struct fib_rules_ops	*rules_ops;
 	bool			fib_has_custom_rules;
-	bool			fib_has_custom_local_routes;
 	struct fib_table __rcu	*fib_main;
 	struct fib_table __rcu	*fib_default;
 #endif
+	bool			fib_has_custom_local_routes;
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	int			fib_num_tclassid_users;
 #endif
-- 
2.13.5

^ permalink raw reply related

* (unknown), 
From: unsubscribe.me @ 2017-09-22  1:22 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 7172596099.doc --]
[-- Type: application/msword, Size: 55811 bytes --]

^ permalink raw reply

* Re: [PATCH net] net/ncsi: Don't assume last available channel exists
From: David Miller @ 2017-09-22  1:11 UTC (permalink / raw)
  To: sam; +Cc: netdev, linux-kernel
In-Reply-To: <1506042000.1377.21.camel@mendozajonas.com>

From: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Date: Fri, 22 Sep 2017 11:00:00 +1000

> If we haven't configured a channel yet (or are in the process of doing
> so) we won't have a hot_channel - does it make more sense to
> - check against the hot_channel as currently done,
> - only check the filter size at configure time for /each/ channel,
> - only conditionally enable the .ndo_vlan_rx_add_vid net_device callback
> once we've configured a channel (eg. for ftgmac100 in the
> ftgmac100_ncsi_handler() callback?)

The last isn't so feasible.

The device shouldn't be marked attached until a channel is available,
because it seems like communication cannot occur until one is.  Right?

You could experiment with netif_device_detach()/netif_device_attach().

When the device is in the detached state, callbacks such as
->ndo_vlan_rx_add_vid() will not be invoked.

^ permalink raw reply

* linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2017-09-22  1:03 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List, Paolo Abeni

Hi all,

After merging the net-next tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

net/ipv4/fib_frontend.c: In function 'fib_validate_source':
net/ipv4/fib_frontend.c:411:16: error: 'struct netns_ipv4' has no member named 'fib_has_custom_local_routes'
   if (net->ipv4.fib_has_custom_local_routes)
                ^
net/ipv4/fib_frontend.c: In function 'inet_rtm_newroute':
net/ipv4/fib_frontend.c:773:12: error: 'struct netns_ipv4' has no member named 'fib_has_custom_local_routes'
   net->ipv4.fib_has_custom_local_routes = true;
            ^

Caused by commit

  6e617de84e87 ("net: avoid a full fib lookup when rp_filter is disabled.")

This build has CONFIG_IP_MULTIPLE_TABLES unset.

I have reverted that commit for today.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* Re: [PATCH net] net/ncsi: Don't assume last available channel exists
From: Samuel Mendoza-Jonas @ 2017-09-22  1:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20170920.160519.764603868579556859.davem@davemloft.net>

On Wed, 2017-09-20 at 16:05 -0700, David Miller wrote:
> From: Samuel Mendoza-Jonas <sam@mendozajonas.com>
> Date: Wed, 20 Sep 2017 14:12:51 +1000
> 
> > When handling new VLAN tags in NCSI we check the maximum allowed number
> > of filters on the last active ("hot") channel. However if the 'add'
> > callback is called before NCSI has configured a channel, this causes a
> > NULL dereference.
> > 
> > Check that we actually have a hot channel, and warn if it is missing.
> > 
> > Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
> 
> Well, a few things.
> 
> We should not allow this driver method to be invoked in the first
> place if the device is not in a state where the operation is legal
> yet.
> 
> Second of all, if !ndp is true, you should not return 0 to indicate
> success.
> 
> But more fundamentally, we should block this method from being
> callable unless the device is in the proper state and can complete the
> operation.
> 
> Seriously, these checks should be completely unnecessary if those
> issues are handled properly.

Good point, this made me step back and reconsider the problem here.

The ncsi_vlan_rx_add_vid() callback exists because the NCSI driver needs
to know which VLAN IDs are set, but we don't have a straightforward way
of accessing that information later in ncsi_configure_channel() - as
opposed to the MAC address for example which is just accessed via
dev->dev_addr. Instead they're kept track of in the ndp->vlan_vids list
and the NCSI driver reconfigures any channels it knows about.

So in that sense the NCSI device *is* ready to handle the operation. The
hot_channel fix here was to check if we were exceeding the maximum number
of VLAN filters supported by the remote channel. That itself is bit
debatable since different channels could support different numbers of
filters but right now the NCSI driver only supports one active channel at
a time.

If we haven't configured a channel yet (or are in the process of doing
so) we won't have a hot_channel - does it make more sense to
- check against the hot_channel as currently done,
- only check the filter size at configure time for /each/ channel,
- only conditionally enable the .ndo_vlan_rx_add_vid net_device callback
once we've configured a channel (eg. for ftgmac100 in the
ftgmac100_ncsi_handler() callback?)

Thanks,
Sam

^ permalink raw reply

* Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on
From: Eric Dumazet @ 2017-09-22  0:37 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Paweł Staszewski, Paolo Abeni, Jesper Dangaard Brouer,
	Linux Kernel Network Developers, Alexander Duyck
In-Reply-To: <7dbc2dd1-eec9-0bd3-7255-4fe6026847aa@gmail.com>

On Thu, 2017-09-21 at 15:07 -0700, Florian Fainelli wrote:
> On 09/21/2017 02:54 PM, Eric Dumazet wrote:
> > On Thu, 2017-09-21 at 14:41 -0700, Florian Fainelli wrote:
> > 
> >> Would not this apply to pretty much any stacked device setup though? It
> >> seems like any network device that just queues up its packet on another
> >> physical device for actual transmission may need that (e.g: DSA, bond,
> >> team, more.?)
> > 
> > We support bonding and team already.
> 
> Right, so that seems to mostly leave us with DSA at least. What about
> other devices that also have IFF_NO_QUEUE set?

It wont work.

loopback has IFF_NO_QUEUE, but you need to keep dst on skbs...

^ permalink raw reply

* [PATCH net] net: qualcomm: rmnet: Fix rcu splat in rmnet_is_real_dev_registered
From: Subash Abhinov Kasiviswanathan @ 2017-09-22  0:00 UTC (permalink / raw)
  To: xiaolong.ye, davem, netdev; +Cc: Subash Abhinov Kasiviswanathan

Xiaolong reported a suspicious rcu_dereference_check in the device
unregister notifier callback. Since we do not dereference the
rx_handler_data, it's ok to just check for the value of the pointer.
Note that this section is already protected by rtnl_lock.

[  101.364846] WARNING: suspicious RCU usage
[  101.365654] 4.13.0-rc6-01701-gceed73a #1 Not tainted
[  101.370873] -----------------------------
[  101.372472] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:57 suspicious rcu_dereference_check() usage!
[  101.374427]
[  101.374427] other info that might help us debug this:
[  101.374427]
[  101.387491]
[  101.387491] rcu_scheduler_active = 2, debug_locks = 1
[  101.389368] 1 lock held by trinity-main/2809:
[  101.390736]  #0:  (rtnl_mutex){+.+.+.}, at: [<8146085b>] rtnl_lock+0xf/0x11
[  101.395482]
[  101.395482] stack backtrace:
[  101.396948] CPU: 0 PID: 2809 Comm: trinity-main Not tainted 4.13.0-rc6-01701-gceed73a #1
[  101.398857] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
[  101.401079] Call Trace:
[  101.401656]  dump_stack+0xa1/0xeb
[  101.402871]  lockdep_rcu_suspicious+0xc7/0xd0
[  101.403665]  rmnet_is_real_dev_registered+0x40/0x4e
[  101.405199]  rmnet_config_notify_cb+0x2c/0x142
[  101.406344]  ? wireless_nlevent_flush+0x47/0x71
[  101.407385]  notifier_call_chain+0x2d/0x47
[  101.408645]  raw_notifier_call_chain+0xc/0xe
[  101.409882]  call_netdevice_notifiers_info+0x41/0x49
[  101.411402]  call_netdevice_notifiers+0xc/0xe
[  101.412713]  rollback_registered_many+0x268/0x36e
[  101.413702]  rollback_registered+0x39/0x56
[  101.414965]  unregister_netdevice_queue+0x79/0x88
[  101.415908]  unregister_netdev+0x16/0x1d

Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Reported-by: kernel test robot <xiaolong.ye@intel.com>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index 98f2255..1e33aea 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -51,10 +51,7 @@ struct rmnet_walk_data {
 
 static int rmnet_is_real_dev_registered(const struct net_device *real_dev)
 {
-	rx_handler_func_t *rx_handler;
-
-	rx_handler = rcu_dereference(real_dev->rx_handler);
-	return (rx_handler == rmnet_rx_handler);
+	return rcu_access_pointer(real_dev->rx_handler) == rmnet_rx_handler;
 }
 
 /* Needs rtnl lock */
-- 
1.9.1

^ permalink raw reply related

* [Patch net-next] net_sched: use idr to allocate u32 filter handles
From: Cong Wang @ 2017-09-21 23:43 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Chris Mi, Jamal Hadi Salim

Cc: Chris Mi <chrism@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_u32.c | 108 ++++++++++++++++++++++++++++++++--------------------
 1 file changed, 67 insertions(+), 41 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 10b8d851fc6b..316b8a791b13 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -46,6 +46,7 @@
 #include <net/act_api.h>
 #include <net/pkt_cls.h>
 #include <linux/netdevice.h>
+#include <linux/idr.h>
 
 struct tc_u_knode {
 	struct tc_u_knode __rcu	*next;
@@ -82,6 +83,7 @@ struct tc_u_hnode {
 	struct tc_u_common	*tp_c;
 	int			refcnt;
 	unsigned int		divisor;
+	struct idr		handle_idr;
 	struct rcu_head		rcu;
 	/* The 'ht' field MUST be the last field in structure to allow for
 	 * more entries allocated at end of structure.
@@ -93,7 +95,7 @@ struct tc_u_common {
 	struct tc_u_hnode __rcu	*hlist;
 	struct Qdisc		*q;
 	int			refcnt;
-	u32			hgenerator;
+	struct idr		handle_idr;
 	struct hlist_node	hnode;
 	struct rcu_head		rcu;
 };
@@ -311,19 +313,19 @@ static void *u32_get(struct tcf_proto *tp, u32 handle)
 	return u32_lookup_key(ht, handle);
 }
 
-static u32 gen_new_htid(struct tc_u_common *tp_c)
+static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr)
 {
-	int i = 0x800;
+	unsigned long idr_index;
+	int err;
 
-	/* hgenerator only used inside rtnl lock it is safe to increment
+	/* This is only used inside rtnl lock it is safe to increment
 	 * without read _copy_ update semantics
 	 */
-	do {
-		if (++tp_c->hgenerator == 0x7FF)
-			tp_c->hgenerator = 1;
-	} while (--i > 0 && u32_lookup_ht(tp_c, (tp_c->hgenerator|0x800)<<20));
-
-	return i > 0 ? (tp_c->hgenerator|0x800)<<20 : 0;
+	err = idr_alloc_ext(&tp_c->handle_idr, ptr, &idr_index,
+			    1, 0x800, GFP_KERNEL);
+	if (err)
+		return 0;
+	return (u32)(idr_index | 0x800) << 20;
 }
 
 static struct hlist_head *tc_u_common_hash;
@@ -366,8 +368,9 @@ static int u32_init(struct tcf_proto *tp)
 		return -ENOBUFS;
 
 	root_ht->refcnt++;
-	root_ht->handle = tp_c ? gen_new_htid(tp_c) : 0x80000000;
+	root_ht->handle = tp_c ? gen_new_htid(tp_c, root_ht) : 0x80000000;
 	root_ht->prio = tp->prio;
+	idr_init(&root_ht->handle_idr);
 
 	if (tp_c == NULL) {
 		tp_c = kzalloc(sizeof(*tp_c), GFP_KERNEL);
@@ -377,6 +380,7 @@ static int u32_init(struct tcf_proto *tp)
 		}
 		tp_c->q = tp->q;
 		INIT_HLIST_NODE(&tp_c->hnode);
+		idr_init(&tp_c->handle_idr);
 
 		h = tc_u_hash(tp);
 		hlist_add_head(&tp_c->hnode, &tc_u_common_hash[h]);
@@ -565,6 +569,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 					 rtnl_dereference(n->next));
 			tcf_unbind_filter(tp, &n->res);
 			u32_remove_hw_knode(tp, n->handle);
+			idr_remove_ext(&ht->handle_idr, n->handle);
 			call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
 		}
 	}
@@ -586,6 +591,8 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht)
 	     hn = &phn->next, phn = rtnl_dereference(*hn)) {
 		if (phn == ht) {
 			u32_clear_hw_hnode(tp, ht);
+			idr_destroy(&ht->handle_idr);
+			idr_remove_ext(&tp_c->handle_idr, ht->handle);
 			RCU_INIT_POINTER(*hn, ht->next);
 			kfree_rcu(ht, rcu);
 			return 0;
@@ -633,6 +640,7 @@ static void u32_destroy(struct tcf_proto *tp)
 			kfree_rcu(ht, rcu);
 		}
 
+		idr_destroy(&tp_c->handle_idr);
 		kfree(tp_c);
 	}
 
@@ -701,27 +709,21 @@ static int u32_delete(struct tcf_proto *tp, void *arg, bool *last)
 	return ret;
 }
 
-#define NR_U32_NODE (1<<12)
-static u32 gen_new_kid(struct tc_u_hnode *ht, u32 handle)
+static u32 gen_new_kid(struct tc_u_hnode *ht, u32 htid)
 {
-	struct tc_u_knode *n;
-	unsigned long i;
-	unsigned long *bitmap = kzalloc(BITS_TO_LONGS(NR_U32_NODE) * sizeof(unsigned long),
-					GFP_KERNEL);
-	if (!bitmap)
-		return handle | 0xFFF;
-
-	for (n = rtnl_dereference(ht->ht[TC_U32_HASH(handle)]);
-	     n;
-	     n = rtnl_dereference(n->next))
-		set_bit(TC_U32_NODE(n->handle), bitmap);
-
-	i = find_next_zero_bit(bitmap, NR_U32_NODE, 0x800);
-	if (i >= NR_U32_NODE)
-		i = find_next_zero_bit(bitmap, NR_U32_NODE, 1);
+	unsigned long idr_index;
+	u32 start = htid | 0x800;
+	u32 max = htid | 0xFFF;
+	u32 min = htid;
+
+	if (idr_alloc_ext(&ht->handle_idr, NULL, &idr_index,
+			  start, max + 1, GFP_KERNEL)) {
+		if (idr_alloc_ext(&ht->handle_idr, NULL, &idr_index,
+				  min + 1, max + 1, GFP_KERNEL))
+			return max;
+	}
 
-	kfree(bitmap);
-	return handle | (i >= NR_U32_NODE ? 0xFFF : i);
+	return (u32)idr_index;
 }
 
 static const struct nla_policy u32_policy[TCA_U32_MAX + 1] = {
@@ -806,6 +808,7 @@ static void u32_replace_knode(struct tcf_proto *tp, struct tc_u_common *tp_c,
 		if (pins->handle == n->handle)
 			break;
 
+	idr_replace_ext(&ht->handle_idr, n, n->handle);
 	RCU_INIT_POINTER(n->next, pins->next);
 	rcu_assign_pointer(*ins, n);
 }
@@ -937,22 +940,33 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 			return -EINVAL;
 		if (TC_U32_KEY(handle))
 			return -EINVAL;
-		if (handle == 0) {
-			handle = gen_new_htid(tp->data);
-			if (handle == 0)
-				return -ENOMEM;
-		}
 		ht = kzalloc(sizeof(*ht) + divisor*sizeof(void *), GFP_KERNEL);
 		if (ht == NULL)
 			return -ENOBUFS;
+		if (handle == 0) {
+			handle = gen_new_htid(tp->data, ht);
+			if (handle == 0) {
+				kfree(ht);
+				return -ENOMEM;
+			}
+		} else {
+			err = idr_alloc_ext(&tp_c->handle_idr, ht, NULL,
+					    handle, handle + 1, GFP_KERNEL);
+			if (err) {
+				kfree(ht);
+				return err;
+			}
+		}
 		ht->tp_c = tp_c;
 		ht->refcnt = 1;
 		ht->divisor = divisor;
 		ht->handle = handle;
 		ht->prio = tp->prio;
+		idr_init(&ht->handle_idr);
 
 		err = u32_replace_hw_hnode(tp, ht, flags);
 		if (err) {
+			idr_remove_ext(&tp_c->handle_idr, handle);
 			kfree(ht);
 			return err;
 		}
@@ -986,24 +1000,33 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 		if (TC_U32_HTID(handle) && TC_U32_HTID(handle^htid))
 			return -EINVAL;
 		handle = htid | TC_U32_NODE(handle);
+		err = idr_alloc_ext(&ht->handle_idr, NULL, NULL,
+				    handle, handle + 1,
+				    GFP_KERNEL);
+		if (err)
+			return err;
 	} else
 		handle = gen_new_kid(ht, htid);
 
-	if (tb[TCA_U32_SEL] == NULL)
-		return -EINVAL;
+	if (tb[TCA_U32_SEL] == NULL) {
+		err = -EINVAL;
+		goto erridr;
+	}
 
 	s = nla_data(tb[TCA_U32_SEL]);
 
 	n = kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL);
-	if (n == NULL)
-		return -ENOBUFS;
+	if (n == NULL) {
+		err = -ENOBUFS;
+		goto erridr;
+	}
 
 #ifdef CONFIG_CLS_U32_PERF
 	size = sizeof(struct tc_u32_pcnt) + s->nkeys * sizeof(u64);
 	n->pf = __alloc_percpu(size, __alignof__(struct tc_u32_pcnt));
 	if (!n->pf) {
-		kfree(n);
-		return -ENOBUFS;
+		err = -ENOBUFS;
+		goto errfree;
 	}
 #endif
 
@@ -1066,9 +1089,12 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 errout:
 	tcf_exts_destroy(&n->exts);
 #ifdef CONFIG_CLS_U32_PERF
+errfree:
 	free_percpu(n->pf);
 #endif
 	kfree(n);
+erridr:
+	idr_remove_ext(&ht->handle_idr, handle);
 	return err;
 }
 
-- 
2.13.0

^ permalink raw reply related

* [Patch net-next] net_sched: use idr to allocate basic filter handles
From: Cong Wang @ 2017-09-21 23:43 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Chris Mi, Jamal Hadi Salim

Cc: Chris Mi <chrism@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_basic.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index d89ebafd2239..29da19eb6447 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -17,13 +17,14 @@
 #include <linux/errno.h>
 #include <linux/rtnetlink.h>
 #include <linux/skbuff.h>
+#include <linux/idr.h>
 #include <net/netlink.h>
 #include <net/act_api.h>
 #include <net/pkt_cls.h>
 
 struct basic_head {
-	u32			hgenerator;
 	struct list_head	flist;
+	struct idr		handle_idr;
 	struct rcu_head		rcu;
 };
 
@@ -78,6 +79,7 @@ static int basic_init(struct tcf_proto *tp)
 	if (head == NULL)
 		return -ENOBUFS;
 	INIT_LIST_HEAD(&head->flist);
+	idr_init(&head->handle_idr);
 	rcu_assign_pointer(tp->root, head);
 	return 0;
 }
@@ -99,8 +101,10 @@ static void basic_destroy(struct tcf_proto *tp)
 	list_for_each_entry_safe(f, n, &head->flist, link) {
 		list_del_rcu(&f->link);
 		tcf_unbind_filter(tp, &f->res);
+		idr_remove_ext(&head->handle_idr, f->handle);
 		call_rcu(&f->rcu, basic_delete_filter);
 	}
+	idr_destroy(&head->handle_idr);
 	kfree_rcu(head, rcu);
 }
 
@@ -111,6 +115,7 @@ static int basic_delete(struct tcf_proto *tp, void *arg, bool *last)
 
 	list_del_rcu(&f->link);
 	tcf_unbind_filter(tp, &f->res);
+	idr_remove_ext(&head->handle_idr, f->handle);
 	call_rcu(&f->rcu, basic_delete_filter);
 	*last = list_empty(&head->flist);
 	return 0;
@@ -154,6 +159,7 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 	struct nlattr *tb[TCA_BASIC_MAX + 1];
 	struct basic_filter *fold = (struct basic_filter *) *arg;
 	struct basic_filter *fnew;
+	unsigned long idr_index;
 
 	if (tca[TCA_OPTIONS] == NULL)
 		return -EINVAL;
@@ -179,30 +185,31 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 	err = -EINVAL;
 	if (handle) {
 		fnew->handle = handle;
-	} else if (fold) {
-		fnew->handle = fold->handle;
+		if (!fold) {
+			err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
+					    handle, handle + 1, GFP_KERNEL);
+			if (err)
+				goto errout;
+		}
 	} else {
-		unsigned int i = 0x80000000;
-		do {
-			if (++head->hgenerator == 0x7FFFFFFF)
-				head->hgenerator = 1;
-		} while (--i > 0 && basic_get(tp, head->hgenerator));
-
-		if (i <= 0) {
-			pr_err("Insufficient number of handles\n");
+		err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
+				    1, 0x80000000, GFP_KERNEL);
+		if (err)
 			goto errout;
-		}
-
-		fnew->handle = head->hgenerator;
+		fnew->handle = idr_index;
 	}
 
 	err = basic_set_parms(net, tp, fnew, base, tb, tca[TCA_RATE], ovr);
-	if (err < 0)
+	if (err < 0) {
+		if (!fold)
+			idr_remove_ext(&head->handle_idr, fnew->handle);
 		goto errout;
+	}
 
 	*arg = fnew;
 
 	if (fold) {
+		idr_replace_ext(&head->handle_idr, fnew, fnew->handle);
 		list_replace_rcu(&fold->link, &fnew->link);
 		tcf_unbind_filter(tp, &fold->res);
 		call_rcu(&fold->rcu, basic_delete_filter);
-- 
2.13.0

^ permalink raw reply related

* [Patch net-next] net_sched: use idr to allocate bpf filter handles
From: Cong Wang @ 2017-09-21 23:43 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Daniel Borkmann, Chris Mi, Jamal Hadi Salim

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Chris Mi <chrism@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_bpf.c | 57 ++++++++++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 29 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 520c5027646a..4fd352a1778e 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -17,6 +17,7 @@
 #include <linux/skbuff.h>
 #include <linux/filter.h>
 #include <linux/bpf.h>
+#include <linux/idr.h>
 
 #include <net/rtnetlink.h>
 #include <net/pkt_cls.h>
@@ -32,7 +33,7 @@ MODULE_DESCRIPTION("TC BPF based classifier");
 
 struct cls_bpf_head {
 	struct list_head plist;
-	u32 hgen;
+	struct idr handle_idr;
 	struct rcu_head rcu;
 };
 
@@ -238,6 +239,7 @@ static int cls_bpf_init(struct tcf_proto *tp)
 		return -ENOBUFS;
 
 	INIT_LIST_HEAD_RCU(&head->plist);
+	idr_init(&head->handle_idr);
 	rcu_assign_pointer(tp->root, head);
 
 	return 0;
@@ -264,6 +266,9 @@ static void cls_bpf_delete_prog_rcu(struct rcu_head *rcu)
 
 static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog)
 {
+	struct cls_bpf_head *head = rtnl_dereference(tp->root);
+
+	idr_remove_ext(&head->handle_idr, prog->handle);
 	cls_bpf_stop_offload(tp, prog);
 	list_del_rcu(&prog->link);
 	tcf_unbind_filter(tp, &prog->res);
@@ -287,6 +292,7 @@ static void cls_bpf_destroy(struct tcf_proto *tp)
 	list_for_each_entry_safe(prog, tmp, &head->plist, link)
 		__cls_bpf_delete(tp, prog);
 
+	idr_destroy(&head->handle_idr);
 	kfree_rcu(head, rcu);
 }
 
@@ -421,27 +427,6 @@ static int cls_bpf_set_parms(struct net *net, struct tcf_proto *tp,
 	return 0;
 }
 
-static u32 cls_bpf_grab_new_handle(struct tcf_proto *tp,
-				   struct cls_bpf_head *head)
-{
-	unsigned int i = 0x80000000;
-	u32 handle;
-
-	do {
-		if (++head->hgen == 0x7FFFFFFF)
-			head->hgen = 1;
-	} while (--i > 0 && cls_bpf_get(tp, head->hgen));
-
-	if (unlikely(i == 0)) {
-		pr_err("Insufficient number of handles\n");
-		handle = 0;
-	} else {
-		handle = head->hgen;
-	}
-
-	return handle;
-}
-
 static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 			  struct tcf_proto *tp, unsigned long base,
 			  u32 handle, struct nlattr **tca,
@@ -451,6 +436,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	struct cls_bpf_prog *oldprog = *arg;
 	struct nlattr *tb[TCA_BPF_MAX + 1];
 	struct cls_bpf_prog *prog;
+	unsigned long idr_index;
 	int ret;
 
 	if (tca[TCA_OPTIONS] == NULL)
@@ -476,21 +462,30 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 		}
 	}
 
-	if (handle == 0)
-		prog->handle = cls_bpf_grab_new_handle(tp, head);
-	else
+	if (handle == 0) {
+		ret = idr_alloc_ext(&head->handle_idr, prog, &idr_index,
+				    1, 0x80000000, GFP_KERNEL);
+		if (ret)
+			goto errout;
+		prog->handle = idr_index;
+	} else {
+		if (!oldprog) {
+			ret = idr_alloc_ext(&head->handle_idr, prog, &idr_index,
+					    handle, handle + 1, GFP_KERNEL);
+			if (ret)
+				goto errout;
+		}
 		prog->handle = handle;
-	if (prog->handle == 0) {
-		ret = -EINVAL;
-		goto errout;
 	}
 
 	ret = cls_bpf_set_parms(net, tp, prog, base, tb, tca[TCA_RATE], ovr);
 	if (ret < 0)
-		goto errout;
+		goto errout_idr;
 
 	ret = cls_bpf_offload(tp, prog, oldprog);
 	if (ret) {
+		if (!oldprog)
+			idr_remove_ext(&head->handle_idr, prog->handle);
 		__cls_bpf_delete_prog(prog);
 		return ret;
 	}
@@ -499,6 +494,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 		prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
 	if (oldprog) {
+		idr_replace_ext(&head->handle_idr, prog, handle);
 		list_replace_rcu(&oldprog->link, &prog->link);
 		tcf_unbind_filter(tp, &oldprog->res);
 		call_rcu(&oldprog->rcu, cls_bpf_delete_prog_rcu);
@@ -509,6 +505,9 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 	*arg = prog;
 	return 0;
 
+errout_idr:
+	if (!oldprog)
+		idr_remove_ext(&head->handle_idr, prog->handle);
 errout:
 	tcf_exts_destroy(&prog->exts);
 	kfree(prog);
-- 
2.13.0

^ permalink raw reply related

* [net-next:master 42/46] net/ipv4/fib_frontend.c:411:16: error: 'struct netns_ipv4' has no member named 'fib_has_custom_local_routes'
From: kbuild test robot @ 2017-09-21 23:27 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: kbuild-all, netdev

[-- Attachment #1: Type: text/plain, Size: 2244 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   b6cd4b5895848968e8fee93fc5e3dc8babc40b9e
commit: 6e617de84e87d626d1e976fc30e1322239fd4d2d [42/46] net: avoid a full fib lookup when rp_filter is disabled.
config: x86_64-kexec (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        git checkout 6e617de84e87d626d1e976fc30e1322239fd4d2d
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   net/ipv4/fib_frontend.c: In function 'fib_validate_source':
>> net/ipv4/fib_frontend.c:411:16: error: 'struct netns_ipv4' has no member named 'fib_has_custom_local_routes'
      if (net->ipv4.fib_has_custom_local_routes)
                   ^
   net/ipv4/fib_frontend.c: In function 'inet_rtm_newroute':
   net/ipv4/fib_frontend.c:773:12: error: 'struct netns_ipv4' has no member named 'fib_has_custom_local_routes'
      net->ipv4.fib_has_custom_local_routes = true;
               ^

vim +411 net/ipv4/fib_frontend.c

   395	
   396	/* Ignore rp_filter for packets protected by IPsec. */
   397	int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
   398				u8 tos, int oif, struct net_device *dev,
   399				struct in_device *idev, u32 *itag)
   400	{
   401		int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
   402		struct net *net = dev_net(dev);
   403	
   404		if (!r && !fib_num_tclassid_users(net) &&
   405		    (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
   406			if (IN_DEV_ACCEPT_LOCAL(idev))
   407				goto ok;
   408			/* if no local routes are added from user space we can check
   409			 * for local addresses looking-up the ifaddr table
   410			 */
 > 411			if (net->ipv4.fib_has_custom_local_routes)
   412				goto full_check;
   413			if (inet_lookup_ifaddr_rcu(net, src))
   414				return -EINVAL;
   415	
   416	ok:
   417			*itag = 0;
   418			return 0;
   419		}
   420	
   421	full_check:
   422		return __fib_validate_source(skb, src, dst, tos, oif, dev, r, idev, itag);
   423	}
   424	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26118 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] bpf/verifier: improve disassembly of BPF_END instructions
From: Y Song @ 2017-09-21 23:11 UTC (permalink / raw)
  To: Edward Cree; +Cc: Alexei Starovoitov, Daniel Borkmann, David Miller, netdev
In-Reply-To: <46aa4442-b8ed-e4c1-4897-8f650e23d448@solarflare.com>

On Thu, Sep 21, 2017 at 12:58 PM, Edward Cree <ecree@solarflare.com> wrote:
> On 21/09/17 20:44, Alexei Starovoitov wrote:
>> On Thu, Sep 21, 2017 at 09:29:33PM +0200, Daniel Borkmann wrote:
>>> More intuitive, but agree on the from_be/le. Maybe we should
>>> just drop the "to_" prefix altogether, and leave the rest as is since
>>> it's not surrounded by braces, it's also not a cast but rather an op.
> That works for me.
>> 'be16 r4' is ambiguous regarding upper bits.
>>
>> what about my earlier suggestion:
>> r4 = (be16) (u16) r4
>> r4 = (le64) (u64) r4
>>
>> It will be pretty clear what instruction is doing (that upper bits become zero).
> Trouble with that is that's very *not* what C will do with those casts
>  and it doesn't really capture the bidirectional/symmetry thing.  The
>  closest I could see with that is something like `r4 = (be16/u16) r4`,
>  but that's quite an ugly mongrel.
> I think Daniel's idea of `be16`, `le32` etc one-arg opcodes is the
>  cleanest and clearest.  Should it be
>     r4 = be16 r4
>  or just
>     be16 r4
> ?  Personally I incline towards the latter, but admit it doesn't really
>  match the syntax of other opcodes.

I did some quick prototyping in llvm to make sure we have a syntax
llvm is happy. Apparently, llvm does not like the syntax
   r4 = be16 r4    or    r4 = (be16) (u16) r4.

In llvm:utils/TableGen/AsmMatcherEmitter.cpp:

    // Verify that any operand is only mentioned once.
    // We reject aliases and ignore instructions for now.
    if (Tok[0] == '$' && !OperandNames.insert(Tok).second) {
      if (!Hack)
        PrintFatalError(TheDef->getLoc(),
                        "ERROR: matchable with tied operand '" + Tok +
                        "' can never be matched!");
      // FIXME: Should reject these.  The ARM backend hits this with $lane in a
      // bunch of instructions.  It is unclear what the right answer is.
      DEBUG({
        errs() << "warning: '" << TheDef->getName() << "': "
               << "ignoring instruction with tied operand '"
               << Tok << "'\n";
      });
      return false;
    }

Later on, such insn will be ignored in table matching and assember
will not work any more.

Note that here bswap16/32/64 require source and destination register
must be the same.

So it looks like "be16/be32/be64/le16/le32/le64 #register" is a good idea.
We could use "be16 (u16)#register", but not sure whether extra u16
conversion adds value or
rather adds more confusion.

>
> To shed a few more bikes, I did also wonder about the BPF_NEG opcode,
>  which (if I'm reading the code correctly) currently renders as
>     r4 = neg r4 0
>     (u32) r4 = neg (u32) r4 0
> That printing of the insn->imm, while harmless, is needless and
>  potentially confusing.  Should we get rid of it?

Currently, llvm does not issue "neg" insn yet. Maybe you can issue
     r3 = -r4          // 64bit
     r3 = (u32) -r4 // 32bit

This matches what interpreter does. This will be similar to other ALU
operations.

^ permalink raw reply

* [PATCH] cnic: Fix an error handling path in 'cnic_alloc_bnx2x_resc()'
From: Christophe JAILLET @ 2017-09-21 23:01 UTC (permalink / raw)
  To: jmaxwell37, davem, stephen, danielj, parav
  Cc: netdev, linux-kernel, kernel-janitors, Christophe JAILLET

All the error handling paths 'goto error', except this one.
We should also go to error in this case, or some resources will be
leaking.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
 drivers/net/ethernet/broadcom/cnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index cec94bbb2ea5..8bc126a156e8 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -1278,7 +1278,7 @@ static int cnic_alloc_bnx2x_resc(struct cnic_dev *dev)
 
 	ret = cnic_alloc_dma(dev, kwq_16_dma, pages, 0);
 	if (ret)
-		return -ENOMEM;
+		goto error;
 
 	n = CNIC_PAGE_SIZE / CNIC_KWQ16_DATA_SIZE;
 	for (i = 0, j = 0; i < cp->max_cid_space; i++) {
-- 
2.11.0

^ permalink raw reply related

* [PATCH] wireless: iwlegacy:  make const array static to shink object code size Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit
From: Colin King @ 2017-09-21 22:56 UTC (permalink / raw)
  To: Stanislaw Gruszka, Kalle Valo, linux-wireless, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Don't populate const array ac_to_fifo on the stack in an inlined
function, instead make it static.  Makes the object code smaller
by over 800 bytes:

   text	   data	    bss	    dec	    hex	filename
 159029	  33154	   1216	 193399	  2f377	4965-mac.o

   text	   data	    bss	    dec	    hex	filename
 158122	  33250	   1216	 192588	  2f04c	4965-mac.o

(gcc version 7.2.0 x86_64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/intel/iwlegacy/4965-mac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlegacy/4965-mac.c b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
index de9b6522c43f..65eba2c24292 100644
--- a/drivers/net/wireless/intel/iwlegacy/4965-mac.c
+++ b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
@@ -1480,7 +1480,7 @@ il4965_get_ac_from_tid(u16 tid)
 static inline int
 il4965_get_fifo_from_tid(u16 tid)
 {
-	const u8 ac_to_fifo[] = {
+	static const u8 ac_to_fifo[] = {
 		IL_TX_FIFO_VO,
 		IL_TX_FIFO_VI,
 		IL_TX_FIFO_BE,
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH 1/1] net: ti: netcp: use setup_timer
From: David Miller @ 2017-09-21 22:49 UTC (permalink / raw)
  To: allen.lkml; +Cc: linux-kernel, w-kwok2, m-karicheri2, netdev
In-Reply-To: <1505998978-14898-1-git-send-email-allen.lkml@gmail.com>

From: Allen Pais <allen.lkml@gmail.com>
Date: Thu, 21 Sep 2017 18:32:58 +0530

>     Use setup_timer function instead of initializing timer with the
>     function and data fields.
> 
> Signed-off-by: Allen Pais <allen.lkml@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1] net: usb: catc: use setup_timer() helper
From: David Miller @ 2017-09-21 22:49 UTC (permalink / raw)
  To: allen.lkml; +Cc: linux-kernel, linux-usb, netdev
In-Reply-To: <1505998455-14776-1-git-send-email-allen.lkml@gmail.com>

From: Allen Pais <allen.lkml@gmail.com>
Date: Thu, 21 Sep 2017 18:24:15 +0530

>     Use setup_timer function instead of initializing timer with the
>     function and data fields.
> 
> Signed-off-by: Allen Pais <allen.lkml@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1] net: wan : hdlc: use setup_timer() helper
From: David Miller @ 2017-09-21 22:49 UTC (permalink / raw)
  To: allen.lkml; +Cc: linux-kernel, khc, netdev
In-Reply-To: <1505998075-12969-1-git-send-email-allen.lkml@gmail.com>

From: Allen Pais <allen.lkml@gmail.com>
Date: Thu, 21 Sep 2017 18:17:55 +0530

>     Use setup_timer function instead of initializing timer with the
>     function and data fields.
> 
> Signed-off-by: Allen Pais <allen.lkml@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1] net:nfc: use setup_timer
From: David Miller @ 2017-09-21 22:48 UTC (permalink / raw)
  To: allen.lkml; +Cc: linux-kernel, sameo, netdev
In-Reply-To: <1505991573-30298-1-git-send-email-allen.lkml@gmail.com>

From: Allen Pais <allen.lkml@gmail.com>
Date: Thu, 21 Sep 2017 16:29:33 +0530

>     Use setup_timer function instead of initializing timer with the
>     function and data fields.
> 
> Signed-off-by: Allen Pais <allen.lkml@gmail.com>

Applied.

^ permalink raw reply

* Re: [net-next v3] bridge: trigger RTM_NEWLINK when interface is modified by bridge ioctl
From: David Miller @ 2017-09-21 22:45 UTC (permalink / raw)
  To: vincent; +Cc: stephen, dsahern, bridge, netdev
In-Reply-To: <20170921100525.20395-1-vincent@bernat.im>

From: Vincent Bernat <vincent@bernat.im>
Date: Thu, 21 Sep 2017 12:05:25 +0200

> Currently, there is a difference in netlink events received when an
> interface is modified through bridge ioctl() or through netlink. This
> patch generates additional events when an interface is added to or
> removed from a bridge via ioctl().
...
> Signed-off-by: Vincent Bernat <vincent@bernat.im>

Applied.

^ permalink raw reply

* [PATCH net-next v2] bpf: Optimize lpm trie delete
From: Craig Gallek @ 2017-09-21 22:43 UTC (permalink / raw)
  To: Daniel Mack, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller; +Cc: netdev

From: Craig Gallek <kraig@google.com>

Before the delete operator was added, this datastructure maintained
an invariant that intermediate nodes were only present when necessary
to build the tree.  This patch updates the delete operation to reinstate
that invariant by removing unnecessary intermediate nodes after a node is
removed and thus keeping the tree structure at a minimal size.

Suggested-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Craig Gallek <kraig@google.com>
---
v2:
  I really failed the interview on this one.  v1 did not collapse all
  extra intermediate node possibilities.  I believe this one does by
	additionally tracking node parent information.  See comments.

 kernel/bpf/lpm_trie.c | 71 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 9d58a576b2ae..34d8a690ea05 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -394,8 +394,8 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 {
 	struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
 	struct bpf_lpm_trie_key *key = _key;
-	struct lpm_trie_node __rcu **trim;
-	struct lpm_trie_node *node;
+	struct lpm_trie_node __rcu **trim, **trim2;
+	struct lpm_trie_node *node, *parent;
 	unsigned long irq_flags;
 	unsigned int next_bit;
 	size_t matchlen = 0;
@@ -407,31 +407,26 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 	raw_spin_lock_irqsave(&trie->lock, irq_flags);
 
 	/* Walk the tree looking for an exact key/length match and keeping
-	 * track of where we could begin trimming the tree.  The trim-point
-	 * is the sub-tree along the walk consisting of only single-child
-	 * intermediate nodes and ending at a leaf node that we want to
-	 * remove.
+	 * track of the path we traverse.  We will need to know the node
+	 * we wish to delete, and the slot that points to the node we want
+	 * to delete.  We may also need to know the nodes parent and the
+	 * slot that contains it.
 	 */
 	trim = &trie->root;
-	node = rcu_dereference_protected(
-		trie->root, lockdep_is_held(&trie->lock));
-	while (node) {
+	trim2 = trim;
+	parent = NULL;
+	while ((node = rcu_dereference_protected(
+		       *trim, lockdep_is_held(&trie->lock)))) {
 		matchlen = longest_prefix_match(trie, node, key);
 
 		if (node->prefixlen != matchlen ||
 		    node->prefixlen == key->prefixlen)
 			break;
 
+		parent = node;
+		trim2 = trim;
 		next_bit = extract_bit(key->data, node->prefixlen);
-		/* If we hit a node that has more than one child or is a valid
-		 * prefix itself, do not remove it. Reset the root of the trim
-		 * path to its descendant on our path.
-		 */
-		if (!(node->flags & LPM_TREE_NODE_FLAG_IM) ||
-		    (node->child[0] && node->child[1]))
-			trim = &node->child[next_bit];
-		node = rcu_dereference_protected(
-			node->child[next_bit], lockdep_is_held(&trie->lock));
+		trim = &node->child[next_bit];
 	}
 
 	if (!node || node->prefixlen != key->prefixlen ||
@@ -442,27 +437,47 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 
 	trie->n_entries--;
 
-	/* If the node we are removing is not a leaf node, simply mark it
+	/* If the node we are removing has two children, simply mark it
 	 * as intermediate and we are done.
 	 */
-	if (rcu_access_pointer(node->child[0]) ||
+	if (rcu_access_pointer(node->child[0]) &&
 	    rcu_access_pointer(node->child[1])) {
 		node->flags |= LPM_TREE_NODE_FLAG_IM;
 		goto out;
 	}
 
-	/* trim should now point to the slot holding the start of a path from
-	 * zero or more intermediate nodes to our leaf node for deletion.
+	/* If the parent of the node we are about to delete is an intermediate
+	 * node, and the deleted node doesn't have any children, we can delete
+	 * the intermediate parent as well and promote its other child
+	 * up the tree.  Doing this maintains the invariant that all
+	 * intermediate nodes have exactly 2 children and that there are no
+	 * unnecessary intermediate nodes in the tree.
 	 */
-	while ((node = rcu_dereference_protected(
-			*trim, lockdep_is_held(&trie->lock)))) {
-		RCU_INIT_POINTER(*trim, NULL);
-		trim = rcu_access_pointer(node->child[0]) ?
-			&node->child[0] :
-			&node->child[1];
+	if (parent && (parent->flags & LPM_TREE_NODE_FLAG_IM) &&
+	    !node->child[0] && !node->child[1]) {
+		if (node == rcu_access_pointer(parent->child[0]))
+			rcu_assign_pointer(
+				*trim2, rcu_access_pointer(parent->child[1]));
+		else
+			rcu_assign_pointer(
+				*trim2, rcu_access_pointer(parent->child[0]));
+		kfree_rcu(parent, rcu);
 		kfree_rcu(node, rcu);
+		goto out;
 	}
 
+	/* The node we are removing has either zero or one child. If there
+	 * is a child, move it into the removed node's slot then delete
+	 * the node.  Otherwise just clear the slot and delete the node.
+	 */
+	if (node->child[0])
+		rcu_assign_pointer(*trim, rcu_access_pointer(node->child[0]));
+	else if (node->child[1])
+		rcu_assign_pointer(*trim, rcu_access_pointer(node->child[1]));
+	else
+		RCU_INIT_POINTER(*trim, NULL);
+	kfree_rcu(node, rcu);
+
 out:
 	raw_spin_unlock_irqrestore(&trie->lock, irq_flags);
 
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* Re: [PATCH net-next 12/14] gtp: Configuration for zero UDP checksum
From: Tom Herbert @ 2017-09-21 22:41 UTC (permalink / raw)
  To: Harald Welte
  Cc: Tom Herbert, David Miller, Linux Kernel Network Developers,
	Pablo Neira Ayuso, Rohit Seth
In-Reply-To: <20170921015522.sv5netrlcj5vebys@nataraja>

On Wed, Sep 20, 2017 at 6:55 PM, Harald Welte <laforge@gnumonks.org> wrote:
> Hi Tom,
>
> On Wed, Sep 20, 2017 at 11:09:29AM -0700, Tom Herbert wrote:
>> On Mon, Sep 18, 2017 at 9:24 PM, David Miller <davem@davemloft.net> wrote:
>> > From: Tom Herbert <tom@quantonium.net>
>> >> Add configuration to control use of zero checksums on transmit for both
>> >> IPv4 and IPv6, and control over accepting zero IPv6 checksums on
>> >> receive.
>> >
>> > I thought we were trying to move away from this special case of allowing
>> > zero UDP checksums with tunnels, especially for ipv6.
>>
>> I don't have a strong preference either way. I like consistency with
>> VXLAN and foo/UDP, but I guess it's not required. Interestingly, since
>> GTP only carries IP, IPv6 zero checksums are actually safer here than
>> VXLAN or GRE/UDP.
>
> Just for the record: I don't care either way and I defer to the kernel
> networking developers to decide if they want to have zero UDP checksum
> in GTP or not.
>
> The 3GPP specs don't say anything about UDP checksums.  So there's no
> requirement to use them, and hence operation without UDP checksums
> should be compliant.  Cisco GTP implementation has udp checksumming
> configurable, so other implementations also seem to provide both ways.
>
> In general, I would argue one wants UDP checksumming of GTP in all
> setups, as while the inner IP packet might be protected, the GTP header
> itself is not, and that's what contains important data suhc as the TEID
> (Tunnel Endpoint ID).  But that's of course just my personal opinion,
> and I'm not saying we should prevent people from using lower protection
> if that's what they want.
>
The tradeoffs and requirements of zero UDP6 checksums are discussed at
length in RFC6935 and RFC6936. Given other implementations make it
configurable it should also be here.

Tom

> --
> - Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
> ============================================================================
> "Privacy in residential applications is a desirable marketing option."
>                                                   (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply

* Re: [PATCH 2/2] ip_tunnel: add mpls over gre encapsulation
From: Roopa Prabhu @ 2017-09-21 22:35 UTC (permalink / raw)
  To: Amine Kherbouche; +Cc: netdev@vger.kernel.org, xeb, David Lamparter
In-Reply-To: <1505985924-12479-3-git-send-email-amine.kherbouche@6wind.com>

On Thu, Sep 21, 2017 at 2:25 AM, Amine Kherbouche
<amine.kherbouche@6wind.com> wrote:
> This commit introduces the MPLSoGRE support (RFC 4023), using ip tunnel
> API.
>
> Encap:
>   - Add a new iptunnel type mpls.
>
> Decap:
>   - pull gre hdr and call mpls_forward().
>
> Signed-off-by: Amine Kherbouche <amine.kherbouche@6wind.com>
> ---
>  include/net/gre.h              |  3 +++
>  include/uapi/linux/if_tunnel.h |  1 +
>  net/ipv4/gre_demux.c           | 22 ++++++++++++++++++++++
>  net/ipv4/ip_gre.c              |  9 +++++++++
>  net/ipv6/ip6_gre.c             |  7 +++++++
>  net/mpls/af_mpls.c             | 37 +++++++++++++++++++++++++++++++++++++
>  6 files changed, 79 insertions(+)
>
> diff --git a/include/net/gre.h b/include/net/gre.h
> index d25d836..88a8343 100644
> --- a/include/net/gre.h
> +++ b/include/net/gre.h
> @@ -35,6 +35,9 @@ struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
>                                        u8 name_assign_type);
>  int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
>                      bool *csum_err, __be16 proto, int nhs);
> +#if IS_ENABLED(CONFIG_MPLS)
> +int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len);
> +#endif
>
>  static inline int gre_calc_hlen(__be16 o_flags)
>  {
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 2e52088..a2f48c0 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -84,6 +84,7 @@ enum tunnel_encap_types {
>         TUNNEL_ENCAP_NONE,
>         TUNNEL_ENCAP_FOU,
>         TUNNEL_ENCAP_GUE,
> +       TUNNEL_ENCAP_MPLS,
>  };
>
>  #define TUNNEL_ENCAP_FLAG_CSUM         (1<<0)
> diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
> index b798862..a6a937e 100644
> --- a/net/ipv4/gre_demux.c
> +++ b/net/ipv4/gre_demux.c
> @@ -23,6 +23,9 @@
>  #include <linux/netdevice.h>
>  #include <linux/if_tunnel.h>
>  #include <linux/spinlock.h>
> +#if IS_ENABLED(CONFIG_MPLS)
> +#include <linux/mpls.h>
> +#endif
>  #include <net/protocol.h>
>  #include <net/gre.h>
>
> @@ -122,6 +125,25 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
>  }
>  EXPORT_SYMBOL(gre_parse_header);
>
> +#if IS_ENABLED(CONFIG_MPLS)
> +int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len)
> +{
> +       if (unlikely(!pskb_may_pull(skb, gre_hdr_len)))
> +               goto drop;
> +
> +       /* Pop GRE hdr and reset the skb */
> +       skb_pull(skb, gre_hdr_len);
> +       skb_reset_network_header(skb);
> +
> +       mpls_forward(skb, skb->dev, NULL, NULL);
> +
> +       return 0;
> +drop:
> +       return NET_RX_DROP;
> +}
> +EXPORT_SYMBOL(mpls_gre_rcv);
> +#endif
> +
>  static int gre_rcv(struct sk_buff *skb)
>  {
>         const struct gre_protocol *proto;
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 9cee986..dd4431c 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -412,10 +412,19 @@ static int gre_rcv(struct sk_buff *skb)
>                         return 0;
>         }
>
> +#if IS_ENABLED(CONFIG_MPLS)
> +       if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
> +               if (mpls_gre_rcv(skb, hdr_len))
> +                       goto drop;
> +               return 0;
> +       }
> +#endif
> +
>         if (ipgre_rcv(skb, &tpi, hdr_len) == PACKET_RCVD)
>                 return 0;
>
>         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
> +
>  drop:
>         kfree_skb(skb);
>         return 0;
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index c82d41e..e52396d 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -476,6 +476,13 @@ static int gre_rcv(struct sk_buff *skb)
>         if (hdr_len < 0)
>                 goto drop;
>
> +#if IS_ENABLED(CONFIG_MPLS)
> +       if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
> +               if (mpls_gre_rcv(skb, hdr_len))
> +                       goto drop;
> +               return 0;
> +       }
> +#endif
>         if (iptunnel_pull_header(skb, hdr_len, tpi.proto, false))
>                 goto drop;
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 36ea2ad..060ed07 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -16,6 +16,7 @@
>  #include <net/arp.h>
>  #include <net/ip_fib.h>
>  #include <net/netevent.h>
> +#include <net/ip_tunnels.h>
>  #include <net/netns/generic.h>
>  #if IS_ENABLED(CONFIG_IPV6)
>  #include <net/ipv6.h>
> @@ -39,6 +40,40 @@ static int one = 1;
>  static int label_limit = (1 << 20) - 1;
>  static int ttl_max = 255;
>
> +size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e)
> +{
> +       return sizeof(struct mpls_shim_hdr);
> +}
> +
> +int ipgre_mpls_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
> +                           u8 *protocol, struct flowi4 *fl4)
> +{
> +       return 0;
> +}


any reason you are supporting only rx ?


> +
> +static const struct ip_tunnel_encap_ops mpls_iptun_ops = {
> +       .encap_hlen = ipgre_mpls_encap_hlen,
> +       .build_header = ipgre_mpls_build_header,
> +};
> +
> +int ipgre_tunnel_encap_add_mpls_ops(void)
> +{
> +       int ret;
> +
> +       ret = ip_tunnel_encap_add_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
> +       if (ret < 0) {
> +               pr_err("can't add mplsgre ops\n");
> +               return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +static void ipgre_tunnel_encap_del_mpls_ops(void)
> +{
> +       ip_tunnel_encap_del_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
> +}
> +
>  static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
>                        struct nlmsghdr *nlh, struct net *net, u32 portid,
>                        unsigned int nlm_flags);
> @@ -2486,6 +2521,7 @@ static int __init mpls_init(void)
>                       0);
>         rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
>                       mpls_netconf_dump_devconf, 0);
> +       ipgre_tunnel_encap_add_mpls_ops();
>         err = 0;
>  out:
>         return err;
> @@ -2503,6 +2539,7 @@ static void __exit mpls_exit(void)
>         dev_remove_pack(&mpls_packet_type);
>         unregister_netdevice_notifier(&mpls_dev_notifier);
>         unregister_pernet_subsys(&mpls_net_ops);
> +       ipgre_tunnel_encap_del_mpls_ops();
>  }
>  module_exit(mpls_exit);
>
> --
> 2.1.4
>

^ permalink raw reply

* Re: [PATCH net-next] cxgb4: avoid stall while shutting down the adapter
From: David Miller @ 2017-09-21 22:35 UTC (permalink / raw)
  To: ganeshgr; +Cc: netdev, nirranjan, indranil, venkatesh
In-Reply-To: <1505978447-14944-1-git-send-email-ganeshgr@chelsio.com>

From: Ganesh Goudar <ganeshgr@chelsio.com>
Date: Thu, 21 Sep 2017 12:50:47 +0530

> do not wait for completion while deleting the filters
> when the adapter is shutting down because we may not get
> the response as interrupts will be disabled.
> 
> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 1/1] net/smc: parameter cleanup in smc_cdc_get_free_slot()
From: David Miller @ 2017-09-21 22:33 UTC (permalink / raw)
  To: ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA,
	jwi-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	heiko.carstens-tA70FqPdS9bQT0dZR+AlfA,
	raspl-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
In-Reply-To: <20170921071734.16977-1-ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

From: Ursula Braun <ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date: Thu, 21 Sep 2017 09:17:34 +0200

> Use the smc_connection as first parameter with smc_cdc_get_free_slot().
> This is just a small code cleanup, no functional change.
> 
> Signed-off-by: Ursula Braun <ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net 0/9] net/smc: bug fixes 2017-09-20
From: David Miller @ 2017-09-21 22:31 UTC (permalink / raw)
  To: ubraun
  Cc: netdev, linux-rdma, linux-s390, jwi, schwidefsky, heiko.carstens,
	raspl
In-Reply-To: <20170921071634.16883-1-ubraun@linux.vnet.ibm.com>

From: Ursula Braun <ubraun@linux.vnet.ibm.com>
Date: Thu, 21 Sep 2017 09:16:25 +0200

> here is a collection of small smc-patches built for net fixing
> smc problems in different areas.

Series applied, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox