Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/2 net-2.6.25] [IPV4] Remove extra argument from arp_ignore.
From: Denis V. Lunev @ 2008-01-14 14:43 UTC (permalink / raw)
  To: David Miller; +Cc: den, netdev, devel
In-Reply-To: <20080114.063753.42458530.davem@davemloft.net>

David Miller wrote:
> This patch series is numbered, but your other patch series sent a few
> moments ago had no sequence numbers in the subject lines or changelog.
> 
> How can I know what order to apply those in and do they need to go in
> before or after this set?
> 
> I shouldn't have to ask questions like this, so please help avoid
> confusion of this nature in the future.
> 
> Thanks.
> 

non-numbered patches do not intersect with each other.

numbered ones depends on each other but not from the rest.

Sorry for inconvenience :(

^ permalink raw reply

* Re: [PATCH 1/2 net-2.6.25] [IPV4] Remove extra argument from arp_ignore.
From: David Miller @ 2008-01-14 14:37 UTC (permalink / raw)
  To: den; +Cc: netdev, devel
In-Reply-To: <1200319481-18459-1-git-send-email-den@openvz.org>


This patch series is numbered, but your other patch series sent a few
moments ago had no sequence numbers in the subject lines or changelog.

How can I know what order to apply those in and do they need to go in
before or after this set?

I shouldn't have to ask questions like this, so please help avoid
confusion of this nature in the future.

Thanks.

^ permalink raw reply

* [PATCH 1/2 net-2.6.25] [IPV4] Remove extra argument from arp_ignore.
From: Denis V. Lunev @ 2008-01-14 14:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev

arp_ignore has two arguments: dev & in_dev. dev is used for inet_confirm_addr
calling only.

inet_confirm_addr, in turn, either gets in_dev from the device passed or
iterates over all network devices if the device passed is NULL. It seems
logical to directly pass in_dev into inet_confirm_addr.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/linux/inetdevice.h |    2 +-
 net/ipv4/arp.c             |   11 +++++------
 net/ipv4/devinet.c         |   17 ++++++-----------
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index b3c5081..45f3731 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -135,7 +135,7 @@ extern int		devinet_ioctl(unsigned int cmd, void __user *);
 extern void		devinet_init(void);
 extern struct in_device	*inetdev_by_index(int);
 extern __be32		inet_select_addr(const struct net_device *dev, __be32 dst, int scope);
-extern __be32		inet_confirm_addr(const struct net_device *dev, __be32 dst, __be32 local, int scope);
+extern __be32		inet_confirm_addr(struct in_device *in_dev, __be32 dst, __be32 local, int scope);
 extern struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix, __be32 mask);
 
 static __inline__ int inet_ifa_match(__be32 addr, struct in_ifaddr *ifa)
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index e944d98..f38e1a9 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -382,8 +382,7 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
 		read_unlock_bh(&neigh->lock);
 }
 
-static int arp_ignore(struct in_device *in_dev, struct net_device *dev,
-		      __be32 sip, __be32 tip)
+static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
 {
 	int scope;
 
@@ -403,7 +402,7 @@ static int arp_ignore(struct in_device *in_dev, struct net_device *dev,
 	case 3:	/* Do not reply for scope host addresses */
 		sip = 0;
 		scope = RT_SCOPE_LINK;
-		dev = NULL;
+		in_dev = NULL;
 		break;
 	case 4:	/* Reserved */
 	case 5:
@@ -415,7 +414,7 @@ static int arp_ignore(struct in_device *in_dev, struct net_device *dev,
 	default:
 		return 0;
 	}
-	return !inet_confirm_addr(dev, sip, tip, scope);
+	return !inet_confirm_addr(in_dev, sip, tip, scope);
 }
 
 static int arp_filter(__be32 sip, __be32 tip, struct net_device *dev)
@@ -807,7 +806,7 @@ static int arp_process(struct sk_buff *skb)
 	if (sip == 0) {
 		if (arp->ar_op == htons(ARPOP_REQUEST) &&
 		    inet_addr_type(&init_net, tip) == RTN_LOCAL &&
-		    !arp_ignore(in_dev,dev,sip,tip))
+		    !arp_ignore(in_dev, sip, tip))
 			arp_send(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha,
 				 dev->dev_addr, sha);
 		goto out;
@@ -825,7 +824,7 @@ static int arp_process(struct sk_buff *skb)
 				int dont_send = 0;
 
 				if (!dont_send)
-					dont_send |= arp_ignore(in_dev,dev,sip,tip);
+					dont_send |= arp_ignore(in_dev,sip,tip);
 				if (!dont_send && IN_DEV_ARPFILTER(in_dev))
 					dont_send |= arp_filter(sip,tip,dev);
 				if (!dont_send)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 03db15b..dc1665a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -968,24 +968,19 @@ static __be32 confirm_addr_indev(struct in_device *in_dev, __be32 dst,
 
 /*
  * Confirm that local IP address exists using wildcards:
- * - dev: only on this interface, 0=any interface
+ * - in_dev: only on this interface, 0=any interface
  * - dst: only in the same subnet as dst, 0=any dst
  * - local: address, 0=autoselect the local address
  * - scope: maximum allowed scope value for the local address
  */
-__be32 inet_confirm_addr(const struct net_device *dev, __be32 dst, __be32 local, int scope)
+__be32 inet_confirm_addr(struct in_device *in_dev,
+			 __be32 dst, __be32 local, int scope)
 {
 	__be32 addr = 0;
-	struct in_device *in_dev;
-
-	if (dev) {
-		rcu_read_lock();
-		if ((in_dev = __in_dev_get_rcu(dev)))
-			addr = confirm_addr_indev(in_dev, dst, local, scope);
-		rcu_read_unlock();
+	struct net_device *dev;
 
-		return addr;
-	}
+	if (in_dev != NULL)
+		return confirm_addr_indev(in_dev, dst, local, scope);
 
 	read_lock(&dev_base_lock);
 	rcu_read_lock();
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 2/2 net-2.6.25] [NETNS] Process inet_confirm_addr in the correct namespace.
From: Denis V. Lunev @ 2008-01-14 14:04 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev
In-Reply-To: <1200319481-18459-1-git-send-email-den@openvz.org>

inet_confirm_addr can be called with NULL in_dev from arp_ignore iff
scope is RT_SCOPE_LINK.

Lets always pass the device and check for RT_SCOPE_LINK scope inside
inet_confirm_addr. This let us take network namespace from in_device a
need for an additional argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/arp.c     |    1 -
 net/ipv4/devinet.c |    6 ++++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index f38e1a9..b715ec0 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -402,7 +400,6 @@ static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
 	case 3:	/* Do not reply for scope host addresses */
 		sip = 0;
 		scope = RT_SCOPE_LINK;
-		in_dev = NULL;
 		break;
 	case 4:	/* Reserved */
 	case 5:
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index dc1665a..4569c69 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -978,13 +978,15 @@ __be32 inet_confirm_addr(struct in_device *in_dev,
 {
 	__be32 addr = 0;
 	struct net_device *dev;
+	struct net *net;
 
-	if (in_dev != NULL)
+	if (scope != RT_SCOPE_LINK)
 		return confirm_addr_indev(in_dev, dst, local, scope);
 
+	net = in_dev->dev->nd_net;
 	read_lock(&dev_base_lock);
 	rcu_read_lock();
-	for_each_netdev(&init_net, dev) {
+	for_each_netdev(net, dev) {
 		if ((in_dev = __in_dev_get_rcu(dev))) {
 			addr = confirm_addr_indev(in_dev, dst, local, scope);
 			if (addr)
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH net-2.6.25] [ARP] Move inet_addr_type call after simple error checks in arp_contructor.
From: Denis V. Lunev @ 2008-01-14 13:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, Denis V. Lunev

The neighbour entry will be destroyed in the case of error, so it is pointless
to perform constly routing table lookup in this case.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/arp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index b715ec0..49c24ff 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -235,8 +235,6 @@ static int arp_constructor(struct neighbour *neigh)
 	struct in_device *in_dev;
 	struct neigh_parms *parms;
 
-	neigh->type = inet_addr_type(&init_net, addr);
-
 	rcu_read_lock();
 	in_dev = __in_dev_get_rcu(dev);
 	if (in_dev == NULL) {
@@ -244,6 +242,8 @@ static int arp_constructor(struct neighbour *neigh)
 		return -EINVAL;
 	}
 
+	neigh->type = inet_addr_type(&init_net, addr);
+
 	parms = in_dev->arp_parms;
 	__neigh_parms_put(neigh->parms);
 	neigh->parms = neigh_parms_clone(parms);
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH net-2.6.25] [ARP] Remove overkill checks from neigh_param_alloc.
From: Denis V. Lunev @ 2008-01-14 13:48 UTC (permalink / raw)
  To: davem
  Cc: netdev, devel,
	den1/0006-ARP-Move-inet_addr_type-call-after-simple-error-ch.patch,
	Denis V. Lunev, Pavel Emelyanov
In-Reply-To: <1200318504-18215-2-git-send-email-den@openvz.org>

Valid network device is always passed into neigh_param_alloc, so remove
extra checking for dev == NULL. Additionally, cleanup bogus netns assignment.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
---
 net/core/neighbour.c |   18 +++++++-----------
 1 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index af49137..32f1a23 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1301,10 +1301,7 @@ struct neigh_parms *neigh_parms_alloc(struct net_device *dev,
 	struct neigh_parms *p, *ref;
 	struct net *net;
 
-	net = &init_net;
-	if (dev)
-		net = dev->nd_net;
-
+	net = dev->nd_net;
 	ref = lookup_neigh_params(tbl, net, 0);
 	if (!ref)
 		return NULL;
@@ -1316,15 +1313,14 @@ struct neigh_parms *neigh_parms_alloc(struct net_device *dev,
 		INIT_RCU_HEAD(&p->rcu_head);
 		p->reachable_time =
 				neigh_rand_reach_time(p->base_reachable_time);
-		if (dev) {
-			if (dev->neigh_setup && dev->neigh_setup(dev, p)) {
-				kfree(p);
-				return NULL;
-			}
 
-			dev_hold(dev);
-			p->dev = dev;
+		if (dev->neigh_setup && dev->neigh_setup(dev, p)) {
+			kfree(p);
+			return NULL;
 		}
+
+		dev_hold(dev);
+		p->dev = dev;
 		p->net = hold_net(net);
 		p->sysctl_table = NULL;
 		write_lock_bh(&tbl->lock);
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH net-2.6.25] [IPV4] fib_rules_unregister is essentially void.
From: Denis V. Lunev @ 2008-01-14 13:48 UTC (permalink / raw)
  To: davem
  Cc: netdev, devel,
	den1/0006-ARP-Move-inet_addr_type-call-after-simple-error-ch.patch,
	Denis V. Lunev
In-Reply-To: <1200318504-18215-1-git-send-email-den@openvz.org>

fib_rules_unregister is called only after successful register and the return
code is never checked.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/net/fib_rules.h |    2 +-
 net/core/fib_rules.c    |   21 ++++-----------------
 2 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 88f870f..34349f9 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -104,7 +104,7 @@ static inline u32 frh_get_table(struct fib_rule_hdr *frh, struct nlattr **nla)
 }
 
 extern int fib_rules_register(struct net *, struct fib_rules_ops *);
-extern int fib_rules_unregister(struct net *, struct fib_rules_ops *);
+extern void fib_rules_unregister(struct net *, struct fib_rules_ops *);
 extern void                     fib_rules_cleanup_ops(struct fib_rules_ops *);
 
 extern int			fib_rules_lookup(struct fib_rules_ops *,
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 0eecf4c..42ccaf5 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -115,29 +115,16 @@ void fib_rules_cleanup_ops(struct fib_rules_ops *ops)
 }
 EXPORT_SYMBOL_GPL(fib_rules_cleanup_ops);
 
-int fib_rules_unregister(struct net *net, struct fib_rules_ops *ops)
+void fib_rules_unregister(struct net *net, struct fib_rules_ops *ops)
 {
-	int err = 0;
-	struct fib_rules_ops *o;
 
 	spin_lock(&net->rules_mod_lock);
-	list_for_each_entry(o, &net->rules_ops, list) {
-		if (o == ops) {
-			list_del_rcu(&o->list);
-			fib_rules_cleanup_ops(ops);
-			goto out;
-		}
-	}
-
-	err = -ENOENT;
-out:
+	list_del_rcu(&ops->list);
+	fib_rules_cleanup_ops(ops);
 	spin_unlock(&net->rules_mod_lock);
 
 	synchronize_rcu();
-	if (!err)
-		release_net(net);
-
-	return err;
+	release_net(net);
 }
 
 EXPORT_SYMBOL_GPL(fib_rules_unregister);
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH net-2.6.25] [NETNS] Make arp code network namespace consistent.
From: Denis V. Lunev @ 2008-01-14 13:48 UTC (permalink / raw)
  To: davem
  Cc: netdev, devel,
	den1/0006-ARP-Move-inet_addr_type-call-after-simple-error-ch.patch,
	Denis V. Lunev

Some calls in the arp.c have network namespace as an argument. Getting
init_net inside these functions is simply inconsistent. Fix this.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/arp.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 49c24ff..0db7d49 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -969,13 +969,13 @@ static int arp_req_set_public(struct net *net, struct arpreq *r,
 	if (mask && mask != htonl(0xFFFFFFFF))
 		return -EINVAL;
 	if (!dev && (r->arp_flags & ATF_COM)) {
-		dev = dev_getbyhwaddr(&init_net, r->arp_ha.sa_family,
+		dev = dev_getbyhwaddr(net, r->arp_ha.sa_family,
 				r->arp_ha.sa_data);
 		if (!dev)
 			return -ENODEV;
 	}
 	if (mask) {
-		if (pneigh_lookup(&arp_tbl, &init_net, &ip, dev, 1) == NULL)
+		if (pneigh_lookup(&arp_tbl, net, &ip, dev, 1) == NULL)
 			return -ENOBUFS;
 		return 0;
 	}
@@ -1084,7 +1084,7 @@ static int arp_req_delete_public(struct net *net, struct arpreq *r,
 	__be32 mask = ((struct sockaddr_in *)&r->arp_netmask)->sin_addr.s_addr;
 
 	if (mask == htonl(0xFFFFFFFF))
-		return pneigh_delete(&arp_tbl, &init_net, &ip, dev);
+		return pneigh_delete(&arp_tbl, net, &ip, dev);
 
 	if (mask)
 		return -EINVAL;
@@ -1162,7 +1162,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	rtnl_lock();
 	if (r.arp_dev[0]) {
 		err = -ENODEV;
-		if ((dev = __dev_get_by_name(&init_net, r.arp_dev)) == NULL)
+		if ((dev = __dev_get_by_name(net, r.arp_dev)) == NULL)
 			goto out;
 
 		/* Mmmm... It is wrong... ARPHRD_NETROM==0 */
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH net-2.6.25] [ARP] neigh_parms_put(destroy) are essentially local to core/neighbour.c.
From: Denis V. Lunev @ 2008-01-14 13:48 UTC (permalink / raw)
  To: davem
  Cc: netdev, devel,
	den1/0006-ARP-Move-inet_addr_type-call-after-simple-error-ch.patch,
	Denis V. Lunev
In-Reply-To: <1200318504-18215-4-git-send-email-den@openvz.org>

Make them static.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/net/neighbour.h |    7 -------
 net/core/neighbour.c    |   11 ++++++++++-
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index a0d42a5..ebbfb50 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -213,7 +213,6 @@ extern struct neighbour 	*neigh_event_ns(struct neigh_table *tbl,
 
 extern struct neigh_parms	*neigh_parms_alloc(struct net_device *dev, struct neigh_table *tbl);
 extern void			neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms);
-extern void			neigh_parms_destroy(struct neigh_parms *parms);
 extern unsigned long		neigh_rand_reach_time(unsigned long base);
 
 extern void			pneigh_enqueue(struct neigh_table *tbl, struct neigh_parms *p,
@@ -254,12 +253,6 @@ static inline void __neigh_parms_put(struct neigh_parms *parms)
 	atomic_dec(&parms->refcnt);
 }
 
-static inline void neigh_parms_put(struct neigh_parms *parms)
-{
-	if (atomic_dec_and_test(&parms->refcnt))
-		neigh_parms_destroy(parms);
-}
-
 static inline struct neigh_parms *neigh_parms_clone(struct neigh_parms *parms)
 {
 	atomic_inc(&parms->refcnt);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 9b0b773..41394db 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -55,6 +55,8 @@
 
 #define PNEIGH_HASHMASK		0xF
 
+static inline void neigh_parms_put(struct neigh_parms *parms);
+
 static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
@@ -1348,7 +1350,7 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
 	NEIGH_PRINTK1("neigh_parms_release: not found\n");
 }
 
-void neigh_parms_destroy(struct neigh_parms *parms)
+static void neigh_parms_destroy(struct neigh_parms *parms)
 {
 	release_net(parms->net);
 	if (parms->dev)
@@ -1356,6 +1358,13 @@ void neigh_parms_destroy(struct neigh_parms *parms)
 	kfree(parms);
 }
 
+static inline void neigh_parms_put(struct neigh_parms *parms)
+{
+	if (atomic_dec_and_test(&parms->refcnt))
+		neigh_parms_destroy(parms);
+}
+
+
 static struct lock_class_key neigh_table_proxy_queue_class;
 
 void neigh_table_init_no_netlink(struct neigh_table *tbl)
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH net-2.6.25] [ARP] Remove forward declaration of neigh_changeaddr.
From: Denis V. Lunev @ 2008-01-14 13:48 UTC (permalink / raw)
  To: davem
  Cc: netdev, devel,
	den1/0006-ARP-Move-inet_addr_type-call-after-simple-error-ch.patch,
	Denis V. Lunev
In-Reply-To: <1200318504-18215-3-git-send-email-den@openvz.org>

No need for this. It is declared in the neighbour.h

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/core/neighbour.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 2eab6a5..9b0b773 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -59,7 +59,6 @@ static void neigh_timer_handler(unsigned long arg);
 static void __neigh_notify(struct neighbour *n, int type, int flags);
 static void neigh_update_notify(struct neighbour *neigh);
 static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
-void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 
 static struct neigh_table *neigh_tables;
 #ifdef CONFIG_PROC_FS
-- 
1.5.3.rc5


^ permalink raw reply related

* Re: [PATCH net-2.6.25 4/4][NETNS][RAW]: Create the /proc/net/raw(6) in each namespace.
From: David Miller @ 2008-01-14 13:37 UTC (permalink / raw)
  To: xemul; +Cc: netdev, devel
In-Reply-To: <478B5FE4.3020602@openvz.org>

From: Pavel Emelyanov <xemul@openvz.org>
Date: Mon, 14 Jan 2008 16:13:08 +0300

> To do so, just register the proper subsystem and create files in
> ->init callbacks.
> 
> No other special per-namespace handling for raw sockets is required.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-2.6.25 3/4][NETNS][RAW]: Eliminate explicit init_net references.
From: David Miller @ 2008-01-14 13:37 UTC (permalink / raw)
  To: xemul; +Cc: netdev, devel
In-Reply-To: <478B5F75.70608@openvz.org>

From: Pavel Emelyanov <xemul@openvz.org>
Date: Mon, 14 Jan 2008 16:11:17 +0300

> Happily, in all the rest places (->bind callbacks only), that require the 
> struct net, we have a socket, so get the net from it.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-2.6.25 2/4][NETNS][RAW]: Make /proc/net/raw(6) show per-namespace socket list.
From: David Miller @ 2008-01-14 13:37 UTC (permalink / raw)
  To: xemul; +Cc: netdev, devel
In-Reply-To: <478B5EDA.4000407@openvz.org>

From: Pavel Emelyanov <xemul@openvz.org>
Date: Mon, 14 Jan 2008 16:08:42 +0300

> Pull the struct net pointer up to the showing functions
> to filter the sockets depending on their namespaces.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-2.6.25 1/4][NETNS][RAW]: Make ipv[46] raw sockets lookup namespaces aware.
From: David Miller @ 2008-01-14 13:36 UTC (permalink / raw)
  To: xemul; +Cc: netdev, devel
In-Reply-To: <478B5DBB.6020603@openvz.org>

From: Pavel Emelyanov <xemul@openvz.org>
Date: Mon, 14 Jan 2008 16:03:55 +0300

> This requires just to pass the appropriate struct net pointer 
> into __raw_v[46]_lookup and skip sockets that do not belong
> to a needed namespace.
> 
> The proper net is get from skb->dev in all the cases.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

Applied.

^ permalink raw reply

* [PATCH net-2.6.25 4/4][NETNS][RAW]: Create the /proc/net/raw(6) in each namespace.
From: Pavel Emelyanov @ 2008-01-14 13:13 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel

To do so, just register the proper subsystem and create files in
->init callbacks.

No other special per-namespace handling for raw sockets is required.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 net/ipv4/raw.c |   22 +++++++++++++++++++---
 net/ipv6/raw.c |   22 +++++++++++++++++++---
 2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 206c869..91a5218 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -1003,15 +1003,31 @@ static const struct file_operations raw_seq_fops = {
 	.release = seq_release_net,
 };
 
-int __init raw_proc_init(void)
+static __net_init int raw_init_net(struct net *net)
 {
-	if (!proc_net_fops_create(&init_net, "raw", S_IRUGO, &raw_seq_fops))
+	if (!proc_net_fops_create(net, "raw", S_IRUGO, &raw_seq_fops))
 		return -ENOMEM;
+
 	return 0;
 }
 
+static __net_exit void raw_exit_net(struct net *net)
+{
+	proc_net_remove(net, "raw");
+}
+
+static __net_initdata struct pernet_operations raw_net_ops = {
+	.init = raw_init_net,
+	.exit = raw_exit_net,
+};
+
+int __init raw_proc_init(void)
+{
+	return register_pernet_subsys(&raw_net_ops);
+}
+
 void __init raw_proc_exit(void)
 {
-	proc_net_remove(&init_net, "raw");
+	unregister_pernet_subsys(&raw_net_ops);
 }
 #endif /* CONFIG_PROC_FS */
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 970529e..4d88055 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1270,16 +1270,32 @@ static const struct file_operations raw6_seq_fops = {
 	.release =	seq_release_net,
 };
 
-int __init raw6_proc_init(void)
+static int raw6_init_net(struct net *net)
 {
-	if (!proc_net_fops_create(&init_net, "raw6", S_IRUGO, &raw6_seq_fops))
+	if (!proc_net_fops_create(net, "raw6", S_IRUGO, &raw6_seq_fops))
 		return -ENOMEM;
+
 	return 0;
 }
 
+static void raw6_exit_net(struct net *net)
+{
+	proc_net_remove(net, "raw6");
+}
+
+static struct pernet_operations raw6_net_ops = {
+	.init = raw6_init_net,
+	.exit = raw6_exit_net,
+};
+
+int __init raw6_proc_init(void)
+{
+	return register_pernet_subsys(&raw6_net_ops);
+}
+
 void raw6_proc_exit(void)
 {
-	proc_net_remove(&init_net, "raw6");
+	unregister_pernet_subsys(&raw6_net_ops);
 }
 #endif	/* CONFIG_PROC_FS */
 
-- 
1.5.3.4


^ permalink raw reply related

* [PATCH net-2.6.25 3/4][NETNS][RAW]: Eliminate explicit init_net references.
From: Pavel Emelyanov @ 2008-01-14 13:11 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel

Happily, in all the rest places (->bind callbacks only), that require the 
struct net, we have a socket, so get the net from it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 net/ipv4/raw.c |    2 +-
 net/ipv6/raw.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 4e95372..206c869 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -625,7 +625,7 @@ static int raw_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 
 	if (sk->sk_state != TCP_CLOSE || addr_len < sizeof(struct sockaddr_in))
 		goto out;
-	chk_addr_ret = inet_addr_type(&init_net, addr->sin_addr.s_addr);
+	chk_addr_ret = inet_addr_type(sk->sk_net, addr->sin_addr.s_addr);
 	ret = -EADDRNOTAVAIL;
 	if (addr->sin_addr.s_addr && chk_addr_ret != RTN_LOCAL &&
 	    chk_addr_ret != RTN_MULTICAST && chk_addr_ret != RTN_BROADCAST)
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 026fa91..970529e 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -291,7 +291,7 @@ static int rawv6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 			if (!sk->sk_bound_dev_if)
 				goto out;
 
-			dev = dev_get_by_index(&init_net, sk->sk_bound_dev_if);
+			dev = dev_get_by_index(sk->sk_net, sk->sk_bound_dev_if);
 			if (!dev) {
 				err = -ENODEV;
 				goto out;
@@ -304,7 +304,7 @@ static int rawv6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		v4addr = LOOPBACK4_IPV6;
 		if (!(addr_type & IPV6_ADDR_MULTICAST))	{
 			err = -EADDRNOTAVAIL;
-			if (!ipv6_chk_addr(&init_net, &addr->sin6_addr,
+			if (!ipv6_chk_addr(sk->sk_net, &addr->sin6_addr,
 					   dev, 0)) {
 				if (dev)
 					dev_put(dev);
-- 
1.5.3.4


^ permalink raw reply related

* [PATCH net-2.6.25 2/4][NETNS][RAW]: Make /proc/net/raw(6) show per-namespace socket list.
From: Pavel Emelyanov @ 2008-01-14 13:08 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel

Pull the struct net pointer up to the showing functions
to filter the sockets depending on their namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 include/net/raw.h |    3 ++-
 net/ipv4/raw.c    |   20 ++++++++++++--------
 net/ipv6/raw.c    |    4 ++--
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/net/raw.h b/include/net/raw.h
index 4d1aba0..cca81d8 100644
--- a/include/net/raw.h
+++ b/include/net/raw.h
@@ -39,6 +39,7 @@ extern int  raw_proc_init(void);
 extern void raw_proc_exit(void);
 
 struct raw_iter_state {
+	struct seq_net_private p;
 	int bucket;
 	unsigned short family;
 	struct raw_hashinfo *h;
@@ -48,7 +49,7 @@ struct raw_iter_state {
 void *raw_seq_start(struct seq_file *seq, loff_t *pos);
 void *raw_seq_next(struct seq_file *seq, void *v, loff_t *pos);
 void raw_seq_stop(struct seq_file *seq, void *v);
-int raw_seq_open(struct file *file, struct raw_hashinfo *h,
+int raw_seq_open(struct inode *ino, struct file *file, struct raw_hashinfo *h,
 		unsigned short family);
 
 #endif
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index a490a9d..4e95372 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -860,7 +860,8 @@ static struct sock *raw_get_first(struct seq_file *seq)
 		struct hlist_node *node;
 
 		sk_for_each(sk, node, &state->h->ht[state->bucket])
-			if (sk->sk_family == state->family)
+			if (sk->sk_net == state->p.net &&
+					sk->sk_family == state->family)
 				goto found;
 	}
 	sk = NULL;
@@ -876,7 +877,8 @@ static struct sock *raw_get_next(struct seq_file *seq, struct sock *sk)
 		sk = sk_next(sk);
 try_again:
 		;
-	} while (sk && sk->sk_family != state->family);
+	} while (sk && sk->sk_net != state->p.net &&
+			sk->sk_family != state->family);
 
 	if (!sk && ++state->bucket < RAW_HTABLE_SIZE) {
 		sk = sk_head(&state->h->ht[state->bucket]);
@@ -970,16 +972,18 @@ static const struct seq_operations raw_seq_ops = {
 	.show  = raw_seq_show,
 };
 
-int raw_seq_open(struct file *file, struct raw_hashinfo *h,
+int raw_seq_open(struct inode *ino, struct file *file, struct raw_hashinfo *h,
 		unsigned short family)
 {
+	int err;
 	struct raw_iter_state *i;
 
-	i = __seq_open_private(file, &raw_seq_ops,
+	err = seq_open_net(ino, file, &raw_seq_ops,
 			sizeof(struct raw_iter_state));
-	if (i == NULL)
-		return -ENOMEM;
+	if (err < 0)
+		return err;
 
+	i = raw_seq_private((struct seq_file *)file->private_data);
 	i->h = h;
 	i->family = family;
 	return 0;
@@ -988,7 +992,7 @@ EXPORT_SYMBOL_GPL(raw_seq_open);
 
 static int raw_v4_seq_open(struct inode *inode, struct file *file)
 {
-	return raw_seq_open(file, &raw_v4_hashinfo, PF_INET);
+	return raw_seq_open(inode, file, &raw_v4_hashinfo, PF_INET);
 }
 
 static const struct file_operations raw_seq_fops = {
@@ -996,7 +1000,7 @@ static const struct file_operations raw_seq_fops = {
 	.open	 = raw_v4_seq_open,
 	.read	 = seq_read,
 	.llseek	 = seq_lseek,
-	.release = seq_release_private,
+	.release = seq_release_net,
 };
 
 int __init raw_proc_init(void)
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 6f20086..026fa91 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1259,7 +1259,7 @@ static const struct seq_operations raw6_seq_ops = {
 
 static int raw6_seq_open(struct inode *inode, struct file *file)
 {
-	return raw_seq_open(file, &raw_v6_hashinfo, PF_INET6);
+	return raw_seq_open(inode, file, &raw_v6_hashinfo, PF_INET6);
 }
 
 static const struct file_operations raw6_seq_fops = {
@@ -1267,7 +1267,7 @@ static const struct file_operations raw6_seq_fops = {
 	.open =		raw6_seq_open,
 	.read =		seq_read,
 	.llseek =	seq_lseek,
-	.release =	seq_release_private,
+	.release =	seq_release_net,
 };
 
 int __init raw6_proc_init(void)
-- 
1.5.3.4


^ permalink raw reply related

* [PATCH net-2.6.25 1/4][NETNS][RAW]: Make ipv[46] raw sockets lookup namespaces aware.
From: Pavel Emelyanov @ 2008-01-14 13:03 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel

This requires just to pass the appropriate struct net pointer 
into __raw_v[46]_lookup and skip sockets that do not belong
to a needed namespace.

The proper net is get from skb->dev in all the cases.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---
 net/ipv4/raw.c |   21 +++++++++++++--------
 net/ipv6/raw.c |   18 +++++++++++++-----
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 747911a..a490a9d 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -116,16 +116,15 @@ static void raw_v4_unhash(struct sock *sk)
 	raw_unhash_sk(sk, &raw_v4_hashinfo);
 }
 
-static struct sock *__raw_v4_lookup(struct sock *sk, unsigned short num,
-			     __be32 raddr, __be32 laddr,
-			     int dif)
+static struct sock *__raw_v4_lookup(struct net *net, struct sock *sk,
+		unsigned short num, __be32 raddr, __be32 laddr, int dif)
 {
 	struct hlist_node *node;
 
 	sk_for_each_from(sk, node) {
 		struct inet_sock *inet = inet_sk(sk);
 
-		if (inet->num == num 					&&
+		if (sk->sk_net == net && inet->num == num 		&&
 		    !(inet->daddr && inet->daddr != raddr) 		&&
 		    !(inet->rcv_saddr && inet->rcv_saddr != laddr)	&&
 		    !(sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif))
@@ -169,12 +168,15 @@ static int raw_v4_input(struct sk_buff *skb, struct iphdr *iph, int hash)
 	struct sock *sk;
 	struct hlist_head *head;
 	int delivered = 0;
+	struct net *net;
 
 	read_lock(&raw_v4_hashinfo.lock);
 	head = &raw_v4_hashinfo.ht[hash];
 	if (hlist_empty(head))
 		goto out;
-	sk = __raw_v4_lookup(__sk_head(head), iph->protocol,
+
+	net = skb->dev->nd_net;
+	sk = __raw_v4_lookup(net, __sk_head(head), iph->protocol,
 			     iph->saddr, iph->daddr,
 			     skb->dev->ifindex);
 
@@ -187,7 +189,7 @@ static int raw_v4_input(struct sk_buff *skb, struct iphdr *iph, int hash)
 			if (clone)
 				raw_rcv(sk, clone);
 		}
-		sk = __raw_v4_lookup(sk_next(sk), iph->protocol,
+		sk = __raw_v4_lookup(net, sk_next(sk), iph->protocol,
 				     iph->saddr, iph->daddr,
 				     skb->dev->ifindex);
 	}
@@ -273,6 +275,7 @@ void raw_icmp_error(struct sk_buff *skb, int protocol, u32 info)
 	int hash;
 	struct sock *raw_sk;
 	struct iphdr *iph;
+	struct net *net;
 
 	hash = protocol & (RAW_HTABLE_SIZE - 1);
 
@@ -280,8 +283,10 @@ void raw_icmp_error(struct sk_buff *skb, int protocol, u32 info)
 	raw_sk = sk_head(&raw_v4_hashinfo.ht[hash]);
 	if (raw_sk != NULL) {
 		iph = (struct iphdr *)skb->data;
-		while ((raw_sk = __raw_v4_lookup(raw_sk, protocol, iph->daddr,
-						iph->saddr,
+		net = skb->dev->nd_net;
+
+		while ((raw_sk = __raw_v4_lookup(net, raw_sk, protocol,
+						iph->daddr, iph->saddr,
 						skb->dev->ifindex)) != NULL) {
 			raw_err(raw_sk, skb, info);
 			raw_sk = sk_next(raw_sk);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index cb0b110..6f20086 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -76,8 +76,9 @@ static void raw_v6_unhash(struct sock *sk)
 }
 
 
-static struct sock *__raw_v6_lookup(struct sock *sk, unsigned short num,
-		struct in6_addr *loc_addr, struct in6_addr *rmt_addr, int dif)
+static struct sock *__raw_v6_lookup(struct net *net, struct sock *sk,
+		unsigned short num, struct in6_addr *loc_addr,
+		struct in6_addr *rmt_addr, int dif)
 {
 	struct hlist_node *node;
 	int is_multicast = ipv6_addr_is_multicast(loc_addr);
@@ -86,6 +87,9 @@ static struct sock *__raw_v6_lookup(struct sock *sk, unsigned short num,
 		if (inet_sk(sk)->num == num) {
 			struct ipv6_pinfo *np = inet6_sk(sk);
 
+			if (sk->sk_net != net)
+				continue;
+
 			if (!ipv6_addr_any(&np->daddr) &&
 			    !ipv6_addr_equal(&np->daddr, rmt_addr))
 				continue;
@@ -165,6 +169,7 @@ static int ipv6_raw_deliver(struct sk_buff *skb, int nexthdr)
 	struct sock *sk;
 	int delivered = 0;
 	__u8 hash;
+	struct net *net;
 
 	saddr = &ipv6_hdr(skb)->saddr;
 	daddr = saddr + 1;
@@ -182,7 +187,8 @@ static int ipv6_raw_deliver(struct sk_buff *skb, int nexthdr)
 	if (sk == NULL)
 		goto out;
 
-	sk = __raw_v6_lookup(sk, nexthdr, daddr, saddr, IP6CB(skb)->iif);
+	net = skb->dev->nd_net;
+	sk = __raw_v6_lookup(net, sk, nexthdr, daddr, saddr, IP6CB(skb)->iif);
 
 	while (sk) {
 		int filtered;
@@ -225,7 +231,7 @@ static int ipv6_raw_deliver(struct sk_buff *skb, int nexthdr)
 				rawv6_rcv(sk, clone);
 			}
 		}
-		sk = __raw_v6_lookup(sk_next(sk), nexthdr, daddr, saddr,
+		sk = __raw_v6_lookup(net, sk_next(sk), nexthdr, daddr, saddr,
 				     IP6CB(skb)->iif);
 	}
 out:
@@ -359,6 +365,7 @@ void raw6_icmp_error(struct sk_buff *skb, int nexthdr,
 	struct sock *sk;
 	int hash;
 	struct in6_addr *saddr, *daddr;
+	struct net *net;
 
 	hash = nexthdr & (RAW_HTABLE_SIZE - 1);
 
@@ -367,8 +374,9 @@ void raw6_icmp_error(struct sk_buff *skb, int nexthdr,
 	if (sk != NULL) {
 		saddr = &ipv6_hdr(skb)->saddr;
 		daddr = &ipv6_hdr(skb)->daddr;
+		net = skb->dev->nd_net;
 
-		while ((sk = __raw_v6_lookup(sk, nexthdr, saddr, daddr,
+		while ((sk = __raw_v6_lookup(net, sk, nexthdr, saddr, daddr,
 						IP6CB(skb)->iif))) {
 			rawv6_err(sk, skb, NULL, type, code,
 					inner_offset, info);
-- 
1.5.3.4


^ permalink raw reply related

* Re: [PATCH 2.6.23+] ingress classify to [nf]mark
From: jamal @ 2008-01-14 12:56 UTC (permalink / raw)
  To: mahatma; +Cc: netdev
In-Reply-To: <478B8250.90602@bspu.unibel.by>

On Mon, 2008-14-01 at 13:40 -0200, Dzianis Kahanovich wrote:
> jamal wrote:

> Yes, I only do it by inertia after "#define tc_index mark".

And i am afraid this bothers me greatly.
You already have ways to achieve what you need by setting proper policy,
the difference in configuration is an extra one policy line you have to
type in. Adding yet another #ifdef is really going overboard.

> I not understand why "tc_index" changed in this place. 1) there are ingress 2) 
> there are "OK" action. Are "tc_index" will not changed after: "tc filter add 
> dev eth0 parent ffff: ... flowid 1:1 action continue" ? In general - are 
> tc_index useful in ingress? (may be tc_index used in [nf]mark-style, but even 
> in netfilter it feature migrate - IMHO, may be I time to time do not see in 
> needed place)

tc_index could be used for classification actually. If you "continue"
you could hit another classifier which looks at it.

> Sorry, I just change focus from existing "tc_index=..." to common behaviour ;)

> [...]
> > Please refer to what i said above; if what i said still doesnt make
> > sense i can create (the simple) patch.
> 
> A bit vague... sorry...

I mean:

#ifdef CONFIG_NET_CLS_ACT
.... leave this part alone which already sets tc_index ...
#else
...set tc_index and mark here ...
#endif

And when we have a metadata action - we remove setting of tc_index from
#ifdef CONFIG_NET_CLS_ACT

Did that make sense?

cheers,
jamal


^ permalink raw reply

* Re: [PATCH 2/9] get rid of unused revision element
From: David Miller @ 2008-01-14 12:06 UTC (permalink / raw)
  To: Robert.Olsson; +Cc: shemminger, robert.olsson, netdev, stephen.hemminger
In-Reply-To: <18315.19232.437801.553809@robur.slu.se>

From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Mon, 14 Jan 2008 12:44:32 +0100

>  The idea was to have a selective flush of route cache entries when
>  a fib insert/delete happened. From what I remember you added another/
>  better solution. Just a list with route cache entries pointing to parent 
>  route. So yes this was obsoleted by your/our effort to avoid total 
>  flushing of the route cache. Unfinished work.

Yes, that's right.  The synchronization was very hard.

But there is another issue, see below....

>  According to  http://bgpupdates.potaroo.net/instability/bgpupd.html
>  (last in page) we currently flush the route cache 2.80 times per second. 
>  when using full Internet routing with Linux. Maybe we're forced to pick 
>  up this thread again someday.

This proves we need to solve this problem.

The reason I've never gone back to that work is that I didn't
want to do it while we still had multiple FIB data structure
implementations.

Someone needs to go over whatever deficiencies exist in fib_trie
vs. fib_hash so that we can delete fib_hash and move over to using
fib_trie always.  It makes no sense to implement everything
interfacing into that code twice.

There was a full consensus that this was the way to move forward,
we just need the dirty work to be done.

If someone wants to show their gratitude for my getting rid of
the multipath cached routing code, the above work would be a
great way to do so (hint hint) :-)

^ permalink raw reply

* Re: Netconf at conf.au 2008?
From: David Miller @ 2008-01-14 11:57 UTC (permalink / raw)
  To: gdt; +Cc: netdev
In-Reply-To: <1200308480.5882.39.camel@andromache>

From: Glen Turner <gdt@gdt.id.au>
Date: Mon, 14 Jan 2008 21:31:20 +1030

> If you have trouble hosting netconf next year please get in
> touch.  Although AARNet could not sponsor the flights or
> accommodation we could provide everything else you require.

I appreciate the offer, but the funding is the one and only issue.

I have more ways to get conference facilities and network connectivity
for them than I need.

> so if you and the netdev people were willing to hold a one-day
> training/workshop of interest to academic and research network
> users (say network host tuning) before/after your meeting then
> we'd have no trouble getting sponsorship via the training
> event that would be sufficient to cover the airfares and
> accommodation for the larger event.

We're really not interested in things like this, but thanks for
mentioning.

At best what we do is allow one or two representatives from a major
sponsoring party to attend and give a presentation (netconf is invite
only, so this is a big deal).  And that presentation must be on
practical work the presenter has done or is doing with the Linux
networking upstream (ie. it can't be a "our company needs the Linux
networking to do X", some sales/marketing pitch, or "here's cool
proprietary Y we're doing with Linux")

^ permalink raw reply

* Re: [PATCH 2/9] get rid of unused revision element
From: Robert Olsson @ 2008-01-14 11:44 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, robert.olsson, netdev, stephen.hemminger
In-Reply-To: <20080112.205059.223738374.davem@davemloft.net>


David Miller writes:

 > > The revision element must of been part of an earlier design,
 > > because currently it is set but never used.
 > > 
 > > Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
 > 
 > I suspect Robert wanted to play around with some generation
 > ID optimizations but never got around to it.

 The idea was to have a selective flush of route cache entries when
 a fib insert/delete happened. From what I remember you added another/
 better solution. Just a list with route cache entries pointing to parent 
 route. So yes this was obsoleted by your/our effort to avoid total 
 flushing of the route cache. Unfinished work.
 
 According to  http://bgpupdates.potaroo.net/instability/bgpupd.html
 (last in page) we currently flush the route cache 2.80 times per second. 
 when using full Internet routing with Linux. Maybe we're forced to pick 
 up this thread again someday.

 Cheers.
					--ro
 
 

^ permalink raw reply

* Re: [PATCH 2.6.23+] ingress classify to [nf]mark
From: Dzianis Kahanovich @ 2008-01-14 15:40 UTC (permalink / raw)
  To: netdev; +Cc: hadi, mahatma
In-Reply-To: <1200253484.4427.33.camel@localhost>

jamal wrote:

>> I in doubts only about "action continue".
>> To "and/or" behaviour one of best usage are (example):
> 
> I dont think you should be touching the action part at all primarily
> because actions can set the mark after classification. 

Yes, I only do it by inertia after "#define tc_index mark".

I not understand why "tc_index" changed in this place. 1) there are ingress 2) 
there are "OK" action. Are "tc_index" will not changed after: "tc filter add 
dev eth0 parent ffff: ... flowid 1:1 action continue" ? In general - are 
tc_index useful in ingress? (may be tc_index used in [nf]mark-style, but even 
in netfilter it feature migrate - IMHO, may be I time to time do not see in 
needed place)

Sorry, I just change focus from existing "tc_index=..." to common behaviour ;)

[...]
> Please refer to what i said above; if what i said still doesnt make
> sense i can create (the simple) patch.

A bit vague... sorry...

-- 
WBR,
Denis Kaganovich,  mahatma@eu.by  http://mahatma.bspu.unibel.by

^ permalink raw reply

* Re: [PATCH 9/9] fix sparse warnings
From: Robert Olsson @ 2008-01-14 11:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, David Miller, Robert Olsson, netdev
In-Reply-To: <20080112130824.54cc1fc3@deepthought>


Thanks for hacking and improving and the trie... another idea that could
be also tested. If we look into routing table we see that most leafs 
only has one prefix

Main:
        Aver depth:     2.57
        Max depth:      7
        Leaves:         231173

ip route | wc -l 
241649

Thats 231173/241649 = 96% with the current Internet routing.

How about if would have a fastpath and store one entry direct in the 
leaf struct this to avoid loading the leaf_info list in most cases?

One could believe that both lookup and dump could improve.

Cheers.
					--ro



Stephen Hemminger writes:

 > Remember that the code should be optimized for lookup, not management
 > operations. We ran into this during testing (the test suite was looking
 > for number of routes), thats why I put in the size field.
 > 
 > The existing dump code is really slow:
 > 
 > 1) FIB_TRIE   Under KVM:
 >      load 164393 routes		12.436 sec
 >      ip route | wc -l		12.569 sec
 >      grep /proc/net/route	25.357 sec
 > 
 > 99% of the cpu time is spent in nextleaf() during these dump operations.
 > 
 > 2) FIB_HASH 	Under KVM:
 >      load 164393 routes		10.833 sec
 >      ip route | wc -l		1.981 sec
 >      grep /proc/net/route	0.204 sec

^ permalink raw reply

* Re: Netconf at conf.au 2008?
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2008-01-14 11:06 UTC (permalink / raw)
  To: madduck; +Cc: netdev, yoshfuji
In-Reply-To: <20080113181751.GA15063@lapse.madduck.net>

In article <20080113181751.GA15063@lapse.madduck.net> (at Sun, 13 Jan 2008 19:17:51 +0100), martin f krafft <madduck@madduck.net> says:

> also sprach Andy Johnson <johnsonzjo@gmail.com> [2008.01.12.0752 +0100]:
> > I saw somewhere (maybe in this mailing list a while ago) that
> > there might be a  Linux Kernel Developers' Netconf conference  at
> > conf.au 2008.
> 
> I think you may be mixing things up, and it may be my fault in ways.
> I am developing netconf: http://netconf.alioth.debian.org. I am
> aware of the NETCONF protocol and have considered renaming my
> project, but looking around, it seemed to me that NETCONF isn't
> really all that active, and so I chose to keep the name. If people
> think that wasn't wise, I'm willing to listen...

Very confusing to me...

--yoshfuji

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox