netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC NET 00/04]: Increase number of possible routing tables
@ 2006-07-03  7:52 Patrick McHardy
  2006-07-03  7:53 ` [RFC NET 01/04]: Use u32 for routing table IDs Patrick McHardy
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Patrick McHardy @ 2006-07-03  7:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, greearb, Patrick McHardy

I took on Ben's challenge to increase the number of possible routing tables,
these are the resulting patches.

The table IDs are changed to 32 bit values and are contained in a new netlink
routing attribute. For compatibility rtm_table in struct rtmsg can still be
used to access the first 255 tables and contains the low 8 bit of the table
ID in case of dumps. Unfortunately there are no invalid values for rtm_table,
so the best userspace can do in case of a new iproute version that tries to
access tables > 255 on an old kernel is to use RTM_UNSPEC (0) for rtm_table,
which will make the kernel allocate an empty table instead of silently adding
routes to a more or less random table. The iproute patch will follow shortly.

The hash tables are statically sized since on-the-fly resizing would require
introducing locking in the packet processing path (currently we need none),
if this is a problem we could just directly attach table references to rules,
since tables are never deleted or freed this would be a simple change.

One spot is still missing (nl_fib_lookup), so these patches are purely a RFC
for now. Tested only with IPv4, I mainly converted DECNET as well to keep it
in sync and because iteration over all possible table values, as done in many
spots, has an unacceptable overhead with 32 bit values.


 include/linux/rtnetlink.h |   11 +++
 include/net/dn_fib.h      |    7 +-
 include/net/ip_fib.h      |   39 ++++---------
 net/decnet/dn_fib.c       |   62 ++-------------------
 net/decnet/dn_route.c     |    1 
 net/decnet/dn_rules.c     |   12 ++--
 net/decnet/dn_table.c     |  133 ++++++++++++++++++++++++++++++++--------------
 net/ipv4/fib_frontend.c   |  115 +++++++++++++++++++++++++--------------
 net/ipv4/fib_hash.c       |   30 +++++-----
 net/ipv4/fib_lookup.h     |    4 -
 net/ipv4/fib_rules.c      |   18 +++---
 net/ipv4/fib_semantics.c  |    5 +
 net/ipv4/fib_trie.c       |   32 +++++------
 net/ipv4/route.c          |    1 
 net/ipv6/route.c          |    1 
 15 files changed, 255 insertions(+), 216 deletions(-)

Patrick McHardy:
      [NET]: Use u32 for routing table IDs
      [NET]: Introduce RTA_TABLE routing attribute
      [IPV4]: Increase number of possible routing tables to 2^32
      [DECNET]: Increase number of possible routing tables to 2^32

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC NET 01/04]: Use u32 for routing table IDs
  2006-07-03  7:52 [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
@ 2006-07-03  7:53 ` Patrick McHardy
  2006-07-03  7:53 ` [RFC NET 02/04]: Introduce RTA_TABLE routing attribute Patrick McHardy
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Patrick McHardy @ 2006-07-03  7:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, greearb, Patrick McHardy

[NET]: Use u32 for routing table IDs

Use u32 for routing table IDs in net/ipv4 and net/decnet in preparation of
support for a larger number of routing tables. No functional changes are
made by this patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 7e3ac412e095b5a4d29a7244d5cee795267a7c6a
tree cfcec90b10614b7a1401459f2ea0abec4e189799
parent f82bc1762e0e74b7e0040a4d83be06d32c37fc2e
author Patrick McHardy <kaber@trash.net> Mon, 03 Jul 2006 07:43:07 +0200
committer Patrick McHardy <kaber@trash.net> Mon, 03 Jul 2006 07:43:07 +0200

 include/net/dn_fib.h     |    4 ++--
 include/net/ip_fib.h     |   14 +++++++-------
 net/decnet/dn_fib.c      |    6 +++---
 net/decnet/dn_rules.c    |    4 ++--
 net/decnet/dn_table.c    |   10 +++++-----
 net/ipv4/fib_frontend.c  |    8 ++++----
 net/ipv4/fib_hash.c      |    4 ++--
 net/ipv4/fib_lookup.h    |    4 ++--
 net/ipv4/fib_rules.c     |    8 ++++----
 net/ipv4/fib_semantics.c |    4 ++--
 net/ipv4/fib_trie.c      |    6 +++---
 11 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/include/net/dn_fib.h b/include/net/dn_fib.h
index a15dcf0..9464f48 100644
--- a/include/net/dn_fib.h
+++ b/include/net/dn_fib.h
@@ -94,7 +94,7 @@ #define DN_FIB_INFO(f) ((f)->fn_info)
 
 
 struct dn_fib_table {
-	int n;
+	u32 n;
 
 	int (*insert)(struct dn_fib_table *t, struct rtmsg *r, 
 			struct dn_kern_rta *rta, struct nlmsghdr *n, 
@@ -137,7 +137,7 @@ extern int dn_fib_sync_up(struct net_dev
 /*
  * dn_tables.c
  */
-extern struct dn_fib_table *dn_fib_get_table(int n, int creat);
+extern struct dn_fib_table *dn_fib_get_table(u32 n, int creat);
 extern struct dn_fib_table *dn_fib_empty_table(void);
 extern void dn_fib_table_init(void);
 extern void dn_fib_table_cleanup(void);
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index a095d1d..ddc3ced 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -149,7 +149,7 @@ #define FIB_RES_NETMASK(res)	        (0)
 #endif /* CONFIG_IP_ROUTE_MULTIPATH_WRANDOM */
 
 struct fib_table {
-	unsigned char	tb_id;
+	u32		tb_id;
 	unsigned	tb_stamp;
 	int		(*tb_lookup)(struct fib_table *tb, const struct flowi *flp, struct fib_result *res);
 	int		(*tb_insert)(struct fib_table *table, struct rtmsg *r,
@@ -172,14 +172,14 @@ #ifndef CONFIG_IP_MULTIPLE_TABLES
 extern struct fib_table *ip_fib_local_table;
 extern struct fib_table *ip_fib_main_table;
 
-static inline struct fib_table *fib_get_table(int id)
+static inline struct fib_table *fib_get_table(u32 id)
 {
 	if (id != RT_TABLE_LOCAL)
 		return ip_fib_main_table;
 	return ip_fib_local_table;
 }
 
-static inline struct fib_table *fib_new_table(int id)
+static inline struct fib_table *fib_new_table(u32 id)
 {
 	return fib_get_table(id);
 }
@@ -204,10 +204,10 @@ #define ip_fib_main_table (fib_tables[RT
 
 extern struct fib_table * fib_tables[RT_TABLE_MAX+1];
 extern int fib_lookup(const struct flowi *flp, struct fib_result *res);
-extern struct fib_table *__fib_new_table(int id);
+extern struct fib_table *__fib_new_table(u32 id);
 extern void fib_rule_put(struct fib_rule *r);
 
-static inline struct fib_table *fib_get_table(int id)
+static inline struct fib_table *fib_get_table(u32 id)
 {
 	if (id == 0)
 		id = RT_TABLE_MAIN;
@@ -215,7 +215,7 @@ static inline struct fib_table *fib_get_
 	return fib_tables[id];
 }
 
-static inline struct fib_table *fib_new_table(int id)
+static inline struct fib_table *fib_new_table(u32 id)
 {
 	if (id == 0)
 		id = RT_TABLE_MAIN;
@@ -248,7 +248,7 @@ extern int fib_convert_rtentry(int cmd, 
 extern u32  __fib_res_prefsrc(struct fib_result *res);
 
 /* Exported by fib_hash.c */
-extern struct fib_table *fib_hash_init(int id);
+extern struct fib_table *fib_hash_init(u32 id);
 
 #ifdef CONFIG_IP_MULTIPLE_TABLES
 /* Exported by fib_rules.c */
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index 0375077..f4c1c1e 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -534,8 +534,8 @@ int dn_fib_rtm_newroute(struct sk_buff *
 
 int dn_fib_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	int t;
-	int s_t;
+	u32 t;
+	u32 s_t;
 	struct dn_fib_table *tb;
 
 	if (NLMSG_PAYLOAD(cb->nlh, 0) >= sizeof(struct rtmsg) &&
@@ -765,7 +765,7 @@ void dn_fib_flush(void)
 {
         int flushed = 0;
         struct dn_fib_table *tb;
-        int id;
+        u32 id;
 
         for(id = RT_TABLE_MAX; id > 0; id--) {
                 if ((tb = dn_fib_get_table(id, 0)) == NULL)
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 06e785f..16ca66b 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -43,7 +43,7 @@ struct dn_fib_rule
 	struct hlist_node	r_hlist;
 	atomic_t		r_clntref;
 	u32			r_preference;
-	unsigned char		r_table;
+	u32			r_table;
 	unsigned char		r_action;
 	unsigned char		r_dst_len;
 	unsigned char		r_src_len;
@@ -130,7 +130,7 @@ int dn_fib_rtm_newrule(struct sk_buff *s
 	struct rtmsg *rtm = NLMSG_DATA(nlh);
 	struct dn_fib_rule *r, *new_r, *last = NULL;
 	struct hlist_node *node = NULL;
-	unsigned char table_id;
+	u32 table_id;
 
 	if (rtm->rtm_src_len > 16 || rtm->rtm_dst_len > 16)
 		return -EINVAL;
diff --git a/net/decnet/dn_table.c b/net/decnet/dn_table.c
index 37d9d0a..7de6a88 100644
--- a/net/decnet/dn_table.c
+++ b/net/decnet/dn_table.c
@@ -268,7 +268,7 @@ static int dn_fib_nh_match(struct rtmsg 
 }
 
 static int dn_fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event,
-                        u8 tb_id, u8 type, u8 scope, void *dst, int dst_len,
+                        u32 tb_id, u8 type, u8 scope, void *dst, int dst_len,
                         struct dn_fib_info *fi, unsigned int flags)
 {
         struct rtmsg *rtm;
@@ -331,7 +331,7 @@ rtattr_failure:
 }
 
 
-static void dn_rtmsg_fib(int event, struct dn_fib_node *f, int z, int tb_id,
+static void dn_rtmsg_fib(int event, struct dn_fib_node *f, int z, u32 tb_id,
                         struct nlmsghdr *nlh, struct netlink_skb_parms *req)
 {
         struct sk_buff *skb;
@@ -744,7 +744,7 @@ out:
 }
 
 
-struct dn_fib_table *dn_fib_get_table(int n, int create)
+struct dn_fib_table *dn_fib_get_table(u32 n, int create)
 {
         struct dn_fib_table *t;
 
@@ -781,7 +781,7 @@ struct dn_fib_table *dn_fib_get_table(in
         return t;
 }
 
-static void dn_fib_del_tree(int n)
+static void dn_fib_del_tree(u32 n)
 {
 	struct dn_fib_table *t;
 
@@ -795,7 +795,7 @@ static void dn_fib_del_tree(int n)
 
 struct dn_fib_table *dn_fib_empty_table(void)
 {
-        int id;
+        u32 id;
 
         for(id = RT_TABLE_MIN; id <= RT_TABLE_MAX; id++)
                 if (dn_fib_tables[id] == NULL)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index ba2a707..4d5429a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -61,7 +61,7 @@ #define RT_TABLE_MIN 1
 
 struct fib_table *fib_tables[RT_TABLE_MAX+1];
 
-struct fib_table *__fib_new_table(int id)
+struct fib_table *__fib_new_table(u32 id)
 {
 	struct fib_table *tb;
 
@@ -81,7 +81,7 @@ static void fib_flush(void)
 	int flushed = 0;
 #ifdef CONFIG_IP_MULTIPLE_TABLES
 	struct fib_table *tb;
-	int id;
+	u32 id;
 
 	for (id = RT_TABLE_MAX; id>0; id--) {
 		if ((tb = fib_get_table(id))==NULL)
@@ -332,8 +332,8 @@ int inet_rtm_newroute(struct sk_buff *sk
 
 int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	int t;
-	int s_t;
+	u32 t;
+	u32 s_t;
 	struct fib_table *tb;
 
 	if (NLMSG_PAYLOAD(cb->nlh, 0) >= sizeof(struct rtmsg) &&
diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index 3c1d32a..4b79173 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -766,9 +766,9 @@ static int fn_hash_dump(struct fib_table
 }
 
 #ifdef CONFIG_IP_MULTIPLE_TABLES
-struct fib_table * fib_hash_init(int id)
+struct fib_table * fib_hash_init(u32 id)
 #else
-struct fib_table * __init fib_hash_init(int id)
+struct fib_table * __init fib_hash_init(u32 id)
 #endif
 {
 	struct fib_table *tb;
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index ef6609e..ddd5249 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -30,11 +30,11 @@ extern struct fib_info *fib_create_info(
 extern int fib_nh_match(struct rtmsg *r, struct nlmsghdr *,
 			struct kern_rta *rta, struct fib_info *fi);
 extern int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event,
-			 u8 tb_id, u8 type, u8 scope, void *dst,
+			 u32 tb_id, u8 type, u8 scope, void *dst,
 			 int dst_len, u8 tos, struct fib_info *fi,
 			 unsigned int);
 extern void rtmsg_fib(int event, u32 key, struct fib_alias *fa,
-		      int z, int tb_id,
+		      int z, u32 tb_id,
 		      struct nlmsghdr *n, struct netlink_skb_parms *req);
 extern struct fib_alias *fib_find_alias(struct list_head *fah,
 					u8 tos, u32 prio);
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 6c642d1..71ede02 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -56,7 +56,7 @@ struct fib_rule
 	struct hlist_node hlist;
 	atomic_t	r_clntref;
 	u32		r_preference;
-	unsigned char	r_table;
+	u32		r_table;
 	unsigned char	r_action;
 	unsigned char	r_dst_len;
 	unsigned char	r_src_len;
@@ -145,7 +145,7 @@ #endif
 
 static struct fib_table *fib_empty_table(void)
 {
-	int id;
+	u32 id;
 
 	for (id = 1; id <= RT_TABLE_MAX; id++)
 		if (fib_tables[id] == NULL)
@@ -177,7 +177,7 @@ int inet_rtm_newrule(struct sk_buff *skb
 	struct rtmsg *rtm = NLMSG_DATA(nlh);
 	struct fib_rule *r, *new_r, *last = NULL;
 	struct hlist_node *node = NULL;
-	unsigned char table_id;
+	u32 table_id;
 
 	if (rtm->rtm_src_len > 32 || rtm->rtm_dst_len > 32 ||
 	    (rtm->rtm_tos & ~IPTOS_TOS_MASK))
@@ -320,7 +320,7 @@ #endif
 		    (r->r_ifindex && r->r_ifindex != flp->iif))
 			continue;
 
-FRprintk("tb %d r %d ", r->r_table, r->r_action);
+FRprintk("tb %u r %d ", r->r_table, r->r_action);
 		switch (r->r_action) {
 		case RTN_UNICAST:
 			policy = r;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 5f87533..84537df 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -273,7 +273,7 @@ int ip_fib_check_default(u32 gw, struct 
 }
 
 void rtmsg_fib(int event, u32 key, struct fib_alias *fa,
-	       int z, int tb_id,
+	       int z, u32 tb_id,
 	       struct nlmsghdr *n, struct netlink_skb_parms *req)
 {
 	struct sk_buff *skb;
@@ -940,7 +940,7 @@ u32 __fib_res_prefsrc(struct fib_result 
 
 int
 fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event,
-	      u8 tb_id, u8 type, u8 scope, void *dst, int dst_len, u8 tos,
+	      u32 tb_id, u8 type, u8 scope, void *dst, int dst_len, u8 tos,
 	      struct fib_info *fi, unsigned int flags)
 {
 	struct rtmsg *rtm;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 1cb6530..3936f16 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1148,7 +1148,7 @@ fn_trie_insert(struct fib_table *tb, str
 
 	key = ntohl(key);
 
-	pr_debug("Insert table=%d %08x/%d\n", tb->tb_id, key, plen);
+	pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
 
 	mask = ntohl(inet_make_mask(plen));
 
@@ -1943,9 +1943,9 @@ out:
 /* Fix more generic FIB names for init later */
 
 #ifdef CONFIG_IP_MULTIPLE_TABLES
-struct fib_table * fib_hash_init(int id)
+struct fib_table * fib_hash_init(u32 id)
 #else
-struct fib_table * __init fib_hash_init(int id)
+struct fib_table * __init fib_hash_init(u32 id)
 #endif
 {
 	struct fib_table *tb;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC NET 02/04]: Introduce RTA_TABLE routing attribute
  2006-07-03  7:52 [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
  2006-07-03  7:53 ` [RFC NET 01/04]: Use u32 for routing table IDs Patrick McHardy
@ 2006-07-03  7:53 ` Patrick McHardy
  2006-07-03  7:53 ` [RFC IPV4 03/04]: Increase number of possible routing tables to 2^32 Patrick McHardy
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Patrick McHardy @ 2006-07-03  7:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, greearb, Patrick McHardy

[NET]: Introduce RTA_TABLE routing attribute

Introduce RTA_TABLE routing attribute to hold 32 bit routing table IDs.
Usespace compatibility is provided by continuing to accept and send the
rtm_table field, but because of its limited size it can only carry the
low 8 bits of the table ID. This implies that if larger IDs are used,
_all_ userspace programs using them need to use RTA_TABLE.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 8cf1ae7345f935350dede855381dfbb620cabc1c
tree d674b6bb251dda0fcb915db0dbca98cda7779348
parent 7e3ac412e095b5a4d29a7244d5cee795267a7c6a
author Patrick McHardy <kaber@trash.net> Mon, 03 Jul 2006 08:13:14 +0200
committer Patrick McHardy <kaber@trash.net> Mon, 03 Jul 2006 08:13:14 +0200

 include/linux/rtnetlink.h |    8 ++++++++
 net/decnet/dn_fib.c       |    7 ++++---
 net/decnet/dn_route.c     |    1 +
 net/decnet/dn_rules.c     |    6 ++++--
 net/decnet/dn_table.c     |    1 +
 net/ipv4/fib_frontend.c   |    7 ++++---
 net/ipv4/fib_rules.c      |    6 ++++--
 net/ipv4/fib_semantics.c  |    1 +
 net/ipv4/route.c          |    1 +
 net/ipv6/route.c          |    1 +
 10 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index facd9ee..8f6efff 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -263,6 +263,7 @@ enum rtattr_type_t
 	RTA_CACHEINFO,
 	RTA_SESSION,
 	RTA_MP_ALGO,
+	RTA_TABLE,
 	__RTA_MAX
 };
 
@@ -1065,6 +1066,13 @@ #define BUG_TRAP(x) do { \
 	} \
 } while(0)
 
+static inline u32 rtm_get_table(struct rtmsg *rtm, struct rtattr **rta)
+{
+	return RTA_GET_U32(rta[RTA_TABLE-1]);
+rtattr_failure:
+	return rtm->rtm_table;
+}
+
 #endif /* __KERNEL__ */
 
 
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index f4c1c1e..a43e59b 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -491,7 +491,8 @@ static int dn_fib_check_attr(struct rtms
 		if (attr) {
 			if (RTA_PAYLOAD(attr) < 4 && RTA_PAYLOAD(attr) != 2)
 				return -EINVAL;
-			if (i != RTA_MULTIPATH && i != RTA_METRICS)
+			if (i != RTA_MULTIPATH && i != RTA_METRICS &&
+			    i != RTA_TABLE)
 				rta[i-1] = (struct rtattr *)RTA_DATA(attr);
 		}
 	}
@@ -508,7 +509,7 @@ int dn_fib_rtm_delroute(struct sk_buff *
 	if (dn_fib_check_attr(r, rta))
 		return -EINVAL;
 
-	tb = dn_fib_get_table(r->rtm_table, 0);
+	tb = dn_fib_get_table(rtm_get_table(r, rta), 0);
 	if (tb)
 		return tb->delete(tb, r, (struct dn_kern_rta *)rta, nlh, &NETLINK_CB(skb));
 
@@ -524,7 +525,7 @@ int dn_fib_rtm_newroute(struct sk_buff *
 	if (dn_fib_check_attr(r, rta))
 		return -EINVAL;
 
-	tb = dn_fib_get_table(r->rtm_table, 1);
+	tb = dn_fib_get_table(rtm_get_table(r, rta), 1);
 	if (tb) 
 		return tb->insert(tb, r, (struct dn_kern_rta *)rta, nlh, &NETLINK_CB(skb));
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 1355614..2c5bc4e 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1480,6 +1480,7 @@ static int dn_rt_fill_info(struct sk_buf
 	r->rtm_src_len = 0;
 	r->rtm_tos = 0;
 	r->rtm_table = RT_TABLE_MAIN;
+	RTA_PUT_U32(skb, RTA_TABLE, RT_TABLE_MAIN);
 	r->rtm_type = rt->rt_type;
 	r->rtm_flags = (rt->rt_flags & ~0xFFFF) | RTM_F_CLONED;
 	r->rtm_scope = RT_SCOPE_UNIVERSE;
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 16ca66b..d274d59 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -77,6 +77,7 @@ int dn_fib_rtm_delrule(struct sk_buff *s
 	struct rtmsg *rtm = NLMSG_DATA(nlh);
 	struct dn_fib_rule *r;
 	struct hlist_node *node;
+	u32 table = rtm_get_table(rtm, rta);
 	int err = -ESRCH;
 
 	hlist_for_each_entry(r, node, &dn_fib_rules, r_hlist) {
@@ -90,7 +91,7 @@ #endif
 			(!rtm->rtm_type || rtm->rtm_type == r->r_action) &&
 			(!rta[RTA_PRIORITY-1] || memcmp(RTA_DATA(rta[RTA_PRIORITY-1]), &r->r_preference, 4) == 0) &&
 			(!rta[RTA_IIF-1] || rtattr_strcmp(rta[RTA_IIF-1], r->r_ifname) == 0) &&
-			(!rtm->rtm_table || (r && rtm->rtm_table == r->r_table))) {
+			(!table || (r && table == r->r_table))) {
 
 			err = -EPERM;
 			if (r == &default_rule)
@@ -141,7 +142,7 @@ int dn_fib_rtm_newrule(struct sk_buff *s
 	if (rtm->rtm_type == RTN_NAT)
 		return -EINVAL;
 
-	table_id = rtm->rtm_table;
+	table_id = rtm_get_table(rtm, rta);
 	if (table_id == RT_TABLE_UNSPEC) {
 		struct dn_fib_table *tb;
 		if (rtm->rtm_type == RTN_UNICAST) {
@@ -365,6 +366,7 @@ #ifdef CONFIG_DECNET_ROUTE_FWMARK
 		RTA_PUT(skb, RTA_PROTOINFO, 4, &r->r_fwmark);
 #endif
 	rtm->rtm_table = r->r_table;
+	RTA_PUT_U32(skb, RTA_TABLE, r->r_table);
 	rtm->rtm_protocol = 0;
 	rtm->rtm_scope = 0;
 	rtm->rtm_type = r->r_action;
diff --git a/net/decnet/dn_table.c b/net/decnet/dn_table.c
index 7de6a88..b165282 100644
--- a/net/decnet/dn_table.c
+++ b/net/decnet/dn_table.c
@@ -282,6 +282,7 @@ static int dn_fib_dump_info(struct sk_bu
         rtm->rtm_src_len = 0;
         rtm->rtm_tos = 0;
         rtm->rtm_table = tb_id;
+	RTA_PUT_U32(skb, RTA_TABLE, tb_id);
         rtm->rtm_flags = fi->fib_flags;
         rtm->rtm_scope = scope;
 	rtm->rtm_type  = type;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d5429a..2f54f22 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -293,7 +293,8 @@ static int inet_check_attr(struct rtmsg 
 		if (attr) {
 			if (RTA_PAYLOAD(attr) < 4)
 				return -EINVAL;
-			if (i != RTA_MULTIPATH && i != RTA_METRICS)
+			if (i != RTA_MULTIPATH && i != RTA_METRICS &&
+			    i != RTA_TABLE)
 				*rta = (struct rtattr*)RTA_DATA(attr);
 		}
 	}
@@ -309,7 +310,7 @@ int inet_rtm_delroute(struct sk_buff *sk
 	if (inet_check_attr(r, rta))
 		return -EINVAL;
 
-	tb = fib_get_table(r->rtm_table);
+	tb = fib_get_table(rtm_get_table(r, rta));
 	if (tb)
 		return tb->tb_delete(tb, r, (struct kern_rta*)rta, nlh, &NETLINK_CB(skb));
 	return -ESRCH;
@@ -324,7 +325,7 @@ int inet_rtm_newroute(struct sk_buff *sk
 	if (inet_check_attr(r, rta))
 		return -EINVAL;
 
-	tb = fib_new_table(r->rtm_table);
+	tb = fib_new_table(rtm_get_table(r, rta));
 	if (tb)
 		return tb->tb_insert(tb, r, (struct kern_rta*)rta, nlh, &NETLINK_CB(skb));
 	return -ENOBUFS;
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 71ede02..e6d1f5a 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -111,6 +111,7 @@ int inet_rtm_delrule(struct sk_buff *skb
 	struct rtmsg *rtm = NLMSG_DATA(nlh);
 	struct fib_rule *r;
 	struct hlist_node *node;
+	u32 table = rtm_get_table(rtm, rta);
 	int err = -ESRCH;
 
 	hlist_for_each_entry(r, node, &fib_rules, hlist) {
@@ -125,7 +126,7 @@ #endif
 		    (!rtm->rtm_type || rtm->rtm_type == r->r_action) &&
 		    (!rta[RTA_PRIORITY-1] || memcmp(RTA_DATA(rta[RTA_PRIORITY-1]), &r->r_preference, 4) == 0) &&
 		    (!rta[RTA_IIF-1] || rtattr_strcmp(rta[RTA_IIF-1], r->r_ifname) == 0) &&
-		    (!rtm->rtm_table || (r && rtm->rtm_table == r->r_table))) {
+		    (!table || (r && table == r->r_table))) {
 			err = -EPERM;
 			if (r == &local_rule)
 				break;
@@ -186,7 +187,7 @@ int inet_rtm_newrule(struct sk_buff *skb
 	if (rta[RTA_IIF-1] && RTA_PAYLOAD(rta[RTA_IIF-1]) > IFNAMSIZ)
 		return -EINVAL;
 
-	table_id = rtm->rtm_table;
+	table_id = rtm_get_table(rtm, rta);
 	if (table_id == RT_TABLE_UNSPEC) {
 		struct fib_table *table;
 		if (rtm->rtm_type == RTN_UNICAST) {
@@ -403,6 +404,7 @@ #ifdef CONFIG_IP_ROUTE_FWMARK
 		RTA_PUT(skb, RTA_PROTOINFO, 4, &r->r_fwmark);
 #endif
 	rtm->rtm_table = r->r_table;
+	RTA_PUT_U32(skb, RTA_TABLE, r->r_table);
 	rtm->rtm_protocol = 0;
 	rtm->rtm_scope = 0;
 	rtm->rtm_type = r->r_action;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 84537df..3c45256 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -954,6 +954,7 @@ fib_dump_info(struct sk_buff *skb, u32 p
 	rtm->rtm_src_len = 0;
 	rtm->rtm_tos = tos;
 	rtm->rtm_table = tb_id;
+	RTA_PUT_U32(skb, RTA_TABLE, tb_id);
 	rtm->rtm_type = type;
 	rtm->rtm_flags = fi->fib_flags;
 	rtm->rtm_scope = scope;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index da44fab..7ef78f3 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2638,6 +2638,7 @@ #endif
 	r->rtm_src_len	= 0;
 	r->rtm_tos	= rt->fl.fl4_tos;
 	r->rtm_table	= RT_TABLE_MAIN;
+	RTA_PUT_U32(skb, RTA_TABLE, RT_TABLE_MAIN);
 	r->rtm_type	= rt->rt_type;
 	r->rtm_scope	= RT_SCOPE_UNIVERSE;
 	r->rtm_protocol = RTPROT_UNSPEC;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 87c39c9..4410282 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1745,6 +1745,7 @@ static int rt6_fill_node(struct sk_buff 
 	rtm->rtm_src_len = rt->rt6i_src.plen;
 	rtm->rtm_tos = 0;
 	rtm->rtm_table = RT_TABLE_MAIN;
+	RTA_PUT_U32(skb, RTA_TABLE, RT_TABLE_MAIN);
 	if (rt->rt6i_flags&RTF_REJECT)
 		rtm->rtm_type = RTN_UNREACHABLE;
 	else if (rt->rt6i_dev && (rt->rt6i_dev->flags&IFF_LOOPBACK))

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC IPV4 03/04]: Increase number of possible routing tables to 2^32
  2006-07-03  7:52 [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
  2006-07-03  7:53 ` [RFC NET 01/04]: Use u32 for routing table IDs Patrick McHardy
  2006-07-03  7:53 ` [RFC NET 02/04]: Introduce RTA_TABLE routing attribute Patrick McHardy
@ 2006-07-03  7:53 ` Patrick McHardy
  2006-07-03  7:53 ` [RFC DECNET 04/04]: " Patrick McHardy
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Patrick McHardy @ 2006-07-03  7:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, greearb, Patrick McHardy

[IPV4]: Increase number of possible routing tables to 2^32

Increase the nubmer of possible routing tables to 2^32 by replacing the
fixed sized array of tables by a hash table.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit aab791510bc6fb2392ac361b0375f60a24b02659
tree f198102cd18b1a6233c573cf4f1f5f0b8827e724
parent 8cf1ae7345f935350dede855381dfbb620cabc1c
author Patrick McHardy <kaber@trash.net> Mon, 03 Jul 2006 09:21:22 +0200
committer Patrick McHardy <kaber@trash.net> Mon, 03 Jul 2006 09:21:22 +0200

 include/linux/rtnetlink.h |    3 -
 include/net/ip_fib.h      |   25 ++---------
 net/ipv4/fib_frontend.c   |  100 ++++++++++++++++++++++++++++++---------------
 net/ipv4/fib_hash.c       |   26 ++++++------
 net/ipv4/fib_rules.c      |    4 +-
 net/ipv4/fib_trie.c       |   26 ++++++------
 6 files changed, 100 insertions(+), 84 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 8f6efff..c1217e4 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -238,9 +238,8 @@ enum rt_class_t
 	RT_TABLE_DEFAULT=253,
 	RT_TABLE_MAIN=254,
 	RT_TABLE_LOCAL=255,
-	__RT_TABLE_MAX
 };
-#define RT_TABLE_MAX (__RT_TABLE_MAX - 1)
+#define RT_TABLE_MAX 0xFFFFFFFF
 
 
 
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index ddc3ced..4b764e2 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -149,6 +149,7 @@ #define FIB_RES_NETMASK(res)	        (0)
 #endif /* CONFIG_IP_ROUTE_MULTIPATH_WRANDOM */
 
 struct fib_table {
+	struct hlist_node tb_hlist;
 	u32		tb_id;
 	unsigned	tb_stamp;
 	int		(*tb_lookup)(struct fib_table *tb, const struct flowi *flp, struct fib_result *res);
@@ -199,30 +200,14 @@ static inline void fib_select_default(co
 }
 
 #else /* CONFIG_IP_MULTIPLE_TABLES */
-#define ip_fib_local_table (fib_tables[RT_TABLE_LOCAL])
-#define ip_fib_main_table (fib_tables[RT_TABLE_MAIN])
+#define ip_fib_local_table fib_get_table(RT_TABLE_LOCAL)
+#define ip_fib_main_table fib_get_table(RT_TABLE_MAIN)
 
-extern struct fib_table * fib_tables[RT_TABLE_MAX+1];
 extern int fib_lookup(const struct flowi *flp, struct fib_result *res);
-extern struct fib_table *__fib_new_table(u32 id);
+extern struct fib_table *fib_new_table(u32 id);
+extern struct fib_table *fib_get_table(u32 id);
 extern void fib_rule_put(struct fib_rule *r);
 
-static inline struct fib_table *fib_get_table(u32 id)
-{
-	if (id == 0)
-		id = RT_TABLE_MAIN;
-
-	return fib_tables[id];
-}
-
-static inline struct fib_table *fib_new_table(u32 id)
-{
-	if (id == 0)
-		id = RT_TABLE_MAIN;
-
-	return fib_tables[id] ? : __fib_new_table(id);
-}
-
 extern void fib_select_default(const struct flowi *flp, struct fib_result *res);
 
 #endif /* CONFIG_IP_MULTIPLE_TABLES */
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 2f54f22..3c49e6b 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -36,6 +36,7 @@ #include <linux/if_arp.h>
 #include <linux/skbuff.h>
 #include <linux/netlink.h>
 #include <linux/init.h>
+#include <linux/list.h>
 
 #include <net/ip.h>
 #include <net/protocol.h>
@@ -50,48 +51,67 @@ #define FFprint(a...) printk(KERN_DEBUG 
 
 #ifndef CONFIG_IP_MULTIPLE_TABLES
 
-#define RT_TABLE_MIN RT_TABLE_MAIN
-
 struct fib_table *ip_fib_local_table;
 struct fib_table *ip_fib_main_table;
 
-#else
+#define FIB_TABLE_HASHSZ 1
+static struct hlist_head fib_table_hash[FIB_TABLE_HASHSZ];
 
-#define RT_TABLE_MIN 1
+#else
 
-struct fib_table *fib_tables[RT_TABLE_MAX+1];
+#define FIB_TABLE_HASHSZ 256
+static struct hlist_head fib_table_hash[FIB_TABLE_HASHSZ];
 
-struct fib_table *__fib_new_table(u32 id)
+struct fib_table *fib_new_table(u32 id)
 {
 	struct fib_table *tb;
+	unsigned int h;
 
+	if (id == 0)
+		id = RT_TABLE_MAIN;
+	tb = fib_get_table(id);
+	if (tb)
+		return tb;
 	tb = fib_hash_init(id);
 	if (!tb)
 		return NULL;
-	fib_tables[id] = tb;
+	h = id & (FIB_TABLE_HASHSZ - 1);
+	hlist_add_head_rcu(&tb->tb_hlist, &fib_table_hash[h]);
 	return tb;
 }
 
+struct fib_table *fib_get_table(u32 id)
+{
+	struct fib_table *tb;
+	struct hlist_node *node;
+	unsigned int h;
 
+	if (id == 0)
+		id = RT_TABLE_MAIN;
+	h = id & (FIB_TABLE_HASHSZ - 1);
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(tb, node, &fib_table_hash[h], tb_hlist) {
+		if (tb->tb_id == id) {
+			rcu_read_unlock();
+			return tb;
+		}
+	}
+	rcu_read_unlock();
+	return NULL;
+}
 #endif /* CONFIG_IP_MULTIPLE_TABLES */
 
-
 static void fib_flush(void)
 {
 	int flushed = 0;
-#ifdef CONFIG_IP_MULTIPLE_TABLES
 	struct fib_table *tb;
-	u32 id;
+	struct hlist_node *node;
+	unsigned int h;
 
-	for (id = RT_TABLE_MAX; id>0; id--) {
-		if ((tb = fib_get_table(id))==NULL)
-			continue;
-		flushed += tb->tb_flush(tb);
+	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
+		hlist_for_each_entry(tb, node, &fib_table_hash[h], tb_hlist)
+			flushed += tb->tb_flush(tb);
 	}
-#else /* CONFIG_IP_MULTIPLE_TABLES */
-	flushed += ip_fib_main_table->tb_flush(ip_fib_main_table);
-	flushed += ip_fib_local_table->tb_flush(ip_fib_local_table);
-#endif /* CONFIG_IP_MULTIPLE_TABLES */
 
 	if (flushed)
 		rt_cache_flush(-1);
@@ -333,29 +353,35 @@ int inet_rtm_newroute(struct sk_buff *sk
 
 int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	u32 t;
-	u32 s_t;
+	unsigned int h, s_h;
+	unsigned int e = 0, s_e;
 	struct fib_table *tb;
+	struct hlist_node *node;
 
 	if (NLMSG_PAYLOAD(cb->nlh, 0) >= sizeof(struct rtmsg) &&
 	    ((struct rtmsg*)NLMSG_DATA(cb->nlh))->rtm_flags&RTM_F_CLONED)
 		return ip_rt_dump(skb, cb);
 
-	s_t = cb->args[0];
-	if (s_t == 0)
-		s_t = cb->args[0] = RT_TABLE_MIN;
-
-	for (t=s_t; t<=RT_TABLE_MAX; t++) {
-		if (t < s_t) continue;
-		if (t > s_t)
-			memset(&cb->args[1], 0, sizeof(cb->args)-sizeof(cb->args[0]));
-		if ((tb = fib_get_table(t))==NULL)
-			continue;
-		if (tb->tb_dump(tb, skb, cb) < 0) 
-			break;
+	s_h = cb->args[0];
+	s_e = cb->args[1];
+
+	for (h = s_h; h < FIB_TABLE_HASHSZ; h++) {
+		e = 0;
+		hlist_for_each_entry(tb, node, &fib_table_hash[h], tb_hlist) {
+			if (e < s_e)
+				goto next;
+			if (e > s_e)
+				memset(&cb->args[1], 0, sizeof(cb->args) -
+				                 2 * sizeof(cb->args[0]));
+			if (tb->tb_dump(tb, skb, cb) < 0)
+				goto out;
+next:
+			e++;
+		}
 	}
-
-	cb->args[0] = t;
+out:
+	cb->args[1] = e;
+	cb->args[0] = h;
 
 	return skb->len;
 }
@@ -653,9 +679,15 @@ static struct notifier_block fib_netdev_
 
 void __init ip_fib_init(void)
 {
+	unsigned int i;
+
+	for (i = 0; i < FIB_TABLE_HASHSZ; i++)
+		INIT_HLIST_HEAD(&fib_table_hash[i]);
 #ifndef CONFIG_IP_MULTIPLE_TABLES
 	ip_fib_local_table = fib_hash_init(RT_TABLE_LOCAL);
+	hlist_add_head_rcu(&ip_fib_local_table->tb_hlist, &fib_table_hash[0]);
 	ip_fib_main_table  = fib_hash_init(RT_TABLE_MAIN);
+	hlist_add_head_rcu(&ip_fib_main_table->tb_hlist, &fib_table_hash[0]);
 #else
 	fib_rules_init();
 #endif
diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index 4b79173..fcbf2d6 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -685,7 +685,7 @@ fn_hash_dump_bucket(struct sk_buff *skb,
 	struct fib_node *f;
 	int i, s_i;
 
-	s_i = cb->args[3];
+	s_i = cb->args[4];
 	i = 0;
 	hlist_for_each_entry(f, node, head, fn_hash) {
 		struct fib_alias *fa;
@@ -705,14 +705,14 @@ fn_hash_dump_bucket(struct sk_buff *skb,
 					  fa->fa_tos,
 					  fa->fa_info,
 					  NLM_F_MULTI) < 0) {
-				cb->args[3] = i;
+				cb->args[4] = i;
 				return -1;
 			}
 		next:
 			i++;
 		}
 	}
-	cb->args[3] = i;
+	cb->args[4] = i;
 	return skb->len;
 }
 
@@ -723,21 +723,21 @@ fn_hash_dump_zone(struct sk_buff *skb, s
 {
 	int h, s_h;
 
-	s_h = cb->args[2];
+	s_h = cb->args[3];
 	for (h=0; h < fz->fz_divisor; h++) {
 		if (h < s_h) continue;
 		if (h > s_h)
-			memset(&cb->args[3], 0,
-			       sizeof(cb->args) - 3*sizeof(cb->args[0]));
+			memset(&cb->args[4], 0,
+			       sizeof(cb->args) - 4*sizeof(cb->args[0]));
 		if (fz->fz_hash == NULL ||
 		    hlist_empty(&fz->fz_hash[h]))
 			continue;
 		if (fn_hash_dump_bucket(skb, cb, tb, fz, &fz->fz_hash[h])<0) {
-			cb->args[2] = h;
+			cb->args[3] = h;
 			return -1;
 		}
 	}
-	cb->args[2] = h;
+	cb->args[3] = h;
 	return skb->len;
 }
 
@@ -747,21 +747,21 @@ static int fn_hash_dump(struct fib_table
 	struct fn_zone *fz;
 	struct fn_hash *table = (struct fn_hash*)tb->tb_data;
 
-	s_m = cb->args[1];
+	s_m = cb->args[2];
 	read_lock(&fib_hash_lock);
 	for (fz = table->fn_zone_list, m=0; fz; fz = fz->fz_next, m++) {
 		if (m < s_m) continue;
 		if (m > s_m)
-			memset(&cb->args[2], 0,
-			       sizeof(cb->args) - 2*sizeof(cb->args[0]));
+			memset(&cb->args[3], 0,
+			       sizeof(cb->args) - 3*sizeof(cb->args[0]));
 		if (fn_hash_dump_zone(skb, cb, tb, fz) < 0) {
-			cb->args[1] = m;
+			cb->args[2] = m;
 			read_unlock(&fib_hash_lock);
 			return -1;
 		}
 	}
 	read_unlock(&fib_hash_lock);
-	cb->args[1] = m;
+	cb->args[2] = m;
 	return skb->len;
 }
 
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index e6d1f5a..a41ab4b 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -149,8 +149,8 @@ static struct fib_table *fib_empty_table
 	u32 id;
 
 	for (id = 1; id <= RT_TABLE_MAX; id++)
-		if (fib_tables[id] == NULL)
-			return __fib_new_table(id);
+		if (fib_get_table(id) == NULL)
+			return fib_new_table(id);
 	return NULL;
 }
 
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 3936f16..92b1d77 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1848,7 +1848,7 @@ static int fn_trie_dump_fa(t_key key, in
 
 	u32 xkey = htonl(key);
 
-	s_i = cb->args[3];
+	s_i = cb->args[4];
 	i = 0;
 
 	/* rcu_read_lock is hold by caller */
@@ -1870,12 +1870,12 @@ static int fn_trie_dump_fa(t_key key, in
 				  plen,
 				  fa->fa_tos,
 				  fa->fa_info, 0) < 0) {
-			cb->args[3] = i;
+			cb->args[4] = i;
 			return -1;
 		}
 		i++;
 	}
-	cb->args[3] = i;
+	cb->args[4] = i;
 	return skb->len;
 }
 
@@ -1886,14 +1886,14 @@ static int fn_trie_dump_plen(struct trie
 	struct list_head *fa_head;
 	struct leaf *l = NULL;
 
-	s_h = cb->args[2];
+	s_h = cb->args[3];
 
 	for (h = 0; (l = nextleaf(t, l)) != NULL; h++) {
 		if (h < s_h)
 			continue;
 		if (h > s_h)
-			memset(&cb->args[3], 0,
-			       sizeof(cb->args) - 3*sizeof(cb->args[0]));
+			memset(&cb->args[4], 0,
+			       sizeof(cb->args) - 4*sizeof(cb->args[0]));
 
 		fa_head = get_fa_head(l, plen);
 
@@ -1904,11 +1904,11 @@ static int fn_trie_dump_plen(struct trie
 			continue;
 
 		if (fn_trie_dump_fa(l->key, plen, fa_head, tb, skb, cb)<0) {
-			cb->args[2] = h;
+			cb->args[3] = h;
 			return -1;
 		}
 	}
-	cb->args[2] = h;
+	cb->args[3] = h;
 	return skb->len;
 }
 
@@ -1917,23 +1917,23 @@ static int fn_trie_dump(struct fib_table
 	int m, s_m;
 	struct trie *t = (struct trie *) tb->tb_data;
 
-	s_m = cb->args[1];
+	s_m = cb->args[2];
 
 	rcu_read_lock();
 	for (m = 0; m <= 32; m++) {
 		if (m < s_m)
 			continue;
 		if (m > s_m)
-			memset(&cb->args[2], 0,
-				sizeof(cb->args) - 2*sizeof(cb->args[0]));
+			memset(&cb->args[3], 0,
+				sizeof(cb->args) - 3*sizeof(cb->args[0]));
 
 		if (fn_trie_dump_plen(t, 32-m, tb, skb, cb)<0) {
-			cb->args[1] = m;
+			cb->args[2] = m;
 			goto out;
 		}
 	}
 	rcu_read_unlock();
-	cb->args[1] = m;
+	cb->args[2] = m;
 	return skb->len;
 out:
 	rcu_read_unlock();

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC DECNET 04/04]: Increase number of possible routing tables to 2^32
  2006-07-03  7:52 [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
                   ` (2 preceding siblings ...)
  2006-07-03  7:53 ` [RFC IPV4 03/04]: Increase number of possible routing tables to 2^32 Patrick McHardy
@ 2006-07-03  7:53 ` Patrick McHardy
  2006-07-03 11:20   ` Steven Whitehouse
  2006-07-03  9:23 ` [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
  2006-07-07  8:05 ` Patrick McHardy
  5 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2006-07-03  7:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, greearb, Patrick McHardy

[DECNET]: Increase number of possible routing tables to 2^32

Increase the nubmer of possible routing tables to 2^32 by replacing the
fixed sized array of tables by a hash table.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 2bafd208cbec6b6291662bf39d94f1f9e3a54e31
tree 1b922ca700a00f4fcb97d7567d85bd9d49b5bc90
parent aab791510bc6fb2392ac361b0375f60a24b02659
author Patrick McHardy <kaber@trash.net> Mon, 03 Jul 2006 09:21:50 +0200
committer Patrick McHardy <kaber@trash.net> Mon, 03 Jul 2006 09:21:50 +0200

 include/net/dn_fib.h  |    3 -
 net/decnet/dn_fib.c   |   49 --------------------
 net/decnet/dn_rules.c |    2 -
 net/decnet/dn_table.c |  122 +++++++++++++++++++++++++++++++++++--------------
 4 files changed, 90 insertions(+), 86 deletions(-)

diff --git a/include/net/dn_fib.h b/include/net/dn_fib.h
index 9464f48..8098bdd 100644
--- a/include/net/dn_fib.h
+++ b/include/net/dn_fib.h
@@ -94,6 +94,7 @@ #define DN_FIB_INFO(f) ((f)->fn_info)
 
 
 struct dn_fib_table {
+	struct hlist_node hlist;
 	u32 n;
 
 	int (*insert)(struct dn_fib_table *t, struct rtmsg *r, 
@@ -179,8 +180,6 @@ static inline void dn_fib_res_put(struct
 		dn_fib_rule_put(res->r);
 }
 
-extern struct dn_fib_table *dn_fib_tables[];
-
 #else /* Endnode */
 
 #define dn_fib_init()  do { } while(0)
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index a43e59b..7127d36 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -532,39 +532,6 @@ int dn_fib_rtm_newroute(struct sk_buff *
 	return -ENOBUFS;
 }
 
-
-int dn_fib_dump(struct sk_buff *skb, struct netlink_callback *cb)
-{
-	u32 t;
-	u32 s_t;
-	struct dn_fib_table *tb;
-
-	if (NLMSG_PAYLOAD(cb->nlh, 0) >= sizeof(struct rtmsg) &&
-		((struct rtmsg *)NLMSG_DATA(cb->nlh))->rtm_flags&RTM_F_CLONED)
-			return dn_cache_dump(skb, cb);
-
-	s_t = cb->args[0];
-	if (s_t == 0)
-		s_t = cb->args[0] = RT_MIN_TABLE;
-
-	for(t = s_t; t <= RT_TABLE_MAX; t++) {
-		if (t < s_t)
-			continue;
-		if (t > s_t)
-			memset(&cb->args[1], 0,
-			       sizeof(cb->args) - sizeof(cb->args[0]));
-		tb = dn_fib_get_table(t, 0);
-		if (tb == NULL)
-			continue;
-		if (tb->dump(tb, skb, cb) < 0)
-			break;
-	}
-
-	cb->args[0] = t;
-
-	return skb->len;
-}
-
 static void fib_magic(int cmd, int type, __le16 dst, int dst_len, struct dn_ifaddr *ifa)
 {
 	struct dn_fib_table *tb;
@@ -762,22 +729,6 @@ int dn_fib_sync_up(struct net_device *de
         return ret;
 }
 
-void dn_fib_flush(void)
-{
-        int flushed = 0;
-        struct dn_fib_table *tb;
-        u32 id;
-
-        for(id = RT_TABLE_MAX; id > 0; id--) {
-                if ((tb = dn_fib_get_table(id, 0)) == NULL)
-                        continue;
-                flushed += tb->flush(tb);
-        }
-
-        if (flushed)
-                dn_rt_cache_flush(-1);
-}
-
 static struct notifier_block dn_fib_dnaddr_notifier = {
 	.notifier_call = dn_fib_dnaddr_event,
 };
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index d274d59..6d1752a 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -273,7 +273,7 @@ unsigned dnet_addr_type(__le16 addr)
 	struct flowi fl = { .nl_u = { .dn_u = { .daddr = addr } } };
 	struct dn_fib_res res;
 	unsigned ret = RTN_UNICAST;
-	struct dn_fib_table *tb = dn_fib_tables[RT_TABLE_LOCAL];
+	struct dn_fib_table *tb = dn_fib_get_table(RT_TABLE_LOCAL, 0);
 
 	res.r = NULL;
 
diff --git a/net/decnet/dn_table.c b/net/decnet/dn_table.c
index b165282..0c3417c 100644
--- a/net/decnet/dn_table.c
+++ b/net/decnet/dn_table.c
@@ -74,9 +74,9 @@ #define DN_FIB_SCAN_KEY(f, fp, key) \
 for( ; ((f) = *(fp)) != NULL && dn_key_eq((f)->fn_key, (key)); (fp) = &(f)->fn_next)
 
 #define RT_TABLE_MIN 1
-
+#define DN_FIB_TABLE_HASHSZ 256
+static struct hlist_head dn_fib_table_hash[DN_FIB_TABLE_HASHSZ];
 static DEFINE_RWLOCK(dn_fib_tables_lock);
-struct dn_fib_table *dn_fib_tables[RT_TABLE_MAX + 1];
 
 static kmem_cache_t *dn_hash_kmem __read_mostly;
 static int dn_fib_hash_zombies;
@@ -365,7 +365,7 @@ static __inline__ int dn_hash_dump_bucke
 {
 	int i, s_i;
 
-	s_i = cb->args[3];
+	s_i = cb->args[4];
 	for(i = 0; f; i++, f = f->fn_next) {
 		if (i < s_i)
 			continue;
@@ -378,11 +378,11 @@ static __inline__ int dn_hash_dump_bucke
 				(f->fn_state & DN_S_ZOMBIE) ? 0 : f->fn_type,
 				f->fn_scope, &f->fn_key, dz->dz_order, 
 				f->fn_info, NLM_F_MULTI) < 0) {
-			cb->args[3] = i;
+			cb->args[4] = i;
 			return -1;
 		}
 	}
-	cb->args[3] = i;
+	cb->args[4] = i;
 	return skb->len;
 }
 
@@ -393,20 +393,20 @@ static __inline__ int dn_hash_dump_zone(
 {
 	int h, s_h;
 
-	s_h = cb->args[2];
+	s_h = cb->args[3];
 	for(h = 0; h < dz->dz_divisor; h++) {
 		if (h < s_h)
 			continue;
 		if (h > s_h)
-			memset(&cb->args[3], 0, sizeof(cb->args) - 3*sizeof(cb->args[0]));
+			memset(&cb->args[4], 0, sizeof(cb->args) - 4*sizeof(cb->args[0]));
 		if (dz->dz_hash == NULL || dz->dz_hash[h] == NULL)
 			continue;
 		if (dn_hash_dump_bucket(skb, cb, tb, dz, dz->dz_hash[h]) < 0) {
-			cb->args[2] = h;
+			cb->args[3] = h;
 			return -1;
 		}
 	}
-	cb->args[2] = h;
+	cb->args[3] = h;
 	return skb->len;
 }
 
@@ -417,26 +417,61 @@ static int dn_fib_table_dump(struct dn_f
 	struct dn_zone *dz;
 	struct dn_hash *table = (struct dn_hash *)tb->data;
 
-	s_m = cb->args[1];
+	s_m = cb->args[2];
 	read_lock(&dn_fib_tables_lock);
 	for(dz = table->dh_zone_list, m = 0; dz; dz = dz->dz_next, m++) {
 		if (m < s_m)
 			continue;
 		if (m > s_m)
-			memset(&cb->args[2], 0, sizeof(cb->args) - 2*sizeof(cb->args[0]));
+			memset(&cb->args[3], 0, sizeof(cb->args) - 3*sizeof(cb->args[0]));
 
 		if (dn_hash_dump_zone(skb, cb, tb, dz) < 0) {
-			cb->args[1] = m;
+			cb->args[2] = m;
 			read_unlock(&dn_fib_tables_lock);
 			return -1;
 		}
 	}
 	read_unlock(&dn_fib_tables_lock);
-	cb->args[1] = m;
+	cb->args[2] = m;
 
         return skb->len;
 }
 
+int dn_fib_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	unsigned int h, s_h;
+	unsigned int e = 0, s_e;
+	struct dn_fib_table *tb;
+	struct hlist_node *node;
+
+	if (NLMSG_PAYLOAD(cb->nlh, 0) >= sizeof(struct rtmsg) &&
+		((struct rtmsg *)NLMSG_DATA(cb->nlh))->rtm_flags&RTM_F_CLONED)
+			return dn_cache_dump(skb, cb);
+
+	s_h = cb->args[0];
+	s_e = cb->args[1];
+
+	for (h = s_h; h < DN_FIB_TABLE_HASHSZ; h++) {
+		e = 0;
+		hlist_for_each_entry(tb, node, &dn_fib_table_hash[h], hlist) {
+			if (e < s_e)
+				goto next;
+			if (e > s_e)
+				memset(&cb->args[2], 0, sizeof(cb->args) -
+				                 2 * sizeof(cb->args[0]));
+			if (tb->dump(tb, skb, cb) < 0)
+				goto out;
+next:
+			e++;
+		}
+	}
+out:
+	cb->args[1] = e;
+	cb->args[0] = h;
+
+	return skb->len;
+}
+
 static int dn_fib_table_insert(struct dn_fib_table *tb, struct rtmsg *r, struct dn_kern_rta *rta, struct nlmsghdr *n, struct netlink_skb_parms *req)
 {
 	struct dn_hash *table = (struct dn_hash *)tb->data;
@@ -748,6 +783,8 @@ out:
 struct dn_fib_table *dn_fib_get_table(u32 n, int create)
 {
         struct dn_fib_table *t;
+	struct hlist_node *node;
+	unsigned int h;
 
         if (n < RT_TABLE_MIN)
                 return NULL;
@@ -755,8 +792,15 @@ struct dn_fib_table *dn_fib_get_table(u3
         if (n > RT_TABLE_MAX)
                 return NULL;
 
-        if (dn_fib_tables[n]) 
-                return dn_fib_tables[n];
+	h = n & (DN_FIB_TABLE_HASHSZ - 1);
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(t, node, &dn_fib_table_hash[h], hlist) {
+		if (t->n == n) {
+			rcu_read_unlock();
+			return t;
+		}
+	}
+	rcu_read_unlock();
 
         if (!create)
                 return NULL;
@@ -777,33 +821,37 @@ struct dn_fib_table *dn_fib_get_table(u3
         t->flush  = dn_fib_table_flush;
         t->dump = dn_fib_table_dump;
 	memset(t->data, 0, sizeof(struct dn_hash));
-        dn_fib_tables[n] = t;
+	hlist_add_head_rcu(&t->hlist, &dn_fib_table_hash[h]);
 
         return t;
 }
 
-static void dn_fib_del_tree(u32 n)
-{
-	struct dn_fib_table *t;
-
-	write_lock(&dn_fib_tables_lock);
-	t = dn_fib_tables[n];
-	dn_fib_tables[n] = NULL;
-	write_unlock(&dn_fib_tables_lock);
-
-	kfree(t);
-}
-
 struct dn_fib_table *dn_fib_empty_table(void)
 {
         u32 id;
 
         for(id = RT_TABLE_MIN; id <= RT_TABLE_MAX; id++)
-                if (dn_fib_tables[id] == NULL)
+		if (dn_fib_get_table(id, 0) == NULL)
                         return dn_fib_get_table(id, 1);
         return NULL;
 }
 
+void dn_fib_flush(void)
+{
+        int flushed = 0;
+        struct dn_fib_table *tb;
+	struct hlist_node *node;
+	unsigned int h;
+
+	for (h = 0; h < DN_FIB_TABLE_HASHSZ; h++) {
+		hlist_for_each_entry(tb, node, &dn_fib_table_hash[h], hlist)
+	                flushed += tb->flush(tb);
+        }
+
+        if (flushed)
+                dn_rt_cache_flush(-1);
+}
+
 void __init dn_fib_table_init(void)
 {
 	dn_hash_kmem = kmem_cache_create("dn_fib_info_cache",
@@ -814,10 +862,16 @@ void __init dn_fib_table_init(void)
 
 void __exit dn_fib_table_cleanup(void)
 {
-	int i;
-
-	for (i = RT_TABLE_MIN; i <= RT_TABLE_MAX; ++i)
-		dn_fib_del_tree(i);
+	struct dn_fib_table *t;
+	struct hlist_node *node, *next;
+	unsigned int h;
 
-	return;
+	write_lock(&dn_fib_tables_lock);
+	for (h = 0; h < DN_FIB_TABLE_HASHSZ; h++)
+		hlist_for_each_entry_safe(t, node, next, &dn_fib_table_hash[h],
+		                          hlist) {
+			hlist_del(&t->hlist);
+			kfree(t);
+	}
+	write_unlock(&dn_fib_tables_lock);
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-03  7:52 [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
                   ` (3 preceding siblings ...)
  2006-07-03  7:53 ` [RFC DECNET 04/04]: " Patrick McHardy
@ 2006-07-03  9:23 ` Patrick McHardy
  2006-07-03  9:38   ` Patrick McHardy
  2006-07-07  8:05 ` Patrick McHardy
  5 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2006-07-03  9:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, greearb, Stephen Hemminger

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

Patrick McHardy wrote:
> I took on Ben's challenge to increase the number of possible routing tables,
> these are the resulting patches.
> 
> The table IDs are changed to 32 bit values and are contained in a new netlink
> routing attribute. For compatibility rtm_table in struct rtmsg can still be
> used to access the first 255 tables and contains the low 8 bit of the table
> ID in case of dumps. Unfortunately there are no invalid values for rtm_table,
> so the best userspace can do in case of a new iproute version that tries to
> access tables > 255 on an old kernel is to use RTM_UNSPEC (0) for rtm_table,
> which will make the kernel allocate an empty table instead of silently adding
> routes to a more or less random table. The iproute patch will follow shortly.

Actually that last part wasn't entirely true. The last couple of
releases of the kernel include the inet_check_attr function,
which (unwillingly) breaks with the tradition of ignoring
unknown attributes and signals an error on receiving the RTA_TABLE
attribute. So the iproute patch only includes the RTA_TABLE
attribute when the table ID is > 255, in which case rtm_table
is set to RT_TABLE_UNSPEC. Old kernels will still have the
behaviour I described above. The patch has been tested to
behave as expected on both patched and unpatched kernels.


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 10047 bytes --]

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 5e33a20..7573c62 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -238,9 +238,8 @@ enum rt_class_t
 	RT_TABLE_DEFAULT=253,
 	RT_TABLE_MAIN=254,
 	RT_TABLE_LOCAL=255,
-	__RT_TABLE_MAX
 };
-#define RT_TABLE_MAX (__RT_TABLE_MAX - 1)
+#define RT_TABLE_MAX 0xFFFFFFFF
 
 
 
@@ -263,6 +262,7 @@ enum rtattr_type_t
 	RTA_CACHEINFO,
 	RTA_SESSION,
 	RTA_MP_ALGO,
+	RTA_TABLE,
 	__RTA_MAX
 };
 
diff --git a/include/rt_names.h b/include/rt_names.h
index 2d9ef10..07a10e0 100644
--- a/include/rt_names.h
+++ b/include/rt_names.h
@@ -5,7 +5,7 @@ #include <asm/types.h>
 
 char* rtnl_rtprot_n2a(int id, char *buf, int len);
 char* rtnl_rtscope_n2a(int id, char *buf, int len);
-char* rtnl_rttable_n2a(int id, char *buf, int len);
+char* rtnl_rttable_n2a(__u32 id, char *buf, int len);
 char* rtnl_rtrealm_n2a(int id, char *buf, int len);
 char* rtnl_dsfield_n2a(int id, char *buf, int len);
 int rtnl_rtprot_a2n(__u32 *id, char *arg);
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 1fe4a69..8b286b0 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -32,4 +32,12 @@ extern int do_multiaddr(int argc, char *
 extern int do_multiroute(int argc, char **argv);
 extern int do_xfrm(int argc, char **argv);
 
+static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb)
+{
+	__u32 table = r->rtm_table;
+	if (tb[RTA_TABLE])
+		table = *(__u32*) RTA_DATA(tb[RTA_TABLE]);
+	return table;
+}
+
 extern struct rtnl_handle rth;
diff --git a/ip/iproute.c b/ip/iproute.c
index a43c09e..4ebe617 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -75,7 +75,8 @@ static void usage(void)
 
 static struct
 {
-	int tb;
+	__u32 tb;
+	int cloned;
 	int flushed;
 	char *flushb;
 	int flushp;
@@ -125,6 +126,7 @@ int print_route(const struct sockaddr_nl
 	inet_prefix prefsrc;
 	inet_prefix via;
 	int host_len = -1;
+	__u32 table;
 	SPRINT_BUF(b1);
 	
 
@@ -151,27 +153,23 @@ int print_route(const struct sockaddr_nl
 		host_len = 80;
 
 	if (r->rtm_family == AF_INET6) {
+		if (filter.cloned) {
+			if (!(r->rtm_flags&RTM_F_CLONED))
+				return 0;
+		}
 		if (filter.tb) {
-			if (filter.tb < 0) {
-				if (!(r->rtm_flags&RTM_F_CLONED))
-					return 0;
-			} else {
-				if (r->rtm_flags&RTM_F_CLONED)
+			if (r->rtm_flags&RTM_F_CLONED)
+				return 0;
+			if (filter.tb == RT_TABLE_LOCAL) {
+				if (r->rtm_type != RTN_LOCAL)
 					return 0;
-				if (filter.tb == RT_TABLE_LOCAL) {
-					if (r->rtm_type != RTN_LOCAL)
-						return 0;
-				} else if (filter.tb == RT_TABLE_MAIN) {
-					if (r->rtm_type == RTN_LOCAL)
-						return 0;
-				} else {
+			} else if (filter.tb == RT_TABLE_MAIN) {
+				if (r->rtm_type == RTN_LOCAL)
 					return 0;
-				}
+			} else {
+				return 0;
 			}
 		}
-	} else {
-		if (filter.tb > 0 && filter.tb != r->rtm_table)
-			return 0;
 	}
 	if ((filter.protocol^r->rtm_protocol)&filter.protocolmask)
 		return 0;
@@ -225,6 +223,10 @@ int print_route(const struct sockaddr_nl
 			memcpy(&prefsrc.data, RTA_DATA(tb[RTA_PREFSRC]), host_len/8);
 	}
 
+	table = rtm_get_table(r, tb);
+	if (r->rtm_family == AF_INET && filter.tb > 0 && filter.tb != table)
+		return 0;
+
 	if (filter.rdst.family && inet_addr_match(&dst, &filter.rdst, filter.rdst.bitlen))
 		return 0;
 	if (filter.mdst.family && filter.mdst.bitlen >= 0 &&
@@ -354,8 +356,8 @@ int print_route(const struct sockaddr_nl
 		fprintf(fp, "dev %s ", ll_index_to_name(*(int*)RTA_DATA(tb[RTA_OIF])));
 
 	if (!(r->rtm_flags&RTM_F_CLONED)) {
-		if (r->rtm_table != RT_TABLE_MAIN && !filter.tb)
-			fprintf(fp, " table %s ", rtnl_rttable_n2a(r->rtm_table, b1, sizeof(b1)));
+		if (table != RT_TABLE_MAIN && !filter.tb)
+			fprintf(fp, " table %s ", rtnl_rttable_n2a(table, b1, sizeof(b1)));
 		if (r->rtm_protocol != RTPROT_BOOT && filter.protocolmask != -1)
 			fprintf(fp, " proto %s ", rtnl_rtprot_n2a(r->rtm_protocol, b1, sizeof(b1)));
 		if (r->rtm_scope != RT_SCOPE_UNIVERSE && filter.scopemask != -1)
@@ -840,7 +842,12 @@ #endif
 			NEXT_ARG();
 			if (rtnl_rttable_a2n(&tid, *argv))
 				invarg("\"table\" value is invalid\n", *argv);
-			req.r.rtm_table = tid;
+			if (tid < 256)
+				req.r.rtm_table = tid;
+			else {
+				req.r.rtm_table = RT_TABLE_UNSPEC;
+				addattr32(&req.n, sizeof(req), RTA_TABLE, tid);
+			}
 			table_ok = 1;
 		} else if (strcmp(*argv, "dev") == 0 ||
 			   strcmp(*argv, "oif") == 0) {
@@ -1022,7 +1029,7 @@ static int iproute_list_or_flush(int arg
 			filter.tb = tid;
 		} else if (matches(*argv, "cached") == 0 ||
 			   matches(*argv, "cloned") == 0) {
-			filter.tb = -1;
+			filter.cloned = 1;
 		} else if (strcmp(*argv, "tos") == 0 ||
 			   matches(*argv, "dsfield") == 0) {
 			__u32 tos;
diff --git a/ip/iprule.c b/ip/iprule.c
index ccf699f..6caf573 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -27,6 +27,7 @@ #include <string.h>
 
 #include "rt_names.h"
 #include "utils.h"
+#include "ip_common.h"
 
 extern struct rtnl_handle rth;
 
@@ -51,6 +52,7 @@ static int print_rule(const struct socka
 	struct rtmsg *r = NLMSG_DATA(n);
 	int len = n->nlmsg_len;
 	int host_len = -1;
+	__u32 table;
 	struct rtattr * tb[RTA_MAX+1];
 	char abuf[256];
 	SPRINT_BUF(b1);
@@ -129,8 +131,9 @@ static int print_rule(const struct socka
 		fprintf(fp, "iif %s ", (char*)RTA_DATA(tb[RTA_IIF]));
 	}
 
-	if (r->rtm_table)
-		fprintf(fp, "lookup %s ", rtnl_rttable_n2a(r->rtm_table, b1, sizeof(b1)));
+	table = rtm_get_table(r, tb);
+	if (table)
+		fprintf(fp, "lookup %s ", rtnl_rttable_n2a(table, b1, sizeof(b1)));
 
 	if (tb[RTA_FLOW]) {
 		__u32 to = *(__u32*)RTA_DATA(tb[RTA_FLOW]);
@@ -257,7 +260,12 @@ static int iprule_modify(int cmd, int ar
 			NEXT_ARG();
 			if (rtnl_rttable_a2n(&tid, *argv))
 				invarg("invalid table ID\n", *argv);
-			req.r.rtm_table = tid;
+			if (tid < 256)
+				req.r.rtm_table = tid;
+			else {
+				req.r.rtm_table = RT_TABLE_UNSPEC;
+				addattr32(&req.n, sizeof(req), RTA_TABLE, tid);
+			}
 			table_ok = 1;
 		} else if (strcmp(*argv, "dev") == 0 ||
 			   strcmp(*argv, "iif") == 0) {
diff --git a/lib/rt_names.c b/lib/rt_names.c
index 05046c2..2ff984a 100644
--- a/lib/rt_names.c
+++ b/lib/rt_names.c
@@ -23,6 +23,51 @@ #include <linux/rtnetlink.h>
 
 #include "rt_names.h"
 
+struct rtnl_hash_entry {
+	struct rtnl_hash_entry *next;
+	unsigned int		id;
+	char *			name;
+};
+
+static void
+rtnl_hash_initialize(char *file, struct rtnl_hash_entry **hash, int size)
+{
+	struct rtnl_hash_entry *entry;
+	char buf[512];
+	FILE *fp;
+
+	fp = fopen(file, "r");
+	if (!fp)
+		return;
+	while (fgets(buf, sizeof(buf), fp)) {
+		char *p = buf;
+		int id;
+		char namebuf[512];
+
+		while (*p == ' ' || *p == '\t')
+			p++;
+		if (*p == '#' || *p == '\n' || *p == 0)
+			continue;
+		if (sscanf(p, "0x%x %s\n", &id, namebuf) != 2 &&
+		    sscanf(p, "0x%x %s #", &id, namebuf) != 2 &&
+		    sscanf(p, "%d %s\n", &id, namebuf) != 2 &&
+		    sscanf(p, "%d %s #", &id, namebuf) != 2) {
+			fprintf(stderr, "Database %s is corrupted at %s\n",
+				file, p);
+			return;
+		}
+
+		if (id<0)
+			continue;
+		entry = malloc(sizeof(*entry));
+		entry->id   = id;
+		entry->name = strdup(namebuf);
+		entry->next = hash[id & (size - 1)];
+		hash[id & (size - 1)] = entry;
+	}
+	fclose(fp);
+}
+
 static void rtnl_tab_initialize(char *file, char **tab, int size)
 {
 	char buf[512];
@@ -57,7 +102,6 @@ static void rtnl_tab_initialize(char *fi
 	fclose(fp);
 }
 
-
 static char * rtnl_rtprot_tab[256] = {
 	[RTPROT_UNSPEC] = "none",
 	[RTPROT_REDIRECT] ="redirect",
@@ -266,9 +310,14 @@ int rtnl_rtrealm_a2n(__u32 *id, char *ar
 }
 
 
+static struct rtnl_hash_entry dflt_table_entry  = { .id = 253, .name = "default" };
+static struct rtnl_hash_entry main_table_entry  = { .id = 254, .name = "main" };
+static struct rtnl_hash_entry local_table_entry = { .id = 255, .name = "local" };
 
-static char * rtnl_rttable_tab[256] = {
-	"unspec",
+static struct rtnl_hash_entry * rtnl_rttable_hash[256] = {
+	[253] = &dflt_table_entry,
+	[254] = &main_table_entry,
+	[255] = &local_table_entry,
 };
 
 static int rtnl_rttable_init;
@@ -276,26 +325,26 @@ static int rtnl_rttable_init;
 static void rtnl_rttable_initialize(void)
 {
 	rtnl_rttable_init = 1;
-	rtnl_rttable_tab[255] = "local";
-	rtnl_rttable_tab[254] = "main";
-	rtnl_rttable_tab[253] = "default";
-	rtnl_tab_initialize("/etc/iproute2/rt_tables",
-			    rtnl_rttable_tab, 256);
+	rtnl_hash_initialize("/etc/iproute2/rt_tables",
+			     rtnl_rttable_hash, 256);
 }
 
-char * rtnl_rttable_n2a(int id, char *buf, int len)
+char * rtnl_rttable_n2a(__u32 id, char *buf, int len)
 {
-	if (id<0 || id>=256) {
-		snprintf(buf, len, "%d", id);
+	struct rtnl_hash_entry *entry;
+
+	if (id >= RT_TABLE_MAX) {
+		snprintf(buf, len, "%u", id);
 		return buf;
 	}
-	if (!rtnl_rttable_tab[id]) {
-		if (!rtnl_rttable_init)
-			rtnl_rttable_initialize();
-	}
-	if (rtnl_rttable_tab[id])
-		return rtnl_rttable_tab[id];
-	snprintf(buf, len, "%d", id);
+	if (!rtnl_rttable_init)
+		rtnl_rttable_initialize();
+	entry = rtnl_rttable_hash[id & 255];
+	while (entry && entry->id != id)
+		entry = entry->next;
+	if (entry)
+		return entry->name;
+	snprintf(buf, len, "%u", id);
 	return buf;
 }
 
@@ -303,8 +352,9 @@ int rtnl_rttable_a2n(__u32 *id, char *ar
 {
 	static char *cache = NULL;
 	static unsigned long res;
+	struct rtnl_hash_entry *entry;
 	char *end;
-	int i;
+	__u32 i;
 
 	if (cache && strcmp(cache, arg) == 0) {
 		*id = res;
@@ -315,9 +365,11 @@ int rtnl_rttable_a2n(__u32 *id, char *ar
 		rtnl_rttable_initialize();
 
 	for (i=0; i<256; i++) {
-		if (rtnl_rttable_tab[i] &&
-		    strcmp(rtnl_rttable_tab[i], arg) == 0) {
-			cache = rtnl_rttable_tab[i];
+		entry = rtnl_rttable_hash[i];
+		while (entry && strcmp(entry->name, arg))
+			entry = entry->next;
+		if (entry) {
+			cache = entry->name;
 			res = i;
 			*id = res;
 			return 0;
@@ -325,7 +377,7 @@ int rtnl_rttable_a2n(__u32 *id, char *ar
 	}
 
 	i = strtoul(arg, &end, 0);
-	if (!end || end == arg || *end || i > 255)
+	if (!end || end == arg || *end || i > RT_TABLE_MAX)
 		return -1;
 	*id = i;
 	return 0;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-03  9:23 ` [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
@ 2006-07-03  9:38   ` Patrick McHardy
  2006-07-03 11:34     ` Thomas Graf
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2006-07-03  9:38 UTC (permalink / raw)
  To: netdev; +Cc: davem, greearb, Stephen Hemminger

Patrick McHardy wrote:
> Patrick McHardy wrote:
> 
>>I took on Ben's challenge to increase the number of possible routing tables,
>>these are the resulting patches.
>>
>>The table IDs are changed to 32 bit values and are contained in a new netlink
>>routing attribute. For compatibility rtm_table in struct rtmsg can still be
>>used to access the first 255 tables and contains the low 8 bit of the table
>>ID in case of dumps. Unfortunately there are no invalid values for rtm_table,
>>so the best userspace can do in case of a new iproute version that tries to
>>access tables > 255 on an old kernel is to use RTM_UNSPEC (0) for rtm_table,
>>which will make the kernel allocate an empty table instead of silently adding
>>routes to a more or less random table. The iproute patch will follow shortly.
> 
> 
> Actually that last part wasn't entirely true. The last couple of
> releases of the kernel include the inet_check_attr function,
> which (unwillingly) breaks with the tradition of ignoring
> unknown attributes and signals an error on receiving the RTA_TABLE
> attribute. So the iproute patch only includes the RTA_TABLE
> attribute when the table ID is > 255, in which case rtm_table
> is set to RT_TABLE_UNSPEC. Old kernels will still have the
> behaviour I described above. The patch has been tested to
> behave as expected on both patched and unpatched kernels.

That wasn't entirely true either, its not inet_check_attr but
rtnetlink_rcv_message that aborts, and it does this on all
kernels. Somehow I thought unknown attributes were usually
ignored .. anyway, this is a good thing in this case as it
will avoid unexpected behaviour and simply return an error
on kernels where this feature is not available.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC DECNET 04/04]: Increase number of possible routing tables to 2^32
  2006-07-03  7:53 ` [RFC DECNET 04/04]: " Patrick McHardy
@ 2006-07-03 11:20   ` Steven Whitehouse
  2006-07-03 11:21     ` Patrick McHardy
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Whitehouse @ 2006-07-03 11:20 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, greearb, Patrick Caulfield

Hi,

On Mon, Jul 03, 2006 at 09:53:05AM +0200, Patrick McHardy wrote:
> [DECNET]: Increase number of possible routing tables to 2^32
> 
> Increase the nubmer of possible routing tables to 2^32 by replacing the
> fixed sized array of tables by a hash table.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>
>
I've had a quick look though the DECnet parts of this and it looks good,
atthough I've not had a chance to test it at all. Please cc Patrick Caulfield
on DECnet changes,

Steve.
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC DECNET 04/04]: Increase number of possible routing tables to 2^32
  2006-07-03 11:20   ` Steven Whitehouse
@ 2006-07-03 11:21     ` Patrick McHardy
  0 siblings, 0 replies; 20+ messages in thread
From: Patrick McHardy @ 2006-07-03 11:21 UTC (permalink / raw)
  To: Steven Whitehouse; +Cc: davem, netdev, greearb, Patrick Caulfield

Steven Whitehouse wrote:
> Hi,
> 
> On Mon, Jul 03, 2006 at 09:53:05AM +0200, Patrick McHardy wrote:
> 
>>[DECNET]: Increase number of possible routing tables to 2^32
>>
>>Increase the nubmer of possible routing tables to 2^32 by replacing the
>>fixed sized array of tables by a hash table.
>>
>>Signed-off-by: Patrick McHardy <kaber@trash.net>
>>
> I've had a quick look though the DECnet parts of this and it looks good,
> atthough I've not had a chance to test it at all. Please cc Patrick Caulfield
> on DECnet changes,

Thanks, will do on the next submission.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-03  9:38   ` Patrick McHardy
@ 2006-07-03 11:34     ` Thomas Graf
  2006-07-03 11:36       ` Patrick McHardy
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Graf @ 2006-07-03 11:34 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, davem, greearb, Stephen Hemminger

* Patrick McHardy <kaber@trash.net> 2006-07-03 11:38
> That wasn't entirely true either, its not inet_check_attr but
> rtnetlink_rcv_message that aborts, and it does this on all
> kernels. Somehow I thought unknown attributes were usually
> ignored ..

This only applies to the first level of rtnetlink attributes,
when using rtattr_parse() unknown attributes are ignored.

Once this ugly rta_buf has disappeared it will become more
consistent.

Patches look good to me except that new iproute binaries
won't work with older kernels anymore?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-03 11:34     ` Thomas Graf
@ 2006-07-03 11:36       ` Patrick McHardy
  2006-07-03 11:41         ` Thomas Graf
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2006-07-03 11:36 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev, davem, greearb, Stephen Hemminger

Thomas Graf wrote:
> * Patrick McHardy <kaber@trash.net> 2006-07-03 11:38
> 
>>That wasn't entirely true either, its not inet_check_attr but
>>rtnetlink_rcv_message that aborts, and it does this on all
>>kernels. Somehow I thought unknown attributes were usually
>>ignored ..
> 
> 
> This only applies to the first level of rtnetlink attributes,
> when using rtattr_parse() unknown attributes are ignored.
> 
> Once this ugly rta_buf has disappeared it will become more
> consistent.
> 
> Patches look good to me except that new iproute binaries
> won't work with older kernels anymore?

They will as long as this feature isn't used, the RTA_TABLE
attribute is only added to the message when the table id
is > 255. Worked fine during my tests, or are you refering
to something else?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-03 11:36       ` Patrick McHardy
@ 2006-07-03 11:41         ` Thomas Graf
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Graf @ 2006-07-03 11:41 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, davem, greearb, Stephen Hemminger

* Patrick McHardy <kaber@trash.net> 2006-07-03 13:36
> They will as long as this feature isn't used, the RTA_TABLE
> attribute is only added to the message when the table id
> is > 255. Worked fine during my tests, or are you refering
> to something else?

Perfect, I said nothing :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-03  7:52 [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
                   ` (4 preceding siblings ...)
  2006-07-03  9:23 ` [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
@ 2006-07-07  8:05 ` Patrick McHardy
  2006-07-07 18:13   ` Ben Greear
  5 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2006-07-07  8:05 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, greearb, Thomas Graf, Robert.Olsson

Patrick McHardy wrote:
> I took on Ben's challenge to increase the number of possible routing tables,
> these are the resulting patches.
> 
> The table IDs are changed to 32 bit values and are contained in a new netlink
> routing attribute. For compatibility rtm_table in struct rtmsg can still be
> used to access the first 255 tables and contains the low 8 bit of the table
> ID in case of dumps. Unfortunately there are no invalid values for rtm_table,
> so the best userspace can do in case of a new iproute version that tries to
> access tables > 255 on an old kernel is to use RTM_UNSPEC (0) for rtm_table,
> which will make the kernel allocate an empty table instead of silently adding
> routes to a more or less random table. The iproute patch will follow shortly.
> 
> The hash tables are statically sized since on-the-fly resizing would require
> introducing locking in the packet processing path (currently we need none),
> if this is a problem we could just directly attach table references to rules,
> since tables are never deleted or freed this would be a simple change.
> 
> One spot is still missing (nl_fib_lookup), so these patches are purely a RFC
> for now. Tested only with IPv4, I mainly converted DECNET as well to keep it
> in sync and because iteration over all possible table values, as done in many
> spots, has an unacceptable overhead with 32 bit values.


Since there were no objections, I would like to finalize this patch by
takeing care of nl_fib_lookup. Since it was introduced as a debugging
interface for fib_trie and the interface definitions are not even
public (contained in include/net), I wonder if anyone really cares about
backwards compatibility or if I can just change it.

Robert, Thomas, you are the only two users of the interface I'm aware
of, what do you think?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-07  8:05 ` Patrick McHardy
@ 2006-07-07 18:13   ` Ben Greear
  2006-07-07 19:58     ` Patrick McHardy
  0 siblings, 1 reply; 20+ messages in thread
From: Ben Greear @ 2006-07-07 18:13 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, Thomas Graf, Robert.Olsson

Patrick McHardy wrote:
> Patrick McHardy wrote:
> 
>>I took on Ben's challenge to increase the number of possible routing tables,
>>these are the resulting patches.

I am seeing problems..though they could be with the way I'm using the tool
or pehaps I patched the kernel incorrectly.

I applied the 3 patches to 2.6.17..all patches applied without problem,
but with a few lines of fuzz.  I get the same behaviour with and
without the new 'ip' patches applied.

If I do an 'ip ru show', then I see lots of tables, though not all it seems.
(I have not tried beyond 205 yet).  But, if I do an 'ip route show table XX',
then I see nothing or incorrect values.

For my test, I am creating 200 virtual interfaces (mac-vlans in my case, but
802.1q should work equally well.)  I am giving them all IP addrs on the same
subnet, and a routing table for each source IP addr.

The commands I run to generate the routing tables are found in this file:
http://www.candelatech.com/oss/gc.txt

When I change back to kernel 2.6.16.16 with only my patchset applied, things
seem to be working, so it looks like an issue with the new kernel patches.
I can provide access to this machine as well as my full patch set, etc...

For whatever reason, table 5 does appear in a bizarre fashion:

[greear@grok lanforge]$ more ~/tmp/ip.txt
[root@sb65g2 lanforge]# ip route show table 5
10.1.2.0/24 via 10.1.2.2 dev eth1#0
default via 10.1.2.1 dev eth1#0
[root@sb65g2 lanforge]# ip route show table 4
[root@sb65g2 lanforge]# ip route show table 3
[root@sb65g2 lanforge]# ip route show table 2
[root@sb65g2 lanforge]# ip route show table 1
[root@sb65g2 lanforge]# ip route show table 0
10.1.2.0/24 via 10.1.2.2 dev eth1#0  table 5
default via 10.1.2.1 dev eth1#0  table 5

#  Here is a listing of 'ip ru show'.
[greear@grok lanforge]$ more ~/tmp/ru.txt
0:      from all lookup local
31203:  from 10.1.2.144 lookup 147
31204:  from 10.1.2.143 lookup 146
31205:  from 10.1.2.142 lookup 145
31206:  from 10.1.2.141 lookup 144
31207:  from 10.1.2.140 lookup 143
31208:  from 10.1.2.139 lookup 142
31209:  from 10.1.2.138 lookup 141
31210:  from 10.1.2.137 lookup 140
31211:  from 10.1.2.136 lookup 139
31212:  from 10.1.2.135 lookup 138
31213:  from 10.1.2.134 lookup 137
31214:  from 10.1.2.133 lookup 136
31215:  from 10.1.2.132 lookup 135
31216:  from 10.1.2.131 lookup 134
31217:  from 10.1.2.130 lookup 133
31218:  from 10.1.2.129 lookup 132
31219:  from 10.1.2.128 lookup 131
31220:  from 10.1.2.127 lookup 130
31221:  from 10.1.2.126 lookup 129
31222:  from 10.1.2.125 lookup 128
31223:  from 10.1.2.124 lookup 127
31224:  from 10.1.2.123 lookup 126
31225:  from 10.1.2.122 lookup 125
31226:  from 10.1.2.121 lookup 124
31227:  from 10.1.2.120 lookup 123
31228:  from 10.1.2.119 lookup 122
31229:  from 10.1.2.118 lookup 121
31230:  from 10.1.2.117 lookup 120
31231:  from 10.1.2.116 lookup 119
31232:  from 10.1.2.115 lookup 118
31233:  from 10.1.2.114 lookup 117
31234:  from 10.1.2.113 lookup 116
31235:  from 10.1.2.201 lookup 204
31236:  from 10.1.2.200 lookup 203
31237:  from 10.1.2.199 lookup 202
31238:  from 10.1.2.198 lookup 201
31239:  from 10.1.2.197 lookup 200
31240:  from 10.1.2.196 lookup 199
31241:  from 10.1.2.195 lookup 198
31242:  from 10.1.2.112 lookup 115
31243:  from 10.1.2.111 lookup 114
31244:  from 10.1.2.110 lookup 113
31245:  from 10.1.2.109 lookup 112
31246:  from 10.1.2.108 lookup 111
31247:  from 10.1.2.107 lookup 110
31248:  from 10.1.2.106 lookup 109
31249:  from 10.1.2.105 lookup 108
31250:  from 10.1.2.104 lookup 107
31251:  from 10.1.2.103 lookup 106
31252:  from 10.1.2.102 lookup 105
31253:  from 10.1.2.101 lookup 104
31254:  from 10.1.2.100 lookup 103
31255:  from 10.1.2.99 lookup 102
31256:  from 10.1.2.98 lookup 101
31257:  from 10.1.2.97 lookup 100
31258:  from 10.1.2.96 lookup 99
31259:  from 10.1.2.95 lookup 98
31260:  from 10.1.2.94 lookup 97
31261:  from 10.1.2.93 lookup 96
31262:  from 10.1.2.92 lookup 95
31263:  from 10.1.2.91 lookup 94
31264:  from 10.1.2.90 lookup 93
31265:  from 10.1.2.89 lookup 92
31266:  from 10.1.2.88 lookup 91
31267:  from 10.1.2.87 lookup 90
31268:  from 10.1.2.86 lookup 89
31269:  from 10.1.2.85 lookup 88
31270:  from 10.1.2.84 lookup 87
31271:  from 10.1.2.83 lookup 86
31272:  from 10.1.2.82 lookup 85
31273:  from 10.1.2.81 lookup 84
31274:  from 10.1.2.80 lookup 83
31275:  from 10.1.2.79 lookup 82

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-07 18:13   ` Ben Greear
@ 2006-07-07 19:58     ` Patrick McHardy
  2006-07-07 23:59       ` David Miller
  2006-07-08  1:07       ` Ben Greear
  0 siblings, 2 replies; 20+ messages in thread
From: Patrick McHardy @ 2006-07-07 19:58 UTC (permalink / raw)
  To: Ben Greear; +Cc: davem, netdev, Thomas Graf, Robert.Olsson

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

Ben Greear wrote:
> Patrick McHardy wrote:
> 
>>> I took on Ben's challenge to increase the number of possible routing
>>> tables, these are the resulting patches.
> 
> 
> I am seeing problems..though they could be with the way I'm using the tool
> or pehaps I patched the kernel incorrectly.
> 
> I applied the 3 patches to 2.6.17..all patches applied without problem,
> but with a few lines of fuzz.  I get the same behaviour with and
> without the new 'ip' patches applied.
> 
> If I do an 'ip ru show', then I see lots of tables, though not all it
> seems. (I have not tried beyond 205 yet).  But, if I do an
> 'ip route show table XX', then I see nothing or incorrect values.

My patches introduced a bug when dumping tables which could lead to
incorrect routes beeing dumped. A second bug (that already existed)
makes the kernel fail when dumping more rules than fit in a skb.
I think I've already seen the patch to address the second problem
a short time ago sent by someone else. Anyway, this patch should
fix both.


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1526 bytes --]

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 3c49e6b..6e1aaa4 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -357,6 +357,7 @@ int inet_dump_fib(struct sk_buff *skb, s
 	unsigned int e = 0, s_e;
 	struct fib_table *tb;
 	struct hlist_node *node;
+	int dumped = 0;
 
 	if (NLMSG_PAYLOAD(cb->nlh, 0) >= sizeof(struct rtmsg) &&
 	    ((struct rtmsg*)NLMSG_DATA(cb->nlh))->rtm_flags&RTM_F_CLONED)
@@ -365,16 +366,17 @@ int inet_dump_fib(struct sk_buff *skb, s
 	s_h = cb->args[0];
 	s_e = cb->args[1];
 
-	for (h = s_h; h < FIB_TABLE_HASHSZ; h++) {
+	for (h = s_h; h < FIB_TABLE_HASHSZ; h++, s_e = 0) {
 		e = 0;
 		hlist_for_each_entry(tb, node, &fib_table_hash[h], tb_hlist) {
 			if (e < s_e)
 				goto next;
-			if (e > s_e)
-				memset(&cb->args[1], 0, sizeof(cb->args) -
+			if (dumped)
+				memset(&cb->args[2], 0, sizeof(cb->args) -
 				                 2 * sizeof(cb->args[0]));
 			if (tb->tb_dump(tb, skb, cb) < 0) 
 				goto out;
+			dumped = 1;
 next:
 			e++;
 		}
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index a41ab4b..6f33f12 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -459,13 +459,13 @@ int inet_dump_rules(struct sk_buff *skb,
 
 	rcu_read_lock();
 	hlist_for_each_entry(r, node, &fib_rules, hlist) {
-
 		if (idx < s_idx)
-			continue;
+			goto next;
 		if (inet_fill_rule(skb, r, NETLINK_CB(cb->skb).pid,
 				   cb->nlh->nlmsg_seq,
 				   RTM_NEWRULE, NLM_F_MULTI) < 0)
 			break;
+next:
 		idx++;
 	}
 	rcu_read_unlock();

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-07 19:58     ` Patrick McHardy
@ 2006-07-07 23:59       ` David Miller
  2006-07-08  2:45         ` Patrick McHardy
  2006-07-08  1:07       ` Ben Greear
  1 sibling, 1 reply; 20+ messages in thread
From: David Miller @ 2006-07-07 23:59 UTC (permalink / raw)
  To: kaber; +Cc: greearb, netdev, tgraf, Robert.Olsson

From: Patrick McHardy <kaber@trash.net>
Date: Fri, 07 Jul 2006 21:58:31 +0200

> My patches introduced a bug when dumping tables which could lead to
> incorrect routes beeing dumped. A second bug (that already existed)
> makes the kernel fail when dumping more rules than fit in a skb.
> I think I've already seen the patch to address the second problem
> a short time ago sent by someone else. Anyway, this patch should
> fix both.

Nice work Patrick.

You guys have a lot of time to flesh out any remaining issues and
failures, and then submit this for 2.6.19

Thanks again.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-07 19:58     ` Patrick McHardy
  2006-07-07 23:59       ` David Miller
@ 2006-07-08  1:07       ` Ben Greear
  2006-07-08  2:48         ` Patrick McHardy
  1 sibling, 1 reply; 20+ messages in thread
From: Ben Greear @ 2006-07-08  1:07 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, Thomas Graf, Robert.Olsson

Patrick McHardy wrote:
> Ben Greear wrote:
> 
>>Patrick McHardy wrote:
>>
>>
>>>>I took on Ben's challenge to increase the number of possible routing
>>>>tables, these are the resulting patches.
>>
>>
>>I am seeing problems..though they could be with the way I'm using the tool
>>or pehaps I patched the kernel incorrectly.
>>
>>I applied the 3 patches to 2.6.17..all patches applied without problem,
>>but with a few lines of fuzz.  I get the same behaviour with and
>>without the new 'ip' patches applied.
>>
>>If I do an 'ip ru show', then I see lots of tables, though not all it
>>seems. (I have not tried beyond 205 yet).  But, if I do an
>>'ip route show table XX', then I see nothing or incorrect values.
> 
> 
> My patches introduced a bug when dumping tables which could lead to
> incorrect routes beeing dumped. A second bug (that already existed)
> makes the kernel fail when dumping more rules than fit in a skb.
> I think I've already seen the patch to address the second problem
> a short time ago sent by someone else. Anyway, this patch should
> fix both.

With this patch applied everything is looking much better.  I currently
have 400+ interfaces and one routing table per interface, and traffic
is passing as expected.

This is probably due to my own application polling interfaces for
stat updates...but I am seeing over 50% usage (with more system than user-space)
in this setup on an otherwise lightly loaded system.  top shows no process averaging
more than about 2% CPU (and only 2-3 are above 0.0 typically), which I find
a little strange.  load is around 3.0.

I'll dig into my code and see if I can tune the stat-gathering logic a bit...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-07 23:59       ` David Miller
@ 2006-07-08  2:45         ` Patrick McHardy
  0 siblings, 0 replies; 20+ messages in thread
From: Patrick McHardy @ 2006-07-08  2:45 UTC (permalink / raw)
  To: David Miller; +Cc: greearb, netdev, tgraf, Robert.Olsson

David Miller wrote:
> Nice work Patrick.
> 
> You guys have a lot of time to flesh out any remaining issues and
> failures, and then submit this for 2.6.19

Will do, I already expected to miss the deadline :)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-08  1:07       ` Ben Greear
@ 2006-07-08  2:48         ` Patrick McHardy
  2006-07-08  5:06           ` Ben Greear
  0 siblings, 1 reply; 20+ messages in thread
From: Patrick McHardy @ 2006-07-08  2:48 UTC (permalink / raw)
  To: Ben Greear; +Cc: davem, netdev, Thomas Graf, Robert.Olsson

Ben Greear wrote:
> With this patch applied everything is looking much better.  I currently
> have 400+ interfaces and one routing table per interface, and traffic
> is passing as expected.
> 
> This is probably due to my own application polling interfaces for
> stat updates...but I am seeing over 50% usage (with more system than
> user-space)
> in this setup on an otherwise lightly loaded system.  top shows no
> process averaging
> more than about 2% CPU (and only 2-3 are above 0.0 typically), which I find
> a little strange.  load is around 3.0.

I can't imagine this beeing related to the increased number of
routing tables, with a number of entries slightly (not even two
times) over the hash size it shouldn't make that much of a
difference. It may of course be a bug, but I don't see it.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC NET 00/04]: Increase number of possible routing tables
  2006-07-08  2:48         ` Patrick McHardy
@ 2006-07-08  5:06           ` Ben Greear
  0 siblings, 0 replies; 20+ messages in thread
From: Ben Greear @ 2006-07-08  5:06 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: davem, netdev, Thomas Graf, Robert.Olsson

Patrick McHardy wrote:
> Ben Greear wrote:
> 
>>With this patch applied everything is looking much better.  I currently
>>have 400+ interfaces and one routing table per interface, and traffic
>>is passing as expected.
>>
>>This is probably due to my own application polling interfaces for
>>stat updates...but I am seeing over 50% usage (with more system than
>>user-space)
>>in this setup on an otherwise lightly loaded system.  top shows no
>>process averaging
>>more than about 2% CPU (and only 2-3 are above 0.0 typically), which I find
>>a little strange.  load is around 3.0.
> 
> 
> I can't imagine this beeing related to the increased number of
> routing tables, with a number of entries slightly (not even two
> times) over the hash size it shouldn't make that much of a
> difference. It may of course be a bug, but I don't see it.

I think it was my polling logic that was the problem.  I fixed it up to
be more clever and the load went away.

Ben

> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-07-08  5:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-03  7:52 [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
2006-07-03  7:53 ` [RFC NET 01/04]: Use u32 for routing table IDs Patrick McHardy
2006-07-03  7:53 ` [RFC NET 02/04]: Introduce RTA_TABLE routing attribute Patrick McHardy
2006-07-03  7:53 ` [RFC IPV4 03/04]: Increase number of possible routing tables to 2^32 Patrick McHardy
2006-07-03  7:53 ` [RFC DECNET 04/04]: " Patrick McHardy
2006-07-03 11:20   ` Steven Whitehouse
2006-07-03 11:21     ` Patrick McHardy
2006-07-03  9:23 ` [RFC NET 00/04]: Increase number of possible routing tables Patrick McHardy
2006-07-03  9:38   ` Patrick McHardy
2006-07-03 11:34     ` Thomas Graf
2006-07-03 11:36       ` Patrick McHardy
2006-07-03 11:41         ` Thomas Graf
2006-07-07  8:05 ` Patrick McHardy
2006-07-07 18:13   ` Ben Greear
2006-07-07 19:58     ` Patrick McHardy
2006-07-07 23:59       ` David Miller
2006-07-08  2:45         ` Patrick McHardy
2006-07-08  1:07       ` Ben Greear
2006-07-08  2:48         ` Patrick McHardy
2006-07-08  5:06           ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).