* ipv6 hitting route max_size @ 2011-06-06 21:37 Simon Kirby 2011-06-06 22:01 ` David Miller 0 siblings, 1 reply; 8+ messages in thread From: Simon Kirby @ 2011-06-06 21:37 UTC (permalink / raw) To: netdev, YOSHIFUJI Hideaki Hello, /proc/sys/net/ipv4/route/max_size is the maximum size of the ipv4 route _cache_, which is based on hash table size which is based on ram size. The route _cache_ (eg: ip -4 route show cache) is not allowed to grow beyond this size. gc_min_interval_ms was added to allow garbage collection to happen often enough that this would not be reached even under spoofed-address attacks (which we used to see happen before). /proc/sys/net/ipv6/route/max_size and a number of similar GC knobs exist, but max_size seems to limit the size of the v6 route table, not the v6 route cache. net/ipv6/route.c:2829 just sets this to 4096: net->ipv6.sysctl.ip6_rt_max_size = 4096; If I set up quagga and ipv6 bgp peering to the Internets, I get about 6075 routes today, exceeding this limit. This cases zebra to log errors such as this when it tries to add the routes to the kernel: netlink-cmd error: Cannot allocate memory, type=RTM_NEWROUTE(24), seq=27089196, pid=0 This goes away if I increase /proc/sys/net/ipv6/route/max_size. Is this cache limit somehow tied to route entries by some (un)intentional IPv6 feature? Reproduce with something like this (bash, 2.6.32 or 2.6.39 or similar): for ((i = 0;i < 4200;i++)); do ip route add unreachable 2000::$i; done Note that 4100 succeeds on my box, so something else is also happening. Simon- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ipv6 hitting route max_size 2011-06-06 21:37 ipv6 hitting route max_size Simon Kirby @ 2011-06-06 22:01 ` David Miller 2011-06-06 23:15 ` Simon Kirby 0 siblings, 1 reply; 8+ messages in thread From: David Miller @ 2011-06-06 22:01 UTC (permalink / raw) To: sim; +Cc: netdev, yoshfuji From: Simon Kirby <sim@hostway.ca> Date: Mon, 6 Jun 2011 14:37:27 -0700 > /proc/sys/net/ipv6/route/max_size and a number of similar GC knobs exist, > but max_size seems to limit the size of the v6 route table, not the v6 > route cache. There is no v6 route cache. Instead of a routing cache, ipv6 route lookups "clone" new routes into the same datastructre the route table is stored in. > Is this cache limit somehow tied to route entries by some (un)intentional > IPv6 feature? Again, there is no cache. The same datastructure holds the routing table, and cloned routes created by lookups. There is no seperation. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ipv6 hitting route max_size 2011-06-06 22:01 ` David Miller @ 2011-06-06 23:15 ` Simon Kirby 2011-06-06 23:28 ` David Miller 2011-06-07 7:56 ` David Miller 0 siblings, 2 replies; 8+ messages in thread From: Simon Kirby @ 2011-06-06 23:15 UTC (permalink / raw) To: David Miller; +Cc: netdev, yoshfuji On Mon, Jun 06, 2011 at 03:01:42PM -0700, David Miller wrote: > From: Simon Kirby <sim@hostway.ca> > Date: Mon, 6 Jun 2011 14:37:27 -0700 > > > /proc/sys/net/ipv6/route/max_size and a number of similar GC knobs exist, > > but max_size seems to limit the size of the v6 route table, not the v6 > > route cache. > > There is no v6 route cache. > > Instead of a routing cache, ipv6 route lookups "clone" new routes into > the same datastructre the route table is stored in. Ok, makes sense, but the result is now that ipv4 loads a full Internet table with no adjustments, while ipv6 does not. Would it make sense to change 4096 to 1048576, or would it be better to count only clones of the actual route or something along those lines? Simon- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ipv6 hitting route max_size 2011-06-06 23:15 ` Simon Kirby @ 2011-06-06 23:28 ` David Miller 2011-06-07 7:56 ` David Miller 1 sibling, 0 replies; 8+ messages in thread From: David Miller @ 2011-06-06 23:28 UTC (permalink / raw) To: sim; +Cc: netdev, yoshfuji From: Simon Kirby <sim@hostway.ca> Date: Mon, 6 Jun 2011 16:15:21 -0700 > On Mon, Jun 06, 2011 at 03:01:42PM -0700, David Miller wrote: > >> From: Simon Kirby <sim@hostway.ca> >> Date: Mon, 6 Jun 2011 14:37:27 -0700 >> >> > /proc/sys/net/ipv6/route/max_size and a number of similar GC knobs exist, >> > but max_size seems to limit the size of the v6 route table, not the v6 >> > route cache. >> >> There is no v6 route cache. >> >> Instead of a routing cache, ipv6 route lookups "clone" new routes into >> the same datastructre the route table is stored in. > > Ok, makes sense, but the result is now that ipv4 loads a full Internet > table with no adjustments, while ipv6 does not. Would it make sense to > change 4096 to 1048576, or would it be better to count only clones of > the actual route or something along those lines? The latter is probably the way to handle this problem. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ipv6 hitting route max_size 2011-06-06 23:15 ` Simon Kirby 2011-06-06 23:28 ` David Miller @ 2011-06-07 7:56 ` David Miller 2011-06-09 4:40 ` Simon Kirby 1 sibling, 1 reply; 8+ messages in thread From: David Miller @ 2011-06-07 7:56 UTC (permalink / raw) To: sim; +Cc: netdev, yoshfuji From: Simon Kirby <sim@hostway.ca> Date: Mon, 6 Jun 2011 16:15:21 -0700 > Ok, makes sense, but the result is now that ipv4 loads a full Internet > table with no adjustments, while ipv6 does not. Would it make sense to > change 4096 to 1048576, or would it be better to count only clones of > the actual route or something along those lines? Simon can you give this patch a try? diff --git a/include/net/dst.h b/include/net/dst.h index 7d15d23..e12ddfb 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -77,6 +77,7 @@ struct dst_entry { #define DST_NOPOLICY 0x0004 #define DST_NOHASH 0x0008 #define DST_NOCACHE 0x0010 +#define DST_NOCOUNT 0x0020 union { struct dst_entry *next; struct rtable __rcu *rt_next; diff --git a/net/core/dst.c b/net/core/dst.c index 9ccca03..6135f36 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -190,7 +190,8 @@ void *dst_alloc(struct dst_ops *ops, struct net_device *dev, dst->lastuse = jiffies; dst->flags = flags; dst->next = NULL; - dst_entries_add(ops, 1); + if (!(flags & DST_NOCOUNT)) + dst_entries_add(ops, 1); return dst; } EXPORT_SYMBOL(dst_alloc); @@ -243,7 +244,8 @@ again: neigh_release(neigh); } - dst_entries_add(dst->ops, -1); + if (!(dst->flags & DST_NOCOUNT)) + dst_entries_add(dst->ops, -1); if (dst->ops->destroy) dst->ops->destroy(dst); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index de2b1de..7fb44b0 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -228,7 +228,8 @@ static struct rt6_info ip6_blk_hole_entry_template = { /* allocate dst with ip6_dst_ops */ static inline struct rt6_info *ip6_dst_alloc(struct dst_ops *ops, - struct net_device *dev) + struct net_device *dev, + int flags) { struct rt6_info *rt = dst_alloc(ops, dev, 0, 0, 0); @@ -1042,7 +1043,7 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev, if (unlikely(idev == NULL)) return NULL; - rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, dev); + rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, dev, 0); if (unlikely(rt == NULL)) { in6_dev_put(idev); goto out; @@ -1214,7 +1215,7 @@ int ip6_route_add(struct fib6_config *cfg) goto out; } - rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, NULL); + rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, NULL, DST_NOCOUNT); if (rt == NULL) { err = -ENOMEM; @@ -1734,7 +1735,7 @@ static struct rt6_info * ip6_rt_copy(struct rt6_info *ort) { struct net *net = dev_net(ort->rt6i_dev); struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, - ort->dst.dev); + ort->dst.dev, 0); if (rt) { rt->dst.input = ort->dst.input; @@ -2013,7 +2014,7 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev, { struct net *net = dev_net(idev->dev); struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, - net->loopback_dev); + net->loopback_dev, 0); struct neighbour *neigh; if (rt == NULL) { ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: ipv6 hitting route max_size 2011-06-07 7:56 ` David Miller @ 2011-06-09 4:40 ` Simon Kirby 2011-06-24 21:35 ` David Miller 2011-06-24 21:57 ` David Miller 0 siblings, 2 replies; 8+ messages in thread From: Simon Kirby @ 2011-06-09 4:40 UTC (permalink / raw) To: David Miller; +Cc: netdev, yoshfuji On Tue, Jun 07, 2011 at 12:56:45AM -0700, David Miller wrote: > From: Simon Kirby <sim@hostway.ca> > Date: Mon, 6 Jun 2011 16:15:21 -0700 > > > Ok, makes sense, but the result is now that ipv4 loads a full Internet > > table with no adjustments, while ipv6 does not. Would it make sense to > > change 4096 to 1048576, or would it be better to count only clones of > > the actual route or something along those lines? > > Simon can you give this patch a try? Didn't apply to 2.6.39, so I tried 3.0-rc2, but I get an Oops when running the example reproduction case I gave before ( for ((i = 0;i < 4200;i++)); do ip route add unreachable 2000::$i; done ) both with and without your patch applied: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0 IP: [<ffffffff8143e2b7>] ip6_route_add+0xe7/0x6b0 PGD 3ed7c8067 PUD 3ed5a1067 PMD 0 Oops: 0002 [#1] SMP CPU 0 Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack tg3 e100 libphy Pid: 8932, comm: ip Not tainted 3.0.0-rc2-amd64-net #1 To Be Filled By O.E.M. To Be Filled By O.E.M./TYAN High-End Dual AMD Opteron, S2882 RIP: 0010:[<ffffffff8143e2b7>] [<ffffffff8143e2b7>] ip6_route_add+0xe7/0x6b0 RSP: 0018:ffff8803e59939f8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8803e5993a58 RCX: 0000000000000038 RDX: 00000000000000a0 RSI: 0000000000000008 RDI: 00000000000000a0 RBP: ffffffff817b3300 R08: ffffffff816c8980 R09: 0000000000000000 R10: 0000000000000001 R11: dead000000200200 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 00000000fffffff4 FS: 00007f5f11908700(0000) GS:ffff8803ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a0 CR3: 00000003edfdd000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 8932, threadinfo ffff8803e5992000, task ffff8803ed70dfa0) Stack: 0000000000000000 0000000000000000 ffff8803eecd2a80 0000000000000000 0000000000000000 ffff8803eedabe00 ffff8803ee671600 ffffffff813b8a30 ffff8803fe00ac00 ffff8803e5993b50 0000000000000000 ffffffff8143e89c Call Trace: [<ffffffff813b8a30>] ? rtnetlink_rcv+0x30/0x30 [<ffffffff8143e89c>] ? inet6_rtm_newroute+0x1c/0x30 [<ffffffff813cb3b9>] ? netlink_rcv_skb+0x89/0xb0 [<ffffffff813b8a1f>] ? rtnetlink_rcv+0x1f/0x30 [<ffffffff813cb013>] ? netlink_unicast+0x283/0x2d0 [<ffffffff813cb930>] ? netlink_sendmsg+0x230/0x390 [<ffffffff8139639b>] ? sock_sendmsg+0xab/0xe0 [<ffffffff810925eb>] ? __alloc_pages_nodemask+0x10b/0x700 [<ffffffff810a3fc2>] ? __do_fault+0x3e2/0x4c0 [<ffffffff81395b9e>] ? move_addr_to_kernel+0x2e/0x40 [<ffffffff813a1fd9>] ? verify_iovec+0x69/0xd0 [<ffffffff813972e2>] ? __sys_sendmsg+0x172/0x300 [<ffffffff81027465>] ? do_page_fault+0x1a5/0x430 [<ffffffff813cb6be>] ? netlink_autobind+0x8e/0xd0 [<ffffffff81395bfc>] ? move_addr_to_user+0x4c/0x60 [<ffffffff81396f55>] ? sys_getsockname+0xd5/0xe0 [<ffffffff81397634>] ? sys_sendmsg+0x44/0x80 [<ffffffff814a35bb>] ? system_call_fastpath+0x16/0x1b Code: 31 c9 31 d2 45 31 c0 31 f6 41 bf f4 ff ff ff e8 b0 2d f7 ff 48 8d 90 a0 00 00 00 49 89 c4 b9 38 00 00 00 31 c0 4d 85 e4 48 89 d7 <f3> ab 0f 84 06 03 00 00 66 41 c7 44 24 6a ff ff 31 c0 f6 43 16 RIP [<ffffffff8143e2b7>] ip6_route_add+0xe7/0x6b0 RSP <ffff8803e59939f8> CR2: 00000000000000a0 ---[ end trace 370907621d87fefc ]--- I don't see many changes to ip6_route_add other than c3968a857a6b6c3. Checking shortly once I get a git tree on this box, but no ipmi and I'm remote at the moment. Btw, maybe rt6_alloc_clone or rt6_alloc_cow needs to clear the DST_NOCOUNT flag from rt->dst.flags for it to count any of them? Didn't verify. Simon- ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ipv6 hitting route max_size 2011-06-09 4:40 ` Simon Kirby @ 2011-06-24 21:35 ` David Miller 2011-06-24 21:57 ` David Miller 1 sibling, 0 replies; 8+ messages in thread From: David Miller @ 2011-06-24 21:35 UTC (permalink / raw) To: sim; +Cc: netdev, yoshfuji From: Simon Kirby <sim@hostway.ca> Date: Wed, 8 Jun 2011 21:40:42 -0700 > Didn't apply to 2.6.39, so I tried 3.0-rc2, but I get an Oops when > running the example reproduction case I gave before ( I'll try to debug this. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ipv6 hitting route max_size 2011-06-09 4:40 ` Simon Kirby 2011-06-24 21:35 ` David Miller @ 2011-06-24 21:57 ` David Miller 1 sibling, 0 replies; 8+ messages in thread From: David Miller @ 2011-06-24 21:57 UTC (permalink / raw) To: sim; +Cc: netdev, yoshfuji From: Simon Kirby <sim@hostway.ca> Date: Wed, 8 Jun 2011 21:40:42 -0700 > Didn't apply to 2.6.39, so I tried 3.0-rc2, but I get an Oops when > running the example reproduction case I gave before ( > > for ((i = 0;i < 4200;i++)); do ip route add unreachable 2000::$i; done > > ) both with and without your patch applied: I tried to reproduce this with Linus's current tree but I cannot. Here is what I did: -------------------- hex.c -------------------- #include <stdio.h> int main(void) { int i; for (i = 0; i < 0x4200; i++) { printf("%04x ", i); } printf("\n"); return 0; } -------------------- hex.c -------------------- bash$ gcc -o hex hex.c bash$ for i in $(./hex); do ip route add unreachable 2000::$i; done bash$ It takes a bit of time to run, but no crash. :-) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-06-24 21:59 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-06 21:37 ipv6 hitting route max_size Simon Kirby 2011-06-06 22:01 ` David Miller 2011-06-06 23:15 ` Simon Kirby 2011-06-06 23:28 ` David Miller 2011-06-07 7:56 ` David Miller 2011-06-09 4:40 ` Simon Kirby 2011-06-24 21:35 ` David Miller 2011-06-24 21:57 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).