* [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table @ 2013-03-07 21:42 Hannes Frederic Sowa 2013-03-08 5:57 ` Hannes Frederic Sowa 0 siblings, 1 reply; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-07 21:42 UTC (permalink / raw) To: netdev; +Cc: eric.dumazet, yoshfuji ipv6_addr_hash can yield the exact same hash value for countless ip addresses from one /64 subnet. Let's make it a bit harder to create a long list of reassembly queues in the hash table by using a stronger hash. I spotted this just by looking at the source and did not verify if it is really the case, so this patch has RFC status. But the jhash change should not really hurt anyway. This patch is only compile tested. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> --- net/ipv6/reassembly.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 3c6a772..4d39cf3 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -79,7 +79,7 @@ unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr, { u32 c; - c = jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr), + c = jhash_3words(ipv6_addr_jhash(saddr), ipv6_addr_hash(daddr), (__force u32)id, rnd); return c & (INETFRAGS_HASHSZ - 1); -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-07 21:42 [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table Hannes Frederic Sowa @ 2013-03-08 5:57 ` Hannes Frederic Sowa 2013-03-08 13:04 ` Hannes Frederic Sowa 0 siblings, 1 reply; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-08 5:57 UTC (permalink / raw) To: netdev, eric.dumazet, yoshfuji On Thu, Mar 07, 2013 at 10:42:11PM +0100, Hannes Frederic Sowa wrote: > ipv6_addr_hash can yield the exact same hash value for countless ip > addresses from one /64 subnet. Let's make it a bit harder to create a > long list of reassembly queues in the hash table by using a stronger hash. > > I spotted this just by looking at the source and did not verify if it > is really the case, so this patch has RFC status. But the jhash change > should not really hurt anyway. This patch is only compile tested. Hmpf, I haven't seen Eric's change(279e9f2: ipv6: optimize inet6_hash_frag()) and tried to compare a v3.8 against a net-next kernel. At a first sight, it seems in some kind of denial of service scenario the relative amount of time spend in inet_frag_find increased, but I will do more comparable benchmarks later today (could be also because of other changes). I just wanted to let you know that I will do more research on this one and that you shouldn't apply this patch for now. I also noticed that this function is also used by netfilter in the forwarding path. Perhaps a jenkins hash on the destination address would be better, too. Perhaps a netfilter specific hash function could also be used. Greetings, Hannes ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 5:57 ` Hannes Frederic Sowa @ 2013-03-08 13:04 ` Hannes Frederic Sowa 2013-03-08 14:53 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-08 13:04 UTC (permalink / raw) To: netdev, eric.dumazet, yoshfuji On Fri, Mar 08, 2013 at 06:57:18AM +0100, Hannes Frederic Sowa wrote: > On Thu, Mar 07, 2013 at 10:42:11PM +0100, Hannes Frederic Sowa wrote: > > ipv6_addr_hash can yield the exact same hash value for countless ip > > addresses from one /64 subnet. Let's make it a bit harder to create a > > long list of reassembly queues in the hash table by using a stronger hash. > > > > I spotted this just by looking at the source and did not verify if it > > is really the case, so this patch has RFC status. But the jhash change > > should not really hurt anyway. This patch is only compile tested. > > Hmpf, I haven't seen Eric's change(279e9f2: ipv6: optimize > inet6_hash_frag()) and tried to compare a v3.8 against a net-next > kernel. At a first sight, it seems in some kind of denial of service > scenario the relative amount of time spend in inet_frag_find increased, > but I will do more comparable benchmarks later today (could be also > because of other changes). I just wanted to let you know that I will do > more research on this one and that you shouldn't apply this patch for now. > > I also noticed that this function is also used by netfilter in the > forwarding path. Perhaps a jenkins hash on the destination address would > be better, too. Perhaps a netfilter specific hash function could also > be used. Getting benchmarks on my box is actually pretty hairy without burning it down. :) Generating a flow of fragmented packets with frag_id set to zero and ipv6 addresses which generate the same ipv6_addr_hash I gathered this from a debug printk in inet6_hash_frag (current version from net-next): [ 25.992202] FRAG6: hash -635286572 [ 25.992218] FRAG6: hash -635286572 [ 25.992237] FRAG6: hash -635286572 [ 25.992253] FRAG6: hash -635286572 [ 25.992273] FRAG6: hash -635286572 [ 25.992289] FRAG6: hash -635286572 [ 25.992309] FRAG6: hash -635286572 [ 25.992327] FRAG6: hash -635286572 [ 25.992347] FRAG6: hash -635286572 [ 25.992363] FRAG6: hash -635286572 [ 25.992382] FRAG6: hash -635286572 [ 25.992398] FRAG6: hash -635286572 With the patches reverted: [ 39.770790] FRAG6: hash 1323153854 [ 39.773691] FRAG6: hash 1323153854 [ 39.776413] FRAG6: hash 977187067 [ 39.778782] FRAG6: hash 977187067 [ 39.781298] FRAG6: hash -1226453562 [ 39.784043] FRAG6: hash -1226453562 [ 39.786443] FRAG6: hash -923346920 [ 39.789431] FRAG6: hash -923346920 [ 39.792205] FRAG6: hash 282862966 [ 39.794918] FRAG6: hash 282862966 [ 39.797749] FRAG6: hash -1059175404 [ 39.800054] FRAG6: hash -1059175404 [ 39.802904] FRAG6: hash -1085381868 It is pretty simple to generate a source of different addresses that yield to the same hash value. So I am in favour to revert the optimization changes in 279e9f2. It seems that even ipv6_addr_jhash is not strong enough when used in the forwarding path for reassembly, because of the xor of the most significant 32 bits of the address. Quick test program is here: https://gist.github.com/hannes/5116331 Ok if I send a patch later to revert this change? What do you think about increasing the size of the fragmentation queue hash table INETFRAGS_HASHSZ, too? Thanks, Hannes ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 13:04 ` Hannes Frederic Sowa @ 2013-03-08 14:53 ` Eric Dumazet 2013-03-08 15:08 ` Hannes Frederic Sowa 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2013-03-08 14:53 UTC (permalink / raw) To: Hannes Frederic Sowa; +Cc: netdev, yoshfuji On Fri, 2013-03-08 at 14:04 +0100, Hannes Frederic Sowa wrote: > Ok if I send a patch later to revert this change? What do you think about > increasing the size of the fragmentation queue hash table INETFRAGS_HASHSZ, > too? Thats because reassembly hash was so small, and number of in flight reassembly is so small anyway, that I felt not worth having so many instructions to compute a hash that is truncated to 6 bits In real life I always advocate _not_ using fragments, I dont know why so many people try to used them. No matter how you hash, a hacker can easily fill your defrag unit with not complete datagrams, so what's the point ? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 14:53 ` Eric Dumazet @ 2013-03-08 15:08 ` Hannes Frederic Sowa 2013-03-08 15:23 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-08 15:08 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, yoshfuji On Fri, Mar 08, 2013 at 06:53:06AM -0800, Eric Dumazet wrote: > No matter how you hash, a hacker can easily fill your defrag unit with > not complete datagrams, so what's the point ? I want to harden reassembly logic against all fragments being put in the same hash bucket because of malicious traffic and thus creating long list traversals in the fragment queue hash table. I totally agree that fragments should be avoided if possible. Thanks, Hannes ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 15:08 ` Hannes Frederic Sowa @ 2013-03-08 15:23 ` Eric Dumazet 2013-03-08 15:54 ` Hannes Frederic Sowa ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Eric Dumazet @ 2013-03-08 15:23 UTC (permalink / raw) To: Hannes Frederic Sowa; +Cc: netdev, yoshfuji On Fri, 2013-03-08 at 16:08 +0100, Hannes Frederic Sowa wrote: > On Fri, Mar 08, 2013 at 06:53:06AM -0800, Eric Dumazet wrote: > > No matter how you hash, a hacker can easily fill your defrag unit with > > not complete datagrams, so what's the point ? > > I want to harden reassembly logic against all fragments being put in > the same hash bucket because of malicious traffic and thus creating > long list traversals in the fragment queue hash table. Note that the long traversal was a real issue with TCP (thats why I introduced ipv6_addr_jhash()), as a single ehash slot could contains thousand of sockets. But with fragments, we should just limit the depth of any particular slot, and drop above a particular threshold. reassembly is a best effort mechanism, better make sure it doesnt use all our cpu cycles. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 15:23 ` Eric Dumazet @ 2013-03-08 15:54 ` Hannes Frederic Sowa 2013-03-08 16:15 ` Eric Dumazet 2013-03-09 15:19 ` Hannes Frederic Sowa 2013-03-08 20:53 ` Hannes Frederic Sowa 2013-03-13 1:27 ` Hannes Frederic Sowa 2 siblings, 2 replies; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-08 15:54 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, yoshfuji On Fri, Mar 08, 2013 at 07:23:39AM -0800, Eric Dumazet wrote: > On Fri, 2013-03-08 at 16:08 +0100, Hannes Frederic Sowa wrote: > > On Fri, Mar 08, 2013 at 06:53:06AM -0800, Eric Dumazet wrote: > > > No matter how you hash, a hacker can easily fill your defrag unit with > > > not complete datagrams, so what's the point ? > > > > I want to harden reassembly logic against all fragments being put in > > the same hash bucket because of malicious traffic and thus creating > > long list traversals in the fragment queue hash table. > > Note that the long traversal was a real issue with TCP (thats why I > introduced ipv6_addr_jhash()), as a single ehash slot could contains > thousand of sockets. > > But with fragments, we should just limit the depth of any particular > slot, and drop above a particular threshold. > > reassembly is a best effort mechanism, better make sure it doesnt use > all our cpu cycles. Hm, I have to think about it, especially because it is used in the netfilter code. There could be some fairness issues if packets get dropped in netfilter if reassembly could not be performed. I'll check this. Btw. did s.o. have a look at skb->rxhash? I just started looking after its many users but perhaps someone did this job already. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 15:54 ` Hannes Frederic Sowa @ 2013-03-08 16:15 ` Eric Dumazet 2013-03-08 16:18 ` Hannes Frederic Sowa 2013-03-09 15:19 ` Hannes Frederic Sowa 1 sibling, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2013-03-08 16:15 UTC (permalink / raw) To: Hannes Frederic Sowa; +Cc: netdev, yoshfuji On Fri, 2013-03-08 at 16:54 +0100, Hannes Frederic Sowa wrote: > Btw. did s.o. have a look at skb->rxhash? I just started looking after > its many users but perhaps someone did this job already. You mean for IPv6 ? Most of us use the rxhash provided by the NIC, our software fallback is a best effort as well for RPS or some qdisc. If some hacker specifically cook packets hashing to same rxhash, thats fine. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 16:15 ` Eric Dumazet @ 2013-03-08 16:18 ` Hannes Frederic Sowa 0 siblings, 0 replies; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-08 16:18 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, yoshfuji On Fri, Mar 08, 2013 at 08:15:10AM -0800, Eric Dumazet wrote: > On Fri, 2013-03-08 at 16:54 +0100, Hannes Frederic Sowa wrote: > > > Btw. did s.o. have a look at skb->rxhash? I just started looking after > > its many users but perhaps someone did this job already. > > You mean for IPv6 ? Yes, because of the usage of ipv6_addr_hash in skb_flow_dissect. > Most of us use the rxhash provided by the NIC, our software fallback is > a best effort as well for RPS or some qdisc. > > If some hacker specifically cook packets hashing to same rxhash, thats > fine. Ok, good to know. Thanks! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 15:54 ` Hannes Frederic Sowa 2013-03-08 16:15 ` Eric Dumazet @ 2013-03-09 15:19 ` Hannes Frederic Sowa 1 sibling, 0 replies; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-09 15:19 UTC (permalink / raw) To: Eric Dumazet, netdev, yoshfuji On Fri, Mar 08, 2013 at 04:54:04PM +0100, Hannes Frederic Sowa wrote: > On Fri, Mar 08, 2013 at 07:23:39AM -0800, Eric Dumazet wrote: > > On Fri, 2013-03-08 at 16:08 +0100, Hannes Frederic Sowa wrote: > > > On Fri, Mar 08, 2013 at 06:53:06AM -0800, Eric Dumazet wrote: > > > > No matter how you hash, a hacker can easily fill your defrag unit with > > > > not complete datagrams, so what's the point ? > > > > > > I want to harden reassembly logic against all fragments being put in > > > the same hash bucket because of malicious traffic and thus creating > > > long list traversals in the fragment queue hash table. > > > > Note that the long traversal was a real issue with TCP (thats why I > > introduced ipv6_addr_jhash()), as a single ehash slot could contains > > thousand of sockets. > > > > But with fragments, we should just limit the depth of any particular > > slot, and drop above a particular threshold. > > > > reassembly is a best effort mechanism, better make sure it doesnt use > > all our cpu cycles. > > Hm, I have to think about it, especially because it is used in the netfilter > code. There could be some fairness issues if packets get dropped in netfilter > if reassembly could not be performed. I'll check this. There should be no fairness issues in the forwarding path because legitimate traffic should send a reasonable random 32bit fragmentation id which is a direct input to jhash. I thought about the list length limit this morning and I believe we have to dynamically compute it (maybe on sysctl change), because I bet people want to have their fragmentation cache utilized if they higher the memory thresholds (maybe dns resolvers with dnssec employed). The missing value is the average skb->truesize so I'll have to assume one. Any pointers on this? We could also export the limit to userspace and have the users deal with it. But I am not in favour of this solution. Should we do reclamation in the hash buckets (I would have to switch from hlist to list) or just drop incoming fragments (this should be fairly easy). Currently we discard fragments in lru fashion, so I think reclamation would be the way to go, too. Thanks, Hannes ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 15:23 ` Eric Dumazet 2013-03-08 15:54 ` Hannes Frederic Sowa @ 2013-03-08 20:53 ` Hannes Frederic Sowa 2013-03-13 1:27 ` Hannes Frederic Sowa 2 siblings, 0 replies; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-08 20:53 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, yoshfuji On Fri, Mar 08, 2013 at 07:23:39AM -0800, Eric Dumazet wrote: > On Fri, 2013-03-08 at 16:08 +0100, Hannes Frederic Sowa wrote: > > On Fri, Mar 08, 2013 at 06:53:06AM -0800, Eric Dumazet wrote: > > > No matter how you hash, a hacker can easily fill your defrag unit with > > > not complete datagrams, so what's the point ? > > > > I want to harden reassembly logic against all fragments being put in > > the same hash bucket because of malicious traffic and thus creating > > long list traversals in the fragment queue hash table. > > Note that the long traversal was a real issue with TCP (thats why I > introduced ipv6_addr_jhash()), as a single ehash slot could contains > thousand of sockets. > > But with fragments, we should just limit the depth of any particular > slot, and drop above a particular threshold. > > reassembly is a best effort mechanism, better make sure it doesnt use > all our cpu cycles. On my VM I counted 17500 iterations in one hash bucket and maxing out one CPU until I got rcu stalls and NMIs. In comparison: If I use the old hasing code I have max iterations of 370 and don't expirience any rcu stalls or NMIs (seems to be around 17500/64+-epsilon). I have not yet drawn my conclusion on this, yet, but I agree some list length limiting code would be useful independent of the hash function. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-08 15:23 ` Eric Dumazet 2013-03-08 15:54 ` Hannes Frederic Sowa 2013-03-08 20:53 ` Hannes Frederic Sowa @ 2013-03-13 1:27 ` Hannes Frederic Sowa 2013-03-13 1:31 ` Hannes Frederic Sowa 2013-03-13 5:29 ` Eric Dumazet 2 siblings, 2 replies; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-13 1:27 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, yoshfuji, brouer [cc'ing Jesper, too] On Fri, Mar 08, 2013 at 07:23:39AM -0800, Eric Dumazet wrote: > On Fri, 2013-03-08 at 16:08 +0100, Hannes Frederic Sowa wrote: > > On Fri, Mar 08, 2013 at 06:53:06AM -0800, Eric Dumazet wrote: > > > No matter how you hash, a hacker can easily fill your defrag unit with > > > not complete datagrams, so what's the point ? > > > > I want to harden reassembly logic against all fragments being put in > > the same hash bucket because of malicious traffic and thus creating > > long list traversals in the fragment queue hash table. > > Note that the long traversal was a real issue with TCP (thats why I > introduced ipv6_addr_jhash()), as a single ehash slot could contains > thousand of sockets. > > But with fragments, we should just limit the depth of any particular > slot, and drop above a particular threshold. [PATCH net-next RFC] inet: add max_depth to limit list length in inet_frags hash This does implement trivial drop for fragments where the hash queue is above some limit. I calculate the limit as follow: I averaged the folowing formula max_depth = max_threshold / INETFRAGS_HASHSZ / rounded up (SKB_TRUELEN(0) sizeof(struct ipq or struct frag_queue)) to max_threshold >> 15 So we start with a maximum list length of 128. I think we could halve this value to 64, but because I have no real performance data I left it at this higher value for now. This patch does only protect IPv6 (and not netfilter ipv6 defragmentation) and will switch off limit checking if max_depth is zero. I'll rewrite the check if we agree that this simple solution is the way to go (simple drop) and will clamp the minimum value to 1 as soon as I also migrated ipv4 and netfilter to the new sysctl handler. When testing this patch: Disable netfilter defragmenation for ipv6 on your machine if you test this patch, otherwise you won't see the improvment. Machine now runs smoothly under fragmentation dos. Ok if I target this patch for net next time because the hashing changes are in there already? Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> --- include/net/inet_frag.h | 13 +++++++++++++ net/ipv4/inet_fragment.c | 25 ++++++++++++++++++++++++- net/ipv6/reassembly.c | 6 +++++- 3 files changed, 42 insertions(+), 2 deletions(-) diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index 76c3fe5..9ba6ada 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -17,6 +17,7 @@ struct netns_frags { int timeout; int high_thresh; int low_thresh; + int max_depth; }; struct inet_frag_queue { @@ -43,6 +44,11 @@ struct inet_frag_queue { #define INETFRAGS_HASHSZ 64 +/* max_depth = max_threshold / INETFRAGS_HASHSZ / rounded up (SKB_TRUELEN(0) + + * sizeof(struct ipq or struct frag_queue)) + */ +#define INETFRAGS_MAXDEPTH_SHIFT 15 + struct inet_frags { struct hlist_head hash[INETFRAGS_HASHSZ]; /* This rwlock is a global lock (seperate per IPv4, IPv6 and @@ -144,4 +150,11 @@ static inline void inet_frag_lru_add(struct netns_frags *nf, list_add_tail(&q->lru_list, &nf->lru_list); spin_unlock(&nf->lru_lock); } + +#ifdef CONFIG_SYSCTL +int inet_frag_update_high_thresh(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos); +#endif + #endif diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 245ae07..92f1fdd 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -277,6 +277,7 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, __releases(&f->lock) { struct inet_frag_queue *q; + int depth = 0; hlist_for_each_entry(q, &f->hash[hash], list) { if (q->net == nf && f->match(q, key)) { @@ -284,9 +285,31 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, read_unlock(&f->lock); return q; } + depth++; } read_unlock(&f->lock); - return inet_frag_create(nf, f, key); + if (!nf->max_depth || depth <= nf->max_depth) + return inet_frag_create(nf, f, key); + else + return NULL; } EXPORT_SYMBOL(inet_frag_find); + +#ifdef CONFIG_SYSCTL +int inet_frag_update_high_thresh(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos) +{ + int ret; + ret = proc_dointvec(table, write, buffer, lenp, ppos); + + if (!ret && write && table->extra1) { + int *data = table->data; + int *max_depth = table->extra1; + *max_depth = *data >> INETFRAGS_MAXDEPTH_SHIFT; + } + return ret; +} +EXPORT_SYMBOL(inet_frag_update_high_thresh); +#endif diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 3c6a772..84b35f6 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -558,7 +558,8 @@ static struct ctl_table ip6_frags_ns_ctl_table[] = { .data = &init_net.ipv6.frags.high_thresh, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = inet_frag_update_high_thresh, + .extra1 = &init_net.ipv6.frags.max_depth }, { .procname = "ip6frag_low_thresh", @@ -600,6 +601,7 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net) goto err_alloc; table[0].data = &net->ipv6.frags.high_thresh; + table[0].extra1 = &net->ipv6.frags.max_depth; table[1].data = &net->ipv6.frags.low_thresh; table[2].data = &net->ipv6.frags.timeout; @@ -670,6 +672,8 @@ static int __net_init ipv6_frags_init_net(struct net *net) net->ipv6.frags.high_thresh = IPV6_FRAG_HIGH_THRESH; net->ipv6.frags.low_thresh = IPV6_FRAG_LOW_THRESH; net->ipv6.frags.timeout = IPV6_FRAG_TIMEOUT; + net->ipv6.frags.max_depth = + IPV6_FRAG_HIGH_THRESH >> INETFRAGS_MAXDEPTH_SHIFT; inet_frags_init_net(&net->ipv6.frags); -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-13 1:27 ` Hannes Frederic Sowa @ 2013-03-13 1:31 ` Hannes Frederic Sowa 2013-03-13 5:29 ` Eric Dumazet 1 sibling, 0 replies; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-13 1:31 UTC (permalink / raw) To: Eric Dumazet, netdev, yoshfuji, brouer On Wed, Mar 13, 2013 at 02:27:15AM +0100, Hannes Frederic Sowa wrote: > Disable netfilter defragmenation for ipv6 on your machine if you test > this patch, otherwise you won't see the improvment. Machine now runs > smoothly under fragmentation dos. Just to clarify: When I wrote dos above, I actually meant a dos where the addresses have a common ipv6_addr_hash. A simple test tool I wrote for testing is here: https://gist.github.com/hannes/5116331 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-13 1:27 ` Hannes Frederic Sowa 2013-03-13 1:31 ` Hannes Frederic Sowa @ 2013-03-13 5:29 ` Eric Dumazet 2013-03-14 1:37 ` Hannes Frederic Sowa 1 sibling, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2013-03-13 5:29 UTC (permalink / raw) To: Hannes Frederic Sowa; +Cc: netdev, yoshfuji, brouer On Wed, 2013-03-13 at 02:27 +0100, Hannes Frederic Sowa wrote: > > [PATCH net-next RFC] inet: add max_depth to limit list length in inet_frags hash > > This does implement trivial drop for fragments where the hash queue > is above some limit. > > I calculate the limit as follow: > > I averaged the folowing formula > > max_depth = max_threshold / INETFRAGS_HASHSZ / rounded up (SKB_TRUELEN(0) > sizeof(struct ipq or struct frag_queue)) > > to > > max_threshold >> 15 > > So we start with a maximum list length of 128. I think we could halve > this value to 64, but because I have no real performance data I left it > at this higher value for now. > > This patch does only protect IPv6 (and not netfilter ipv6 defragmentation) > and will switch off limit checking if max_depth is zero. I'll rewrite > the check if we agree that this simple solution is the way to go (simple > drop) and will clamp the minimum value to 1 as soon as I also migrated > ipv4 and netfilter to the new sysctl handler. > > When testing this patch: > > Disable netfilter defragmenation for ipv6 on your machine if you test > this patch, otherwise you won't see the improvment. Machine now runs > smoothly under fragmentation dos. > > Ok if I target this patch for net next time because the hashing changes > are in there already? > > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> > --- > include/net/inet_frag.h | 13 +++++++++++++ > net/ipv4/inet_fragment.c | 25 ++++++++++++++++++++++++- > net/ipv6/reassembly.c | 6 +++++- > 3 files changed, 42 insertions(+), 2 deletions(-) > > diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h > index 76c3fe5..9ba6ada 100644 > --- a/include/net/inet_frag.h > +++ b/include/net/inet_frag.h > @@ -17,6 +17,7 @@ struct netns_frags { > int timeout; > int high_thresh; > int low_thresh; > + int max_depth; > }; > > struct inet_frag_queue { > @@ -43,6 +44,11 @@ struct inet_frag_queue { > > #define INETFRAGS_HASHSZ 64 > > +/* max_depth = max_threshold / INETFRAGS_HASHSZ / rounded up (SKB_TRUELEN(0) + > + * sizeof(struct ipq or struct frag_queue)) > + */ > +#define INETFRAGS_MAXDEPTH_SHIFT 15 > + This looks like a bit complex to me, as we cant change INETFRAGS_HASHSZ from userland. I would just a use a predefined max depth of 128, or even less. If we have 128 items in a hash chain, then something is really wrong with our choices. > diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c > index 245ae07..92f1fdd 100644 > --- a/net/ipv4/inet_fragment.c > +++ b/net/ipv4/inet_fragment.c > @@ -277,6 +277,7 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, > __releases(&f->lock) > { > struct inet_frag_queue *q; > + int depth = 0; > > hlist_for_each_entry(q, &f->hash[hash], list) { > if (q->net == nf && f->match(q, key)) { > @@ -284,9 +285,31 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, > read_unlock(&f->lock); > return q; > } > + depth++; > } > read_unlock(&f->lock); > > - return inet_frag_create(nf, f, key); > + if (!nf->max_depth || depth <= nf->max_depth) > + return inet_frag_create(nf, f, key); > + else > + return NULL; > } I would issue a one one time warning in syslog when depth exceeds the limit. Thanks ! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-13 5:29 ` Eric Dumazet @ 2013-03-14 1:37 ` Hannes Frederic Sowa 2013-03-14 4:36 ` Stephen Hemminger 2013-03-14 7:10 ` Jesper Dangaard Brouer 0 siblings, 2 replies; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-14 1:37 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, yoshfuji, brouer On Wed, Mar 13, 2013 at 06:29:28AM +0100, Eric Dumazet wrote: > I would issue a one one time warning in syslog when depth exceeds the > limit. I addressed your suggestion to simplify this patch. I decided against a once message but used net_ratelimit() (as it was already used by the warning about no memory available). I don't have a strong opinion on that, just thought it could be a recurring event which would be worth reporting again because it should only happen on strange/malicous traffic patterns where admins should act. I based this patch on the net tree. Thanks! [PATCH net] inet: limit length of fragment queue hash table bucket lists This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat arbitrary and just ensures that we can fill up the fragment cache with empty packets up to the default ip_frag_high_thresh limits. It should just protect from list iteration eating considerable amounts of cpu. If we reach the maximum length in one hash bucket a warning is printed. This is implemented on the caller side of inet_frag_find to distinguish between the different users of inet_fragment.c. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> --- include/net/inet_frag.h | 30 ++++++++++++++++++++++++++++++ net/ipv4/inet_fragment.c | 7 ++++++- net/ipv4/ip_fragment.c | 9 ++------- net/ipv6/netfilter/nf_conntrack_reasm.c | 10 ++++------ net/ipv6/reassembly.c | 6 ++++-- 5 files changed, 46 insertions(+), 16 deletions(-) diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index 76c3fe5..0350468 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -43,6 +43,13 @@ struct inet_frag_queue { #define INETFRAGS_HASHSZ 64 +/* averaged: + * max_depth = default ipfrag_high_thresh / INETFRAGS_HASHSZ / + * rounded up (SKB_TRUELEN(0) + sizeof(struct ipq or + * struct frag_queue)) + */ +#define INETFRAGS_MAXDEPTH 128 + struct inet_frags { struct hlist_head hash[INETFRAGS_HASHSZ]; /* This rwlock is a global lock (seperate per IPv4, IPv6 and @@ -77,6 +84,29 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, struct inet_frags *f, void *key, unsigned int hash) __releases(&f->lock); +#define INET_FRAG_FIND_CHECK(val) \ + ({ \ + static const char ___mem[] = \ + KERN_ERR pr_fmt( \ + "inet_frag_find: No memory left." \ + " Dropping fragment.\n"); \ + static const char ___limit[] = \ + KERN_WARNING pr_fmt( \ + "inet_frag_find: Fragment hash bucket" \ + " list length grew above limit " \ + __stringify(INETFRAGS_MAXDEPTH) \ + ". Dropping fragment.\n"); \ + bool ___b = true; \ + if (IS_ERR_OR_NULL(val)) { \ + ___b = false; \ + if (PTR_ERR(val) == -ENOBUFS) \ + LIMIT_NETDEBUG(___limit); \ + else \ + LIMIT_NETDEBUG(___mem); \ + } \ + ___b; \ + }) + static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f) { if (atomic_dec_and_test(&q->refcnt)) diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 245ae07..0022a3e 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -277,6 +277,7 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, __releases(&f->lock) { struct inet_frag_queue *q; + int depth = 0; hlist_for_each_entry(q, &f->hash[hash], list) { if (q->net == nf && f->match(q, key)) { @@ -284,9 +285,13 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, read_unlock(&f->lock); return q; } + depth++; } read_unlock(&f->lock); - return inet_frag_create(nf, f, key); + if (depth <= INETFRAGS_MAXDEPTH) + return inet_frag_create(nf, f, key); + else + return ERR_PTR(-ENOBUFS); } EXPORT_SYMBOL(inet_frag_find); diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index b6d30ac..8533316 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -292,14 +292,9 @@ static inline struct ipq *ip_find(struct net *net, struct iphdr *iph, u32 user) hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol); q = inet_frag_find(&net->ipv4.frags, &ip4_frags, &arg, hash); - if (q == NULL) - goto out_nomem; - + if (!INET_FRAG_FIND_CHECK(q)) + return NULL; return container_of(q, struct ipq, q); - -out_nomem: - LIMIT_NETDEBUG(KERN_ERR pr_fmt("ip_frag_create: no memory left !\n")); - return NULL; } /* Is the fragment too far ahead to be part of ipq? */ diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 54087e9..f56468b 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -14,6 +14,8 @@ * 2 of the License, or (at your option) any later version. */ +#define pr_fmt(fmt) "IPv6-nf: " fmt + #include <linux/errno.h> #include <linux/types.h> #include <linux/string.h> @@ -180,13 +182,9 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id, q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash); local_bh_enable(); - if (q == NULL) - goto oom; - + if (!INET_FRAG_FIND_CHECK(q)) + return NULL; return container_of(q, struct frag_queue, q); - -oom: - return NULL; } diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 3c6a772..7dd0841 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -26,6 +26,9 @@ * YOSHIFUJI,H. @USAGI Always remove fragment header to * calculate ICV correctly. */ + +#define pr_fmt(fmt) "IPv6: " fmt + #include <linux/errno.h> #include <linux/types.h> #include <linux/string.h> @@ -185,9 +188,8 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src, const struct in6 hash = inet6_hash_frag(id, src, dst, ip6_frags.rnd); q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash); - if (q == NULL) + if (!INET_FRAG_FIND_CHECK(q)) return NULL; - return container_of(q, struct frag_queue, q); } -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 1:37 ` Hannes Frederic Sowa @ 2013-03-14 4:36 ` Stephen Hemminger 2013-03-14 7:14 ` Hannes Frederic Sowa 2013-03-14 7:10 ` Jesper Dangaard Brouer 1 sibling, 1 reply; 25+ messages in thread From: Stephen Hemminger @ 2013-03-14 4:36 UTC (permalink / raw) To: Eric Dumazet, netdev, yoshfuji, brouer > +#define INET_FRAG_FIND_CHECK(val) \ > + ({ \ > + static const char ___mem[] = \ > + KERN_ERR pr_fmt( \ > + "inet_frag_find: No memory left." \ > + " Dropping fragment.\n"); \ > + static const char ___limit[] = \ > + KERN_WARNING pr_fmt( \ > + "inet_frag_find: Fragment hash bucket" \ > + " list length grew above limit " \ > + __stringify(INETFRAGS_MAXDEPTH) \ > + ". Dropping fragment.\n"); \ > + bool ___b = true; \ > + if (IS_ERR_OR_NULL(val)) { \ > + ___b = false; \ > + if (PTR_ERR(val) == -ENOBUFS) \ > + LIMIT_NETDEBUG(___limit); \ > + else \ > + LIMIT_NETDEBUG(___mem); \ > + } \ > + ___b; \ > + }) > + Big macros suck, write it as an inline function or better yet a real function. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 4:36 ` Stephen Hemminger @ 2013-03-14 7:14 ` Hannes Frederic Sowa 2013-03-14 9:47 ` David Laight 0 siblings, 1 reply; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-14 7:14 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Eric Dumazet, netdev, yoshfuji, brouer On Wed, Mar 13, 2013 at 09:36:52PM -0700, Stephen Hemminger wrote: > > +#define INET_FRAG_FIND_CHECK(val) \ > > + ({ \ > > + static const char ___mem[] = \ > > + KERN_ERR pr_fmt( \ > > + "inet_frag_find: No memory left." \ > > + " Dropping fragment.\n"); \ > > + static const char ___limit[] = \ > > + KERN_WARNING pr_fmt( \ > > + "inet_frag_find: Fragment hash bucket" \ > > + " list length grew above limit " \ > > + __stringify(INETFRAGS_MAXDEPTH) \ > > + ". Dropping fragment.\n"); \ > > + bool ___b = true; \ > > + if (IS_ERR_OR_NULL(val)) { \ > > + ___b = false; \ > > + if (PTR_ERR(val) == -ENOBUFS) \ > > + LIMIT_NETDEBUG(___limit); \ > > + else \ > > + LIMIT_NETDEBUG(___mem); \ > > + } \ > > + ___b; \ > > + }) > > + > > Big macros suck, write it as an inline function or better yet a real function. I switched to the macro to have string expansion with pr_fmt. So it is visible from the dmesg if IPv4, IPv6 or IPv6-nf did generate the message. This could be done with a function, too, but would require a bit more string handling. ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 7:14 ` Hannes Frederic Sowa @ 2013-03-14 9:47 ` David Laight 2013-03-14 10:34 ` Eric Dumazet 2013-03-14 12:34 ` Hannes Frederic Sowa 0 siblings, 2 replies; 25+ messages in thread From: David Laight @ 2013-03-14 9:47 UTC (permalink / raw) To: Hannes Frederic Sowa, Stephen Hemminger Cc: Eric Dumazet, netdev, yoshfuji, brouer > On Wed, Mar 13, 2013 at 09:36:52PM -0700, Stephen Hemminger wrote: > > > +#define INET_FRAG_FIND_CHECK(val) \ > > > + ({ \ > > > + static const char ___mem[] = \ > > > + KERN_ERR pr_fmt( \ > > > + "inet_frag_find: No memory left." \ > > > + " Dropping fragment.\n"); \ > > > + static const char ___limit[] = \ > > > + KERN_WARNING pr_fmt( \ > > > + "inet_frag_find: Fragment hash bucket" \ > > > + " list length grew above limit " \ > > > + __stringify(INETFRAGS_MAXDEPTH) \ > > > + ". Dropping fragment.\n"); \ > > > + bool ___b = true; \ > > > + if (IS_ERR_OR_NULL(val)) { \ > > > + ___b = false; \ > > > + if (PTR_ERR(val) == -ENOBUFS) \ > > > + LIMIT_NETDEBUG(___limit); \ > > > + else \ > > > + LIMIT_NETDEBUG(___mem); \ > > > + } \ > > > + ___b; \ > > > + }) > > > + > > > > Big macros suck, write it as an inline function or better yet a real function. > > I switched to the macro to have string expansion with pr_fmt. So it is visible > from the dmesg if IPv4, IPv6 or IPv6-nf did generate the message. This could > be done with a function, too, but would require a bit more string handling. I'd guess it would be best to have the IS_ERR_OR_NULL() inline calling a real function on error. I'd also have thought that INETFRAGS_MAXDEPTH should be run-time tunable. David ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 9:47 ` David Laight @ 2013-03-14 10:34 ` Eric Dumazet 2013-03-14 12:34 ` Hannes Frederic Sowa 1 sibling, 0 replies; 25+ messages in thread From: Eric Dumazet @ 2013-03-14 10:34 UTC (permalink / raw) To: David Laight Cc: Hannes Frederic Sowa, Stephen Hemminger, netdev, yoshfuji, brouer On Thu, 2013-03-14 at 09:47 +0000, David Laight wrote: > I'd also have thought that INETFRAGS_MAXDEPTH should be run-time tunable. The useful thing would be to be able to resize the hash table, for some heavy user cases. Check net/ipv4/tcp_metrics.c : Not only we use a small depth, but is not tunable. #define TCP_METRICS_RECLAIM_DEPTH 5 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 9:47 ` David Laight 2013-03-14 10:34 ` Eric Dumazet @ 2013-03-14 12:34 ` Hannes Frederic Sowa 1 sibling, 0 replies; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-14 12:34 UTC (permalink / raw) To: David Laight; +Cc: Stephen Hemminger, Eric Dumazet, netdev, yoshfuji, brouer On Thu, Mar 14, 2013 at 09:47:24AM -0000, David Laight wrote: > > On Wed, Mar 13, 2013 at 09:36:52PM -0700, Stephen Hemminger wrote: > > > > +#define INET_FRAG_FIND_CHECK(val) \ > > > > + ({ \ > > > > + static const char ___mem[] = \ > > > > + KERN_ERR pr_fmt( \ > > > > + "inet_frag_find: No memory left." \ > > > > + " Dropping fragment.\n"); \ > > > > + static const char ___limit[] = \ > > > > + KERN_WARNING pr_fmt( \ > > > > + "inet_frag_find: Fragment hash bucket" \ > > > > + " list length grew above limit " \ > > > > + __stringify(INETFRAGS_MAXDEPTH) \ > > > > + ". Dropping fragment.\n"); \ > > > > + bool ___b = true; \ > > > > + if (IS_ERR_OR_NULL(val)) { \ > > > > + ___b = false; \ > > > > + if (PTR_ERR(val) == -ENOBUFS) \ > > > > + LIMIT_NETDEBUG(___limit); \ > > > > + else \ > > > > + LIMIT_NETDEBUG(___mem); \ > > > > + } \ > > > > + ___b; \ > > > > + }) > > > > + > > > > > > Big macros suck, write it as an inline function or better yet a real function. > > > > I switched to the macro to have string expansion with pr_fmt. So it is visible > > from the dmesg if IPv4, IPv6 or IPv6-nf did generate the message. This could > > be done with a function, too, but would require a bit more string handling. > > I'd guess it would be best to have the IS_ERR_OR_NULL() inline calling a > real function on error. I'll explore later on how to achieve this. My rational was to not do dynamic string handling to build the error messages. Perhaps I can achieve both at the same time. :) > I'd also have thought that INETFRAGS_MAXDEPTH should be run-time tunable. 128 is actually a pretty high limit. Have you seen the patch I posted before this one? I tried to come up with a solution to dynamically calculate max_depth based on the max_thresh of the fragmentation cache. What do you think about that solution? Thanks! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 1:37 ` Hannes Frederic Sowa 2013-03-14 4:36 ` Stephen Hemminger @ 2013-03-14 7:10 ` Jesper Dangaard Brouer 2013-03-14 7:23 ` Hannes Frederic Sowa 1 sibling, 1 reply; 25+ messages in thread From: Jesper Dangaard Brouer @ 2013-03-14 7:10 UTC (permalink / raw) To: Hannes Frederic Sowa; +Cc: Eric Dumazet, netdev, yoshfuji On Thu, 2013-03-14 at 02:37 +0100, Hannes Frederic Sowa wrote: > [PATCH net] inet: limit length of fragment queue hash table bucket lists > > This patch introduces a constant limit of the fragment queue hash > table bucket list lengths. Currently the limit 128 is choosen somewhat > arbitrary and just ensures that we can fill up the fragment cache with > empty packets up to the default ip_frag_high_thresh limits. It should > just protect from list iteration eating considerable amounts of cpu. > > If we reach the maximum length in one hash bucket a warning is printed. > This is implemented on the caller side of inet_frag_find to distinguish > between the different users of inet_fragment.c. I like the idea of having a safe guard on the fragment queue hash table bucket list lengths. But I'm considering another cleanup/evictor strategy, where we drop the LRU list, and do frag eviction on a hash bucket level (which will be more cache optimal). This strategy would also involve a list length limit. -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 7:10 ` Jesper Dangaard Brouer @ 2013-03-14 7:23 ` Hannes Frederic Sowa 2013-03-14 7:28 ` Hannes Frederic Sowa 0 siblings, 1 reply; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-14 7:23 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: Eric Dumazet, netdev, yoshfuji On Thu, Mar 14, 2013 at 08:10:40AM +0100, Jesper Dangaard Brouer wrote: > On Thu, 2013-03-14 at 02:37 +0100, Hannes Frederic Sowa wrote: > > > [PATCH net] inet: limit length of fragment queue hash table bucket lists > > > > This patch introduces a constant limit of the fragment queue hash > > table bucket list lengths. Currently the limit 128 is choosen somewhat > > arbitrary and just ensures that we can fill up the fragment cache with > > empty packets up to the default ip_frag_high_thresh limits. It should > > just protect from list iteration eating considerable amounts of cpu. > > > > If we reach the maximum length in one hash bucket a warning is printed. > > This is implemented on the caller side of inet_frag_find to distinguish > > between the different users of inet_fragment.c. > > I like the idea of having a safe guard on the fragment queue hash table > bucket list lengths. But I'm considering another cleanup/evictor > strategy, where we drop the LRU list, and do frag eviction on a hash > bucket level (which will be more cache optimal). This strategy would > also involve a list length limit. I would try to get a simple guard into v3.9. In 3.9 the hashing of the key of ipv6 fragments changed in such a way that an attacker could generate fragments which would end up in just one hash chain, thus eating a lot of cpu time because of list traversal. Later on, when you post your patches we could simply revert/update this safeguard to your version. Thanks, Hannes ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 7:23 ` Hannes Frederic Sowa @ 2013-03-14 7:28 ` Hannes Frederic Sowa 2013-03-14 9:18 ` Jesper Dangaard Brouer 0 siblings, 1 reply; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-14 7:28 UTC (permalink / raw) To: Jesper Dangaard Brouer, Eric Dumazet, netdev, yoshfuji On Thu, Mar 14, 2013 at 08:23:41AM +0100, Hannes Frederic Sowa wrote: > On Thu, Mar 14, 2013 at 08:10:40AM +0100, Jesper Dangaard Brouer wrote: > > On Thu, 2013-03-14 at 02:37 +0100, Hannes Frederic Sowa wrote: > > > > > [PATCH net] inet: limit length of fragment queue hash table bucket lists > > > > > > This patch introduces a constant limit of the fragment queue hash > > > table bucket list lengths. Currently the limit 128 is choosen somewhat > > > arbitrary and just ensures that we can fill up the fragment cache with > > > empty packets up to the default ip_frag_high_thresh limits. It should > > > just protect from list iteration eating considerable amounts of cpu. > > > > > > If we reach the maximum length in one hash bucket a warning is printed. > > > This is implemented on the caller side of inet_frag_find to distinguish > > > between the different users of inet_fragment.c. > > > > I like the idea of having a safe guard on the fragment queue hash table > > bucket list lengths. But I'm considering another cleanup/evictor > > strategy, where we drop the LRU list, and do frag eviction on a hash > > bucket level (which will be more cache optimal). This strategy would > > also involve a list length limit. > > I would try to get a simple guard into v3.9. In 3.9 the hashing of the key > of ipv6 fragments changed in such a way that an attacker could generate > fragments which would end up in just one hash chain, thus eating a lot > of cpu time because of list traversal. Later on, when you post your > patches we could simply revert/update this safeguard to your version. I just wanted to mention that if you plan to target v3.9 with some patches we could simply drop this patch. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 7:28 ` Hannes Frederic Sowa @ 2013-03-14 9:18 ` Jesper Dangaard Brouer 2013-03-14 12:45 ` Hannes Frederic Sowa 0 siblings, 1 reply; 25+ messages in thread From: Jesper Dangaard Brouer @ 2013-03-14 9:18 UTC (permalink / raw) To: Hannes Frederic Sowa; +Cc: Eric Dumazet, netdev, yoshfuji On Thu, 2013-03-14 at 08:28 +0100, Hannes Frederic Sowa wrote: > On Thu, Mar 14, 2013 at 08:23:41AM +0100, Hannes Frederic Sowa wrote: > > On Thu, Mar 14, 2013 at 08:10:40AM +0100, Jesper Dangaard Brouer wrote: > > > On Thu, 2013-03-14 at 02:37 +0100, Hannes Frederic Sowa wrote: > > > > > > > [PATCH net] inet: limit length of fragment queue hash table bucket lists > > > > > > > > This patch introduces a constant limit of the fragment queue hash > > > > table bucket list lengths. Currently the limit 128 is choosen somewhat > > > > arbitrary and just ensures that we can fill up the fragment cache with > > > > empty packets up to the default ip_frag_high_thresh limits. It should > > > > just protect from list iteration eating considerable amounts of cpu. > > > > > > > > If we reach the maximum length in one hash bucket a warning is printed. > > > > This is implemented on the caller side of inet_frag_find to distinguish > > > > between the different users of inet_fragment.c. > > > > > > I like the idea of having a safe guard on the fragment queue hash table > > > bucket list lengths. But I'm considering another cleanup/evictor > > > strategy, where we drop the LRU list, and do frag eviction on a hash > > > bucket level (which will be more cache optimal). This strategy would > > > also involve a list length limit. > > > > I would try to get a simple guard into v3.9. In 3.9 the hashing of the key > > of ipv6 fragments changed in such a way that an attacker could generate > > fragments which would end up in just one hash chain, thus eating a lot > > of cpu time because of list traversal. Later on, when you post your > > patches we could simply revert/update this safeguard to your version. > > I just wanted to mention that if you plan to target v3.9 with some patches we > could simply drop this patch. I will start working on this as soon as Netfilter Workshop is over and I have recovered from organizing this event. DaveM told me I needed to finish my frag patches first, before I could implement all the other cool ideas we have come up with during the workshop ;-) But I don't know if my frag changes can make v3.9, maybe v3.10 is more realistic? In which case we should use your patch to close this problem on v3.9 IMHO. -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table 2013-03-14 9:18 ` Jesper Dangaard Brouer @ 2013-03-14 12:45 ` Hannes Frederic Sowa 0 siblings, 0 replies; 25+ messages in thread From: Hannes Frederic Sowa @ 2013-03-14 12:45 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: Eric Dumazet, netdev, yoshfuji On Thu, Mar 14, 2013 at 10:18:10AM +0100, Jesper Dangaard Brouer wrote: > I will start working on this as soon as Netfilter Workshop is over and I > have recovered from organizing this event. DaveM told me I needed to > finish my frag patches first, before I could implement all the other > cool ideas we have come up with during the workshop ;-) Yeah, I have had a look at your patches, looks promising. Definitely worth that you get some pressure to finish them. :) > But I don't know if my frag changes can make v3.9, maybe v3.10 is more > realistic? In which case we should use your patch to close this problem > on v3.9 IMHO. Sounds good. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2013-03-14 12:45 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-07 21:42 [PATCH RFC] ipv6: use stronger hash for reassembly queue hash table Hannes Frederic Sowa 2013-03-08 5:57 ` Hannes Frederic Sowa 2013-03-08 13:04 ` Hannes Frederic Sowa 2013-03-08 14:53 ` Eric Dumazet 2013-03-08 15:08 ` Hannes Frederic Sowa 2013-03-08 15:23 ` Eric Dumazet 2013-03-08 15:54 ` Hannes Frederic Sowa 2013-03-08 16:15 ` Eric Dumazet 2013-03-08 16:18 ` Hannes Frederic Sowa 2013-03-09 15:19 ` Hannes Frederic Sowa 2013-03-08 20:53 ` Hannes Frederic Sowa 2013-03-13 1:27 ` Hannes Frederic Sowa 2013-03-13 1:31 ` Hannes Frederic Sowa 2013-03-13 5:29 ` Eric Dumazet 2013-03-14 1:37 ` Hannes Frederic Sowa 2013-03-14 4:36 ` Stephen Hemminger 2013-03-14 7:14 ` Hannes Frederic Sowa 2013-03-14 9:47 ` David Laight 2013-03-14 10:34 ` Eric Dumazet 2013-03-14 12:34 ` Hannes Frederic Sowa 2013-03-14 7:10 ` Jesper Dangaard Brouer 2013-03-14 7:23 ` Hannes Frederic Sowa 2013-03-14 7:28 ` Hannes Frederic Sowa 2013-03-14 9:18 ` Jesper Dangaard Brouer 2013-03-14 12:45 ` Hannes Frederic Sowa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).