From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [net-next PATCH 2/3] net: fix enforcing of fragment queue hash list depth Date: Fri, 19 Apr 2013 03:11:27 -0700 Message-ID: <1366366287.3205.98.camel@edumazet-glaptop> References: <20130418213637.14296.43143.stgit@dragon> <20130418213732.14296.36026.stgit@dragon> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: "David S. Miller" , Hannes Frederic Sowa , netdev@vger.kernel.org To: Jesper Dangaard Brouer Return-path: Received: from mail-pd0-f176.google.com ([209.85.192.176]:53413 "EHLO mail-pd0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754804Ab3DSKLb (ORCPT ); Fri, 19 Apr 2013 06:11:31 -0400 Received: by mail-pd0-f176.google.com with SMTP id r11so2137536pdi.7 for ; Fri, 19 Apr 2013 03:11:30 -0700 (PDT) In-Reply-To: <20130418213732.14296.36026.stgit@dragon> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 2013-04-18 at 23:38 +0200, Jesper Dangaard Brouer wrote: > I have found an issues with commit: > > commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055 > Author: Hannes Frederic Sowa > Date: Fri Mar 15 11:32:30 2013 +0000 > > inet: limit length of fragment queue hash table bucket lists > > There is a connection between the fixed 128 hash depth limit and the > frag mem limit/thresh settings, which limits how high the thresh can > be set. > > The 128 elems hash depth limit, results in bad behaviour if mem limit > thresh holds are increased, via /proc/sys/net :: > > /proc/sys/net/ipv4/ipfrag_high_thresh > /proc/sys/net/ipv4/ipfrag_low_thresh > > If we increase the thresh, to something allowing 128 elements in each > bucket, which is not that high given the hash array size of 64 > (64*128=8192), e.g. > big MTU frags (2944(truesize)+208(ipq))*8192(max elems)=25755648 > small frags ( 896(truesize)+208(ipq))*8192(max elems)=9043968 > > The problem with commit 5a3da1fe (inet: limit length of fragment queue > hash table bucket lists) is that, once we hit the limit, the we *keep* > the existing frag queues, not allowing new frag queues to be created. > Thus, an attacker can effectivly block handling of fragments for 30 > sec (as each frag queue have a timeout of 30 sec). > > Even without increasing the limit, as Hannes showed, an attacker on > IPv6 can "attack" a specific hash bucket, and via that change, can > block/drop new fragments also (trying to) utilize this bucket. > > Summary: > With the default mem limit/thresh settings, this is not general > problem, but adjusting the thresh limits result in some-what > unexpected behavior. > > Proposed solution: > IMHO instead of keeping existing frag queues, we should kill one of > the frag queues in the hash instead. This strategy wont really help DDOS attacks. No frag will ever complete. I am not sure its worth adding extra complexity. > > Implementation complications: > Killing of frag queues while only holding the hash bucket lock, and > not the frag queue lock, complicates the implementation, as we race > and can end up (trying to) remove the hash element twice (resulting in > an oops). This have been addressed by using hlist_del_init() and a > hlist_unhashed() check in fq_unlink_hash(). > > Extra: > * Added new sysctl "max_hash_depth" option, to allow users to adjust the hash > depth along with adjusting the thresh limits. > * Change max hash depth to 32, thus limit handling to approx 2048 frag queues. > > Signed-off-by: Jesper Dangaard Brouer > --- > > include/net/inet_frag.h | 9 +--- > net/ipv4/inet_fragment.c | 64 ++++++++++++++++++++----------- > net/ipv4/ip_fragment.c | 13 +++++- > net/ipv6/netfilter/nf_conntrack_reasm.c | 5 +- > net/ipv6/reassembly.c | 15 ++++++- > 5 files changed, 68 insertions(+), 38 deletions(-) Hmm... adding a new sysctl without documentation is a clear sign you'll be the only user of it. You are also setting a default limit of 32, more likely to hit the problem than current 128 value. We know the real solution is to have a correctly sized hash table, so why adding a temporary sysctl ? As soon as /proc/sys/net/ipv4/ipfrag_high_thresh is changed, a resize should be attempted. But the max depth itself should be a reasonable value, and doesn't need to be tuned IMHO. The 64 slots hash table was chosen years ago, when machines had 3 order of magnitude less ram than today. Before hash resizing, I would just bump hash size to something more reasonable like 1024. That would allow some admin to set /proc/sys/net/ipv4/ipfrag_high_thresh to a quite large value.