Re: [net-next PATCH 2/3] net: fix enforcing of fragment queue hash list depth

Netdev List
 help / color / mirror / Atom feed

From: Eric Dumazet <eric.dumazet@gmail.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	netdev@vger.kernel.org
Subject: Re: [net-next PATCH 2/3] net: fix enforcing of fragment queue hash list depth
Date: Fri, 19 Apr 2013 03:11:27 -0700	[thread overview]
Message-ID: <1366366287.3205.98.camel@edumazet-glaptop> (raw)
In-Reply-To: <20130418213732.14296.36026.stgit@dragon>

On Thu, 2013-04-18 at 23:38 +0200, Jesper Dangaard Brouer wrote:
> I have found an issues with commit:
> 
>  commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055
>  Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
>  Date:   Fri Mar 15 11:32:30 2013 +0000
> 
>     inet: limit length of fragment queue hash table bucket lists
> 
> There is a connection between the fixed 128 hash depth limit and the
> frag mem limit/thresh settings, which limits how high the thresh can
> be set.
> 
> The 128 elems hash depth limit, results in bad behaviour if mem limit
> thresh holds are increased, via /proc/sys/net ::
> 
>  /proc/sys/net/ipv4/ipfrag_high_thresh
>  /proc/sys/net/ipv4/ipfrag_low_thresh
> 
> If we increase the thresh, to something allowing 128 elements in each
> bucket, which is not that high given the hash array size of 64
> (64*128=8192), e.g.
>   big MTU frags (2944(truesize)+208(ipq))*8192(max elems)=25755648
>   small frags   ( 896(truesize)+208(ipq))*8192(max elems)=9043968
> 
> The problem with commit 5a3da1fe (inet: limit length of fragment queue
> hash table bucket lists) is that, once we hit the limit, the we *keep*
> the existing frag queues, not allowing new frag queues to be created.
> Thus, an attacker can effectivly block handling of fragments for 30
> sec (as each frag queue have a timeout of 30 sec).
> 
> Even without increasing the limit, as Hannes showed, an attacker on
> IPv6 can "attack" a specific hash bucket, and via that change, can
> block/drop new fragments also (trying to) utilize this bucket.
> 
> Summary:
>  With the default mem limit/thresh settings, this is not general
> problem, but adjusting the thresh limits result in some-what
> unexpected behavior.
> 
> Proposed solution:
>  IMHO instead of keeping existing frag queues, we should kill one of
> the frag queues in the hash instead.


This strategy wont really help DDOS attacks. No frag will ever complete.

I am not sure its worth adding extra complexity.

> 
> Implementation complications:
>  Killing of frag queues while only holding the hash bucket lock, and
> not the frag queue lock, complicates the implementation, as we race
> and can end up (trying to) remove the hash element twice (resulting in
> an oops). This have been addressed by using hlist_del_init() and a
> hlist_unhashed() check in fq_unlink_hash().
> 
> Extra:
> * Added new sysctl "max_hash_depth" option, to allow users to adjust the hash
>   depth along with adjusting the thresh limits.
> * Change max hash depth to 32, thus limit handling to approx 2048 frag queues.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
> 
>  include/net/inet_frag.h                 |    9 +---
>  net/ipv4/inet_fragment.c                |   64 ++++++++++++++++++++-----------
>  net/ipv4/ip_fragment.c                  |   13 +++++-
>  net/ipv6/netfilter/nf_conntrack_reasm.c |    5 +-
>  net/ipv6/reassembly.c                   |   15 ++++++-
>  5 files changed, 68 insertions(+), 38 deletions(-)

Hmm... adding a new sysctl without documentation is a clear sign you'll
be the only user of it.

You are also setting a default limit of 32, more likely to hit the
problem than current 128 value.

We know the real solution is to have a correctly sized hash table, so
why adding a temporary sysctl ?

As soon as /proc/sys/net/ipv4/ipfrag_high_thresh is changed, a resize
should be attempted.

But the max depth itself should be a reasonable value, and doesn't need
to be tuned IMHO.

The 64 slots hash table was chosen years ago, when machines had 3 order
of magnitude less ram than today.

Before hash resizing, I would just bump hash size to something more
reasonable like 1024.

That would allow some admin to set /proc/sys/net/ipv4/ipfrag_high_thresh
to a quite large value.

next prev parent reply	other threads:[~2013-04-19 10:11 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-18 21:37 [net-next PATCH 0/3] net: frag code fixes and RFC for LRU removal Jesper Dangaard Brouer
2013-04-18 21:37 ` [net-next PATCH 1/3] net: fix race bug in fragmentation create code Jesper Dangaard Brouer
2013-04-19  1:00   ` Hannes Frederic Sowa
2013-04-19  8:09     ` Jesper Dangaard Brouer
2013-04-18 21:38 ` [net-next PATCH 2/3] net: fix enforcing of fragment queue hash list depth Jesper Dangaard Brouer
2013-04-19  0:52   ` Hannes Frederic Sowa
2013-04-19 10:11   ` Eric Dumazet [this message]
2013-04-19 10:41     ` David Laight
2013-04-19 11:14       ` Eric Dumazet
2013-04-19 12:19     ` Jesper Dangaard Brouer
2013-04-19 12:45       ` Hannes Frederic Sowa
2013-04-19 14:29         ` Jesper Dangaard Brouer
2013-04-19 15:06           ` Hannes Frederic Sowa
2013-04-19 19:44           ` Hannes Frederic Sowa
2013-04-22  9:10             ` Jesper Dangaard Brouer
2013-04-22 14:54               ` Hannes Frederic Sowa
2013-04-22 16:30                 ` Jesper Dangaard Brouer
2013-04-22 17:49                 ` Jesper Dangaard Brouer
2013-04-23  0:20                   ` Hannes Frederic Sowa
2013-04-23 14:19                     ` Jesper Dangaard Brouer
2013-04-23 20:54                       ` Hannes Frederic Sowa
2013-04-19 14:42       ` Eric Dumazet
2013-04-19 14:45       ` Eric Dumazet
2013-04-19 14:45       ` Eric Dumazet
2013-04-19 14:49       ` Eric Dumazet
2013-04-24 13:35         ` Jesper Dangaard Brouer
2013-04-24 15:05           ` Eric Dumazet
2013-04-18 21:39 ` [RFC net-next PATCH 3/3] net: remove fragmentation LRU list system Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1366366287.3205.98.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox