Re: [net-next PATCH 2/3] net: fix enforcing of fragment queue hash list depth

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org
Subject: Re: [net-next PATCH 2/3] net: fix enforcing of fragment queue hash list depth
Date: Mon, 22 Apr 2013 19:49:12 +0200	[thread overview]
Message-ID: <1366652952.26911.334.camel@localhost> (raw)
In-Reply-To: <20130422145431.GA26838@order.stressinduktion.org>

On Mon, 2013-04-22 at 16:54 +0200, Hannes Frederic Sowa wrote:
> On Mon, Apr 22, 2013 at 11:10:34AM +0200, Jesper Dangaard Brouer wrote:
> > (To avoid pissing people off) I acknowledge that we should change the
> > hash size, as its ridiculously small with 64 entries.
> > 
> > But your mem limit assumption and hash depth limit assumptions are
> > broken, because the mem limit is per netns (network namespace).
> > Thus, starting more netns instances will break these assumptions.
> 
> Oh, I see. :/
> 
> At first I thought we should make the fragment hash per namespace too,
> to provide better isolation in case of lxc. But then each chrome tab
> would allocate its own fragment cache, too. Hmm... but people using
> namespaces have plenty much memory, don't they? We could also provide
> an inet_fragment namespace. ;)

I'm wondering if we could do the opposite, move the mem limit and LRU
list "out-of" the netns?
Either way, this would make the relationship of the mem limit and hash
size more sane.

> > The dangerous part of your change (commit 5a3da1fe) is that you keep the
> > existing frag queues (and don't allow new frag queues to be created).
> > The attackers fragments will never finish (timeout 30 sec), while valid
> > fragments will complete and "exit" the queue, thus the end result is
> > hash bucket is filled with attackers invalid/incomplete fragments.
> 
> I would not mind if your change gets accepted (I have not completyl
> reviewed it yet), but I have my doubts if it is an advantage to the
> current solution.
> 
> First off, I think an attacker can keep the fragment cache pretty much
> filled up with little cost. The current implementation has the grace
> period where no new fragments will be accepted after the DoS, this is
> solved by your patch. But the change makes it easier for an attacker to
> evict "valid" fragments from the cache in the first 30 seconds of the
> DoS, too.

The "grace period" is quite harmful (where no new fragments will be
accepted).  Just creating 3 netns (3x 4MB mem limit) we can make all
queues reach 128 entries, resulting in a "grace period" of 30 sec where
no frags are possible. (min frag size is 1108 bytes with my trafgen
script).

> I am not sure whether the current fragmentation handling or your solution
> does perform better in real world (or if it actually matters).
> 
> Nonetheless it does add a bit more complexity and a new sysctl which does
> expose something the kernel should know how to do better.

Well, actually I don't like exposing the max_hash_depth sysctl, it was a
wrong idea/move.

I like Eric's idea of resizing the hash based on the max thresh,
unfortunately this does not make sense when the max thresh is per netns
and the hash table is global.

I'm also thinking, it is really worth the complexity of having a depth
limit on this hash table?  Is it that important.  The mem limit should
at some point kick in and save the day anyhow (before, without per hash
bucket locking it might make sense).

> > Besides, after we have implemented per hash bucket locking (in my change
> > commit 19952cc4 "net: frag queue per hash bucket locking").
> > Then, I don't think it is a big problem that a single hash bucket is
> > being "attacked".
> 
> I don't know, I wouldn't say so. The contention point is now the per
> hash bucket lock but it should show the same symptoms as before.
> 
> In my opinion we should start resizing the hash table irrespective of
> the namespace limits (one needs CAP_NET_ADMIN to connect a netns to
> the outside world, I think) and try to move forward with Patch 3. This
> patch 2 would then only be a dependency and would introduce the eviction
> strategy you need for patch 3. But the focus should be on the removal of the
> lru cleanup. What do you think?

I agree, increasing the hash tables size makes sense, as its
ridiculously small with 64 entries.

Yes, removal of the (per netns) "global" LRU list should be the real
focus.  This was just a dependency for introducing the eviction strategy
I needed for patch 3.

But the eviction strategy in patch-3, is actually also "broken" because
the mem limit is per netns and we do eviction based on the netns shared
hash table... thinking of going back to my original idea of simply doing
LRU lists per CPU.

--Jesper

next prev parent reply	other threads:[~2013-04-22 17:49 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-18 21:37 [net-next PATCH 0/3] net: frag code fixes and RFC for LRU removal Jesper Dangaard Brouer
2013-04-18 21:37 ` [net-next PATCH 1/3] net: fix race bug in fragmentation create code Jesper Dangaard Brouer
2013-04-19  1:00   ` Hannes Frederic Sowa
2013-04-19  8:09     ` Jesper Dangaard Brouer
2013-04-18 21:38 ` [net-next PATCH 2/3] net: fix enforcing of fragment queue hash list depth Jesper Dangaard Brouer
2013-04-19  0:52   ` Hannes Frederic Sowa
2013-04-19 10:11   ` Eric Dumazet
2013-04-19 10:41     ` David Laight
2013-04-19 11:14       ` Eric Dumazet
2013-04-19 12:19     ` Jesper Dangaard Brouer
2013-04-19 12:45       ` Hannes Frederic Sowa
2013-04-19 14:29         ` Jesper Dangaard Brouer
2013-04-19 15:06           ` Hannes Frederic Sowa
2013-04-19 19:44           ` Hannes Frederic Sowa
2013-04-22  9:10             ` Jesper Dangaard Brouer
2013-04-22 14:54               ` Hannes Frederic Sowa
2013-04-22 16:30                 ` Jesper Dangaard Brouer
2013-04-22 17:49                 ` Jesper Dangaard Brouer [this message]
2013-04-23  0:20                   ` Hannes Frederic Sowa
2013-04-23 14:19                     ` Jesper Dangaard Brouer
2013-04-23 20:54                       ` Hannes Frederic Sowa
2013-04-19 14:42       ` Eric Dumazet
2013-04-19 14:45       ` Eric Dumazet
2013-04-19 14:45       ` Eric Dumazet
2013-04-19 14:49       ` Eric Dumazet
2013-04-24 13:35         ` Jesper Dangaard Brouer
2013-04-24 15:05           ` Eric Dumazet
2013-04-18 21:39 ` [RFC net-next PATCH 3/3] net: remove fragmentation LRU list system Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1366652952.26911.334.camel@localhost \
    --to=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).