netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: josh@joshtriplett.org
To: Thomas Graf <tgraf@suug.ch>
Cc: David Miller <davem@davemloft.net>,
	kaber@trash.net, paulmck@linux.vnet.ibm.com,
	alexei.starovoitov@gmail.com, herbert@gondor.apana.org.au,
	ying.xue@windriver.com, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org
Subject: Re: Ottawa and slow hash-table resize
Date: Tue, 24 Feb 2015 10:33:46 -0800	[thread overview]
Message-ID: <20150224183346.GA10713@cloud> (raw)
In-Reply-To: <20150224175014.GA29802@casper.infradead.org>

On Tue, Feb 24, 2015 at 05:50:14PM +0000, Thomas Graf wrote:
> On 02/24/15 at 12:09pm, David Miller wrote:
> > And having a flood of 1 million new TCP connections all at once
> > shouldn't knock us over.
> > 
> > Therefore, we will need to find a way to handle this problem without
> > being able to block on insert.
> 
> One possible way to handle this is to have users like TCP grow
> quicker than 2x. Maybe start with 16x and grow slower and slower
> using a log function. (No, we do not want rhashtable congestion
> control algos ;-)
> 
> > Thinking about this, if inserts occur during a pending resize, if the
> > nelems of the table has exceeded even the grow threshold for the new
> > table, it makes no sense to allow these async inserts as they are
> > going to make the resize take longer and prolong the pain.
> 
> Let's say we start with an initial table size of 16K (we can make
> this system memory depenend) and we grow by 8x. New inserts go
> into the new table immediately so as soon as we have 12K entries
> we'll grow right to 128K buckets. As we grow above 75K we'll start
> growing to 1024K buckets. New entries already go to the 1024K
> buckets at this point given that the first grow cycle should be
> fast. The 2nd grow cycle would take an est 6 RCU grace periods.
> This would also still give us a max of 8K bucket locks which
> should be good enough as well.
> 
> Just thinking this out loud. Still working on this.

I agree.  Client systems should start with the smallest possible table
size and memory usage (just enough for dozens or hundreds of
connections), and possibly never grow past that.  Any system processing
a non-trivial number of connections, however, wants to very quickly grow
to a substantial number of buckets.  The unzipping algorithm works just
fine for any integer growth factor; it just gets a bit more complicated.

One nice thing is that the resize algorithm very quickly allocates the
new buckets and sets up the head pointers such that the new table can be
used for inserts almost immediately, *without* a synchronize_rcu.  Only
the bucket unzipping process takes a non-trivial amount of time
(including one or more synchronize_rcu calls).  And the newly inserted
entries will go directly to the appropriate buckets, so they'll take
advantage of the larger table size.

> > On one hand I like the async resize because it means that an insert
> > that triggers the resize doesn't incur a huge latency spike since
> > it was simply unlucky to be the resize trigger event.  The async
> > resize smoothes out the cost of the resize across the system.
> > 
> > This scheme works really well if, on average, the resize operation
> > completes before enough subsequent inserts occur to exceed even
> > the resized tables resize threshold.
> > 
> > So I think what I'm getting at is that we can allow parallel inserts
> > but only up until the point where the resized tables thresholds are
> > exceeded.
> > 
> > Looking at how to implement this, I think that there is too much
> > configurability to this code.  There is no reason to have indirect
> > calls for the grow decision.  This should be a quick test, but it's
> > not because we go through ->grow_decision.  It should just be
> > rht_grow_above_75 or whatever, and inline this crap!
> > 
> > Nobody even uses this indirection capability, it's therefore over
> > engineered :-)
> 
> Another option is to only call the grow_decision once every N inserts
> or removals (32? 64?) and handle updates as batches.

If we have a means of tracking the number of inserts, we already have
the ability to make the decision, which is just a single comparison.  No
need to batch, since the decision of whether to check would *also*
require a comparison.

I do think this should just use the same growth function everywhere
until a user comes along that needs something different.

- Josh Triplett

  parent reply	other threads:[~2015-02-24 18:33 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-23 18:49 Ottawa and slow hash-table resize Paul E. McKenney
2015-02-23 19:12 ` josh
2015-02-23 21:03   ` Thomas Graf
2015-02-23 21:52     ` Paul E. McKenney
2015-02-23 22:32       ` David Miller
2015-02-23 23:06         ` Paul E. McKenney
2015-02-24  8:37           ` Thomas Graf
2015-02-24 10:39             ` Patrick McHardy
2015-02-24 10:46               ` David Laight
2015-02-24 10:48                 ` Patrick McHardy
2015-02-24 17:09               ` David Miller
2015-02-24 17:50                 ` Thomas Graf
2015-02-24 18:26                   ` David Miller
2015-02-24 18:45                     ` josh
2015-02-24 22:34                       ` Thomas Graf
2015-02-25  8:56                         ` Herbert Xu
2015-02-25 17:38                           ` Thomas Graf
2015-02-24 18:33                   ` josh [this message]
2015-02-25  8:55                 ` Herbert Xu
2015-02-25 17:38                   ` Thomas Graf
2015-02-23 21:00 ` Thomas Graf
2015-02-23 22:35   ` Paul E. McKenney
2015-02-24  8:59 ` Thomas Graf
2015-02-24  9:38   ` Daniel Borkmann
2015-02-24 10:42     ` Patrick McHardy
2015-02-24 16:14       ` Josh Hunt
2015-02-24 16:25         ` Patrick McHardy
2015-02-24 16:57           ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2015-02-23 22:17 Alexei Starovoitov
2015-02-23 22:34 ` David Miller
2015-02-23 22:37 ` Paul E. McKenney
2015-02-23 23:07 Alexei Starovoitov
2015-02-23 23:15 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150224183346.GA10713@cloud \
    --to=josh@joshtriplett.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tgraf@suug.ch \
    --cc=ying.xue@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).