Re: Ottawa and slow hash-table resize

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Graf <tgraf@suug.ch>
To: David Miller <davem@davemloft.net>
Cc: kaber@trash.net, paulmck@linux.vnet.ibm.com,
	josh@joshtriplett.org, alexei.starovoitov@gmail.com,
	herbert@gondor.apana.org.au, ying.xue@windriver.com,
	netdev@vger.kernel.org, netfilter-devel@vger.kernel.org
Subject: Re: Ottawa and slow hash-table resize
Date: Tue, 24 Feb 2015 17:50:14 +0000	[thread overview]
Message-ID: <20150224175014.GA29802@casper.infradead.org> (raw)
In-Reply-To: <20150224.120944.866231994361475327.davem@davemloft.net>

On 02/24/15 at 12:09pm, David Miller wrote:
> And having a flood of 1 million new TCP connections all at once
> shouldn't knock us over.
> 
> Therefore, we will need to find a way to handle this problem without
> being able to block on insert.

One possible way to handle this is to have users like TCP grow
quicker than 2x. Maybe start with 16x and grow slower and slower
using a log function. (No, we do not want rhashtable congestion
control algos ;-)

> Thinking about this, if inserts occur during a pending resize, if the
> nelems of the table has exceeded even the grow threshold for the new
> table, it makes no sense to allow these async inserts as they are
> going to make the resize take longer and prolong the pain.

Let's say we start with an initial table size of 16K (we can make
this system memory depenend) and we grow by 8x. New inserts go
into the new table immediately so as soon as we have 12K entries
we'll grow right to 128K buckets. As we grow above 75K we'll start
growing to 1024K buckets. New entries already go to the 1024K
buckets at this point given that the first grow cycle should be
fast. The 2nd grow cycle would take an est 6 RCU grace periods.
This would also still give us a max of 8K bucket locks which
should be good enough as well.

Just thinking this out loud. Still working on this.

> On one hand I like the async resize because it means that an insert
> that triggers the resize doesn't incur a huge latency spike since
> it was simply unlucky to be the resize trigger event.  The async
> resize smoothes out the cost of the resize across the system.
> 
> This scheme works really well if, on average, the resize operation
> completes before enough subsequent inserts occur to exceed even
> the resized tables resize threshold.
> 
> So I think what I'm getting at is that we can allow parallel inserts
> but only up until the point where the resized tables thresholds are
> exceeded.
> 
> Looking at how to implement this, I think that there is too much
> configurability to this code.  There is no reason to have indirect
> calls for the grow decision.  This should be a quick test, but it's
> not because we go through ->grow_decision.  It should just be
> rht_grow_above_75 or whatever, and inline this crap!
> 
> Nobody even uses this indirection capability, it's therefore over
> engineered :-)

Another option is to only call the grow_decision once every N inserts
or removals (32? 64?) and handle updates as batches. No objection
to ditching the grow/shrink function for now though. Not sure we
anyone actually needs different growth semantics.

next prev parent reply	other threads:[~2015-02-24 17:50 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-23 18:49 Ottawa and slow hash-table resize Paul E. McKenney
2015-02-23 19:12 ` josh
2015-02-23 21:03   ` Thomas Graf
2015-02-23 21:52     ` Paul E. McKenney
2015-02-23 22:32       ` David Miller
2015-02-23 23:06         ` Paul E. McKenney
2015-02-24  8:37           ` Thomas Graf
2015-02-24 10:39             ` Patrick McHardy
2015-02-24 10:46               ` David Laight
2015-02-24 10:48                 ` Patrick McHardy
2015-02-24 17:09               ` David Miller
2015-02-24 17:50                 ` Thomas Graf [this message]
2015-02-24 18:26                   ` David Miller
2015-02-24 18:45                     ` josh
2015-02-24 22:34                       ` Thomas Graf
2015-02-25  8:56                         ` Herbert Xu
2015-02-25 17:38                           ` Thomas Graf
2015-02-24 18:33                   ` josh
2015-02-25  8:55                 ` Herbert Xu
2015-02-25 17:38                   ` Thomas Graf
2015-02-23 21:00 ` Thomas Graf
2015-02-23 22:35   ` Paul E. McKenney
2015-02-24  8:59 ` Thomas Graf
2015-02-24  9:38   ` Daniel Borkmann
2015-02-24 10:42     ` Patrick McHardy
2015-02-24 16:14       ` Josh Hunt
2015-02-24 16:25         ` Patrick McHardy
2015-02-24 16:57           ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2015-02-23 22:17 Alexei Starovoitov
2015-02-23 22:34 ` David Miller
2015-02-23 22:37 ` Paul E. McKenney
2015-02-23 23:07 Alexei Starovoitov
2015-02-23 23:15 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150224175014.GA29802@casper.infradead.org \
    --to=tgraf@suug.ch \
    --cc=alexei.starovoitov@gmail.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=josh@joshtriplett.org \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=ying.xue@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).