Re: Ottawa and slow hash-table resize

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Miller <davem@davemloft.net>
To: paulmck@linux.vnet.ibm.com
Cc: tgraf@suug.ch, josh@joshtriplett.org,
	alexei.starovoitov@gmail.com, herbert@gondor.apana.org.au,
	kaber@trash.net, ying.xue@windriver.com, netdev@vger.kernel.org,
	netfilter-devel@vger.kernel.org
Subject: Re: Ottawa and slow hash-table resize
Date: Mon, 23 Feb 2015 17:32:52 -0500 (EST)	[thread overview]
Message-ID: <20150223.173252.397503088215638994.davem@davemloft.net> (raw)
In-Reply-To: <20150223215248.GA15405@linux.vnet.ibm.com>

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Mon, 23 Feb 2015 13:52:49 -0800

> On Mon, Feb 23, 2015 at 09:03:58PM +0000, Thomas Graf wrote:
>> On 02/23/15 at 11:12am, josh@joshtriplett.org wrote:
>> > In theory, resizes should only take the locks for the buckets they're
>> > currently unzipping, and adds should take those same locks.  Neither one
>> > should take a whole-table lock, other than resize excluding concurrent
>> > resizes.  Is that still insufficient?
>> 
>> Correct, this is what happens. The problem is basically that
>> if we insert from atomic context we cannot slow down inserts
>> and the table may not grow quickly enough.
>> 
>> > Yeah, the add/remove statistics used for tracking would need some
>> > special handling to avoid being a table-wide bottleneck.
>> 
>> Daniel is working on a patch to do per-cpu element counting
>> with a batched update cycle.
> 
> One approach is simply to count only when a resize operation is in
> flight.  Another is to keep a per-bucket count, which can be summed
> at the beginning of the next resize operation.

I think we should think exactly about what we should do when someone
loops non-stop adding 1 million entries to the hash table and the
initial table size is very small.

This is a common use case for at least one of the current rhashtable
users (nft_hash).  When you load an nftables rule with a large set
of IP addresses attached, this is what happens.

Yes I understand that nftables could give a hint and start with a
larger hash size from the start when it knows this is going to happen,
but I still believe that we should behave reasonably when starting
from a small table.

I'd say that with the way things work right now, in this situation it
actually hurts to allow asynchronous inserts during a resize.  Because
we end up with extremely long hash table chains, and thus make the
resize work and the lookups both take an excruciatingly long amount of
time to complete.

I just did a quick scan of all code paths that do inserts into an
rhashtable, and it seems like all of them can easily block.  So why
don't we do that?  Make inserts sleep on an rhashtable expansion
waitq.

There could even be a counter of pending inserts, so the expander can
decide to expand further before waking the inserting threads up.

next prev parent reply	other threads:[~2015-02-23 22:32 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-23 18:49 Ottawa and slow hash-table resize Paul E. McKenney
2015-02-23 19:12 ` josh
2015-02-23 21:03   ` Thomas Graf
2015-02-23 21:52     ` Paul E. McKenney
2015-02-23 22:32       ` David Miller [this message]
2015-02-23 23:06         ` Paul E. McKenney
2015-02-24  8:37           ` Thomas Graf
2015-02-24 10:39             ` Patrick McHardy
2015-02-24 10:46               ` David Laight
2015-02-24 10:48                 ` Patrick McHardy
2015-02-24 17:09               ` David Miller
2015-02-24 17:50                 ` Thomas Graf
2015-02-24 18:26                   ` David Miller
2015-02-24 18:45                     ` josh
2015-02-24 22:34                       ` Thomas Graf
2015-02-25  8:56                         ` Herbert Xu
2015-02-25 17:38                           ` Thomas Graf
2015-02-24 18:33                   ` josh
2015-02-25  8:55                 ` Herbert Xu
2015-02-25 17:38                   ` Thomas Graf
2015-02-23 21:00 ` Thomas Graf
2015-02-23 22:35   ` Paul E. McKenney
2015-02-24  8:59 ` Thomas Graf
2015-02-24  9:38   ` Daniel Borkmann
2015-02-24 10:42     ` Patrick McHardy
2015-02-24 16:14       ` Josh Hunt
2015-02-24 16:25         ` Patrick McHardy
2015-02-24 16:57           ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2015-02-23 22:17 Alexei Starovoitov
2015-02-23 22:34 ` David Miller
2015-02-23 22:37 ` Paul E. McKenney
2015-02-23 23:07 Alexei Starovoitov
2015-02-23 23:15 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150223.173252.397503088215638994.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=alexei.starovoitov@gmail.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=josh@joshtriplett.org \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tgraf@suug.ch \
    --cc=ying.xue@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).