netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Patrick McHardy <kaber@trash.net>
Cc: pablo@netfilter.org, netfilter-devel@vger.kernel.org,
	paulmck@linux.vnet.ibm.com
Subject: Re: [PATCH] netfilter: nft_hash: bug fixes and resizing
Date: Thu, 27 Feb 2014 07:47:46 -0800	[thread overview]
Message-ID: <20140227154746.GA26756@thin> (raw)
In-Reply-To: <1393512980-28780-2-git-send-email-kaber@trash.net>

On Thu, Feb 27, 2014 at 02:56:20PM +0000, Patrick McHardy wrote:
> The hash set type is very broken and was never meant to be merged in this
> state. Missing RCU synchronization on element removal, leaking chain
> refcounts when used as a verdict map, races during lookups, a fixed table
> size are probably just some of the problems. Luckily it is currently
> never chosen by the kernel when the rbtree type is also available.
> 
> Rewrite it to be usable.
> 
> The new implementation supports automatic hash table resizing using RCU,
> based on Paul McKenney's and Josh Triplett's algorithm "Optimized Resizing
> For RCU-Protected Hash Tables" described in [1].
> 
> Resizing doesn't require a second list head in the elements, it works by
> chosing a hash function that remaps elements to a predictable set of buckets,
> only resizing by integral factors and
> 
> - during expansion: linking new buckets to the old bucket that contains
>   elements for any of the new buckets, thereby creating imprecise chains,
>   then incrementally seperating the elements until the new buckets only
>   contain elements that hash directly to them.
> 
> - during shrinking: linking the hash chains of all old buckets that hash
>   to the same new bucket to form a single chain.
> 
> Expansion requires at most the number of elements in the longest hash chain
> grace periods, shrinking requires a single grace period.
> 
> Due to the requirement of having hash chains/elements linked to multiple
> buckets during resizing, homemade single linked lists are used instead of
> the existing list helpers, that don't support this in a clean fashion.
> As a side effect, the amount of memory required per element is reduced by
> one pointer.
> 
> Expansion is triggered when the load factors exceeds 75%, shrinking when
> the load factor goes below 30%. Both operations are allowed to fail and
> will be retried on the next insertion or removal if their respective
> conditions still hold.

Since this hash table uses chaining, does it really make sense to expand
at 75% load?  You might just want to expand at 100%, which is even
easier to check for.  But that seems like a question for benchmarks to
answer.

> [1] http://dl.acm.org/citation.cfm?id=2002181.2002192
> 
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Signed-off-by: Patrick McHardy <kaber@trash.net>

This looks correct to me.  Thank you very much for this work!

One suggestion and one minor micro-optimization (unrelated to the
algorithm implementation) below.  With those changed:
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

> +	nft_hash_for_each_entry(he, he->next) {
> +		if (nft_hash_data(&he->key, ntbl->size, set->klen) != h)
> +			continue;
> +		next = he;
> +		break;
> +	}

> +		nft_hash_for_each_entry(he, tbl->buckets[h]) {
> +			if (nft_hash_data(&he->key, ntbl->size, set->klen) != i)
> +				continue;
> +			RCU_INIT_POINTER(ntbl->buckets[i], he);
> +			break;
> +		}

In both of these cases, you could reverse the sense of the if with
s/!=/==/, move the "statement; break;" into the if, and the continue
would become redundant.  (You then wouldn't even need the braces around
the loop body.)

Second, in the load-factor calculation:

> +	/* Expand table when exceeding 75% load */
> +	if (tbl->elements * 4 / 3 > tbl->size)
> +		nft_hash_tbl_expand(set, priv);

I just checked, and GCC ends up using an imul to implement this, due to
the division by 3.  I'd suggest rewriting it to:

if (tbl->elements > (tbl->size >> 2) * 3)

Dividing tbl->size by 4 first avoids the possibility of integer
overflow, and GCC translates the *3 into a single lea instruction.

(Also, do you need an NFT_HASH_MAX_SIZE here?)

Similar considerations apply to the calculation for shrinking.

- Josh Triplett

  reply	other threads:[~2014-02-27 15:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-27 14:56 [PATCH] netfilter: hash resizing Patrick McHardy
2014-02-27 14:56 ` [PATCH] netfilter: nft_hash: bug fixes and resizing Patrick McHardy
2014-02-27 15:47   ` Josh Triplett [this message]
2014-02-27 16:06     ` Patrick McHardy
2014-02-27 19:47   ` Paul E. McKenney
2014-02-28 11:28     ` Patrick McHardy
2014-02-28 16:09       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140227154746.GA26756@thin \
    --to=josh@joshtriplett.org \
    --cc=kaber@trash.net \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).