Re: RFC: nftables set selection

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Patrick McHardy <kaber@trash.net>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: RFC: nftables set selection
Date: Thu, 23 Jan 2014 11:14:29 +0100	[thread overview]
Message-ID: <20140123101429.GA4391@localhost> (raw)
In-Reply-To: <20140122105136.GB8764@macbook.localnet>

On Wed, Jan 22, 2014 at 10:51:37AM +0000, Patrick McHardy wrote:
> On Wed, Jan 22, 2014 at 11:12:59AM +0100, Pablo Neira Ayuso wrote:
> 
> Well, assuming (for a constant set) userspace provides the information:
> 
> - key size 4 bytes
> - number of keys 64
> - key range 192.168.0.0-192.168.1.200
> - number of clusters (just for distance 256 included here) 2
> 
> So the kernel would ask each set for an appromixation of expected
> memory use and performance:
> 
> - rbtree:
>   mem = sizeof(struct nft_rbtree) + 64 * sizeof(struct nft_ebtree_elem)
>       = 3080b
>   lookup = O(log N)
> 
> - hash:
>   size = sizeof(struct nft_hash) + 4/3 * 64 * sizeof(struct hlist_head) +
>          64 * sizeof(struct nft_hash_elem)
>        = 2746b
>   lookup = O(1)
> 
> - amtrie:
>   size = sizeof(struct nft_amtrie) +
> 	 sizeof(struct nft_amtrie_node) +
>          num_clusters * (key_size - 2) * (sizeof(struct nft_amtrie_node) +
> 					  sizeof(struct nft_amtrie_node *) +
> 	 num_clusters * (sizeof(struct nft_amtrie_node)
> 	
>        = 280b
> 
>   lookup = O(1)

Thanks for the detailed example.

> Some explanation here: the calculation is a worst case calculation since
> it can't determine exactly how much memory will be used. Each node on
> each level can store up to 256 elements or 8 bits of the key.
>
> Intermediate nodes have one pointer for each child node, leaf nodes only
> contain the bitmap. Since we don't know whether the leaf nodes
> (clusters of size 256) can share intermediate nodes, we assume they can't.
>
> So there is the tree root, pointing to a root node, with paths to
> two levels of intermediate nodes, echo one pointing to a leaf node.
>
> - obmap:
>   not usable since range too large

What information the obmap uses to decide if the range fits?

> So based on both size and memory usage the array mapped trie would win.
> This doesn't factor in constant costs, which are also smaller for the
> array mapped trie than for hashes. For this I'll probably just use some
> constant factor.

I see, with the new amtrie we need the memory consumption
approximation, otherwise I guess we would need to simulate the
addition of that element to know the precise memory consumption that
resulted from inserting that element.

> > Another possibility that I've been considering is that nft makes the
> > set type selection and tuning from userspace. The idea is that the
> > kernel provides a listing via netlink of the existing set types,
> > including a description of the available set type parameters and
> > performance information (memory consumption per element, lookup time
> > performance metrics). Based on that information and the abstract set
> > description, userspace can decide what set type suits better for your
> > needs.
> 
> That's probably even harder than the other way around. As you can see
> in the amtrie example, but it applies to obmap even more, there is not
> always a constant cost per element. The obmap has completely constant
> costs, but is only usable for small ranges, the amtrie has memory costs
> which depend on the amount of clusters of size 256.
>
> I think its not too unreasonable to expect that most algorithms we would
> use require similar information. Compressed tries would probably slightly
> differ, but we'd be either using trees, tries or bitmaps I suppose.
>
> > One of the things that I like the most in nftables is that the
> > "intelligence" resides in userspace, so, once we have the generic
> > kernel infrastructure in place, you only need to update your userspace
> > tool to get new features. Moreover, if we improve the set selection
> > logic from userspace, users won't need to wait until distributors
> > upgrade their kernel to get a better set selection/tuning and I think
> > we'll end up having quite some debate on the set selection algorithm
> > as it will be the key to improve performance.
> > 
> > So, why don't we perform this set selection/tuning from userspace?
> 
> I believe statistical properties of data can be described easier than
> the properties of the algorithms, especially wrt. memory usage.

Wouldn't the approximated memory consumption and lookup be enough to
perform a good set selection from userspace?

next prev parent reply	other threads:[~2014-01-23 10:32 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-21 17:07 RFC: nftables set selection Patrick McHardy
2014-01-22 10:12 ` Pablo Neira Ayuso
2014-01-22 10:51   ` Patrick McHardy
2014-01-23 10:14     ` Pablo Neira Ayuso [this message]
2014-01-23 11:16       ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140123101429.GA4391@localhost \
    --to=pablo@netfilter.org \
    --cc=kaber@trash.net \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).