All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Patrick McHardy <kaber@trash.net>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: RFC: nftables set selection
Date: Thu, 23 Jan 2014 11:14:29 +0100	[thread overview]
Message-ID: <20140123101429.GA4391@localhost> (raw)
In-Reply-To: <20140122105136.GB8764@macbook.localnet>

On Wed, Jan 22, 2014 at 10:51:37AM +0000, Patrick McHardy wrote:
> On Wed, Jan 22, 2014 at 11:12:59AM +0100, Pablo Neira Ayuso wrote:
> 
> Well, assuming (for a constant set) userspace provides the information:
> 
> - key size 4 bytes
> - number of keys 64
> - key range 192.168.0.0-192.168.1.200
> - number of clusters (just for distance 256 included here) 2
> 
> So the kernel would ask each set for an appromixation of expected
> memory use and performance:
> 
> - rbtree:
>   mem = sizeof(struct nft_rbtree) + 64 * sizeof(struct nft_ebtree_elem)
>       = 3080b
>   lookup = O(log N)
> 
> - hash:
>   size = sizeof(struct nft_hash) + 4/3 * 64 * sizeof(struct hlist_head) +
>          64 * sizeof(struct nft_hash_elem)
>        = 2746b
>   lookup = O(1)
> 
> - amtrie:
>   size = sizeof(struct nft_amtrie) +
> 	 sizeof(struct nft_amtrie_node) +
>          num_clusters * (key_size - 2) * (sizeof(struct nft_amtrie_node) +
> 					  sizeof(struct nft_amtrie_node *) +
> 	 num_clusters * (sizeof(struct nft_amtrie_node)
> 	
>        = 280b
> 
>   lookup = O(1)

Thanks for the detailed example.

> Some explanation here: the calculation is a worst case calculation since
> it can't determine exactly how much memory will be used. Each node on
> each level can store up to 256 elements or 8 bits of the key.
>
> Intermediate nodes have one pointer for each child node, leaf nodes only
> contain the bitmap. Since we don't know whether the leaf nodes
> (clusters of size 256) can share intermediate nodes, we assume they can't.
>
> So there is the tree root, pointing to a root node, with paths to
> two levels of intermediate nodes, echo one pointing to a leaf node.
>
> - obmap:
>   not usable since range too large

What information the obmap uses to decide if the range fits?

> So based on both size and memory usage the array mapped trie would win.
> This doesn't factor in constant costs, which are also smaller for the
> array mapped trie than for hashes. For this I'll probably just use some
> constant factor.

I see, with the new amtrie we need the memory consumption
approximation, otherwise I guess we would need to simulate the
addition of that element to know the precise memory consumption that
resulted from inserting that element.

> > Another possibility that I've been considering is that nft makes the
> > set type selection and tuning from userspace. The idea is that the
> > kernel provides a listing via netlink of the existing set types,
> > including a description of the available set type parameters and
> > performance information (memory consumption per element, lookup time
> > performance metrics). Based on that information and the abstract set
> > description, userspace can decide what set type suits better for your
> > needs.
> 
> That's probably even harder than the other way around. As you can see
> in the amtrie example, but it applies to obmap even more, there is not
> always a constant cost per element. The obmap has completely constant
> costs, but is only usable for small ranges, the amtrie has memory costs
> which depend on the amount of clusters of size 256.
>
> I think its not too unreasonable to expect that most algorithms we would
> use require similar information. Compressed tries would probably slightly
> differ, but we'd be either using trees, tries or bitmaps I suppose.
>
> > One of the things that I like the most in nftables is that the
> > "intelligence" resides in userspace, so, once we have the generic
> > kernel infrastructure in place, you only need to update your userspace
> > tool to get new features. Moreover, if we improve the set selection
> > logic from userspace, users won't need to wait until distributors
> > upgrade their kernel to get a better set selection/tuning and I think
> > we'll end up having quite some debate on the set selection algorithm
> > as it will be the key to improve performance.
> > 
> > So, why don't we perform this set selection/tuning from userspace?
> 
> I believe statistical properties of data can be described easier than
> the properties of the algorithms, especially wrt. memory usage.

Wouldn't the approximated memory consumption and lookup be enough to
perform a good set selection from userspace?

  reply	other threads:[~2014-01-23 10:32 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-21 17:07 RFC: nftables set selection Patrick McHardy
2014-01-22 10:12 ` Pablo Neira Ayuso
2014-01-22 10:51   ` Patrick McHardy
2014-01-23 10:14     ` Pablo Neira Ayuso [this message]
2014-01-23 11:16       ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140123101429.GA4391@localhost \
    --to=pablo@netfilter.org \
    --cc=kaber@trash.net \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.