From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Patrick McHardy <kaber@trash.net>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: RFC: nftables set selection
Date: Thu, 23 Jan 2014 11:14:29 +0100 [thread overview]
Message-ID: <20140123101429.GA4391@localhost> (raw)
In-Reply-To: <20140122105136.GB8764@macbook.localnet>
On Wed, Jan 22, 2014 at 10:51:37AM +0000, Patrick McHardy wrote:
> On Wed, Jan 22, 2014 at 11:12:59AM +0100, Pablo Neira Ayuso wrote:
>
> Well, assuming (for a constant set) userspace provides the information:
>
> - key size 4 bytes
> - number of keys 64
> - key range 192.168.0.0-192.168.1.200
> - number of clusters (just for distance 256 included here) 2
>
> So the kernel would ask each set for an appromixation of expected
> memory use and performance:
>
> - rbtree:
> mem = sizeof(struct nft_rbtree) + 64 * sizeof(struct nft_ebtree_elem)
> = 3080b
> lookup = O(log N)
>
> - hash:
> size = sizeof(struct nft_hash) + 4/3 * 64 * sizeof(struct hlist_head) +
> 64 * sizeof(struct nft_hash_elem)
> = 2746b
> lookup = O(1)
>
> - amtrie:
> size = sizeof(struct nft_amtrie) +
> sizeof(struct nft_amtrie_node) +
> num_clusters * (key_size - 2) * (sizeof(struct nft_amtrie_node) +
> sizeof(struct nft_amtrie_node *) +
> num_clusters * (sizeof(struct nft_amtrie_node)
>
> = 280b
>
> lookup = O(1)
Thanks for the detailed example.
> Some explanation here: the calculation is a worst case calculation since
> it can't determine exactly how much memory will be used. Each node on
> each level can store up to 256 elements or 8 bits of the key.
>
> Intermediate nodes have one pointer for each child node, leaf nodes only
> contain the bitmap. Since we don't know whether the leaf nodes
> (clusters of size 256) can share intermediate nodes, we assume they can't.
>
> So there is the tree root, pointing to a root node, with paths to
> two levels of intermediate nodes, echo one pointing to a leaf node.
>
> - obmap:
> not usable since range too large
What information the obmap uses to decide if the range fits?
> So based on both size and memory usage the array mapped trie would win.
> This doesn't factor in constant costs, which are also smaller for the
> array mapped trie than for hashes. For this I'll probably just use some
> constant factor.
I see, with the new amtrie we need the memory consumption
approximation, otherwise I guess we would need to simulate the
addition of that element to know the precise memory consumption that
resulted from inserting that element.
> > Another possibility that I've been considering is that nft makes the
> > set type selection and tuning from userspace. The idea is that the
> > kernel provides a listing via netlink of the existing set types,
> > including a description of the available set type parameters and
> > performance information (memory consumption per element, lookup time
> > performance metrics). Based on that information and the abstract set
> > description, userspace can decide what set type suits better for your
> > needs.
>
> That's probably even harder than the other way around. As you can see
> in the amtrie example, but it applies to obmap even more, there is not
> always a constant cost per element. The obmap has completely constant
> costs, but is only usable for small ranges, the amtrie has memory costs
> which depend on the amount of clusters of size 256.
>
> I think its not too unreasonable to expect that most algorithms we would
> use require similar information. Compressed tries would probably slightly
> differ, but we'd be either using trees, tries or bitmaps I suppose.
>
> > One of the things that I like the most in nftables is that the
> > "intelligence" resides in userspace, so, once we have the generic
> > kernel infrastructure in place, you only need to update your userspace
> > tool to get new features. Moreover, if we improve the set selection
> > logic from userspace, users won't need to wait until distributors
> > upgrade their kernel to get a better set selection/tuning and I think
> > we'll end up having quite some debate on the set selection algorithm
> > as it will be the key to improve performance.
> >
> > So, why don't we perform this set selection/tuning from userspace?
>
> I believe statistical properties of data can be described easier than
> the properties of the algorithms, especially wrt. memory usage.
Wouldn't the approximated memory consumption and lookup be enough to
perform a good set selection from userspace?
next prev parent reply other threads:[~2014-01-23 10:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-21 17:07 RFC: nftables set selection Patrick McHardy
2014-01-22 10:12 ` Pablo Neira Ayuso
2014-01-22 10:51 ` Patrick McHardy
2014-01-23 10:14 ` Pablo Neira Ayuso [this message]
2014-01-23 11:16 ` Patrick McHardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140123101429.GA4391@localhost \
--to=pablo@netfilter.org \
--cc=kaber@trash.net \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).