From: Joanne Koong <joannekoong@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation
Date: Mon, 20 Sep 2021 15:52:50 -0700 [thread overview]
Message-ID: <17d7b319-01d0-163e-57b6-c385d38cc9ad@fb.com> (raw)
In-Reply-To: <CAEf4BzaA2QCmcc0nZqNbAqMdabqhjE5X_Nh59QjP8kd=iGH5GA@mail.gmail.com>
My previous email replied to Alexei's email before I saw Andrii's new
email, so please
feel free to disregard my previous email.
On 9/20/21 1:58 PM, Andrii Nakryiko wrote:
> On Fri, Sep 17, 2021 at 6:08 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Mon, Sep 13, 2021 at 09:04:30PM -0700, Joanne Koong wrote:
>>> +
>>> +/* For bloom filter maps, the next 4 bits represent how many hashes to use.
>>> + * The maximum number of hash functions supported is 15. If this is not set,
>>> + * the default number of hash functions used will be 5.
>>> + */
>>> + BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
>>> + BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
>>> + BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
>>> + BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),
>> The bit selection is unintuitive.
>> Since key_size has to be zero may be used that instead to indicate the number of hash
>> functions in the rare case when 5 is not good enough?
> Hm... I was initially thinking about proposing something like that,
> but it felt a bit ugly at the time. But now thinking about this a bit
> more, I think this would be a bit more meaningful if we change the
> terminology a bit. Instead of saying that Bloom filter has values and
> no keys, we actually have keys and no values. So all those bytes that
> are hashed are treated as keys (which is actually how sets are
> implemented on top of maps, where you have keys and no values, or at
> least the value is always true).
>
> So with that we'll have key/key_size to specify number of bytes that
> needs to be hashed (and it's type info). And then we can squint a bit
> and say that number of hashes are specified by value_size, as in
> values are those nr_hash bits that we set in Bloom filter.
>
> Still a bit of terminology stretch, but won't necessitate those
> specialized fields just for Bloom filter map. But if default value is
> going to be good enough for most cases and most cases won't need to
> adjust number of hashes, this seems to be pretty clean to me.
With having bloom filter map keys instead of values, I think this would
lead to messier code in the kernel for handling map_lookup_elem
and map_update_elem calls, due to the fact that the bloom filter map
is a non-associative map and the current APIs for non-associative map types
(peek_elem/push_elem/pop_elem) all have the map data as the value and
not the key.
For example, for map_update_elem, the API from the eBPF program side is
int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64
flags);
This would require us to either
a) Add some custom logic in syscall.c so that we bypass the
copy_from_bpfptr call on
bloom filter map values (necessary because memcpying 0 bytes still
requires the src pointer
to be valid), which would allow us to pass in a NULL value
b) Add a new function like
int (*map_push_key)(struct bpf_map *map, void *key, u64 flags)
that eBPF programs can call instead of map_update_elem.
or
c) Try to repurpose the existing map_push_elem API by passing in the key
instead of the value,
which would lead to inconsistent use of the API
I think if we could change the non-associative map types (currently only
stack maps and queue
maps, I believe) to have their data be a key instead of a value, and
have the pop/peek APIs use
keys instead of values, then this would be cleaner, since we could then
just use the existing peek/pop
APIs.
>> Or use inner_map_fd since there is no possibility of having an inner map in bloomfilter.
>> It could be a union:
>> __u32 max_entries; /* max number of entries in a map */
>> __u32 map_flags; /* BPF_MAP_CREATE related
>> * flags defined above.
>> */
>> union {
>> __u32 inner_map_fd; /* fd pointing to the inner map */
>> __u32 nr_hash_funcs; /* or number of hash functions */
>> };
> This works as well. A bit more Bloom filter-only terminology
> throughout UAPI and libbpf, but I'd be fine with that as well.
>
Great, it looks like this is the consensus - I will go with this option
for v3!
>> __u32 numa_node; /* numa node */
>>
>>> +struct bpf_bloom_filter {
>>> + struct bpf_map map;
>>> + u32 bit_array_mask;
>>> + u32 hash_seed;
>>> + /* If the size of the values in the bloom filter is u32 aligned,
>>> + * then it is more performant to use jhash2 as the underlying hash
>>> + * function, else we use jhash. This tracks the number of u32s
>>> + * in an u32-aligned value size. If the value size is not u32 aligned,
>>> + * this will be 0.
>>> + */
>>> + u32 aligned_u32_count;
>> what is the performance difference?
>> May be we enforce 4-byte sized value for simplicity?
> This might be a bit too surprising, especially if keys are just some
> strings, where people might not expect that it has to 4-byte multiple
> size. And debugging this without extra tooling (like retsnoop) is
> going to be nightmarish.
>
> If the performance diff is huge and that if/else logic is
> unacceptable, we can also internally pad with up to 3 zero bytes and
> include those into the hash.
I think the if/else logic is unavoidable if we support non 4-byte
aligned value sizes,
unless we are okay with truncating any remainder bytes of non 4-byte
aligned values
and stipulating that a bloom filter map value size has to be greater
than 4 bytes (these
conditions would allow us to use jhash2 for every value without an
if/else check). If we
internally pad, we will have to pad on every update and lookup, which
would also
require an if/else.
Thanks for the comments and reviews, Alexei and Andrii. They are much
appreciated!
next prev parent reply other threads:[~2021-09-20 22:54 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-14 4:04 [PATCH v2 bpf-next 0/4] Implement bloom filter map Joanne Koong
2021-09-14 4:04 ` [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation Joanne Koong
2021-09-17 17:01 ` Alexei Starovoitov
2021-09-20 20:58 ` Andrii Nakryiko
2021-09-20 22:52 ` Joanne Koong [this message]
2021-09-20 23:21 ` Andrii Nakryiko
2021-09-20 21:03 ` Joanne Koong
2021-09-17 21:48 ` Andrii Nakryiko
2021-09-14 4:04 ` [PATCH v2 bpf-next 2/4] selftests/bpf: Add bloom filter map test cases Joanne Koong
2021-09-14 4:04 ` [PATCH v2 bpf-next 3/4] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
2021-09-14 4:04 ` [PATCH v2 bpf-next 4/4] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=17d7b319-01d0-163e-57b6-c385d38cc9ad@fb.com \
--to=joannekoong@fb.com \
--cc=Kernel-team@fb.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=bpf@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox