BPF List
 help / color / mirror / Atom feed
From: Joanne Koong <joannekoong@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive
Date: Fri, 15 Oct 2021 16:35:02 -0700	[thread overview]
Message-ID: <02a28958-0015-e1cf-84a5-cb7d5717fdaf@fb.com> (raw)
In-Reply-To: <CAEf4Bzanj_rGR4Y1iQB=TLb8ud3m9_W6JEQx9sW=auFMV3QGRg@mail.gmail.com>

On 10/8/21 7:54 PM, Andrii Nakryiko wrote:

> On Wed, Oct 6, 2021 at 3:37 PM Joanne Koong <joannekoong@fb.com> wrote:
>> On 10/6/21 3:21 PM, Joanne Koong wrote:
>>
>>> This patch adds benchmark tests for the throughput (for lookups + updates)
>>> and the false positive rate of bloom filter lookups, as well as some
>>> minor refactoring of the bash script for running the benchmarks.
>>>
>>> These benchmarks show that as the number of hash functions increases,
>>> the throughput and the false positive rate of the bloom filter decreases.
>>>   From the benchmark data, the approximate average false-positive rates for
>>> 8-byte values are roughly as follows:
>>>
>>> 1 hash function = ~30%
>>> 2 hash functions = ~15%
>>> 3 hash functions = ~5%
>>> 4 hash functions = ~2.5%
>>> 5 hash functions = ~1%
>>> 6 hash functions = ~0.5%
>>> 7 hash functions  = ~0.35%
>>> 8 hash functions = ~0.15%
>>> 9 hash functions = ~0.1%
>>> 10 hash functions = ~0%
>>>
>>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
>>> ---
>>>    tools/testing/selftests/bpf/Makefile          |   6 +-
>>>    tools/testing/selftests/bpf/bench.c           |  37 ++
>>>    tools/testing/selftests/bpf/bench.h           |   3 +
>>>    .../bpf/benchs/bench_bloom_filter_map.c       | 411 ++++++++++++++++++
>>>    .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
>>>    .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
>>>    .../selftests/bpf/benchs/run_common.sh        |  48 ++
>>>    tools/testing/selftests/bpf/bpf_util.h        |  11 +
>>>    .../selftests/bpf/progs/bloom_filter_bench.c  | 146 +++++++
>>>    9 files changed, 690 insertions(+), 30 deletions(-)
>>>    create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
>>>    create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
>>>    create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
>>>    create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_bench.c
>>>
[...]

>>> +
>>> +SEC("fentry/__x64_sys_getpgid")
>>> +int prog_bloom_filter_hashmap_lookup(void *ctx)
>>> +{
>>> +     __u64 *result;
>>> +     int i, j, err;
>>> +
>>> +     __u32 val[64] = {0};
>>> +
>>> +     for (i = 0; i < 1024; i++) {
>>> +             for (j = 0; j < value_sz_nr_u32s && j < 64; j++)
>>> +                     val[j] = bpf_get_prandom_u32();
>>> +
>> I tried out prepopulating these random values from the userspace side
>> (where we generate a random sequence of 10000 bytes and put that
>> in a bpf array map, then iterate through the bpf array map in the bpf
>> program; when I tried implementing it using a global array, I saw
>> verifier errors when indexing into the array).
>>
>> Additionally, this slows down the bench program as well, since we need
>> to generate all of these random values in the setup() portion of the
>> program.
>> I'm not convinced that prepopulating the values ahead of time nets us
>> anything - if the concern is that this slows down the bpf program,
>> I think this slows down the program in both the hashmap with and without
>> bloom filter cases; since we are mainly only interested in the delta between
>> these two scenarios, I don't think this ends up mattering that much.
> So imagine that a hashmap benchmark takes 10ms per iteration, and
> bloom filter + hashmap takes 5ms. That's a 2x difference, right? Now
> imagine that random values generation takes another 5ms, so actually
> you measure 15ms vs 10ms run time. Now, suddenly, you have measured
> just a 1.5x difference.
Yeah, you're right - in this case, the calls to bpf_get_prandom_u32()
are time-intensive enough to have a measurable impact on the difference. I
guess we could say that the delta is a conservative lower bound estimate -
that this shows the bloom filter is at least X% faster on average,
but I agree that it'd be great to get a more accurate estimate of the
speed improvement.

What do you think about expanding the benchmark framework to let the
user program control when an iteration is finished? Right now, every 
iteration
runs for 1 sec, and we calculate how many hits+drops have occurred within
that 1 sec. This doesn't work great for when we need to prepopulate the 
data in
advance since we don't know how much data is needed (eg 1 million entries
might be enough on some servers while on other more powerful servers, it'll
finish going through the 1 million entries before the timer is triggered -
one option is to keep reusing the same data until the timer triggers, but
this runs into potential issues where the hits/drops stats can overflow,
especially since they monotonically increase between iterations); we 
could err
on overpopulating to make sure there will always be enough entries, but
then this leads to irritatingly long setup times.

A more ideal setup would be something where we prepopulate the data to
1 million entries, then in each iteration of the benchmark go through the
1 million entries timing how long it takes to run through it with vs. 
without
the bloom filter. This also leads to slightly more accurate results 
since now
we also don't have to spend time logging each hit/drop in the corresponding
per-cpu stats. I'm thinking about this mostly in the context of the bloom
filter, but maybe having this option of running benchmarks this way could be
useful for other maps in the future as well.

What are your thoughts?


>
> But it's ok, feel free to just keep the benchmark as is.
>
[...]


  reply	other threads:[~2021-10-15 23:35 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-06 22:20 [PATCH bpf-next v4 0/5] Implement bitset maps, with bloom filter Joanne Koong
2021-10-06 22:20 ` [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities Joanne Koong
2021-10-07 14:20   ` Toke Høiland-Jørgensen
2021-10-07 21:59     ` Joanne Koong
2021-10-08 21:57       ` Toke Høiland-Jørgensen
2021-10-08 23:11         ` Andrii Nakryiko
2021-10-09 13:10           ` Toke Høiland-Jørgensen
2021-10-12  3:17             ` Andrii Nakryiko
2021-10-12 12:48               ` Toke Høiland-Jørgensen
2021-10-12 22:46                 ` Joanne Koong
2021-10-12 23:25                   ` Zvi Effron
2021-10-13  1:17                     ` Joanne Koong
2021-10-13  4:48                       ` Alexei Starovoitov
2021-10-13  0:11                   ` Martin KaFai Lau
2021-10-13  4:41                     ` Alexei Starovoitov
2021-10-19 23:53                       ` Andrii Nakryiko
2021-10-08 23:05   ` Andrii Nakryiko
2021-10-08 23:24     ` Zvi Effron
2021-10-09  0:16       ` Martin KaFai Lau
2021-10-06 22:21 ` [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag Joanne Koong
2021-10-08 23:19   ` Andrii Nakryiko
2021-10-20 21:08     ` Joanne Koong
2021-10-20 21:21       ` Andrii Nakryiko
2021-10-21 20:14         ` Joanne Koong
2021-10-21 21:41           ` Andrii Nakryiko
2021-10-09  2:12   ` Andrii Nakryiko
2021-10-06 22:21 ` [PATCH bpf-next v4 3/5] selftests/bpf: Add bitset map test cases Joanne Koong
2021-10-06 22:21 ` [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive Joanne Koong
2021-10-06 22:35   ` Joanne Koong
2021-10-09  2:54     ` Andrii Nakryiko
2021-10-15 23:35       ` Joanne Koong [this message]
2021-10-20  0:46         ` Joanne Koong
2021-10-09  2:39   ` Andrii Nakryiko
2021-10-06 22:21 ` [PATCH bpf-next v4 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups w/ vs. w/out bloom filter Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02a28958-0015-e1cf-84a5-cb7d5717fdaf@fb.com \
    --to=joannekoong@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox