From: Anton Protopopov <aspsk@isovalent.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
John Fastabend <john.fastabend@gmail.com>
Subject: Re: [PATCH bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4
Date: Sat, 1 Apr 2023 19:25:44 +0000 [thread overview]
Message-ID: <ZCiFONQVTvRia3nY@zh-lab-node-5> (raw)
In-Reply-To: <20230401162003.kkkx7ynlu7a2msn6@dhcp-172-26-102-232.dhcp.thefacebook.com>
On Sat, Apr 01, 2023 at 09:20:03AM -0700, Alexei Starovoitov wrote:
> On Sat, Apr 01, 2023 at 10:10:50AM +0000, Anton Protopopov wrote:
> > The BPF hashmap uses the jhash() hash function. There is an optimized version
> > of this hash function which may be used if hash size is a multiple of 4. Apply
> > this optimization to the hashmap in a similar way as it is done in the bloom
> > filter map.
> >
> > On practice the optimization is only noticeable for smaller key sizes, which,
> > however, is sufficient for many applications. An example is listed in the
> > following table of measurements (a hashmap of 65536 elements was used):
> >
> > --------------------------------------------------------------------
> > | key_size | fullness | lookups /sec | lookups (opt) /sec | gain |
> > --------------------------------------------------------------------
> > | 4 | 25% | 42.990M | 46.000M | 7.0% |
> > | 4 | 50% | 37.910M | 39.094M | 3.1% |
> > | 4 | 75% | 34.486M | 36.124M | 4.7% |
> > | 4 | 100% | 31.760M | 32.719M | 3.0% |
> > --------------------------------------------------------------------
> > | 8 | 25% | 43.855M | 49.626M | 13.2% |
> > | 8 | 50% | 38.328M | 42.152M | 10.0% |
> > | 8 | 75% | 34.483M | 38.088M | 10.5% |
> > | 8 | 100% | 31.306M | 34.686M | 10.8% |
> > --------------------------------------------------------------------
> > | 12 | 25% | 38.398M | 43.770M | 14.0% |
> > | 12 | 50% | 33.336M | 37.712M | 13.1% |
> > | 12 | 75% | 29.917M | 34.440M | 15.1% |
> > | 12 | 100% | 27.322M | 30.480M | 11.6% |
> > --------------------------------------------------------------------
> > | 16 | 25% | 41.491M | 41.921M | 1.0% |
> > | 16 | 50% | 36.206M | 36.474M | 0.7% |
> > | 16 | 75% | 32.529M | 33.027M | 1.5% |
> > | 16 | 100% | 29.581M | 30.325M | 2.5% |
> > --------------------------------------------------------------------
> > | 20 | 25% | 34.240M | 36.787M | 7.4% |
> > | 20 | 50% | 30.328M | 32.663M | 7.7% |
> > | 20 | 75% | 27.536M | 29.354M | 6.6% |
> > | 20 | 100% | 24.847M | 26.505M | 6.7% |
> > --------------------------------------------------------------------
> > | 24 | 25% | 36.329M | 40.608M | 11.8% |
> > | 24 | 50% | 31.444M | 35.059M | 11.5% |
> > | 24 | 75% | 28.426M | 31.452M | 10.6% |
> > | 24 | 100% | 26.278M | 28.741M | 9.4% |
> > --------------------------------------------------------------------
> > | 28 | 25% | 31.540M | 31.944M | 1.3% |
> > | 28 | 50% | 27.739M | 28.063M | 1.2% |
> > | 28 | 75% | 24.993M | 25.814M | 3.3% |
> > | 28 | 100% | 23.513M | 23.500M | -0.1% |
> > --------------------------------------------------------------------
> > | 32 | 25% | 32.116M | 33.953M | 5.7% |
> > | 32 | 50% | 28.879M | 29.859M | 3.4% |
> > | 32 | 75% | 26.227M | 26.948M | 2.7% |
> > | 32 | 100% | 23.829M | 24.613M | 3.3% |
> > --------------------------------------------------------------------
> > | 64 | 25% | 22.535M | 22.554M | 0.1% |
> > | 64 | 50% | 20.471M | 20.675M | 1.0% |
> > | 64 | 75% | 19.077M | 19.146M | 0.4% |
> > | 64 | 100% | 17.710M | 18.131M | 2.4% |
> > --------------------------------------------------------------------
> >
> > The following script was used to gather the results (SMT & frequency off):
> >
> > cd tools/testing/selftests/bpf
> > for key_size in 4 8 12 16 20 24 28 32 64; do
> > for nr_entries in `seq 16384 16384 65536`; do
> > fullness=$(printf '%3s' $((nr_entries*100/65536)))
> > echo -n "key_size=$key_size: $fullness% full: "
> > sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=$key_size --nr_entries=$nr_entries --max_entries=65536 --nr_loops=2000000 --map_flags=0x40 | grep cpu
> > done
> > echo
> > done
> >
> > Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
> > ---
> > kernel/bpf/hashtab.c | 29 ++++++++++++++++++-----------
> > 1 file changed, 18 insertions(+), 11 deletions(-)
> >
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 96b645bba3a4..eb804815f7c3 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -103,6 +103,7 @@ struct bpf_htab {
> > u32 n_buckets; /* number of hash buckets */
> > u32 elem_size; /* size of each element in bytes */
> > u32 hashrnd;
> > + u32 key_size_u32;
> > struct lock_class_key lockdep_key;
> > int __percpu *map_locked[HASHTAB_MAP_LOCK_COUNT];
> > };
> > @@ -510,6 +511,10 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> > else
> > htab->elem_size += round_up(htab->map.value_size, 8);
> >
> > + /* optimize hash computations if key_size is divisible by 4 */
> > + if ((attr->key_size & (sizeof(u32) - 1)) == 0)
> > + htab->key_size_u32 = attr->key_size / sizeof(u32);
>
> Please use & 3 and / 4.
> sizeof(u32) is not going to change.
Yes, sure
> > +
> > err = -E2BIG;
> > /* prevent zero size kmalloc and check for u32 overflow */
> > if (htab->n_buckets == 0 ||
> > @@ -605,9 +610,11 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> > return ERR_PTR(err);
> > }
> >
> > -static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
> > +static inline u32 htab_map_hash(const struct bpf_htab *htab, const void *key, u32 key_len)
> > {
> > - return jhash(key, key_len, hashrnd);
> > + if (likely(htab->key_size_u32))
> > + return jhash2(key, htab->key_size_u32, htab->hashrnd);
> > + return jhash(key, key_len, htab->hashrnd);
>
> Could you measure the speed when &3 and /4 is done in the hot path ?
> I would expect the performance to be the same or faster,
> since extra load is gone.
I don't see any visible difference (I've tested "&3 and /4" and "%4==0 and /4"
variants). Do you still prefer division in favor of using htab->key_size_u32?
next prev parent reply other threads:[~2023-04-01 19:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-01 10:10 [PATCH bpf-next] bpf: optimize hashmap lookups when key_size is divisible by 4 Anton Protopopov
2023-04-01 16:20 ` Alexei Starovoitov
2023-04-01 19:25 ` Anton Protopopov [this message]
2023-04-01 19:35 ` Alexei Starovoitov
2023-04-01 19:53 ` Anton Protopopov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZCiFONQVTvRia3nY@zh-lab-node-5 \
--to=aspsk@isovalent.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=martin.lau@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox