Re: [PATCH v6 bpf-next] selftests/bpf: Add benchmark for local_storage get

All of lore.kernel.org
 help / color / mirror / Atom feed

From: John Fastabend <john.fastabend@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	John Fastabend <john.fastabend@gmail.com>
Cc: Martin KaFai Lau <kafai@fb.com>,
	Dave Marchevsky <davemarchevsky@fb.com>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v6 bpf-next] selftests/bpf: Add benchmark for local_storage get
Date: Wed, 22 Jun 2022 20:25:33 -0700	[thread overview]
Message-ID: <62b3dd2d8f81c_6a3b22085b@john.notmuch> (raw)
In-Reply-To: <CAADnVQJh0BKP3+WHQ+EmEvw3h=QS9xgR=8uk_3Zb6321xhMjtg@mail.gmail.com>

Alexei Starovoitov wrote:
> On Wed, Jun 22, 2022 at 6:26 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Martin KaFai Lau wrote:
> > > On Tue, Jun 21, 2022 at 10:49:46PM -0700, John Fastabend wrote:
> > > > Martin KaFai Lau wrote:
> > > > > On Tue, Jun 21, 2022 at 12:17:54PM -0700, John Fastabend wrote:
> > > > > > > Hashmap Control
> > > > > > > ===============
> > > > > > >         num keys: 10
> > > > > > > hashmap (control) sequential    get:  hits throughput: 20.900 ± 0.334 M ops/s, hits latency: 47.847 ns/op, important_hits throughput: 20.900 ± 0.334 M ops/s
> > > > > > >
> > > > > > >         num keys: 1000
> > > > > > > hashmap (control) sequential    get:  hits throughput: 13.758 ± 0.219 M ops/s, hits latency: 72.683 ns/op, important_hits throughput: 13.758 ± 0.219 M ops/s
> > > > > > >
> > > > > > >         num keys: 10000
> > > > > > > hashmap (control) sequential    get:  hits throughput: 6.995 ± 0.034 M ops/s, hits latency: 142.959 ns/op, important_hits throughput: 6.995 ± 0.034 M ops/s
> > > > > > >
> > > > > > >         num keys: 100000
> > > > > > > hashmap (control) sequential    get:  hits throughput: 4.452 ± 0.371 M ops/s, hits latency: 224.635 ns/op, important_hits throughput: 4.452 ± 0.371 M ops/s
> > > > > > >
> > > > > > >         num keys: 4194304
> > > > > > > hashmap (control) sequential    get:  hits throughput: 3.043 ± 0.033 M ops/s, hits latency: 328.587 ns/op, important_hits throughput: 3.043 ± 0.033 M ops/s
> > > > > > >
> > > > > >
> > > > > > Why is the hashmap lookup not constant with the number of keys? It looks
> > > > > > like its prepopulated without collisions so I wouldn't expect any
> > > > > > extra ops on the lookup side after looking at the code quickly.
> > > > > It may be due to the cpu-cache misses as the map grows.
> > > >
> > > > Maybe but, values are just ints so even 1k * 4B = 4kB should be
> > > > inside an otherwise unused server class system. Would be more
> > > > believable (to me at least) if the drop off happened at 100k or
> > > > more.
> > > It is not only value (and key) size.  There is overhead.
> > > htab_elem alone is 48bytes.  key and value need to 8bytes align also.
> > >
> >
> > Right late night math didn't add up. Now I'm wondering if we can make
> > hashmap behave much better, that drop off is looking really ugly.
> >
> > > From a random machine:
> > > lscpu -C
> > > NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL  SETS PHY-LINE COHERENCY-SIZE
> > > L1d       32K     576K    8 Data            1    64        1             64
> > > L1i       32K     576K    8 Instruction     1    64        1             64
> > > L2         1M      18M   16 Unified         2  1024        1             64
> > > L3      24.8M    24.8M   11 Unified         3 36864        1             64
> >
> > Could you do a couple more data point then, num keys=100,200,400? I would
> > expect those to fit in the cache and be same as 10 by the cache theory. I
> > could try as well but looking like Friday before I have a spare moment.
> 
> I think the benchmark achieved its goal :)
> It generated plenty of interesting data.
> Pulling random out of hot loop and any other improvements
> can be done as follow ups.
> Pushed it to bpf-next.

Yep just realized I hadn't ACK'd it yet. Thanks for the patches my
guess is we can improve the hashmap a bunch. FWIW we use the
hashmap as described here so wondering if we need to cut over to
task storage or just make hashmap better at large values.

next prev parent reply	other threads:[~2022-06-23  4:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20 22:25 [PATCH v6 bpf-next] selftests/bpf: Add benchmark for local_storage get Dave Marchevsky
2022-06-21 19:17 ` John Fastabend
2022-06-22  0:29   ` Martin KaFai Lau
2022-06-22  5:49     ` John Fastabend
2022-06-22 17:26       ` Martin KaFai Lau
2022-06-23  1:26         ` John Fastabend
2022-06-23  2:18           ` Alexei Starovoitov
2022-06-23  3:25             ` John Fastabend [this message]
2022-06-23  2:53           ` Dave Marchevsky
2022-06-23  3:27             ` John Fastabend
2022-06-23  2:31   ` Dave Marchevsky
2022-06-23  2:20 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62b3dd2d8f81c_6a3b22085b@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.