linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Matt Fleming <matt@readmodwrite.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Shuah Khan <shuah@kernel.org>,
	 kernel-team <kernel-team@cloudflare.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	 "open list:KERNEL SELFTEST FRAMEWORK"
	<linux-kselftest@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,  bpf <bpf@vger.kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	 Yonghong Song <yonghong.song@linux.dev>,
	Network Development <netdev@vger.kernel.org>,
	 Matt Fleming <mfleming@cloudflare.com>
Subject: Re: [PATCH bpf-next v3] selftests/bpf: Add LPM trie microbenchmarks
Date: Thu, 31 Jul 2025 09:41:22 -0700	[thread overview]
Message-ID: <CAADnVQLnicTicjJhH8gUJK+mpngg5rVoJuQGMiypwtmyC01ZOw@mail.gmail.com> (raw)
In-Reply-To: <CAENh_SS2R3aQByV_=WRCO=ZHknk_+pV7RhXA4qx5OGMBN1SnOA@mail.gmail.com>

On Tue, Jul 29, 2025 at 6:56 AM Matt Fleming <matt@readmodwrite.com> wrote:
>
> On Mon, Jul 28, 2025 at 3:35 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > Please make a full description of what the test does,
> > since it's not trivial to decipher from the code.
> > If I'm reading it correctly, first, the user space
> > makes the LPM completely full and then lookup/update
> > use random key-s within range.
>
> Yep, that's correct. I'll provide a more comprehensive description in v4.
>
> > But delete looks different. It seems the kernel delete
> > operation can interleave with user space refilling the LPM,
> > so ...
> >
> > >   lookup: throughput    7.423 ± 0.023 M ops/s (  7.423M ops/prod), latency  134.710 ns/op
> > >   update: throughput    2.643 ± 0.015 M ops/s (  2.643M ops/prod), latency  378.310 ns/op
> > >   delete: throughput    0.712 ± 0.008 M ops/s (  0.712M ops/prod), latency 1405.152 ns/op
> >
> > this comparison doesn't look like apples to apples,
> > since delete will include user space refill time.
>
> Yeah, you're right. Though we only measure the delete time from in the
> BPF prog, delete throughput is essentially blocked while the refill
> happens and because things are measured with a 1-second timer
> (bench.c:sigalarm_handler) the refill time gets accounted for anyway.
> I could try extrapolating the delete time like I've done for the free
> op, i.e. we calculate how many ops were completed in what fraction of
> a second.
>
> > >     free: throughput    0.574 ± 0.003 K ops/s (  0.574K ops/prod), latency    1.743 ms/op
> >
> > Does this measure the free-ing of full LPM ?
>
> Yes, this measures the total time to free every element in the trie.
>
> > > +static void __lpm_validate(void)
> >
> > why double underscore ?
> > Just lpm_validate() ?
>
> The double underscore is used for the common validation parts, but
> I'll rename this to include "_common()" (along with all other uses of
> __).

No. It's also wrong.
Double underscore or _common suffix are both meaningless.
The helper name should describe what it's for.

> > > +       /*
> > > +        * Ideally we'd have access to the map ID but that's already
> > > +        * freed before we enter trie_free().
> > > +        */
> > > +       BPF_CORE_READ_STR_INTO(&name, map, name);
> > > +       if (bpf_strncmp(name, BPF_OBJ_NAME_LEN, "trie_free_map"))
> > > +               return 0;
> > > +
> > > +       val = bpf_ktime_get_ns();
> > > +       bpf_map_update_elem(&latency_free_start, &map, &val, BPF_ANY);
> >
> > Looks like there is only one lpm map.
> > What's the point of that extra map ?
> > Just have a global var for start time ?
>
> Sure, I can make this a global variable for the start time instead.
>
> > bpf_get_prandom_u32() is not free
> > and modulo operation isn't free either.
> > The benchmark includes their time.
> > It's ok to have it, but add a mode where the bench
> > tests linear lookup/update too with simple key.data++
>
> Good idea.
>
> > Since the map suppose to full before we start all keys
> > should be there, right?
>
> Yes.
>
> > Let's add a sanity check that update() succeeds.
>
> Will do.
>
> > > +static int delete (__u32 index, bool *need_refill)
> > > +{
> > > +       struct trie_key key = {
> > > +               .data = deleted_entries,
> > > +               .prefixlen = prefixlen,
> > > +       };
> > > +
> > > +       bpf_map_delete_elem(&trie_map, &key);
> > > +
> > > +       /* Do we need to refill the map? */
> > > +       if (++deleted_entries == nr_entries) {
> > > +               /*
> > > +                * Atomicity isn't required because DELETE only supports
> > > +                * one producer running concurrently. What we need is a
> > > +                * way to track how many entries have been deleted from
> > > +                * the trie between consecutive invocations of the BPF
> > > +                * prog because a single bpf_loop() call might not
> > > +                * delete all entries, e.g. when NR_LOOPS < nr_entries.
> > > +                */
> > > +               deleted_entries = 0;
> > > +               *need_refill = true;
> > > +               return 1;
> >
> > This early break is another reason that makes
> > 'delete' op different from 'lookup/update'.
> > Pls make all 3 consistent, so they can be compared to each other.
>
> Hmm.. I'm not quite sure how to do that. lookup/update don't modify
> the number of entries in the map whereas delete does (update only
> updates existing entries, it doesn't create new ones). So when the map
> is empty it needs to be refilled. You're right that somehow the time
> to refill needs to be removed from the delete throughput/latency
> numbers but fundamentally these 3 ops are not the same and I don't see
> how to treat them as such.

well, random-key update when the map is full is also quite different from
random-key update when the map is empty.

Instead doing an update from user space do timed ops:
1 start with empty map, update (aka insert) all keys sequentially
2 lookup all sequentially
3 delete all sequentially
4 update (aka insert) all sequentially
5 lookup random
6 update random
7 delete all random

The elapsed time for 1 and 4 should be exactly the same.
While all others might have differences,
but all can be compared to each other and reasoned about.

  reply	other threads:[~2025-07-31 16:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-22 15:01 [PATCH bpf-next v3] selftests/bpf: Add LPM trie microbenchmarks Matt Fleming
2025-07-28 14:34 ` Alexei Starovoitov
2025-07-29 13:56   ` Matt Fleming
2025-07-31 16:41     ` Alexei Starovoitov [this message]
2025-08-08 14:21       ` Matt Fleming
2025-08-08 16:42         ` Alexei Starovoitov
2025-08-13 16:59   ` Jesper Dangaard Brouer
2025-08-08 15:51 ` Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQLnicTicjJhH8gUJK+mpngg5rVoJuQGMiypwtmyC01ZOw@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=hawk@kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=matt@readmodwrite.com \
    --cc=mfleming@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).