public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Anton Protopopov <aspsk@isovalent.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Joe Stringer <joe@isovalent.com>,
	John Fastabend <john.fastabend@gmail.com>
Subject: Re: [PATCH bpf-next 1/2] bpf: add new map ops ->map_pressure
Date: Thu, 1 Jun 2023 18:18:44 +0000	[thread overview]
Message-ID: <ZHjhBFLLnUcSy9Tt@zh-lab-node-5> (raw)
In-Reply-To: <CAADnVQ+67FF=JsxTDxoo2XL8zSh5Y3xptGee8vBj8OwP3b=aew@mail.gmail.com>

On Thu, Jun 01, 2023 at 09:39:43AM -0700, Alexei Starovoitov wrote:
> On Thu, Jun 1, 2023 at 12:30 AM Anton Protopopov <aspsk@isovalent.com> wrote:
> >
> > On Wed, May 31, 2023 at 11:24:29AM -0700, Alexei Starovoitov wrote:
> > > On Wed, May 31, 2023 at 11:05:10AM +0000, Anton Protopopov wrote:
> > > >  static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
> > > >  {
> > > >     htab_put_fd_value(htab, l);
> > > >
> > > > +   dec_elem_count(htab);
> > > > +
> > > >     if (htab_is_prealloc(htab)) {
> > > >             check_and_free_fields(htab, l);
> > > >             __pcpu_freelist_push(&htab->freelist, &l->fnode);
> > > >     } else {
> > > > -           dec_elem_count(htab);
> > > >             htab_elem_free(htab, l);
> > > >     }
> > > >  }
> > > > @@ -1006,6 +1024,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
> > > >                     if (!l)
> > > >                             return ERR_PTR(-E2BIG);
> > > >                     l_new = container_of(l, struct htab_elem, fnode);
> > > > +                   inc_elem_count(htab);
> > >
> > > The current use_percpu_counter heuristic is far from perfect. It works for some cases,
> > > but will surely get bad as the comment next to PERCPU_COUNTER_BATCH is trying to say.
> > > Hence, there is a big performance risk doing inc/dec everywhere.
> > > Hence, this is a nack: we cannot decrease performance of various maps for few folks
> > > who want to see map stats.
> >
> > This patch adds some inc/dec only for preallocated hashtabs and doesn't change
> > code for BPF_F_NO_PREALLOC (they already do incs/decs where needed). And for
> > preallocated hashtabs we don't need to compare counters,
> 
> exactly. that's why I don't like to add inc/dec that serves no purpose
> other than stats.
> 
> > so a raw (non-batch)
> > percpu counter may be used for this case.
> 
> and you can do it inside your own bpf prog.
> 
> > > If you want to see "pressure", please switch cilium to use bpf_mem_alloc htab and
> > > use tracing style direct 'struct bpf_htab' access like progs/map_ptr_kern.c is demonstrating.
> > > No kernel patches needed.
> > > Then bpf_prog_run such tracing prog and read all internal map info.
> > > It's less convenient that exposing things in uapi, but not being uapi is the point.
> >
> > Thanks for the pointers, this makes sense. However, this doesn't work for LRU
> > which is always pre-allocated. Would it be ok if we add non-batch percpu
> > counter for !BPF_F_NO_PREALLOC case and won't expose it directly to userspace?
> 
> LRU logic doesn't kick in until the map is full.

In fact, it can: a reproducable example is in the self-test from this patch
series. In the test N threads try to insert random values for keys 1..3000
simultaneously. As the result, the map may contain any number of elements,
typically 100 to 1000 (never full 3000, which is also less than the map size).
So a user can't really even closely estimate the number of elements in the LRU
map based on the number of updates (with unique keys). A per-cpu counter
inc/dec'ed from the kernel side would solve this.

> If your LRU map is not full you shouldn't be using LRU in the first place.

This makes sense, yes, especially that LRU evictions may happen randomly,
without a map being full. I will step back with this patch until we investigate
if we can replace LRUs with hashes.

Thanks for the comments!

  reply	other threads:[~2023-06-01 18:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31 11:05 [PATCH bpf-next 0/2] add mechanism to report map pressure Anton Protopopov
2023-05-31 11:05 ` [PATCH bpf-next 1/2] bpf: add new map ops ->map_pressure Anton Protopopov
2023-05-31 18:24   ` Alexei Starovoitov
2023-06-01  7:31     ` Anton Protopopov
2023-06-01 16:39       ` Alexei Starovoitov
2023-06-01 18:18         ` Anton Protopopov [this message]
2023-06-01 18:24           ` Alexei Starovoitov
2023-06-02  0:40             ` Alexei Starovoitov
2023-06-02 14:21               ` Anton Protopopov
2023-06-02 16:23                 ` Alexei Starovoitov
2023-06-06  7:49                   ` Anton Protopopov
2023-06-24  0:00                   ` John Fastabend
2023-06-26 16:29                     ` Anton Protopopov
2023-06-28 13:17             ` Anton Protopopov
2023-06-01  0:44   ` kernel test robot
2023-06-01  7:50     ` Anton Protopopov
2023-06-13  8:23       ` Yujie Liu
2023-06-13  8:30         ` Anton Protopopov
2023-05-31 11:05 ` [PATCH bpf-next 2/2] selftests/bpf: test map pressure Anton Protopopov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZHjhBFLLnUcSy9Tt@zh-lab-node-5 \
    --to=aspsk@isovalent.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=joe@isovalent.com \
    --cc=john.fastabend@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox