From: Anton Protopopov <aspsk@isovalent.com>
To: Hou Tao <houtao@huaweicloud.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>,
bpf@vger.kernel.org
Subject: Re: [PATCH v4 bpf-next 5/6] selftests/bpf: test map percpu stats
Date: Thu, 6 Jul 2023 12:57:16 +0000 [thread overview]
Message-ID: <ZKa6LHj295dY7G+q@zh-lab-node-5> (raw)
In-Reply-To: <43425377-667b-ab01-951a-0513ef79a59d@huaweicloud.com>
On Thu, Jul 06, 2023 at 08:21:17PM +0800, Hou Tao wrote:
> Hi,
>
> On 7/6/2023 7:54 PM, Anton Protopopov wrote:
> > On Thu, Jul 06, 2023 at 06:49:02PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 7/6/2023 12:01 AM, Anton Protopopov wrote:
> >>> Add a new map test, map_percpu_stats.c, which is checking the correctness of
> >>> map's percpu elements counters. For supported maps the test upserts a number
> >>> of elements, checks the correctness of the counters, then deletes all the
> >>> elements and checks again that the counters sum drops down to zero.
> >>>
> >>> The following map types are tested:
> >>>
> >>> * BPF_MAP_TYPE_HASH, BPF_F_NO_PREALLOC
> >>> * BPF_MAP_TYPE_PERCPU_HASH, BPF_F_NO_PREALLOC
> >>> * BPF_MAP_TYPE_HASH,
> >>> * BPF_MAP_TYPE_PERCPU_HASH,
> >>> * BPF_MAP_TYPE_LRU_HASH
> >>> * BPF_MAP_TYPE_LRU_PERCPU_HASH
> >>> * BPF_MAP_TYPE_LRU_HASH, BPF_F_NO_COMMON_LRU
> >>> * BPF_MAP_TYPE_LRU_PERCPU_HASH, BPF_F_NO_COMMON_LRU
> >>> * BPF_MAP_TYPE_HASH_OF_MAPS
> >>>
> >>> Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
> >> Acked-by: Hou Tao <houtao1@huawei.com>
> >>
> >> With two nits below.
> > Thanks, fixed both for v5.
> Great.
> SNIP
> > +static void delete_and_lookup_batch(int map_fd, void *keys, __u32 count)
> > +{
> > + static __u8 values[(8 << 10) * MAX_ENTRIES];
> > + void *in_batch = NULL, *out_batch;
> > + __u32 save_count = count;
> > + int ret;
> > +
> > + ret = bpf_map_lookup_and_delete_batch(map_fd,
> > + &in_batch, &out_batch,
> > + keys, values, &count,
> > + NULL);
> > +
> > + /*
> > + * Despite what uapi header says, lookup_and_delete_batch will return
> > + * -ENOENT in case we successfully have deleted all elements, so check
> > + * this separately
> > + */
> >> It seems it is a bug in __htab_map_lookup_and_delete_batch(). I could
> >> post a patch to fix it if you don't plan to do that by yourself.
> > This should be as simple as
> >
> > @@ -1876,7 +1876,8 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> > total += bucket_cnt;
> > batch++;
> > if (batch >= htab->n_buckets) {
> > - ret = -ENOENT;
> > + if (!total)
> > + ret = -ENOENT;
> > goto after_loop;
> > }
> > goto again;
>
> No. I think changing it to "if (max_count > total) ret = -ENOENT;" will
> be more appropriate, because it means the requested count couldn't been
> fulfilled and it is also consistent with the comments in
> include/uapi/linux/bpf.h
Say, I have a map of size N and I don't know how many entries there are.
Then I will do
count=N
lookup_and_delete(&count)
In this case we will walk through the whole map, reach the 'batch >=
htab->n_buckets', and set the count to the number of elements we read.
(If, in opposite, there's no space to read a whole bucket, then we check this
above and return -ENOSPC.)
> > However, this might be already utilized by some apps to check that they've read
> > all entries. Two local examples are map_tests/map_in_map_batch_ops.c and
> > map_tests/htab_map_batch_ops.c. Another example I know is from BCC tools:
> > https://github.com/iovisor/bcc/blob/master/libbpf-tools/map_helpers.c#L58
> I think these use cases will be fine. Because when the last element has
> been successfully iterated and returned, the out_batch is also updated,
> so if the batch op is called again, -ENOENT will be returned.
> >
> > Can we update comments in include/uapi/linux/bpf.h?
> I think the comments are correct.
Currently we return -ENOENT as an indicator that (a) 'in_batch' is out of
bounds (b) we reached the end of map. So technically, this is an optimization,
as if we read elements in a loop by passing 'in_batch', 'out_batch', even if we
return 0 in case (b), the next syscall would return -ENOENT, because the new
'in_batch' would point to out of bounds.
This also makes sense for a map which is empty: we reached the end of map,
didn't find any elements, so we're returning -ENOENT (in contrast with saying
"all is ok, we read 0 elements").
So from my point of view -ENOENT makes sense. However, comments say "Returns
zero on success" which doesn't look true to me as I think that reading the
whole map in one syscall is a success :)
next prev parent reply other threads:[~2023-07-06 12:56 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-05 16:01 [PATCH v4 bpf-next 0/6] bpf: add percpu stats for bpf_map Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 1/6] bpf: add percpu stats for bpf_map elements insertions/deletions Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 2/6] bpf: add a new kfunc to return current bpf_map elements count Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 3/6] bpf: populate the per-cpu insertions/deletions counters for hashmaps Anton Protopopov
2023-07-06 1:24 ` Alexei Starovoitov
2023-07-06 5:47 ` Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 4/6] bpf: make preloaded map iterators to display map elements count Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 5/6] selftests/bpf: test map percpu stats Anton Protopopov
2023-07-06 10:49 ` Hou Tao
2023-07-06 11:54 ` Anton Protopopov
2023-07-06 12:21 ` Hou Tao
2023-07-06 12:57 ` Anton Protopopov [this message]
2023-07-07 1:41 ` Hou Tao
2023-07-07 7:28 ` Anton Protopopov
2023-07-07 9:40 ` Hou Tao
2023-07-07 11:06 ` Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 6/6] selftests/bpf: check that ->elem_count is non-zero for the hash map Anton Protopopov
2023-07-06 1:26 ` Alexei Starovoitov
2023-07-06 5:44 ` Anton Protopopov
2023-07-06 17:03 ` Alexei Starovoitov
2023-07-06 17:43 ` Anton Protopopov
2023-07-06 17:48 ` Alexei Starovoitov
2023-07-06 18:35 ` Anton Protopopov
2023-07-06 20:33 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZKa6LHj295dY7G+q@zh-lab-node-5 \
--to=aspsk@isovalent.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=houtao@huaweicloud.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.