From: Anton Protopopov <aspsk@isovalent.com>
To: Hou Tao <houtao@huaweicloud.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>,
bpf@vger.kernel.org
Subject: Re: [PATCH v4 bpf-next 5/6] selftests/bpf: test map percpu stats
Date: Fri, 7 Jul 2023 07:28:18 +0000 [thread overview]
Message-ID: <ZKe+kudf9kSQatjB@zh-lab-node-5> (raw)
In-Reply-To: <6bf45335-67ef-4eb0-0e97-c3b3ee55a451@huaweicloud.com>
On Fri, Jul 07, 2023 at 09:41:03AM +0800, Hou Tao wrote:
> Hi,
>
> On 7/6/2023 8:57 PM, Anton Protopopov wrote:
> > On Thu, Jul 06, 2023 at 08:21:17PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 7/6/2023 7:54 PM, Anton Protopopov wrote:
> >>
> SNIP
> >>> +static void delete_and_lookup_batch(int map_fd, void *keys, __u32 count)
> >>> +{
> >>> + static __u8 values[(8 << 10) * MAX_ENTRIES];
> >>> + void *in_batch = NULL, *out_batch;
> >>> + __u32 save_count = count;
> >>> + int ret;
> >>> +
> >>> + ret = bpf_map_lookup_and_delete_batch(map_fd,
> >>> + &in_batch, &out_batch,
> >>> + keys, values, &count,
> >>> + NULL);
> >>> +
> >>> + /*
> >>> + * Despite what uapi header says, lookup_and_delete_batch will return
> >>> + * -ENOENT in case we successfully have deleted all elements, so check
> >>> + * this separately
> >>> + */
> >>>> It seems it is a bug in __htab_map_lookup_and_delete_batch(). I could
> >>>> post a patch to fix it if you don't plan to do that by yourself.
> >>> This should be as simple as
> >>>
> >>> @@ -1876,7 +1876,8 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >>> total += bucket_cnt;
> >>> batch++;
> >>> if (batch >= htab->n_buckets) {
> >>> - ret = -ENOENT;
> >>> + if (!total)
> >>> + ret = -ENOENT;
> >>> goto after_loop;
> >>> }
> >>> goto again;
> >> No. I think changing it to "if (max_count > total) ret = -ENOENT;" will
> >> be more appropriate, because it means the requested count couldn't been
> >> fulfilled and it is also consistent with the comments in
> >> include/uapi/linux/bpf.h
> > Say, I have a map of size N and I don't know how many entries there are.
> > Then I will do
> >
> > count=N
> > lookup_and_delete(&count)
> >
> > In this case we will walk through the whole map, reach the 'batch >=
> > htab->n_buckets', and set the count to the number of elements we read.
> >
> > (If, in opposite, there's no space to read a whole bucket, then we check this
> > above and return -ENOSPC.)
> >
> >>> However, this might be already utilized by some apps to check that they've read
> >>> all entries. Two local examples are map_tests/map_in_map_batch_ops.c and
> >>> map_tests/htab_map_batch_ops.c. Another example I know is from BCC tools:
> >>> https://github.com/iovisor/bcc/blob/master/libbpf-tools/map_helpers.c#L58
> >> I think these use cases will be fine. Because when the last element has
> >> been successfully iterated and returned, the out_batch is also updated,
> >> so if the batch op is called again, -ENOENT will be returned.
> >>> Can we update comments in include/uapi/linux/bpf.h?
> >> I think the comments are correct.
> > Currently we return -ENOENT as an indicator that (a) 'in_batch' is out of
> > bounds (b) we reached the end of map. So technically, this is an optimization,
> > as if we read elements in a loop by passing 'in_batch', 'out_batch', even if we
> > return 0 in case (b), the next syscall would return -ENOENT, because the new
> > 'in_batch' would point to out of bounds.
> >
> > This also makes sense for a map which is empty: we reached the end of map,
> > didn't find any elements, so we're returning -ENOENT (in contrast with saying
> > "all is ok, we read 0 elements").
> >
> > So from my point of view -ENOENT makes sense. However, comments say "Returns
> > zero on success" which doesn't look true to me as I think that reading the
> > whole map in one syscall is a success :)
>
> I get your point. The current implementation of BPF_MAP_LOOKUP_BATCH
> does the following two things:
> 1) returns 0 when the whole map has not been iterated but there is no
> space for current bucket.
The algorithm works per bucket. For a bucket number X it checks if there is
enough space in the output buffer to store all bucket elements. If there is,
ok, go to the next bucket. If not, then it checks if any elements were written
already [from previous buckets]. If not, then it returns -ENOSPC, meaning,
"you've asked to copy at most N elements, but I can only copy M > N, not less,
please provide a bigger buffer."
> 2) doesn't return 0 when the whole map has been iterated successfully
> (and the requested count is fulfilled)
>
> For 1) I prefer to update the comments in uapi. If instead we fix the
> implementation, we may break the existed users which need to check
> ENOSPC to continue the batch op.
> For 2) I don't have a preference. Both updating the comments and
> implementation are fine to me.
>
> WDYT ?
>
I think that (1) is perfectly fine, -ENOSPC is returned only when we can't copy
elements, which is an error.
The (2) requires updating docs. The API is similar to get_next_key, and docs
can be updated in the same way. By updating docs we're not changing any uapi,
right?
next prev parent reply other threads:[~2023-07-07 7:27 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-05 16:01 [PATCH v4 bpf-next 0/6] bpf: add percpu stats for bpf_map Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 1/6] bpf: add percpu stats for bpf_map elements insertions/deletions Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 2/6] bpf: add a new kfunc to return current bpf_map elements count Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 3/6] bpf: populate the per-cpu insertions/deletions counters for hashmaps Anton Protopopov
2023-07-06 1:24 ` Alexei Starovoitov
2023-07-06 5:47 ` Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 4/6] bpf: make preloaded map iterators to display map elements count Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 5/6] selftests/bpf: test map percpu stats Anton Protopopov
2023-07-06 10:49 ` Hou Tao
2023-07-06 11:54 ` Anton Protopopov
2023-07-06 12:21 ` Hou Tao
2023-07-06 12:57 ` Anton Protopopov
2023-07-07 1:41 ` Hou Tao
2023-07-07 7:28 ` Anton Protopopov [this message]
2023-07-07 9:40 ` Hou Tao
2023-07-07 11:06 ` Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 6/6] selftests/bpf: check that ->elem_count is non-zero for the hash map Anton Protopopov
2023-07-06 1:26 ` Alexei Starovoitov
2023-07-06 5:44 ` Anton Protopopov
2023-07-06 17:03 ` Alexei Starovoitov
2023-07-06 17:43 ` Anton Protopopov
2023-07-06 17:48 ` Alexei Starovoitov
2023-07-06 18:35 ` Anton Protopopov
2023-07-06 20:33 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZKe+kudf9kSQatjB@zh-lab-node-5 \
--to=aspsk@isovalent.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=houtao@huaweicloud.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.