Re: [PATCH v4 bpf-next 5/6] selftests/bpf: test map percpu stats

BPF List
 help / color / mirror / Atom feed

From: Anton Protopopov <aspsk@isovalent.com>
To: Hou Tao <houtao@huaweicloud.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>,
	bpf@vger.kernel.org
Subject: Re: [PATCH v4 bpf-next 5/6] selftests/bpf: test map percpu stats
Date: Fri, 7 Jul 2023 11:06:55 +0000	[thread overview]
Message-ID: <ZKfxz2BB83jGwQJG@zh-lab-node-5> (raw)
In-Reply-To: <25b0dd44-ecd5-989a-63b5-d0156b5e4b34@huaweicloud.com>

On Fri, Jul 07, 2023 at 05:40:28PM +0800, Hou Tao wrote:
> Hi,
> 
> On 7/7/2023 3:28 PM, Anton Protopopov wrote:
> > On Fri, Jul 07, 2023 at 09:41:03AM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 7/6/2023 8:57 PM, Anton Protopopov wrote:
> >>> On Thu, Jul 06, 2023 at 08:21:17PM +0800, Hou Tao wrote:
> >>>> Hi,
> >>>>
> >>>> On 7/6/2023 7:54 PM, Anton Protopopov wrote:
> >>>>
> >> SNIP
> >>>>> +static void delete_and_lookup_batch(int map_fd, void *keys, __u32 count)
> >>>>> +{
> >>>>> +	static __u8 values[(8 << 10) * MAX_ENTRIES];
> >>>>> +	void *in_batch = NULL, *out_batch;
> >>>>> +	__u32 save_count = count;
> >>>>> +	int ret;
> >>>>> +
> >>>>> +	ret = bpf_map_lookup_and_delete_batch(map_fd,
> >>>>> +					      &in_batch, &out_batch,
> >>>>> +					      keys, values, &count,
> >>>>> +					      NULL);
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * Despite what uapi header says, lookup_and_delete_batch will return
> >>>>> +	 * -ENOENT in case we successfully have deleted all elements, so check
> >>>>> +	 * this separately
> >>>>> +	 */
> >>>>>> It seems it is a bug in __htab_map_lookup_and_delete_batch(). I could
> >>>>>> post a patch to fix it if you don't plan to do that by yourself.
> >>>>> This should be as simple as
> >>>>>
> >>>>> @@ -1876,7 +1876,8 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> >>>>>         total += bucket_cnt;
> >>>>>         batch++;
> >>>>>         if (batch >= htab->n_buckets) {
> >>>>> -               ret = -ENOENT;
> >>>>> +               if (!total)
> >>>>> +                       ret = -ENOENT;
> >>>>>                 goto after_loop;
> >>>>>         }
> >>>>>         goto again;
> >>>> No. I think changing it to "if (max_count > total) ret = -ENOENT;" will
> >>>> be more appropriate, because it means the requested count couldn't been
> >>>> fulfilled and it is also consistent with the comments in 
> >>>> include/uapi/linux/bpf.h
> >>> Say, I have a map of size N and I don't know how many entries there are.
> >>> Then I will do
> >>>
> >>>     count=N
> >>>     lookup_and_delete(&count)
> >>>
> >>> In this case we will walk through the whole map, reach the 'batch >=
> >>> htab->n_buckets', and set the count to the number of elements we read.
> >>>
> >>> (If, in opposite, there's no space to read a whole bucket, then we check this
> >>> above and return -ENOSPC.)
> >>>
> >>>>> However, this might be already utilized by some apps to check that they've read
> >>>>> all entries. Two local examples are map_tests/map_in_map_batch_ops.c and
> >>>>> map_tests/htab_map_batch_ops.c. Another example I know is from BCC tools:
> >>>>> https://github.com/iovisor/bcc/blob/master/libbpf-tools/map_helpers.c#L58
> >>>> I think these use cases will be fine. Because when the last element has
> >>>> been successfully iterated and returned, the out_batch is also updated,
> >>>> so if the batch op is called again, -ENOENT will be returned.
> >>>>> Can we update comments in include/uapi/linux/bpf.h?
> >>>> I think the comments are correct.
> >>> Currently we return -ENOENT as an indicator that (a) 'in_batch' is out of
> >>> bounds (b) we reached the end of map. So technically, this is an optimization,
> >>> as if we read elements in a loop by passing 'in_batch', 'out_batch', even if we
> >>> return 0 in case (b), the next syscall would return -ENOENT, because the new
> >>> 'in_batch' would point to out of bounds.
> >>>
> >>> This also makes sense for a map which is empty: we reached the end of map,
> >>> didn't find any elements, so we're returning -ENOENT (in contrast with saying
> >>> "all is ok, we read 0 elements").
> >>>
> >>> So from my point of view -ENOENT makes sense. However, comments say "Returns
> >>> zero on success" which doesn't look true to me as I think that reading the
> >>> whole map in one syscall is a success :)
> >> I get your point. The current implementation of BPF_MAP_LOOKUP_BATCH
> >> does the following two things:
> >> 1) returns 0 when the whole map has not been iterated but there is no
> >> space for current bucket.
> > The algorithm works per bucket. For a bucket number X it checks if there is
> > enough space in the output buffer to store all bucket elements. If there is,
> > ok, go to the next bucket. If not, then it checks if any elements were written
> > already [from previous buckets]. If not, then it returns -ENOSPC, meaning,
> > "you've asked to copy at most N elements, but I can only copy M > N, not less,
> > please provide a bigger buffer."
> Yes.
> >
> >> 2) doesn't return 0 when the whole map has been iterated successfully
> >> (and the requested count is fulfilled)
> >>
> >> For 1) I prefer to update the comments in uapi. If instead we fix the
> >> implementation, we may break the existed users which need to check
> >> ENOSPC to continue the batch op.
> >> For 2) I don't have a preference. Both updating the comments and
> >> implementation are fine to me.
> >>
> >> WDYT ?
> >>
> > I think that (1) is perfectly fine, -ENOSPC is returned only when we can't copy
> > elements, which is an error.
> 
> Maybe I misinterpreted the comments in bpf.h. As said in the comment:
> "On success, *count* elements from the map are copied into the user
> buffer",  I think the count here means the value of count which is used
> as input instead of output.

Yes, also may be updated, as this is actually "up to *count*" (*count* is an
output parameter which is not mentioned in the LOOKUP_BATCH description, only
in LOOKUP_AND_DELETE_BATCH). On the other side, the LOOKUP_AND_DELETE_BATCH
comment says "delete at least count" which is not true as well.

> > The (2) requires updating docs. The API is similar to get_next_key, and docs
> > can be updated in the same way. By updating docs we're not changing any uapi,
> > right?
> I think it is fine.

next prev parent reply	other threads:[~2023-07-07 11:06 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-05 16:01 [PATCH v4 bpf-next 0/6] bpf: add percpu stats for bpf_map Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 1/6] bpf: add percpu stats for bpf_map elements insertions/deletions Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 2/6] bpf: add a new kfunc to return current bpf_map elements count Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 3/6] bpf: populate the per-cpu insertions/deletions counters for hashmaps Anton Protopopov
2023-07-06  1:24   ` Alexei Starovoitov
2023-07-06  5:47     ` Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 4/6] bpf: make preloaded map iterators to display map elements count Anton Protopopov
2023-07-05 16:01 ` [PATCH v4 bpf-next 5/6] selftests/bpf: test map percpu stats Anton Protopopov
2023-07-06 10:49   ` Hou Tao
2023-07-06 11:54     ` Anton Protopopov
2023-07-06 12:21       ` Hou Tao
2023-07-06 12:57         ` Anton Protopopov
2023-07-07  1:41           ` Hou Tao
2023-07-07  7:28             ` Anton Protopopov
2023-07-07  9:40               ` Hou Tao
2023-07-07 11:06                 ` Anton Protopopov [this message]
2023-07-05 16:01 ` [PATCH v4 bpf-next 6/6] selftests/bpf: check that ->elem_count is non-zero for the hash map Anton Protopopov
2023-07-06  1:26   ` Alexei Starovoitov
2023-07-06  5:44     ` Anton Protopopov
2023-07-06 17:03       ` Alexei Starovoitov
2023-07-06 17:43         ` Anton Protopopov
2023-07-06 17:48           ` Alexei Starovoitov
2023-07-06 18:35             ` Anton Protopopov
2023-07-06 20:33               ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKfxz2BB83jGwQJG@zh-lab-node-5 \
    --to=aspsk@isovalent.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=houtao@huaweicloud.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox