* [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch
@ 2025-02-10 7:22 Yan Zhai
2025-02-10 7:22 ` [PATCH v3 bpf 1/2] " Yan Zhai
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Yan Zhai @ 2025-02-10 7:22 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Mykola Lysenko, Shuah Khan, Yan Zhai, Brian Vazquez, linux-kernel,
linux-kselftest, kernel-team, Hou Tao
The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.
This patch fixes this behavior by skipping the non-existing key, and
does not return EINTR any more.
V2->V3: deleted a unused macro
V1->V2: split the fix and selftests; fixed a few selftests issues.
V2: https://lore.kernel.org/bpf/cover.1738905497.git.yan@cloudflare.com/
V1: https://lore.kernel.org/bpf/Z6OYbS4WqQnmzi2z@debian.debian/
Yan Zhai (2):
bpf: skip non exist keys in generic_map_lookup_batch
selftests: bpf: test batch lookup on array of maps with holes
kernel/bpf/syscall.c | 18 ++----
.../bpf/map_tests/map_in_map_batch_ops.c | 62 +++++++++++++------
2 files changed, 49 insertions(+), 31 deletions(-)
--
2.39.5
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 bpf 1/2] bpf: skip non exist keys in generic_map_lookup_batch
2025-02-10 7:22 [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch Yan Zhai
@ 2025-02-10 7:22 ` Yan Zhai
2025-02-10 9:19 ` Jiri Olsa
2025-02-10 7:22 ` [PATCH v3 bpf 2/2] selftests: bpf: test batch lookup on array of maps with holes Yan Zhai
2025-02-19 1:40 ` [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch patchwork-bot+netdevbpf
2 siblings, 1 reply; 8+ messages in thread
From: Yan Zhai @ 2025-02-10 7:22 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Mykola Lysenko, Shuah Khan, Yan Zhai, Brian Vazquez, linux-kernel,
linux-kselftest, kernel-team, Hou Tao
The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.
Rather, do not retry in generic_map_lookup_batch. If it finds a non
existing element, skip to the next key. This simple solution comes with
a price that transient errors may not be recovered, and the iteration
might cycle back to the first key under parallel deletion. For example,
Hou Tao <houtao@huaweicloud.com> pointed out a following scenario:
For LPM trie map:
(1) ->map_get_next_key(map, prev_key, key) returns a valid key
(2) bpf_map_copy_value() return -ENOMENT
It means the key must be deleted concurrently.
(3) goto next_key
It swaps the prev_key and key
(4) ->map_get_next_key(map, prev_key, key) again
prev_key points to a non-existing key, for LPM trie it will treat just
like prev_key=NULL case, the returned key will be duplicated.
With the retry logic, the iteration can continue to the key next to the
deleted one. But if we directly skip to the next key, the iteration loop
would restart from the first key for the lpm_trie type.
However, not all races may be recovered. For example, if current key is
deleted after instead of before bpf_map_copy_value, or if the prev_key
also gets deleted, then the loop will still restart from the first key
for lpm_tire anyway. For generic lookup it might be better to stay
simple, i.e. just skip to the next key. To guarantee that the output
keys are not duplicated, it is better to implement map type specific
batch operations, which can properly lock the trie and synchronize with
concurrent mutators.
Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Hou Tao <houtao1@huawei.com>
---
v2->v3: deleted a used macro
v1->v2: incorporate more useful information inside commit message.
---
kernel/bpf/syscall.c | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c420edbfb7c8..e5f1c7fd0ba7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1968,8 +1968,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
return err;
}
-#define MAP_LOOKUP_RETRIES 3
-
int generic_map_lookup_batch(struct bpf_map *map,
const union bpf_attr *attr,
union bpf_attr __user *uattr)
@@ -1979,8 +1977,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
void __user *values = u64_to_user_ptr(attr->batch.values);
void __user *keys = u64_to_user_ptr(attr->batch.keys);
void *buf, *buf_prevkey, *prev_key, *key, *value;
- int err, retry = MAP_LOOKUP_RETRIES;
u32 value_size, cp, max_count;
+ int err;
if (attr->batch.elem_flags & ~BPF_F_LOCK)
return -EINVAL;
@@ -2026,14 +2024,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
err = bpf_map_copy_value(map, key, value,
attr->batch.elem_flags);
- if (err == -ENOENT) {
- if (retry) {
- retry--;
- continue;
- }
- err = -EINTR;
- break;
- }
+ if (err == -ENOENT)
+ goto next_key;
if (err)
goto free_buf;
@@ -2048,12 +2040,12 @@ int generic_map_lookup_batch(struct bpf_map *map,
goto free_buf;
}
+ cp++;
+next_key:
if (!prev_key)
prev_key = buf_prevkey;
swap(prev_key, key);
- retry = MAP_LOOKUP_RETRIES;
- cp++;
cond_resched();
}
--
2.39.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 bpf 2/2] selftests: bpf: test batch lookup on array of maps with holes
2025-02-10 7:22 [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch Yan Zhai
2025-02-10 7:22 ` [PATCH v3 bpf 1/2] " Yan Zhai
@ 2025-02-10 7:22 ` Yan Zhai
2025-02-19 1:40 ` [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch patchwork-bot+netdevbpf
2 siblings, 0 replies; 8+ messages in thread
From: Yan Zhai @ 2025-02-10 7:22 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Mykola Lysenko, Shuah Khan, Yan Zhai, Brian Vazquez, linux-kernel,
linux-kselftest, kernel-team, Hou Tao
Iterating through array of maps may encounter non existing keys. The
batch operation should not fail on when this happens.
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Hou Tao <houtao1@huawei.com>
---
.../bpf/map_tests/map_in_map_batch_ops.c | 62 +++++++++++++------
1 file changed, 44 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/bpf/map_tests/map_in_map_batch_ops.c b/tools/testing/selftests/bpf/map_tests/map_in_map_batch_ops.c
index 66191ae9863c..79c3ccadb962 100644
--- a/tools/testing/selftests/bpf/map_tests/map_in_map_batch_ops.c
+++ b/tools/testing/selftests/bpf/map_tests/map_in_map_batch_ops.c
@@ -120,11 +120,12 @@ static void validate_fetch_results(int outer_map_fd,
static void fetch_and_validate(int outer_map_fd,
struct bpf_map_batch_opts *opts,
- __u32 batch_size, bool delete_entries)
+ __u32 batch_size, bool delete_entries,
+ bool has_holes)
{
- __u32 *fetched_keys, *fetched_values, total_fetched = 0;
- __u32 batch_key = 0, fetch_count, step_size;
- int err, max_entries = OUTER_MAP_ENTRIES;
+ int err, max_entries = OUTER_MAP_ENTRIES - !!has_holes;
+ __u32 *fetched_keys, *fetched_values, total_fetched = 0, i;
+ __u32 batch_key = 0, fetch_count, step_size = batch_size;
__u32 value_size = sizeof(__u32);
/* Total entries needs to be fetched */
@@ -134,9 +135,8 @@ static void fetch_and_validate(int outer_map_fd,
"Memory allocation failed for fetched_keys or fetched_values",
"error=%s\n", strerror(errno));
- for (step_size = batch_size;
- step_size <= max_entries;
- step_size += batch_size) {
+ /* hash map may not always return full batch */
+ for (i = 0; i < OUTER_MAP_ENTRIES; i++) {
fetch_count = step_size;
err = delete_entries
? bpf_map_lookup_and_delete_batch(outer_map_fd,
@@ -155,6 +155,7 @@ static void fetch_and_validate(int outer_map_fd,
if (err && errno == ENOSPC) {
/* Fetch again with higher batch size */
total_fetched = 0;
+ step_size += batch_size;
continue;
}
@@ -184,18 +185,19 @@ static void fetch_and_validate(int outer_map_fd,
}
static void _map_in_map_batch_ops(enum bpf_map_type outer_map_type,
- enum bpf_map_type inner_map_type)
+ enum bpf_map_type inner_map_type,
+ bool has_holes)
{
+ __u32 max_entries = OUTER_MAP_ENTRIES - !!has_holes;
__u32 *outer_map_keys, *inner_map_fds;
- __u32 max_entries = OUTER_MAP_ENTRIES;
LIBBPF_OPTS(bpf_map_batch_opts, opts);
__u32 value_size = sizeof(__u32);
int batch_size[2] = {5, 10};
__u32 map_index, op_index;
int outer_map_fd, ret;
- outer_map_keys = calloc(max_entries, value_size);
- inner_map_fds = calloc(max_entries, value_size);
+ outer_map_keys = calloc(OUTER_MAP_ENTRIES, value_size);
+ inner_map_fds = calloc(OUTER_MAP_ENTRIES, value_size);
CHECK((!outer_map_keys || !inner_map_fds),
"Memory allocation failed for outer_map_keys or inner_map_fds",
"error=%s\n", strerror(errno));
@@ -209,6 +211,24 @@ static void _map_in_map_batch_ops(enum bpf_map_type outer_map_type,
((outer_map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
? 9 : 1000) - map_index;
+ /* This condition is only meaningful for array of maps.
+ *
+ * max_entries == OUTER_MAP_ENTRIES - 1 if it is true. Say
+ * max_entries is short for n, then outer_map_keys looks like:
+ *
+ * [n, n-1, ... 2, 1]
+ *
+ * We change it to
+ *
+ * [n, n-1, ... 2, 0]
+ *
+ * So it will leave key 1 as a hole. It will serve to test the
+ * correctness when batch on an array: a "non-exist" key might be
+ * actually allocated and returned from key iteration.
+ */
+ if (has_holes)
+ outer_map_keys[max_entries - 1]--;
+
/* batch operation - map_update */
ret = bpf_map_update_batch(outer_map_fd, outer_map_keys,
inner_map_fds, &max_entries, &opts);
@@ -219,15 +239,17 @@ static void _map_in_map_batch_ops(enum bpf_map_type outer_map_type,
/* batch operation - map_lookup */
for (op_index = 0; op_index < 2; ++op_index)
fetch_and_validate(outer_map_fd, &opts,
- batch_size[op_index], false);
+ batch_size[op_index], false,
+ has_holes);
/* batch operation - map_lookup_delete */
if (outer_map_type == BPF_MAP_TYPE_HASH_OF_MAPS)
fetch_and_validate(outer_map_fd, &opts,
- max_entries, true /*delete*/);
+ max_entries, true /*delete*/,
+ has_holes);
/* close all map fds */
- for (map_index = 0; map_index < max_entries; map_index++)
+ for (map_index = 0; map_index < OUTER_MAP_ENTRIES; map_index++)
close(inner_map_fds[map_index]);
close(outer_map_fd);
@@ -237,16 +259,20 @@ static void _map_in_map_batch_ops(enum bpf_map_type outer_map_type,
void test_map_in_map_batch_ops_array(void)
{
- _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_ARRAY);
+ _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_ARRAY, false);
printf("%s:PASS with inner ARRAY map\n", __func__);
- _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_HASH);
+ _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_HASH, false);
printf("%s:PASS with inner HASH map\n", __func__);
+ _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_ARRAY, true);
+ printf("%s:PASS with inner ARRAY map with holes\n", __func__);
+ _map_in_map_batch_ops(BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_HASH, true);
+ printf("%s:PASS with inner HASH map with holes\n", __func__);
}
void test_map_in_map_batch_ops_hash(void)
{
- _map_in_map_batch_ops(BPF_MAP_TYPE_HASH_OF_MAPS, BPF_MAP_TYPE_ARRAY);
+ _map_in_map_batch_ops(BPF_MAP_TYPE_HASH_OF_MAPS, BPF_MAP_TYPE_ARRAY, false);
printf("%s:PASS with inner ARRAY map\n", __func__);
- _map_in_map_batch_ops(BPF_MAP_TYPE_HASH_OF_MAPS, BPF_MAP_TYPE_HASH);
+ _map_in_map_batch_ops(BPF_MAP_TYPE_HASH_OF_MAPS, BPF_MAP_TYPE_HASH, false);
printf("%s:PASS with inner HASH map\n", __func__);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 bpf 1/2] bpf: skip non exist keys in generic_map_lookup_batch
2025-02-10 7:22 ` [PATCH v3 bpf 1/2] " Yan Zhai
@ 2025-02-10 9:19 ` Jiri Olsa
2025-02-10 14:47 ` Brian Vazquez
0 siblings, 1 reply; 8+ messages in thread
From: Jiri Olsa @ 2025-02-10 9:19 UTC (permalink / raw)
To: Yan Zhai
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
Mykola Lysenko, Shuah Khan, Brian Vazquez, linux-kernel,
linux-kselftest, kernel-team, Hou Tao
On Sun, Feb 09, 2025 at 11:22:35PM -0800, Yan Zhai wrote:
> The generic_map_lookup_batch currently returns EINTR if it fails with
> ENOENT and retries several times on bpf_map_copy_value. The next batch
> would start from the same location, presuming it's a transient issue.
> This is incorrect if a map can actually have "holes", i.e.
> "get_next_key" can return a key that does not point to a valid value. At
> least the array of maps type may contain such holes legitly. Right now
> these holes show up, generic batch lookup cannot proceed any more. It
> will always fail with EINTR errors.
>
> Rather, do not retry in generic_map_lookup_batch. If it finds a non
> existing element, skip to the next key. This simple solution comes with
> a price that transient errors may not be recovered, and the iteration
> might cycle back to the first key under parallel deletion. For example,
probably stupid question, but why not keep the retry logic and when
it fails then instead of returning EINTR just jump to the next key
jirka
> Hou Tao <houtao@huaweicloud.com> pointed out a following scenario:
>
> For LPM trie map:
> (1) ->map_get_next_key(map, prev_key, key) returns a valid key
>
> (2) bpf_map_copy_value() return -ENOMENT
> It means the key must be deleted concurrently.
>
> (3) goto next_key
> It swaps the prev_key and key
>
> (4) ->map_get_next_key(map, prev_key, key) again
> prev_key points to a non-existing key, for LPM trie it will treat just
> like prev_key=NULL case, the returned key will be duplicated.
>
> With the retry logic, the iteration can continue to the key next to the
> deleted one. But if we directly skip to the next key, the iteration loop
> would restart from the first key for the lpm_trie type.
>
> However, not all races may be recovered. For example, if current key is
> deleted after instead of before bpf_map_copy_value, or if the prev_key
> also gets deleted, then the loop will still restart from the first key
> for lpm_tire anyway. For generic lookup it might be better to stay
> simple, i.e. just skip to the next key. To guarantee that the output
> keys are not duplicated, it is better to implement map type specific
> batch operations, which can properly lock the trie and synchronize with
> concurrent mutators.
>
> Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
> Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
> Signed-off-by: Yan Zhai <yan@cloudflare.com>
> Acked-by: Hou Tao <houtao1@huawei.com>
> ---
> v2->v3: deleted a used macro
> v1->v2: incorporate more useful information inside commit message.
> ---
> kernel/bpf/syscall.c | 18 +++++-------------
> 1 file changed, 5 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index c420edbfb7c8..e5f1c7fd0ba7 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1968,8 +1968,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> return err;
> }
>
> -#define MAP_LOOKUP_RETRIES 3
> -
> int generic_map_lookup_batch(struct bpf_map *map,
> const union bpf_attr *attr,
> union bpf_attr __user *uattr)
> @@ -1979,8 +1977,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
> void __user *values = u64_to_user_ptr(attr->batch.values);
> void __user *keys = u64_to_user_ptr(attr->batch.keys);
> void *buf, *buf_prevkey, *prev_key, *key, *value;
> - int err, retry = MAP_LOOKUP_RETRIES;
> u32 value_size, cp, max_count;
> + int err;
>
> if (attr->batch.elem_flags & ~BPF_F_LOCK)
> return -EINVAL;
> @@ -2026,14 +2024,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
> err = bpf_map_copy_value(map, key, value,
> attr->batch.elem_flags);
>
> - if (err == -ENOENT) {
> - if (retry) {
> - retry--;
> - continue;
> - }
> - err = -EINTR;
> - break;
> - }
> + if (err == -ENOENT)
> + goto next_key;
>
> if (err)
> goto free_buf;
> @@ -2048,12 +2040,12 @@ int generic_map_lookup_batch(struct bpf_map *map,
> goto free_buf;
> }
>
> + cp++;
> +next_key:
> if (!prev_key)
> prev_key = buf_prevkey;
>
> swap(prev_key, key);
> - retry = MAP_LOOKUP_RETRIES;
> - cp++;
> cond_resched();
> }
>
> --
> 2.39.5
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 bpf 1/2] bpf: skip non exist keys in generic_map_lookup_batch
2025-02-10 9:19 ` Jiri Olsa
@ 2025-02-10 14:47 ` Brian Vazquez
2025-02-10 16:21 ` Yan Zhai
0 siblings, 1 reply; 8+ messages in thread
From: Brian Vazquez @ 2025-02-10 14:47 UTC (permalink / raw)
To: Jiri Olsa
Cc: Yan Zhai, bpf, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Mykola Lysenko, Shuah Khan,
linux-kernel, linux-kselftest, kernel-team, Hou Tao
On Mon, Feb 10, 2025 at 4:19 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Sun, Feb 09, 2025 at 11:22:35PM -0800, Yan Zhai wrote:
> > The generic_map_lookup_batch currently returns EINTR if it fails with
> > ENOENT and retries several times on bpf_map_copy_value. The next batch
> > would start from the same location, presuming it's a transient issue.
> > This is incorrect if a map can actually have "holes", i.e.
> > "get_next_key" can return a key that does not point to a valid value. At
> > least the array of maps type may contain such holes legitly. Right now
> > these holes show up, generic batch lookup cannot proceed any more. It
> > will always fail with EINTR errors.
> >
> > Rather, do not retry in generic_map_lookup_batch. If it finds a non
> > existing element, skip to the next key. This simple solution comes with
> > a price that transient errors may not be recovered, and the iteration
> > might cycle back to the first key under parallel deletion. For example,
>
> probably stupid question, but why not keep the retry logic and when
> it fails then instead of returning EINTR just jump to the next key
>
> jirka
+1, keeping the retry logic but moving to the next key on error sounds
like a sensible approach.
>
>
> > Hou Tao <houtao@huaweicloud.com> pointed out a following scenario:
> >
> > For LPM trie map:
> > (1) ->map_get_next_key(map, prev_key, key) returns a valid key
> >
> > (2) bpf_map_copy_value() return -ENOMENT
> > It means the key must be deleted concurrently.
> >
> > (3) goto next_key
> > It swaps the prev_key and key
> >
> > (4) ->map_get_next_key(map, prev_key, key) again
> > prev_key points to a non-existing key, for LPM trie it will treat just
> > like prev_key=NULL case, the returned key will be duplicated.
> >
> > With the retry logic, the iteration can continue to the key next to the
> > deleted one. But if we directly skip to the next key, the iteration loop
> > would restart from the first key for the lpm_trie type.
> >
> > However, not all races may be recovered. For example, if current key is
> > deleted after instead of before bpf_map_copy_value, or if the prev_key
> > also gets deleted, then the loop will still restart from the first key
> > for lpm_tire anyway. For generic lookup it might be better to stay
> > simple, i.e. just skip to the next key. To guarantee that the output
> > keys are not duplicated, it is better to implement map type specific
> > batch operations, which can properly lock the trie and synchronize with
> > concurrent mutators.
> >
> > Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
> > Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
> > Signed-off-by: Yan Zhai <yan@cloudflare.com>
> > Acked-by: Hou Tao <houtao1@huawei.com>
> > ---
> > v2->v3: deleted a used macro
> > v1->v2: incorporate more useful information inside commit message.
> > ---
> > kernel/bpf/syscall.c | 18 +++++-------------
> > 1 file changed, 5 insertions(+), 13 deletions(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index c420edbfb7c8..e5f1c7fd0ba7 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1968,8 +1968,6 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> > return err;
> > }
> >
> > -#define MAP_LOOKUP_RETRIES 3
> > -
> > int generic_map_lookup_batch(struct bpf_map *map,
> > const union bpf_attr *attr,
> > union bpf_attr __user *uattr)
> > @@ -1979,8 +1977,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
> > void __user *values = u64_to_user_ptr(attr->batch.values);
> > void __user *keys = u64_to_user_ptr(attr->batch.keys);
> > void *buf, *buf_prevkey, *prev_key, *key, *value;
> > - int err, retry = MAP_LOOKUP_RETRIES;
> > u32 value_size, cp, max_count;
> > + int err;
> >
> > if (attr->batch.elem_flags & ~BPF_F_LOCK)
> > return -EINVAL;
> > @@ -2026,14 +2024,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
> > err = bpf_map_copy_value(map, key, value,
> > attr->batch.elem_flags);
> >
> > - if (err == -ENOENT) {
> > - if (retry) {
> > - retry--;
> > - continue;
> > - }
> > - err = -EINTR;
> > - break;
> > - }
> > + if (err == -ENOENT)
> > + goto next_key;
> >
> > if (err)
> > goto free_buf;
> > @@ -2048,12 +2040,12 @@ int generic_map_lookup_batch(struct bpf_map *map,
> > goto free_buf;
> > }
> >
> > + cp++;
> > +next_key:
> > if (!prev_key)
> > prev_key = buf_prevkey;
> >
> > swap(prev_key, key);
> > - retry = MAP_LOOKUP_RETRIES;
> > - cp++;
> > cond_resched();
> > }
> >
> > --
> > 2.39.5
> >
> >
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 bpf 1/2] bpf: skip non exist keys in generic_map_lookup_batch
2025-02-10 14:47 ` Brian Vazquez
@ 2025-02-10 16:21 ` Yan Zhai
2025-02-12 17:04 ` Jiri Olsa
0 siblings, 1 reply; 8+ messages in thread
From: Yan Zhai @ 2025-02-10 16:21 UTC (permalink / raw)
To: Brian Vazquez
Cc: Jiri Olsa, bpf, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
Stanislav Fomichev, Hao Luo, Mykola Lysenko, Shuah Khan,
linux-kernel, linux-kselftest, kernel-team, Hou Tao
Hi Brian, Jiri
thanks for the comments.
On Mon, Feb 10, 2025 at 8:47 AM Brian Vazquez <brianvv@google.com> wrote:
>
> On Mon, Feb 10, 2025 at 4:19 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Sun, Feb 09, 2025 at 11:22:35PM -0800, Yan Zhai wrote:
> > > The generic_map_lookup_batch currently returns EINTR if it fails with
> > > ENOENT and retries several times on bpf_map_copy_value. The next batch
> > > would start from the same location, presuming it's a transient issue.
> > > This is incorrect if a map can actually have "holes", i.e.
> > > "get_next_key" can return a key that does not point to a valid value. At
> > > least the array of maps type may contain such holes legitly. Right now
> > > these holes show up, generic batch lookup cannot proceed any more. It
> > > will always fail with EINTR errors.
> > >
> > > Rather, do not retry in generic_map_lookup_batch. If it finds a non
> > > existing element, skip to the next key. This simple solution comes with
> > > a price that transient errors may not be recovered, and the iteration
> > > might cycle back to the first key under parallel deletion. For example,
> >
> > probably stupid question, but why not keep the retry logic and when
> > it fails then instead of returning EINTR just jump to the next key
> >
> > jirka
>
> +1, keeping the retry logic but moving to the next key on error sounds
> like a sensible approach.
>
I made the trade off since retry would consistently fail for the array
of maps, so it is merely wasting cycles to ever do so. It is already
pretty slow to read these maps today from userspace (for us we read
them for accounting/monitoring purposes), so it is nice to save a few
cycles especially for sparse maps. E.g. We use inner maps to store
protocol specific actions in an array of maps with 256 slots, but
usually only a few common protocols like TCP/UDP/ICMP are populated,
leaving most "holes". On the other hand, I personally feel it is
really "fragile" if users rely heavily on this logic to survive
concurrent lookup and deletion. Would it make more sense to provide
concurrency guarantee with map specific ops like hash map?
best
Yan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 bpf 1/2] bpf: skip non exist keys in generic_map_lookup_batch
2025-02-10 16:21 ` Yan Zhai
@ 2025-02-12 17:04 ` Jiri Olsa
0 siblings, 0 replies; 8+ messages in thread
From: Jiri Olsa @ 2025-02-12 17:04 UTC (permalink / raw)
To: Yan Zhai
Cc: Brian Vazquez, Jiri Olsa, bpf, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
KP Singh, Stanislav Fomichev, Hao Luo, Mykola Lysenko, Shuah Khan,
linux-kernel, linux-kselftest, kernel-team, Hou Tao
On Mon, Feb 10, 2025 at 10:21:38AM -0600, Yan Zhai wrote:
> Hi Brian, Jiri
>
> thanks for the comments.
>
> On Mon, Feb 10, 2025 at 8:47 AM Brian Vazquez <brianvv@google.com> wrote:
> >
> > On Mon, Feb 10, 2025 at 4:19 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Sun, Feb 09, 2025 at 11:22:35PM -0800, Yan Zhai wrote:
> > > > The generic_map_lookup_batch currently returns EINTR if it fails with
> > > > ENOENT and retries several times on bpf_map_copy_value. The next batch
> > > > would start from the same location, presuming it's a transient issue.
> > > > This is incorrect if a map can actually have "holes", i.e.
> > > > "get_next_key" can return a key that does not point to a valid value. At
> > > > least the array of maps type may contain such holes legitly. Right now
> > > > these holes show up, generic batch lookup cannot proceed any more. It
> > > > will always fail with EINTR errors.
> > > >
> > > > Rather, do not retry in generic_map_lookup_batch. If it finds a non
> > > > existing element, skip to the next key. This simple solution comes with
> > > > a price that transient errors may not be recovered, and the iteration
> > > > might cycle back to the first key under parallel deletion. For example,
> > >
> > > probably stupid question, but why not keep the retry logic and when
> > > it fails then instead of returning EINTR just jump to the next key
> > >
> > > jirka
> >
> > +1, keeping the retry logic but moving to the next key on error sounds
> > like a sensible approach.
> >
> I made the trade off since retry would consistently fail for the array
> of maps, so it is merely wasting cycles to ever do so. It is already
> pretty slow to read these maps today from userspace (for us we read
> them for accounting/monitoring purposes), so it is nice to save a few
> cycles especially for sparse maps. E.g. We use inner maps to store
> protocol specific actions in an array of maps with 256 slots, but
> usually only a few common protocols like TCP/UDP/ICMP are populated,
> leaving most "holes". On the other hand, I personally feel it is
> really "fragile" if users rely heavily on this logic to survive
> concurrent lookup and deletion. Would it make more sense to provide
> concurrency guarantee with map specific ops like hash map?
Brian, any details on the EINTR path? is that just to survive concurent
batch-lookup and delete?
if that's important use case I guess the map specific function would be
possible, because it's broken for maps with holes as you described
thanks,
jirka
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch
2025-02-10 7:22 [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch Yan Zhai
2025-02-10 7:22 ` [PATCH v3 bpf 1/2] " Yan Zhai
2025-02-10 7:22 ` [PATCH v3 bpf 2/2] selftests: bpf: test batch lookup on array of maps with holes Yan Zhai
@ 2025-02-19 1:40 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-02-19 1:40 UTC (permalink / raw)
To: Yan Zhai
Cc: bpf, ast, daniel, john.fastabend, andrii, martin.lau, eddyz87,
song, yonghong.song, kpsingh, sdf, haoluo, jolsa, mykolal, shuah,
brianvv, linux-kernel, linux-kselftest, kernel-team, houtao
Hello:
This series was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Sun, 9 Feb 2025 23:22:31 -0800 you wrote:
> The generic_map_lookup_batch currently returns EINTR if it fails with
> ENOENT and retries several times on bpf_map_copy_value. The next batch
> would start from the same location, presuming it's a transient issue.
> This is incorrect if a map can actually have "holes", i.e.
> "get_next_key" can return a key that does not point to a valid value. At
> least the array of maps type may contain such holes legitly. Right now
> these holes show up, generic batch lookup cannot proceed any more. It
> will always fail with EINTR errors.
>
> [...]
Here is the summary with links:
- [v3,bpf,1/2] bpf: skip non exist keys in generic_map_lookup_batch
https://git.kernel.org/bpf/bpf/c/5644c6b50ffe
- [v3,bpf,2/2] selftests: bpf: test batch lookup on array of maps with holes
https://git.kernel.org/bpf/bpf/c/d66b7739176d
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-02-19 1:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-10 7:22 [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch Yan Zhai
2025-02-10 7:22 ` [PATCH v3 bpf 1/2] " Yan Zhai
2025-02-10 9:19 ` Jiri Olsa
2025-02-10 14:47 ` Brian Vazquez
2025-02-10 16:21 ` Yan Zhai
2025-02-12 17:04 ` Jiri Olsa
2025-02-10 7:22 ` [PATCH v3 bpf 2/2] selftests: bpf: test batch lookup on array of maps with holes Yan Zhai
2025-02-19 1:40 ` [PATCH v3 bpf 0/2] bpf: skip non exist keys in generic_map_lookup_batch patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox