public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] bpf: Fix unintended eviction when updating lru hash maps
@ 2025-12-02 15:30 Leon Hwang
  2025-12-02 15:30 ` [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps Leon Hwang
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Leon Hwang @ 2025-12-02 15:30 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Leon Hwang, Saket Kumar Bhaskar, David S . Miller,
	linux-kernel, linux-kselftest, kernel-patches-bot

This unintended LRU eviction issue was observed while developing the
selftest for
"[PATCH bpf-next v10 0/8] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps" [1].

When updating an existing element in lru_hash or lru_percpu_hash maps,
the current implementation calls prealloc_lru_pop() to get a new node
before checking if the key already exists. If the map is full, this
triggers LRU eviction and removes an existing element, even though the
update operation only needs to modify the value in-place.

In the selftest, this was to be worked around by reserving an extra entry to
avoid triggering eviction in __htab_lru_percpu_map_update_elem().
However, the underlying issue remains problematic because:

1. Users may unexpectedly lose entries when updating existing keys in a
   full map.
2. The eviction overhead is unnecessary for existing key updates.

This patchset fixes the issue by first checking if the key exists before
allocating a new node. If the key is found, update the value in-place,
refresh the LRU reference, and return immediately without triggering any
eviction. Only proceed with node allocation if the key does not exist.

Links:
[1] https://lore.kernel.org/bpf/20251117162033.6296-1-leon.hwang@linux.dev/

Leon Hwang (3):
  bpf: Avoid unintended eviction when updating lru_hash maps
  bpf: Avoid unintended eviction when updating lru_percpu_hash maps
  selftests/bpf: Add tests to verify no unintended eviction when
    updating lru hash maps

 kernel/bpf/hashtab.c                          | 43 +++++++++++
 .../selftests/bpf/prog_tests/htab_update.c    | 73 +++++++++++++++++++
 2 files changed, 116 insertions(+)

--
2.52.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps
  2025-12-02 15:30 [PATCH bpf-next 0/3] bpf: Fix unintended eviction when updating lru hash maps Leon Hwang
@ 2025-12-02 15:30 ` Leon Hwang
  2025-12-02 18:10   ` Alexei Starovoitov
  2025-12-02 15:30 ` [PATCH bpf-next 2/3] bpf: Avoid unintended eviction when updating lru_percpu_hash maps Leon Hwang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Leon Hwang @ 2025-12-02 15:30 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Leon Hwang, Saket Kumar Bhaskar, David S . Miller,
	linux-kernel, linux-kselftest, kernel-patches-bot

When updating an existing element in lru_hash maps, the current
implementation always calls prealloc_lru_pop() to get a new node before
checking if the key already exists. If the map is full, this triggers
LRU eviction and removes an existing element, even though the update
operation only needs to modify the value of an existing key in-place.

This is problematic because:
1. Users may unexpectedly lose entries when doing simple value updates
2. The eviction overhead is unnecessary for existing key updates

Fix this by first checking if the key exists before allocating a new
node. If the key is found, update the value in-place, refresh the LRU
reference, and return immediately without triggering any eviction.

Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 kernel/bpf/hashtab.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index c8a9b27f8663..fb624aa76573 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1207,6 +1207,27 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
+	ret = htab_lock_bucket(b, &flags);
+	if (ret)
+		goto err_lock_bucket;
+
+	l_old = lookup_elem_raw(head, hash, key, key_size);
+
+	ret = check_flags(htab, l_old, map_flags);
+	if (ret)
+		goto err;
+
+	if (l_old) {
+		bpf_lru_node_set_ref(&l_old->lru_node);
+		copy_map_value(&htab->map, htab_elem_value(l_old, map->key_size), value);
+		check_and_free_fields(htab, l_old);
+	}
+
+	htab_unlock_bucket(b, flags);
+
+	if (l_old)
+		return 0;
+
 	/* For LRU, we need to alloc before taking bucket's
 	 * spinlock because getting free nodes from LRU may need
 	 * to remove older elements from htab and this removal
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 2/3] bpf: Avoid unintended eviction when updating lru_percpu_hash maps
  2025-12-02 15:30 [PATCH bpf-next 0/3] bpf: Fix unintended eviction when updating lru hash maps Leon Hwang
  2025-12-02 15:30 ` [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps Leon Hwang
@ 2025-12-02 15:30 ` Leon Hwang
  2025-12-02 15:30 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests to verify no unintended eviction when updating lru hash maps Leon Hwang
  2025-12-02 20:44 ` [syzbot ci] Re: bpf: Fix " syzbot ci
  3 siblings, 0 replies; 8+ messages in thread
From: Leon Hwang @ 2025-12-02 15:30 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Leon Hwang, Saket Kumar Bhaskar, David S . Miller,
	linux-kernel, linux-kselftest, kernel-patches-bot

Similar to the previous fix for lru_hash maps, the lru_percpu_hash map
implementation also suffers from unnecessary eviction when updating
existing elements.

When updating a key that already exists in a full lru_percpu_hash map,
the current code path calls prealloc_lru_pop() before checking for the
existing key (unless map_flags is BPF_EXIST). This can evict an unrelated
element even though the update is just modifying the per-CPU value of an
existing entry.

Fix this by looking up the key first. If found, update the per-CPU value
in-place using pcpu_copy_value(), refresh the LRU reference, and return
early. Only proceed with node allocation if the key does not exist.

Fixes: 8f8449384ec3 ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 kernel/bpf/hashtab.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index fb624aa76573..af54fc3a9ba9 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1358,6 +1358,28 @@ static long __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
+	ret = htab_lock_bucket(b, &flags);
+	if (ret)
+		goto err_lock_bucket;
+
+	l_old = lookup_elem_raw(head, hash, key, key_size);
+
+	ret = check_flags(htab, l_old, map_flags);
+	if (ret)
+		goto err;
+
+	if (l_old) {
+		bpf_lru_node_set_ref(&l_old->lru_node);
+		/* per-cpu hash map can update value in-place */
+		pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size),
+				value, onallcpus);
+	}
+
+	htab_unlock_bucket(b, flags);
+
+	if (l_old)
+		return 0;
+
 	/* For LRU, we need to alloc before taking bucket's
 	 * spinlock because LRU's elem alloc may need
 	 * to remove older elem from htab and this removal
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 3/3] selftests/bpf: Add tests to verify no unintended eviction when updating lru hash maps
  2025-12-02 15:30 [PATCH bpf-next 0/3] bpf: Fix unintended eviction when updating lru hash maps Leon Hwang
  2025-12-02 15:30 ` [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps Leon Hwang
  2025-12-02 15:30 ` [PATCH bpf-next 2/3] bpf: Avoid unintended eviction when updating lru_percpu_hash maps Leon Hwang
@ 2025-12-02 15:30 ` Leon Hwang
  2025-12-02 15:56   ` bot+bpf-ci
  2025-12-02 20:44 ` [syzbot ci] Re: bpf: Fix " syzbot ci
  3 siblings, 1 reply; 8+ messages in thread
From: Leon Hwang @ 2025-12-02 15:30 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Leon Hwang, Saket Kumar Bhaskar, David S . Miller,
	linux-kernel, linux-kselftest, kernel-patches-bot

Add two tests to verify that updating an existing element in LRU hash
maps does not cause unintended eviction of other elements.

The test creates lru_hash/lru_percpu_hash maps with max_entries slots and
populates all of them. It then updates an existing key and verifies that:
1. The update succeeds without error
2. The updated key has the new value
3. All other keys still exist with their original values

This validates the fix that prevents unnecessary LRU eviction when
updating existing elements in full LRU hash maps.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/htab_update.c    | 73 +++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c
index d0b405eb2966..bd29a915bb05 100644
--- a/tools/testing/selftests/bpf/prog_tests/htab_update.c
+++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c
@@ -143,3 +143,76 @@ void test_htab_update(void)
 	if (test__start_subtest("concurrent_update"))
 		test_concurrent_update();
 }
+
+static void test_lru_hash_map_update_elem(enum bpf_map_type map_type)
+{
+	int err, map_fd, i, key, nr_cpus, max_entries = 128;
+	u64 *values, value = 0xDEADC0DE;
+
+	nr_cpus = libbpf_num_possible_cpus();
+	if (!ASSERT_GT(nr_cpus, 0, "libbpf_num_possible_cpus"))
+		return;
+
+	values = calloc(nr_cpus, sizeof(u64));
+	if (!ASSERT_OK_PTR(values, "calloc values"))
+		return;
+	for (i = 0; i < nr_cpus; i++)
+		values[i] = value;
+
+	map_fd = bpf_map_create(map_type, "test_lru", sizeof(int), sizeof(u64), max_entries, NULL);
+	if (!ASSERT_GE(map_fd, 0, "bpf_map_create")) {
+		free(values);
+		return;
+	}
+
+	/* populate all slots */
+	for (key = 0; key < max_entries; key++) {
+		err = bpf_map_update_elem(map_fd, &key, values, 0);
+		if (!ASSERT_OK(err, "bpf_map_update_elem"))
+			goto out;
+	}
+
+	/* LRU eviction should not happen */
+
+	key = 0;
+	memset(values, 0, nr_cpus * sizeof(u64));
+	err = bpf_map_update_elem(map_fd, &key, values, 0);
+	if (!ASSERT_OK(err, "bpf_map_update_elem"))
+		goto out;
+
+	err = bpf_map_lookup_elem(map_fd, &key, values);
+	if (!ASSERT_OK(err, "bpf_map_lookup_elem"))
+		goto out;
+	if (!ASSERT_EQ(*values, 0, "bpf_map_lookup_elem value"))
+		goto out;
+
+	for (key = 1; key < max_entries; key++) {
+		err = bpf_map_lookup_elem(map_fd, &key, values);
+		if (!ASSERT_OK(err, "bpf_map_lookup_elem"))
+			goto out;
+		if (!ASSERT_EQ(*values, value, "bpf_map_lookup_elem value"))
+			goto out;
+	}
+
+out:
+	close(map_fd);
+	free(values);
+}
+
+static void test_update_lru_hash_map(void)
+{
+	test_lru_hash_map_update_elem(BPF_MAP_TYPE_LRU_HASH);
+}
+
+static void test_update_lru_percpu_hash_map(void)
+{
+	test_lru_hash_map_update_elem(BPF_MAP_TYPE_LRU_PERCPU_HASH);
+}
+
+void test_update_lru_hash_maps(void)
+{
+	if (test__start_subtest("lru_hash"))
+		test_update_lru_hash_map();
+	if (test__start_subtest("lru_percpu_hash"))
+		test_update_lru_percpu_hash_map();
+}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Add tests to verify no unintended eviction when updating lru hash maps
  2025-12-02 15:30 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests to verify no unintended eviction when updating lru hash maps Leon Hwang
@ 2025-12-02 15:56   ` bot+bpf-ci
  0 siblings, 0 replies; 8+ messages in thread
From: bot+bpf-ci @ 2025-12-02 15:56 UTC (permalink / raw)
  To: leon.hwang, bpf
  Cc: ast, daniel, andrii, martin.lau, eddyz87, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah, leon.hwang,
	skb99, davem, linux-kernel, linux-kselftest, kernel-patches-bot,
	ast, andrii, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2419 bytes --]

> diff --git a/tools/testing/selftests/bpf/prog_tests/htab_update.c b/tools/testing/selftests/bpf/prog_tests/htab_update.c
> index d0b405eb2..bd29a915b 100644
> --- a/tools/testing/selftests/bpf/prog_tests/htab_update.c
> +++ b/tools/testing/selftests/bpf/prog_tests/htab_update.c

[ ... ]

> +static void test_lru_hash_map_update_elem(enum bpf_map_type map_type)
> +{
> +	int err, map_fd, i, key, nr_cpus, max_entries = 128;
> +	u64 *values, value = 0xDEADC0DE;
> +
> +	nr_cpus = libbpf_num_possible_cpus();
> +	if (!ASSERT_GT(nr_cpus, 0, "libbpf_num_possible_cpus"))
> +		return;
> +
> +	values = calloc(nr_cpus, sizeof(u64));

[ ... ]

> +	err = bpf_map_lookup_elem(map_fd, &key, values);
> +	if (!ASSERT_OK(err, "bpf_map_lookup_elem"))
> +		goto out;
> +	if (!ASSERT_EQ(*values, 0, "bpf_map_lookup_elem value"))
> +		goto out;
> +
> +	for (key = 1; key < max_entries; key++) {
> +		err = bpf_map_lookup_elem(map_fd, &key, values);
> +		if (!ASSERT_OK(err, "bpf_map_lookup_elem"))
> +			goto out;
> +		if (!ASSERT_EQ(*values, value, "bpf_map_lookup_elem value"))
> +			goto out;
> +	}

When test_lru_hash_map_update_elem() is called with
BPF_MAP_TYPE_LRU_PERCPU_HASH via test_update_lru_percpu_hash_map(),
should all CPU values be checked rather than just *values?

For percpu maps, bpf_map_lookup_elem() returns an array with one value
per CPU. The test allocates space for nr_cpus values but only verifies
the first element. This means values for CPUs 1 through nr_cpus-1 are
not validated.

Looking at tools/testing/selftests/bpf/prog_tests/lookup_and_delete.c,
the pattern for percpu maps is to loop through all CPUs:

    for (i = 0; i < nr_cpus; i++) {
        val = value[i];
        /* check val */
    }

For BPF_MAP_TYPE_LRU_HASH this is correct, but for
BPF_MAP_TYPE_LRU_PERCPU_HASH the test may pass even if non-primary
CPUs have incorrect values.

> +
> +out:
> +	close(map_fd);
> +	free(values);
> +}
> +
> +static void test_update_lru_hash_map(void)
> +{
> +	test_lru_hash_map_update_elem(BPF_MAP_TYPE_LRU_HASH);
> +}
> +
> +static void test_update_lru_percpu_hash_map(void)
> +{
> +	test_lru_hash_map_update_elem(BPF_MAP_TYPE_LRU_PERCPU_HASH);
> +}

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19864460617

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps
  2025-12-02 15:30 ` [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps Leon Hwang
@ 2025-12-02 18:10   ` Alexei Starovoitov
  2025-12-03 15:14     ` Leon Hwang
  0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2025-12-02 18:10 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Saket Kumar Bhaskar, David S . Miller, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot

On Tue, Dec 2, 2025 at 7:31 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> When updating an existing element in lru_hash maps, the current
> implementation always calls prealloc_lru_pop() to get a new node before
> checking if the key already exists. If the map is full, this triggers
> LRU eviction and removes an existing element, even though the update
> operation only needs to modify the value of an existing key in-place.
>
> This is problematic because:
> 1. Users may unexpectedly lose entries when doing simple value updates
> 2. The eviction overhead is unnecessary for existing key updates
>
> Fix this by first checking if the key exists before allocating a new
> node. If the key is found, update the value in-place, refresh the LRU
> reference, and return immediately without triggering any eviction.
>
> Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  kernel/bpf/hashtab.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index c8a9b27f8663..fb624aa76573 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -1207,6 +1207,27 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
>         b = __select_bucket(htab, hash);
>         head = &b->head;
>
> +       ret = htab_lock_bucket(b, &flags);
> +       if (ret)
> +               goto err_lock_bucket;
> +
> +       l_old = lookup_elem_raw(head, hash, key, key_size);
> +
> +       ret = check_flags(htab, l_old, map_flags);
> +       if (ret)
> +               goto err;
> +
> +       if (l_old) {
> +               bpf_lru_node_set_ref(&l_old->lru_node);
> +               copy_map_value(&htab->map, htab_elem_value(l_old, map->key_size), value);
> +               check_and_free_fields(htab, l_old);
> +       }

We cannot do this. It breaks the atomicity of the update.
We added htab_map_update_elem_in_place() for a very specific case.
See
https://lore.kernel.org/all/20250401062250.543403-1-houtao@huaweicloud.com/
and discussion in v1,v2.

We cannot do in-place updates for other map types.
It will break user expectations.

pw-bot: cr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [syzbot ci] Re: bpf: Fix unintended eviction when updating lru hash maps
  2025-12-02 15:30 [PATCH bpf-next 0/3] bpf: Fix unintended eviction when updating lru hash maps Leon Hwang
                   ` (2 preceding siblings ...)
  2025-12-02 15:30 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests to verify no unintended eviction when updating lru hash maps Leon Hwang
@ 2025-12-02 20:44 ` syzbot ci
  3 siblings, 0 replies; 8+ messages in thread
From: syzbot ci @ 2025-12-02 20:44 UTC (permalink / raw)
  To: andrii, ast, bpf, daniel, davem, eddyz87, haoluo, john.fastabend,
	jolsa, kernel-patches-bot, kpsingh, leon.hwang, linux-kernel,
	linux-kselftest, martin.lau, sdf, shuah, skb99, song,
	yonghong.song
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v1] bpf: Fix unintended eviction when updating lru hash maps
https://lore.kernel.org/all/20251202153032.10118-1-leon.hwang@linux.dev
* [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps
* [PATCH bpf-next 2/3] bpf: Avoid unintended eviction when updating lru_percpu_hash maps
* [PATCH bpf-next 3/3] selftests/bpf: Add tests to verify no unintended eviction when updating lru hash maps

and found the following issue:
general protection fault in bpf_lru_push_free

Full report is available here:
https://ci.syzbot.org/series/64db9547-852d-4c56-a3a1-3d18f254330c

***

general protection fault in bpf_lru_push_free

tree:      bpf-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git
base:      5262cb23393f7e86a64d1a45eeaa8a6f99f03d10
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/702a7a8c-4385-4089-a777-8b47155f8794/config
C repro:   https://ci.syzbot.org/findings/0cffd810-925f-42f4-88d2-8d21195f341e/c_repro
syz repro: https://ci.syzbot.org/findings/0cffd810-925f-42f4-88d2-8d21195f341e/syz_repro

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
CPU: 0 UID: 0 PID: 5953 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:bpf_percpu_lru_push_free kernel/bpf/bpf_lru_list.c:539 [inline]
RIP: 0010:bpf_lru_push_free+0x6e/0xbb0 kernel/bpf/bpf_lru_list.c:551
Code: 01 0f 85 e4 00 00 00 4c 89 f0 48 c1 e8 03 80 3c 28 00 74 08 4c 89 f7 e8 c0 82 42 00 4d 8b 3e 4c 8d 73 10 4c 89 f0 48 c1 e8 03 <0f> b6 04 28 84 c0 0f 85 5b 09 00 00 45 0f b7 36 bf 08 00 00 00 44
RSP: 0018:ffffc900046d7b48 EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000010 RCX: ffff888112941d00
RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffff888117c02300
RBP: dffffc0000000000 R08: ffffffff8f7cee77 R09: 1ffffffff1ef9dce
R10: dffffc0000000000 R11: fffffbfff1ef9dcf R12: 0000000000000002
R13: 00000000fffffffe R14: 0000000000000020 R15: 0000607d55cf6a80
FS:  000055555edc7500(0000) GS:ffff88818eb38000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2ed63fff CR3: 000000010bca0000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 htab_lru_push_free kernel/bpf/hashtab.c:1183 [inline]
 htab_lru_map_update_elem+0x33e/0xa90 kernel/bpf/hashtab.c:1266
 bpf_map_update_value+0x751/0x920 kernel/bpf/syscall.c:294
 map_update_elem+0x355/0x4b0 kernel/bpf/syscall.c:1817
 __sys_bpf+0x619/0x860 kernel/bpf/syscall.c:6150
 __do_sys_bpf kernel/bpf/syscall.c:6272 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:6270 [inline]
 __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:6270
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe41118f7c9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcd4179988 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 00007fe4113e5fa0 RCX: 00007fe41118f7c9
RDX: 0000000000000020 RSI: 0000200000000800 RDI: 0000000000000002
RBP: 00007fe4111f297f R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fe4113e5fa0 R14: 00007fe4113e5fa0 R15: 0000000000000003
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:bpf_percpu_lru_push_free kernel/bpf/bpf_lru_list.c:539 [inline]
RIP: 0010:bpf_lru_push_free+0x6e/0xbb0 kernel/bpf/bpf_lru_list.c:551
Code: 01 0f 85 e4 00 00 00 4c 89 f0 48 c1 e8 03 80 3c 28 00 74 08 4c 89 f7 e8 c0 82 42 00 4d 8b 3e 4c 8d 73 10 4c 89 f0 48 c1 e8 03 <0f> b6 04 28 84 c0 0f 85 5b 09 00 00 45 0f b7 36 bf 08 00 00 00 44
RSP: 0018:ffffc900046d7b48 EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000010 RCX: ffff888112941d00
RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffff888117c02300
RBP: dffffc0000000000 R08: ffffffff8f7cee77 R09: 1ffffffff1ef9dce
R10: dffffc0000000000 R11: fffffbfff1ef9dcf R12: 0000000000000002
R13: 00000000fffffffe R14: 0000000000000020 R15: 0000607d55cf6a80
FS:  000055555edc7500(0000) GS:ffff88818eb38000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2ed63fff CR3: 000000010bca0000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
   0:	01 0f                	add    %ecx,(%rdi)
   2:	85 e4                	test   %esp,%esp
   4:	00 00                	add    %al,(%rax)
   6:	00 4c 89 f0          	add    %cl,-0x10(%rcx,%rcx,4)
   a:	48 c1 e8 03          	shr    $0x3,%rax
   e:	80 3c 28 00          	cmpb   $0x0,(%rax,%rbp,1)
  12:	74 08                	je     0x1c
  14:	4c 89 f7             	mov    %r14,%rdi
  17:	e8 c0 82 42 00       	call   0x4282dc
  1c:	4d 8b 3e             	mov    (%r14),%r15
  1f:	4c 8d 73 10          	lea    0x10(%rbx),%r14
  23:	4c 89 f0             	mov    %r14,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	0f b6 04 28          	movzbl (%rax,%rbp,1),%eax <-- trapping instruction
  2e:	84 c0                	test   %al,%al
  30:	0f 85 5b 09 00 00    	jne    0x991
  36:	45 0f b7 36          	movzwl (%r14),%r14d
  3a:	bf 08 00 00 00       	mov    $0x8,%edi
  3f:	44                   	rex.R


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps
  2025-12-02 18:10   ` Alexei Starovoitov
@ 2025-12-03 15:14     ` Leon Hwang
  0 siblings, 0 replies; 8+ messages in thread
From: Leon Hwang @ 2025-12-03 15:14 UTC (permalink / raw)
  To: Alexei Starovoitov, houtao1
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Saket Kumar Bhaskar, David S . Miller, LKML,
	open list:KERNEL SELFTEST FRAMEWORK, kernel-patches-bot



On 2025/12/3 02:10, Alexei Starovoitov wrote:
> On Tue, Dec 2, 2025 at 7:31 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> When updating an existing element in lru_hash maps, the current
>> implementation always calls prealloc_lru_pop() to get a new node before
>> checking if the key already exists. If the map is full, this triggers
>> LRU eviction and removes an existing element, even though the update
>> operation only needs to modify the value of an existing key in-place.
>>
>> This is problematic because:
>> 1. Users may unexpectedly lose entries when doing simple value updates
>> 2. The eviction overhead is unnecessary for existing key updates
>>
>> Fix this by first checking if the key exists before allocating a new
>> node. If the key is found, update the value in-place, refresh the LRU
>> reference, and return immediately without triggering any eviction.
>>
>> Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH")
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  kernel/bpf/hashtab.c | 21 +++++++++++++++++++++
>>  1 file changed, 21 insertions(+)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index c8a9b27f8663..fb624aa76573 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -1207,6 +1207,27 @@ static long htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value
>>         b = __select_bucket(htab, hash);
>>         head = &b->head;
>>
>> +       ret = htab_lock_bucket(b, &flags);
>> +       if (ret)
>> +               goto err_lock_bucket;
>> +
>> +       l_old = lookup_elem_raw(head, hash, key, key_size);
>> +
>> +       ret = check_flags(htab, l_old, map_flags);
>> +       if (ret)
>> +               goto err;
>> +
>> +       if (l_old) {
>> +               bpf_lru_node_set_ref(&l_old->lru_node);
>> +               copy_map_value(&htab->map, htab_elem_value(l_old, map->key_size), value);
>> +               check_and_free_fields(htab, l_old);
>> +       }
> 
> We cannot do this. It breaks the atomicity of the update.
> We added htab_map_update_elem_in_place() for a very specific case.
> See
> https://lore.kernel.org/all/20250401062250.543403-1-houtao@huaweicloud.com/
> and discussion in v1,v2.
> 
> We cannot do in-place updates for other map types.
> It will break user expectations.
> 

After going through the patch set and the related discussions, I
understand the concerns around breaking update atomicity.

I'll look into alternative approaches to address this issue without
violating the expected atomic semantics.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-03 15:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-02 15:30 [PATCH bpf-next 0/3] bpf: Fix unintended eviction when updating lru hash maps Leon Hwang
2025-12-02 15:30 ` [PATCH bpf-next 1/3] bpf: Avoid unintended eviction when updating lru_hash maps Leon Hwang
2025-12-02 18:10   ` Alexei Starovoitov
2025-12-03 15:14     ` Leon Hwang
2025-12-02 15:30 ` [PATCH bpf-next 2/3] bpf: Avoid unintended eviction when updating lru_percpu_hash maps Leon Hwang
2025-12-02 15:30 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests to verify no unintended eviction when updating lru hash maps Leon Hwang
2025-12-02 15:56   ` bot+bpf-ci
2025-12-02 20:44 ` [syzbot ci] Re: bpf: Fix " syzbot ci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox