* [PATCH] bpf: restrict verifier access to bpf_lru_node.ref
@ 2025-07-15 7:57 Shankari Anand
2025-07-15 14:49 ` Alexei Starovoitov
0 siblings, 1 reply; 5+ messages in thread
From: Shankari Anand @ 2025-07-15 7:57 UTC (permalink / raw)
To: bpf, linux-kernel
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shankari Anand, syzbot+ad4661d6ca888ce7fe11
syzbot reported a data race on the `ref` field of `struct bpf_lru_node`:
https://syzkaller.appspot.com/bug?extid=ad4661d6ca888ce7fe11
This race arises when user programs read the `.ref` field from a BPF map
that uses LRU logic, potentially exposing unprotected state.
Accesses to `ref` are already wrapped with READ_ONCE() and WRITE_ONCE().
However, the BPF verifier currently allows unprivileged programs to
read this field via BTF-enabled pointer, bypassing internal assumptions.
To mitigate this, the verifier is updated to disallow access
to the `.ref` field in `struct bpf_lru_node`.
This is done by checking both the base type and field name
in `check_ptr_to_btf_access()` and returning -EACCES if matched.
Reported-by: syzbot+ad4661d6ca888ce7fe11@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6847e661.a70a0220.27c366.005d.GAE@google.com/T/
Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com>
---
kernel/bpf/verifier.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 169845710c7e..775ce454268c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7159,6 +7159,19 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
}
ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag, &field_name);
+
+ /* Block access to sensitive kernel-internal fields */
+ if (field_name && reg->btf && btf_is_kernel(reg->btf)) {
+ const struct btf_type *base_type = btf_type_by_id(reg->btf, reg->btf_id);
+ const char *type_name = btf_name_by_offset(reg->btf, base_type->name_off);
+
+ if (strcmp(type_name, "bpf_lru_node") == 0 &&
+ strcmp(field_name, "ref") == 0) {
+ verbose(env,
+ "access to field 'ref' in struct bpf_lru_node is not allowed\n");
+ return -EACCES;
+ }
+ }
}
if (ret < 0)
base-commit: 155a3c003e555a7300d156a5252c004c392ec6b0
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref
2025-07-15 7:57 [PATCH] bpf: restrict verifier access to bpf_lru_node.ref Shankari Anand
@ 2025-07-15 14:49 ` Alexei Starovoitov
2025-07-15 21:26 ` Martin KaFai Lau
0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2025-07-15 14:49 UTC (permalink / raw)
To: Shankari Anand, Martin KaFai Lau
Cc: bpf, LKML, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
syzbot+ad4661d6ca888ce7fe11
On Tue, Jul 15, 2025 at 12:58 AM Shankari Anand
<shankari.ak0208@gmail.com> wrote:
>
> syzbot reported a data race on the `ref` field of `struct bpf_lru_node`:
> https://syzkaller.appspot.com/bug?extid=ad4661d6ca888ce7fe11
>
> This race arises when user programs read the `.ref` field from a BPF map
> that uses LRU logic, potentially exposing unprotected state.
>
> Accesses to `ref` are already wrapped with READ_ONCE() and WRITE_ONCE().
> However, the BPF verifier currently allows unprivileged programs to
> read this field via BTF-enabled pointer, bypassing internal assumptions.
>
> To mitigate this, the verifier is updated to disallow access
> to the `.ref` field in `struct bpf_lru_node`.
> This is done by checking both the base type and field name
> in `check_ptr_to_btf_access()` and returning -EACCES if matched.
>
> Reported-by: syzbot+ad4661d6ca888ce7fe11@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/6847e661.a70a0220.27c366.005d.GAE@google.com/T/
> Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com>
> ---
> kernel/bpf/verifier.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 169845710c7e..775ce454268c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7159,6 +7159,19 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> }
>
> ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag, &field_name);
> +
> + /* Block access to sensitive kernel-internal fields */
This makes no sense. Tracing bpf progs are allowed to read
all kernel internal data fields.
Also you misread the kcsan report.
It says that 'read' comes from:
read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]
which is reading hash and key of htab_elem while
write side actually writes hash too:
*(u32 *)((void *)node + lru->hash_offset) = hash;
Martin,
is it really possible for these read/write to race ?
--
pw-bot: cr
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref
2025-07-15 14:49 ` Alexei Starovoitov
@ 2025-07-15 21:26 ` Martin KaFai Lau
2025-07-16 6:32 ` Shankari Anand
0 siblings, 1 reply; 5+ messages in thread
From: Martin KaFai Lau @ 2025-07-15 21:26 UTC (permalink / raw)
To: Alexei Starovoitov, Shankari Anand
Cc: bpf, LKML, Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
syzbot+ad4661d6ca888ce7fe11
On 7/15/25 7:49 AM, Alexei Starovoitov wrote:
> Also you misread the kcsan report.
>
> It says that 'read' comes from:
>
> read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
> lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]
>
> which is reading hash and key of htab_elem while
> write side actually writes hash too:
> *(u32 *)((void *)node + lru->hash_offset) = hash;
>
> Martin,
> is it really possible for these read/write to race ?
I think it is possible. The elem in the lru's freelist currently does not wait
for a rcu gp before reuse. There is a chance that the rcu reader is still
reading the hash value that was put in the freelist, while the writer is reusing
and updating it.
I think the percpu_freelist used in the regular hashmap should have similar
behavior, so may be worth finding a common solution, such as waiting for a rcu
gp before reusing it.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref
2025-07-15 21:26 ` Martin KaFai Lau
@ 2025-07-16 6:32 ` Shankari Anand
2025-07-16 20:02 ` Martin KaFai Lau
0 siblings, 1 reply; 5+ messages in thread
From: Shankari Anand @ 2025-07-16 6:32 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Alexei Starovoitov, bpf, LKML, Martin KaFai Lau,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Eduard Zingerman, Song Liu, Yonghong Song,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
syzbot+ad4661d6ca888ce7fe11
Hello,
>
>
> Also you misread the kcsan report.
> It says that 'read' comes from:
>
> read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
> lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]
> which is reading hash and key of htab_elem while
> write side actually writes hash too:
> *(u32 *)((void *)node + lru->hash_offset) = hash;
Thanks for the clarification. I misattributed the race to the ref
field, but the KCSAN report indeed points to a data race between a
reader, lookup_nulls_elem_raw(), accessing the hash or key fields, and
a writer, bpf_lru_pop_free(), reinitializing and reusing the same
element from the LRU freelist without waiting for an RCU grace period.
> I think it is possible. The elem in the lru's freelist currently does not wait
> for a rcu gp before reuse. There is a chance that the rcu reader is still
> reading the hash value that was put in the freelist, while the writer is reusing
> and updating it.
>
> I think the percpu_freelist used in the regular hashmap should have similar
> behavior, so may be worth finding a common solution, such as waiting for a rcu
> gp before reusing it.
To resolve this, would it make sense to ensure that elements popped
from the free list are only reused after a grace period? Similar to
how other parts of the kernel manage safe object reuse.
--
Regards,
Shankari
On Wed, Jul 16, 2025 at 2:57 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 7/15/25 7:49 AM, Alexei Starovoitov wrote:
> > Also you misread the kcsan report.
> >
> > It says that 'read' comes from:
> >
> > read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
> > lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]
> >
> > which is reading hash and key of htab_elem while
> > write side actually writes hash too:
> > *(u32 *)((void *)node + lru->hash_offset) = hash;
> >
> > Martin,
> > is it really possible for these read/write to race ?
>
> I think it is possible. The elem in the lru's freelist currently does not wait
> for a rcu gp before reuse. There is a chance that the rcu reader is still
> reading the hash value that was put in the freelist, while the writer is reusing
> and updating it.
>
> I think the percpu_freelist used in the regular hashmap should have similar
> behavior, so may be worth finding a common solution, such as waiting for a rcu
> gp before reusing it.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref
2025-07-16 6:32 ` Shankari Anand
@ 2025-07-16 20:02 ` Martin KaFai Lau
0 siblings, 0 replies; 5+ messages in thread
From: Martin KaFai Lau @ 2025-07-16 20:02 UTC (permalink / raw)
To: Shankari Anand
Cc: Alexei Starovoitov, bpf, LKML, Martin KaFai Lau,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Eduard Zingerman, Song Liu, Yonghong Song,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
syzbot+ad4661d6ca888ce7fe11, Hou Tao
On 7/15/25 11:32 PM, Shankari Anand wrote:
>> I think the percpu_freelist used in the regular hashmap should have similar
>> behavior, so may be worth finding a common solution, such as waiting for a rcu
>> gp before reusing it.
> To resolve this, would it make sense to ensure that elements popped
> from the free list are only reused after a grace period? Similar to
> how other parts of the kernel manage safe object reuse.
The reuse behavior has been there for a long time. It had been discussed
before. Please go back to those threads for the background
and the direction that it is going. This thread is a good start:
https://lore.kernel.org/bpf/20250204082848.13471-3-hotforest@gmail.com/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-16 20:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15 7:57 [PATCH] bpf: restrict verifier access to bpf_lru_node.ref Shankari Anand
2025-07-15 14:49 ` Alexei Starovoitov
2025-07-15 21:26 ` Martin KaFai Lau
2025-07-16 6:32 ` Shankari Anand
2025-07-16 20:02 ` Martin KaFai Lau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).