* [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
@ 2024-10-22 1:45 Byeonguk Jeong
2024-10-22 9:43 ` Toke Høiland-Jørgensen
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Byeonguk Jeong @ 2024-10-22 1:45 UTC (permalink / raw)
To: Daniel Borkmann, Yonghong Song; +Cc: bpf, linux-kernel
trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
full paths from the root to leaves. For example, consider a trie with
max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
.prefixlen = 8 make 9 nodes be written on the node stack with size 8.
Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
---
kernel/bpf/lpm_trie.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 0218a5132ab5..9b60eda0f727 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -655,7 +655,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
if (!key || key->prefixlen > trie->max_prefixlen)
goto find_leftmost;
- node_stack = kmalloc_array(trie->max_prefixlen,
+ node_stack = kmalloc_array(trie->max_prefixlen + 1,
sizeof(struct lpm_trie_node *),
GFP_ATOMIC | __GFP_NOWARN);
if (!node_stack)
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
2024-10-22 1:45 [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key() Byeonguk Jeong
@ 2024-10-22 9:43 ` Toke Høiland-Jørgensen
2024-10-22 19:51 ` Alexei Starovoitov
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-10-22 9:43 UTC (permalink / raw)
To: Byeonguk Jeong, Daniel Borkmann, Yonghong Song; +Cc: bpf, linux-kernel
Byeonguk Jeong <jungbu2855@gmail.com> writes:
> trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> full paths from the root to leaves. For example, consider a trie with
> max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
>
> Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
Makes sense!
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
2024-10-22 1:45 [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key() Byeonguk Jeong
2024-10-22 9:43 ` Toke Høiland-Jørgensen
@ 2024-10-22 19:51 ` Alexei Starovoitov
2024-10-23 1:29 ` Byeonguk Jeong
2024-10-23 8:44 ` Byeonguk Jeong
2024-10-23 2:03 ` Hou Tao
2024-10-24 9:08 ` [PATCH] selftests/bpf: Add test for trie_get_next_key() Byeonguk Jeong
3 siblings, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2024-10-22 19:51 UTC (permalink / raw)
To: Byeonguk Jeong; +Cc: Daniel Borkmann, Yonghong Song, bpf, LKML
On Mon, Oct 21, 2024 at 6:49 PM Byeonguk Jeong <jungbu2855@gmail.com> wrote:
>
> trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> full paths from the root to leaves. For example, consider a trie with
> max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
Hmm. It sounds possible, but pls demonstrate it with a selftest.
With the amount of fuzzing I'm surprised it was not discovered earlier.
pw-bot: cr
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
2024-10-22 19:51 ` Alexei Starovoitov
@ 2024-10-23 1:29 ` Byeonguk Jeong
2024-10-23 8:44 ` Byeonguk Jeong
1 sibling, 0 replies; 16+ messages in thread
From: Byeonguk Jeong @ 2024-10-23 1:29 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Daniel Borkmann, Yonghong Song, bpf, LKML
On Tue, Oct 22, 2024 at 12:51:05PM -0700, Alexei Starovoitov wrote:
> On Mon, Oct 21, 2024 at 6:49 PM Byeonguk Jeong <jungbu2855@gmail.com> wrote:
> >
> > trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> > while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> > full paths from the root to leaves. For example, consider a trie with
> > max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> > 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> > .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
>
> Hmm. It sounds possible, but pls demonstrate it with a selftest.
> With the amount of fuzzing I'm surprised it was not discovered earlier.
>
> pw-bot: cr
With a simple test below, the kernel crashes in a minute or you can easily
discover the bug on KFENCE-enabled kernels.
#!/bin/bash
bpftool map create /sys/fs/bpf/lpm type lpm_trie key 5 value 1 \
entries 16 flags 0x1name lpm
for i in {0..8}; do
bpftool map update pinned /sys/fs/bpf/lpm \
key hex 0$i 00 00 00 00 \
value hex 00 any
done
while true; do
bpftool map dump pinned /sys/fs/bpf/lpm
done
In my environment (6.12-rc4, with CONFIG_KFENCE), dmesg gave me this
message as expected.
[ 463.141394] BUG: KFENCE: out-of-bounds write in trie_get_next_key+0x2f2/0x670
[ 463.143422] Out-of-bounds write at 0x0000000095bc45ea (256B right of kfence-#156):
[ 463.144438] trie_get_next_key+0x2f2/0x670
[ 463.145439] map_get_next_key+0x261/0x410
[ 463.146444] __sys_bpf+0xad4/0x1170
[ 463.147438] __x64_sys_bpf+0x74/0xc0
[ 463.148431] do_syscall_64+0x79/0x150
[ 463.149425] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 463.151436] kfence-#156: 0x00000000279749c1-0x0000000034dc4abb, size=256, cache=kmalloc-256
[ 463.153414] allocated by task 2021 on cpu 2 at 463.140440s (0.012974s ago):
[ 463.154413] trie_get_next_key+0x252/0x670
[ 463.155411] map_get_next_key+0x261/0x410
[ 463.156402] __sys_bpf+0xad4/0x1170
[ 463.157390] __x64_sys_bpf+0x74/0xc0
[ 463.158386] do_syscall_64+0x79/0x150
[ 463.159372] entry_SYSCALL_64_after_hwframe+0x76/0x7e
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
2024-10-22 1:45 [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key() Byeonguk Jeong
2024-10-22 9:43 ` Toke Høiland-Jørgensen
2024-10-22 19:51 ` Alexei Starovoitov
@ 2024-10-23 2:03 ` Hou Tao
2024-10-23 7:30 ` Byeonguk Jeong
2024-10-24 9:08 ` [PATCH] selftests/bpf: Add test for trie_get_next_key() Byeonguk Jeong
3 siblings, 1 reply; 16+ messages in thread
From: Hou Tao @ 2024-10-23 2:03 UTC (permalink / raw)
To: Byeonguk Jeong, Daniel Borkmann, Yonghong Song; +Cc: bpf, linux-kernel
On 10/22/2024 9:45 AM, Byeonguk Jeong wrote:
> trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> full paths from the root to leaves. For example, consider a trie with
> max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
>
> Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
> ---
Tested-by: Hou Tao <houtao1@huawei.com>
Without the fix, there will be KASAN report as show below when dumping
all keys in the lpm-trie through bpf_map_get_next_key().
However, I have a dumb question: does it make sense to reject the
element with prefixlen = 0 ? Because I can't think of a use case where a
zero-length prefix will be useful.
==================================================================
BUG: KASAN: slab-out-of-bounds in trie_get_next_key+0x133/0x530
Write of size 8 at addr ffff8881076c2fc0 by task test_lpm_trie.b/446
CPU: 0 UID: 0 PID: 446 Comm: test_lpm_trie.b Not tainted 6.11.0+ #52
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0xb0
print_report+0xce/0x610
? trie_get_next_key+0x133/0x530
? kasan_complete_mode_report_info+0x3c/0x200
? trie_get_next_key+0x133/0x530
kasan_report+0x9c/0xd0
? trie_get_next_key+0x133/0x530
__asan_store8+0x81/0xb0
trie_get_next_key+0x133/0x530
__sys_bpf+0x1b03/0x3140
? __pfx___sys_bpf+0x10/0x10
? __pfx_vfs_write+0x10/0x10
? find_held_lock+0x8e/0xb0
? ksys_write+0xee/0x180
? syscall_exit_to_user_mode+0xb3/0x220
? mark_held_locks+0x28/0x90
? mark_held_locks+0x28/0x90
__x64_sys_bpf+0x45/0x60
x64_sys_call+0x1b2a/0x20d0
do_syscall_64+0x5d/0x100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f9c5e9c9c5d
......
</TASK>
Allocated by task 446:
kasan_save_stack+0x28/0x50
kasan_save_track+0x14/0x30
kasan_save_alloc_info+0x36/0x40
__kasan_kmalloc+0x84/0xa0
__kmalloc_noprof+0x214/0x540
trie_get_next_key+0xa7/0x530
__sys_bpf+0x1b03/0x3140
__x64_sys_bpf+0x45/0x60
x64_sys_call+0x1b2a/0x20d0
do_syscall_64+0x5d/0x100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The buggy address belongs to the object at ffff8881076c2f80
which belongs to the cache kmalloc-rnd-09-64 of size 64
The buggy address is located 0 bytes to the right of
allocated 64-byte region [ffff8881076c2f80, ffff8881076c2fc0)
> kernel/bpf/lpm_trie.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 0218a5132ab5..9b60eda0f727 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -655,7 +655,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
> if (!key || key->prefixlen > trie->max_prefixlen)
> goto find_leftmost;
>
> - node_stack = kmalloc_array(trie->max_prefixlen,
> + node_stack = kmalloc_array(trie->max_prefixlen + 1,
> sizeof(struct lpm_trie_node *),
> GFP_ATOMIC | __GFP_NOWARN);
> if (!node_stack)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
2024-10-23 2:03 ` Hou Tao
@ 2024-10-23 7:30 ` Byeonguk Jeong
2024-10-23 9:59 ` Hou Tao
0 siblings, 1 reply; 16+ messages in thread
From: Byeonguk Jeong @ 2024-10-23 7:30 UTC (permalink / raw)
To: Hou Tao; +Cc: Daniel Borkmann, Yonghong Song, bpf, linux-kernel
On Wed, Oct 23, 2024 at 10:03:44AM +0800, Hou Tao wrote:
>
> Without the fix, there will be KASAN report as show below when dumping
> all keys in the lpm-trie through bpf_map_get_next_key().
Thank you for testing.
>
> However, I have a dumb question: does it make sense to reject the
> element with prefixlen = 0 ? Because I can't think of a use case where a
> zero-length prefix will be useful.
With prefixlen = 0, it would always return -ENOENT, I think. Maybe it is
good to reject it earlier!
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
2024-10-22 19:51 ` Alexei Starovoitov
2024-10-23 1:29 ` Byeonguk Jeong
@ 2024-10-23 8:44 ` Byeonguk Jeong
1 sibling, 0 replies; 16+ messages in thread
From: Byeonguk Jeong @ 2024-10-23 8:44 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Daniel Borkmann, Yonghong Song, bpf, LKML
On Tue, Oct 22, 2024 at 12:51:05PM -0700, Alexei Starovoitov wrote:
> On Mon, Oct 21, 2024 at 6:49 PM Byeonguk Jeong <jungbu2855@gmail.com> wrote:
> >
> > trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
> > while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
> > full paths from the root to leaves. For example, consider a trie with
> > max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
> > 0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
> > .prefixlen = 8 make 9 nodes be written on the node stack with size 8.
>
> Hmm. It sounds possible, but pls demonstrate it with a selftest.
> With the amount of fuzzing I'm surprised it was not discovered earlier.
>
> pw-bot: cr
I sent this again because lkml did not understand previous one which is
8B encoded.
With a simple test below, the kernel crashes in a minute or you can
discover the bug on KFENCE-enabled kernels easily.
#!/bin/bash
bpftool map create /sys/fs/bpf/lpm type lpm_trie key 5 value 1 \
entries 16 flags 0x1name lpm
for i in {0..8}; do
bpftool map update pinned /sys/fs/bpf/lpm \
key hex 0$i 00 00 00 00 \
value hex 00 any
done
while true; do
bpftool map dump pinned /sys/fs/bpf/lpm
done
In my environment (6.12-rc4, with CONFIG_KFENCE), dmesg gave me this
message as expected.
[ 463.141394] BUG: KFENCE: out-of-bounds write in trie_get_next_key+0x2f2/0x670
[ 463.143422] Out-of-bounds write at 0x0000000095bc45ea (256B right of kfence-#156):
[ 463.144438] trie_get_next_key+0x2f2/0x670
[ 463.145439] map_get_next_key+0x261/0x410
[ 463.146444] __sys_bpf+0xad4/0x1170
[ 463.147438] __x64_sys_bpf+0x74/0xc0
[ 463.148431] do_syscall_64+0x79/0x150
[ 463.149425] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 463.151436] kfence-#156: 0x00000000279749c1-0x0000000034dc4abb, size=256, cache=kmalloc-256
[ 463.153414] allocated by task 2021 on cpu 2 at 463.140440s (0.012974s ago):
[ 463.154413] trie_get_next_key+0x252/0x670
[ 463.155411] map_get_next_key+0x261/0x410
[ 463.156402] __sys_bpf+0xad4/0x1170
[ 463.157390] __x64_sys_bpf+0x74/0xc0
[ 463.158386] do_syscall_64+0x79/0x150
[ 463.159372] entry_SYSCALL_64_after_hwframe+0x76/0x7e
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
2024-10-23 7:30 ` Byeonguk Jeong
@ 2024-10-23 9:59 ` Hou Tao
2024-10-24 1:48 ` Byeonguk Jeong
0 siblings, 1 reply; 16+ messages in thread
From: Hou Tao @ 2024-10-23 9:59 UTC (permalink / raw)
To: Byeonguk Jeong; +Cc: Daniel Borkmann, Yonghong Song, bpf, linux-kernel
Hi,
On 10/23/2024 3:30 PM, Byeonguk Jeong wrote:
> On Wed, Oct 23, 2024 at 10:03:44AM +0800, Hou Tao wrote:
>> Without the fix, there will be KASAN report as show below when dumping
>> all keys in the lpm-trie through bpf_map_get_next_key().
> Thank you for testing.
Alexei suggested adding a bpf self-test for the patch. I think you
could reference the code in lpm_trie_map_batch_ops.c [1] or similar and
add a new file that uses bpf_map_get_next_key to demonstrate the
out-of-bound problem. The test can be run by ./test_maps. There is some
document for the procedure in [2].
[1]: tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c
[2]:
https://github.com/torvalds/linux/blob/master/Documentation/bpf/bpf_devel_QA.rst
>
>> However, I have a dumb question: does it make sense to reject the
>> element with prefixlen = 0 ? Because I can't think of a use case where a
>> zero-length prefix will be useful.
> With prefixlen = 0, it would always return -ENOENT, I think. Maybe it is
> good to reject it earlier!
>
> .
Which procedure will return -ENOENT ? I think the element with
prefixlen=0 could still be found through the key with prefixlen = 0.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
2024-10-23 9:59 ` Hou Tao
@ 2024-10-24 1:48 ` Byeonguk Jeong
2024-10-24 3:19 ` Hou Tao
0 siblings, 1 reply; 16+ messages in thread
From: Byeonguk Jeong @ 2024-10-24 1:48 UTC (permalink / raw)
To: Hou Tao; +Cc: Daniel Borkmann, Yonghong Song, bpf, linux-kernel
Hi,
On Wed, Oct 23, 2024 at 05:59:53PM +0800, Hou Tao wrote:
> Alexei suggested adding a bpf self-test for the patch. I think you
> could reference the code in lpm_trie_map_batch_ops.c [1] or similar and
> add a new file that uses bpf_map_get_next_key to demonstrate the
> out-of-bound problem. The test can be run by ./test_maps. There is some
> document for the procedure in [2].
>
> [1]: tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c
> [2]:
> https://github.com/torvalds/linux/blob/master/Documentation/bpf/bpf_devel_QA.rst
Okay, I will add a new test. Thanks for the detailed guideline.
> Which procedure will return -ENOENT ? I think the element with
> prefixlen=0 could still be found through the key with prefixlen = 0.
I mean, BPF_MAP_GET_NEXT_KEY with .prefixlen = 0 would give us -ENOENT,
as it follows postorder. BPF_MAP_LOOKUP_ELEM still find the element
with prefixlen 0 through the key with prefixlen 0 as you said.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key()
2024-10-24 1:48 ` Byeonguk Jeong
@ 2024-10-24 3:19 ` Hou Tao
0 siblings, 0 replies; 16+ messages in thread
From: Hou Tao @ 2024-10-24 3:19 UTC (permalink / raw)
To: Byeonguk Jeong; +Cc: Daniel Borkmann, Yonghong Song, bpf, linux-kernel
On 10/24/2024 9:48 AM, Byeonguk Jeong wrote:
> Hi,
>
> On Wed, Oct 23, 2024 at 05:59:53PM +0800, Hou Tao wrote:
>> Alexei suggested adding a bpf self-test for the patch. I think you
>> could reference the code in lpm_trie_map_batch_ops.c [1] or similar and
>> add a new file that uses bpf_map_get_next_key to demonstrate the
>> out-of-bound problem. The test can be run by ./test_maps. There is some
>> document for the procedure in [2].
>>
>> [1]: tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c
>> [2]:
>> https://github.com/torvalds/linux/blob/master/Documentation/bpf/bpf_devel_QA.rst
> Okay, I will add a new test. Thanks for the detailed guideline.
>
>> Which procedure will return -ENOENT ? I think the element with
>> prefixlen=0 could still be found through the key with prefixlen = 0.
> I mean, BPF_MAP_GET_NEXT_KEY with .prefixlen = 0 would give us -ENOENT,
> as it follows postorder. BPF_MAP_LOOKUP_ELEM still find the element
> with prefixlen 0 through the key with prefixlen 0 as you said.
I see. But considering the element with .prefixlen = 0 is the last one
in the map, returning -ENOENT is expected.
> .
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] selftests/bpf: Add test for trie_get_next_key()
2024-10-22 1:45 [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key() Byeonguk Jeong
` (2 preceding siblings ...)
2024-10-23 2:03 ` Hou Tao
@ 2024-10-24 9:08 ` Byeonguk Jeong
2024-10-24 9:41 ` Daniel Borkmann
2024-10-25 11:53 ` Hou Tao
3 siblings, 2 replies; 16+ messages in thread
From: Byeonguk Jeong @ 2024-10-24 9:08 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Daniel Borkmann, Yonghong Song, bpf, LKML
Add a test for out-of-bounds write in trie_get_next_key() when a full
path from root to leaf exists and bpf_map_get_next_key() is called
with the leaf node. It may crashes the kernel on failure, so please
run in a VM.
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
---
.../bpf/map_tests/lpm_trie_map_get_next_key.c | 115 ++++++++++++++++++
1 file changed, 115 insertions(+)
create mode 100644 tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
diff --git a/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c b/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
new file mode 100644
index 000000000000..85b916b69411
--- /dev/null
+++ b/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * WARNING
+ * -------
+ * This test suite may crash the kernel, thus should be run in a VM.
+ */
+
+#define _GNU_SOURCE
+#include <linux/bpf.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include <test_maps.h>
+
+struct test_lpm_key {
+ __u32 prefix;
+ __u32 data;
+};
+
+struct get_next_key_ctx {
+ struct test_lpm_key key;
+ bool start;
+ bool stop;
+ int map_fd;
+ int loop;
+};
+
+static void *get_next_key_fn(void *arg)
+{
+ struct get_next_key_ctx *ctx = arg;
+ struct test_lpm_key next_key;
+ int i;
+
+ while (!ctx->start)
+ usleep(1);
+
+ while (!ctx->stop && i++ < ctx->loop)
+ bpf_map_get_next_key(ctx->map_fd, &ctx->key, &next_key);
+
+ return NULL;
+}
+
+static void abort_get_next_key(struct get_next_key_ctx *ctx, pthread_t *tids,
+ unsigned int nr)
+{
+ unsigned int i;
+
+ ctx->stop = true;
+ ctx->start = true;
+ for (i = 0; i < nr; i++)
+ pthread_join(tids[i], NULL);
+}
+
+/* This test aims to prevent regression of future. As long as the kernel does
+ * not panic, it is considered as success.
+ */
+void test_lpm_trie_map_get_next_key(void)
+{
+#define MAX_NR_THREADS 256
+ LIBBPF_OPTS(bpf_map_create_opts, create_opts,
+ .map_flags = BPF_F_NO_PREALLOC);
+ struct test_lpm_key key = {};
+ __u32 val = 0;
+ int map_fd;
+ const __u32 max_prefixlen = 8 * (sizeof(key) - sizeof(key.prefix));
+ const __u32 max_entries = max_prefixlen + 1;
+ unsigned int i, nr = MAX_NR_THREADS, loop = 4096;
+ pthread_t tids[MAX_NR_THREADS];
+ struct get_next_key_ctx ctx;
+ int err;
+
+ map_fd = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, "lpm_trie_map",
+ sizeof(struct test_lpm_key), sizeof(__u32),
+ max_entries, &create_opts);
+ CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
+ strerror(errno));
+
+ for (i = 0; i <= max_prefixlen; i++) {
+ key.prefix = i;
+ err = bpf_map_update_elem(map_fd, &key, &val, BPF_ANY);
+ CHECK(err, "bpf_map_update_elem()", "error:%s\n",
+ strerror(errno));
+ }
+
+ ctx.start = false;
+ ctx.stop = false;
+ ctx.map_fd = map_fd;
+ ctx.loop = loop;
+ memcpy(&ctx.key, &key, sizeof(key));
+
+ for (i = 0; i < nr; i++) {
+ err = pthread_create(&tids[i], NULL, get_next_key_fn, &ctx);
+ if (err) {
+ abort_get_next_key(&ctx, tids, i);
+ CHECK(err, "pthread_create", "error %d\n", err);
+ }
+ }
+
+ ctx.start = true;
+ for (i = 0; i < nr; i++)
+ pthread_join(tids[i], NULL);
+
+ printf("%s:PASS\n", __func__);
+
+ close(map_fd);
+}
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
2024-10-24 9:08 ` [PATCH] selftests/bpf: Add test for trie_get_next_key() Byeonguk Jeong
@ 2024-10-24 9:41 ` Daniel Borkmann
2024-10-24 22:26 ` Byeonguk Jeong
2024-10-25 11:53 ` Hou Tao
1 sibling, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2024-10-24 9:41 UTC (permalink / raw)
To: Byeonguk Jeong, Alexei Starovoitov; +Cc: Yonghong Song, bpf, LKML
Hi Byeonguk,
On 10/24/24 11:08 AM, Byeonguk Jeong wrote:
> Add a test for out-of-bounds write in trie_get_next_key() when a full
> path from root to leaf exists and bpf_map_get_next_key() is called
> with the leaf node. It may crashes the kernel on failure, so please
> run in a VM.
>
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
Could you submit the fix + this selftest as a 2-patch series, otherwise BPF CI
cannot test both in combination (pls make sure subject has [PATCH bpf] so that
our CI adds this on top of the bpf tree).
Right now the CI selftest build threw an error:
/tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c: In function ‘test_lpm_trie_map_get_next_key’:
/tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c:84:9: error: format not a string literal and no format arguments [-Werror=format-security]
84 | CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
| ^~~~~
TEST-OBJ [test_maps] task_storage_map.test.o
TEST-OBJ [test_progs] access_variable_array.test.o
cc1: all warnings being treated as errors
TEST-OBJ [test_progs] align.test.o
TEST-OBJ [test_progs] arena_atomics.test.o
make: *** [Makefile:765: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/lpm_trie_map_get_next_key.test.o] Error 1
make: *** Waiting for unfinished jobs....
GEN-SKEL [test_progs-no_alu32] test_usdt.skel.h
make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
Also on quick glance, please use ASSERT_*() macros instead of CHECK() as the
latter is deprecated.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
2024-10-24 9:41 ` Daniel Borkmann
@ 2024-10-24 22:26 ` Byeonguk Jeong
2024-10-25 11:54 ` Hou Tao
0 siblings, 1 reply; 16+ messages in thread
From: Byeonguk Jeong @ 2024-10-24 22:26 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: Alexei Starovoitov, Yonghong Song, bpf, LKML
Hi Daniel,
Okay, I will submit them in a series of patches. Btw, ASSERT_* macros
are not defined for map_tests. Should I add the definitions for them,
or just go with CHECK?
Thanks,
Byeonguk
On Thu, Oct 24, 2024 at 11:41:19AM +0200, Daniel Borkmann wrote:
> Hi Byeonguk,
>
> On 10/24/24 11:08 AM, Byeonguk Jeong wrote:
> > Add a test for out-of-bounds write in trie_get_next_key() when a full
> > path from root to leaf exists and bpf_map_get_next_key() is called
> > with the leaf node. It may crashes the kernel on failure, so please
> > run in a VM.
> >
> > Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
>
> Could you submit the fix + this selftest as a 2-patch series, otherwise BPF CI
> cannot test both in combination (pls make sure subject has [PATCH bpf] so that
> our CI adds this on top of the bpf tree).
>
> Right now the CI selftest build threw an error:
>
> /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c: In function ‘test_lpm_trie_map_get_next_key’:
> /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c:84:9: error: format not a string literal and no format arguments [-Werror=format-security]
> 84 | CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
> | ^~~~~
> TEST-OBJ [test_maps] task_storage_map.test.o
> TEST-OBJ [test_progs] access_variable_array.test.o
> cc1: all warnings being treated as errors
> TEST-OBJ [test_progs] align.test.o
> TEST-OBJ [test_progs] arena_atomics.test.o
> make: *** [Makefile:765: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/lpm_trie_map_get_next_key.test.o] Error 1
> make: *** Waiting for unfinished jobs....
> GEN-SKEL [test_progs-no_alu32] test_usdt.skel.h
> make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
>
> Also on quick glance, please use ASSERT_*() macros instead of CHECK() as the
> latter is deprecated.
>
> Thanks,
> Daniel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
2024-10-24 9:08 ` [PATCH] selftests/bpf: Add test for trie_get_next_key() Byeonguk Jeong
2024-10-24 9:41 ` Daniel Borkmann
@ 2024-10-25 11:53 ` Hou Tao
1 sibling, 0 replies; 16+ messages in thread
From: Hou Tao @ 2024-10-25 11:53 UTC (permalink / raw)
To: Byeonguk Jeong, Alexei Starovoitov
Cc: Daniel Borkmann, Yonghong Song, bpf, LKML
Hi,
On 10/24/2024 5:08 PM, Byeonguk Jeong wrote:
> Add a test for out-of-bounds write in trie_get_next_key() when a full
> path from root to leaf exists and bpf_map_get_next_key() is called
> with the leaf node. It may crashes the kernel on failure, so please
> run in a VM.
>
> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
> ---
> .../bpf/map_tests/lpm_trie_map_get_next_key.c | 115 ++++++++++++++++++
> 1 file changed, 115 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
>
> diff --git a/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c b/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
> new file mode 100644
> index 000000000000..85b916b69411
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * WARNING
> + * -------
> + * This test suite may crash the kernel, thus should be run in a VM.
> + */
> +
The comments above are unnecessary, please remove it.
> +#define _GNU_SOURCE
> +#include <linux/bpf.h>
> +#include <stdio.h>
> +#include <stdbool.h>
> +#include <unistd.h>
> +#include <errno.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <pthread.h>
> +
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include <test_maps.h>
> +
> +struct test_lpm_key {
> + __u32 prefix;
> + __u32 data;
> +};
> +
> +struct get_next_key_ctx {
> + struct test_lpm_key key;
> + bool start;
> + bool stop;
> + int map_fd;
> + int loop;
> +};
> +
> +static void *get_next_key_fn(void *arg)
> +{
> + struct get_next_key_ctx *ctx = arg;
> + struct test_lpm_key next_key;
> + int i;
int i = 0;
> +
> + while (!ctx->start)
> + usleep(1);
> +
> + while (!ctx->stop && i++ < ctx->loop)
> + bpf_map_get_next_key(ctx->map_fd, &ctx->key, &next_key);
> +
> + return NULL;
> +}
> +
> +static void abort_get_next_key(struct get_next_key_ctx *ctx, pthread_t *tids,
> + unsigned int nr)
> +{
> + unsigned int i;
> +
> + ctx->stop = true;
> + ctx->start = true;
> + for (i = 0; i < nr; i++)
> + pthread_join(tids[i], NULL);
> +}
> +
> +/* This test aims to prevent regression of future. As long as the kernel does
> + * not panic, it is considered as success.
> + */
> +void test_lpm_trie_map_get_next_key(void)
> +{
> +#define MAX_NR_THREADS 256
Are 8 threads sufficient to reproduce the problem ?
> + LIBBPF_OPTS(bpf_map_create_opts, create_opts,
> + .map_flags = BPF_F_NO_PREALLOC);
> + struct test_lpm_key key = {};
> + __u32 val = 0;
> + int map_fd;
> + const __u32 max_prefixlen = 8 * (sizeof(key) - sizeof(key.prefix));
> + const __u32 max_entries = max_prefixlen + 1;
> + unsigned int i, nr = MAX_NR_THREADS, loop = 4096;
> + pthread_t tids[MAX_NR_THREADS];
> + struct get_next_key_ctx ctx;
> + int err;
> +
> + map_fd = bpf_map_create(BPF_MAP_TYPE_LPM_TRIE, "lpm_trie_map",
> + sizeof(struct test_lpm_key), sizeof(__u32),
> + max_entries, &create_opts);
> + CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
> + strerror(errno));
CHECK(map_fd == -1, "bpf_map_create()", "error:%s\n", strerror(errno));
It seems you didn't build test it.
> +
> + for (i = 0; i <= max_prefixlen; i++) {
> + key.prefix = i;
> + err = bpf_map_update_elem(map_fd, &key, &val, BPF_ANY);
> + CHECK(err, "bpf_map_update_elem()", "error:%s\n",
> + strerror(errno));
> + }
> +
> + ctx.start = false;
> + ctx.stop = false;
> + ctx.map_fd = map_fd;
> + ctx.loop = loop;
> + memcpy(&ctx.key, &key, sizeof(key));
> +
> + for (i = 0; i < nr; i++) {
> + err = pthread_create(&tids[i], NULL, get_next_key_fn, &ctx);
> + if (err) {
> + abort_get_next_key(&ctx, tids, i);
> + CHECK(err, "pthread_create", "error %d\n", err);
> + }
> + }
> +
> + ctx.start = true;
> + for (i = 0; i < nr; i++)
> + pthread_join(tids[i], NULL);
> +
> + printf("%s:PASS\n", __func__);
> +
> + close(map_fd);
> +}
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
2024-10-24 22:26 ` Byeonguk Jeong
@ 2024-10-25 11:54 ` Hou Tao
2024-10-25 12:03 ` Daniel Borkmann
0 siblings, 1 reply; 16+ messages in thread
From: Hou Tao @ 2024-10-25 11:54 UTC (permalink / raw)
To: Byeonguk Jeong, Daniel Borkmann
Cc: Alexei Starovoitov, Yonghong Song, bpf, LKML
Hi,
On 10/25/2024 6:26 AM, Byeonguk Jeong wrote:
> Hi Daniel,
>
> Okay, I will submit them in a series of patches. Btw, ASSERT_* macros
> are not defined for map_tests. Should I add the definitions for them,
> or just go with CHECK?
For tests in map_tests, I think using CHECK() will be fine.
>
> Thanks,
> Byeonguk
>
> On Thu, Oct 24, 2024 at 11:41:19AM +0200, Daniel Borkmann wrote:
>> Hi Byeonguk,
>>
>> On 10/24/24 11:08 AM, Byeonguk Jeong wrote:
>>> Add a test for out-of-bounds write in trie_get_next_key() when a full
>>> path from root to leaf exists and bpf_map_get_next_key() is called
>>> with the leaf node. It may crashes the kernel on failure, so please
>>> run in a VM.
>>>
>>> Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
>> Could you submit the fix + this selftest as a 2-patch series, otherwise BPF CI
>> cannot test both in combination (pls make sure subject has [PATCH bpf] so that
>> our CI adds this on top of the bpf tree).
>>
>> Right now the CI selftest build threw an error:
>>
>> /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c: In function ‘test_lpm_trie_map_get_next_key’:
>> /tmp/work/bpf/bpf/tools/testing/selftests/bpf/map_tests/lpm_trie_map_get_next_key.c:84:9: error: format not a string literal and no format arguments [-Werror=format-security]
>> 84 | CHECK(map_fd == -1, "bpf_map_create(), error:%s\n",
>> | ^~~~~
>> TEST-OBJ [test_maps] task_storage_map.test.o
>> TEST-OBJ [test_progs] access_variable_array.test.o
>> cc1: all warnings being treated as errors
>> TEST-OBJ [test_progs] align.test.o
>> TEST-OBJ [test_progs] arena_atomics.test.o
>> make: *** [Makefile:765: /tmp/work/bpf/bpf/tools/testing/selftests/bpf/lpm_trie_map_get_next_key.test.o] Error 1
>> make: *** Waiting for unfinished jobs....
>> GEN-SKEL [test_progs-no_alu32] test_usdt.skel.h
>> make: Leaving directory '/tmp/work/bpf/bpf/tools/testing/selftests/bpf'
>>
>> Also on quick glance, please use ASSERT_*() macros instead of CHECK() as the
>> latter is deprecated.
>>
>> Thanks,
>> Daniel
> .
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] selftests/bpf: Add test for trie_get_next_key()
2024-10-25 11:54 ` Hou Tao
@ 2024-10-25 12:03 ` Daniel Borkmann
0 siblings, 0 replies; 16+ messages in thread
From: Daniel Borkmann @ 2024-10-25 12:03 UTC (permalink / raw)
To: Hou Tao, Byeonguk Jeong; +Cc: Alexei Starovoitov, Yonghong Song, bpf, LKML
On 10/25/24 1:54 PM, Hou Tao wrote:
> On 10/25/2024 6:26 AM, Byeonguk Jeong wrote:
>>
>> Okay, I will submit them in a series of patches. Btw, ASSERT_* macros
>> are not defined for map_tests. Should I add the definitions for them,
>> or just go with CHECK?
>
> For tests in map_tests, I think using CHECK() will be fine.
Given there is no alternative infra, agree. Would be nice to convert this
over at some point.
Best,
Daniel
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-10-25 12:03 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-22 1:45 [PATCH] bpf: Fix out-of-bounds write in trie_get_next_key() Byeonguk Jeong
2024-10-22 9:43 ` Toke Høiland-Jørgensen
2024-10-22 19:51 ` Alexei Starovoitov
2024-10-23 1:29 ` Byeonguk Jeong
2024-10-23 8:44 ` Byeonguk Jeong
2024-10-23 2:03 ` Hou Tao
2024-10-23 7:30 ` Byeonguk Jeong
2024-10-23 9:59 ` Hou Tao
2024-10-24 1:48 ` Byeonguk Jeong
2024-10-24 3:19 ` Hou Tao
2024-10-24 9:08 ` [PATCH] selftests/bpf: Add test for trie_get_next_key() Byeonguk Jeong
2024-10-24 9:41 ` Daniel Borkmann
2024-10-24 22:26 ` Byeonguk Jeong
2024-10-25 11:54 ` Hou Tao
2024-10-25 12:03 ` Daniel Borkmann
2024-10-25 11:53 ` Hou Tao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox