* bpf-next hang+kasan uaf refcount acquire splat when running test_progs
@ 2023-04-20 12:03 Florian Westphal
2023-04-20 12:44 ` Eduard Zingerman
0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2023-04-20 12:03 UTC (permalink / raw)
To: bpf; +Cc: Eduard Zingerman, Dave Marchevsky
Hello,
while working on bpf-netfilter test cases i found that test_progs
never invokes bpf_test_run().
After applying following small patch it gets called as expected.
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index 47e9e076bc8f..e2a1bdc5a570 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -587,7 +587,7 @@ void run_subtest(struct test_loader *tester,
/* For some reason test_verifier executes programs
* with all capabilities restored. Do the same here.
*/
- if (!restore_capabilities(&caps))
+ if (restore_capabilities(&caps))
goto tobj_cleanup;
do_prog_test_run(bpf_program__fd(tprog), &retval);
... but then output just hangs. With KASAN enabled I see following splat,
followed by a refcount saturation warning:
BUG: KASAN: slab-out-of-bounds in bpf_refcount_acquire_impl+0x63/0xd0
Write of size 4 at addr ffff8881072b34e8 by task new_name/12847
CPU: 1 PID: 12847 Comm: new_name Not tainted 6.3.0-rc6+ #129
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x32/0x40
print_address_description.constprop.0+0x2b/0x3d0
? bpf_refcount_acquire_impl+0x63/0xd0
print_report+0xb0/0x260
? bpf_refcount_acquire_impl+0x63/0xd0
? kasan_addr_to_slab+0x9/0x70
? bpf_refcount_acquire_impl+0x63/0xd0
kasan_report+0xad/0xd0
? bpf_refcount_acquire_impl+0x63/0xd0
kasan_check_range+0x13b/0x190
bpf_refcount_acquire_impl+0x63/0xd0
bpf_prog_affcc6c9d58016ca___insert_in_tree_and_list+0x54/0x131
bpf_prog_795203cdef4805f4_insert_and_remove_tree_true_list_true+0x15/0x11b
bpf_test_run+0x2a0/0x5f0
? bpf_test_timer_continue+0x430/0x430
? kmem_cache_alloc+0xe5/0x260
? kasan_set_track+0x21/0x30
? krealloc+0x9e/0xe0
bpf_prog_test_run_skb+0x890/0x1270
? __kmem_cache_free+0x9f/0x170
? bpf_prog_test_run_raw_tp+0x570/0x570
? __fget_light+0x52/0x4d0
? map_update_elem+0x680/0x680
__sys_bpf+0x75e/0xd10
? bpf_link_by_id+0xa0/0xa0
? rseq_get_rseq_cs+0x67/0x650
? __blkcg_punt_bio_submit+0x1b0/0x1b0
__x64_sys_bpf+0x6f/0xb0
do_syscall_64+0x3a/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f2f6a8b392d
Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 e4 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe46938328 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000007150690 RCX: 00007f2f6a8b392d
RDX: 0000000000000050 RSI: 00007ffe46938360 RDI: 000000000000000a
RBP: 00007ffe46938340 R08: 0000000000000064 R09: 00007ffe46938360
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffe46938b78 R14: 0000000000e09db0 R15: 00007f2f6a9ff000
</TASK>
I can also reproduce this on bpf-next/780c69830ec6b27e0224586ce26bc89552fcf163.
Is this a known bug?
If you can't reproduce this I can make .config available.
Thanks.
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: bpf-next hang+kasan uaf refcount acquire splat when running test_progs
2023-04-20 12:03 bpf-next hang+kasan uaf refcount acquire splat when running test_progs Florian Westphal
@ 2023-04-20 12:44 ` Eduard Zingerman
2023-04-20 12:52 ` Florian Westphal
0 siblings, 1 reply; 9+ messages in thread
From: Eduard Zingerman @ 2023-04-20 12:44 UTC (permalink / raw)
To: Florian Westphal, bpf; +Cc: Dave Marchevsky
On Thu, 2023-04-20 at 14:03 +0200, Florian Westphal wrote:
> Hello,
>
> while working on bpf-netfilter test cases i found that test_progs
> never invokes bpf_test_run().
>
> After applying following small patch it gets called as expected.
>
> diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
> index 47e9e076bc8f..e2a1bdc5a570 100644
> --- a/tools/testing/selftests/bpf/test_loader.c
> +++ b/tools/testing/selftests/bpf/test_loader.c
> @@ -587,7 +587,7 @@ void run_subtest(struct test_loader *tester,
> /* For some reason test_verifier executes programs
> * with all capabilities restored. Do the same here.
> */
> - if (!restore_capabilities(&caps))
> + if (restore_capabilities(&caps))
> goto tobj_cleanup;
>
> do_prog_test_run(bpf_program__fd(tprog), &retval);
>
> ... but then output just hangs. With KASAN enabled I see following splat,
> followed by a refcount saturation warning:
>
> BUG: KASAN: slab-out-of-bounds in bpf_refcount_acquire_impl+0x63/0xd0
> Write of size 4 at addr ffff8881072b34e8 by task new_name/12847
>
> CPU: 1 PID: 12847 Comm: new_name Not tainted 6.3.0-rc6+ #129
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x32/0x40
> print_address_description.constprop.0+0x2b/0x3d0
> ? bpf_refcount_acquire_impl+0x63/0xd0
> print_report+0xb0/0x260
> ? bpf_refcount_acquire_impl+0x63/0xd0
> ? kasan_addr_to_slab+0x9/0x70
> ? bpf_refcount_acquire_impl+0x63/0xd0
> kasan_report+0xad/0xd0
> ? bpf_refcount_acquire_impl+0x63/0xd0
> kasan_check_range+0x13b/0x190
> bpf_refcount_acquire_impl+0x63/0xd0
> bpf_prog_affcc6c9d58016ca___insert_in_tree_and_list+0x54/0x131
> bpf_prog_795203cdef4805f4_insert_and_remove_tree_true_list_true+0x15/0x11b
> bpf_test_run+0x2a0/0x5f0
> ? bpf_test_timer_continue+0x430/0x430
> ? kmem_cache_alloc+0xe5/0x260
> ? kasan_set_track+0x21/0x30
> ? krealloc+0x9e/0xe0
> bpf_prog_test_run_skb+0x890/0x1270
> ? __kmem_cache_free+0x9f/0x170
> ? bpf_prog_test_run_raw_tp+0x570/0x570
> ? __fget_light+0x52/0x4d0
> ? map_update_elem+0x680/0x680
> __sys_bpf+0x75e/0xd10
> ? bpf_link_by_id+0xa0/0xa0
> ? rseq_get_rseq_cs+0x67/0x650
> ? __blkcg_punt_bio_submit+0x1b0/0x1b0
> __x64_sys_bpf+0x6f/0xb0
> do_syscall_64+0x3a/0x80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f2f6a8b392d
> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 e4 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffe46938328 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> RAX: ffffffffffffffda RBX: 0000000007150690 RCX: 00007f2f6a8b392d
> RDX: 0000000000000050 RSI: 00007ffe46938360 RDI: 000000000000000a
> RBP: 00007ffe46938340 R08: 0000000000000064 R09: 00007ffe46938360
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 00007ffe46938b78 R14: 0000000000e09db0 R15: 00007f2f6a9ff000
> </TASK>
>
> I can also reproduce this on bpf-next/780c69830ec6b27e0224586ce26bc89552fcf163.
> Is this a known bug?
Hi Florian,
Thank you for the report, that's my bug :(
After the suggested change I can run the ./test_progs till the end
(takes a few minutes, though). One test is failing: verifier_array_access,
this is because map it uses is not populated with values (as it was when this was
a part ./test_verifier).
However, in the middle of execution I do see a trace similar to yours:
[ 82.757127] ------------[ cut here ]------------
[ 82.757667] refcount_t: saturated; leaking memory.
[ 82.758098] WARNING: CPU: 0 PID: 176 at lib/refcount.c:22 refcount_warn_saturate+0x61/0xe0
[ 82.758775] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE)]
[ 82.759369] CPU: 0 PID: 176 Comm: new_name Tainted: G W OE 6.3.0-rc6-01631-g780c69830ec6 #474
[ 82.760145] RIP: 0010:refcount_warn_saturate+0x61/0xe0
[ 82.760578] Code: 05 84 3a 34 01 01 e8 be 1a b5 ff 0f 0b c3 80 3d 78 3a 34 01 00 75 d7 48 c7 c7 d0 b0 0d 82 c6 05 68 3a 34 01 01 e8 9f 1a b5 ff <0f> 0b c3 80 3d 58 3a 34 01 00 75 b8 48 c7 c7 f8
[ 82.762066] RSP: 0018:ffffc90000ac7c80 EFLAGS: 00010282
[ 82.762491] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 82.763078] RDX: 0000000000000202 RSI: 00000000ffffffea RDI: 0000000000000001
[ 82.763674] RBP: ffffc90000ac7cb0 R08: ffffffff82745808 R09: 00000000ffffdfff
[ 82.764279] R10: ffffffff82665820 R11: ffffffff82715820 R12: ffff888102bdb128
[ 82.764888] R13: ffffc9000011d048 R14: ffff888102bdb128 R15: 0000000000000000
[ 82.765490] FS: 00007fde45dd1b80(0000) GS:ffff88817bc00000(0000) knlGS:0000000000000000
[ 82.766183] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 82.766662] CR2: 00007fde455ce000 CR3: 0000000102f55000 CR4: 00000000003506b0
[ 82.767226] Call Trace:
[ 82.767430] <TASK>
[ 82.767618] bpf_refcount_acquire_impl+0x3a/0x50
[ 82.767995] bpf_prog_a89006de37d09e06___insert_in_tree_and_list+0x54/0x131
[ 82.768556] bpf_prog_7c093a5d96bc51b4_insert_and_remove_tree_false_list_false+0x15/0xf2
[ 82.769195] bpf_test_run+0x17f/0x300
[ 82.769599] bpf_prog_test_run_skb+0x35c/0x700
[ 82.770014] __sys_bpf+0xa0b/0x2ca0
[ 82.770328] __x64_sys_bpf+0x1a/0x20
[ 82.770680] do_syscall_64+0x35/0x80
[ 82.771019] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 82.771474] RIP: 0033:0x7fde45ed45a9
[ 82.771805] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 08 0d 08
[ 82.773409] RSP: 002b:00007ffe2a7fee68 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 82.774088] RAX: ffffffffffffffda RBX: 000055b0c518f5f0 RCX: 00007fde45ed45a9
[ 82.774707] RDX: 0000000000000050 RSI: 00007ffe2a7feeb0 RDI: 000000000000000a
[ 82.775318] RBP: 00007ffe2a7fee80 R08: 0000000000000064 R09: 00007ffe2a7feeb0
[ 82.775924] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 82.776534] R13: 00007
Could you please share your config?
I'd like to reproduce the hang.
Thanks,
Eduard
>
> If you can't reproduce this I can make .config available.
>
> Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bpf-next hang+kasan uaf refcount acquire splat when running test_progs
2023-04-20 12:44 ` Eduard Zingerman
@ 2023-04-20 12:52 ` Florian Westphal
2023-04-20 13:00 ` Eduard Zingerman
0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2023-04-20 12:52 UTC (permalink / raw)
To: Eduard Zingerman; +Cc: Florian Westphal, bpf, Dave Marchevsky
Eduard Zingerman <eddyz87@gmail.com> wrote:
> > BUG: KASAN: slab-out-of-bounds in bpf_refcount_acquire_impl+0x63/0xd0
> > Write of size 4 at addr ffff8881072b34e8 by task new_name/12847
> >
> > CPU: 1 PID: 12847 Comm: new_name Not tainted 6.3.0-rc6+ #129
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x32/0x40
> > print_address_description.constprop.0+0x2b/0x3d0
> > ? bpf_refcount_acquire_impl+0x63/0xd0
> > print_report+0xb0/0x260
> > ? bpf_refcount_acquire_impl+0x63/0xd0
> > ? kasan_addr_to_slab+0x9/0x70
> > ? bpf_refcount_acquire_impl+0x63/0xd0
> > kasan_report+0xad/0xd0
> > ? bpf_refcount_acquire_impl+0x63/0xd0
> > kasan_check_range+0x13b/0x190
> > bpf_refcount_acquire_impl+0x63/0xd0
> > bpf_prog_affcc6c9d58016ca___insert_in_tree_and_list+0x54/0x131
> > bpf_prog_795203cdef4805f4_insert_and_remove_tree_true_list_true+0x15/0x11b
> > bpf_test_run+0x2a0/0x5f0
> > ? bpf_test_timer_continue+0x430/0x430
> > ? kmem_cache_alloc+0xe5/0x260
> > ? kasan_set_track+0x21/0x30
> > ? krealloc+0x9e/0xe0
> > bpf_prog_test_run_skb+0x890/0x1270
> > ? __kmem_cache_free+0x9f/0x170
> > ? bpf_prog_test_run_raw_tp+0x570/0x570
> > ? __fget_light+0x52/0x4d0
> > ? map_update_elem+0x680/0x680
> > __sys_bpf+0x75e/0xd10
> > ? bpf_link_by_id+0xa0/0xa0
> > ? rseq_get_rseq_cs+0x67/0x650
> > ? __blkcg_punt_bio_submit+0x1b0/0x1b0
> > __x64_sys_bpf+0x6f/0xb0
> > do_syscall_64+0x3a/0x80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > RIP: 0033:0x7f2f6a8b392d
> > Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 e4 0c 00 f7 d8 64 89 01 48
> > RSP: 002b:00007ffe46938328 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> > RAX: ffffffffffffffda RBX: 0000000007150690 RCX: 00007f2f6a8b392d
> > RDX: 0000000000000050 RSI: 00007ffe46938360 RDI: 000000000000000a
> > RBP: 00007ffe46938340 R08: 0000000000000064 R09: 00007ffe46938360
> > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > R13: 00007ffe46938b78 R14: 0000000000e09db0 R15: 00007f2f6a9ff000
> > </TASK>
> >
> > I can also reproduce this on bpf-next/780c69830ec6b27e0224586ce26bc89552fcf163.
> > Is this a known bug?
>
> Hi Florian,
>
> Thank you for the report, that's my bug :(
>
> After the suggested change I can run the ./test_progs till the end
> (takes a few minutes, though). One test is failing: verifier_array_access,
> this is because map it uses is not populated with values (as it was when this was
> a part ./test_verifier).
Right, I see that failure too.
> However, in the middle of execution I do see a trace similar to yours:
I see this as well, to get to it quicker:
./test_progs --allow=refcounted_kptr
> [ 82.757127] ------------[ cut here ]------------
> [ 82.757667] refcount_t: saturated; leaking memory.
> [ 82.758098] WARNING: CPU: 0 PID: 176 at lib/refcount.c:22 refcount_warn_saturate+0x61/0xe0
I get this one right after the kasan splat.
> Could you please share your config?
> I'd like to reproduce the hang.
It hangs for me if I just rerun
./test_progs --allow=refcounted_kptr
a couple of times (maybe once per cpu...?).
I'll send the config if that doesn't hang for you.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bpf-next hang+kasan uaf refcount acquire splat when running test_progs
2023-04-20 12:52 ` Florian Westphal
@ 2023-04-20 13:00 ` Eduard Zingerman
2023-04-20 13:14 ` Eduard Zingerman
0 siblings, 1 reply; 9+ messages in thread
From: Eduard Zingerman @ 2023-04-20 13:00 UTC (permalink / raw)
To: Florian Westphal; +Cc: bpf, Dave Marchevsky
On Thu, 2023-04-20 at 14:52 +0200, Florian Westphal wrote:
> Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > BUG: KASAN: slab-out-of-bounds in bpf_refcount_acquire_impl+0x63/0xd0
> > > Write of size 4 at addr ffff8881072b34e8 by task new_name/12847
> > >
> > > CPU: 1 PID: 12847 Comm: new_name Not tainted 6.3.0-rc6+ #129
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl+0x32/0x40
> > > print_address_description.constprop.0+0x2b/0x3d0
> > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > print_report+0xb0/0x260
> > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > ? kasan_addr_to_slab+0x9/0x70
> > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > kasan_report+0xad/0xd0
> > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > kasan_check_range+0x13b/0x190
> > > bpf_refcount_acquire_impl+0x63/0xd0
> > > bpf_prog_affcc6c9d58016ca___insert_in_tree_and_list+0x54/0x131
> > > bpf_prog_795203cdef4805f4_insert_and_remove_tree_true_list_true+0x15/0x11b
> > > bpf_test_run+0x2a0/0x5f0
> > > ? bpf_test_timer_continue+0x430/0x430
> > > ? kmem_cache_alloc+0xe5/0x260
> > > ? kasan_set_track+0x21/0x30
> > > ? krealloc+0x9e/0xe0
> > > bpf_prog_test_run_skb+0x890/0x1270
> > > ? __kmem_cache_free+0x9f/0x170
> > > ? bpf_prog_test_run_raw_tp+0x570/0x570
> > > ? __fget_light+0x52/0x4d0
> > > ? map_update_elem+0x680/0x680
> > > __sys_bpf+0x75e/0xd10
> > > ? bpf_link_by_id+0xa0/0xa0
> > > ? rseq_get_rseq_cs+0x67/0x650
> > > ? __blkcg_punt_bio_submit+0x1b0/0x1b0
> > > __x64_sys_bpf+0x6f/0xb0
> > > do_syscall_64+0x3a/0x80
> > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > RIP: 0033:0x7f2f6a8b392d
> > > Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 e4 0c 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007ffe46938328 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> > > RAX: ffffffffffffffda RBX: 0000000007150690 RCX: 00007f2f6a8b392d
> > > RDX: 0000000000000050 RSI: 00007ffe46938360 RDI: 000000000000000a
> > > RBP: 00007ffe46938340 R08: 0000000000000064 R09: 00007ffe46938360
> > > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > R13: 00007ffe46938b78 R14: 0000000000e09db0 R15: 00007f2f6a9ff000
> > > </TASK>
> > >
> > > I can also reproduce this on bpf-next/780c69830ec6b27e0224586ce26bc89552fcf163.
> > > Is this a known bug?
> >
> > Hi Florian,
> >
> > Thank you for the report, that's my bug :(
> >
> > After the suggested change I can run the ./test_progs till the end
> > (takes a few minutes, though). One test is failing: verifier_array_access,
> > this is because map it uses is not populated with values (as it was when this was
> > a part ./test_verifier).
>
> Right, I see that failure too.
> > However, in the middle of execution I do see a trace similar to yours:
>
> I see this as well, to get to it quicker:
> ./test_progs --allow=refcounted_kptr
>
>
> > [ 82.757127] ------------[ cut here ]------------
> > [ 82.757667] refcount_t: saturated; leaking memory.
> > [ 82.758098] WARNING: CPU: 0 PID: 176 at lib/refcount.c:22 refcount_warn_saturate+0x61/0xe0
>
> I get this one right after the kasan splat.
>
> > Could you please share your config?
> > I'd like to reproduce the hang.
>
> It hangs for me if I just rerun
> ./test_progs --allow=refcounted_kptr
>
> a couple of times (maybe once per cpu...?).
>
> I'll send the config if that doesn't hang for you.
Ok, I got the hang after executing the following couple of times:
for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done
And here is a dead lock warning in the dmesg:
[ 1200.463933] ============================================
[ 1200.464382] WARNING: possible recursive locking detected
[ 1200.464546] 6.3.0-rc6-01631-g780c69830ec6 #474 Tainted: G W OE
[ 1200.464546] --------------------------------------------
[ 1200.464546] test_progs/4311 is trying to acquire lock:
[ 1200.464546] ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: btf_put+0x36/0x80
[ 1200.464546]
[ 1200.464546] but task is already holding lock:
[ 1200.464546] ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: bpf_find_btf_id+0xeb/0x1b0
[ 1200.464546]
[ 1200.464546] other info that might help us debug this:
[ 1200.464546] Possible unsafe locking scenario:
[ 1200.464546]
[ 1200.464546] CPU0
[ 1200.464546] ----
[ 1200.464546] lock(btf_idr_lock);
[ 1200.464546] lock(btf_idr_lock);
[ 1200.464546]
[ 1200.464546] *** DEADLOCK ***
[ 1200.464546]
[ 1200.464546] May be due to missing lock nesting notation
[ 1200.464546]
[ 1200.464546] 1 lock held by test_progs/4311:
[ 1200.464546] #0: ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: bpf_find_btf_id+0xeb/0x1b0
[ 1200.464546]
[ 1200.464546] stack backtrace:
[ 1200.464546] CPU: 2 PID: 4311 Comm: test_progs Tainted: G W OE 6.3.0-rc6-01631-g780c69830ec6 #474
[ 1200.464546] Call Trace:
[ 1200.464546] <TASK>
[ 1200.464546] dump_stack_lvl+0x47/0x70
[ 1200.464546] __lock_acquire+0x88b/0x2710
[ 1200.464546] ? __lock_acquire+0x350/0x2710
[ 1200.464546] lock_acquire+0xca/0x2c0
[ 1200.464546] ? btf_put+0x36/0x80
[ 1200.464546] ? lock_acquire+0xda/0x2c0
[ 1200.464546] _raw_spin_lock_irqsave+0x3e/0x60
[ 1200.464546] ? btf_put+0x36/0x80
[ 1200.464546] btf_put+0x36/0x80
[ 1200.464546] bpf_find_btf_id+0xf3/0x1b0
[ 1200.464546] btf_parse_fields+0x570/0xbf0
[ 1200.464546] ? lock_release+0x139/0x280
[ 1200.464546] ? __bpf_map_area_alloc+0xaa/0xd0
[ 1200.464546] ? __kmem_cache_alloc_node+0x14a/0x220
[ 1200.464546] ? rcu_is_watching+0xd/0x40
[ 1200.464546] ? __kmalloc_node+0xcb/0x140
[ 1200.464546] map_check_btf+0x9c/0x260
[ 1200.464546] __sys_bpf+0x274b/0x2ca0
[ 1200.464546] ? lock_release+0x139/0x280
[ 1200.464546] __x64_sys_bpf+0x1a/0x20
[ 1200.464546] do_syscall_64+0x35/0x80
[ 1200.464546] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 1200.464546] RIP: 0033:0x7f659b7a15a9
[ 1200.464546] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 08 0d 08
[ 1200.464546] RSP: 002b:00007ffe984a7e88 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[ 1200.464546] RAX: ffffffffffffffda RBX: 000055ebdcfae150 RCX: 00007f659b7a15a9
[ 1200.464546] RDX: 0000000000000048 RSI: 00007ffe984a7f10 RDI: 0000000000000000
[ 1200.464546] RBP: 00007ffe984a7ea0 R08: 00007ffe984a8000 R09: 00007ffe984a7f10
[ 1200.464546] R10: 0000000000000001 R11: 0000000000000206 R12: 0000000000000000
[ 1200.464546] R13: 00007ffe984a8740 R14: 000055ebd91894f0 R15: 00007f659b8f0020
[ 1200.464546] </TASK>
[ 1224.297533] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [test_progs:4315]
[ 1224.297533] Modules linked in: [last unloaded: bpf_testmod(OE)]
The refcounted_kptr does use __retval annotations.
So, we have a testing framework issue that masks BTF parsing issue...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bpf-next hang+kasan uaf refcount acquire splat when running test_progs
2023-04-20 13:00 ` Eduard Zingerman
@ 2023-04-20 13:14 ` Eduard Zingerman
2023-04-20 23:22 ` Eduard Zingerman
0 siblings, 1 reply; 9+ messages in thread
From: Eduard Zingerman @ 2023-04-20 13:14 UTC (permalink / raw)
To: Florian Westphal; +Cc: bpf, Dave Marchevsky
On Thu, 2023-04-20 at 16:00 +0300, Eduard Zingerman wrote:
> On Thu, 2023-04-20 at 14:52 +0200, Florian Westphal wrote:
> > Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > BUG: KASAN: slab-out-of-bounds in bpf_refcount_acquire_impl+0x63/0xd0
> > > > Write of size 4 at addr ffff8881072b34e8 by task new_name/12847
> > > >
> > > > CPU: 1 PID: 12847 Comm: new_name Not tainted 6.3.0-rc6+ #129
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> > > > Call Trace:
> > > > <TASK>
> > > > dump_stack_lvl+0x32/0x40
> > > > print_address_description.constprop.0+0x2b/0x3d0
> > > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > > print_report+0xb0/0x260
> > > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > > ? kasan_addr_to_slab+0x9/0x70
> > > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > > kasan_report+0xad/0xd0
> > > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > > kasan_check_range+0x13b/0x190
> > > > bpf_refcount_acquire_impl+0x63/0xd0
> > > > bpf_prog_affcc6c9d58016ca___insert_in_tree_and_list+0x54/0x131
> > > > bpf_prog_795203cdef4805f4_insert_and_remove_tree_true_list_true+0x15/0x11b
> > > > bpf_test_run+0x2a0/0x5f0
> > > > ? bpf_test_timer_continue+0x430/0x430
> > > > ? kmem_cache_alloc+0xe5/0x260
> > > > ? kasan_set_track+0x21/0x30
> > > > ? krealloc+0x9e/0xe0
> > > > bpf_prog_test_run_skb+0x890/0x1270
> > > > ? __kmem_cache_free+0x9f/0x170
> > > > ? bpf_prog_test_run_raw_tp+0x570/0x570
> > > > ? __fget_light+0x52/0x4d0
> > > > ? map_update_elem+0x680/0x680
> > > > __sys_bpf+0x75e/0xd10
> > > > ? bpf_link_by_id+0xa0/0xa0
> > > > ? rseq_get_rseq_cs+0x67/0x650
> > > > ? __blkcg_punt_bio_submit+0x1b0/0x1b0
> > > > __x64_sys_bpf+0x6f/0xb0
> > > > do_syscall_64+0x3a/0x80
> > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > RIP: 0033:0x7f2f6a8b392d
> > > > Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 e4 0c 00 f7 d8 64 89 01 48
> > > > RSP: 002b:00007ffe46938328 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> > > > RAX: ffffffffffffffda RBX: 0000000007150690 RCX: 00007f2f6a8b392d
> > > > RDX: 0000000000000050 RSI: 00007ffe46938360 RDI: 000000000000000a
> > > > RBP: 00007ffe46938340 R08: 0000000000000064 R09: 00007ffe46938360
> > > > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > > R13: 00007ffe46938b78 R14: 0000000000e09db0 R15: 00007f2f6a9ff000
> > > > </TASK>
> > > >
> > > > I can also reproduce this on bpf-next/780c69830ec6b27e0224586ce26bc89552fcf163.
> > > > Is this a known bug?
> > >
> > > Hi Florian,
> > >
> > > Thank you for the report, that's my bug :(
> > >
> > > After the suggested change I can run the ./test_progs till the end
> > > (takes a few minutes, though). One test is failing: verifier_array_access,
> > > this is because map it uses is not populated with values (as it was when this was
> > > a part ./test_verifier).
> >
> > Right, I see that failure too.
> > > However, in the middle of execution I do see a trace similar to yours:
> >
> > I see this as well, to get to it quicker:
> > ./test_progs --allow=refcounted_kptr
> >
> >
> > > [ 82.757127] ------------[ cut here ]------------
> > > [ 82.757667] refcount_t: saturated; leaking memory.
> > > [ 82.758098] WARNING: CPU: 0 PID: 176 at lib/refcount.c:22 refcount_warn_saturate+0x61/0xe0
> >
> > I get this one right after the kasan splat.
> >
> > > Could you please share your config?
> > > I'd like to reproduce the hang.
> >
> > It hangs for me if I just rerun
> > ./test_progs --allow=refcounted_kptr
> >
> > a couple of times (maybe once per cpu...?).
> >
> > I'll send the config if that doesn't hang for you.
>
> Ok, I got the hang after executing the following couple of times:
>
> for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done
>
> And here is a dead lock warning in the dmesg:
>
> [ 1200.463933] ============================================
> [ 1200.464382] WARNING: possible recursive locking detected
> [ 1200.464546] 6.3.0-rc6-01631-g780c69830ec6 #474 Tainted: G W OE
> [ 1200.464546] --------------------------------------------
> [ 1200.464546] test_progs/4311 is trying to acquire lock:
> [ 1200.464546] ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: btf_put+0x36/0x80
> [ 1200.464546]
> [ 1200.464546] but task is already holding lock:
> [ 1200.464546] ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: bpf_find_btf_id+0xeb/0x1b0
> [ 1200.464546]
> [ 1200.464546] other info that might help us debug this:
> [ 1200.464546] Possible unsafe locking scenario:
> [ 1200.464546]
> [ 1200.464546] CPU0
> [ 1200.464546] ----
> [ 1200.464546] lock(btf_idr_lock);
> [ 1200.464546] lock(btf_idr_lock);
> [ 1200.464546]
> [ 1200.464546] *** DEADLOCK ***
> [ 1200.464546]
> [ 1200.464546] May be due to missing lock nesting notation
> [ 1200.464546]
> [ 1200.464546] 1 lock held by test_progs/4311:
> [ 1200.464546] #0: ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: bpf_find_btf_id+0xeb/0x1b0
> [ 1200.464546]
> [ 1200.464546] stack backtrace:
> [ 1200.464546] CPU: 2 PID: 4311 Comm: test_progs Tainted: G W OE 6.3.0-rc6-01631-g780c69830ec6 #474
> [ 1200.464546] Call Trace:
> [ 1200.464546] <TASK>
> [ 1200.464546] dump_stack_lvl+0x47/0x70
> [ 1200.464546] __lock_acquire+0x88b/0x2710
> [ 1200.464546] ? __lock_acquire+0x350/0x2710
> [ 1200.464546] lock_acquire+0xca/0x2c0
> [ 1200.464546] ? btf_put+0x36/0x80
> [ 1200.464546] ? lock_acquire+0xda/0x2c0
> [ 1200.464546] _raw_spin_lock_irqsave+0x3e/0x60
> [ 1200.464546] ? btf_put+0x36/0x80
> [ 1200.464546] btf_put+0x36/0x80
> [ 1200.464546] bpf_find_btf_id+0xf3/0x1b0
> [ 1200.464546] btf_parse_fields+0x570/0xbf0
> [ 1200.464546] ? lock_release+0x139/0x280
> [ 1200.464546] ? __bpf_map_area_alloc+0xaa/0xd0
> [ 1200.464546] ? __kmem_cache_alloc_node+0x14a/0x220
> [ 1200.464546] ? rcu_is_watching+0xd/0x40
> [ 1200.464546] ? __kmalloc_node+0xcb/0x140
> [ 1200.464546] map_check_btf+0x9c/0x260
> [ 1200.464546] __sys_bpf+0x274b/0x2ca0
> [ 1200.464546] ? lock_release+0x139/0x280
> [ 1200.464546] __x64_sys_bpf+0x1a/0x20
> [ 1200.464546] do_syscall_64+0x35/0x80
> [ 1200.464546] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> [ 1200.464546] RIP: 0033:0x7f659b7a15a9
> [ 1200.464546] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 08 0d 08
> [ 1200.464546] RSP: 002b:00007ffe984a7e88 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> [ 1200.464546] RAX: ffffffffffffffda RBX: 000055ebdcfae150 RCX: 00007f659b7a15a9
> [ 1200.464546] RDX: 0000000000000048 RSI: 00007ffe984a7f10 RDI: 0000000000000000
> [ 1200.464546] RBP: 00007ffe984a7ea0 R08: 00007ffe984a8000 R09: 00007ffe984a7f10
> [ 1200.464546] R10: 0000000000000001 R11: 0000000000000206 R12: 0000000000000000
> [ 1200.464546] R13: 00007ffe984a8740 R14: 000055ebd91894f0 R15: 00007f659b8f0020
> [ 1200.464546] </TASK>
> [ 1224.297533] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [test_progs:4315]
> [ 1224.297533] Modules linked in: [last unloaded: bpf_testmod(OE)]
>
> The refcounted_kptr does use __retval annotations.
> So, we have a testing framework issue that masks BTF parsing issue...
>
And when executed on a freshly start VM (before the saturation warning)
the error is different:
[ 15.673749] ------------[ cut here ]------------
[ 15.674384] refcount_t: addition on 0; use-after-free.
[ 15.674878] WARNING: CPU: 2 PID: 119 at lib/refcount.c:25 refcount_warn_saturate+0x80/0xe0
[ 15.675613] Modules linked in: bpf_testmod(OE)
[ 15.676019] CPU: 2 PID: 119 Comm: test_progs Tainted: G OE 6.3.0-rc6-01631-g780c69830ec6 #474
[ 15.676859] RIP: 0010:refcount_warn_saturate+0x80/0xe0
[ 15.677315] Code: 05 68 3a 34 01 01 e8 9f 1a b5 ff 0f 0b c3 80 3d 58 3a 34 01 00 75 b8 48 c7 c7 f8 b0 0d 82 c6 05 48 3a 34 01 01 e8 80 1a b5 ff <0f> 0b c3 80 3d 3b 3a 34 01 00 75 99 48 c7 c7 db
[ 15.678935] RSP: 0018:ffffc9000031bc80 EFLAGS: 00010282
[ 15.679411] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 15.680047] RDX: 0000000000000202 RSI: 00000000ffffffea RDI: 0000000000000001
[ 15.680664] RBP: ffffc9000031bcb0 R08: ffffffff82745808 R09: 00000000ffffdfff
[ 15.681289] R10: ffffffff82665820 R11: ffffffff82715820 R12: ffff888102b7a428
[ 15.681957] R13: ffffc90000159048 R14: ffff888102b7a428 R15: 0000000000000000
[ 15.682592] FS: 00007fd26beeeb80(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000
[ 15.683272] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 15.683764] CR2: 0000558f51f65cd3 CR3: 0000000104034000 CR4: 00000000003506a0
[ 15.684383] Call Trace:
[ 15.684611] <TASK>
[ 15.684799] bpf_refcount_acquire_impl+0x4a/0x50
[ 15.685214] bpf_prog_b3419cf1b56e14c4___insert_in_tree_and_list+0x54/0x131
[ 15.685808] bpf_prog_b5014ed510a2fac6_insert_and_remove_tree_true_list_true+0x15/0x11b
[ 15.686562] bpf_test_run+0x17f/0x300
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bpf-next hang+kasan uaf refcount acquire splat when running test_progs
2023-04-20 13:14 ` Eduard Zingerman
@ 2023-04-20 23:22 ` Eduard Zingerman
2023-04-20 23:54 ` Dave Marchevsky
0 siblings, 1 reply; 9+ messages in thread
From: Eduard Zingerman @ 2023-04-20 23:22 UTC (permalink / raw)
To: Florian Westphal; +Cc: bpf, Dave Marchevsky
On Thu, 2023-04-20 at 16:14 +0300, Eduard Zingerman wrote:
> On Thu, 2023-04-20 at 16:00 +0300, Eduard Zingerman wrote:
> > On Thu, 2023-04-20 at 14:52 +0200, Florian Westphal wrote:
> > > Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > > > BUG: KASAN: slab-out-of-bounds in bpf_refcount_acquire_impl+0x63/0xd0
> > > > > Write of size 4 at addr ffff8881072b34e8 by task new_name/12847
> > > > >
> > > > > CPU: 1 PID: 12847 Comm: new_name Not tainted 6.3.0-rc6+ #129
> > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> > > > > Call Trace:
> > > > > <TASK>
> > > > > dump_stack_lvl+0x32/0x40
> > > > > print_address_description.constprop.0+0x2b/0x3d0
> > > > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > > > print_report+0xb0/0x260
> > > > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > > > ? kasan_addr_to_slab+0x9/0x70
> > > > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > > > kasan_report+0xad/0xd0
> > > > > ? bpf_refcount_acquire_impl+0x63/0xd0
> > > > > kasan_check_range+0x13b/0x190
> > > > > bpf_refcount_acquire_impl+0x63/0xd0
> > > > > bpf_prog_affcc6c9d58016ca___insert_in_tree_and_list+0x54/0x131
> > > > > bpf_prog_795203cdef4805f4_insert_and_remove_tree_true_list_true+0x15/0x11b
> > > > > bpf_test_run+0x2a0/0x5f0
> > > > > ? bpf_test_timer_continue+0x430/0x430
> > > > > ? kmem_cache_alloc+0xe5/0x260
> > > > > ? kasan_set_track+0x21/0x30
> > > > > ? krealloc+0x9e/0xe0
> > > > > bpf_prog_test_run_skb+0x890/0x1270
> > > > > ? __kmem_cache_free+0x9f/0x170
> > > > > ? bpf_prog_test_run_raw_tp+0x570/0x570
> > > > > ? __fget_light+0x52/0x4d0
> > > > > ? map_update_elem+0x680/0x680
> > > > > __sys_bpf+0x75e/0xd10
> > > > > ? bpf_link_by_id+0xa0/0xa0
> > > > > ? rseq_get_rseq_cs+0x67/0x650
> > > > > ? __blkcg_punt_bio_submit+0x1b0/0x1b0
> > > > > __x64_sys_bpf+0x6f/0xb0
> > > > > do_syscall_64+0x3a/0x80
> > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > RIP: 0033:0x7f2f6a8b392d
> > > > > Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 e4 0c 00 f7 d8 64 89 01 48
> > > > > RSP: 002b:00007ffe46938328 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> > > > > RAX: ffffffffffffffda RBX: 0000000007150690 RCX: 00007f2f6a8b392d
> > > > > RDX: 0000000000000050 RSI: 00007ffe46938360 RDI: 000000000000000a
> > > > > RBP: 00007ffe46938340 R08: 0000000000000064 R09: 00007ffe46938360
> > > > > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > > > R13: 00007ffe46938b78 R14: 0000000000e09db0 R15: 00007f2f6a9ff000
> > > > > </TASK>
> > > > >
> > > > > I can also reproduce this on bpf-next/780c69830ec6b27e0224586ce26bc89552fcf163.
> > > > > Is this a known bug?
> > > >
> > > > Hi Florian,
> > > >
> > > > Thank you for the report, that's my bug :(
> > > >
> > > > After the suggested change I can run the ./test_progs till the end
> > > > (takes a few minutes, though). One test is failing: verifier_array_access,
> > > > this is because map it uses is not populated with values (as it was when this was
> > > > a part ./test_verifier).
> > >
> > > Right, I see that failure too.
> > > > However, in the middle of execution I do see a trace similar to yours:
> > >
> > > I see this as well, to get to it quicker:
> > > ./test_progs --allow=refcounted_kptr
> > >
> > >
> > > > [ 82.757127] ------------[ cut here ]------------
> > > > [ 82.757667] refcount_t: saturated; leaking memory.
> > > > [ 82.758098] WARNING: CPU: 0 PID: 176 at lib/refcount.c:22 refcount_warn_saturate+0x61/0xe0
> > >
> > > I get this one right after the kasan splat.
> > >
> > > > Could you please share your config?
> > > > I'd like to reproduce the hang.
> > >
> > > It hangs for me if I just rerun
> > > ./test_progs --allow=refcounted_kptr
> > >
> > > a couple of times (maybe once per cpu...?).
> > >
> > > I'll send the config if that doesn't hang for you.
> >
> > Ok, I got the hang after executing the following couple of times:
> >
> > for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done
> >
> > And here is a dead lock warning in the dmesg:
> >
> > [ 1200.463933] ============================================
> > [ 1200.464382] WARNING: possible recursive locking detected
> > [ 1200.464546] 6.3.0-rc6-01631-g780c69830ec6 #474 Tainted: G W OE
> > [ 1200.464546] --------------------------------------------
> > [ 1200.464546] test_progs/4311 is trying to acquire lock:
> > [ 1200.464546] ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: btf_put+0x36/0x80
> > [ 1200.464546]
> > [ 1200.464546] but task is already holding lock:
> > [ 1200.464546] ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: bpf_find_btf_id+0xeb/0x1b0
> > [ 1200.464546]
> > [ 1200.464546] other info that might help us debug this:
> > [ 1200.464546] Possible unsafe locking scenario:
> > [ 1200.464546]
> > [ 1200.464546] CPU0
> > [ 1200.464546] ----
> > [ 1200.464546] lock(btf_idr_lock);
> > [ 1200.464546] lock(btf_idr_lock);
> > [ 1200.464546]
> > [ 1200.464546] *** DEADLOCK ***
> > [ 1200.464546]
> > [ 1200.464546] May be due to missing lock nesting notation
> > [ 1200.464546]
> > [ 1200.464546] 1 lock held by test_progs/4311:
> > [ 1200.464546] #0: ffffffff8276f518 (btf_idr_lock){+.-.}-{2:2}, at: bpf_find_btf_id+0xeb/0x1b0
> > [ 1200.464546]
> > [ 1200.464546] stack backtrace:
> > [ 1200.464546] CPU: 2 PID: 4311 Comm: test_progs Tainted: G W OE 6.3.0-rc6-01631-g780c69830ec6 #474
> > [ 1200.464546] Call Trace:
> > [ 1200.464546] <TASK>
> > [ 1200.464546] dump_stack_lvl+0x47/0x70
> > [ 1200.464546] __lock_acquire+0x88b/0x2710
> > [ 1200.464546] ? __lock_acquire+0x350/0x2710
> > [ 1200.464546] lock_acquire+0xca/0x2c0
> > [ 1200.464546] ? btf_put+0x36/0x80
> > [ 1200.464546] ? lock_acquire+0xda/0x2c0
> > [ 1200.464546] _raw_spin_lock_irqsave+0x3e/0x60
> > [ 1200.464546] ? btf_put+0x36/0x80
> > [ 1200.464546] btf_put+0x36/0x80
> > [ 1200.464546] bpf_find_btf_id+0xf3/0x1b0
> > [ 1200.464546] btf_parse_fields+0x570/0xbf0
> > [ 1200.464546] ? lock_release+0x139/0x280
> > [ 1200.464546] ? __bpf_map_area_alloc+0xaa/0xd0
> > [ 1200.464546] ? __kmem_cache_alloc_node+0x14a/0x220
> > [ 1200.464546] ? rcu_is_watching+0xd/0x40
> > [ 1200.464546] ? __kmalloc_node+0xcb/0x140
> > [ 1200.464546] map_check_btf+0x9c/0x260
> > [ 1200.464546] __sys_bpf+0x274b/0x2ca0
> > [ 1200.464546] ? lock_release+0x139/0x280
> > [ 1200.464546] __x64_sys_bpf+0x1a/0x20
> > [ 1200.464546] do_syscall_64+0x35/0x80
> > [ 1200.464546] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > [ 1200.464546] RIP: 0033:0x7f659b7a15a9
> > [ 1200.464546] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 08 0d 08
> > [ 1200.464546] RSP: 002b:00007ffe984a7e88 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
> > [ 1200.464546] RAX: ffffffffffffffda RBX: 000055ebdcfae150 RCX: 00007f659b7a15a9
> > [ 1200.464546] RDX: 0000000000000048 RSI: 00007ffe984a7f10 RDI: 0000000000000000
> > [ 1200.464546] RBP: 00007ffe984a7ea0 R08: 00007ffe984a8000 R09: 00007ffe984a7f10
> > [ 1200.464546] R10: 0000000000000001 R11: 0000000000000206 R12: 0000000000000000
> > [ 1200.464546] R13: 00007ffe984a8740 R14: 000055ebd91894f0 R15: 00007f659b8f0020
> > [ 1200.464546] </TASK>
> > [ 1224.297533] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [test_progs:4315]
> > [ 1224.297533] Modules linked in: [last unloaded: bpf_testmod(OE)]
> >
> > The refcounted_kptr does use __retval annotations.
> > So, we have a testing framework issue that masks BTF parsing issue...
> >
>
> And when executed on a freshly start VM (before the saturation warning)
> the error is different:
>
> [ 15.673749] ------------[ cut here ]------------
> [ 15.674384] refcount_t: addition on 0; use-after-free.
> [ 15.674878] WARNING: CPU: 2 PID: 119 at lib/refcount.c:25 refcount_warn_saturate+0x80/0xe0
> [ 15.675613] Modules linked in: bpf_testmod(OE)
> [ 15.676019] CPU: 2 PID: 119 Comm: test_progs Tainted: G OE 6.3.0-rc6-01631-g780c69830ec6 #474
> [ 15.676859] RIP: 0010:refcount_warn_saturate+0x80/0xe0
> [ 15.677315] Code: 05 68 3a 34 01 01 e8 9f 1a b5 ff 0f 0b c3 80 3d 58 3a 34 01 00 75 b8 48 c7 c7 f8 b0 0d 82 c6 05 48 3a 34 01 01 e8 80 1a b5 ff <0f> 0b c3 80 3d 3b 3a 34 01 00 75 99 48 c7 c7 db
> [ 15.678935] RSP: 0018:ffffc9000031bc80 EFLAGS: 00010282
> [ 15.679411] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 15.680047] RDX: 0000000000000202 RSI: 00000000ffffffea RDI: 0000000000000001
> [ 15.680664] RBP: ffffc9000031bcb0 R08: ffffffff82745808 R09: 00000000ffffdfff
> [ 15.681289] R10: ffffffff82665820 R11: ffffffff82715820 R12: ffff888102b7a428
> [ 15.681957] R13: ffffc90000159048 R14: ffff888102b7a428 R15: 0000000000000000
> [ 15.682592] FS: 00007fd26beeeb80(0000) GS:ffff88817bd00000(0000) knlGS:0000000000000000
> [ 15.683272] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 15.683764] CR2: 0000558f51f65cd3 CR3: 0000000104034000 CR4: 00000000003506a0
> [ 15.684383] Call Trace:
> [ 15.684611] <TASK>
> [ 15.684799] bpf_refcount_acquire_impl+0x4a/0x50
> [ 15.685214] bpf_prog_b3419cf1b56e14c4___insert_in_tree_and_list+0x54/0x131
> [ 15.685808] bpf_prog_b5014ed510a2fac6_insert_and_remove_tree_true_list_true+0x15/0x11b
> [ 15.686562] bpf_test_run+0x17f/0x300
>
Looking at the error and at the test source code, it appears to me
that there is an API misuse for the `refcount_t` type.
The `bpf_refcount_acquire` invocation in the test expands as a call to
bpf_refcount_acquire_impl(), which treats passed pointer as a value of
`refcount_t`:
/* Description
* Increment the refcount on a refcounted local kptr, turning the
* non-owning reference input into an owning reference in the process.
*
* The 'meta' parameter is rewritten by the verifier, no need for BPF
* program to set it.
* Returns
* An owning reference to the object pointed to by 'kptr'
*/
extern void *bpf_refcount_acquire_impl(void *kptr, void *meta) __ksym;
__bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
{
...
refcount_inc((refcount_t *)ref);
return (void *)p__refcounted_kptr;
}
The comment for `refcount_inc` says:
/**
* ...
* Will WARN if the refcount is 0, as this represents a possible use-after-free
* condition.
*/
static inline void refcount_inc(refcount_t *r)
And looking at block/blk-core.c as an example, refcount_t is supposed
to be used as follows:
- upon object creation it's refcount is set to 1:
refcount_set(&q->refs, 1);
- when reference is added the refcount is incremented:
refcount_inc(&q->refs);
- when reference is removed the refcount is decremented and checked:
if (refcount_dec_and_test(&q->refs))
blk_free_queue(q);
So, 0 is not actually a valid value for refcount_t instance. And I
don't see any calls to refcount_set() in kernel/bpf/helpers.c, which
implements bpf_refcount_acquire_impl().
Dave, I see that bpf_refcount_acquire_impl() was recently added by you,
could you please take a look?
The TLDR: of a thread:
- __retval is currently ignored;
- to fix it apply the patch below;
- running the following command several times produces lot's
of nasty errors in dmesg:
$ for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index 47e9e076bc8f..e2a1bdc5a570 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -587,7 +587,7 @@ void run_subtest(struct test_loader *tester,
/* For some reason test_verifier executes programs
* with all capabilities restored. Do the same here.
*/
- if (!restore_capabilities(&caps))
+ if (restore_capabilities(&caps))
goto tobj_cleanup;
do_prog_test_run(bpf_program__fd(tprog), &retval);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: bpf-next hang+kasan uaf refcount acquire splat when running test_progs
2023-04-20 23:22 ` Eduard Zingerman
@ 2023-04-20 23:54 ` Dave Marchevsky
2023-04-21 0:00 ` Eduard Zingerman
0 siblings, 1 reply; 9+ messages in thread
From: Dave Marchevsky @ 2023-04-20 23:54 UTC (permalink / raw)
To: Eduard Zingerman, Florian Westphal; +Cc: bpf, Dave Marchevsky
Hi Eduard and Florian, thanks for finding this issue. I am looking into the
bpf_refcount side of things, and IIUC Eduard just submitted a series fixing
__retval / test infra.
I'm having trouble reproducing the refcount-specific splats, would you mind
sharing .config?
On 4/20/23 7:22 PM, Eduard Zingerman wrote:
[...]
> Looking at the error and at the test source code, it appears to me
> that there is an API misuse for the `refcount_t` type.
>
> The `bpf_refcount_acquire` invocation in the test expands as a call to
> bpf_refcount_acquire_impl(), which treats passed pointer as a value of
> `refcount_t`:
>
> /* Description
> * Increment the refcount on a refcounted local kptr, turning the
> * non-owning reference input into an owning reference in the process.
> *
> * The 'meta' parameter is rewritten by the verifier, no need for BPF
> * program to set it.
> * Returns
> * An owning reference to the object pointed to by 'kptr'
> */
> extern void *bpf_refcount_acquire_impl(void *kptr, void *meta) __ksym;
>
> __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
> {
> ...
> refcount_inc((refcount_t *)ref);
> return (void *)p__refcounted_kptr;
> }
>
> The comment for `refcount_inc` says:
>
> /**
> * ...
> * Will WARN if the refcount is 0, as this represents a possible use-after-free
> * condition.
> */
> static inline void refcount_inc(refcount_t *r)
>
> And looking at block/blk-core.c as an example, refcount_t is supposed
> to be used as follows:
> - upon object creation it's refcount is set to 1:
> refcount_set(&q->refs, 1);
> - when reference is added the refcount is incremented:
> refcount_inc(&q->refs);
> - when reference is removed the refcount is decremented and checked:
> if (refcount_dec_and_test(&q->refs))
> blk_free_queue(q);
>
> So, 0 is not actually a valid value for refcount_t instance. And I
> don't see any calls to refcount_set() in kernel/bpf/helpers.c, which
> implements bpf_refcount_acquire_impl().
>
> Dave, I see that bpf_refcount_acquire_impl() was recently added by you,
> could you please take a look?
>
refcount_set is called from bpf_obj_init_field in include/linux/bpf.h .
Maybe there is some scenario where the bpf_refcount doesn't pass through
bpf_obj_init_field... Regardless, digging now.
> The TLDR: of a thread:
> - __retval is currently ignored;
> - to fix it apply the patch below;
> - running the following command several times produces lot's
> of nasty errors in dmesg:
>
> $ for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done
>
> diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
> index 47e9e076bc8f..e2a1bdc5a570 100644
> --- a/tools/testing/selftests/bpf/test_loader.c
> +++ b/tools/testing/selftests/bpf/test_loader.c
> @@ -587,7 +587,7 @@ void run_subtest(struct test_loader *tester,
> /* For some reason test_verifier executes programs
> * with all capabilities restored. Do the same here.
> */
> - if (!restore_capabilities(&caps))
> + if (restore_capabilities(&caps))
> goto tobj_cleanup;
>
> do_prog_test_run(bpf_program__fd(tprog), &retval);
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bpf-next hang+kasan uaf refcount acquire splat when running test_progs
2023-04-20 23:54 ` Dave Marchevsky
@ 2023-04-21 0:00 ` Eduard Zingerman
2023-04-21 0:09 ` Dave Marchevsky
0 siblings, 1 reply; 9+ messages in thread
From: Eduard Zingerman @ 2023-04-21 0:00 UTC (permalink / raw)
To: Dave Marchevsky, Florian Westphal; +Cc: bpf, Dave Marchevsky
On Thu, 2023-04-20 at 19:54 -0400, Dave Marchevsky wrote:
> Hi Eduard and Florian, thanks for finding this issue. I am looking into the
> bpf_refcount side of things, and IIUC Eduard just submitted a series fixing
> __retval / test infra.
>
> I'm having trouble reproducing the refcount-specific splats, would you mind
> sharing .config?
Here is what I use:
https://gist.github.com/eddyz87/6c13e1783b5ae4b11b2d9e29fbe5ee49
>
> On 4/20/23 7:22 PM, Eduard Zingerman wrote:
>
> [...]
> > Looking at the error and at the test source code, it appears to me
> > that there is an API misuse for the `refcount_t` type.
> >
> > The `bpf_refcount_acquire` invocation in the test expands as a call to
> > bpf_refcount_acquire_impl(), which treats passed pointer as a value of
> > `refcount_t`:
> >
> > /* Description
> > * Increment the refcount on a refcounted local kptr, turning the
> > * non-owning reference input into an owning reference in the process.
> > *
> > * The 'meta' parameter is rewritten by the verifier, no need for BPF
> > * program to set it.
> > * Returns
> > * An owning reference to the object pointed to by 'kptr'
> > */
> > extern void *bpf_refcount_acquire_impl(void *kptr, void *meta) __ksym;
> >
> > __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
> > {
> > ...
> > refcount_inc((refcount_t *)ref);
> > return (void *)p__refcounted_kptr;
> > }
> >
> > The comment for `refcount_inc` says:
> >
> > /**
> > * ...
> > * Will WARN if the refcount is 0, as this represents a possible use-after-free
> > * condition.
> > */
> > static inline void refcount_inc(refcount_t *r)
> >
> > And looking at block/blk-core.c as an example, refcount_t is supposed
> > to be used as follows:
> > - upon object creation it's refcount is set to 1:
> > refcount_set(&q->refs, 1);
> > - when reference is added the refcount is incremented:
> > refcount_inc(&q->refs);
> > - when reference is removed the refcount is decremented and checked:
> > if (refcount_dec_and_test(&q->refs))
> > blk_free_queue(q);
> >
> > So, 0 is not actually a valid value for refcount_t instance. And I
> > don't see any calls to refcount_set() in kernel/bpf/helpers.c, which
> > implements bpf_refcount_acquire_impl().
> >
> > Dave, I see that bpf_refcount_acquire_impl() was recently added by you,
> > could you please take a look?
> >
>
> refcount_set is called from bpf_obj_init_field in include/linux/bpf.h .
> Maybe there is some scenario where the bpf_refcount doesn't pass through
> bpf_obj_init_field... Regardless, digging now.
Oh, sorry, missed that.
I should have added some printks before posting...
>
> > The TLDR: of a thread:
> > - __retval is currently ignored;
> > - to fix it apply the patch below;
> > - running the following command several times produces lot's
> > of nasty errors in dmesg:
> >
> > $ for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done
> >
> > diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
> > index 47e9e076bc8f..e2a1bdc5a570 100644
> > --- a/tools/testing/selftests/bpf/test_loader.c
> > +++ b/tools/testing/selftests/bpf/test_loader.c
> > @@ -587,7 +587,7 @@ void run_subtest(struct test_loader *tester,
> > /* For some reason test_verifier executes programs
> > * with all capabilities restored. Do the same here.
> > */
> > - if (!restore_capabilities(&caps))
> > + if (restore_capabilities(&caps))
> > goto tobj_cleanup;
> >
> > do_prog_test_run(bpf_program__fd(tprog), &retval);
> >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: bpf-next hang+kasan uaf refcount acquire splat when running test_progs
2023-04-21 0:00 ` Eduard Zingerman
@ 2023-04-21 0:09 ` Dave Marchevsky
0 siblings, 0 replies; 9+ messages in thread
From: Dave Marchevsky @ 2023-04-21 0:09 UTC (permalink / raw)
To: Eduard Zingerman, Florian Westphal; +Cc: bpf, Dave Marchevsky
On 4/20/23 8:00 PM, Eduard Zingerman wrote:
> On Thu, 2023-04-20 at 19:54 -0400, Dave Marchevsky wrote:
>> Hi Eduard and Florian, thanks for finding this issue. I am looking into the
>> bpf_refcount side of things, and IIUC Eduard just submitted a series fixing
>> __retval / test infra.
>>
>> I'm having trouble reproducing the refcount-specific splats, would you mind
>> sharing .config?
>
> Here is what I use:
> https://gist.github.com/eddyz87/6c13e1783b5ae4b11b2d9e29fbe5ee49
After looking at your __retval fix series I understand why
I couldn't repro. I didn’t realize the __retval issue was that
the tests weren’t being run _at all_. I thought they were just
always reporting success. My mistake, able to repro now.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-04-21 0:09 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-20 12:03 bpf-next hang+kasan uaf refcount acquire splat when running test_progs Florian Westphal
2023-04-20 12:44 ` Eduard Zingerman
2023-04-20 12:52 ` Florian Westphal
2023-04-20 13:00 ` Eduard Zingerman
2023-04-20 13:14 ` Eduard Zingerman
2023-04-20 23:22 ` Eduard Zingerman
2023-04-20 23:54 ` Dave Marchevsky
2023-04-21 0:00 ` Eduard Zingerman
2023-04-21 0:09 ` Dave Marchevsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).