* [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage
@ 2024-12-18 9:21 Abel Wu
2024-12-19 2:45 ` Yonghong Song
0 siblings, 1 reply; 8+ messages in thread
From: Abel Wu @ 2024-12-18 9:21 UTC (permalink / raw)
To: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
David Vernet
Cc: Abel Wu, open list:BPF [STORAGE & CGROUPS], open list
The following commit
bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]")
first introduced deadlock prevention for fentry/fexit programs attaching
on bpf_task_storage helpers. That commit also employed the logic in map
free path in its v6 version.
Later bpf_cgrp_storage was first introduced in
c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs")
which faces the same issue as bpf_task_storage, instead of its busy
counter, NULL was passed to bpf_local_storage_map_free() which opened
a window to cause deadlock:
<TASK>
_raw_spin_lock_irqsave+0x3d/0x50
bpf_local_storage_update+0xd1/0x460
bpf_cgrp_storage_get+0x109/0x130
bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e
bpf_trace_run1+0x84/0x100
cgroup_storage_ptr+0x4c/0x60
bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160
bpf_selem_unlink_storage+0x6f/0x110
bpf_local_storage_map_free+0xa2/0x110
bpf_map_free_deferred+0x5b/0x90
process_one_work+0x17c/0x390
worker_thread+0x251/0x360
kthread+0xd2/0x100
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
[ Since the verifier treats 'void *' as scalar which
prevents me from getting a pointer to 'struct cgroup *',
I added a raw tracepoint in cgroup_storage_ptr() to
help reproducing this issue. ]
Although it is tricky to reproduce, the risk of deadlock exists and
worthy of a fix, by passing its busy counter to the free procedure so
it can be properly incremented before storage/smap locking.
Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
---
kernel/bpf/bpf_cgrp_storage.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c
index 20f05de92e9c..7996fcea3755 100644
--- a/kernel/bpf/bpf_cgrp_storage.c
+++ b/kernel/bpf/bpf_cgrp_storage.c
@@ -154,7 +154,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
static void cgroup_storage_map_free(struct bpf_map *map)
{
- bpf_local_storage_map_free(map, &cgroup_cache, NULL);
+ bpf_local_storage_map_free(map, &cgroup_cache, &bpf_cgrp_storage_busy);
}
/* *gfp_flags* is a hidden argument provided by the verifier */
--
2.37.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage 2024-12-18 9:21 [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage Abel Wu @ 2024-12-19 2:45 ` Yonghong Song 2024-12-19 12:38 ` Abel Wu 0 siblings, 1 reply; 8+ messages in thread From: Yonghong Song @ 2024-12-19 2:45 UTC (permalink / raw) To: Abel Wu, Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman, Song Liu, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Vernet Cc: open list:BPF [STORAGE & CGROUPS], open list On 12/18/24 1:21 AM, Abel Wu wrote: > The following commit > bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]") > first introduced deadlock prevention for fentry/fexit programs attaching > on bpf_task_storage helpers. That commit also employed the logic in map > free path in its v6 version. > > Later bpf_cgrp_storage was first introduced in > c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") > which faces the same issue as bpf_task_storage, instead of its busy > counter, NULL was passed to bpf_local_storage_map_free() which opened > a window to cause deadlock: > > <TASK> > _raw_spin_lock_irqsave+0x3d/0x50 > bpf_local_storage_update+0xd1/0x460 > bpf_cgrp_storage_get+0x109/0x130 > bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e > bpf_trace_run1+0x84/0x100 > cgroup_storage_ptr+0x4c/0x60 > bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 > bpf_selem_unlink_storage+0x6f/0x110 > bpf_local_storage_map_free+0xa2/0x110 > bpf_map_free_deferred+0x5b/0x90 > process_one_work+0x17c/0x390 > worker_thread+0x251/0x360 > kthread+0xd2/0x100 > ret_from_fork+0x34/0x50 > ret_from_fork_asm+0x1a/0x30 > </TASK> > > [ Since the verifier treats 'void *' as scalar which > prevents me from getting a pointer to 'struct cgroup *', > I added a raw tracepoint in cgroup_storage_ptr() to > help reproducing this issue. ] > > Although it is tricky to reproduce, the risk of deadlock exists and > worthy of a fix, by passing its busy counter to the free procedure so > it can be properly incremented before storage/smap locking. The above stack trace and explanation does not show that we will have a potential dead lock here. You mentioned that it is tricky to reproduce, does it mean that you have done some analysis or coding to reproduce it? Could you share the details on why you think we may have deadlock here? > > Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") > Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> > --- > kernel/bpf/bpf_cgrp_storage.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c > index 20f05de92e9c..7996fcea3755 100644 > --- a/kernel/bpf/bpf_cgrp_storage.c > +++ b/kernel/bpf/bpf_cgrp_storage.c > @@ -154,7 +154,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr) > > static void cgroup_storage_map_free(struct bpf_map *map) > { > - bpf_local_storage_map_free(map, &cgroup_cache, NULL); > + bpf_local_storage_map_free(map, &cgroup_cache, &bpf_cgrp_storage_busy); > } > > /* *gfp_flags* is a hidden argument provided by the verifier */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Re: [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage 2024-12-19 2:45 ` Yonghong Song @ 2024-12-19 12:38 ` Abel Wu 2024-12-19 18:39 ` Yonghong Song 0 siblings, 1 reply; 8+ messages in thread From: Abel Wu @ 2024-12-19 12:38 UTC (permalink / raw) To: Yonghong Song, Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman, Song Liu, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Vernet Cc: open list:BPF [STORAGE & CGROUPS], open list Hi Yonghong, On 12/19/24 10:45 AM, Yonghong Song Wrote: > > > > On 12/18/24 1:21 AM, Abel Wu wrote: >> The following commit >> bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]") >> first introduced deadlock prevention for fentry/fexit programs attaching >> on bpf_task_storage helpers. That commit also employed the logic in map >> free path in its v6 version. >> >> Later bpf_cgrp_storage was first introduced in >> c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") >> which faces the same issue as bpf_task_storage, instead of its busy >> counter, NULL was passed to bpf_local_storage_map_free() which opened >> a window to cause deadlock: >> >> <TASK> >> _raw_spin_lock_irqsave+0x3d/0x50 >> bpf_local_storage_update+0xd1/0x460 >> bpf_cgrp_storage_get+0x109/0x130 >> bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e >> bpf_trace_run1+0x84/0x100 >> cgroup_storage_ptr+0x4c/0x60 >> bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 >> bpf_selem_unlink_storage+0x6f/0x110 >> bpf_local_storage_map_free+0xa2/0x110 >> bpf_map_free_deferred+0x5b/0x90 >> process_one_work+0x17c/0x390 >> worker_thread+0x251/0x360 >> kthread+0xd2/0x100 >> ret_from_fork+0x34/0x50 >> ret_from_fork_asm+0x1a/0x30 >> </TASK> >> >> [ Since the verifier treats 'void *' as scalar which >> prevents me from getting a pointer to 'struct cgroup *', >> I added a raw tracepoint in cgroup_storage_ptr() to >> help reproducing this issue. ] >> >> Although it is tricky to reproduce, the risk of deadlock exists and >> worthy of a fix, by passing its busy counter to the free procedure so >> it can be properly incremented before storage/smap locking. > > The above stack trace and explanation does not show that we will have > a potential dead lock here. You mentioned that it is tricky to reproduce, > does it mean that you have done some analysis or coding to reproduce it? > Could you share the details on why you think we may have deadlock here? The stack is A-A deadlocked: cgroup_storage_ptr() is called with storage->lock held, while the bpf_prog attaching on this function also tries to acquire the same lock by calling bpf_cgrp_storage_get() thus leading to a AA deadlock. The tricky part is, instead of attaching on cgroup_storage_ptr() directly, I added a tracepoint inside it to hook: ------ diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c index 20f05de92e9c..679209d4f88f 100644 --- a/kernel/bpf/bpf_cgrp_storage.c +++ b/kernel/bpf/bpf_cgrp_storage.c @@ -40,6 +40,8 @@ static struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) { struct cgroup *cg = owner; + trace_cgroup_ptr(cg); + return &cg->bpf_cgrp_storage; } ------ The reason doing so is that typecasting from 'void *owner' to 'struct cgroup *' will be rejected by the verifier. But there could be other ways to obtain a pointer to the @owner cgroup too, making the deadlock possible. Thanks, Abel > >> >> Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") >> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> >> --- >> kernel/bpf/bpf_cgrp_storage.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c >> index 20f05de92e9c..7996fcea3755 100644 >> --- a/kernel/bpf/bpf_cgrp_storage.c >> +++ b/kernel/bpf/bpf_cgrp_storage.c >> @@ -154,7 +154,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr) >> static void cgroup_storage_map_free(struct bpf_map *map) >> { >> - bpf_local_storage_map_free(map, &cgroup_cache, NULL); >> + bpf_local_storage_map_free(map, &cgroup_cache, &bpf_cgrp_storage_busy); >> } >> /* *gfp_flags* is a hidden argument provided by the verifier */ > ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage 2024-12-19 12:38 ` Abel Wu @ 2024-12-19 18:39 ` Yonghong Song 2024-12-19 18:43 ` Alexei Starovoitov 2024-12-21 2:26 ` Abel Wu 0 siblings, 2 replies; 8+ messages in thread From: Yonghong Song @ 2024-12-19 18:39 UTC (permalink / raw) To: Abel Wu, Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman, Song Liu, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Vernet Cc: open list:BPF [STORAGE & CGROUPS], open list On 12/19/24 4:38 AM, Abel Wu wrote: > Hi Yonghong, > > On 12/19/24 10:45 AM, Yonghong Song Wrote: >> >> >> >> On 12/18/24 1:21 AM, Abel Wu wrote: >>> The following commit >>> bc235cdb423a ("bpf: Prevent deadlock from recursive >>> bpf_task_storage_[get|delete]") >>> first introduced deadlock prevention for fentry/fexit programs >>> attaching >>> on bpf_task_storage helpers. That commit also employed the logic in map >>> free path in its v6 version. >>> >>> Later bpf_cgrp_storage was first introduced in >>> c4bcfb38a95e ("bpf: Implement cgroup storage available to >>> non-cgroup-attached bpf progs") >>> which faces the same issue as bpf_task_storage, instead of its busy >>> counter, NULL was passed to bpf_local_storage_map_free() which opened >>> a window to cause deadlock: >>> >>> <TASK> >>> _raw_spin_lock_irqsave+0x3d/0x50 >>> bpf_local_storage_update+0xd1/0x460 >>> bpf_cgrp_storage_get+0x109/0x130 >>> bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e >>> bpf_trace_run1+0x84/0x100 >>> cgroup_storage_ptr+0x4c/0x60 >>> bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 >>> bpf_selem_unlink_storage+0x6f/0x110 >>> bpf_local_storage_map_free+0xa2/0x110 >>> bpf_map_free_deferred+0x5b/0x90 >>> process_one_work+0x17c/0x390 >>> worker_thread+0x251/0x360 >>> kthread+0xd2/0x100 >>> ret_from_fork+0x34/0x50 >>> ret_from_fork_asm+0x1a/0x30 >>> </TASK> >>> >>> [ Since the verifier treats 'void *' as scalar which >>> prevents me from getting a pointer to 'struct cgroup *', >>> I added a raw tracepoint in cgroup_storage_ptr() to >>> help reproducing this issue. ] >>> >>> Although it is tricky to reproduce, the risk of deadlock exists and >>> worthy of a fix, by passing its busy counter to the free procedure so >>> it can be properly incremented before storage/smap locking. >> >> The above stack trace and explanation does not show that we will have >> a potential dead lock here. You mentioned that it is tricky to >> reproduce, >> does it mean that you have done some analysis or coding to reproduce it? >> Could you share the details on why you think we may have deadlock here? > > The stack is A-A deadlocked: cgroup_storage_ptr() is called with > storage->lock held, while the bpf_prog attaching on this function > also tries to acquire the same lock by calling bpf_cgrp_storage_get() > thus leading to a AA deadlock. > > The tricky part is, instead of attaching on cgroup_storage_ptr() > directly, I added a tracepoint inside it to hook: > > ------ > diff --git a/kernel/bpf/bpf_cgrp_storage.c > b/kernel/bpf/bpf_cgrp_storage.c > index 20f05de92e9c..679209d4f88f 100644 > --- a/kernel/bpf/bpf_cgrp_storage.c > +++ b/kernel/bpf/bpf_cgrp_storage.c > @@ -40,6 +40,8 @@ static struct bpf_local_storage __rcu > **cgroup_storage_ptr(void *owner) > { > struct cgroup *cg = owner; > > + trace_cgroup_ptr(cg); > + > return &cg->bpf_cgrp_storage; > } > > ------ > > The reason doing so is that typecasting from 'void *owner' to > 'struct cgroup *' will be rejected by the verifier. But there > could be other ways to obtain a pointer to the @owner cgroup > too, making the deadlock possible. I checked the callstack and what you described indeed the case. In function bpf_selem_unlink_storage(), local_storage->lock is held before calling bpf_selem_unlink_storage_nolock/cgroup_storage_ptr. If there is a fentry/tracepoint on the cgroup_storage_ptr and then we could have a deadlock as you described in the above. As you mentioned, it is tricky to reproduce. fentry on cgroup_storage_ptr does not work due to func signature: struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) Even say we support 'void *' for fentry and we do bpf_rdonly_cast() to cast 'void *owner' to 'struct cgroup *owner', and owner cannot be passed to helper/kfunc. Your fix looks good but it would be great to have a reproducer. One possibility is to find a function which can be fentried within local_storage->lock. If you know the cgroup id, in bpf program you can use bpf_cgroup_from_id() to get a trusted cgroup ptr from the id. and then you can use that cgroup ptr to do bpf_cgrp_storage_get() etc. which should be able to triger deadlock. Could you give a try? Also, in your commit message, it will be great if you can illustrage where each lock happens, e.g. local_storage->lock bpf_selem_unlink_storage_nolock.constprop.0 ... bpf_local_storage_update ... local_storage->lock ... > > Thanks, > Abel > >> >>> >>> Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to >>> non-cgroup-attached bpf progs") >>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> >>> --- >>> kernel/bpf/bpf_cgrp_storage.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/kernel/bpf/bpf_cgrp_storage.c >>> b/kernel/bpf/bpf_cgrp_storage.c >>> index 20f05de92e9c..7996fcea3755 100644 >>> --- a/kernel/bpf/bpf_cgrp_storage.c >>> +++ b/kernel/bpf/bpf_cgrp_storage.c >>> @@ -154,7 +154,7 @@ static struct bpf_map >>> *cgroup_storage_map_alloc(union bpf_attr *attr) >>> static void cgroup_storage_map_free(struct bpf_map *map) >>> { >>> - bpf_local_storage_map_free(map, &cgroup_cache, NULL); >>> + bpf_local_storage_map_free(map, &cgroup_cache, >>> &bpf_cgrp_storage_busy); >>> } >>> /* *gfp_flags* is a hidden argument provided by the verifier */ >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage 2024-12-19 18:39 ` Yonghong Song @ 2024-12-19 18:43 ` Alexei Starovoitov 2024-12-19 18:57 ` Yonghong Song 2024-12-21 2:26 ` Abel Wu 1 sibling, 1 reply; 8+ messages in thread From: Alexei Starovoitov @ 2024-12-19 18:43 UTC (permalink / raw) To: Yonghong Song Cc: Abel Wu, Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman, Song Liu, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Vernet, open list:BPF [STORAGE & CGROUPS], open list On Thu, Dec 19, 2024 at 10:39 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > On 12/19/24 4:38 AM, Abel Wu wrote: > > Hi Yonghong, > > > > On 12/19/24 10:45 AM, Yonghong Song Wrote: > >> > >> > >> > >> On 12/18/24 1:21 AM, Abel Wu wrote: > >>> The following commit > >>> bc235cdb423a ("bpf: Prevent deadlock from recursive > >>> bpf_task_storage_[get|delete]") > >>> first introduced deadlock prevention for fentry/fexit programs > >>> attaching > >>> on bpf_task_storage helpers. That commit also employed the logic in map > >>> free path in its v6 version. > >>> > >>> Later bpf_cgrp_storage was first introduced in > >>> c4bcfb38a95e ("bpf: Implement cgroup storage available to > >>> non-cgroup-attached bpf progs") > >>> which faces the same issue as bpf_task_storage, instead of its busy > >>> counter, NULL was passed to bpf_local_storage_map_free() which opened > >>> a window to cause deadlock: > >>> > >>> <TASK> > >>> _raw_spin_lock_irqsave+0x3d/0x50 > >>> bpf_local_storage_update+0xd1/0x460 > >>> bpf_cgrp_storage_get+0x109/0x130 > >>> bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e > >>> bpf_trace_run1+0x84/0x100 > >>> cgroup_storage_ptr+0x4c/0x60 > >>> bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 > >>> bpf_selem_unlink_storage+0x6f/0x110 > >>> bpf_local_storage_map_free+0xa2/0x110 > >>> bpf_map_free_deferred+0x5b/0x90 > >>> process_one_work+0x17c/0x390 > >>> worker_thread+0x251/0x360 > >>> kthread+0xd2/0x100 > >>> ret_from_fork+0x34/0x50 > >>> ret_from_fork_asm+0x1a/0x30 > >>> </TASK> > >>> > >>> [ Since the verifier treats 'void *' as scalar which > >>> prevents me from getting a pointer to 'struct cgroup *', > >>> I added a raw tracepoint in cgroup_storage_ptr() to > >>> help reproducing this issue. ] > >>> > >>> Although it is tricky to reproduce, the risk of deadlock exists and > >>> worthy of a fix, by passing its busy counter to the free procedure so > >>> it can be properly incremented before storage/smap locking. > >> > >> The above stack trace and explanation does not show that we will have > >> a potential dead lock here. You mentioned that it is tricky to > >> reproduce, > >> does it mean that you have done some analysis or coding to reproduce it? > >> Could you share the details on why you think we may have deadlock here? > > > > The stack is A-A deadlocked: cgroup_storage_ptr() is called with > > storage->lock held, while the bpf_prog attaching on this function > > also tries to acquire the same lock by calling bpf_cgrp_storage_get() > > thus leading to a AA deadlock. > > > > The tricky part is, instead of attaching on cgroup_storage_ptr() > > directly, I added a tracepoint inside it to hook: > > > > ------ > > diff --git a/kernel/bpf/bpf_cgrp_storage.c > > b/kernel/bpf/bpf_cgrp_storage.c > > index 20f05de92e9c..679209d4f88f 100644 > > --- a/kernel/bpf/bpf_cgrp_storage.c > > +++ b/kernel/bpf/bpf_cgrp_storage.c > > @@ -40,6 +40,8 @@ static struct bpf_local_storage __rcu > > **cgroup_storage_ptr(void *owner) > > { > > struct cgroup *cg = owner; > > > > + trace_cgroup_ptr(cg); > > + > > return &cg->bpf_cgrp_storage; > > } > > > > ------ > > > > The reason doing so is that typecasting from 'void *owner' to > > 'struct cgroup *' will be rejected by the verifier. But there > > could be other ways to obtain a pointer to the @owner cgroup > > too, making the deadlock possible. > > I checked the callstack and what you described indeed the case. > In function bpf_selem_unlink_storage(), local_storage->lock is > held before calling bpf_selem_unlink_storage_nolock/cgroup_storage_ptr. > If there is a fentry/tracepoint on the cgroup_storage_ptr and then we could > have a deadlock as you described in the above. > > As you mentioned, it is tricky to reproduce. fentry on cgroup_storage_ptr > does not work due to func signature: > struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) > Even say we support 'void *' for fentry and we do bpf_rdonly_cast() > to cast 'void *owner' to 'struct cgroup *owner', and owner cannot be > passed to helper/kfunc. > > Your fix looks good but it would be great to have a reproducer. > One possibility is to find a function which can be fentried within > local_storage->lock. If you know the cgroup id, in bpf program you > can use bpf_cgroup_from_id() to get a trusted cgroup ptr from the id. > and then you can use that cgroup ptr to do bpf_cgrp_storage_get() etc. > which should be able to triger deadlock. Could you give a try? I'd rather mark a set of functions as notrace to avoid this situation or add: CFLAGS_REMOVE_bpf_cgrp_storage.o = $(CC_FLAGS_FTRACE) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage 2024-12-19 18:43 ` Alexei Starovoitov @ 2024-12-19 18:57 ` Yonghong Song 2025-01-26 9:19 ` Abel Wu 0 siblings, 1 reply; 8+ messages in thread From: Yonghong Song @ 2024-12-19 18:57 UTC (permalink / raw) To: Alexei Starovoitov Cc: Abel Wu, Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman, Song Liu, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Vernet, open list:BPF [STORAGE & CGROUPS], open list On 12/19/24 10:43 AM, Alexei Starovoitov wrote: > On Thu, Dec 19, 2024 at 10:39 AM Yonghong Song <yonghong.song@linux.dev> wrote: >> >> >> >> On 12/19/24 4:38 AM, Abel Wu wrote: >>> Hi Yonghong, >>> >>> On 12/19/24 10:45 AM, Yonghong Song Wrote: >>>> >>>> >>>> On 12/18/24 1:21 AM, Abel Wu wrote: >>>>> The following commit >>>>> bc235cdb423a ("bpf: Prevent deadlock from recursive >>>>> bpf_task_storage_[get|delete]") >>>>> first introduced deadlock prevention for fentry/fexit programs >>>>> attaching >>>>> on bpf_task_storage helpers. That commit also employed the logic in map >>>>> free path in its v6 version. >>>>> >>>>> Later bpf_cgrp_storage was first introduced in >>>>> c4bcfb38a95e ("bpf: Implement cgroup storage available to >>>>> non-cgroup-attached bpf progs") >>>>> which faces the same issue as bpf_task_storage, instead of its busy >>>>> counter, NULL was passed to bpf_local_storage_map_free() which opened >>>>> a window to cause deadlock: >>>>> >>>>> <TASK> >>>>> _raw_spin_lock_irqsave+0x3d/0x50 >>>>> bpf_local_storage_update+0xd1/0x460 >>>>> bpf_cgrp_storage_get+0x109/0x130 >>>>> bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e >>>>> bpf_trace_run1+0x84/0x100 >>>>> cgroup_storage_ptr+0x4c/0x60 >>>>> bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 >>>>> bpf_selem_unlink_storage+0x6f/0x110 >>>>> bpf_local_storage_map_free+0xa2/0x110 >>>>> bpf_map_free_deferred+0x5b/0x90 >>>>> process_one_work+0x17c/0x390 >>>>> worker_thread+0x251/0x360 >>>>> kthread+0xd2/0x100 >>>>> ret_from_fork+0x34/0x50 >>>>> ret_from_fork_asm+0x1a/0x30 >>>>> </TASK> >>>>> >>>>> [ Since the verifier treats 'void *' as scalar which >>>>> prevents me from getting a pointer to 'struct cgroup *', >>>>> I added a raw tracepoint in cgroup_storage_ptr() to >>>>> help reproducing this issue. ] >>>>> >>>>> Although it is tricky to reproduce, the risk of deadlock exists and >>>>> worthy of a fix, by passing its busy counter to the free procedure so >>>>> it can be properly incremented before storage/smap locking. >>>> The above stack trace and explanation does not show that we will have >>>> a potential dead lock here. You mentioned that it is tricky to >>>> reproduce, >>>> does it mean that you have done some analysis or coding to reproduce it? >>>> Could you share the details on why you think we may have deadlock here? >>> The stack is A-A deadlocked: cgroup_storage_ptr() is called with >>> storage->lock held, while the bpf_prog attaching on this function >>> also tries to acquire the same lock by calling bpf_cgrp_storage_get() >>> thus leading to a AA deadlock. >>> >>> The tricky part is, instead of attaching on cgroup_storage_ptr() >>> directly, I added a tracepoint inside it to hook: >>> >>> ------ >>> diff --git a/kernel/bpf/bpf_cgrp_storage.c >>> b/kernel/bpf/bpf_cgrp_storage.c >>> index 20f05de92e9c..679209d4f88f 100644 >>> --- a/kernel/bpf/bpf_cgrp_storage.c >>> +++ b/kernel/bpf/bpf_cgrp_storage.c >>> @@ -40,6 +40,8 @@ static struct bpf_local_storage __rcu >>> **cgroup_storage_ptr(void *owner) >>> { >>> struct cgroup *cg = owner; >>> >>> + trace_cgroup_ptr(cg); >>> + >>> return &cg->bpf_cgrp_storage; >>> } >>> >>> ------ >>> >>> The reason doing so is that typecasting from 'void *owner' to >>> 'struct cgroup *' will be rejected by the verifier. But there >>> could be other ways to obtain a pointer to the @owner cgroup >>> too, making the deadlock possible. >> I checked the callstack and what you described indeed the case. >> In function bpf_selem_unlink_storage(), local_storage->lock is >> held before calling bpf_selem_unlink_storage_nolock/cgroup_storage_ptr. >> If there is a fentry/tracepoint on the cgroup_storage_ptr and then we could >> have a deadlock as you described in the above. >> >> As you mentioned, it is tricky to reproduce. fentry on cgroup_storage_ptr >> does not work due to func signature: >> struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) >> Even say we support 'void *' for fentry and we do bpf_rdonly_cast() >> to cast 'void *owner' to 'struct cgroup *owner', and owner cannot be >> passed to helper/kfunc. >> >> Your fix looks good but it would be great to have a reproducer. >> One possibility is to find a function which can be fentried within >> local_storage->lock. If you know the cgroup id, in bpf program you >> can use bpf_cgroup_from_id() to get a trusted cgroup ptr from the id. >> and then you can use that cgroup ptr to do bpf_cgrp_storage_get() etc. >> which should be able to triger deadlock. Could you give a try? > I'd rather mark a set of functions as notrace to avoid this situation > or add: > CFLAGS_REMOVE_bpf_cgrp_storage.o = $(CC_FLAGS_FTRACE) If we go through CFLAGS_REMOVE path, we need to do CFLAGS_REMOVE_bpf_local_storage.o = $(CC_FLAGS_FTRACE) as well since bpf_selem_unlink_storage_nolock() calls a few functions which, if fentry traced, could trigger similar issue (with bpf_cgroup_from_id() approach). ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Re: [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage 2024-12-19 18:57 ` Yonghong Song @ 2025-01-26 9:19 ` Abel Wu 0 siblings, 0 replies; 8+ messages in thread From: Abel Wu @ 2025-01-26 9:19 UTC (permalink / raw) To: Yonghong Song, Alexei Starovoitov Cc: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman, Song Liu, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Vernet, open list:BPF [STORAGE & CGROUPS], open list Hi Alexei, Yonghong, I am really sorry that I didn't notice the failure of sending out this reply that day until Martin reminds me in v2.. The bpf_local_storage (esp. bpf_cgrp_storage) is becoming one of the most important facility in our containerized infrastructure due to its excellent scalability and performance. We (Bytedance) are migrating our bpf progs, which provide container insights, to use bpf_cgrp_storage, so it is important if there is a way to profile its internals to see how it behaves under real workloads. On 12/20/24 2:57 AM, Yonghong Song Wrote: > > > > On 12/19/24 10:43 AM, Alexei Starovoitov wrote: >> On Thu, Dec 19, 2024 at 10:39 AM Yonghong Song <yonghong.song@linux.dev> wrote: >>> >>> >>> >>> On 12/19/24 4:38 AM, Abel Wu wrote: >>>> Hi Yonghong, >>>> >>>> On 12/19/24 10:45 AM, Yonghong Song Wrote: >>>>> >>>>> >>>>> On 12/18/24 1:21 AM, Abel Wu wrote: >>>>>> The following commit >>>>>> bc235cdb423a ("bpf: Prevent deadlock from recursive >>>>>> bpf_task_storage_[get|delete]") >>>>>> first introduced deadlock prevention for fentry/fexit programs >>>>>> attaching >>>>>> on bpf_task_storage helpers. That commit also employed the logic in map >>>>>> free path in its v6 version. >>>>>> >>>>>> Later bpf_cgrp_storage was first introduced in >>>>>> c4bcfb38a95e ("bpf: Implement cgroup storage available to >>>>>> non-cgroup-attached bpf progs") >>>>>> which faces the same issue as bpf_task_storage, instead of its busy >>>>>> counter, NULL was passed to bpf_local_storage_map_free() which opened >>>>>> a window to cause deadlock: >>>>>> >>>>>> <TASK> >>>>>> _raw_spin_lock_irqsave+0x3d/0x50 >>>>>> bpf_local_storage_update+0xd1/0x460 >>>>>> bpf_cgrp_storage_get+0x109/0x130 >>>>>> bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e >>>>>> bpf_trace_run1+0x84/0x100 >>>>>> cgroup_storage_ptr+0x4c/0x60 >>>>>> bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 >>>>>> bpf_selem_unlink_storage+0x6f/0x110 >>>>>> bpf_local_storage_map_free+0xa2/0x110 >>>>>> bpf_map_free_deferred+0x5b/0x90 >>>>>> process_one_work+0x17c/0x390 >>>>>> worker_thread+0x251/0x360 >>>>>> kthread+0xd2/0x100 >>>>>> ret_from_fork+0x34/0x50 >>>>>> ret_from_fork_asm+0x1a/0x30 >>>>>> </TASK> >>>>>> >>>>>> [ Since the verifier treats 'void *' as scalar which >>>>>> prevents me from getting a pointer to 'struct cgroup *', >>>>>> I added a raw tracepoint in cgroup_storage_ptr() to >>>>>> help reproducing this issue. ] >>>>>> >>>>>> Although it is tricky to reproduce, the risk of deadlock exists and >>>>>> worthy of a fix, by passing its busy counter to the free procedure so >>>>>> it can be properly incremented before storage/smap locking. >>>>> The above stack trace and explanation does not show that we will have >>>>> a potential dead lock here. You mentioned that it is tricky to >>>>> reproduce, >>>>> does it mean that you have done some analysis or coding to reproduce it? >>>>> Could you share the details on why you think we may have deadlock here? >>>> The stack is A-A deadlocked: cgroup_storage_ptr() is called with >>>> storage->lock held, while the bpf_prog attaching on this function >>>> also tries to acquire the same lock by calling bpf_cgrp_storage_get() >>>> thus leading to a AA deadlock. >>>> >>>> The tricky part is, instead of attaching on cgroup_storage_ptr() >>>> directly, I added a tracepoint inside it to hook: >>>> >>>> ------ >>>> diff --git a/kernel/bpf/bpf_cgrp_storage.c >>>> b/kernel/bpf/bpf_cgrp_storage.c >>>> index 20f05de92e9c..679209d4f88f 100644 >>>> --- a/kernel/bpf/bpf_cgrp_storage.c >>>> +++ b/kernel/bpf/bpf_cgrp_storage.c >>>> @@ -40,6 +40,8 @@ static struct bpf_local_storage __rcu >>>> **cgroup_storage_ptr(void *owner) >>>> { >>>> struct cgroup *cg = owner; >>>> >>>> + trace_cgroup_ptr(cg); >>>> + >>>> return &cg->bpf_cgrp_storage; >>>> } >>>> >>>> ------ >>>> >>>> The reason doing so is that typecasting from 'void *owner' to >>>> 'struct cgroup *' will be rejected by the verifier. But there >>>> could be other ways to obtain a pointer to the @owner cgroup >>>> too, making the deadlock possible. >>> I checked the callstack and what you described indeed the case. >>> In function bpf_selem_unlink_storage(), local_storage->lock is >>> held before calling bpf_selem_unlink_storage_nolock/cgroup_storage_ptr. >>> If there is a fentry/tracepoint on the cgroup_storage_ptr and then we could >>> have a deadlock as you described in the above. >>> >>> As you mentioned, it is tricky to reproduce. fentry on cgroup_storage_ptr >>> does not work due to func signature: >>> struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) >>> Even say we support 'void *' for fentry and we do bpf_rdonly_cast() >>> to cast 'void *owner' to 'struct cgroup *owner', and owner cannot be >>> passed to helper/kfunc. >>> >>> Your fix looks good but it would be great to have a reproducer. >>> One possibility is to find a function which can be fentried within >>> local_storage->lock. If you know the cgroup id, in bpf program you >>> can use bpf_cgroup_from_id() to get a trusted cgroup ptr from the id. >>> and then you can use that cgroup ptr to do bpf_cgrp_storage_get() etc. >>> which should be able to triger deadlock. Could you give a try? >> I'd rather mark a set of functions as notrace to avoid this situation >> or add: >> CFLAGS_REMOVE_bpf_cgrp_storage.o = $(CC_FLAGS_FTRACE) > > If we go through CFLAGS_REMOVE path, we need to do > > CFLAGS_REMOVE_bpf_local_storage.o = $(CC_FLAGS_FTRACE) > > as well since bpf_selem_unlink_storage_nolock() calls a few functions > which, if fentry traced, could trigger similar issue (with bpf_cgroup_from_id() approach). > If we go through this path, shall we also do the same to other local storages? It's a little weird that only disabling tracing on this one while allowing others. And once if we had removed trace flags for all bpf_*_storage, the previous efforts of avoiding recurring on bpf local storages are no longer needed anymore IMHO. After carefully (or not?) examined all the critical sections inside bpf_local_storage.c, I didn't find any trace-able points that can cause deadlock if the busy_counter gets properly incremented (and the cgroup storage's counter is the only one that not doing so). I totally agree that disabling tracing on the local storages is the ideal solution to get rid of deadlock issues, but can we just fix this counter problem in order to retain the observability into the internals of local storage as I believe it will play a more and more important role in modern containerized infrastructures. Thanks & Best Regards, Abel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Re: [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage 2024-12-19 18:39 ` Yonghong Song 2024-12-19 18:43 ` Alexei Starovoitov @ 2024-12-21 2:26 ` Abel Wu 1 sibling, 0 replies; 8+ messages in thread From: Abel Wu @ 2024-12-21 2:26 UTC (permalink / raw) To: Yonghong Song, Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman, Song Liu, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Vernet Cc: open list:BPF [STORAGE & CGROUPS], open list On 12/20/24 2:39 AM, Yonghong Song Wrote: > > > > On 12/19/24 4:38 AM, Abel Wu wrote: >> Hi Yonghong, >> >> On 12/19/24 10:45 AM, Yonghong Song Wrote: >>> >>> >>> >>> On 12/18/24 1:21 AM, Abel Wu wrote: >>>> The following commit >>>> bc235cdb423a ("bpf: Prevent deadlock from recursive bpf_task_storage_[get|delete]") >>>> first introduced deadlock prevention for fentry/fexit programs attaching >>>> on bpf_task_storage helpers. That commit also employed the logic in map >>>> free path in its v6 version. >>>> >>>> Later bpf_cgrp_storage was first introduced in >>>> c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") >>>> which faces the same issue as bpf_task_storage, instead of its busy >>>> counter, NULL was passed to bpf_local_storage_map_free() which opened >>>> a window to cause deadlock: >>>> >>>> <TASK> >>>> _raw_spin_lock_irqsave+0x3d/0x50 >>>> bpf_local_storage_update+0xd1/0x460 >>>> bpf_cgrp_storage_get+0x109/0x130 >>>> bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e >>>> bpf_trace_run1+0x84/0x100 >>>> cgroup_storage_ptr+0x4c/0x60 >>>> bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 >>>> bpf_selem_unlink_storage+0x6f/0x110 >>>> bpf_local_storage_map_free+0xa2/0x110 >>>> bpf_map_free_deferred+0x5b/0x90 >>>> process_one_work+0x17c/0x390 >>>> worker_thread+0x251/0x360 >>>> kthread+0xd2/0x100 >>>> ret_from_fork+0x34/0x50 >>>> ret_from_fork_asm+0x1a/0x30 >>>> </TASK> >>>> >>>> [ Since the verifier treats 'void *' as scalar which >>>> prevents me from getting a pointer to 'struct cgroup *', >>>> I added a raw tracepoint in cgroup_storage_ptr() to >>>> help reproducing this issue. ] >>>> >>>> Although it is tricky to reproduce, the risk of deadlock exists and >>>> worthy of a fix, by passing its busy counter to the free procedure so >>>> it can be properly incremented before storage/smap locking. >>> >>> The above stack trace and explanation does not show that we will have >>> a potential dead lock here. You mentioned that it is tricky to reproduce, >>> does it mean that you have done some analysis or coding to reproduce it? >>> Could you share the details on why you think we may have deadlock here? >> >> The stack is A-A deadlocked: cgroup_storage_ptr() is called with >> storage->lock held, while the bpf_prog attaching on this function >> also tries to acquire the same lock by calling bpf_cgrp_storage_get() >> thus leading to a AA deadlock. >> >> The tricky part is, instead of attaching on cgroup_storage_ptr() >> directly, I added a tracepoint inside it to hook: >> >> ------ >> diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c >> index 20f05de92e9c..679209d4f88f 100644 >> --- a/kernel/bpf/bpf_cgrp_storage.c >> +++ b/kernel/bpf/bpf_cgrp_storage.c >> @@ -40,6 +40,8 @@ static struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) >> { >> struct cgroup *cg = owner; >> >> + trace_cgroup_ptr(cg); >> + >> return &cg->bpf_cgrp_storage; >> } >> >> ------ >> >> The reason doing so is that typecasting from 'void *owner' to >> 'struct cgroup *' will be rejected by the verifier. But there >> could be other ways to obtain a pointer to the @owner cgroup >> too, making the deadlock possible. > > I checked the callstack and what you described indeed the case. > In function bpf_selem_unlink_storage(), local_storage->lock is > held before calling bpf_selem_unlink_storage_nolock/cgroup_storage_ptr. > If there is a fentry/tracepoint on the cgroup_storage_ptr and then we could > have a deadlock as you described in the above. > > As you mentioned, it is tricky to reproduce. fentry on cgroup_storage_ptr > does not work due to func signature: > struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) > Even say we support 'void *' for fentry and we do bpf_rdonly_cast() > to cast 'void *owner' to 'struct cgroup *owner', and owner cannot be > passed to helper/kfunc. > > Your fix looks good but it would be great to have a reproducer. > One possibility is to find a function which can be fentried within > local_storage->lock. If you know the cgroup id, in bpf program you > can use bpf_cgroup_from_id() to get a trusted cgroup ptr from the id. > and then you can use that cgroup ptr to do bpf_cgrp_storage_get() etc. > which should be able to triger deadlock. Could you give a try? Sure. > > Also, in your commit message, it will be great if you can illustrage > where each lock happens, e.g. > local_storage->lock > bpf_selem_unlink_storage_nolock.constprop.0 > ... > bpf_local_storage_update > ... > local_storage->lock > ... > OK, will update. >> >> Thanks, >> Abel >> >>> >>>> >>>> Fixes: c4bcfb38a95e ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") >>>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com> >>>> --- >>>> kernel/bpf/bpf_cgrp_storage.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/kernel/bpf/bpf_cgrp_storage.c b/kernel/bpf/bpf_cgrp_storage.c >>>> index 20f05de92e9c..7996fcea3755 100644 >>>> --- a/kernel/bpf/bpf_cgrp_storage.c >>>> +++ b/kernel/bpf/bpf_cgrp_storage.c >>>> @@ -154,7 +154,7 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr) >>>> static void cgroup_storage_map_free(struct bpf_map *map) >>>> { >>>> - bpf_local_storage_map_free(map, &cgroup_cache, NULL); >>>> + bpf_local_storage_map_free(map, &cgroup_cache, &bpf_cgrp_storage_busy); >>>> } >>>> /* *gfp_flags* is a hidden argument provided by the verifier */ >>> >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-26 9:20 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-18 9:21 [PATCH bpf] bpf: Fix deadlock when freeing cgroup storage Abel Wu 2024-12-19 2:45 ` Yonghong Song 2024-12-19 12:38 ` Abel Wu 2024-12-19 18:39 ` Yonghong Song 2024-12-19 18:43 ` Alexei Starovoitov 2024-12-19 18:57 ` Yonghong Song 2025-01-26 9:19 ` Abel Wu 2024-12-21 2:26 ` Abel Wu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox