* [bpf-next v2 1/2] bpf: hash map, avoid deadlock with suitable hash mask @ 2022-12-17 15:02 xiangxia.m.yue 2022-12-17 15:02 ` [bpf-next v2 2/2] selftests/bpf: add test case for htab map xiangxia.m.yue 0 siblings, 1 reply; 4+ messages in thread From: xiangxia.m.yue @ 2022-12-17 15:02 UTC (permalink / raw) To: bpf Cc: Tonghao Zhang, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Hou Tao From: Tonghao Zhang <xiangxia.m.yue@gmail.com> The deadlock still may occur while accessed in NMI and non-NMI context. Because in NMI, we still may access the same bucket but with different map_locked index. For example, on the same CPU, .max_entries = 2, we update the hash map, with key = 4, while running bpf prog in NMI nmi_handle(), to update hash map with key = 20, so it will have the same bucket index but have different map_locked index. To fix this issue, using min mask to hash again. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Hou Tao <houtao1@huawei.com> --- kernel/bpf/hashtab.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 5aa2b5525f79..974f104f47a0 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -152,7 +152,7 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab, { unsigned long flags; - hash = hash & HASHTAB_MAP_LOCK_MASK; + hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1); preempt_disable(); if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { @@ -171,7 +171,7 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab, struct bucket *b, u32 hash, unsigned long flags) { - hash = hash & HASHTAB_MAP_LOCK_MASK; + hash = hash & min_t(u32, HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1); raw_spin_unlock_irqrestore(&b->raw_lock, flags); __this_cpu_dec(*(htab->map_locked[hash])); preempt_enable(); -- 2.27.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [bpf-next v2 2/2] selftests/bpf: add test case for htab map 2022-12-17 15:02 [bpf-next v2 1/2] bpf: hash map, avoid deadlock with suitable hash mask xiangxia.m.yue @ 2022-12-17 15:02 ` xiangxia.m.yue 2022-12-17 17:37 ` Yonghong Song 0 siblings, 1 reply; 4+ messages in thread From: xiangxia.m.yue @ 2022-12-17 15:02 UTC (permalink / raw) To: bpf Cc: Tonghao Zhang, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Hou Tao From: Tonghao Zhang <xiangxia.m.yue@gmail.com> This testing show how to reproduce deadlock in special case. We update htab map in Task and NMI context. Task can be interrupted by NMI, if the same map bucket was locked, there will be a deadlock. * map max_entries is 2. * NMI using key 4 and Task context using key 20. * so same bucket index but map_locked index is different. The selftest use perf to produce the NMI and fentry nmi_handle. Note that bpf_overflow_handler checks bpf_prog_active, but in bpf update map syscall increase this counter in bpf_disable_instrumentation. Then fentry nmi_handle and update hash map will reproduce the issue. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Hou Tao <houtao1@huawei.com> --- tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 + tools/testing/selftests/bpf/DENYLIST.s390x | 1 + .../selftests/bpf/prog_tests/htab_deadlock.c | 75 +++++++++++++++++++ .../selftests/bpf/progs/htab_deadlock.c | 30 ++++++++ 4 files changed, 107 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_deadlock.c create mode 100644 tools/testing/selftests/bpf/progs/htab_deadlock.c diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64 index 99cc33c51eaa..87e8fc9c9df2 100644 --- a/tools/testing/selftests/bpf/DENYLIST.aarch64 +++ b/tools/testing/selftests/bpf/DENYLIST.aarch64 @@ -24,6 +24,7 @@ fexit_test # fexit_attach unexpected error get_func_args_test # get_func_args_test__attach unexpected error: -524 (errno 524) (trampoline) get_func_ip_test # get_func_ip_test__attach unexpected error: -524 (errno 524) (trampoline) htab_update/reenter_update +htab_deadlock # failed to find kernel BTF type ID of 'nmi_handle': -3 (trampoline) kfree_skb # attach fentry unexpected error: -524 (trampoline) kfunc_call/subprog # extern (var ksym) 'bpf_prog_active': not found in kernel BTF kfunc_call/subprog_lskel # skel unexpected error: -2 diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x index 585fcf73c731..735239b31050 100644 --- a/tools/testing/selftests/bpf/DENYLIST.s390x +++ b/tools/testing/selftests/bpf/DENYLIST.s390x @@ -26,6 +26,7 @@ get_func_args_test # trampoline get_func_ip_test # get_func_ip_test__attach unexpected error: -524 (trampoline) get_stack_raw_tp # user_stack corrupted user stack (no backchain userspace) htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline) +htab_deadlock # failed to find kernel BTF type ID of 'nmi_handle': -3 (trampoline) kfree_skb # attach fentry unexpected error: -524 (trampoline) kfunc_call # 'bpf_prog_active': not found in kernel BTF (?) kfunc_dynptr_param # JIT does not support calling kernel function (kfunc) diff --git a/tools/testing/selftests/bpf/prog_tests/htab_deadlock.c b/tools/testing/selftests/bpf/prog_tests/htab_deadlock.c new file mode 100644 index 000000000000..137dce8f1346 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/htab_deadlock.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 DiDi Global Inc. */ +#define _GNU_SOURCE +#include <pthread.h> +#include <sched.h> +#include <test_progs.h> + +#include "htab_deadlock.skel.h" + +static int perf_event_open(void) +{ + struct perf_event_attr attr = {0}; + int pfd; + + /* create perf event on CPU 0 */ + attr.size = sizeof(attr); + attr.type = PERF_TYPE_HARDWARE; + attr.config = PERF_COUNT_HW_CPU_CYCLES; + attr.freq = 1; + attr.sample_freq = 1000; + pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC); + + return pfd >= 0 ? pfd : -errno; +} + +void test_htab_deadlock(void) +{ + unsigned int val = 0, key = 20; + struct bpf_link *link = NULL; + struct htab_deadlock *skel; + int err, i, pfd; + cpu_set_t cpus; + + skel = htab_deadlock__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) + return; + + err = htab_deadlock__attach(skel); + if (!ASSERT_OK(err, "skel_attach")) + goto clean_skel; + + /* NMI events. */ + pfd = perf_event_open(); + if (pfd < 0) { + if (pfd == -ENOENT || pfd == -EOPNOTSUPP) { + printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n", __func__); + test__skip(); + goto clean_skel; + } + if (!ASSERT_GE(pfd, 0, "perf_event_open")) + goto clean_skel; + } + + link = bpf_program__attach_perf_event(skel->progs.bpf_empty, pfd); + if (!ASSERT_OK_PTR(link, "attach_perf_event")) + goto clean_pfd; + + /* Pinned on CPU 0 */ + CPU_ZERO(&cpus); + CPU_SET(0, &cpus); + pthread_setaffinity_np(pthread_self(), sizeof(cpus), &cpus); + + /* update bpf map concurrently on CPU0 in NMI and Task context. + * there should be no kernel deadlock. + */ + for (i = 0; i < 100000; i++) + bpf_map_update_elem(bpf_map__fd(skel->maps.htab), + &key, &val, BPF_ANY); + + bpf_link__destroy(link); +clean_pfd: + close(pfd); +clean_skel: + htab_deadlock__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/htab_deadlock.c b/tools/testing/selftests/bpf/progs/htab_deadlock.c new file mode 100644 index 000000000000..72178f073667 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/htab_deadlock.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2022 DiDi Global Inc. */ +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +char _license[] SEC("license") = "GPL"; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 2); + __uint(map_flags, BPF_F_ZERO_SEED); + __type(key, unsigned int); + __type(value, unsigned int); +} htab SEC(".maps"); + +SEC("fentry/nmi_handle") +int bpf_nmi_handle(struct pt_regs *regs) +{ + unsigned int val = 0, key = 4; + + bpf_map_update_elem(&htab, &key, &val, BPF_ANY); + return 0; +} + +SEC("perf_event") +int bpf_empty(struct pt_regs *regs) +{ + return 0; +} -- 2.27.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [bpf-next v2 2/2] selftests/bpf: add test case for htab map 2022-12-17 15:02 ` [bpf-next v2 2/2] selftests/bpf: add test case for htab map xiangxia.m.yue @ 2022-12-17 17:37 ` Yonghong Song 2022-12-19 2:36 ` Tonghao Zhang 0 siblings, 1 reply; 4+ messages in thread From: Yonghong Song @ 2022-12-17 17:37 UTC (permalink / raw) To: xiangxia.m.yue, bpf Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Hou Tao On 12/17/22 7:02 AM, xiangxia.m.yue@gmail.com wrote: > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > This testing show how to reproduce deadlock in special case. > We update htab map in Task and NMI context. Task can be interrupted by > NMI, if the same map bucket was locked, there will be a deadlock. > > * map max_entries is 2. > * NMI using key 4 and Task context using key 20. > * so same bucket index but map_locked index is different. > > The selftest use perf to produce the NMI and fentry nmi_handle. > Note that bpf_overflow_handler checks bpf_prog_active, but in bpf update > map syscall increase this counter in bpf_disable_instrumentation. > Then fentry nmi_handle and update hash map will reproduce the issue. > > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > Cc: Alexei Starovoitov <ast@kernel.org> > Cc: Daniel Borkmann <daniel@iogearbox.net> > Cc: Andrii Nakryiko <andrii@kernel.org> > Cc: Martin KaFai Lau <martin.lau@linux.dev> > Cc: Song Liu <song@kernel.org> > Cc: Yonghong Song <yhs@fb.com> > Cc: John Fastabend <john.fastabend@gmail.com> > Cc: KP Singh <kpsingh@kernel.org> > Cc: Stanislav Fomichev <sdf@google.com> > Cc: Hao Luo <haoluo@google.com> > Cc: Jiri Olsa <jolsa@kernel.org> > Cc: Hou Tao <houtao1@huawei.com> Ack with a small nit below. Acked-by: Yonghong Song <yhs@fb.com> > --- > tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 + > tools/testing/selftests/bpf/DENYLIST.s390x | 1 + > .../selftests/bpf/prog_tests/htab_deadlock.c | 75 +++++++++++++++++++ > .../selftests/bpf/progs/htab_deadlock.c | 30 ++++++++ > 4 files changed, 107 insertions(+) > create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_deadlock.c > create mode 100644 tools/testing/selftests/bpf/progs/htab_deadlock.c > [...] > diff --git a/tools/testing/selftests/bpf/progs/htab_deadlock.c b/tools/testing/selftests/bpf/progs/htab_deadlock.c > new file mode 100644 > index 000000000000..72178f073667 > --- /dev/null > +++ b/tools/testing/selftests/bpf/progs/htab_deadlock.c > @@ -0,0 +1,30 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* Copyright (c) 2022 DiDi Global Inc. */ > +#include <linux/bpf.h> > +#include <bpf/bpf_helpers.h> > +#include <bpf/bpf_tracing.h> > + > +char _license[] SEC("license") = "GPL"; > + > +struct { > + __uint(type, BPF_MAP_TYPE_HASH); > + __uint(max_entries, 2); > + __uint(map_flags, BPF_F_ZERO_SEED); > + __type(key, unsigned int); > + __type(value, unsigned int); > +} htab SEC(".maps"); > + > +SEC("fentry/nmi_handle") nmi_handle() is a static function. In my setup, it is not inlined. But if it is inlined, the test will succeed regardless of the previous fix. But currently we don't have mechanisms to discover such situations, so I am okay with the test. But it would be good if you can add a small comment to explain this caveat. > +int bpf_nmi_handle(struct pt_regs *regs) > +{ > + unsigned int val = 0, key = 4; > + > + bpf_map_update_elem(&htab, &key, &val, BPF_ANY); > + return 0; > +} > + > +SEC("perf_event") > +int bpf_empty(struct pt_regs *regs) > +{ > + return 0; > +} ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [bpf-next v2 2/2] selftests/bpf: add test case for htab map 2022-12-17 17:37 ` Yonghong Song @ 2022-12-19 2:36 ` Tonghao Zhang 0 siblings, 0 replies; 4+ messages in thread From: Tonghao Zhang @ 2022-12-19 2:36 UTC (permalink / raw) To: Yonghong Song Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Hou Tao On Sun, Dec 18, 2022 at 1:38 AM Yonghong Song <yhs@meta.com> wrote: > > > > On 12/17/22 7:02 AM, xiangxia.m.yue@gmail.com wrote: > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > > > This testing show how to reproduce deadlock in special case. > > We update htab map in Task and NMI context. Task can be interrupted by > > NMI, if the same map bucket was locked, there will be a deadlock. > > > > * map max_entries is 2. > > * NMI using key 4 and Task context using key 20. > > * so same bucket index but map_locked index is different. > > > > The selftest use perf to produce the NMI and fentry nmi_handle. > > Note that bpf_overflow_handler checks bpf_prog_active, but in bpf update > > map syscall increase this counter in bpf_disable_instrumentation. > > Then fentry nmi_handle and update hash map will reproduce the issue. > > > > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > Cc: Alexei Starovoitov <ast@kernel.org> > > Cc: Daniel Borkmann <daniel@iogearbox.net> > > Cc: Andrii Nakryiko <andrii@kernel.org> > > Cc: Martin KaFai Lau <martin.lau@linux.dev> > > Cc: Song Liu <song@kernel.org> > > Cc: Yonghong Song <yhs@fb.com> > > Cc: John Fastabend <john.fastabend@gmail.com> > > Cc: KP Singh <kpsingh@kernel.org> > > Cc: Stanislav Fomichev <sdf@google.com> > > Cc: Hao Luo <haoluo@google.com> > > Cc: Jiri Olsa <jolsa@kernel.org> > > Cc: Hou Tao <houtao1@huawei.com> > > Ack with a small nit below. > > Acked-by: Yonghong Song <yhs@fb.com> > > > --- > > tools/testing/selftests/bpf/DENYLIST.aarch64 | 1 + > > tools/testing/selftests/bpf/DENYLIST.s390x | 1 + > > .../selftests/bpf/prog_tests/htab_deadlock.c | 75 +++++++++++++++++++ > > .../selftests/bpf/progs/htab_deadlock.c | 30 ++++++++ > > 4 files changed, 107 insertions(+) > > create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_deadlock.c > > create mode 100644 tools/testing/selftests/bpf/progs/htab_deadlock.c > > > [...] > > diff --git a/tools/testing/selftests/bpf/progs/htab_deadlock.c b/tools/testing/selftests/bpf/progs/htab_deadlock.c > > new file mode 100644 > > index 000000000000..72178f073667 > > --- /dev/null > > +++ b/tools/testing/selftests/bpf/progs/htab_deadlock.c > > @@ -0,0 +1,30 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* Copyright (c) 2022 DiDi Global Inc. */ > > +#include <linux/bpf.h> > > +#include <bpf/bpf_helpers.h> > > +#include <bpf/bpf_tracing.h> > > + > > +char _license[] SEC("license") = "GPL"; > > + > > +struct { > > + __uint(type, BPF_MAP_TYPE_HASH); > > + __uint(max_entries, 2); > > + __uint(map_flags, BPF_F_ZERO_SEED); > > + __type(key, unsigned int); > > + __type(value, unsigned int); > > +} htab SEC(".maps"); > > + > > +SEC("fentry/nmi_handle") > > nmi_handle() is a static function. In my setup, it is not inlined. > But if it is inlined, the test will succeed regardless of the > previous fix. But currently we don't have mechanisms to > discover such situations, so I am okay with the test. > But it would be good if you can add a small comment > to explain this caveat. Ok, Thanks > > +int bpf_nmi_handle(struct pt_regs *regs) > > +{ > > + unsigned int val = 0, key = 4; > > + > > + bpf_map_update_elem(&htab, &key, &val, BPF_ANY); > > + return 0; > > +} > > + > > +SEC("perf_event") > > +int bpf_empty(struct pt_regs *regs) > > +{ > > + return 0; > > +} -- Best regards, Tonghao ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-12-19 2:37 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-17 15:02 [bpf-next v2 1/2] bpf: hash map, avoid deadlock with suitable hash mask xiangxia.m.yue 2022-12-17 15:02 ` [bpf-next v2 2/2] selftests/bpf: add test case for htab map xiangxia.m.yue 2022-12-17 17:37 ` Yonghong Song 2022-12-19 2:36 ` Tonghao Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox