* [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang @ 2025-06-20 11:38 Arnd Bergmann 2025-06-23 21:32 ` Alexei Starovoitov 0 siblings, 1 reply; 7+ messages in thread From: Arnd Bergmann @ 2025-06-20 11:38 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor Cc: Arnd Bergmann, John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Nick Desaulniers, Bill Wendling, Justin Stitt, Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, linux-kernel, llvm From: Arnd Bergmann <arnd@arndb.de> clang versions before version 18 manage to badly optimize the bpf verifier, with lots of variable spills leading to excessive stack usage in addition to likely rather slow code: kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than] kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than] Turn off the sanitizer in the two functions that suffer the most from this when using one of the affected clang version. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- kernel/bpf/verifier.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2fa797a6d6a2..7724c7a56d79 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state) return 0; } -static int do_check(struct bpf_verifier_env *env) +#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100 +/* old clang versions cause excessive stack usage here */ +#define __workaround_kasan __disable_sanitizer_instrumentation +#else +#define __workaround_kasan +#endif + +static __workaround_kasan int do_check(struct bpf_verifier_env *env) { bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); struct bpf_verifier_state *state = env->cur_state; @@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat /* Do various post-verification rewrites in a single program pass. * These rewrites simplify JIT and interpreter implementations. */ -static int do_misc_fixups(struct bpf_verifier_env *env) +static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env) { struct bpf_prog *prog = env->prog; enum bpf_attach_type eatype = prog->expected_attach_type; -- 2.39.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang 2025-06-20 11:38 [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang Arnd Bergmann @ 2025-06-23 21:32 ` Alexei Starovoitov 2025-07-01 20:03 ` Yonghong Song 0 siblings, 1 reply; 7+ messages in thread From: Alexei Starovoitov @ 2025-06-23 21:32 UTC (permalink / raw) To: Arnd Bergmann, Yonghong Song Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor, Arnd Bergmann, John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Nick Desaulniers, Bill Wendling, Justin Stitt, Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML, clang-built-linux On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > clang versions before version 18 manage to badly optimize the bpf > verifier, with lots of variable spills leading to excessive stack > usage in addition to likely rather slow code: > > kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than] > kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than] > > Turn off the sanitizer in the two functions that suffer the most from > this when using one of the affected clang version. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > kernel/bpf/verifier.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 2fa797a6d6a2..7724c7a56d79 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state) > return 0; > } > > -static int do_check(struct bpf_verifier_env *env) > +#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100 > +/* old clang versions cause excessive stack usage here */ > +#define __workaround_kasan __disable_sanitizer_instrumentation > +#else > +#define __workaround_kasan > +#endif > + > +static __workaround_kasan int do_check(struct bpf_verifier_env *env) This looks too hacky for a workaround. Let's figure out what's causing such excessive stack usage and fix it. We did some of this work in commit 6f606ffd6dd7 ("bpf: Move insn_buf[16] to bpf_verifier_env") and similar. Looks like it wasn't enough or more stack usage crept in since then. Also make sure you're using the latest bpf-next. A bunch of code was moved out of do_check(). So I bet the current bpf-next/master doesn't have a problem with this particular function. In my kasan build do_check() is now fully inlined. do_check_common() is not and it's using 512 bytes of stack. > { > bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); > struct bpf_verifier_state *state = env->cur_state; > @@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat > /* Do various post-verification rewrites in a single program pass. > * These rewrites simplify JIT and interpreter implementations. > */ > -static int do_misc_fixups(struct bpf_verifier_env *env) > +static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env) This one is using 832 byte of stack with kasan. Which is indeed high. Big chunk seems to be coming from chk_and_sdiv[] and chk_and_smod[]. Yonghong, looks like you contributed that piece of code. Pls see how to reduce stack size here. Daniel used this pattern in earlier commits. Looks like we took it too far. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang 2025-06-23 21:32 ` Alexei Starovoitov @ 2025-07-01 20:03 ` Yonghong Song 2025-07-01 20:45 ` Andrii Nakryiko 0 siblings, 1 reply; 7+ messages in thread From: Yonghong Song @ 2025-07-01 20:03 UTC (permalink / raw) To: Alexei Starovoitov, Arnd Bergmann Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor, Arnd Bergmann, John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Nick Desaulniers, Bill Wendling, Justin Stitt, Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML, clang-built-linux On 6/23/25 2:32 PM, Alexei Starovoitov wrote: > On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote: >> From: Arnd Bergmann <arnd@arndb.de> >> >> clang versions before version 18 manage to badly optimize the bpf >> verifier, with lots of variable spills leading to excessive stack >> usage in addition to likely rather slow code: >> >> kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than] >> kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than] >> >> Turn off the sanitizer in the two functions that suffer the most from >> this when using one of the affected clang version. >> >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> >> --- >> kernel/bpf/verifier.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> index 2fa797a6d6a2..7724c7a56d79 100644 >> --- a/kernel/bpf/verifier.c >> +++ b/kernel/bpf/verifier.c >> @@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state) >> return 0; >> } >> >> -static int do_check(struct bpf_verifier_env *env) >> +#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100 >> +/* old clang versions cause excessive stack usage here */ >> +#define __workaround_kasan __disable_sanitizer_instrumentation >> +#else >> +#define __workaround_kasan >> +#endif >> + >> +static __workaround_kasan int do_check(struct bpf_verifier_env *env) > This looks too hacky for a workaround. > Let's figure out what's causing such excessive stack usage and fix it. > We did some of this work in > commit 6f606ffd6dd7 ("bpf: Move insn_buf[16] to bpf_verifier_env") > and similar. > Looks like it wasn't enough or more stack usage crept in since then. > > Also make sure you're using the latest bpf-next. > A bunch of code was moved out of do_check(). > So I bet the current bpf-next/master doesn't have a problem > with this particular function. > In my kasan build do_check() is now fully inlined. > do_check_common() is not and it's using 512 bytes of stack. > >> { >> bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); >> struct bpf_verifier_state *state = env->cur_state; >> @@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat >> /* Do various post-verification rewrites in a single program pass. >> * These rewrites simplify JIT and interpreter implementations. >> */ >> -static int do_misc_fixups(struct bpf_verifier_env *env) >> +static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env) > This one is using 832 byte of stack with kasan. > Which is indeed high. > Big chunk seems to be coming from chk_and_sdiv[] and chk_and_smod[]. > > Yonghong, > looks like you contributed that piece of code. > Pls see how to reduce stack size here. > Daniel used this pattern in earlier commits. Looks like > we took it too far. With llvm17, I got the following error: /home/yhs/work/bpf-next/kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check' [- Werror,-Wframe-larger-than] 24491 | int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size) | ^ /home/yhs/work/bpf-next/kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check' [- Werror,-Wframe-larger-than] 19921 | static int do_check(struct bpf_verifier_env *env) | ^ 2 errors generated. I checked IR and found the following memory allocations which may contribute excessive stack usage: attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 { entry: %zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854 %rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855 %cnt.i = alloca i32, align 4, !DIAssignID !19856 %patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857 %chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858 %chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859 %chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860 %chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861 %chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862 %chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863 %desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864 %target_size.i = alloca i32, align 4, !DIAssignID !19865 %patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866 %patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867 %ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868 %ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869 %ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870 %fd.i = alloca i32, align 4, !DIAssignID !19871 %log_true_size = alloca i32, align 4, !DIAssignID !19872 ... So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and can be coverted to runtime allocation but that is not enough for 1280 stack limit, we need to do more conversion from stack to memory allocation. Will try to have uniform way to convert 'alloca [<num> x %struct.bpf_insn]' to runtime allocation. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang 2025-07-01 20:03 ` Yonghong Song @ 2025-07-01 20:45 ` Andrii Nakryiko 2025-07-01 21:28 ` Yonghong Song 0 siblings, 1 reply; 7+ messages in thread From: Andrii Nakryiko @ 2025-07-01 20:45 UTC (permalink / raw) To: Yonghong Song Cc: Alexei Starovoitov, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor, Arnd Bergmann, John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Nick Desaulniers, Bill Wendling, Justin Stitt, Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML, clang-built-linux On Tue, Jul 1, 2025 at 1:03 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > On 6/23/25 2:32 PM, Alexei Starovoitov wrote: > > On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote: > >> From: Arnd Bergmann <arnd@arndb.de> > >> > >> clang versions before version 18 manage to badly optimize the bpf > >> verifier, with lots of variable spills leading to excessive stack > >> usage in addition to likely rather slow code: > >> > >> kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than] > >> kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than] > >> > >> Turn off the sanitizer in the two functions that suffer the most from > >> this when using one of the affected clang version. > >> > >> Signed-off-by: Arnd Bergmann <arnd@arndb.de> > >> --- > >> kernel/bpf/verifier.c | 11 +++++++++-- > >> 1 file changed, 9 insertions(+), 2 deletions(-) > >> > >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >> index 2fa797a6d6a2..7724c7a56d79 100644 > >> --- a/kernel/bpf/verifier.c > >> +++ b/kernel/bpf/verifier.c > >> @@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state) > >> return 0; > >> } > >> > >> -static int do_check(struct bpf_verifier_env *env) > >> +#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100 > >> +/* old clang versions cause excessive stack usage here */ > >> +#define __workaround_kasan __disable_sanitizer_instrumentation > >> +#else > >> +#define __workaround_kasan > >> +#endif > >> + > >> +static __workaround_kasan int do_check(struct bpf_verifier_env *env) > > This looks too hacky for a workaround. > > Let's figure out what's causing such excessive stack usage and fix it. > > We did some of this work in > > commit 6f606ffd6dd7 ("bpf: Move insn_buf[16] to bpf_verifier_env") > > and similar. > > Looks like it wasn't enough or more stack usage crept in since then. > > > > Also make sure you're using the latest bpf-next. > > A bunch of code was moved out of do_check(). > > So I bet the current bpf-next/master doesn't have a problem > > with this particular function. > > In my kasan build do_check() is now fully inlined. > > do_check_common() is not and it's using 512 bytes of stack. > > > >> { > >> bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); > >> struct bpf_verifier_state *state = env->cur_state; > >> @@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat > >> /* Do various post-verification rewrites in a single program pass. > >> * These rewrites simplify JIT and interpreter implementations. > >> */ > >> -static int do_misc_fixups(struct bpf_verifier_env *env) > >> +static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env) > > This one is using 832 byte of stack with kasan. > > Which is indeed high. > > Big chunk seems to be coming from chk_and_sdiv[] and chk_and_smod[]. > > > > Yonghong, > > looks like you contributed that piece of code. > > Pls see how to reduce stack size here. > > Daniel used this pattern in earlier commits. Looks like > > we took it too far. > > With llvm17, I got the following error: > > /home/yhs/work/bpf-next/kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check' [- > Werror,-Wframe-larger-than] > 24491 | int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size) > | ^ > /home/yhs/work/bpf-next/kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check' [- > Werror,-Wframe-larger-than] > 19921 | static int do_check(struct bpf_verifier_env *env) > | ^ > 2 errors generated. > > I checked IR and found the following memory allocations which may contribute > excessive stack usage: > > attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 { > entry: > %zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854 > %rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855 > %cnt.i = alloca i32, align 4, !DIAssignID !19856 > %patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857 > %chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858 > %chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859 > %chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860 > %chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861 > %chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862 > %chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863 > %desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864 > %target_size.i = alloca i32, align 4, !DIAssignID !19865 > %patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866 > %patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867 > %ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868 > %ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869 > %ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870 > %fd.i = alloca i32, align 4, !DIAssignID !19871 > %log_true_size = alloca i32, align 4, !DIAssignID !19872 > ... > > So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and > can be coverted to runtime allocation but that is not enough for 1280 > stack limit, we need to do more conversion from stack to memory > allocation. Will try to have uniform way to convert > 'alloca [<num> x %struct.bpf_insn]' to runtime allocation. > Do we need to go all the way to dynamic allocation? See env->insns_buf (which some parts of this function are already using for constructing instruction patch), let's just converge on that? It pre-allocates space for 32 instructions, should be sufficient for all the use cases, no? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang 2025-07-01 20:45 ` Andrii Nakryiko @ 2025-07-01 21:28 ` Yonghong Song 2025-07-02 7:48 ` Arnd Bergmann 0 siblings, 1 reply; 7+ messages in thread From: Yonghong Song @ 2025-07-01 21:28 UTC (permalink / raw) To: Andrii Nakryiko Cc: Alexei Starovoitov, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor, Arnd Bergmann, John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Nick Desaulniers, Bill Wendling, Justin Stitt, Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML, clang-built-linux On 7/1/25 1:45 PM, Andrii Nakryiko wrote: > On Tue, Jul 1, 2025 at 1:03 PM Yonghong Song <yonghong.song@linux.dev> wrote: >> >> >> On 6/23/25 2:32 PM, Alexei Starovoitov wrote: >>> On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote: >>>> From: Arnd Bergmann <arnd@arndb.de> >>>> >>>> clang versions before version 18 manage to badly optimize the bpf >>>> verifier, with lots of variable spills leading to excessive stack >>>> usage in addition to likely rather slow code: >>>> >>>> kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than] >>>> kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than] >>>> >>>> Turn off the sanitizer in the two functions that suffer the most from >>>> this when using one of the affected clang version. >>>> >>>> Signed-off-by: Arnd Bergmann <arnd@arndb.de> >>>> --- >>>> kernel/bpf/verifier.c | 11 +++++++++-- >>>> 1 file changed, 9 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >>>> index 2fa797a6d6a2..7724c7a56d79 100644 >>>> --- a/kernel/bpf/verifier.c >>>> +++ b/kernel/bpf/verifier.c >>>> @@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state) >>>> return 0; >>>> } >>>> >>>> -static int do_check(struct bpf_verifier_env *env) >>>> +#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100 >>>> +/* old clang versions cause excessive stack usage here */ >>>> +#define __workaround_kasan __disable_sanitizer_instrumentation >>>> +#else >>>> +#define __workaround_kasan >>>> +#endif >>>> + >>>> +static __workaround_kasan int do_check(struct bpf_verifier_env *env) >>> This looks too hacky for a workaround. >>> Let's figure out what's causing such excessive stack usage and fix it. >>> We did some of this work in >>> commit 6f606ffd6dd7 ("bpf: Move insn_buf[16] to bpf_verifier_env") >>> and similar. >>> Looks like it wasn't enough or more stack usage crept in since then. >>> >>> Also make sure you're using the latest bpf-next. >>> A bunch of code was moved out of do_check(). >>> So I bet the current bpf-next/master doesn't have a problem >>> with this particular function. >>> In my kasan build do_check() is now fully inlined. >>> do_check_common() is not and it's using 512 bytes of stack. >>> >>>> { >>>> bool pop_log = !(env->log.level & BPF_LOG_LEVEL2); >>>> struct bpf_verifier_state *state = env->cur_state; >>>> @@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat >>>> /* Do various post-verification rewrites in a single program pass. >>>> * These rewrites simplify JIT and interpreter implementations. >>>> */ >>>> -static int do_misc_fixups(struct bpf_verifier_env *env) >>>> +static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env) >>> This one is using 832 byte of stack with kasan. >>> Which is indeed high. >>> Big chunk seems to be coming from chk_and_sdiv[] and chk_and_smod[]. >>> >>> Yonghong, >>> looks like you contributed that piece of code. >>> Pls see how to reduce stack size here. >>> Daniel used this pattern in earlier commits. Looks like >>> we took it too far. >> With llvm17, I got the following error: >> >> /home/yhs/work/bpf-next/kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check' [- >> Werror,-Wframe-larger-than] >> 24491 | int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size) >> | ^ >> /home/yhs/work/bpf-next/kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check' [- >> Werror,-Wframe-larger-than] >> 19921 | static int do_check(struct bpf_verifier_env *env) >> | ^ >> 2 errors generated. >> >> I checked IR and found the following memory allocations which may contribute >> excessive stack usage: >> >> attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 { >> entry: >> %zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854 >> %rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855 >> %cnt.i = alloca i32, align 4, !DIAssignID !19856 >> %patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857 >> %chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858 >> %chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859 >> %chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860 >> %chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861 >> %chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862 >> %chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863 >> %desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864 >> %target_size.i = alloca i32, align 4, !DIAssignID !19865 >> %patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866 >> %patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867 >> %ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868 >> %ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869 >> %ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870 >> %fd.i = alloca i32, align 4, !DIAssignID !19871 >> %log_true_size = alloca i32, align 4, !DIAssignID !19872 >> ... >> >> So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and >> can be coverted to runtime allocation but that is not enough for 1280 >> stack limit, we need to do more conversion from stack to memory >> allocation. Will try to have uniform way to convert >> 'alloca [<num> x %struct.bpf_insn]' to runtime allocation. >> > Do we need to go all the way to dynamic allocation? See env->insns_buf > (which some parts of this function are already using for constructing > instruction patch), let's just converge on that? It pre-allocates > space for 32 instructions, should be sufficient for all the use cases, > no? Make sense. This is much better. Thanks! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang 2025-07-01 21:28 ` Yonghong Song @ 2025-07-02 7:48 ` Arnd Bergmann 2025-07-02 14:14 ` Yonghong Song 0 siblings, 1 reply; 7+ messages in thread From: Arnd Bergmann @ 2025-07-02 7:48 UTC (permalink / raw) To: Yonghong Song, Andrii Nakryiko Cc: Alexei Starovoitov, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor, John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Nick Desaulniers, Bill Wendling, Justin Stitt, Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML, clang-built-linux On Tue, Jul 1, 2025, at 23:28, Yonghong Song wrote: > On 7/1/25 1:45 PM, Andrii Nakryiko wrote: >> On Tue, Jul 1, 2025 at 1:03 PM Yonghong Song <yonghong.song@linux.dev> wrote: >>> On 6/23/25 2:32 PM, Alexei Starovoitov wrote: >>>> On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote: >>>>> From: Arnd Bergmann <arnd@arndb.de> >>> >>> I checked IR and found the following memory allocations which may contribute >>> excessive stack usage: >>> >>> attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 { >>> entry: >>> %zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854 >>> %rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855 >>> %cnt.i = alloca i32, align 4, !DIAssignID !19856 >>> %patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857 >>> %chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858 >>> %chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859 >>> %chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860 >>> %chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861 >>> %chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862 >>> %chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863 >>> %desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864 >>> %target_size.i = alloca i32, align 4, !DIAssignID !19865 >>> %patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866 >>> %patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867 >>> %ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868 >>> %ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869 >>> %ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870 >>> %fd.i = alloca i32, align 4, !DIAssignID !19871 >>> %log_true_size = alloca i32, align 4, !DIAssignID !19872 >>> ... >>> >>> So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and >>> can be coverted to runtime allocation but that is not enough for 1280 >>> stack limit, we need to do more conversion from stack to memory >>> allocation. Will try to have uniform way to convert >>> 'alloca [<num> x %struct.bpf_insn]' to runtime allocation. >>> >> Do we need to go all the way to dynamic allocation? See env->insns_buf >> (which some parts of this function are already using for constructing >> instruction patch), let's just converge on that? It pre-allocates >> space for 32 instructions, should be sufficient for all the use cases, >> no? > > Make sense. This is much better. Thanks! I'm not sure if that actually helps on the old clang version, as far as I understood it in my initial analysis, the problem in the struct bpf_insn chk_and_sdiv[] = { /* [R,W]x sdiv 0 -> 0 * LLONG_MIN sdiv -1 -> LLONG_MIN * INT_MIN sdiv -1 -> INT_MIN */ BPF_MOV64_REG(BPF_REG_AX, insn->src_reg), ... } construct is not the chk_and_sdiv[] array itself but the struct initializer in the BPF_MOV64_REG() macro that leads to having two copies of the struct on the stack and then copying between them. In gcc or clang-18+, these all get folded into a single object on the stack. (Disclaimer: I don't understand anything about how clang actually works internally, the above is only speculation on my side, based on the assembler output) Arnd ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang 2025-07-02 7:48 ` Arnd Bergmann @ 2025-07-02 14:14 ` Yonghong Song 0 siblings, 0 replies; 7+ messages in thread From: Yonghong Song @ 2025-07-02 14:14 UTC (permalink / raw) To: Arnd Bergmann, Andrii Nakryiko Cc: Alexei Starovoitov, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor, John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Nick Desaulniers, Bill Wendling, Justin Stitt, Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML, clang-built-linux On 7/2/25 12:48 AM, Arnd Bergmann wrote: > On Tue, Jul 1, 2025, at 23:28, Yonghong Song wrote: >> On 7/1/25 1:45 PM, Andrii Nakryiko wrote: >>> On Tue, Jul 1, 2025 at 1:03 PM Yonghong Song <yonghong.song@linux.dev> wrote: >>>> On 6/23/25 2:32 PM, Alexei Starovoitov wrote: >>>>> On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote: >>>>>> From: Arnd Bergmann <arnd@arndb.de> >>>> I checked IR and found the following memory allocations which may contribute >>>> excessive stack usage: >>>> >>>> attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 { >>>> entry: >>>> %zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854 >>>> %rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855 >>>> %cnt.i = alloca i32, align 4, !DIAssignID !19856 >>>> %patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857 >>>> %chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858 >>>> %chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859 >>>> %chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860 >>>> %chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861 >>>> %chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862 >>>> %chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863 >>>> %desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864 >>>> %target_size.i = alloca i32, align 4, !DIAssignID !19865 >>>> %patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866 >>>> %patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867 >>>> %ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868 >>>> %ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869 >>>> %ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870 >>>> %fd.i = alloca i32, align 4, !DIAssignID !19871 >>>> %log_true_size = alloca i32, align 4, !DIAssignID !19872 >>>> ... >>>> >>>> So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and >>>> can be coverted to runtime allocation but that is not enough for 1280 >>>> stack limit, we need to do more conversion from stack to memory >>>> allocation. Will try to have uniform way to convert >>>> 'alloca [<num> x %struct.bpf_insn]' to runtime allocation. >>>> >>> Do we need to go all the way to dynamic allocation? See env->insns_buf >>> (which some parts of this function are already using for constructing >>> instruction patch), let's just converge on that? It pre-allocates >>> space for 32 instructions, should be sufficient for all the use cases, >>> no? >> Make sense. This is much better. Thanks! > I'm not sure if that actually helps on the old clang version, as far > as I understood it in my initial analysis, the problem in the > > struct bpf_insn chk_and_sdiv[] = { > /* [R,W]x sdiv 0 -> 0 > * LLONG_MIN sdiv -1 -> LLONG_MIN > * INT_MIN sdiv -1 -> INT_MIN > */ > BPF_MOV64_REG(BPF_REG_AX, insn->src_reg), > ... > } > > construct is not the chk_and_sdiv[] array itself but the > struct initializer in the BPF_MOV64_REG() macro that leads to > having two copies of the struct on the stack and then copying > between them. In gcc or clang-18+, these all get folded > into a single object on the stack. See https://lore.kernel.org/bpf/20250702053332.1991516-1-yonghong.song@linux.dev/. The above 'struct bpf_insn chk_and_sdiv[] = { ... }' will be removed so there will not be stack consumption any more for it. Instead, we use the scratch space in bpf_verifier_env. > > (Disclaimer: I don't understand anything about how clang > actually works internally, the above is only speculation on > my side, based on the assembler output) > > Arnd ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-02 14:14 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-20 11:38 [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang Arnd Bergmann 2025-06-23 21:32 ` Alexei Starovoitov 2025-07-01 20:03 ` Yonghong Song 2025-07-01 20:45 ` Andrii Nakryiko 2025-07-01 21:28 ` Yonghong Song 2025-07-02 7:48 ` Arnd Bergmann 2025-07-02 14:14 ` Yonghong Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).