* [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang
@ 2025-06-20 11:38 Arnd Bergmann
2025-06-23 21:32 ` Alexei Starovoitov
0 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2025-06-20 11:38 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Nathan Chancellor
Cc: Arnd Bergmann, John Fastabend, Martin KaFai Lau, Eduard Zingerman,
Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
Jiri Olsa, Nick Desaulniers, Bill Wendling, Justin Stitt,
Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, linux-kernel, llvm
From: Arnd Bergmann <arnd@arndb.de>
clang versions before version 18 manage to badly optimize the bpf
verifier, with lots of variable spills leading to excessive stack
usage in addition to likely rather slow code:
kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than]
kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than]
Turn off the sanitizer in the two functions that suffer the most from
this when using one of the affected clang version.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
kernel/bpf/verifier.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2fa797a6d6a2..7724c7a56d79 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
return 0;
}
-static int do_check(struct bpf_verifier_env *env)
+#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100
+/* old clang versions cause excessive stack usage here */
+#define __workaround_kasan __disable_sanitizer_instrumentation
+#else
+#define __workaround_kasan
+#endif
+
+static __workaround_kasan int do_check(struct bpf_verifier_env *env)
{
bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
struct bpf_verifier_state *state = env->cur_state;
@@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat
/* Do various post-verification rewrites in a single program pass.
* These rewrites simplify JIT and interpreter implementations.
*/
-static int do_misc_fixups(struct bpf_verifier_env *env)
+static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env)
{
struct bpf_prog *prog = env->prog;
enum bpf_attach_type eatype = prog->expected_attach_type;
--
2.39.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang
2025-06-20 11:38 [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang Arnd Bergmann
@ 2025-06-23 21:32 ` Alexei Starovoitov
2025-07-01 20:03 ` Yonghong Song
0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2025-06-23 21:32 UTC (permalink / raw)
To: Arnd Bergmann, Yonghong Song
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Nathan Chancellor, Arnd Bergmann, John Fastabend,
Martin KaFai Lau, Eduard Zingerman, Song Liu, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Nick Desaulniers,
Bill Wendling, Justin Stitt, Kumar Kartikeya Dwivedi,
Luis Gerhorst, bpf, LKML, clang-built-linux
On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> clang versions before version 18 manage to badly optimize the bpf
> verifier, with lots of variable spills leading to excessive stack
> usage in addition to likely rather slow code:
>
> kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than]
> kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than]
>
> Turn off the sanitizer in the two functions that suffer the most from
> this when using one of the affected clang version.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> kernel/bpf/verifier.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 2fa797a6d6a2..7724c7a56d79 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> return 0;
> }
>
> -static int do_check(struct bpf_verifier_env *env)
> +#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100
> +/* old clang versions cause excessive stack usage here */
> +#define __workaround_kasan __disable_sanitizer_instrumentation
> +#else
> +#define __workaround_kasan
> +#endif
> +
> +static __workaround_kasan int do_check(struct bpf_verifier_env *env)
This looks too hacky for a workaround.
Let's figure out what's causing such excessive stack usage and fix it.
We did some of this work in
commit 6f606ffd6dd7 ("bpf: Move insn_buf[16] to bpf_verifier_env")
and similar.
Looks like it wasn't enough or more stack usage crept in since then.
Also make sure you're using the latest bpf-next.
A bunch of code was moved out of do_check().
So I bet the current bpf-next/master doesn't have a problem
with this particular function.
In my kasan build do_check() is now fully inlined.
do_check_common() is not and it's using 512 bytes of stack.
> {
> bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
> struct bpf_verifier_state *state = env->cur_state;
> @@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat
> /* Do various post-verification rewrites in a single program pass.
> * These rewrites simplify JIT and interpreter implementations.
> */
> -static int do_misc_fixups(struct bpf_verifier_env *env)
> +static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env)
This one is using 832 byte of stack with kasan.
Which is indeed high.
Big chunk seems to be coming from chk_and_sdiv[] and chk_and_smod[].
Yonghong,
looks like you contributed that piece of code.
Pls see how to reduce stack size here.
Daniel used this pattern in earlier commits. Looks like
we took it too far.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang
2025-06-23 21:32 ` Alexei Starovoitov
@ 2025-07-01 20:03 ` Yonghong Song
2025-07-01 20:45 ` Andrii Nakryiko
0 siblings, 1 reply; 7+ messages in thread
From: Yonghong Song @ 2025-07-01 20:03 UTC (permalink / raw)
To: Alexei Starovoitov, Arnd Bergmann
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Nathan Chancellor, Arnd Bergmann, John Fastabend,
Martin KaFai Lau, Eduard Zingerman, Song Liu, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Nick Desaulniers,
Bill Wendling, Justin Stitt, Kumar Kartikeya Dwivedi,
Luis Gerhorst, bpf, LKML, clang-built-linux
On 6/23/25 2:32 PM, Alexei Starovoitov wrote:
> On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote:
>> From: Arnd Bergmann <arnd@arndb.de>
>>
>> clang versions before version 18 manage to badly optimize the bpf
>> verifier, with lots of variable spills leading to excessive stack
>> usage in addition to likely rather slow code:
>>
>> kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than]
>> kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than]
>>
>> Turn off the sanitizer in the two functions that suffer the most from
>> this when using one of the affected clang version.
>>
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> ---
>> kernel/bpf/verifier.c | 11 +++++++++--
>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 2fa797a6d6a2..7724c7a56d79 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
>> return 0;
>> }
>>
>> -static int do_check(struct bpf_verifier_env *env)
>> +#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100
>> +/* old clang versions cause excessive stack usage here */
>> +#define __workaround_kasan __disable_sanitizer_instrumentation
>> +#else
>> +#define __workaround_kasan
>> +#endif
>> +
>> +static __workaround_kasan int do_check(struct bpf_verifier_env *env)
> This looks too hacky for a workaround.
> Let's figure out what's causing such excessive stack usage and fix it.
> We did some of this work in
> commit 6f606ffd6dd7 ("bpf: Move insn_buf[16] to bpf_verifier_env")
> and similar.
> Looks like it wasn't enough or more stack usage crept in since then.
>
> Also make sure you're using the latest bpf-next.
> A bunch of code was moved out of do_check().
> So I bet the current bpf-next/master doesn't have a problem
> with this particular function.
> In my kasan build do_check() is now fully inlined.
> do_check_common() is not and it's using 512 bytes of stack.
>
>> {
>> bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
>> struct bpf_verifier_state *state = env->cur_state;
>> @@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat
>> /* Do various post-verification rewrites in a single program pass.
>> * These rewrites simplify JIT and interpreter implementations.
>> */
>> -static int do_misc_fixups(struct bpf_verifier_env *env)
>> +static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env)
> This one is using 832 byte of stack with kasan.
> Which is indeed high.
> Big chunk seems to be coming from chk_and_sdiv[] and chk_and_smod[].
>
> Yonghong,
> looks like you contributed that piece of code.
> Pls see how to reduce stack size here.
> Daniel used this pattern in earlier commits. Looks like
> we took it too far.
With llvm17, I got the following error:
/home/yhs/work/bpf-next/kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check' [-
Werror,-Wframe-larger-than]
24491 | int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
| ^
/home/yhs/work/bpf-next/kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check' [-
Werror,-Wframe-larger-than]
19921 | static int do_check(struct bpf_verifier_env *env)
| ^
2 errors generated.
I checked IR and found the following memory allocations which may contribute
excessive stack usage:
attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 {
entry:
%zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854
%rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855
%cnt.i = alloca i32, align 4, !DIAssignID !19856
%patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857
%chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858
%chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859
%chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860
%chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861
%chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862
%chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863
%desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864
%target_size.i = alloca i32, align 4, !DIAssignID !19865
%patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866
%patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867
%ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868
%ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869
%ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870
%fd.i = alloca i32, align 4, !DIAssignID !19871
%log_true_size = alloca i32, align 4, !DIAssignID !19872
...
So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and
can be coverted to runtime allocation but that is not enough for 1280
stack limit, we need to do more conversion from stack to memory
allocation. Will try to have uniform way to convert
'alloca [<num> x %struct.bpf_insn]' to runtime allocation.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang
2025-07-01 20:03 ` Yonghong Song
@ 2025-07-01 20:45 ` Andrii Nakryiko
2025-07-01 21:28 ` Yonghong Song
0 siblings, 1 reply; 7+ messages in thread
From: Andrii Nakryiko @ 2025-07-01 20:45 UTC (permalink / raw)
To: Yonghong Song
Cc: Alexei Starovoitov, Arnd Bergmann, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor,
Arnd Bergmann, John Fastabend, Martin KaFai Lau, Eduard Zingerman,
Song Liu, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML,
clang-built-linux
On Tue, Jul 1, 2025 at 1:03 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 6/23/25 2:32 PM, Alexei Starovoitov wrote:
> > On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >> From: Arnd Bergmann <arnd@arndb.de>
> >>
> >> clang versions before version 18 manage to badly optimize the bpf
> >> verifier, with lots of variable spills leading to excessive stack
> >> usage in addition to likely rather slow code:
> >>
> >> kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than]
> >> kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than]
> >>
> >> Turn off the sanitizer in the two functions that suffer the most from
> >> this when using one of the affected clang version.
> >>
> >> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> >> ---
> >> kernel/bpf/verifier.c | 11 +++++++++--
> >> 1 file changed, 9 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index 2fa797a6d6a2..7724c7a56d79 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
> >> return 0;
> >> }
> >>
> >> -static int do_check(struct bpf_verifier_env *env)
> >> +#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100
> >> +/* old clang versions cause excessive stack usage here */
> >> +#define __workaround_kasan __disable_sanitizer_instrumentation
> >> +#else
> >> +#define __workaround_kasan
> >> +#endif
> >> +
> >> +static __workaround_kasan int do_check(struct bpf_verifier_env *env)
> > This looks too hacky for a workaround.
> > Let's figure out what's causing such excessive stack usage and fix it.
> > We did some of this work in
> > commit 6f606ffd6dd7 ("bpf: Move insn_buf[16] to bpf_verifier_env")
> > and similar.
> > Looks like it wasn't enough or more stack usage crept in since then.
> >
> > Also make sure you're using the latest bpf-next.
> > A bunch of code was moved out of do_check().
> > So I bet the current bpf-next/master doesn't have a problem
> > with this particular function.
> > In my kasan build do_check() is now fully inlined.
> > do_check_common() is not and it's using 512 bytes of stack.
> >
> >> {
> >> bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
> >> struct bpf_verifier_state *state = env->cur_state;
> >> @@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat
> >> /* Do various post-verification rewrites in a single program pass.
> >> * These rewrites simplify JIT and interpreter implementations.
> >> */
> >> -static int do_misc_fixups(struct bpf_verifier_env *env)
> >> +static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env)
> > This one is using 832 byte of stack with kasan.
> > Which is indeed high.
> > Big chunk seems to be coming from chk_and_sdiv[] and chk_and_smod[].
> >
> > Yonghong,
> > looks like you contributed that piece of code.
> > Pls see how to reduce stack size here.
> > Daniel used this pattern in earlier commits. Looks like
> > we took it too far.
>
> With llvm17, I got the following error:
>
> /home/yhs/work/bpf-next/kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check' [-
> Werror,-Wframe-larger-than]
> 24491 | int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
> | ^
> /home/yhs/work/bpf-next/kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check' [-
> Werror,-Wframe-larger-than]
> 19921 | static int do_check(struct bpf_verifier_env *env)
> | ^
> 2 errors generated.
>
> I checked IR and found the following memory allocations which may contribute
> excessive stack usage:
>
> attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 {
> entry:
> %zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854
> %rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855
> %cnt.i = alloca i32, align 4, !DIAssignID !19856
> %patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857
> %chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858
> %chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859
> %chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860
> %chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861
> %chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862
> %chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863
> %desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864
> %target_size.i = alloca i32, align 4, !DIAssignID !19865
> %patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866
> %patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867
> %ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868
> %ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869
> %ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870
> %fd.i = alloca i32, align 4, !DIAssignID !19871
> %log_true_size = alloca i32, align 4, !DIAssignID !19872
> ...
>
> So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and
> can be coverted to runtime allocation but that is not enough for 1280
> stack limit, we need to do more conversion from stack to memory
> allocation. Will try to have uniform way to convert
> 'alloca [<num> x %struct.bpf_insn]' to runtime allocation.
>
Do we need to go all the way to dynamic allocation? See env->insns_buf
(which some parts of this function are already using for constructing
instruction patch), let's just converge on that? It pre-allocates
space for 32 instructions, should be sufficient for all the use cases,
no?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang
2025-07-01 20:45 ` Andrii Nakryiko
@ 2025-07-01 21:28 ` Yonghong Song
2025-07-02 7:48 ` Arnd Bergmann
0 siblings, 1 reply; 7+ messages in thread
From: Yonghong Song @ 2025-07-01 21:28 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, Arnd Bergmann, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor,
Arnd Bergmann, John Fastabend, Martin KaFai Lau, Eduard Zingerman,
Song Liu, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML,
clang-built-linux
On 7/1/25 1:45 PM, Andrii Nakryiko wrote:
> On Tue, Jul 1, 2025 at 1:03 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>> On 6/23/25 2:32 PM, Alexei Starovoitov wrote:
>>> On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote:
>>>> From: Arnd Bergmann <arnd@arndb.de>
>>>>
>>>> clang versions before version 18 manage to badly optimize the bpf
>>>> verifier, with lots of variable spills leading to excessive stack
>>>> usage in addition to likely rather slow code:
>>>>
>>>> kernel/bpf/verifier.c:23936:5: error: stack frame size (2096) exceeds limit (1280) in 'bpf_check' [-Werror,-Wframe-larger-than]
>>>> kernel/bpf/verifier.c:21563:12: error: stack frame size (1984) exceeds limit (1280) in 'do_misc_fixups' [-Werror,-Wframe-larger-than]
>>>>
>>>> Turn off the sanitizer in the two functions that suffer the most from
>>>> this when using one of the affected clang version.
>>>>
>>>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>>> ---
>>>> kernel/bpf/verifier.c | 11 +++++++++--
>>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 2fa797a6d6a2..7724c7a56d79 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -19810,7 +19810,14 @@ static int do_check_insn(struct bpf_verifier_env *env, bool *do_print_state)
>>>> return 0;
>>>> }
>>>>
>>>> -static int do_check(struct bpf_verifier_env *env)
>>>> +#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 180100
>>>> +/* old clang versions cause excessive stack usage here */
>>>> +#define __workaround_kasan __disable_sanitizer_instrumentation
>>>> +#else
>>>> +#define __workaround_kasan
>>>> +#endif
>>>> +
>>>> +static __workaround_kasan int do_check(struct bpf_verifier_env *env)
>>> This looks too hacky for a workaround.
>>> Let's figure out what's causing such excessive stack usage and fix it.
>>> We did some of this work in
>>> commit 6f606ffd6dd7 ("bpf: Move insn_buf[16] to bpf_verifier_env")
>>> and similar.
>>> Looks like it wasn't enough or more stack usage crept in since then.
>>>
>>> Also make sure you're using the latest bpf-next.
>>> A bunch of code was moved out of do_check().
>>> So I bet the current bpf-next/master doesn't have a problem
>>> with this particular function.
>>> In my kasan build do_check() is now fully inlined.
>>> do_check_common() is not and it's using 512 bytes of stack.
>>>
>>>> {
>>>> bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
>>>> struct bpf_verifier_state *state = env->cur_state;
>>>> @@ -21817,7 +21824,7 @@ static int add_hidden_subprog(struct bpf_verifier_env *env, struct bpf_insn *pat
>>>> /* Do various post-verification rewrites in a single program pass.
>>>> * These rewrites simplify JIT and interpreter implementations.
>>>> */
>>>> -static int do_misc_fixups(struct bpf_verifier_env *env)
>>>> +static __workaround_kasan int do_misc_fixups(struct bpf_verifier_env *env)
>>> This one is using 832 byte of stack with kasan.
>>> Which is indeed high.
>>> Big chunk seems to be coming from chk_and_sdiv[] and chk_and_smod[].
>>>
>>> Yonghong,
>>> looks like you contributed that piece of code.
>>> Pls see how to reduce stack size here.
>>> Daniel used this pattern in earlier commits. Looks like
>>> we took it too far.
>> With llvm17, I got the following error:
>>
>> /home/yhs/work/bpf-next/kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check' [-
>> Werror,-Wframe-larger-than]
>> 24491 | int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
>> | ^
>> /home/yhs/work/bpf-next/kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check' [-
>> Werror,-Wframe-larger-than]
>> 19921 | static int do_check(struct bpf_verifier_env *env)
>> | ^
>> 2 errors generated.
>>
>> I checked IR and found the following memory allocations which may contribute
>> excessive stack usage:
>>
>> attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 {
>> entry:
>> %zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854
>> %rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855
>> %cnt.i = alloca i32, align 4, !DIAssignID !19856
>> %patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857
>> %chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858
>> %chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859
>> %chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860
>> %chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861
>> %chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862
>> %chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863
>> %desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864
>> %target_size.i = alloca i32, align 4, !DIAssignID !19865
>> %patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866
>> %patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867
>> %ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868
>> %ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869
>> %ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870
>> %fd.i = alloca i32, align 4, !DIAssignID !19871
>> %log_true_size = alloca i32, align 4, !DIAssignID !19872
>> ...
>>
>> So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and
>> can be coverted to runtime allocation but that is not enough for 1280
>> stack limit, we need to do more conversion from stack to memory
>> allocation. Will try to have uniform way to convert
>> 'alloca [<num> x %struct.bpf_insn]' to runtime allocation.
>>
> Do we need to go all the way to dynamic allocation? See env->insns_buf
> (which some parts of this function are already using for constructing
> instruction patch), let's just converge on that? It pre-allocates
> space for 32 instructions, should be sufficient for all the use cases,
> no?
Make sense. This is much better. Thanks!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang
2025-07-01 21:28 ` Yonghong Song
@ 2025-07-02 7:48 ` Arnd Bergmann
2025-07-02 14:14 ` Yonghong Song
0 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2025-07-02 7:48 UTC (permalink / raw)
To: Yonghong Song, Andrii Nakryiko
Cc: Alexei Starovoitov, Arnd Bergmann, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor,
John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML,
clang-built-linux
On Tue, Jul 1, 2025, at 23:28, Yonghong Song wrote:
> On 7/1/25 1:45 PM, Andrii Nakryiko wrote:
>> On Tue, Jul 1, 2025 at 1:03 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>> On 6/23/25 2:32 PM, Alexei Starovoitov wrote:
>>>> On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote:
>>>>> From: Arnd Bergmann <arnd@arndb.de>
>>>
>>> I checked IR and found the following memory allocations which may contribute
>>> excessive stack usage:
>>>
>>> attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 {
>>> entry:
>>> %zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854
>>> %rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855
>>> %cnt.i = alloca i32, align 4, !DIAssignID !19856
>>> %patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857
>>> %chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858
>>> %chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859
>>> %chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860
>>> %chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861
>>> %chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862
>>> %chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863
>>> %desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864
>>> %target_size.i = alloca i32, align 4, !DIAssignID !19865
>>> %patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866
>>> %patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867
>>> %ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868
>>> %ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869
>>> %ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870
>>> %fd.i = alloca i32, align 4, !DIAssignID !19871
>>> %log_true_size = alloca i32, align 4, !DIAssignID !19872
>>> ...
>>>
>>> So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and
>>> can be coverted to runtime allocation but that is not enough for 1280
>>> stack limit, we need to do more conversion from stack to memory
>>> allocation. Will try to have uniform way to convert
>>> 'alloca [<num> x %struct.bpf_insn]' to runtime allocation.
>>>
>> Do we need to go all the way to dynamic allocation? See env->insns_buf
>> (which some parts of this function are already using for constructing
>> instruction patch), let's just converge on that? It pre-allocates
>> space for 32 instructions, should be sufficient for all the use cases,
>> no?
>
> Make sense. This is much better. Thanks!
I'm not sure if that actually helps on the old clang version, as far
as I understood it in my initial analysis, the problem in the
struct bpf_insn chk_and_sdiv[] = {
/* [R,W]x sdiv 0 -> 0
* LLONG_MIN sdiv -1 -> LLONG_MIN
* INT_MIN sdiv -1 -> INT_MIN
*/
BPF_MOV64_REG(BPF_REG_AX, insn->src_reg),
...
}
construct is not the chk_and_sdiv[] array itself but the
struct initializer in the BPF_MOV64_REG() macro that leads to
having two copies of the struct on the stack and then copying
between them. In gcc or clang-18+, these all get folded
into a single object on the stack.
(Disclaimer: I don't understand anything about how clang
actually works internally, the above is only speculation on
my side, based on the assembler output)
Arnd
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang
2025-07-02 7:48 ` Arnd Bergmann
@ 2025-07-02 14:14 ` Yonghong Song
0 siblings, 0 replies; 7+ messages in thread
From: Yonghong Song @ 2025-07-02 14:14 UTC (permalink / raw)
To: Arnd Bergmann, Andrii Nakryiko
Cc: Alexei Starovoitov, Arnd Bergmann, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Nathan Chancellor,
John Fastabend, Martin KaFai Lau, Eduard Zingerman, Song Liu,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Nick Desaulniers, Bill Wendling, Justin Stitt,
Kumar Kartikeya Dwivedi, Luis Gerhorst, bpf, LKML,
clang-built-linux
On 7/2/25 12:48 AM, Arnd Bergmann wrote:
> On Tue, Jul 1, 2025, at 23:28, Yonghong Song wrote:
>> On 7/1/25 1:45 PM, Andrii Nakryiko wrote:
>>> On Tue, Jul 1, 2025 at 1:03 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> On 6/23/25 2:32 PM, Alexei Starovoitov wrote:
>>>>> On Fri, Jun 20, 2025 at 4:38 AM Arnd Bergmann <arnd@kernel.org> wrote:
>>>>>> From: Arnd Bergmann <arnd@arndb.de>
>>>> I checked IR and found the following memory allocations which may contribute
>>>> excessive stack usage:
>>>>
>>>> attr.coerce1, i32 noundef %uattr_size) local_unnamed_addr #0 align 16 !dbg !19800 {
>>>> entry:
>>>> %zext_patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19854
>>>> %rnd_hi32_patch.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19855
>>>> %cnt.i = alloca i32, align 4, !DIAssignID !19856
>>>> %patch.i766 = alloca [3 x %struct.bpf_insn], align 16, !DIAssignID !19857
>>>> %chk_and_sdiv.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19858
>>>> %chk_and_smod.i = alloca [1 x %struct.bpf_insn], align 4, !DIAssignID !19859
>>>> %chk_and_div.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19860
>>>> %chk_and_mod.i = alloca [4 x %struct.bpf_insn], align 16, !DIAssignID !19861
>>>> %chk_and_sdiv343.i = alloca [8 x %struct.bpf_insn], align 16, !DIAssignID !19862
>>>> %chk_and_smod472.i = alloca [9 x %struct.bpf_insn], align 16, !DIAssignID !19863
>>>> %desc.i = alloca %struct.bpf_jit_poke_descriptor, align 8, !DIAssignID !19864
>>>> %target_size.i = alloca i32, align 4, !DIAssignID !19865
>>>> %patch.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19866
>>>> %patch355.i = alloca [2 x %struct.bpf_insn], align 16, !DIAssignID !19867
>>>> %ja.i = alloca %struct.bpf_insn, align 8, !DIAssignID !19868
>>>> %ret_insn.i.i = alloca [8 x i32], align 16, !DIAssignID !19869
>>>> %ret_prog.i.i = alloca [8 x i32], align 16, !DIAssignID !19870
>>>> %fd.i = alloca i32, align 4, !DIAssignID !19871
>>>> %log_true_size = alloca i32, align 4, !DIAssignID !19872
>>>> ...
>>>>
>>>> So yes, chk_and_{div,mod,sdiv,smod} consumes quite some stack and
>>>> can be coverted to runtime allocation but that is not enough for 1280
>>>> stack limit, we need to do more conversion from stack to memory
>>>> allocation. Will try to have uniform way to convert
>>>> 'alloca [<num> x %struct.bpf_insn]' to runtime allocation.
>>>>
>>> Do we need to go all the way to dynamic allocation? See env->insns_buf
>>> (which some parts of this function are already using for constructing
>>> instruction patch), let's just converge on that? It pre-allocates
>>> space for 32 instructions, should be sufficient for all the use cases,
>>> no?
>> Make sense. This is much better. Thanks!
> I'm not sure if that actually helps on the old clang version, as far
> as I understood it in my initial analysis, the problem in the
>
> struct bpf_insn chk_and_sdiv[] = {
> /* [R,W]x sdiv 0 -> 0
> * LLONG_MIN sdiv -1 -> LLONG_MIN
> * INT_MIN sdiv -1 -> INT_MIN
> */
> BPF_MOV64_REG(BPF_REG_AX, insn->src_reg),
> ...
> }
>
> construct is not the chk_and_sdiv[] array itself but the
> struct initializer in the BPF_MOV64_REG() macro that leads to
> having two copies of the struct on the stack and then copying
> between them. In gcc or clang-18+, these all get folded
> into a single object on the stack.
See https://lore.kernel.org/bpf/20250702053332.1991516-1-yonghong.song@linux.dev/.
The above 'struct bpf_insn chk_and_sdiv[] = { ... }' will be removed so
there will not be stack consumption any more for it. Instead, we use
the scratch space in bpf_verifier_env.
>
> (Disclaimer: I don't understand anything about how clang
> actually works internally, the above is only speculation on
> my side, based on the assembler output)
>
> Arnd
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-02 14:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20 11:38 [PATCH] bpf: turn off sanitizer in do_misc_fixups for old clang Arnd Bergmann
2025-06-23 21:32 ` Alexei Starovoitov
2025-07-01 20:03 ` Yonghong Song
2025-07-01 20:45 ` Andrii Nakryiko
2025-07-01 21:28 ` Yonghong Song
2025-07-02 7:48 ` Arnd Bergmann
2025-07-02 14:14 ` Yonghong Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).