* riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
@ 2025-05-22 13:29 Naresh Kamboju
2025-05-22 16:48 ` Kent Overstreet
0 siblings, 1 reply; 14+ messages in thread
From: Naresh Kamboju @ 2025-05-22 13:29 UTC (permalink / raw)
To: linux-bcache, open list, lkft-triage, Linux Regressions
Cc: kent.overstreet, Arnd Bergmann, Dan Carpenter, Anders Roxell
Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next
tag next-20250516 and next-20250522.
First seen on the next-20250516
Good: next-20250515
Bad: next-20250516
Regressions found on riscv:
- build/gcc-13-allyesconfig
Regression Analysis:
- New regression? Yes
- Reproducible? Yes
Build regression: riscv gcc-13 allyesconfig error the frame size of
2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
## Build log
fs/bcachefs/data_update.c: In function '__bch2_data_update_index_update':
fs/bcachefs/data_update.c:464:1: error: the frame size of 2064 bytes
is larger than 2048 bytes [-Werror=frame-larger-than=]
464 | }
| ^
cc1: all warnings being treated as errors
## Source
* Kernel version: 6.15.0-rc7
* Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
* Git sha: 460178e842c7a1e48a06df684c66eb5fd630bcf7
* Git describe: next-20250522
## Build
* Build log: https://qa-reports.linaro.org/api/testruns/28521854/log_file/
* Build history:
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250522/testrun/28521854/suite/build/test/gcc-13-allyesconfig/history/
* Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/2xRoAAw5dl69AvvHb8oZ3pL1SFx/
* Kernel config:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2xRoAAw5dl69AvvHb8oZ3pL1SFx/config
--
Linaro LKFT
https://lkft.linaro.org
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-22 13:29 riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Naresh Kamboju @ 2025-05-22 16:48 ` Kent Overstreet 2025-05-23 13:19 ` Naresh Kamboju 0 siblings, 1 reply; 14+ messages in thread From: Kent Overstreet @ 2025-05-22 16:48 UTC (permalink / raw) To: Naresh Kamboju Cc: linux-bcache, open list, lkft-triage, Linux Regressions, Arnd Bergmann, Dan Carpenter, Anders Roxell On Thu, May 22, 2025 at 06:59:53PM +0530, Naresh Kamboju wrote: > Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next > tag next-20250516 and next-20250522. > > First seen on the next-20250516 > Good: next-20250515 > Bad: next-20250516 > > Regressions found on riscv: > - build/gcc-13-allyesconfig > > Regression Analysis: > - New regression? Yes > - Reproducible? Yes > > Build regression: riscv gcc-13 allyesconfig error the frame size of > 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Is this a kmsan build? kmsan seems to inflate stack usage by quite a lot. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-22 16:48 ` Kent Overstreet @ 2025-05-23 13:19 ` Naresh Kamboju 2025-05-23 13:49 ` Arnd Bergmann 0 siblings, 1 reply; 14+ messages in thread From: Naresh Kamboju @ 2025-05-23 13:19 UTC (permalink / raw) To: Kent Overstreet Cc: linux-bcache, open list, lkft-triage, Linux Regressions, Arnd Bergmann, Dan Carpenter, Anders Roxell On Thu, 22 May 2025 at 22:18, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Thu, May 22, 2025 at 06:59:53PM +0530, Naresh Kamboju wrote: > > Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next > > tag next-20250516 and next-20250522. > > > > First seen on the next-20250516 > > Good: next-20250515 > > Bad: next-20250516 > > > > Regressions found on riscv: > > - build/gcc-13-allyesconfig > > > > Regression Analysis: > > - New regression? Yes > > - Reproducible? Yes > > > > Build regression: riscv gcc-13 allyesconfig error the frame size of > > 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > > Is this a kmsan build? kmsan seems to inflate stack usage by quite a > lot. This is allyesconfig build which has KASAN builds. CONFIG_HAVE_ARCH_KASAN=y CONFIG_HAVE_ARCH_KASAN_VMALLOC=y CONFIG_CC_HAS_KASAN_GENERIC=y CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y CONFIG_KASAN=y CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y CONFIG_KASAN_GENERIC=y # CONFIG_KASAN_OUTLINE is not set CONFIG_KASAN_INLINE=y CONFIG_KASAN_STACK=y CONFIG_KASAN_VMALLOC=y CONFIG_KASAN_KUNIT_TEST=y CONFIG_KASAN_EXTRA_INFO=y - Naresh ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-23 13:19 ` Naresh Kamboju @ 2025-05-23 13:49 ` Arnd Bergmann 2025-05-23 14:08 ` Kent Overstreet 0 siblings, 1 reply; 14+ messages in thread From: Arnd Bergmann @ 2025-05-23 13:49 UTC (permalink / raw) To: Naresh Kamboju, Kent Overstreet Cc: linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell On Fri, May 23, 2025, at 15:19, Naresh Kamboju wrote: > On Thu, 22 May 2025 at 22:18, Kent Overstreet <kent.overstreet@linux.dev> wrote: >> >> On Thu, May 22, 2025 at 06:59:53PM +0530, Naresh Kamboju wrote: >> > Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next >> > tag next-20250516 and next-20250522. >> > >> > First seen on the next-20250516 >> > Good: next-20250515 >> > Bad: next-20250516 >> > >> > Regressions found on riscv: >> > - build/gcc-13-allyesconfig >> > >> > Regression Analysis: >> > - New regression? Yes >> > - Reproducible? Yes >> > >> > Build regression: riscv gcc-13 allyesconfig error the frame size of >> > 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] >> >> Is this a kmsan build? kmsan seems to inflate stack usage by quite a >> lot. KMSAN is currently a clang-only feature. > This is allyesconfig build which has KASAN builds. > > CONFIG_HAVE_ARCH_KASAN=y > CONFIG_HAVE_ARCH_KASAN_VMALLOC=y > CONFIG_CC_HAS_KASAN_GENERIC=y > CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y > CONFIG_KASAN=y > CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y > CONFIG_KASAN_GENERIC=y > # CONFIG_KASAN_OUTLINE is not set > CONFIG_KASAN_INLINE=y > CONFIG_KASAN_STACK=y I reproduced the problem locally and found this to go down to 1440 bytes after I turn off KASAN_STACK. next-20250523 has some changes that take the number down further to 1136 with KASAN_STACK and or 1552 with KASAN_STACK. I've turned bcachefs with kasan-stack on for my randconfig builds again to see if there are any remaining corner cases. Arnd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-23 13:49 ` Arnd Bergmann @ 2025-05-23 14:08 ` Kent Overstreet 2025-05-23 15:17 ` Arnd Bergmann 0 siblings, 1 reply; 14+ messages in thread From: Kent Overstreet @ 2025-05-23 14:08 UTC (permalink / raw) To: Arnd Bergmann Cc: Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell On Fri, May 23, 2025 at 03:49:54PM +0200, Arnd Bergmann wrote: > On Fri, May 23, 2025, at 15:19, Naresh Kamboju wrote: > > On Thu, 22 May 2025 at 22:18, Kent Overstreet <kent.overstreet@linux.dev> wrote: > >> > >> On Thu, May 22, 2025 at 06:59:53PM +0530, Naresh Kamboju wrote: > >> > Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next > >> > tag next-20250516 and next-20250522. > >> > > >> > First seen on the next-20250516 > >> > Good: next-20250515 > >> > Bad: next-20250516 > >> > > >> > Regressions found on riscv: > >> > - build/gcc-13-allyesconfig > >> > > >> > Regression Analysis: > >> > - New regression? Yes > >> > - Reproducible? Yes > >> > > >> > Build regression: riscv gcc-13 allyesconfig error the frame size of > >> > 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] > >> > >> Is this a kmsan build? kmsan seems to inflate stack usage by quite a > >> lot. > > KMSAN is currently a clang-only feature. > > > This is allyesconfig build which has KASAN builds. > > > > CONFIG_HAVE_ARCH_KASAN=y > > CONFIG_HAVE_ARCH_KASAN_VMALLOC=y > > CONFIG_CC_HAS_KASAN_GENERIC=y > > CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y > > CONFIG_KASAN=y > > CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y > > CONFIG_KASAN_GENERIC=y > > # CONFIG_KASAN_OUTLINE is not set > > CONFIG_KASAN_INLINE=y > > CONFIG_KASAN_STACK=y > > I reproduced the problem locally and found this to go down to > 1440 bytes after I turn off KASAN_STACK. next-20250523 has > some changes that take the number down further to 1136 with > KASAN_STACK and or 1552 with KASAN_STACK. > > I've turned bcachefs with kasan-stack on for my randconfig > builds again to see if there are any remaining corner cases. Thanks for the numbers - that does still seem high, I'll have to have a look with pahole. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-23 14:08 ` Kent Overstreet @ 2025-05-23 15:17 ` Arnd Bergmann 2025-05-23 17:11 ` Kent Overstreet 0 siblings, 1 reply; 14+ messages in thread From: Arnd Bergmann @ 2025-05-23 15:17 UTC (permalink / raw) To: Kent Overstreet Cc: Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell On Fri, May 23, 2025, at 16:08, Kent Overstreet wrote: > On Fri, May 23, 2025 at 03:49:54PM +0200, Arnd Bergmann wrote: >> On Fri, May 23, 2025, at 15:19, Naresh Kamboju wrote: > >> I reproduced the problem locally and found this to go down to >> 1440 bytes after I turn off KASAN_STACK. next-20250523 has >> some changes that take the number down further to 1136 with >> KASAN_STACK and or 1552 with KASAN_STACK. >> >> I've turned bcachefs with kasan-stack on for my randconfig >> builds again to see if there are any remaining corner cases. > > Thanks for the numbers - that does still seem high, I'll have to have a > look with pahole. I agree it's still larger than it should be: having more than a few hundred bytes on a function usually means that there is both the risk for actual overflow and general inefficiency if all the stack data gets accessed as well. It's probably not actually structure data though, but a combination of effects: - KASAN_STACK adds extra redzones for each variable - KASAN_STACK further prevents stack slots from getting reused inside one function, in order to better pinpoint which instance caused problems like out-of-scope access - passing structures by value causes them to be put on the stack on some architectures, even when the structure size is only one or two registers - sanitizers turn off optimizations that lead to better stack usage - in some cases, the missed optimization ends up causing local variables to get spilled to the stack many times because of a combination of all the above. The good news is that so far my randconfig builds have not shown any more stack frame warnings on next-20230523 with bcachefs force-enabled, now 55 builds into the change, across arm32/arm64/x86 using gcc-15.1. Arnd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-23 15:17 ` Arnd Bergmann @ 2025-05-23 17:11 ` Kent Overstreet 2025-05-23 18:01 ` Arnd Bergmann 0 siblings, 1 reply; 14+ messages in thread From: Kent Overstreet @ 2025-05-23 17:11 UTC (permalink / raw) To: Arnd Bergmann Cc: Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell On Fri, May 23, 2025 at 05:17:15PM +0200, Arnd Bergmann wrote: > On Fri, May 23, 2025, at 16:08, Kent Overstreet wrote: > > On Fri, May 23, 2025 at 03:49:54PM +0200, Arnd Bergmann wrote: > >> On Fri, May 23, 2025, at 15:19, Naresh Kamboju wrote: > > > >> I reproduced the problem locally and found this to go down to > >> 1440 bytes after I turn off KASAN_STACK. next-20250523 has > >> some changes that take the number down further to 1136 with > >> KASAN_STACK and or 1552 with KASAN_STACK. > >> > >> I've turned bcachefs with kasan-stack on for my randconfig > >> builds again to see if there are any remaining corner cases. > > > > Thanks for the numbers - that does still seem high, I'll have to have a > > look with pahole. > > I agree it's still larger than it should be: having more than > a few hundred bytes on a function usually means that there is > both the risk for actual overflow and general inefficiency if > all the stack data gets accessed as well. > > It's probably not actually structure data though, but a > combination of effects: > > - KASAN_STACK adds extra redzones for each variable > - KASAN_STACK further prevents stack slots from getting > reused inside one function, in order to better pinpoint > which instance caused problems like out-of-scope access > - passing structures by value causes them to be put on > the stack on some architectures, even when the structure > size is only one or two registers We mainly do this with bkey_s_c, which is just two words: on x86_64, that gets passed in registers. Is riscv different? > - sanitizers turn off optimizations that lead to better > stack usage > - in some cases, the missed optimization ends up causing > local variables to get spilled to the stack many times > because of a combination of all the above. Yeesh. I suspect we should be running with a larger stack when the sanitizers are running, and perhaps tweak the warnings accordingly. I did a bunch of stack usage work after I found a kmsan build was blowing out the stack, but then running with max stack usage tracing enabled showed it to be a largely non issue on non-sanitizer builds, IIRC. > The good news is that so far my randconfig builds have not > shown any more stack frame warnings on next-20230523 with > bcachefs force-enabled, now 55 builds into the change, > across arm32/arm64/x86 using gcc-15.1. Good to know, thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-23 17:11 ` Kent Overstreet @ 2025-05-23 18:01 ` Arnd Bergmann 2025-05-25 17:18 ` David Laight 0 siblings, 1 reply; 14+ messages in thread From: Arnd Bergmann @ 2025-05-23 18:01 UTC (permalink / raw) To: Kent Overstreet Cc: Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell On Fri, May 23, 2025, at 19:11, Kent Overstreet wrote: > On Fri, May 23, 2025 at 05:17:15PM +0200, Arnd Bergmann wrote: >> >> - KASAN_STACK adds extra redzones for each variable >> - KASAN_STACK further prevents stack slots from getting >> reused inside one function, in order to better pinpoint >> which instance caused problems like out-of-scope access >> - passing structures by value causes them to be put on >> the stack on some architectures, even when the structure >> size is only one or two registers > > We mainly do this with bkey_s_c, which is just two words: on x86_64, > that gets passed in registers. Is riscv different? Not sure, I think it's mostly older ABIs that are limited, either not passing structures in registers at all, or only possibly one but not two of them. >> - sanitizers turn off optimizations that lead to better >> stack usage >> - in some cases, the missed optimization ends up causing >> local variables to get spilled to the stack many times >> because of a combination of all the above. > > Yeesh. > > I suspect we should be running with a larger stack when the sanitizers > are running, and perhaps tweak the warnings accordingly. I did a bunch > of stack usage work after I found a kmsan build was blowing out the > stack, but then running with max stack usage tracing enabled showed it > to be a largely non issue on non-sanitizer builds, IIRC. Enabling KASAN does double the available stack space. However, I don't think we should use that as an excuse to raise the per-function warning limit, because - the majority of all function stacks do not grow that much when sanitizers are enabled - allmodconfig enables KASAN and should still catch mistakes where a driver accidentally puts a large structure on the stack - 2KB on 64-bit targes is a really large limit. At some point in the past I had a series that lowered the limit to 1536 byte for 64-bit targets, but I never managed to get all the changes merged. Arnd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-23 18:01 ` Arnd Bergmann @ 2025-05-25 17:18 ` David Laight 2025-05-25 17:36 ` Kent Overstreet 0 siblings, 1 reply; 14+ messages in thread From: David Laight @ 2025-05-25 17:18 UTC (permalink / raw) To: Arnd Bergmann Cc: Kent Overstreet, Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell On Fri, 23 May 2025 20:01:33 +0200 "Arnd Bergmann" <arnd@arndb.de> wrote: > On Fri, May 23, 2025, at 19:11, Kent Overstreet wrote: > > On Fri, May 23, 2025 at 05:17:15PM +0200, Arnd Bergmann wrote: > >> > >> - KASAN_STACK adds extra redzones for each variable > >> - KASAN_STACK further prevents stack slots from getting > >> reused inside one function, in order to better pinpoint > >> which instance caused problems like out-of-scope access > >> - passing structures by value causes them to be put on > >> the stack on some architectures, even when the structure > >> size is only one or two registers > > > > We mainly do this with bkey_s_c, which is just two words: on x86_64, > > that gets passed in registers. Is riscv different? > > Not sure, I think it's mostly older ABIs that are limited, > either not passing structures in registers at all, or only > possibly one but not two of them. > > >> - sanitizers turn off optimizations that lead to better > >> stack usage > >> - in some cases, the missed optimization ends up causing > >> local variables to get spilled to the stack many times > >> because of a combination of all the above. > > > > Yeesh. > > > > I suspect we should be running with a larger stack when the sanitizers > > are running, and perhaps tweak the warnings accordingly. I did a bunch > > of stack usage work after I found a kmsan build was blowing out the > > stack, but then running with max stack usage tracing enabled showed it > > to be a largely non issue on non-sanitizer builds, IIRC. > > Enabling KASAN does double the available stack space. However, I don't > think we should use that as an excuse to raise the per-function > warning limit, because > > - the majority of all function stacks do not grow that much when > sanitizers are enabled > - allmodconfig enables KASAN and should still catch mistakes > where a driver accidentally puts a large structure on the stack That is rather annoying when you want to look at the generated code :-( > - 2KB on 64-bit targes is a really large limit. At some point > in the past I had a series that lowered the limit to 1536 byte > for 64-bit targets, but I never managed to get all the changes > merged. I've a cunning plan to do a proper static analysis of stack usage. It is a 'simple' matter of getting objtool to output all calls with the stack offset. Indirect calls need the function hashes from fine-ibt, but also need clang to support 'hash seeds' to disambiguate all the void (*)(void *) functions. That'll first barf at all recursion, and then, I expect, show a massive stack use inside snprintf() in some error path. Just need a big stack of 'round tuits'. David > > > Arnd > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-25 17:18 ` David Laight @ 2025-05-25 17:36 ` Kent Overstreet 2025-05-25 17:47 ` David Laight 2025-05-25 19:25 ` Steven Rostedt 0 siblings, 2 replies; 14+ messages in thread From: Kent Overstreet @ 2025-05-25 17:36 UTC (permalink / raw) To: David Laight Cc: Arnd Bergmann, Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell, Steven Rostedt +cc Steve On Sun, May 25, 2025 at 06:18:42PM +0100, David Laight wrote: > On Fri, 23 May 2025 20:01:33 +0200 > "Arnd Bergmann" <arnd@arndb.de> wrote: > > > On Fri, May 23, 2025, at 19:11, Kent Overstreet wrote: > > > On Fri, May 23, 2025 at 05:17:15PM +0200, Arnd Bergmann wrote: > > >> > > >> - KASAN_STACK adds extra redzones for each variable > > >> - KASAN_STACK further prevents stack slots from getting > > >> reused inside one function, in order to better pinpoint > > >> which instance caused problems like out-of-scope access > > >> - passing structures by value causes them to be put on > > >> the stack on some architectures, even when the structure > > >> size is only one or two registers > > > > > > We mainly do this with bkey_s_c, which is just two words: on x86_64, > > > that gets passed in registers. Is riscv different? > > > > Not sure, I think it's mostly older ABIs that are limited, > > either not passing structures in registers at all, or only > > possibly one but not two of them. > > > > >> - sanitizers turn off optimizations that lead to better > > >> stack usage > > >> - in some cases, the missed optimization ends up causing > > >> local variables to get spilled to the stack many times > > >> because of a combination of all the above. > > > > > > Yeesh. > > > > > > I suspect we should be running with a larger stack when the sanitizers > > > are running, and perhaps tweak the warnings accordingly. I did a bunch > > > of stack usage work after I found a kmsan build was blowing out the > > > stack, but then running with max stack usage tracing enabled showed it > > > to be a largely non issue on non-sanitizer builds, IIRC. > > > > Enabling KASAN does double the available stack space. However, I don't > > think we should use that as an excuse to raise the per-function > > warning limit, because > > > > - the majority of all function stacks do not grow that much when > > sanitizers are enabled > > - allmodconfig enables KASAN and should still catch mistakes > > where a driver accidentally puts a large structure on the stack > > That is rather annoying when you want to look at the generated code :-( > > > - 2KB on 64-bit targes is a really large limit. At some point > > in the past I had a series that lowered the limit to 1536 byte > > for 64-bit targets, but I never managed to get all the changes > > merged. > > I've a cunning plan to do a proper static analysis of stack usage. > It is a 'simple' matter of getting objtool to output all calls with > the stack offset. > Indirect calls need the function hashes from fine-ibt, but also need > clang to support 'hash seeds' to disambiguate all the void (*)(void *) > functions. > That'll first barf at all recursion, and then, I expect, show a massive > stack use inside snprintf() in some error path. I suspect recursion will make the results you get with that approach useless. We already have "trace max stack", but that only checks at process exit, so it doesn't tell you much. We could do better with tracing - just inject a trampoline that checks the current stack usage against the maximum stack usage we've seen, and emits a trace event with a stack trace if it's greater. (and now Steve's going to tell us he's already done this :) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-25 17:36 ` Kent Overstreet @ 2025-05-25 17:47 ` David Laight 2025-05-25 18:10 ` Kent Overstreet 2025-05-25 19:25 ` Steven Rostedt 1 sibling, 1 reply; 14+ messages in thread From: David Laight @ 2025-05-25 17:47 UTC (permalink / raw) To: Kent Overstreet Cc: Arnd Bergmann, Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell, Steven Rostedt On Sun, 25 May 2025 13:36:16 -0400 Kent Overstreet <kent.overstreet@linux.dev> wrote: > +cc Steve ... > > I've a cunning plan to do a proper static analysis of stack usage. > > It is a 'simple' matter of getting objtool to output all calls with > > the stack offset. > > Indirect calls need the function hashes from fine-ibt, but also need > > clang to support 'hash seeds' to disambiguate all the void (*)(void *) > > functions. > > That'll first barf at all recursion, and then, I expect, show a massive > > stack use inside snprintf() in some error path. > > I suspect recursion will make the results you get with that approach > useless. Recursion is an issue, but the kernel really doesn't support recursion. So you actually want to know the possible recursion loops anyway. I suspect (hope) most will be the 'recurses only once' type. If not they need some other bound. > We already have "trace max stack", but that only checks at process exit, > so it doesn't tell you much. > > We could do better with tracing - just inject a trampoline that checks > the current stack usage against the maximum stack usage we've seen, and > emits a trace event with a stack trace if it's greater. Both those only tells you the stack you've used. The static analysis will show you the stack 'you might use'. Which is really much more important. I did this for an embedded system a long time ago. The outcome was that we didn't have enough memory to allocate the 'worst case' stacks! David ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-25 17:47 ` David Laight @ 2025-05-25 18:10 ` Kent Overstreet 0 siblings, 0 replies; 14+ messages in thread From: Kent Overstreet @ 2025-05-25 18:10 UTC (permalink / raw) To: David Laight Cc: Arnd Bergmann, Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell, Steven Rostedt On Sun, May 25, 2025 at 06:47:57PM +0100, David Laight wrote: > On Sun, 25 May 2025 13:36:16 -0400 > Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > +cc Steve > ... > > > I've a cunning plan to do a proper static analysis of stack usage. > > > It is a 'simple' matter of getting objtool to output all calls with > > > the stack offset. > > > Indirect calls need the function hashes from fine-ibt, but also need > > > clang to support 'hash seeds' to disambiguate all the void (*)(void *) > > > functions. > > > That'll first barf at all recursion, and then, I expect, show a massive > > > stack use inside snprintf() in some error path. > > > > I suspect recursion will make the results you get with that approach > > useless. > > Recursion is an issue, but the kernel really doesn't support recursion. > So you actually want to know the possible recursion loops anyway. > I suspect (hope) most will be the 'recurses only once' type. > If not they need some other bound. Recursion is a fact of life when you get different subsystems interacting in unpredictable ways. You can be in one filesystem, and then end up in a fault handler (gup(), or a simple copy to/from user), and then end up in a completely different filesystem - and then you call into the block layer, or networking if it's NFS. Static analysis might get you some useful data within a subsystem, but it won't tell you much about the kernel as a whole as people are actually running it. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-25 17:36 ` Kent Overstreet 2025-05-25 17:47 ` David Laight @ 2025-05-25 19:25 ` Steven Rostedt 2025-05-25 20:04 ` Kent Overstreet 1 sibling, 1 reply; 14+ messages in thread From: Steven Rostedt @ 2025-05-25 19:25 UTC (permalink / raw) To: Kent Overstreet Cc: David Laight, Arnd Bergmann, Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell On Sun, 25 May 2025 13:36:16 -0400 Kent Overstreet <kent.overstreet@linux.dev> wrote: > We already have "trace max stack", but that only checks at process exit, > so it doesn't tell you much. Nope, it traces the stack at every function call, but it misses the leaf functions and also doesn't check interrupts as they may use a different stack. > > We could do better with tracing - just inject a trampoline that checks > the current stack usage against the maximum stack usage we've seen, and > emits a trace event with a stack trace if it's greater. > > (and now Steve's going to tell us he's already done this :) Close ;-) # echo 1 > /proc/sys/kernel/stack_tracer_enabled Wait. # cat /sys/kernel/tracing/stack_trace Depth Size Location (33 entries) ----- ---- -------- 0) 8360 48 __msecs_to_jiffies+0x9/0x30 1) 8312 104 update_group_capacity+0x95/0x970 2) 8208 520 update_sd_lb_stats.constprop.0+0x278/0x2f40 3) 7688 416 sched_balance_find_src_group+0x96/0xe30 4) 7272 512 sched_balance_rq+0x53f/0x2fe0 5) 6760 344 sched_balance_newidle+0x6c1/0x1310 6) 6416 80 pick_next_task_fair+0x55/0xe60 7) 6336 328 __schedule+0x8a5/0x33d0 8) 6008 32 schedule+0xe2/0x3b0 9) 5976 32 io_schedule+0x8f/0xf0 10) 5944 264 rq_qos_wait+0x12a/0x200 11) 5680 144 wbt_wait+0x159/0x260 12) 5536 40 __rq_qos_throttle+0x50/0x90 13) 5496 320 blk_mq_submit_bio+0x70b/0x1ff0 14) 5176 240 __submit_bio+0x1b3/0x600 15) 4936 248 submit_bio_noacct_nocheck+0x546/0xca0 16) 4688 144 ext4_bio_write_folio+0x69d/0x1870 17) 4544 64 mpage_submit_folio+0x14c/0x2b0 18) 4480 96 mpage_process_page_bufs+0x392/0x7a0 19) 4384 632 mpage_prepare_extent_to_map+0xa5b/0x1080 20) 3752 496 ext4_do_writepages+0x8af/0x2ee0 21) 3256 304 ext4_writepages+0x26f/0x5c0 22) 2952 344 do_writepages+0x183/0x7c0 23) 2608 152 __writeback_single_inode+0x114/0xb00 24) 2456 744 writeback_sb_inodes+0x52b/0xdf0 25) 1712 168 __writeback_inodes_wb+0xf4/0x270 26) 1544 312 wb_writeback+0x547/0x800 27) 1232 328 wb_workfn+0x7b1/0xbc0 28) 904 352 process_one_work+0x85a/0x1450 29) 552 176 worker_thread+0x5b7/0xf80 30) 376 168 kthread+0x371/0x720 31) 208 32 ret_from_fork+0x34/0x70 32) 176 176 ret_from_fork_asm+0x1a/0x30 The code that does this is in kernel/trace/trace_stack.c It simply attaches to the function tracer and at ever function checks the current stack size. Hmm, I need to update this because today we even pass the stack pointer via the ftrace_regs if the arch supports it. Using that would allow me to get rid of the hack: static void check_stack(unsigned long ip, unsigned long *stack) { [..] this_size = ((unsigned long)stack) & (THREAD_SIZE-1); this_size = THREAD_SIZE - this_size; unsigned long stack; [..] static void stack_trace_call(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs) { unsigned long stack; [..] check_stack(ip, &stack); -- Steve ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] 2025-05-25 19:25 ` Steven Rostedt @ 2025-05-25 20:04 ` Kent Overstreet 0 siblings, 0 replies; 14+ messages in thread From: Kent Overstreet @ 2025-05-25 20:04 UTC (permalink / raw) To: Steven Rostedt Cc: David Laight, Arnd Bergmann, Naresh Kamboju, linux-bcache, open list, lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell On Sun, May 25, 2025 at 03:25:02PM -0400, Steven Rostedt wrote: > On Sun, 25 May 2025 13:36:16 -0400 > Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > We already have "trace max stack", but that only checks at process exit, > > so it doesn't tell you much. > > Nope, it traces the stack at every function call, but it misses the leaf > functions and also doesn't check interrupts as they may use a different > stack. I was thinking of DEBUG_STACK_USAGE :) > > We could do better with tracing - just inject a trampoline that checks > > the current stack usage against the maximum stack usage we've seen, and > > emits a trace event with a stack trace if it's greater. > > > > (and now Steve's going to tell us he's already done this :) > > Close ;-) > > # echo 1 > /proc/sys/kernel/stack_tracer_enabled > > Wait. > > # cat /sys/kernel/tracing/stack_trace > Depth Size Location (33 entries) > ----- ---- -------- > 0) 8360 48 __msecs_to_jiffies+0x9/0x30 > 1) 8312 104 update_group_capacity+0x95/0x970 > 2) 8208 520 update_sd_lb_stats.constprop.0+0x278/0x2f40 > 3) 7688 416 sched_balance_find_src_group+0x96/0xe30 > 4) 7272 512 sched_balance_rq+0x53f/0x2fe0 > 5) 6760 344 sched_balance_newidle+0x6c1/0x1310 > 6) 6416 80 pick_next_task_fair+0x55/0xe60 > 7) 6336 328 __schedule+0x8a5/0x33d0 > 8) 6008 32 schedule+0xe2/0x3b0 > 9) 5976 32 io_schedule+0x8f/0xf0 > 10) 5944 264 rq_qos_wait+0x12a/0x200 > 11) 5680 144 wbt_wait+0x159/0x260 > 12) 5536 40 __rq_qos_throttle+0x50/0x90 > 13) 5496 320 blk_mq_submit_bio+0x70b/0x1ff0 > 14) 5176 240 __submit_bio+0x1b3/0x600 > 15) 4936 248 submit_bio_noacct_nocheck+0x546/0xca0 > 16) 4688 144 ext4_bio_write_folio+0x69d/0x1870 > 17) 4544 64 mpage_submit_folio+0x14c/0x2b0 > 18) 4480 96 mpage_process_page_bufs+0x392/0x7a0 > 19) 4384 632 mpage_prepare_extent_to_map+0xa5b/0x1080 > 20) 3752 496 ext4_do_writepages+0x8af/0x2ee0 > 21) 3256 304 ext4_writepages+0x26f/0x5c0 > 22) 2952 344 do_writepages+0x183/0x7c0 > 23) 2608 152 __writeback_single_inode+0x114/0xb00 > 24) 2456 744 writeback_sb_inodes+0x52b/0xdf0 > 25) 1712 168 __writeback_inodes_wb+0xf4/0x270 > 26) 1544 312 wb_writeback+0x547/0x800 > 27) 1232 328 wb_workfn+0x7b1/0xbc0 > 28) 904 352 process_one_work+0x85a/0x1450 > 29) 552 176 worker_thread+0x5b7/0xf80 > 30) 376 168 kthread+0x371/0x720 > 31) 208 32 ret_from_fork+0x34/0x70 > 32) 176 176 ret_from_fork_asm+0x1a/0x30 Nice! This is exactly what I was looking for :) Depth Size Location (48 entries) ----- ---- -------- 0) 7728 48 __update_load_avg_se+0x9/0x440 1) 7680 80 update_load_avg+0x25f/0x2b0 2) 7600 56 set_next_task_fair+0x232/0x290 3) 7544 48 pick_next_task_fair+0xcf/0x1a0 4) 7496 120 __schedule+0x284/0xe80 5) 7376 16 preempt_schedule_irq+0x33/0x50 6) 7360 136 asm_common_interrupt+0x26/0x40 7) 7224 48 get_symbol_offset+0x43/0x70 8) 7176 56 kallsyms_lookup_buildid+0x55/0xf0 9) 7120 88 __sprint_symbol.isra.0+0x48/0xf0 10) 7032 720 symbol_string+0xf1/0x120 11) 6312 120 vsnprintf+0x3dc/0x5d0 12) 6192 128 bch2_prt_printf+0x57/0x140 13) 6064 64 bch2_prt_task_backtrace+0x71/0xc0 14) 6000 40 print_cycle+0x71/0xa0 15) 5960 104 trace_would_deadlock+0xb6/0x150 16) 5856 128 break_cycle+0xfe/0x260 17) 5728 368 bch2_check_for_deadlock+0x35f/0x5f0 18) 5360 96 six_lock_slowpath.isra.0+0x204/0x4c0 19) 5264 96 __bch2_btree_node_get+0x384/0x5b0 20) 5168 336 bch2_btree_path_traverse_one+0x7a5/0xd60 21) 4832 232 bch2_btree_iter_peek_slot+0x104/0x7f0 22) 4600 216 btree_key_cache_fill+0xcf/0x1a0 23) 4384 72 bch2_btree_path_traverse_cached+0x2bd/0x310 24) 4312 336 bch2_btree_path_traverse_one+0x705/0xd60 25) 3976 232 bch2_btree_iter_peek_slot+0x104/0x7f0 26) 3744 424 bch2_check_discard_freespace_key+0x172/0x5e0 27) 3320 224 bch2_bucket_alloc_freelist+0x422/0x610 28) 3096 88 bch2_bucket_alloc_trans+0x1f3/0x3a0 29) 3008 168 bch2_bucket_alloc_set_trans+0xf1/0x360 30) 2840 184 __open_bucket_add_buckets+0x40b/0x660 31) 2656 40 open_bucket_add_buckets+0x72/0xf0 32) 2616 280 bch2_alloc_sectors_start_trans+0x76d/0xd00 33) 2336 424 __bch2_write+0x1d1/0x11d0 34) 1912 168 __bch2_writepage+0x3b2/0x790 35) 1744 72 write_cache_pages+0x5c/0xa0 36) 1672 176 bch2_writepages+0x67/0xc0 37) 1496 184 do_writepages+0xcc/0x240 38) 1312 64 __writeback_single_inode+0x41/0x320 39) 1248 456 writeback_sb_inodes+0x216/0x4e0 40) 792 64 __writeback_inodes_wb+0x4c/0xe0 41) 728 168 wb_writeback+0x19c/0x310 42) 560 136 wb_workfn+0x2a4/0x400 43) 424 64 process_one_work+0x18c/0x330 44) 360 72 worker_thread+0x252/0x3a0 45) 288 80 kthread+0xf9/0x210 46) 208 32 ret_from_fork+0x31/0x50 47) 176 176 ret_from_fork_asm+0x11/0x20 ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-05-25 20:05 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-22 13:29 riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Naresh Kamboju 2025-05-22 16:48 ` Kent Overstreet 2025-05-23 13:19 ` Naresh Kamboju 2025-05-23 13:49 ` Arnd Bergmann 2025-05-23 14:08 ` Kent Overstreet 2025-05-23 15:17 ` Arnd Bergmann 2025-05-23 17:11 ` Kent Overstreet 2025-05-23 18:01 ` Arnd Bergmann 2025-05-25 17:18 ` David Laight 2025-05-25 17:36 ` Kent Overstreet 2025-05-25 17:47 ` David Laight 2025-05-25 18:10 ` Kent Overstreet 2025-05-25 19:25 ` Steven Rostedt 2025-05-25 20:04 ` Kent Overstreet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox