Linux bcache driver list
 help / color / mirror / Atom feed
* riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
@ 2025-05-22 13:29 Naresh Kamboju
  2025-05-22 16:48 ` Kent Overstreet
  0 siblings, 1 reply; 14+ messages in thread
From: Naresh Kamboju @ 2025-05-22 13:29 UTC (permalink / raw)
  To: linux-bcache, open list, lkft-triage, Linux Regressions
  Cc: kent.overstreet, Arnd Bergmann, Dan Carpenter, Anders Roxell

Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next
tag next-20250516 and next-20250522.

First seen on the next-20250516
 Good: next-20250515
 Bad:  next-20250516

Regressions found on riscv:
 - build/gcc-13-allyesconfig

Regression Analysis:
 - New regression? Yes
 - Reproducible? Yes

Build regression: riscv gcc-13 allyesconfig error the frame size of
2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>

## Build log
fs/bcachefs/data_update.c: In function '__bch2_data_update_index_update':
fs/bcachefs/data_update.c:464:1: error: the frame size of 2064 bytes
is larger than 2048 bytes [-Werror=frame-larger-than=]
  464 | }
      | ^
cc1: all warnings being treated as errors


## Source
* Kernel version: 6.15.0-rc7
* Git tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git
* Git sha: 460178e842c7a1e48a06df684c66eb5fd630bcf7
* Git describe: next-20250522

## Build
* Build log: https://qa-reports.linaro.org/api/testruns/28521854/log_file/
* Build history:
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250522/testrun/28521854/suite/build/test/gcc-13-allyesconfig/history/
* Build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/2xRoAAw5dl69AvvHb8oZ3pL1SFx/
* Kernel config:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2xRoAAw5dl69AvvHb8oZ3pL1SFx/config

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-22 13:29 riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Naresh Kamboju
@ 2025-05-22 16:48 ` Kent Overstreet
  2025-05-23 13:19   ` Naresh Kamboju
  0 siblings, 1 reply; 14+ messages in thread
From: Kent Overstreet @ 2025-05-22 16:48 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: linux-bcache, open list, lkft-triage, Linux Regressions,
	Arnd Bergmann, Dan Carpenter, Anders Roxell

On Thu, May 22, 2025 at 06:59:53PM +0530, Naresh Kamboju wrote:
> Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next
> tag next-20250516 and next-20250522.
> 
> First seen on the next-20250516
>  Good: next-20250515
>  Bad:  next-20250516
> 
> Regressions found on riscv:
>  - build/gcc-13-allyesconfig
> 
> Regression Analysis:
>  - New regression? Yes
>  - Reproducible? Yes
> 
> Build regression: riscv gcc-13 allyesconfig error the frame size of
> 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

Is this a kmsan build? kmsan seems to inflate stack usage by quite a
lot.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-22 16:48 ` Kent Overstreet
@ 2025-05-23 13:19   ` Naresh Kamboju
  2025-05-23 13:49     ` Arnd Bergmann
  0 siblings, 1 reply; 14+ messages in thread
From: Naresh Kamboju @ 2025-05-23 13:19 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-bcache, open list, lkft-triage, Linux Regressions,
	Arnd Bergmann, Dan Carpenter, Anders Roxell

On Thu, 22 May 2025 at 22:18, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> On Thu, May 22, 2025 at 06:59:53PM +0530, Naresh Kamboju wrote:
> > Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next
> > tag next-20250516 and next-20250522.
> >
> > First seen on the next-20250516
> >  Good: next-20250515
> >  Bad:  next-20250516
> >
> > Regressions found on riscv:
> >  - build/gcc-13-allyesconfig
> >
> > Regression Analysis:
> >  - New regression? Yes
> >  - Reproducible? Yes
> >
> > Build regression: riscv gcc-13 allyesconfig error the frame size of
> > 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
>
> Is this a kmsan build? kmsan seems to inflate stack usage by quite a
> lot.

This is allyesconfig build which has KASAN builds.

CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
CONFIG_KASAN=y
CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y
CONFIG_KASAN_GENERIC=y
# CONFIG_KASAN_OUTLINE is not set
CONFIG_KASAN_INLINE=y
CONFIG_KASAN_STACK=y
CONFIG_KASAN_VMALLOC=y
CONFIG_KASAN_KUNIT_TEST=y
CONFIG_KASAN_EXTRA_INFO=y

- Naresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-23 13:19   ` Naresh Kamboju
@ 2025-05-23 13:49     ` Arnd Bergmann
  2025-05-23 14:08       ` Kent Overstreet
  0 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2025-05-23 13:49 UTC (permalink / raw)
  To: Naresh Kamboju, Kent Overstreet
  Cc: linux-bcache, open list, lkft-triage, Linux Regressions,
	Dan Carpenter, Anders Roxell

On Fri, May 23, 2025, at 15:19, Naresh Kamboju wrote:
> On Thu, 22 May 2025 at 22:18, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>>
>> On Thu, May 22, 2025 at 06:59:53PM +0530, Naresh Kamboju wrote:
>> > Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next
>> > tag next-20250516 and next-20250522.
>> >
>> > First seen on the next-20250516
>> >  Good: next-20250515
>> >  Bad:  next-20250516
>> >
>> > Regressions found on riscv:
>> >  - build/gcc-13-allyesconfig
>> >
>> > Regression Analysis:
>> >  - New regression? Yes
>> >  - Reproducible? Yes
>> >
>> > Build regression: riscv gcc-13 allyesconfig error the frame size of
>> > 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
>>
>> Is this a kmsan build? kmsan seems to inflate stack usage by quite a
>> lot.

KMSAN is currently a clang-only feature.

> This is allyesconfig build which has KASAN builds.
>
> CONFIG_HAVE_ARCH_KASAN=y
> CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
> CONFIG_CC_HAS_KASAN_GENERIC=y
> CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
> CONFIG_KASAN=y
> CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y
> CONFIG_KASAN_GENERIC=y
> # CONFIG_KASAN_OUTLINE is not set
> CONFIG_KASAN_INLINE=y
> CONFIG_KASAN_STACK=y

I reproduced the problem locally and found this to go down to
1440 bytes after I turn off KASAN_STACK. next-20250523 has
some changes that take the number down further to 1136 with
KASAN_STACK and or 1552 with KASAN_STACK.

I've turned bcachefs with kasan-stack on for my randconfig
builds again to see if there are any remaining corner cases.

     Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-23 13:49     ` Arnd Bergmann
@ 2025-05-23 14:08       ` Kent Overstreet
  2025-05-23 15:17         ` Arnd Bergmann
  0 siblings, 1 reply; 14+ messages in thread
From: Kent Overstreet @ 2025-05-23 14:08 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Naresh Kamboju, linux-bcache, open list, lkft-triage,
	Linux Regressions, Dan Carpenter, Anders Roxell

On Fri, May 23, 2025 at 03:49:54PM +0200, Arnd Bergmann wrote:
> On Fri, May 23, 2025, at 15:19, Naresh Kamboju wrote:
> > On Thu, 22 May 2025 at 22:18, Kent Overstreet <kent.overstreet@linux.dev> wrote:
> >>
> >> On Thu, May 22, 2025 at 06:59:53PM +0530, Naresh Kamboju wrote:
> >> > Regressions on riscv allyesconfig build failed with gcc-13 on the Linux next
> >> > tag next-20250516 and next-20250522.
> >> >
> >> > First seen on the next-20250516
> >> >  Good: next-20250515
> >> >  Bad:  next-20250516
> >> >
> >> > Regressions found on riscv:
> >> >  - build/gcc-13-allyesconfig
> >> >
> >> > Regression Analysis:
> >> >  - New regression? Yes
> >> >  - Reproducible? Yes
> >> >
> >> > Build regression: riscv gcc-13 allyesconfig error the frame size of
> >> > 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
> >>
> >> Is this a kmsan build? kmsan seems to inflate stack usage by quite a
> >> lot.
> 
> KMSAN is currently a clang-only feature.
> 
> > This is allyesconfig build which has KASAN builds.
> >
> > CONFIG_HAVE_ARCH_KASAN=y
> > CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
> > CONFIG_CC_HAS_KASAN_GENERIC=y
> > CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
> > CONFIG_KASAN=y
> > CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y
> > CONFIG_KASAN_GENERIC=y
> > # CONFIG_KASAN_OUTLINE is not set
> > CONFIG_KASAN_INLINE=y
> > CONFIG_KASAN_STACK=y
> 
> I reproduced the problem locally and found this to go down to
> 1440 bytes after I turn off KASAN_STACK. next-20250523 has
> some changes that take the number down further to 1136 with
> KASAN_STACK and or 1552 with KASAN_STACK.
> 
> I've turned bcachefs with kasan-stack on for my randconfig
> builds again to see if there are any remaining corner cases.

Thanks for the numbers - that does still seem high, I'll have to have a
look with pahole.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-23 14:08       ` Kent Overstreet
@ 2025-05-23 15:17         ` Arnd Bergmann
  2025-05-23 17:11           ` Kent Overstreet
  0 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2025-05-23 15:17 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Naresh Kamboju, linux-bcache, open list, lkft-triage,
	Linux Regressions, Dan Carpenter, Anders Roxell

On Fri, May 23, 2025, at 16:08, Kent Overstreet wrote:
> On Fri, May 23, 2025 at 03:49:54PM +0200, Arnd Bergmann wrote:
>> On Fri, May 23, 2025, at 15:19, Naresh Kamboju wrote:
>
>> I reproduced the problem locally and found this to go down to
>> 1440 bytes after I turn off KASAN_STACK. next-20250523 has
>> some changes that take the number down further to 1136 with
>> KASAN_STACK and or 1552 with KASAN_STACK.
>> 
>> I've turned bcachefs with kasan-stack on for my randconfig
>> builds again to see if there are any remaining corner cases.
>
> Thanks for the numbers - that does still seem high, I'll have to have a
> look with pahole.

I agree it's still larger than it should be: having more than
a few hundred bytes on a function usually means that there is
both the risk for actual overflow and general inefficiency if
all the stack data gets accessed as well.

It's probably not actually structure data though, but a
combination of effects:

- KASAN_STACK adds extra redzones for each variable
- KASAN_STACK further prevents stack slots from getting
  reused inside one function, in order to better pinpoint
  which instance caused problems like out-of-scope access
- passing structures by value causes them to be put on
  the stack on some architectures, even when the structure
  size is only one or two registers
- sanitizers turn off optimizations that lead to better
  stack usage
- in some cases, the missed optimization ends up causing
  local variables to get spilled to the stack many times
  because of a combination of all the above.

The good news is that so far my randconfig builds have not
shown any more stack frame warnings on next-20230523 with
bcachefs force-enabled, now 55 builds into the change,
across arm32/arm64/x86 using gcc-15.1.

       Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-23 15:17         ` Arnd Bergmann
@ 2025-05-23 17:11           ` Kent Overstreet
  2025-05-23 18:01             ` Arnd Bergmann
  0 siblings, 1 reply; 14+ messages in thread
From: Kent Overstreet @ 2025-05-23 17:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Naresh Kamboju, linux-bcache, open list, lkft-triage,
	Linux Regressions, Dan Carpenter, Anders Roxell

On Fri, May 23, 2025 at 05:17:15PM +0200, Arnd Bergmann wrote:
> On Fri, May 23, 2025, at 16:08, Kent Overstreet wrote:
> > On Fri, May 23, 2025 at 03:49:54PM +0200, Arnd Bergmann wrote:
> >> On Fri, May 23, 2025, at 15:19, Naresh Kamboju wrote:
> >
> >> I reproduced the problem locally and found this to go down to
> >> 1440 bytes after I turn off KASAN_STACK. next-20250523 has
> >> some changes that take the number down further to 1136 with
> >> KASAN_STACK and or 1552 with KASAN_STACK.
> >> 
> >> I've turned bcachefs with kasan-stack on for my randconfig
> >> builds again to see if there are any remaining corner cases.
> >
> > Thanks for the numbers - that does still seem high, I'll have to have a
> > look with pahole.
> 
> I agree it's still larger than it should be: having more than
> a few hundred bytes on a function usually means that there is
> both the risk for actual overflow and general inefficiency if
> all the stack data gets accessed as well.
> 
> It's probably not actually structure data though, but a
> combination of effects:
> 
> - KASAN_STACK adds extra redzones for each variable
> - KASAN_STACK further prevents stack slots from getting
>   reused inside one function, in order to better pinpoint
>   which instance caused problems like out-of-scope access
> - passing structures by value causes them to be put on
>   the stack on some architectures, even when the structure
>   size is only one or two registers

We mainly do this with bkey_s_c, which is just two words: on x86_64,
that gets passed in registers. Is riscv different?

> - sanitizers turn off optimizations that lead to better
>   stack usage
> - in some cases, the missed optimization ends up causing
>   local variables to get spilled to the stack many times
>   because of a combination of all the above.

Yeesh.

I suspect we should be running with a larger stack when the sanitizers
are running, and perhaps tweak the warnings accordingly. I did a bunch
of stack usage work after I found a kmsan build was blowing out the
stack, but then running with max stack usage tracing enabled showed it
to be a largely non issue on non-sanitizer builds, IIRC.

> The good news is that so far my randconfig builds have not
> shown any more stack frame warnings on next-20230523 with
> bcachefs force-enabled, now 55 builds into the change,
> across arm32/arm64/x86 using gcc-15.1.

Good to know, thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-23 17:11           ` Kent Overstreet
@ 2025-05-23 18:01             ` Arnd Bergmann
  2025-05-25 17:18               ` David Laight
  0 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2025-05-23 18:01 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Naresh Kamboju, linux-bcache, open list, lkft-triage,
	Linux Regressions, Dan Carpenter, Anders Roxell

On Fri, May 23, 2025, at 19:11, Kent Overstreet wrote:
> On Fri, May 23, 2025 at 05:17:15PM +0200, Arnd Bergmann wrote:
>> 
>> - KASAN_STACK adds extra redzones for each variable
>> - KASAN_STACK further prevents stack slots from getting
>>   reused inside one function, in order to better pinpoint
>>   which instance caused problems like out-of-scope access
>> - passing structures by value causes them to be put on
>>   the stack on some architectures, even when the structure
>>   size is only one or two registers
>
> We mainly do this with bkey_s_c, which is just two words: on x86_64,
> that gets passed in registers. Is riscv different?

Not sure, I think it's mostly older ABIs that are limited,
either not passing structures in registers at all, or only
possibly one but not two of them.

>> - sanitizers turn off optimizations that lead to better
>>   stack usage
>> - in some cases, the missed optimization ends up causing
>>   local variables to get spilled to the stack many times
>>   because of a combination of all the above.
>
> Yeesh.
>
> I suspect we should be running with a larger stack when the sanitizers
> are running, and perhaps tweak the warnings accordingly. I did a bunch
> of stack usage work after I found a kmsan build was blowing out the
> stack, but then running with max stack usage tracing enabled showed it
> to be a largely non issue on non-sanitizer builds, IIRC.

Enabling KASAN does double the available stack space. However, I don't
think we should use that as an excuse to raise the per-function
warning limit, because

 - the majority of all function stacks do not grow that much when
   sanitizers are enabled
 - allmodconfig enables KASAN and should still catch mistakes
   where a driver accidentally puts a large structure on the stack
 - 2KB on 64-bit targes is a really large limit. At some point
   in the past I had a series that lowered the limit to 1536 byte
   for 64-bit targets, but I never managed to get all the changes
   merged.
  

     Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-23 18:01             ` Arnd Bergmann
@ 2025-05-25 17:18               ` David Laight
  2025-05-25 17:36                 ` Kent Overstreet
  0 siblings, 1 reply; 14+ messages in thread
From: David Laight @ 2025-05-25 17:18 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Kent Overstreet, Naresh Kamboju, linux-bcache, open list,
	lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell

On Fri, 23 May 2025 20:01:33 +0200
"Arnd Bergmann" <arnd@arndb.de> wrote:

> On Fri, May 23, 2025, at 19:11, Kent Overstreet wrote:
> > On Fri, May 23, 2025 at 05:17:15PM +0200, Arnd Bergmann wrote:  
> >> 
> >> - KASAN_STACK adds extra redzones for each variable
> >> - KASAN_STACK further prevents stack slots from getting
> >>   reused inside one function, in order to better pinpoint
> >>   which instance caused problems like out-of-scope access
> >> - passing structures by value causes them to be put on
> >>   the stack on some architectures, even when the structure
> >>   size is only one or two registers  
> >
> > We mainly do this with bkey_s_c, which is just two words: on x86_64,
> > that gets passed in registers. Is riscv different?  
> 
> Not sure, I think it's mostly older ABIs that are limited,
> either not passing structures in registers at all, or only
> possibly one but not two of them.
> 
> >> - sanitizers turn off optimizations that lead to better
> >>   stack usage
> >> - in some cases, the missed optimization ends up causing
> >>   local variables to get spilled to the stack many times
> >>   because of a combination of all the above.  
> >
> > Yeesh.
> >
> > I suspect we should be running with a larger stack when the sanitizers
> > are running, and perhaps tweak the warnings accordingly. I did a bunch
> > of stack usage work after I found a kmsan build was blowing out the
> > stack, but then running with max stack usage tracing enabled showed it
> > to be a largely non issue on non-sanitizer builds, IIRC.  
> 
> Enabling KASAN does double the available stack space. However, I don't
> think we should use that as an excuse to raise the per-function
> warning limit, because
> 
>  - the majority of all function stacks do not grow that much when
>    sanitizers are enabled
>  - allmodconfig enables KASAN and should still catch mistakes
>    where a driver accidentally puts a large structure on the stack

That is rather annoying when you want to look at the generated code :-(

>  - 2KB on 64-bit targes is a really large limit. At some point
>    in the past I had a series that lowered the limit to 1536 byte
>    for 64-bit targets, but I never managed to get all the changes
>    merged.

I've a cunning plan to do a proper static analysis of stack usage.
It is a 'simple' matter of getting objtool to output all calls with
the stack offset.
Indirect calls need the function hashes from fine-ibt, but also need
clang to support 'hash seeds' to disambiguate all the void (*)(void *)
functions.
That'll first barf at all recursion, and then, I expect, show a massive
stack use inside snprintf() in some error path.

Just need a big stack of 'round tuits'.

	David

>   
> 
>      Arnd
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-25 17:18               ` David Laight
@ 2025-05-25 17:36                 ` Kent Overstreet
  2025-05-25 17:47                   ` David Laight
  2025-05-25 19:25                   ` Steven Rostedt
  0 siblings, 2 replies; 14+ messages in thread
From: Kent Overstreet @ 2025-05-25 17:36 UTC (permalink / raw)
  To: David Laight
  Cc: Arnd Bergmann, Naresh Kamboju, linux-bcache, open list,
	lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell,
	Steven Rostedt

+cc Steve

On Sun, May 25, 2025 at 06:18:42PM +0100, David Laight wrote:
> On Fri, 23 May 2025 20:01:33 +0200
> "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
> > On Fri, May 23, 2025, at 19:11, Kent Overstreet wrote:
> > > On Fri, May 23, 2025 at 05:17:15PM +0200, Arnd Bergmann wrote:  
> > >> 
> > >> - KASAN_STACK adds extra redzones for each variable
> > >> - KASAN_STACK further prevents stack slots from getting
> > >>   reused inside one function, in order to better pinpoint
> > >>   which instance caused problems like out-of-scope access
> > >> - passing structures by value causes them to be put on
> > >>   the stack on some architectures, even when the structure
> > >>   size is only one or two registers  
> > >
> > > We mainly do this with bkey_s_c, which is just two words: on x86_64,
> > > that gets passed in registers. Is riscv different?  
> > 
> > Not sure, I think it's mostly older ABIs that are limited,
> > either not passing structures in registers at all, or only
> > possibly one but not two of them.
> > 
> > >> - sanitizers turn off optimizations that lead to better
> > >>   stack usage
> > >> - in some cases, the missed optimization ends up causing
> > >>   local variables to get spilled to the stack many times
> > >>   because of a combination of all the above.  
> > >
> > > Yeesh.
> > >
> > > I suspect we should be running with a larger stack when the sanitizers
> > > are running, and perhaps tweak the warnings accordingly. I did a bunch
> > > of stack usage work after I found a kmsan build was blowing out the
> > > stack, but then running with max stack usage tracing enabled showed it
> > > to be a largely non issue on non-sanitizer builds, IIRC.  
> > 
> > Enabling KASAN does double the available stack space. However, I don't
> > think we should use that as an excuse to raise the per-function
> > warning limit, because
> > 
> >  - the majority of all function stacks do not grow that much when
> >    sanitizers are enabled
> >  - allmodconfig enables KASAN and should still catch mistakes
> >    where a driver accidentally puts a large structure on the stack
> 
> That is rather annoying when you want to look at the generated code :-(
> 
> >  - 2KB on 64-bit targes is a really large limit. At some point
> >    in the past I had a series that lowered the limit to 1536 byte
> >    for 64-bit targets, but I never managed to get all the changes
> >    merged.
> 
> I've a cunning plan to do a proper static analysis of stack usage.
> It is a 'simple' matter of getting objtool to output all calls with
> the stack offset.
> Indirect calls need the function hashes from fine-ibt, but also need
> clang to support 'hash seeds' to disambiguate all the void (*)(void *)
> functions.
> That'll first barf at all recursion, and then, I expect, show a massive
> stack use inside snprintf() in some error path.

I suspect recursion will make the results you get with that approach
useless.

We already have "trace max stack", but that only checks at process exit,
so it doesn't tell you much.

We could do better with tracing - just inject a trampoline that checks
the current stack usage against the maximum stack usage we've seen, and
emits a trace event with a stack trace if it's greater.

(and now Steve's going to tell us he's already done this :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-25 17:36                 ` Kent Overstreet
@ 2025-05-25 17:47                   ` David Laight
  2025-05-25 18:10                     ` Kent Overstreet
  2025-05-25 19:25                   ` Steven Rostedt
  1 sibling, 1 reply; 14+ messages in thread
From: David Laight @ 2025-05-25 17:47 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Arnd Bergmann, Naresh Kamboju, linux-bcache, open list,
	lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell,
	Steven Rostedt

On Sun, 25 May 2025 13:36:16 -0400
Kent Overstreet <kent.overstreet@linux.dev> wrote:

> +cc Steve
...
> > I've a cunning plan to do a proper static analysis of stack usage.
> > It is a 'simple' matter of getting objtool to output all calls with
> > the stack offset.
> > Indirect calls need the function hashes from fine-ibt, but also need
> > clang to support 'hash seeds' to disambiguate all the void (*)(void *)
> > functions.
> > That'll first barf at all recursion, and then, I expect, show a massive
> > stack use inside snprintf() in some error path.  
> 
> I suspect recursion will make the results you get with that approach
> useless.

Recursion is an issue, but the kernel really doesn't support recursion.
So you actually want to know the possible recursion loops anyway.
I suspect (hope) most will be the 'recurses only once' type.
If not they need some other bound.

> We already have "trace max stack", but that only checks at process exit,
> so it doesn't tell you much.
> 
> We could do better with tracing - just inject a trampoline that checks
> the current stack usage against the maximum stack usage we've seen, and
> emits a trace event with a stack trace if it's greater.

Both those only tells you the stack you've used.
The static analysis will show you the stack 'you might use'.
Which is really much more important.

I did this for an embedded system a long time ago.
The outcome was that we didn't have enough memory to allocate the
'worst case' stacks!

	David



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-25 17:47                   ` David Laight
@ 2025-05-25 18:10                     ` Kent Overstreet
  0 siblings, 0 replies; 14+ messages in thread
From: Kent Overstreet @ 2025-05-25 18:10 UTC (permalink / raw)
  To: David Laight
  Cc: Arnd Bergmann, Naresh Kamboju, linux-bcache, open list,
	lkft-triage, Linux Regressions, Dan Carpenter, Anders Roxell,
	Steven Rostedt

On Sun, May 25, 2025 at 06:47:57PM +0100, David Laight wrote:
> On Sun, 25 May 2025 13:36:16 -0400
> Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> > +cc Steve
> ...
> > > I've a cunning plan to do a proper static analysis of stack usage.
> > > It is a 'simple' matter of getting objtool to output all calls with
> > > the stack offset.
> > > Indirect calls need the function hashes from fine-ibt, but also need
> > > clang to support 'hash seeds' to disambiguate all the void (*)(void *)
> > > functions.
> > > That'll first barf at all recursion, and then, I expect, show a massive
> > > stack use inside snprintf() in some error path.  
> > 
> > I suspect recursion will make the results you get with that approach
> > useless.
> 
> Recursion is an issue, but the kernel really doesn't support recursion.
> So you actually want to know the possible recursion loops anyway.
> I suspect (hope) most will be the 'recurses only once' type.
> If not they need some other bound.

Recursion is a fact of life when you get different subsystems
interacting in unpredictable ways.

You can be in one filesystem, and then end up in a fault handler (gup(),
or a simple copy to/from user), and then end up in a completely
different filesystem - and then you call into the block layer, or
networking if it's NFS.

Static analysis might get you some useful data within a subsystem, but
it won't tell you much about the kernel as a whole as people are
actually running it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-25 17:36                 ` Kent Overstreet
  2025-05-25 17:47                   ` David Laight
@ 2025-05-25 19:25                   ` Steven Rostedt
  2025-05-25 20:04                     ` Kent Overstreet
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2025-05-25 19:25 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: David Laight, Arnd Bergmann, Naresh Kamboju, linux-bcache,
	open list, lkft-triage, Linux Regressions, Dan Carpenter,
	Anders Roxell

On Sun, 25 May 2025 13:36:16 -0400
Kent Overstreet <kent.overstreet@linux.dev> wrote:

> We already have "trace max stack", but that only checks at process exit,
> so it doesn't tell you much.

Nope, it traces the stack at every function call, but it misses the leaf
functions and also doesn't check interrupts as they may use a different
stack.

> 
> We could do better with tracing - just inject a trampoline that checks
> the current stack usage against the maximum stack usage we've seen, and
> emits a trace event with a stack trace if it's greater.
> 
> (and now Steve's going to tell us he's already done this :)

Close ;-)

# echo 1 > /proc/sys/kernel/stack_tracer_enabled

Wait.

# cat /sys/kernel/tracing/stack_trace
        Depth    Size   Location    (33 entries)
        -----    ----   --------
  0)     8360      48   __msecs_to_jiffies+0x9/0x30
  1)     8312     104   update_group_capacity+0x95/0x970
  2)     8208     520   update_sd_lb_stats.constprop.0+0x278/0x2f40
  3)     7688     416   sched_balance_find_src_group+0x96/0xe30
  4)     7272     512   sched_balance_rq+0x53f/0x2fe0
  5)     6760     344   sched_balance_newidle+0x6c1/0x1310
  6)     6416      80   pick_next_task_fair+0x55/0xe60
  7)     6336     328   __schedule+0x8a5/0x33d0
  8)     6008      32   schedule+0xe2/0x3b0
  9)     5976      32   io_schedule+0x8f/0xf0
 10)     5944     264   rq_qos_wait+0x12a/0x200
 11)     5680     144   wbt_wait+0x159/0x260
 12)     5536      40   __rq_qos_throttle+0x50/0x90
 13)     5496     320   blk_mq_submit_bio+0x70b/0x1ff0
 14)     5176     240   __submit_bio+0x1b3/0x600
 15)     4936     248   submit_bio_noacct_nocheck+0x546/0xca0
 16)     4688     144   ext4_bio_write_folio+0x69d/0x1870
 17)     4544      64   mpage_submit_folio+0x14c/0x2b0
 18)     4480      96   mpage_process_page_bufs+0x392/0x7a0
 19)     4384     632   mpage_prepare_extent_to_map+0xa5b/0x1080
 20)     3752     496   ext4_do_writepages+0x8af/0x2ee0
 21)     3256     304   ext4_writepages+0x26f/0x5c0
 22)     2952     344   do_writepages+0x183/0x7c0
 23)     2608     152   __writeback_single_inode+0x114/0xb00
 24)     2456     744   writeback_sb_inodes+0x52b/0xdf0
 25)     1712     168   __writeback_inodes_wb+0xf4/0x270
 26)     1544     312   wb_writeback+0x547/0x800
 27)     1232     328   wb_workfn+0x7b1/0xbc0
 28)      904     352   process_one_work+0x85a/0x1450
 29)      552     176   worker_thread+0x5b7/0xf80
 30)      376     168   kthread+0x371/0x720
 31)      208      32   ret_from_fork+0x34/0x70
 32)      176     176   ret_from_fork_asm+0x1a/0x30


The code that does this is in kernel/trace/trace_stack.c

It simply attaches to the function tracer and at ever function checks the
current stack size.

Hmm, I need to update this because today we even pass the stack pointer via
the ftrace_regs if the arch supports it. Using that would allow me to get
rid of the hack:


static void check_stack(unsigned long ip, unsigned long *stack)
{
	[..]
	this_size = ((unsigned long)stack) & (THREAD_SIZE-1);
	this_size = THREAD_SIZE - this_size;


	unsigned long stack;

[..]

static void
stack_trace_call(unsigned long ip, unsigned long parent_ip,
		 struct ftrace_ops *op, struct ftrace_regs *fregs)
{
	unsigned long stack;
	[..]

	check_stack(ip, &stack);


-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
  2025-05-25 19:25                   ` Steven Rostedt
@ 2025-05-25 20:04                     ` Kent Overstreet
  0 siblings, 0 replies; 14+ messages in thread
From: Kent Overstreet @ 2025-05-25 20:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Laight, Arnd Bergmann, Naresh Kamboju, linux-bcache,
	open list, lkft-triage, Linux Regressions, Dan Carpenter,
	Anders Roxell

On Sun, May 25, 2025 at 03:25:02PM -0400, Steven Rostedt wrote:
> On Sun, 25 May 2025 13:36:16 -0400
> Kent Overstreet <kent.overstreet@linux.dev> wrote:
> 
> > We already have "trace max stack", but that only checks at process exit,
> > so it doesn't tell you much.
> 
> Nope, it traces the stack at every function call, but it misses the leaf
> functions and also doesn't check interrupts as they may use a different
> stack.

I was thinking of DEBUG_STACK_USAGE :)

> > We could do better with tracing - just inject a trampoline that checks
> > the current stack usage against the maximum stack usage we've seen, and
> > emits a trace event with a stack trace if it's greater.
> > 
> > (and now Steve's going to tell us he's already done this :)
> 
> Close ;-)
> 
> # echo 1 > /proc/sys/kernel/stack_tracer_enabled
> 
> Wait.
> 
> # cat /sys/kernel/tracing/stack_trace
>         Depth    Size   Location    (33 entries)
>         -----    ----   --------
>   0)     8360      48   __msecs_to_jiffies+0x9/0x30
>   1)     8312     104   update_group_capacity+0x95/0x970
>   2)     8208     520   update_sd_lb_stats.constprop.0+0x278/0x2f40
>   3)     7688     416   sched_balance_find_src_group+0x96/0xe30
>   4)     7272     512   sched_balance_rq+0x53f/0x2fe0
>   5)     6760     344   sched_balance_newidle+0x6c1/0x1310
>   6)     6416      80   pick_next_task_fair+0x55/0xe60
>   7)     6336     328   __schedule+0x8a5/0x33d0
>   8)     6008      32   schedule+0xe2/0x3b0
>   9)     5976      32   io_schedule+0x8f/0xf0
>  10)     5944     264   rq_qos_wait+0x12a/0x200
>  11)     5680     144   wbt_wait+0x159/0x260
>  12)     5536      40   __rq_qos_throttle+0x50/0x90
>  13)     5496     320   blk_mq_submit_bio+0x70b/0x1ff0
>  14)     5176     240   __submit_bio+0x1b3/0x600
>  15)     4936     248   submit_bio_noacct_nocheck+0x546/0xca0
>  16)     4688     144   ext4_bio_write_folio+0x69d/0x1870
>  17)     4544      64   mpage_submit_folio+0x14c/0x2b0
>  18)     4480      96   mpage_process_page_bufs+0x392/0x7a0
>  19)     4384     632   mpage_prepare_extent_to_map+0xa5b/0x1080
>  20)     3752     496   ext4_do_writepages+0x8af/0x2ee0
>  21)     3256     304   ext4_writepages+0x26f/0x5c0
>  22)     2952     344   do_writepages+0x183/0x7c0
>  23)     2608     152   __writeback_single_inode+0x114/0xb00
>  24)     2456     744   writeback_sb_inodes+0x52b/0xdf0
>  25)     1712     168   __writeback_inodes_wb+0xf4/0x270
>  26)     1544     312   wb_writeback+0x547/0x800
>  27)     1232     328   wb_workfn+0x7b1/0xbc0
>  28)      904     352   process_one_work+0x85a/0x1450
>  29)      552     176   worker_thread+0x5b7/0xf80
>  30)      376     168   kthread+0x371/0x720
>  31)      208      32   ret_from_fork+0x34/0x70
>  32)      176     176   ret_from_fork_asm+0x1a/0x30

Nice! This is exactly what I was looking for :)

        Depth    Size   Location    (48 entries)
        -----    ----   --------
  0)     7728      48   __update_load_avg_se+0x9/0x440
  1)     7680      80   update_load_avg+0x25f/0x2b0
  2)     7600      56   set_next_task_fair+0x232/0x290
  3)     7544      48   pick_next_task_fair+0xcf/0x1a0
  4)     7496     120   __schedule+0x284/0xe80
  5)     7376      16   preempt_schedule_irq+0x33/0x50
  6)     7360     136   asm_common_interrupt+0x26/0x40
  7)     7224      48   get_symbol_offset+0x43/0x70
  8)     7176      56   kallsyms_lookup_buildid+0x55/0xf0
  9)     7120      88   __sprint_symbol.isra.0+0x48/0xf0
 10)     7032     720   symbol_string+0xf1/0x120
 11)     6312     120   vsnprintf+0x3dc/0x5d0
 12)     6192     128   bch2_prt_printf+0x57/0x140
 13)     6064      64   bch2_prt_task_backtrace+0x71/0xc0
 14)     6000      40   print_cycle+0x71/0xa0
 15)     5960     104   trace_would_deadlock+0xb6/0x150
 16)     5856     128   break_cycle+0xfe/0x260
 17)     5728     368   bch2_check_for_deadlock+0x35f/0x5f0
 18)     5360      96   six_lock_slowpath.isra.0+0x204/0x4c0
 19)     5264      96   __bch2_btree_node_get+0x384/0x5b0
 20)     5168     336   bch2_btree_path_traverse_one+0x7a5/0xd60
 21)     4832     232   bch2_btree_iter_peek_slot+0x104/0x7f0
 22)     4600     216   btree_key_cache_fill+0xcf/0x1a0
 23)     4384      72   bch2_btree_path_traverse_cached+0x2bd/0x310
 24)     4312     336   bch2_btree_path_traverse_one+0x705/0xd60
 25)     3976     232   bch2_btree_iter_peek_slot+0x104/0x7f0
 26)     3744     424   bch2_check_discard_freespace_key+0x172/0x5e0
 27)     3320     224   bch2_bucket_alloc_freelist+0x422/0x610
 28)     3096      88   bch2_bucket_alloc_trans+0x1f3/0x3a0
 29)     3008     168   bch2_bucket_alloc_set_trans+0xf1/0x360
 30)     2840     184   __open_bucket_add_buckets+0x40b/0x660
 31)     2656      40   open_bucket_add_buckets+0x72/0xf0
 32)     2616     280   bch2_alloc_sectors_start_trans+0x76d/0xd00
 33)     2336     424   __bch2_write+0x1d1/0x11d0
 34)     1912     168   __bch2_writepage+0x3b2/0x790
 35)     1744      72   write_cache_pages+0x5c/0xa0
 36)     1672     176   bch2_writepages+0x67/0xc0
 37)     1496     184   do_writepages+0xcc/0x240
 38)     1312      64   __writeback_single_inode+0x41/0x320
 39)     1248     456   writeback_sb_inodes+0x216/0x4e0
 40)      792      64   __writeback_inodes_wb+0x4c/0xe0
 41)      728     168   wb_writeback+0x19c/0x310
 42)      560     136   wb_workfn+0x2a4/0x400
 43)      424      64   process_one_work+0x18c/0x330
 44)      360      72   worker_thread+0x252/0x3a0
 45)      288      80   kthread+0xf9/0x210
 46)      208      32   ret_from_fork+0x31/0x50
 47)      176     176   ret_from_fork_asm+0x11/0x20

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-05-25 20:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 13:29 riscv gcc-13 allyesconfig error the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Naresh Kamboju
2025-05-22 16:48 ` Kent Overstreet
2025-05-23 13:19   ` Naresh Kamboju
2025-05-23 13:49     ` Arnd Bergmann
2025-05-23 14:08       ` Kent Overstreet
2025-05-23 15:17         ` Arnd Bergmann
2025-05-23 17:11           ` Kent Overstreet
2025-05-23 18:01             ` Arnd Bergmann
2025-05-25 17:18               ` David Laight
2025-05-25 17:36                 ` Kent Overstreet
2025-05-25 17:47                   ` David Laight
2025-05-25 18:10                     ` Kent Overstreet
2025-05-25 19:25                   ` Steven Rostedt
2025-05-25 20:04                     ` Kent Overstreet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox