From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Corentin Labbe <clabbe.montjoie@gmail.com>,
Linus Walleij <linus.walleij@linaro.org>,
Arnd Bergmann <arnd@arndb.de>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: boot flooded with unwind: Index not found
Date: Wed, 9 Mar 2022 00:01:26 +0000 [thread overview]
Message-ID: <YifuVmkcb1ie7bzk@shell.armlinux.org.uk> (raw)
In-Reply-To: <Yh9TdbWwHX/5Bhmt@shell.armlinux.org.uk>
On Wed, Mar 02, 2022 at 11:22:29AM +0000, Russell King (Oracle) wrote:
> On Wed, Mar 02, 2022 at 12:19:40PM +0100, Ard Biesheuvel wrote:
> > On Wed, 2 Mar 2022 at 12:12, Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > >
> > > On Wed, Mar 02, 2022 at 11:09:49AM +0100, Corentin Labbe wrote:
> > > > The crash disappeared (but the suspicious RCU usage is still here).
> > >
> > > As the trace on those is:
> > >
> > > [ 0.239629] unwind_backtrace from show_stack+0x10/0x14
> > > [ 0.239654] show_stack from init_stack+0x1c54/0x2000
> > >
> > > unwind_backtrace() and show_stack() are both C code, the compiler will
> > > emit the unwind information for it. show_stack() isn't called from
> > > assembly code, only from C code, so the next function's unwind
> > > information should also be generated by the compiler.
> > >
> > > However, init_stack is not a function - it's an array of unsigned long.
> > > There is no way this should appear in the trace, and this suggests that
> > > the unwind of show_stack() has gone wrong.
> > >
> > > I don't see anything obvious in Ard's changes that would cause that
> > > though.
> > >
> > > Did it used to work fine with previous versions of linux-next - those
> > > versions where we had Ard's "arm-vmap-stacks-v6" tag merged in
> > > (commit 2fa394824493) and did this only appear when I merged
> > > "arm-ftrace-for-rmk" (commit 74aaaa1e9bba) ? Did merging
> > > "arm-ftrace-for-rmk" cause any change in your .config?
> > >
> >
> > I can reproduce the RCU warnings, and I have tracked this down to the
> > change I made to return_address() for the graph tracer, which I
> > thought was justified after removing the call to
> > kernel_text_address():
> >
> > --- a/arch/arm/include/asm/ftrace.h
> > +++ b/arch/arm/include/asm/ftrace.h
> > @@ -35,26 +35,8 @@ static inline unsigned long
> > ftrace_call_adjust(unsigned long addr)
> >
> > #ifndef __ASSEMBLY__
> >
> > -#if defined(CONFIG_FRAME_POINTER) && !defined(CONFIG_ARM_UNWIND)
> > -/*
> > - * return_address uses walk_stackframe to do it's work. If both
> > - * CONFIG_FRAME_POINTER=y and CONFIG_ARM_UNWIND=y walk_stackframe uses unwind
> > - * information. For this to work in the function tracer many functions would
> > - * have to be marked with __notrace. So for now just depend on
> > - * !CONFIG_ARM_UNWIND.
> > - */
> > -
> > void *return_address(unsigned int);
> >
> > -#else
> > -
> > -static inline void *return_address(unsigned int level)
> > -{
> > - return NULL;
> > -}
> > -
> > -#endif
> > -
> > #define ftrace_return_address(n) return_address(n)
> >
> > #define ARCH_HAS_SYSCALL_MATCH_SYM_NAME
> >
> > However, the function graph tracer works happily with this bit
> > reverted, and so that is probably the best course of action here.
> >
> > I have already sent the patch that reintroduces the
> > kernel_text_address() check - would you prefer a v2 of that one with
> > this change incorporated? Or a second patch that just reverts the
> > above? (Given that the bogus dereference was invoked from
> > return_address() as well, I suspect that this change would make the
> > get_kernel_nofault() change I proposed in this thread redundant)
>
> I'd prefer patches on top of my devel-stable branch, thanks.
To reinterate what I've just put on IRC - we have not got to the bottom
of this problem yet - it still very much exists.
There seems to be something of a fundamental issue with the unwinder,
it now appears to be going wrong and failing to unwind beyond a
couple of functions, and the address it's coming out with appears to
be incorrect. I've only just discovered this because I created my very
own bug, and yet again, the timing sucks with the proximity of the
merge window.
I'm getting:
[ 13.198803] [<c0017728>] (unwind_backtrace) from [<c0012828>] (show_stack+0x10/0x14)
[ 13.198820] [<c0012828>] (show_stack) from [<c2be78d4>] (0xc2be78d4)
for the WARN_ON() stacktrace, and that address that apparently called
show_stack() is most definitely rubbish and incorrect. This makes any
WARN_ON() condition undebuggable.
This is with both 9183/1 and 9184/1 applied on top of pulling your
"arm-ftrace-for-rmk" tag and also with just the "arm-vmap-stacks-v6"
tag. This seems to point at one of these patches breaking the
unwinder:
a1c510d0adc6 ARM: implement support for vmap'ed stacks
532319b9c418 ARM: unwind: disregard unwind info before stack frame is set up
4ab6827081c6 ARM: unwind: dump exception stack from calling frame
b6506981f880 ARM: unwind: support unwinding across multiple stacks
Given that the unwinder is broken, I wonder whether 0183/1 and 9184/1
are actually required.
I did try to point this problem out a few emails back:
"As the trace on those is:
[ 0.239629] unwind_backtrace from show_stack+0x10/0x14
[ 0.239654] show_stack from init_stack+0x1c54/0x2000
unwind_backtrace() and show_stack() are both C code, the compiler will
emit the unwind information for it. show_stack() isn't called from
assembly code, only from C code, so the next function's unwind
information should also be generated by the compiler.
However, init_stack is not a function - it's an array of unsigned long.
There is no way this should appear in the trace, and this suggests that
the unwind of show_stack() has gone wrong."
In Corentin's case, there is no way init_stack should ever appear in
the stack trace. In my case, it's not init_stack, but 0xc2be78d4.
Can you try testing out a dummy WARN_ON(1) test in your kernel please?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Corentin Labbe <clabbe.montjoie@gmail.com>,
Linus Walleij <linus.walleij@linaro.org>,
Arnd Bergmann <arnd@arndb.de>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: boot flooded with unwind: Index not found
Date: Wed, 9 Mar 2022 00:01:26 +0000 [thread overview]
Message-ID: <YifuVmkcb1ie7bzk@shell.armlinux.org.uk> (raw)
In-Reply-To: <Yh9TdbWwHX/5Bhmt@shell.armlinux.org.uk>
On Wed, Mar 02, 2022 at 11:22:29AM +0000, Russell King (Oracle) wrote:
> On Wed, Mar 02, 2022 at 12:19:40PM +0100, Ard Biesheuvel wrote:
> > On Wed, 2 Mar 2022 at 12:12, Russell King (Oracle)
> > <linux@armlinux.org.uk> wrote:
> > >
> > > On Wed, Mar 02, 2022 at 11:09:49AM +0100, Corentin Labbe wrote:
> > > > The crash disappeared (but the suspicious RCU usage is still here).
> > >
> > > As the trace on those is:
> > >
> > > [ 0.239629] unwind_backtrace from show_stack+0x10/0x14
> > > [ 0.239654] show_stack from init_stack+0x1c54/0x2000
> > >
> > > unwind_backtrace() and show_stack() are both C code, the compiler will
> > > emit the unwind information for it. show_stack() isn't called from
> > > assembly code, only from C code, so the next function's unwind
> > > information should also be generated by the compiler.
> > >
> > > However, init_stack is not a function - it's an array of unsigned long.
> > > There is no way this should appear in the trace, and this suggests that
> > > the unwind of show_stack() has gone wrong.
> > >
> > > I don't see anything obvious in Ard's changes that would cause that
> > > though.
> > >
> > > Did it used to work fine with previous versions of linux-next - those
> > > versions where we had Ard's "arm-vmap-stacks-v6" tag merged in
> > > (commit 2fa394824493) and did this only appear when I merged
> > > "arm-ftrace-for-rmk" (commit 74aaaa1e9bba) ? Did merging
> > > "arm-ftrace-for-rmk" cause any change in your .config?
> > >
> >
> > I can reproduce the RCU warnings, and I have tracked this down to the
> > change I made to return_address() for the graph tracer, which I
> > thought was justified after removing the call to
> > kernel_text_address():
> >
> > --- a/arch/arm/include/asm/ftrace.h
> > +++ b/arch/arm/include/asm/ftrace.h
> > @@ -35,26 +35,8 @@ static inline unsigned long
> > ftrace_call_adjust(unsigned long addr)
> >
> > #ifndef __ASSEMBLY__
> >
> > -#if defined(CONFIG_FRAME_POINTER) && !defined(CONFIG_ARM_UNWIND)
> > -/*
> > - * return_address uses walk_stackframe to do it's work. If both
> > - * CONFIG_FRAME_POINTER=y and CONFIG_ARM_UNWIND=y walk_stackframe uses unwind
> > - * information. For this to work in the function tracer many functions would
> > - * have to be marked with __notrace. So for now just depend on
> > - * !CONFIG_ARM_UNWIND.
> > - */
> > -
> > void *return_address(unsigned int);
> >
> > -#else
> > -
> > -static inline void *return_address(unsigned int level)
> > -{
> > - return NULL;
> > -}
> > -
> > -#endif
> > -
> > #define ftrace_return_address(n) return_address(n)
> >
> > #define ARCH_HAS_SYSCALL_MATCH_SYM_NAME
> >
> > However, the function graph tracer works happily with this bit
> > reverted, and so that is probably the best course of action here.
> >
> > I have already sent the patch that reintroduces the
> > kernel_text_address() check - would you prefer a v2 of that one with
> > this change incorporated? Or a second patch that just reverts the
> > above? (Given that the bogus dereference was invoked from
> > return_address() as well, I suspect that this change would make the
> > get_kernel_nofault() change I proposed in this thread redundant)
>
> I'd prefer patches on top of my devel-stable branch, thanks.
To reinterate what I've just put on IRC - we have not got to the bottom
of this problem yet - it still very much exists.
There seems to be something of a fundamental issue with the unwinder,
it now appears to be going wrong and failing to unwind beyond a
couple of functions, and the address it's coming out with appears to
be incorrect. I've only just discovered this because I created my very
own bug, and yet again, the timing sucks with the proximity of the
merge window.
I'm getting:
[ 13.198803] [<c0017728>] (unwind_backtrace) from [<c0012828>] (show_stack+0x10/0x14)
[ 13.198820] [<c0012828>] (show_stack) from [<c2be78d4>] (0xc2be78d4)
for the WARN_ON() stacktrace, and that address that apparently called
show_stack() is most definitely rubbish and incorrect. This makes any
WARN_ON() condition undebuggable.
This is with both 9183/1 and 9184/1 applied on top of pulling your
"arm-ftrace-for-rmk" tag and also with just the "arm-vmap-stacks-v6"
tag. This seems to point at one of these patches breaking the
unwinder:
a1c510d0adc6 ARM: implement support for vmap'ed stacks
532319b9c418 ARM: unwind: disregard unwind info before stack frame is set up
4ab6827081c6 ARM: unwind: dump exception stack from calling frame
b6506981f880 ARM: unwind: support unwinding across multiple stacks
Given that the unwinder is broken, I wonder whether 0183/1 and 9184/1
are actually required.
I did try to point this problem out a few emails back:
"As the trace on those is:
[ 0.239629] unwind_backtrace from show_stack+0x10/0x14
[ 0.239654] show_stack from init_stack+0x1c54/0x2000
unwind_backtrace() and show_stack() are both C code, the compiler will
emit the unwind information for it. show_stack() isn't called from
assembly code, only from C code, so the next function's unwind
information should also be generated by the compiler.
However, init_stack is not a function - it's an array of unsigned long.
There is no way this should appear in the trace, and this suggests that
the unwind of show_stack() has gone wrong."
In Corentin's case, there is no way init_stack should ever appear in
the stack trace. In my case, it's not init_stack, but 0xc2be78d4.
Can you try testing out a dummy WARN_ON(1) test in your kernel please?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2022-03-09 0:03 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-01 15:48 boot flooded with unwind: Index not found Corentin Labbe
2022-03-01 15:48 ` Corentin Labbe
2022-03-01 15:49 ` Russell King (Oracle)
2022-03-01 15:49 ` Russell King (Oracle)
2022-03-01 16:37 ` Ard Biesheuvel
2022-03-01 16:37 ` Ard Biesheuvel
2022-03-01 16:52 ` Ard Biesheuvel
2022-03-01 16:52 ` Ard Biesheuvel
2022-03-01 18:19 ` Corentin Labbe
2022-03-01 18:19 ` Corentin Labbe
2022-03-02 8:39 ` Corentin Labbe
2022-03-02 8:39 ` Corentin Labbe
2022-03-02 8:44 ` Ard Biesheuvel
2022-03-02 8:44 ` Ard Biesheuvel
2022-03-02 8:55 ` Corentin Labbe
2022-03-02 9:45 ` Ard Biesheuvel
2022-03-02 9:45 ` Ard Biesheuvel
2022-03-02 10:09 ` Corentin Labbe
2022-03-02 10:09 ` Corentin Labbe
2022-03-02 11:12 ` Russell King (Oracle)
2022-03-02 11:12 ` Russell King (Oracle)
2022-03-02 11:19 ` Ard Biesheuvel
2022-03-02 11:19 ` Ard Biesheuvel
2022-03-02 11:22 ` Russell King (Oracle)
2022-03-02 11:22 ` Russell King (Oracle)
2022-03-09 0:01 ` Russell King (Oracle) [this message]
2022-03-09 0:01 ` Russell King (Oracle)
2022-03-09 1:08 ` Russell King (Oracle)
2022-03-09 1:08 ` Russell King (Oracle)
2022-03-09 7:20 ` Ard Biesheuvel
2022-03-09 7:20 ` Ard Biesheuvel
2022-03-01 18:16 ` Corentin Labbe
2022-03-01 18:16 ` Corentin Labbe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YifuVmkcb1ie7bzk@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=ardb@kernel.org \
--cc=arnd@arndb.de \
--cc=clabbe.montjoie@gmail.com \
--cc=linus.walleij@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.