linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Bad mode in undefined instruction handler detected
@ 2016-03-29 14:55 Patrick Doyle
  2016-03-29 16:59 ` Russell King - ARM Linux
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Doyle @ 2016-03-29 14:55 UTC (permalink / raw)
  To: linux-arm-kernel

Hello folks...
I am looking for a clue as to how I might debug the following kernel oops:

Bad mode in undefined instruction handler detected
Internal error: Oops - bad mode: 0 [#1] ARM
Modules linked in: usb_f_eem g_ether u_ether libcomposite atmel_usba_udc
CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.0-linux4sam_5.1 #4
Hardware name: Atmel SAMA5
task: c11440b8 ti: c1140000 task.ti: c1140000
PC is at _einittext+0x3f9456e8/0xfffdb400
LR is at 0xc0c565c8
pc : [<ffff0024>]    lr : [<c0c565c8>]    psr: 600e0192
sp : c1141f30  ip : 00000019  fp : c11420d0
r10: c1147548  r9 : 00000132  r8 : b0a5c253
r7 : 00000132  r6 : b0a5bacb  r5 : 00000000  r4 : c1162d10
r3 : 00000132  r2 : b0a5c253  r1 : fffffff9  r0 : c1141f78
Flags: nZCv  IRQs off  FIQs on  Mode IRQ_32  ISA ARM  Segment kernel
Control: 10c53c7d  Table: 24804059  DAC: 00000015
Process swapper (pid: 0, stack limit = 0xc1140208)
Stack: (0xc1141f30 to 0xc1142000)
1f20:                                     c1141f78 fffffff9 b0a5c253 00000132
1f40: c1162d10 00000000 b0a5bacb 00000132 b0a5c253 00000132 c1147548 c11420d0
1f60: 00000019 c1141f30 c0c565c8 ffff0024 600e0192 ffffffff b0a5c253 00000132
1f80: 00000001 c1140000 c11420c8 c1162d10 c11631c4 00000001 c1162d08 c1147548
1fa0: c11420d0 c003c9f0 00000000 c116d57b 00000000 c1142000 00000000 c0686c34
1fc0: ffffffff ffffffff c0686680 00000000 00000000 c06ad280 c116db14 c1142078
1fe0: c06ad27c c114518c 20004059 410fc051 00000000 20008078 00000000 00000000
Code: ea000481 ea000400 ea000487 e7fddef1 (e7fddef1)
---[ end trace bf564fba2bcf8264 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task!

This reproduces quite regularly on my (custom SAMA5) development board
when transferring data from a camera (wired to the ISI port) to an
external host via USB.  So the problem could be related to the camera
driver, the USB driver, or something completely different (such as a
memory error that happens to occur with the same symptoms every time I
run my code for 5 to 10 minutes, despite the fact that I am unable to
induce a memory error by running stress, stress-ng, or memtester).

Any clues as to how I might parse this oops and ultimately track down
the root cause of this would be greatly appreciated.

Thanks.

--wpd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Bad mode in undefined instruction handler detected
  2016-03-29 14:55 Bad mode in undefined instruction handler detected Patrick Doyle
@ 2016-03-29 16:59 ` Russell King - ARM Linux
  2016-03-29 18:22   ` Patrick Doyle
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux @ 2016-03-29 16:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 29, 2016 at 10:55:07AM -0400, Patrick Doyle wrote:
> Hello folks...
> I am looking for a clue as to how I might debug the following kernel oops:

I'm not sure even I can help on this one, it looks extremely weird.

> Bad mode in undefined instruction handler detected
> Internal error: Oops - bad mode: 0 [#1] ARM
> Modules linked in: usb_f_eem g_ether u_ether libcomposite atmel_usba_udc
> CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.0-linux4sam_5.1 #4
> Hardware name: Atmel SAMA5
> task: c11440b8 ti: c1140000 task.ti: c1140000
> PC is at _einittext+0x3f9456e8/0xfffdb400
> LR is at 0xc0c565c8
> pc : [<ffff0024>]    lr : [<c0c565c8>]    psr: 600e0192
> sp : c1141f30  ip : 00000019  fp : c11420d0
> r10: c1147548  r9 : 00000132  r8 : b0a5c253
> r7 : 00000132  r6 : b0a5bacb  r5 : 00000000  r4 : c1162d10
> r3 : 00000132  r2 : b0a5c253  r1 : fffffff9  r0 : c1141f78
> Flags: nZCv  IRQs off  FIQs on  Mode IRQ_32  ISA ARM  Segment kernel

So what this is saying is that we entered the undefined instruction
handler due to the instruction at PC=0xffff0024 with PSR=0x600e0192,
where the PSR value indicates that we were in 32-bit IRQ mode.

Entering the undefined instruction handler from PC=0xffff0024 is
intentional, since we fill the vectors page with a code which is
guaranteed to cause an undefined instruction exception (0xe7fddef1).

The problem is two fold:

(a) how did we end up executing code at 0xffff0024?
(b) how did we execute this code while in IRQ mode?

> Control: 10c53c7d  Table: 24804059  DAC: 00000015
> Process swapper (pid: 0, stack limit = 0xc1140208)
> Stack: (0xc1141f30 to 0xc1142000)
> 1f20:                                     c1141f78 fffffff9 b0a5c253 00000132
> 1f40: c1162d10 00000000 b0a5bacb 00000132 b0a5c253 00000132 c1147548 c11420d0
> 1f60: 00000019 c1141f30 c0c565c8 ffff0024 600e0192 ffffffff

These stacked values are the register state printed out above, so we can
ignore these (this gets printed because we're unable to properly save
the SVC stack pointer, which is why these exceptions are fatal.)

> 1f60:                                                       b0a5c253 00000132
> 1f80: 00000001 c1140000 c11420c8 c1162d10 c11631c4 00000001 c1162d08 c1147548
> 1fa0: c11420d0 c003c9f0 00000000 c116d57b 00000000 c1142000 00000000 c0686c34
> 1fc0: ffffffff ffffffff c0686680 00000000 00000000 c06ad280 c116db14 c1142078
> 1fe0: c06ad27c c114518c 20004059 410fc051 00000000 20008078 00000000 00000000

This is the stack state for the parent context, and since we don't have
a backtrace (due to no frame pointer) this becomes guess work - frankly
I'm not going to waste my time guessing, and at this point ask you to
reproduce on a kernel _with_ frame pointers enabled.

You should then get proper stack frames which will take some of the guess
work out of working out what in the above stack dump are addresses and
what isn't.

This is a nice example where _not_ having frame pointers makes things
stupidly difficult to trace - IMHO GCC must _never_ remove support for
frame pointers on ARM for this very reason.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Bad mode in undefined instruction handler detected
  2016-03-29 16:59 ` Russell King - ARM Linux
@ 2016-03-29 18:22   ` Patrick Doyle
  2016-03-29 18:29     ` Russell King - ARM Linux
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Doyle @ 2016-03-29 18:22 UTC (permalink / raw)
  To: linux-arm-kernel

Russel,
Thank you for your insights...

On Tue, Mar 29, 2016 at 12:59 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> This is the stack state for the parent context, and since we don't have
> a backtrace (due to no frame pointer) this becomes guess work - frankly
> I'm not going to waste my time guessing, and at this point ask you to
> reproduce on a kernel _with_ frame pointers enabled.
>
> You should then get proper stack frames which will take some of the guess
> work out of working out what in the above stack dump are addresses and
> what isn't.
Hmmm... inexplicably, compiling with frame pointers enabled doesn't
seem to be an option for me :-(

My Kernel Hacking submenu (based on 4.1 branch of the linux-at91
kernel source tree) goes from

"Filter access to /dev/mem"
to
"Enable stack unwinding support (EXPERIMENTAL)"

with no option to set CONFIG_FRAME_POINTER in between.  Could that be
because FRAME_POINTER looks like:

config FRAME_POINTER
    bool
    depends on !THUMB2_KERNEL
    default y if !ARM_UNWIND || FUNCTION_GRAPH_TRACER
    help
      If you say N here, the resulting kernel will be slightly smaller and
      faster. However, if neither FRAME_POINTER nor ARM_UNWIND are enabled,
      when a problem occurs with the kernel, the information that is
      reported is severely limited.

-- that is, not prompt for the "bool" option?  Is this an either/or
choice for ARM_UNWIND vs FRAME_POINTER?

I'm going to try adding a prompt for FRAME_POINTER in my tree and see
that gives me the ability to enable it.

--wpd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Bad mode in undefined instruction handler detected
  2016-03-29 18:22   ` Patrick Doyle
@ 2016-03-29 18:29     ` Russell King - ARM Linux
  2016-03-29 19:18       ` Patrick Doyle
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux @ 2016-03-29 18:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 29, 2016 at 02:22:29PM -0400, Patrick Doyle wrote:
> Hmmm... inexplicably, compiling with frame pointers enabled doesn't
> seem to be an option for me :-(

Right, because the Kconfig system is setup to use a frame pointer
when the configuration options allow it, because not having it hurts
the ability to get sensible debugging from kernel crashes.

> with no option to set CONFIG_FRAME_POINTER in between.  Could that be
> because FRAME_POINTER looks like:
> 
> config FRAME_POINTER
>     bool
>     depends on !THUMB2_KERNEL
>     default y if !ARM_UNWIND || FUNCTION_GRAPH_TRACER
>     help
>       If you say N here, the resulting kernel will be slightly smaller and
>       faster. However, if neither FRAME_POINTER nor ARM_UNWIND are enabled,
>       when a problem occurs with the kernel, the information that is
>       reported is severely limited.
> 
> -- that is, not prompt for the "bool" option?  Is this an either/or
> choice for ARM_UNWIND vs FRAME_POINTER?

THUMB2_KERNEL is an option, which I believe you already have disabled.
So, the offending option is probably ARM_UNWIND.  You are provided with
the ARM_UNWIND option, and you need to say no to that.  Don't try to
build a kenrel with frame pointers and unwinding together - it'll
probably fail, or if you succeed in building it, it probably won't
work correctly since that's a combination we don't support.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Bad mode in undefined instruction handler detected
  2016-03-29 18:29     ` Russell King - ARM Linux
@ 2016-03-29 19:18       ` Patrick Doyle
  2016-03-29 20:04         ` Russell King - ARM Linux
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Doyle @ 2016-03-29 19:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 29, 2016 at 2:29 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> THUMB2_KERNEL is an option, which I believe you already have disabled.
> So, the offending option is probably ARM_UNWIND.  You are provided with
> the ARM_UNWIND option, and you need to say no to that.  Don't try to
> build a kenrel with frame pointers and unwinding together - it'll
> probably fail, or if you succeed in building it, it probably won't
> work correctly since that's a combination we don't support.
>
Right, I can confirm that building with both produces a kernel that
doesn't boot (for me anyway).  Unfortunately, enabling FRAME_POINTER
instead of ARM_UNWIND produced a kernel which, when it crashed only
told me:

Unable to handle kernel paging request at virtual address e5912144
pgd = c0004000
[e5912144] *pgd=00000000
Internal error: Oops: 5 [#1] ARM

Is there something else I need to enable to get a more verbose dump?

--wpd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Bad mode in undefined instruction handler detected
  2016-03-29 19:18       ` Patrick Doyle
@ 2016-03-29 20:04         ` Russell King - ARM Linux
  2016-03-29 21:19           ` Patrick Doyle
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux @ 2016-03-29 20:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 29, 2016 at 03:18:08PM -0400, Patrick Doyle wrote:
> On Tue, Mar 29, 2016 at 2:29 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > THUMB2_KERNEL is an option, which I believe you already have disabled.
> > So, the offending option is probably ARM_UNWIND.  You are provided with
> > the ARM_UNWIND option, and you need to say no to that.  Don't try to
> > build a kenrel with frame pointers and unwinding together - it'll
> > probably fail, or if you succeed in building it, it probably won't
> > work correctly since that's a combination we don't support.
> >
> Right, I can confirm that building with both produces a kernel that
> doesn't boot (for me anyway).  Unfortunately, enabling FRAME_POINTER
> instead of ARM_UNWIND produced a kernel which, when it crashed only
> told me:
> 
> Unable to handle kernel paging request at virtual address e5912144
> pgd = c0004000
> [e5912144] *pgd=00000000
> Internal error: Oops: 5 [#1] ARM
> 
> Is there something else I need to enable to get a more verbose dump?

Sounds like you're getting memory corruption, and this time it's
(possibly) overwriting bits of the kernel responsible for dumping
this data.

If it only happens when using the camera, it sounds to me (at least)
as though the camera driver is buggy, and that's outside of my
expertise.  That's guess-work though.

I'm afraid that I don't think I can help in this case, sorry.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Bad mode in undefined instruction handler detected
  2016-03-29 20:04         ` Russell King - ARM Linux
@ 2016-03-29 21:19           ` Patrick Doyle
  0 siblings, 0 replies; 7+ messages in thread
From: Patrick Doyle @ 2016-03-29 21:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 29, 2016 at 4:04 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Mar 29, 2016 at 03:18:08PM -0400, Patrick Doyle wrote:
>> Unable to handle kernel paging request at virtual address e5912144
>> pgd = c0004000
>> [e5912144] *pgd=00000000
>> Internal error: Oops: 5 [#1] ARM
>>
>> Is there something else I need to enable to get a more verbose dump?
>
> Sounds like you're getting memory corruption, and this time it's
> (possibly) overwriting bits of the kernel responsible for dumping
> this data.
>
> If it only happens when using the camera, it sounds to me (at least)
> as though the camera driver is buggy, and that's outside of my
> expertise.  That's guess-work though.
>
> I'm afraid that I don't think I can help in this case, sorry.
Thanks for your help and insight.  It does happen in other contexts
besides the use of the camera.  I'll keep poking around and see what I
can find.

Thanks again.

--wpd

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-03-29 21:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-29 14:55 Bad mode in undefined instruction handler detected Patrick Doyle
2016-03-29 16:59 ` Russell King - ARM Linux
2016-03-29 18:22   ` Patrick Doyle
2016-03-29 18:29     ` Russell King - ARM Linux
2016-03-29 19:18       ` Patrick Doyle
2016-03-29 20:04         ` Russell King - ARM Linux
2016-03-29 21:19           ` Patrick Doyle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).