* strange, spurious seeming vector exception on pxa300
@ 2009-12-01 22:13 Yeasah Pell
2009-12-02 6:00 ` Eric Miao
0 siblings, 1 reply; 7+ messages in thread
From: Yeasah Pell @ 2009-12-01 22:13 UTC (permalink / raw)
To: linux-arm-kernel
Has anybody ever seen vector exceptions happen on an ARM (xscale,
pxa300) without 26-bit mode being used? I have some application and
kernel code which appears to work on most hardware, but we have at least
one board which causes periodic messages:
Unhandled fault: vector exception (0x010) at 0x412c8a90
(I also fudged the fault handler a bit to dump the SPSR: 0x80000010)
These messages correspond with a SEGV being sent to the application. The
code address is always the same, but the instruction in question is just
an ordinary load just like many others surrounding it. It's in a hot
path so it's being run successfully for thousands of iterations before
the problem manifests. Running in gdb and ignoring the SEGV causes the
application to continue normally, so apparently the load is successful
(the particular load operation in question is critical to proper
operation of the app)
The definition of a vector exception as I understand it seems to be at
odds with the context in which the exception is being generated (for one
thing the CPU's not in 26-bit mode, and for another thing the data
address is nowhere near the exception vectors), so it seems like it
might be spurious somehow.
If anybody has any theory how this might happen other than some kind of
hardware fault, please let me know! It's driving me absolutely nuts.
Thanks,
Yeasah Pell
^ permalink raw reply [flat|nested] 7+ messages in thread
* strange, spurious seeming vector exception on pxa300
2009-12-01 22:13 strange, spurious seeming vector exception on pxa300 Yeasah Pell
@ 2009-12-02 6:00 ` Eric Miao
2009-12-02 6:07 ` Eric Miao
0 siblings, 1 reply; 7+ messages in thread
From: Eric Miao @ 2009-12-02 6:00 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 2, 2009 at 6:13 AM, Yeasah Pell <yeasah@comrex.com> wrote:
> Has anybody ever seen vector exceptions happen on an ARM (xscale, pxa300)
> without 26-bit mode being used? I have some application and kernel code
> which appears to work on most hardware, but we have at least one board which
> causes periodic messages:
>
> Unhandled fault: vector exception (0x010) at 0x412c8a90
>
> (I also fudged the fault handler a bit to dump the SPSR: 0x80000010)
Never had such exceptions. This is weird, SPSR[4] == 1 indicates a 32-bit mode.
^ permalink raw reply [flat|nested] 7+ messages in thread
* strange, spurious seeming vector exception on pxa300
2009-12-02 6:00 ` Eric Miao
@ 2009-12-02 6:07 ` Eric Miao
2009-12-02 14:40 ` Yeasah Pell
0 siblings, 1 reply; 7+ messages in thread
From: Eric Miao @ 2009-12-02 6:07 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 2, 2009 at 2:00 PM, Eric Miao <eric.y.miao@gmail.com> wrote:
> On Wed, Dec 2, 2009 at 6:13 AM, Yeasah Pell <yeasah@comrex.com> wrote:
>> Has anybody ever seen vector exceptions happen on an ARM (xscale, pxa300)
>> without 26-bit mode being used? I have some application and kernel code
>> which appears to work on most hardware, but we have at least one board which
>> causes periodic messages:
>>
>> Unhandled fault: vector exception (0x010) at 0x412c8a90
>>
>> (I also fudged the fault handler a bit to dump the SPSR: 0x80000010)
>
> Never had such exceptions. This is weird, SPSR[4] == 1 indicates a 32-bit mode.
When the processor is in a 32-bit configuration (PROG32 is active) and
in a 26-bit mode (CPSR[4] == 0),
data access (but not instruction fetches) to the exception vectors
(address 0x0 to 0x1f) causes a data abort.
This is known as a vector exception.
This is what explained in the manual, seems something related to 26-bit mode.
What's your compiling environment and flags for your application?
^ permalink raw reply [flat|nested] 7+ messages in thread
* strange, spurious seeming vector exception on pxa300
2009-12-02 6:07 ` Eric Miao
@ 2009-12-02 14:40 ` Yeasah Pell
2009-12-02 15:50 ` Russell King - ARM Linux
0 siblings, 1 reply; 7+ messages in thread
From: Yeasah Pell @ 2009-12-02 14:40 UTC (permalink / raw)
To: linux-arm-kernel
Eric Miao wrote:
> On Wed, Dec 2, 2009 at 2:00 PM, Eric Miao <eric.y.miao@gmail.com> wrote:
>
>> On Wed, Dec 2, 2009 at 6:13 AM, Yeasah Pell <yeasah@comrex.com> wrote:
>>
>>> Has anybody ever seen vector exceptions happen on an ARM (xscale, pxa300)
>>> without 26-bit mode being used? I have some application and kernel code
>>> which appears to work on most hardware, but we have at least one board which
>>> causes periodic messages:
>>>
>>> Unhandled fault: vector exception (0x010) at 0x412c8a90
>>>
>>> (I also fudged the fault handler a bit to dump the SPSR: 0x80000010)
>>>
>> Never had such exceptions. This is weird, SPSR[4] == 1 indicates a 32-bit mode.
>>
>
> When the processor is in a 32-bit configuration (PROG32 is active) and
> in a 26-bit mode (CPSR[4] == 0),
> data access (but not instruction fetches) to the exception vectors
> (address 0x0 to 0x1f) causes a data abort.
> This is known as a vector exception.
>
> This is what explained in the manual, seems something related to 26-bit mode.
> What's your compiling environment and flags for your application?
>
Hi, Eric -- thanks for the reply.
It's a crosstool-ng generated toolchain w/gcc 4.3.2. The optimization
flags are '-mcpu=xscale -funroll-loops -O3', but it has been observed on
debug builds which lack these flags as well.
There's no 26-bit code in the system that I'm aware of, certainly not in
the application where the exception occurs. As you can see from the
saved CPSR, the processor isn't in 26-bit mode at the time of the
exception anyway. And even if it was, the load is from 0x412c8a90
(etc.), not 0x0-0x1f. From what I've seen in the ARM architecture manual
(mostly the part that you've copied above), this operation should not be
able to cause such an exception, so I'm wondering if there is some
alternate condition that can lead to this kind of exception.
In gdb, things look like this (after the SEGV from the fault is received
by the target):
(gdb) info registers
r0 0x0 0
r1 0x412c8a04 1093437956
r2 0x0 0
r3 0x401c57f8 1075599352
r4 0x4029457c 1076446588
r5 0x9 9
r6 0x40390000 1077477376
r7 0x412c94e0 1093440736
r8 0x40390150 1077477712
r9 0x3d0f00 4001536
r10 0x4037a6bc 1077388988
r11 0x412c8b84 1093438340
r12 0x401d6c20 1075670048
sp 0x412c8a2c 0x412c8a2c
lr 0x4029603c 1076453436
pc 0x400ec47c 0x400ec47c <f1+172>
fps 0x0 0
cpsr 0x60000010 1610612752
(gdb) disassemble 0x400ec47c
Dump of assembler code for function f1:
0x400ec3d0 <f1+0>: mov r12, sp
0x400ec3d4 <f1+4>: push {r4, r5, r6, r7, r8, r9, r10, r11, r12,
lr, pc}
0x400ec3d8 <f1+8>: ldr r4, [pc, #3508] ; 0x400ed194 <f1+3524>
0x400ec3dc <f1+12>: sub r11, r12, #4 ; 0x4
0x400ec3e0 <f1+16>: ldr lr, [pc, #3504] ; 0x400ed198 <f1+3528>
0x400ec3e4 <f1+20>: ldr r12, [pc, #3504] ; 0x400ed19c <f1+3532>
0x400ec3e8 <f1+24>: add r3, pc, r4
0x400ec3ec <f1+28>: sub sp, sp, #304 ; 0x130
0x400ec3f0 <f1+32>: str r3, [r11, #-296]
0x400ec3f4 <f1+36>: ldr r4, [r3, r12]
0x400ec3f8 <f1+40>: add lr, r3, lr
0x400ec3fc <f1+44>: ldr r12, [r11, #-296]
0x400ec400 <f1+48>: ldr r3, [pc, #3480] ; 0x400ed1a0 <f1+3536>
0x400ec404 <f1+52>: str r0, [r11, #-244]
0x400ec408 <f1+56>: sub r0, r11, #40 ; 0x28
0x400ec40c <f1+60>: add r3, r12, r3
0x400ec410 <f1+64>: sub r12, r11, #140 ; 0x8c
0x400ec414 <f1+68>: str r4, [r11, #-148]
0x400ec418 <f1+72>: str lr, [r11, #-144]
0x400ec41c <f1+76>: stmib r12, {r3, sp}
0x400ec420 <f1+80>: str r0, [r11, #-140]
0x400ec424 <f1+84>: sub r0, r11, #172 ; 0xac
0x400ec428 <f1+88>: str r1, [r11, #-248]
0x400ec42c <f1+92>: str r2, [r11, #-252]
0x400ec430 <f1+96>: bl 0x400e1c60 <_init+1048>
0x400ec434 <f1+100>: ldr r1, [r11, #-248] ; beginning of "actual"
function code
0x400ec438 <f1+104>: cmp r1, #0 ; 0x0 ; this is expected to be
always unequal
0x400ec43c <f1+108>: streq r1, [r11, #-228]
0x400ec440 <f1+112>: beq 0x400ec47c <f1+172>
0x400ec444 <f1+116>: ldr r3, [pc, #3416] ; 0x400ed1a4 <f1+3540>
0x400ec448 <f1+120>: ldr r2, [r11, #-296]
0x400ec44c <f1+124>: ldr lr, [pc, #3412] ; 0x400ed1a8 <f1+3544>
0x400ec450 <f1+128>: mov r0, r1
0x400ec454 <f1+132>: ldr r1, [r2, r3]
0x400ec458 <f1+136>: mov r3, #0 ; 0x0
0x400ec45c <f1+140>: ldr r2, [r2, lr]
0x400ec460 <f1+144>: bl 0x400e3370 <_init+6952>
0x400ec464 <f1+148>: cmp r0, #0 ; 0x0 ; this is expected to be
always equal
0x400ec468 <f1+152>: ldrne r12, [r11, #-244]
0x400ec46c <f1+156>: movne r3, #1 ; 0x1
0x400ec470 <f1+160>: str r0, [r11, #-228]
0x400ec474 <f1+164>: strne r3, [r12, #16]
0x400ec478 <f1+168>: strne r3, [r12, #8]
0x400ec47c <f1+172>: ldr r1, [r11, #-244] ; this throws an
exception once in many thousand iterations
0x400ec480 <f1+176>: ldr r0, [r1, #16]
...
The compare at 0x400ec434 is expected to be unequal (and the register
state shown above confirms this at the time of the exception), and the
compare at 0x400ec464 is expected to be equal (again the register state
confirms this). So we know the path of execution must have included for
example 0x400ec448, which is a substantially similar operation to the
one which causes the exception: a plain register load from the same page
in memory.
I noticed that the instruction that throws the exception is a branch
target (from 0x400ec430). Inserting a nop at the location the exception
is thrown appears to avoid the problem at any timescale that I can
detect (many hours at least, versus up to a few minutes that it takes to
fail without it) -- but inserting a nop at any other location in the
function doesn't seem effective. Perhaps I will try running this test
with branch prediction disabled -- assuming that doesn't hurt
performance so much that the test cannot be run.
^ permalink raw reply [flat|nested] 7+ messages in thread
* strange, spurious seeming vector exception on pxa300
2009-12-02 14:40 ` Yeasah Pell
@ 2009-12-02 15:50 ` Russell King - ARM Linux
2009-12-02 16:04 ` Daniel Mack
0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux @ 2009-12-02 15:50 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 02, 2009 at 09:40:41AM -0500, Yeasah Pell wrote:
> Eric Miao wrote:
>> On Wed, Dec 2, 2009 at 2:00 PM, Eric Miao <eric.y.miao@gmail.com> wrote:
>>> On Wed, Dec 2, 2009 at 6:13 AM, Yeasah Pell <yeasah@comrex.com> wrote:
>>>> Has anybody ever seen vector exceptions happen on an ARM (xscale, pxa300)
>>>> without 26-bit mode being used? I have some application and kernel code
>>>> which appears to work on most hardware, but we have at least one board which
>>>> causes periodic messages:
>>>>
>>>> Unhandled fault: vector exception (0x010) at 0x412c8a90
FSR=0x010, which decodes as: Read. Domain 1. Status 0.
>>>> (I also fudged the fault handler a bit to dump the SPSR: 0x80000010)
NZCV=1000, 32-bit, user mode.
> There's no 26-bit code in the system that I'm aware of, certainly not in
> the application where the exception occurs. As you can see from the
> saved CPSR, the processor isn't in 26-bit mode at the time of the
> exception anyway. And even if it was, the load is from 0x412c8a90
> (etc.), not 0x0-0x1f. From what I've seen in the ARM architecture manual
> (mostly the part that you've copied above), this operation should not be
> able to cause such an exception, so I'm wondering if there is some
> alternate condition that can lead to this kind of exception.
>
> In gdb, things look like this (after the SEGV from the fault is received
> by the target):
>
> (gdb) info registers
> r0 0x0 0
> r1 0x412c8a04 1093437956
> r2 0x0 0
> r3 0x401c57f8 1075599352
> r4 0x4029457c 1076446588
> r5 0x9 9
> r6 0x40390000 1077477376
> r7 0x412c94e0 1093440736
> r8 0x40390150 1077477712
> r9 0x3d0f00 4001536
> r10 0x4037a6bc 1077388988
> r11 0x412c8b84 1093438340
0x412c8a90 (the fault value) + 224 = 0x412c8b84, which is the r11 value.
So that's consistent.
> r12 0x401d6c20 1075670048
> sp 0x412c8a2c 0x412c8a2c
> lr 0x4029603c 1076453436
> pc 0x400ec47c 0x400ec47c <f1+172>
> fps 0x0 0
> cpsr 0x60000010 1610612752
CPSR says NZCV=0110 (zero, carry). 32-bit user mode.
Given that the conditions are clearly wrong for a vector exception, I would
say that you're hitting some kind of hardware bug - maybe caused by a dirty
power supply to the PXA, causing it to misbehave?
^ permalink raw reply [flat|nested] 7+ messages in thread
* strange, spurious seeming vector exception on pxa300
2009-12-02 15:50 ` Russell King - ARM Linux
@ 2009-12-02 16:04 ` Daniel Mack
2009-12-02 16:39 ` Yeasah Pell
0 siblings, 1 reply; 7+ messages in thread
From: Daniel Mack @ 2009-12-02 16:04 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Dec 02, 2009 at 03:50:57PM +0000, Russell King - ARM Linux wrote:
> > r12 0x401d6c20 1075670048
> > sp 0x412c8a2c 0x412c8a2c
> > lr 0x4029603c 1076453436
> > pc 0x400ec47c 0x400ec47c <f1+172>
> > fps 0x0 0
> > cpsr 0x60000010 1610612752
>
> CPSR says NZCV=0110 (zero, carry). 32-bit user mode.
>
> Given that the conditions are clearly wrong for a vector exception, I would
> say that you're hitting some kind of hardware bug - maybe caused by a dirty
> power supply to the PXA, causing it to misbehave?
We've had trouble of that kind as well some month ago with an early
prototype. It wasn't an exception we got, but the bug was clearly
hitting the same code path every single time, so this issue might be
related. Eventually it went away with new board revision which made wire
patching around the DDR SDRAM unnecessary (i.e, cleaner signal pathes).
Strange enough, I would have expected such flaws to cause processor
misbeviour of all sorts, totally random and unpredictable. The fact that
is was the same function we always ended up in is still some kind of
miracle I can't explain.
Daniel
^ permalink raw reply [flat|nested] 7+ messages in thread
* strange, spurious seeming vector exception on pxa300
2009-12-02 16:04 ` Daniel Mack
@ 2009-12-02 16:39 ` Yeasah Pell
0 siblings, 0 replies; 7+ messages in thread
From: Yeasah Pell @ 2009-12-02 16:39 UTC (permalink / raw)
To: linux-arm-kernel
Daniel Mack wrote:
> On Wed, Dec 02, 2009 at 03:50:57PM +0000, Russell King - ARM Linux wrote:
>>
>>
>> Given that the conditions are clearly wrong for a vector exception, I would
>> say that you're hitting some kind of hardware bug - maybe caused by a dirty
>> power supply to the PXA, causing it to misbehave?
>>
Of course I desperately want to believe this, as then I can ignore all
this insanity and move on, but it's a bit difficult to swallow that some
kind of hardware issue would cause a failure that is so consistent (same
instruction every time, regardless of physical/virtual memory locations,
compilation flags, etc.) yet caused absolutely no other perceptible
problems...
> We've had trouble of that kind as well some month ago with an early
> prototype. It wasn't an exception we got, but the bug was clearly
> hitting the same code path every single time, so this issue might be
> related. Eventually it went away with new board revision which made wire
> patching around the DDR SDRAM unnecessary (i.e, cleaner signal pathes).
>
> Strange enough, I would have expected such flaws to cause processor
> misbeviour of all sorts, totally random and unpredictable. The fact that
> is was the same function we always ended up in is still some kind of
> miracle I can't explain.
>
> Daniel
>
...and having some anecdotal evidence of that kind of situation
happening is helpful.
Thanks, guys.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-12-02 16:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-01 22:13 strange, spurious seeming vector exception on pxa300 Yeasah Pell
2009-12-02 6:00 ` Eric Miao
2009-12-02 6:07 ` Eric Miao
2009-12-02 14:40 ` Yeasah Pell
2009-12-02 15:50 ` Russell King - ARM Linux
2009-12-02 16:04 ` Daniel Mack
2009-12-02 16:39 ` Yeasah Pell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).