* [parisc-linux] 32 bit compiler bug causing kernel crashes
@ 2000-09-15 12:06 John Marvin
2000-09-15 17:43 ` David Huggins-Daines
2000-09-27 21:45 ` David Huggins-Daines
0 siblings, 2 replies; 3+ messages in thread
From: John Marvin @ 2000-09-15 12:06 UTC (permalink / raw)
To: parisc-linux
I've been investigating a problem that was leading to the kernel executing
a break 0,0 (executing 0) at random times. I've tracked the problem down
to a compiler bug.
When the kernel hit the break instruction it was always at the same
location in the kernel (in __rpc_execute() in net/sunrpc/sched.c).
Since the 0 wasn't on a cache line boundary, and since it was in
kernel text (which isn't modified after palo loads it), I suspected
that the problem was not a cache flush bug, but was instead either
someone directly scribbling on the kernel, or someone dma'ing
into it. In order to eliminate the first possibility I modified the
kernel vm mappings to make the kernel text read only (and added some
code in the trap handler to catch it instead of passing it to
do_page_fault()). Well I caught some code in the act, and it was
quite close to the instruction that was being zero'd:
c0205b24 <.L1770>:
1---> c0205b24: 0c 86 12 80 stw r6,0(sr0,r4)
c0205b28: 40 73 01 08 ldb 84(sr0,r3),r19
c0205b2c: 08 b3 02 13 and r19,r5,r19
c0205b30: 86 60 20 0a cmpib,=,n 0,r19,c0205b3c <.L1809>
2---> c0205b34: e8 5f 1b 85 b,l c02058fc <__rpc_atrun+0x40>,rp
c0205b38: 34 42 3f d1 ldo -18(rp),rp
c0205b3c <.L1809>:
3---> c0205b3c: 0d 00 12 80 stw r0,0(sr0,r8)
"1--->" above points to the instruction that was being zeroed in error.
"2--->" above is a branch to schedule() (in kernel/sched.c).
"3--->" above is the instruction caught writing into "1--->" above
r8 contained 0xc0205b24. So I wondered how it got that value, since
it should be pointing to the current task structure (current->).
I thought perhaps there was still some old "r8 hack" code around,
and just yesterday I noticed some cruft in entry.S dealing with r8,
but although superfluous, it was not the problem, since the registers
were saved before r8 was used. I checked through all of the trap
paths, and couldn't find anyplace that was trashing r8.
So then I wondered how r8 was always getting 0xc0205b24, and figured that
that value must be used somewhere. I noticed the .L1770 label, and
figured that it must be there for a reason, but I couldn't find a
branch to it. I then noticed the ldo -18(rp),rp after the branch
to schedule. Ooooh, score a point for gcc. That's an optimization I've
never seen in the hp-ux compiler. schedule is in the bottom of a loop,
so this modification of rp in the delay slot causes schedule to return
to the top of the loop, i.e. <.L1770>. This means that 0xc0205b24
is in r2 when schedule is called. So I decided to look at schedule
to see if I could get a clue how the value of r2 was getting transferred
into r8. So here is the code at the beginning of schedule():
c0114404 <schedule>:
c0114404: 08 03 02 41 copy r3,r1
c0114408: 08 1e 02 43 copy sp,r3
4---> c011440c: 0c 68 12 90 stw r8,8(sr0,r3)
c0114410: 08 1e 02 48 copy sp,r8
c0114414: 6b c2 3f d9 stw rp,-14(sr0,sp)
c0114418: 08 08 02 53 copy r8,r19
5---> c011441c: 6f c1 01 00 stw,ma r1,80(sr0,sp)
This is not good. At "4--->" above r8 is being saved above the stack
pointer, i.e. before the stack pointer is incremented at "5--->" above.
This is a compiler bug, and I rebuilt my compiler from top of branch
sources to make sure that it is still there. It is.
For many of you, this is obvious, but just to finish this long winded
story, if an interrupt comes in between "4--->" and "5--->" above, the
stored value of r8 will get trashed because the stack pointer is
still pointing below it. When an interrupt comes in, a trap frame will
be stored starting at the stack pointer, and guess what register is
going to be saved at sp+8? Yep, r2.
John Marvin
jsm@fc.hp.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [parisc-linux] 32 bit compiler bug causing kernel crashes
2000-09-15 12:06 [parisc-linux] 32 bit compiler bug causing kernel crashes John Marvin
@ 2000-09-15 17:43 ` David Huggins-Daines
2000-09-27 21:45 ` David Huggins-Daines
1 sibling, 0 replies; 3+ messages in thread
From: David Huggins-Daines @ 2000-09-15 17:43 UTC (permalink / raw)
To: John Marvin; +Cc: parisc-linux
John Marvin <jsm@udlkern.fc.hp.com> writes:
> I've been investigating a problem that was leading to the kernel executing
> a break 0,0 (executing 0) at random times. I've tracked the problem down
> to a compiler bug.
Cool. Strangely enough I don't see this in 2.3.99pre8, only in 2.4.
The problem with userspace executing zero at random times is obviously
something else.
> So here is the code at the beginning of schedule():
>
> c0114404 <schedule>:
> c0114404: 08 03 02 41 copy r3,r1
> c0114408: 08 1e 02 43 copy sp,r3
Hm. This bit above puzzles me, because I thought the kernel was
either compiled with -fomit-frame-pointer, or with sufficient
optimization that the frame pointer would be omitted anyway.
> 4---> c011440c: 0c 68 12 90 stw r8,8(sr0,r3)
> c0114410: 08 1e 02 48 copy sp,r8
> c0114414: 6b c2 3f d9 stw rp,-14(sr0,sp)
> c0114418: 08 08 02 53 copy r8,r19
> 5---> c011441c: 6f c1 01 00 stw,ma r1,80(sr0,sp)
Wow. That is some unbelievably screwed up register spilling. In fact
I suspect that this may be a reload bug.
--
dhd@linuxcare.com, http://www.linuxcare.com/
Linuxcare. Support for the revolution.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [parisc-linux] 32 bit compiler bug causing kernel crashes
2000-09-15 12:06 [parisc-linux] 32 bit compiler bug causing kernel crashes John Marvin
2000-09-15 17:43 ` David Huggins-Daines
@ 2000-09-27 21:45 ` David Huggins-Daines
1 sibling, 0 replies; 3+ messages in thread
From: David Huggins-Daines @ 2000-09-27 21:45 UTC (permalink / raw)
To: John Marvin; +Cc: parisc-linux
John Marvin <jsm@udlkern.fc.hp.com> writes:
> This is not good. At "4--->" above r8 is being saved above the stack
> pointer, i.e. before the stack pointer is incremented at "5--->" above.
> This is a compiler bug, and I rebuilt my compiler from top of branch
> sources to make sure that it is still there. It is.
More follow up on this. I've isolated the optimization flag that
causes this, which is -fschedule-insns2 (or, as it's known internally,
'flag_schedule_insns_after_reload'). Which if you think about it
makes perfect sense given the description of the problem, since the
references to memory off the stack pointer are generated in reload.
Somehow I guess this pass has to be made aware that it can't reorder
references to the stack pointer and frame pointer with respect to each
other ... or something.
I'll keep investigating this when I have time but I don't see an
immediate solution to it.
--
dhd@linuxcare.com, http://www.linuxcare.com/
Linuxcare. Support for the revolution.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2000-09-27 21:58 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-09-15 12:06 [parisc-linux] 32 bit compiler bug causing kernel crashes John Marvin
2000-09-15 17:43 ` David Huggins-Daines
2000-09-27 21:45 ` David Huggins-Daines
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.