Re: bcm33xx port

Linux MIPS Architecture development
 help / color / mirror / Atom feed

* Re: bcm33xx port
       [not found] ` <Pine.LNX.4.55.0806080342310.15673@cliff.in.clinika.pl>
@ 2008-06-08  4:32   ` Luke -Jr
  2008-06-08 12:53     ` Maciej W. Rozycki
  0 siblings, 1 reply; 12+ messages in thread
From: Luke -Jr @ 2008-06-08  4:32 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-kernel, linux-mips

On Saturday 07 June 2008, Maciej W. Rozycki wrote:
> On Sat, 7 Jun 2008, Luke -Jr wrote:
> > > I'm not too up on MIPS but there're a few things in the log which stand
> > > out to me:
> > >
> > > Determined physical RAM map:
> > >  memory: 00fa0000 @ 00000000 (usable)
> > > User-defined physical RAM map:
> > >  memory: 007a1200 @ 00000000 (usable)
> > >
> > > Can you confirm these sizes and locations for RAM?  Does anything
> > > change if you don't force the size constraint?
> >
> > According to
> > http://research.msrg.utoronto.ca/ece344/2007s/os161/mips.html , MIPS has
> > a pretty odd memory layout, and I'm honestly not sure how Linux usually
> > handles it. I don't feel competent to try and summarize the details on
> > that page here.
>
>  Nothing odd about the memory layout I would say unless you want to go
> beyond 512MB with a 32-bit system which is not the case here.

Well, I always imagined memory layout as being a simple flat range from 0 to 
all_memory_in_system, but this is my first experience with it at such a low 
level, so I guess I don't know what's "odd" or "normal".

> > > CPU frequency 32.00 MHz
> > >
> > > Really?  Is your bootloader setting the CPU up correctly before handing
> > > control to Linux?
> >
> > The CPU is 200 MHz, I believe. The bootloader is just a part of VxWorks,
> > not really meant to boot anything else.
>
>  CFE is pretty much standard for Broadcom platforms and far from being
> specific to VxWorks.

VxWorks, including the boot loader, is not CFE as far as I am aware. If you're 
referring to the "CFEv2" in the log, that appears to be the default of a 
switch (eg, if Linux doesn't detect anything else).

>  I'd be more concerned about:
>
> Calibrating delay loop (skipped)... 0.00 BogoMIPS preset

The calibration code was crashing, so I set it to a fixed 1 value.
Worst case, some code won't delay as long as it wants to, right?

> > > Reserved instruction in kernel code[#1]:
> > >
> > > You're compiling with an appropriate -march switch?
> >
> > I believe so... It appears to be a "reserved instruction" only because of
> > the memory area it tries to access. The instruction in question is "store
> > word", nothing complex.
>
>  You have got something seriously broken -- __bzero traps exceptions on
> stores for graceful recovery as user addresses may be accessed as is the
> case here.  If the reserved instruction exception handler is reached, then
> clearly the store instruction is not the immediate cause.

What else could it be?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08  4:32   ` bcm33xx port Luke -Jr
@ 2008-06-08 12:53     ` Maciej W. Rozycki
  2008-06-08 18:56       ` Luke -Jr
  0 siblings, 1 reply; 12+ messages in thread
From: Maciej W. Rozycki @ 2008-06-08 12:53 UTC (permalink / raw)
  To: Luke -Jr; +Cc: linux-kernel, linux-mips

On Sat, 7 Jun 2008, Luke -Jr wrote:

> Well, I always imagined memory layout as being a simple flat range from 0 to 
> all_memory_in_system, but this is my first experience with it at such a low 
> level, so I guess I don't know what's "odd" or "normal".

 You mean the layout of virtual memory?  Well, have a look at what the
Alpha defines as sparse memory for something certainly less
straightforward than what MIPS segments are.  Anyway, what's reported here
is physical memory and there is nothing special about it.

> VxWorks, including the boot loader, is not CFE as far as I am aware. If you're 
> referring to the "CFEv2" in the log, that appears to be the default of a 
> switch (eg, if Linux doesn't detect anything else).

 That message is not included in the standard kernel -- how can I know it
is meaningless?  As I wrote CFE is standard Broadcom firmware.

> The calibration code was crashing, so I set it to a fixed 1 value.
> Worst case, some code won't delay as long as it wants to, right?

 That's grossly wrong.  If you need to preset it for the time being till
you debug calibration, then for a MIPS processor assume one instruction
per clock tick and two instructions per loop -- that may not be entirely
correct, but is a good approximation.  Otherwise you risk peripheral
devices are not driven correctly with all sorts of the nasty results.

> >  You have got something seriously broken -- __bzero traps exceptions on
> > stores for graceful recovery as user addresses may be accessed as is the
> > case here.  If the reserved instruction exception handler is reached, then
> > clearly the store instruction is not the immediate cause.
> 
> What else could it be?

 Well, you've got the system and I have no crystal ball.  You have means
to debug it.  See how control is passed to the RI exception.  Find which 
of the TLB exceptions happens and how it proceeds.  Etc...

  Maciej

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 12:53     ` Maciej W. Rozycki
@ 2008-06-08 18:56       ` Luke -Jr
  2008-06-08 19:53         ` Kevin D. Kissell
  2008-06-08 19:59         ` Maciej W. Rozycki
  0 siblings, 2 replies; 12+ messages in thread
From: Luke -Jr @ 2008-06-08 18:56 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-kernel, linux-mips

On Sunday 08 June 2008, Maciej W. Rozycki wrote:
> On Sat, 7 Jun 2008, Luke -Jr wrote:
> > VxWorks, including the boot loader, is not CFE as far as I am aware. If
> > you're referring to the "CFEv2" in the log, that appears to be the
> > default of a switch (eg, if Linux doesn't detect anything else).
>
>  That message is not included in the standard kernel -- how can I know it
> is meaningless?  As I wrote CFE is standard Broadcom firmware.

It's not? Guess it came from the bcm63xx patches OpenWrt has that I'm using as 
a base for this... Either way, it seems unlikely something claiming to 
be "VxWorks System Boot" is a standard firmware.

> > The calibration code was crashing, so I set it to a fixed 1 value.
> > Worst case, some code won't delay as long as it wants to, right?
>
>  That's grossly wrong.  If you need to preset it for the time being till
> you debug calibration, then for a MIPS processor assume one instruction
> per clock tick and two instructions per loop -- that may not be entirely
> correct, but is a good approximation.  Otherwise you risk peripheral
> devices are not driven correctly with all sorts of the nasty results.

Meaning this?
	preset_lpj = loops_per_jiffy = 2;

> > >  You have got something seriously broken -- __bzero traps exceptions on
> > > stores for graceful recovery as user addresses may be accessed as is
> > > the case here.  If the reserved instruction exception handler is
> > > reached, then clearly the store instruction is not the immediate cause.
> >
> > What else could it be?
>
>  Well, you've got the system and I have no crystal ball.  You have means
> to debug it.  See how control is passed to the RI exception.  Find which
> of the TLB exceptions happens and how it proceeds.  Etc...

Unfortunately, I don't understand how to "see how control is passed" or 
finding TLB exceptions... Could you point me in the right direction to learn 
about this?

On Sunday 08 June 2008, Kevin D. Kissell wrote:
> The universe of possible failures is large.  The two most likely categories
> are (a) configuring the build for a variant of the architecture (64-bit,
> MIPS32R2) that your hardware doesn't support - this is what Maciej was
> referring to,

CONFIG_CPU_MIPS32_R1=y

> and (b) control being transferred to a block of memory that isn't actually
> code, as can happen if exception vectors or global pointers-to-functions
> aren't set up correctly, or if the kernel stack is being corrupted.   When
> you say "the instruction in question is a store word", how do you know that? 

The RI error spits out a bunch of info, including epc which presumably points 
to the instruction causing the problem: ac85ffc0; this is 'sw a1,-64(a0)'

Luke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 18:56       ` Luke -Jr
@ 2008-06-08 19:53         ` Kevin D. Kissell
  2008-06-08 20:14           ` Maciej W. Rozycki
  2008-06-08 20:20           ` Luke -Jr
  2008-06-08 19:59         ` Maciej W. Rozycki
  1 sibling, 2 replies; 12+ messages in thread
From: Kevin D. Kissell @ 2008-06-08 19:53 UTC (permalink / raw)
  To: Luke -Jr; +Cc: Maciej W. Rozycki, linux-kernel, linux-mips

Luke -Jr wrote:
> On Sunday 08 June 2008, Kevin D. Kissell wrote:
>   
>> and (b) control being transferred to a block of memory that isn't actually
>> code, as can happen if exception vectors or global pointers-to-functions
>> aren't set up correctly, or if the kernel stack is being corrupted.   When
>> you say "the instruction in question is a store word", how do you know that? 
>>     
>
> The RI error spits out a bunch of info, including epc which presumably points 
> to the instruction causing the problem: ac85ffc0; this is 'sw a1,-64(a0)'
>   
But unless the processor itself is actually defective, there is no way that
a  SW instruction can cause an RI exception.  Sometimes a kernel crash
is so violent that the kernel stack frame cannot be reliably decoded by
the crash dump code, and this would appear to be one of those cases.
I find the address of 0xac85ffc0 to be a bit suspicious, myself.  That's
a kseg1 (non-cacheable identity map) address for physical address
0x0c85ffc0, which would be legitimate (though suspicious) if you had
256MB of RAM, but the boot log quote you posted earlier suggests
that you've only got 16M.  Is there really memory of some kind at
that address?  Are you calling routines in a boot ROM from Linux?

Debugging Linux kernel crashes is probably not the best way to learn
the MIPS privileged resource architecture.  I'd strongly recommend
http://www.amazon.com/See-MIPS-Second-Dominic-Sweetman/dp/0120884216/

          Regards,

          Kevin K.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 18:56       ` Luke -Jr
  2008-06-08 19:53         ` Kevin D. Kissell
@ 2008-06-08 19:59         ` Maciej W. Rozycki
  2008-06-08 20:27           ` Luke -Jr
  1 sibling, 1 reply; 12+ messages in thread
From: Maciej W. Rozycki @ 2008-06-08 19:59 UTC (permalink / raw)
  To: Luke -Jr; +Cc: linux-kernel, linux-mips

On Sun, 8 Jun 2008, Luke -Jr wrote:

> It's not? Guess it came from the bcm63xx patches OpenWrt has that I'm using as 
> a base for this... Either way, it seems unlikely something claiming to 
> be "VxWorks System Boot" is a standard firmware.

 It would be best if the patches you are referring to got merged with the
mainline.  Otherwise whoever uses them is essentially on their own --
people lack the resources needed to chase random changes out there in
general.

> >  That's grossly wrong.  If you need to preset it for the time being till
> > you debug calibration, then for a MIPS processor assume one instruction
> > per clock tick and two instructions per loop -- that may not be entirely
> > correct, but is a good approximation.  Otherwise you risk peripheral
> > devices are not driven correctly with all sorts of the nasty results.
> 
> Meaning this?
> 	preset_lpj = loops_per_jiffy = 2;

 Not exactly.  Try harder -- this is simple arithmetic and you've got all
the data given above already. :)

> >  Well, you've got the system and I have no crystal ball.  You have means
> > to debug it.  See how control is passed to the RI exception.  Find which
> > of the TLB exceptions happens and how it proceeds.  Etc...
> 
> Unfortunately, I don't understand how to "see how control is passed" or 
> finding TLB exceptions... Could you point me in the right direction to learn 
> about this?

 You can check how the return address is set at the function's entry point 
to see how it's called.

 As to the TLB exceptions -- well, read the MIPS architecture spec first.  
Then -- well, referring you to arch/mips/mm/tlbex.c would be pure cruelty
;) -- but have a look at do_page_fault(), which is where all the
processing important here is done -- the machine code generated from
tlbex.c handles the success paths only.

> > and (b) control being transferred to a block of memory that isn't actually
> > code, as can happen if exception vectors or global pointers-to-functions
> > aren't set up correctly, or if the kernel stack is being corrupted.   When
> > you say "the instruction in question is a store word", how do you know that? 
> 
> The RI error spits out a bunch of info, including epc which presumably points 
> to the instruction causing the problem: ac85ffc0; this is 'sw a1,-64(a0)'

 I have seen that already and wrote these stores in __bzero are protected.  
Perhaps the fixup fails for some reason, but you need to investigate it
and this is why I suggested to see how the RI handler is reached.  Since
this is a known point the failure leads to, you should be able to work
backwards from there quite easily.

  Maciej

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 19:53         ` Kevin D. Kissell
@ 2008-06-08 20:14           ` Maciej W. Rozycki
  2008-06-08 20:20           ` Luke -Jr
  1 sibling, 0 replies; 12+ messages in thread
From: Maciej W. Rozycki @ 2008-06-08 20:14 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: Luke -Jr, linux-kernel, linux-mips

On Sun, 8 Jun 2008, Kevin D. Kissell wrote:

> > The RI error spits out a bunch of info, including epc which presumably points 
> > to the instruction causing the problem: ac85ffc0; this is 'sw a1,-64(a0)'
> >   
> But unless the processor itself is actually defective, there is no way that
> a  SW instruction can cause an RI exception.  Sometimes a kernel crash
> is so violent that the kernel stack frame cannot be reliably decoded by
> the crash dump code, and this would appear to be one of those cases.
> I find the address of 0xac85ffc0 to be a bit suspicious, myself.  That's
> a kseg1 (non-cacheable identity map) address for physical address
> 0x0c85ffc0, which would be legitimate (though suspicious) if you had
> 256MB of RAM, but the boot log quote you posted earlier suggests
> that you've only got 16M.  Is there really memory of some kind at
> that address?  Are you calling routines in a boot ROM from Linux?

 Well, 0xac85ffc0 is the instruction word corresponding to 'sw a1,-64(a0)'.
:)  The actual address of the failure is apparently 0x004e010c, which is
pretty much a standard location somewhere within a user executable proper.

  Maciej

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 19:53         ` Kevin D. Kissell
  2008-06-08 20:14           ` Maciej W. Rozycki
@ 2008-06-08 20:20           ` Luke -Jr
  1 sibling, 0 replies; 12+ messages in thread
From: Luke -Jr @ 2008-06-08 20:20 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: Maciej W. Rozycki, linux-kernel, linux-mips

On Sunday 08 June 2008, Kevin D. Kissell wrote:
> Luke -Jr wrote:
> > On Sunday 08 June 2008, Kevin D. Kissell wrote:
> >> and (b) control being transferred to a block of memory that isn't
> >> actually code, as can happen if exception vectors or global
> >> pointers-to-functions aren't set up correctly, or if the kernel stack is
> >> being corrupted.   When you say "the instruction in question is a store
> >> word", how do you know that?
> >
> > The RI error spits out a bunch of info, including epc which presumably
> > points to the instruction causing the problem: ac85ffc0; this is 'sw
> > a1,-64(a0)'
>
> But unless the processor itself is actually defective, there is no way that
> a  SW instruction can cause an RI exception. Sometimes a kernel crash 
> is so violent that the kernel stack frame cannot be reliably decoded by
> the crash dump code, and this would appear to be one of those cases.

In that case, wouldn't the "kernel stack" appear to be complete nonsense?
Yet the stack in this case is quite logical and consistent. Furthermore, if I 
skip the bzero stuff (by commenting out the call), it will crash shortly 
thereafter when the ELF loader attempts to write to it in another way.
Is it very unlikely that the bcm3345 is simply raising the wrong exception (or 
perhaps Linux is misinterpreting the exception)?

> I find the address of 0xac85ffc0 to be a bit suspicious, myself.  That's
> a kseg1 (non-cacheable identity map) address for physical address
> 0x0c85ffc0, which would be legitimate (though suspicious) if you had
> 256MB of RAM, but the boot log quote you posted earlier suggests
> that you've only got 16M.  Is there really memory of some kind at
> that address?  Are you calling routines in a boot ROM from Linux?

ac85ffc0 is the instruction for 'sw a1,-64(a0)', not an address.
The board has only 8 MB RAM, to the best I can tell from looking up the RAM 
chip (hynix KOREA HY57V641620HG 0229A T-7).

> Debugging Linux kernel crashes is probably not the best way to learn
> the MIPS privileged resource architecture.  I'd strongly recommend
> http://www.amazon.com/See-MIPS-Second-Dominic-Sweetman/dp/0120884216/

Can you recommend any gratis materials to read? I don't have room in my budget 
to spend money on this hobby right now..

Luke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 19:59         ` Maciej W. Rozycki
@ 2008-06-08 20:27           ` Luke -Jr
  2008-06-08 22:13             ` Maciej W. Rozycki
  0 siblings, 1 reply; 12+ messages in thread
From: Luke -Jr @ 2008-06-08 20:27 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-kernel, linux-mips

On Sunday 08 June 2008, Maciej W. Rozycki wrote:
> On Sun, 8 Jun 2008, Luke -Jr wrote:
> > the bcm63xx patches OpenWrt has that I'm using as a base for this...
>
>  It would be best if the patches you are referring to got merged with the
> mainline.  Otherwise whoever uses them is essentially on their own --
> people lack the resources needed to chase random changes out there in
> general.

Is merging with mainline something I can help with, being a beginner in this 
area generally and not having any part in writing them?

> > >  That's grossly wrong.  If you need to preset it for the time being
> > > till you debug calibration, then for a MIPS processor assume one
> > > instruction per clock tick and two instructions per loop -- that may
> > > not be entirely correct, but is a good approximation.  Otherwise you
> > > risk peripheral devices are not driven correctly with all sorts of the
> > > nasty results.
> >
> > Meaning this?
> > 	preset_lpj = loops_per_jiffy = 2;
>
>  Not exactly.  Try harder -- this is simple arithmetic and you've got all
> the data given above already. :)

200 / 2? I'm not really sure what a 'jiffy' is..

> > > and (b) control being transferred to a block of memory that isn't
> > > actually code, as can happen if exception vectors or global
> > > pointers-to-functions aren't set up correctly, or if the kernel stack
> > > is being corrupted.   When you say "the instruction in question is a
> > > store word", how do you know that?
> >
> > The RI error spits out a bunch of info, including epc which presumably
> > points to the instruction causing the problem: ac85ffc0; this is 'sw
> > a1,-64(a0)'
>
>  I have seen that already and wrote these stores in __bzero are protected.
> Perhaps the fixup fails for some reason, but you need to investigate it
> and this is why I suggested to see how the RI handler is reached.  Since
> this is a known point the failure leads to, you should be able to work
> backwards from there quite easily.

Ah, so what you're saying is that perhaps the 'sw' is triggering a TLB 
exception, and the handler for *that* is causing the RI problem?

Thanks,

Luke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 20:27           ` Luke -Jr
@ 2008-06-08 22:13             ` Maciej W. Rozycki
  2008-06-08 23:36               ` Luke -Jr
  2008-06-09  6:05               ` Luke -Jr
  0 siblings, 2 replies; 12+ messages in thread
From: Maciej W. Rozycki @ 2008-06-08 22:13 UTC (permalink / raw)
  To: Luke -Jr; +Cc: linux-kernel, linux-mips

On Sun, 8 Jun 2008, Luke -Jr wrote:

> Is merging with mainline something I can help with, being a beginner in this 
> area generally and not having any part in writing them?

 Well, you can certainly serve as a messenger telling them if they want
people to get proper support from upstream maintainers they better merge
sooner rather than later.  Otherwise it is them who should really be
bothered with cases like yours.

 The general principle is: "merge as soon as you can, even if code is
incomplete" as you get more attention and perhaps developers involved as a
result, some free support (e.g. with bulk changes done automatically to
all the relevant bits in the tree) and avoid duplicated work; also when at
the time of the merge you are told to rewrite your code differently.

> >  Not exactly.  Try harder -- this is simple arithmetic and you've got all
> > the data given above already. :)
> 
> 200 / 2? I'm not really sure what a 'jiffy' is..

 Hmm, I have thought it can be inferred from the code involved or failing
that -- Google...  Well, anyway, a jiffy is a tick of the kernel timer or,
specifically in this context and to be more precise, the interval between
such two consecutive ticks or, in other words, 1/HZ.

> >  I have seen that already and wrote these stores in __bzero are protected.
> > Perhaps the fixup fails for some reason, but you need to investigate it
> > and this is why I suggested to see how the RI handler is reached.  Since
> > this is a known point the failure leads to, you should be able to work
> > backwards from there quite easily.
> 
> Ah, so what you're saying is that perhaps the 'sw' is triggering a TLB 
> exception, and the handler for *that* is causing the RI problem?

 This is almost certain what happens here.  The pointer involved is a
valid (user) address and is correctly aligned, so you cannot get an
address error exception.  A TLB exception is next on the list to check.

 Of course you cannot rule out I-cache corruption or suchlike, but if I
were you, I would start with simple assumptions first.

  Maciej

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 22:13             ` Maciej W. Rozycki
@ 2008-06-08 23:36               ` Luke -Jr
  2008-06-09  6:40                 ` Geert Uytterhoeven
  2008-06-09  6:05               ` Luke -Jr
  1 sibling, 1 reply; 12+ messages in thread
From: Luke -Jr @ 2008-06-08 23:36 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-kernel, linux-mips

On Sunday 08 June 2008, Maciej W. Rozycki wrote:
> On Sun, 8 Jun 2008, Luke -Jr wrote:
> > Is merging with mainline something I can help with, being a beginner in
> > this area generally and not having any part in writing them?
>
>  Well, you can certainly serve as a messenger telling them if they want
> people to get proper support from upstream maintainers they better merge
> sooner rather than later.

Apparently the reason for lack of merge is due to missing (proprietary?) 
drivers for DSL, Ethernet, and WiFi on the bcm63xx platform. I'll pass on 
the "incomplete is ok" message, though, and hopefully that will help :)

>  The general principle is: "merge as soon as you can, even if code is
> incomplete" as you get more attention and perhaps developers involved as a
> result, some free support (e.g. with bulk changes done automatically to
> all the relevant bits in the tree) and avoid duplicated work; also when at
> the time of the merge you are told to rewrite your code differently.

Does this apply even to my trivial/barely begun attempts so far? When bcm63xx 
gets merged, should I be planning to merge my stuff even before it boots?

> > >  Not exactly.  Try harder -- this is simple arithmetic and you've got
> > > all the data given above already. :)
> >
> > 200 / 2? I'm not really sure what a 'jiffy' is..
>
>  Hmm, I have thought it can be inferred from the code involved or failing
> that -- Google...  Well, anyway, a jiffy is a tick of the kernel timer or,
> specifically in this context and to be more precise, the interval between
> such two consecutive ticks or, in other words, 1/HZ.

jiffy = 1 / 200000 HZ = 0.000005 sec/tick
loop = 200000 instructions / 2 instructions per loop = 100000 loops/sec

So 0.00000000005 loops per jiffy? But it can't be, since loops_per_jiffy isn't 
floating point... :/

> > >  I have seen that already and wrote these stores in __bzero are
> > > protected. Perhaps the fixup fails for some reason, but you need to
> > > investigate it and this is why I suggested to see how the RI handler is
> > > reached.  Since this is a known point the failure leads to, you should
> > > be able to work backwards from there quite easily.
> >
> > Ah, so what you're saying is that perhaps the 'sw' is triggering a TLB
> > exception, and the handler for *that* is causing the RI problem?
>
>  This is almost certain what happens here.  The pointer involved is a
> valid (user) address and is correctly aligned, so you cannot get an
> address error exception.  A TLB exception is next on the list to check.

Is there an easy way to printk out a complete trace of the exception stack?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 22:13             ` Maciej W. Rozycki
  2008-06-08 23:36               ` Luke -Jr
@ 2008-06-09  6:05               ` Luke -Jr
  1 sibling, 0 replies; 12+ messages in thread
From: Luke -Jr @ 2008-06-09  6:05 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-kernel, linux-mips

On Sunday 08 June 2008, Maciej W. Rozycki wrote:
> On Sun, 8 Jun 2008, Luke -Jr wrote:
> > >  I have seen that already and wrote these stores in __bzero are
> > > protected. Perhaps the fixup fails for some reason, but you need to
> > > investigate it and this is why I suggested to see how the RI handler is
> > > reached.  Since this is a known point the failure leads to, you should
> > > be able to work backwards from there quite easily.
> >
> > Ah, so what you're saying is that perhaps the 'sw' is triggering a TLB
> > exception, and the handler for *that* is causing the RI problem?
>
>  This is almost certain what happens here.  The pointer involved is a
> valid (user) address and is correctly aligned, so you cannot get an
> address error exception.  A TLB exception is next on the list to check.

I added some code to do_ri:
	if (unlikely(!user_mode(regs)))
	{
		long real_epc;
		asm("move %0, $sp" : "=r"(real_epc));
		printk("----- LJR -------\n");
		show_raw_backtrace(real_epc);
		printk("----- LJRx-------\n");
	}

Which gave me some potentially useful info:
	----- LJR -------
	Call Trace:
	[<80011460>] ret_from_exception+0x0/0x24
	[<80069de4>] vma_link+0x48/0x114
	[<8001b1f0>] blast_icache16+0x0/0xec
	[<800aa27c>] padzero+0x5c/0x74
	[<800c6774>] __bzero+0x38/0x164
	[<800ab04c>] load_elf_binary+0x948/0x145c
	[<800aac6c>] load_elf_binary+0x568/0x145c
	[<80083b80>] __path_lookup_intent_open+0x60/0xe4
	[<80083b50>] __path_lookup_intent_open+0x30/0xe4
	[<80080044>] permission+0x10c/0x148
	[<8007bfd4>] search_binary_handler+0x78/0x18c
	[<800aa15c>] load_script+0x25c/0x270
	[<800aa148>] load_script+0x248/0x270
	[<800aa7b4>] load_elf_binary+0xb0/0x145c
	[<8007c204>] get_arg_page+0x4c/0xc4
	[<8001cab4>] r4k_flush_cache_page+0x1c/0x28
	[<8007bfd4>] search_binary_handler+0x78/0x18c
	[<8007e004>] do_execve+0x18c/0x258
	[<8007dfe4>] do_execve+0x16c/0x258
	[<80081074>] getname+0x24/0x118
	[<8001570c>] sys_execve+0x4c/0x78
	[<80030610>] release_console_sem+0x114/0x358
	[<80018410>] stack_done+0x20/0x3c
	[<80031038>] vprintk+0x368/0x448
	[<8007554c>] get_unused_fd_flags+0x60/0x184
	[<80081074>] getname+0x24/0x118
	[<80010478>] init_post+0x60/0xe8
	[<80015584>] kernel_execve+0x8/0x20
	[<800136cc>] kernel_thread_helper+0x10/0x18
	[<800136bc>] kernel_thread_helper+0x0/0x18
	
	----- LJRx-------

Too tired to debug further tonight, but hopefully this stack will stand out to 
someone :)

Luke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bcm33xx port
  2008-06-08 23:36               ` Luke -Jr
@ 2008-06-09  6:40                 ` Geert Uytterhoeven
  0 siblings, 0 replies; 12+ messages in thread
From: Geert Uytterhoeven @ 2008-06-09  6:40 UTC (permalink / raw)
  To: Luke -Jr; +Cc: Maciej W. Rozycki, linux-kernel, linux-mips

On Sun, 8 Jun 2008, Luke -Jr wrote:
> On Sunday 08 June 2008, Maciej W. Rozycki wrote:
> > On Sun, 8 Jun 2008, Luke -Jr wrote:
> > > >  Not exactly.  Try harder -- this is simple arithmetic and you've got
> > > > all the data given above already. :)
> > >
> > > 200 / 2? I'm not really sure what a 'jiffy' is..
> >
> >  Hmm, I have thought it can be inferred from the code involved or failing
> > that -- Google...  Well, anyway, a jiffy is a tick of the kernel timer or,
> > specifically in this context and to be more precise, the interval between
> > such two consecutive ticks or, in other words, 1/HZ.
                                                     ^^
Look at CONFIG_HZ, which is probably 100, 250, or 1000.

> jiffy = 1 / 200000 HZ = 0.000005 sec/tick
> loop = 200000 instructions / 2 instructions per loop = 100000 loops/sec
> 
> So 0.00000000005 loops per jiffy? But it can't be, since loops_per_jiffy isn't 
> floating point... :/

So loops_per_jiffie is approx. CPU clock frequency / CONFIG_HZ.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-06-09  6:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200806072113.26433.luke@dashjr.org>
     [not found] ` <Pine.LNX.4.55.0806080342310.15673@cliff.in.clinika.pl>
2008-06-08  4:32   ` bcm33xx port Luke -Jr
2008-06-08 12:53     ` Maciej W. Rozycki
2008-06-08 18:56       ` Luke -Jr
2008-06-08 19:53         ` Kevin D. Kissell
2008-06-08 20:14           ` Maciej W. Rozycki
2008-06-08 20:20           ` Luke -Jr
2008-06-08 19:59         ` Maciej W. Rozycki
2008-06-08 20:27           ` Luke -Jr
2008-06-08 22:13             ` Maciej W. Rozycki
2008-06-08 23:36               ` Luke -Jr
2008-06-09  6:40                 ` Geert Uytterhoeven
2008-06-09  6:05               ` Luke -Jr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox