linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Why does one "stw" fail with address translation disabled in PPC405EP?
@ 2008-08-22 18:27 Zhou Rui
  2008-08-22 18:42 ` Josh Boyer
  0 siblings, 1 reply; 13+ messages in thread
From: Zhou Rui @ 2008-08-22 18:27 UTC (permalink / raw)
  To: linuxppc-dev

Hi, all:
    I think I meet an odd problem with PPC405EP (PPChameleonEVB Board).
    I am running a kernel module which will execute a user space
application. The entry point of the application is 0x100000a0. At the
moment when the processor tries to execute the application, 0x100000a0
is not in TLB (this can be seen from BDI by printing out TLB entries),
so DTLBMiss is called automatically and then finish_tlb_load. However,
InstructionAccess is followed and the problem arises here.
InstructionAccess starts from 0x400, and after instruction "0xc0000434
<InstructionAccess+52>:      stw     r12,64(r11)", machine check occurs.
This instruction will store the value of r12, which is 0x0 at this
moment, to address 0x03072de0. I am puzzled why this action leads to
machine check. Is it illegal to store 0x0 in a memory address? Or is
there some other cause of the machine check here?

405EP>r
GPR00: c31c5200 c3072da0 c03a97b0 100000a0
GPR04: c306a000 c306e000 c31c51b8 c306a000
GPR08: c0a64000 c0a64000 40000000 03072da0
GPR12: 00000000 00000000 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000
GPR20: 00000000 00000000 00000000 00000000
GPR24: 00000000 00000000 00000000 00000000
GPR28: 00000000 c31d0000 100000a0 c306a000
CR   : 20000000     MSR: 00001000
405EP>t
    Core number       : 0
    Core state        : debug mode
    Debug entry cause : single step
    Current PC        : 0x00000434
    Current CR        : 0x20000000
    Current MSR       : 0x00001000
    Current LR        : 0xc31c478c
405EP>r
GPR00: c31c5200 c3072da0 c03a97b0 100000a0
GPR04: c306a000 c306e000 c31c51b8 c306a000
GPR08: c0a64000 c0a64000 40000000 03072da0
GPR12: 00000000 00000000 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000
GPR20: 00000000 00000000 00000000 00000000
GPR24: 00000000 00000000 00000000 00000000
GPR28: 00000000 c31d0000 100000a0 c306a000
CR   : 20000000     MSR: 00001000
405EP>t
    Core number       : 0
    Core state        : debug mode
    Debug entry cause : single step
    Current PC        : 0x00000200
    Current CR        : 0x20000000
    Current MSR       : 0x00001000
    Current LR        : 0xc31c478c
405EP>

The error message shows more information. I am also puzzled why NIP here
is 0x440 but not 0x434:

Data machine check in kernel mode.
PLB0: BEAR= 0x03072dd4 ACR=   0x00000000 BESR=  0x00c00000
PLB0 to OPB: BEAR= 0x04000000 BESR0= 0x00000000 BESR1= 0x00000000
Oops: machine check, sig: 7 [#1]
NIP: 00000440 LR: C31C478C CTR: 100000A0
REGS: c02a8f50 TRAP: 0202   Not tainted  (2.6.19.2)
MSR: 00021000 <ME>  CR: 20000000  XER: 00000000
TASK = c0399490[987] 'loader.xm' THREAD: c028a000
GPR00: C31C5200 C3072DA0 C0399490 100000A0 C306A000 C306E000 C31C51B8
C306A000 
GPR08: C0413000 C0413000 FFFFFFFF 03072DA0 00000000 00000000 00000000
00000000 
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 
GPR24: 00000000 00000000 00000000 00000000 00000000 C31D0000 100000A0
C306A000 
NIP [00000440] 0x440
LR [C31C478C] jump_xm_dom+0x2c/0x48 [xm]
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
Data machine check in kernel mode.
PLB0: BEAR= 0x03072dc0 ACR=   0x00000000 BESR=  0x00800000
PLB0 to OPB: BEAR= 0x04000000 BESR0= 0x00000000 BESR1= 0x00000000
Oops: machine check, sig: 7 [#2]
NIP: C0002EA8 LR: C0002E94 CTR: C31C3094
REGS: c02a8f50 TRAP: 0202   Not tainted  (2.6.19.2)
MSR: 00021030 <ME,IR,DR>  CR: 22002022  XER: 00000000
TASK = c03990d0[905] 'klogd' THREAD: c0e34000
GPR00: C0002E94 C0E35F40 C03990D0 00000FFF 00000001 00000000 00000FFF
00000000 
GPR08: 00000000 00000000 00021032 00000000 C0E34000 0804E364 100F0000
00000000 
GPR16: 101009E8 1009DF98 100F0000 08046368 08046364 07FEF08C 08046130
08004B74 
GPR24: 08004FA4 08046130 08004DB4 08004DB8 08004F70 080466BC 08046358
08046AC0 
NIP [C0002EA8] ret_from_syscall+0x14/0x3c
LR [C0002E94] ret_from_syscall+0x0/0x3c
Call Trace:
[C0E35F40] [C0002E94] ret_from_syscall+0x0/0x3c (unreliable)
Instruction dump:
614a9634 5400103a 408000a0 7d4a002e 7d4803a6 39210010 4e800021 7c661b78 
542c0024 3d400002 614a1032 7d400124 <812c0028> 3900fdfc 7120db0f
408201a4

Another question is when 0x100000a0 is missed in TLB, why the order of
calling kernel functions is DTLBMiss -- finish_tlb_load --
InstructionAccess?

Appreciate in advance for any advice!!!

Best Wishes

Zhou Rui
2008-08-22

__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-08-22 18:27 Why does one "stw" fail with address translation disabled in PPC405EP? Zhou Rui
@ 2008-08-22 18:42 ` Josh Boyer
  2008-08-23  8:26   ` Zhou Rui
  2008-08-23 21:18   ` Zhou Rui
  0 siblings, 2 replies; 13+ messages in thread
From: Josh Boyer @ 2008-08-22 18:42 UTC (permalink / raw)
  To: Zhou Rui; +Cc: linuxppc-dev

On Fri, Aug 22, 2008 at 08:27:15PM +0200, Zhou Rui wrote:
>Hi, all:
>    I think I meet an odd problem with PPC405EP (PPChameleonEVB Board).

What kernel version are you using?

>    I am running a kernel module which will execute a user space
>application. The entry point of the application is 0x100000a0. At the

That should be the first clue that you are doing it wrong.  Don't do
stuff like that in modules...

>moment when the processor tries to execute the application, 0x100000a0
>is not in TLB (this can be seen from BDI by printing out TLB entries),
>so DTLBMiss is called automatically and then finish_tlb_load. However,
>InstructionAccess is followed and the problem arises here.
>InstructionAccess starts from 0x400, and after instruction "0xc0000434
><InstructionAccess+52>:      stw     r12,64(r11)", machine check occurs.
>This instruction will store the value of r12, which is 0x0 at this
>moment, to address 0x03072de0. I am puzzled why this action leads to
>machine check. Is it illegal to store 0x0 in a memory address? Or is
>there some other cause of the machine check here?

I have no idea if you're using physical or virtual addresses here, so
there isn't much we can do to help you.

Do you have enough DRAM to cover that?  Some of those boards only come
with 32MiB of DRAM.

josh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-08-22 18:42 ` Josh Boyer
@ 2008-08-23  8:26   ` Zhou Rui
  2008-08-23 22:49     ` Benjamin Herrenschmidt
  2008-08-24 18:55     ` Wolfgang Denk
  2008-08-23 21:18   ` Zhou Rui
  1 sibling, 2 replies; 13+ messages in thread
From: Zhou Rui @ 2008-08-23  8:26 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linuxppc-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2237 bytes --]


在 2008-08-22五的 14:42 -0400,Josh Boyer写道:
> On Fri, Aug 22, 2008 at 08:27:15PM +0200, Zhou Rui wrote:
> >Hi, all:
> >    I think I meet an odd problem with PPC405EP (PPChameleonEVB Board).
> 
> What kernel version are you using?

linux-2.6.19.2 from ELDK4.1
> 
> >    I am running a kernel module which will execute a user space
> >application. The entry point of the application is 0x100000a0. At the
> 
> That should be the first clue that you are doing it wrong.  Don't do
> stuff like that in modules...

Oh, but our project needs a function like that ...
> 
> >moment when the processor tries to execute the application, 0x100000a0
> >is not in TLB (this can be seen from BDI by printing out TLB entries),
> >so DTLBMiss is called automatically and then finish_tlb_load. However,
> >InstructionAccess is followed and the problem arises here.
> >InstructionAccess starts from 0x400, and after instruction "0xc0000434
> ><InstructionAccess+52>:      stw     r12,64(r11)", machine check occurs.
> >This instruction will store the value of r12, which is 0x0 at this
> >moment, to address 0x03072de0. I am puzzled why this action leads to
> >machine check. Is it illegal to store 0x0 in a memory address? Or is
> >there some other cause of the machine check here?
> 
> I have no idea if you're using physical or virtual addresses here, so
> there isn't much we can do to help you.

It is physical address at this moment. Address translation is disabled
automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and
Instrunction Storage Exception.
> 
> Do you have enough DRAM to cover that?  Some of those boards only come
> with 32MiB of DRAM.

My board only has 32MB DRAM. Do you mean 32MB is not enough for that?
The same codes can run well in a PPC440EP (Yosemite Board) which owns
256MB DRAM. At the beginning of my work, I thought memory size may be
the cause of failure. But I did not know how to demonstrate it. So if
the limitation of 32MB DRAM leads to the failure, are there any methods
for the codes to solve it?

Thank you very much for your reply!

Best Wishes

Zhou Rui
2008-08-23

> 
> josh

__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-08-22 18:42 ` Josh Boyer
  2008-08-23  8:26   ` Zhou Rui
@ 2008-08-23 21:18   ` Zhou Rui
  1 sibling, 0 replies; 13+ messages in thread
From: Zhou Rui @ 2008-08-23 21:18 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linuxppc-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 1901 bytes --]


在 2008-08-22五的 14:42 -0400,Josh Boyer写道:
> On Fri, Aug 22, 2008 at 08:27:15PM +0200, Zhou Rui wrote:
> >Hi, all:
> >    I think I meet an odd problem with PPC405EP (PPChameleonEVB Board).
> 
> What kernel version are you using?
> 
> >    I am running a kernel module which will execute a user space
> >application. The entry point of the application is 0x100000a0. At the
> 
> That should be the first clue that you are doing it wrong.  Don't do
> stuff like that in modules...
> 
> >moment when the processor tries to execute the application, 0x100000a0
> >is not in TLB (this can be seen from BDI by printing out TLB entries),
> >so DTLBMiss is called automatically and then finish_tlb_load. However,
> >InstructionAccess is followed and the problem arises here.
> >InstructionAccess starts from 0x400, and after instruction "0xc0000434
> ><InstructionAccess+52>:      stw     r12,64(r11)", machine check occurs.
> >This instruction will store the value of r12, which is 0x0 at this
> >moment, to address 0x03072de0. I am puzzled why this action leads to
> >machine check. Is it illegal to store 0x0 in a memory address? Or is
> >there some other cause of the machine check here?
> 
> I have no idea if you're using physical or virtual addresses here, so
> there isn't much we can do to help you.
> 
> Do you have enough DRAM to cover that?  Some of those boards only come
> with 32MiB of DRAM.

Oh, I think I just now realized that for 32MB DRAM, the physical address
should be between 0x0 and 0x1FFFFFF, but the address 0x03072da0 was out
of this range ...
To enlarge the memory size is one solution but I do not think it is
feasible for my board. Is there any solution from software side? I hope
there is.
Thank you very much!!!

Best Wishes

Zhou Rui
2008-08-23

> 
> josh

__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-08-23  8:26   ` Zhou Rui
@ 2008-08-23 22:49     ` Benjamin Herrenschmidt
  2008-08-24 18:55     ` Wolfgang Denk
  1 sibling, 0 replies; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2008-08-23 22:49 UTC (permalink / raw)
  To: Zhou Rui; +Cc: linuxppc-dev

On Sat, 2008-08-23 at 10:26 +0200, Zhou Rui wrote:
> My board only has 32MB DRAM. Do you mean 32MB is not enough for that?
> The same codes can run well in a PPC440EP (Yosemite Board) which owns
> 256MB DRAM. At the beginning of my work, I thought memory size may be
> the cause of failure. But I did not know how to demonstrate it. So if
> the limitation of 32MB DRAM leads to the failure, are there any methods
> for the codes to solve it?

Well, it looks like the kernel is trying to access memory beyond 32M,
which would mean that the problem is that your kernel port or your
bootloader somewhat thinks there is more memory than there really is.

You need to look there... whatever passes the amount of memory to
the kernel at boot needs to be fixed.

Ben.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-08-23  8:26   ` Zhou Rui
  2008-08-23 22:49     ` Benjamin Herrenschmidt
@ 2008-08-24 18:55     ` Wolfgang Denk
  2008-08-25 19:16       ` Zhou Rui
  1 sibling, 1 reply; 13+ messages in thread
From: Wolfgang Denk @ 2008-08-24 18:55 UTC (permalink / raw)
  To: Zhou Rui; +Cc: linuxppc-dev

Dear Zhou Rui,

In message <1219479992.7565.17.camel@localhost> you wrote:
>
> > >    I am running a kernel module which will execute a user space
> > >application. The entry point of the application is 0x100000a0. At the
> > 
> > That should be the first clue that you are doing it wrong.  Don't do
> > stuff like that in modules...
> 
> Oh, but our project needs a function like that ...

You should really think about this. Why do you think you  need  this?
What  exactly  are  you  trying  to  do?  [Probably  there are better
approaches to solve your problem...]

> It is physical address at this moment. Address translation is disabled
> automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and
> Instrunction Storage Exception.

Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a
physical address?

> > Do you have enough DRAM to cover that?  Some of those boards only come
> > with 32MiB of DRAM.
> 
> My board only has 32MB DRAM. Do you mean 32MB is not enough for that?

Well, 0x1000'00A0 is above 256 MB, while you  have  only  32  MB  RAM
which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So
what you claim to be a physical address (and I think your claim is
wrong) is far outside available physical memory.

> The same codes can run well in a PPC440EP (Yosemite Board) which owns
> 256MB DRAM. At the beginning of my work, I thought memory size may be
> the cause of failure. But I did not know how to demonstrate it. So if
> the limitation of 32MB DRAM leads to the failure, are there any methods
> for the codes to solve it?

I think you got lost on the wrong track. Please describe  which  task
you  want  to  implement, and there might be another, better approach
for it.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
The management question ... is not _whether_ to build a pilot  system
and  throw  it away. You _will_ do that. The only question is whether
to plan in advance to build a throwaway, or to promise to deliver the
throwaway to customers.       - Fred Brooks, "The Mythical Man Month"

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-08-24 18:55     ` Wolfgang Denk
@ 2008-08-25 19:16       ` Zhou Rui
  2008-08-28 15:53         ` Zhou Rui
  2008-08-31 11:50         ` [Evolves!] " Zhou Rui
  0 siblings, 2 replies; 13+ messages in thread
From: Zhou Rui @ 2008-08-25 19:16 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linuxppc-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 4697 bytes --]

Hi,
I think maybe you have known this project named XtratuM
(http://www.xtratum.org). I'm porting it from x86 to PPC405. The
implementation on PPC440 has been basically finished
(ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it.

I think I can say something of my analysis so far for the cause of my
problem. Thanks for the mention of memory size. Once the kernel module
of XtratuM is loaded, the symbols of it are placed to virtual addresses
like 0xc3xxxxxx. Because in normal state, address translation is enabled
(MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading
the domain, because the entry point 0x100000a0 is not in TLB and it
should be reloaded, Data TLB Miss Exception arises and DTLBMiss is
called. The exception clears MSR[IR, DR], so address translation is
disabled and physical address should be used at this moment. If we want
something at the virtual address of 0xc3xxxxxx, we must access the
physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB
memory makes the valid physical address range from 0x0 to 0x1ffffff.
Therefore, during the exception handling, the addresses out of range
should not be accessed, but the instructions cannot know the memory
limitation in advance and tries to do something in addresses such as
0x03072da0 based on the address translation mechanism, which leads to
machine check.
I haved tried to append "mem=32M" to kernel command line but no help. I
think it is because when loading the kernel in normal state, address
translation is enabled and the virtual addresses are okay. Kernel cannot
foresee that there is going to be a TLB miss exception and the illegal
physical addresses like 0x03xxxxxx may be accessed.

So any ideas for this problem are welcome.

Thank you very much for taking care.

Best Wishes

Zhou Rui
2008-08-25

在 2008-08-24日的 20:55 +0200,Wolfgang Denk写道:
> Dear Zhou Rui,
> 
> In message <1219479992.7565.17.camel@localhost> you wrote:
> >
> > > >    I am running a kernel module which will execute a user space
> > > >application. The entry point of the application is 0x100000a0. At the
> > > 
> > > That should be the first clue that you are doing it wrong.  Don't do
> > > stuff like that in modules...
> > 
> > Oh, but our project needs a function like that ...
> 
> You should really think about this. Why do you think you  need  this?
> What  exactly  are  you  trying  to  do?  [Probably  there are better
> approaches to solve your problem...]

> > It is physical address at this moment. Address translation is disabled
> > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and
> > Instrunction Storage Exception.
> 
> Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a
> physical address?
> 
> > > Do you have enough DRAM to cover that?  Some of those boards only come
> > > with 32MiB of DRAM.
> > 
> > My board only has 32MB DRAM. Do you mean 32MB is not enough for that?
> 
> Well, 0x1000'00A0 is above 256 MB, while you  have  only  32  MB  RAM
> which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So
> what you claim to be a physical address (and I think your claim is
> wrong) is far outside available physical memory.
> 
> > The same codes can run well in a PPC440EP (Yosemite Board) which owns
> > 256MB DRAM. At the beginning of my work, I thought memory size may be
> > the cause of failure. But I did not know how to demonstrate it. So if
> > the limitation of 32MB DRAM leads to the failure, are there any methods
> > for the codes to solve it?
> 
> I think you got lost on the wrong track. Please describe  which  task
> you  want  to  implement, and there might be another, better approach
> for it.
> 
> Best regards,
> 
> Wolfgang Denk

__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-08-25 19:16       ` Zhou Rui
@ 2008-08-28 15:53         ` Zhou Rui
  2008-08-31 11:50         ` [Evolves!] " Zhou Rui
  1 sibling, 0 replies; 13+ messages in thread
From: Zhou Rui @ 2008-08-28 15:53 UTC (permalink / raw)
  To: Linuxppc-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 6349 bytes --]

Hi, all:
Well, as described before, the problem happens at "0xc0000434
<InstructionAccess+52>:      stw     r12,64(r11)". At this moment,
address translation is disabled and physical addresses are used, but r11
contains 0x03072da0 which is a physical address out of the range of 0x0
and 0x01ffffff. I check backward and see that value of r11 is from r1 by
tophys(r11,r1). r1 should hold the stack pointer, whose value is in the
form of 0xc3xxxxxx when the problem happens.
In this project, we make use of the concept of domain for high level
OSes above XtratuM kernel. Here Linux is the root domain and the problem
happens when we tries to load and transfer to a testing domain which
just prints some sentences, and 0x100000a0 is the entry point of the
testing domain, so we need to execute this address. We define a struct
domain_t for the domain and the stack pointer is achieved by:
....
  d->sstack_st = vmalloc (DEFAULT_SSTACK_SIZE); /* DEFAULT_SSTACK_SIZE
is 0x1000 */

  if (!d->sstack_st) {
    destroy_domain (d);
    id = -OUT_OF_MEMORY;
    goto exit_load_domain;
  }

  d->sstack = (unsigned long *)((unsigned long)d->sstack_st +
DEFAULT_SSTACK_SIZE); /* sstack is the new domain's stack pointer that
will be moved to r1 */

After doing this, sstack will get the value of 0xc3xxxxxx.

Shouln't I use vmalloc here? Or are there any other solution?

Thanks in advance for any advice!

Best Wishes

Zhou Rui
2008-08-28

在 2008-08-25一的 21:16 +0200,Zhou Rui写道:
> Hi,
> I think maybe you have known this project named XtratuM
> (http://www.xtratum.org). I'm porting it from x86 to PPC405. The
> implementation on PPC440 has been basically finished
> (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it.
> 
> I think I can say something of my analysis so far for the cause of my
> problem. Thanks for the mention of memory size. Once the kernel module
> of XtratuM is loaded, the symbols of it are placed to virtual addresses
> like 0xc3xxxxxx. Because in normal state, address translation is enabled
> (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading
> the domain, because the entry point 0x100000a0 is not in TLB and it
> should be reloaded, Data TLB Miss Exception arises and DTLBMiss is
> called. The exception clears MSR[IR, DR], so address translation is
> disabled and physical address should be used at this moment. If we want
> something at the virtual address of 0xc3xxxxxx, we must access the
> physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB
> memory makes the valid physical address range from 0x0 to 0x1ffffff.
> Therefore, during the exception handling, the addresses out of range
> should not be accessed, but the instructions cannot know the memory
> limitation in advance and tries to do something in addresses such as
> 0x03072da0 based on the address translation mechanism, which leads to
> machine check.
> I haved tried to append "mem=32M" to kernel command line but no help. I
> think it is because when loading the kernel in normal state, address
> translation is enabled and the virtual addresses are okay. Kernel cannot
> foresee that there is going to be a TLB miss exception and the illegal
> physical addresses like 0x03xxxxxx may be accessed.
> 
> So any ideas for this problem are welcome.
> 
> Thank you very much for taking care.
> 
> Best Wishes
> 
> Zhou Rui
> 2008-08-25
> 
> 在 2008-08-24日的 20:55 +0200,Wolfgang Denk写道:
> > Dear Zhou Rui,
> > 
> > In message <1219479992.7565.17.camel@localhost> you wrote:
> > >
> > > > >    I am running a kernel module which will execute a user space
> > > > >application. The entry point of the application is 0x100000a0. At the
> > > > 
> > > > That should be the first clue that you are doing it wrong.  Don't do
> > > > stuff like that in modules...
> > > 
> > > Oh, but our project needs a function like that ...
> > 
> > You should really think about this. Why do you think you  need  this?
> > What  exactly  are  you  trying  to  do?  [Probably  there are better
> > approaches to solve your problem...]
> 
> > > It is physical address at this moment. Address translation is disabled
> > > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and
> > > Instrunction Storage Exception.
> > 
> > Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a
> > physical address?
> > 
> > > > Do you have enough DRAM to cover that?  Some of those boards only come
> > > > with 32MiB of DRAM.
> > > 
> > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that?
> > 
> > Well, 0x1000'00A0 is above 256 MB, while you  have  only  32  MB  RAM
> > which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So
> > what you claim to be a physical address (and I think your claim is
> > wrong) is far outside available physical memory.
> > 
> > > The same codes can run well in a PPC440EP (Yosemite Board) which owns
> > > 256MB DRAM. At the beginning of my work, I thought memory size may be
> > > the cause of failure. But I did not know how to demonstrate it. So if
> > > the limitation of 32MB DRAM leads to the failure, are there any methods
> > > for the codes to solve it?
> > 
> > I think you got lost on the wrong track. Please describe  which  task
> > you  want  to  implement, and there might be another, better approach
> > for it.
> > 
> > Best regards,
> > 
> > Wolfgang Denk

__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-08-25 19:16       ` Zhou Rui
  2008-08-28 15:53         ` Zhou Rui
@ 2008-08-31 11:50         ` Zhou Rui
  2008-09-01  5:42           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 13+ messages in thread
From: Zhou Rui @ 2008-08-31 11:50 UTC (permalink / raw)
  To: Linuxppc-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 7752 bytes --]

Hi, all:
    My problem seems basically solved.
    We we used to call vmalloc() in the memory management part of our
source, but it seems to be the key unreliable point resulting in the
problem. vmalloc() always assigns some virtual addresses whose
corresponding physical addresses are out of memory size (there is only
32MB DRAM in our 405 board). Once instructions try to access these
illegal physical address, machine check happens
    Afterwards, we call kmalloc() instead and it works basically as what
we want. But problems of the memory management still exist because
therea are program check exception sometimes and page always:
....
-bash-3.2# PROGRAM: reason: 0x8000000, nip: 0xc028bf20
Oops: Exception in kernel mode, sig: 4 [#1]
NIP: C028BF20 LR: C028BF20 CTR: C31C6078
REGS: c028be80 TRAP: 0700   Not tainted  (2.6.19.2-eldk-xm.1.0)
MSR: 00029030 <EE,ME,IR,DR>  CR: 00000000  XER: 00000000
TASK = c0228a30[0] 'swapper' THREAD: c028a000
GPR00: 00000000 C028BF30 C0228A30 C034B7B0 C028BF20 00000000 00000001
00000000 
GPR08: 00000003 C31D0000 22000082 00029030 2BDD9FE1 C03B3164 0000066F
2B1F1DC8 
GPR16: C03B3050 0FFEA478 10010000 C31D0000 C028BEF0 C31CA2E4 00021030
C028A000 
GPR24: C028BEF0 C0228B44 C0228468 C03B3050 C028BF10 C31C60C4 00029030
C03B3050 
NIP [C028BF20] init_thread_union+0x1f20/0x2000
LR [C028BF20] init_thread_union+0x1f20/0x2000
Call Trace:
[C028BF30] [0FFEA478] 0xffea478 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
Kernel panic - not syncing: Attempted to kill the idle task!
 <0>Rebooting in 180 seconds..

And there is bad page:
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Backtrace:
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Bad page state in process 'loader.xm'
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Trying to fix it up, but a reboot is needed
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Bad page state in process 'loader.xm'
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Trying to fix it up, but a reboot is needed
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: page:c02f0e60 flags:0x00000400 mapping:00000000 mapcount:0
count:1

I will do some traces for fixing those problems.

And could anyone like to give some explanation between vmalloc() and
kmalloc()? Based on our work, there seems to be great difference.

Thank you very much!

Best Wishes

Zhou Rui
2008-08-31

在 2008-08-25一的 21:16 +0200,Zhou Rui写道:
> Hi,
> I think maybe you have known this project named XtratuM
> (http://www.xtratum.org). I'm porting it from x86 to PPC405. The
> implementation on PPC440 has been basically finished
> (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it.
> 
> I think I can say something of my analysis so far for the cause of my
> problem. Thanks for the mention of memory size. Once the kernel module
> of XtratuM is loaded, the symbols of it are placed to virtual addresses
> like 0xc3xxxxxx. Because in normal state, address translation is enabled
> (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading
> the domain, because the entry point 0x100000a0 is not in TLB and it
> should be reloaded, Data TLB Miss Exception arises and DTLBMiss is
> called. The exception clears MSR[IR, DR], so address translation is
> disabled and physical address should be used at this moment. If we want
> something at the virtual address of 0xc3xxxxxx, we must access the
> physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB
> memory makes the valid physical address range from 0x0 to 0x1ffffff.
> Therefore, during the exception handling, the addresses out of range
> should not be accessed, but the instructions cannot know the memory
> limitation in advance and tries to do something in addresses such as
> 0x03072da0 based on the address translation mechanism, which leads to
> machine check.
> I haved tried to append "mem=32M" to kernel command line but no help. I
> think it is because when loading the kernel in normal state, address
> translation is enabled and the virtual addresses are okay. Kernel cannot
> foresee that there is going to be a TLB miss exception and the illegal
> physical addresses like 0x03xxxxxx may be accessed.
> 
> So any ideas for this problem are welcome.
> 
> Thank you very much for taking care.
> 
> Best Wishes
> 
> Zhou Rui
> 2008-08-25
> 
> 在 2008-08-24日的 20:55 +0200,Wolfgang Denk写道:
> > Dear Zhou Rui,
> > 
> > In message <1219479992.7565.17.camel@localhost> you wrote:
> > >
> > > > >    I am running a kernel module which will execute a user space
> > > > >application. The entry point of the application is 0x100000a0. At the
> > > > 
> > > > That should be the first clue that you are doing it wrong.  Don't do
> > > > stuff like that in modules...
> > > 
> > > Oh, but our project needs a function like that ...
> > 
> > You should really think about this. Why do you think you  need  this?
> > What  exactly  are  you  trying  to  do?  [Probably  there are better
> > approaches to solve your problem...]
> 
> > > It is physical address at this moment. Address translation is disabled
> > > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and
> > > Instrunction Storage Exception.
> > 
> > Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a
> > physical address?
> > 
> > > > Do you have enough DRAM to cover that?  Some of those boards only come
> > > > with 32MiB of DRAM.
> > > 
> > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that?
> > 
> > Well, 0x1000'00A0 is above 256 MB, while you  have  only  32  MB  RAM
> > which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So
> > what you claim to be a physical address (and I think your claim is
> > wrong) is far outside available physical memory.
> > 
> > > The same codes can run well in a PPC440EP (Yosemite Board) which owns
> > > 256MB DRAM. At the beginning of my work, I thought memory size may be
> > > the cause of failure. But I did not know how to demonstrate it. So if
> > > the limitation of 32MB DRAM leads to the failure, are there any methods
> > > for the codes to solve it?
> > 
> > I think you got lost on the wrong track. Please describe  which  task
> > you  want  to  implement, and there might be another, better approach
> > for it.
> > 
> > Best regards,
> > 
> > Wolfgang Denk
> 
> __________________________________________________
> ϿעŻ?
> http://cn.mail.yahoo.com
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-08-31 11:50         ` [Evolves!] " Zhou Rui
@ 2008-09-01  5:42           ` Benjamin Herrenschmidt
  2008-09-01  7:22             ` Zhou Rui
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2008-09-01  5:42 UTC (permalink / raw)
  To: Zhou Rui; +Cc: Linuxppc-dev

On Sun, 2008-08-31 at 13:50 +0200, Zhou Rui wrote:
> Hi, all:
>     My problem seems basically solved.
>     We we used to call vmalloc() in the memory management part of our
> source, but it seems to be the key unreliable point resulting in the
> problem. vmalloc() always assigns some virtual addresses whose
> corresponding physical addresses are out of memory size (there is only
> 32MB DRAM in our 405 board). Once instructions try to access these
> illegal physical address, machine check happens

That should -never- happen.

Have you verified, as I asked you a while ago, that you are actually
passing the right amount of memory to your kernel from the device-tree
or the bootloader ?

Ben.

>     Afterwards, we call kmalloc() instead and it works basically as what
> we want. But problems of the memory management still exist because
> therea are program check exception sometimes and page always:
> ....
> -bash-3.2# PROGRAM: reason: 0x8000000, nip: 0xc028bf20
> Oops: Exception in kernel mode, sig: 4 [#1]
> NIP: C028BF20 LR: C028BF20 CTR: C31C6078
> REGS: c028be80 TRAP: 0700   Not tainted  (2.6.19.2-eldk-xm.1.0)
> MSR: 00029030 <EE,ME,IR,DR>  CR: 00000000  XER: 00000000
> TASK = c0228a30[0] 'swapper' THREAD: c028a000
> GPR00: 00000000 C028BF30 C0228A30 C034B7B0 C028BF20 00000000 00000001
> 00000000 
> GPR08: 00000003 C31D0000 22000082 00029030 2BDD9FE1 C03B3164 0000066F
> 2B1F1DC8 
> GPR16: C03B3050 0FFEA478 10010000 C31D0000 C028BEF0 C31CA2E4 00021030
> C028A000 
> GPR24: C028BEF0 C0228B44 C0228468 C03B3050 C028BF10 C31C60C4 00029030
> C03B3050 
> NIP [C028BF20] init_thread_union+0x1f20/0x2000
> LR [C028BF20] init_thread_union+0x1f20/0x2000
> Call Trace:
> [C028BF30] [0FFEA478] 0xffea478 (unreliable)
> Instruction dump:
> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
> Kernel panic - not syncing: Attempted to kill the idle task!
>  <0>Rebooting in 180 seconds..
> 
> And there is bad page:
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Backtrace:
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Bad page state in process 'loader.xm'
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Trying to fix it up, but a reboot is needed
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Bad page state in process 'loader.xm'
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Trying to fix it up, but a reboot is needed
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: page:c02f0e60 flags:0x00000400 mapping:00000000 mapcount:0
> count:1
> 
> I will do some traces for fixing those problems.
> 
> And could anyone like to give some explanation between vmalloc() and
> kmalloc()? Based on our work, there seems to be great difference.
> 
> Thank you very much!
> 
> Best Wishes
> 
> Zhou Rui
> 2008-08-31
> 
> 在 2008-08-25一的 21:16 +0200,Zhou Rui写道:
> > Hi,
> > I think maybe you have known this project named XtratuM
> > (http://www.xtratum.org). I'm porting it from x86 to PPC405. The
> > implementation on PPC440 has been basically finished
> > (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it.
> > 
> > I think I can say something of my analysis so far for the cause of my
> > problem. Thanks for the mention of memory size. Once the kernel module
> > of XtratuM is loaded, the symbols of it are placed to virtual addresses
> > like 0xc3xxxxxx. Because in normal state, address translation is enabled
> > (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading
> > the domain, because the entry point 0x100000a0 is not in TLB and it
> > should be reloaded, Data TLB Miss Exception arises and DTLBMiss is
> > called. The exception clears MSR[IR, DR], so address translation is
> > disabled and physical address should be used at this moment. If we want
> > something at the virtual address of 0xc3xxxxxx, we must access the
> > physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB
> > memory makes the valid physical address range from 0x0 to 0x1ffffff.
> > Therefore, during the exception handling, the addresses out of range
> > should not be accessed, but the instructions cannot know the memory
> > limitation in advance and tries to do something in addresses such as
> > 0x03072da0 based on the address translation mechanism, which leads to
> > machine check.
> > I haved tried to append "mem=32M" to kernel command line but no help. I
> > think it is because when loading the kernel in normal state, address
> > translation is enabled and the virtual addresses are okay. Kernel cannot
> > foresee that there is going to be a TLB miss exception and the illegal
> > physical addresses like 0x03xxxxxx may be accessed.
> > 
> > So any ideas for this problem are welcome.
> > 
> > Thank you very much for taking care.
> > 
> > Best Wishes
> > 
> > Zhou Rui
> > 2008-08-25
> > 
> > 在 2008-08-24日的 20:55 +0200,Wolfgang Denk写道:
> > > Dear Zhou Rui,
> > > 
> > > In message <1219479992.7565.17.camel@localhost> you wrote:
> > > >
> > > > > >    I am running a kernel module which will execute a user space
> > > > > >application. The entry point of the application is 0x100000a0. At the
> > > > > 
> > > > > That should be the first clue that you are doing it wrong.  Don't do
> > > > > stuff like that in modules...
> > > > 
> > > > Oh, but our project needs a function like that ...
> > > 
> > > You should really think about this. Why do you think you  need  this?
> > > What  exactly  are  you  trying  to  do?  [Probably  there are better
> > > approaches to solve your problem...]
> > 
> > > > It is physical address at this moment. Address translation is disabled
> > > > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and
> > > > Instrunction Storage Exception.
> > > 
> > > Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a
> > > physical address?
> > > 
> > > > > Do you have enough DRAM to cover that?  Some of those boards only come
> > > > > with 32MiB of DRAM.
> > > > 
> > > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that?
> > > 
> > > Well, 0x1000'00A0 is above 256 MB, while you  have  only  32  MB  RAM
> > > which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So
> > > what you claim to be a physical address (and I think your claim is
> > > wrong) is far outside available physical memory.
> > > 
> > > > The same codes can run well in a PPC440EP (Yosemite Board) which owns
> > > > 256MB DRAM. At the beginning of my work, I thought memory size may be
> > > > the cause of failure. But I did not know how to demonstrate it. So if
> > > > the limitation of 32MB DRAM leads to the failure, are there any methods
> > > > for the codes to solve it?
> > > 
> > > I think you got lost on the wrong track. Please describe  which  task
> > > you  want  to  implement, and there might be another, better approach
> > > for it.
> > > 
> > > Best regards,
> > > 
> > > Wolfgang Denk
> > 
> > __________________________________________________
> > ϿעŻ?
> > http://cn.mail.yahoo.com
> > 
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-dev
> 
> __________________________________________________
> ϿעŻ?
> http://cn.mail.yahoo.com
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-09-01  7:22             ` Zhou Rui
@ 2008-09-01  7:17               ` Benjamin Herrenschmidt
  2008-09-01  8:19                 ` Zhou Rui
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Herrenschmidt @ 2008-09-01  7:17 UTC (permalink / raw)
  To: Zhou Rui; +Cc: Linuxppc-dev

On Mon, 2008-09-01 at 09:22 +0200, Zhou Rui wrote:
> 在 2008-09-01一的 15:42 +1000,Benjamin Herrenschmidt写道:
> > On Sun, 2008-08-31 at 13:50 +0200, Zhou Rui wrote:
> > > Hi, all:
> > >     My problem seems basically solved.
> > >     We we used to call vmalloc() in the memory management part of our
> > > source, but it seems to be the key unreliable point resulting in the
> > > problem. vmalloc() always assigns some virtual addresses whose
> > > corresponding physical addresses are out of memory size (there is only
> > > 32MB DRAM in our 405 board). Once instructions try to access these
> > > illegal physical address, machine check happens
> > 
> > That should -never- happen.
> > 
> > Have you verified, as I asked you a while ago, that you are actually
> > passing the right amount of memory to your kernel from the device-tree
> > or the bootloader ?
> > 
> > Ben.
> 
> I added "mem=32M" to linux command line of the bootloader, and got the
> same machine check.

Paul spotted today that you are actually trying to use __pa() on
addresses returned from vmalloc, that will not work. Those are virtual
addresses in a non-linear mapping, _pa() only works on the linear
mapping.

Ben.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-09-01  5:42           ` Benjamin Herrenschmidt
@ 2008-09-01  7:22             ` Zhou Rui
  2008-09-01  7:17               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Zhou Rui @ 2008-09-01  7:22 UTC (permalink / raw)
  To: benh; +Cc: Linuxppc-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 9274 bytes --]


在 2008-09-01一的 15:42 +1000,Benjamin Herrenschmidt写道:
> On Sun, 2008-08-31 at 13:50 +0200, Zhou Rui wrote:
> > Hi, all:
> >     My problem seems basically solved.
> >     We we used to call vmalloc() in the memory management part of our
> > source, but it seems to be the key unreliable point resulting in the
> > problem. vmalloc() always assigns some virtual addresses whose
> > corresponding physical addresses are out of memory size (there is only
> > 32MB DRAM in our 405 board). Once instructions try to access these
> > illegal physical address, machine check happens
> 
> That should -never- happen.
> 
> Have you verified, as I asked you a while ago, that you are actually
> passing the right amount of memory to your kernel from the device-tree
> or the bootloader ?
> 
> Ben.

I added "mem=32M" to linux command line of the bootloader, and got the
same machine check.

Best Wishes

Zhou Rui
2008-09-01

> 
> >     Afterwards, we call kmalloc() instead and it works basically as what
> > we want. But problems of the memory management still exist because
> > therea are program check exception sometimes and page always:
> > ....
> > -bash-3.2# PROGRAM: reason: 0x8000000, nip: 0xc028bf20
> > Oops: Exception in kernel mode, sig: 4 [#1]
> > NIP: C028BF20 LR: C028BF20 CTR: C31C6078
> > REGS: c028be80 TRAP: 0700   Not tainted  (2.6.19.2-eldk-xm.1.0)
> > MSR: 00029030 <EE,ME,IR,DR>  CR: 00000000  XER: 00000000
> > TASK = c0228a30[0] 'swapper' THREAD: c028a000
> > GPR00: 00000000 C028BF30 C0228A30 C034B7B0 C028BF20 00000000 00000001
> > 00000000 
> > GPR08: 00000003 C31D0000 22000082 00029030 2BDD9FE1 C03B3164 0000066F
> > 2B1F1DC8 
> > GPR16: C03B3050 0FFEA478 10010000 C31D0000 C028BEF0 C31CA2E4 00021030
> > C028A000 
> > GPR24: C028BEF0 C0228B44 C0228468 C03B3050 C028BF10 C31C60C4 00029030
> > C03B3050 
> > NIP [C028BF20] init_thread_union+0x1f20/0x2000
> > LR [C028BF20] init_thread_union+0x1f20/0x2000
> > Call Trace:
> > [C028BF30] [0FFEA478] 0xffea478 (unreliable)
> > Instruction dump:
> > XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
> > XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
> > Kernel panic - not syncing: Attempted to kill the idle task!
> >  <0>Rebooting in 180 seconds..
> > 
> > And there is bad page:
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Backtrace:
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Bad page state in process 'loader.xm'
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Trying to fix it up, but a reboot is needed
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Bad page state in process 'loader.xm'
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Trying to fix it up, but a reboot is needed
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: page:c02f0e60 flags:0x00000400 mapping:00000000 mapcount:0
> > count:1
> > 
> > I will do some traces for fixing those problems.
> > 
> > And could anyone like to give some explanation between vmalloc() and
> > kmalloc()? Based on our work, there seems to be great difference.
> > 
> > Thank you very much!
> > 
> > Best Wishes
> > 
> > Zhou Rui
> > 2008-08-31
> > 
> > 在 2008-08-25一的 21:16 +0200,Zhou Rui写道:
> > > Hi,
> > > I think maybe you have known this project named XtratuM
> > > (http://www.xtratum.org). I'm porting it from x86 to PPC405. The
> > > implementation on PPC440 has been basically finished
> > > (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it.
> > > 
> > > I think I can say something of my analysis so far for the cause of my
> > > problem. Thanks for the mention of memory size. Once the kernel module
> > > of XtratuM is loaded, the symbols of it are placed to virtual addresses
> > > like 0xc3xxxxxx. Because in normal state, address translation is enabled
> > > (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading
> > > the domain, because the entry point 0x100000a0 is not in TLB and it
> > > should be reloaded, Data TLB Miss Exception arises and DTLBMiss is
> > > called. The exception clears MSR[IR, DR], so address translation is
> > > disabled and physical address should be used at this moment. If we want
> > > something at the virtual address of 0xc3xxxxxx, we must access the
> > > physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB
> > > memory makes the valid physical address range from 0x0 to 0x1ffffff.
> > > Therefore, during the exception handling, the addresses out of range
> > > should not be accessed, but the instructions cannot know the memory
> > > limitation in advance and tries to do something in addresses such as
> > > 0x03072da0 based on the address translation mechanism, which leads to
> > > machine check.
> > > I haved tried to append "mem=32M" to kernel command line but no help. I
> > > think it is because when loading the kernel in normal state, address
> > > translation is enabled and the virtual addresses are okay. Kernel cannot
> > > foresee that there is going to be a TLB miss exception and the illegal
> > > physical addresses like 0x03xxxxxx may be accessed.
> > > 
> > > So any ideas for this problem are welcome.
> > > 
> > > Thank you very much for taking care.
> > > 
> > > Best Wishes
> > > 
> > > Zhou Rui
> > > 2008-08-25
> > > 
> > > 在 2008-08-24日的 20:55 +0200,Wolfgang Denk写道:
> > > > Dear Zhou Rui,
> > > > 
> > > > In message <1219479992.7565.17.camel@localhost> you wrote:
> > > > >
> > > > > > >    I am running a kernel module which will execute a user space
> > > > > > >application. The entry point of the application is 0x100000a0. At the
> > > > > > 
> > > > > > That should be the first clue that you are doing it wrong.  Don't do
> > > > > > stuff like that in modules...
> > > > > 
> > > > > Oh, but our project needs a function like that ...
> > > > 
> > > > You should really think about this. Why do you think you  need  this?
> > > > What  exactly  are  you  trying  to  do?  [Probably  there are better
> > > > approaches to solve your problem...]
> > > 
> > > > > It is physical address at this moment. Address translation is disabled
> > > > > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and
> > > > > Instrunction Storage Exception.
> > > > 
> > > > Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a
> > > > physical address?
> > > > 
> > > > > > Do you have enough DRAM to cover that?  Some of those boards only come
> > > > > > with 32MiB of DRAM.
> > > > > 
> > > > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that?
> > > > 
> > > > Well, 0x1000'00A0 is above 256 MB, while you  have  only  32  MB  RAM
> > > > which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So
> > > > what you claim to be a physical address (and I think your claim is
> > > > wrong) is far outside available physical memory.
> > > > 
> > > > > The same codes can run well in a PPC440EP (Yosemite Board) which owns
> > > > > 256MB DRAM. At the beginning of my work, I thought memory size may be
> > > > > the cause of failure. But I did not know how to demonstrate it. So if
> > > > > the limitation of 32MB DRAM leads to the failure, are there any methods
> > > > > for the codes to solve it?
> > > > 
> > > > I think you got lost on the wrong track. Please describe  which  task
> > > > you  want  to  implement, and there might be another, better approach
> > > > for it.
> > > > 
> > > > Best regards,
> > > > 
> > > > Wolfgang Denk
> > > 
> > > __________________________________________________
> > > ϿעŻ?
> > > http://cn.mail.yahoo.com
> > > 
> > > _______________________________________________
> > > Linuxppc-dev mailing list
> > > Linuxppc-dev@ozlabs.org
> > > https://ozlabs.org/mailman/listinfo/linuxppc-dev
> > 
> > __________________________________________________
> > ϿעŻ?
> > http://cn.mail.yahoo.com
> > 
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@ozlabs.org
> > https://ozlabs.org/mailman/listinfo/linuxppc-dev
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP?
  2008-09-01  7:17               ` Benjamin Herrenschmidt
@ 2008-09-01  8:19                 ` Zhou Rui
  0 siblings, 0 replies; 13+ messages in thread
From: Zhou Rui @ 2008-09-01  8:19 UTC (permalink / raw)
  To: benh; +Cc: Linuxppc-dev


> Paul spotted today that you are actually trying to use __pa() on
> addresses returned from vmalloc, that will not work. Those are virtual
> addresses in a non-linear mapping, _pa() only works on the linear
> mapping.
> 
> Ben.

Oh, do you mean the addresses vmalloc() returns are non-linear mapping?
So if I use vmalloc(), how to translate the virtual addresses to
physical ones?
And does kmalloc() return addresses linear mapping?

Thank you!

Best Wishes

Zhou Rui
2008-09-01

> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-dev

__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-09-01  7:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-22 18:27 Why does one "stw" fail with address translation disabled in PPC405EP? Zhou Rui
2008-08-22 18:42 ` Josh Boyer
2008-08-23  8:26   ` Zhou Rui
2008-08-23 22:49     ` Benjamin Herrenschmidt
2008-08-24 18:55     ` Wolfgang Denk
2008-08-25 19:16       ` Zhou Rui
2008-08-28 15:53         ` Zhou Rui
2008-08-31 11:50         ` [Evolves!] " Zhou Rui
2008-09-01  5:42           ` Benjamin Herrenschmidt
2008-09-01  7:22             ` Zhou Rui
2008-09-01  7:17               ` Benjamin Herrenschmidt
2008-09-01  8:19                 ` Zhou Rui
2008-08-23 21:18   ` Zhou Rui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).