* Why does one "stw" fail with address translation disabled in PPC405EP? @ 2008-08-22 18:27 Zhou Rui 2008-08-22 18:42 ` Josh Boyer 0 siblings, 1 reply; 13+ messages in thread From: Zhou Rui @ 2008-08-22 18:27 UTC (permalink / raw) To: linuxppc-dev Hi, all: I think I meet an odd problem with PPC405EP (PPChameleonEVB Board). I am running a kernel module which will execute a user space application. The entry point of the application is 0x100000a0. At the moment when the processor tries to execute the application, 0x100000a0 is not in TLB (this can be seen from BDI by printing out TLB entries), so DTLBMiss is called automatically and then finish_tlb_load. However, InstructionAccess is followed and the problem arises here. InstructionAccess starts from 0x400, and after instruction "0xc0000434 <InstructionAccess+52>: stw r12,64(r11)", machine check occurs. This instruction will store the value of r12, which is 0x0 at this moment, to address 0x03072de0. I am puzzled why this action leads to machine check. Is it illegal to store 0x0 in a memory address? Or is there some other cause of the machine check here? 405EP>r GPR00: c31c5200 c3072da0 c03a97b0 100000a0 GPR04: c306a000 c306e000 c31c51b8 c306a000 GPR08: c0a64000 c0a64000 40000000 03072da0 GPR12: 00000000 00000000 00000000 00000000 GPR16: 00000000 00000000 00000000 00000000 GPR20: 00000000 00000000 00000000 00000000 GPR24: 00000000 00000000 00000000 00000000 GPR28: 00000000 c31d0000 100000a0 c306a000 CR : 20000000 MSR: 00001000 405EP>t Core number : 0 Core state : debug mode Debug entry cause : single step Current PC : 0x00000434 Current CR : 0x20000000 Current MSR : 0x00001000 Current LR : 0xc31c478c 405EP>r GPR00: c31c5200 c3072da0 c03a97b0 100000a0 GPR04: c306a000 c306e000 c31c51b8 c306a000 GPR08: c0a64000 c0a64000 40000000 03072da0 GPR12: 00000000 00000000 00000000 00000000 GPR16: 00000000 00000000 00000000 00000000 GPR20: 00000000 00000000 00000000 00000000 GPR24: 00000000 00000000 00000000 00000000 GPR28: 00000000 c31d0000 100000a0 c306a000 CR : 20000000 MSR: 00001000 405EP>t Core number : 0 Core state : debug mode Debug entry cause : single step Current PC : 0x00000200 Current CR : 0x20000000 Current MSR : 0x00001000 Current LR : 0xc31c478c 405EP> The error message shows more information. I am also puzzled why NIP here is 0x440 but not 0x434: Data machine check in kernel mode. PLB0: BEAR= 0x03072dd4 ACR= 0x00000000 BESR= 0x00c00000 PLB0 to OPB: BEAR= 0x04000000 BESR0= 0x00000000 BESR1= 0x00000000 Oops: machine check, sig: 7 [#1] NIP: 00000440 LR: C31C478C CTR: 100000A0 REGS: c02a8f50 TRAP: 0202 Not tainted (2.6.19.2) MSR: 00021000 <ME> CR: 20000000 XER: 00000000 TASK = c0399490[987] 'loader.xm' THREAD: c028a000 GPR00: C31C5200 C3072DA0 C0399490 100000A0 C306A000 C306E000 C31C51B8 C306A000 GPR08: C0413000 C0413000 FFFFFFFF 03072DA0 00000000 00000000 00000000 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 GPR24: 00000000 00000000 00000000 00000000 00000000 C31D0000 100000A0 C306A000 NIP [00000440] 0x440 LR [C31C478C] jump_xm_dom+0x2c/0x48 [xm] Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Data machine check in kernel mode. PLB0: BEAR= 0x03072dc0 ACR= 0x00000000 BESR= 0x00800000 PLB0 to OPB: BEAR= 0x04000000 BESR0= 0x00000000 BESR1= 0x00000000 Oops: machine check, sig: 7 [#2] NIP: C0002EA8 LR: C0002E94 CTR: C31C3094 REGS: c02a8f50 TRAP: 0202 Not tainted (2.6.19.2) MSR: 00021030 <ME,IR,DR> CR: 22002022 XER: 00000000 TASK = c03990d0[905] 'klogd' THREAD: c0e34000 GPR00: C0002E94 C0E35F40 C03990D0 00000FFF 00000001 00000000 00000FFF 00000000 GPR08: 00000000 00000000 00021032 00000000 C0E34000 0804E364 100F0000 00000000 GPR16: 101009E8 1009DF98 100F0000 08046368 08046364 07FEF08C 08046130 08004B74 GPR24: 08004FA4 08046130 08004DB4 08004DB8 08004F70 080466BC 08046358 08046AC0 NIP [C0002EA8] ret_from_syscall+0x14/0x3c LR [C0002E94] ret_from_syscall+0x0/0x3c Call Trace: [C0E35F40] [C0002E94] ret_from_syscall+0x0/0x3c (unreliable) Instruction dump: 614a9634 5400103a 408000a0 7d4a002e 7d4803a6 39210010 4e800021 7c661b78 542c0024 3d400002 614a1032 7d400124 <812c0028> 3900fdfc 7120db0f 408201a4 Another question is when 0x100000a0 is missed in TLB, why the order of calling kernel functions is DTLBMiss -- finish_tlb_load -- InstructionAccess? Appreciate in advance for any advice!!! Best Wishes Zhou Rui 2008-08-22 __________________________________________________ ¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä? http://cn.mail.yahoo.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why does one "stw" fail with address translation disabled in PPC405EP? 2008-08-22 18:27 Why does one "stw" fail with address translation disabled in PPC405EP? Zhou Rui @ 2008-08-22 18:42 ` Josh Boyer 2008-08-23 8:26 ` Zhou Rui 2008-08-23 21:18 ` Zhou Rui 0 siblings, 2 replies; 13+ messages in thread From: Josh Boyer @ 2008-08-22 18:42 UTC (permalink / raw) To: Zhou Rui; +Cc: linuxppc-dev On Fri, Aug 22, 2008 at 08:27:15PM +0200, Zhou Rui wrote: >Hi, all: > I think I meet an odd problem with PPC405EP (PPChameleonEVB Board). What kernel version are you using? > I am running a kernel module which will execute a user space >application. The entry point of the application is 0x100000a0. At the That should be the first clue that you are doing it wrong. Don't do stuff like that in modules... >moment when the processor tries to execute the application, 0x100000a0 >is not in TLB (this can be seen from BDI by printing out TLB entries), >so DTLBMiss is called automatically and then finish_tlb_load. However, >InstructionAccess is followed and the problem arises here. >InstructionAccess starts from 0x400, and after instruction "0xc0000434 ><InstructionAccess+52>: stw r12,64(r11)", machine check occurs. >This instruction will store the value of r12, which is 0x0 at this >moment, to address 0x03072de0. I am puzzled why this action leads to >machine check. Is it illegal to store 0x0 in a memory address? Or is >there some other cause of the machine check here? I have no idea if you're using physical or virtual addresses here, so there isn't much we can do to help you. Do you have enough DRAM to cover that? Some of those boards only come with 32MiB of DRAM. josh ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why does one "stw" fail with address translation disabled in PPC405EP? 2008-08-22 18:42 ` Josh Boyer @ 2008-08-23 8:26 ` Zhou Rui 2008-08-23 22:49 ` Benjamin Herrenschmidt 2008-08-24 18:55 ` Wolfgang Denk 2008-08-23 21:18 ` Zhou Rui 1 sibling, 2 replies; 13+ messages in thread From: Zhou Rui @ 2008-08-23 8:26 UTC (permalink / raw) To: Josh Boyer; +Cc: linuxppc-dev [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2237 bytes --] å¨ 2008-08-22äºç 14:42 -0400ï¼Josh Boyeråéï¼ > On Fri, Aug 22, 2008 at 08:27:15PM +0200, Zhou Rui wrote: > >Hi, all: > > I think I meet an odd problem with PPC405EP (PPChameleonEVB Board). > > What kernel version are you using? linux-2.6.19.2 from ELDK4.1 > > > I am running a kernel module which will execute a user space > >application. The entry point of the application is 0x100000a0. At the > > That should be the first clue that you are doing it wrong. Don't do > stuff like that in modules... Oh, but our project needs a function like that ... > > >moment when the processor tries to execute the application, 0x100000a0 > >is not in TLB (this can be seen from BDI by printing out TLB entries), > >so DTLBMiss is called automatically and then finish_tlb_load. However, > >InstructionAccess is followed and the problem arises here. > >InstructionAccess starts from 0x400, and after instruction "0xc0000434 > ><InstructionAccess+52>: stw r12,64(r11)", machine check occurs. > >This instruction will store the value of r12, which is 0x0 at this > >moment, to address 0x03072de0. I am puzzled why this action leads to > >machine check. Is it illegal to store 0x0 in a memory address? Or is > >there some other cause of the machine check here? > > I have no idea if you're using physical or virtual addresses here, so > there isn't much we can do to help you. It is physical address at this moment. Address translation is disabled automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and Instrunction Storage Exception. > > Do you have enough DRAM to cover that? Some of those boards only come > with 32MiB of DRAM. My board only has 32MB DRAM. Do you mean 32MB is not enough for that? The same codes can run well in a PPC440EP (Yosemite Board) which owns 256MB DRAM. At the beginning of my work, I thought memory size may be the cause of failure. But I did not know how to demonstrate it. So if the limitation of 32MB DRAM leads to the failure, are there any methods for the codes to solve it? Thank you very much for your reply! Best Wishes Zhou Rui 2008-08-23 > > josh __________________________________________________ ¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä? http://cn.mail.yahoo.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why does one "stw" fail with address translation disabled in PPC405EP? 2008-08-23 8:26 ` Zhou Rui @ 2008-08-23 22:49 ` Benjamin Herrenschmidt 2008-08-24 18:55 ` Wolfgang Denk 1 sibling, 0 replies; 13+ messages in thread From: Benjamin Herrenschmidt @ 2008-08-23 22:49 UTC (permalink / raw) To: Zhou Rui; +Cc: linuxppc-dev On Sat, 2008-08-23 at 10:26 +0200, Zhou Rui wrote: > My board only has 32MB DRAM. Do you mean 32MB is not enough for that? > The same codes can run well in a PPC440EP (Yosemite Board) which owns > 256MB DRAM. At the beginning of my work, I thought memory size may be > the cause of failure. But I did not know how to demonstrate it. So if > the limitation of 32MB DRAM leads to the failure, are there any methods > for the codes to solve it? Well, it looks like the kernel is trying to access memory beyond 32M, which would mean that the problem is that your kernel port or your bootloader somewhat thinks there is more memory than there really is. You need to look there... whatever passes the amount of memory to the kernel at boot needs to be fixed. Ben. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why does one "stw" fail with address translation disabled in PPC405EP? 2008-08-23 8:26 ` Zhou Rui 2008-08-23 22:49 ` Benjamin Herrenschmidt @ 2008-08-24 18:55 ` Wolfgang Denk 2008-08-25 19:16 ` Zhou Rui 1 sibling, 1 reply; 13+ messages in thread From: Wolfgang Denk @ 2008-08-24 18:55 UTC (permalink / raw) To: Zhou Rui; +Cc: linuxppc-dev Dear Zhou Rui, In message <1219479992.7565.17.camel@localhost> you wrote: > > > > I am running a kernel module which will execute a user space > > >application. The entry point of the application is 0x100000a0. At the > > > > That should be the first clue that you are doing it wrong. Don't do > > stuff like that in modules... > > Oh, but our project needs a function like that ... You should really think about this. Why do you think you need this? What exactly are you trying to do? [Probably there are better approaches to solve your problem...] > It is physical address at this moment. Address translation is disabled > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and > Instrunction Storage Exception. Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a physical address? > > Do you have enough DRAM to cover that? Some of those boards only come > > with 32MiB of DRAM. > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that? Well, 0x1000'00A0 is above 256 MB, while you have only 32 MB RAM which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So what you claim to be a physical address (and I think your claim is wrong) is far outside available physical memory. > The same codes can run well in a PPC440EP (Yosemite Board) which owns > 256MB DRAM. At the beginning of my work, I thought memory size may be > the cause of failure. But I did not know how to demonstrate it. So if > the limitation of 32MB DRAM leads to the failure, are there any methods > for the codes to solve it? I think you got lost on the wrong track. Please describe which task you want to implement, and there might be another, better approach for it. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de The management question ... is not _whether_ to build a pilot system and throw it away. You _will_ do that. The only question is whether to plan in advance to build a throwaway, or to promise to deliver the throwaway to customers. - Fred Brooks, "The Mythical Man Month" ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why does one "stw" fail with address translation disabled in PPC405EP? 2008-08-24 18:55 ` Wolfgang Denk @ 2008-08-25 19:16 ` Zhou Rui 2008-08-28 15:53 ` Zhou Rui 2008-08-31 11:50 ` [Evolves!] " Zhou Rui 0 siblings, 2 replies; 13+ messages in thread From: Zhou Rui @ 2008-08-25 19:16 UTC (permalink / raw) To: Wolfgang Denk; +Cc: linuxppc-dev [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 4697 bytes --] Hi, I think maybe you have known this project named XtratuM (http://www.xtratum.org). I'm porting it from x86 to PPC405. The implementation on PPC440 has been basically finished (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it. I think I can say something of my analysis so far for the cause of my problem. Thanks for the mention of memory size. Once the kernel module of XtratuM is loaded, the symbols of it are placed to virtual addresses like 0xc3xxxxxx. Because in normal state, address translation is enabled (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading the domain, because the entry point 0x100000a0 is not in TLB and it should be reloaded, Data TLB Miss Exception arises and DTLBMiss is called. The exception clears MSR[IR, DR], so address translation is disabled and physical address should be used at this moment. If we want something at the virtual address of 0xc3xxxxxx, we must access the physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB memory makes the valid physical address range from 0x0 to 0x1ffffff. Therefore, during the exception handling, the addresses out of range should not be accessed, but the instructions cannot know the memory limitation in advance and tries to do something in addresses such as 0x03072da0 based on the address translation mechanism, which leads to machine check. I haved tried to append "mem=32M" to kernel command line but no help. I think it is because when loading the kernel in normal state, address translation is enabled and the virtual addresses are okay. Kernel cannot foresee that there is going to be a TLB miss exception and the illegal physical addresses like 0x03xxxxxx may be accessed. So any ideas for this problem are welcome. Thank you very much for taking care. Best Wishes Zhou Rui 2008-08-25 å¨ 2008-08-24æ¥ç 20:55 +0200ï¼Wolfgang Denkåéï¼ > Dear Zhou Rui, > > In message <1219479992.7565.17.camel@localhost> you wrote: > > > > > > I am running a kernel module which will execute a user space > > > >application. The entry point of the application is 0x100000a0. At the > > > > > > That should be the first clue that you are doing it wrong. Don't do > > > stuff like that in modules... > > > > Oh, but our project needs a function like that ... > > You should really think about this. Why do you think you need this? > What exactly are you trying to do? [Probably there are better > approaches to solve your problem...] > > It is physical address at this moment. Address translation is disabled > > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and > > Instrunction Storage Exception. > > Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a > physical address? > > > > Do you have enough DRAM to cover that? Some of those boards only come > > > with 32MiB of DRAM. > > > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that? > > Well, 0x1000'00A0 is above 256 MB, while you have only 32 MB RAM > which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So > what you claim to be a physical address (and I think your claim is > wrong) is far outside available physical memory. > > > The same codes can run well in a PPC440EP (Yosemite Board) which owns > > 256MB DRAM. At the beginning of my work, I thought memory size may be > > the cause of failure. But I did not know how to demonstrate it. So if > > the limitation of 32MB DRAM leads to the failure, are there any methods > > for the codes to solve it? > > I think you got lost on the wrong track. Please describe which task > you want to implement, and there might be another, better approach > for it. > > Best regards, > > Wolfgang Denk __________________________________________________ ¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä? http://cn.mail.yahoo.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why does one "stw" fail with address translation disabled in PPC405EP? 2008-08-25 19:16 ` Zhou Rui @ 2008-08-28 15:53 ` Zhou Rui 2008-08-31 11:50 ` [Evolves!] " Zhou Rui 1 sibling, 0 replies; 13+ messages in thread From: Zhou Rui @ 2008-08-28 15:53 UTC (permalink / raw) To: Linuxppc-dev [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 6349 bytes --] Hi, all: Well, as described before, the problem happens at "0xc0000434 <InstructionAccess+52>: stw r12,64(r11)". At this moment, address translation is disabled and physical addresses are used, but r11 contains 0x03072da0 which is a physical address out of the range of 0x0 and 0x01ffffff. I check backward and see that value of r11 is from r1 by tophys(r11,r1). r1 should hold the stack pointer, whose value is in the form of 0xc3xxxxxx when the problem happens. In this project, we make use of the concept of domain for high level OSes above XtratuM kernel. Here Linux is the root domain and the problem happens when we tries to load and transfer to a testing domain which just prints some sentences, and 0x100000a0 is the entry point of the testing domain, so we need to execute this address. We define a struct domain_t for the domain and the stack pointer is achieved by: .... d->sstack_st = vmalloc (DEFAULT_SSTACK_SIZE); /* DEFAULT_SSTACK_SIZE is 0x1000 */ if (!d->sstack_st) { destroy_domain (d); id = -OUT_OF_MEMORY; goto exit_load_domain; } d->sstack = (unsigned long *)((unsigned long)d->sstack_st + DEFAULT_SSTACK_SIZE); /* sstack is the new domain's stack pointer that will be moved to r1 */ After doing this, sstack will get the value of 0xc3xxxxxx. Shouln't I use vmalloc here? Or are there any other solution? Thanks in advance for any advice! Best Wishes Zhou Rui 2008-08-28 å¨ 2008-08-25ä¸ç 21:16 +0200ï¼Zhou Ruiåéï¼ > Hi, > I think maybe you have known this project named XtratuM > (http://www.xtratum.org). I'm porting it from x86 to PPC405. The > implementation on PPC440 has been basically finished > (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it. > > I think I can say something of my analysis so far for the cause of my > problem. Thanks for the mention of memory size. Once the kernel module > of XtratuM is loaded, the symbols of it are placed to virtual addresses > like 0xc3xxxxxx. Because in normal state, address translation is enabled > (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading > the domain, because the entry point 0x100000a0 is not in TLB and it > should be reloaded, Data TLB Miss Exception arises and DTLBMiss is > called. The exception clears MSR[IR, DR], so address translation is > disabled and physical address should be used at this moment. If we want > something at the virtual address of 0xc3xxxxxx, we must access the > physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB > memory makes the valid physical address range from 0x0 to 0x1ffffff. > Therefore, during the exception handling, the addresses out of range > should not be accessed, but the instructions cannot know the memory > limitation in advance and tries to do something in addresses such as > 0x03072da0 based on the address translation mechanism, which leads to > machine check. > I haved tried to append "mem=32M" to kernel command line but no help. I > think it is because when loading the kernel in normal state, address > translation is enabled and the virtual addresses are okay. Kernel cannot > foresee that there is going to be a TLB miss exception and the illegal > physical addresses like 0x03xxxxxx may be accessed. > > So any ideas for this problem are welcome. > > Thank you very much for taking care. > > Best Wishes > > Zhou Rui > 2008-08-25 > > å¨ 2008-08-24æ¥ç 20:55 +0200ï¼Wolfgang Denkåéï¼ > > Dear Zhou Rui, > > > > In message <1219479992.7565.17.camel@localhost> you wrote: > > > > > > > > I am running a kernel module which will execute a user space > > > > >application. The entry point of the application is 0x100000a0. At the > > > > > > > > That should be the first clue that you are doing it wrong. Don't do > > > > stuff like that in modules... > > > > > > Oh, but our project needs a function like that ... > > > > You should really think about this. Why do you think you need this? > > What exactly are you trying to do? [Probably there are better > > approaches to solve your problem...] > > > > It is physical address at this moment. Address translation is disabled > > > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and > > > Instrunction Storage Exception. > > > > Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a > > physical address? > > > > > > Do you have enough DRAM to cover that? Some of those boards only come > > > > with 32MiB of DRAM. > > > > > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that? > > > > Well, 0x1000'00A0 is above 256 MB, while you have only 32 MB RAM > > which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So > > what you claim to be a physical address (and I think your claim is > > wrong) is far outside available physical memory. > > > > > The same codes can run well in a PPC440EP (Yosemite Board) which owns > > > 256MB DRAM. At the beginning of my work, I thought memory size may be > > > the cause of failure. But I did not know how to demonstrate it. So if > > > the limitation of 32MB DRAM leads to the failure, are there any methods > > > for the codes to solve it? > > > > I think you got lost on the wrong track. Please describe which task > > you want to implement, and there might be another, better approach > > for it. > > > > Best regards, > > > > Wolfgang Denk __________________________________________________ ¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä? http://cn.mail.yahoo.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP? 2008-08-25 19:16 ` Zhou Rui 2008-08-28 15:53 ` Zhou Rui @ 2008-08-31 11:50 ` Zhou Rui 2008-09-01 5:42 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 13+ messages in thread From: Zhou Rui @ 2008-08-31 11:50 UTC (permalink / raw) To: Linuxppc-dev [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 7752 bytes --] Hi, all: My problem seems basically solved. We we used to call vmalloc() in the memory management part of our source, but it seems to be the key unreliable point resulting in the problem. vmalloc() always assigns some virtual addresses whose corresponding physical addresses are out of memory size (there is only 32MB DRAM in our 405 board). Once instructions try to access these illegal physical address, machine check happens Afterwards, we call kmalloc() instead and it works basically as what we want. But problems of the memory management still exist because therea are program check exception sometimes and page always: .... -bash-3.2# PROGRAM: reason: 0x8000000, nip: 0xc028bf20 Oops: Exception in kernel mode, sig: 4 [#1] NIP: C028BF20 LR: C028BF20 CTR: C31C6078 REGS: c028be80 TRAP: 0700 Not tainted (2.6.19.2-eldk-xm.1.0) MSR: 00029030 <EE,ME,IR,DR> CR: 00000000 XER: 00000000 TASK = c0228a30[0] 'swapper' THREAD: c028a000 GPR00: 00000000 C028BF30 C0228A30 C034B7B0 C028BF20 00000000 00000001 00000000 GPR08: 00000003 C31D0000 22000082 00029030 2BDD9FE1 C03B3164 0000066F 2B1F1DC8 GPR16: C03B3050 0FFEA478 10010000 C31D0000 C028BEF0 C31CA2E4 00021030 C028A000 GPR24: C028BEF0 C0228B44 C0228468 C03B3050 C028BF10 C31C60C4 00029030 C03B3050 NIP [C028BF20] init_thread_union+0x1f20/0x2000 LR [C028BF20] init_thread_union+0x1f20/0x2000 Call Trace: [C028BF30] [0FFEA478] 0xffea478 (unreliable) Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Kernel panic - not syncing: Attempted to kill the idle task! <0>Rebooting in 180 seconds.. And there is bad page: Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... 405 kernel: Backtrace: Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... 405 kernel: Bad page state in process 'loader.xm' Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... 405 kernel: Trying to fix it up, but a reboot is needed Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... 405 kernel: Bad page state in process 'loader.xm' Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... 405 kernel: Trying to fix it up, but a reboot is needed Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... 405 kernel: page:c02f0e60 flags:0x00000400 mapping:00000000 mapcount:0 count:1 I will do some traces for fixing those problems. And could anyone like to give some explanation between vmalloc() and kmalloc()? Based on our work, there seems to be great difference. Thank you very much! Best Wishes Zhou Rui 2008-08-31 å¨ 2008-08-25ä¸ç 21:16 +0200ï¼Zhou Ruiåéï¼ > Hi, > I think maybe you have known this project named XtratuM > (http://www.xtratum.org). I'm porting it from x86 to PPC405. The > implementation on PPC440 has been basically finished > (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it. > > I think I can say something of my analysis so far for the cause of my > problem. Thanks for the mention of memory size. Once the kernel module > of XtratuM is loaded, the symbols of it are placed to virtual addresses > like 0xc3xxxxxx. Because in normal state, address translation is enabled > (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading > the domain, because the entry point 0x100000a0 is not in TLB and it > should be reloaded, Data TLB Miss Exception arises and DTLBMiss is > called. The exception clears MSR[IR, DR], so address translation is > disabled and physical address should be used at this moment. If we want > something at the virtual address of 0xc3xxxxxx, we must access the > physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB > memory makes the valid physical address range from 0x0 to 0x1ffffff. > Therefore, during the exception handling, the addresses out of range > should not be accessed, but the instructions cannot know the memory > limitation in advance and tries to do something in addresses such as > 0x03072da0 based on the address translation mechanism, which leads to > machine check. > I haved tried to append "mem=32M" to kernel command line but no help. I > think it is because when loading the kernel in normal state, address > translation is enabled and the virtual addresses are okay. Kernel cannot > foresee that there is going to be a TLB miss exception and the illegal > physical addresses like 0x03xxxxxx may be accessed. > > So any ideas for this problem are welcome. > > Thank you very much for taking care. > > Best Wishes > > Zhou Rui > 2008-08-25 > > å¨ 2008-08-24æ¥ç 20:55 +0200ï¼Wolfgang Denkåéï¼ > > Dear Zhou Rui, > > > > In message <1219479992.7565.17.camel@localhost> you wrote: > > > > > > > > I am running a kernel module which will execute a user space > > > > >application. The entry point of the application is 0x100000a0. At the > > > > > > > > That should be the first clue that you are doing it wrong. Don't do > > > > stuff like that in modules... > > > > > > Oh, but our project needs a function like that ... > > > > You should really think about this. Why do you think you need this? > > What exactly are you trying to do? [Probably there are better > > approaches to solve your problem...] > > > > It is physical address at this moment. Address translation is disabled > > > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and > > > Instrunction Storage Exception. > > > > Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a > > physical address? > > > > > > Do you have enough DRAM to cover that? Some of those boards only come > > > > with 32MiB of DRAM. > > > > > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that? > > > > Well, 0x1000'00A0 is above 256 MB, while you have only 32 MB RAM > > which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So > > what you claim to be a physical address (and I think your claim is > > wrong) is far outside available physical memory. > > > > > The same codes can run well in a PPC440EP (Yosemite Board) which owns > > > 256MB DRAM. At the beginning of my work, I thought memory size may be > > > the cause of failure. But I did not know how to demonstrate it. So if > > > the limitation of 32MB DRAM leads to the failure, are there any methods > > > for the codes to solve it? > > > > I think you got lost on the wrong track. Please describe which task > > you want to implement, and there might be another, better approach > > for it. > > > > Best regards, > > > > Wolfgang Denk > > __________________________________________________ > ϿעŻ? > http://cn.mail.yahoo.com > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-dev __________________________________________________ ¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä? http://cn.mail.yahoo.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP? 2008-08-31 11:50 ` [Evolves!] " Zhou Rui @ 2008-09-01 5:42 ` Benjamin Herrenschmidt 2008-09-01 7:22 ` Zhou Rui 0 siblings, 1 reply; 13+ messages in thread From: Benjamin Herrenschmidt @ 2008-09-01 5:42 UTC (permalink / raw) To: Zhou Rui; +Cc: Linuxppc-dev On Sun, 2008-08-31 at 13:50 +0200, Zhou Rui wrote: > Hi, all: > My problem seems basically solved. > We we used to call vmalloc() in the memory management part of our > source, but it seems to be the key unreliable point resulting in the > problem. vmalloc() always assigns some virtual addresses whose > corresponding physical addresses are out of memory size (there is only > 32MB DRAM in our 405 board). Once instructions try to access these > illegal physical address, machine check happens That should -never- happen. Have you verified, as I asked you a while ago, that you are actually passing the right amount of memory to your kernel from the device-tree or the bootloader ? Ben. > Afterwards, we call kmalloc() instead and it works basically as what > we want. But problems of the memory management still exist because > therea are program check exception sometimes and page always: > .... > -bash-3.2# PROGRAM: reason: 0x8000000, nip: 0xc028bf20 > Oops: Exception in kernel mode, sig: 4 [#1] > NIP: C028BF20 LR: C028BF20 CTR: C31C6078 > REGS: c028be80 TRAP: 0700 Not tainted (2.6.19.2-eldk-xm.1.0) > MSR: 00029030 <EE,ME,IR,DR> CR: 00000000 XER: 00000000 > TASK = c0228a30[0] 'swapper' THREAD: c028a000 > GPR00: 00000000 C028BF30 C0228A30 C034B7B0 C028BF20 00000000 00000001 > 00000000 > GPR08: 00000003 C31D0000 22000082 00029030 2BDD9FE1 C03B3164 0000066F > 2B1F1DC8 > GPR16: C03B3050 0FFEA478 10010000 C31D0000 C028BEF0 C31CA2E4 00021030 > C028A000 > GPR24: C028BEF0 C0228B44 C0228468 C03B3050 C028BF10 C31C60C4 00029030 > C03B3050 > NIP [C028BF20] init_thread_union+0x1f20/0x2000 > LR [C028BF20] init_thread_union+0x1f20/0x2000 > Call Trace: > [C028BF30] [0FFEA478] 0xffea478 (unreliable) > Instruction dump: > XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX > XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX > Kernel panic - not syncing: Attempted to kill the idle task! > <0>Rebooting in 180 seconds.. > > And there is bad page: > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > 405 kernel: Backtrace: > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > 405 kernel: Bad page state in process 'loader.xm' > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > 405 kernel: Trying to fix it up, but a reboot is needed > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > 405 kernel: Bad page state in process 'loader.xm' > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > 405 kernel: Trying to fix it up, but a reboot is needed > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > 405 kernel: page:c02f0e60 flags:0x00000400 mapping:00000000 mapcount:0 > count:1 > > I will do some traces for fixing those problems. > > And could anyone like to give some explanation between vmalloc() and > kmalloc()? Based on our work, there seems to be great difference. > > Thank you very much! > > Best Wishes > > Zhou Rui > 2008-08-31 > > 在 2008-08-25一的 21:16 +0200,Zhou Rui写道: > > Hi, > > I think maybe you have known this project named XtratuM > > (http://www.xtratum.org). I'm porting it from x86 to PPC405. The > > implementation on PPC440 has been basically finished > > (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it. > > > > I think I can say something of my analysis so far for the cause of my > > problem. Thanks for the mention of memory size. Once the kernel module > > of XtratuM is loaded, the symbols of it are placed to virtual addresses > > like 0xc3xxxxxx. Because in normal state, address translation is enabled > > (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading > > the domain, because the entry point 0x100000a0 is not in TLB and it > > should be reloaded, Data TLB Miss Exception arises and DTLBMiss is > > called. The exception clears MSR[IR, DR], so address translation is > > disabled and physical address should be used at this moment. If we want > > something at the virtual address of 0xc3xxxxxx, we must access the > > physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB > > memory makes the valid physical address range from 0x0 to 0x1ffffff. > > Therefore, during the exception handling, the addresses out of range > > should not be accessed, but the instructions cannot know the memory > > limitation in advance and tries to do something in addresses such as > > 0x03072da0 based on the address translation mechanism, which leads to > > machine check. > > I haved tried to append "mem=32M" to kernel command line but no help. I > > think it is because when loading the kernel in normal state, address > > translation is enabled and the virtual addresses are okay. Kernel cannot > > foresee that there is going to be a TLB miss exception and the illegal > > physical addresses like 0x03xxxxxx may be accessed. > > > > So any ideas for this problem are welcome. > > > > Thank you very much for taking care. > > > > Best Wishes > > > > Zhou Rui > > 2008-08-25 > > > > 在 2008-08-24日的 20:55 +0200,Wolfgang Denk写道: > > > Dear Zhou Rui, > > > > > > In message <1219479992.7565.17.camel@localhost> you wrote: > > > > > > > > > > I am running a kernel module which will execute a user space > > > > > >application. The entry point of the application is 0x100000a0. At the > > > > > > > > > > That should be the first clue that you are doing it wrong. Don't do > > > > > stuff like that in modules... > > > > > > > > Oh, but our project needs a function like that ... > > > > > > You should really think about this. Why do you think you need this? > > > What exactly are you trying to do? [Probably there are better > > > approaches to solve your problem...] > > > > > > It is physical address at this moment. Address translation is disabled > > > > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and > > > > Instrunction Storage Exception. > > > > > > Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a > > > physical address? > > > > > > > > Do you have enough DRAM to cover that? Some of those boards only come > > > > > with 32MiB of DRAM. > > > > > > > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that? > > > > > > Well, 0x1000'00A0 is above 256 MB, while you have only 32 MB RAM > > > which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So > > > what you claim to be a physical address (and I think your claim is > > > wrong) is far outside available physical memory. > > > > > > > The same codes can run well in a PPC440EP (Yosemite Board) which owns > > > > 256MB DRAM. At the beginning of my work, I thought memory size may be > > > > the cause of failure. But I did not know how to demonstrate it. So if > > > > the limitation of 32MB DRAM leads to the failure, are there any methods > > > > for the codes to solve it? > > > > > > I think you got lost on the wrong track. Please describe which task > > > you want to implement, and there might be another, better approach > > > for it. > > > > > > Best regards, > > > > > > Wolfgang Denk > > > > __________________________________________________ > > ϿעŻ? > > http://cn.mail.yahoo.com > > > > _______________________________________________ > > Linuxppc-dev mailing list > > Linuxppc-dev@ozlabs.org > > https://ozlabs.org/mailman/listinfo/linuxppc-dev > > __________________________________________________ > ϿעŻ? > http://cn.mail.yahoo.com > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP? 2008-09-01 5:42 ` Benjamin Herrenschmidt @ 2008-09-01 7:22 ` Zhou Rui 2008-09-01 7:17 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 13+ messages in thread From: Zhou Rui @ 2008-09-01 7:22 UTC (permalink / raw) To: benh; +Cc: Linuxppc-dev [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 9274 bytes --] å¨ 2008-09-01ä¸ç 15:42 +1000ï¼Benjamin Herrenschmidtåéï¼ > On Sun, 2008-08-31 at 13:50 +0200, Zhou Rui wrote: > > Hi, all: > > My problem seems basically solved. > > We we used to call vmalloc() in the memory management part of our > > source, but it seems to be the key unreliable point resulting in the > > problem. vmalloc() always assigns some virtual addresses whose > > corresponding physical addresses are out of memory size (there is only > > 32MB DRAM in our 405 board). Once instructions try to access these > > illegal physical address, machine check happens > > That should -never- happen. > > Have you verified, as I asked you a while ago, that you are actually > passing the right amount of memory to your kernel from the device-tree > or the bootloader ? > > Ben. I added "mem=32M" to linux command line of the bootloader, and got the same machine check. Best Wishes Zhou Rui 2008-09-01 > > > Afterwards, we call kmalloc() instead and it works basically as what > > we want. But problems of the memory management still exist because > > therea are program check exception sometimes and page always: > > .... > > -bash-3.2# PROGRAM: reason: 0x8000000, nip: 0xc028bf20 > > Oops: Exception in kernel mode, sig: 4 [#1] > > NIP: C028BF20 LR: C028BF20 CTR: C31C6078 > > REGS: c028be80 TRAP: 0700 Not tainted (2.6.19.2-eldk-xm.1.0) > > MSR: 00029030 <EE,ME,IR,DR> CR: 00000000 XER: 00000000 > > TASK = c0228a30[0] 'swapper' THREAD: c028a000 > > GPR00: 00000000 C028BF30 C0228A30 C034B7B0 C028BF20 00000000 00000001 > > 00000000 > > GPR08: 00000003 C31D0000 22000082 00029030 2BDD9FE1 C03B3164 0000066F > > 2B1F1DC8 > > GPR16: C03B3050 0FFEA478 10010000 C31D0000 C028BEF0 C31CA2E4 00021030 > > C028A000 > > GPR24: C028BEF0 C0228B44 C0228468 C03B3050 C028BF10 C31C60C4 00029030 > > C03B3050 > > NIP [C028BF20] init_thread_union+0x1f20/0x2000 > > LR [C028BF20] init_thread_union+0x1f20/0x2000 > > Call Trace: > > [C028BF30] [0FFEA478] 0xffea478 (unreliable) > > Instruction dump: > > XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX > > XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX > > Kernel panic - not syncing: Attempted to kill the idle task! > > <0>Rebooting in 180 seconds.. > > > > And there is bad page: > > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > > 405 kernel: Backtrace: > > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > > 405 kernel: Bad page state in process 'loader.xm' > > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > > 405 kernel: Trying to fix it up, but a reboot is needed > > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > > 405 kernel: Bad page state in process 'loader.xm' > > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > > 405 kernel: Trying to fix it up, but a reboot is needed > > Message from syslogd@ at Thu Jan 1 01:32:00 1970 ... > > 405 kernel: page:c02f0e60 flags:0x00000400 mapping:00000000 mapcount:0 > > count:1 > > > > I will do some traces for fixing those problems. > > > > And could anyone like to give some explanation between vmalloc() and > > kmalloc()? Based on our work, there seems to be great difference. > > > > Thank you very much! > > > > Best Wishes > > > > Zhou Rui > > 2008-08-31 > > > > å¨ 2008-08-25ä¸ç 21:16 +0200ï¼Zhou Ruiåéï¼ > > > Hi, > > > I think maybe you have known this project named XtratuM > > > (http://www.xtratum.org). I'm porting it from x86 to PPC405. The > > > implementation on PPC440 has been basically finished > > > (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2) and I know there was discussion about it in this mail list before. XtratuM is an ADEOS based nano kernel. It aims for realtime and is designed to provide virtual timer, virtual interrupt and memory space sperations for domains. Each domain is loaded by a userspace program (instead of the root domain as a kernel module) and the loader will load the domain's (ELF staticly excutable) PT_LOAD section into memory, and then raise a properly system call (passing the structurized loaded data as arguments) to load the domain via load_domain_sys() of XtratuM, and at the last step of loading the domain, xtratum will jump to the entry code of the new domain(asm wrappered start() routine) and then everything should be fine. 0x100000a0 is the entry point of the test domain, and that is why I need to start execution from it. > > > > > > I think I can say something of my analysis so far for the cause of my > > > problem. Thanks for the mention of memory size. Once the kernel module > > > of XtratuM is loaded, the symbols of it are placed to virtual addresses > > > like 0xc3xxxxxx. Because in normal state, address translation is enabled > > > (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading > > > the domain, because the entry point 0x100000a0 is not in TLB and it > > > should be reloaded, Data TLB Miss Exception arises and DTLBMiss is > > > called. The exception clears MSR[IR, DR], so address translation is > > > disabled and physical address should be used at this moment. If we want > > > something at the virtual address of 0xc3xxxxxx, we must access the > > > physical addresses like 0x03xxxxxx. Nevertheless, the limitation of 32MB > > > memory makes the valid physical address range from 0x0 to 0x1ffffff. > > > Therefore, during the exception handling, the addresses out of range > > > should not be accessed, but the instructions cannot know the memory > > > limitation in advance and tries to do something in addresses such as > > > 0x03072da0 based on the address translation mechanism, which leads to > > > machine check. > > > I haved tried to append "mem=32M" to kernel command line but no help. I > > > think it is because when loading the kernel in normal state, address > > > translation is enabled and the virtual addresses are okay. Kernel cannot > > > foresee that there is going to be a TLB miss exception and the illegal > > > physical addresses like 0x03xxxxxx may be accessed. > > > > > > So any ideas for this problem are welcome. > > > > > > Thank you very much for taking care. > > > > > > Best Wishes > > > > > > Zhou Rui > > > 2008-08-25 > > > > > > å¨ 2008-08-24æ¥ç 20:55 +0200ï¼Wolfgang Denkåéï¼ > > > > Dear Zhou Rui, > > > > > > > > In message <1219479992.7565.17.camel@localhost> you wrote: > > > > > > > > > > > > I am running a kernel module which will execute a user space > > > > > > >application. The entry point of the application is 0x100000a0. At the > > > > > > > > > > > > That should be the first clue that you are doing it wrong. Don't do > > > > > > stuff like that in modules... > > > > > > > > > > Oh, but our project needs a function like that ... > > > > > > > > You should really think about this. Why do you think you need this? > > > > What exactly are you trying to do? [Probably there are better > > > > approaches to solve your problem...] > > > > > > > > It is physical address at this moment. Address translation is disabled > > > > > automatically (MSR[IR, DR] = [0, 0]) because of TLB Miss Exception and > > > > > Instrunction Storage Exception. > > > > > > > > Hm.. are you absolutely sure that the 0x100000a0 mentioned above is a > > > > physical address? > > > > > > > > > > Do you have enough DRAM to cover that? Some of those boards only come > > > > > > with 32MiB of DRAM. > > > > > > > > > > My board only has 32MB DRAM. Do you mean 32MB is not enough for that? > > > > > > > > Well, 0x1000'00A0 is above 256 MB, while you have only 32 MB RAM > > > > which is most probably mapped from 0x0000'0000...0x01FF'FFFF... So > > > > what you claim to be a physical address (and I think your claim is > > > > wrong) is far outside available physical memory. > > > > > > > > > The same codes can run well in a PPC440EP (Yosemite Board) which owns > > > > > 256MB DRAM. At the beginning of my work, I thought memory size may be > > > > > the cause of failure. But I did not know how to demonstrate it. So if > > > > > the limitation of 32MB DRAM leads to the failure, are there any methods > > > > > for the codes to solve it? > > > > > > > > I think you got lost on the wrong track. Please describe which task > > > > you want to implement, and there might be another, better approach > > > > for it. > > > > > > > > Best regards, > > > > > > > > Wolfgang Denk > > > > > > __________________________________________________ > > > ϿעŻ? > > > http://cn.mail.yahoo.com > > > > > > _______________________________________________ > > > Linuxppc-dev mailing list > > > Linuxppc-dev@ozlabs.org > > > https://ozlabs.org/mailman/listinfo/linuxppc-dev > > > > __________________________________________________ > > ϿעŻ? > > http://cn.mail.yahoo.com > > > > _______________________________________________ > > Linuxppc-dev mailing list > > Linuxppc-dev@ozlabs.org > > https://ozlabs.org/mailman/listinfo/linuxppc-dev > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-dev __________________________________________________ ¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä? http://cn.mail.yahoo.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP? 2008-09-01 7:22 ` Zhou Rui @ 2008-09-01 7:17 ` Benjamin Herrenschmidt 2008-09-01 8:19 ` Zhou Rui 0 siblings, 1 reply; 13+ messages in thread From: Benjamin Herrenschmidt @ 2008-09-01 7:17 UTC (permalink / raw) To: Zhou Rui; +Cc: Linuxppc-dev On Mon, 2008-09-01 at 09:22 +0200, Zhou Rui wrote: > 在 2008-09-01一的 15:42 +1000,Benjamin Herrenschmidt写道: > > On Sun, 2008-08-31 at 13:50 +0200, Zhou Rui wrote: > > > Hi, all: > > > My problem seems basically solved. > > > We we used to call vmalloc() in the memory management part of our > > > source, but it seems to be the key unreliable point resulting in the > > > problem. vmalloc() always assigns some virtual addresses whose > > > corresponding physical addresses are out of memory size (there is only > > > 32MB DRAM in our 405 board). Once instructions try to access these > > > illegal physical address, machine check happens > > > > That should -never- happen. > > > > Have you verified, as I asked you a while ago, that you are actually > > passing the right amount of memory to your kernel from the device-tree > > or the bootloader ? > > > > Ben. > > I added "mem=32M" to linux command line of the bootloader, and got the > same machine check. Paul spotted today that you are actually trying to use __pa() on addresses returned from vmalloc, that will not work. Those are virtual addresses in a non-linear mapping, _pa() only works on the linear mapping. Ben. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP? 2008-09-01 7:17 ` Benjamin Herrenschmidt @ 2008-09-01 8:19 ` Zhou Rui 0 siblings, 0 replies; 13+ messages in thread From: Zhou Rui @ 2008-09-01 8:19 UTC (permalink / raw) To: benh; +Cc: Linuxppc-dev > Paul spotted today that you are actually trying to use __pa() on > addresses returned from vmalloc, that will not work. Those are virtual > addresses in a non-linear mapping, _pa() only works on the linear > mapping. > > Ben. Oh, do you mean the addresses vmalloc() returns are non-linear mapping? So if I use vmalloc(), how to translate the virtual addresses to physical ones? And does kmalloc() return addresses linear mapping? Thank you! Best Wishes Zhou Rui 2008-09-01 > > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-dev __________________________________________________ ¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä? http://cn.mail.yahoo.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Why does one "stw" fail with address translation disabled in PPC405EP? 2008-08-22 18:42 ` Josh Boyer 2008-08-23 8:26 ` Zhou Rui @ 2008-08-23 21:18 ` Zhou Rui 1 sibling, 0 replies; 13+ messages in thread From: Zhou Rui @ 2008-08-23 21:18 UTC (permalink / raw) To: Josh Boyer; +Cc: linuxppc-dev [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 1901 bytes --] å¨ 2008-08-22äºç 14:42 -0400ï¼Josh Boyeråéï¼ > On Fri, Aug 22, 2008 at 08:27:15PM +0200, Zhou Rui wrote: > >Hi, all: > > I think I meet an odd problem with PPC405EP (PPChameleonEVB Board). > > What kernel version are you using? > > > I am running a kernel module which will execute a user space > >application. The entry point of the application is 0x100000a0. At the > > That should be the first clue that you are doing it wrong. Don't do > stuff like that in modules... > > >moment when the processor tries to execute the application, 0x100000a0 > >is not in TLB (this can be seen from BDI by printing out TLB entries), > >so DTLBMiss is called automatically and then finish_tlb_load. However, > >InstructionAccess is followed and the problem arises here. > >InstructionAccess starts from 0x400, and after instruction "0xc0000434 > ><InstructionAccess+52>: stw r12,64(r11)", machine check occurs. > >This instruction will store the value of r12, which is 0x0 at this > >moment, to address 0x03072de0. I am puzzled why this action leads to > >machine check. Is it illegal to store 0x0 in a memory address? Or is > >there some other cause of the machine check here? > > I have no idea if you're using physical or virtual addresses here, so > there isn't much we can do to help you. > > Do you have enough DRAM to cover that? Some of those boards only come > with 32MiB of DRAM. Oh, I think I just now realized that for 32MB DRAM, the physical address should be between 0x0 and 0x1FFFFFF, but the address 0x03072da0 was out of this range ... To enlarge the memory size is one solution but I do not think it is feasible for my board. Is there any solution from software side? I hope there is. Thank you very much!!! Best Wishes Zhou Rui 2008-08-23 > > josh __________________________________________________ ¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä? http://cn.mail.yahoo.com ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-09-01 7:50 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-22 18:27 Why does one "stw" fail with address translation disabled in PPC405EP? Zhou Rui 2008-08-22 18:42 ` Josh Boyer 2008-08-23 8:26 ` Zhou Rui 2008-08-23 22:49 ` Benjamin Herrenschmidt 2008-08-24 18:55 ` Wolfgang Denk 2008-08-25 19:16 ` Zhou Rui 2008-08-28 15:53 ` Zhou Rui 2008-08-31 11:50 ` [Evolves!] " Zhou Rui 2008-09-01 5:42 ` Benjamin Herrenschmidt 2008-09-01 7:22 ` Zhou Rui 2008-09-01 7:17 ` Benjamin Herrenschmidt 2008-09-01 8:19 ` Zhou Rui 2008-08-23 21:18 ` Zhou Rui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).