From mboxrd@z Thu Jan 1 00:00:00 1970 To: Dan Malek Cc: linuxppc-embedded@lists.linuxppc.org Subject: Re: 8xx MMU Table Walk Base (was Re: kernel crashes at InstructionTLBMiss ) In-reply-to: Your message of "Mon, 05 Jun 2000 16:37:55 -0400" <393C0FA3.9208BAE1@embeddededge.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Tue, 06 Jun 2000 16:31:08 +1000 Message-ID: <23333.960273068@msa.cmst.csiro.au> From: Murray Jensen Sender: owner-linuxppc-embedded@lists.linuxppc.org List-Id: On Mon, 05 Jun 2000 16:37:55 -0400, Dan Malek writes: >Murray Jensen wrote: > >> Here we come to a dilemma that I have had since I started with this stuff. >> I have never been able to get an 8xx kernel running without adding a patch >> to update the Table Walk Base register at the time that a new mm context is >> activated. > > >After reading your diatribe Diatribe? Hmm.. Sorry, I didn't mean to offend you - I thought I was being reasonably clear, and definitely polite. I wasn't being at all critical of anyone associated with Linux/PPC or the 8xx embedded version - I think you and they all do a great job, and I am very impressed. In my eagerness I left out some information I should have provided, sorry. I will try to correct that now. I use the linuxppc_2_3 bitkeeper repository at hq.fsmlabs.com as the base for my local changes. I use a Sun Ultra 60 dual cpu sparc workstation running Solaris 2.7 as my host o/s, with gcc-2.95.2, the latest binutils from the CVS repository at :pserver:anoncvs@anoncvs.cygnus.com:/cvs/src, and glibc-2.1.3 configured as an mpc8xx cross-compiler for Solaris. I build my own root filesystem, based on sources from the net. When I compile the kernel, I build zImage.initrd and download it to the target using the GDB protocol via a serial port. My hardware is a Cogent CMA102 motherboard, with CMA286-60 CPU module (MPC860 cpu - rev no. XPC860MHZP66C1), and CMA302 I/O module with 8Mb flash. The motherboard has 32Mb RAM, 2 serial and 1 parallel ports, and LCD display. The cpu module has a 128K boot eprom, which I load with a small ROM monitor I wrote based on the GDB eprom stubs configuration of eCos (embedded cygnus operating system - which supports the cogent platform). The monitor supports downloading via the serial port (at 230400bps) into RAM using the GDB protocol, programming flash from a RAM image, and booting an image that resides in flash, among other things (I call it ELILO :-). Modifications I make to the kernel are minimal - just drivers for devices on the cogent platform (including the I/O mappings, which are different to the MBX in that they reside in the lower half of the address space which required me to use ioremap() correctly by setting ioremap_base and saving its return value and using this to access my devices) and some other minor changes, which I believe are not relevant. The only major change I have had to make to the kernel is the one I discussed in my previous message. I checked this out again, and one other change was moving most of the code at _start in head_8xx.S to after the exception handlers because the extra mappings required for the Cogent devices caused this code to exceed 0x100 bytes. The other thing I added was making use of the MPC860 watchdog which I could do because I had control of the boot eprom code (if the kernel hangs I get a watchdog reset in some circumstances, depending on the type of hang). >There are many subtle changes to context switching that happen during >the minor updates (which could be weekly). I usually update daily, or every couple of days, a local copy of the bitkeeper repository (using rsync, but I also maintain a read-only anonymous bitkeeper clone which I bk pull at the same time, because I like to use bk sccstool to follow the changes), which I then "import" into a vendor branch of a local CVS repository. My local changes are maintained in the HEAD revision. I also maintain a "stable" branch which is a working kernel, based on repository as at October 1999. >There are several patches >floating around (and probably more kernel sources) that certainly >are not correct. I don't use any patches from the net - all changes made are local. >I don't know where you get your source code, but there >are exactly two consistent and working kernel sources that I have ever >provided. One is in ftp://linuxppc.cs.nmt.edu/pub/linuxppc/embedded, >the mpc8xx-2.2.13.tgz tarball. A better and completely up to date >kernel is in ftp.mvista.com/pub/CDK/wip/ppc_8xx/RPMS (along with >everything else to build an 8xx embedded system). Everyone should be >using the kernel from MontaVista, and if something isn't in there >that you want, send me patches against that. These are all 2.2.x, no? I believe I need 2.[34].x because I want to use the latest RT-Linux stuff eventually, which only works with the 2.3.x, or later, kernels. >There are patches posted against that original tarball, and make sure >you are not mixing kernel versions and patches. As I say, I use a pristine 2.[34].x kernel with local changes only. >Finally, lots of bugs associated with porting to new hardware manifest >themselves as "problems" in any VM related function. Since many people >don't understand the subtle interactions of all of these functions (as >evidenced by your message) you become convinced the problem is associated >with this complexity and fail to unravel the clues to the real cause. I don't think I deserve this sort of belittling. Treating potential contributors in this way can only have a negative effect on open source development. I admit I don't yet fully understand the PowerPC architecture, or the MPC8xx implementation of it, but I am learning, and with nearly 20 years experience in computer science I believe I should be able to pick it up eventually (I've "seen it all before" :-). >This could be as simple as intrusive debugging hardware, I use kgdb. >some silicon >bug not understood, I included my chip revision above. It appears to be a C1 revision chip. >or prototype hardware not working correctly. Definitely. >There are lots of products and systems in development running this software, >so you have to approach this generic software from the assumption that >it is first likely to be working. I did. I said I was intrigued as to why this problem only affected me. And once I make the described change, the "generic software" works for me also (at least an older revision works - current revisions still crash, something to do with the memory allocation stuff, I believe). As I said in my previous message, I suspect something else I am doing is triggering this bug (that much is obvious), but there are two possibilities: either I am doing something wrong in my local changes, or the "generic software" has a bug which does not show up in anyone else's implementation. I was wondering whether the latter was the case (I wasn't blaming anyone, I was excited that maybe I had discovered a long existing hidden fault in the software, that may explain some mysterious failure modes, that someone else might be getting - other developers may then post, saying "yeah, that would explain my problem, blah blah", and so the discussion goes on. Upon searching the archives, I found that a similar problem had been discussed for the 2.2.x kernels, so maybe the fix or fixes didn't make their way into the 2.[34].x kernels. I don't know, anything is possible, that's why we have these discussion groups). >Are there possible bugs? Sure, and you have to provide minimal information >for the rest of us to help out. Again, apologies for not providing enough information in my message - I made assumptions I shouldn't have. Obviously, on my first post I should have been completely anal, because no-one knows me from a bar of soap. I can then start to be less exacting after I have been around for a while. >Where did you get the sources? What >patches did you apply? What are your hardware details? What >modifications did you make? See above. >As for 2.4.xx, the 8xx still doesn't work correctly. However, I >discovered it failed to work after the 403 additions, so I am now >learning about the 403 in an effort to make everything live happily >together again. It was my feeling that the problems were to do with the new memory allocation stuff introduced a couple of months ago. >Note, this has nothing to do with M_TWB...... I know. Now that we have gotten past treating me like a dill, please can you re-read my original message and see if I am making any sense at all? I would very much appreciate some insights and even constructive criticism. Cheers! Murray... PS: I haven't contributed the Cogent platform changes yet, because I wasn't happy that I had done everything properly. This was really my first foray into taking part in the Linux/PPC embedded development community - I can't say it has been particularly successful (despite my good feelings about contributing a small fix a couple of days ago). I will try not to be too discouraged. -- Murray Jensen, CSIRO Manufacturing Sci & Tech, Phone: +61 3 9662 7763 Locked Bag No. 9, Preston, Vic, 3072, Australia. Fax: +61 3 9662 7853 Internet: Murray.Jensen@cmst.csiro.au (old address was mjj@mlb.dmt.csiro.au) ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/