From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <3AE42049.C88B0C12@enst.fr> Date: Mon, 23 Apr 2001 14:30:01 +0200 From: Stefan Nunninger Reply-To: nunninger@web.de MIME-Version: 1.0 To: linuxppc-embedded List Subject: crashme produces hang of linux Content-Type: text/plain; charset=us-ascii Sender: owner-linuxppc-embedded@lists.linuxppc.org List-Id: Hello, I got a Montavista kernel 2.2.14 running on a custom board with a MPC860 powerpc. The root file system is mounted over nfs from a PC running an nfs server. To check the stability of the board I was running several tests. First I used all kind of applications I could imagine and tried whether they are working fine. I found no problems using anyting like that. This includes basic programs like ls, cd, vi, tar, gzip, top, ftp, ftpd telnet, telnetd, httpd etc. Also the board runs for several days when used as websever even though not under heavy load. So I felt quite confident everything works fine. Now I tried to verify the board's stability using crashme. Crashme is a program that tests the stability of a operating system. It generates random code and executes it. Obviously this will generate all kind of errors as segmentation faults, illegal instructions etc. That is fine and is a wanted property. However it is expected that crashme may not crash or hang the operating system. Unfortunately I found that my system hangs shortly after starting crashme. The kernel seems to work still fine as it reacts to ping requests. However it is not possible to connect to the system using telnet or ftp. Also the console, which is connected via the serial port (minicom), does not react. The only solution I found was restarting the system. Thus this seems to indicates some stability problems on my embedded device. It might be that this is nothing serious as such a situation should not occur during normal operation. Still it would be better if the kernel would stay useable even in an extrem situation as when using crashme. Shurely it would be interesting to know what kind of instruction produces the hang. There is a possibility to let the program write a logfile in which the code that is execute is stored. After a crash the last line in the logfile should give the instruction producing the crash. However to use this the sync mechanism of linux has to be switched off. Because syncing would prevent the data be written immediately to disk. The sync buffer however will be lost after the crash. For the case of the embedded device there is a further problem. When the kernel crashes it crashes probably also the network connection which is necessary for the nfs connection. Thus quite likely the last instruction will not be transfered by nfs. Thus I do not know which instruction produces the hang. After all I would be interested in hearing what you think about all that. Do you think crashme is a useful test at all. Should I simple ignore the result and be happy that so far no other problems occured. Or is it probable that the board will get unstable in some rare cases. What might be the reason for the hang. Is there anything obvious I should check. As I've read several times that memory is a difficult task with linux I veryfied the UMPA values I'm using. As I have no logic analyzer at hand this was only a check for plausability of the values. And finally has anybody done similar tests. Which further tests should I do for stability? Also I'd like to figure out the performance of the board. I'm especially interested in benchmarks which give an idea of basic values like raw processing speed, file system performance, memory and network performance. And finally I'd like to compare my device to other embedded devices and to known PC systems. Any ideas are welcome - many thanks Stefan -- Stefan Nunninger Ecole nationale superieure des telecommunications 46, Rue Barrault 75634 Paris Cedex 13 Tel: 01 45 81 7507 (bureau) 01 45 81 7600 (laboratoire) ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/