From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Andrew B. Cramer" Subject: RE: killing a stuborn process Date: Fri, 10 Jan 2003 01:05:05 -0600 Sender: linux-admin-owner@vger.kernel.org Message-ID: <3E1E1C41.24507.1DD1FD@localhost> References: Reply-To: andrew.cramer@cramer-ts.com Mime-Version: 1.0 Content-Transfer-Encoding: 7BIT Return-path: In-reply-to: Content-description: Mail message body List-Id: Content-Type: text/plain; charset="us-ascii" To: linux-admin@vger.kernel.org, thomas.steinbrecher@physchem.uni-freiburg.de Thomas, If you think it may be a hardware problem, compile the kernel without SMP support, and remove the second processor. After a restart, check /proc/cpuinfo to see that one processor is shown. Test. If it still fails or not, swap the processors. Test again. That will eliminate the CPU's as an issue. DO NOT use your working system for parts! Just check configuration files and module versions. Best - Andrew > From: Thomas Steinbrecher [mailto:thomas.steinbrecher@physchem.uni-freiburg.de] > Sent: Tuesday, January 07, 2003 3:34 PM > To: linux-admin@vger.kernel.org > Subject: killing a stuborn process > > > Greetings, > > please excuse if this is a newbie question, I'm no computer expert. > > I have several dual ATHLON PC's with Suse 8.0 in a Beowulf cluster on which I run molecular dynamics calculations. > > On one of my machines the calculations stop sometimes > without apparent reason and I can't end the processes > anymore (kill -9 has no effect). The processes are shown as running by ps, but they produce no cpu load with top. > > Is there another way to kill such a process? > > I suppose one of the CPUs might be damaged, but how do I > test if that is true and what causes the processes to > hangup? > (The same calculation runs fine on another PC with > identical software and hardware setup) > > Regards, > > Thomas