linux-admin.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* killing a stuborn process
@ 2003-01-07 19:34 Thomas Steinbrecher
  0 siblings, 0 replies; 3+ messages in thread
From: Thomas Steinbrecher @ 2003-01-07 19:34 UTC (permalink / raw)
  To: linux-admin

Greetings,

please excuse if this is a newbie question, I'm no computer
expert. 

I have several dual ATHLON PC's with Suse 8.0 in a Beowulf
cluster on which I run molecular dynamics calculations.

On one of my machines the calculations stop sometimes
without apparent reason and I can't end the processes
anymore (kill -9 has no effect). The processes are shown as
running by ps, but they produce no cpu load with top.

Is there another way to kill such a process?

I suppose one of the CPUs might be damaged, but how do I
test if that is true and what causes the processes to
hangup? 
(The same calculation runs fine on another PC with
identical software and hardware setup)

Regards,

Thomas

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: killing a stuborn process
@ 2003-01-08 14:48 Shaw, Marco
  2003-01-10  7:05 ` Andrew B. Cramer
  0 siblings, 1 reply; 3+ messages in thread
From: Shaw, Marco @ 2003-01-08 14:48 UTC (permalink / raw)
  To: linux-admin

Tough one...  Something weird with the process if it runs fine on another machine.  Without knowing what the process does it's difficult to try to figure out what's going on.

I support you're trying to do a kill -9 on the PID itself.  Try to track down the PPID, and see if you can cleanly kill that one, unless the PPID is "1"!

Marco

-----Original Message-----
From: Thomas Steinbrecher [mailto:thomas.steinbrecher@physchem.uni-freiburg.de] 
Sent: Tuesday, January 07, 2003 3:34 PM
To: linux-admin@vger.kernel.org
Subject: killing a stuborn process


Greetings,

please excuse if this is a newbie question, I'm no computer expert. 

I have several dual ATHLON PC's with Suse 8.0 in a Beowulf cluster on which I run molecular dynamics calculations.

On one of my machines the calculations stop sometimes
without apparent reason and I can't end the processes
anymore (kill -9 has no effect). The processes are shown as running by ps, but they produce no cpu load with top.

Is there another way to kill such a process?

I suppose one of the CPUs might be damaged, but how do I
test if that is true and what causes the processes to
hangup? 
(The same calculation runs fine on another PC with
identical software and hardware setup)

Regards,

Thomas
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: killing a stuborn process
  2003-01-08 14:48 Shaw, Marco
@ 2003-01-10  7:05 ` Andrew B. Cramer
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew B. Cramer @ 2003-01-10  7:05 UTC (permalink / raw)
  To: linux-admin, thomas.steinbrecher


Thomas,
	If you think it may be a hardware problem, compile the kernel 
without SMP support, and remove the second processor. After a 
restart, check /proc/cpuinfo to see that one processor is shown. 
Test. If it still fails or not, swap the processors. Test again. That 
will eliminate the CPU's as an issue. DO NOT use your working system 
for parts! Just check configuration files and module versions.

Best - Andrew

> From: Thomas Steinbrecher [mailto:thomas.steinbrecher@physchem.uni-freiburg.de] 
> Sent: Tuesday, January 07, 2003 3:34 PM
> To: linux-admin@vger.kernel.org
> Subject: killing a stuborn process
> 
> 
> Greetings,
> 
> please excuse if this is a newbie question, I'm no computer expert. 
> 
> I have several dual ATHLON PC's with Suse 8.0 in a Beowulf cluster on which I run molecular dynamics calculations.
> 
> On one of my machines the calculations stop sometimes
> without apparent reason and I can't end the processes
> anymore (kill -9 has no effect). The processes are shown as running by ps, but they produce no cpu load with top.
> 
> Is there another way to kill such a process?
> 
> I suppose one of the CPUs might be damaged, but how do I
> test if that is true and what causes the processes to
> hangup? 
> (The same calculation runs fine on another PC with
> identical software and hardware setup)
> 
> Regards,
> 
> Thomas



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-01-10  7:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-07 19:34 killing a stuborn process Thomas Steinbrecher
  -- strict thread matches above, loose matches on Subject: below --
2003-01-08 14:48 Shaw, Marco
2003-01-10  7:05 ` Andrew B. Cramer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).