public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.7 SMP trouble?
@ 2004-07-19 19:16 Jason Gauthier
  2004-07-19 20:04 ` Richard B. Johnson
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Gauthier @ 2004-07-19 19:16 UTC (permalink / raw)
  To: linux-kernel

I've found an IBM netfinity (5600) box that was shelved a few years ago.  I
spent $80 and got two processors for it. (P3-667).

I put them in the box, installed Linux (slackware) and upgraded the kernel
to 2.6.7.  I then started installing my software on it.  Nagios, MRTG,
samba, and some other tools we use for network monitoring.  This is going to
be an upgrade to a monitoring server we have.  Well, I went home, came in
the next day and the box was locked hard.  No messages, no console output.
Just dead.

Thinking it was a fluke, I fired it up.  Again, after several hours running;
total death.  So, I figured I have two options.  Software or hardware is
making it die.  I removed each processor in turn, and ran the box for over
24 hours under HIGH stress. (5+ load average). The system is running the
above mentioned software.  But, just to make sure this processor gets a
workout I am compiling code over and over.  Both processors have been rock
solid for the duration of the test.  

I then placed both processors in the box and started the same test.  It was
dead within 8 hours.  I am now very suspicious of the kernel.

So, I installed 2.4.22 and ran the same tests.  It went over 48 hours with
no issues.  Now I'm certain it's the kernel.  Can anyone confirm any SMP
issues that might cause this?  

Thanks,

Jason


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.7 SMP trouble?
  2004-07-19 19:16 2.6.7 SMP trouble? Jason Gauthier
@ 2004-07-19 20:04 ` Richard B. Johnson
  2004-07-20 14:22   ` Zwane Mwaikambo
  2004-07-20 19:57   ` Bill Davidsen
  0 siblings, 2 replies; 7+ messages in thread
From: Richard B. Johnson @ 2004-07-19 20:04 UTC (permalink / raw)
  To: Jason Gauthier; +Cc: linux-kernel

On Mon, 19 Jul 2004, Jason Gauthier wrote:

> I've found an IBM netfinity (5600) box that was shelved a few years ago.  I
> spent $80 and got two processors for it. (P3-667).
>
> I put them in the box, installed Linux (slackware) and upgraded the kernel
> to 2.6.7.  I then started installing my software on it.  Nagios, MRTG,
> samba, and some other tools we use for network monitoring.  This is going to
> be an upgrade to a monitoring server we have.  Well, I went home, came in
> the next day and the box was locked hard.  No messages, no console output.
> Just dead.
>
> Thinking it was a fluke, I fired it up.  Again, after several hours running;
> total death.  So, I figured I have two options.  Software or hardware is
> making it die.  I removed each processor in turn, and ran the box for over
> 24 hours under HIGH stress. (5+ load average). The system is running the
> above mentioned software.  But, just to make sure this processor gets a
> workout I am compiling code over and over.  Both processors have been rock
> solid for the duration of the test.
>
> I then placed both processors in the box and started the same test.  It was
> dead within 8 hours.  I am now very suspicious of the kernel.
>
> So, I installed 2.4.22 and ran the same tests.  It went over 48 hours with
> no issues.  Now I'm certain it's the kernel.  Can anyone confirm any SMP
> issues that might cause this?
>
> Thanks,
>
> Jason

Another data-point. I haven't been able to run any new (2.6+) kernel
reliably in a SMP machine. They stop. Just like you noted. That's
why all my SMP machines still run 2.4.26. It's rock solid and has
the latest-and-greatest updates (there's a -pre-27 coming out).
Anyway, for production machines, you probably need to run 2.4.26.

If you don't really need anything reliable, you might try to enable
Sys Req and see if you can find out where it's stopped. When my
machines stop, the CPUs get cold, just like their clocks were
shut off! -- another data-point --


Cheers,
Dick Johnson
Penguin : Linux version 2.4.26 on an i686 machine (5570.56 BogoMips).
            Note 96.31% of all statistics are fiction.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: 2.6.7 SMP trouble?
@ 2004-07-20 14:21 Jason Gauthier
  2004-07-20 14:34 ` Zwane Mwaikambo
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Gauthier @ 2004-07-20 14:21 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Linux Kernel

> It would actually help if you found the exact version which 
> stopped working, then we can get it looked at and fixed.
> 

If it's in the middle of 2.5 development somewhere that could take me months
:) 
Assuming 1 day to download compile and start the tests, it's about a day per
kernel.

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.7 SMP trouble?
  2004-07-19 20:04 ` Richard B. Johnson
@ 2004-07-20 14:22   ` Zwane Mwaikambo
  2004-07-20 19:57   ` Bill Davidsen
  1 sibling, 0 replies; 7+ messages in thread
From: Zwane Mwaikambo @ 2004-07-20 14:22 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Jason Gauthier, Linux Kernel

On Mon, 19 Jul 2004, Richard B. Johnson wrote:

> Another data-point. I haven't been able to run any new (2.6+) kernel
> reliably in a SMP machine. They stop. Just like you noted. That's
> why all my SMP machines still run 2.4.26. It's rock solid and has
> the latest-and-greatest updates (there's a -pre-27 coming out).
> Anyway, for production machines, you probably need to run 2.4.26.

It would actually help if you found the exact version which stopped
working, then we can get it looked at and fixed.

Ta,
	Zwane


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: 2.6.7 SMP trouble?
  2004-07-20 14:21 Jason Gauthier
@ 2004-07-20 14:34 ` Zwane Mwaikambo
  0 siblings, 0 replies; 7+ messages in thread
From: Zwane Mwaikambo @ 2004-07-20 14:34 UTC (permalink / raw)
  To: Jason Gauthier; +Cc: Linux Kernel

On Tue, 20 Jul 2004, Jason Gauthier wrote:

> > It would actually help if you found the exact version which
> > stopped working, then we can get it looked at and fixed.
> >
>
> If it's in the middle of 2.5 development somewhere that could take me months
> :)
> Assuming 1 day to download compile and start the tests, it's about a day per
> kernel.

Try the following kernels;

2.6.0
2.5.65
2.5.60

Basically just make large strides, due to the lack of other data this may
be the only way.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.7 SMP trouble?
  2004-07-19 20:04 ` Richard B. Johnson
  2004-07-20 14:22   ` Zwane Mwaikambo
@ 2004-07-20 19:57   ` Bill Davidsen
  1 sibling, 0 replies; 7+ messages in thread
From: Bill Davidsen @ 2004-07-20 19:57 UTC (permalink / raw)
  To: linux-kernel

Richard B. Johnson wrote:
> On Mon, 19 Jul 2004, Jason Gauthier wrote:
> 
> 
>>I've found an IBM netfinity (5600) box that was shelved a few years ago.  I
>>spent $80 and got two processors for it. (P3-667).
>>
>>I put them in the box, installed Linux (slackware) and upgraded the kernel
>>to 2.6.7.  I then started installing my software on it.  Nagios, MRTG,
>>samba, and some other tools we use for network monitoring.  This is going to
>>be an upgrade to a monitoring server we have.  Well, I went home, came in
>>the next day and the box was locked hard.  No messages, no console output.
>>Just dead.
>>
>>Thinking it was a fluke, I fired it up.  Again, after several hours running;
>>total death.  So, I figured I have two options.  Software or hardware is
>>making it die.  I removed each processor in turn, and ran the box for over
>>24 hours under HIGH stress. (5+ load average). The system is running the
>>above mentioned software.  But, just to make sure this processor gets a
>>workout I am compiling code over and over.  Both processors have been rock
>>solid for the duration of the test.
>>
>>I then placed both processors in the box and started the same test.  It was
>>dead within 8 hours.  I am now very suspicious of the kernel.
>>
>>So, I installed 2.4.22 and ran the same tests.  It went over 48 hours with
>>no issues.  Now I'm certain it's the kernel.  Can anyone confirm any SMP
>>issues that might cause this?
>>
>>Thanks,
>>
>>Jason
> 
> 
> Another data-point. I haven't been able to run any new (2.6+) kernel
> reliably in a SMP machine. They stop. Just like you noted. That's
> why all my SMP machines still run 2.4.26. It's rock solid and has
> the latest-and-greatest updates (there's a -pre-27 coming out).
> Anyway, for production machines, you probably need to run 2.4.26.
> 
> If you don't really need anything reliable, you might try to enable
> Sys Req and see if you can find out where it's stopped. When my
> machines stop, the CPUs get cold, just like their clocks were
> shut off! -- another data-point --

I suspect it's s config thing, rather than some overall evil, I have 
some production machines up 72+ days. These are production news servers 
with hundreds of users all day long.

The exact version is 2.6.5aa5, but I had a 2.6.7 up for 30 days or so 
until AS3.0 got a hotfix for my applications.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: 2.6.7 SMP trouble?
@ 2004-08-02 13:14 Jason Gauthier
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Gauthier @ 2004-08-02 13:14 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Linux Kernel

Sorry for the delay in reponse to this thread.  I've been testing this out,
and it takes awhile.

Well, sadly, I was *wrong*.  After a couple days my 2.4.26 kernel completely
died on me too.
So, I back tracked to 2.4.22, which I had previously thought stable.  But
no, it also died.

So, I started at the beginning.
Here are my results:

2.4.0-2.4.9:  Do not compile.
2.4.10-2.4.15: 'Oops' on boot.
2.4.16-2.4.17: skipped
2.4.18:       Same SMP symptoms.

So, I could try 16/17, but I don't think it matter much at this point.

What should my next step be?

Thanks for all the help so far.

Jason


> -----Original Message-----
> From: Zwane Mwaikambo [mailto:zwane@fsmlabs.com] 
> Sent: Tuesday, July 20, 2004 10:35 AM
> To: Jason Gauthier
> Cc: Linux Kernel
> Subject: RE: 2.6.7 SMP trouble?
> 
> On Tue, 20 Jul 2004, Jason Gauthier wrote:
> 
> > > It would actually help if you found the exact version 
> which stopped 
> > > working, then we can get it looked at and fixed.
> > >
> >
> > If it's in the middle of 2.5 development somewhere that 
> could take me 
> > months
> > :)
> > Assuming 1 day to download compile and start the tests, 
> it's about a 
> > day per kernel.
> 
> Try the following kernels;
> 
> 2.6.0
> 2.5.65
> 2.5.60
> 
> Basically just make large strides, due to the lack of other 
> data this may be the only way.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-08-02 13:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-19 19:16 2.6.7 SMP trouble? Jason Gauthier
2004-07-19 20:04 ` Richard B. Johnson
2004-07-20 14:22   ` Zwane Mwaikambo
2004-07-20 19:57   ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2004-07-20 14:21 Jason Gauthier
2004-07-20 14:34 ` Zwane Mwaikambo
2004-08-02 13:14 Jason Gauthier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox