linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* System lockup.
@ 2002-10-21  0:02 Bill Leckey
  2002-10-21 11:30 ` Alan Cox
  0 siblings, 1 reply; 4+ messages in thread
From: Bill Leckey @ 2002-10-21  0:02 UTC (permalink / raw)
  To: linux-kernel

I have a terminal server that's supporting up to 240 lines.  It's a 
2.4.17 kernel, and is running squid, and using the reiser file system to 
store log files, squid cache and other data.  About every day or so, the 
machine locks up.  The screen is blank, keyboard doesn't respond, the 
serial console I set up shows no 'dying gasp' and there is nothing in 
any of the system logs.

This doesn't appear to be related to load as it has happened both during 
the busiest times and during the low times.

I'm still servicing interrupts from our serial devices (on IRQ 11), so 
it seems interrupts are still happening.

Beyond this, however, I have no idea where to go from here.  If anyone 
has any hints on what the problem might be, or even a way to gather more 
information, I would be grateful.

-- 
Bill Leckey - Senior Software Design Engineer
TPG Research and Development
Ph: +61 2 62851711
Fax: +61 2 62853939


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: System lockup.
  2002-10-21  0:02 System lockup Bill Leckey
@ 2002-10-21 11:30 ` Alan Cox
  2002-10-21 21:44   ` Bill Leckey
  2002-10-27 22:26   ` Bill Leckey
  0 siblings, 2 replies; 4+ messages in thread
From: Alan Cox @ 2002-10-21 11:30 UTC (permalink / raw)
  To: Bill Leckey; +Cc: Linux Kernel Mailing List

On Mon, 2002-10-21 at 01:02, Bill Leckey wrote:
> I have a terminal server that's supporting up to 240 lines.  It's a 
> 2.4.17 kernel, and is running squid, and using the reiser file system to 
> store log files, squid cache and other data.  About every day or so, the 
> machine locks up.  The screen is blank, keyboard doesn't respond, the 
> serial console I set up shows no 'dying gasp' and there is nothing in 
> any of the system logs.
> 
> This doesn't appear to be related to load as it has happened both during 
> the busiest times and during the low times.
> 
> I'm still servicing interrupts from our serial devices (on IRQ 11), so 
> it seems interrupts are still happening.
> 
> Beyond this, however, I have no idea where to go from here.  If anyone 
> has any hints on what the problem might be, or even a way to gather more 
> information, I would be grateful.

Hardware details would be a useful starting point. Also if its
uniprocessor or SMP. Finally have you considered 2.4.19 as 2.4.17 does
have at least one known and fixed small PPP race. With 240 lines I guess
you might actually hit that


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: System lockup.
  2002-10-21 11:30 ` Alan Cox
@ 2002-10-21 21:44   ` Bill Leckey
  2002-10-27 22:26   ` Bill Leckey
  1 sibling, 0 replies; 4+ messages in thread
From: Bill Leckey @ 2002-10-21 21:44 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> On Mon, 2002-10-21 at 01:02, Bill Leckey wrote:
> 
>>I have a terminal server that's supporting up to 240 lines.  It's a 
>>2.4.17 kernel, and is running squid, and using the reiser file system to 
>>store log files, squid cache and other data.  About every day or so, the 
>>machine locks up.  The screen is blank, keyboard doesn't respond, the 
>>serial console I set up shows no 'dying gasp' and there is nothing in 
>>any of the system logs.
>>
>>This doesn't appear to be related to load as it has happened both during 
>>the busiest times and during the low times.
>>
>>I'm still servicing interrupts from our serial devices (on IRQ 11), so 
>>it seems interrupts are still happening.
>>
>>Beyond this, however, I have no idea where to go from here.  If anyone 
>>has any hints on what the problem might be, or even a way to gather more 
>>information, I would be grateful.
> 
> 
> Hardware details would be a useful starting point. Also if its
> uniprocessor or SMP. Finally have you considered 2.4.19 as 2.4.17 does
> have at least one known and fixed small PPP race. With 240 lines I guess
> you might actually hit that


Thanks for the reply Alan, much appreciated.

This system is running on Uniprocessor Intel P III's or Celerons (a 
variety of clock speeds) with  either 256 or 512Mb of memory and no swap 
(another experiment).  The Serial Hardware is a proprietary card (with a 
driver to cope with that, mostly copied from the standard serial driver) 
giving us the 240 serial lines.  I can give more detail on that and the 
driver if necessary.

Just offhand I have been through my  driver (with others making 
comments/suggestions) a few times to see if it's all my fault (which it 
may well still be).  The reason I know interrupts are still running is 
that on every interrupt I stick a different value onto the Parallel port 
and can see those changing as I would expect even when the system is 
locked up otherwise.

I am considering 2.4.19 but haven't had a chance to test with the 
driver/hardware yet.  That will be soon as it seems a good path to go down.

One thing I forgot was that there was the same failure with a system 
running without the squid.  It seems though, that a system running 
without squid will fail less often (two or more weeks between failures 
as opposed to a few days).

I hope I'm giving enough detail to be useful here.

Bill


-- 
Bill Leckey - Senior Software Design Engineer
TPG Research and Development
Ph: +61 2 62851711
Fax: +61 2 62853939


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: System lockup.
  2002-10-21 11:30 ` Alan Cox
  2002-10-21 21:44   ` Bill Leckey
@ 2002-10-27 22:26   ` Bill Leckey
  1 sibling, 0 replies; 4+ messages in thread
From: Bill Leckey @ 2002-10-27 22:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Just FYI, I've been running 2.4.19 on three systems for about 3 days 
now.  Where before I would expect each system to hang at least once a 
day, I've had no hangs whatsoever.

I'm not sure if it was the PPP race or something else but 2.4.19 seems 
(at least so far) much more stable.

Thanks for the advice Alan.

Bill

Alan Cox wrote:
> On Mon, 2002-10-21 at 01:02, Bill Leckey wrote:
> 
>>I have a terminal server that's supporting up to 240 lines.  It's a 
>>2.4.17 kernel, and is running squid, and using the reiser file system to 
>>store log files, squid cache and other data.  About every day or so, the 
>>machine locks up.  The screen is blank, keyboard doesn't respond, the 
>>serial console I set up shows no 'dying gasp' and there is nothing in 
>>any of the system logs.
>>
>>This doesn't appear to be related to load as it has happened both during 
>>the busiest times and during the low times.
>>
>>I'm still servicing interrupts from our serial devices (on IRQ 11), so 
>>it seems interrupts are still happening.
>>
>>Beyond this, however, I have no idea where to go from here.  If anyone 
>>has any hints on what the problem might be, or even a way to gather more 
>>information, I would be grateful.
> 
> 
> Hardware details would be a useful starting point. Also if its
> uniprocessor or SMP. Finally have you considered 2.4.19 as 2.4.17 does
> have at least one known and fixed small PPP race. With 240 lines I guess
> you might actually hit that
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-10-27 22:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-21  0:02 System lockup Bill Leckey
2002-10-21 11:30 ` Alan Cox
2002-10-21 21:44   ` Bill Leckey
2002-10-27 22:26   ` Bill Leckey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).