* System lockup.
@ 2002-10-21 0:02 Bill Leckey
2002-10-21 11:30 ` Alan Cox
0 siblings, 1 reply; 4+ messages in thread
From: Bill Leckey @ 2002-10-21 0:02 UTC (permalink / raw)
To: linux-kernel
I have a terminal server that's supporting up to 240 lines. It's a
2.4.17 kernel, and is running squid, and using the reiser file system to
store log files, squid cache and other data. About every day or so, the
machine locks up. The screen is blank, keyboard doesn't respond, the
serial console I set up shows no 'dying gasp' and there is nothing in
any of the system logs.
This doesn't appear to be related to load as it has happened both during
the busiest times and during the low times.
I'm still servicing interrupts from our serial devices (on IRQ 11), so
it seems interrupts are still happening.
Beyond this, however, I have no idea where to go from here. If anyone
has any hints on what the problem might be, or even a way to gather more
information, I would be grateful.
--
Bill Leckey - Senior Software Design Engineer
TPG Research and Development
Ph: +61 2 62851711
Fax: +61 2 62853939
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: System lockup.
2002-10-21 0:02 System lockup Bill Leckey
@ 2002-10-21 11:30 ` Alan Cox
2002-10-21 21:44 ` Bill Leckey
2002-10-27 22:26 ` Bill Leckey
0 siblings, 2 replies; 4+ messages in thread
From: Alan Cox @ 2002-10-21 11:30 UTC (permalink / raw)
To: Bill Leckey; +Cc: Linux Kernel Mailing List
On Mon, 2002-10-21 at 01:02, Bill Leckey wrote:
> I have a terminal server that's supporting up to 240 lines. It's a
> 2.4.17 kernel, and is running squid, and using the reiser file system to
> store log files, squid cache and other data. About every day or so, the
> machine locks up. The screen is blank, keyboard doesn't respond, the
> serial console I set up shows no 'dying gasp' and there is nothing in
> any of the system logs.
>
> This doesn't appear to be related to load as it has happened both during
> the busiest times and during the low times.
>
> I'm still servicing interrupts from our serial devices (on IRQ 11), so
> it seems interrupts are still happening.
>
> Beyond this, however, I have no idea where to go from here. If anyone
> has any hints on what the problem might be, or even a way to gather more
> information, I would be grateful.
Hardware details would be a useful starting point. Also if its
uniprocessor or SMP. Finally have you considered 2.4.19 as 2.4.17 does
have at least one known and fixed small PPP race. With 240 lines I guess
you might actually hit that
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: System lockup.
2002-10-21 11:30 ` Alan Cox
@ 2002-10-21 21:44 ` Bill Leckey
2002-10-27 22:26 ` Bill Leckey
1 sibling, 0 replies; 4+ messages in thread
From: Bill Leckey @ 2002-10-21 21:44 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel
Alan Cox wrote:
> On Mon, 2002-10-21 at 01:02, Bill Leckey wrote:
>
>>I have a terminal server that's supporting up to 240 lines. It's a
>>2.4.17 kernel, and is running squid, and using the reiser file system to
>>store log files, squid cache and other data. About every day or so, the
>>machine locks up. The screen is blank, keyboard doesn't respond, the
>>serial console I set up shows no 'dying gasp' and there is nothing in
>>any of the system logs.
>>
>>This doesn't appear to be related to load as it has happened both during
>>the busiest times and during the low times.
>>
>>I'm still servicing interrupts from our serial devices (on IRQ 11), so
>>it seems interrupts are still happening.
>>
>>Beyond this, however, I have no idea where to go from here. If anyone
>>has any hints on what the problem might be, or even a way to gather more
>>information, I would be grateful.
>
>
> Hardware details would be a useful starting point. Also if its
> uniprocessor or SMP. Finally have you considered 2.4.19 as 2.4.17 does
> have at least one known and fixed small PPP race. With 240 lines I guess
> you might actually hit that
Thanks for the reply Alan, much appreciated.
This system is running on Uniprocessor Intel P III's or Celerons (a
variety of clock speeds) with either 256 or 512Mb of memory and no swap
(another experiment). The Serial Hardware is a proprietary card (with a
driver to cope with that, mostly copied from the standard serial driver)
giving us the 240 serial lines. I can give more detail on that and the
driver if necessary.
Just offhand I have been through my driver (with others making
comments/suggestions) a few times to see if it's all my fault (which it
may well still be). The reason I know interrupts are still running is
that on every interrupt I stick a different value onto the Parallel port
and can see those changing as I would expect even when the system is
locked up otherwise.
I am considering 2.4.19 but haven't had a chance to test with the
driver/hardware yet. That will be soon as it seems a good path to go down.
One thing I forgot was that there was the same failure with a system
running without the squid. It seems though, that a system running
without squid will fail less often (two or more weeks between failures
as opposed to a few days).
I hope I'm giving enough detail to be useful here.
Bill
--
Bill Leckey - Senior Software Design Engineer
TPG Research and Development
Ph: +61 2 62851711
Fax: +61 2 62853939
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: System lockup.
2002-10-21 11:30 ` Alan Cox
2002-10-21 21:44 ` Bill Leckey
@ 2002-10-27 22:26 ` Bill Leckey
1 sibling, 0 replies; 4+ messages in thread
From: Bill Leckey @ 2002-10-27 22:26 UTC (permalink / raw)
To: Alan Cox; +Cc: Linux Kernel Mailing List
Just FYI, I've been running 2.4.19 on three systems for about 3 days
now. Where before I would expect each system to hang at least once a
day, I've had no hangs whatsoever.
I'm not sure if it was the PPP race or something else but 2.4.19 seems
(at least so far) much more stable.
Thanks for the advice Alan.
Bill
Alan Cox wrote:
> On Mon, 2002-10-21 at 01:02, Bill Leckey wrote:
>
>>I have a terminal server that's supporting up to 240 lines. It's a
>>2.4.17 kernel, and is running squid, and using the reiser file system to
>>store log files, squid cache and other data. About every day or so, the
>>machine locks up. The screen is blank, keyboard doesn't respond, the
>>serial console I set up shows no 'dying gasp' and there is nothing in
>>any of the system logs.
>>
>>This doesn't appear to be related to load as it has happened both during
>>the busiest times and during the low times.
>>
>>I'm still servicing interrupts from our serial devices (on IRQ 11), so
>>it seems interrupts are still happening.
>>
>>Beyond this, however, I have no idea where to go from here. If anyone
>>has any hints on what the problem might be, or even a way to gather more
>>information, I would be grateful.
>
>
> Hardware details would be a useful starting point. Also if its
> uniprocessor or SMP. Finally have you considered 2.4.19 as 2.4.17 does
> have at least one known and fixed small PPP race. With 240 lines I guess
> you might actually hit that
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2002-10-27 22:24 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-21 0:02 System lockup Bill Leckey
2002-10-21 11:30 ` Alan Cox
2002-10-21 21:44 ` Bill Leckey
2002-10-27 22:26 ` Bill Leckey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).