All of lore.kernel.org
 help / color / mirror / Atom feed
* System lockup.
@ 2002-10-21  0:02 Bill Leckey
  2002-10-21 11:30 ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Bill Leckey @ 2002-10-21  0:02 UTC (permalink / raw)
  To: linux-kernel

I have a terminal server that's supporting up to 240 lines.  It's a 
2.4.17 kernel, and is running squid, and using the reiser file system to 
store log files, squid cache and other data.  About every day or so, the 
machine locks up.  The screen is blank, keyboard doesn't respond, the 
serial console I set up shows no 'dying gasp' and there is nothing in 
any of the system logs.

This doesn't appear to be related to load as it has happened both during 
the busiest times and during the low times.

I'm still servicing interrupts from our serial devices (on IRQ 11), so 
it seems interrupts are still happening.

Beyond this, however, I have no idea where to go from here.  If anyone 
has any hints on what the problem might be, or even a way to gather more 
information, I would be grateful.

-- 
Bill Leckey - Senior Software Design Engineer
TPG Research and Development
Ph: +61 2 62851711
Fax: +61 2 62853939


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: System lockup.
  2002-10-21  0:02 System lockup Bill Leckey
@ 2002-10-21 11:30 ` Alan Cox
  2002-10-21 21:44   ` Bill Leckey
  2002-10-27 22:26   ` Bill Leckey
  0 siblings, 2 replies; 11+ messages in thread
From: Alan Cox @ 2002-10-21 11:30 UTC (permalink / raw)
  To: Bill Leckey; +Cc: Linux Kernel Mailing List

On Mon, 2002-10-21 at 01:02, Bill Leckey wrote:
> I have a terminal server that's supporting up to 240 lines.  It's a 
> 2.4.17 kernel, and is running squid, and using the reiser file system to 
> store log files, squid cache and other data.  About every day or so, the 
> machine locks up.  The screen is blank, keyboard doesn't respond, the 
> serial console I set up shows no 'dying gasp' and there is nothing in 
> any of the system logs.
> 
> This doesn't appear to be related to load as it has happened both during 
> the busiest times and during the low times.
> 
> I'm still servicing interrupts from our serial devices (on IRQ 11), so 
> it seems interrupts are still happening.
> 
> Beyond this, however, I have no idea where to go from here.  If anyone 
> has any hints on what the problem might be, or even a way to gather more 
> information, I would be grateful.

Hardware details would be a useful starting point. Also if its
uniprocessor or SMP. Finally have you considered 2.4.19 as 2.4.17 does
have at least one known and fixed small PPP race. With 240 lines I guess
you might actually hit that


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: System lockup.
  2002-10-21 11:30 ` Alan Cox
@ 2002-10-21 21:44   ` Bill Leckey
  2002-10-27 22:26   ` Bill Leckey
  1 sibling, 0 replies; 11+ messages in thread
From: Bill Leckey @ 2002-10-21 21:44 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> On Mon, 2002-10-21 at 01:02, Bill Leckey wrote:
> 
>>I have a terminal server that's supporting up to 240 lines.  It's a 
>>2.4.17 kernel, and is running squid, and using the reiser file system to 
>>store log files, squid cache and other data.  About every day or so, the 
>>machine locks up.  The screen is blank, keyboard doesn't respond, the 
>>serial console I set up shows no 'dying gasp' and there is nothing in 
>>any of the system logs.
>>
>>This doesn't appear to be related to load as it has happened both during 
>>the busiest times and during the low times.
>>
>>I'm still servicing interrupts from our serial devices (on IRQ 11), so 
>>it seems interrupts are still happening.
>>
>>Beyond this, however, I have no idea where to go from here.  If anyone 
>>has any hints on what the problem might be, or even a way to gather more 
>>information, I would be grateful.
> 
> 
> Hardware details would be a useful starting point. Also if its
> uniprocessor or SMP. Finally have you considered 2.4.19 as 2.4.17 does
> have at least one known and fixed small PPP race. With 240 lines I guess
> you might actually hit that


Thanks for the reply Alan, much appreciated.

This system is running on Uniprocessor Intel P III's or Celerons (a 
variety of clock speeds) with  either 256 or 512Mb of memory and no swap 
(another experiment).  The Serial Hardware is a proprietary card (with a 
driver to cope with that, mostly copied from the standard serial driver) 
giving us the 240 serial lines.  I can give more detail on that and the 
driver if necessary.

Just offhand I have been through my  driver (with others making 
comments/suggestions) a few times to see if it's all my fault (which it 
may well still be).  The reason I know interrupts are still running is 
that on every interrupt I stick a different value onto the Parallel port 
and can see those changing as I would expect even when the system is 
locked up otherwise.

I am considering 2.4.19 but haven't had a chance to test with the 
driver/hardware yet.  That will be soon as it seems a good path to go down.

One thing I forgot was that there was the same failure with a system 
running without the squid.  It seems though, that a system running 
without squid will fail less often (two or more weeks between failures 
as opposed to a few days).

I hope I'm giving enough detail to be useful here.

Bill


-- 
Bill Leckey - Senior Software Design Engineer
TPG Research and Development
Ph: +61 2 62851711
Fax: +61 2 62853939


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: System lockup.
  2002-10-21 11:30 ` Alan Cox
  2002-10-21 21:44   ` Bill Leckey
@ 2002-10-27 22:26   ` Bill Leckey
  1 sibling, 0 replies; 11+ messages in thread
From: Bill Leckey @ 2002-10-27 22:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Just FYI, I've been running 2.4.19 on three systems for about 3 days 
now.  Where before I would expect each system to hang at least once a 
day, I've had no hangs whatsoever.

I'm not sure if it was the PPP race or something else but 2.4.19 seems 
(at least so far) much more stable.

Thanks for the advice Alan.

Bill

Alan Cox wrote:
> On Mon, 2002-10-21 at 01:02, Bill Leckey wrote:
> 
>>I have a terminal server that's supporting up to 240 lines.  It's a 
>>2.4.17 kernel, and is running squid, and using the reiser file system to 
>>store log files, squid cache and other data.  About every day or so, the 
>>machine locks up.  The screen is blank, keyboard doesn't respond, the 
>>serial console I set up shows no 'dying gasp' and there is nothing in 
>>any of the system logs.
>>
>>This doesn't appear to be related to load as it has happened both during 
>>the busiest times and during the low times.
>>
>>I'm still servicing interrupts from our serial devices (on IRQ 11), so 
>>it seems interrupts are still happening.
>>
>>Beyond this, however, I have no idea where to go from here.  If anyone 
>>has any hints on what the problem might be, or even a way to gather more 
>>information, I would be grateful.
> 
> 
> Hardware details would be a useful starting point. Also if its
> uniprocessor or SMP. Finally have you considered 2.4.19 as 2.4.17 does
> have at least one known and fixed small PPP race. With 240 lines I guess
> you might actually hit that
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* system lockup
@ 2004-08-13 22:53 Mike Waychison
  2004-08-13 23:23 ` Ian Pratt
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Waychison @ 2004-08-13 22:53 UTC (permalink / raw)
  To: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi All,


I've recently managed to get a 2.6.7 dom0 to boot without any major
problems.  I had seen the hwclock issues and the nosegfixup issues fly
by and these seem to be all worked out now.


Another issue though is still lingering for me, and I'm a little
clueless as to how to go about debugging it.


It seems that once I log into my gnome session, everything is aok for
the first little while, until eventually I get a segfault pop-up for
wnck-applet.  I've never seen this segfault before, and am not sure if
it is xen or 2.6.7 related (I'm still running a 2.6.1 variant).
However, a few seconds later, whether or not I click the 'ok' in the
segfault dialog, the machine seems to lock hard.


I've tested this w/ & w/o both a) removing the /lib/tls directory and b)
the nosegfixup kernel option.


Has anyone else seen this?  or possibly know of any other cause for this?

I will try updating to vanilla 2.6.7 tonight to see the issue remains.




Also:


A while ago, I tried building the xenolinux-2.6.7-dom0 kernel with a
pentium II cpu target.  The system appeared to boot up properly, however
X couldn't start as there seemed to be a mysterious SIGBUS being sent.
For now, I'm using a PIV build on this PII..


Thanks,

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBHUZVdQs4kOxk3/MRAgDsAJ421NUq5IUMdmNKNTvoqsWz1TqeeQCeOU66
A5S+sHksUBrbjtBH2nTlFNk=
=mA/7
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: system lockup
  2004-08-13 22:53 system lockup Mike Waychison
@ 2004-08-13 23:23 ` Ian Pratt
  2004-08-13 23:59   ` Mike Waychison
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Pratt @ 2004-08-13 23:23 UTC (permalink / raw)
  To: Mike Waychison; +Cc: xen-devel, Ian.Pratt


> I've recently managed to get a 2.6.7 dom0 to boot without any major
> problems.  I had seen the hwclock issues and the nosegfixup issues fly
> by and these seem to be all worked out now.

That's good to hear.
 
> It seems that once I log into my gnome session, everything is aok for
> the first little while, until eventually I get a segfault pop-up for
> wnck-applet.  I've never seen this segfault before, and am not sure if
> it is xen or 2.6.7 related (I'm still running a 2.6.1 variant).
> However, a few seconds later, whether or not I click the 'ok' in the
> segfault dialog, the machine seems to lock hard.

Is this all in domain 0 or another domain?

Can it be repeated with 2.4, or will your filesystem not boot
with a 2.4 kernel? (BTW: what file system are you using?)
 
> I've tested this w/ & w/o both a) removing the /lib/tls directory and b)
> the nosegfixup kernel option.

It's useful to know its not a tls issue. 
 
> Has anyone else seen this?  or possibly know of any other cause for this?

Could you hook up a serial console to the machine? If you start
Xen with the appropriate (e.g. 'com1=115200,8n1') option there's
a fair chance that either Xen or more likely, domain 0, will
write some sort of crash message to the console when the machine
locks. A 'debug=y' build of Xen might help.
  
> A while ago, I tried building the xenolinux-2.6.7-dom0 kernel with a
> pentium II cpu target.  The system appeared to boot up properly, however
> X couldn't start as there seemed to be a mysterious SIGBUS being sent.
> For now, I'm using a PIV build on this PII..

That's very weird. The other way around I could sort of
understand...

xen/linux 2.6.7 hasn't been as well tested as 2.4.26, but it
seems to survive LTP (Linux Test Project) OK.

Ian
 



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: system lockup
  2004-08-13 23:23 ` Ian Pratt
@ 2004-08-13 23:59   ` Mike Waychison
  2004-08-14  0:20     ` Ian Pratt
  2004-08-14  8:32     ` Keir Fraser
  0 siblings, 2 replies; 11+ messages in thread
From: Mike Waychison @ 2004-08-13 23:59 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Pratt wrote:
>>I've recently managed to get a 2.6.7 dom0 to boot without any major
>>problems.  I had seen the hwclock issues and the nosegfixup issues fly
>>by and these seem to be all worked out now.
>
>
> That's good to hear.
>
>
>>It seems that once I log into my gnome session, everything is aok for
>>the first little while, until eventually I get a segfault pop-up for
>>wnck-applet.  I've never seen this segfault before, and am not sure if
>>it is xen or 2.6.7 related (I'm still running a 2.6.1 variant).
>>However, a few seconds later, whether or not I click the 'ok' in the
>>segfault dialog, the machine seems to lock hard.
>
>
> Is this all in domain 0 or another domain?

This is all dom0.  I haven't had the chance to load up any other domains
yet.  My main concern at this point is getting my hardware working :)

>
> Can it be repeated with 2.4, or will your filesystem not boot
> with a 2.4 kernel? (BTW: what file system are you using?)
>
>
>>I've tested this w/ & w/o both a) removing the /lib/tls directory and b)
>>the nosegfixup kernel option.
>
>
> It's useful to know its not a tls issue.
>
>
>>Has anyone else seen this?  or possibly know of any other cause for this?
>
>
> Could you hook up a serial console to the machine? If you start
> Xen with the appropriate (e.g. 'com1=115200,8n1') option there's
> a fair chance that either Xen or more likely, domain 0, will
> write some sort of crash message to the console when the machine
> locks. A 'debug=y' build of Xen might help.

I'll hopefully be able to try something eventually.  I just moved, so at
this point I'm living out of cardboard boxes 8)

Out of curiosity, is there any way to get the Xen console/crash dumps on
the console (as opposed to serial)?

>
>
>>A while ago, I tried building the xenolinux-2.6.7-dom0 kernel with a
>>pentium II cpu target.  The system appeared to boot up properly, however
>>X couldn't start as there seemed to be a mysterious SIGBUS being sent.
>>For now, I'm using a PIV build on this PII..
>
>
> That's very weird. The other way around I could sort of
> understand...

Ah, I just checked my latest build, and it is in fact a PII build.  I
saw this issue a couple days ago: I may have had an inconsistent build.
Sorry for the noise :)



- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
http://www.sun.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE:  The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBHVXodQs4kOxk3/MRAu8aAJ0Z5/QGwq2smkZ7dUAO+GJEFpfLqACeOINe
fqiUr1AYwVmCxRfE+XxryMU=
=oQpz
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: system lockup
  2004-08-13 23:59   ` Mike Waychison
@ 2004-08-14  0:20     ` Ian Pratt
  2004-08-14  8:32     ` Keir Fraser
  1 sibling, 0 replies; 11+ messages in thread
From: Ian Pratt @ 2004-08-14  0:20 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Ian Pratt, xen-devel

> > Could you hook up a serial console to the machine? If you start
> > Xen with the appropriate (e.g. 'com1=115200,8n1') option there's
> > a fair chance that either Xen or more likely, domain 0, will
> > write some sort of crash message to the console when the machine
> > locks. A 'debug=y' build of Xen might help.
> 
> Out of curiosity, is there any way to get the Xen console/crash dumps on
> the console (as opposed to serial)?

You can see Xen's boot messages with 'xen_dmesg.py'.

If you're in a graphics screen mode I'm afraid there's no way to
get crash messages out of either dom0 or Xen.

If you're in a text mode, you should see crash dumps from dom0.

You are much better off with a serial line, though.

Ian


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: system lockup
  2004-08-13 23:59   ` Mike Waychison
  2004-08-14  0:20     ` Ian Pratt
@ 2004-08-14  8:32     ` Keir Fraser
  2004-08-16 13:59       ` Mark Williamson
  1 sibling, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2004-08-14  8:32 UTC (permalink / raw)
  To: Mike Waychison; +Cc: Ian Pratt, xen-devel


> > Could you hook up a serial console to the machine? If you start
> > Xen with the appropriate (e.g. 'com1=115200,8n1') option there's
> > a fair chance that either Xen or more likely, domain 0, will
> > write some sort of crash message to the console when the machine
> > locks. A 'debug=y' build of Xen might help.
> 
> I'll hopefully be able to try something eventually.  I just moved, so at
> this point I'm living out of cardboard boxes 8)
> 
> Out of curiosity, is there any way to get the Xen console/crash dumps on
> the console (as opposed to serial)?

DOM0 will of course write crash dumps to dmesg and /var/log/messages,
if it can. Adn Xen's trace buffer can be seen with xen_dmesg.py.

But if you're locking hard then you want those crash dumps to get
written somewhere synchronously, and there's no better option than a
serial line. :-)

 -- Keir


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: system lockup
  2004-08-14  8:32     ` Keir Fraser
@ 2004-08-16 13:59       ` Mark Williamson
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Williamson @ 2004-08-16 13:59 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Mike Waychison, Ian Pratt, xen-devel, Mark.Williamson

Btw,

Xen boot output can be collected by "xm dmesg" - xen_dmesg.py is dead.

HTH
Mark



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285

^ permalink raw reply	[flat|nested] 11+ messages in thread

* system lockup
@ 2007-02-27  1:20 Roman Mashak
  0 siblings, 0 replies; 11+ messages in thread
From: Roman Mashak @ 2007-02-27  1:20 UTC (permalink / raw)
  To: netdev

Hello,

for learning device drivers I took 8139too.c to explore it. I
disrupted the code into several logical blocks and now I'm trying to
implement my own simplified version referring to original code from
time to time. I have rtl8139d NIC for experiments. So, by now I've
acomplished the following stages:
1) detected device
2) enable PCI device
3) memory mapped IO initialised
4) initialization of 'net_device' structure

And I'm stuck on the chip reset. Whenever I load driver and try to
enable interface (ifconfig eth1 up) my system just hangs, keyboard
locks up, I can't even use 'SysRq' shortcuts.

I figured out that problem occurs after I initialised chip, i.e. in
this routine called from 'net_device->open' method:


#define CmdTxEnb  (0x04)
...
#define RxOK   (0x01)
#define RxErr   (0x02)
#define TxOK   (0x04)
#define TxErr   (0x08)
#define RxOverFlow  (0x10)
#define RxUnderrun  (0x20)
#define RxFIFOOver  (0x40)
#define CableLen  (0x2000)
#define TimeOut   (0x4000)
#define SysErr   (0x8000)

#define INT_MASK (RxOK | RxErr | TxOK | TxErr | RxOverFlow | \
RxUnderrun | RxFIFOOver | CableLen | TimeOut | SysErr)


static void rtl8139_hw_start(struct net_device *dev)
{
   struct rtl8139_private *tp = dev->priv;
   void *ioaddr = tp->mmio_addr;
   ...
   writeb(CmdTxEnb, ioaddr + REG_COMMAND);
   writel(0x00000600, ioaddr + REG_TX_CONFIG);    /* DMA burst size 1024 */

/* init TX buffer DMA addresses */
for (i = 0; i < NUM_TX_DESC; i++) {
writel(tp->tx_bufs_dma + (tp->tx_buf[i] - tp->tx_bufs), ioaddr +
REG_TX_ADDR0 + (i * 4));
}


   /* enable all known interrupts by setting the interrupt mask */
   writew(INT_MASK, ioaddr + REG_INTR_MASK);


   netif_start_queue(dev);
   return;
}


static int rtl8139_open(struct net_device *dev)
{
   int retval;
   struct rtl8139_private *tp = dev->priv;
   ...
   retval = request_irq(dev->irq, rtl8139_interrupt, 0, dev->name, dev);
   if (retval)
       return retval;

/* Get memory for TX buffers. Memory must be DMA-able */
tp->tx_bufs = pci_alloc_consistent(tp->pci_dev, TOTAL_TX_BUF_SIZE,
&tp->tx_bufs_dma);
...
rtl8139_init_ring(dev);
rtl8139_hw_start(dev);

DPRINTK("init_ring() & hw_start() passed\n");


   return;
}

rtl8139_hw_start() is really invoked and returned, since I'm getting
printk output. Commenting 'rtl8139_hw_start(dev);' out brings the
interface up succesfully, that's why I came to conclusion the problem
is in chip initialization routine.

If anybody has any clue, I'd appreciate to hear it and get advice.
Thanks in advance.

-- 
Roman

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-02-27  1:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-13 22:53 system lockup Mike Waychison
2004-08-13 23:23 ` Ian Pratt
2004-08-13 23:59   ` Mike Waychison
2004-08-14  0:20     ` Ian Pratt
2004-08-14  8:32     ` Keir Fraser
2004-08-16 13:59       ` Mark Williamson
  -- strict thread matches above, loose matches on Subject: below --
2007-02-27  1:20 Roman Mashak
2002-10-21  0:02 System lockup Bill Leckey
2002-10-21 11:30 ` Alan Cox
2002-10-21 21:44   ` Bill Leckey
2002-10-27 22:26   ` Bill Leckey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.