SiI680 oops/panic

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* SiI680 oops/panic
@ 2003-10-09 21:18 Erik Bourget
  2003-10-09 21:33 ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 2+ messages in thread
From: Erik Bourget @ 2003-10-09 21:18 UTC (permalink / raw)
  To: andre; +Cc: linux-kernel

(Cc to lkml in case anybody else has any intuition).

Hello Andre;

A while ago I spoke with you about my funky SiI680 experiences.  I have since
decided that my Hitachi "DeathStar" 180GXP harddrives were all faulty, and
replaced them with Western Digital..  This was bad.  On the other hand, the
kernel /DID/ oops/panic a few times, and I finally managed to drive to the
datacenter before somebody rebooted it.

And I took a shot with my trusty digital camera.
http://tacos.sus.mcgill.ca/~erik/oops-panic.jpg (24,457b) 
(textual bits have been re-typed below)

Kernel: 2.4.21, SMP (also happened on 2.4.22, also SMP)
(Note that the box has only one CPU, and no 'hyperthreading')

Text from it:

printing eip:
3d3d3d3d
*pde = 00000000
Oops: 0000
CPU:   0
EIP:  0010:[<3d3d3d3d>]  Not tainted
EFLAGS: 00010046
...
Process nfsd (pid: 200)
...
Code: Bad EIP value.
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

Hardware:
  Dell "650" 1U server, P4 2.4GHz, 512MB, 2x120-GB Hitachi 180GXP DeskStar
  drives in RAID-1 configuration.

Software: vanilla Debian woody, vanilla kernel.org kernels.

Load on the drives was Constant and Extreme.  The machines serve as mail
storage for our ISP's mail system.  The drives were running a constant synch
process that checked a database to see which users had authenticated with the
mail system (via POP3, IMAP, etc - such that they might have deleted mail) and
ran rsync to an identical machine over a 1000mbit link on their directories.
Literally -

while(1) {
         @userlist = changed_directories();
         foreach (@userlist) {
                 do_rsync(localhost:$_, remotehost:$_);
         }
         sleep(30);
}

Can you make heads or tails of this?  My first thought is that some driver
isn't handling faulty hardware in an error-tolerant way.

Thanks for your time;

- Erik Bourget

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: SiI680 oops/panic
  2003-10-09 21:18 SiI680 oops/panic Erik Bourget
@ 2003-10-09 21:33 ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 2+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-10-09 21:33 UTC (permalink / raw)
  To: Erik Bourget, andre; +Cc: linux-kernel


Please write down your oops and decode it using ksymoops.

thanks,
--bartlomiej

On Thursday 09 of October 2003 23:18, Erik Bourget wrote:
> (Cc to lkml in case anybody else has any intuition).
>
> Hello Andre;
>
> A while ago I spoke with you about my funky SiI680 experiences.  I have
> since decided that my Hitachi "DeathStar" 180GXP harddrives were all
> faulty, and replaced them with Western Digital..  This was bad.  On the
> other hand, the kernel /DID/ oops/panic a few times, and I finally managed
> to drive to the datacenter before somebody rebooted it.
>
> And I took a shot with my trusty digital camera.
> http://tacos.sus.mcgill.ca/~erik/oops-panic.jpg (24,457b)
> (textual bits have been re-typed below)
>
> Kernel: 2.4.21, SMP (also happened on 2.4.22, also SMP)
> (Note that the box has only one CPU, and no 'hyperthreading')
>
> Text from it:
>
> printing eip:
> 3d3d3d3d
> *pde = 00000000
> Oops: 0000
> CPU:   0
> EIP:  0010:[<3d3d3d3d>]  Not tainted
> EFLAGS: 00010046
> ...
> Process nfsd (pid: 200)
> ...
> Code: Bad EIP value.
>  <0>Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing
>
> Hardware:
>   Dell "650" 1U server, P4 2.4GHz, 512MB, 2x120-GB Hitachi 180GXP DeskStar
>   drives in RAID-1 configuration.
>
> Software: vanilla Debian woody, vanilla kernel.org kernels.
>
> Load on the drives was Constant and Extreme.  The machines serve as mail
> storage for our ISP's mail system.  The drives were running a constant
> synch process that checked a database to see which users had authenticated
> with the mail system (via POP3, IMAP, etc - such that they might have
> deleted mail) and ran rsync to an identical machine over a 1000mbit link on
> their directories. Literally -
>
> while(1) {
>          @userlist = changed_directories();
>          foreach (@userlist) {
>                  do_rsync(localhost:$_, remotehost:$_);
>          }
>          sleep(30);
> }
>
> Can you make heads or tails of this?  My first thought is that some driver
> isn't handling faulty hardware in an error-tolerant way.
>
> Thanks for your time;
>
> - Erik Bourget


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-10-09 21:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-09 21:18 SiI680 oops/panic Erik Bourget
2003-10-09 21:33 ` Bartlomiej Zolnierkiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox