All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Merillat <harik.attar@gmail.com>
To: linux-scsi@vger.kernel.org
Subject: Megaraid lockup on 2.6.[7-8]
Date: Fri, 6 Aug 2004 21:55:32 -0400	[thread overview]
Message-ID: <c0c067900408061855df4dcae@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 3767 bytes --]

This dump is from 2.6.7-rc3.  I'm using the in-kernel megaraid driver
on this Express 500.

0000:00:09.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 02)
        Subsystem: American Megatrends Inc. MegaRAID 475 Express
500/500LC RAID Controller

At this point, I'm really unsure what's going on.  It went from
"working" to "not working"
overnight, after a 2 month uptime with no problem.  Now it lasts
between 5 minutes and 3 hours before it locks up.  Somtimes the
lockups are 'soft', in that all disk IO hangs in wait.  Othertimes
it's hard (kernel panic and reload)

Due to the sudden nature of the problem, I suspected hardware. 
Starting with the SCSI cable,
RAID card, RAM, PCI riser card, motherboard... it's all been replaced.
 The megaraid itself
dosn't complain about problems accessing the drives (and the state is
always Optimal on reboot, no media or "other" errors on any drive)

Software wise, I've tried from 2.6.7-rc3, 2.6.7, 2.6.8-rc2 and 2.6.8-rc3.

I tried the 2.20 series, but they don't appear to recognize older
cards?  Are they for
U320 only or did I do something wrong?  (2.20 works fine on another
box with a U320 controller)

I'm at a loss.  I guess I could replace all the drives, being as
that's the last thing left to try,
but A) it's a serious PITA and B) they "appear" fine, even when doing
a full consistency check
via the Ctrl-M BIOS.   

This is made even more fun due to the fact it's a LIVE server, or was,
until two days ago.  It's
also stored in a colo facility about an hour from my house.   I'm
going to go back tomorrow
and try updating the firmware on the megaraid, but that's about the
only thing I can think
left to try.   Perhaps some PCI setting I should try?

Any help would be greatly appreciated!

(Different PCI Bus/Slot due to being in a different MB when I captured
this backtrace)


megaraid: found 0x101e:0x1960:bus 1:slot 2:func 0
scsi0:Found MegaRAID controller at 0xf8846000, IRQ:27
megaraid: [C170:3.13] detected 1 logical drives.
megaraid: supports extended CDBs.
megaraid: channel[0] is raid.
scsi0 : LSI Logic MegaRAID C170 254 commands 16 targs 4 chans 7 luns
scsi0: scanning scsi channel 0 for logical drives.
  Vendor: MegaRAID  Model: LD 0 RAID5  210G  Rev: C170
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 430116864 512-byte hdwr sectors (220220 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
 /dev/scsi/host0/bus0/target0/lun0: p1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0,  type 0
scsi0: scanning scsi channel 1 for logical drives.
scsi0: scanning scsi channel 2 for logical drives.
scsi0: scanning scsi channel 4 [P0] for physical devices.
...
megaraid: ABORTING-12793 cmd=2a <c=0 t=0 l=0>
megaraid: ABORTING-12796 cmd=2a <c=0 t=0 l=0>
megaraid: ABORTING-12797 cmd=2a <c=0 t=0 l=0>
megaraid: ABORTING-12798 cmd=2a <c=0 t=0 l=0>
megaraid: reservation reset failed.
megaraid: RESET-12793 cmd=2a <c=0 t=0 l=0>
megaraid: reservation reset failed.
megaraid: RESET-12793 cmd=2a <c=0 t=0 l=0>
megaraid: reservation reset failed.
megaraid: RESET-12793 cmd=2a <c=0 t=0 l=0>
scsi: Device offlined - not ready after error recovery: host 0 channel
0 id 0 lun 0
scsi: Device offlined - not ready after error recovery: host 0 channel
0 id 0 lun 0
scsi: Device offlined - not ready after error recovery: host 0 channel
0 id 0 lun 0
scsi: Device offlined - not ready after error recovery: host 0 channel
0 id 0 lun 0
SCSI error : <0 0 0 0> return code = 0x6000000
end_request: I/O error, dev sda, sector 197231
Buffer I/O error on device dm-64, logical block 24598
lost page write due to I/O error on dm-64
scsi0 (0:0): rejecting I/O to offline device

[-- Attachment #2: typhoon-config.gz --]
[-- Type: application/gzip, Size: 7202 bytes --]

             reply	other threads:[~2004-08-07  1:55 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-07  1:55 Dan Merillat [this message]
     [not found] ` <1091882205l.7352l.1l@serve.riede.org>
2004-08-07 16:26   ` Megaraid lockup on 2.6.[7-8] Dan Merillat
  -- strict thread matches above, loose matches on Subject: below --
2004-08-09 14:11 Mukker, Atul
2004-08-09 22:33 ` Dan Merillat
2004-08-09 23:02   ` Dan Merillat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0c067900408061855df4dcae@mail.gmail.com \
    --to=harik.attar@gmail.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.