From: "Brett G. Durrett" <brett@imvu.com>
To: linux-kernel@vger.kernel.org
Cc: "Brett G. Durrett" <brett@imvu.com>,
"David N. Welton" <d.welton@webster.it>
Subject: Re: megaraid_sas waiting for command and then offline
Date: Mon, 13 Nov 2006 13:40:28 -0800 [thread overview]
Message-ID: <4558E64C.10309@imvu.com> (raw)
In-Reply-To: <453FE9C4.1090504@imvu.com>
Bad news - I just reproduced the failure using EXT3 on a system that had
a complete install 4 days ago, so it looks like the megaraid_sas driver
fails with both XFS and EXT3 (although EXT3 seems more reliable).
I was running EXT with no read ahead:
# ./MegaCli -LDGetProp -Cache -L0 -A0
Adapter 0-VD 0: Cache Policy:WriteBack, ReadAheadNone, Direct
# mount
/dev/sda1 on / type ext3 (rw,errors=remount-ro)
# uname -a
Linux AF001158 2.6.18-imvuamd64smpmsastest #1 SMP Mon Oct 9 21:26:46 PDT
2006 x86_64 GNU/Linux
Here are the megaraid entries from syslog:
FACILITY DATE TIME MESSAGE
kern-warning 2006-11-13 12:56:25 kernel: megasas[0]: 64 bit SGLs were
sent to FW
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Pending OS cmds in FW :
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x15351800 : <3>megasas[0]: frame count : 0x8, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe238b77, lba_hi : 0x0, sense_buf addr : 0x1534d900,sge count
: 0x47
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x1535c800 : <3>megasas[0]: frame count : 0x8, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe23991f, lba_hi : 0x0, sense_buf addr : 0x15356d00,sge count
: 0x50
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x15375000 : <3>megasas[0]: frame count : 0x6, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe23aaaf, lba_hi : 0x0, sense_buf addr : 0x15371800,sge count
: 0x1a
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x15377c00 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xae0005f, lba_hi : 0x0, sense_buf addr : 0x15371d80,sge count
: 0x2
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x1537b400 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe208367, lba_hi : 0x0, sense_buf addr : 0x1537a280,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x1537d400 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe239697, lba_hi : 0x0, sense_buf addr : 0x1537a680,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff00000 : <3>megasas[0]: frame count : 0x8, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe238f17, lba_hi : 0x0, sense_buf addr : 0x1537ac00,sge count
: 0x45
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff01400 : <3>megasas[0]: frame count : 0x7, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe238df7, lba_hi : 0x0, sense_buf addr : 0x1537ae80,sge count
: 0x22
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff06400 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xa68d66f, lba_hi : 0x0, sense_buf addr : 0xcff03680,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff18400 : <3>megasas[0]: frame count : 0x8, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe239e27, lba_hi : 0x0, sense_buf addr : 0xcff15680,sge count
: 0x50
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff1f000 : <3>megasas[0]: frame count : 0x8, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe239b9f, lba_hi : 0x0, sense_buf addr : 0xcff1e200,sge count
: 0x50
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff20000 : <3>megasas[0]: frame count : 0x4, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe23c41f, lba_hi : 0x0, sense_buf addr : 0xcff1e400,sge count
: 0xf
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff2b000 : <3>megasas[0]: frame count : 0x3, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe23a377, lba_hi : 0x0, sense_buf addr : 0xcff27800,sge count
: 0xa
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff35c00 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xa601697, lba_hi : 0x0, sense_buf addr : 0xcff30b80,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff44400 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe238b6f, lba_hi : 0x0, sense_buf addr : 0xcff42480,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff4cc00 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe20a287, lba_hi : 0x0, sense_buf addr : 0xcff4b380,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff4f800 : <3>megasas[0]: frame count : 0x8, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe23a0f7, lba_hi : 0x0, sense_buf addr : 0xcff4b900,sge count
: 0x38
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff52400 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0x5f4009f, lba_hi : 0x0, sense_buf addr : 0xcff4be80,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff5fc00 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe238f0f, lba_hi : 0x0, sense_buf addr : 0xcff5d580,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff60000 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xa6000df, lba_hi : 0x0, sense_buf addr : 0xcff5d600,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff6bc00 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe239e1f, lba_hi : 0x0, sense_buf addr : 0xcff66b80,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff75800 : <3>megasas[0]: frame count : 0x8, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe239197, lba_hi : 0x0, sense_buf addr : 0xcff6fd00,sge count
: 0x50
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff76400 : <3>megasas[0]: frame count : 0x3, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe23a0a7, lba_hi : 0x0, sense_buf addr : 0xcff6fe80,sge count
: 0xa
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff7b400 : <3>megasas[0]: frame count : 0x8, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe23969f, lba_hi : 0x0, sense_buf addr : 0xcff78680,sge count
: 0x50
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0xcff7e400 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe23aaa7, lba_hi : 0x0, sense_buf addr : 0xcff78c80,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x15391400 : <3>megasas[0]: frame count : 0x2, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xd0c004f, lba_hi : 0x0, sense_buf addr : 0x1538ae80,sge count
: 0x3
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x153a3000 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0x5f40217, lba_hi : 0x0, sense_buf addr : 0x1539ce00,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x153adc00 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe2343e7, lba_hi : 0x0, sense_buf addr : 0x153ae180,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x153bdc00 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xa601657, lba_hi : 0x0, sense_buf addr : 0x153b7d80,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x153c3000 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xae00057, lba_hi : 0x0, sense_buf addr : 0x153c0600,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x153c4000 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe2324af, lba_hi : 0x0, sense_buf addr : 0x153c0800,sge count
: 0x1
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Frame addr
:0x153c7400 : <3>megasas[0]: frame count : 0x8, Cmd : 0x2, Tgt id : 0x0,
lba lo : 0xe239417, lba_hi : 0x0, sense_buf addr : 0x153c0e80,sge count
: 0x50
kern-warning 2006-11-13 12:56:25 kernel: megasas[0]: Pending Internal
cmds in FW :
kern-err 2006-11-13 12:56:25 kernel: megasas[0]: Dumping Done.
kern-err 2006-11-13 12:56:25 kernel: megasas: failed to do reset
kern-notice 2006-11-13 12:56:25 kernel: sd 0:2:0:0: megasas: RESET
-20487153 cmd=2a
kern-err 2006-11-13 12:56:25 kernel: megasas: cannot recover from
previous reset failures
kern-notice 2006-11-13 12:56:25 kernel: sd 0:2:0:0: megasas: RESET
-20487153 cmd=2a
kern-err 2006-11-13 12:56:25 kernel: megasas: cannot recover from
previous reset failures
kern-notice 2006-11-13 12:56:24 kernel: megasas: [100]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [105]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [110]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [115]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [120]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [125]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [130]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [135]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [140]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [145]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [150]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [155]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [160]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [165]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [170]waiting for 32
commands to complete
kern-notice 2006-11-13 12:56:24 kernel: megasas: [175]waiting for 32
commands to complete
kern-warning 2006-11-13 12:56:24 kernel: megasas[0]: Dumping Frame
Phys Address of all pending cmds in FW
kern-err 2006-11-13 12:56:24 kernel: megasas[0]: Total OS Pending cmds
: 32
kern-notice 2006-11-13 12:54:59 kernel: megasas: [95]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:54 kernel: megasas: [90]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:49 kernel: megasas: [85]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:44 kernel: megasas: [80]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:39 kernel: megasas: [75]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:34 kernel: megasas: [70]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:29 kernel: megasas: [65]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:24 kernel: megasas: [60]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:19 kernel: megasas: [55]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:14 kernel: megasas: [50]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:09 kernel: megasas: [45]waiting for 32
commands to complete
kern-notice 2006-11-13 12:54:04 kernel: megasas: [40]waiting for 32
commands to complete
kern-notice 2006-11-13 12:53:59 kernel: megasas: [35]waiting for 32
commands to complete
kern-notice 2006-11-13 12:53:54 kernel: megasas: [30]waiting for 32
commands to complete
kern-notice 2006-11-13 12:53:49 kernel: megasas: [25]waiting for 32
commands to complete
kern-notice 2006-11-13 12:53:44 kernel: megasas: [20]waiting for 32
commands to complete
kern-notice 2006-11-13 12:53:39 kernel: megasas: [15]waiting for 32
commands to complete
kern-notice 2006-11-13 12:53:34 kernel: megasas: [10]waiting for 32
commands to complete
kern-notice 2006-11-13 12:53:29 kernel: megasas: [ 5]waiting for 32
commands to complete
kern-notice 2006-11-13 12:53:24 kernel: sd 0:2:0:0: megasas: RESET
-20487153 cmd=2a
kern-notice 2006-11-13 12:53:24 kernel: megasas: [ 0]waiting for 32
commands to complete
Brett G. Durrett wrote:
>
> David,
>
> We switched to 2.6.18 (SMP) and applied the latest patches from LSI
> (got them directly from Sumant Patro). Also, he told me to make sure
> "read ahead" was set to "off". This seems to have reduced the
> frequency of the failures to about once per week (across 10+
> machines), down from several times per week.
>
> After I reported an additional failure, Sumant said they were able to
> reproduce the problems with XFS but they have not seen it with EXT3.
> I prefer XFS but I prefer to have reliable databases even more...
>
> I now have a couple of systems running in the new configuration and I
> am slowly migrating others to it as well. I have not seen a failure
> with EXT3 but I statistically it would have been unlikely... I won't
> declare victory until I have more systems converted with a few weeks
> of reliable use.
>
> Hope this helps... if anybody solves the root cause I will happily
> offer them a small gift to show my gratitude.
>
> B-
>
>
>
> David N. Welton wrote:
>
>> Hi,
>>
>> I found someone corresponding to your name writing about a problem with
>> the megaraid sas driver/hardware on the LKML:
>>
>> http://lkml.org/lkml/2006/9/6/12
>>
>> We have a Dell (2950, running 2.6.18 #1 SMP) as well, and the way I
>> managed to kill the thing dead in its tracks (symptoms basically what
>> you you describe) is with smartctl:
>>
>> root@salgari:~# smartctl --all /dev/sda
>> smartctl version 5.34 [i686-pc-linux-gnu] Copyright (C) 2002-5 Bruce
>> Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> Device: DELL PERC 5/i Version: 1.00
>> Device type: disk
>> Local Time is: Wed Oct 25 10:14:40 2006 CEST
>> Device does not support SMART
>>
>> Error Counter logging not supported
>>
>>
>> Device does not support Self Test logging
>>
>> ----
>>
>> [61101.681857] sd 0:2:0:0: rejecting I/O to offline device
>> [61101.681944] EXT3-fs error (device sda1): ext3_readdir: directory
>> #7553069 contains a hole at offset 0
>> [61103.944794] sd 0:2:0:0: rejecting I/O to offline device
>> [61103.944879] EXT3-fs error (device sda1): ext3_readdir: directory
>> #7553069 contains a hole at offset 0
>> [61104.672212] sd 0:2:0:0: rejecting I/O to offline device
>> [61104.672295] EXT3-fs error (device sda1): ext3_readdir: directory
>> #7553069 contains a hole at offset 0
>> [61105.255981] sd 0:2:0:0: rejecting I/O to offline device
>> [61105.256066] EXT3-fs error (device sda1): ext3_readdir: directory
>> #7553069 contains a hole at offset 0
>>
>> ----
>>
>> Dead in the water. We suspect that in any case there are some disk
>> problems, which is why we were trying to use smartctl in the first
>> place.
>>
>> I was just curious if you managed to figure anything out...
>>
>> Thanks,
>> Dave Welton
>>
>>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2006-11-13 21:40 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-25 8:46 megaraid_sas waiting for command and then offline David N. Welton
2006-10-25 22:48 ` Brett G. Durrett
2006-10-25 23:03 ` Alan Cox
2006-11-13 21:40 ` Brett G. Durrett [this message]
-- strict thread matches above, loose matches on Subject: below --
2006-09-06 17:14 Patro, Sumant
2006-09-06 20:44 ` Brett G. Durrett
2006-09-06 4:49 Brett G. Durrett
2006-09-06 14:11 ` Dave Lloyd
2006-09-06 16:04 ` Brett G. Durrett
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4558E64C.10309@imvu.com \
--to=brett@imvu.com \
--cc=d.welton@webster.it \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox