linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mptsas and ioc0: ERRORs
@ 2008-03-31 13:04 Lars Täuber
  2008-04-03 10:31 ` Lars Täuber
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Täuber @ 2008-03-31 13:04 UTC (permalink / raw)
  To: linux-scsi

Hallo,

my name is Lars and I'm working for the IT of a german academy.
We recently bought some expensive equipment to build up a SAN with Linux.

If this is the wrong address to ask excuse me please. (Where to ask instead?)

The hardware is the following:

monosan:~ # cat /etc/SuSE-release 
openSUSE 10.3 (X86-64)
VERSION = 10.3
monosan:~ # uname -a
Linux monosan 2.6.22.17-0.1-default #1 SMP 2008/02/10 20:01:04 UTC x86_64 x86_64 x86_64 GNU/Linux

2x Dual-Core AMD Opteron(tm) Processor 2216

03:04.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 02)

monosan:~ # ls -1 /lib/firmware/
ethp_z8e.dat
eth_z8e.dat
myri10ge_ethp_z8e.dat
myri10ge_eth_z8e.dat
myri10ge_rss_ethp_z8e.dat
myri10ge_rss_eth_z8e.dat
rss_ethp_z8e.dat
rss_eth_z8e.dat

The HBA has 2 external SFF-8088 connectors and each one is connected to one extender board of the same Promise VTrak VTJ610sD disc enclosure. This is meant to be for redundancy. Therefor I use multipathing.
The VTrak contains 16 SATA discs connected as sda-sdr (and sds-sdah).
There is one Software-RAID6 over 15 discs + one hot spare.

I get the following errors:
monosan:~ # fgrep "Mar 28" /var/log/messages | egrep "(scsi|mpt)"
Mar 28 21:45:37 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100395bd1c0)
Mar 28 21:45:48 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:45:48 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
Mar 28 21:45:51 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: Issue of TaskMgmt failed!
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: FAILED (sc=ffff8100395bd1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810039654700)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff810039654700)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81003bb87d80)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81003bb87d80)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100519cc5c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100519cc5c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100083504c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100083504c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100787ccd40)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100787ccd40)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81006ee04240)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81006ee04240)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100787cc100)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff8100787cc100)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007cbeb1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007cbeb1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81003bb87a00)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81003bb87a00)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81011bb48300)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81011bb48300)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81011bb484c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81011bb484c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007cbebc40)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007cbebc40)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81003976f0c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81003976f0c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810051a7a1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff810051a7a1c0)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810015f93880)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff810015f93880)
Mar 28 21:46:07 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100395bd1c0)
Mar 28 21:46:17 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:46:17 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
Mar 28 21:46:21 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT
Mar 28 21:46:36 monosan kernel: mptscsih: ioc0: Issue of TaskMgmt failed!
Mar 28 21:46:36 monosan kernel: mptscsih: ioc0: target reset: FAILED (sc=ffff8100395bd1c0)
Mar 28 21:46:36 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff8100395bd1c0)
Mar 28 21:46:48 monosan kernel: mptbase: ioc0: ERROR - Doorbell INT timeout (count=4999), IntStatus=80000008!
Mar 28 21:46:48 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:46:48 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
Mar 28 21:46:48 monosan kernel: mptbase: ioc0: ERROR - Doorbell INT timeout (count=4999), IntStatus=0!
Mar 28 21:46:49 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT
Mar 28 21:47:05 monosan kernel: mptscsih: ioc0: Issue of TaskMgmt failed!
Mar 28 21:47:05 monosan kernel: mptscsih: ioc0: bus reset: FAILED (sc=ffff8100395bd1c0)
Mar 28 21:47:05 monosan kernel: mptscsih: ioc0: attempting host reset! (sc=ffff8100395bd1c0)
Mar 28 21:47:05 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: host reset: SUCCESS (sc=ffff8100395bd1c0)
Mar 28 21:47:23 monosan kernel: sd 6:0:26:0: scsi: Device offlined - not ready after error recovery
Mar 28 21:47:23 monosan kernel: scsi 6:0:7:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:4:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:6:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:2:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:1:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:10:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=8380 req_idx_MR=8380 mf=ffff81007db02900 mr=0000000000000000 sc=0000000000000000
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=6680 req_idx_MR=6680 mf=ffff81007db0be80 mr=0000000000000000 sc=019724848808e8c1
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=ce00 req_idx_MR=ce00 mf=ffff81007db0ea00 mr=0000000000000000 sc=ffff81007da92000
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=2900 req_idx_MR=2900 mf=ffff81007db04900 mr=0000000000000000 sc=0000000000000000
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=4900 req_idx_MR=4900 mf=ffff81007db06680 mr=0000000000000000 sc=0000007800000018
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=be80 req_idx_MR=be80 mf=ffff81007db0ce00 mr=0000000000000000 sc=0000000000000000
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=ea00 req_idx_MR=ea00 mf=ffff81007db10b00 mr=0000000000000000 sc=0000000000000000
Mar 28 21:47:23 monosan kernel: scsi 6:0:12:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:11:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:13:0: rejecting I/O to dead device
Mar 28 21:47:23 monosan kernel: scsi 6:0:14:0: rejecting I/O to dead device

And 11 discs have just dissappeared simultaneously:
monosan:~ # cat /proc/mdstat 
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] 
md4 : active raid6 dm-9[15](S) dm-8[16](F) dm-7[13] dm-6[17](F) dm-5[18](F) dm-4[19](F) dm-3[20](F) dm-2[21](F) dm-15[22](F) dm-14[23](F) dm-13[5] dm-12[24](F) dm-11[3] dm-10[25](F) dm-1[26](F) dm-0[0]
      12697912448 blocks level 6, 64k chunk, algorithm 2 [15/4] [U__U_U_______U_]

This hasn't happened for the first time, but at first I thought I might have made a mistake somewhere. Now it has happened again and additionally on a second machine with same hardware for the third time too.
Has this something todo with the multipathing?
Is it strange to have multipathing through the same HBA?
How to debug this any further?

Thanks for any help.

Lars

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-03-31 13:04 mptsas and ioc0: ERRORs Lars Täuber
@ 2008-04-03 10:31 ` Lars Täuber
  2008-04-03 11:13   ` Bernd Schubert
  2008-04-03 14:01   ` James Bottomley
  0 siblings, 2 replies; 14+ messages in thread
From: Lars Täuber @ 2008-04-03 10:31 UTC (permalink / raw)
  To: linux-scsi

Hi!

OK, I understand. Being ignored means having asked at the wrong place.
Can someone give me a hint where to ask instead?

Thanks
Lars

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-04-03 10:31 ` Lars Täuber
@ 2008-04-03 11:13   ` Bernd Schubert
  2008-04-04  7:21     ` Lars Täuber
  2008-04-03 14:01   ` James Bottomley
  1 sibling, 1 reply; 14+ messages in thread
From: Bernd Schubert @ 2008-04-03 11:13 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-scsi

Hello Lars,

On Thursday 03 April 2008 12:31:07 Lars Täuber wrote:
> Hi!
>
> OK, I understand. Being ignored means having asked at the wrong place.
> Can someone give me a hint where to ask instead?

no, I think you did ask at the right place, but nobody does know what is the 
cause of the problem.
Well, from my point of view linux-scsi is by far to quiet in case of errors. I 
have patches for 2.6.22 to improve this situation and the error handler in 
general, but presently don't have the time to clean it up and to push it 
upstream.

If you would be willing to use 2.6.22 + my scsi patches we would at least find 
out which scsi commands fail.

Eric also did sent me a more recent version of the MPT driver, it also might 
be worth to try this version. It fixed a really troublesome mpt-scsi bug for 
us, but I don't know how well it works for SAS.


Cheers,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-04-03 10:31 ` Lars Täuber
  2008-04-03 11:13   ` Bernd Schubert
@ 2008-04-03 14:01   ` James Bottomley
  2008-04-04  7:14     ` Lars Täuber
  1 sibling, 1 reply; 14+ messages in thread
From: James Bottomley @ 2008-04-03 14:01 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-scsi, Moore, Eric


On Thu, 2008-04-03 at 12:31 +0200, Lars Täuber wrote:
> OK, I understand. Being ignored means having asked at the wrong place.
> Can someone give me a hint where to ask instead?

Actually, LSI support might be better.  This:

> Mar 28 21:45:37 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100395bd1c0)
> Mar 28 21:45:48 monosan kernel: mptbase: Initiating ioc0 recovery
> Mar 28 21:45:48 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
> Mar 28 21:45:51 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT

Indicates some type of firmware problem on the card (ioc fault means
fault in the firmware engine).

This:

> Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
> Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=8380 req_idx_MR=8380 mf=ffff81007db02900 mr=0000000000000000 sc=0000000000000000

Indicates a rendezvous failure between the ioc firmware and the driver.
The firmware is sending a response to a task that the driver thinks it
previously sent.

The problem for us is that while the driver is open source, it's really
just a translator for the on board firmware engine.  If the firmware
engine is truly the cause of these faults, there's not much anyone
except LSI can do to help.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-04-03 14:01   ` James Bottomley
@ 2008-04-04  7:14     ` Lars Täuber
  2008-04-04 16:42       ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Täuber @ 2008-04-04  7:14 UTC (permalink / raw)
  To: linux-scsi

Hi James,

James Bottomley <James.Bottomley@HansenPartnership.com> schrieb:
> Actually, LSI support might be better.  This:
> 
> > Mar 28 21:45:37 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100395bd1c0)
> > Mar 28 21:45:48 monosan kernel: mptbase: Initiating ioc0 recovery
> > Mar 28 21:45:48 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
> > Mar 28 21:45:51 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT
> 
> Indicates some type of firmware problem on the card (ioc fault means
> fault in the firmware engine).

thank you for this descriptive info. 
 
> This:
> 
> > Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
> > Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=8380 req_idx_MR=8380 mf=ffff81007db02900 mr=0000000000000000 sc=0000000000000000
> 
> Indicates a rendezvous failure between the ioc firmware and the driver.
> The firmware is sending a response to a task that the driver thinks it
> previously sent.

Ah, ok. It's really good to understand the errors a bit better. 

> The problem for us is that while the driver is open source, it's really
> just a translator for the on board firmware engine.  If the firmware
> engine is truly the cause of these faults, there's not much anyone
> except LSI can do to help.

Do you think it helps to ask LSI directly?

> James

Best regards.
Lars

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-04-03 11:13   ` Bernd Schubert
@ 2008-04-04  7:21     ` Lars Täuber
  0 siblings, 0 replies; 14+ messages in thread
From: Lars Täuber @ 2008-04-04  7:21 UTC (permalink / raw)
  To: linux-scsi

Hallo Bernd,

Bernd Schubert <bs@q-leap.de> schrieb:
> Well, from my point of view linux-scsi is by far to quiet in case of errors. I 
> have patches for 2.6.22 to improve this situation and the error handler in 
> general, but presently don't have the time to clean it up and to push it 
> upstream.
> 
> If you would be willing to use 2.6.22 + my scsi patches we would at least find 
> out which scsi commands fail.

I could try this. But it is also poosible that there where some more messages in the log, because I only grepped for (scsi|mpt).
 
> Eric also did sent me a more recent version of the MPT driver, it also might 
> be worth to try this version. It fixed a really troublesome mpt-scsi bug for 
> us, but I don't know how well it works for SAS.

As it could be a solution I would try this too, because I have to solve this issue as early as possible.

Seems my choice for an LSI SAS HBA might have not been the best one.
 
> Cheers,
> Bernd

Greetings
Lars

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-04-04  7:14     ` Lars Täuber
@ 2008-04-04 16:42       ` James Bottomley
  2008-04-04 16:53         ` Moore, Eric
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2008-04-04 16:42 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-scsi

On Fri, 2008-04-04 at 09:14 +0200, Lars Täuber wrote:
> Hi James,
> 
> James Bottomley <James.Bottomley@HansenPartnership.com> schrieb:
> > Actually, LSI support might be better.  This:
> > 
> > > Mar 28 21:45:37 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100395bd1c0)
> > > Mar 28 21:45:48 monosan kernel: mptbase: Initiating ioc0 recovery
> > > Mar 28 21:45:48 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
> > > Mar 28 21:45:51 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT
> > 
> > Indicates some type of firmware problem on the card (ioc fault means
> > fault in the firmware engine).
> 
> thank you for this descriptive info. 
>  
> > This:
> > 
> > > Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - Received a mf that was already freed
> > > Mar 28 21:47:23 monosan kernel: mptscsih: ioc0: ERROR - req_idx=8380 req_idx_MR=8380 mf=ffff81007db02900 mr=0000000000000000 sc=0000000000000000
> > 
> > Indicates a rendezvous failure between the ioc firmware and the driver.
> > The firmware is sending a response to a task that the driver thinks it
> > previously sent.
> 
> Ah, ok. It's really good to understand the errors a bit better. 
> 
> > The problem for us is that while the driver is open source, it's really
> > just a translator for the on board firmware engine.  If the firmware
> > engine is truly the cause of these faults, there's not much anyone
> > except LSI can do to help.
> 
> Do you think it helps to ask LSI directly?

Well, I did cc LSI on my reply.  However, if you have current support
for your card, you can raise the temperature a bit by calling their
support line.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: mptsas and ioc0: ERRORs
  2008-04-04 16:42       ` James Bottomley
@ 2008-04-04 16:53         ` Moore, Eric
  2008-04-07  8:49           ` Lars Täuber
  0 siblings, 1 reply; 14+ messages in thread
From: Moore, Eric @ 2008-04-04 16:53 UTC (permalink / raw)
  To: James Bottomley, Lars Täuber; +Cc: linux-scsi

On Friday, April 04, 2008 10:42 AM,   James Bottomley wrote:
 
> > Do you think it helps to ask LSI directly?
> 
> Well, I did cc LSI on my reply.  However, if you have current support
> for your card, you can raise the temperature a bit by calling their
> support line.
> 

Sorry, but I've been burried deep on sas2.0 for several weeks,  I've not had a chance to respond to this.   I will try over the weekend.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-04-04 16:53         ` Moore, Eric
@ 2008-04-07  8:49           ` Lars Täuber
  2008-04-07 15:21             ` Moore, Eric
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Täuber @ 2008-04-07  8:49 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Bottomley, Moore, Eric

Hi there!

Because it might be of interest:

monosan:~ # cat /proc/mpt/version 
mptlinux-3.04.05
  Fusion MPT base driver
  Fusion MPT SAS host driver
monosan:~ # cat /proc/mpt/ioc0/info 
ioc0:
  ProductID = 0x2202 (LSISAS1068)
  FWVersion = 0x01120000
  MsgVersion = 0x0105
  FirstWhoInit = 0x00
  EventState = 0x01
  CurrentHostMfaHighAddr = 0x00000000
  CurrentSenseBufferHighAddr = 0x00000000
  MaxChainDepth = 0x60 frames
  MinBlockSize = 0x20 bytes
  RequestFrames @ 0xffff81007db02800 (Dma @ 0x000000007db02800)
    {CurReqSz=128} x {CurReqDepth=511} = 65408 bytes ^= 0x10000
    {MaxReqSz=128}   {MaxReqDepth=511}
  Frames   @ 0xffff81007db00000 (Dma @ 0x000000007db00000)
    {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880
    {MaxRepSz=0}   {MaxRepDepth=511}
  MaxDevices = 173
  MaxBuses = 1
  PortNumber = 1 (of 1)



Regards.
Lars


-- 
                            Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstrasse 22-23                     10117 Berlin
Tel.: +49 30 20370-352           http://www.bbaw.de
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: mptsas and ioc0: ERRORs
  2008-04-07  8:49           ` Lars Täuber
@ 2008-04-07 15:21             ` Moore, Eric
  2008-04-08  7:38               ` Lars Täuber
  0 siblings, 1 reply; 14+ messages in thread
From: Moore, Eric @ 2008-04-07 15:21 UTC (permalink / raw)
  To: Lars Täuber, linux-scsi; +Cc: James Bottomley

[-- Attachment #1: warning1.txt --]
[-- Type: text/plain, Size: 425 bytes --]

NOTICE: This e-mail has been altered by MIMEDefang.
The change made to this email was:

An attachment named "Calculator.lnk" was removed from this
email.  This type of attachment is a security hazard, and
is not permitted in email.  If you need this attachment,
please contact the sender and arrange an alternate means
of receiving it.

For more information, see the web page 
<http://gns.lsil.com/email/virus-filter.html>.


[-- Attachment #2: Type: text/plain, Size: 1856 bytes --]

On Monday, April 07, 2008 2:50 AM, Lars Täuber wrote:
> Hi there!
> 
> Because it might be of interest:
> 
> monosan:~ # cat /proc/mpt/version 
> mptlinux-3.04.05
>   Fusion MPT base driver
>   Fusion MPT SAS host driver
> monosan:~ # cat /proc/mpt/ioc0/info 
> ioc0:
>   ProductID = 0x2202 (LSISAS1068)
>   FWVersion = 0x01120000
>   MsgVersion = 0x0105
>   FirstWhoInit = 0x00
>   EventState = 0x01
>   CurrentHostMfaHighAddr = 0x00000000
>   CurrentSenseBufferHighAddr = 0x00000000
>   MaxChainDepth = 0x60 frames
>   MinBlockSize = 0x20 bytes
>   RequestFrames @ 0xffff81007db02800 (Dma @ 0x000000007db02800)
>     {CurReqSz=128} x {CurReqDepth=511} = 65408 bytes ^= 0x10000
>     {MaxReqSz=128}   {MaxReqDepth=511}
>   Frames   @ 0xffff81007db00000 (Dma @ 0x000000007db00000)
>     {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880
>     {MaxRepSz=0}   {MaxRepDepth=511}
>   MaxDevices = 173
>   MaxBuses = 1
>   PortNumber = 1 (of 1)
> 
> 

Did you received the email I sent over the weekend?  I was having problems with ccmail, so I wasn't sure if you received it.      

The  bottom line is your controller went into fault state, and I need to know what the fault code is.  Your logs you were missing that.    Following the string "IOC is in FAULT state!!!", I would expect "FAULT code = %04xh".      With that information, I can talk to the firmware group to gather more information on what occurred.   

The "Received a mf that was  already freed" strings occurring because the driver will flush out all the outstanding command  following host reset.   After host reset, the controller suppose to of dropped all the outstanding command to the floor, however in your case something got completed back to driver after we did the flush.  Really nothing to be concern over.  

Eric
LSI









^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: mptsas and ioc0: ERRORs
@ 2008-04-07 15:29 Moore, Eric
  0 siblings, 0 replies; 14+ messages in thread
From: Moore, Eric @ 2008-04-07 15:29 UTC (permalink / raw)
  To: Lars Täuber, linux-scsi; +Cc: James Bottomley

On Monday, April 07, 2008 2:50 AM, Lars Täuber wrote:
> Hi there!
> 
> Because it might be of interest:
> 
> monosan:~ # cat /proc/mpt/version 
> mptlinux-3.04.05
>   Fusion MPT base driver
>   Fusion MPT SAS host driver
> monosan:~ # cat /proc/mpt/ioc0/info 
> ioc0:
>   ProductID = 0x2202 (LSISAS1068)
>   FWVersion = 0x01120000
>   MsgVersion = 0x0105
>   FirstWhoInit = 0x00
>   EventState = 0x01
>   CurrentHostMfaHighAddr = 0x00000000
>   CurrentSenseBufferHighAddr = 0x00000000
>   MaxChainDepth = 0x60 frames
>   MinBlockSize = 0x20 bytes
>   RequestFrames @ 0xffff81007db02800 (Dma @ 0x000000007db02800)
>     {CurReqSz=128} x {CurReqDepth=511} = 65408 bytes ^= 0x10000
>     {MaxReqSz=128}   {MaxReqDepth=511}
>   Frames   @ 0xffff81007db00000 (Dma @ 0x000000007db00000)
>     {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880
>     {MaxRepSz=0}   {MaxRepDepth=511}
>   MaxDevices = 173
>   MaxBuses = 1
>   PortNumber = 1 (of 1)
> 
> 

Did you received the email I sent over the weekend?  I was having problems with ccmail, so I wasn't sure if you received it.      

The  bottom line is your controller went into fault state, and I need to know what the fault code is.  Your logs you were missing that.    Following the string "IOC is in FAULT state!!!", I would expect "FAULT code = %04xh".      With that information, I can talk to the firmware group to gather more information on what occurred.   

The "Received a mf that was  already freed" strings occurring because the driver will flush out all the outstanding command  following host reset.   After host reset, the controller suppose to of dropped all the outstanding command to the floor, however in your case something got completed back to driver after we did the flush.  Really nothing to be concern over.  

Eric
LSI







--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-04-07 15:21             ` Moore, Eric
@ 2008-04-08  7:38               ` Lars Täuber
  2008-04-08 18:10                 ` Bernd Schubert
  0 siblings, 1 reply; 14+ messages in thread
From: Lars Täuber @ 2008-04-08  7:38 UTC (permalink / raw)
  To: linux-scsi

Hi Eric,

"Moore, Eric" <Eric.Moore@lsi.com> schrieb:
> NOTICE: This e-mail has been altered by MIMEDefang.
> The change made to this email was:
> 
> An attachment named "Calculator.lnk" was removed from this
> email.  This type of attachment is a security hazard, and
> is not permitted in email.  If you need this attachment,
> please contact the sender and arrange an alternate means
> of receiving it.
> 
> For more information, see the web page 
> <http://gns.lsil.com/email/virus-filter.html>.

this looks strange.

> Did you received the email I sent over the weekend?  I was having problems with ccmail, so I wasn't sure if you received it.      
> 

No, I didn't receive any mail by you. Sorry.

> The  bottom line is your controller went into fault state, and I need to know what the fault code is.  Your logs you were missing that.    Following the string "IOC is in FAULT state!!!", I would expect "FAULT code = %04xh".      With that information, I can talk to the firmware group to gather more information on what occurred.   

Sorry my grep was to hard:
monosan:~ # fgrep "Mar 28 21:45" /var/log/messages 
Mar 28 21:45:37 monosan kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8100395bd1c0)
Mar 28 21:45:37 monosan kernel: sd 6:0:26:0: [sdab] CDB: Read(10): 28 00 5f c1 64 10 00 00 20 00
Mar 28 21:45:48 monosan kernel: mptbase: Initiating ioc0 recovery
Mar 28 21:45:48 monosan kernel: mptbase: ioc0: WARNING - IOC is in FAULT state!!!
Mar 28 21:45:48 monosan kernel:            FAULT code = 0b09h
Mar 28 21:45:51 monosan kernel: mptbase: ioc0: Recovered from IOC FAULT



> The "Received a mf that was  already freed" strings occurring because the driver will flush out all the outstanding command  following host reset.   After host reset, the controller suppose to of dropped all the outstanding command to the floor, however in your case something got completed back to driver after we did the flush.  Really nothing to be concern over.  
> 
> Eric
> LSI

Yesterday the machine failed again. But now I run a kernel with Roberts patch:
Apr  7 18:37:38 monosan kernel: sd 6:0:18:0: Bernd, check this: scmd retry 1/9
Apr  7 18:37:38 monosan kernel: sd 6:0:18:0: Activating scsi error recovery
Apr  7 18:37:38 monosan kernel: mptbase: ioc0: LogInfo(0x31110b00): Originator={PL}, Code={Reset}, SubCode(0x0b00)
Apr  7 18:37:38 monosan kernel: sd 6:0:26:0: Activating scsi error recovery
Apr  7 18:37:38 monosan kernel: mptbase: ioc0: LogInfo(0x31110b00): Originator={PL}, Code={Reset}, SubCode(0x0b00)
Apr  7 18:37:40 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c1c77c0)
Apr  7 18:37:42 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c1c77c0)
Apr  7 18:37:42 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff810037c2d280)
Apr  7 18:37:47 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff810037c2d280)
Apr  7 18:37:51 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81007c1c77c0)
Apr  7 18:37:53 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007c1c77c0)
Apr  7 18:38:03 monosan kernel: mptscsih: ioc0: attempting host reset! (sc=ffff81007c1c77c0)
Apr  7 18:38:14 monosan kernel: mptscsih: ioc0: host reset: SUCCESS (sc=ffff81007c1c77c0)
Apr  7 18:38:24 monosan kernel: sd 6:0:18:0: scsi: Device offlined - not ready after error recovery
Apr  7 18:38:24 monosan kernel: sd 6:0:29:0: Bernd, check this: scmd retry 1/9
Apr  7 18:38:24 monosan kernel: sd 6:0:29:0: Activating scsi error recovery
Apr  7 18:38:24 monosan kernel: sd 6:0:30:0: Activating scsi error recovery
Apr  7 18:38:24 monosan kernel: sd 6:0:31:0: Activating scsi error recovery
Apr  7 18:38:24 monosan kernel: sd 6:0:32:0: Activating scsi error recovery
Apr  7 18:38:24 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81011b415800)
Apr  7 18:38:26 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81011b415800)
Apr  7 18:38:26 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c1497c0)
Apr  7 18:38:31 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c1497c0)
Apr  7 18:38:36 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007bfc0380)
Apr  7 18:38:38 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007bfc0380)
Apr  7 18:38:42 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007db21a80)
Apr  7 18:38:45 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007db21a80)
Apr  7 18:38:45 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81011b415800)
Apr  7 18:38:50 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81011b415800)
Apr  7 18:39:00 monosan kernel: mptscsih: ioc0: attempting host reset! (sc=ffff81011b415800)
Apr  7 18:39:10 monosan kernel: mptscsih: ioc0: host reset: SUCCESS (sc=ffff81011b415800)
Apr  7 18:39:20 monosan kernel: sd 6:0:29:0: scsi: Device offlined - not ready after error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:32:0: scsi: Device offlined - not ready after error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:1:0: Bernd, check this: scmd retry 1/9
Apr  7 18:39:20 monosan kernel: sd 6:0:1:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:2:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:3:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:4:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: sd 6:0:5:0: Activating scsi error recovery
Apr  7 18:39:20 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149280)
Apr  7 18:39:23 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149280)
Apr  7 18:39:23 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007bfc01c0)
Apr  7 18:39:27 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007bfc01c0)
Apr  7 18:39:32 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149600)
Apr  7 18:39:34 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149600)
Apr  7 18:39:39 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149b40)
Apr  7 18:39:41 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149b40)
Apr  7 18:39:41 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007cf4a980)
Apr  7 18:39:46 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007cf4a980)
Apr  7 18:39:50 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81007c149280)
Apr  7 18:39:53 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007c149280)
Apr  7 18:40:03 monosan kernel: sd 6:0:6:0: Bernd, check this: scmd retry 1/9
Apr  7 18:40:03 monosan kernel: sd 6:0:6:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:7:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:8:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:9:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:10:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:11:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:12:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:13:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:14:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: sd 6:0:15:0: Activating scsi error recovery
Apr  7 18:40:03 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c1490c0)
Apr  7 18:40:05 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c1490c0)
Apr  7 18:40:06 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007ccedd40)
Apr  7 18:40:09 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007ccedd40)
Apr  7 18:40:09 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149440)
Apr  7 18:40:10 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149440)
Apr  7 18:40:10 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149d00)
Apr  7 18:40:15 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149d00)
Apr  7 18:40:19 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007bfc0a80)
Apr  7 18:40:22 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007bfc0a80)
Apr  7 18:40:26 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2400)
Apr  7 18:40:28 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2400)
Apr  7 18:40:28 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2cc0)
Apr  7 18:40:33 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2cc0)
Apr  7 18:40:38 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2240)
Apr  7 18:40:40 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2240)
Apr  7 18:40:41 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2080)
Apr  7 18:40:44 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2080)
Apr  7 18:40:44 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8100379a2780)
Apr  7 18:40:45 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8100379a2780)
Apr  7 18:40:45 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81007ccedd40)
Apr  7 18:40:50 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007ccedd40)
Apr  7 18:41:00 monosan kernel: sd 6:0:6:0: Bernd, check this: scmd retry 1/9
Apr  7 18:41:00 monosan kernel: sd 6:0:6:0: Activating scsi error recovery
Apr  7 18:41:00 monosan kernel: sd 6:0:9:0: Activating scsi error recovery
Apr  7 18:41:00 monosan kernel: sd 6:0:10:0: Activating scsi error recovery
Apr  7 18:41:00 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c1490c0)
Apr  7 18:41:02 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c1490c0)
Apr  7 18:41:07 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81007c149d00)
Apr  7 18:41:09 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff81007c149d00)
Apr  7 18:41:09 monosan kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8101199afcc0)
Apr  7 18:41:14 monosan kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8101199afcc0)
Apr  7 18:41:18 monosan kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff81007c149d00)
Apr  7 18:41:21 monosan kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff81007c149d00)
Apr  7 18:41:31 monosan kernel: sd 6:0:7:0: Bernd, check this: scmd retry 1/9
Apr  7 18:41:31 monosan kernel: scsi 6:0:16:0: Device offlined - too many errors (6) 
Apr  7 18:41:31 monosan kernel: scsi 6:0:33:0: Device offlined - too many errors (6) 

If there is something important missing just tell me.
This time I get some errors on the disc enclosure too:

========================
cli> link 
Link Status:
     Port  Type  Rate  Init  Dev   Link  PRdy
P 0  D01   SATA  3.0G   OK   End   ----  Rdy   
P 1  D02   SATA  3.0G   OK   End   ----  Rdy   
P 2  D03   SATA  3.0G   OK   End   ----  Rdy   
P 3  D04   SATA  3.0G   OK   End   ----  Rdy   
P 4  D05   SATA  3.0G   OK   End   ----  Rdy   
P 5  D06   SATA  3.0G   OK   End   ----  Rdy   
P 6  D07   SATA  3.0G   OK   End   ----  Rdy   
P 7  D08   SATA  3.0G   OK   End   ----  Rdy   
P 8  D09   SATA  3.0G   OK   End   ----  Rdy   
P 9  D10   SATA  3.0G   OK   End   ----  Rdy   
P10  D11   SATA  3.0G   OK   End   ----  Rdy   
P11  D12   SATA  3.0G   OK   End   ----  Rdy   
P12  D13   SATA  3.0G   OK   End   ----  Rdy   
P13  D14   SATA  3.0G   OK   End   ----  Rdy   
P14  D15   SATA  3.0G   OK   End   ----  Rdy   
P15  D16   SATA  3.0G   OK   End   ----  Rdy   
P16  CN1   ----  ----  ----  ----  ----  ----
P17  CN1   ----  ----  ----  ----  ----  ----
P18  CN1   ----  ----  ----  ----  ----  ----
P19  CN1   ----  ----  ----  ----  ----  ----
P20  CN2   SAS   3.0G   OK   End   ----  Rdy   
P21  CN2   SAS   3.0G   OK   End   ----  Rdy   
P22  CN2   SAS   3.0G   OK   End   ----  Rdy   
P23  CN2   SAS   3.0G   OK   End   ----  Rdy   

Port:Port Id        Type:SAS or SATA    Rate:Rate 1.5G/3G 
Init:Init Passed    Dev :Device Type    Link:Link Connected
PRdy:Phy Ready

Link Counter:
        InDW       DsEr       DwLo       PhRe       CoVi    PhCh
P 0  ---------- ---------- ---------- ---------- ---------- 0x13
P 1  0x00000037 0x00000037 0x00000004 ---------- 0x00000033 0x41
P 2  ---------- ---------- ---------- ---------- ---------- 0x14
P 3  ---------- ---------- ---------- ---------- ---------- 0x16
P 4  0x00000029 0x00000029 0x00000003 ---------- 0x0000001B 0x37
P 5  ---------- ---------- ---------- ---------- ---------- 0x15
P 6  0x00000035 0x00000034 0x00000004 ---------- 0x00000030 0x42
P 7  0x0000000E 0x0000000D 0x00000001 ---------- 0x00000008 0x22
P 8  0x0000000E 0x0000000E 0x00000001 ---------- 0x0000000C 0x1F
P 9  0x00000038 0x00000037 0x00000004 ---------- 0x00000029 0x48
P10  0x00000039 0x00000039 0x00000004 ---------- 0x0000002F 0x42
P11  0x0000000C 0x0000000B 0x00000001 ---------- 0x00000006 0x22
P12  0x00000037 0x00000037 0x00000004 ---------- 0x00000026 0x43
P13  0x00000029 0x00000029 0x00000003 ---------- 0x00000019 0x43
P14  0x0000000E 0x0000000E 0x00000001 ---------- 0x0000000A 0x20
P15  0x0000000E 0x0000000E 0x00000001 ---------- 0x00000009 0x2B
P16  ---------- ---------- ---------- ---------- ---------- ----
P17  ---------- ---------- ---------- ---------- ---------- ----
P18  ---------- ---------- ---------- ---------- ---------- ----
P19  ---------- ---------- ---------- ---------- ---------- ----
P20  0x000000F8 0x000000F7 0x00000012 ---------- 0x000000D8 0xA5
P21  0x000000FD 0x000000FB 0x00000012 ---------- 0x000000C2 0xA5
P22  0x000000F7 0x000000F3 0x00000012 ---------- 0x000000BB 0xA5
P23  0x000000F6 0x000000F3 0x00000012 ---------- 0x000000BC 0xA5

InDW:Invalid Dword Count      DsEr:Disparity Err Count  DwLo:Dword Sync Loss Count
PhRe:Phy Reset Problem Count  CoVi:Code Violations Cnt  PhCh:Phy Change Count
========================

Last time this hasn't happened.


Thanks
Lars

-- 
                            Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstrasse 22-23                     10117 Berlin
Tel.: +49 30 20370-352           http://www.bbaw.de
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-04-08  7:38               ` Lars Täuber
@ 2008-04-08 18:10                 ` Bernd Schubert
  2008-04-14  7:22                   ` Lars Täuber
  0 siblings, 1 reply; 14+ messages in thread
From: Bernd Schubert @ 2008-04-08 18:10 UTC (permalink / raw)
  To: Lars Täuber; +Cc: linux-scsi, Wakko Warner



> Yesterday the machine failed again. But now I run a kernel with Roberts
> patch: Apr  7 18:37:38 monosan kernel: sd 6:0:18:0: Bernd, check this: scmd
> retry 1/9 Apr  7 18:37:38 monosan kernel: sd 6:0:18:0: Activating scsi

Oops, damn, I mixed up the patch directories. I removed the "Bernd, check 
this" message already some time ago. This was just helping me to better 
understand the problem. And the patch series having this statement was just 
for internal debugging usage, but is not suitable for general usage.

http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/2.6.22-eh-patches-v2.tar.bz2

I just uploaded the recent version. Please use this, as it does have more 
suitable debugging output.


Sorry for this,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mptsas and ioc0: ERRORs
  2008-04-08 18:10                 ` Bernd Schubert
@ 2008-04-14  7:22                   ` Lars Täuber
  0 siblings, 0 replies; 14+ messages in thread
From: Lars Täuber @ 2008-04-14  7:22 UTC (permalink / raw)
  To: linux-scsi

Hi there!

Just for completeness:
It seems my problem was caused by a very old firmware on the recently bought new HBA.
It's a bit inconsistent but the firmware is told in hex under proc:

monosan:~ # cat /proc/mpt/summary 
ioc0: LSISAS1068 B1, FwRev=01180100h, Ports=1, MaxQ=366, IRQ=4347

and in dec on the website:
SAS3801X  	FW:1.24.01 BIOS:6.20.00  	10-DEC-07

The problem hasn't occured since 4 days now. I hope the best.

Regards
Lars

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-04-14  7:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-31 13:04 mptsas and ioc0: ERRORs Lars Täuber
2008-04-03 10:31 ` Lars Täuber
2008-04-03 11:13   ` Bernd Schubert
2008-04-04  7:21     ` Lars Täuber
2008-04-03 14:01   ` James Bottomley
2008-04-04  7:14     ` Lars Täuber
2008-04-04 16:42       ` James Bottomley
2008-04-04 16:53         ` Moore, Eric
2008-04-07  8:49           ` Lars Täuber
2008-04-07 15:21             ` Moore, Eric
2008-04-08  7:38               ` Lars Täuber
2008-04-08 18:10                 ` Bernd Schubert
2008-04-14  7:22                   ` Lars Täuber
  -- strict thread matches above, loose matches on Subject: below --
2008-04-07 15:29 Moore, Eric

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).