public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] mpt fusion error handler patches
@ 2008-09-12 18:57 Bernd Schubert
  2008-09-12 18:59 ` [PATCH 1/5] scsi abort Bernd Schubert
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Bernd Schubert @ 2008-09-12 18:57 UTC (permalink / raw)
  To: Linux SCSI Mailing List; +Cc: Eric Moore, Sathya Prakash

Hello,

I'm going to submit several error handler patches for the MPT fusion 
driver. The purpose of these patches is mainly to fix errors happening 
on the second port of dual port 53C1030 based HBAs.
As I complained some time ago on this list, a device failure on one of the 
ports of LSI22320R HBAs, will also cause device failures of innocent devices 
on the other port of this HBA. In order to debug this Eric Moore sent me a 
fusion-tip version of this driver, which we have been using ever since. However, 
this version has issues with SAS HBAs and probably also won't work for recent kernel 
versions. So I spent quite some amount of time to figure out why fusion-tip 
version (4.x) of the driver doesn't have the issue.

Below I will provide the some examples of these issues. Errors on one of the attached 
scsi devices have been simulated using lsiutil by doing target of one of the attached 
devices on one of the port (5 0 4 0).

Unpatched 2.6.26 + a few scsi diagnostics and error handler patches:

[  224.819697] sd 5:0:4:0: last recovery: 4294911483, now: 4294948403
[  224.826142] sd 5:0:4:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
[  224.831676] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 0c 27 2e 98 00 00 04 00 00 00
[  224.842803] sd 5:0:4:0: Activating scsi error recovery (1)
[  224.857824] sd 5:0:4:0: trying to abort command
[  224.865697] mptscsih: ioc1: attempting task abort! (sc=ffff8100f8f10000)
[  224.870572] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 0c 27 2e 98 00 00 04 00 00 00
[  227.047968] mptbase: ioc1: Initiating recovery
[  229.481849] sd 5:0:4:0: mptscsih: ioc1: completing cmds: fw_channel 0, fw_id 4, sc=ffff8100f8fbb180, mf = ffff8100
[...]
[  364.322013] mptscsih: ioc1: bus reset: SUCCESS (sc=ffff8100f8f11b80)
[  371.924342] sd 4:0:2:0: scmd retry 6/6
[  371.928364] sd 4:0:2:0: last recovery: 0, now: 4294985148
[  371.932924] sd 4:0:2:0: [sda] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
[  371.932924] sd 4:0:2:0: [sda] CDB: Write(16): 8a 00 00 00 00 01 31 8b 4a 4e 00 00 00 39 00 00
[  371.932924] sd 4:0:2:0: Activating scsi error recovery (1)
[  371.960382] sd 4:0:2:0: Sending BDR 0xffff81007eaf2538
[  371.984936] sd 4:0:2:0: trying device reset
[  371.989426] mptscsih: ioc0: attempting target reset! (sc=ffff81007eb7c780)

As you can see, suddenly also target 4 0 2 0 fails, which is ioc0. In the end:

[  398.596119] sd 4:0:2:0: [sda] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
[  398.605291] end_request: I/O error, dev sda, sector 5126179406
[  398.612360] end_request: I/O error, dev sda, sector 5126179406
[  398.617818]  target4:0:2: Beginning Domain Validation

So the innocent device sda (which is really another device) failed.

Now the same with patches applied, but with the soft reset-handler deactivated:

[  912.861708] sd 5:0:4:0: last recovery: 4295082734, now: 4295120387
[  912.868130] sd 5:0:4:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_

[  912.873757] sd 5:0:4:0: [sdc] CDB: Write(10): 2a 00 73 11 33 08 00 04 00 00
[  912.873757] sd 5:0:4:0: Activating scsi error recovery (2)
[  912.889492] sd 5:0:4:0: trying to abort command
[  912.894118] mptscsih: ioc1: attempting task abort! (sc=ffff8100e361d180)
[  912.900951] sd 5:0:4:0: [sdc] CDB: Write(10): 2a 00 73 11 33 08 00 04 00 00
[  913.025771] mptscsih: ioc1: task abort: FAILED (sc=ffff8100e361d180)
[  913.032269] sd 5:0:4:0: Sending BDR 0xffff8100f99e1428
[  913.040264] sd 5:0:4:0: trying device reset
[  913.044597] mptscsih: ioc1: attempting target reset! (sc=ffff8100e361d180)
[  913.049955] sd 5:0:4:0: [sdc] CDB: Write(10): 2a 00 73 11 33 08 00 04 00 00
[  913.177284] mptscsih: ioc1: target reset: FAILED (sc=ffff8100e361d180)
[  913.181946] Sending BRST chan: 0
[  913.185945] sd 5:0:4:0: trying bus reset
[  913.189974] mptscsih: ioc1: attempting bus reset! (sc=ffff8100e361d180)
[  913.197310] sd 5:0:4:0: [sdc] CDB: Write(10): 2a 00 73 11 33 08 00 04 00 00
[  913.325079] mptscsih: ioc1: bus reset: FAILED (sc=ffff8100e361d180)
[  913.329668] sd 5:0:4:0: trying host reset
[  913.333864] mptscsih: ioc1: attempting host reset! (sc=ffff8100e361d180)
[  913.341832] mptscsih: ioc1: Skipping hard reset in order to prevent failures on ioc

[  913.349821] mptscsih: ioc1: host reset: FAILED (sc=ffff8100e361d180)
[  913.356704] sd 5:0:4:0: Device offlined - not ready after error recovery
[  913.363199] sd 5:0:4:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK

=> The device was not recovered, but at least 4 0 2 0 didn't fail :)

Now with all patches applied:

[  214.903699] sd 5:0:4:0: last recovery: 0, now: 4294945953
[  214.910652] sd 5:0:4:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_OK,SUGGEST_OK
[  214.918652] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 31 8b 9c e7 00 00 00 39 00 00
[  214.918652] sd 5:0:4:0: Activating scsi error recovery (1)
[  214.934655] sd 5:0:4:0: trying to abort command
[  214.939581] mptscsih: ioc1: attempting task abort! (sc=ffff8100f9be0c80)
[  214.947581] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 31 8b 9c e7 00 00 00 39 00 00
[  215.077430] mptscsih: ioc1: task abort: FAILED (sc=ffff8100f9be0c80)
[  215.083645] sd 5:0:4:0: Sending BDR 0xffff81007eb51428
[  215.090298] sd 5:0:4:0: trying device reset
[  215.094810] mptscsih: ioc1: attempting target reset! (sc=ffff8100f9be0c80)
[  215.101917] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 31 8b 9c e7 00 00 00 39 00 00
[  215.229659] mptscsih: ioc1: target reset: FAILED (sc=ffff8100f9be0c80)
[  215.236367] Sending BRST chan: 0
[  215.240173] sd 5:0:4:0: trying bus reset
[  215.244313] mptscsih: ioc1: attempting bus reset! (sc=ffff8100f9be0c80)
[  215.251731] sd 5:0:4:0: [sdc] CDB: Write(16): 8a 00 00 00 00 01 31 8b 9c e7 00 00 00 39 00 00
[  215.382449] mptscsih: ioc1: bus reset: FAILED (sc=ffff8100f9be0c80)
[  215.388946] sd 5:0:4:0: trying host reset
[  215.393162] mptscsih: ioc1: attempting host reset! (sc=ffff8100f9be0c80)
[  215.400489] sd 5:0:4:0: mptscsih: ioc1: completing cmds: fw_channel 0, fw_id 4, sc=ffff8100f9be0c80, mf = ffff8105
[  217.317914] mptbase: ioc1: SoftResetHandler: completed (1 seconds): SUCCESS
[  217.324924] mptscsih: ioc1: host reset: SUCCESS (sc=ffff8100f9be0c80)
[  227.546452]  target5:0:4: Beginning Domain Validation
[  227.578775]  target5:0:4: Ending Domain Validation
[  227.584099]  target5:0:4: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS PCOMP (6.25 ns, offset 127)
[  227.596959]  target5:0:5: Beginning Domain Validation
[  227.651196]  target5:0:5: Ending Domain Validation
[  227.656977]  target5:0:5: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS PCOMP (6.25 ns, offset 127)


-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-09-13 12:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-12 18:57 [PATCH 0/5] mpt fusion error handler patches Bernd Schubert
2008-09-12 18:59 ` [PATCH 1/5] scsi abort Bernd Schubert
2008-09-12 19:00 ` [PATCH 2/5] fusion reset handler Bernd Schubert
2008-09-12 19:01 ` [PATCH 3/5] fusion remove the TMHandler Bernd Schubert
2008-09-12 19:02 ` [PATCH 4/5] fusion prevent DV deadlock Bernd Schubert
2008-09-13 12:24   ` Bernd Schubert
2008-09-12 19:03 ` [PATCH 5/5] fusion disable scsi hard resets Bernd Schubert
2008-09-13  4:32 ` [PATCH 0/5] mpt fusion error handler patches Mr. James W. Laferriere
2008-09-13 12:25 ` Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox