From: Mike Anderson <andmike@linux.vnet.ibm.com>
To: Paul Smith <paul@mad-scientist.net>
Cc: linux-scsi@vger.kernel.org,
James Bottomley <James.Bottomley@HansenPartnership.com>,
Mike Christie <michaelc@cs.wisc.edu>
Subject: Re: [2.6.27.25] Hang in SCSI sync cache when a disk is removed--?
Date: Mon, 6 Jul 2009 23:25:44 -0700 [thread overview]
Message-ID: <20090707062543.GA2459@linux.vnet.ibm.com> (raw)
In-Reply-To: <1246903453.9022.7246.camel@psmith-ubeta.netezza.com>
Paul Smith <paul@mad-scientist.net> wrote:
>
I was expecting a little more output from the error handler thread, but
the log does show a few things.
It would be good if in the failing case you could provide a sysrq "t"
output so I could understand where the reset handler is waiting.
It appears there are a few things going on.
1.) The dm deactivate calling blk_abort_queue is leading to error handler
activation. Similar to a previously described issue.
http://permalink.gmane.org/gmane.linux.kernel.device-mapper.devel/8543
- This kernel does not have DID_TRANSPORT_DISRUPTED so that
avoidance method cannot be used.
2.) The task aborts are completing, but the tur is most likely being
failed with a response of DID_BUS_BUSY leading to continued recovery.
3.) We appear to be inside mpt_HardResetHandler, but need more info to
understand where in the call chain.
T
> kernel log output:
> ------------------
> Jul 6 13:31:18 user.info kernel: mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:18 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:27 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:27 user.info kernel: mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:35 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:35 user.info kernel: sd 3:0:1:0: [sdn] Result: hostbyte=0x02 driverbyte=0x00
> Jul 6 13:31:35 user.warn kernel: device-mapper: multipath: Failing path 8:208.
> Jul 6 13:31:35 daemon.notice multipathd: 8:208: mark as failed
> Jul 6 13:31:35 daemon.notice multipathd: encl1Slot2: remaining active paths: 1
> Jul 6 13:31:43 user.info kernel: mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:43 user.info kernel: sd 2:0:1:0: [sdb] <6>mptscsih: ioc0: attempting task abort! (sc=ffff8804623e4b40)
> Jul 6 13:31:43 user.info kernel: sd 2:0:1:0: [sdb] CDB: cdb[0]=0x12: 12 01 80 00 fe 00
> Jul 6 13:31:43 user.info kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff8804623e4b40)
> Jul 6 13:31:43 user.warn kernel: Result: hostbyte=0x02 driverbyte=0x00
> Jul 6 13:31:43 user.warn kernel: device-mapper: multipath: Failing path 8:16.
> Jul 6 13:31:43 daemon.notice multipathd: 8:16: mark as failed
> Jul 6 13:31:43 daemon.notice multipathd: encl1Slot2: remaining active paths: 0
> Jul 6 13:31:51 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:51 user.info kernel: mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:51 user.info kernel: mptscsih: ioc0: attempting task abort! (sc=ffff880463a6b680)
> Jul 6 13:31:51 user.info kernel: sd 2:0:1:0: [sdb] CDB:
> Jul 6 13:31:51 user.info kernel: cdb[0]=0x12
> Jul 6 13:31:51 user.info kernel: : 12
> Jul 6 13:31:51 user.info kernel: 01
> Jul 6 13:31:51 user.info kernel: 80 00
> Jul 6 13:31:51 user.info kernel: fe 00
> Jul 6 13:31:51 user.info kernel:
> Jul 6 13:31:51 user.info kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff880463a6b680)
> Jul 6 13:31:59 daemon.notice multipathd: sdn: readsector0 checker reports path is down
> Jul 6 13:31:59 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:59 user.info kernel: mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:31:59 user.info kernel: mptscsih: ioc0: attempting target reset! (sc=ffff8804623e4b40)
> Jul 6 13:31:59 user.info kernel: sd 2:0:1:0: [sdb] CDB: cdb[0]=0x12
> Jul 6 13:31:59 user.info kernel: : 12
> Jul 6 13:31:59 user.info kernel: 01 80
> Jul 6 13:31:59 user.info kernel: 00 fe
> Jul 6 13:31:59 user.info kernel: 00
> Jul 6 13:31:59 user.info kernel:
> Jul 6 13:31:59 user.info kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff8804623e4b40)
> Jul 6 13:32:07 user.info kernel: mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:32:07 user.info kernel: mptscsih: ioc0: attempting bus reset! (sc=ffff8804623e4b40)
> Jul 6 13:32:07 user.info kernel: sd 2:0:1:0: [sdb] CDB:
> Jul 6 13:32:07 user.info kernel: cdb[0]=0x12
> Jul 6 13:32:07 user.info kernel: : 12
> Jul 6 13:32:07 user.info kernel: 01
> Jul 6 13:32:07 user.info kernel: 80 00
> Jul 6 13:32:07 user.info kernel: fe
> Jul 6 13:32:07 user.info kernel: 00
> Jul 6 13:32:11 user.info kernel: mptscsih: ioc0: bus reset: SUCCESS (sc=ffff8804623e4b40)
> Jul 6 13:32:23 user.info kernel: mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:32:31 user.info kernel: mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:32:31 user.info kernel: mptscsih: ioc0: attempting host reset! (sc=ffff8804623e4b40)
> Jul 6 13:32:42 user.info kernel: mptscsih: ioc0: host reset: SUCCESS (sc=ffff8804623e4b40)
> Jul 6 13:32:43 user.info kernel: mptbase: ioc0: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501)
> Jul 6 13:32:43 user.info kernel: mptsas: ioc0: removing ssp device: fw_channel 0, fw_id 27, phy 1, sas_addr 0x5000c5000d298821
> Jul 6 13:32:43 user.debug kernel: phy-2:2:79: mptsas: ioc0: delete phy 1, phy-obj (0xffff88046a5e6800)
> Jul 6 13:32:43 user.debug kernel: port-2:2:1: mptsas: ioc0: delete port 1, sas_addr (0x5000c5000d298821)
> Jul 6 13:32:43 user.notice kernel: sd 2:0:1:0: [sdb] Synchronizing SCSI cache
> Jul 6 13:32:52 user.info kernel: sd 2:0:1:0: Device offlined - not ready after error recovery
> Jul 6 13:32:53 user.info kernel: sd 2:0:1:0: [sdb] md: super_written gets error=-5, uptodate=0
> Jul 6 13:32:53 user.alert kernel: raid1: Disk failure on dm-13, disabling device.
> Jul 6 13:32:53 user.alert kernel: raid1: Operation continuing on 1 devices.
> Jul 6 13:32:53 daemon.notice multipathd: sdb: readsector0 checker reports path is down
> Jul 6 13:32:53 user.warn kernel: Result: hostbyte=0x01 driverbyte=0x00
> Jul 6 13:32:53 user.info kernel: mptbase: ioc0: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501)
> Jul 6 13:32:53 user.warn kernel: RAID1 conf printout:
> Jul 6 13:32:53 user.warn kernel: --- wd:1 rd:2
> Jul 6 13:32:53 user.warn kernel: disk 0, wo:0, o:1, dev:dm-6
> Jul 6 13:32:53 user.warn kernel: disk 1, wo:1, o:0, dev:dm-13
> Jul 6 13:32:53 user.warn kernel: RAID1 conf printout:
> Jul 6 13:32:53 user.warn kernel: --- wd:1 rd:2
> Jul 6 13:32:53 user.warn kernel: disk 0, wo:0, o:1, dev:dm-6
> Jul 6 13:32:55 daemon.notice multipathd: sdn: readsector0 checker reports path is down
> Jul 6 13:32:55 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:33:02 user.info kernel: mptscsih: ioc1: attempting task abort! (sc=ffff880461f6a280)
> Jul 6 13:33:02 user.info kernel: sd 3:0:1:0: CDB:
> Jul 6 13:33:02 user.info kernel: cdb[0]=0x12
> Jul 6 13:33:02 user.info kernel: :
> Jul 6 13:33:02 user.info kernel: 12
> Jul 6 13:33:02 user.info kernel: 00 00
> Jul 6 13:33:02 user.info kernel: 00 24
> Jul 6 13:33:02 user.info kernel: 00
> Jul 6 13:33:02 user.info kernel:
> Jul 6 13:33:02 user.info kernel: mptbase: ioc1: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
> Jul 6 13:33:02 user.info kernel: mptscsih: ioc1: task abort: SUCCESS (sc=ffff880461f6a280)
> Jul 6 13:33:03 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:33:03 user.info kernel: mptscsih: ioc1: attempting task abort! (sc=ffff88046c04a000)
> Jul 6 13:33:03 user.info kernel: sd 3:0:1:0: [sdn] CDB:
> Jul 6 13:33:03 user.info kernel: cdb[0]=0x12
> Jul 6 13:33:03 user.info kernel: : 12
> Jul 6 13:33:03 user.info kernel: 01
> Jul 6 13:33:03 user.info kernel: 80 00
> Jul 6 13:33:03 user.info kernel: fe
> Jul 6 13:33:03 user.info kernel: 00
> Jul 6 13:33:03 user.info kernel:
> Jul 6 13:33:03 user.info kernel: mptscsih: ioc1: task abort: SUCCESS (sc=ffff88046c04a000)
> Jul 6 13:33:04 user.err kernel: scsi 2:0:1:0: rejecting I/O to dead device
> Jul 6 13:33:04 daemon.notice multipathd: sdb: readsector0 checker reports path is down
> Jul 6 13:33:11 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:33:11 user.info kernel: mptscsih: ioc1: attempting task abort! (sc=ffff880463a6adc0)
> Jul 6 13:33:11 user.info kernel: sd 3:0:1:0: [sdn] CDB:
> Jul 6 13:33:11 user.info kernel: cdb[0]=0x12
> Jul 6 13:33:11 user.info kernel: : 12
> Jul 6 13:33:11 user.info kernel: 01
> Jul 6 13:33:11 user.info kernel: 80 00
> Jul 6 13:33:11 user.info kernel: fe
> Jul 6 13:33:11 user.info kernel: 00
> Jul 6 13:33:11 user.info kernel: mptscsih: ioc1: task abort: SUCCESS (sc=ffff880463a6adc0)
> Jul 6 13:33:19 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:33:19 user.info kernel: mptscsih: ioc1: attempting target reset! (sc=ffff880461f6a280)
> Jul 6 13:33:19 user.info kernel: sd 3:0:1:0: CDB:
> Jul 6 13:33:19 user.info kernel: cdb[0]=0x12
> Jul 6 13:33:19 user.info kernel: : 12
> Jul 6 13:33:19 user.info kernel: 00 00
> Jul 6 13:33:19 user.info kernel: 00 24
> Jul 6 13:33:19 user.info kernel: 00
> Jul 6 13:33:19 user.info kernel:
> Jul 6 13:33:19 user.info kernel: mptscsih: ioc1: target reset: SUCCESS (sc=ffff880461f6a280)
> Jul 6 13:33:27 user.info kernel: mptbase: ioc1: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
> Jul 6 13:33:27 user.info kernel: mptscsih: ioc1: attempting bus reset! (sc=ffff880461f6a280)
> Jul 6 13:33:27 user.info kernel: sd 3:0:1:0: CDB:
> Jul 6 13:33:27 user.info kernel: cdb[0]=0x12
> Jul 6 13:33:27 user.info kernel: :
> Jul 6 13:33:27 user.info kernel: 12
> Jul 6 13:33:27 user.info kernel: 00
> Jul 6 13:33:27 user.info kernel: 24
> Jul 6 13:33:27 user.info kernel: 00
> Jul 6 13:33:27 user.info kernel:
> Jul 6 13:33:31 user.info kernel: mptscsih: ioc1: bus reset: SUCCESS (sc=ffff880461f6a280)
> Jul 6 13:33:42 user.info kernel: mptbase: ioc1: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
> Jul 6 13:33:42 user.info kernel: mptscsih: ioc1: attempting host reset! (sc=ffff880461f6a280)
> Jul 6 13:33:42 user.info kernel: mptsas: ioc1: removing ssp device: fw_channel 0, fw_id 62, phy 1, sas_addr 0x5000c5000d298822
> Jul 6 13:33:42 user.debug kernel: phy-3:2:79: mptsas: ioc1: delete phy 1, phy-obj (0xffff88046de9dc00)
> Jul 6 13:33:42 user.debug kernel: port-3:2:1: mptsas: ioc1: delete port 1, sas_addr (0x5000c5000d298822)
> Jul 6 13:33:42 user.notice kernel: sd 3:0:1:0: [sdn] Synchronizing SCSI cache
>
>
>
-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com
next prev parent reply other threads:[~2009-07-07 6:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-02 16:22 [2.6.27.25] Hang in SCSI sync cache when a disk is removed--? Paul Smith
2009-07-02 17:41 ` Mike Anderson
2009-07-02 20:12 ` Paul Smith
2009-07-06 18:04 ` Paul Smith
2009-07-07 6:25 ` Mike Anderson [this message]
2009-07-07 13:58 ` James Bottomley
2009-07-07 14:33 ` Paul Smith
2009-07-07 20:24 ` Desai, Kashyap
2009-07-07 20:45 ` Mike Anderson
2009-07-07 21:10 ` Mike Anderson
2009-07-21 10:16 ` Desai, Kashyap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090707062543.GA2459@linux.vnet.ibm.com \
--to=andmike@linux.vnet.ibm.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=linux-scsi@vger.kernel.org \
--cc=michaelc@cs.wisc.edu \
--cc=paul@mad-scientist.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox