linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems with Seagate 8TB SMR archive drives
@ 2015-08-14 17:52 Tejas Rao
  2015-08-14 18:19 ` Tejas Rao
  2015-08-14 18:22 ` Jeff Johnson
  0 siblings, 2 replies; 4+ messages in thread
From: Tejas Rao @ 2015-08-14 17:52 UTC (permalink / raw)
  To: linux-raid

I am aware that the Seagate SMR 8tb archive drives are not meant to be 
used in a RAID environment as they lack TLER/ERC support.

We are trying to use these drives with mdraid and are seeing problems. 
It seems that after writing to these drives for few hours, occasionally 
some drives stop responding for few minutes and recover on its own 
later. (This was expected).

I have increased the device timeouts to 480 seconds now 
(sys/block/<device>/device/timeout). The md device is assembled on top 
of dm-multipath devices and dm-multipath is configured to retry 50 times 
(no_path_retry=50). I have also changed queue_depth for each device to 1 
(NCQ disabled).

Usually when I see retries and a drive stops responding, it recovers on 
its own after few minutes and the md layer does not fail the disk. 
Occasionally though, the md layer fails the disk after ~15 seconds or so 
of the drive becoming non-responsive. See below.

sdlm and sdfb are the same disk (mpathaw, dm-231,dm-226). Why does the 
md layer not wait for 480 seconds before failing the disk. As you can 
see the drive recovered after ~ 8 minutes but the md layer failed it 
after ~15 seconds.

What other tunables can I tune to avoid kicking a drive out early.

Aug 14 13:01:12 dc045 kernel: mpt2sas3: log_info(0x31120303): 
originator(PL), code(0x12), sub_code(0x0303)
Aug 14 13:01:12 dc045 kernel: mpt2sas3: 
_scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
Aug 14 13:01:12 dc045 kernel: mpt2sas1: 
_scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
Aug 14 13:05:24 dc045 kernel: mpt2sas3: log_info(0x31120303): 
originator(PL), code(0x12), sub_code(0x0303)
Aug 14 13:05:24 dc045 kernel: mpt2sas3: 
_scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
Aug 14 13:05:24 dc045 kernel: mpt2sas1: 
_scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
Aug 14 13:12:19 dc045 kernel: mpt2sas3: log_info(0x31120303): 
originator(PL), code(0x12), sub_code(0x0303)
Aug 14 13:12:19 dc045 kernel: mpt2sas3: 
_scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
Aug 14 13:12:19 dc045 kernel: mpt2sas1: 
_scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
Aug 14 13:12:35 dc045 kernel: __ratelimit: 2 callbacks suppressed
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 00 
55 6a 5e c8 00 00 b8 00
Aug 14 13:12:35 dc045 kernel: __ratelimit: 14 callbacks suppressed
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 00 
55 6a 5f 80 00 00 10 00
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 00 
55 6a 5f 90 00 03 38 00
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Read(16): 88 00 
00 00 00 01 d5 59 50 a8 00 00 00 08 00 00
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(16): 88 00 
00 00 00 01 d5 59 50 b0 00 00 00 80 00 00
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Read(10): 28 00 
55 6a 78 00 00 00 50 00
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
dm-226, sector 7874367664
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
dm-226, sector 1433040896
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 00 
00 00 08 08 00 00 01 00
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
sdfb, sector 2056
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(10): 28 00 
55 6a 7b e0 00 00 20 00
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
sdlm, sector 1433041888
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
dm-226, sector 1433041888
Aug 14 13:12:35 dc045 kernel: mpt2sas1: 
_scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(10): 28 00 
00 00 00 00 00 00 08 00
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
sdlm, sector 0
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
dm-226, sector 2056
Aug 14 13:12:35 dc045 kernel: md: super_written gets error=-121, uptodate=0
Aug 14 13:12:35 dc045 kernel: md/raid:md8: Disk failure on dm-231, 
disabling device.
Aug 14 13:12:35 dc045 kernel: md/raid:md8: Operation continuing on 13 
devices.
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : Hardware 
Error [current]
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
ASCQ=0x0ASC=0x81 ASCQ=0x0
Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 00 
55 6a 62 c8 00 04 00 00
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
sdlm, sector 1433035464
Aug 14 13:12:35 dc045 kernel: mpt2sas3: 
_scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
dm-226, sector 1433035464
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
sdfb, sector 1433036488
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
dm-226, sector 1433036488
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
sdlm, sector 1433037512
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
dm-226, sector 1433037512
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
sdfb, sector 2128
Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
dm-226, sector 2128
Aug 14 13:12:35 dc045 kernel: md: super_written gets error=-121, uptodate=0
Aug 14 13:12:36 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:12:36 dc045 multipathd: checker failed path 68:320 in map mpathaw
Aug 14 13:12:36 dc045 multipathd: mpathaw: remaining active paths: 1
Aug 14 13:12:36 dc045 kernel: device-mapper: multipath: Failing path 68:320.
Aug 14 13:13:17 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:13:25 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:13:25 dc045 multipathd: checker failed path 129:208 in map mpathaw
Aug 14 13:13:25 dc045 multipathd: mpathaw: Entering recovery mode: 
max_retries=50
Aug 14 13:13:25 dc045 multipathd: mpathaw: remaining active paths: 0
Aug 14 13:13:25 dc045 kernel: device-mapper: multipath: Failing path 
129:208.
Aug 14 13:13:25 dc045 multipathd: mpathaw: Entering recovery mode: 
max_retries=50
Aug 14 13:13:27 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:13:35 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:13:37 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:13:45 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:13:47 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:13:55 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:13:57 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:14:05 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:14:07 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:14:15 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:14:17 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:14:25 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:14:27 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:14:35 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:14:37 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:14:46 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:14:48 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:14:56 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:14:58 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:15:06 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:15:08 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:15:16 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:15:18 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:15:26 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:15:28 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:15:36 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:15:38 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:15:46 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:15:48 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:15:56 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:15:58 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:16:06 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:16:08 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:16:16 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:16:18 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:16:26 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:16:28 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:16:37 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:16:39 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:16:47 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:16:49 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:16:57 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:16:59 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:17:07 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:17:09 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:17:17 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:17:19 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:17:27 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:17:29 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:17:37 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:17:39 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:17:47 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:17:49 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:17:57 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:17:59 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:18:07 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:18:09 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:18:17 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:18:19 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:18:28 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:18:30 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:18:38 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:18:40 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:18:48 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:18:50 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:18:58 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:19:00 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:19:08 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:19:10 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:19:18 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:19:20 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:19:28 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:19:30 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:19:38 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:19:40 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:19:48 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:19:50 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:19:58 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:20:00 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:20:08 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:20:11 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:20:19 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:20:21 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:20:29 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is down
Aug 14 13:20:31 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is down
Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: attempting task abort! 
scmd(ffff881c6e49aec0)
Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 00 
55 6a 72 c8 00 01 68 00
Aug 14 13:20:37 dc045 kernel: scsi target25:0:42: handle(0x0037), 
sas_address(0x5000c5007b2ee20d), phy(18)
Aug 14 13:20:37 dc045 kernel: scsi target25:0:42: 
enclosure_logical_id(0x500093d00104c000), slot(39)
Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: attempting task abort! 
scmd(ffff880c35fba7c0)
Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 00 
55 6a 6e c8 00 04 00 00
Aug 14 13:20:37 dc045 kernel: scsi target27:0:42: handle(0x0037), 
sas_address(0x5000c5007b2ee20e), phy(18)
Aug 14 13:20:37 dc045 kernel: scsi target27:0:42: 
enclosure_logical_id(0x500093d00104c000), slot(39)
Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: task abort: SUCCESS 
scmd(ffff881c6e49aec0)
Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: task abort: SUCCESS 
scmd(ffff880c35fba7c0)
Aug 14 13:20:37 dc045 kernel: mpt2sas3: 
_scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
Aug 14 13:20:37 dc045 kernel: mpt2sas1: 
_scsih_sas_broadcast_primitive_event: enter: phy number(5), width(8)
Aug 14 13:20:39 dc045 multipathd: mpathaw: sdfb - directio checker 
reports path is up
Aug 14 13:20:39 dc045 multipathd: 129:208: reinstated
Aug 14 13:20:39 dc045 multipathd: mpathaw: queue_if_no_path enabled
Aug 14 13:20:39 dc045 multipathd: mpathaw: Recovered to normal mode
Aug 14 13:20:39 dc045 multipathd: mpathaw: remaining active paths: 1
Aug 14 13:20:41 dc045 multipathd: mpathaw: sdlm - directio checker 
reports path is up
Aug 14 13:20:41 dc045 multipathd: 68:320: reinstated
Aug 14 13:20:41 dc045 multipathd: mpathaw: remaining active paths: 2
Aug 14 13:21:12 dc045 kernel: md: unbind<dm-231>
Aug 14 13:21:12 dc045 kernel: md: export_rdev(dm-231)








^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with Seagate 8TB SMR archive drives
  2015-08-14 17:52 Problems with Seagate 8TB SMR archive drives Tejas Rao
@ 2015-08-14 18:19 ` Tejas Rao
  2015-08-14 18:22 ` Jeff Johnson
  1 sibling, 0 replies; 4+ messages in thread
From: Tejas Rao @ 2015-08-14 18:19 UTC (permalink / raw)
  To: linux-raid

What does "md: super_written gets error=-121, uptodate=0" mean? Looking 
at the md man page, I see the below. Does the error mean a write failed? 
Can we make it so that md waits for the write to complete for 480 seconds?

If the md driver detects a write error on a device in a RAID1, RAID4, 
RAID5, RAID6, or RAID10 array, it immediately disables that device 
(marking it as faulty)  and  continues  operation  on  the remaining 
devices.

On 8/14/2015 13:52, Tejas Rao wrote:
> I am aware that the Seagate SMR 8tb archive drives are not meant to be 
> used in a RAID environment as they lack TLER/ERC support.
>
> We are trying to use these drives with mdraid and are seeing problems. 
> It seems that after writing to these drives for few hours, 
> occasionally some drives stop responding for few minutes and recover 
> on its own later. (This was expected).
>
> I have increased the device timeouts to 480 seconds now 
> (sys/block/<device>/device/timeout). The md device is assembled on top 
> of dm-multipath devices and dm-multipath is configured to retry 50 
> times (no_path_retry=50). I have also changed queue_depth for each 
> device to 1 (NCQ disabled).
>
> Usually when I see retries and a drive stops responding, it recovers 
> on its own after few minutes and the md layer does not fail the disk. 
> Occasionally though, the md layer fails the disk after ~15 seconds or 
> so of the drive becoming non-responsive. See below.
>
> sdlm and sdfb are the same disk (mpathaw, dm-231,dm-226). Why does the 
> md layer not wait for 480 seconds before failing the disk. As you can 
> see the drive recovered after ~ 8 minutes but the md layer failed it 
> after ~15 seconds.
>
> What other tunables can I tune to avoid kicking a drive out early.
>
> Aug 14 13:01:12 dc045 kernel: mpt2sas3: log_info(0x31120303): 
> originator(PL), code(0x12), sub_code(0x0303)
> Aug 14 13:01:12 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:01:12 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:05:24 dc045 kernel: mpt2sas3: log_info(0x31120303): 
> originator(PL), code(0x12), sub_code(0x0303)
> Aug 14 13:05:24 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:05:24 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:19 dc045 kernel: mpt2sas3: log_info(0x31120303): 
> originator(PL), code(0x12), sub_code(0x0303)
> Aug 14 13:12:19 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:19 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:35 dc045 kernel: __ratelimit: 2 callbacks suppressed
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
> 00 55 6a 5e c8 00 00 b8 00
> Aug 14 13:12:35 dc045 kernel: __ratelimit: 14 callbacks suppressed
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 
> 00 55 6a 5f 80 00 00 10 00
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
> 00 55 6a 5f 90 00 03 38 00
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Read(16): 88 
> 00 00 00 00 01 d5 59 50 a8 00 00 00 08 00 00
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(16): 88 
> 00 00 00 00 01 d5 59 50 b0 00 00 00 80 00 00
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Read(10): 28 
> 00 55 6a 78 00 00 00 50 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 7874367664
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433040896
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 
> 00 00 00 08 08 00 00 01 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdfb, sector 2056
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(10): 28 
> 00 55 6a 7b e0 00 00 20 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdlm, sector 1433041888
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433041888
> Aug 14 13:12:35 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(10): 28 
> 00 00 00 00 00 00 00 08 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdlm, sector 0
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 2056
> Aug 14 13:12:35 dc045 kernel: md: super_written gets error=-121, 
> uptodate=0
> Aug 14 13:12:35 dc045 kernel: md/raid:md8: Disk failure on dm-231, 
> disabling device.
> Aug 14 13:12:35 dc045 kernel: md/raid:md8: Operation continuing on 13 
> devices.
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
> 00 55 6a 62 c8 00 04 00 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdlm, sector 1433035464
> Aug 14 13:12:35 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433035464
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdfb, sector 1433036488
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433036488
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdlm, sector 1433037512
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433037512
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdfb, sector 2128
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 2128
> Aug 14 13:12:35 dc045 kernel: md: super_written gets error=-121, 
> uptodate=0
> Aug 14 13:12:36 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:12:36 dc045 multipathd: checker failed path 68:320 in map 
> mpathaw
> Aug 14 13:12:36 dc045 multipathd: mpathaw: remaining active paths: 1
> Aug 14 13:12:36 dc045 kernel: device-mapper: multipath: Failing path 
> 68:320.
> Aug 14 13:13:17 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:13:25 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:13:25 dc045 multipathd: checker failed path 129:208 in map 
> mpathaw
> Aug 14 13:13:25 dc045 multipathd: mpathaw: Entering recovery mode: 
> max_retries=50
> Aug 14 13:13:25 dc045 multipathd: mpathaw: remaining active paths: 0
> Aug 14 13:13:25 dc045 kernel: device-mapper: multipath: Failing path 
> 129:208.
> Aug 14 13:13:25 dc045 multipathd: mpathaw: Entering recovery mode: 
> max_retries=50
> Aug 14 13:13:27 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:13:35 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:13:37 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:13:45 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:13:47 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:13:55 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:13:57 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:05 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:07 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:15 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:17 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:25 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:27 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:35 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:37 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:46 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:48 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:56 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:58 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:06 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:08 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:16 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:18 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:26 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:28 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:36 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:38 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:46 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:48 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:56 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:58 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:06 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:08 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:16 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:18 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:26 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:28 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:37 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:39 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:47 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:49 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:57 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:59 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:07 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:09 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:17 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:19 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:27 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:29 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:37 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:39 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:47 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:49 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:57 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:59 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:07 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:09 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:17 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:19 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:28 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:30 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:38 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:40 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:48 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:50 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:58 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:00 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:08 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:10 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:18 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:20 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:28 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:30 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:38 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:40 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:48 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:50 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:58 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:20:00 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:20:08 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:20:11 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:20:19 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:20:21 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:20:29 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:20:31 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: attempting task abort! 
> scmd(ffff881c6e49aec0)
> Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 
> 00 55 6a 72 c8 00 01 68 00
> Aug 14 13:20:37 dc045 kernel: scsi target25:0:42: handle(0x0037), 
> sas_address(0x5000c5007b2ee20d), phy(18)
> Aug 14 13:20:37 dc045 kernel: scsi target25:0:42: 
> enclosure_logical_id(0x500093d00104c000), slot(39)
> Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: attempting task abort! 
> scmd(ffff880c35fba7c0)
> Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
> 00 55 6a 6e c8 00 04 00 00
> Aug 14 13:20:37 dc045 kernel: scsi target27:0:42: handle(0x0037), 
> sas_address(0x5000c5007b2ee20e), phy(18)
> Aug 14 13:20:37 dc045 kernel: scsi target27:0:42: 
> enclosure_logical_id(0x500093d00104c000), slot(39)
> Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: task abort: SUCCESS 
> scmd(ffff881c6e49aec0)
> Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: task abort: SUCCESS 
> scmd(ffff880c35fba7c0)
> Aug 14 13:20:37 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:20:37 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(5), width(8)
> Aug 14 13:20:39 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is up
> Aug 14 13:20:39 dc045 multipathd: 129:208: reinstated
> Aug 14 13:20:39 dc045 multipathd: mpathaw: queue_if_no_path enabled
> Aug 14 13:20:39 dc045 multipathd: mpathaw: Recovered to normal mode
> Aug 14 13:20:39 dc045 multipathd: mpathaw: remaining active paths: 1
> Aug 14 13:20:41 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is up
> Aug 14 13:20:41 dc045 multipathd: 68:320: reinstated
> Aug 14 13:20:41 dc045 multipathd: mpathaw: remaining active paths: 2
> Aug 14 13:21:12 dc045 kernel: md: unbind<dm-231>
> Aug 14 13:21:12 dc045 kernel: md: export_rdev(dm-231)
>
>
>
>
>
>
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with Seagate 8TB SMR archive drives
  2015-08-14 17:52 Problems with Seagate 8TB SMR archive drives Tejas Rao
  2015-08-14 18:19 ` Tejas Rao
@ 2015-08-14 18:22 ` Jeff Johnson
  2015-08-14 18:27   ` Tejas Rao
  1 sibling, 1 reply; 4+ messages in thread
From: Jeff Johnson @ 2015-08-14 18:22 UTC (permalink / raw)
  To: Tejas Rao, linux-raid

Tejas,

You are probably running firmware AR13 or older. The drives are designed 
for very power efficient archive use and they have a very aggressive 
spin down timer. Once they drop to a slower speed or stopped it will 
take a significant time to get back up to ready. I would highly suggest 
updating to code AR14 or AR15.

The drives aren't really intended for your use model. Perhaps a cron job 
that does a dd read (direct mode) of a hundred MBs or so from the md 
device to dev/null every couple minutes might be enough to keep them 
warm and not spun down.

--Jeff


On 8/14/15 10:52 AM, Tejas Rao wrote:
> I am aware that the Seagate SMR 8tb archive drives are not meant to be 
> used in a RAID environment as they lack TLER/ERC support.
>
> We are trying to use these drives with mdraid and are seeing problems. 
> It seems that after writing to these drives for few hours, 
> occasionally some drives stop responding for few minutes and recover 
> on its own later. (This was expected).
>
> I have increased the device timeouts to 480 seconds now 
> (sys/block/<device>/device/timeout). The md device is assembled on top 
> of dm-multipath devices and dm-multipath is configured to retry 50 
> times (no_path_retry=50). I have also changed queue_depth for each 
> device to 1 (NCQ disabled).
>
> Usually when I see retries and a drive stops responding, it recovers 
> on its own after few minutes and the md layer does not fail the disk. 
> Occasionally though, the md layer fails the disk after ~15 seconds or 
> so of the drive becoming non-responsive. See below.
>
> sdlm and sdfb are the same disk (mpathaw, dm-231,dm-226). Why does the 
> md layer not wait for 480 seconds before failing the disk. As you can 
> see the drive recovered after ~ 8 minutes but the md layer failed it 
> after ~15 seconds.
>
> What other tunables can I tune to avoid kicking a drive out early.
>
> Aug 14 13:01:12 dc045 kernel: mpt2sas3: log_info(0x31120303): 
> originator(PL), code(0x12), sub_code(0x0303)
> Aug 14 13:01:12 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:01:12 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:05:24 dc045 kernel: mpt2sas3: log_info(0x31120303): 
> originator(PL), code(0x12), sub_code(0x0303)
> Aug 14 13:05:24 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:05:24 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:19 dc045 kernel: mpt2sas3: log_info(0x31120303): 
> originator(PL), code(0x12), sub_code(0x0303)
> Aug 14 13:12:19 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:19 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:35 dc045 kernel: __ratelimit: 2 callbacks suppressed
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
> 00 55 6a 5e c8 00 00 b8 00
> Aug 14 13:12:35 dc045 kernel: __ratelimit: 14 callbacks suppressed
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 
> 00 55 6a 5f 80 00 00 10 00
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
> 00 55 6a 5f 90 00 03 38 00
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Read(16): 88 
> 00 00 00 00 01 d5 59 50 a8 00 00 00 08 00 00
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(16): 88 
> 00 00 00 00 01 d5 59 50 b0 00 00 00 80 00 00
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Read(10): 28 
> 00 55 6a 78 00 00 00 50 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 7874367664
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433040896
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 
> 00 00 00 08 08 00 00 01 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdfb, sector 2056
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(10): 28 
> 00 55 6a 7b e0 00 00 20 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdlm, sector 1433041888
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433041888
> Aug 14 13:12:35 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(10): 28 
> 00 00 00 00 00 00 00 08 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdlm, sector 0
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 2056
> Aug 14 13:12:35 dc045 kernel: md: super_written gets error=-121, 
> uptodate=0
> Aug 14 13:12:35 dc045 kernel: md/raid:md8: Disk failure on dm-231, 
> disabling device.
> Aug 14 13:12:35 dc045 kernel: md/raid:md8: Operation continuing on 13 
> devices.
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
> Hardware Error [current]
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> ASC=0x81 
> ASCQ=0x0ASC=0x81 ASCQ=0x0
> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
> 00 55 6a 62 c8 00 04 00 00
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdlm, sector 1433035464
> Aug 14 13:12:35 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433035464
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdfb, sector 1433036488
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433036488
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdlm, sector 1433037512
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 1433037512
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> sdfb, sector 2128
> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
> dm-226, sector 2128
> Aug 14 13:12:35 dc045 kernel: md: super_written gets error=-121, 
> uptodate=0
> Aug 14 13:12:36 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:12:36 dc045 multipathd: checker failed path 68:320 in map 
> mpathaw
> Aug 14 13:12:36 dc045 multipathd: mpathaw: remaining active paths: 1
> Aug 14 13:12:36 dc045 kernel: device-mapper: multipath: Failing path 
> 68:320.
> Aug 14 13:13:17 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:13:25 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:13:25 dc045 multipathd: checker failed path 129:208 in map 
> mpathaw
> Aug 14 13:13:25 dc045 multipathd: mpathaw: Entering recovery mode: 
> max_retries=50
> Aug 14 13:13:25 dc045 multipathd: mpathaw: remaining active paths: 0
> Aug 14 13:13:25 dc045 kernel: device-mapper: multipath: Failing path 
> 129:208.
> Aug 14 13:13:25 dc045 multipathd: mpathaw: Entering recovery mode: 
> max_retries=50
> Aug 14 13:13:27 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:13:35 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:13:37 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:13:45 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:13:47 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:13:55 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:13:57 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:05 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:07 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:15 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:17 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:25 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:27 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:35 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:37 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:46 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:48 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:14:56 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:14:58 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:06 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:08 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:16 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:18 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:26 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:28 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:36 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:38 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:46 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:48 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:15:56 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:15:58 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:06 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:08 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:16 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:18 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:26 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:28 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:37 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:39 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:47 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:49 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:16:57 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:16:59 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:07 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:09 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:17 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:19 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:27 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:29 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:37 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:39 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:47 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:49 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:17:57 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:17:59 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:07 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:09 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:17 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:19 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:28 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:30 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:38 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:40 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:48 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:18:50 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:18:58 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:00 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:08 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:10 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:18 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:20 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:28 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:30 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:38 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:40 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:48 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:19:50 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:19:58 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:20:00 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:20:08 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:20:11 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:20:19 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:20:21 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:20:29 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is down
> Aug 14 13:20:31 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is down
> Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: attempting task abort! 
> scmd(ffff881c6e49aec0)
> Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 
> 00 55 6a 72 c8 00 01 68 00
> Aug 14 13:20:37 dc045 kernel: scsi target25:0:42: handle(0x0037), 
> sas_address(0x5000c5007b2ee20d), phy(18)
> Aug 14 13:20:37 dc045 kernel: scsi target25:0:42: 
> enclosure_logical_id(0x500093d00104c000), slot(39)
> Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: attempting task abort! 
> scmd(ffff880c35fba7c0)
> Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
> 00 55 6a 6e c8 00 04 00 00
> Aug 14 13:20:37 dc045 kernel: scsi target27:0:42: handle(0x0037), 
> sas_address(0x5000c5007b2ee20e), phy(18)
> Aug 14 13:20:37 dc045 kernel: scsi target27:0:42: 
> enclosure_logical_id(0x500093d00104c000), slot(39)
> Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: task abort: SUCCESS 
> scmd(ffff881c6e49aec0)
> Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: task abort: SUCCESS 
> scmd(ffff880c35fba7c0)
> Aug 14 13:20:37 dc045 kernel: mpt2sas3: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
> Aug 14 13:20:37 dc045 kernel: mpt2sas1: 
> _scsih_sas_broadcast_primitive_event: enter: phy number(5), width(8)
> Aug 14 13:20:39 dc045 multipathd: mpathaw: sdfb - directio checker 
> reports path is up
> Aug 14 13:20:39 dc045 multipathd: 129:208: reinstated
> Aug 14 13:20:39 dc045 multipathd: mpathaw: queue_if_no_path enabled
> Aug 14 13:20:39 dc045 multipathd: mpathaw: Recovered to normal mode
> Aug 14 13:20:39 dc045 multipathd: mpathaw: remaining active paths: 1
> Aug 14 13:20:41 dc045 multipathd: mpathaw: sdlm - directio checker 
> reports path is up
> Aug 14 13:20:41 dc045 multipathd: 68:320: reinstated
> Aug 14 13:20:41 dc045 multipathd: mpathaw: remaining active paths: 2
> Aug 14 13:21:12 dc045 kernel: md: unbind<dm-231>
> Aug 14 13:21:12 dc045 kernel: md: export_rdev(dm-231)
>
>
>
>
>
>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-performance Computing / Lustre Filesystems / Scale-out Storage


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with Seagate 8TB SMR archive drives
  2015-08-14 18:22 ` Jeff Johnson
@ 2015-08-14 18:27   ` Tejas Rao
  0 siblings, 0 replies; 4+ messages in thread
From: Tejas Rao @ 2015-08-14 18:27 UTC (permalink / raw)
  To: Jeff Johnson, linux-raid

The drives are running AR15.

The md device (md8) has XFS on it and I am transferring 33TB of data 
using 'cp' command, so I doubt the drives are spinning down as they are 
continuously being written to. We plan to use them for archive/backup 
purposes only. After the initial ingest (filling up the drives), the 
drives will be accessed occasionally for very small reads/writes. It 
would be nice though to ingest in a more stable way.

On 8/14/2015 14:22, Jeff Johnson wrote:
> Tejas,
>
> You are probably running firmware AR13 or older. The drives are 
> designed for very power efficient archive use and they have a very 
> aggressive spin down timer. Once they drop to a slower speed or 
> stopped it will take a significant time to get back up to ready. I 
> would highly suggest updating to code AR14 or AR15.
>
> The drives aren't really intended for your use model. Perhaps a cron 
> job that does a dd read (direct mode) of a hundred MBs or so from the 
> md device to dev/null every couple minutes might be enough to keep 
> them warm and not spun down.
>
> --Jeff
>
>
> On 8/14/15 10:52 AM, Tejas Rao wrote:
>> I am aware that the Seagate SMR 8tb archive drives are not meant to 
>> be used in a RAID environment as they lack TLER/ERC support.
>>
>> We are trying to use these drives with mdraid and are seeing 
>> problems. It seems that after writing to these drives for few hours, 
>> occasionally some drives stop responding for few minutes and recover 
>> on its own later. (This was expected).
>>
>> I have increased the device timeouts to 480 seconds now 
>> (sys/block/<device>/device/timeout). The md device is assembled on 
>> top of dm-multipath devices and dm-multipath is configured to retry 
>> 50 times (no_path_retry=50). I have also changed queue_depth for each 
>> device to 1 (NCQ disabled).
>>
>> Usually when I see retries and a drive stops responding, it recovers 
>> on its own after few minutes and the md layer does not fail the disk. 
>> Occasionally though, the md layer fails the disk after ~15 seconds or 
>> so of the drive becoming non-responsive. See below.
>>
>> sdlm and sdfb are the same disk (mpathaw, dm-231,dm-226). Why does 
>> the md layer not wait for 480 seconds before failing the disk. As you 
>> can see the drive recovered after ~ 8 minutes but the md layer failed 
>> it after ~15 seconds.
>>
>> What other tunables can I tune to avoid kicking a drive out early.
>>
>> Aug 14 13:01:12 dc045 kernel: mpt2sas3: log_info(0x31120303): 
>> originator(PL), code(0x12), sub_code(0x0303)
>> Aug 14 13:01:12 dc045 kernel: mpt2sas3: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
>> Aug 14 13:01:12 dc045 kernel: mpt2sas1: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
>> Aug 14 13:05:24 dc045 kernel: mpt2sas3: log_info(0x31120303): 
>> originator(PL), code(0x12), sub_code(0x0303)
>> Aug 14 13:05:24 dc045 kernel: mpt2sas3: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
>> Aug 14 13:05:24 dc045 kernel: mpt2sas1: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
>> Aug 14 13:12:19 dc045 kernel: mpt2sas3: log_info(0x31120303): 
>> originator(PL), code(0x12), sub_code(0x0303)
>> Aug 14 13:12:19 dc045 kernel: mpt2sas3: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
>> Aug 14 13:12:19 dc045 kernel: mpt2sas1: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
>> Aug 14 13:12:35 dc045 kernel: __ratelimit: 2 callbacks suppressed
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
>> 00 55 6a 5e c8 00 00 b8 00
>> Aug 14 13:12:35 dc045 kernel: __ratelimit: 14 callbacks suppressed
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 
>> 00 55 6a 5f 80 00 00 10 00
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
>> 00 55 6a 5f 90 00 03 38 00
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Read(16): 88 
>> 00 00 00 00 01 d5 59 50 a8 00 00 00 08 00 00
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(16): 88 
>> 00 00 00 00 01 d5 59 50 b0 00 00 00 80 00 00
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Read(10): 28 
>> 00 55 6a 78 00 00 00 50 00
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> dm-226, sector 7874367664
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> dm-226, sector 1433040896
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 
>> 00 00 00 08 08 00 00 01 00
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> sdfb, sector 2056
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(10): 28 
>> 00 55 6a 7b e0 00 00 20 00
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> sdlm, sector 1433041888
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> dm-226, sector 1433041888
>> Aug 14 13:12:35 dc045 kernel: mpt2sas1: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Read(10): 28 
>> 00 00 00 00 00 00 00 08 00
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> sdlm, sector 0
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> dm-226, sector 2056
>> Aug 14 13:12:35 dc045 kernel: md: super_written gets error=-121, 
>> uptodate=0
>> Aug 14 13:12:35 dc045 kernel: md/raid:md8: Disk failure on dm-231, 
>> disabling device.
>> Aug 14 13:12:35 dc045 kernel: md/raid:md8: Operation continuing on 13 
>> devices.
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Unhandled sense code
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_SENSE
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] Sense Key : 
>> Hardware Error [current]
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] <<vendor>> 
>> ASC=0x81 ASCQ=0x0ASC=0x81 ASCQ=0x0
>> Aug 14 13:12:35 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
>> 00 55 6a 62 c8 00 04 00 00
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> sdlm, sector 1433035464
>> Aug 14 13:12:35 dc045 kernel: mpt2sas3: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> dm-226, sector 1433035464
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> sdfb, sector 1433036488
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> dm-226, sector 1433036488
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> sdlm, sector 1433037512
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> dm-226, sector 1433037512
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> sdfb, sector 2128
>> Aug 14 13:12:35 dc045 kernel: end_request: critical target error, dev 
>> dm-226, sector 2128
>> Aug 14 13:12:35 dc045 kernel: md: super_written gets error=-121, 
>> uptodate=0
>> Aug 14 13:12:36 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:12:36 dc045 multipathd: checker failed path 68:320 in map 
>> mpathaw
>> Aug 14 13:12:36 dc045 multipathd: mpathaw: remaining active paths: 1
>> Aug 14 13:12:36 dc045 kernel: device-mapper: multipath: Failing path 
>> 68:320.
>> Aug 14 13:13:17 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:13:25 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:13:25 dc045 multipathd: checker failed path 129:208 in map 
>> mpathaw
>> Aug 14 13:13:25 dc045 multipathd: mpathaw: Entering recovery mode: 
>> max_retries=50
>> Aug 14 13:13:25 dc045 multipathd: mpathaw: remaining active paths: 0
>> Aug 14 13:13:25 dc045 kernel: device-mapper: multipath: Failing path 
>> 129:208.
>> Aug 14 13:13:25 dc045 multipathd: mpathaw: Entering recovery mode: 
>> max_retries=50
>> Aug 14 13:13:27 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:13:35 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:13:37 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:13:45 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:13:47 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:13:55 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:13:57 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:14:05 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:14:07 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:14:15 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:14:17 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:14:25 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:14:27 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:14:35 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:14:37 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:14:46 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:14:48 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:14:56 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:14:58 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:15:06 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:15:08 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:15:16 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:15:18 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:15:26 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:15:28 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:15:36 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:15:38 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:15:46 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:15:48 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:15:56 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:15:58 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:16:06 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:16:08 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:16:16 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:16:18 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:16:26 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:16:28 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:16:37 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:16:39 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:16:47 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:16:49 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:16:57 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:16:59 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:17:07 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:17:09 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:17:17 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:17:19 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:17:27 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:17:29 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:17:37 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:17:39 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:17:47 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:17:49 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:17:57 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:17:59 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:18:07 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:18:09 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:18:17 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:18:19 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:18:28 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:18:30 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:18:38 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:18:40 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:18:48 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:18:50 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:18:58 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:19:00 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:19:08 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:19:10 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:19:18 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:19:20 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:19:28 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:19:30 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:19:38 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:19:40 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:19:48 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:19:50 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:19:58 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:20:00 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:20:08 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:20:11 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:20:19 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:20:21 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:20:29 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is down
>> Aug 14 13:20:31 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is down
>> Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: attempting task abort! 
>> scmd(ffff881c6e49aec0)
>> Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: [sdfb] CDB: Write(10): 2a 
>> 00 55 6a 72 c8 00 01 68 00
>> Aug 14 13:20:37 dc045 kernel: scsi target25:0:42: handle(0x0037), 
>> sas_address(0x5000c5007b2ee20d), phy(18)
>> Aug 14 13:20:37 dc045 kernel: scsi target25:0:42: 
>> enclosure_logical_id(0x500093d00104c000), slot(39)
>> Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: attempting task abort! 
>> scmd(ffff880c35fba7c0)
>> Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: [sdlm] CDB: Write(10): 2a 
>> 00 55 6a 6e c8 00 04 00 00
>> Aug 14 13:20:37 dc045 kernel: scsi target27:0:42: handle(0x0037), 
>> sas_address(0x5000c5007b2ee20e), phy(18)
>> Aug 14 13:20:37 dc045 kernel: scsi target27:0:42: 
>> enclosure_logical_id(0x500093d00104c000), slot(39)
>> Aug 14 13:20:37 dc045 kernel: sd 25:0:42:0: task abort: SUCCESS 
>> scmd(ffff881c6e49aec0)
>> Aug 14 13:20:37 dc045 kernel: sd 27:0:42:0: task abort: SUCCESS 
>> scmd(ffff880c35fba7c0)
>> Aug 14 13:20:37 dc045 kernel: mpt2sas3: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(7), width(8)
>> Aug 14 13:20:37 dc045 kernel: mpt2sas1: 
>> _scsih_sas_broadcast_primitive_event: enter: phy number(5), width(8)
>> Aug 14 13:20:39 dc045 multipathd: mpathaw: sdfb - directio checker 
>> reports path is up
>> Aug 14 13:20:39 dc045 multipathd: 129:208: reinstated
>> Aug 14 13:20:39 dc045 multipathd: mpathaw: queue_if_no_path enabled
>> Aug 14 13:20:39 dc045 multipathd: mpathaw: Recovered to normal mode
>> Aug 14 13:20:39 dc045 multipathd: mpathaw: remaining active paths: 1
>> Aug 14 13:20:41 dc045 multipathd: mpathaw: sdlm - directio checker 
>> reports path is up
>> Aug 14 13:20:41 dc045 multipathd: 68:320: reinstated
>> Aug 14 13:20:41 dc045 multipathd: mpathaw: remaining active paths: 2
>> Aug 14 13:21:12 dc045 kernel: md: unbind<dm-231>
>> Aug 14 13:21:12 dc045 kernel: md: export_rdev(dm-231)
>>
>>
>>
>>
>>
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-08-14 18:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-14 17:52 Problems with Seagate 8TB SMR archive drives Tejas Rao
2015-08-14 18:19 ` Tejas Rao
2015-08-14 18:22 ` Jeff Johnson
2015-08-14 18:27   ` Tejas Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).