linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* LSI megasas and duplicate erroneous devices?
@ 2015-08-29 15:00 Thomas Fjellstrom
  2015-08-30  3:37 ` Thomas Fjellstrom
  0 siblings, 1 reply; 2+ messages in thread
From: Thomas Fjellstrom @ 2015-08-29 15:00 UTC (permalink / raw)
  To: linux-scsi

I'm seeing the strangest thing with my M1015/LSI card.

I'm getting regular DID_BAD_TARGET errors in the kernel log for a drive that
was erroneously detected.

Inital detection looks like:


[    0.860853] megasas: 06.806.08.00-rc1
[    0.860877] megasas: 0x1000:0x0073:0x1014:0x03b1: bus 1:slot 0:func 0
[    0.861172] megasas: FW now in Ready state
[    0.861796] libata version 3.00 loaded.
[    0.907832] megasas_init_mfi: fw_support_ieee=67108864
[    0.907902] megasas: INIT adapter done
[    0.955875] megaraid_sas 0000:01:00.0: Controller type: iMR
[    0.955990] scsi host0: LSI SAS based MegaRAID driver
[    0.959560] scsi 0:0:9:0: Direct-Access     ATA      ST3000DM001-1CH1 CC29 PQ: 0 ANSI: 5
[    0.964976] scsi 0:0:10:0: Direct-Access     ATA      ST3000DM001-1CH1 CC29 PQ: 0 ANSI: 5
[    0.970747] scsi 0:0:11:0: Direct-Access     ATA      ST3000DM001-1CH1 CC29 PQ: 0 ANSI: 5
[    0.975890] scsi 0:0:13:0: Direct-Access     ATA      WDC WD30EFRX-68E 0A82 PQ: 0 ANSI: 5
[    0.981458] scsi 0:0:15:0: Direct-Access     ATA      ST3000DM001-1ER1 CC25 PQ: 0 ANSI: 5
[    0.987097] scsi 0:0:16:0: Direct-Access     ATA      WDC WD30EFRX-68E 0A82 PQ: 0 ANSI: 5


Then after a while, the following appears:


[ 2545.701262] scsi 0:0:14:0: Direct-Access     ATA      WDC WD30EFRX-68E 0A82 PQ: 0 ANSI: 5


Note, this drive is a duplicate of either 0:0:13:0, or 0:0:16:0, there are
only two WD Reds in this system. Two of the ports on the card are unpopulated.

Then I see some errors some time later:


[ 7113.505094] sd 0:0:14:0: [sdi] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 7113.506044] sd 0:0:14:0: [sdi] CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[ 7113.506967] blk_update_request: I/O error, dev sdi, sector 0
[ 7113.508415] sd 0:0:14:0: [sdi] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 7113.509330] sd 0:0:14:0: [sdi] CDB: Read(16) 88 00 00 00 00 01 5d 50 a3 00 00 00 00 08 00 00
[ 7113.510252] blk_update_request: I/O error, dev sdi, sector 5860532992
[ 7113.511567] sd 0:0:14:0: [sdi] FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 7113.512522] sd 0:0:14:0: [sdi] CDB: Read(16) 88 00 00 00 00 01 5d 50 a3 a0 00 00 00 08 00 00
[ 7113.513487] blk_update_request: I/O error, dev sdi, sector 5860533152
[ 7113.515208] sd 0:0:14:0: [sdi] Synchronizing SCSI cache


This keeps happening over and over.

Attempting to `smartctl -a` on sdi fails with "no such device", and sdi does
not currently appear in /dev

This machine is currently running a 4.0.2-1 kernel from debian sid.

What exactly can cause this, and how can I fix it?

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: LSI megasas and duplicate erroneous devices?
  2015-08-29 15:00 LSI megasas and duplicate erroneous devices? Thomas Fjellstrom
@ 2015-08-30  3:37 ` Thomas Fjellstrom
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Fjellstrom @ 2015-08-30  3:37 UTC (permalink / raw)
  To: linux-scsi

On Sat 29 Aug 2015 09:00:44 AM Thomas Fjellstrom wrote:
> I'm seeing the strangest thing with my M1015/LSI card.
> 
> I'm getting regular DID_BAD_TARGET errors in the kernel log for a drive that
> was erroneously detected.
> 
> Inital detection looks like:
> 
> 
[snip]
> 
> 
> Note, this drive is a duplicate of either 0:0:13:0, or 0:0:16:0, there are
> only two WD Reds in this system. Two of the ports on the card are
> unpopulated.

*sigh* It appears there are actually three WD Reds in the system. So the 
problem isn't near as weird as i thought. I noticed the drive was actually 
visible a few minutes ago and it had a different S/N than the other WD's. I 
decided to take a look at the drives, and make sure all cables were seated 
properly, and while i was in there i took a can of compressed air to the 
entire box. Upon boot the drive was detected immediately, and so far hasn't 
dropped off, but it never dropped right away, so we'll see what happens. I'm 
running `badblocks -w` on it to see if it encounters any errors at all. If 
theres no errors, It'll go back into service as a spare in a raid6 array.

I've had to mess with that system so many times the past year I forgot I ended 
up putting three new WD's in there after a few Seagate's failed. 

So there we go. Nothing here unless it drops out again. I apologize for the 
noise.

[snip]
> 
> What exactly can cause this, and how can I fix it?

-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-08-30  3:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-29 15:00 LSI megasas and duplicate erroneous devices? Thomas Fjellstrom
2015-08-30  3:37 ` Thomas Fjellstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).