All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: Tejun Heo <tj@kernel.org>, linux-ide@vger.kernel.org
Subject: PMP SMART error recovery and failure code decoding help
Date: Sun, 16 Jan 2011 08:39:50 -0800	[thread overview]
Message-ID: <20110116163950.GA691@merlins.org> (raw)

I have 2 sets of 5 drives being a PMP.

- 2.6.36.0 kernel
- sata_sil24 card
- Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9

All 10 are outputting errors on a schedule after being queried by some Smart tool
(I have hddtemp and smartmontools at least).

Error is:
ata10.02: failed command: SMART
ata10.02: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata10.02: status: { DRDY }

I think it's due to a script that I wrote that uses hdparm -y to spin
drives down after an idle period (because at least 5 of my drives, WDC
WD20EADS, have stupid green firmware that prevents auto spindowns with
hdparm -S).
My swdisksusp script is at:
http://marc.merlins.org/perso/linux/post_2010-08-03_Spinning-Down-WD20EADS-Drives-and-Fixing-Load-Cycle.html

Anyway, the problem happens with both drives that I manually spin down and
drives that spin down on their own. I think it's not actually a 'real',
error but more an issue where drives cannot answer some SMART error when
they are spun down.
That said, is it normal/expected for the PMP code to do a full bus reset
because of a SMART command that couldn't go through?

Thanks,
Marc

Jan 16 05:54:23 gargamel kernel: ata9.00: failed command: SMART
Jan 16 05:54:31 gargamel kernel: ata9.01: failed command: SMART
Jan 16 05:54:39 gargamel kernel: ata9.02: failed command: SMART
Jan 16 05:54:47 gargamel kernel: ata9.03: failed command: SMART
Jan 16 05:54:55 gargamel kernel: ata9.04: failed command: SMART
Jan 16 06:05:04 gargamel kernel: ata9.00: failed command: SMART
Jan 16 06:05:12 gargamel kernel: ata9.01: failed command: SMART
Jan 16 06:05:20 gargamel kernel: ata9.02: failed command: SMART
Jan 16 06:05:28 gargamel kernel: ata9.03: failed command: SMART
Jan 16 06:05:36 gargamel kernel: ata9.04: failed command: SMART
Jan 16 06:05:44 gargamel kernel: ata10.00: failed command: SMART
Jan 16 06:06:01 gargamel kernel: ata10.01: failed command: SMART
Jan 16 06:06:18 gargamel kernel: ata10.02: failed command: SMART
Jan 16 06:16:35 gargamel kernel: ata9.00: failed command: SMART
Jan 16 06:16:52 gargamel kernel: ata9.01: failed command: SMART
Jan 16 06:17:01 gargamel kernel: ata9.02: failed command: SMART
Jan 16 06:17:08 gargamel kernel: ata9.03: failed command: SMART
Jan 16 06:17:16 gargamel kernel: ata9.04: failed command: SMART
Jan 16 06:27:25 gargamel kernel: ata10.00: failed command: SMART
Jan 16 06:27:42 gargamel kernel: ata10.01: failed command: SMART
Jan 16 06:27:59 gargamel kernel: ata10.02: failed command: SMART
Jan 16 06:38:16 gargamel kernel: ata9.00: failed command: SMART
Jan 16 06:38:24 gargamel kernel: ata9.01: failed command: SMART
Jan 16 06:38:32 gargamel kernel: ata9.02: failed command: SMART
Jan 16 06:38:40 gargamel kernel: ata9.03: failed command: SMART
Jan 16 06:38:48 gargamel kernel: ata9.04: failed command: SMART
Jan 16 06:59:05 gargamel kernel: ata10.00: failed command: SMART
Jan 16 06:59:19 gargamel kernel: ata10.01: failed command: SMART
Jan 16 06:59:36 gargamel kernel: ata10.02: failed command: SMART
Jan 16 07:29:58 gargamel kernel: ata10.00: failed command: SMART
Jan 16 07:30:15 gargamel kernel: ata10.01: failed command: SMART
Jan 16 07:30:32 gargamel kernel: ata10.02: failed command: SMART
Jan 16 08:00:55 gargamel kernel: ata10.00: failed command: SMART
Jan 16 08:01:12 gargamel kernel: ata10.01: failed command: SMART
Jan 16 08:01:29 gargamel kernel: ata10.02: failed command: SMART

A full error looks like this:
ata10.00: failed to read SCR 1 (Emask=0x40)
ata10.01: failed to read SCR 1 (Emask=0x40)
ata10.02: failed to read SCR 1 (Emask=0x40)
ata10.03: failed to read SCR 1 (Emask=0x40)
ata10.04: failed to read SCR 1 (Emask=0x40)
ata10.05: failed to read SCR 1 (Emask=0x40)
ata10.15: exception Emask 0x4 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.02: failed command: SMART
ata10.02: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata10.02: status: { DRDY }
ata10.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.15: hard resetting link
ata10.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata10.05: limiting SATA link speed to 1.5 Gbps
ata10.00: hard resetting link
ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
ata10.01: hard resetting link
ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.02: hard resetting link
ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.03: hard resetting link
ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.04: hard resetting link
ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.05: hard resetting link
ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata10.00: configured for UDMA/100
ata10.01: configured for UDMA/100
ata10.02: qc timeout (cmd 0xec)
ata10.02: failed to IDENTIFY (I/O error, err_mask=0x5)
ata10.02: revalidation failed (errno=-5)
ata10.15: hard resetting link
ata10.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata10.00: hard resetting link
ata10.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
ata10.01: hard resetting link
ata10.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.02: hard resetting link
ata10.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.03: hard resetting link
ata10.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.04: hard resetting link
ata10.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata10.05: hard resetting link
ata10.05: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata10.00: configured for UDMA/100
ata10.01: configured for UDMA/100
ata10.02: configured for UDMA/100
ata10.03: configured for UDMA/100
ata10.04: configured for UDMA/100
ata10: EH complete


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

             reply	other threads:[~2011-01-16 17:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-16 16:39 Marc MERLIN [this message]
2011-01-17 13:26 ` PMP SMART error recovery and failure code decoding help Tejun Heo
2011-01-17 16:43   ` Marc MERLIN
2011-01-17 17:12     ` Tejun Heo
2011-01-17 17:29       ` Marc MERLIN
2011-06-29 17:14         ` Marc MERLIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110116163950.GA691@merlins.org \
    --to=marc@merlins.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.