linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Denys Dmytriyenko <denis@denix.org>
To: Tejun Heo <htejun@gmail.com>
Cc: Mark Lord <liml@rtr.ca>, Gabor FUNK <FUNK.Gabor@hunetkft.hu>,
	linux-ide@vger.kernel.org, Jim Paris <jim@jtan.com>
Subject: Re: sata_sil24 stability and performance
Date: Thu, 20 Mar 2008 18:37:36 -0400	[thread overview]
Message-ID: <20080320223736.GA19940@denix.org> (raw)
In-Reply-To: <47DF63C1.5090205@gmail.com>

Hi,

On Tue, Mar 18, 2008 at 03:40:01PM +0900, Tejun Heo wrote:
> > Error 42 occurred at disk power-on lifetime: 3444 hours (143 days + 12 hours)
> >   When the command that caused the error occurred, the device was in an unknown state.
> > 
> >   After command completion occurred, registers were:
> >   ER ST SC SN CL CH DH
> >   -- -- -- -- -- -- --
> >   84 41 28 ff 46 5a 40
> > 
> >   Commands leading to the command that caused the error were:
> >   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
> >   -- -- -- -- -- -- -- --  ----------------  --------------------
> >   60 08 28 ff 46 5a 40 00   2d+07:38:11.073  READ FPDMA QUEUED
> >   60 08 28 ff 46 5a 40 00   2d+07:38:11.073  READ FPDMA QUEUED
> >   60 08 28 ff 46 5a 40 00   2d+07:38:11.073  READ FPDMA QUEUED
> >   60 10 20 2f 47 5a 40 00   2d+07:38:11.073  READ FPDMA QUEUED
> >   60 08 18 1f 47 5a 40 00   2d+07:38:11.073  READ FPDMA QUEUED
> 
> Error 42 occurred about 21days ago.  Unless your clock is off, I don't
> think this is what you've seen but the error is UNC (uncorrectable media
> error), so it does mean that your drive has some bad sectors which can
> explain the device error you saw.
> 
> > Error 41 occurred at disk power-on lifetime: 3405 hours (141 days + 21 hours)
> >   When the command that caused the error occurred, the device was in an unknown state.
> > 
> >   After command completion occurred, registers were:
> >   ER ST SC SN CL CH DH
> >   -- -- -- -- -- -- --
> >   00 41 01 10 00 00 a0  Error:
> > 
> >   Commands leading to the command that caused the error were:
> >   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
> >   -- -- -- -- -- -- -- --  ----------------  --------------------
> >   2f 00 01 10 00 00 a0 00      12:51:00.112  READ LOG EXT
> >   60 20 20 7f 32 4c 40 00      12:51:00.081  READ FPDMA QUEUED
> >   60 08 18 6f 32 4c 40 00      12:51:00.081  READ FPDMA QUEUED
> >   60 30 10 9f 32 4c 40 00      12:51:00.081  READ FPDMA QUEUED
> >   60 08 08 5f 32 4c 40 00      12:51:00.081  READ FPDMA QUEUED
> 
> Hmm.. this one less clear.  Maybe the device wasn't expecting READ LOG
> EXT as it was still in NCQ command phase and got surprised?
> 
> Currently you're the first and only one to report illegal qc_active
> transition problem.  I'd like to know what precedes the error which
> isn't exactly easy in retrospect.  For now, please keep an eye on those
> errors and report if you can see any pattern.  And just in case, can you
> get 2.6.24 on the machine and see anything changes?

Thanks for the info. As Gabor suggested, I watched UDMA_CRC_Error_Count and it 
slowly grows only on this particular drive. And here is another recent 
exception for the same drive, which is somewhat strange looking:

Mar 19 22:24:29 [kernel] ata3.00: exception Emask 0x40 SAct 0x3f SErr 0x0 action 0x6 frozen
Mar 19 22:24:29 [kernel] ata3.00: irq_stat 0x00060002, PRB not on qword boundary
Mar 19 22:24:29 [kernel] ata3.00: cmd 60/08:00:27:32:f3/00:00:2c:00:00/40 tag 0 cdb 0x0 data 4096 in
Mar 19 22:24:29 [kernel]          res 50/00:00:00:00:00/00:00:00:00:00/40 Emask 0x40 (internal error)
Mar 19 22:24:29 [kernel] ata3.00: cmd 60/08:08:6f:32:f3/00:00:2c:00:00/40 tag 1 cdb 0x0 data 4096 in
Mar 19 22:24:29 [kernel]          res 50/00:00:00:00:00/00:00:00:00:00/40 Emask 0x40 (internal error)
Mar 19 22:24:29 [kernel] ata3.00: cmd 60/08:10:67:32:f3/00:00:2c:00:00/40 tag 2 cdb 0x0 data 4096 in
Mar 19 22:24:29 [kernel]          res 50/00:00:00:00:00/00:00:00:00:00/40 Emask 0x40 (internal error)
Mar 19 22:24:29 [kernel] ata3.00: cmd 60/08:18:37:32:f3/00:00:2c:00:00/40 tag 3 cdb 0x0 data 4096 in
Mar 19 22:24:29 [kernel]          res 50/00:00:00:00:00/00:00:00:00:00/40 Emask 0x40 (internal error)
Mar 19 22:24:29 [kernel] ata3.00: cmd 60/08:20:47:32:f3/00:00:2c:00:00/40 tag 4 cdb 0x0 data 4096 in
Mar 19 22:24:29 [kernel]          res 50/00:00:00:00:00/00:00:00:00:00/40 Emask 0x40 (internal error)
Mar 19 22:24:29 [kernel] ata3.00: cmd 60/10:28:77:32:f3/00:00:2c:00:00/40 tag 5 cdb 0x0 data 8192 in
Mar 19 22:24:29 [kernel]          res 50/00:00:00:00:00/00:00:00:00:00/40 Emask 0x40 (internal error)
Mar 19 22:24:29 [kernel] ata3: hard resetting port
Mar 19 22:24:31 [kernel] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 19 22:24:31 [kernel] ata3.00: configured for UDMA/100
Mar 19 22:24:31 [kernel] ata3: EH pending after completion, repeating EH (cnt=4)
Mar 19 22:24:31 [kernel] ata3: exception Emask 0x2 SAct 0x0 SErr 0x0 action 0x2
Mar 19 22:24:31 [kernel] ata3: irq_stat 0x00060002, protocol mismatch
Mar 19 22:24:31 [kernel] ata3: soft resetting port
Mar 19 22:24:31 [kernel] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 19 22:24:31 [kernel] ata3.00: configured for UDMA/100
Mar 19 22:24:31 [kernel] ata3: EH complete
Mar 19 22:24:31 [kernel] sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Mar 19 22:24:31 [kernel] sd 2:0:0:0: [sdc] Write Protect is off
Mar 19 22:24:31 [kernel] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Mar 19 22:24:31 [kernel] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 19 22:24:31 [kernel] sd 2:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Mar 19 22:24:31 [kernel] sd 2:0:0:0: [sdc] Write Protect is off
Mar 19 22:24:31 [kernel] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
Mar 19 22:24:31 [kernel] sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Any ieas what this might be? I'll definitely try to replace the cable and see 
what happens.

BTW, issuing "smartctl -a" on a drive in standby, throws this exception:

Mar 20 18:16:53 [kernel] ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Mar 20 18:16:53 [kernel] ata10.00: cmd b0/da:00:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 0 
Mar 20 18:16:53 [kernel]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 20 18:16:53 [kernel] ata10: soft resetting port
Mar 20 18:16:54 [kernel] ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 20 18:16:54 [kernel] ata10.00: configured for UDMA/100
Mar 20 18:16:54 [kernel] ata10: EH complete
Mar 20 18:16:54 [kernel] sd 9:0:0:0: [sdj] 976773168 512-byte hardware sectors (500108 MB)
Mar 20 18:16:54 [kernel] sd 9:0:0:0: [sdj] Write Protect is off
Mar 20 18:16:54 [kernel] sd 9:0:0:0: [sdj] Mode Sense: 00 3a 00 00
Mar 20 18:16:54 [kernel] sd 9:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

-- 
Denys

  reply	other threads:[~2008-03-20 22:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-19  2:09 sata_sil24 stability and performance Denys Dmytriyenko
2008-02-19  4:36 ` Jim Paris
2008-02-19  6:39   ` Denys Dmytriyenko
2008-02-19 15:32   ` Mark Lord
2008-03-02  6:14     ` Denys Dmytriyenko
2008-03-02  9:39       ` Gabor FUNK
2008-03-04  0:02         ` Tejun Heo
2008-03-04  0:22           ` Denys Dmytriyenko
2008-03-04  3:28             ` Tejun Heo
2008-03-04  6:29               ` Denys Dmytriyenko
2008-03-05  8:11                 ` Tejun Heo
2008-03-06  4:14                   ` Denys Dmytriyenko
2008-03-06  4:25                     ` Tejun Heo
2008-03-06  6:55                       ` Denys Dmytriyenko
2008-03-06  7:08                         ` Tejun Heo
2008-03-15 21:43                           ` Denys Dmytriyenko
2008-03-17  3:09                             ` Mark Lord
2008-03-18  0:15                               ` Denys Dmytriyenko
2008-03-18  4:09                                 ` Tejun Heo
2008-03-18  4:53                                   ` Denys Dmytriyenko
2008-03-18  6:40                                     ` Tejun Heo
2008-03-20 22:37                                       ` Denys Dmytriyenko [this message]
2008-03-21  0:18                                         ` Tejun Heo
2008-04-14  1:19                                           ` Denys Dmytriyenko
2008-04-14  2:49                                             ` Tejun Heo
2008-04-14 10:55                                             ` Gabor FUNK
2008-03-18  9:14                                     ` Gabor FUNK
2008-03-18 13:06                                       ` Gabor FUNK
2008-03-18 20:05                                   ` Mark Lord
2008-03-18 20:06                                     ` Mark Lord

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080320223736.GA19940@denix.org \
    --to=denis@denix.org \
    --cc=FUNK.Gabor@hunetkft.hu \
    --cc=htejun@gmail.com \
    --cc=jim@jtan.com \
    --cc=liml@rtr.ca \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).