All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: Tejun Heo <htejun@gmail.com>
Cc: Tejun Heo <tj@kernel.org>, linux-ide@vger.kernel.org
Subject: Re: help with PMP failures
Date: Tue, 17 Nov 2009 23:41:45 -0800	[thread overview]
Message-ID: <20091118074145.GA24488@merlins.org> (raw)
In-Reply-To: <4B0371F4.9060703@gmail.com>

On Wed, Nov 18, 2009 at 01:03:00PM +0900, Tejun Heo wrote:
> Hello,
> 
> 11/18/2009 02:39 AM, Marc MERLIN wrote:
> >> This is the actual failure.  Your 6.02 drive reported media error
> >> which combined with the controller errata caused port wide failure.
> >  
> > Ah, I see, so it should be the one for me to focus on.
> > If it hadn't had an error, everything wouldn't have gone down the toilet,
> > next, right?
> 
> Yes, that's my guess.
> 
> > scsi 6:2:0:0: Direct-Access     ATA      Hitachi HDS72101 GKAO PQ: 0 ANSI: 5
> > sd 6:2:0:0: [sdj] 1953525168 512-byte hardware sectors: (1.00 TB/931 GiB)
> >
> > If it's a media error, shouldn't it show up in the smart counters?
> 
> Does smartctl -a output shows any logged errors?

Didn't know about -a, good call.
Yep, they do and the times are about consistent with the raid going down

Not sure if the last 5 errors are enough to give a good clue, or not.

If I can't quite figure it out, I'll just pop in a 16 port adaptec sata 
board I recently picked up. It'll take PMP out of the equation if I truly
have drives returning read / write errors that don't quite seem to show up
in smart yet.

  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       18580

 
SMART Error Log Version: 1
ATA Error Count: 47 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 47 occurred at disk power-on lifetime: 18547 hours (772 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 59 70 44

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 3f 59 70 40 08  13d+19:41:56.500  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 08  13d+19:41:56.500  FLUSH CACHE EXIT
  ea 00 00 00 00 00 a0 08  13d+19:41:56.200  FLUSH CACHE EXIT
  61 08 00 3f 59 70 40 08  13d+19:41:56.200  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 08  13d+19:41:56.200  FLUSH CACHE EXIT

Error 46 occurred at disk power-on lifetime: 18546 hours (772 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.
  
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 59 70 44
  
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 3f 59 70 40 08  13d+19:27:26.400  WRITE FPDMA QUEUED
  27 00 00 00 00 00 e0 08  13d+19:27:26.400  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 08  13d+19:27:26.400  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 08  13d+19:27:26.400  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 08  13d+19:27:26.400  READ NATIVE MAX ADDRESS EXT

Error 45 occurred at disk power-on lifetime: 18546 hours (772 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.
  
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 59 70 44
  
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 3f 59 70 40 08  13d+19:27:26.100  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 08  13d+19:27:26.100  FLUSH CACHE EXIT
  ea 00 00 00 00 00 a0 08  13d+19:27:25.800  FLUSH CACHE EXIT
  61 08 00 3f 59 70 40 08  13d+19:27:25.800  WRITE FPDMA QUEUED
  27 00 00 00 00 00 e0 08  13d+19:27:25.800  READ NATIVE MAX ADDRESS EXT

Error 44 occurred at disk power-on lifetime: 18546 hours (772 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.
  
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 59 70 44
  
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 3f 59 70 40 08  13d+19:27:24.900  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 08  13d+19:27:24.900  FLUSH CACHE EXIT
  b0 d0 01 00 4f c2 00 08  13d+19:20:22.300  SMART READ DATA
  b0 d8 00 00 4f c2 00 08  13d+19:20:22.100  SMART ENABLE OPERATIONS
  e5 00 00 00 00 00 00 08  13d+19:20:22.100  CHECK POWER MODE

Error 43 occurred at disk power-on lifetime: 18546 hours (772 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.
  
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 46 59 70 44
  
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 00 3f 59 70 40 08  13d+19:12:24.500  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 08  13d+19:12:24.500  FLUSH CACHE EXIT
  ea 00 00 00 00 00 a0 08  13d+19:12:23.600  FLUSH CACHE EXIT
  61 08 00 3f 59 70 40 08  13d+19:12:23.600  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 a0 08  13d+19:12:23.600  FLUSH CACHE EXIT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     18565         -
# 2  Short offline       Completed without error       00%     18535         -
# 3  Short offline       Completed without error       00%     18511         -
# 4  Extended offline    Completed without error       00%     18492         -
# 5  Short offline       Completed without error       00%     18487         -
# 6  Short offline       Completed without error       00%     18463         -
# 7  Short offline       Completed without error       00%     18439         -
# 8  Short offline       Completed without error       00%     18415         -
# 9  Short offline       Completed without error       00%     18391         -
#10  Short offline       Completed without error       00%     18366         -
#11  Short offline       Completed without error       00%     18343         -
#12  Extended offline    Completed without error       00%     18324         -
#13  Short offline       Completed without error       00%     18319         -
#14  Short offline       Completed without error       00%     18295         -
#15  Short offline       Completed without error       00%     18271         -
#16  Short offline       Completed without error       00%     18247         -
#17  Short offline       Completed without error       00%     18223         -
#18  Short offline       Completed without error       00%     18199         -
#19  Short offline       Completed without error       00%     18178         -
#20  Short offline       Completed without error       00%     18178         -
#21  Short offline       Completed without error       00%     18178         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Thanks for looking,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

  reply	other threads:[~2009-11-18  7:41 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20091116184242.GA22250@merlins.org>
     [not found] ` <20091116184853.GA23126@merlins.org>
     [not found]   ` <4B0238EC.6060803@kernel.org>
2009-11-17 17:39     ` help with PMP failures Marc MERLIN
2009-11-18  4:03       ` Tejun Heo
2009-11-18  7:41         ` Marc MERLIN [this message]
2009-11-18  8:33           ` Tejun Heo
2009-11-18 18:29             ` Marc MERLIN

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091118074145.GA24488@merlins.org \
    --to=marc@merlins.org \
    --cc=htejun@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.