From: Petr Vandrovec <vandrove@vc.cvut.cz>
To: Tejun Heo <htejun@gmail.com>
Cc: Marc Bejarano <beej@alum.mit.edu>, linux-ide@vger.kernel.org
Subject: Re: errors on shutdown with PMP
Date: Tue, 31 Jul 2007 02:16:40 -0700 [thread overview]
Message-ID: <46AEFDF8.9010904@vc.cvut.cz> (raw)
In-Reply-To: <46AEBB71.9040003@gmail.com>
Tejun Heo wrote:
> Marc Bejarano wrote:
>> At 03:33 7/28/2007, Tejun Heo wrote:
>>> Device times out write.
>> odd that it would be able to be part of an lv's filesystem that had
>> hundreds of gigabytes recently written to it and then choke on flushing
>> during shutdown.
>>
>>> And then never comes back.
>> asleep at the wheel ;)
>>
>>> Please post the result of 'smartctl -a /dev/sdX' where sdX is the device
>>> which went offline.
>> i suppose i should have seen that coming. here you go:
>> ===
>> [root@dell ~]# /usr/local/sbin/smartctl -a /dev/sdc
>> smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6
>> Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> === START OF INFORMATION SECTION ===
>> Model Family: Seagate Barracuda 7200.10 family
>> Device Model: ST3750640AS
> [--snip--]
>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
>> UPDATED WHEN_FAILED RAW_VALUE
>> 1 Raw_Read_Error_Rate 0x000f 090 079 006 Pre-fail Always
>> - 66902364
>> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
>> - 31
>> 7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
>> - 146651228
>> 195 Hardware_ECC_Recovered 0x001a 056 049 000 Old_age Always
>> - 102514302
>> 198 Offline_Uncorrectable 0x0010 099 099 000 Old_age
>> Offline - 40
>
> Counters don't look too friendly. Do you happen to have another drive
> of the same model? If so, can you post smartctl -a of the drive?
Offline_Uncorrectable looks bad, as well as Reallocated_Sector_Ct...
For Raw_Read_Error_Rate/Seek_Error_Rate/Hardware_ECC_Recovered it is how
Seagates work:
gwy:~# for a in /dev/sd[a-f]; do smartctl -a $a; done | grep
'\(Raw_Read\|Seek_Error\|Hardware_ECC\|Offline_Uncorr\|Reallocated\|^Device
M\|^Firmware\)'
Device Model: Hitachi HDT725032VLA380
Firmware Version: V54OA52A
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail
Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
Device Model: Hitachi HDS721010KLA330
Firmware Version: GKAOA70F
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail
Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
Device Model: ST3750640AS
Firmware Version: 3.AAE
1 Raw_Read_Error_Rate 0x000f 110 087 006 Pre-fail
Always - 201790283
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 076 060 030 Pre-fail
Always - 43520234
195 Hardware_ECC_Recovered 0x001a 059 050 000 Old_age Always
- 40212951
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
Device Model: Hitachi HDS721010KLA330
Firmware Version: GKAOA70F
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail
Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
Device Model: ST3750640AS
Firmware Version: 3.AAD
1 Raw_Read_Error_Rate 0x000f 114 083 006 Pre-fail
Always - 121388046
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 078 065 030 Pre-fail
Always - 78605591
195 Hardware_ECC_Recovered 0x001a 066 050 000 Old_age Always
- 194670617
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
Device Model: Sans Digital V.36.B0D
Firmware Version: V.36.B0D
BTW, sdb-sde are behind PMP, no problems on shutdown. Funniest is that
all these counters are 32bit, so during day you see like your disk is
estimated to die in 5 days, then suddenly that 32bit counter overflows,
and your disk is again healthy as possible. I did not measure what
these counters actually count on these 750GB drives, but on 100GB
notebook Seagate drive every sector read counts as 3-5 ECC errors, and
every Smart data interrogation as 1...
Petr
next prev parent reply other threads:[~2007-07-31 9:56 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-27 21:53 errors on shutdown with PMP Marc Bejarano
2007-07-28 7:33 ` Tejun Heo
2007-07-31 3:13 ` Marc Bejarano
2007-07-31 4:32 ` Tejun Heo
2007-07-31 9:16 ` Petr Vandrovec [this message]
2007-07-31 16:23 ` Marc Bejarano
2007-08-01 3:46 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46AEFDF8.9010904@vc.cvut.cz \
--to=vandrove@vc.cvut.cz \
--cc=beej@alum.mit.edu \
--cc=htejun@gmail.com \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.