From: Petr Vandrovec <vandrove@vc.cvut.cz>
To: Tejun Heo <htejun@gmail.com>
Cc: Marc Bejarano <beej@alum.mit.edu>, linux-ide@vger.kernel.org
Subject: Re: errors on shutdown with PMP
Date: Tue, 31 Jul 2007 02:16:40 -0700 [thread overview]
Message-ID: <46AEFDF8.9010904@vc.cvut.cz> (raw)
In-Reply-To: <46AEBB71.9040003@gmail.com>
Tejun Heo wrote:
> Marc Bejarano wrote:
>> At 03:33 7/28/2007, Tejun Heo wrote:
>>> Device times out write.
>> odd that it would be able to be part of an lv's filesystem that had
>> hundreds of gigabytes recently written to it and then choke on flushing
>> during shutdown.
>>
>>> And then never comes back.
>> asleep at the wheel ;)
>>
>>> Please post the result of 'smartctl -a /dev/sdX' where sdX is the device
>>> which went offline.
>> i suppose i should have seen that coming. here you go:
>> ===
>> [root@dell ~]# /usr/local/sbin/smartctl -a /dev/sdc
>> smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6
>> Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> === START OF INFORMATION SECTION ===
>> Model Family: Seagate Barracuda 7200.10 family
>> Device Model: ST3750640AS
> [--snip--]
>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
>> UPDATED WHEN_FAILED RAW_VALUE
>> 1 Raw_Read_Error_Rate 0x000f 090 079 006 Pre-fail Always
>> - 66902364
>> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always
>> - 31
>> 7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always
>> - 146651228
>> 195 Hardware_ECC_Recovered 0x001a 056 049 000 Old_age Always
>> - 102514302
>> 198 Offline_Uncorrectable 0x0010 099 099 000 Old_age
>> Offline - 40
>
> Counters don't look too friendly. Do you happen to have another drive
> of the same model? If so, can you post smartctl -a of the drive?
Offline_Uncorrectable looks bad, as well as Reallocated_Sector_Ct...
For Raw_Read_Error_Rate/Seek_Error_Rate/Hardware_ECC_Recovered it is how
Seagates work:
gwy:~# for a in /dev/sd[a-f]; do smartctl -a $a; done | grep
'\(Raw_Read\|Seek_Error\|Hardware_ECC\|Offline_Uncorr\|Reallocated\|^Device
M\|^Firmware\)'
Device Model: Hitachi HDT725032VLA380
Firmware Version: V54OA52A
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail
Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
Device Model: Hitachi HDS721010KLA330
Firmware Version: GKAOA70F
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail
Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
Device Model: ST3750640AS
Firmware Version: 3.AAE
1 Raw_Read_Error_Rate 0x000f 110 087 006 Pre-fail
Always - 201790283
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 076 060 030 Pre-fail
Always - 43520234
195 Hardware_ECC_Recovered 0x001a 059 050 000 Old_age Always
- 40212951
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
Device Model: Hitachi HDS721010KLA330
Firmware Version: GKAOA70F
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail
Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
Device Model: ST3750640AS
Firmware Version: 3.AAD
1 Raw_Read_Error_Rate 0x000f 114 083 006 Pre-fail
Always - 121388046
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 078 065 030 Pre-fail
Always - 78605591
195 Hardware_ECC_Recovered 0x001a 066 050 000 Old_age Always
- 194670617
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
Device Model: Sans Digital V.36.B0D
Firmware Version: V.36.B0D
BTW, sdb-sde are behind PMP, no problems on shutdown. Funniest is that
all these counters are 32bit, so during day you see like your disk is
estimated to die in 5 days, then suddenly that 32bit counter overflows,
and your disk is again healthy as possible. I did not measure what
these counters actually count on these 750GB drives, but on 100GB
notebook Seagate drive every sector read counts as 3-5 ECC errors, and
every Smart data interrogation as 1...
Petr
next prev parent reply other threads:[~2007-07-31 9:56 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-27 21:53 errors on shutdown with PMP Marc Bejarano
2007-07-28 7:33 ` Tejun Heo
2007-07-31 3:13 ` Marc Bejarano
2007-07-31 4:32 ` Tejun Heo
2007-07-31 9:16 ` Petr Vandrovec [this message]
2007-07-31 16:23 ` Marc Bejarano
2007-08-01 3:46 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46AEFDF8.9010904@vc.cvut.cz \
--to=vandrove@vc.cvut.cz \
--cc=beej@alum.mit.edu \
--cc=htejun@gmail.com \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).