From: Phil Turmel <philip@turmel.org>
To: Stone <stone@heisl.org>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Brocken Raid & LUKS
Date: Tue, 19 Feb 2013 19:31:43 -0500 [thread overview]
Message-ID: <5124196F.6090000@turmel.org> (raw)
In-Reply-To: <5123FB71.3060509@heisl.org>
You forgot to include linux-raid again. I'm adding them back to the
CC:. Please always use "reply to all" in your email client.
I will look for your detailed reply tomorrow.
Phil
On 02/19/2013 05:23 PM, Stone wrote:
> Am 19.02.2013 23:08, schrieb Phil Turmel:
>> On 02/19/2013 04:31 PM, Stone wrote:
>>
>> [trim /]
>>
>>>> [trim /]
>>> ok. my system is a ubuntu 12.04
>>> i can install a older mdadm or a install a old ubuntu like 11.04. there
>>> is a older mdadm on board.
>> Using the older ubuntu as a LiveCD should be fine--you don't have to
>> uninistall your current system.
>>
>> [trim /]
>>
>>> ok. here my next steps
>>> i find a older mdadm or i install a older ubunt with an older mdadm on
>>> board.
>>> then i stop my md2 device and recreate it with: mdadm --create /dev/md2
>>> --assume-clean --verbose --level=5 --raid-devices=4 /dev/sdc1 /dev/sdd1
>>> missing /dev/sdf1
>> Yes. But read all the way through first....
>>
>>> with a little bit of hope i can open the device.
>> But *don't* mount it! Use "fsck -n" after you open it to verify it is
>> Ok. If you mount it, and the chunk size is wrong, it will damage your
>> encrypted filesystem.
>>
>>> if not. i stop the md2 and recreate it with? with the parameter chunk?
>>> and with what value? do you have a range for me?
>> The current default is 512. The old default was 64. I'd try that if
>> 512 doesn't work. After that you'll have to guess.
> Ok i will test this tomorrow.
>>> here the timeout infos:
>>> for x in /sys/block/sd*/device/timeout ; do echo $x ; cat $x ; done
>>> /sys/block/sda/device/timeout
>>> 30
>>> /sys/block/sdb/device/timeout
>>> 30
>>> /sys/block/sdc/device/timeout
>>> 30
>>> /sys/block/sdd/device/timeout
>>> 30
>>> /sys/block/sde/device/timeout
>>> 30
>>> /sys/block/sdf/device/timeout
>>> 30
>> Ok, these are all Linux default. 30 seconds.
>>
>>> here the smart infos:
>> Uh oh. Two serious issues:
>>
>>> smartctl -x /dev/sdc1
>>> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-23-generic] (local
>>> build)
>>> Copyright (C) 2002-11 by Bruce Allen,
>>> http://smartmontools.sourceforge.net
>> [trim /]
>>
>>> 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
>>> 7 Seek_Error_Rate -OSR-K 200 200 000 - 0
>>> 9 Power_On_Hours -O--CK 078 078 000 - 16219
>>> 10 Spin_Retry_Count -O--CK 100 100 000 - 0
>>> 11 Calibration_Retry_Count -O--CK 100 253 000 - 0
>>> 12 Power_Cycle_Count -O--CK 100 100 000 - 84
>>> 192 Power-Off_Retract_Count -O--CK 200 200 000 - 82
>>> 193 Load_Cycle_Count -O--CK 169 169 000 - 94419
>>> 194 Temperature_Celsius -O---K 114 106 000 - 36
>>> 196 Reallocated_Event_Count -O--CK 200 200 000 - 0
>>> 197 Current_Pending_Sector -O--CK 200 200 000 - 2
>> Serious issue #1:
>>
>> You have unreadable sectors on sdc. When you hit them during rebuild,
>> sdc will be kicked out (again). They might not be permanent errors, but
>> you can't tell until the drive is given fresh data to write over them.
>>
>> You have two choices:
>>
>> 1) use ddrescue to copy sdc onto a new drive, then use it in place of
>> sdc when you re-create the array, or
>>
>> 2) use badblocks to find the exact locations of the bad sectors, then
>> write zeros to those sectors using dd.
>>
>> Either way, you have lost whatever those sectors used to hold.
>>
>> [trim /]
> yes this cheep WD Green drives. i have 4 new better drives here the i
> will use instead. this means i will get the raid running and than i copy
> all the data on the new drives.
>>> SCT Status Version: 3
>>> SCT Version (vendor specific): 258 (0x0102)
>>> SCT Support Level: 1
>>> Device State: Active (0)
>>> Current Temperature: 36 Celsius
>>> Power Cycle Min/Max Temperature: 33/37 Celsius
>>> Lifetime Min/Max Temperature: 33/44 Celsius
>>> Under/Over Temperature Limit Count: 0/0
>>> SCT Temperature History Version: 2
>>> Temperature Sampling Period: 1 minute
>>> Temperature Logging Interval: 1 minute
>>> Min/Max recommended Temperature: 0/60 Celsius
>>> Min/Max Temperature Limit: -41/85 Celsius
>>> Temperature History Size (Index): 478 (314)
>>>
>>> Index Estimated Time Temperature Celsius
>>> 315 2013-02-19 14:26 36 *****************
>>> ... ..(476 skipped). .. *****************
>>> 314 2013-02-19 22:23 36 *****************
>>>
>>> Warning: device does not support SCT Error Recovery Control command
>> Serious issue #2:
>>
>> Error timeout mismatch. Your cheap drives do not support Error Recovery
>> Control. That means when they run into unreadable sectors, they will
>> spend a couple minutes trying "extra hard" to get the data.
>>
>> But linux is only going to wait 30 seconds. Then it will reset the SATA
>> link and try again. But the drive will *not* give up its error recovery
>> effort, and will not even *talk* to the linux driver in the meantime, so
>> the linux driver will disconnect the drive and report errors for all
>> remaining requests. This will cause MD to kick the drive out.
>>
>> You only have one choice:
>>
>> 1) Set a long timeout in the linux drivers for the drives in your array,
>> on every boot. Something like:
>>
>> for x in /sys/block/sd[cdef]/device/timeout ; do echo 180 >$x ; done
>>
>> If you had slightly better drives, SCTERC would be supported. On
>> desktop drives at power up, it is disabled. But you would be able to
>> enable a normal 7.0 second timeout in the drives using smartctl. (In a
>> script, on every boot up.) Enterprise "raid" drives do this by default.
>>
>> [trim /]
>>
>>> smartctl -x /dev/sdd1
>>> smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-23-generic] (local
>>> build)
>>> Copyright (C) 2002-11 by Bruce Allen,
>>> http://smartmontools.sourceforge.net
>> [trim /]
>>
>>> SMART Attributes Data Structure revision number: 16
>>> Vendor Specific SMART Attributes with Thresholds:
>>> ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
>>> 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 534
>>> 3 Spin_Up_Time POS--K 172 171 021 - 6383
>>> 4 Start_Stop_Count -O--CK 100 100 000 - 586
>>> 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 2
>> You already have two relocations on this drive.
>>
>>> 7 Seek_Error_Rate -OSR-K 100 253 000 - 0
>>> 9 Power_On_Hours -O--CK 085 085 000 - 11487
>> In less than two years. You should pay close attention to this.
>>
>> Phil
> i think i must learn to interpret the smart values better.
> thank you.
> i will send you tomorrow my new info with the older mdadm version.
next prev parent reply other threads:[~2013-02-20 0:31 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-19 16:01 Brocken Raid & LUKS stone
2013-02-19 17:57 ` Phil Turmel
[not found] ` <5123E4E9.3020609@heisl.org>
2013-02-19 21:16 ` Phil Turmel
[not found] ` <5123EF45.6080405@heisl.org>
[not found] ` <5123F7C7.7000406@turmel.org>
[not found] ` <5123FB71.3060509@heisl.org>
2013-02-20 0:31 ` Phil Turmel [this message]
2013-02-20 18:32 ` Stone
2013-02-20 18:39 ` Phil Turmel
2013-02-21 7:04 ` Stone
2013-02-21 9:42 ` stone
2013-02-21 13:29 ` Phil Turmel
2013-02-21 14:19 ` stone
2013-02-21 15:04 ` Phil Turmel
2013-02-21 15:30 ` stone
2013-02-21 15:38 ` Phil Turmel
2013-02-21 15:49 ` Phil Turmel
2013-02-21 16:32 ` Stone
2013-02-21 16:41 ` Phil Turmel
2013-02-21 16:43 ` Stone
2013-02-21 16:46 ` Phil Turmel
2013-02-21 16:51 ` Stone
2013-02-21 16:54 ` Phil Turmel
2013-02-21 17:17 ` Stone
2013-02-21 17:23 ` Stone
2013-02-21 17:36 ` Phil Turmel
2013-02-21 17:47 ` Stone
2013-02-21 18:00 ` Phil Turmel
2013-02-21 18:08 ` Stone
2013-02-21 18:11 ` Phil Turmel
2013-02-21 18:29 ` Stone
2013-02-21 18:54 ` Phil Turmel
2013-02-21 19:12 ` Stone
2013-02-21 19:17 ` Stone
2013-02-21 19:24 ` Phil Turmel
2013-02-21 19:29 ` Stone
2013-02-21 19:45 ` Phil Turmel
2013-02-21 19:46 ` Stone
[not found] ` <51269DE0.5070905@heisl.org>
2013-02-22 10:31 ` stone
2013-02-22 13:53 ` Phil Turmel
2013-02-22 14:58 ` Stone
2013-02-22 15:37 ` Phil Turmel
2013-02-22 18:17 ` Stone
2013-02-22 18:23 ` Phil Turmel
2013-02-22 20:43 ` Stone
2013-02-22 22:35 ` Phil Turmel
2013-02-22 22:42 ` Stone
2013-02-23 2:22 ` Phil Turmel
2013-02-23 3:11 ` Stone
2013-02-23 4:36 ` Phil Turmel
2013-02-23 10:19 ` Stone
2013-02-23 16:10 ` Phil Turmel
2013-02-23 22:26 ` Stone
2013-02-23 23:49 ` Phil Turmel
2013-02-24 0:13 ` Stone
2013-02-24 4:04 ` Phil Turmel
2013-02-24 7:10 ` Stone
2013-02-24 14:15 ` Phil Turmel
2013-02-24 18:22 ` Stone
2013-02-24 18:33 ` Phil Turmel
2013-02-24 19:23 ` Stone
2013-02-24 19:51 ` Phil Turmel
2013-02-24 20:15 ` Stone
2013-02-24 20:25 ` Phil Turmel
2013-02-24 20:38 ` Stone
2013-02-24 20:44 ` Phil Turmel
2013-02-24 20:47 ` Stone
2013-02-25 9:06 ` stone
2013-02-25 18:31 ` Stone
2013-02-25 20:11 ` Stone
2013-02-26 0:19 ` Phil Turmel
2013-02-27 7:26 ` Stone
2013-02-27 19:04 ` Stone
2013-02-27 19:33 ` Hans-Peter Jansen
2013-02-27 19:51 ` Stone
2013-03-02 17:13 ` Phil Turmel
[not found] ` <5127B0AB.5040108@heisl.org>
2013-02-22 18:30 ` Phil Turmel
2013-02-21 22:29 ` Chris Murphy
2013-02-21 22:34 ` Phil Turmel
2013-02-21 22:20 ` Chris Murphy
2013-02-21 22:26 ` Phil Turmel
2013-02-21 13:15 ` Phil Turmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5124196F.6090000@turmel.org \
--to=philip@turmel.org \
--cc=linux-raid@vger.kernel.org \
--cc=stone@heisl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.