Failed during rebuild (raid5)

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Failed during rebuild (raid5)
@ 2013-05-03 11:23 Andreas Boman
  2013-05-03 11:38 ` Benjamin ESTRABAUD
  2013-05-03 12:26 ` Ole Tange
  0 siblings, 2 replies; 24+ messages in thread
From: Andreas Boman @ 2013-05-03 11:23 UTC (permalink / raw)
  To: linux-raid

I have (had?) a raid 5 array, with 5 disks (1.5TB/EA), smartd warned one 
was getting bad, so I replaced it with an identical disk.

I issued mdadm --manage --add /dev/md127 /dev/sdX

The array seemed to be rebuilding, was at around 15% when I went to bed.

This morning I came up to see the array degraded with two missing 
drives, another failed during the rebuild.

I powered the system down, and since I have the disk smartd flagged as 
bad and tried to just plug that in and power up hoping to see the array 
come back up -no such luck (not enough disks).

I powered the system down again, and now I'm trying to evaluate my best 
options to recover. Hoping to have some good advice in my inbox when I 
get back from the office. I'll be able to boot the thing up and get log 
info this afternoon.

Thanks!
Andreas

(please cc me, not subscribed)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 11:23 Failed during rebuild (raid5) Andreas Boman
@ 2013-05-03 11:38 ` Benjamin ESTRABAUD
  2013-05-03 12:40   ` Robin Hill
  2013-05-03 12:26 ` Ole Tange
  1 sibling, 1 reply; 24+ messages in thread
From: Benjamin ESTRABAUD @ 2013-05-03 11:38 UTC (permalink / raw)
  To: linux-raid; +Cc: Andreas Boman

On 03/05/13 12:23, Andreas Boman wrote:
> I have (had?) a raid 5 array, with 5 disks (1.5TB/EA), smartd warned 
> one was getting bad, so I replaced it with an identical disk.
> I issued mdadm --manage --add /dev/md127 /dev/sdX
>
> The array seemed to be rebuilding, was at around 15% when I went to bed.
>
> This morning I came up to see the array degraded with two missing 
> drives, another failed during the rebuild.
>
> I powered the system down, and since I have the disk smartd flagged as 
> bad and tried to just plug that in and power up hoping to see the 
> array come back up -no such luck (not enough disks).
>
Unfortunately this happens way too often: Your RAID members silently 
fail over time. They will get some bad blocks, and you won't know about 
it until you try to read or write one of the bad blocks. When that 
happen a disk will get kicked out. At this stage you'll replace the 
disk, not knowing that other areas of the other RAID members have also 
failed. The only sensible option is to run a RAID 6 which dramatically 
reduces the potential for double failure, or to run a RAID 5 but run a 
weekly (at least) check of the entire array for badblocks, carefully 
monitoring the smart reported changes after running the test (trying to 
read the entire array will cause badblocks to be detected and 
reallocated if any).
> I powered the system down again, and now I'm trying to evaluate my 
> best options to recover. Hoping to have some good advice in my inbox 
> when I get back from the office. I'll be able to boot the thing up and 
> get log info this afternoon.
>
I had to recover an array like that twice. The most important is 
probably to mitigate the data loss on the second drive that is failing 
right now by "ddrescueing" all of its data on another drive before it 
gets more damaged (the longer the failing drive is online the less 
chance you have). Use GNU ddrescue for that purpose.

Once you have rescued the failing drive onto a new one, you could then 
try to add that new recovered drive in place of the failing one and 
start the resync as you did before.

Note that it would probably be worthwhile to ddrescue the initial drive 
that you took out (if it is still good enough to do so) in case the 
second drive cannot be recovered correctly or is missing some data.

Regards,
Ben.

> Thanks!
> Andreas
>
> (please cc me, not subscribed)
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 11:23 Failed during rebuild (raid5) Andreas Boman
  2013-05-03 11:38 ` Benjamin ESTRABAUD
@ 2013-05-03 12:26 ` Ole Tange
  2013-05-04 11:29   ` Andreas Boman
  2013-05-05 14:00   ` Andreas Boman
  1 sibling, 2 replies; 24+ messages in thread
From: Ole Tange @ 2013-05-03 12:26 UTC (permalink / raw)
  To: Andreas Boman; +Cc: linux-raid

On Fri, May 3, 2013 at 1:23 PM, Andreas Boman <aboman@midgaard.us> wrote:

> This morning I came up to see the array degraded with two missing drives,
> another failed during the rebuild.

I just started this page for dealing with situations like yours:
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID


/Ole

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 11:38 ` Benjamin ESTRABAUD
@ 2013-05-03 12:40   ` Robin Hill
  2013-05-03 13:52     ` John Stoffel
  0 siblings, 1 reply; 24+ messages in thread
From: Robin Hill @ 2013-05-03 12:40 UTC (permalink / raw)
  To: Andreas Boman; +Cc: linux-raid, Benjamin ESTRABAUD

[-- Attachment #1: Type: text/plain, Size: 3092 bytes --]

On Fri May 03, 2013 at 12:38:47PM +0100, Benjamin ESTRABAUD wrote:

> On 03/05/13 12:23, Andreas Boman wrote:
> > I have (had?) a raid 5 array, with 5 disks (1.5TB/EA), smartd warned 
> > one was getting bad, so I replaced it with an identical disk.
> > I issued mdadm --manage --add /dev/md127 /dev/sdX
> >
> > The array seemed to be rebuilding, was at around 15% when I went to bed.
> >
> > This morning I came up to see the array degraded with two missing 
> > drives, another failed during the rebuild.
> >
> > I powered the system down, and since I have the disk smartd flagged as 
> > bad and tried to just plug that in and power up hoping to see the 
> > array come back up -no such luck (not enough disks).
> >
> Unfortunately this happens way too often: Your RAID members silently 
> fail over time. They will get some bad blocks, and you won't know about 
> it until you try to read or write one of the bad blocks. When that 
> happen a disk will get kicked out. At this stage you'll replace the 
> disk, not knowing that other areas of the other RAID members have also 
> failed. The only sensible option is to run a RAID 6 which dramatically 
> reduces the potential for double failure, or to run a RAID 5 but run a 
> weekly (at least) check of the entire array for badblocks, carefully 
> monitoring the smart reported changes after running the test (trying to 
> read the entire array will cause badblocks to be detected and 
> reallocated if any).
>
I'd recommend running a regular check anyway, whatever RAID level you're
running. I wouldn't choose to run RAID5 with more than 5 disks, or
larger than 1TB members either, but that really comes down to an
individual's cost/benefit view on the situation.

> > I powered the system down again, and now I'm trying to evaluate my 
> > best options to recover. Hoping to have some good advice in my inbox 
> > when I get back from the office. I'll be able to boot the thing up and 
> > get log info this afternoon.
> >
> I had to recover an array like that twice. The most important is 
> probably to mitigate the data loss on the second drive that is failing 
> right now by "ddrescueing" all of its data on another drive before it 
> gets more damaged (the longer the failing drive is online the less 
> chance you have). Use GNU ddrescue for that purpose.
> 
> Once you have rescued the failing drive onto a new one, you could then 
> try to add that new recovered drive in place of the failing one and 
> start the resync as you did before.
> 
Not "add", but "force assemble using". Whenever you end up in a
situation with too many failed disks to start your array, ddrescue and
force assemble (mdadm -Af) is your safest option. If that fails you can
look at more extreme measures, but that should always be the first
approach.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 12:40   ` Robin Hill
@ 2013-05-03 13:52     ` John Stoffel
  2013-05-03 14:51       ` Phil Turmel
  2013-05-03 16:29       ` Mikael Abrahamsson
  0 siblings, 2 replies; 24+ messages in thread
From: John Stoffel @ 2013-05-03 13:52 UTC (permalink / raw)
  To: Robin Hill; +Cc: Andreas Boman, linux-raid, Benjamin ESTRABAUD


After watching endless threads about RAID5 arrays losing a disk, and
then losing a second during the rebuild, I wonder if it would make
sense to:

- have MD automatically increase all disk timeouts when doing a
  rebuild.  The idea being that we are more tolerant of a bad sector
  when rebuilding?  The idea would be to NOT just evict disks when in
  potentially bad situations without trying really hard.  

- Automatically setup an automatic scrub of the array that happens
  weekly unless you explicitly turn it off.  This would possibly
  require changes from the distros, but if it could be made a core
  part of MD so that all the blocks in the array get read each week,
  that would help with silent failures.

We've got all these compute cycles kicking around that could be used
to make things even more reliable, we should be using them in some
smart way.

John

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 13:52     ` John Stoffel
@ 2013-05-03 14:51       ` Phil Turmel
  2013-05-03 16:23         ` John Stoffel
  2013-05-03 16:29       ` Mikael Abrahamsson
  1 sibling, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2013-05-03 14:51 UTC (permalink / raw)
  To: John Stoffel; +Cc: Robin Hill, Andreas Boman, linux-raid, Benjamin ESTRABAUD

On 05/03/2013 09:52 AM, John Stoffel wrote:
> 
> After watching endless threads about RAID5 arrays losing a disk, and
> then losing a second during the rebuild, I wonder if it would make
> sense to:
> 
> - have MD automatically increase all disk timeouts when doing a
>   rebuild.  The idea being that we are more tolerant of a bad sector
>   when rebuilding?  The idea would be to NOT just evict disks when in
>   potentially bad situations without trying really hard.  

This would be conterproductive for those users who actually follow
manufacturer guidelines when selecting drives for their arrays.

Anyways, it's a policy issue that belongs in userspace.  Distros can do
this today if they want.  There's no lack of scripts in this list's
archives.

> - Automatically setup an automatic scrub of the array that happens
>   weekly unless you explicitly turn it off.  This would possibly
>   require changes from the distros, but if it could be made a core
>   part of MD so that all the blocks in the array get read each week,
>   that would help with silent failures.

I understand some distros already do this.

> We've got all these compute cycles kicking around that could be used
> to make things even more reliable, we should be using them in some
> smart way.

But the "smart way" varies with the hardware at hand.  There's no "one
size fits all" solution here.

Phil


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 14:51       ` Phil Turmel
@ 2013-05-03 16:23         ` John Stoffel
  2013-05-03 16:32           ` Roman Mamedov
  0 siblings, 1 reply; 24+ messages in thread
From: John Stoffel @ 2013-05-03 16:23 UTC (permalink / raw)
  To: Phil Turmel
  Cc: John Stoffel, Robin Hill, Andreas Boman, linux-raid,
	Benjamin ESTRABAUD

>>>>> "Phil" == Phil Turmel <philip@turmel.org> writes:

Phil> On 05/03/2013 09:52 AM, John Stoffel wrote:
>> 
>> After watching endless threads about RAID5 arrays losing a disk, and
>> then losing a second during the rebuild, I wonder if it would make
>> sense to:
>> 
>> - have MD automatically increase all disk timeouts when doing a
>> rebuild.  The idea being that we are more tolerant of a bad sector
>> when rebuilding?  The idea would be to NOT just evict disks when in
>> potentially bad situations without trying really hard.  

Phil> This would be conterproductive for those users who actually
Phil> follow manufacturer guidelines when selecting drives for their
Phil> arrays.

Well for them, which is drives supporting STEC, etc, you'd skip that
step.  But for those using consumer drives, it might make sense.  And
I didn't say to make this change for all arrays, just for those in a
rebuilding state where losing another disk would be potentially
fatal. 

Phil> Anyways, it's a policy issue that belongs in userspace.  Distros
Phil> can do this today if they want.  There's no lack of scripts in
Phil> this list's archives.

Sure, but I'm saying that MD should push the policy to default to
doing this.  You can turn if off if you like and if you know enough.  

>> - Automatically setup an automatic scrub of the array that happens
>> weekly unless you explicitly turn it off.  This would possibly
>> require changes from the distros, but if it could be made a core
>> part of MD so that all the blocks in the array get read each week,
>> that would help with silent failures.

Phil> I understand some distros already do this.

>> We've got all these compute cycles kicking around that could be used
>> to make things even more reliable, we should be using them in some
>> smart way.

Phil> But the "smart way" varies with the hardware at hand.  There's
Phil> no "one size fits all" solution here.

What's the common thread?  A RAID5 loses a disk.  While rebuilding,
another disk goes south.  Poof!  The entire array is toast until you
go through alot of manual steps to re-create.  All I'm suggesting is
that when in a degraded state, MD automatically becomes more tolerant
of timeouts, errors and tries harder to keep going.  

John

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 13:52     ` John Stoffel
  2013-05-03 14:51       ` Phil Turmel
@ 2013-05-03 16:29       ` Mikael Abrahamsson
  2013-05-03 19:29         ` John Stoffel
  1 sibling, 1 reply; 24+ messages in thread
From: Mikael Abrahamsson @ 2013-05-03 16:29 UTC (permalink / raw)
  To: John Stoffel; +Cc: linux-raid

On Fri, 3 May 2013, John Stoffel wrote:

> We've got all these compute cycles kicking around that could be used to 
> make things even more reliable, we should be using them in some smart 
> way.

I would like to see mdadm tell the user that RAID5 is not a recommended 
raid level for component drives over 200 TB in size and recommend people 
to use RAID6, and ask for confirmation from the operator to go ahead and 
create RAID5 anyway if operator really wants that.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 16:23         ` John Stoffel
@ 2013-05-03 16:32           ` Roman Mamedov
  2013-05-04 14:48             ` maurice
  0 siblings, 1 reply; 24+ messages in thread
From: Roman Mamedov @ 2013-05-03 16:32 UTC (permalink / raw)
  To: John Stoffel
  Cc: Phil Turmel, Robin Hill, Andreas Boman, linux-raid,
	Benjamin ESTRABAUD

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]

On Fri, 3 May 2013 12:23:24 -0400
"John Stoffel" <john@stoffel.org> wrote:

> Sure, but I'm saying that MD should push the policy to default to
> doing this.  You can turn if off if you like and if you know enough.  

I wouldn't want mdadm suddenly change its behavior on my systems to activate
some automated waste of resources, and leave me scrambling how to disable
this, with the point of this whole change being to pander to stupid and greedy
people (because using RAID5 and not RAID6 with a set-up of six 4 TB drives
is nothing but utter stupidity and greed). Let them lose their arrays once or
twice, and learn.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 16:29       ` Mikael Abrahamsson
@ 2013-05-03 19:29         ` John Stoffel
  2013-05-04  4:14           ` Mikael Abrahamsson
  0 siblings, 1 reply; 24+ messages in thread
From: John Stoffel @ 2013-05-03 19:29 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: John Stoffel, linux-raid

>>>>> "Mikael" == Mikael Abrahamsson <swmike@swm.pp.se> writes:

Mikael> On Fri, 3 May 2013, John Stoffel wrote:
>> We've got all these compute cycles kicking around that could be used to 
>> make things even more reliable, we should be using them in some smart 
>> way.

Mikael> I would like to see mdadm tell the user that RAID5 is not a recommended 
Mikael> raid level for component drives over 200 TB in size and recommend people 
Mikael> to use RAID6, and ask for confirmation from the operator to go ahead and 
Mikael> create RAID5 anyway if operator really wants that.

200T?  Isn't that a little big?  *grin*  Actually I think the warning
should be based on both number of devices in the array and the size of
members and the speed of the individual devices.  

Makeing a RADI5 with 10 RAM disks that are 10Gb each in size and can
write at 500MB/s might not that bad a thing to do.  I'm being a little
silly here, but I think the idea is right.  We need to account for all
three factors of a RAID5 array:  number of devices, size, and speed of
each devices.

John

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 19:29         ` John Stoffel
@ 2013-05-04  4:14           ` Mikael Abrahamsson
  0 siblings, 0 replies; 24+ messages in thread
From: Mikael Abrahamsson @ 2013-05-04  4:14 UTC (permalink / raw)
  To: John Stoffel; +Cc: linux-raid

On Fri, 3 May 2013, John Stoffel wrote:

> 200T?  Isn't that a little big?  *grin* Actually I think the warning 
> should be based on both number of devices in the array and the size of 
> members and the speed of the individual devices.

Yeah, I meant GB.

> Makeing a RADI5 with 10 RAM disks that are 10Gb each in size and can 
> write at 500MB/s might not that bad a thing to do.  I'm being a little 
> silly here, but I think the idea is right.  We need to account for all 
> three factors of a RAID5 array:  number of devices, size, and speed of 
> each devices.

Well, yes, 3 devices might be ok for RAID5, but for 4 devices or more I 
would like the operator to read a page and make a really informed 
decision, warning them about the consequences of RAID5 on large drives.

This page could also inform users about the mismatched SATA timeout (or we 
actually get the SATA layer default changed from 30 to 180 seconds), 
because right now I'd say 30 seconds isn't good for anything. RAID drives 
have 7 seconds default before reporting an error, making 30 seconds too 
long, and consumer drives have 120 seconds (?) making 30 seconds way too 
short.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 12:26 ` Ole Tange
@ 2013-05-04 11:29   ` Andreas Boman
  2013-05-05 14:00   ` Andreas Boman
  1 sibling, 0 replies; 24+ messages in thread
From: Andreas Boman @ 2013-05-04 11:29 UTC (permalink / raw)
  To: Ole Tange; +Cc: linux-raid

On 05/03/2013 08:26 AM, Ole Tange wrote:
> On Fri, May 3, 2013 at 1:23 PM, Andreas Boman<aboman@midgaard.us>  wrote:
>
>> This morning I came up to see the array degraded with two missing drives,
>> another failed during the rebuild.
> I just started this page for dealing with situations like yours:
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
>
>
> /Ole
Thanks Ole, that is a great resource. Thanks to everybody else who 
offered advise as well. I'll be getting new disks coming in today and 
will try to recover.

I didn't mean to set of a contentious debate about default behavior of 
mdadm. However having an option where mdadm can 'try harder' to complete 
would be a good thing. It should probably remain an option to prevent 
further beating on the disk and thus further data loss for those that 
would prefer to remove the disk and clone it before trying to restore.

/Andreas





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 16:32           ` Roman Mamedov
@ 2013-05-04 14:48             ` maurice
  0 siblings, 0 replies; 24+ messages in thread
From: maurice @ 2013-05-04 14:48 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-raid

On 5/3/2013 10:32 AM, Roman Mamedov wrote:
> ..
> with the point of this whole change being to pander to stupid and greedy
> people (because using RAID5 and not RAID6 with a set-up of six 4 TB drives
> is nothing but utter stupidity and greed).
More likely is the "cheapness"of using consumer desktop drives, and then 
acting outraged when they do not act reliably as enterprise drives.

Of course the drive manufacturers only make those expensive enterprise 
drives to extract more money from us,
they are really all the same, and it is all one big conspiracy!


-- 
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-03 12:26 ` Ole Tange
  2013-05-04 11:29   ` Andreas Boman
@ 2013-05-05 14:00   ` Andreas Boman
  2013-05-05 17:16     ` Andreas Boman
  1 sibling, 1 reply; 24+ messages in thread
From: Andreas Boman @ 2013-05-05 14:00 UTC (permalink / raw)
  To: Ole Tange; +Cc: linux-raid

On 05/03/2013 08:26 AM, Ole Tange wrote:
> On Fri, May 3, 2013 at 1:23 PM, Andreas Boman<aboman@midgaard.us>  wrote:
>
>> This morning I came up to see the array degraded with two missing drives,
>> another failed during the rebuild.
> I just started this page for dealing with situations like yours:
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
>
>
> /Ole

After having ddrescue running all night, it dropped to copying at a rate 
of 512B/s. I interrupted it and restarted, it stays at that speed. shows 
no errors:

    Press Ctrl-C to interrupt
           Initial status (read from logfile)
     rescued:   557052 MB,  errsize:       0 B,  errors:       0
     Current status
     rescued:     1493 GB,  errsize:       0 B,  current rate:      512 B/s
        ipos:   937316 MB,   errors:       0,    average rate:   16431 kB/s
           opos:   937316 MB,     time from last successful read:       0 s
           Copying non-tried blocks...

However that is much too slow...

Then, I decided to take a look at the superblocks and to my horror 
discovered this:

# mdadm --examine /dev/sd[b-g] >>raid.status
mdadm: No md superblock detected on /dev/sdb.
mdadm: No md superblock detected on /dev/sdc.
mdadm: No md superblock detected on /dev/sdd.
mdadm: No md superblock detected on /dev/sde.
mdadm: No md superblock detected on /dev/sdf.
mdadm: No md superblock detected on /dev/sdg.

Can I recover still? What is going on here?

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-05 14:00   ` Andreas Boman
@ 2013-05-05 17:16     ` Andreas Boman
  2013-05-06  1:10       ` Sam Bingner
  2013-05-06  3:21       ` Phil Turmel
  0 siblings, 2 replies; 24+ messages in thread
From: Andreas Boman @ 2013-05-05 17:16 UTC (permalink / raw)
  To: Ole Tange; +Cc: linux-raid

On 05/05/2013 10:00 AM, Andreas Boman wrote:
> On 05/03/2013 08:26 AM, Ole Tange wrote:
>> On Fri, May 3, 2013 at 1:23 PM, Andreas Boman<aboman@midgaard.us>  
>> wrote:
>>
>>> This morning I came up to see the array degraded with two missing 
>>> drives,
>>> another failed during the rebuild.
>> I just started this page for dealing with situations like yours:
>> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
>>
>>
>> /Ole
>
> After having ddrescue running all night, it dropped to copying at a 
> rate of 512B/s. I interrupted it and restarted, it stays at that 
> speed. shows no errors:
>
>    Press Ctrl-C to interrupt
>           Initial status (read from logfile)
>     rescued:   557052 MB,  errsize:       0 B,  errors:       0
>     Current status
>     rescued:     1493 GB,  errsize:       0 B,  current rate:      512 
> B/s
>        ipos:   937316 MB,   errors:       0,    average rate:   16431 
> kB/s
>           opos:   937316 MB,     time from last successful read:       
> 0 s
>           Copying non-tried blocks...
>
>
> However that is much too slow...
>
> Then, I decided to take a look at the superblocks and to my horror 
> discovered this:
>
> # mdadm --examine /dev/sd[b-g] >>raid.status
> mdadm: No md superblock detected on /dev/sdb.
> mdadm: No md superblock detected on /dev/sdc.
> mdadm: No md superblock detected on /dev/sdd.
> mdadm: No md superblock detected on /dev/sde.
> mdadm: No md superblock detected on /dev/sdf.
> mdadm: No md superblock detected on /dev/sdg.
>
> Can I recover still? What is going on here?
>
> Thanks,
> Andreas
>
Turns out the superblocks are there. I ran --examine on the disk instead 
of partition. OOps.

I still have the problem with ddrescue being very slow, it is running at 
512 B/s pretty much no matter what options I use. The ddrescued disk 
does NOT have a md superblock. I tried to ddrescue -i to skip and grab 
the last 3MB or so of the disk, that seemed to work, but I still don't 
have the superblock.

How do I find/recover the superblock from the original disk?

After that is done I'll try to get the array up with 4 disks, then add 
the spare and have it rebuild. After that I'll add a disk to go to raid 6.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-05 17:16     ` Andreas Boman
@ 2013-05-06  1:10       ` Sam Bingner
  2013-05-06  3:21       ` Phil Turmel
  1 sibling, 0 replies; 24+ messages in thread
From: Sam Bingner @ 2013-05-06  1:10 UTC (permalink / raw)
  To: Andreas Boman; +Cc: Ole Tange, linux-raid@vger.kernel.org

On May 5, 2013, at 7:17 AM, "Andreas Boman" <aboman@midgaard.us> wrote:

> On 05/05/2013 10:00 AM, Andreas Boman wrote:
>> On 05/03/2013 08:26 AM, Ole Tange wrote:
>>> On Fri, May 3, 2013 at 1:23 PM, Andreas Boman<aboman@midgaard.us>  wrote:
>>> 
>>>> This morning I came up to see the array degraded with two missing drives,
>>>> another failed during the rebuild.
>>> I just started this page for dealing with situations like yours:
>>> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
>>> 
>>> 
>>> /Ole
>> 
>> After having ddrescue running all night, it dropped to copying at a rate of 512B/s. I interrupted it and restarted, it stays at that speed. shows no errors:
>> 
>>   Press Ctrl-C to interrupt
>>          Initial status (read from logfile)
>>    rescued:   557052 MB,  errsize:       0 B,  errors:       0
>>    Current status
>>    rescued:     1493 GB,  errsize:       0 B,  current rate:      512 B/s
>>       ipos:   937316 MB,   errors:       0,    average rate:   16431 kB/s
>>          opos:   937316 MB,     time from last successful read:       0 s
>>          Copying non-tried blocks...
>> 
>> 
>> However that is much too slow...
>> 
>> Then, I decided to take a look at the superblocks and to my horror discovered this:
>> 
>> # mdadm --examine /dev/sd[b-g] >>raid.status
>> mdadm: No md superblock detected on /dev/sdb.
>> mdadm: No md superblock detected on /dev/sdc.
>> mdadm: No md superblock detected on /dev/sdd.
>> mdadm: No md superblock detected on /dev/sde.
>> mdadm: No md superblock detected on /dev/sdf.
>> mdadm: No md superblock detected on /dev/sdg.
>> 
>> Can I recover still? What is going on here?
>> 
>> Thanks,
>> Andreas
> Turns out the superblocks are there. I ran --examine on the disk instead of partition. OOps.
> 
> I still have the problem with ddrescue being very slow, it is running at 512 B/s pretty much no matter what options I use. The ddrescued disk does NOT have a md superblock. I tried to ddrescue -i to skip and grab the last 3MB or so of the disk, that seemed to work, but I still don't have the superblock.
> 
> How do I find/recover the superblock from the original disk?
> 
> After that is done I'll try to get the array up with 4 disks, then add the spare and have it rebuild. After that I'll add a disk to go to raid 6.
> 
> Thanks,
> Andreas
> 
You need to just let ddrescue run - it is probably on the area of the disk with problems.  When it gets past that it should speed up again.

If you just want to get the rest first you could add an entry to the log file saying some area around where you are now failed so that it goes back and tries it later, but I would not do that if I were you.

Sam

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-05 17:16     ` Andreas Boman
  2013-05-06  1:10       ` Sam Bingner
@ 2013-05-06  3:21       ` Phil Turmel
       [not found]         ` <51878BD0.9010809@midgaard.us>
  1 sibling, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2013-05-06  3:21 UTC (permalink / raw)
  To: Andreas Boman; +Cc: Ole Tange, linux-raid

Hi Andreas,

On 05/05/2013 01:16 PM, Andreas Boman wrote:

[trim /]

> Turns out the superblocks are there. I ran --examine on the disk instead
> of partition. OOps.

Please share the "--examine" reports for your array, and "smartctl -x"
for each disk, and anything from dmesg/syslog that relates to your array
or errors on its members.  (Your original post did say you would be able
to get log info.)

> I still have the problem with ddrescue being very slow, it is running at
> 512 B/s pretty much no matter what options I use. The ddrescued disk
> does NOT have a md superblock. I tried to ddrescue -i to skip and grab
> the last 3MB or so of the disk, that seemed to work, but I still don't
> have the superblock.
> 
> How do I find/recover the superblock from the original disk?

Superblocks are either at/near the beginning of the block device (v1.1 &
v1.2) or near the end (v0.90 and v1.0).  If you've already recovered
beginning and end, and it's still not there, then you won't find it.

It may have to be reconstructed as part of "--create --assume-clean",
but that is a dangerous operation.  You haven't yet shared enough
information to get good advice.

> After that is done I'll try to get the array up with 4 disks, then add
> the spare and have it rebuild. After that I'll add a disk to go to raid 6.

It may be wiser to get it running degraded and take a backup, but that
remains to be seen.  You haven't shown that you know why the first
rebuild failed.  Until that is understood and addressed, you probably
won't succeed in rebuilding onto a spare.

Phil

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
       [not found]         ` <51878BD0.9010809@midgaard.us>
@ 2013-05-06 12:36           ` Phil Turmel
       [not found]             ` <5188189D.1060806@midgaard.us>
  0 siblings, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2013-05-06 12:36 UTC (permalink / raw)
  To: Andreas Boman; +Cc: linux-raid

Hi Andreas,

You dropped the list.  Please don't do that.  I added it back, and left
the end of the mail untrimmed so the list can see it.

On 05/06/2013 06:54 AM, Andreas Boman wrote:
> On 05/05/2013 11:21 PM, Phil Turmel wrote:
>> Hi Andreas,
>>
>> On 05/05/2013 01:16 PM, Andreas Boman wrote:
>>
>> [trim /]
>>
>>> Turns out the superblocks are there. I ran --examine on the disk instead
>>> of partition. OOps.
>>
>> Please share the "--examine" reports for your array, and "smartctl -x"
>> for each disk, and anything from dmesg/syslog that relates to your array
>> or errors on its members.  (Your original post did say you would be able
>> to get log info.)
> 
> The --examine for the array (as it is now) and smartctl -x for the
> failed disk are at the end of this mail.
> 
> I pasted some log snippets here:  http://pastebin.com/iqnYje1W
> This should be the interesting part:
> 
> May  2 15:50:14 yggdrasil kernel: [    7.247383] md: md127 stopped.
> May  2 15:50:14 yggdrasil kernel: [    7.794697] raid5: allocated 5334kB
> for md127
> May  2 15:50:14 yggdrasil kernel: [    7.794843] md127: detected
> capacity change from 0 to 6001196793856
> May  2 15:50:14 yggdrasil kernel: [    7.796294]  md127: unknown
> partition table
> May  2 15:54:36 yggdrasil kernel: [  287.180692] md: recovery of RAID
> array md127
> May  2 22:40:26 yggdrasil kernel: [24637.888695] raid5:md127: read error
> not correctable (sector 884472576 on sda1).

Current versions of MD raid in the kernel allow multiple read errors per
hour before kicking out a drive.  What kernel and mdadm versions are
involved here?

> Disk was sda at the time, sdb now don't ask why it reorders at times, I
> don't know. Sometimes the on board boot disk is sda, sometimes it is the
> last disk it seems.

You need to document the device names vs. drive S/Ns so you don't mess
up any "--create" operations.  This is one of the reasons "--create
--assume-clean" is so dangerous.

I recommend my own "lsdrv" @ github.com/pturmel.  But an excerpt from
"ls -l /dev/disk/by-id/" will do.

Use of LABEL= and UUID= syntax in fstab and during boot is intended to
mitigate the fact that the kernel cannot guarantee the order it finds
devices during boot.

> I tried to jump ddrescue to the end of the drive to ensure I get the md
> superblock and then live with some lost data after file system repair.
> 
> ./ddrescue -f -d -n -i1499889899520 /dev/sdb /dev/sdf /root/rescue.log
> 
> ^that is what i did (i tried to go further and further). These completed
> every time with no error. Also no superblock copied.

You have a v0.90 array.  The superblock is within 128k of the end of the
partition.

> I'm guessing its some kind of user error that prevents me from copying
> that superblock.

Yes, something destroyed it.

> I'm still trying to determine if bringing the array up (--assemble
> --force) using this disk with the missing data will be just bad or very
> bad? I've been told that mdadm doesn't care, but what will it do when
> data is missing in a chunk on this disk?

Presuming you mean while using the ddrescued copy, then any bad data
will show up in the array's files.  There's no help for that.

>>> After that is done I'll try to get the array up with 4 disks, then add
>>> the spare and have it rebuild. After that I'll add a disk to go to
>>> raid 6.
>>
>> It may be wiser to get it running degraded and take a backup, but that
>> remains to be seen.  You haven't shown that you know why the first
>> rebuild failed.  Until that is understood and addressed, you probably
>> won't succeed in rebuilding onto a spare.

You only shared one "smartctl -x" report.  Please show the others.  If
the others show pending sectors, you will have more difficulty after
rescuing sdb.  (You will need to use ddrescue on the other drives that
show pending sectors.)

/dev/sdb has six pending sectors--unrecoverable read errors that won't
be resolved until those sectors are rewritten.  They might be normal
transient errors that'll be fine after rewrite.  Or they might be
unwritable, and the drive will have to re-allocate them.  You need
regular "check" scrubs in a non-degraded array to catch these early and
fix them.

Since ddrescue is struggling with this disk starting at 884471083, close
to the point where MD kicked it, you might have a large damage area that
can't be rewritten.

> I have been wondering about that, it would be difficult to do (not to
> mention I'd have to buy a bunch of large disks to backup to), but I have
> (am) considered it.

Be careful selecting drives.  The Samsung drive has ERC--you really want
to pick drives that have it.

If I understand correctly, your current plan is to ddrescue sdb, then
assemble degraded (with --force).  I agree with this plan, and I think
you should not need to use "--create --assume-clean".  You will need to
fsck the filesystem before you mount, and accept that some data will be
lost.  Be sure to remove sdb from the system after you've duplicated it,
as two drives with identical metadata will cause problems for MD.

Phil

> 
> Thanks,
> Andreas
> 
> 
> ---------------------metadata
> /dev/sdb1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 60b8d5d0:00c342d3:59cb281a:834c72d9
>   Creation Time : Sun Oct  3 06:23:33 2010
>      Raid Level : raid5
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 5860543744 (5589.05 GiB 6001.20 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 127
> 
>     Update Time : Thu May  2 22:15:22 2013
>           State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : dd5e9120 - correct
>          Events : 1011948
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     2       8        1        2      active sync   /dev/sda1
> 
>    0     0       8       33        0      active sync   /dev/sdc1
>    1     1       0        0        1      faulty removed
>    2     2       8        1        2      active sync   /dev/sda1
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       65        4      active sync   /dev/sde1
>    5     5       8       17        5      spare   /dev/sdb1
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 60b8d5d0:00c342d3:59cb281a:834c72d9
>   Creation Time : Sun Oct  3 06:23:33 2010
>      Raid Level : raid5
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 5860543744 (5589.05 GiB 6001.20 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 127
> 
>     Update Time : Fri May  3 05:31:49 2013
>           State : clean
>  Active Devices : 3
> Working Devices : 4
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : dd5ef826 - correct
>          Events : 1012026
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     5       8       17        5      spare   /dev/sdb1
> 
>    0     0       8       33        0      active sync   /dev/sdc1
>    1     1       0        0        1      faulty removed
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       65        4      active sync   /dev/sde1
>    5     5       8       17        5      spare   /dev/sdb1
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 60b8d5d0:00c342d3:59cb281a:834c72d9
>   Creation Time : Sun Oct  3 06:23:33 2010
>      Raid Level : raid5
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 5860543744 (5589.05 GiB 6001.20 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 127
> 
>     Update Time : Fri May  3 05:31:49 2013
>           State : clean
>  Active Devices : 3
> Working Devices : 4
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : dd5ef832 - correct
>          Events : 1012026
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8       33        0      active sync   /dev/sdc1
> 
>    0     0       8       33        0      active sync   /dev/sdc1
>    1     1       0        0        1      faulty removed
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       65        4      active sync   /dev/sde1
>    5     5       8       17        5      spare   /dev/sdb1
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 60b8d5d0:00c342d3:59cb281a:834c72d9
>   Creation Time : Sun Oct  3 06:23:33 2010
>      Raid Level : raid5
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 5860543744 (5589.05 GiB 6001.20 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 127
> 
>     Update Time : Fri May  3 05:31:49 2013
>           State : clean
>  Active Devices : 3
> Working Devices : 4
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : dd5ef848 - correct
>          Events : 1012026
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     3       8       49        3      active sync   /dev/sdd1
> 
>    0     0       8       33        0      active sync   /dev/sdc1
>    1     1       0        0        1      faulty removed
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       65        4      active sync   /dev/sde1
>    5     5       8       17        5      spare   /dev/sdb1
> /dev/sdg1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 60b8d5d0:00c342d3:59cb281a:834c72d9
>   Creation Time : Sun Oct  3 06:23:33 2010
>      Raid Level : raid5
>   Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
>      Array Size : 5860543744 (5589.05 GiB 6001.20 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 127
> 
>     Update Time : Fri May  3 05:31:49 2013
>           State : clean
>  Active Devices : 3
> Working Devices : 4
>  Failed Devices : 2
>   Spare Devices : 1
>        Checksum : dd5ef85a - correct
>          Events : 1012026
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     4       8       65        4      active sync   /dev/sde1
> 
>    0     0       8       33        0      active sync   /dev/sdc1
>    1     1       0        0        1      faulty removed
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       65        4      active sync   /dev/sde1
>    5     5       8       17        5      spare   /dev/sdb1
> 
> 
> 
> ---------------------smartctl -x
> 
> 
>  smartctl -x /dev/sdb
> smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION ===
> Model Family:     SAMSUNG SpinPoint F2 EG series
> Device Model:     SAMSUNG HD154UI
> Serial Number:    S1Y6J1LZ100168
> Firmware Version: 1AG01118
> User Capacity:    1,500,301,910,016 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 3b
> Local Time is:    Mon May  6 05:41:26 2013 EDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x00)    Offline data collection activity
>                     was never started.
>                     Auto Offline Data Collection: Disabled.
> Self-test execution status:      ( 114)    The previous self-test
> completed having
>                     the read element of the test failed.
> Total time to complete Offline
> data collection:          (19591) seconds.
> Offline data collection
> capabilities:              (0x7b) SMART execute Offline immediate.
>                     Auto Offline data collection on/off support.
>                     Suspend Offline collection upon new
>                     command.
>                     Offline surface scan supported.
>                     Self-test supported.
>                     Conveyance Self-test supported.
>                     Selective Self-test supported.
> SMART capabilities:            (0x0003)    Saves SMART data before entering
>                     power-saving mode.
>                     Supports SMART auto save timer.
> Error logging capability:        (0x01)    Error logging supported.
>                     General Purpose Logging supported.
> Short self-test routine
> recommended polling time:      (   2) minutes.
> Extended self-test routine
> recommended polling time:      ( 255) minutes.
> Conveyance self-test routine
> recommended polling time:      (  34) minutes.
> SCT capabilities:            (0x003f)    SCT Status supported.
>                     SCT Error Recovery Control supported.
>                     SCT Feature Control supported.
>                     SCT Data Table supported.
> 
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE     
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail 
> Always       -       17
>   3 Spin_Up_Time            0x0007   063   063   011    Pre-fail 
> Always       -       11770
>   4 Start_Stop_Count        0x0032   100   100   000    Old_age  
> Always       -       155
>   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
> Always       -       0
>   7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail 
> Always       -       0
>   8 Seek_Time_Performance   0x0025   100   097   015    Pre-fail 
> Offline      -       14926
>   9 Power_On_Hours          0x0032   100   100   000    Old_age  
> Always       -       378
>  10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail 
> Always       -       0
>  11 Calibration_Retry_Count 0x0012   100   100   000    Old_age  
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age  
> Always       -       67
>  13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age  
> Always       -       17
> 183 Runtime_Bad_Block       0x0032   100   100   000    Old_age  
> Always       -       1
> 184 End-to-End_Error        0x0033   100   100   000    Pre-fail 
> Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age  
> Always       -       18
> 188 Command_Timeout         0x0032   100   100   000    Old_age  
> Always       -       0
> 190 Airflow_Temperature_Cel 0x0022   075   068   000    Old_age  
> Always       -       25 (Lifetime Min/Max 17/32)
> 194 Temperature_Celsius     0x0022   075   066   000    Old_age  
> Always       -       25 (Lifetime Min/Max 17/34)
> 195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age  
> Always       -       463200379
> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age  
> Always       -       0
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age  
> Always       -       6
> 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age  
> Offline      -       1
> 199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age  
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age  
> Always       -       0
> 201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age  
> Always       -       0
> 
> General Purpose Logging (GPL) feature set supported
> General Purpose Log Directory Version 1
> SMART           Log Directory Version 1 [multi-sector log support]
> GP/S  Log at address 0x00 has    1 sectors [Log Directory]
> SMART Log at address 0x01 has    1 sectors [Summary SMART error log]
> SMART Log at address 0x02 has    2 sectors [Comprehensive SMART error log]
> GP    Log at address 0x03 has    2 sectors [Ext. Comprehensive SMART
> error log]
> GP    Log at address 0x04 has    2 sectors [Device Statistics]
> SMART Log at address 0x06 has    1 sectors [SMART self-test log]
> GP    Log at address 0x07 has    2 sectors [Extended self-test log]
> SMART Log at address 0x09 has    1 sectors [Selective self-test log]
> GP    Log at address 0x10 has    1 sectors [NCQ Command Error]
> GP    Log at address 0x11 has    1 sectors [SATA Phy Event Counters]
> GP    Log at address 0x20 has    2 sectors [Streaming performance log]
> GP    Log at address 0x21 has    1 sectors [Write stream error log]
> GP    Log at address 0x22 has    1 sectors [Read stream error log]
> GP/S  Log at address 0x80 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x81 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x82 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x83 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x84 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x85 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x86 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x87 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x88 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x89 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x8a has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x8b has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x8c has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x8d has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x8e has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x8f has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x90 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x91 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x92 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x93 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x94 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x95 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x96 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x97 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x98 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x99 has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x9a has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x9b has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x9c has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x9d has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x9e has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0x9f has   16 sectors [Host vendor specific log]
> GP/S  Log at address 0xe0 has    1 sectors [SCT Command/Status]
> GP/S  Log at address 0xe1 has    1 sectors [SCT Data Transfer]
> 
> SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
> Device Error Count: 6
>     CR     = Command Register
>     FEATR  = Features Register
>     COUNT  = Count (was: Sector Count) Register
>     LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
>     LH     = LBA High (was: Cylinder High) Register    ]   LBA
>     LM     = LBA Mid (was: Cylinder Low) Register      ] Register
>     LL     = LBA Low (was: Sector Number) Register     ]
>     DV     = Device (was: Device/Head) Register
>     DC     = Device Control Register
>     ER     = Error register
>     ST     = Status register
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
> 
> Error 6 [5] occurred at disk power-on lifetime: 329 hours (13 days + 17
> hours)
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER -- ST COUNT  LBA_48  LH LM LL DV DC
>   -- -- -- == -- == == == -- -- -- -- --
>   00 -- 42 00 00 00 00 34 b7 f5 34 40 00
> 
>   Commands leading to the command that caused the error were:
>   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
> Command/Feature_Name
>   -- == -- == -- == == == -- -- -- -- --  --------------- 
> --------------------
>   60 00 08 01 00 00 00 00 b7 f4 3f 40 00     06:51:19.350  READ FPDMA
> QUEUED
>   60 00 00 01 00 00 00 00 b7 f5 3f 40 00     06:51:19.350  READ FPDMA
> QUEUED
>   60 00 08 01 00 00 00 00 b7 f7 3f 40 00     06:51:19.350  READ FPDMA
> QUEUED
>   60 00 70 01 00 00 00 00 b7 f6 3f 40 00     06:51:19.350  READ FPDMA
> QUEUED
>   60 00 08 01 00 00 00 00 b7 f8 3f 40 00     06:51:19.350  READ FPDMA
> QUEUED
> 
> Error 5 [4] occurred at disk power-on lifetime: 329 hours (13 days + 17
> hours)
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER -- ST COUNT  LBA_48  LH LM LL DV DC
>   -- -- -- == -- == == == -- -- -- -- --
>   00 -- 42 00 00 00 00 34 b7 f5 35 40 00
> 
>   Commands leading to the command that caused the error were:
>   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
> Command/Feature_Name
>   -- == -- == -- == == == -- -- -- -- --  --------------- 
> --------------------
>   60 00 08 01 00 00 00 00 b7 fb 3f 40 00     06:51:13.030  READ FPDMA
> QUEUED
>   60 00 00 01 00 00 00 00 b7 fa 3f 40 00     06:51:13.030  READ FPDMA
> QUEUED
>   60 00 08 00 e8 00 00 00 b7 f9 57 40 00     06:51:13.030  READ FPDMA
> QUEUED
>   60 00 70 00 18 00 00 00 b7 f9 3f 40 00     06:51:13.030  READ FPDMA
> QUEUED
>   60 00 08 01 00 00 00 00 b7 f8 3f 40 00     06:51:13.030  READ FPDMA
> QUEUED
> 
> Error 4 [3] occurred at disk power-on lifetime: 329 hours (13 days + 17
> hours)
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER -- ST COUNT  LBA_48  LH LM LL DV DC
>   -- -- -- == -- == == == -- -- -- -- --
>   00 -- 42 00 00 00 00 34 b7 f5 33 40 00
> 
>   Commands leading to the command that caused the error were:
>   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
> Command/Feature_Name
>   -- == -- == -- == == == -- -- -- -- --  --------------- 
> --------------------
>   60 00 08 01 00 00 00 00 b7 f4 3f 40 00     06:51:08.060  READ FPDMA
> QUEUED
>   60 00 00 01 00 00 00 00 b7 f5 3f 40 00     06:51:08.060  READ FPDMA
> QUEUED
>   60 00 08 01 00 00 00 00 b7 f7 3f 40 00     06:51:08.060  READ FPDMA
> QUEUED
>   60 00 70 01 00 00 00 00 b7 f6 3f 40 00     06:51:08.060  READ FPDMA
> QUEUED
>   60 00 08 01 00 00 00 00 b7 f8 3f 40 00     06:51:08.060  READ FPDMA
> QUEUED
> 
> Error 3 [2] occurred at disk power-on lifetime: 329 hours (13 days + 17
> hours)
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER -- ST COUNT  LBA_48  LH LM LL DV DC
>   -- -- -- == -- == == == -- -- -- -- --
>   00 -- 42 00 00 00 00 34 b7 f5 31 40 00
> 
>   Commands leading to the command that caused the error were:
>   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
> Command/Feature_Name
>   -- == -- == -- == == == -- -- -- -- --  --------------- 
> --------------------
>   60 00 08 01 00 00 00 00 b7 fb 3f 40 00     06:51:03.650  READ FPDMA
> QUEUED
>   60 00 00 01 00 00 00 00 b7 fa 3f 40 00     06:51:03.650  READ FPDMA
> QUEUED
>   60 00 08 00 e8 00 00 00 b7 f9 57 40 00     06:51:03.650  READ FPDMA
> QUEUED
>   60 00 70 00 18 00 00 00 b7 f9 3f 40 00     06:51:03.650  READ FPDMA
> QUEUED
>   60 00 08 01 00 00 00 00 b7 f8 3f 40 00     06:51:03.650  READ FPDMA
> QUEUED
> 
> Error 2 [1] occurred at disk power-on lifetime: 329 hours (13 days + 17
> hours)
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER -- ST COUNT  LBA_48  LH LM LL DV DC
>   -- -- -- == -- == == == -- -- -- -- --
>   00 -- 42 00 00 00 00 34 b7 f5 34 40 00
> 
>   Commands leading to the command that caused the error were:
>   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
> Command/Feature_Name
>   -- == -- == -- == == == -- -- -- -- --  --------------- 
> --------------------
>   60 00 08 01 00 00 00 00 b7 f4 3f 40 00     06:50:58.240  READ FPDMA
> QUEUED
>   60 00 00 01 00 00 00 00 b7 f5 3f 40 00     06:50:58.240  READ FPDMA
> QUEUED
>   60 00 08 01 00 00 00 00 b7 f7 3f 40 00     06:50:58.240  READ FPDMA
> QUEUED
>   60 00 70 01 00 00 00 00 b7 f6 3f 40 00     06:50:58.240  READ FPDMA
> QUEUED
>   60 00 08 01 00 00 00 00 b7 f8 3f 40 00     06:50:58.240  READ FPDMA
> QUEUED
> 
> Error 1 [0] occurred at disk power-on lifetime: 329 hours (13 days + 17
> hours)
>   When the command that caused the error occurred, the device was active
> or idle.
> 
>   After command completion occurred, registers were:
>   ER -- ST COUNT  LBA_48  LH LM LL DV DC
>   -- -- -- == -- == == == -- -- -- -- --
>   00 -- 42 00 00 00 00 34 b7 f5 2f 40 00
> 
>   Commands leading to the command that caused the error were:
>   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
> Command/Feature_Name
>   -- == -- == -- == == == -- -- -- -- --  --------------- 
> --------------------
>   60 00 00 01 00 00 00 00 b7 f4 3f 40 00     06:50:53.300  READ FPDMA
> QUEUED
>   60 00 08 01 00 00 00 00 b7 f3 3f 40 00     06:50:53.280  READ FPDMA
> QUEUED
>   60 00 00 01 00 00 00 00 b7 f2 3f 40 00     06:50:53.280  READ FPDMA
> QUEUED
>   60 00 08 00 e8 00 00 00 b7 f1 57 40 00     06:50:53.280  READ FPDMA
> QUEUED
>   60 00 00 00 18 00 00 00 b7 f1 3f 40 00     06:50:53.270  READ FPDMA
> QUEUED
> 
> SMART Extended Self-test Log Version: 1 (2 sectors)
> Num  Test_Description    Status                  Remaining 
> LifeTime(hours)  LBA_of_first_error
> # 1  Short offline       Completed: read failure       20%      
> 375         884471093
> # 2  Short offline       Completed: read failure       20%      
> 351         884471083
> # 3  Extended offline    Completed: read failure       90%      
> 337         884471094
> # 4  Short offline       Completed: read failure       20%      
> 333         884471093
> # 5  Short offline       Completed without error       00%      
> 309         -
> # 6  Short offline       Completed without error       00%      
> 285         -
> # 7  Short offline       Completed without error       00%      
> 261         -
> # 8  Short offline       Completed without error       00%      
> 237         -
> # 9  Short offline       Completed without error       00%      
> 213         -
> #10  Extended offline    Completed: read failure       60%      
> 203         884471093
> #11  Short offline       Completed without error       00%      
> 190         -
> #12  Short offline       Completed without error       00%      
> 166         -
> #13  Short offline       Completed without error       00%      
> 142         -
> #14  Short offline       Completed without error       00%      
> 118         -
> #15  Short offline       Completed without error       00%       
> 94         -
> #16  Short offline       Completed without error       00%       
> 70         -
> 
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>   After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
> 
> SCT Status Version:                  2
> SCT Version (vendor specific):       256 (0x0100)
> SCT Support Level:                   1
> Device State:                        Active (0)
> Current Temperature:                 25 Celsius
> Power Cycle Max Temperature:         34 Celsius
> Lifetime    Max Temperature:         40 Celsius
> SCT Temperature History Version:     2
> Temperature Sampling Period:         1 minute
> Temperature Logging Interval:        1 minute
> Min/Max recommended Temperature:     -4/72 Celsius
> Min/Max Temperature Limit:           -9/77 Celsius
> Temperature History Size (Index):    128 (113)
> 
> Index    Estimated Time   Temperature Celsius
>  114    2013-05-06 03:34    26  *******
>  115    2013-05-06 03:35    25  ******
>  116    2013-05-06 03:36    26  *******
>  ...    ..(  3 skipped).    ..  *******
>  120    2013-05-06 03:40    26  *******
>  121    2013-05-06 03:41    25  ******
>  122    2013-05-06 03:42    25  ******
>  123    2013-05-06 03:43    26  *******
>  124    2013-05-06 03:44    25  ******
>  ...    ..(  4 skipped).    ..  ******
>    1    2013-05-06 03:49    25  ******
>    2    2013-05-06 03:50    26  *******
>    3    2013-05-06 03:51    26  *******
>    4    2013-05-06 03:52    25  ******
>    5    2013-05-06 03:53    25  ******
>    6    2013-05-06 03:54    25  ******
>    7    2013-05-06 03:55    26  *******
>  ...    ..(  2 skipped).    ..  *******
>   10    2013-05-06 03:58    26  *******
>   11    2013-05-06 03:59    25  ******
>   12    2013-05-06 04:00    26  *******
>   13    2013-05-06 04:01    25  ******
>   14    2013-05-06 04:02    25  ******
>   15    2013-05-06 04:03    26  *******
>   16    2013-05-06 04:04    25  ******
>  ...    ..(  4 skipped).    ..  ******
>   21    2013-05-06 04:09    25  ******
>   22    2013-05-06 04:10    26  *******
>   23    2013-05-06 04:11    25  ******
>  ...    ..( 89 skipped).    ..  ******
>  113    2013-05-06 05:41    25  ******
> 
> SCT Error Recovery Control:
>            Read:     70 (7.0 seconds)
>           Write:     70 (7.0 seconds)
> 
> SATA Phy Event Counters (GP Log 0x11)
> ID      Size     Value  Description
> 0x000a  2            7  Device-to-host register FISes sent due to a
> COMRESET
> 0x0001  2            0  Command failed due to ICRC error
> 0x0002  2            0  R_ERR response for data FIS
> 0x0003  2            0  R_ERR response for device-to-host data FIS
> 0x0004  2            0  R_ERR response for host-to-device data FIS
> 0x0005  2            0  R_ERR response for non-data FIS
> 0x0006  2            0  R_ERR response for device-to-host non-data FIS
> 0x0007  2            0  R_ERR response for host-to-device non-data FIS
> 0x0008  2            0  Device-to-host non-data FIS retries
> 0x0009  2            7  Transition from drive PhyRdy to drive PhyNRdy
> 0x000b  2            0  CRC errors within host-to-device FIS
> 0x000d  2            0  Non-CRC errors within host-to-device FIS
> 0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
> 0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
> 0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
> 0x0013  2            0  R_ERR response for host-to-device non-data FIS,
> non-CRC
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
       [not found]             ` <5188189D.1060806@midgaard.us>
@ 2013-05-07  0:39               ` Phil Turmel
  2013-05-07  1:14                 ` Andreas Boman
  0 siblings, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2013-05-07  0:39 UTC (permalink / raw)
  To: Andreas Boman; +Cc: linux-raid

On 05/06/2013 04:54 PM, Andreas Boman wrote:
> On 05/06/2013 08:36 AM, Phil Turmel wrote:

[trim /]

>> Current versions of MD raid in the kernel allow multiple read errors per
>> hour before kicking out a drive.  What kernel and mdadm versions are
>> involved here?
> kernel 2.6.32-5-amd64, mdadm 3.1.4 (debian 6.0.7)

Ok.  Missing some neat features, but not a crisis.

>>> Disk was sda at the time, sdb now don't ask why it reorders at times, I
>>> don't know. Sometimes the on board boot disk is sda, sometimes it is the
>>> last disk it seems.
>>
>> You need to document the device names vs. drive S/Ns so you don't mess
>> up any "--create" operations.  This is one of the reasons "--create
>> --assume-clean" is so dangerous. I recommend my own "lsdrv" @
>> github.com/pturmel.  But an excerpt from
>> "ls -l /dev/disk/by-id/" will do.
>>
>> Use of LABEL= and UUID= syntax in fstab and during boot is intended to
>> mitigate the fact that the kernel cannot guarantee the order it finds
>> devices during boot.
>>
> Noted, I'll look into this.

Thanks to smartctl, we now have an index of drive names to serial
numbers.  Whenever you create an array, document what drive is what
role, just in case.

[trim /]

>>> I'm guessing its some kind of user error that prevents me from copying
>>> that superblock.
>>
>> Yes, something destroyed it.
> The superblock is available on the original drive, I can do mdadm -E
> /dev/sdb all day long. It just hasn't transferred to the new disk.

Hmmm.  v0.90 is at the end of the member device.  Does your partition go
all the way to the end?  Please show your partition tables:

fdisk -lu /dev/sd[bcdefg]

>>> I'm still trying to determine if bringing the array up (--assemble
>>> --force) using this disk with the missing data will be just bad or very
>>> bad? I've been told that mdadm doesn't care, but what will it do when
>>> data is missing in a chunk on this disk?
>>
>> Presuming you mean while using the ddrescued copy, then any bad data
>> will show up in the array's files.  There's no help for that.
> Right, but the array will come up and fdisk (xfs_repair) should be able
> to get it going again with most data avalable? mdadm won't get wonky
> when expected parity data isn't there for example?

Yes, xfs_repair will fix what's fixable.  It might not notice file
contents that are no longer correct.

[trim /]

>> /dev/sdb has six pending sectors--unrecoverable read errors that won't
>> be resolved until those sectors are rewritten.  They might be normal
>> transient errors that'll be fine after rewrite.  Or they might be
>> unwritable, and the drive will have to re-allocate them.  You need
>> regular "check" scrubs in a non-degraded array to catch these early and
>> fix them.
> I have smartd run daily 'short self test' and weekly 'long self test', I
> guess that wasn't enough.

No.  Each drive by itself cannot fix its own errors.  It needs its bad
data *rewritten* by an upper layer.  MD will do this when it encounters
read errors in a non-degraded array.  And it will test-read everything
to trigger these corrections during a "check" scrub.  See the "Scrubbing
and Mismatches" section of the "md" man-page.

As long as these errors aren't bunched together so much that MD exceeds
its internal read error limits, the drives with these errors are fixed
and stay on online.  More on this below, though.

>> Since ddrescue is struggling with this disk starting at 884471083, close
>> to the point where MD kicked it, you might have a large damage area that
>> can't be rewritten.
>>
>>> I have been wondering about that, it would be difficult to do (not to
>>> mention I'd have to buy a bunch of large disks to backup to), but I have
>>> (am) considered it.
>>
>> Be careful selecting drives.  The Samsung drive has ERC--you really want
>> to pick drives that have it.
> Noted, I'll look into what that is and hope that my new disks have it as
> well.

Your /dev/sdb {SAMSUNG HD154UI S1Y6J1LZ100168} has ERC, and it is set to
the typical 7.0 seconds for RAID duty:

> SCT Error Recovery Control:
>            Read:     70 (7.0 seconds)
>           Write:     70 (7.0 seconds)

Your /dev/sdc {SAMSUNG HD154UI S1XWJX0D300206} also has ERC, but it is
disabled:

> SCT Error Recovery Control:
>            Read: Disabled
>           Write: Disabled

Fortunately, it has no pending sectors (yet).

Your /dev/sdd {SAMSUNG HD154UI S1XWJX0B900500} and /dev/sde {SAMSUNG
HD154UI S1XWJ1KS813588} also have ERC, and are also disabled.

If ERC is available, but disabled, it can be enabled by a suitable
script in /etc/local.d/ or in /etc/rc.local (Enterprise drives enable it
by default, desktop disks do not.), like so:

# smartctl -l scterc,70,70 /dev/sdc

Now for the bad news:

Your /dev/sdf {ST3000DM001-1CH166 W1F1LTQY} and /dev/sdg
{ST3000DM001-1CH166 W1F1LTQY} do not have ERC at all.  Modern "green"
drives generally don't:

> Warning: device does not support SCT Error Recovery Control command

Since these cannot be set to a short error timeout, the linux driver's
timeout must be changed to tolerate 2+ minutes of error recovery.  I
recommend 180 seconds.  This must be put in /etc/local.d/ or
/etc/rc.local like so:

# echo 180 >/sys/block/sdf/device/timeout

If you don't do this, "check" scrubbing will fail.  And by fail, I mean
any ordinary URE will kick drives out instead of fixing them.  Search
the archives for "scterc" and you'll find more detailed explanations
(attached to horror stories).

>> If I understand correctly, your current plan is to ddrescue sdb, then
>> assemble degraded (with --force).  I agree with this plan, and I think
>> you should not need to use "--create --assume-clean".  You will need to
>> fsck the filesystem before you mount, and accept that some data will be
>> lost.  Be sure to remove sdb from the system after you've duplicated it,
>> as two drives with identical metadata will cause problems for MD.

> Correct, that is the plan: Assemble degraded with the 3 'good' disks and
> the ddrescued copy, xfs_repair and add the 5th disk back. Allow it to
> resynch the array. Then reshape to raid 6. Allow that to finish, then
> add another disk and grow the array/lvm/filesystem. Looking at a lot of
> beating on the disks reshaping so much, but after that I should be fine
> for a while. I'll probably add a hot spare as well.

I would encourage you to take your backups of critical files as soon as
the array is running, before you add a fifth disk.  Then you can add two
disks and recover/reshape simultaneously.

Phil

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-07  0:39               ` Phil Turmel
@ 2013-05-07  1:14                 ` Andreas Boman
  2013-05-07  1:46                   ` Phil Turmel
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Boman @ 2013-05-07  1:14 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On 05/06/2013 08:39 PM, Phil Turmel wrote:
> On 05/06/2013 04:54 PM, Andreas Boman wrote:
>> On 05/06/2013 08:36 AM, Phil Turmel wrote:
>
> [trim /]
<snip>
>
>
> Hmmm.  v0.90 is at the end of the member device.  Does your partition go
> all the way to the end?  Please show your partition tables:
>
> fdisk -lu /dev/sd[bcdefg]

fdisk -lu /dev/sd[bcdefg]

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x3d1e17f0

    Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              63  2930272064  1465136001   fd  Linux raid 
autodetect

Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

    Device Boot      Start         End      Blocks   Id  System
/dev/sdc1              63  2930272064  1465136001   fd  Linux raid 
autodetect

Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

    Device Boot      Start         End      Blocks   Id  System
/dev/sdd1              63  2930272064  1465136001   fd  Linux raid 
autodetect

Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x36cc19da

    Device Boot      Start         End      Blocks   Id  System
/dev/sde1              63  2930272064  1465136001   fd  Linux raid 
autodetect

Disk /dev/sdf: 3000.6 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders, total 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x3d1e17f0

    Device Boot      Start         End      Blocks   Id  System
/dev/sdf1              63  2930272064  1465136001   fd  Linux raid 
autodetect
Partition 1 does not start on physical sector boundary.

Disk /dev/sdg: 3000.6 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders, total 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

    Device Boot      Start         End      Blocks   Id  System
/dev/sdg1              63  2930272064  1465136001   fd  Linux raid 
autodetect
Partition 1 does not start on physical sector boundary.

>> Warning: device does not support SCT Error Recovery Control command
>
> Since these cannot be set to a short error timeout, the linux driver's
> timeout must be changed to tolerate 2+ minutes of error recovery.  I
> recommend 180 seconds.  This must be put in /etc/local.d/ or
> /etc/rc.local like so:
>
> # echo 180>/sys/block/sdf/device/timeout
>
> If you don't do this, "check" scrubbing will fail.  And by fail, I mean
> any ordinary URE will kick drives out instead of fixing them.  Search
> the archives for "scterc" and you'll find more detailed explanations
> (attached to horror stories).

Thank you! I had no idea about that or I obviously would not have bought 
those disks...

<snip>
>
> I would encourage you to take your backups of critical files as soon as
> the array is running, before you add a fifth disk.  Then you can add two
> disks and recover/reshape simultaneously.

Hmm.. any hints as to how to do that at the same time? That does sound 
better.

Thanks you for all your help/advice Phil.
Andreas


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-07  1:14                 ` Andreas Boman
@ 2013-05-07  1:46                   ` Phil Turmel
  2013-05-07  2:08                     ` Andreas Boman
  0 siblings, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2013-05-07  1:46 UTC (permalink / raw)
  To: Andreas Boman; +Cc: linux-raid

On 05/06/2013 09:14 PM, Andreas Boman wrote:
> fdisk -lu /dev/sd[bcdefg]
> 
> Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
> 255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x3d1e17f0
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1              63  2930272064  1465136001   fd  Linux raid
> autodetect

Oooo!  That's not good.  Your partitions are not on 4k boundaries, so
they won't be compatible with modern large drives.  Modern fdisk puts
the first partition at sector 2048 by default.  (Highly recommended.)
You're stuck with this on the old drives until you can rebuild the
entire array.

[trim /]

> Disk /dev/sdf: 3000.6 GB, 3000592982016 bytes
> 255 heads, 63 sectors/track, 364801 cylinders, total 5860533168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x3d1e17f0
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdf1              63  2930272064  1465136001   fd  Linux raid
> autodetect
> Partition 1 does not start on physical sector boundary.

This is serious.  The drives will run, but every block written to them
will create at least two read-modify-write cycles on 4k sectors.  In
addition to crushing your array's performance, it will prevent scrub
actions from fixing UREs (the read part of the R-M-W will fail).

Fortunately, these new drives are bigger than the originals, so you can
put the partition at sector 2048 and still have it the same size as the
originals.  Warning:  v0.90 has problems with partitions greater than
2GB in some kernel versions.  When you are ready to fix your overall
partition alignment issues, you probably want to switch to v1.1 or v1.2
metadata as well.

[trim /]

>> I would encourage you to take your backups of critical files as soon as
>> the array is running, before you add a fifth disk.  Then you can add two
>> disks and recover/reshape simultaneously.
> 
> Hmm.. any hints as to how to do that at the same time? That does sound
> better.

I believe you would set "sync_max" to "0" before adding the spares, then
issue the "--grow" command to reshape, then set "sync_max" to "max".
Others may want to chime in here.

Phil

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-07  1:46                   ` Phil Turmel
@ 2013-05-07  2:08                     ` Andreas Boman
  2013-05-07  2:16                       ` Phil Turmel
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Boman @ 2013-05-07  2:08 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On 05/06/2013 09:46 PM, Phil Turmel wrote:
> Fortunately, these new drives are bigger than the originals, so you 
> can put the partition at sector 2048 and still have it the same size 
> as the originals. Warning: v0.90 has problems with partitions greater 
> than 2GB in some kernel versions. When you are ready to fix your 
> overall partition alignment issues, you probably want to switch to 
> v1.1 or v1.2 metadata as well. [trim /] 
Ok, this is not great news.. I'll have to fix that. Later.

My ddrescue problem remains however, I'm still unable to get my degraded 
array online with the ddrescued disk since I'm missing the md superblock 
on that disk.

/Andreas

mdadm -E /dev/sdf1
mdadm: No md superblock detected on /dev/sdf1.


./ddrescue -f -d /dev/sdb /dev/sdf /root/rescue.log


GNU ddrescue 1.17-rc3
Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued:     1493 GB,  errsize:       0 B,  errors:       0
Current status
rescued:     1493 GB,  errsize:       0 B,  current rate:      512 B/s
    ipos:   942774 MB,   errors:       0,    average rate:      579 B/s
    opos:   942774 MB,    time since last successful read:       0 s
Copying non-tried blocks..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-07  2:08                     ` Andreas Boman
@ 2013-05-07  2:16                       ` Phil Turmel
  2013-05-07  2:21                         ` Andreas Boman
  0 siblings, 1 reply; 24+ messages in thread
From: Phil Turmel @ 2013-05-07  2:16 UTC (permalink / raw)
  To: Andreas Boman; +Cc: linux-raid

On 05/06/2013 10:08 PM, Andreas Boman wrote:
> My ddrescue problem remains however, I'm still unable to get my degraded
> array online with the ddrescued disk since I'm missing the md superblock
> on that disk.
> 
> /Andreas
> 
> mdadm -E /dev/sdf1
> mdadm: No md superblock detected on /dev/sdf1.

Try using "blockdev --rereadpt /dev/sdf".  Then check again.

Phil


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Failed during rebuild (raid5)
  2013-05-07  2:16                       ` Phil Turmel
@ 2013-05-07  2:21                         ` Andreas Boman
  0 siblings, 0 replies; 24+ messages in thread
From: Andreas Boman @ 2013-05-07  2:21 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On 05/06/2013 10:16 PM, Phil Turmel wrote:
> On 05/06/2013 10:08 PM, Andreas Boman wrote:
>> My ddrescue problem remains however, I'm still unable to get my degraded
>> array online with the ddrescued disk since I'm missing the md superblock
>> on that disk.
>>
>> /Andreas
>>
>> mdadm -E /dev/sdf1
>> mdadm: No md superblock detected on /dev/sdf1.
> Try using "blockdev --rereadpt /dev/sdf".  Then check again.
>
> Phil
>

No change.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-05-07  2:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-03 11:23 Failed during rebuild (raid5) Andreas Boman
2013-05-03 11:38 ` Benjamin ESTRABAUD
2013-05-03 12:40   ` Robin Hill
2013-05-03 13:52     ` John Stoffel
2013-05-03 14:51       ` Phil Turmel
2013-05-03 16:23         ` John Stoffel
2013-05-03 16:32           ` Roman Mamedov
2013-05-04 14:48             ` maurice
2013-05-03 16:29       ` Mikael Abrahamsson
2013-05-03 19:29         ` John Stoffel
2013-05-04  4:14           ` Mikael Abrahamsson
2013-05-03 12:26 ` Ole Tange
2013-05-04 11:29   ` Andreas Boman
2013-05-05 14:00   ` Andreas Boman
2013-05-05 17:16     ` Andreas Boman
2013-05-06  1:10       ` Sam Bingner
2013-05-06  3:21       ` Phil Turmel
     [not found]         ` <51878BD0.9010809@midgaard.us>
2013-05-06 12:36           ` Phil Turmel
     [not found]             ` <5188189D.1060806@midgaard.us>
2013-05-07  0:39               ` Phil Turmel
2013-05-07  1:14                 ` Andreas Boman
2013-05-07  1:46                   ` Phil Turmel
2013-05-07  2:08                     ` Andreas Boman
2013-05-07  2:16                       ` Phil Turmel
2013-05-07  2:21                         ` Andreas Boman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).