Raid5 Array stopped suddenly, no apparent error messages.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid5 Array stopped suddenly, no apparent error messages.
@ 2012-05-09 18:08 Brian McKee
  2012-05-10 10:09 ` John Robinson
  0 siblings, 1 reply; 3+ messages in thread
From: Brian McKee @ 2012-05-09 18:08 UTC (permalink / raw)
  To: linux-raid

If this is not a good place to ask for help, please point me to where I
can ask. Sorry if I offend.

TL;DR: My question is this: is it safe to run mdadm --create
--assume-clean on an existing array? And by safe I mean: is it
guaranteed that the existing ext4 partition's data will not be lost when
I run the command?

Background: I have a 4.5TB Raid 5 Array (96% full) that has been up and
running for almost two years. A couple of months ago I upgraded my
kernel to 3.2.12 and yesterday around 2:00 PM this raid array stopped
working, became read only and started throwing a lot of I/O errors.

I assumed it was a failed drive but was saddened to see that according
to mdadm three drives had been removed from the array. The array
contains four drives, three 1.5 TB drives and one 2.0 TB drive
(partitioned so that it has a 1.5TB first partition to match the other
three). All three of the 1.5TB drives reported as bad according to mdadm.

I ran smartctl and it reported that all four drives were healthy which
gave me hope that something else had gone wrong and the data was okay.

For more details you can read this gentoo thread:
http://forums.gentoo.org/viewtopic-p-7033578.html

Summary: The three drives won't assemble because they are not fresh.

Any help or advice would be greatly appreciated. Mostly I want to know
if its safe to force mdadm to accept the drives into an array with
--create --assume-clean.

Thanks,

Brian McKee

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Raid5 Array stopped suddenly, no apparent error messages.
  2012-05-09 18:08 Raid5 Array stopped suddenly, no apparent error messages Brian McKee
@ 2012-05-10 10:09 ` John Robinson
  2012-05-10 15:50   ` Brian McKee
  0 siblings, 1 reply; 3+ messages in thread
From: John Robinson @ 2012-05-10 10:09 UTC (permalink / raw)
  To: Brian McKee; +Cc: linux-raid

On 09/05/2012 19:08, Brian McKee wrote:
> If this is not a good place to ask for help, please point me to where I
> can ask. Sorry if I offend.
>
> TL;DR: My question is this: is it safe to run mdadm --create
> --assume-clean on an existing array? And by safe I mean: is it
> guaranteed that the existing ext4 partition's data will not be lost when
> I run the command?

Any --create --assume-clean will only rewrite the metadata. You would 
need to get the command exactly right, specifying the chunk size, 
metadata type and member partitions in the right order in order to be 
able to see your filesystem. However...

[...]
> For more details you can read this gentoo thread:
> http://forums.gentoo.org/viewtopic-p-7033578.html
>
> Summary: The three drives won't assemble because they are not fresh.

I don't think you are seeing the recent kernel bug; you can see all the 
correct metadata on all your drives.

The problem you have is that your member partitions have different event 
counts. You can force the assembly, ignoring the different event counts, 
with --assemble --force. You should then run a fsck as there may already 
be some corruption which occurred when the event counts got out of sync.

You should also try to track down what caused the issue in the first 
place. Check your logs for ata errors.

Cheers,

John.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Raid5 Array stopped suddenly, no apparent error messages.
  2012-05-10 10:09 ` John Robinson
@ 2012-05-10 15:50   ` Brian McKee
  0 siblings, 0 replies; 3+ messages in thread
From: Brian McKee @ 2012-05-10 15:50 UTC (permalink / raw)
  To: John Robinson; +Cc: linux-raid

John,

Your intuition was right on. The log shows that the ESATA link went down
around 2:40 PM. The kernel brought it back up, had problems, slowed it
to 1.5Gb/sec and then started chasing its own tail. I got a few Temp
Celsius and read error rate change messages during the ordeal.

mdadm -A --force /dev/md1 followed by fsck.ext4 /dev/md1 totally worked
and there were only two inode count wrong error messages.

Thanks much for taking the time to help me! I really appreciate it.

Brian

On 05/10/2012 03:09 AM, John Robinson wrote:
> On 09/05/2012 19:08, Brian McKee wrote:
>> If this is not a good place to ask for help, please point me to where I
>> can ask. Sorry if I offend.
>>
>> TL;DR: My question is this: is it safe to run mdadm --create
>> --assume-clean on an existing array? And by safe I mean: is it
>> guaranteed that the existing ext4 partition's data will not be lost when
>> I run the command?
>
> Any --create --assume-clean will only rewrite the metadata. You would
> need to get the command exactly right, specifying the chunk size,
> metadata type and member partitions in the right order in order to be
> able to see your filesystem. However...
>
> [...]
>> For more details you can read this gentoo thread:
>> http://forums.gentoo.org/viewtopic-p-7033578.html
>>
>> Summary: The three drives won't assemble because they are not fresh.
>
> I don't think you are seeing the recent kernel bug; you can see all
> the correct metadata on all your drives.
>
> The problem you have is that your member partitions have different
> event counts. You can force the assembly, ignoring the different event
> counts, with --assemble --force. You should then run a fsck as there
> may already be some corruption which occurred when the event counts
> got out of sync.
>
> You should also try to track down what caused the issue in the first
> place. Check your logs for ata errors.
>
> Cheers,
>
> John.
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-05-10 15:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-09 18:08 Raid5 Array stopped suddenly, no apparent error messages Brian McKee
2012-05-10 10:09 ` John Robinson
2012-05-10 15:50   ` Brian McKee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).