raid5 missing disks during chunk size grow

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid5 missing disks during chunk size grow
@ 2014-09-22 11:21 Martin Senebald
  2014-09-22 11:44 ` Mikael Abrahamsson
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Senebald @ 2014-09-22 11:21 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

Hi,

i have a tricky problem. I have a 6disk raid5 setup and did a mdadm grow chunk size.
Apparently 3 disk went missing (controller/kernel crash) and the raid failed.  Disks are all ok. Backupfile for grow is there and raid config available including status(examine) of all disks.

Problem = 3 disks where in a state like

/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : 84bea3d7:7b697819:b8c1aa5a:a615e3b4
           Name : chenbro.han.daquan.eu:2  (local to host chenbro.han.daquan.eu)
  Creation Time : Sat Sep 20 12:28:57 2014
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5856270336 (2792.49 GiB 2998.41 GB)
     Array Size : 14640675840 (13962.44 GiB 14992.05 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 67cc689a:2cdf3d4d:0c7d03d4:c12c7475

  Reshape pos'n : 557547520 (531.72 GiB 570.93 GB)
  New Chunksize : 512K

    Update Time : Mon Sep 22 09:39:35 2014
       Checksum : 25acf25d - correct
         Events : 109099

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : ...AAA ('A' == active, '.' == missing)

and 3 in 

/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : 84bea3d7:7b697819:b8c1aa5a:a615e3b4
           Name : chenbro.han.daquan.eu:2  (local to host chenbro.han.daquan.eu)
  Creation Time : Sat Sep 20 12:28:57 2014
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5856270336 (2792.49 GiB 2998.41 GB)
     Array Size : 14640675840 (13962.44 GiB 14992.05 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f74a1585:93fb850b:64667d57:42c34e52

  Reshape pos'n : 557547520 (531.72 GiB 570.93 GB)
  New Chunksize : 512K

    Update Time : Mon Sep 22 09:02:59 2014
       Checksum : 2ef3049b - correct
         Events : 109096

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAAAA ('A' == active, '.' == missing)

My idea was, first bringing back the disks in the array then continue the grow (using the backup file)
add / re-add disk was not working, so i assumed recreate the array with —assume-clean would bring me closer. 

Question now ist, how to i get the array back to the grow process at the state of the failure? 
I didnt find anything regarding this. (if even possible)

Someone any idea?
What would be the best way to tackle this problem?

Thanks

BR Martin


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid5 missing disks during chunk size grow
  2014-09-22 11:21 raid5 missing disks during chunk size grow Martin Senebald
@ 2014-09-22 11:44 ` Mikael Abrahamsson
  2014-09-22 11:55   ` Martin Senebald
  0 siblings, 1 reply; 7+ messages in thread
From: Mikael Abrahamsson @ 2014-09-22 11:44 UTC (permalink / raw)
  To: Martin Senebald; +Cc: linux-raid

[-- Attachment #1: Type: TEXT/PLAIN, Size: 648 bytes --]

On Mon, 22 Sep 2014, Martin Senebald wrote:

> My idea was, first bringing back the disks in the array then continue the grow (using the backup file)
> add / re-add disk was not working, so i assumed recreate the array with —assume-clean would bring me closer.

Was this an idea you had that you didn't do, or did you actually execute 
on it?

> What would be the best way to tackle this problem?

Send mdadm --examine from all 6 component drives to the list and let's 
take it from there. Under no circumstances do --create on the components.

What kernel version and mdadm version do you have?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid5 missing disks during chunk size grow
  2014-09-22 11:44 ` Mikael Abrahamsson
@ 2014-09-22 11:55   ` Martin Senebald
  2014-09-22 12:02     ` Mikael Abrahamsson
  2014-09-23  0:32     ` NeilBrown
  0 siblings, 2 replies; 7+ messages in thread
From: Martin Senebald @ 2014-09-22 11:55 UTC (permalink / raw)
  To: linux-raid

Am 22.09.2014 um 13:44 schrieb Mikael Abrahamsson <swmike@swm.pp.se>:

> On Mon, 22 Sep 2014, Martin Senebald wrote:
> 
>> My idea was, first bringing back the disks in the array then continue the grow (using the backup file)
>> add / re-add disk was not working, so i assumed recreate the array with —assume-clean would bring me closer.
> 
> Was this an idea you had that you didn't do, or did you actually execute on it?

I did .. 

> 
>> What would be the best way to tackle this problem?
> 
> Send mdadm --examine from all 6 component drives to the list and let's take it from there.

the current state of the disks:

https://gist.github.com/daquan/94239614fc3b67789c9a#file-current-state

the state before the —create 

https://gist.github.com/daquan/94239614fc3b67789c9a#file-before-create-assume-clean


> Under no circumstances do --create on the components.

that sounds not so promising anymore :-/

> 
> What kernel version and mdadm version do you have?
> 
> -- 
> Mikael Abrahamsson    email: swmike@swm.pp.se
> 


BR Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid5 missing disks during chunk size grow
  2014-09-22 11:55   ` Martin Senebald
@ 2014-09-22 12:02     ` Mikael Abrahamsson
  2014-09-22 12:50       ` Martin Senebald
  2014-09-23  0:32     ` NeilBrown
  1 sibling, 1 reply; 7+ messages in thread
From: Mikael Abrahamsson @ 2014-09-22 12:02 UTC (permalink / raw)
  To: Martin Senebald; +Cc: linux-raid

[-- Attachment #1: Type: TEXT/PLAIN, Size: 743 bytes --]

On Mon, 22 Sep 2014, Martin Senebald wrote:

>> Was this an idea you had that you didn't do, or did you actually execute on it?
>
> I did ..

What made you think that was a good idea? I have been re-writing text on 
the linux-raid wiki to discourage people from doing that. Wherefrom did 
you get the information?

> the state before the —create
>
> https://gist.github.com/daquan/94239614fc3b67789c9a#file-before-create-assume-clean

I have no idea how to get you from there.

>> Under no circumstances do --create on the components.
>
> that sounds not so promising anymore :-/

Indeed.

>>
>> What kernel version and mdadm version do you have?

Why didn't you answer this crucial question?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid5 missing disks during chunk size grow
  2014-09-22 12:02     ` Mikael Abrahamsson
@ 2014-09-22 12:50       ` Martin Senebald
  2014-09-22 12:59         ` Mikael Abrahamsson
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Senebald @ 2014-09-22 12:50 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1803 bytes --]

Am 22.09.2014 um 14:02 schrieb Mikael Abrahamsson <swmike@swm.pp.se>:

> On Mon, 22 Sep 2014, Martin Senebald wrote:
> 
>>> Was this an idea you had that you didn't do, or did you actually execute on it?
>> 
>> I did ..
> 
> What made you think that was a good idea?

Good question, i can’t point to a specific source. I guess this was more the "out of options“ decision that time. (at least it looked that way) 

> I have been re-writing text on the linux-raid wiki to discourage people from doing that. Wherefrom did you get the information?

And obviously i didn’t find that either. I just quickly looked over it and didn’t find it. 

> 
>> the state before the —create
>> 
>> https://gist.github.com/daquan/94239614fc3b67789c9a#file-before-create-assume-clean
> 
> I have no idea how to get you from there.

That was maybe what lead me to believe —create might help. I understand that recreating the array was maybe the worst thing (destroying the reshape state information of the array) 
but generally speaking i think the data itself is not yet lost. The positions of the reshape is available for each disk(what i see from the examine). The Data on the md2 didn’t change while reshape was ongoing (is a lvm pvdisk and lvm was not active). But i have no clue how to bring the raid to the state of grow when the raid failed.  I don’t understand to much how the grow process for chunk size works. 

> 
>>> Under no circumstances do --create on the components.
>> 
>> that sounds not so promising anymore :-/
> 
> Indeed.
>>> 
>>> What kernel version and mdadm version do you have?
> 
> Why didn't you answer this crucial question?

I just forgot :) 

	Debian 3.2.60-1+deb7u3 x86_64

> 
> -- 
> Mikael Abrahamsson    email: swmike@swm.pp.se
> 

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid5 missing disks during chunk size grow
  2014-09-22 12:50       ` Martin Senebald
@ 2014-09-22 12:59         ` Mikael Abrahamsson
  0 siblings, 0 replies; 7+ messages in thread
From: Mikael Abrahamsson @ 2014-09-22 12:59 UTC (permalink / raw)
  To: Martin Senebald; +Cc: linux-raid

On Mon, 22 Sep 2014, Martin Senebald wrote:

> I just forgot :)
>
> 	Debian 3.2.60-1+deb7u3 x86_64

You still didn't provide mdadm version.

I would suggest you look into getting a more recent kernel version and 
mdadm version, because I would guess you have a non-recent mdadm with your 
debian dist.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid5 missing disks during chunk size grow
  2014-09-22 11:55   ` Martin Senebald
  2014-09-22 12:02     ` Mikael Abrahamsson
@ 2014-09-23  0:32     ` NeilBrown
  1 sibling, 0 replies; 7+ messages in thread
From: NeilBrown @ 2014-09-23  0:32 UTC (permalink / raw)
  To: Martin Senebald; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3188 bytes --]

On Mon, 22 Sep 2014 13:55:24 +0200 Martin Senebald <martin@senebald.de> wrote:

> Am 22.09.2014 um 13:44 schrieb Mikael Abrahamsson <swmike@swm.pp.se>:
> 
> > On Mon, 22 Sep 2014, Martin Senebald wrote:
> > 
> >> My idea was, first bringing back the disks in the array then continue the grow (using the backup file)
> >> add / re-add disk was not working, so i assumed recreate the array with —assume-clean would bring me closer.
> > 
> > Was this an idea you had that you didn't do, or did you actually execute on it?
> 
> I did .. 

oops. Though maybe I should say OOOPS.

Part of your array had one chunk size, part of the array had another.  By
using "create" you had to choose one chunk size or the other.  Obviously
neither is correct of the whole device.

You are now in a situation where you have made a mess and you need to somehow
recover your data.  It is all there, but how patient and careful can you be?

By far the safest approach would be to find some other storage solution into
which you can restore all the data.  Then you can try to restore the data
there and see if it look OK.

There are three sections  to the data:

 1/ the early part of the array which has been reshaped to the new chunk size.
 2/ the part of the array which is stored in the backup file.
 3/ the late part of the array which has not been reshaped yet.

Depending on which chunksize you used when you created the array (and
assuming the the newly created array has the same data-offset as the old
array), then either '1' or '3' should be available directly in the newly
created array.  Calculating the exact start and size requires care.  I
suggest you try to work it out and I can check your calculations.

If you copy that out, then 'create' the array with the other chunk size you
should be able to copy the other large section.

Getting the data out of the backup file might require careful reading of a
hex dump of that file to read the 'superblock' to find out exactly what is
store there and  where.  It shouldn't be difficult but does need care.

If  you do go down this path, please feel free to ask for more specifics and
ask me to check your calculations.

For future reference "--assemble --force" is your friend.

NeilBrown

> 
> > 
> >> What would be the best way to tackle this problem?
> > 
> > Send mdadm --examine from all 6 component drives to the list and let's take it from there.
> 
> the current state of the disks:
> 
> https://gist.github.com/daquan/94239614fc3b67789c9a#file-current-state
> 
> the state before the —create 
> 
> https://gist.github.com/daquan/94239614fc3b67789c9a#file-before-create-assume-clean
> 
> 
> > Under no circumstances do --create on the components.
> 
> that sounds not so promising anymore :-/
> 
> > 
> > What kernel version and mdadm version do you have?
> > 
> > -- 
> > Mikael Abrahamsson    email: swmike@swm.pp.se
> > 
> 
> 
> BR Martin
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-23  0:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-22 11:21 raid5 missing disks during chunk size grow Martin Senebald
2014-09-22 11:44 ` Mikael Abrahamsson
2014-09-22 11:55   ` Martin Senebald
2014-09-22 12:02     ` Mikael Abrahamsson
2014-09-22 12:50       ` Martin Senebald
2014-09-22 12:59         ` Mikael Abrahamsson
2014-09-23  0:32     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).