Corrupted ext4 filesystem after mdadm manipulation error

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Corrupted ext4 filesystem after mdadm manipulation error
@ 2014-04-24  5:05 L.M.J
  2014-04-24 17:48 ` L.M.J
  0 siblings, 1 reply; 19+ messages in thread
From: L.M.J @ 2014-04-24  5:05 UTC (permalink / raw)
  To: linux-raid

Hi,

For the third time, I had to change a failed drive from my home linux RAID5 box. Previous one went right and
this time, I don't know what I did wrong, but I broke my RAID5. Well, at least, he didn't want to
start. /dev/sdb was the failed drive /dev/sdc and /dev/sdd are OK.

I tried to reassemble the RAID with this command after I replace sdb and create a new partition :

 ~# mdadm -Cv /dev/md0 --assume-clean --level=5 --raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1
-> '-C' was not a good idea here 

Well, I guess I did an another mistake here, I should have done this instead :
 ~# mdadm -Av /dev/md0 --assume-clean --level=5 --raid-devices=3 /dev/sdc1 /dev/sdd1 missing

Maybe this wipe out my data...
Let's go futher, then, pvdisplay, pvscan, vgdisplay returns empty information 

Google helped me, and I did this :
 ~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt

	[..]
	physical_volumes {
		pv0 {
			id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
			device = "/dev/md0"
			status = ["ALLOCATABLE"]
			flags = []
			dev_size = 7814047360
			pe_start = 384
			pe_count = 953863
		}
	}
	logical_volumes {

		lvdata {
			id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
			status = ["READ", "WRITE", "VISIBLE"]
			flags = []
			segment_count = 1
	[..]

Since I saw lvm information, I guess I haven't lost all information yet...

I tried an unhoped command :
 ~# pvcreate --uuid "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW" --restorefile /etc/lvm/archive/lvm-raid_00302.vg /dev/md0 Then,

 ~# vgcfgrestore lvm-raid

 ~# lvs -a -o +devices
   LV     VG       Attr   LSize   Origin Snap%  Move Log Copy%  Convert  Devices
   lvdata lvm-raid -wi-a- 450,00g                                                     /dev/md0(148480)
   lvmp   lvm-raid -wi-a-  80,00g                                                     /dev/md0(263680)
Then :
 ~# lvchange -ay /dev/lvm-raid/lv*

I was quite happy until now.
Problem appears now when I try to mount those 2 LV (lvdata & lvmp) as ext4 partition :
 ~# mount /home/foo/RAID_mp/

 ~# mount | grep -i mp
      /dev/mapper/lvm--raid-lvmp on /home/foo/RAID_mp type ext4 (rw)

 ~# df -h /home/foo/RAID_mp
      Filesystem                            Size  Used Avail Use%   Mounted on
      /dev/mapper/lvm--raid-lvmp   79G   61G   19G  77%   /home/foo/RAID_mp

Here is the big problem
 ~# ls -la /home/foo/RAID_mp
      total 0

I did a LVM R/W snapshot on the /dev/mapper/lvm--raid-lvmp LV, I fsck it. I recover 50% of the files only,
all located in lost-+found/ directory with names heading with #xxxxx.

I would like to know if there is a last chance to recover my data ?

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Corrupted ext4 filesystem after mdadm manipulation error
  2014-04-24  5:05 Corrupted ext4 filesystem after mdadm manipulation error L.M.J
@ 2014-04-24 17:48 ` L.M.J
       [not found]   ` <CAK_KU4a+Ep7=F=NSbb-hqN6Rvayx4QPWm-M2403OHn5-LVaNZw@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: L.M.J @ 2014-04-24 17:48 UTC (permalink / raw)
  To: linux-raid

Up please :-(

Le Thu, 24 Apr 2014 07:05:48 +0200,
"L.M.J" <linuxmasterjedi@free.fr> a écrit :

> Hi,
> 
> For the third time, I had to change a failed drive from my home linux RAID5 box. Previous one went right and
> this time, I don't know what I did wrong, but I broke my RAID5. Well, at least, he didn't want to
> start. /dev/sdb was the failed drive /dev/sdc and /dev/sdd are OK.
> 
> I tried to reassemble the RAID with this command after I replace sdb and create a new partition :
> 
>  ~# mdadm -Cv /dev/md0 --assume-clean --level=5 --raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1
> -> '-C' was not a good idea here 
> 
> Well, I guess I did an another mistake here, I should have done this instead :
>  ~# mdadm -Av /dev/md0 --assume-clean --level=5 --raid-devices=3 /dev/sdc1 /dev/sdd1 missing
> 
> Maybe this wipe out my data...
> Let's go futher, then, pvdisplay, pvscan, vgdisplay returns empty information 
> 
> Google helped me, and I did this :
>  ~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt
> 
> 	[..]
> 	physical_volumes {
> 		pv0 {
> 			id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
> 			device = "/dev/md0"
> 			status = ["ALLOCATABLE"]
> 			flags = []
> 			dev_size = 7814047360
> 			pe_start = 384
> 			pe_count = 953863
> 		}
> 	}
> 	logical_volumes {
> 
> 		lvdata {
> 			id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
> 			status = ["READ", "WRITE", "VISIBLE"]
> 			flags = []
> 			segment_count = 1
> 	[..]
> 
> Since I saw lvm information, I guess I haven't lost all information yet...
> 
> I tried an unhoped command :
>  ~# pvcreate --uuid "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
> --restorefile /etc/lvm/archive/lvm-raid_00302.vg /dev/md0 Then,
> 
>  ~# vgcfgrestore lvm-raid
> 
>  ~# lvs -a -o +devices
>    LV     VG       Attr   LSize   Origin Snap%  Move Log Copy%  Convert  Devices
>    lvdata lvm-raid -wi-a- 450,00g                                                     /dev/md0(148480)
>    lvmp   lvm-raid -wi-a-  80,00g                                                     /dev/md0(263680)
> Then :
>  ~# lvchange -ay /dev/lvm-raid/lv*
> 
> I was quite happy until now.
> Problem appears now when I try to mount those 2 LV (lvdata & lvmp) as ext4 partition :
>  ~# mount /home/foo/RAID_mp/
> 
>  ~# mount | grep -i mp
>       /dev/mapper/lvm--raid-lvmp on /home/foo/RAID_mp type ext4 (rw)
> 
>  ~# df -h /home/foo/RAID_mp
>       Filesystem                            Size  Used Avail Use%   Mounted on
>       /dev/mapper/lvm--raid-lvmp   79G   61G   19G  77%   /home/foo/RAID_mp
> 
> Here is the big problem
>  ~# ls -la /home/foo/RAID_mp
>       total 0
> 
> I did a LVM R/W snapshot on the /dev/mapper/lvm--raid-lvmp LV, I fsck it. I recover 50% of the files only,
> all located in lost-+found/ directory with names heading with #xxxxx.
> 
> I would like to know if there is a last chance to recover my data ?
> 
> Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <CAK_KU4a+Ep7=F=NSbb-hqN6Rvayx4QPWm-M2403OHn5-LVaNZw@mail.gmail.com>]

* Re: Corrupted ext4 filesystem after mdadm manipulation error
       [not found]   ` <CAK_KU4a+Ep7=F=NSbb-hqN6Rvayx4QPWm-M2403OHn5-LVaNZw@mail.gmail.com>
@ 2014-04-24 18:35     ` L.M.J
       [not found]       ` <CAK_KU4Zh-azXEEzW4f1m=boCZDKevqaSHxW0XoAgRdrCbm2PkA@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: L.M.J @ 2014-04-24 18:35 UTC (permalink / raw)
  To: Scott D'Vileskis; +Cc: linux-raid

Hello Scott,

  Do you think I've lost my data 100% for sure ? fsck recovered 50% of the files, don't you thing there is
  still something to save ?

 Thanks


Le Thu, 24 Apr 2014 14:13:05 -0400,
"Scott D'Vileskis" <sdvileskis@gmail.com> a écrit :

> NEVER USE "CREATE" ON FILESYSTEMS OR RAID ARRAYS UNLESS YOU KNOW WHAT YOU
> ARE DOING!
> CREATE destroys things in the creation process, especially with the --force
> option.
> 
> The create argument is only done to create a new array, it will start with
> two drives as 'good' drives and the last will likely be the degraded drive,
> so it will start resyncing and blowing away data on the last drive.  If you
> used the --assume clean argument, and it DID NOT resync the drives, you
> might be able to recreate the array with the two good disks, provided you
> know the original order.
> 
> If you used the --create option, and didn't have your disks in the same
> order they were originally in, you probably lost your data.
> 
> Since you replaced a disk, with no data (or worse, with bad data), you
> should have assembled the array, in degraded mode WITHOUT the
> --assume-clean argument.
> 
> If C & D contain your data, and B used to..
> mdadm --assemble /dev/md0 missing /dev/sdc1 /dev/sdd1
> You might have to --force the assembly. If it works, and it runs in
> degraded mode, mount your filesystem and take a backup.
> 
> Next, then add your replacement drive back in:
> mdadm --add /dev/md0 /dev/sdb1
> (Note, if sdb1 has some superblock data, you might have to
> --zero-superblock first)
> 
> 
> Good luck.
> 
> 
> On Thu, Apr 24, 2014 at 1:48 PM, L.M.J <linuxmasterjedi@free.fr> wrote:
> 
> > Up please :-(
> >
> > Le Thu, 24 Apr 2014 07:05:48 +0200,
> > "L.M.J" <linuxmasterjedi@free.fr> a écrit :
> >
> > > Hi,
> > >
> > > For the third time, I had to change a failed drive from my home linux
> > RAID5 box. Previous one went right and
> > > this time, I don't know what I did wrong, but I broke my RAID5. Well, at
> > least, he didn't want to
> > > start. /dev/sdb was the failed drive /dev/sdc and /dev/sdd are OK.
> > >
> > > I tried to reassemble the RAID with this command after I replace sdb and
> > create a new partition :
> > >
> > >  ~# mdadm -Cv /dev/md0 --assume-clean --level=5 --raid-devices=3
> > /dev/sdc1 /dev/sdd1 /dev/sdb1
> > > -> '-C' was not a good idea here
> > >
> > > Well, I guess I did an another mistake here, I should have done this
> > instead :
> > >  ~# mdadm -Av /dev/md0 --assume-clean --level=5 --raid-devices=3
> > /dev/sdc1 /dev/sdd1 missing
> > >
> > > Maybe this wipe out my data...
> > > Let's go futher, then, pvdisplay, pvscan, vgdisplay returns empty
> > information
> > >
> > > Google helped me, and I did this :
> > >  ~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt
> > >
> > >       [..]
> > >       physical_volumes {
> > >               pv0 {
> > >                       id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
> > >                       device = "/dev/md0"
> > >                       status = ["ALLOCATABLE"]
> > >                       flags = []
> > >                       dev_size = 7814047360
> > >                       pe_start = 384
> > >                       pe_count = 953863
> > >               }
> > >       }
> > >       logical_volumes {
> > >
> > >               lvdata {
> > >                       id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
> > >                       status = ["READ", "WRITE", "VISIBLE"]
> > >                       flags = []
> > >                       segment_count = 1
> > >       [..]
> > >
> > > Since I saw lvm information, I guess I haven't lost all information
> > yet...
> > >
> > > I tried an unhoped command :
> > >  ~# pvcreate --uuid "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
> > > --restorefile /etc/lvm/archive/lvm-raid_00302.vg /dev/md0 Then,
> > >
> > >  ~# vgcfgrestore lvm-raid
> > >
> > >  ~# lvs -a -o +devices
> > >    LV     VG       Attr   LSize   Origin Snap%  Move Log Copy%  Convert
> >  Devices
> > >    lvdata lvm-raid -wi-a- 450,00g
> >               /dev/md0(148480)
> > >    lvmp   lvm-raid -wi-a-  80,00g
> >               /dev/md0(263680)
> > > Then :
> > >  ~# lvchange -ay /dev/lvm-raid/lv*
> > >
> > > I was quite happy until now.
> > > Problem appears now when I try to mount those 2 LV (lvdata & lvmp) as
> > ext4 partition :
> > >  ~# mount /home/foo/RAID_mp/
> > >
> > >  ~# mount | grep -i mp
> > >       /dev/mapper/lvm--raid-lvmp on /home/foo/RAID_mp type ext4 (rw)
> > >
> > >  ~# df -h /home/foo/RAID_mp
> > >       Filesystem                            Size  Used Avail Use%
> > Mounted on
> > >       /dev/mapper/lvm--raid-lvmp   79G   61G   19G  77%
> > /home/foo/RAID_mp
> > >
> > > Here is the big problem
> > >  ~# ls -la /home/foo/RAID_mp
> > >       total 0
> > >
> > > I did a LVM R/W snapshot on the /dev/mapper/lvm--raid-lvmp LV, I fsck
> > it. I recover 50% of the files only,
> > > all located in lost-+found/ directory with names heading with #xxxxx.
> > >
> > > I would like to know if there is a last chance to recover my data ?
> > >
> > > Thanks
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <CAK_KU4Zh-azXEEzW4f1m=boCZDKevqaSHxW0XoAgRdrCbm2PkA@mail.gmail.com>]

* Re: Corrupted ext4 filesystem after mdadm manipulation error
       [not found]       ` <CAK_KU4Zh-azXEEzW4f1m=boCZDKevqaSHxW0XoAgRdrCbm2PkA@mail.gmail.com>
@ 2014-04-24 19:53         ` L.M.J
       [not found]         ` <CAK_KU4aDDaUSGgcGBwCeO+yE0Qa_pUmMdAHMu7pqO7dqEEC71g@mail.gmail.com>
  1 sibling, 0 replies; 19+ messages in thread
From: L.M.J @ 2014-04-24 19:53 UTC (permalink / raw)
  Cc: Scott D'Vileskis, linux-raid

Le Thu, 24 Apr 2014 15:39:11 -0400,
"Scott D'Vileskis" <sdvileskis@gmail.com> a écrit :

> Your data is split 3 ways.. 50% on one disk, 50% on another disk, and one
> disk worth of parity.
> 
> Now, it's not that simple, because the data is not continuous.. It is
> written across the three drives in chinks, with the parity alternating
> between the three drives.
> 
> If you were able to recover 50%, it probably means one disk contains valid
> data.
> 
> Were you able to recover anything larger than your chunk size? Are larger
> files (Mp3s and or movies) actually playable? Likely not.

I ran a fsck on a snapshot lvm partition.
It recovered a 50% of the file, all of them are located in /lost+found/
Here is the size

5,5M 2013-04-24 17:53 #4456582
5,7M 2013-04-24 17:53 #4456589
 16M 2013-04-24 17:53 #4456590
 25M 2013-04-24 17:53 #4456594
 17M 2013-04-24 17:53 #4456578
 18M 2013-04-24 17:53 #4456580
1,3M 2013-04-24 17:54 #4456597
1,1M 2013-04-24 17:54 #4456596
 17M 2013-04-24 17:54 #4456595
2,1M 2013-04-24 17:54 #4456599
932K 2013-04-24 17:54 #4456598


> You might get lucky trying to assemble the array in degraded mode with the
> 2 good disks, as long as the array didn't resync your new disk + good disk
> to the other good disk...
I try that already : re-assemble the array with the good disk and then add the new one. It didn't work as
expected. 


> If added properly, it would have resynced the two good disks with the blank
> disk. Try doing a 'hd /dev/sdb1' to see if there is data on the new disk

~# hd /dev/sdb1 
00000000  37 53 2f 78 4b 00 13 6f  41 43 55 5b 45 14 08 16  |7S/xK..oACU[E...|
00000010  01 03 7e 2a 11 63 13 6f  6b 01 64 6b 03 07 1a 06  |..~*.c.ok.dk....|
00000020  04 56 44 00 46 2a 32 6e  02 4d 56 12 6d 54 6d 66  |.VD.F*2n.MV.mTmf|
00000030  4b 06 18 00 41 49 28 27  4c 38 30 6b 27 2d 1f 25  |K...AI('L80k'-.%|
00000040  07 59 22 0c 19 5e 4c 39  25 2f 27 59 2f 7c 79 10  |.Y"..^L9%/'Y/|y.|
00000050  31 7a 4b 6e 53 49 41 56  13 39 15 4b 58 29 0f 15  |1zKnSIAV.9.KX)..|
00000060  0b 18 09 0f 6b 68 48 0e  7f 03 24 17 66 01 45 12  |....khH...$.f.E.|
00000070  31 1b 7e 1d 14 3c 10 0f  19 70 2d 05 10 2e 51 2a  |1.~..<...p-...Q*|
00000080  4e 54 3a 29 7f 00 45 5a  4d 3e 4c 26 1a 22 2b 57  |NT:)..EZM>L&."+W|
00000090  33 7e 46 51 41 56 79 2a  4e 45 3c 30 6f 1d 11 56  |3~FQAVy*NE<0o..V|
000000a0  4d 1e 64 07 2b 02 1d 01  31 11 58 49 45 5f 7e 2a  |M.d.+...1.XIE_~*|
000000b0  4e 45 57 67 00 16 00 54  4e 0f 55 10 1b 14 1c 00  |NEWg...TN.U.....|
000000c0  7f 58 58 45 54 5b 46 10  0d 2a 3a 7e 1c 08 11 45  |.XXET[F..*:~...E|
000000d0  53 54 7d 10 01 14 1e 07  48 52 54 10 3f 55 58 45  |ST}.....HRT.?UXE|
000000e0  64 61 2b 0a 19 1f 45 1d  1d 02 4b 7e 1d 1b 19 02  |da+...E...K~....|
000000f0  0d 4c 2a 4e 54 50 05 06  01 3e 17 0e 57 64 17 4f  |.L*NTP...>..Wd.O|
00000100  4a 7f 42 7d 4c 52 09 49  53 45 43 1e 7c 6e 12 00  |J.B}LR.ISEC.|n..|
00000110  13 36 03 0b 12 50 4e 48  34 7e 7d 3a 45 12 28 51  |.6...PNH4~}:E.(Q|
00000120  2a 48 3e 3a 42 58 51 7a  2e 62 12 7e 4e 32 2a 17  |*H>:BXQz.b.~N2*.|
[...]


PS : Why in this list 'reply' answer to the previous email sender instead of the ML email address ?

> 
> On Thu, Apr 24, 2014 at 2:35 PM, L.M.J <linuxmasterjedi@free.fr> wrote:
> 
> > Hello Scott,
> >
> >   Do you think I've lost my data 100% for sure ? fsck recovered 50% of the
> > files, don't you thing there is
> >   still something to save ?
> >
> >  Thanks
> >
> >
> > Le Thu, 24 Apr 2014 14:13:05 -0400,
> > "Scott D'Vileskis" <sdvileskis@gmail.com> a écrit :
> >
> > > NEVER USE "CREATE" ON FILESYSTEMS OR RAID ARRAYS UNLESS YOU KNOW WHAT YOU
> > > ARE DOING!
> > > CREATE destroys things in the creation process, especially with the
> > --force
> > > option.
> > >
> > > The create argument is only done to create a new array, it will start
> > with
> > > two drives as 'good' drives and the last will likely be the degraded
> > drive,
> > > so it will start resyncing and blowing away data on the last drive.  If
> > you
> > > used the --assume clean argument, and it DID NOT resync the drives, you
> > > might be able to recreate the array with the two good disks, provided you
> > > know the original order.
> > >
> > > If you used the --create option, and didn't have your disks in the same
> > > order they were originally in, you probably lost your data.
> > >
> > > Since you replaced a disk, with no data (or worse, with bad data), you
> > > should have assembled the array, in degraded mode WITHOUT the
> > > --assume-clean argument.
> > >
> > > If C & D contain your data, and B used to..
> > > mdadm --assemble /dev/md0 missing /dev/sdc1 /dev/sdd1
> > > You might have to --force the assembly. If it works, and it runs in
> > > degraded mode, mount your filesystem and take a backup.
> > >
> > > Next, then add your replacement drive back in:
> > > mdadm --add /dev/md0 /dev/sdb1
> > > (Note, if sdb1 has some superblock data, you might have to
> > > --zero-superblock first)
> > >
> > >
> > > Good luck.
> > >
> > >
> > > On Thu, Apr 24, 2014 at 1:48 PM, L.M.J <linuxmasterjedi@free.fr> wrote:
> > >
> > > > Up please :-(
> > > >
> > > > Le Thu, 24 Apr 2014 07:05:48 +0200,
> > > > "L.M.J" <linuxmasterjedi@free.fr> a écrit :
> > > >
> > > > > Hi,
> > > > >
> > > > > For the third time, I had to change a failed drive from my home linux
> > > > RAID5 box. Previous one went right and
> > > > > this time, I don't know what I did wrong, but I broke my RAID5.
> > Well, at
> > > > least, he didn't want to
> > > > > start. /dev/sdb was the failed drive /dev/sdc and /dev/sdd are OK.
> > > > >
> > > > > I tried to reassemble the RAID with this command after I replace sdb
> > and
> > > > create a new partition :
> > > > >
> > > > >  ~# mdadm -Cv /dev/md0 --assume-clean --level=5 --raid-devices=3
> > > > /dev/sdc1 /dev/sdd1 /dev/sdb1
> > > > > -> '-C' was not a good idea here
> > > > >
> > > > > Well, I guess I did an another mistake here, I should have done this
> > > > instead :
> > > > >  ~# mdadm -Av /dev/md0 --assume-clean --level=5 --raid-devices=3
> > > > /dev/sdc1 /dev/sdd1 missing
> > > > >
> > > > > Maybe this wipe out my data...
> > > > > Let's go futher, then, pvdisplay, pvscan, vgdisplay returns empty
> > > > information
> > > > >
> > > > > Google helped me, and I did this :
> > > > >  ~# dd if=/dev/md0 bs=512 count=255 skip=1 of=/tmp/md0.txt
> > > > >
> > > > >       [..]
> > > > >       physical_volumes {
> > > > >               pv0 {
> > > > >                       id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
> > > > >                       device = "/dev/md0"
> > > > >                       status = ["ALLOCATABLE"]
> > > > >                       flags = []
> > > > >                       dev_size = 7814047360
> > > > >                       pe_start = 384
> > > > >                       pe_count = 953863
> > > > >               }
> > > > >       }
> > > > >       logical_volumes {
> > > > >
> > > > >               lvdata {
> > > > >                       id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
> > > > >                       status = ["READ", "WRITE", "VISIBLE"]
> > > > >                       flags = []
> > > > >                       segment_count = 1
> > > > >       [..]
> > > > >
> > > > > Since I saw lvm information, I guess I haven't lost all information
> > > > yet...
> > > > >
> > > > > I tried an unhoped command :
> > > > >  ~# pvcreate --uuid "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
> > > > > --restorefile /etc/lvm/archive/lvm-raid_00302.vg /dev/md0 Then,
> > > > >
> > > > >  ~# vgcfgrestore lvm-raid
> > > > >
> > > > >  ~# lvs -a -o +devices
> > > > >    LV     VG       Attr   LSize   Origin Snap%  Move Log Copy%
> >  Convert
> > > >  Devices
> > > > >    lvdata lvm-raid -wi-a- 450,00g
> > > >               /dev/md0(148480)
> > > > >    lvmp   lvm-raid -wi-a-  80,00g
> > > >               /dev/md0(263680)
> > > > > Then :
> > > > >  ~# lvchange -ay /dev/lvm-raid/lv*
> > > > >
> > > > > I was quite happy until now.
> > > > > Problem appears now when I try to mount those 2 LV (lvdata & lvmp) as
> > > > ext4 partition :
> > > > >  ~# mount /home/foo/RAID_mp/
> > > > >
> > > > >  ~# mount | grep -i mp
> > > > >       /dev/mapper/lvm--raid-lvmp on /home/foo/RAID_mp type ext4 (rw)
> > > > >
> > > > >  ~# df -h /home/foo/RAID_mp
> > > > >       Filesystem                            Size  Used Avail Use%
> > > > Mounted on
> > > > >       /dev/mapper/lvm--raid-lvmp   79G   61G   19G  77%
> > > > /home/foo/RAID_mp
> > > > >
> > > > > Here is the big problem
> > > > >  ~# ls -la /home/foo/RAID_mp
> > > > >       total 0
> > > > >
> > > > > I did a LVM R/W snapshot on the /dev/mapper/lvm--raid-lvmp LV, I fsck
> > > > it. I recover 50% of the files only,
> > > > > all located in lost-+found/ directory with names heading with #xxxxx.
> > > > >
> > > > > I would like to know if there is a last chance to recover my data ?
> > > > >
> > > > > Thanks
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe
> > linux-raid" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> > in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <CAK_KU4aDDaUSGgcGBwCeO+yE0Qa_pUmMdAHMu7pqO7dqEEC71g@mail.gmail.com>]

* Re: Corrupted ext4 filesystem after mdadm manipulation error
       [not found]         ` <CAK_KU4aDDaUSGgcGBwCeO+yE0Qa_pUmMdAHMu7pqO7dqEEC71g@mail.gmail.com>
@ 2014-04-24 19:56           ` L.M.J
  2014-04-24 20:31             ` Scott D'Vileskis
       [not found]             ` <CAK_KU4YUejncX9yQk4HM5HE=1-qPPxOibuRauFheo3jaBc8SaQ@mail.gmail.com>
  0 siblings, 2 replies; 19+ messages in thread
From: L.M.J @ 2014-04-24 19:56 UTC (permalink / raw)
  To: linux-raid; +Cc: Scott D'Vileskis

Le Thu, 24 Apr 2014 15:43:33 -0400,
"Scott D'Vileskis" <sdvileskis@gmail.com> a écrit :

> Note, if you dare to --create the array again with your two previous disks, you'll want to create the array
> with the 'missing' disk in the right place.
> 
> mdadm --create /dev/md0 missing /dev/sdc1 /dev/sdd1  
> (Assuming your original array was sdb, sdc, sdd) 
> 
> You'll probably need a force and maybe a start-degraded.
> 
> Then, I would try the recovery on the resulting drive.

I think I have enough messed up with my drives, It might be dangerous to recreate again and again the
array :-(
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Corrupted ext4 filesystem after mdadm manipulation error
  2014-04-24 19:56           ` L.M.J
@ 2014-04-24 20:31             ` Scott D'Vileskis
  2014-04-24 22:25               ` Why would a recreation cause a different number of blocks?? Jeff Wiegley
       [not found]             ` <CAK_KU4YUejncX9yQk4HM5HE=1-qPPxOibuRauFheo3jaBc8SaQ@mail.gmail.com>
  1 sibling, 1 reply; 19+ messages in thread
From: Scott D'Vileskis @ 2014-04-24 20:31 UTC (permalink / raw)
  To: L.M.J; +Cc: linux-raid@vger.kernel.org

I have been replying directly to you, not to the mailing list, since
your case seems to be a case of user-screwed-up-his-own-data, and not
a problem with mdadm/linux raid, nor a problem that will necessarily
help someone else (since it is not likely someone will create a mess
in exactly the same manner you have). Also, it is easier to click
reply than reply-all and have to worry about the top-posting police
getting on my case.

To summarize:
1) You lost a disk. Even down a disk, you should have been able to
run/start the array (in degraded mode) with only 2 disks, mounted the
filesystem, backed up data, etc.
2) You then should have simply partitioned and then --add 'ed the new
disk.   mdadm would have written a superblock to the new disk, and
resynced the data

Would have-- Could have-- Should have---

Hindsight is 20/20, a mistake was made, it happens to all of us at
some point or another, (I've lost arrays and filesystems with careless
use of 'dd' once upon a time; Once, I was giving a raid demo to a
friend with loop devices, mistyped something, and blew something else
away)

Unfortunately, you might have clobbered your drives by recreating the
array. I assume your original disks were in the order sdb, sdc, sdd.
If so, you certainly clobbered your superblocks and changed the order
when you did this:
> ~# mdadm -Cv /dev/md0 --assume-clean --level=5 --raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1

You changed the order, but because of the assume-clean, it shouldn't
have started a resync of the data. Your file system probably had a fit
though, since your data was put effectively through a 3-piece
strip-type paper-shredder. You should be able to reorder things
though..

IMPORTANT: At any point did your drives do a resync?

Assuming no, and assuming you haven't done any other writing to your
disks(besides rewriting the superblocks), you can probably correct the
order of your drives by reissuing the --create command with the two
original drives, in the proper order, and the missing drive as the
placeholder. (This will rewrite the superblocks again, but hopefully
in the right order).
mdadm -Cv /dev/md0 --level=5 --raid-devices=3 missing /dev/sdc1 /dev/sdd1
Note, you need the 'missing' drive, so the raid calculates the missing
data, instead of reading chunks from a blank drive.

If you can start that array with 2 devices (it will be degraded with
only 2/3 drives) you should be able to mount and recover your data.
You may need to run a full fsck again since your last fsck probably
made a mess.

Assuming you can mount and copy your data, you can then --add your
'new' drive to the array with the --add argument. (Note, you'll have
to clear it's superblock or mdadm will object)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Why would a recreation cause a different number of blocks??
  2014-04-24 20:31             ` Scott D'Vileskis
@ 2014-04-24 22:25               ` Jeff Wiegley
  2014-04-25  3:34                 ` Mikael Abrahamsson
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Wiegley @ 2014-04-24 22:25 UTC (permalink / raw)
  Cc: linux-raid@vger.kernel.org

Still trying to restore my large storage system with out
total screwing up. There are two different raid md devices.
both had their superblocks wiped and one of the six drives
is screwed (the other 5 are fine).

Before the human failure: (OS reinstall and I only deleted
the MD devices in the ubuntu installer. I think this just
zeros the md superblocks of the affected partitions)

Personalities : [raid6] [raid5] [raid4] [raid1] [linear] [multipath]
[raid0] [raid10]
md3 : active raid6 sda1[0] sdc1[2] sde1[4] sdb1[1] sdd1[6]
       1073735680 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5]
[UUUUU_]

I recreated the device with:

mdadm --create --assume-clean --level=6 --raid-devices=6 /dev/md0
/dev/sdd1 /dev/sdb1 /dev/sde1 /dev/sdc1 /dev/sda1 missing

and now it reports:
root@nas:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda1[4] sdc1[3] sde1[2] sdb1[1] sdd1[0]
       1073215488 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5]
[UUUUU_]

Why did my block counts change? The disk partitions weren't touched
or changed at any point. Shouldn't I have gotten the same size?

The created device isn't work. There is suppose to be luks encrypted
volume there but luksOpen reports there is no luks header. (and there
use to be). Would the odd change in size indicate total corruption?

- Jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Why would a recreation cause a different number of blocks??
  2014-04-24 22:25               ` Why would a recreation cause a different number of blocks?? Jeff Wiegley
@ 2014-04-25  3:34                 ` Mikael Abrahamsson
  2014-04-25  5:02                   ` Jeff Wiegley
  0 siblings, 1 reply; 19+ messages in thread
From: Mikael Abrahamsson @ 2014-04-25  3:34 UTC (permalink / raw)
  To: Jeff Wiegley; +Cc: linux-raid@vger.kernel.org

On Thu, 24 Apr 2014, Jeff Wiegley wrote:

> Why did my block counts change? The disk partitions weren't touched
> or changed at any point. Shouldn't I have gotten the same size?

Defaults in mdadm has changed over time, so data offsets might be 
different. In order to get the exact same data offset you need to use the 
same mdadm version as was originally used, or at least know the values it 
used and use mdadm 3.3 that allows you to specify these data at creation 
time,

> The created device isn't work. There is suppose to be luks encrypted 
> volume there but luksOpen reports there is no luks header. (and there 
> use to be). Would the odd change in size indicate total corruption?

No, the change in size indicates that data offsets are not the same so 
your beginning of volume is now in the wrong place.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Why would a recreation cause a different number of blocks??
  2014-04-25  3:34                 ` Mikael Abrahamsson
@ 2014-04-25  5:02                   ` Jeff Wiegley
  2014-04-25  6:01                     ` Mikael Abrahamsson
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff Wiegley @ 2014-04-25  5:02 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid@vger.kernel.org

I'll look into this. I kind of thought about that sort
of thing and I went and installed ubuntu 12.04 which I
thought was what I started this all with but I might
have done it earlier than 12.04 and I might have used
gentoo.

are the mdadm defaults specific to mdadm version or
would ubuntu and gentoo have specified different
defaults in something like an /etc/defaults/ourmdadm.cfg?

If it's mdadm then could I just grab old copies of mdadm
sources and compile them one version after the other and
try each one?

Thanks,

- Jeff

On 4/24/2014 8:34 PM, Mikael Abrahamsson wrote:
> On Thu, 24 Apr 2014, Jeff Wiegley wrote:
>
>> Why did my block counts change? The disk partitions weren't touched
>> or changed at any point. Shouldn't I have gotten the same size?
>
> Defaults in mdadm has changed over time, so data offsets might be
> different. In order to get the exact same data offset you need to use the
> same mdadm version as was originally used, or at least know the values it
> used and use mdadm 3.3 that allows you to specify these data at creation
> time,
>
>> The created device isn't work. There is suppose to be luks encrypted
>> volume there but luksOpen reports there is no luks header. (and there
>> use to be). Would the odd change in size indicate total corruption?
>
> No, the change in size indicates that data offsets are not the same so
> your beginning of volume is now in the wrong place.
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Why would a recreation cause a different number of blocks??
  2014-04-25  5:02                   ` Jeff Wiegley
@ 2014-04-25  6:01                     ` Mikael Abrahamsson
  2014-04-25  6:45                       ` Jeff Wiegley
  2014-04-25  7:05                       ` Jeff Wiegley
  0 siblings, 2 replies; 19+ messages in thread
From: Mikael Abrahamsson @ 2014-04-25  6:01 UTC (permalink / raw)
  To: Jeff Wiegley; +Cc: linux-raid@vger.kernel.org

On Thu, 24 Apr 2014, Jeff Wiegley wrote:

> If it's mdadm then could I just grab old copies of mdadm sources and 
> compile them one version after the other and try each one?

As far as I know, it's mdadm version specific. If you look in the archives 
I'm sure you'll be able to find the old offsets and you can use the latest 
mdadm with those offsets and hopefully things will work.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Why would a recreation cause a different number of blocks??
  2014-04-25  6:01                     ` Mikael Abrahamsson
@ 2014-04-25  6:45                       ` Jeff Wiegley
  2014-04-25  7:25                         ` Mikael Abrahamsson
  2014-04-25  7:05                       ` Jeff Wiegley
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff Wiegley @ 2014-04-25  6:45 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid@vger.kernel.org

ooooh... Making progress.

I downloaded and compiled mdadm-3.1.4 and used that to create
the array.

The size is the same and luksOpen recognizes it as luks and
asks for and accepts the passphrase.

however mount says it need to be told the filesystem type and
if I add -t xfs then it still fails to mount the filesystem.

Any thoughts why the recreated array would satisfy and pass
cryptsetup's sanity checks but the resulting decrypted data
is not recognizable as xfs?

- Jeff

On 4/24/2014 11:01 PM, Mikael Abrahamsson wrote:
> On Thu, 24 Apr 2014, Jeff Wiegley wrote:
>
>> If it's mdadm then could I just grab old copies of mdadm sources and
>> compile them one version after the other and try each one?
>
> As far as I know, it's mdadm version specific. If you look in the archives
> I'm sure you'll be able to find the old offsets and you can use the latest
> mdadm with those offsets and hopefully things will work.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Why would a recreation cause a different number of blocks??
  2014-04-25  6:45                       ` Jeff Wiegley
@ 2014-04-25  7:25                         ` Mikael Abrahamsson
  0 siblings, 0 replies; 19+ messages in thread
From: Mikael Abrahamsson @ 2014-04-25  7:25 UTC (permalink / raw)
  To: Jeff Wiegley; +Cc: linux-raid@vger.kernel.org

On Thu, 24 Apr 2014, Jeff Wiegley wrote:

> Any thoughts why the recreated array would satisfy and pass cryptsetup's 
> sanity checks but the resulting decrypted data is not recognizable as 
> xfs?

Cryptsetup probably has a very small superblock which fits in one chunk, 
so if you got the order of the other drives wrong, then the fs will still 
be garbled while cryptsetup will think everything is fine.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Why would a recreation cause a different number of blocks??
  2014-04-25  6:01                     ` Mikael Abrahamsson
  2014-04-25  6:45                       ` Jeff Wiegley
@ 2014-04-25  7:05                       ` Jeff Wiegley
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff Wiegley @ 2014-04-25  7:05 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: linux-raid@vger.kernel.org

Here's something about my closer call...

In the original the mdstat lists:

md3 : active raid6 sda1[0] sdc1[2] sde1[4] sdb1[1] sdd1[6]
       1073735680 blocks super 1.2 level 6, 512k chunk, algorithm 2 
[6/5] [UUUUU_]

What do the numbers after the drives mean? It appears to match
the number in the Device Role in an examine.

When I recreated the array with 3.1.4 I used:
/mdadm --create --assume-clean --level=6 --raid-devices=6 /dev/md0 
/dev/sdd1 /dev/sdb1 /dev/sde1 /dev/sdc1 /dev/sda1 missing

now mdadm (which luks is happy with) reports:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdd1[0] sda1[4] sdc1[3] sde1[2] sdb1[1]
       1073735680 blocks super 1.2 level 6, 512k chunk, algorithm 2 
[6/5] [UUUUU_]

which are not the same numbers. Now I figure I can reorder my
command line arguments but (due to prior drive failures) I think
the mapping (from the original) should be:

0: /dev/sda1
1: /dev/sdb1
2: /dev/sdc1
3: ????
4: /dev/sde1
5: ?????
6: /dev/sdd1
7: /dev/sdf1 (this is my current dead drive that I have to leave out
               and bring the array degraded because this drive is
               probably very out of sync)

Though I know to substitute "missing" for /dev/sdf1 to leave it
out, my question is: What do I do about the active devices 3 and 5
on the command line? Also put missing for those?? I don't think
that will work because wouldn't it think I have 3 drives dead and
refuse to start the array?

But it does seem like I'm getting closer... and if I can get
this partition up then I have high probability of recovering
the larger important array that is in /dev/sd[a-o]2

Thanks again,

- Jeff

On 4/24/2014 11:01 PM, Mikael Abrahamsson wrote:
> On Thu, 24 Apr 2014, Jeff Wiegley wrote:
>
>> If it's mdadm then could I just grab old copies of mdadm sources and
>> compile them one version after the other and try each one?
>
> As far as I know, it's mdadm version specific. If you look in the archives
> I'm sure you'll be able to find the old offsets and you can use the latest
> mdadm with those offsets and hopefully things will work.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <CAK_KU4YUejncX9yQk4HM5HE=1-qPPxOibuRauFheo3jaBc8SaQ@mail.gmail.com>]

* Re: Corrupted ext4 filesystem after mdadm manipulation error
       [not found]             ` <CAK_KU4YUejncX9yQk4HM5HE=1-qPPxOibuRauFheo3jaBc8SaQ@mail.gmail.com>
@ 2014-04-25  5:13               ` L.M.J
  2014-04-25  6:04                 ` Mikael Abrahamsson
  0 siblings, 1 reply; 19+ messages in thread
From: L.M.J @ 2014-04-25  5:13 UTC (permalink / raw)
  To: Scott D'Vileskis; +Cc: linux-raid@vger.kernel.org

Le Thu, 24 Apr 2014 16:22:49 -0400,
"Scott D'Vileskis" <sdvileskis@gmail.com> a écrit :

> I have been replying directly to you, not to the mailing list, since your
> case seems to be a case of user-screwed-up-his-own-data, and not a problem
> with mdadm/linux raid, nor a problem that will necessarily help someone
> else (since it is not likely someone will create a mess in exactly the same
> manner you have) .

Ha OK. 



> To summarize:
> 1) You lost a disk. Even down a disk, you should have been able to
> run/start the array (in degraded mode) with only 2 disks, mounted the
> filesystem, etc.
Yes of course, it worked only with 2 disks the last 3 weeks.


> 2) You then should have simply partitioned and then --add 'ed the new disk.
>   mdadm would have written a superblock to the new disk, and resynced the
> data
> 
> I assume your original disks were in the order sdb, sdc, sdd.

Exactly


> Unfortunately, you might have clobbered your drives by recreating the
> array. You certainly clobbered your superblocks and changed the order when
> you did this:
> > ~# mdadm -Cv /dev/md0 --assume-clean --level=5 --raid-devices=3 /dev/sdc1
> /dev/sdd1 /dev/sdb1
> 
> You changed the order, but because of the assume-clean, it shouldn't have
> started a resync of the data. Your file system probably had a fit though.
> 
> Hindsight is 20/20, a mistake was made, it happens to all of us at some
> point or another, (I've lost arrays and filesystems with careless use of
> 'dd' once upon a time, once I was giving a raid demo to a friend with loop
> devices, mistyped something, and blew something away)
> 
> IMPORTANT: At any point did your drives do a resync?

Unfortunatly : yes, resync occurs when I 



> Assuming no, and assuming you haven't done any other writing to your
> disks(besides rewriting the superblocks), you can probably correct the
> order of your drives by reissuing the --create command with the two
> original drives, in the proper order, and the missing drive as the
> placeholder. (This will rewrite the superblocks again, but hopefully in the
> right order)
> mdadm -Cv /dev/md0 --level=5 --raid-devices=3 missing /dev/sdc1 /dev/sdd1
> 
> If you can start that array (it will be degraded with only 2/3 drives) you
> should be able to mount and recover your data. You may need to run a full
> fsck again since your last fsck probably made a mess.

I shutdown the computer, remove the old disk, added the new one. Maybe I've messed up with SATA cables too.
Unfortunately, I use to start the degraded array like this :

~# mdadm --assemble --force /dev/sdc1 /dev/sdd1

didn't work

I created a partition on sdb, and then, the mistake
~# mdadm --stop /dev/md0
~# mdadm -Cv /dev/md0 --assume-clean --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1

Didn't work better, then
~# mdadm --stop /dev/md0
~# mdadm --create /dev/md0 --level=5 --assume-clean --raid-devices=3 /dev/sdc1 /dev/sdd1 missing
~# mdadm --manage /dev/md0 --add /dev/sdb1 


Looks even worst, isn't it ?



> 
> Assuming you can mount and copy your data, you can then --add your 'new'
> drive to the array with the --add argument. (Note, you'll have to clear
> it's superblock or mdadm will object)
> 


And what do you think of files fsck may recovered :

5,5M 2013-04-24 17:53 #4456582
5,7M 2013-04-24 17:53 #4456589
 16M 2013-04-24 17:53 #4456590
 25M 2013-04-24 17:53 #4456594
 17M 2013-04-24 17:53 #4456578
 18M 2013-04-24 17:53 #4456580
1,3M 2013-04-24 17:54 #4456597
1,1M 2013-04-24 17:54 #4456596
 17M 2013-04-24 17:54 #4456595
2,1M 2013-04-24 17:54 #4456599
932K 2013-04-24 17:54 #4456598



Well, what should I do now ? mkfs everything and restart from scratch ?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Corrupted ext4 filesystem after mdadm manipulation error
  2014-04-25  5:13               ` Corrupted ext4 filesystem after mdadm manipulation error L.M.J
@ 2014-04-25  6:04                 ` Mikael Abrahamsson
  2014-04-25 11:43                   ` L. M. J
  0 siblings, 1 reply; 19+ messages in thread
From: Mikael Abrahamsson @ 2014-04-25  6:04 UTC (permalink / raw)
  To: L.M.J; +Cc: Scott D'Vileskis, linux-raid@vger.kernel.org

On Fri, 25 Apr 2014, L.M.J wrote:

> Well, what should I do now ? mkfs everything and restart from scratch ?

Most likely this is your only option. First you overwrote the superblocks 
so the drives came in the wrong order (most likely) and then you ran a 
read/write fsck on it.

The thing to do is to make sure md is read only and use fsck in 
readonly-mode as a diagnostics tool to see if everything is right. If you 
get it wrong and do fsck in read/write mode it's going to change things 
destructively.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Corrupted ext4 filesystem after mdadm manipulation error
  2014-04-25  6:04                 ` Mikael Abrahamsson
@ 2014-04-25 11:43                   ` L. M. J
  2014-04-25 13:36                     ` Scott D'Vileskis
  0 siblings, 1 reply; 19+ messages in thread
From: L. M. J @ 2014-04-25 11:43 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Scott D'Vileskis, linux-raid@vger.kernel.org

On 25 avril 2014 08:04:13 CEST, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>On Fri, 25 Apr 2014, L.M.J wrote:
>
>> Well, what should I do now ? mkfs everything and restart from scratch
>?
>
>Most likely this is your only option. First you overwrote the
>superblocks 
>so the drives came in the wrong order (most likely) and then you ran a 
>read/write fsck on it.
>
>The thing to do is to make sure md is read only and use fsck in 
>readonly-mode as a diagnostics tool to see if everything is right. If
>you 
>get it wrong and do fsck in read/write mode it's going to change things
>
>destructively.

I haven't done a R/W fuck yet, only on LVM snapshot to test, never on real data. 
Does it change anything? 
-- 
May the open source be with you my young padawan. 

Envoyé de mon téléphone, excusez la brièveté.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Corrupted ext4 filesystem after mdadm manipulation error
  2014-04-25 11:43                   ` L. M. J
@ 2014-04-25 13:36                     ` Scott D'Vileskis
  2014-04-25 14:43                       ` L.M.J
  2014-04-25 18:37                       ` Is disk order relative or are the numbers absolute? Jeff Wiegley
  0 siblings, 2 replies; 19+ messages in thread
From: Scott D'Vileskis @ 2014-04-25 13:36 UTC (permalink / raw)
  To: L. M. J; +Cc: linux-raid@vger.kernel.org

Drive B has bogus data on it., since it was resync'd with C & D in the
wrong order. Fortunately, your --add should have only have changed B,
not C & D.

As a last ditch effort, try the --create again but with the two
potentially good disks in the right order:

mdadm --create /dev/md0 --level=5 --raid-devices=3 missing /dev/sdc1 /dev/sdd1

Note: The following is where I have reproduced your problem with loop devices

#Create 3 200MB files
root@Breadman:/home/scott# mkdir raidtesting
root@Breadman:/home/scott# cd raidtesting/
root@Breadman:/home/scott/raidtesting# fallocate -l200000000 sdb
root@Breadman:/home/scott/raidtesting# fallocate -l200000000 sdc
root@Breadman:/home/scott/raidtesting# fallocate -l200000000 sdd
root@Breadman:/home/scott/raidtesting# losetup /dev/loop2 sdb
root@Breadman:/home/scott/raidtesting# losetup /dev/loop3 sdc
root@Breadman:/home/scott/raidtesting# losetup /dev/loop4 sdd
root@Breadman:/home/scott/raidtesting# mdadm --create /dev/md0 -n3 -l5
/dev/loop2 /dev/loop3 /dev/loop4
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
md0 : active raid5 loop4[3] loop3[1] loop2[0]
      388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

root@Breadman:/home/scott/raidtesting# mkfs.reiserfs /dev/md0
mkfs.reiserfs 3.6.21 (2009 www.namesys.com)
<SNIP>
ReiserFS is successfully created on /dev/md0.
root@Breadman:/home/scott/raidtesting# mkdir temp
root@Breadman:/home/scott/raidtesting# mount /dev/md0 temp/

#Then I copied a file to it:
root@Breadman:/home/scott/raidtesting# md5sum temp/systemrescuecd-x86-0.4.3.iso
b88ce25b156619a9a344889bc92b1833  temp/systemrescuecd-x86-0.4.3.iso

#And failed a disk
root@Breadman:/home/scott/raidtesting# umount temp/
root@Breadman:/home/scott/raidtesting# mdadm --fail /dev/md0 /dev/loop2
mdadm: set /dev/loop2 faulty in /dev/md0
root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
md0 : active raid5 loop4[3] loop3[1] loop2[0](F)
      388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

#Stopped array, removed disk, replaced disk by creating a new file
root@Breadman:/home/scott/raidtesting# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@Breadman:/home/scott/raidtesting# losetup -d /dev/loop2
root@Breadman:/home/scott/raidtesting# rm sdb
root@Breadman:/home/scott/raidtesting# fallocate -l200000000 sdb-new
root@Breadman:/home/scott/raidtesting# losetup /dev/loop2 sdb-new

#WRONG: Create array in wrong order
root@Breadman:/home/scott/raidtesting# mdadm --create /dev/md0
--assume-clean -l5 -n3 /dev/loop3 /dev/loop4 /dev/loop2
mdadm: /dev/loop3 appears to be part of a raid array:
       level=raid5 devices=3 ctime=Fri Apr 25 09:10:31 2014
mdadm: /dev/loop4 appears to be part of a raid array:
       level=raid5 devices=3 ctime=Fri Apr 25 09:10:31 2014
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
[raid1] [raid10]
md0 : active raid5 loop2[2] loop4[1] loop3[0]
      388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

root@Breadman:/home/scott/raidtesting# mount /dev/md0 temp/
mount: you must specify the filesystem type

#Nope, doesn't mount, filesystem clobbered, or not?

root@Breadman:/home/scott/raidtesting# mdadm --stop /dev/md0
mdadm: stopped /dev/md0

#Recreate the array, with missing disk in the right place
root@Breadman:/home/scott/raidtesting# mdadm --create /dev/md0 -l5 -n3
missing /dev/loop3 /dev/loop4
mdadm: /dev/loop3 appears to be part of a raid array:
       level=raid5 devices=3 ctime=Fri Apr 25 09:17:38 2014
mdadm: /dev/loop4 appears to be part of a raid array:
       level=raid5 devices=3 ctime=Fri Apr 25 09:17:38 2014
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
root@Breadman:/home/scott/raidtesting# mount /dev/md0 temp/
root@Breadman:/home/scott/raidtesting# ls temp/
systemrescuecd-x86-0.4.3.iso
root@Breadman:/home/scott/raidtesting# md5sum temp/systemrescuecd-x86-0.4.3.iso
b88ce25b156619a9a344889bc92b1833  temp/systemrescuecd-x86-0.4.3.iso

#Notice we are in degraded mode
root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
md0 : active raid5 loop4[2] loop3[1]
      388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]

#Add our replacement disk:
root@Breadman:/home/scott/raidtesting# mdadm --add /dev/md0 /dev/loop2
mdadm: added /dev/loop2

root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
md0 : active raid5 loop2[3] loop4[2] loop3[1]
      388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
      [============>........]  recovery = 62.1% (121316/194048)
finish=0.0min speed=12132K/sec

#After a while (short while with 200MB loop devices):
root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
[raid1] [raid10]
md0 : active raid5 loop2[3] loop4[2] loop3[1]
      388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Corrupted ext4 filesystem after mdadm manipulation error
  2014-04-25 13:36                     ` Scott D'Vileskis
@ 2014-04-25 14:43                       ` L.M.J
  2014-04-25 18:37                       ` Is disk order relative or are the numbers absolute? Jeff Wiegley
  1 sibling, 0 replies; 19+ messages in thread
From: L.M.J @ 2014-04-25 14:43 UTC (permalink / raw)
  To: Scott D'Vileskis; +Cc: linux-raid@vger.kernel.org

Le Fri, 25 Apr 2014 09:36:12 -0400,
"Scott D'Vileskis" <sdvileskis@gmail.com> a écrit :

> As a last ditch effort, try the --create again but with the two
> potentially good disks in the right order:
> 
> mdadm --create /dev/md0 --level=5 --raid-devices=3 missing /dev/sdc1 /dev/sdd1


root@gateway:~# mdadm --create /dev/md0 --level=5 --raid-devices=3 missing /dev/sdc1 /dev/sdd1
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Fri Apr 25 16:20:32 2014
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=3 ctime=Fri Apr 25 16:20:32 2014
Continue creating array? y
mdadm: array /dev/md0 started.


root@gateway:~# ls -l /dev/md*
brw-rw---- 1 root disk   9, 0 2014-04-25 16:34 /dev/md0
brw-rw---- 1 root disk 254, 0 2014-04-25 16:19 /dev/md_d0
lrwxrwxrwx 1 root root      7 2014-04-25 16:04 /dev/md_d0p1 -> md/d0p1
lrwxrwxrwx 1 root root      7 2014-04-25 16:04 /dev/md_d0p2 -> md/d0p2
lrwxrwxrwx 1 root root      7 2014-04-25 16:04 /dev/md_d0p3 -> md/d0p3
lrwxrwxrwx 1 root root      7 2014-04-25 16:04 /dev/md_d0p4 -> md/d0p4

/dev/md:
total 0
brw------- 1 root root 254, 0 2014-04-25 16:04 d0
brw------- 1 root root 254, 1 2014-04-25 16:04 d0p1
brw------- 1 root root 254, 2 2014-04-25 16:04 d0p2
brw------- 1 root root 254, 3 2014-04-25 16:04 d0p3
brw------- 1 root root 254, 4 2014-04-25 16:04 d0p4


root@gateway:~# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid5 sdd1[2] sdc1[1]
      3907023872 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]
      
unused devices: <none>


root@gateway:~# pvscan 
  No matching physical volumes found


root@gateway:~# pvdisplay 



root@gateway:~# dd if=/dev/md0 of=/tmp/md0.dd count=10 bs=1M
10+0 enregistrements lus
10+0 enregistrements écrits
10485760 octets (10 MB) copiés, 0,271947 s, 38,6 MB/s


I can see in /tmp/md0.dd a lot of binary stuff, and sometimes, text :

physical_volumes {

pv0 {
id = "5DZit9-6o5V-a1vu-1D1q-fnc0-syEj-kVwAnW"
device = "/dev/md0"

status = ["ALLOCATABLE"]
flags = []
dev_size = 7814047360
pe_start = 384
pe_count = 953863
}
}

logical_volumes {

lvdata {
id = "JiwAjc-qkvI-58Ru-RO8n-r63Z-ll3E-SJazO7"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1

segment1 {
start_extent = 0
extent_count = 115200

type = "striped"
stripe_count = 1        # linear

stripes = [

[...]

lvdata_snapshot_J5 {
id = "Mcvgul-Qo2L-1sPB-LvtI-KuME-fiiM-6DXeph"
status = ["READ"]
flags = []
segment_count = 1

segment1 {
start_extent = 0
extent_count = 25600

type = "striped"
stripe_count = 1        # linear

stripes = [
"pv0", 284160
]
}
}


[...]

lvdata_snapshot_J5 is a snap I've created a few days before my mdadm chaos, so i'm pretty sure some datas are
still on the drives... Am I wrong ?

Thanks




--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Is disk order relative or are the numbers absolute?
  2014-04-25 13:36                     ` Scott D'Vileskis
  2014-04-25 14:43                       ` L.M.J
@ 2014-04-25 18:37                       ` Jeff Wiegley
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff Wiegley @ 2014-04-25 18:37 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

I'm still trying to recover my array and I'm getting
close I think I just have to get the disk order
correct now.

Over the past year I've had a couple of failures.
and replaced disks. This changed drive numbers.

mdstat before everything went to hell was:
md4 : active raid6 sdf2[7](F) sda2[0] sdc2[2] sde2[4] sdb2[1] sdd2[6]
       10647314432 blocks super 1.2 level 6, 512k chunk, algorithm 2 
[6/5] [UUUUU_]

Does this indicate an order of:
0: sda2
1: sdb2
2: sdc2
3: ???? (previous dead drive I replaced I'm guessing)
4: sde2
5: ???? (second previously dead/replaced drive )
6: sdd2
7: sdf2 (which is currently dead/failed)

I have to recreate the array due to zeroing the superblocks
during install (though I have not changed partition tables or
ever caused a resync of any drives.

My question is I know I can get the five good drives recreated
into an array. but I don't know how to give them specific
numbers. I can get their relative order correct with:

--create --assume-clean ... /dev/sd{a,b,c,e,d,f}2 but
sde2 will be numbered 3, not 4. sdd2 will be 4, not 5.

Will these number changes not make a difference because only
relative order is important? Or do I have to figure out some
way to force absolute positions/numbers to the drives?

Thank you,

- Jeff

On 4/25/2014 6:36 AM, Scott D'Vileskis wrote:
> Drive B has bogus data on it., since it was resync'd with C & D in the
> wrong order. Fortunately, your --add should have only have changed B,
> not C & D.
>
> As a last ditch effort, try the --create again but with the two
> potentially good disks in the right order:
>
> mdadm --create /dev/md0 --level=5 --raid-devices=3 missing /dev/sdc1 /dev/sdd1
>
> Note: The following is where I have reproduced your problem with loop devices
>
> #Create 3 200MB files
> root@Breadman:/home/scott# mkdir raidtesting
> root@Breadman:/home/scott# cd raidtesting/
> root@Breadman:/home/scott/raidtesting# fallocate -l200000000 sdb
> root@Breadman:/home/scott/raidtesting# fallocate -l200000000 sdc
> root@Breadman:/home/scott/raidtesting# fallocate -l200000000 sdd
> root@Breadman:/home/scott/raidtesting# losetup /dev/loop2 sdb
> root@Breadman:/home/scott/raidtesting# losetup /dev/loop3 sdc
> root@Breadman:/home/scott/raidtesting# losetup /dev/loop4 sdd
> root@Breadman:/home/scott/raidtesting# mdadm --create /dev/md0 -n3 -l5
> /dev/loop2 /dev/loop3 /dev/loop4
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md0 started.
>
> root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
> md0 : active raid5 loop4[3] loop3[1] loop2[0]
>        388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
>
> root@Breadman:/home/scott/raidtesting# mkfs.reiserfs /dev/md0
> mkfs.reiserfs 3.6.21 (2009 www.namesys.com)
> <SNIP>
> ReiserFS is successfully created on /dev/md0.
> root@Breadman:/home/scott/raidtesting# mkdir temp
> root@Breadman:/home/scott/raidtesting# mount /dev/md0 temp/
>
> #Then I copied a file to it:
> root@Breadman:/home/scott/raidtesting# md5sum temp/systemrescuecd-x86-0.4.3.iso
> b88ce25b156619a9a344889bc92b1833  temp/systemrescuecd-x86-0.4.3.iso
>
> #And failed a disk
> root@Breadman:/home/scott/raidtesting# umount temp/
> root@Breadman:/home/scott/raidtesting# mdadm --fail /dev/md0 /dev/loop2
> mdadm: set /dev/loop2 faulty in /dev/md0
> root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
> md0 : active raid5 loop4[3] loop3[1] loop2[0](F)
>        388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
>
> #Stopped array, removed disk, replaced disk by creating a new file
> root@Breadman:/home/scott/raidtesting# mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> root@Breadman:/home/scott/raidtesting# losetup -d /dev/loop2
> root@Breadman:/home/scott/raidtesting# rm sdb
> root@Breadman:/home/scott/raidtesting# fallocate -l200000000 sdb-new
> root@Breadman:/home/scott/raidtesting# losetup /dev/loop2 sdb-new
>
> #WRONG: Create array in wrong order
> root@Breadman:/home/scott/raidtesting# mdadm --create /dev/md0
> --assume-clean -l5 -n3 /dev/loop3 /dev/loop4 /dev/loop2
> mdadm: /dev/loop3 appears to be part of a raid array:
>         level=raid5 devices=3 ctime=Fri Apr 25 09:10:31 2014
> mdadm: /dev/loop4 appears to be part of a raid array:
>         level=raid5 devices=3 ctime=Fri Apr 25 09:10:31 2014
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md0 started.
> root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
> [raid1] [raid10]
> md0 : active raid5 loop2[2] loop4[1] loop3[0]
>        388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
>
> root@Breadman:/home/scott/raidtesting# mount /dev/md0 temp/
> mount: you must specify the filesystem type
>
> #Nope, doesn't mount, filesystem clobbered, or not?
>
> root@Breadman:/home/scott/raidtesting# mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
>
> #Recreate the array, with missing disk in the right place
> root@Breadman:/home/scott/raidtesting# mdadm --create /dev/md0 -l5 -n3
> missing /dev/loop3 /dev/loop4
> mdadm: /dev/loop3 appears to be part of a raid array:
>         level=raid5 devices=3 ctime=Fri Apr 25 09:17:38 2014
> mdadm: /dev/loop4 appears to be part of a raid array:
>         level=raid5 devices=3 ctime=Fri Apr 25 09:17:38 2014
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md0 started.
> root@Breadman:/home/scott/raidtesting# mount /dev/md0 temp/
> root@Breadman:/home/scott/raidtesting# ls temp/
> systemrescuecd-x86-0.4.3.iso
> root@Breadman:/home/scott/raidtesting# md5sum temp/systemrescuecd-x86-0.4.3.iso
> b88ce25b156619a9a344889bc92b1833  temp/systemrescuecd-x86-0.4.3.iso
>
> #Notice we are in degraded mode
> root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
> md0 : active raid5 loop4[2] loop3[1]
>        388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
>
> #Add our replacement disk:
> root@Breadman:/home/scott/raidtesting# mdadm --add /dev/md0 /dev/loop2
> mdadm: added /dev/loop2
>
> root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
> md0 : active raid5 loop2[3] loop4[2] loop3[1]
>        388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
>        [============>........]  recovery = 62.1% (121316/194048)
> finish=0.0min speed=12132K/sec
>
> #After a while (short while with 200MB loop devices):
> root@Breadman:/home/scott/raidtesting# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
> [raid1] [raid10]
> md0 : active raid5 loop2[3] loop4[2] loop3[1]
>        388096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-04-25 18:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-24  5:05 Corrupted ext4 filesystem after mdadm manipulation error L.M.J
2014-04-24 17:48 ` L.M.J
     [not found]   ` <CAK_KU4a+Ep7=F=NSbb-hqN6Rvayx4QPWm-M2403OHn5-LVaNZw@mail.gmail.com>
2014-04-24 18:35     ` L.M.J
     [not found]       ` <CAK_KU4Zh-azXEEzW4f1m=boCZDKevqaSHxW0XoAgRdrCbm2PkA@mail.gmail.com>
2014-04-24 19:53         ` L.M.J
     [not found]         ` <CAK_KU4aDDaUSGgcGBwCeO+yE0Qa_pUmMdAHMu7pqO7dqEEC71g@mail.gmail.com>
2014-04-24 19:56           ` L.M.J
2014-04-24 20:31             ` Scott D'Vileskis
2014-04-24 22:25               ` Why would a recreation cause a different number of blocks?? Jeff Wiegley
2014-04-25  3:34                 ` Mikael Abrahamsson
2014-04-25  5:02                   ` Jeff Wiegley
2014-04-25  6:01                     ` Mikael Abrahamsson
2014-04-25  6:45                       ` Jeff Wiegley
2014-04-25  7:25                         ` Mikael Abrahamsson
2014-04-25  7:05                       ` Jeff Wiegley
     [not found]             ` <CAK_KU4YUejncX9yQk4HM5HE=1-qPPxOibuRauFheo3jaBc8SaQ@mail.gmail.com>
2014-04-25  5:13               ` Corrupted ext4 filesystem after mdadm manipulation error L.M.J
2014-04-25  6:04                 ` Mikael Abrahamsson
2014-04-25 11:43                   ` L. M. J
2014-04-25 13:36                     ` Scott D'Vileskis
2014-04-25 14:43                       ` L.M.J
2014-04-25 18:37                       ` Is disk order relative or are the numbers absolute? Jeff Wiegley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).